Introduction to Physical Database
by K. Yue
1. Introduction
- Data in databases is stored in storage medium to provide persistence. Two major questions?
- What are the storage media?
- How are the data stored?
- I/O operations are usually the most significant factor for database operation latency, not CPU or memory operations.
- The memory hierarchy distinguishes each level of computer storage by access time. Higher level memory is usually:
- faster in access time,
- lower in volume (size),
- more expensive in cost, and
- likely to be not persistent: volatile memory.
- Relational model: tuples and relations -> File systems: records and files.
- Principle of locality: Things are not distributed randomly. They are concentrated in some local areas.
- More frequently used data should be stored in memory of higher level in the memory hierarchy.
- Data frequently accessed together should be stored close to each other in the storage device, if applicable.
- Caching may be used to enhance performance.
- Considerations of selecting secondary storage devices:
- Speed
- Volume
- Cost
- Reliability
- Availability
- Each type of storage devices has important characteristics that should be considered carefully for uses in database architecture.
2. Physical Database Design
- Block:
- A row (tuple) can be a record in a file system.
- To improve performance, records can be grouped and stored in blocks to maximize the use of the seek operation in a hard disk.
- Blocking factor: the number of records in a block.
- Block size: the number of Bytes in the block in one read operation.
- Files can be sequential files, direct access files, or others.
- Examples of important DB file structures:
- B+-tree: the primary key is used to navigate through an tree index structure to reach linked terminal nodes that store records.
- Fast sequential access through the primary key: O(n/B) read operation, where n is the number of records, and B is the blocking factor.
- Fast random access through the primary key: O(lg (n/B)).
- Hashing: an address is computed from the primary key for storage:
- Fast random access through the primary key: O(1).
- Slow sequential access through the primary key.
- DBMS may allow you to select the physical structures, sometimes known as the storage engines.
3. Denormalization, Partitioning and Clustering
- Denormalization: combining tables into one table for faster access.
- Partitioning: breaking down tables for faster access (as tables are smaller).
- Horizontal partitioning: distributing rows to component tables.
- Vertical partitioning: distributing columns to component tables.
- Clustering: related records from different tables can be stored together in the same disk area.
4. MySQL Indexes
- An index is an access path created to search for records (tuples) more efficiently.
- There is a cost in creating and maintaining an index.
- Cost effectiveness analysis, including profiling, may be used for consideration of index creation.
- In MySQL, indexes can be created when a table is created: syntax.
- An index creates a physical structure to speed up searching with the indexed attributes.
- Benefits: faster search
- Costs: maintaining the index structure