Introduction to Physical DB

Introduction to Physical Database

by K. Yue

1. Introduction

Data in databases is stored in storage medium to provide persistence. Two major questions?
1. What are the storage media?
2. How are the data stored?
I/O operations are usually the most significant factor for database operation latency, not CPU or memory operations.
The memory hierarchy distinguishes each level of computer storage by access time. Higher level memory is usually:
1. faster in access time,
2. lower in volume (size),
3. more expensive in cost, and
4. likely to be not persistent: volatile memory.
Relational model: tuples and relations -> File systems: records and files.
Principle of locality: Things are not distributed randomly. They are concentrated in some local areas.
1. More frequently used data should be stored in memory of higher level in the memory hierarchy.
2. Data frequently accessed together should be stored close to each other in the storage device, if applicable.
3. Caching may be used to enhance performance.
Considerations of selecting secondary storage devices:
1. Speed
2. Volume
3. Cost
4. Reliability
5. Availability
Each type of storage devices has important characteristics that should be considered carefully for uses in database architecture.

2. Physical Database Design

Block:
- A row (tuple) can be a record in a file system.
- To improve performance, records can be grouped and stored in blocks to maximize the use of the seek operation in a hard disk.
- Blocking factor: the number of records in a block.
- Block size: the number of Bytes in the block in one read operation.
Files can be sequential files, direct access files, or others.
Examples of important DB file structures:
1. B+-tree: the primary key is used to navigate through an tree index structure to reach linked terminal nodes that store records.
  1. Fast sequential access through the primary key: O(n/B) read operation, where n is the number of records, and B is the blocking factor.
  2. Fast random access through the primary key: O(lg (n/B)).
2. Hashing: an address is computed from the primary key for storage:
  1. Fast random access through the primary key: O(1).
  2. Slow sequential access through the primary key.
DBMS may allow you to select the physical structures, sometimes known as the storage engines.

3. Denormalization, Partitioning and Clustering

Denormalization: combining tables into one table for faster access.
Partitioning: breaking down tables for faster access (as tables are smaller).
- Horizontal partitioning: distributing rows to component tables.
- Vertical partitioning: distributing columns to component tables.
Clustering: related records from different tables can be stored together in the same disk area.

4. MySQL Indexes

An index is an access path created to search for records (tuples) more efficiently.
There is a cost in creating and maintaining an index.
Cost effectiveness analysis, including profiling, may be used for consideration of index creation.
In MySQL, indexes can be created when a table is created: syntax.
An index creates a physical structure to speed up searching with the indexed attributes.
1. Benefits: faster search
2. Costs: maintaining the index structure