Introduction to Database Design
K. Yue
1. Introduction
Example:
Consider the vastly simplified and poorly-designed relation/table:
Employee_Bad(EmpId, Name, DeptId, MailCode).
Assumptions made:
Thus, EmpId is a candidate key (CK). An instance of Employee_Bad:
EmpId | Name | DeptId | MailCode |
101 | Lady Gaga | D123 | M10 |
122 | Brad Pitts | D123 | M10 |
140 | Lebron James | D123 | M10 |
155 | Narendra Modi | D222 | M21 |
167 | Jennifer Lopez | D222 | M21 |
311 | John Smiths | D300 | M33 |
Problem:
1.1 Update Anomaly:
(a) The mail code of department D123 is updated to M44.
EmpId | Name | DeptId | MailCode |
101 | Lady Gaga | D123 | M10 -> M44 |
122 | Brad Pitts | D123 | M10 -> M44 |
140 | Lebron James | D123 | M10 -> M44 |
155 | Narendra Modi | D222 | M21 |
167 | Jennifer Lopez | D222 | M21 |
311 | John Smiths | D300 | M33 |
SQL solution: it works but is not efficient.
UPDATE Employee
SET MailCode = 'M44'
WHERE DeptId = 'D123';
(b) Jennifer Lopez is reassigned to work for Department D300:
The table may become:
EmpId | Name | DeptId | MailCode |
101 | Lady Gaga | D123 | M10 |
122 | Brad Pitts | D123 | M10 |
140 | Lebron James | D123 | M10 |
155 | Narendra Modi | D222 | M21 |
167 | Jennifer Lopez | D300 | M21 (may not be updated to M33) |
311 | John Smiths | D300 | M33 |
The intuitive SQL command:
UPDATE Employee
SET DeptId = 'D300'
WHERE Name = 'Jennifer Lopez';
will produce inconsistent results, as shown in the table above.
One needs to update both DeptId and MailCode. However,
UPDATE Employee
SET DeptId = 'D300',
MailCode = (SELECT DISTINCT MailCode FROM Employee WHERE DeptId = 'D300')
WHERE Name = 'Jennifer Lopez';
will not work in MySQL as one cannot include a SELECT clause on the same table in the SET clause of an UPDATE statement.
A possible solution using a session variable, @mailCode:
SELECT DISTINCT MailCode INTO @mailCode
FROM Employee WHERE DeptId = 'D300';
UPDATE Employee
SET DeptId = 'D300',
MailCode = @mailCode
WHERE Name = 'Jennifer Lopez';
1.2 Insertion Anomaly:
It is not possible creating a new Department D777, with the mail code M40 but no employee working for it yet. This is because, as the PK of Employee_Bad, EmpId cannot be null.
EmpId | Name | DeptId | MailCode |
101 | Lady Gaga | D123 | M10 |
122 | Brad Pitts | D123 | M10 |
140 | Lebron James | D123 | M10 |
155 | Narendra Modi | D222 | M21 |
167 | Jennifer Lopez | D222 | M21 |
311 | John Smiths | D300 | M33 |
???? | ???? | D777 | M40 (this row cannot be added) |
1.3 Deletion Anomaly
John Smiths no longer works here. Result: the information that M33 is the mail code of Department D300 is also lost.
EmpId | Name | DeptId | MailCode |
101 | Lady Gaga | D123 | M10 |
122 | Brad Pitts | D123 | M10 |
140 | Lebron James | D123 | M10 |
155 | Narendra Modi | D222 | M21 |
167 | Jennifer Lopez | D222 | M21 |
1.4 Decomposition
A standard way of resolving unnecessary redundancy in poorly designed tables is by proper decomposition: breaking down a relation into two or more component relations.
Example: the decomposition of the relation Employee_Bad into two relations:
1. Empolyee(EmpId, Name, DeptId)
EmpId | Name | DeptId |
101 | Lady Gaga | D123 |
122 | Brad Pitts | D123 |
140 | Lebron James | D123 |
155 | Narendra Modi | D222 |
167 | Jennifer Lopez | D222 |
311 | John Smiths | D300 |
2. Department(DeptId, MailCode)
DeptId | MailCode |
D123 | M10 |
D222 | M21 |
D300 | M33 |
To obtain the original relation Employee_Bad(EmpId, Name, DeptId, MailCode) from
Employee(EmpId, Name, DeptId)
Department(DeptId, MailCode)
Relational algebra: using natural join, |x|.
Employee_Bad = Employee |x| Department
This decomposition is said to be a lossless decomposition.
SQL:
SELECT Employee.*, Department.MailCode
FROM Employee INNER JOIN Department ON (Employee.DeptId = Department.DeptId);
or
SELECT *
FROM Employee NATURAL JOIN Department;
The actions of the four use cases that produced anomaly in Employee_Bad:
1. Empolyee(EmpId, Name, DeptId)
EmpId | Name | DeptId |
101 | Lady Gaga | D123 |
122 | Brad Pitts | D123 |
140 | Lebron James | D123 |
155 | Narendra Modi | D222 |
167 | Jennifer Lopez | |
2. Department(DeptId, MailCode)
DeptId | MailCode |
D123 | |
D222 | M21 |
D300 | M33 |
D777 | M40 |
2. Methods for good database designs
Two main tools:
3. Integrity Rules
3.1 Database-Specific Integrity Rules
Examples: some database-specific integrity rules.
3.2 General Integrity Rules
3.2.1 Entity Integrity Rule
Example:
Employee(EmpId, Name, DeptId, Salary)
EmpId | Name | DeptId | Salary |
101 | Lady Gaga | D123 | 55000000 |
122 | Brad Pitts | D123 | 10100000 |
140 | Lebron James | D123 | 50000000 |
155 | Narendra Modi | @: null | @ |
@ | Jennifer Lopez | D222 | 20000000 (should not be able to add this row) |
@ | John Smiths | D300 | 70000 (should not be able to add this row) |
3.2.2 Referential Integrity Rule
Example:
Employee(EmpId, Name, DeptId)
EmpId | Name | DeptId |
101 | Lady Gaga | D123 |
122 | Brad Pitts | D123 |
140 | Lebron James | @ |
155 | Narendra Modi | D222 |
167 | Jennifer Lopez | D222 |
311 | John Smiths | D300 |
Department(DeptId, MailCode)
DeptId | MailCode |
D123 | M10 |
D222 | M21 |
D300 | M33 |
Example:
Employee(EmpId, Name, DeptId)
EmpId | Name | DeptId |
101 | Lady Gaga | D123 |
122 | Brad Pitts | D123 |
140 | Lebron James | D123 |
155 | Narendra Modi | D222 |
167 | Jennifer Lopez | D222 |
311 | John Smiths | D300 |
350 | Bun Yue | D119 (should not be added) |
Department(DeptId, MailCode)
DeptId | MailCode |
D123 | M10 |
D222 | M21 |
D300 | M33 |
Note: