Introduction to Database Design

K. Yue

1. Introduction

Example:

Consider the vastly simplified and poorly-designed relation/table:

Employee_Bad(EmpId, Name, DeptId, MailCode).

Assumptions made:

  1. Every employee has an unique EmpId.
  2. Every employee is represented as a tuple in the Employee relation.
  3. Every employee works for only one department.
  4. Every department has an unique DeptId.
  5. Every department has exactly one mail code, stored in the field MailCode.

Thus, EmpId is a candidate key (CK). An instance of Employee_Bad:

EmpId Name DeptId MailCode
101 Lady Gaga D123 M10
122 Brad Pitts D123 M10
140 Lebron James D123 M10
155 Narendra Modi D222 M21
167 Jennifer Lopez D222 M21
311 John Smiths D300 M33

Problem:

1.1 Update Anomaly:

(a) The mail code of department D123 is updated to M44.

EmpId Name DeptId MailCode
101 Lady Gaga D123 M10 -> M44
122 Brad Pitts D123 M10 -> M44
140 Lebron James D123 M10 -> M44
155 Narendra Modi D222 M21
167 Jennifer Lopez D222 M21
311 John Smiths D300 M33

SQL solution: it works but is not efficient.

UPDATE Employee
SET MailCode = 'M44'
WHERE DeptId = 'D123';

(b) Jennifer Lopez is reassigned to work for Department D300:

The table may become:

EmpId Name DeptId MailCode
101 Lady Gaga D123 M10
122 Brad Pitts D123 M10
140 Lebron James D123 M10
155 Narendra Modi D222 M21
167 Jennifer Lopez D300 M21 (may not be updated to M33)
311 John Smiths D300 M33

The intuitive SQL command:

UPDATE Employee
SET DeptId = 'D300'
WHERE Name = 'Jennifer Lopez';

will produce inconsistent results, as shown in the table above.

One needs to update both DeptId and MailCode. However,

UPDATE Employee
SET DeptId = 'D300',
     MailCode = (SELECT DISTINCT MailCode FROM Employee WHERE DeptId = 'D300')
WHERE Name = 'Jennifer Lopez'
;

will not work in MySQL as one cannot include a SELECT clause on the same table in the SET clause of an UPDATE statement.

A possible solution using a session variable, @mailCode:

SELECT DISTINCT MailCode INTO @mailCode
FROM Employee WHERE DeptId = 'D300';

UPDATE Employee
SET DeptId = 'D300',
     MailCode = @mailCode
WHERE Name = 'Jennifer Lopez';

1.2 Insertion Anomaly:                                            

It is not possible creating a new Department D777, with the mail code M40 but no employee working for it yet. This is because, as the PK of Employee_Bad, EmpId cannot be null.

EmpId Name DeptId MailCode
101 Lady Gaga D123 M10
122 Brad Pitts D123 M10
140 Lebron James D123 M10
155 Narendra Modi D222 M21
167 Jennifer Lopez D222 M21
311 John Smiths D300 M33
???? ???? D777 M40 (this row cannot be added)

1.3 Deletion Anomaly

John Smiths no longer works here. Result: the information that M33 is the mail code of Department D300 is also lost.

EmpId Name DeptId MailCode
101 Lady Gaga D123 M10
122 Brad Pitts D123 M10
140 Lebron James D123 M10
155 Narendra Modi D222 M21
167 Jennifer Lopez D222 M21
311 John Smiths D300 M33

1.4 Decomposition

A standard way of resolving unnecessary redundancy in poorly designed tables is by proper decomposition: breaking down a relation into two or more component relations.

Example: the decomposition of the relation Employee_Bad into two relations:

1. Empolyee(EmpId, Name, DeptId)

EmpId Name DeptId
101 Lady Gaga D123
122 Brad Pitts D123
140 Lebron James D123
155 Narendra Modi D222
167 Jennifer Lopez D222
311 John Smiths D300

2. Department(DeptId, MailCode)

DeptId MailCode
D123 M10
D222 M21
D300 M33

To obtain the original relation Employee_Bad(EmpId, Name, DeptId, MailCode) from

Employee(EmpId, Name, DeptId)
Department(DeptId, MailCode)

Relational algebra: using natural join, |x|.

Employee_Bad = Employee |x| Department

This decomposition is said to be a lossless decomposition.

SQL:           

SELECT Employee.*, Department.MailCode
FROM Employee INNER JOIN Department ON (Employee.DeptId = Department.DeptId);

or

SELECT *
FROM Employee NATURAL JOIN Department;

  1. There is no loss of information: the definition of lossless decomposition.
  2. No previously mentioned unnecessary redundancy and anomaly.

The actions of the four use cases that produced anomaly in Employee_Bad:

1. Empolyee(EmpId, Name, DeptId)

EmpId Name DeptId
101 Lady Gaga D123
122 Brad Pitts D123
140 Lebron James D123
155 Narendra Modi D222
167 Jennifer Lopez D222 D300
311 John Smiths D300

2. Department(DeptId, MailCode)

DeptId MailCode
D123 M10 M44
D222 M21
D300 M33
D777 M40

2. Methods for good database designs

Two main tools:

  1. Integrity Rules:  data constraint rules for avoiding data inconsistency.
  2. Normal Forms:  a set of rules for designing good relation schemas.

3. Integrity Rules

3.1 Database-Specific Integrity Rules

Examples: some database-specific integrity rules.

  1. Student Id should be a seven-digit number.
  2. Date of Birth should be greater than 1900.
  3. The room number of Delta Building should start with a 'D'.
  4. A student cannot take more than 24 credits in any semester.
  5. A student must show proof of a meningitis shot before registration for the first semester.

3.2 General Integrity Rules

3.2.1 Entity Integrity Rule

Example: 

Employee(EmpId, Name, DeptId, Salary)

EmpId Name DeptId Salary
101 Lady Gaga D123 55000000
122 Brad Pitts D123 10100000
140 Lebron James D123 50000000
155 Narendra Modi @: null @
@ Jennifer Lopez D222 20000000 (should not be able to add this row)
@ John Smiths D300 70000 (should not be able to add this row)

3.2.2 Referential Integrity Rule

Example:   

Employee(EmpId, Name, DeptId)

EmpId Name DeptId
101 Lady Gaga D123
122 Brad Pitts D123
140 Lebron James @
155 Narendra Modi D222
167 Jennifer Lopez D222
311 John Smiths D300

Department(DeptId, MailCode)

DeptId MailCode
D123 M10
D222 M21
D300 M33

Example:   

Employee(EmpId, Name, DeptId)

EmpId Name DeptId
101 Lady Gaga D123
122 Brad Pitts D123
140 Lebron James D123
155 Narendra Modi D222
167 Jennifer Lopez D222
311 John Smiths D300
350 Bun Yue D119 (should not be added)

Department(DeptId, MailCode)

DeptId MailCode
D123 M10
D222 M21
D300 M33

Note:

  1. In practical DBMS, pay attention to where the referential integrity rule is enforced.
  2. For example, in MySQL, only the INNODB data engine enforces the referential integrity rule.
  3. If the DBMS does not enforce the referential integrity rule, it will be the task of the DB developers to do so.