Normalization

by K. Yue

1. Normal Forms

Example:

If a relation is in BCNF, then it is also in 3NF, 2NF and 1NF.

If a relation is in 2NF, then

  1. It is in 1NF,
  2. it may or may not be in 3NF, and
  3. it may or may not in BCNF.

If a relations is not in 3NF, then

  1. It is not in BCNF.
  2. It may or may not be in 1NF or 2NF.

Functional Dependencies (FD):

Example

Many to many relationships.

Consider the relation Enrol:

Course Student Grade
C1 S1 A
C1 S2 B
C1 S3 B
C2 S1 A
C2 S4 D

Under reasonable assumptions, there are many to many relationships between the attributes

  1. Course and Student: course may enrol many students; a student may take many courses)
  2. Course and Grade
  3. Student and Grade
  4. {Course, Grade} and Student: both S2 and S3 have a grade of B in Course C1.

However, the relationship between {Course, Student} and Grade is not a many-to-many relationship if we assume that a student can only has one grade for a given course.

Example

Many to one relationships.

For many applications, the relationship between SS# and NAME are many to one.

SS#        ->     NAME
(many)            (one)

Interpretations and terminology:

  1. Many different SS#'s (persons) may have the same NAME.
  2. Given a SS#, there can only be one NAME associated with it (not allowing alias, etc).
  3. There should not be two tuples with the same SS#, but different NAME. 
  4. SS# uniquely determines NAME.
  5. NAME is functionally determined by SS#.
  6. There is a functional dependency SS# --> NAME.
  7. Hence, a functional dependency specifies a many to one relationship between attributes.

For example,

SS# NAME PHONE
123456789 Peter A
123456789 Paul B
222229999 Mary B

is not allowed if we assume SS# -> NAME.

Example

In a university, there may be a many-to-one relationship between {COURSE#, STUDENT#} and GRADE.

Interpretations:

  1. A student may have only one grade for a course.
  2. We say  there is a functional dependency:
  3. Note that under different assumptions, the functional dependency may not be true.
  4. For example, if a student is allowed to retake a course, then he may have two grades for the same course (in different semesters), then COURSE# STUDENT# --> GRADE  is false.
  5. We may actually have COURSE# STUDENT# SEMESTER --> GRADE

Example

In most application, we have

SS# --> NAME             (i.e.  a person has only one SS#.)

However, in a criminal database, several bad guys may use the same fake SS#, and thus

SS# --> NAME  is not true.

Or, if you are dealing with an international data base with many countries.  Each country may has its own SS#.  Two countries may issue the same SS#.  Hence,

SS# --> NAME   is not true.

We may instead have  SS# COUNTRY --> NAME.

FD definition

Example

SS# --> SNAME:

There are no two tuples with the same SS# but different names.

DEPT-NO --> MANAGER-NO:

There are no two tuples with the same DEPT-NO but different MANAGER-NO.  A department has only one manager.

SUPPLIER# PART# DATE --> QUANTITY

There are no two tuples with the same SUPPLIER#, PART# and DATE but different QUANTITY.  That is, any supplier has only one shipment of a part on a given date.

Armstrong's axioms

Example

Let X be CITY STREET, Y be STREET, then Y is a subset of X, and X --> Y or CITY STREET --> STREET. (Reflexivity).

A --> A and B C --> B are trivial.

If EMP-NO  -->  DEPT-NO and DEPT-NO  -->  MANAGER-NO
then EMP-NO  -->  MANAGER-NO.

Interpretation: If

then every employee has only one manager.

Example

Prove the union Rule.

Proof.

(1) X -> Z (given)
(2) X X -> X Z (augmentation of (1) with X)
(3) X -> XZ (simplification of (2))
(4) X -> Y (given)
(5) XZ -> YZ (augmentation of (4) with Z)
(6) X -> YZ (transitivity on (3) and (5))

Exercise

Prove the pseudo transitivity rule.