Normalization

by K. Yue

1. Functional Dependencies

Example:

If a relation is in BCNF, then it is also in 3NF, 2NF and 1NF.

If a relation is in 2NF, then

  1. It is in 1NF,
  2. it may or may not be in 3NF, and
  3. it may or may not in BCNF.

If a relations is not in 3NF, then

  1. It is not in BCNF.
  2. It may or may not be in 1NF or 2NF.

Functional Dependencies (FD):

Example

Many to many relationships.

Consider the relation Enrol:

Course Student Grade
C1 S1 A
C1 S2 B
C1 S3 B
C2 S1 A
C2 S4 D

Under reasonable assumptions, there are many to many relationships between the attributes

  1. Course and Student: course may enrol many students; a student may take many courses)
  2. Course and Grade
  3. Student and Grade
  4. {Course, Grade} and Student: both S2 and S3 have a grade of B in Course C1.

However, the relationship between {Course, Student} and Grade is not a many-to-many relationship if we assume that a student can only has one grade for a given course.

Example

Many to one relationships.

For many applications, the relationship between SS# and NAME are many to one.

SS#        ->     NAME
(many)            (one)

Interpretations and terminology:

  1. Many different SS#'s (persons) may have the same NAME.
  2. Given a SS#, there can only be one NAME associated with it (not allowing alias, etc).
  3. There should not be two tuples with the same SS#, but different NAME. 
  4. SS# uniquely determines NAME.
  5. NAME is functionally determined by SS#.
  6. There is a functional dependency SS# --> NAME.
  7. Hence, a functional dependency specifies a many to one relationship between attributes.

For example,

SS# NAME PHONE
123456789 Peter A
123456789 Paul B
222229999 Mary B

is not allowed if we assume SS# -> NAME.

Example

In a university, there may be a many-to-one relationship between {COURSE#, STUDENT#} and GRADE.

Interpretations:

  1. A student may have only one grade for a course.
  2. We say  there is a functional dependency:
  3. Note that under different assumptions, the functional dependency may not be true.
  4. For example, if a student is allowed to retake a course, then he may have two grades for the same course (in different semesters), then COURSE# STUDENT# --> GRADE  is false.
  5. We may actually have COURSE# STUDENT# SEMESTER --> GRADE

Example

In most application, we have

SS# --> NAME             (i.e.  a person has only one SS#.)

However, in a criminal database, several bad guys may use the same fake SS#, and thus

SS# --> NAME  is not true.

Or, if you are dealing with an international data base with many countries.  Each country may has its own SS#.  Two countries may issue the same SS#.  Hence,

SS# --> NAME   is not true.

We may instead have  SS# COUNTRY --> NAME.

FD definition

Example

SS# --> SNAME:

There are no two tuples with the same SS# but different names.

DEPT-NO --> MANAGER-NO:

There are no two tuples with the same DEPT-NO but different MANAGER-NO.  A department has only one manager.

SUPPLIER# PNUM DATE --> QUANTITY

There are no two tuples with the same SUPPLIER#, PNUM and DATE but different QUANTITY.  That is, any supplier has only one shipment of a part on a given date.

Armstrong's axioms

Example

Let X be CITY STREET, Y be STREET, then Y is a subset of X, and X --> Y or CITY STREET --> STREET. (Reflexivity).

A --> A and B C --> B are trivial.

If EMP-NO  -->  DEPT-NO and DEPT-NO  -->  MANAGER-NO
then EMP-NO  -->  MANAGER-NO.

Interpretation: If

then every employee has only one manager.

Example

Prove the union Rule.

Proof.

(1) X -> Z (given)
(2) X X -> X Z (augmentation of (1) with X)
(3) X -> XZ (simplification of (2))
(4) X -> Y (given)
(5) XZ -> YZ (augmentation of (4) with Z)
(6) X -> YZ (transitivity on (3) and (5))

Exercise

Prove the pseudo transitivity rule.

Keys and Superkeys

Example

In EMPLOYEE(EMP-NO, DEPT-NO, MANAGER-NO) with

EMP-NO --> DEPT-NO and
DEPT-NO --> MANAGER-NO.

By the transitivity axiom, EMP-NO --> MANAGER-NO.
By the union rule, EMP-NO --> EMP-NO DEPT-NO MANAGER-NO.
                       
Hence, EMP-NO is a key of EMPLOYEE(EMP-NO, DEPT-NO, MANAGER-NO).

On the other hand, DEPT-NO is not a key since we do not have DEPT-NO --> EMP-NO.

Furthermore, there are four superkeys:

  1. EMP-NO
  2. EMP-NO, DEPT-NO
  3. EMP-ND, EPT-NO, MANAGER-NO

Closure of Attributes

Example

Cons der:

F = {A-> B, BC -> DA, BD -> C, E-> A, AC -> DE }

We have

A+ = AB
B+ = B
C+ = C
D+ = D
E+ = E
(AB)+ = AB
(AC)+ = ABCDE
(AD+ = ABCDE
(AE)+ = ABE
(BC)+ = ABCDE
(BD)+ = ABCDE
(BE)+ = ABE
(CD)+ = CD
(CE)+ = ABCDE
(DE)+ = ABCDE
(ABC)+ = ABCDE
...
(ABE)+ = ABE
...

Algorithm for finding X+ for a set of FDs F.

X+ <- X
while (there exists a FD P -> Q such that P is a subset of X+ and there are attributes K in Q that is not in X+) {
   X+ <- X+ U Q
}

Closure of a set of functional dependencies

Example

Consider F = { A->B, B->C }

F+ = {
A->A, A->B, A->C, A-> AB, A-> AC, A-> BC, A->ABC,
B->B, B->C, B->BC,
C->C,
AB->A, AB->B, AB->C, AB->AB, AB->AC, AB->BC, AB->ABC,
AC->A, AC->B, AC->C, AC->AB, AC->AB, AC->BC, AC-> ABC,
BC->B, BC->C, BC->BC,
ABC->A, ABC->B, ABC->C, ABC-> AB, ABC-> AC, ABC-> BC, ABC->ABC }

Note that

Equivalence and cover

Example

Consider the F = { A->B, AB->C }.

B is extraneous since for G = { A->B, A->C }, F+ = G+.

Example

In F = {A->B, AB->C, B->C },

A->C is redundant.

In F = {A->B, AB->C, B->C, A->D},

G1 = {A->B, B->C,A->D} is a minimal cover.

G2 = {A->BD, B->C} is a canonical cover.

Exercise:

Consider F: {A->C, BCD->A, C->E, CD-> A, AB->C }

  1. Does F imply BD-> A (i.e. F |= BD -> A)?
  2. F |= AE -> B ?
  3. Give a canonical cover for F?
  4. Show all candidate keys.

Exercise:

Consider F: {AB->CE, BC->D, D->BC, C->E, A->C, A->E}

Find:

Exercise:

Can there be more than one canonical covers for a set of FDs?

2. Normal Forms using Functional Dependencies

First Normal Form

Example

Example:   Consider the following table.  It is not in 1 NF.

DEPT_NO MANAGER_NO EMP_NO NAME
D123 54321

10000, 12000,
13000

Lady Gaga, Eminem, Lebron James

D225 42315 21000, 22000 Rajiv Gandhi, Bill Clinton
D337 33323 31000 John Smithson

The corresponding relation in 1 NF:

DEPT_NO MANAGER_NO EMP_NO NAME
D123 54321 10000 Lady Gaga
D123 54321 12000 Eminem
D123 54321 13000 Lebron James
D225 42315 21000 Rajiv Gandhi
D225 42315 22000 Bill Clinton
D337 33323 31000 John Smithson

Example

Consider the tuple (Empid: 12345, OSSkills: "Windows, Linux, Solaris").

Second Normal Form

Example

The following relation is not in 2NF.  (Assume the number of credits of a given course does not change).  Note the redundancy and anomalies.

Course Credit Student Grade
C1 3 S1 A
C1 3 S2 B
C1 3 S3 B
C2 2 S1 A
C2 2 S4 D

That is, we assume:

Thus,

  1. Course, Student is the only candidate key.
  2. Prime attributes: Course, Student
  3. Non-prime attribute: Credit, Grade.
  4. (1) is a violation of 2NF.

Third Normal Form

Example

DEPT_NO MANAGER_NO EMP_NO NAME
D123 54321 10000 Lady Gaga
D123 54321 12000 Eminem
D123 54321 13000 Lebron James
D225 42315 21000 Rajiv Gandhi
D225 42315 22000 Bill Clinton
D337 33323 31000 John Smithson

Example

Consider the relation

S(SNUM, PNUM, SNAME, QUANTITY) with the following assumptions:

  1. SNUM is unique for every supplier.
  2. SNAME is unique for every supplier.
  3. QUANTITY is the accumulated quantities of a part supplied by a supplier.
  4. A supplier can supply more than one part.
  5. A part can be supplied by more than one supplier.

We have the following non-trivial functional dependencies:

  1. SNUM --> SNAME
  2. SNAME --> SNUM
  3. SNUM PNUM --> QUANTITY
  4. SNAME PNUM --> QUANTITY

Note that SNUM and SNAME are equivalent.

The candidate keys are:

  1. SNUM PNUM
  2. SNAME PNUM

Prime attributes: SNUM, PNUM, SNAME

Non-prime attribute: QUANTITY.

The relation is in 3NF.

Example

Consider the relation R(CITY, STREET, ZIP) with the FDs:

  1. CITY STREET --> ZIP, and
  2. ZIP --> CITY.

There are two candidate keys:

  1. CITY STREET, and
  2. ZIP STREET

Hence, all attributes are prime attributes and the relation is in both 2NF and 3NF.

Note that a relation such as EMPLOYEE(EMP_ID, EMP_NAME, Street, City, Zip, State) is not in 3NF.

This is a classical example you can find in many database textbook. The FDs may not be correct in the United States. See, for example: Why all 5-digit ZIP Code™ lists are obsolete.

BCNF (Boyce-Codd Normal Form)

Example                  

EMPLOYEE(EMP_NO, NAME, DEPT_NO, MANAGER_NO) with

EMP_NO --> NAME
EMP_NO --> DEPT_NO
DEPT_NO --> MANAGER_NO

is not in BCNF.

The functional dependency  DEPT_NO --> MANAGER_NO is

(1)  non-trivial, and
(2)  DEPT_NO is not a superkey.

We can decompose

EMPLOYEE(EMP_NO, NAME, DEPT_NO, MANAGER_NO) into

EMP(EMP_NO, NAME, DEPT_NO) with

EMP_NO --> NAME
EMP_NO --> DEPT

and

DEPARTMENT(DEPT_NO, MANAGER_NO) with

DEPT_NO --> MANAGER_NO

Both relations are in BCNF since

Recall that these are the good relations without the anomalies in the previous example.

Example    

Consider the relation

S(SNUM, PNUM, SNAME, QUANTITY) with the following non-trivial functional dependencies:

  1. SNUM --> SNAME
  2. SNAME --> SNUM
  3. SNUM PNUM --> QUANTITY
  4. SNAME PNUM --> QUANTITY

Note that SNUM and SNAME are equivalent.

The candidate keys are:

  1. SNUM PNUM
  2. SNAME PNUM

Prime attributes: SNUM, PNUM, SNAME

Non-prime attribute: QUANTITY.

S is not in BCNF because, for example, the functional dependency 

SUPP# --> SNAME is

To deal with it, we can decompose S(SUPP#, PART#, SNAME, QUANTITY) into

(1) SUPPLIER(SNUM, SNAME) with  

SNUM --> SNAME
SNAME --> SNUM

with two candidate keys:

  1. SNUM
  2. SNAME

(2) ORDER(SUPP#, PART#, QUANTITY)  with 

SNUM, PNUM --> QUANTITY.

Example 4:  Consider the relation R(A, B, C, D) with

A --> B,  B --> C, C --> A and C --> D.

There are three candidate keys:

Since every left hand side of any non-trivial functional dependency is a superkey,  R is in BCNF.

Motivation of BCNF

Example 

Consider the relation R(CITY, ZIP, STREET)
       
Using the code for the postal office, we have

CITY STREET --> ZIP, and ZIP --> CITY.

Hence, there are two candidate keys:

  1. CITY STREET, and
  2. ZIP STREET

Therefore, R is not in BCNF since in ZIP --> CITY, ZIP is not a superkey.

However, if we decompose R into two relations, each with two attributes, then the functional dependency

CITY STREET --> ZIP is lost

Therefore, we better leave the relation alone.

Third Normal Form Revisited

Example 

For the relation R(CITY, ZIP, STREET)
       
Using the code for the postal office, we have

CITY STREET --> ZIP, and ZIP --> CITY.

Hence, there are two candidate keys:

  1. CITY STREET, and
  2. ZIP STREET

Hence,

Prime attributes: STREET, CITY, ZIP

R is in the 3NF because

Example 

Reconsider the relation

S(SNUM, PNUM, SNAME, QUANTITY) with the following non-trivial functional dependencies:

  1. SNUM --> SNAME
  2. SNAME --> SNUM
  3. SNUM PNUM --> QUANTITY
  4. SNAME PNUM --> QUANTITY

Note that SNUM and SNAME are equivalent.

The candidate keys are:

  1. SNUM PNUM
  2. SNAME PNUM

Prime attributes: SNUM, PNUM, SNAME

Non-prime attribute: QUANTITY.

S is in 3NF because

Example 

Reconsider

EMPLOYEE(EMP_NO, NAME, DEPT_NO, MANAGER_NO) with

EMP_NO --> NAME
EMP_NO --> DEPT_NO
DEPT_NO --> MANAGER_NO

is not in 3NF.

The functional dependency  DEPT_NO --> MANAGER_NO is

(1) non-trivial,
(2) DEPT_NO is not a superkey, and
(3) MANAGER_NO is not a prime attribute.

Normalization Theory Using Functional Dependencies

3. Decomposition

Lossy Decomposition

Example:

Consider the relation EMP(EMP_NO, DEPT, MGR_NO) with 

EMP_NO -->  DEPT
DEPT -->  MGR_NO

Note that we do not have MGR_NO --> DEPT, since one manager can manage more than one departments under the assumptions made for this example.

EMP_NO DEPT MGR_NO
12345 ACCT 90000
12399 HR 90000
30000 ENG 98000

          
The relation is not in BCNF because of the FD

DEPT --> MGR_NO

Suppose we decompose the relation into

EMP1(EMP_NO, MGR_NO)
DEPT(DEPT, MGR_NO)

They are obtained by projections from EMP:

EMP1:                     

EMP_NO MGR_NO
12345 90000
12399 90000
30000 98000

DEPT:

DEPT MGR_NO
ACCT 90000
HR 90000
ENG 98000

If we do not loss any information by the decomposition, we should get the original relation from the natural join.

However,  EMP1 |x| DEPT is     

EMP_NO DEPT MGR_NO
12345 ACCT 90000
12345 HR 90000
12399 ACCT 90000
12399 HR 90000
30000 ENG 98000

         
This is not the same as the original relation EMP. Spurious tuples were incorrectly created.

Hence, the decomposition of EMP(EMP_NO, DEPT, MGR_NO) into
 
EMP1(EMP_NO, MGR_NO) and
DEPT(DEPT, MGR_NO)

is lossy.  It is not a good decomposition.

Lossless Decomposition

Example:

Consider now the following decomposition of EMP(EMP_NO, DEPT, MGR_NO):

EMP2(EMP_NO, DEPT)  and
EMP3(EMP_NO, MGR_NO)

We have EMP2 and EMP3:

EMP2:                     

EMP_NO DEPT
12345 ACCT
12399 HR
30000 ENG

EMP3:

EMP_NO MGR_NO
12345 90000
12399 90000
30000 98000

Hence, EMP2 |x| EMP3:

EMP_NO DEPT MGR_NO
12345 ACCT 90000
12399 HR 90000
30000 ENG 98000

This is exactly the same as the original relation EMP.  Therefore, the decomposition does not loss any information.  It is a lossless decomposition.

Theory of Lossless Decomposition

Example:

Why is the decomposition of EMP(EMP_NO, DEPT, MGR_NO) into

(1) EMP1(EMP_NO, MGR_NO) and DEPT(DEPT, MGR_NO) lossy, and

(2) EMP2(EMP_NO, DEPT) and EMP3(EMP_NO, MGR_NO) lossless?

Theorem: Suppose R(X, Y, Z) is decomposed into R1(X, Y) and R2(X, Z).  X is the set of common attributes in R1 and R2.  The decomposition is lossless if and only if

(a) X --> Y, or
(b) X --> Z.

Example:

In case (1), X is MGR_NO, Y is EMP_NO, Z is DEPT.

None of condition (a) or (b) is satisfied.  Hence, (1) is lossy.

In case (2), X is EMP_NO, Y is DEPT, Z is MGR_NO.

Both conditions (a) and (b) are satisfied.  Hence, (2) is lossless.

Example:

Consider R(A,B,C,D,E) with {A->BC, CD -> E, BA -> C, D->B}.

It is decomposed into R1(A,B), R2(A,C), R3(C,D,E) and R3(B,D).

Step 1. Create a table of 5 columns (number of columns and 4 rows (number of relations). Populate it with b(i,j).

Relation
A
B
C
D
E
R1
b(1,1)
b(1,2)
b(1,3)
b(1,4)
b(1,5)
R2
b(2,1)
b(2,2)
b(2,3)
b(2,4)
b(2,5)
R3
b(3,1)
b(3,2)
b(3,3)
b(3,4)
b(3,5)
R4
b(4,1)
b(4,2)
b(4,3)
b(4,4)
b(4,5)

Step 2. For each relation Ri, set all attribute Aj that appears in Ri from b(i,j) to a(j).

Relation
A
B
C
D
E
R1
a(1)
a(2)
b(1,3)
b(1,4)
b(1,5)
R2
a(1)
b(2,2)
a(3)
b(2,4)
b(2,5)
R3
b(3,1)
b(3,2)
a(3)
a(4)
a(5)
R4
b(4,1)
a(2)
b(4,3)
b(4,4)
a(5)

Step 3. For each FD X-> Y, with two rows have the common X values, for every attribute W in Y:

Applying A-> BC:

Relation
A
B
C
D
E
R1
a(1)
a(2)
a(3)
b(1,4)
b(1,5)
R2
a(1)
a(2)
a(3)
b(2,4)
b(2,5)
R3
b(3,1)
b(3,2)
a(3)
a(4)
a(5)
R4
b(4,1)
a(2)
b(4,3)
b(4,4)
a(5)

Applying CD -> E: no change since no row has the same values in CD.

Relation
A
B
C
D
E
R1
a(1)
a(2)
a(3)
b(1,4)
b(1,5)
R2
a(1)
a(2)
a(3)
b(2,4)
b(2,5)
R3
b(3,1)
b(3,2)
a(3)
a(4)
a(5)
R4
b(4,1)
a(2)
b(4,3)
b(4,4)
a(5)

Applying BA -> C: no change since R1 and R2 already have the same a's value: A(3).

Relation
A
B
C
D
E
R1
a(1)
a(2)
a(3)
b(1,4)
b(1,5)
R2
a(1)
a(2)
a(3)
b(2,4)
b(2,5)
R3
b(3,1)
b(3,2)
a(3)
a(4)
a(5)
R4
b(4,1)
a(2)
b(4,3)
b(4,4)
a(5)

Applying D->B

Relation
A
B
C
D
E
R1
a(1)
a(2)
a(3)
b(1,4)
b(1,5)
R2
a(1)
a(2)
a(3)
b(1,4)
b(2,5)
R3
b(3,1)
b(3,2)
a(3)
a(4)
a(5)
R4
b(4,1)
a(2)
b(4,3)
b(1,4)
a(5)

Since there is no row with only a's, the decomposition is lossy.

Example:

Now suppose that C->DE is also in the FDs. That is, we have:

R(A,B,C,D,E) with {A->BC, CD -> E, BA -> C, D->B, C->DE}.

We will have one more step.

Applying C->DE:

Relation
A
B
C
D
E
R1
a(1)
a(2)
a(3)
a(4)
a(5)
R2
a(1)
a(2)
a(3)
a(4)
a(5)
R3
b(3,1)
b(3,2)
a(3)
a(4)
a(5)
R4
b(4,1)
a(2)
b(4,3)
b(1,4)
a(5)

Now we have two rows with only a's and thus the decomposition is lossless.

Dependency-Preserving Decomposition

Example:                        

For the relation EMP(EMP_NO,DEPT,MGR_NO) with 

EMP_NO -->  DEPT
DEPT -->  MGR_NO,

The decomposition of EMP into

EMP2(EMP_NO, DEPT)  and
EMP3(EMP_NO, MGR_NO)

is lossless, but it does not preserve dependencies:

the FD  DEPT --> MGR_NO

cannot be enforced by any relation after the decomposition.

For example, if we add the information EMP 23000 work in the ACCT department under manager 97000 and are not careful, we may have:

 EMP2:                     

EMP_NO DEPT
12345 ACCT
12399 HR
30000 ENG
2300 ACCT

EMP3:

EMP_NO MGR_NO
12345 90000
12399 90000
30000 98000
23000 97000

The FD  DEPT -->  MGR_NO is violated.

Thus, for the relation EMP(EMP_NO,DEPT,MGR_NO) with 

EMP_NO -->  DEPT
DEPT -->  MGR_NO,

the best decomposition is into

EMP1(EMP_NO, DEPT)  and
DEPT(DEPT, MGR_NO)

It is easy to show that, the decomposition is lossless, preserves dependencies, and that EMP1 and DEPT are both in BCNF.

Algorithm for decomposition in 3NF relations

Example:

Consider R(A,B,C,D,E) with F = {A->BC, CD -> E, BA -> C, D->B}.

Step 1 Find a canonical cover (as opposed to a minimal cover in EN) G for F.

The FD BA->C is redundant.

G = {A->BC, CD -> E, D->B}.

Step 2. For every FD X->Y in G, create a relation with the schema XY and add it to the result D.

Relations created:

R1(A,B,C) with A->BC
R2(C,D,E) with CD->E
R3(B,D) with D->B

It can be seen very easily that R1, R2 and R3 are all in 3NF. Furthermore, all FDs are preserved.

Step 3. If no relation in D contains the key of R, create a new relation with the key of R being the schema and add it to the result D.

There is only one candidate key of R: AD. Since none of R1, R2 and R3 contains A, create the relation

R4(A,D) with no FD

Step 4. Simplify D by removing relations that are redundant (i.e. that its schema is a subset of the schema of another relation).

No action as there is no redundant relation.

Example:

Consider R(A,B,C,D) with {A->BC, BC->D, D->C}

Using the algorithm, the result of decomposition contains two relations:

R1(A,B,C) with {A-> BC} and
R2(B,C,D) with {BC->D, D>C}

R3(C,D) is removed as redundant in the last step of the algorithm.

R1 and R2 are both in 3NF but R2 is not in BCNF.

Example:

Consider the relation R(A, B, C) with A -> B and C -> B.

R is not in 2NF.  It is not possible to decompose R into two relations losslessly while preserving all functional dependencies.

However, it is possible to decompose into three relations losslessly and with all functional dependencies preserved:

R1(A, B),
R2(B, C) and
R3(A, C).

4. Higher Normal Forms

Multivalued Dependencies

Example 1:

Consider the following instance of the relation R(Emp_No, Dept, Skill):

EMP_NO DEPT SKILL
100 D101 PHP
100 D102 PHP
100 D101 MySQL
100 D102 MySQL
200 D101 PHP
300 D103 Graphics
300 D104 Graphics
400 D102 PHP
400 D102 Graphics
400 D102 MySQL

There are no non-trivial functional dependencies.  R is in BCNF.

If the department an employee is working on is independent of the skill that he has, there is redundancy. For example, the fact that employee 100 has the skill PHP is stored twice.

Example 2:

Under the assumption of the previous example, we have

Emp_No ->-> Dept and
Emp_No ->-> Skill.

Suppose we have the following relation instance under a different set of assumptions:

EMP_NO DEPT SKILL
100 D101 PHP
100 D102 PHP
100 D101 MySQL
300 D103 Graphics

Without knowing the underlying assumptions, this instance of the relation implies that the multivalued dependency Emp_No ->-> Dept does not hold true since

Note:

A more precise definition of MVD: 

Example 3:

Consider Example 1 again. In the data modeling, there may be two classes with a many-to-many association.

If one follows the data modeling and the mapping to relation guideline, two relations will be created (instead of one).

This shows the importance of good data modeling.

Some Properties Of Multivalued Dependencies

Fourth Normal Form

Example 4:

The relation R(Emp_No, Dept, Skill) in Example 1 is in BCNF but not in 4NF.  It should be decomposed into:

R1(Emp_No, Dept) and
R2(Emp_No, Skill).

Note that these two relations are created with good data modeling and mapping to relations.

Embedded Multivalued Dependencies

Example 5:

Consider the following relation R(Proj_No, Emp_No, Skill):

PROJ_NO EMP_NO SKILL
P1 E1 PHP
P1 E2 PHP
P1 E1 MySQL
P1 E2 MySQL
P2 E1 Graphics
P2 E3 Graphics
P3 E3 Graphics
P3 E3 PHP
P4 E4 MySQL

For this application, each project (Proj_No) has a number of employees (Emp_No) and each project requires a list of skills (Skill).  We have:

Proj ->-> Emp_No
Proj ->-> Skill.

An employee provides some skills that a project needs.  If an employee has a skill that is not needed by the project (e.g. employee E1 may have the skill of 'Internet'), the skill is not stored in the tuples of project P1, which does not require the skill 'Internet'.

Note that Emp_No ->-> Skill does not hold in R.

R is not in 4NF.

We can decompose the relation into two relations:

R1(Proj_No, Emp_No) and
R2(Proj_No, Skill)

R1:                        

PROJ_NO EMP_NO
P1 E1
P1 E2
P2 E1
P2 E3
P3 E3
P4 E4

R2:

PROJ_NO SKILL
P1 PHP
P1 MySQL
P2 Graphics
P3 Graphics
P3 PHP
P4 MySQL

Both R1 and R2 are now in 4NF.

However, if skill is a multi-valued attribute of an employee, then we should have

Emp_No ->-> Skill

It does not show up in R because it is embedded. If we project R to remove PROJ_NO, this relationship appears.

These embedded MVD's are not enforced by the relations R1 and R2.

These MVD's only display themselves after projection.

We may have the following three classes with three many-to-many associations between each pair of them:

Decomposition of R into R1 and R2 will lose this embedded multivalued dependency. 

Example 6:

There are (trivial) embedded MVD's Emp_No ->-> Skill | φ and Emp_No ->-> Proj_No | φ in R of the previous example.

To remedy the problem, decompose the relation R into:

R1(Proj_No, Emp_No)
R2(Proj_No, Skill) and
R3(Emp_No, Skill).

All relations are in 4NF (5NF too) and the embedded MVD are not lost.

Example 7:

Consider an application with four classes A, B, C and D with primary keys A_ID, B_ID, C_ID and D_ID.  There are many to many binary associations between A and C as well as B and C.  Furthermore, there is a ternary association between A, B and D.

If somebody has not performed a good data modelling, it is possible to come up with a relation R(A_ID, B_ID, C_ID, D_ID).

However, the MVD C_ID ->-> A_ID (or C_ID ->-> B_ID) is not true in R because of the additional attribute DID.  If the attribute D_ID is removed by projection, then the independence between A_ID and B_ID for a given value of C_ID will show up.  Hence, there is a embedded MVD C_ID ->-> A_ID | B_ID in R.

Join Dependencies

Example 8:

In Example 5 with embedded MVD, there is a non-trivial JD of R(Proj_No, Emp_No, Skill):

{{Proj_No, Emp_No}, {Proj_No, Skill}, {Emp_No, Skill}}

Fifth (Project-Join) Normal Form

Example 9:

The relation R(Proj_No, Emp_No, Skill) of Example 3 does not satisfy 5NF.

Domain-Key Normal Form

Example 10:

Consider the relation scheme Enrolment(Course_No, Student_No, Grade)

The key is {Course_No, Student_No}.

KC: Course_No, Student_No -> Grade.

DC:

Domain(Course_No): 001..999 (i.e. In(Course_No, {001..999}
Domain(Grade): {A,B,C,D,F,I,P}
Domain(Student_No): string(1..10)

GC:

One of the GC may be:

if Course_No mod 10 >= 8 then
   (Grade ε {'P', 'F', 'I'}
else
   (Grade ε {'A', 'B', 'C', 'D', 'F', 'I'};
end if;

The relation Enrolment is in PJNF.

However, the relation Enrolment is not in DKNF since the GC is not a natural consequence of the KC and DC.

To solve the problem, we may make the following decomposition:

Pass_Fail_Course_Enrolment(Course_No, Student_No, Grade) with

DC: 

Domain(Course_No): {I | I ε 1..999 and I mod 10 >= 8}
Domain(Grade): {'F', 'I', 'P'}

Regular_Course_Enrolment(Course_No, Student_No, Grade) with

DC:              

Domain(Course_No): {I | I ε 1..999 and I mod 10 < 8}
Domain(Grade): {'A', 'B', 'C', 'D', 'F', 'I'}

Both relations are now in DKNF. However, this is usually not done. Instead, stored procedures may be used to enforce the constraint.

Example 11:

A non-trivial FD is a GC.
A FD with the determinant being a key is a KC.

Hence, if a relation satisfies BCNF, then all its FD can be deducted from KC.

If a relation does not satisfy BCNF, then there is a FD with a non-key determinant.  Thus, it will not satisfy DKNF too.

Example 12:                    

A non-trivial MVD is a GC.
There is no way to express a MVD using KC or DC.

Thus, if a relation is not in 4NF, it is also not in DKNF.