Introduction to Functional Dependency and Normalization
by K. Yue
1. Introduction to Normalization
Example
If a relation R1 is in 4NF, then R1 is also in BCNF, 3NF, 2NF and 1NF. Refer to the diagram below for R1, R2, and R3. NFNF stands for Non-First Normal Form.
If a relation R2 is in 2NF, then
If a relation R3 is not in 3NF, then
The relations R1 and R3 in this example are depicted in the Venn's Diagram:
1.1 General Overview
2. Functional Dependencies (FD)
Example:
A student, {StudentId: 1233457} can be associated with only one GPA: {GPA: 3.00}
However, several students can have the same GPA:
{GPA: 3.00}
StudentId -> GPA (FD)
(many) (one)
On the other hand, a student {StudentId: 1233457} can take many courses: {cid: CSCI4333}, {cid:CSCI 2315}, {cid: CSCI3331},...
Many students can take the same course: {StudentId: 1233457}, {StudentId: 2233490}, {StudentId: 3333457},... take {cid: CSCI4333}.
Under these assumptions
StudentId | GPA | cid | Grade | others ... |
1233457 | 3.27 | CSCI4333 | A | |
1233457 | 3.27 | CSCI2315 | B- | |
1233457 | 3.27 | CSCI3331 | C+ | |
2233490 | 3.41 | CSCI4333 | A- | ... |
... | ||||
7891110 | 3.27 | ... | Comment: OK | |
1233457 | 3.66 | ... | Comment: not allowed |
StudentId ->-> cid (MVD: not covered in this course)
(many) (many)
2.1 Many to one relationships
Example
For many applications, the relationship between SSN and FName are many to one in a relation R(..,SSN, FName, ...)
SSN -> FName
(many) (one)
Assumptions:
Terms:
For example, consider the relation instance:
SSN | FName | PHONE | ... |
123456789 | Peter | 123-456-7890 | |
123456789 | Paul | 713-283-7066 | |
222229999 | Mary | 713-283-7066 |
The relation instance is not allowed if we assume SSN -> FName.
Example
In a university, there may be a many-to-one relationship between {CourseId, StudentId} and {Grade}.
Interpretations:
Note that AB->CD is a shorthand notation for {A,B} -> {C,D}
FD such as AB-> A, AB->B, AB->AB, A->A are trivial. They are always mathematically true, but do not capture any data requirements.
Example:
In most applications, we have
SSN -> FName (i.e. a person has only one SSN.)
However, in a criminal database, several bad guys may use the same fake SSN, and thus
SSN -> FName may not be true.
Or, if you are dealing with an international database with many countries, each country may has its own SSN. Two countries may issue the same SSN. Hence,
SSN -> FName is not true.
We may instead have SSN, CountryId -> Name.
2.2 Definition of FD
Example:
This instance r1 of R violates X->Z.
X | Y | Z |
'A' | 1 | 110 |
'A' | 1 | 123 |
'A' | 1 | 345 |
'B' | 2 | 232 |
'C' | 1 | 110 |
'C' | 2 | 212 |
This instance r2 of R does not violate X->Y.
X | Y | Z |
'A' | 1 | 110 |
'A' | 1 | 123 |
'A' | 1 | 345 |
'B' | 2 | 232 |
'C' | 1 | 110 |
'C' | 1 | 211 |
However, this instance r2 does not prove that X->Y.
In order to have X-> Y, all instances r of R must not violate the condition.
Examples:
DeptId -> ManagerId:
There are no two tuples with the same DeptId but different ManagerId. Meaning: a department can have only one manager.
CourseId, StudentId, Semester -> Grade
There are no two tuples with the same CourseId, StudentId and Semester, but different Grade. Meaning: any student taking a course in a semester has an unique grade. Note that it may not be true for a different university. Instead, the following may be true:
CourseId, StudentId, Year, Semester -> Grade
Example
Consider the following relation:
Supply(SupplierId, SupplierName, ProductId, ProductDesc, Quantity, ArrivalTime)
The relation stores the quantities and arrival times of shipments of products (identified by ProductId) from suppliers (Identified by SupplierId). A supplier may not have a unique name. Furthermore, the product description, ProductDesc, may be the same for two products. A supplier may supply the same product many times, each with a different ArrivalTime.
The functional dependencies (FD) of the relation may be:
SupplierId -> SupplierName
ProductId -> ProductDesc
SuplierId, ProductId, ArrivalTime -> Quantity
Decomposition:
Supplier(SupplierId, SupplierName) {SupplierId -> SupplierName}
Product(ProductId, ProductDesc) {ProductId -> ProductDesc}
Supply(SuplierId, ProductId, ArrivalTime, Quantity) {SuplierId, ProductId, ArrivalTime -> Quantity}
Example (from Spring 2019 HW):
Consider the following relation GO:
GO(GroupId, GroupName, GroupEMail, GroupChairId, GroupChairLName, GroupChairFName, GroupMemberId, GroupMemberMajor)
The relation stores information about student groups, their chair persons and members. Chair persons and members are students with unique student ids (stored as values in GroupChairId and GroupChairLName respectively). GroupId uniquely identifies a group, and a group has a unique name, and an email address (that may not be unique.) For example, three tuples are shown below.
GroupId |
GroupName |
GroupEMail |
GroupChairId |
GroupChairLName |
GroupChairFName |
GroupMemberId |
GroupMemberMajor |
G1 |
Biology |
bio@uhcl.edu |
12345 |
Lee |
Bryan |
23323 |
Biol |
G1 |
Biology |
bio@uhcl.edu |
12345 |
Lee |
Bryan |
24990 |
Biol |
G1 |
Biology |
bio@uhcl.edu |
12345 |
Lee |
Bryan |
38879 |
Phys |
G2 | Physics | phy@uhcl.edu | 23124 | Smith | Jane | 38879 | Phys |
G2 | Physics | phy@uhcl.edu | 23124 | Smith | Jane | 11900 | Chem |
G2 | Physics | phy@uhcl.edu | 23124 | Smith | Jane | 12345 | Biol |
Bryan Lee is the chair student of the group G1 Biology. The first three tuples also store information of three members of group G1
(a) List all applicable functional dependencies. (Make reasonable assumptions if necessary.)
GroupId -> GroupName, GroupEMail, GroupChairId
GroupName -> GroupId
GroupChairId -> GroupChairLName, GroupChairFName
MemberId -> GroupMemberMajor
Assumptions:
(b) What are the candidate keys?
{GroupId, MemberId} and {GroupName, MemberId}
(c) What is the highest normal form? Why?
1NF. For example, GroupId -> GroupEMail violates 2NF.
(d) If the highest normal form is not BCNF, can you decompose the relation GD losslessly into component relations in BCNF while preserving functional dependencies? If yes, how. If no, why?
Note that changes of attribute names in the member tables. For example, StudentLName is more appropriate than ChairLName since a student may not be a chair.