Introduction to the relational model
by K. Yue
1. Introduction
The relational model is based on the concept of a relation.
- Note that practical DBMS do not follow the pure relational model.
- The relational model is set theoretic.
Important definitions:
- An attribute is a name.
- A domain is a set of values an attribute can take.
- The value of an attribute should be atomic (cannot be divided into smaller components with individual meanings): First Normal Form.
- Null may or may not be an acceptable value for an attribute.
- A tuple is a set of mapping of a set of attributes {A1,A2, ...,An} to a set of values {d1,d2, ...,dn} of the domains {D1,D2, ...,Dn}
Example:
A tuple: a set of mapping.
{SID: 123456, SNAME: "Johnson, Gary", MAJOR: "CSCI", GPA: 3.71}
The same tuple:
{SNAME: "Johnson, Gary", MAJOR: "CSCI", SID: 123456, GPA: 3.71}
Other possible representations:
(SID: 123456, SNAME: "Johnson, Gary", MAJOR: "CSCI", GPA: 3.71)
If the order for displaying the attribute names is understood:
(123456, "Johnson, Gary", "CSCI", 3.71)
Consider the very simple database: supply.sql.txt
MariaDB [supply]> select * from supplier;
+------+-------+---------+--------+
| SNUM | SNAME | SCITY | STATUS |
+------+-------+---------+--------+
| S1 | ABC | Dallas | 10 |
| S2 | DEF | Houston | 20 |
| S3 | Go go | Houston | 12 |
| S4 | P&G | Dallas | 2 |
| S5 | Yap | Phoenix | 5 |
| S6 | Yue | Dallas | 1 |
+------+-------+---------+--------+
6 rows in set (0.04 sec)
MariaDB [supply]> select sname, snum, status, scity
-> from supplier
-> order by scity;
+-------+------+--------+---------+
| sname | snum | status | scity |
+-------+------+--------+---------+
| ABC | S1 | 10 | Dallas |
| P&G | S4 | 2 | Dallas |
| Yue | S6 | 1 | Dallas |
| DEF | S2 | 20 | Houston |
| Go go | S3 | 12 | Houston |
| Yap | S5 | 5 | Phoenix |
+-------+------+--------+---------+
6 rows in set (0.00 sec)
- The two results are relations.
- They look different. They have different display views.
- However, the information stored is the same. In the pure relational model, the two relations are equal.
Notes:
- There are difference between the pure relational model and commercial DBMS.
- A tuple may be called a row or a record in a commercial DBMS.
- An attribute may be called a column name or a field.
- The order of the fields within a row is usually defined in a commercial DBMS, such as by SQL.
- It is important to learn both the theory and practice of databases.
- A relation schema is a set of attributes: R(A1,A2, ...,An).
- The degree (or arity) of a relation is the number of attributes in its schema.
Example: Different notations.
Student: {SID, SNAME, MAJOR, GPA}
or
Student(SID, SNAME, MAJOR, GPA)
- A relation schema can be considered as the framework or structure of a relation.
- A relation schema seldom changes.
- A relation, or more specifically, an instance of a relation, is a set of tuples.
Example:
A relation schema:
Student(SID, SNAME, MAJOR, GPA)
An instance of the relation:
SID |
SNAME |
MAJOR |
GPA |
12345 |
Johnson |
MATH |
3.33 |
13526 |
Yue |
CSCI |
2.11 |
20010 |
Smith |
CSCI |
3.94 |
10110 |
Bajaj |
MATH |
3.22 |
11111 |
Ganta |
CSCI |
3.78 |
Another instance of the relation:
SID |
SNAME |
MAJOR |
GPA |
12345 |
Johnson |
MATH |
3.55 |
13526 |
Yue |
CSCI |
3.61 |
10110 |
Bajaj |
MATH |
3.22 |
11111 |
Ganta |
CSCI |
3.98 |
Consider the very simple database: supply.sql.txt
- R: Supplier
- (A1,A2, ...,An): (SNUM, SNAME, SCITY, STATUS)
- A tuple {d1,d2, ...,dn} = {"S1","ABC","Dallas","10"}.
- {D1,D2, ...,Dn} = {VARCHAR(9), VARCHAR(15), VARCHAR(15), INTEGER}
- Degree of Supplier = 4.
Some observations:
- A relation (instance) can change quickly.
- Some important properties of a relation:
- There is no duplicate tuple.
- Because a relation is a set.
- Consequence: it does not directly support 'object identity'. A candidate key is needed.
- The relational model is known as 'value-oriented'.
- A table and a relation is not exactly the same. "Table" is a more generic term.
- The tuples within a relation are unordered.
- Changing the order of showing the tuples does not change the relation.
- The attributes within a relation schema is unordered.
- Changing the display order of the attributes within a relation schema does not change the relation.
Null values:
- Generally used for representing missing information.
- DBSM provides a method to test whether a value is null or not.
- Three possible interpretations:
- Not applicable.
- Missing value.
- No information at all.
Example:
Consider the attribute SpouseName. A Null value may mean:
- Not applicable: the person is not married.
- Missing information: the person is married and has a spouse, but we do not have the name of the spouse.
- No information at all: we do not know whether the person is married or not.
How do we distinguish between the three meanings of the null value in this case?
By using an extra attribute, such as MaritalStatus.
Note that the Boolean type (if supported) actually uses a 3-valued logic.
2. Keys
- A set of attributes K is a candidate key of a relation R if it minimally identifies a tuple at any time:
- Uniqueness: No two distinct tuples of R have the same value of K.
- Minimality: No proper subset of K has the uniqueness property.
- A candidate key (CK) is a property of a relation.
- A relation has one or more candidate keys.
- A set of attributes may be a candidate key for a relation R but not a candidate key of another relation S.
- For any relation instance, the candidate key of a tuple must have an unique value.
Example:
Consider the following valid relation instance:
A |
B |
C |
12345 |
101 |
A |
13098 |
201 |
B |
Can you identify a candidate key?
Consider another valid relation instance.
StudentID |
Course ID |
Grade |
12345 |
101 |
A |
13098 |
201 |
B |
Can you identify a candidate key?
What about this?
StudentID |
Course ID |
Grade |
Some other columns |
12345 |
101 |
A |
|
13098 |
201 |
B |
|
13098 |
101 |
C- |
|
12345 |
201 |
D |
|
- In general, you cannot 'identify' a candidate key from a valid relation instance.
- A set of attributes that are unique for one instance does not guarantee that it is unique for all instances.
- However, you may reject a set of attributes as potential candidate key from a valid relation instance.
Example:
Given the following instance known to be valid:
StudentID |
Course ID |
Grade |
Some other columns |
12345 |
101 |
A |
|
13098 |
201 |
B |
|
13098 |
101 |
C- |
|
12345 |
201 |
D |
|
The following sets of attributes cannot be CK: [1] StudentId, [2] CourseId.
More terms:
- Whether a set of attributes is a candidate key of a relation or not depend on the semantic (meaning) of the relation in modeling the application.
- A primary key is a selected candidate key for a relation in the DBMS. It is used for practical purpose and does not have a special meaning in the theory of the relational model.
- An alternate key is a candidate key that is not the primary key.
- A key is simple if it has only one attribute
- A key is composite if it has more than one attributes.
- Every relation has at least one candidate key.
- An attribute that appears in a candidate key is a prime attribute (or key attribute).
- An attribute that does not appears in any candidate key is a non-prime (non-key) attribute.
Example:
Consider Employee(EmployeeID, EmployeeName)
If we make these assumptions:
- An employee has an unique ID.
- An employee's name is not unique.
We can conclude:
- EmployeeID is a candidate key of Employee.
- {EmployeeID, EmployeeName} is not a candidate key of Employee: it is unique but not minimal. It is a superkey.
- Prime attribute: EmployeeID
- Non-prime attribute: EmployeeName
Example:
Consider R(EmployeeID, DepartmentID)
If we make these assumptions:
- An employee has an unique ID.
- A department has an unique ID.
- An employee is associated with (works for) 0 or 1 department.
- A department is associated with (may have) many employees.
We can conclude that EmployeeID is the only candidate key for R.
Under a different set of assumptions, we may have different candidate keys
Assumptions:
- An employee has an unique ID.
- A department has an unique ID.
- An employee is associated with 0 or 1 department.
- A department is associated with only 0 or 1 employee.
Then we have two candidate keys:(1) EmployeeId and (2) DepartmentID. The relation R may be called DepartmentHead.
- A superkey is a superset of any candidate key.
- A superkey satisfy the uniqueness requirement, but not necessarily the minimality requirement.
- A candidate key is also a superkey.
Exercise:
A relation R has an arity of 5. What are the possible minimum and maximum number of superkeys of R? . What are the possible minimum and maximum number of candidate keys of R?
What about the general case?
Example:
Prove that a relation always has a candidate key.
Proof:
Since a relation is a set of tuples, the relation schema is always unique. If we can find a proper subset of relation schema that is a candidate key, we have found a candidate key. If not, the relation schema is minimal and is thus a candidate key.
- A foreign key of a relation is a set of attributes that is a candidate key in a foreign relation. That foreign relation is sometimes called the parent (referenced, home, or base) table of the foreign key.
- A foreign key may or may not allow null value. It depends on the problem requirements.
Example:
Consider R(A,B,C) with two candidate keys A and B and S(A, D, E, F) with a single candidate key AD. A is a foreign key in S and the base table of A is R.