Conversion of ER to Relations

Converting ER Diagrams to Relation Schema

by K. Yue

1. Introduction

ER models need to be converted to the underlying data model.
For Relational DB, they need to be converted to relational schema.

2. Mapping of ER Diagrams to the Relational Model

The problem is: Source Model --> Target Model.
Source Model: ER
- Basic structures:
  1. entities
  2. relationships
  3. attributes
- Auxiliary structures: cardinality, data types, nullability, inheritance, etc.
- Other RDB-specific features: candidate keys, primary keys, indices, etc.
- Not all ER models are the same: different notations, extensions and guidelines.
Target Model: Relational
- Basic structures:
  1. relations, and
  2. columns.
- Auxiliary structures: rows, candidate keys, primary keys, foreign keys, data types, nullability, indices, constraints, etc.
- Different RDBMS provide different features.
Mapping rules need to be adapted to satisfy different situations.
Rules (of thumbs) for conversion are shown below.
One needs to understand the rationale behind these rules so you can adapt them to create your own mapping rules for a specific project which has a unique set of:
1. requirements
2. source ER models
3. target RDBMS

Entities:

E1. An entity is mapped to a relation.

The relation may use the same name.
Two relations may be merged into a single relation later on.
Basically,
- An entity type -> a relation.
- An entity instance (object) -> a row that stores the property of the entity.

Attributes:

A1: A single-valued attribute of atomic data type (of an entity or relationship) is mapped to an attribute in the corresponding table.

A2: For a multiple-valued attribute A of an entity E or a relationship L that is implemented as a relation R, A is implemented by a new relation L(RId, A), where RId is the primary key of the relation R.

(RId, A) is a candidate key.
RId is a foreign key referencing R(RId).
You may create a surrogate key, such as L_Id, to be used as a simple primary key. If so, there will be two candidate keys: (1) L_Id, (2) (RId, A)
The name of the new relation L should be meaningful, such as with the mentioning of A and (E or L).

A3. A single-value attribute of composite data type can be implemented in two ways.

A3a. If there is a suitable composite data type in the targeted RDBMS, implement it as an attribute with the targeted composite data type. (Note that first normal form is now not satisfied.)
A3b. Otherwise, treat it as a multi-valued attribute and use Rule A2.

A4: For a derived attribute A, there are usually two ways:

A4a. Implemented as an attribute:
- As a virtual computed column.
- As a physically stored column: mechanism such as triggers or stored procedures should be used to ensure the correctness of the derived values.
A4b. Not implemented as an attribute: mechanism such as stored functions or views can be used.

Keys and Constraints

Every relation should have at least one candidate key.

KC1. If a relation R implements an entity E or a relationship L, and E or L has candidate keys K's, set K's as candidate keys in R.

KC2. If a relation R implements an entity E or a relationship L, and E or L has no candidate key, create a surrogate primary key SK for R.

This is needed since every relation must have a candidate key.

KC3. An UNIQUE constraint should be specified for candidate keys. Each attribute of a candidate key should also be set to NOT NULL.

KC4. Select one candidate key as the primary key and set it as the primary key of the relation.

KC5. If the cardinality of a single-valued attribute is specified:

KC5a. If 0 is allowed, add NULL into the column definition. Note that this may not be necessary as NULL is usually the default.
KC5b. If 0 is not allowed, add NOT NULL into the column definition.

KC6. If a constraint in the ER diagram (including those specified in comments and other documentation) can be implemented by using the SQL Data Definition Language (DDL), do so. Otherwise, be mindful that they will need to be implemented somewhere else.

All requirements should be implemented.

Relationships

R1. For a many to one relationship between E₁ (the entity with the one cardinality) and E_m, add a column E1_Id into the relation R_m (which implements E_m).

Assume that R1_Id is the primary key of the relation R₁(which implements E₁).
R1_Id is a foreign key of R_m references R₁(R1_Id).
The name R1_Id may need to be renamed to become more meaningful.
If 0 is not allowed (i.e. 1..1) for E₁, then R1_Id is not null in Rm.
Any single valued attribute of the relationship is mapped to a column in R_m.
If you have composite or multi-valued attributes of the relationship, you should consider promoting the relationship to an entity in your ER model.

R2. For a many to many relationship between entities E_A and E_B, create a new relation R_AB(RA_Id, RB_Id).

(RA_Id, RB_Id) is a candidate key.
RA_Id references R_A(RA_Id).
RB_Id references R_B(RB_Id).
An additional surrogate key, such as RAB_Id, can be created.

R3. For a one to one relationship between entities E_A and E_B, there are several options:

Treat E_A as E₁ and E_B as E_m and apply rule R1.
Treat E_A as E_m and E_B as E₁ and apply rule R1.
Merge the two relations R_A and R_B into one. (In this case, you may want to refactor the ER diagram.)

R4. For any n-ary relationship (n>2), a new relation is needed.

You should consider using only binary relationships.

Example:

We will conduct classroom examples.