Converting ER Diagrams to Relation Schema
by K. Yue
1. Introduction
- ER models need to be converted to the underlying data model.
- For Relational DB, they need to be converted to relational schema.
2. Mapping of ER Diagrams to the Relational Model
- The problem is: Source Model --> Target Model.
- Source Model: ER
- Basic structures:
- entities
- relationships
- attributes
- Auxiliary structures: cardinality, data types, nullability, inheritance, etc.
- Other RDB-specific features: candidate keys, primary keys, indices, etc.
- Not all ER models are the same: different notations, extensions and guidelines.
- Target Model: Relational
- Basic structures:
- relations, and
- columns.
- Auxiliary structures: rows, candidate keys, primary keys, foreign keys, data types, nullability, indices, constraints, etc.
- Different RDBMS provide different features.
- Mapping rules need to be adapted to satisfy different situations.
- Rules (of thumbs) for conversion are shown below.
- One needs to understand the rationale behind these rules so you can adapt them to create your own mapping rules for a specific project which has a unique set of:
- requirements
- source ER models
- target RDBMS
Entities:
E1. An entity is mapped to a relation.
- The relation may use the same name.
- Two relations may be merged into a single relation later on.
- Basically,
- An entity type -> a relation.
- An entity instance (object) -> a row that stores the property of the entity.
Attributes:
A1: A single-valued attribute of atomic data type (of an entity or relationship) is mapped to an attribute in the corresponding table.
A2: For a multiple-valued attribute A of an entity E or a relationship L that is implemented as a relation R, A is implemented by a new relation L(RId, A), where RId is the primary key of the relation R.
- (RId, A) is a candidate key.
- RId is a foreign key referencing R(RId).
- You may create a surrogate key, such as L_Id, to be used as a simple primary key. If so, there will be two candidate keys: (1) L_Id, (2) (RId, A)
- The name of the new relation L should be meaningful, such as with the mentioning of A and (E or L).
A3. A single-value attribute of composite data type can be implemented in two ways.
- A3a. If there is a suitable composite data type in the targeted RDBMS, implement it as an attribute with the targeted composite data type. (Note that first normal form is now not satisfied.)
- A3b. Otherwise, treat it as a multi-valued attribute and use Rule A2.
A4: For a derived attribute A, there are usually two ways:
- A4a. Implemented as an attribute:
- As a virtual computed column.
- As a physically stored column: mechanism such as triggers or stored procedures should be used to ensure the correctness of the derived values.
- A4b. Not implemented as an attribute: mechanism such as stored functions or views can be used.
Keys and Constraints
Every relation should have at least one candidate key.
KC1. If a relation R implements an entity E or a relationship L, and E or L has candidate keys K's, set K's as candidate keys in R.
KC2. If a relation R implements an entity E or a relationship L, and E or L has no candidate key, create a surrogate primary key SK for R.
- This is needed since every relation must have a candidate key.
KC3. An UNIQUE constraint should be specified for candidate keys. Each attribute of a candidate key should also be set to NOT NULL.
KC4. Select one candidate key as the primary key and set it as the primary key of the relation.
KC5. If the cardinality of a single-valued attribute is specified:
- KC5a. If 0 is allowed, add NULL into the column definition. Note that this may not be necessary as NULL is usually the default.
- KC5b. If 0 is not allowed, add NOT NULL into the column definition.
KC6. If a constraint in the ER diagram (including those specified in comments and other documentation) can be implemented by using the SQL Data Definition Language (DDL), do so. Otherwise, be mindful that they will need to be implemented somewhere else.
- All requirements should be implemented.
Relationships
R1. For a many to one relationship between E1 (the entity with the one cardinality) and Em, add a column E1_Id into the relation Rm (which implements Em).
- Assume that R1_Id is the primary key of the relation R1 (which implements E1).
- R1_Id is a foreign key of Rm references R1(R1_Id).
- The name R1_Id may need to be renamed to become more meaningful.
- If 0 is not allowed (i.e. 1..1) for E1, then R1_Id is not null in Rm.
- Any single valued attribute of the relationship is mapped to a column in Rm.
- If you have composite or multi-valued attributes of the relationship, you should consider promoting the relationship to an entity in your ER model.
R2. For a many to many relationship between entities EA and EB, create a new relation RAB(RA_Id, RB_Id).
- (RA_Id, RB_Id) is a candidate key.
- RA_Id references RA(RA_Id).
- RB_Id references RB(RB_Id).
- An additional surrogate key, such as RAB_Id, can be created.
R3. For a one to one relationship between entities EA and EB, there are several options:
- Treat EA as E1 and EB as Em and apply rule R1.
- Treat EA as Em and EB as E1 and apply rule R1.
- Merge the two relations RA and RB into one. (In this case, you may want to refactor the ER diagram.)
R4. For any n-ary relationship (n>2), a new relation is needed.
- You should consider using only binary relationships.
Example:
We will conduct classroom examples.