Mapping UML class diagrams to the relational model

Mapping UML Class Diagrams to the Relational Model

by K. Yue

1. Resources

UML and DB Modeling:

Read Chapter 9, Elmasri.

2. Mapping OO Model to the Relational Model

Once an OO data model is constructed, it needs to be mapped for implementation in the selected database model.
If relational DB is used, the mapping will be from OO (classes, attributes, associations, etc.) to relational schema (relations, attributes, keys, etc.)
Note that the relational model and the OO model are very different.
CASE tools or DB modeling tools may provide varying degrees of facilities for generating relational schema and corresponding SQL statements.
However, it is important to understand the mechanism a tool uses to generate the relational schema and make adjustment if needed.

The problem is: Source Model --> Target Model.

Source Model: UML
1. Basic structures: classes, associations, and attributes.
2. Auxiliary structures: objects, multiplicity, data types, default values, stereotypes, candidate keys, primary keys, nullability, etc.
Target Model: Relational
1. Basic structures: relations, and columns (attributes).
2. Auxiliary structures: rows (tuples), candidate keys, primary keys, data types, nullability, indexes, constraints, etc.
3. Different RDBMS provide different features.

3. Conversion Rules

We present a set of mapping rules below. It is not meant to be complete or universal.
As alternative examples of 'mapping rules': http://web.fe.up.pt/~ssn/2010/lbaw/slides/lbaw-uml2rel.eng.pdf and http://www.agiledata.org/essays/mappingObjects.html#Figure2IncludingShadowInformation.
Do not mechanically follow these rules. Instead, understand the rationale behind the rules and adapt.
All model details should all be implemented in some ways.

Classes

C1. A class C is mapped to a relation RC.

Relations may later be merged in design refinement and performance tuning.
The relation may use the same name as the class.

Attributes

ATT1. (Basic) Include all single-valued attributes (with simple data types) of a class C as the attributes of RC, the relation for the class C.

ATT2. (Basic) For each multi-valued attribute A of the class C, create a new relation RCA containing the attribute A and the primary key, RCId, of the relation RC (which implements the class C).

(RCId, A) is a composite candidate key.
RCId is a foreign key referencing RC(RCId)
A surrogate key, such as RCA_Id, may be created to serve as a simple candidate key, to be selected as the primary key.
The name of RCA should be meaningfully selected.

ATT3. A single-valued attribute of composite data type (such as set, list, array) can be mapped in various ways.

If there is an comparable composite data type in the targeted DBMS, it can be implemented as an attribute of that data type in the relation.
- The relation will no longer be in the first normal form.
- Care should be taken in handling the difference in data type mapping.
Otherwise, regard the attribute as a multi-valued attribute and apply rule ATT2.

ATT4. For a derived attribute A:

It can be implemented and stored as an attribute of the relation.

Mechanism, such as triggers or stored procedures, should be used to ensure data consistency. That is the derived column should be consistent with the data that derives its value.

It may not be stored as a column directly in any relation.
- Mechanism, such as virtual columns, views or stored functions, may be used to provide standard access to the derived attributes.

ATT5. Data type matching should be handled effectively and consistently.

In later modeling phases, one may use SQL data types of the targeted DBMS in the class diagram.
If available, consider using user-defined data types in the targeted DBMS.

ATT6. If the multiplicity of an attribute is specified for the case of 0.

If 0 is allowed, add the NULL specifier in the column definition. (NULL is usually the default)
If 0 is not allowed, add the NOT NULL specifier in the column definition.

ATT7. The default value of an attribute may be directly implemented in SQL DDL.

Keys and Constraints

KC1. If a relation R implements a class C or an association (class) A, and C or A has candidate keys K's, set all K's as candidate keys in R.

KC2. If a relation R implements a class C or an association (class) A, and C or A has no candidate key, create a surrogate candidate key SK for R.

This is needed as every relation must have at least one candidate key.
SK is usually the primary key.

KC3. All candidate keys can be implemented by using the 'unique' and non-null constraint in SQL.

KC4. Select a candidate key as the primary key and set it accordingly in the relation.

KC5. For a stereotype:

Some may be directly implemented in SQL DDL, e.g. PK, CK, unique, etc.
Otherwise, it is necessary to consider where it is implemented.

Example:

Consider the class Member with the following attributes:

Member_Id: <<PK>>
Member_ScreenName <<unique>>
Hobby*
Medal* <<ordered>>

astah_4

What may the relational schema look like?

Member(MemberId, MemberScreenName):
1. CK: [1] MemberId, [2] MemberScreenName
Hobbies(HobbiesId, MemberId, Hobby):
1. CK: [1] HobbiesId, [2] MemberId, Hobby
2. FK: [1] MemberId references Member(MemberId)
3. A surrogate key, HobbiesId, is created as the primary key.
Medals(MedalsId, MemberId, Medal):
1. CK: [1] MedalsId, [2] MemberId, Medal.
2. FK: [1] MemberId references Member(MemberId)
3. A surrogate key, MedalsId, is created as the primary key.

All columns in the table above are not nullable.

Example:

A class Rectangle has three attributes:

Length
Width
\area: derived.

What may the relational schema look like?

One solution:

Rectangle(RectangleId, Length, Width)

with a view Rect define as

select RectangleId, Length, Width, Length * Width as Area
from Rectangle;

Alternatively, using virtual column:

CREATE or replace TABLE rectangle (
width DOUBLE,
height DOUBLE,
area DOUBLE AS (width * height) virtual
);

Associations

A1. (basic) For a many to one association between C1 (the class with the one multiplicity) and Cm, add a column C1_Id into the relation Rm (which implements Cm).

Assume that R1_Id is the primary key of the relation R1.
R1_Id is a foreign key of Rm referencing R1(R1_Id).
The name R1_Id may need to be renamed.
R1_Id is not null in Rm If and only if 0 is not allowed (i.e. 1..1) for C1.
Any single valued attribute of the association is mapped to a column in Rm.
If you have composite or multi-valued attributes of the relationship, you should consider promoting the association to an association class or a class in your UML class diagram.

A2. (basic) For a many to many association between classes CA and CB, create a new relation RAB(RA_Id, RB_Id).

(RA_Id, RB_Id) is a candidate key.
RA_Id references RA(RA_Id) as a foreign key.
RB_Id references RB(RB_Id) as a foreign key.
An additional surrogate key, such as RAB_Id, can be created.

A3. For a one to one association between classes CA and CB, there are several options:

Treat CA as C1 and CB as Cm and apply A1.
Treat CA as Cm and CB as C1 and apply A1.
Merge the two relations RA and RB into one. (In this case, you may want to refactor the class diagram.)

A4. For any n-ary association (n>2), a new relation is needed.

You should consider using binary associations instead.

Example:

Consider the classes Order and OrderItem which have a (1..1) to (0..*) association. The association has an attribute 'packed'.

What may the relational schema look like?

OrderItem(OrderItemId, ..., OrderId, packed):

OrderId is a foreign key referencing Order(OrderId)

Example:

Consider the class User and Account, which has a one to one association.

What may the relational schema look like?

Example:

Consider the tertiary association between the classes Supplier, Part, and Warehouse with an association attribute quantity.

What may the relational schema look like?

Supply(SupplyId, SupplierId, PartId, WarehouseId, Quantity):

CK: [1] SupplyId, [2] SupplierId, PartId, WarehouseId
FK: [1] SupplierId references Supplier(SupplierId), [2] PartId references Part(PartId), [3] WarehouseId references Warehouse(WarehouseId).