Database Modeling

by K. Yue

1.Modeling and Modeling Transformation

Software development is logical modeling (step [1] in the following diagram): constructing an executable logical model (such as using SQL, Java, etc as the logical language.)

Logical modeling spans many phases in the software lifecycle:

  1. Analysis
  2. Design: architectural and component
  3. Implementation (coding)

Furthermore, logical languages are closer to the machine world and optimized according. They are not designed to describe the real-world and captures real world requirement.

Conceptual modeling is introduced as shown in the diagram below:

Advantages of conceptual modeling:

  1. Conceptual modeling languages, such as Unified Modeling Language (UML) or Entity-Relationship (ER) model, are designed to describe real-world problems and can better capture and refine problem requirements.
  2. The process is a n-step process, breaking down the complexity.
  3. Iterative refinement of the conceptual model to provide the right fidelity for different software development phases.

Using Model-Driven Software Engineering (MDSE), software development is the development of a sequence of models, transforming a higher level models to lower-level models in the process.

2. Motivation:

Bureau of Labor Statistics, Computer and Information Technology Job outlook 2021 to 2031:

Older data for comparison:

Bureau of Labor Statistics, Computer and Information Technology Job outlook 2016 to 2026:

job outlook

Average annual wage over all jobs in 2017: $37,600. 2016 to 2026 expected overall growth rate is 7%. The main difference between "software developers" and "computer programmers" are modeling and model transformation.

3. DB Modeling

What are involved in DB modeling?

  1. Understand, capture, and refine the application requirement until it is clear and precise enough for design and implementation.
  2. Communications are crucial. "Do not build your DB application. Build the DB application for the users."
  3. Complexity management (try to compare with problem solving not involving computers).
  4. Three main topics in modeling tools and techniques:
    1. Modeling language: e.g. UML, ER.
    2. Theory and 'Good' practices: e.g. architectural and design patterns.
    3. Modeling process: what process to following in modeling., E.g. Agile.

Understanding the problem domain

How do people solve problems?

  1. Obtain an initial understanding of the problem.
  2. Form a model to capture the understanding.
  3. Refine the model to include necessary details.
  4. Based on the model, devise a solution.
  5. Implement the solution.

The problem of many novices is that they do not spend enough time and effort in steps (1) to (3) (i.e. modeling) and jump to (4) and (5) quickly.

Example:

What can you do to understand the problem and model it? Some good practices:

  1. Start by asking a lot of questions.
  2. Collect and analyze as much documentation as possible.
  3. Study and use the existing system: find out what is currently being used. Use them with the users. Participate in the work process.
  4. Model, write, and document!
  5. Use your own words and avoid the 'copy and paste' trap. (What is the copy and paste culture?)

Some possible generic questions to ask for modeling.

  1. What are the purposes of the application?
  2. What problems will the application solve? What value will it provide?
  3. What are the boundary and context of the applications?
  4. Who are the users?
  5. Who are the domain experts?
  6. What are the available documentation?
  7. How does the existing system work?
  8. What are the workflow processes of the current system?

Prepare your own list!

Two important options for database modeling languages:

  1. Entity Relationship (ER) Modeling and Extended ER (EER) Modeling: more relational database specific.
  2. UML modeling: more general purpose.

Some advantages of ER diagrams:

  1. Easy to understand
  2. Simpler
  3. More specific to relational database modeling
  4. Good collection of theory and best practices
  5. Good vendor and tool support

Some disadvantages of ER diagrams:

  1. Expressiveness
  2. Not concise
  3. lack of standards
  4. Many versions that can be confusing
  5. narrow focus

What do we look for in a modeling language?

  1. Completeness: can it capture all needed information?
  2. Expressiveness: can it capture information in an efficient and easy way for discussion and communication.
  3. Domain support: how well does it support capturing the information in the application domain?
  4. Precision and accuracy: Is it ambiguous?
  5. Conciseness: what is the 'information density'?
  6. Standardization: Are there many competing versions?
  7. Tool support
  8. Community support
  9. Extensibility: can it be extended to satisfy a specific domain?

We will use on UML in this course.

Read: A Badia and D. Lemire. A call to arms: revisiting database design. SIGMOD Record 40, 3 (November 2011), 61-69.