Database Modeling
by K. Yue
1.Modeling and Modeling Transformation
- Why modeling?
- Many DB developers may just directly figure out the relational schema without formal data modeling. What are the problems with that?
- It may work for very simple problems.
- For more complicated problems, it is necessary to capture the detailed requirements before constructing a solution.
- Initial ideas are usually based on needs. Their definitions can be very coarse and ambiguous. Real-world examples from past UHCL capstone projects:
- "I want a system to store and retrieve sale receipts for small business to claim sale tax waiver."
- "A mobile phone application to share and manage disaster recovery information."
- How do we construct a precise, executable computer solution to a vague real-world problem?
Software development is logical modeling (step [1] in the following diagram): constructing an executable logical model (such as using SQL, Java, etc as the logical language.)
Logical modeling spans many phases in the software lifecycle:
- Analysis
- Design: architectural and component
- Implementation (coding)
Furthermore, logical languages are closer to the machine world and optimized according. They are not designed to describe the real-world and captures real world requirement.
Conceptual modeling is introduced as shown in the diagram below:
Advantages of conceptual modeling:
- Conceptual modeling languages, such as Unified Modeling Language (UML) or Entity-Relationship (ER) model, are designed to describe real-world problems and can better capture and refine problem requirements.
- The process is a n-step process, breaking down the complexity.
- Iterative refinement of the conceptual model to provide the right fidelity for different software development phases.
Using Model-Driven Software Engineering (MDSE), software development is the development of a sequence of models, transforming a higher level models to lower-level models in the process.
2. Motivation:
Bureau of Labor Statistics, Computer and Information Technology Job outlook 2021 to 2031:
Older data for comparison:
Bureau of Labor Statistics, Computer and Information Technology Job outlook 2016 to 2026:
Average annual wage over all jobs in 2017: $37,600. 2016 to 2026 expected overall growth rate is 7%. The main difference between "software developers" and "computer programmers" are modeling and model transformation.
3. DB Modeling
What are involved in DB modeling?
- Understand, capture, and refine the application requirement until it is clear and precise enough for design and implementation.
- Communications are crucial. "Do not build your DB application. Build the DB application for the users."
- Complexity management (try to compare with problem solving not involving computers).
- Three main topics in modeling tools and techniques:
- Modeling language: e.g. UML, ER.
- Theory and 'Good' practices: e.g. architectural and design patterns.
- Modeling process: what process to following in modeling., E.g. Agile.
Understanding the problem domain
How do people solve problems?
- Obtain an initial understanding of the problem.
- Form a model to capture the understanding.
- Refine the model to include necessary details.
- Based on the model, devise a solution.
- Implement the solution.
The problem of many novices is that they do not spend enough time and effort in steps (1) to (3) (i.e. modeling) and jump to (4) and (5) quickly.
- Your supervisor/user/customer/domain expert may not have a clear model themselves!
- In most cases, for novice software developers: what you think what the applications should be is NOT as important! Instead, what the domain experts and the users think is crucial.
Example:
- "I need a class enrollment report."
What can you do to understand the problem and model it? Some good practices:
- Start by asking a lot of questions.
- Collect and analyze as much documentation as possible.
- Study and use the existing system: find out what is currently being used. Use them with the users. Participate in the work process.
- Model, write, and document!
- Use your own words and avoid the 'copy and paste' trap. (What is the copy and paste culture?)
Some possible generic questions to ask for modeling.
- What are the purposes of the application?
- What problems will the application solve? What value will it provide?
- What are the boundary and context of the applications?
- Who are the users?
- Who are the domain experts?
- What are the available documentation?
- How does the existing system work?
- What are the workflow processes of the current system?
Prepare your own list!
Two important options for database modeling languages:
- Entity Relationship (ER) Modeling and Extended ER (EER) Modeling: more relational database specific.
- UML modeling: more general purpose.
Some advantages of ER diagrams:
- Easy to understand
- Simpler
- More specific to relational database modeling
- Good collection of theory and best practices
- Good vendor and tool support
Some disadvantages of ER diagrams:
- Expressiveness
- Not concise
- lack of standards
- Many versions that can be confusing
- narrow focus
What do we look for in a modeling language?
- Completeness: can it capture all needed information?
- Expressiveness: can it capture information in an efficient and easy way for discussion and communication.
- Domain support: how well does it support capturing the information in the application domain?
- Precision and accuracy: Is it ambiguous?
- Conciseness: what is the 'information density'?
- Standardization: Are there many competing versions?
- Tool support
- Community support
- Extensibility: can it be extended to satisfy a specific domain?
We will use on UML in this course.
Read: A Badia and D. Lemire. A call to arms: revisiting database design. SIGMOD Record 40, 3 (November 2011), 61-69.