Chapter 2 — Data Models
2.1 What is a Data Model?
A data model is a collection of conceptual tools for describing:
- Data structure — how data is organized
- Data relationships — how pieces of data relate to each other
- Data semantics — what the data means
- Constraints — rules the data must obey
Data Model Comparison
| Model | Core Structure | Strengths | Weaknesses | Example DBMS |
|---|
| Relational | Tables with rows and columns | General-purpose, SQL, well-understood | Not ideal for complex nested data | MySQL, PostgreSQL, Oracle |
| ER (Entity-Relationship) | Entities, attributes, relationships | Intuitive design tool; maps to relational easily | Conceptual only — not executable | Lucidchart, draw.io |
| Object-Oriented | Objects with attributes and methods | Models complex real-world objects naturally | Complex querying; not widely adopted | db4o, Versant |
| Hierarchical | Tree of parent-child records | Fast for known access paths | Rigid; no M:N without duplication | IBM IMS (legacy) |
| Network | Graph of records connected by pointers | M:N without duplication | Complex navigation | IDMS (legacy) |
| Graph | Nodes (entities) + directed edges (relationships) | Naturally models social/network data | Less efficient for tabular queries | Neo4j, Amazon Neptune |
Exam tip: Know the key characteristic of each model and one example DBMS. Hierarchical and network models are legacy — know their disadvantages vs relational.
2.2 Entity-Relationship (ER) Model
The ER model is a conceptual design tool — it helps model real-world requirements before implementation. Proposed by Peter Chen in 1976.
2.2.1 Entities and Entity Sets
An entity is a distinguishable real-world object (a specific student, a specific course).
An entity set is the collection of all entities of the same type (all students, all courses).
Strong vs Weak Entities
| Strong Entity | Weak Entity |
|---|
| Key | Has its own primary key | Has only a partial key — depends on owner entity |
| ER notation | Single rectangle | Double rectangle |
| Identifying relationship | — | Connected to owner by double diamond |
| Existence | Exists independently | Cannot exist without its owner |
| Example | Employee(emp_id, name) | Dependent(dep_name, dob) — partial key is dep_name, depends on Employee |
2.2.2 Attributes and Their Types
| Attribute Type | Definition | ER Notation | Example |
|---|
| Simple (atomic) | Cannot be divided further | Single ellipse | age, salary |
| Composite | Can be divided into sub-attributes | Ellipse with child ellipses | name → {first_name, last_name} |
| Multivalued | Can hold multiple values for one entity | Double ellipse | phone_numbers, skills |
| Derived | Computed from another attribute | Dashed ellipse | age derived from birth_date |
2.2.3 Keys in ER Model
| Key Type | Definition | Example |
|---|
| Super key | Any set of attributes that uniquely identifies an entity | {sid}, {sid, name}, {sid, name, dept} |
| Candidate key | A minimal super key — no proper subset is also a super key | {sid} |
| Primary key | The chosen candidate key, underlined in ER diagram | sid |
| Partial key | Uniquely identifies a weak entity within its owner set — dashed underline | dep_name in Dependent |
2.2.4 Relationships and Mapping Cardinalities
Cardinality Constraints
| Cardinality | Meaning | Example | ER Notation |
|---|
| 1:1 | Each entity in A relates to at most one in B, and vice versa | One person manages at most one department | Arrow on both sides |
| 1:N | One entity in A relates to many in B; each B to at most one A | One department has many employees | Arrow on the "one" side |
| M:N | Entities in A can relate to many in B and vice versa | A student enrolls in many courses; a course has many students | No arrows |
Participation Constraints
| Type | Meaning | ER Notation | Example |
|---|
| Total participation | Every entity MUST participate in at least one relationship | Double line | Every employee MUST belong to a department |
| Partial participation | Some entities may not participate | Single line | Some employees may not manage any project |
Relationship attributes: Attributes that belong to the relationship itself, not either entity. Example: WORKS_ON between Employee and Project has attribute hours.
Exam tip: Always state both cardinality AND participation constraints when describing a relationship in an ER diagram.
2.2.5 Specialization, Generalization, and Aggregation
Specialization (Top-Down): Start with a general entity (supertype), define subtypes that inherit all attributes and add their own.
- Symbol: ISA triangle (△) pointing from subtype to supertype
- Disjoint: entity belongs to at most one subtype (marked "d")
- Overlapping: entity can belong to multiple subtypes (marked "o")
- Total: every supertype must belong to at least one subtype
- Partial: some supertype entities may not belong to any subtype
Generalization (Bottom-Up): Start with multiple specific entity types, observe common attributes, and merge them into a new supertype.
- Example:
Car, Truck, Motorcycle all have vehicle_id, make, model, year → create a Vehicle supertype.
- Same ER diagram result as specialization — difference is only the direction of thought.
Aggregation: Treat a relationship set as a higher-level entity so it can participate in another relationship.
- Why needed? ER model cannot directly have a relationship between a relationship and an entity.
- Example: An employee works on a project (WORKS_ON relationship). A manager supervises that specific employee-project combination. The supervisor relationship links to the aggregated WORKS_ON relationship.
2.3 Relational Model
The relational model organizes data as a collection of relations (tables). Proposed by E.F. Codd (IBM) in 1970.
Key Terminology
| Relational Term | SQL Equivalent | Description |
|---|
| Relation | Table | Two-dimensional table of data |
| Tuple | Row / Record | A single row in the relation |
| Attribute | Column / Field | A named column |
| Domain | Data type + constraints | The set of allowed values for an attribute |
| Degree | Number of columns | Number of attributes |
| Cardinality | Number of rows | Number of tuples |
Properties of a Relation (Formal)
- Each cell contains exactly one atomic value (1NF requirement)
- Each column has a unique name
- All values in a column are from the same domain
- No duplicate tuples (formally — SQL allows duplicates in tables)
- Order of rows is irrelevant (a relation is a set)
- Order of columns is irrelevant (attributes identified by name)
2.3.1 Keys in the Relational Model
| Key Type | Definition | Example |
|---|
| Super key | Set of attributes that uniquely identifies every tuple | {sid}, {sid, name} |
| Candidate key | Minimal super key | {sid} |
| Primary key (PK) | Chosen candidate key; must be unique + NOT NULL | sid INT PRIMARY KEY |
| Foreign key (FK) | Attribute(s) in R that reference the PK of another relation S | Enrollment.sid references Student.sid |
| Composite key | PK made of multiple attributes | (sid, cid) in Enrollment |
2.3.2 Referential Integrity Constraint
A foreign key constraint requires that every non-null FK value must match an existing PK value in the referenced relation.
| Event | RESTRICT (NO ACTION) | CASCADE | SET NULL | SET DEFAULT |
|---|
| Delete referenced row | ❌ Block the delete | ✓ Delete child rows too | Set FK to NULL | Set FK to default |
| Update referenced PK | ❌ Block the update | ✓ Update FK values too | Set FK to NULL | Set FK to default |
Exam example: Delete a department with employees → RESTRICT: error; CASCADE: employees also deleted; SET NULL: employees' dept_id becomes NULL.
2.4 ER Diagram → Relational Schema Mapping Rules
These rules are very frequently tested. Memorize all 6 rules.
-- RULE 1: Strong entity → its own relation; PK = entity's primary key
CREATE TABLE Department (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100) NOT NULL,
location VARCHAR(100)
);
-- RULE 2: 1:N binary relationship → FK on the N-side (many side)
CREATE TABLE Employee (
emp_id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
salary DECIMAL(10,2),
dept_id INT REFERENCES Department(dept_id) ON DELETE SET NULL
-- FK on Employee (N side), references Department (1 side)
);
-- RULE 3: M:N relationship → new bridge/junction relation
-- PK = combination of both entity PKs; NEVER put FK on either entity alone for M:N
CREATE TABLE Project (
proj_id INT PRIMARY KEY,
title VARCHAR(200) NOT NULL,
budget DECIMAL(12,2)
);
CREATE TABLE Works_On (
emp_id INT REFERENCES Employee(emp_id) ON DELETE CASCADE,
proj_id INT REFERENCES Project(proj_id) ON DELETE CASCADE,
hours DECIMAL(5,2), -- relationship attribute
PRIMARY KEY (emp_id, proj_id) -- composite PK
);
-- RULE 4: Weak entity → own relation; PK = owner's PK + partial key
CREATE TABLE Dependent (
emp_id INT REFERENCES Employee(emp_id) ON DELETE CASCADE,
dep_name VARCHAR(100), -- partial key (unique per emp_id)
relationship VARCHAR(50),
dob DATE,
PRIMARY KEY (emp_id, dep_name) -- composite PK: owner PK + partial key
);
-- RULE 5: Multivalued attribute → separate relation
CREATE TABLE Employee_Phone (
emp_id INT REFERENCES Employee(emp_id) ON DELETE CASCADE,
phone VARCHAR(20),
PRIMARY KEY (emp_id, phone) -- composite PK
);
-- RULE 6: 1:1 relationship → FK in either table (prefer total participation side)
ALTER TABLE Department ADD COLUMN manager_id INT REFERENCES Employee(emp_id);
-- RULE 7: Specialization/Generalization → per-level tables (most common approach)
CREATE TABLE Person (pid INT PRIMARY KEY, name VARCHAR(100), age INT);
CREATE TABLE Student_Spec (
pid INT PRIMARY KEY REFERENCES Person(pid), -- PK is also FK
gpa DECIMAL(3,2),
major VARCHAR(50)
);
CREATE TABLE Employee_Spec (
pid INT PRIMARY KEY REFERENCES Person(pid),
salary DECIMAL(10,2),
emp_id VARCHAR(20)
);
Summary of ER → Relational Mapping Rules
| ER Construct | Relational Mapping Rule |
|---|
| Strong entity | Own table; PK = entity PK |
| Weak entity | Own table; PK = owner PK + partial key; FK to owner with CASCADE delete |
| 1:1 relationship | FK in either table (prefer total participation side) |
| 1:N relationship | FK on the N-side (many side) |
| M:N relationship | New bridge/junction table; PK = combination of both entity PKs |
| Multivalued attribute | New separate table; PK = entity PK + attribute value |
| Derived attribute | Usually not stored (computed when needed) |
| Specialization (ISA) | 3 options: per-level tables (most common), per-subtype tables, or single table with discriminator |
Exam trap — M:N: Never store M:N by putting a multivalued FK in one entity. You MUST create a separate junction table with a composite PK.
Exam Quick-Reference — Chapter 2
ER Notation Summary:
- Rectangle = strong entity; Double rectangle = weak entity
- Diamond = relationship; Double diamond = identifying relationship (for weak entity)
- Ellipse = attribute; Double ellipse = multivalued; Dashed ellipse = derived
- Primary key attribute = underlined; Partial key = dashed underline
- Double line = total participation; Single line = partial participation
Key mapping rules (memorize):
- M:N → always a new bridge table
- Weak entity PK = owner's PK + partial key
- 1:N → FK goes on the N (many) side
- Multivalued attribute → separate table
- FK violation actions: RESTRICT · CASCADE · SET NULL · SET DEFAULT