Chapter 4 — Database Constraints and Normalization
4.1 Integrity Constraints
Integrity constraints are rules defined in the database schema to ensure accuracy, consistency, and validity of data. Enforced by the DBMS, not application code.
Types of Integrity Constraints
| Constraint | Scope | Purpose | SQL Example |
|---|
| NOT NULL | Column | Prevents missing values | name VARCHAR(100) NOT NULL |
| UNIQUE | Column / group | No two rows have same value(s) | UNIQUE(email) |
| PRIMARY KEY | Column / group | NOT NULL + UNIQUE; uniquely identifies each row | sid INT PRIMARY KEY |
| FOREIGN KEY | Column | Referential integrity; references another table's PK | REFERENCES Student(sid) |
| CHECK | Column / row | Custom business rule using any boolean expression | CHECK (gpa BETWEEN 0 AND 4) |
| DEFAULT | Column | Specifies value when none provided during INSERT | dept VARCHAR(50) DEFAULT 'CS' |
| ASSERTION | Multi-table | Constraint over multiple tables; rarely supported natively | CREATE ASSERTION ... |
Domain Constraints
A domain constraint specifies the set of valid values for an attribute. Enforced by the data type (INT, VARCHAR, DATE) plus CHECK constraints.
- Example:
salary DECIMAL(10,2) CHECK (salary >= 0) — domain is non-negative decimal numbers
CREATE TABLE Student (
sid INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(200) UNIQUE,
dept VARCHAR(50) DEFAULT 'Undeclared',
gpa DECIMAL(3,2) CHECK (gpa >= 0 AND gpa <= 4.0),
year INT CHECK (year IN (1,2,3,4,5)),
mentor INT REFERENCES Student(sid) ON DELETE SET NULL -- self-ref FK
);
-- Table-level composite constraints
CREATE TABLE Enrollment (
sid INT REFERENCES Student(sid) ON DELETE CASCADE,
cid INT REFERENCES Course(cid) ON DELETE CASCADE,
grade CHAR(2),
CONSTRAINT pk_enroll PRIMARY KEY (sid, cid),
CONSTRAINT valid_grade CHECK (grade IN ('A','B','C','D','F','IP','W','I'))
);
-- ASSERTION (standard SQL; most DBMS implement this via triggers instead)
CREATE ASSERTION salary_constraint
CHECK (NOT EXISTS (
SELECT 1 FROM Employee WHERE salary > 2 * (SELECT AVG(salary) FROM Employee)
));
4.2 Functional Dependencies (FD)
Formal Definition
A functional dependency X → Y (read: "X functionally determines Y") holds on relation schema R if:
For every valid instance of R, whenever two tuples t₁ and t₂ agree on all attributes in X (t₁[X] = t₂[X]), they must also agree on all attributes in Y (t₁[Y] = t₂[Y]).
Intuition: knowing the value of X uniquely determines the value of Y.
Examples:
sid → name (student ID determines name — each student has one name)
cid → title (course ID determines course title)
dept → dean (each department has one dean)
sid, cid → grade (grade depends on both student AND course)
name ↛ sid (name does NOT determine sid — two students can have same name)
Trivial vs Non-trivial FDs
- Trivial FD: Y ⊆ X. Example:
{sid, name} → name (always true, no information content)
- Non-trivial FD: Y ⊄ X. Example:
sid → name (actually carries information)
Armstrong's Axioms (Sound and Complete)
These 3 axioms can derive ALL valid FDs from a given set F.
| Axiom | Formal Rule | Example |
|---|
| Reflexivity | If Y ⊆ X, then X → Y | {sid,name} → {name} |
| Augmentation | If X → Y, then XZ → YZ | sid → name ⟹ {sid,dept} → {name,dept} |
| Transitivity | If X → Y and Y → Z, then X → Z | sid → dept, dept → dean ⟹ sid → dean |
Mnemonic: RAT (Reflexivity, Augmentation, Transitivity)
Derived Rules (proven from Armstrong's Axioms)
| Rule | Statement |
|---|
| Union | If X→Y and X→Z, then X→YZ |
| Decomposition | If X→YZ, then X→Y and X→Z |
| Pseudotransitivity | If X→Y and WY→Z, then WX→Z |
Exam tip: Armstrong's axioms are sound (only derive valid FDs) and complete (can derive ALL valid FDs). Memorize the three axioms.
Attribute Closure Algorithm
The closure of X under FD set F, written X⁺, is the largest set of attributes that X functionally determines.
Uses:
- If X⁺ = all attributes of R → X is a superkey
- If X⁺ = all attributes AND no proper subset of X achieves this → X is a candidate key
Algorithm: AttributeClosure(X, F)
result ← X
REPEAT
FOR EACH FD (A → B) IN F:
IF A ⊆ result:
result ← result ∪ B
UNTIL result does not change
RETURN result
────────────────────────────────────────────────────
Worked Example 1:
Schema: R(A, B, C, D, E)
FDs: F = { A→BC, B→D, CD→E }
Find: {A}⁺
Start: {A}
Apply A→BC: A ⊆ {A} → {A, B, C}
Apply B→D: B ⊆ {A,B,C} → {A, B, C, D}
Apply CD→E: C,D ⊆ {A,B,C,D} → {A, B, C, D, E} ← all attributes!
Fixed point: {A, B, C, D, E}
{A}⁺ = all attributes → A is a SUPERKEY and CANDIDATE KEY ✓
────────────────────────────────────────────────────
Worked Example 2:
Schema: R(A, B, C, D)
FDs: F = { AB→C, C→DE, B→E }
Find: (AB)⁺
Start: {A, B}
Apply AB→C: A,B ⊆ {A,B} → {A, B, C}
Apply C→DE: C ⊆ {A,B,C} → {A, B, C, D, E}
Fixed point: {A, B, C, D, E}
Does {A,B}⁺ = all of R's attributes? No — G not in R, so check against R only.
If R = (A,B,C,D,E) then yes, (AB) is a superkey.
Is it a candidate key? Check if A alone or B alone is a superkey:
{A}⁺ = {A} (A→BC not applicable since B not in result... wait: A→C?
Re-check: only AB→C, not A→C alone → {A}⁺ = {A} only → not a superkey
{B}⁺ = {B,E} only → not a superkey
→ (AB) IS a candidate key ✓
Canonical Cover (Minimal Cover)
The canonical cover F_c of FD set F is an equivalent minimal FD set with:
- No extraneous attributes on any LHS or RHS
- No redundant FDs (none can be derived from the others)
- Each FD has exactly one attribute on the RHS
Steps to compute canonical cover:
- Apply decomposition: split FDs with multiple RHS attributes (A→BC becomes A→B, A→C)
- Remove extraneous LHS attributes: for each FD XA→Y, check if (X)⁺ already includes Y
- Remove redundant FDs: for each FD X→Y, check if Y ∈ (X)⁺ using F_c − {X→Y}
4.3 Normalization
Database normalization eliminates redundancy and prevents anomalies.
The Three Types of Anomalies
| Anomaly | Description | Example in unnormalized table |
|---|
| Update anomaly | Changing a fact requires updating multiple rows | Customer moves city → must update ALL their orders |
| Insert anomaly | Cannot insert certain data without other unrelated data | Cannot record a new product without placing an order |
| Delete anomaly | Deleting one fact accidentally removes other facts | Cancelling last order removes all customer info |
First Normal Form (1NF)
Condition: Every attribute contains only atomic (indivisible) values — no sets, arrays, or repeating groups.
Violations: storing multiple phone numbers in one column; using repeating columns (phone1, phone2, phone3).
Fix: Move multivalued attribute to separate rows or a separate relation.
Second Normal Form (2NF)
Condition: In 1NF AND every non-prime attribute is fully functionally dependent on every candidate key (no non-prime attribute depends on only part of a composite candidate key).
Only relevant when the PK is composite. Single-attribute PK → automatically in 2NF.
Non-prime attribute: an attribute not part of any candidate key.
Example violation:
OrderItem(order_id, product_id, product_name, quantity, unit_price)
- PK = (order_id, product_id)
product_name depends only on product_id (partial dependency ❌)
unit_price depends only on product_id (partial dependency ❌)
quantity depends on (order_id, product_id) together ✓
Fix:
OrderItem(order_id, product_id, quantity)
Product(product_id, product_name, unit_price)
Third Normal Form (3NF)
Condition: In 2NF AND no non-prime attribute transitively depends on any candidate key.
A transitive dependency: A → B and B → C, where B is not a candidate key and C is non-prime.
Example violation:
Employee(emp_id, dept_name, dept_location)
emp_id → dept_name (direct)
dept_name → dept_location (dept determines its location — transitive!)
dept_name is not a candidate key → violates 3NF
Fix:
Employee(emp_id, dept_name)
Department(dept_name, dept_location)
3NF Formal Definition: A relation schema R is in 3NF if, for every non-trivial FD X → A in R, either:
- X is a superkey of R, OR
- A is a prime attribute (member of some candidate key)
Boyce-Codd Normal Form (BCNF)
Condition: For every non-trivial FD X → Y, X is a superkey of R.
BCNF is stricter than 3NF — the "A is prime" exception is removed.
Example — 3NF but NOT BCNF:
CourseAdvisor(student_id, course_id, advisor_id)
- FDs: (student_id, course_id) → advisor_id; advisor_id → course_id
- Candidate keys:
{student_id, course_id} and {student_id, advisor_id}
- FD
advisor_id → course_id: advisor_id is NOT a superkey → violates BCNF ❌
- But
course_id is prime (part of candidate key) → satisfies 3NF ✓
BCNF Fix:
Advisor_Course(advisor_id, course_id)
Student_Advisor(student_id, advisor_id)
- Loss: FD
(student_id, course_id) → advisor_id can no longer be enforced in a single table
3NF vs BCNF Comparison
| 3NF | BCNF |
|---|
| Strictness | Less strict | More strict (every BCNF schema is in 3NF) |
| Lossless decomposition | ✓ Always achievable | ✓ Always achievable |
| Dependency preservation | ✓ Always achievable | ✗ Not always achievable |
| Anomaly elimination | Partial (may have some residual redundancy) | Complete |
| When to use | When preserving ALL FDs is critical | When anomaly-free storage is the priority |
Exam answer — "When would you use 3NF vs BCNF?"
Use BCNF when complete anomaly elimination is the priority. Use 3NF when you must preserve all functional dependencies (e.g., when some FDs are business rules that must be enforced within a single table, since BCNF may lose those FDs). Try BCNF first; fall back to 3NF if critical dependencies are lost.
BCNF Decomposition Algorithm:
result ← {R}
WHILE some relation Ri in result is NOT in BCNF:
FIND non-trivial FD X → Y in Ri where X is NOT a superkey of Ri
DECOMPOSE Ri into:
Ri1 = X ∪ Y ← X is the PK here; contains the violating FD
Ri2 = Ri − Y ← retains X as the join attribute (lossless join key)
REPLACE Ri with Ri1 and Ri2
RETURN result
Note: Ri2 = (Ri − Y) + X (X is kept in both parts for lossless join)
────────────────────────────────────────────────────────
BCNF Example:
R(A, B, C, D), FDs: A→B, B→C
Candidate key: {A, D}
Check A→B: A is not a superkey (A⁺ = {A,B,C} ≠ {A,B,C,D}) → violates BCNF
Decompose:
R1 = A ∪ B = (A, B) ← R1(A, B); A is PK; A→B holds
R2 = (A, C, D) ← keep A; remove B; A,D is still candidate key
Check R1(A,B): A→B; A is superkey of R1 → BCNF ✓
Check R2(A,C,D): Is B→C applicable? B is not in R2 → no BCNF violation ✓
Result: R1(A,B) and R2(A,C,D)
Lost FD: B→C (can no longer be checked in one table)
────────────────────────────────────────────────────────
3NF Synthesis Algorithm (always preserves dependencies):
1. Compute canonical cover Fc of F
2. FOR EACH FD (X → Y) in Fc: CREATE relation Ri = (X ∪ Y), PK = X
3. IF no relation in result contains a candidate key of R: ADD one
4. REMOVE any relation that is a proper subset of another
RETURN result
Lossless Decomposition
Decomposing R into R1 and R2 is lossless if:
R = R1 ⋈ R2 (natural joining R1 and R2 reproduces R exactly — no spurious tuples)
Test for lossless decomposition:
The decomposition is lossless if and only if at least one of the following holds:
- (R1 ∩ R2) → R1 (common attributes form a superkey for R1), OR
- (R1 ∩ R2) → R2 (common attributes form a superkey for R2)
Example:
- R(A, B, C), FD: A→B; Decompose into R1(A, B), R2(A, C)
- R1 ∩ R2 = {A}; A⁺ = {A,B} using A→B → A→(A,B) = R1 ✓ → Lossless!
Dependency Preservation
A decomposition preserves dependencies if every FD in F can be verified within some single Ri (without needing to join relations). If a dependency spans multiple relations, checking it requires a join — expensive and error-prone.
Exam Quick-Reference — Chapter 4
- Armstrong Axioms (RAT): Reflexivity (Y⊆X → X→Y), Augmentation (X→Y → XZ→YZ), Transitivity (X→Y, Y→Z → X→Z)
- Closure algorithm: start with X; add RHS of any FD whose LHS ⊆ current result; stop at fixed point
- 1NF: atomic values, no repeating groups
- 2NF: 1NF + no partial dependencies on composite PK (non-key attr depends on WHOLE PK)
- 3NF: 2NF + no transitive dependencies; for every non-trivial FD X→A: X is superkey OR A is prime
- BCNF: for every non-trivial FD X→Y, X must be a superkey (stricter — no "A is prime" exception)
- BCNF: eliminates all anomalies but may lose FD preservation; 3NF: always preserves FDs, partial anomalies possible
- Lossless test: R1∩R2 must be a superkey for R1 or R2