Chapter 4

Database Constraints and Normalization

6 hours9 marks24 past questionsBack to chapters Open filtered PYQs Previous: Ch 3 Next: Ch 5

Full Syllabus Outline

Every topic from the syllabus data is preserved here.

4.1
Integrity constraints and domain constraints
DE
NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, and CHECK constraints with SQL syntax. Relational integrity and key constraints. Domain constraints.
4.2
Assertions
DE
CREATE ASSERTION syntax. Using assertions to enforce complex multi-table integrity constraints. Examples.
4.3
Functional dependencies
DEPS
Formal definition of functional dependency. Armstrong's axioms: reflexivity, augmentation, transitivity. Derived rules: union, decomposition, pseudotransitivity. Attribute closure algorithm ({A}+). Canonical cover. Examples and problems.
4.4
Different normal forms (1NF, 2NF, 3NF, BCNF)
DEPS
Definition and significance of database normalization. Update, insertion, and deletion anomalies. 1NF: atomicity of attributes. 2NF: no partial dependencies. 3NF: no transitive dependencies; 3NF decomposition algorithm. BCNF: every determinant is a candidate key; BCNF decomposition algorithm. Lossless-join decomposition. Dependency preservation. Properties and comparison of 3NF and BCNF.

Chapter Notes

Block-based notes with markdown, diagrams, code, and math.

Chapter 4 — Database Constraints and Normalization

4.1 Integrity Constraints

Integrity constraints are rules defined in the database schema to ensure accuracy, consistency, and validity of data. Enforced by the DBMS, not application code.

Types of Integrity Constraints

Constraint	Scope	Purpose	SQL Example
NOT NULL	Column	Prevents missing values	`name VARCHAR(100) NOT NULL`
UNIQUE	Column / group	No two rows have same value(s)	`UNIQUE(email)`
PRIMARY KEY	Column / group	NOT NULL + UNIQUE; uniquely identifies each row	`sid INT PRIMARY KEY`
FOREIGN KEY	Column	Referential integrity; references another table's PK	`REFERENCES Student(sid)`
CHECK	Column / row	Custom business rule using any boolean expression	`CHECK (gpa BETWEEN 0 AND 4)`
DEFAULT	Column	Specifies value when none provided during INSERT	`dept VARCHAR(50) DEFAULT 'CS'`
ASSERTION	Multi-table	Constraint over multiple tables; rarely supported natively	`CREATE ASSERTION ...`

Domain Constraints

A domain constraint specifies the set of valid values for an attribute. Enforced by the data type (INT, VARCHAR, DATE) plus CHECK constraints.

Example: salary DECIMAL(10,2) CHECK (salary >= 0) — domain is non-negative decimal numbers

CREATE TABLE Student (
  sid    INT          PRIMARY KEY,
  name   VARCHAR(100) NOT NULL,
  email  VARCHAR(200) UNIQUE,
  dept   VARCHAR(50)  DEFAULT 'Undeclared',
  gpa    DECIMAL(3,2) CHECK (gpa >= 0 AND gpa <= 4.0),
  year   INT          CHECK (year IN (1,2,3,4,5)),
  mentor INT          REFERENCES Student(sid) ON DELETE SET NULL  -- self-ref FK
);

-- Table-level composite constraints
CREATE TABLE Enrollment (
  sid   INT REFERENCES Student(sid) ON DELETE CASCADE,
  cid   INT REFERENCES Course(cid)  ON DELETE CASCADE,
  grade CHAR(2),
  CONSTRAINT pk_enroll   PRIMARY KEY (sid, cid),
  CONSTRAINT valid_grade CHECK (grade IN ('A','B','C','D','F','IP','W','I'))
);

-- ASSERTION (standard SQL; most DBMS implement this via triggers instead)
CREATE ASSERTION salary_constraint
CHECK (NOT EXISTS (
  SELECT 1 FROM Employee WHERE salary > 2 * (SELECT AVG(salary) FROM Employee)
));

4.2 Functional Dependencies (FD)

Formal Definition

A functional dependency X → Y (read: "X functionally determines Y") holds on relation schema R if:

For every valid instance of R, whenever two tuples t₁ and t₂ agree on all attributes in X (t₁[X] = t₂[X]), they must also agree on all attributes in Y (t₁[Y] = t₂[Y]).

Intuition: knowing the value of X uniquely determines the value of Y.

Examples:

sid → name (student ID determines name — each student has one name)
cid → title (course ID determines course title)
dept → dean (each department has one dean)
sid, cid → grade (grade depends on both student AND course)
name ↛ sid (name does NOT determine sid — two students can have same name)

Trivial vs Non-trivial FDs

Trivial FD: Y ⊆ X. Example: {sid, name} → name (always true, no information content)
Non-trivial FD: Y ⊄ X. Example: sid → name (actually carries information)

Armstrong's Axioms (Sound and Complete)

These 3 axioms can derive ALL valid FDs from a given set F.

Axiom	Formal Rule	Example
Reflexivity	If Y ⊆ X, then X → Y	`{sid,name} → {name}`
Augmentation	If X → Y, then XZ → YZ	`sid → name ⟹ {sid,dept} → {name,dept}`
Transitivity	If X → Y and Y → Z, then X → Z	`sid → dept, dept → dean ⟹ sid → dean`

Mnemonic: RAT (Reflexivity, Augmentation, Transitivity)

Derived Rules (proven from Armstrong's Axioms)

Rule	Statement
Union	If X→Y and X→Z, then X→YZ
Decomposition	If X→YZ, then X→Y and X→Z
Pseudotransitivity	If X→Y and WY→Z, then WX→Z

Exam tip: Armstrong's axioms are sound (only derive valid FDs) and complete (can derive ALL valid FDs). Memorize the three axioms.

Attribute Closure Algorithm

The closure of X under FD set F, written X⁺, is the largest set of attributes that X functionally determines.

Uses:

If X⁺ = all attributes of R → X is a superkey
If X⁺ = all attributes AND no proper subset of X achieves this → X is a candidate key

Algorithm: AttributeClosure(X, F)
  result ← X
  REPEAT
    FOR EACH FD (A → B) IN F:
      IF A ⊆ result:
        result ← result ∪ B
  UNTIL result does not change
  RETURN result

────────────────────────────────────────────────────
Worked Example 1:
  Schema: R(A, B, C, D, E)
  FDs: F = { A→BC, B→D, CD→E }
  Find: {A}⁺

  Start: {A}
  Apply A→BC:  A ⊆ {A}         → {A, B, C}
  Apply B→D:   B ⊆ {A,B,C}     → {A, B, C, D}
  Apply CD→E:  C,D ⊆ {A,B,C,D} → {A, B, C, D, E}  ← all attributes!
  Fixed point: {A, B, C, D, E}

  {A}⁺ = all attributes → A is a SUPERKEY and CANDIDATE KEY ✓

────────────────────────────────────────────────────
Worked Example 2:
  Schema: R(A, B, C, D)
  FDs: F = { AB→C, C→DE, B→E }
  Find: (AB)⁺

  Start: {A, B}
  Apply AB→C:  A,B ⊆ {A,B}     → {A, B, C}
  Apply C→DE:  C ⊆ {A,B,C}     → {A, B, C, D, E}
  Fixed point: {A, B, C, D, E}
  
  Does {A,B}⁺ = all of R's attributes? No — G not in R, so check against R only.
  If R = (A,B,C,D,E) then yes, (AB) is a superkey.
  Is it a candidate key? Check if A alone or B alone is a superkey:
    {A}⁺ = {A} (A→BC not applicable since B not in result... wait: A→C?
    Re-check: only AB→C, not A→C alone → {A}⁺ = {A} only → not a superkey
    {B}⁺ = {B,E} only → not a superkey
  → (AB) IS a candidate key ✓

Canonical Cover (Minimal Cover)

The canonical cover F_c of FD set F is an equivalent minimal FD set with:

No extraneous attributes on any LHS or RHS
No redundant FDs (none can be derived from the others)
Each FD has exactly one attribute on the RHS

Steps to compute canonical cover:

Apply decomposition: split FDs with multiple RHS attributes (A→BC becomes A→B, A→C)
Remove extraneous LHS attributes: for each FD XA→Y, check if (X)⁺ already includes Y
Remove redundant FDs: for each FD X→Y, check if Y ∈ (X)⁺ using F_c − {X→Y}

4.3 Normalization

Database normalization eliminates redundancy and prevents anomalies.

The Three Types of Anomalies

Anomaly	Description	Example in unnormalized table
Update anomaly	Changing a fact requires updating multiple rows	Customer moves city → must update ALL their orders
Insert anomaly	Cannot insert certain data without other unrelated data	Cannot record a new product without placing an order
Delete anomaly	Deleting one fact accidentally removes other facts	Cancelling last order removes all customer info

First Normal Form (1NF)

Condition: Every attribute contains only atomic (indivisible) values — no sets, arrays, or repeating groups.

Violations: storing multiple phone numbers in one column; using repeating columns (phone1, phone2, phone3).
Fix: Move multivalued attribute to separate rows or a separate relation.

Second Normal Form (2NF)

Condition: In 1NF AND every non-prime attribute is fully functionally dependent on every candidate key (no non-prime attribute depends on only part of a composite candidate key).

Only relevant when the PK is composite. Single-attribute PK → automatically in 2NF.

Non-prime attribute: an attribute not part of any candidate key.

Example violation:

OrderItem(order_id, product_id, product_name, quantity, unit_price)

PK = (order_id, product_id)
product_name depends only on product_id (partial dependency ❌)
unit_price depends only on product_id (partial dependency ❌)
quantity depends on (order_id, product_id) together ✓

Fix:

OrderItem(order_id, product_id, quantity)
Product(product_id, product_name, unit_price)

Third Normal Form (3NF)

Condition: In 2NF AND no non-prime attribute transitively depends on any candidate key.

A transitive dependency: A → B and B → C, where B is not a candidate key and C is non-prime.

Example violation:

Employee(emp_id, dept_name, dept_location)

emp_id → dept_name (direct)
dept_name → dept_location (dept determines its location — transitive!)
dept_name is not a candidate key → violates 3NF

Fix:

Employee(emp_id, dept_name)
Department(dept_name, dept_location)

3NF Formal Definition: A relation schema R is in 3NF if, for every non-trivial FD X → A in R, either:

X is a superkey of R, OR
A is a prime attribute (member of some candidate key)

Boyce-Codd Normal Form (BCNF)

Condition: For every non-trivial FD X → Y, X is a superkey of R.

BCNF is stricter than 3NF — the "A is prime" exception is removed.

Example — 3NF but NOT BCNF:

CourseAdvisor(student_id, course_id, advisor_id)

FDs: (student_id, course_id) → advisor_id; advisor_id → course_id
Candidate keys: {student_id, course_id} and {student_id, advisor_id}
FD advisor_id → course_id: advisor_id is NOT a superkey → violates BCNF ❌
But course_id is prime (part of candidate key) → satisfies 3NF ✓

BCNF Fix:

Advisor_Course(advisor_id, course_id)
Student_Advisor(student_id, advisor_id)
Loss: FD (student_id, course_id) → advisor_id can no longer be enforced in a single table

3NF vs BCNF Comparison

	3NF	BCNF
Strictness	Less strict	More strict (every BCNF schema is in 3NF)
Lossless decomposition	✓ Always achievable	✓ Always achievable
Dependency preservation	✓ Always achievable	✗ Not always achievable
Anomaly elimination	Partial (may have some residual redundancy)	Complete
When to use	When preserving ALL FDs is critical	When anomaly-free storage is the priority

Exam answer — "When would you use 3NF vs BCNF?"
Use BCNF when complete anomaly elimination is the priority. Use 3NF when you must preserve all functional dependencies (e.g., when some FDs are business rules that must be enforced within a single table, since BCNF may lose those FDs). Try BCNF first; fall back to 3NF if critical dependencies are lost.

BCNF Decomposition Algorithm:
  result ← {R}
  WHILE some relation Ri in result is NOT in BCNF:
    FIND non-trivial FD X → Y in Ri where X is NOT a superkey of Ri
    DECOMPOSE Ri into:
      Ri1 = X ∪ Y            ← X is the PK here; contains the violating FD
      Ri2 = Ri − Y           ← retains X as the join attribute (lossless join key)
    REPLACE Ri with Ri1 and Ri2
  RETURN result

Note: Ri2 = (Ri − Y) + X (X is kept in both parts for lossless join)

────────────────────────────────────────────────────────
BCNF Example:
  R(A, B, C, D), FDs: A→B, B→C
  Candidate key: {A, D}

  Check A→B: A is not a superkey (A⁺ = {A,B,C} ≠ {A,B,C,D}) → violates BCNF
  Decompose:
    R1 = A ∪ B = (A, B)     ← R1(A, B); A is PK; A→B holds
    R2 = (A, C, D)          ← keep A; remove B; A,D is still candidate key
  
  Check R1(A,B): A→B; A is superkey of R1 → BCNF ✓
  Check R2(A,C,D): Is B→C applicable? B is not in R2 → no BCNF violation ✓

  Result: R1(A,B) and R2(A,C,D)
  Lost FD: B→C (can no longer be checked in one table)

────────────────────────────────────────────────────────
3NF Synthesis Algorithm (always preserves dependencies):
  1. Compute canonical cover Fc of F
  2. FOR EACH FD (X → Y) in Fc: CREATE relation Ri = (X ∪ Y), PK = X
  3. IF no relation in result contains a candidate key of R: ADD one
  4. REMOVE any relation that is a proper subset of another
  RETURN result

Lossless Decomposition

Decomposing R into R1 and R2 is lossless if:

R = R1 ⋈ R2 (natural joining R1 and R2 reproduces R exactly — no spurious tuples)

Test for lossless decomposition: The decomposition is lossless if and only if at least one of the following holds:

(R1 ∩ R2) → R1 (common attributes form a superkey for R1), OR
(R1 ∩ R2) → R2 (common attributes form a superkey for R2)

Example:

R(A, B, C), FD: A→B; Decompose into R1(A, B), R2(A, C)
R1 ∩ R2 = {A}; A⁺ = {A,B} using A→B → A→(A,B) = R1 ✓ → Lossless!

Dependency Preservation

A decomposition preserves dependencies if every FD in F can be verified within some single Ri (without needing to join relations). If a dependency spans multiple relations, checking it requires a join — expensive and error-prone.

Exam Quick-Reference — Chapter 4

Armstrong Axioms (RAT): Reflexivity (Y⊆X → X→Y), Augmentation (X→Y → XZ→YZ), Transitivity (X→Y, Y→Z → X→Z)
Closure algorithm: start with X; add RHS of any FD whose LHS ⊆ current result; stop at fixed point
1NF: atomic values, no repeating groups
2NF: 1NF + no partial dependencies on composite PK (non-key attr depends on WHOLE PK)
3NF: 2NF + no transitive dependencies; for every non-trivial FD X→A: X is superkey OR A is prime
BCNF: for every non-trivial FD X→Y, X must be a superkey (stricter — no "A is prime" exception)
BCNF: eliminates all anomalies but may lose FD preservation; 3NF: always preserves FDs, partial anomalies possible
Lossless test: R1∩R2 must be a superkey for R1 or R2

Solved PYQs

A few past questions with short answers.

CT30104005Chapter 410 marksasked 2x2081 Chaitra R

What is database normalization and why is it needed? Explain 1NF, 2NF, 3NF and BCNF along with examples.

Solution

Normalization: Organizing data to reduce redundancy/anomalies. 1NF: Atomic values. No repeating groups. Fix: Split multi-valued cols. 2NF: 1NF + no partial dependency on composite PK. Fix: Move partially dependent attrs to new table. 3NF: 2NF + no transitive dependency. Fix: Move transitively dependent non-keys out. BCNF: 3NF + every determinant is a superkey. Fix: Decompose on violating FD.

Needed to prevent update/insert/delete anomalies and ensure consistency.

CT30104008Chapter 45 marksasked 2x2080 Chaitra R

What is partial dependency? Define 2NF and 3NF with suitable examples.

Solution

Partial Dependency: Non-key attribute depends on part of a composite PK. 2NF: 1NF + no partial dependencies. Ex: OrderItem(order_id, prod_id, prod_name). prod_name depends only on prod_id. Fix: Split to OrderItem(order_id, prod_id, qty) and Product(prod_id, prod_name). 3NF: 2NF + no transitive dependencies (PK→A→B). Ex: Emp(id, dept, dept_head). dept→dept_head is transitive. Fix: Emp(id, dept), Dept(dept, head).

CT30104019Chapter 48 marksasked 1x2079 Ashwin B

Briefly explain 1NF, 2NF, 3NF and BCNF with suitable illustrations.

Solution

1NF: Atomic. Ex: Student(sid, name, courses='Math,CS') → Split to Student_Course(sid, course). 2NF: No partial. Ex: Order(ord, item, item_name) → Order(ord,item), Item(item,name). 3NF: No transitive. Ex: Emp(id,dept,head) → Emp(id,dept), Dept(dept,head). BCNF: Determinant=Superkey. Ex: Schedule(student,course,instructor) with instructor→course. Decompose to Teaches(instructor,course), Enroll(student,instructor).

More PYQs

Additional practice from this chapter.

Define integrity constraints in DBMS. Give examples of domain constraints with its SQL syntax.
CT301040014 marksasked 1x2082 Kartik B
How do you compute an attribute closure? Explain with a suitable example. Define 3NF along with a suitable example schema. Compare 3NF with BCNF.
CT301040025 marksasked 1x2082 Kartik B
Formally define a functional dependency. Explain the Armstrong's axioms for functional dependencies.
CT301040033 marksasked 1xModel Q
Explain the conditions for BCNF and 3NF. Compare them and explain which normal form you would use in which condition along with proper justification.
CT301040046 marksasked 1xModel Q
What is database normalization and why is it needed? Explain 1NF, 2NF, 3NF and BCNF along with examples.
CT3010400510 marksasked 2x2081 Chaitra R

Database Constraints and Normalization

Integrity constraints and domain constraints

Assertions

Functional dependencies

Different normal forms (1NF, 2NF, 3NF, BCNF)

Chapter 4 — Database Constraints and Normalization

4.1 Integrity Constraints

Types of Integrity Constraints

Domain Constraints

4.2 Functional Dependencies (FD)

Formal Definition

Trivial vs Non-trivial FDs

Armstrong's Axioms (Sound and Complete)

Derived Rules (proven from Armstrong's Axioms)

Attribute Closure Algorithm

Canonical Cover (Minimal Cover)

4.3 Normalization

The Three Types of Anomalies

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

3NF vs BCNF Comparison

Lossless Decomposition

Dependency Preservation

Exam Quick-Reference — Chapter 4