DBMS NotesPYQs

Chapter 4

Database Constraints and Normalization

Full Syllabus Outline

Every topic from the syllabus data is preserved here.

Chapter Notes

Block-based notes with markdown, diagrams, code, and math.

Chapter 4 — Database Constraints and Normalization

4.1 Integrity Constraints

Integrity constraints are rules defined in the database schema to ensure accuracy, consistency, and validity of data. Enforced by the DBMS, not application code.

Types of Integrity Constraints

ConstraintScopePurposeSQL Example
NOT NULLColumnPrevents missing valuesname VARCHAR(100) NOT NULL
UNIQUEColumn / groupNo two rows have same value(s)UNIQUE(email)
PRIMARY KEYColumn / groupNOT NULL + UNIQUE; uniquely identifies each rowsid INT PRIMARY KEY
FOREIGN KEYColumnReferential integrity; references another table's PKREFERENCES Student(sid)
CHECKColumn / rowCustom business rule using any boolean expressionCHECK (gpa BETWEEN 0 AND 4)
DEFAULTColumnSpecifies value when none provided during INSERTdept VARCHAR(50) DEFAULT 'CS'
ASSERTIONMulti-tableConstraint over multiple tables; rarely supported nativelyCREATE ASSERTION ...

Domain Constraints

A domain constraint specifies the set of valid values for an attribute. Enforced by the data type (INT, VARCHAR, DATE) plus CHECK constraints.

  • Example: salary DECIMAL(10,2) CHECK (salary >= 0) — domain is non-negative decimal numbers
CREATE TABLE Student (
  sid    INT          PRIMARY KEY,
  name   VARCHAR(100) NOT NULL,
  email  VARCHAR(200) UNIQUE,
  dept   VARCHAR(50)  DEFAULT 'Undeclared',
  gpa    DECIMAL(3,2) CHECK (gpa >= 0 AND gpa <= 4.0),
  year   INT          CHECK (year IN (1,2,3,4,5)),
  mentor INT          REFERENCES Student(sid) ON DELETE SET NULL  -- self-ref FK
);

-- Table-level composite constraints
CREATE TABLE Enrollment (
  sid   INT REFERENCES Student(sid) ON DELETE CASCADE,
  cid   INT REFERENCES Course(cid)  ON DELETE CASCADE,
  grade CHAR(2),
  CONSTRAINT pk_enroll   PRIMARY KEY (sid, cid),
  CONSTRAINT valid_grade CHECK (grade IN ('A','B','C','D','F','IP','W','I'))
);

-- ASSERTION (standard SQL; most DBMS implement this via triggers instead)
CREATE ASSERTION salary_constraint
CHECK (NOT EXISTS (
  SELECT 1 FROM Employee WHERE salary > 2 * (SELECT AVG(salary) FROM Employee)
));

4.2 Functional Dependencies (FD)

Formal Definition

A functional dependency X → Y (read: "X functionally determines Y") holds on relation schema R if:

For every valid instance of R, whenever two tuples t₁ and t₂ agree on all attributes in X (t₁[X] = t₂[X]), they must also agree on all attributes in Y (t₁[Y] = t₂[Y]).

Intuition: knowing the value of X uniquely determines the value of Y.

Examples:

  • sid → name (student ID determines name — each student has one name)
  • cid → title (course ID determines course title)
  • dept → dean (each department has one dean)
  • sid, cid → grade (grade depends on both student AND course)
  • name ↛ sid (name does NOT determine sid — two students can have same name)

Trivial vs Non-trivial FDs

  • Trivial FD: Y ⊆ X. Example: {sid, name} → name (always true, no information content)
  • Non-trivial FD: Y ⊄ X. Example: sid → name (actually carries information)

Armstrong's Axioms (Sound and Complete)

These 3 axioms can derive ALL valid FDs from a given set F.

AxiomFormal RuleExample
ReflexivityIf Y ⊆ X, then X → Y{sid,name} → {name}
AugmentationIf X → Y, then XZ → YZsid → name ⟹ {sid,dept} → {name,dept}
TransitivityIf X → Y and Y → Z, then X → Zsid → dept, dept → dean ⟹ sid → dean

Mnemonic: RAT (Reflexivity, Augmentation, Transitivity)

Derived Rules (proven from Armstrong's Axioms)

RuleStatement
UnionIf X→Y and X→Z, then X→YZ
DecompositionIf X→YZ, then X→Y and X→Z
PseudotransitivityIf X→Y and WY→Z, then WX→Z

Exam tip: Armstrong's axioms are sound (only derive valid FDs) and complete (can derive ALL valid FDs). Memorize the three axioms.

Attribute Closure Algorithm

The closure of X under FD set F, written X⁺, is the largest set of attributes that X functionally determines.

Uses:

  • If X⁺ = all attributes of R → X is a superkey
  • If X⁺ = all attributes AND no proper subset of X achieves this → X is a candidate key
Algorithm: AttributeClosure(X, F)
  result ← X
  REPEAT
    FOR EACH FD (A → B) IN F:
      IF A ⊆ result:
        result ← result ∪ B
  UNTIL result does not change
  RETURN result

────────────────────────────────────────────────────
Worked Example 1:
  Schema: R(A, B, C, D, E)
  FDs: F = { A→BC, B→D, CD→E }
  Find: {A}⁺

  Start: {A}
  Apply A→BC:  A ⊆ {A}         → {A, B, C}
  Apply B→D:   B ⊆ {A,B,C}     → {A, B, C, D}
  Apply CD→E:  C,D ⊆ {A,B,C,D} → {A, B, C, D, E}  ← all attributes!
  Fixed point: {A, B, C, D, E}

  {A}⁺ = all attributes → A is a SUPERKEY and CANDIDATE KEY ✓

────────────────────────────────────────────────────
Worked Example 2:
  Schema: R(A, B, C, D)
  FDs: F = { AB→C, C→DE, B→E }
  Find: (AB)⁺

  Start: {A, B}
  Apply AB→C:  A,B ⊆ {A,B}     → {A, B, C}
  Apply C→DE:  C ⊆ {A,B,C}     → {A, B, C, D, E}
  Fixed point: {A, B, C, D, E}
  
  Does {A,B}⁺ = all of R's attributes? No — G not in R, so check against R only.
  If R = (A,B,C,D,E) then yes, (AB) is a superkey.
  Is it a candidate key? Check if A alone or B alone is a superkey:
    {A}⁺ = {A} (A→BC not applicable since B not in result... wait: A→C?
    Re-check: only AB→C, not A→C alone → {A}⁺ = {A} only → not a superkey
    {B}⁺ = {B,E} only → not a superkey
  → (AB) IS a candidate key ✓

Canonical Cover (Minimal Cover)

The canonical cover F_c of FD set F is an equivalent minimal FD set with:

  1. No extraneous attributes on any LHS or RHS
  2. No redundant FDs (none can be derived from the others)
  3. Each FD has exactly one attribute on the RHS

Steps to compute canonical cover:

  1. Apply decomposition: split FDs with multiple RHS attributes (A→BC becomes A→B, A→C)
  2. Remove extraneous LHS attributes: for each FD XA→Y, check if (X)⁺ already includes Y
  3. Remove redundant FDs: for each FD X→Y, check if Y ∈ (X)⁺ using F_c − {X→Y}

4.3 Normalization

Database normalization eliminates redundancy and prevents anomalies.

The Three Types of Anomalies

AnomalyDescriptionExample in unnormalized table
Update anomalyChanging a fact requires updating multiple rowsCustomer moves city → must update ALL their orders
Insert anomalyCannot insert certain data without other unrelated dataCannot record a new product without placing an order
Delete anomalyDeleting one fact accidentally removes other factsCancelling last order removes all customer info

First Normal Form (1NF)

Condition: Every attribute contains only atomic (indivisible) values — no sets, arrays, or repeating groups.

Violations: storing multiple phone numbers in one column; using repeating columns (phone1, phone2, phone3).
Fix: Move multivalued attribute to separate rows or a separate relation.


Second Normal Form (2NF)

Condition: In 1NF AND every non-prime attribute is fully functionally dependent on every candidate key (no non-prime attribute depends on only part of a composite candidate key).

Only relevant when the PK is composite. Single-attribute PK → automatically in 2NF.

Non-prime attribute: an attribute not part of any candidate key.

Example violation:

OrderItem(order_id, product_id, product_name, quantity, unit_price)

  • PK = (order_id, product_id)
  • product_name depends only on product_id (partial dependency ❌)
  • unit_price depends only on product_id (partial dependency ❌)
  • quantity depends on (order_id, product_id) together ✓

Fix:

  • OrderItem(order_id, product_id, quantity)
  • Product(product_id, product_name, unit_price)

Third Normal Form (3NF)

Condition: In 2NF AND no non-prime attribute transitively depends on any candidate key.

A transitive dependency: A → B and B → C, where B is not a candidate key and C is non-prime.

Example violation:

Employee(emp_id, dept_name, dept_location)

  • emp_id → dept_name (direct)
  • dept_name → dept_location (dept determines its location — transitive!)
  • dept_name is not a candidate key → violates 3NF

Fix:

  • Employee(emp_id, dept_name)
  • Department(dept_name, dept_location)

3NF Formal Definition: A relation schema R is in 3NF if, for every non-trivial FD X → A in R, either:

  1. X is a superkey of R, OR
  2. A is a prime attribute (member of some candidate key)

Boyce-Codd Normal Form (BCNF)

Condition: For every non-trivial FD X → Y, X is a superkey of R.

BCNF is stricter than 3NF — the "A is prime" exception is removed.

Example — 3NF but NOT BCNF:

CourseAdvisor(student_id, course_id, advisor_id)

  • FDs: (student_id, course_id) → advisor_id; advisor_id → course_id
  • Candidate keys: {student_id, course_id} and {student_id, advisor_id}
  • FD advisor_id → course_id: advisor_id is NOT a superkey → violates BCNF ❌
  • But course_id is prime (part of candidate key) → satisfies 3NF ✓

BCNF Fix:

  • Advisor_Course(advisor_id, course_id)
  • Student_Advisor(student_id, advisor_id)
  • Loss: FD (student_id, course_id) → advisor_id can no longer be enforced in a single table

3NF vs BCNF Comparison

3NFBCNF
StrictnessLess strictMore strict (every BCNF schema is in 3NF)
Lossless decomposition✓ Always achievable✓ Always achievable
Dependency preservation✓ Always achievable✗ Not always achievable
Anomaly eliminationPartial (may have some residual redundancy)Complete
When to useWhen preserving ALL FDs is criticalWhen anomaly-free storage is the priority

Exam answer — "When would you use 3NF vs BCNF?"
Use BCNF when complete anomaly elimination is the priority. Use 3NF when you must preserve all functional dependencies (e.g., when some FDs are business rules that must be enforced within a single table, since BCNF may lose those FDs). Try BCNF first; fall back to 3NF if critical dependencies are lost.

BCNF Decomposition Algorithm:
  result ← {R}
  WHILE some relation Ri in result is NOT in BCNF:
    FIND non-trivial FD X → Y in Ri where X is NOT a superkey of Ri
    DECOMPOSE Ri into:
      Ri1 = X ∪ Y            ← X is the PK here; contains the violating FD
      Ri2 = Ri − Y           ← retains X as the join attribute (lossless join key)
    REPLACE Ri with Ri1 and Ri2
  RETURN result

Note: Ri2 = (Ri − Y) + X (X is kept in both parts for lossless join)

────────────────────────────────────────────────────────
BCNF Example:
  R(A, B, C, D), FDs: A→B, B→C
  Candidate key: {A, D}

  Check A→B: A is not a superkey (A⁺ = {A,B,C} ≠ {A,B,C,D}) → violates BCNF
  Decompose:
    R1 = A ∪ B = (A, B)     ← R1(A, B); A is PK; A→B holds
    R2 = (A, C, D)          ← keep A; remove B; A,D is still candidate key
  
  Check R1(A,B): A→B; A is superkey of R1 → BCNF ✓
  Check R2(A,C,D): Is B→C applicable? B is not in R2 → no BCNF violation ✓

  Result: R1(A,B) and R2(A,C,D)
  Lost FD: B→C (can no longer be checked in one table)

────────────────────────────────────────────────────────
3NF Synthesis Algorithm (always preserves dependencies):
  1. Compute canonical cover Fc of F
  2. FOR EACH FD (X → Y) in Fc: CREATE relation Ri = (X ∪ Y), PK = X
  3. IF no relation in result contains a candidate key of R: ADD one
  4. REMOVE any relation that is a proper subset of another
  RETURN result

Lossless Decomposition

Decomposing R into R1 and R2 is lossless if:

R = R1 ⋈ R2 (natural joining R1 and R2 reproduces R exactly — no spurious tuples)

Test for lossless decomposition: The decomposition is lossless if and only if at least one of the following holds:

  • (R1 ∩ R2) → R1 (common attributes form a superkey for R1), OR
  • (R1 ∩ R2) → R2 (common attributes form a superkey for R2)

Example:

  • R(A, B, C), FD: A→B; Decompose into R1(A, B), R2(A, C)
  • R1 ∩ R2 = {A}; A⁺ = {A,B} using A→B → A→(A,B) = R1 ✓ → Lossless!

Dependency Preservation

A decomposition preserves dependencies if every FD in F can be verified within some single Ri (without needing to join relations). If a dependency spans multiple relations, checking it requires a join — expensive and error-prone.


Exam Quick-Reference — Chapter 4

  • Armstrong Axioms (RAT): Reflexivity (Y⊆X → X→Y), Augmentation (X→Y → XZ→YZ), Transitivity (X→Y, Y→Z → X→Z)
  • Closure algorithm: start with X; add RHS of any FD whose LHS ⊆ current result; stop at fixed point
  • 1NF: atomic values, no repeating groups
  • 2NF: 1NF + no partial dependencies on composite PK (non-key attr depends on WHOLE PK)
  • 3NF: 2NF + no transitive dependencies; for every non-trivial FD X→A: X is superkey OR A is prime
  • BCNF: for every non-trivial FD X→Y, X must be a superkey (stricter — no "A is prime" exception)
  • BCNF: eliminates all anomalies but may lose FD preservation; 3NF: always preserves FDs, partial anomalies possible
  • Lossless test: R1∩R2 must be a superkey for R1 or R2

Solved PYQs

A few past questions with short answers.

CT30104005Chapter 410 marksasked 2x2081 Chaitra R

What is database normalization and why is it needed? Explain 1NF, 2NF, 3NF and BCNF along with examples.

Solution

Normalization: Organizing data to reduce redundancy/anomalies. 1NF: Atomic values. No repeating groups. Fix: Split multi-valued cols. 2NF: 1NF + no partial dependency on composite PK. Fix: Move partially dependent attrs to new table. 3NF: 2NF + no transitive dependency. Fix: Move transitively dependent non-keys out. BCNF: 3NF + every determinant is a superkey. Fix: Decompose on violating FD.

Needed to prevent update/insert/delete anomalies and ensure consistency.

CT30104008Chapter 45 marksasked 2x2080 Chaitra R

What is partial dependency? Define 2NF and 3NF with suitable examples.

Solution

Partial Dependency: Non-key attribute depends on part of a composite PK. 2NF: 1NF + no partial dependencies. Ex: OrderItem(order_id, prod_id, prod_name). prod_name depends only on prod_id. Fix: Split to OrderItem(order_id, prod_id, qty) and Product(prod_id, prod_name). 3NF: 2NF + no transitive dependencies (PK→A→B). Ex: Emp(id, dept, dept_head). dept→dept_head is transitive. Fix: Emp(id, dept), Dept(dept, head).

CT30104019Chapter 48 marksasked 1x2079 Ashwin B

Briefly explain 1NF, 2NF, 3NF and BCNF with suitable illustrations.

Solution

1NF: Atomic. Ex: Student(sid, name, courses='Math,CS') → Split to Student_Course(sid, course). 2NF: No partial. Ex: Order(ord, item, item_name)Order(ord,item), Item(item,name). 3NF: No transitive. Ex: Emp(id,dept,head)Emp(id,dept), Dept(dept,head). BCNF: Determinant=Superkey. Ex: Schedule(student,course,instructor) with instructor→course. Decompose to Teaches(instructor,course), Enroll(student,instructor).

More PYQs

Additional practice from this chapter.