DBMS NotesPYQs

Chapter 9

Advanced Database Concepts

Chapter Notes

Block-based notes with markdown, diagrams, code, and math.

Chapter 9 — Advanced Database Concepts

9.1 Object-Oriented and Object-Relational Databases

Core OO Concepts in Databases

ConceptMeaning
Object identityEach object has a unique OID regardless of its attribute values
EncapsulationInternal state hidden; accessed only via defined methods
InheritanceSubtype inherits all attributes and methods of supertype
PolymorphismSame method name behaves differently for different object types

Object-Relational Model (SQL:1999+)

Extends the relational model with OO features while keeping SQL compatibility.

-- PostgreSQL object-relational extensions

-- User-defined TYPE (composite type)
CREATE TYPE address_t AS (
  street  VARCHAR(200),
  city    VARCHAR(100),
  zipcode VARCHAR(10)
);

-- Use it in a table
CREATE TABLE Customer (
  cid     INT PRIMARY KEY,
  name    VARCHAR(100),
  address address_t          -- composite attribute
);

-- Array type (multivalued attribute)
CREATE TABLE Student_OR (
  sid    INT PRIMARY KEY,
  name   VARCHAR(100),
  phones VARCHAR(20)[]       -- array of phone numbers
);
INSERT INTO Student_OR VALUES (1, 'Alice', ARRAY['9800000001','9800000002']);
SELECT sid, name, phones[1] AS primary_phone FROM Student_OR;

-- Inheritance (table inheritance in PostgreSQL)
CREATE TABLE Person (
  pid  INT PRIMARY KEY,
  name VARCHAR(100),
  age  INT
);
CREATE TABLE Employee_OO () INHERITS (Person);  -- inherits all Person columns
ALTER TABLE Employee_OO ADD COLUMN salary DECIMAL(10,2);
ALTER TABLE Employee_OO ADD COLUMN emp_id INT;

ORM (Object-Relational Mapping)

Maps application-level objects to relational tables automatically.

  • Developer works with objects; ORM generates SQL behind the scenes.
  • Examples: Hibernate (Java), SQLAlchemy (Python), Prisma (Node.js).
  • Reduces boilerplate but can generate inefficient queries (N+1 problem).

9.2 Distributed Databases

A distributed database stores data across multiple geographically separate sites connected by a network, appearing as a single database to the user.

Fragmentation Strategies

Horizontal FragmentationVertical Fragmentation
WhatRows distributed by a predicateColumns distributed by use pattern
ReconstructUNION of all fragmentsJOIN on PK
ExampleNepal orders in Site 1; India orders in Site 2Personal info in Site 1; financial info in Site 2

Replication

TypeDescriptionAdvantage
Full replicationAll sites have the full databaseMaximum availability; fast local reads
Partial replicationOnly some sites have copies of some fragmentsBalance between availability and storage
No replicationEach fragment on exactly one siteNo redundancy overhead

Homogeneous vs Heterogeneous

HomogeneousHeterogeneous
DBMSSame DBMS everywhereDifferent DBMS at different sites
SchemaSameMay differ
ComplexityLowerHigher (schema mapping needed)

9.3 Data Warehousing and OLAP

OLTP vs OLAP

OLTPOLAP
PurposeDay-to-day transactionsAnalysis and reporting
Query typeSimple, short, high frequencyComplex, long, low frequency
DataCurrent, normalizedHistorical, denormalized
UsersClerks, customersAnalysts, managers
Design3NFStar / snowflake schema

Warehouse Schemas

Star schema: central fact table surrounded by denormalized dimension tables.
Snowflake schema: fact table + normalized dimension tables (dimension tables further joined to sub-dimension tables).

OLAP Operations

OperationMeaningExample
Roll-up (aggregation)Move up hierarchy, summarizeMonthly → Quarterly
Drill-downMove down hierarchy, more detailAnnual → Monthly
SliceFix one dimension, 2D resultSales for Q1 only
DiceFix two+ dimensions, sub-cubeSales in Q1, Nepal
Pivot (rotate)Rotate the data cube axesRows ↔ Columns

9.4 NoSQL and Big Data

NoSQL Categories

CategoryStructureUse CaseExamples
Key-ValueSimple KV mapCaching, sessionsRedis, DynamoDB
DocumentJSON/BSON documentsContent mgmt, catalogsMongoDB, CouchDB
Column-familyWide rows, column familiesTime-series, logsCassandra, HBase
GraphNodes + edges + propertiesSocial networks, fraudNeo4j, Amazon Neptune

Big Data — 3 V's (+ 2 more)

VMeaning
VolumeTerabytes to petabytes of data
VelocityData generated and processed at high speed
VarietyStructured, semi-structured, unstructured
VeracityUncertainty and quality of data
ValueBusiness value extracted from data

Hadoop Ecosystem

  • HDFS — distributed file system; splits files into 128 MB blocks, replicates across nodes.
  • MapReduce — parallel processing: Map phase (filter/sort), Reduce phase (aggregate).
  • Hive — SQL-like queries on HDFS.
  • Spark — in-memory distributed processing, much faster than MapReduce.

Exam Quick-Reference

  • Horizontal fragmentation = rows (σ); Vertical fragmentation = columns (π).
  • Reconstruct horizontal: UNION; reconstruct vertical: JOIN on PK.
  • ETL = Extract (from OLTP) → Transform (clean, merge) → Load (into warehouse).
  • OLAP roll-up = aggregate up; drill-down = detail down; slice = fix one dim.
  • NoSQL: key-value (Redis), document (MongoDB), column-family (Cassandra), graph (Neo4j).

Full Syllabus Outline

Every topic from the syllabus data is preserved here.

Solved PYQs

A few past questions with short answers.

Chapter 94 marksasked 1xModel Q

Describe the two primary strategies for distributing data across multiple physical locations: Data Replication and Data Fragmentation, stating a key advantage of each.

Solution

Replication copies data to multiple sites; fragmentation splits data across sites.

  • Replication advantage: improved read availability — any site can serve the data.
  • Fragmentation advantage: locality — queries access the nearest site with the relevant data.
  • Horizontal fragmentation: select rows by predicate (e.g., region = 'Nepal').
  • Vertical fragmentation: select columns by usage pattern (e.g., personal info vs financial info).
Chapter 96 marksasked 3x2082 Kartik B, 2081 Chaitra R, 2080 Ashwin B

Write short notes on: (a) Distributed databases (b) Data warehouses.

Solution

Distributed databases coordinate data across sites; data warehouses support analysis and reporting.

  • Distributed DB: emphasizes consistency, concurrency, and transparency; serves OLTP.
  • Data warehouse: emphasizes historical data, ETL, multidimensional analysis; serves OLAP.
  • Distributed DB: data is current; warehouse: data is historical and integrated.
Chapter 96 marksasked 2x2080 Chaitra R

Write short notes on: (a) Parallel databases (b) Data warehouse.

Solution

Parallel databases use multiple processors/disks simultaneously; data warehouses store historical integrated data.

  • Parallel DB: shared memory, shared disk, or shared nothing architectures.
  • Intra-query parallelism: one query split across processors.
  • Data warehouse: star/snowflake schema, ETL pipeline, OLAP cubes.

More PYQs

Additional practice from this chapter.