Chapter 1

Introduction

3 hours3 marks20 past questionsBack to chapters Open filtered PYQs Next: Ch 2

Full Syllabus Outline

Every topic from the syllabus data is preserved here.

1.1
Application and evolution of database
DE
Definition of data, database and database management systems. Discussion of application and significance of databases and DBMS. History of databases and evolution to recent trends.
1.2
Data abstraction and data independence
DEI
Description of data abstraction levels (physical, logical, and view) along with block diagram and their significance. Principle of physical data independence along with practical significance.
1.3
Schema and instances
DEI
Definition of schema and instances along with practical examples.

Chapter Notes

Block-based notes with markdown, diagrams, code, and math.

Chapter 1 — Introduction to DBMS

1.1 Core Definitions

Term	Definition
Data	Raw, unprocessed facts with no inherent meaning (e.g., 42, "Alice", "2024-01-01")
Information	Processed, organized data that carries meaning and context (e.g., "Alice scored 42 marks on Jan 1 2024")
Database	A large, organized, interrelated collection of data stored persistently and managed as a unit
DBMS	A software system that provides mechanisms to define, construct, manipulate, and share databases among users and applications
Database System	The database itself combined with the DBMS software

Exam tip: The distinction between data and information is a frequent 1-mark question. Data is raw; information has context.

Why Do We Need a DBMS?

Without a DBMS, applications maintain their own data files. This leads to well-known problems:

Problem in File Systems	Concrete Example	DBMS Solution
Data redundancy	Student address stored in Admissions file AND Library file	Single centralized table
Data inconsistency	Student phone updated in one file but not another	One source of truth; all apps share same data
Difficult data access	Need to write a new program for every new query	Declarative SQL — express what, not how
Data isolation	Each department stores files in different formats	Uniform relational schema
Integrity violations	No way to enforce "salary must be positive" in a flat file	CHECK, NOT NULL, FK constraints
Atomicity failures	Power cut during bank transfer leaves A debited but B not credited	Transaction ACID guarantees
Concurrent access anomalies	Two agents book the last seat simultaneously	Locking and concurrency control
Security	File-level permissions are coarse	Fine-grained role-based access control

1.2 Applications and Evolution of Databases

Key Application Domains

Banking & Finance: account management, transactions, loan processing
Airlines: reservation systems, flight scheduling
Universities: student registration, grades, course catalog
Healthcare: patient records, prescriptions, billing
E-Commerce: product catalogs, orders, customer profiles

Evolution Timeline

Era	Technology	Key Feature
1960s	Flat files, IMS (IBM)	Hierarchical model; first structured storage
1970s	Network model (CODASYL); relational model proposed (Codd 1970)	Relational algebra and set-based access
1980s	Relational DBMS go commercial (Oracle, DB2, SQL Server)	SQL standardized (SQL-86, SQL-89)
1990s	Object-oriented extensions; client-server architectures	SQL:1999 adds OO features
2000s	Data warehousing, OLAP, XML databases	Analytical workloads separate from OLTP
2010s	NoSQL (MongoDB, Cassandra, Redis), NewSQL, cloud databases	Horizontal scalability, CAP theorem trade-offs
2020s	AI-augmented optimization, vector databases, cloud-native	LLM integration, real-time analytics

1.3 Three-Level Data Abstraction (ANSI/SPARC Architecture)

The three-level architecture was proposed by ANSI/SPARC in 1975 to achieve data independence — the ability to change one level without affecting higher levels.

Physical Level (Internal Schema)

The lowest level. Describes HOW data is physically stored on disk.

Defines file organization: heap files, sorted files, clustered files
Defines index structures: B+ trees, hash indexes
Describes disk block size, buffer pool management, compression
Managed entirely by the DBA and the storage engine
Application developers and end users have no visibility into this level

Logical Level (Conceptual Schema)

The middle level. Describes WHAT data exists and how entities relate, without any physical storage details.

Defines all tables (relations), their attributes, and data types
Declares all integrity constraints (keys, foreign keys, CHECK)
This is the level where application developers and DBAs work

Student(sid: int PK, name: varchar(100), dept: varchar(50), gpa: decimal(3,2))
Enrollment(sid: int FK→Student, cid: int FK→Course, grade: char(2))
Course(cid: int PK, title: varchar(100), credits: int)

View Level (External Schema)

The highest level. Each user or application sees only a relevant subset of the database.

Multiple external schemas coexist over a single conceptual schema
Provides security: payroll view shows salaries; student view does not
Example: payroll staff see salary; students see only their own transcript

Exam tip: When drawing the 3-level diagram, always show: (1) multiple views at the top, (2) a single conceptual schema in the middle, (3) physical storage at the bottom. This shows the many-to-one-to-one relationship.

1.4 Data Independence

Data independence means changes at one level do not require changes at a higher level.

Physical Data Independence

The ability to modify the physical schema without requiring changes to the logical schema or application programs.

Add an index → no SQL change needed
Reorganize from heap file to sorted file → applications unaffected
Switch from HDD to SSD → schema unchanged
This is the more achievable and more practical form. Modern RDBMS fully support it.

Logical Data Independence

The ability to modify the logical schema without requiring changes to external schemas (views) or application programs.

Add a new column to a table → existing views unaffected
Split one table into two → applications may still use a view that hides the split
This is harder to achieve because views are defined in terms of the logical schema

	Physical Independence	Logical Independence
What changes	Storage layout, indexes	Table structure, relationships
Impact on logical schema	None	View definitions may need updating
Difficulty	Easier to achieve	Harder to achieve

1.5 Schema vs Instance

Concept	Schema	Instance
What it is	Blueprint / structural definition of the database	Actual data stored at a particular point in time
When it changes	Rarely (ALTER TABLE)	Continuously (every INSERT/UPDATE/DELETE)
OOP analogy	Class definition	Object instantiated from that class
SQL analogy	CREATE TABLE statement	Rows in the table
Example	`Student(sid INT PK, name VARCHAR(100), dept VARCHAR(50))`	Row: `(1, 'Asha', 'CS')`

Analogy: Think of schema as a "type" and instance as a "value" of that type — just as an int variable has a type (int) and a value (42).

1.6 DDL · DML · DCL

-- ── DDL — Data Definition Language (define/change schema) ──────────────────
CREATE TABLE Student (
  sid   INT          PRIMARY KEY,
  name  VARCHAR(100) NOT NULL,
  dept  VARCHAR(50)  DEFAULT 'Undeclared',
  gpa   DECIMAL(3,2) CHECK (gpa BETWEEN 0 AND 4)
);
ALTER TABLE Student ADD COLUMN email VARCHAR(200) UNIQUE;
DROP TABLE Student;

-- ── DML — Data Manipulation Language (read and write data) ──────────────────
INSERT INTO Student VALUES (1, 'Asha', 'CS', 3.8);

SELECT name, gpa
FROM   Student
WHERE  dept = 'CS'
ORDER  BY gpa DESC;

UPDATE Student SET dept = 'IT' WHERE sid = 1;
DELETE FROM Student WHERE gpa < 1.0;

-- ── DCL — Data Control Language (access control) ────────────────────────────
GRANT  SELECT, INSERT ON Student TO app_user;
REVOKE INSERT         ON Student FROM app_user;

-- ── TCL — Transaction Control Language ──────────────────────────────────────
BEGIN;
  UPDATE Account SET balance = balance - 100 WHERE id = 1;
  UPDATE Account SET balance = balance + 100 WHERE id = 2;
COMMIT;  -- or ROLLBACK to undo

DDL vs DML vs DCL — Quick Comparison

Category	Full Name	Purpose	Common Commands
DDL	Data Definition Language	Define/modify database structure (schema)	CREATE, ALTER, DROP, TRUNCATE
DML	Data Manipulation Language	Insert, read, update, delete data (instances)	SELECT, INSERT, UPDATE, DELETE
DCL	Data Control Language	Manage access permissions	GRANT, REVOKE
TCL	Transaction Control Language	Manage transactions	BEGIN, COMMIT, ROLLBACK, SAVEPOINT

Exam trap: DDL changes the structure (schema); DML changes the data (instance).

1.7 DBMS Internal Architecture

Exam Quick-Reference — Chapter 1

Data vs Information: data = raw facts; information = processed data with meaning.
Three levels: Physical (HOW stored) → Logical (WHAT exists) → View (WHAT each user sees). Always draw this diagram when asked.
Physical independence: change storage → no logical change. Logical independence: change schema → views still work. Physical is easier.
Schema = blueprint (stable, changed with ALTER TABLE); Instance = snapshot (changes with every DML operation).
DDL = structure (CREATE/ALTER/DROP); DML = data (SELECT/INSERT/UPDATE/DELETE); DCL = permissions (GRANT/REVOKE).
File system disadvantages: redundancy, inconsistency, poor access, isolation, integrity, atomicity, security, concurrency.

Solved PYQs

A few past questions with short answers.

CT30101011Chapter 16 marksasked 2x2080 Chaitra R

Explain the advantages of using Relational Database Management Systems. What are the levels of data abstraction in Database Management system?

Solution

Advantages of RDBMS: Reduced redundancy (normalization), data integrity (constraints), concurrent ACID access, crash recovery (WAL), role-based security, standardized SQL, data independence.

Abstraction Levels: Physical (storage/indexes), Logical (schema/constraints), View (user subsets). Isolates concerns between storage, developers, and end-users.

CT30101016Chapter 16 marksasked 1x2081 Ashwin B

Define database. Explain DDL, DML and DCL with examples.

Solution

Database: Structured, organized data collection. DDL defines structure (CREATE, ALTER, DROP - auto-commit). DML manipulates data (SELECT, INSERT, UPDATE, DELETE - transactional). DCL manages access (GRANT, REVOKE).

-- DDL
CREATE TABLE Student(sid INT PRIMARY KEY, name VARCHAR(100) NOT NULL);
ALTER TABLE Student ADD COLUMN email VARCHAR(100);

-- DML
INSERT INTO Student(sid, name) VALUES (1, 'Asha');
SELECT * FROM Student WHERE sid = 1;
UPDATE Student SET name = 'Asha Sharma' WHERE sid = 1;
DELETE FROM Student WHERE sid = 1;

-- DCL
GRANT SELECT ON Student TO app_user;
REVOKE INSERT ON Student FROM app_user;

CT30101003Chapter 15 marksasked 1x2079 Ashwin B

What do you mean by Schema and Instances? Briefly explain the different level of data abstraction.

Solution

Schema vs Instance:

Schema: The logical blueprint/structure (table definitions, constraints, types). Defined via DDL. Rarely changes.
Instance: The actual data stored at a given moment. Changes with every INSERT/UPDATE/DELETE.

Data Abstraction Levels:

Physical: Storage mechanics (blocks, indexes, hashing).
Logical: Schema, relationships, and integrity rules.
View: User-specific virtual tables.

Abstraction ensures schema design is independent of runtime data and physical storage choices.

More PYQs

Additional practice from this chapter.

Briefly explain the three levels of data abstraction in RDBMS. What is physical data independence?
CT301010014 marksasked 1x2082 Kartik B
What do you mean by data abstraction? Describe the various levels of data abstraction in database management system.
CT301010024 marksasked 1x2080 Ashwin B
What do you mean by Schema and Instances? Briefly explain the different level of data abstraction.
CT301010035 marksasked 1x2079 Ashwin B
Describe the different levels of abstraction in database. Discuss the significance of this abstraction.
CT301010044 marksasked 1x2078 Poush B
Define the terms Data Abstraction and Data Independence. Why they are important in DBMS?
CT301010055 marksasked 1x2078 Chaitra R

Introduction

Application and evolution of database

Data abstraction and data independence

Schema and instances

Chapter 1 — Introduction to DBMS

1.1 Core Definitions

Why Do We Need a DBMS?

1.2 Applications and Evolution of Databases

Key Application Domains

Evolution Timeline

1.3 Three-Level Data Abstraction (ANSI/SPARC Architecture)

Physical Level (Internal Schema)

Logical Level (Conceptual Schema)

View Level (External Schema)

1.4 Data Independence

Physical Data Independence

Logical Data Independence

1.5 Schema vs Instance

1.6 DDL · DML · DCL

DDL vs DML vs DCL — Quick Comparison

1.7 DBMS Internal Architecture

Exam Quick-Reference — Chapter 1