Chapter 1 — Introduction to DBMS
1.1 Core Definitions
| Term | Definition |
|---|
| Data | Raw, unprocessed facts with no inherent meaning (e.g., 42, "Alice", "2024-01-01") |
| Information | Processed, organized data that carries meaning and context (e.g., "Alice scored 42 marks on Jan 1 2024") |
| Database | A large, organized, interrelated collection of data stored persistently and managed as a unit |
| DBMS | A software system that provides mechanisms to define, construct, manipulate, and share databases among users and applications |
| Database System | The database itself combined with the DBMS software |
Exam tip: The distinction between data and information is a frequent 1-mark question. Data is raw; information has context.
Why Do We Need a DBMS?
Without a DBMS, applications maintain their own data files. This leads to well-known problems:
| Problem in File Systems | Concrete Example | DBMS Solution |
|---|
| Data redundancy | Student address stored in Admissions file AND Library file | Single centralized table |
| Data inconsistency | Student phone updated in one file but not another | One source of truth; all apps share same data |
| Difficult data access | Need to write a new program for every new query | Declarative SQL — express what, not how |
| Data isolation | Each department stores files in different formats | Uniform relational schema |
| Integrity violations | No way to enforce "salary must be positive" in a flat file | CHECK, NOT NULL, FK constraints |
| Atomicity failures | Power cut during bank transfer leaves A debited but B not credited | Transaction ACID guarantees |
| Concurrent access anomalies | Two agents book the last seat simultaneously | Locking and concurrency control |
| Security | File-level permissions are coarse | Fine-grained role-based access control |
1.2 Applications and Evolution of Databases
Key Application Domains
- Banking & Finance: account management, transactions, loan processing
- Airlines: reservation systems, flight scheduling
- Universities: student registration, grades, course catalog
- Healthcare: patient records, prescriptions, billing
- E-Commerce: product catalogs, orders, customer profiles
Evolution Timeline
| Era | Technology | Key Feature |
|---|
| 1960s | Flat files, IMS (IBM) | Hierarchical model; first structured storage |
| 1970s | Network model (CODASYL); relational model proposed (Codd 1970) | Relational algebra and set-based access |
| 1980s | Relational DBMS go commercial (Oracle, DB2, SQL Server) | SQL standardized (SQL-86, SQL-89) |
| 1990s | Object-oriented extensions; client-server architectures | SQL:1999 adds OO features |
| 2000s | Data warehousing, OLAP, XML databases | Analytical workloads separate from OLTP |
| 2010s | NoSQL (MongoDB, Cassandra, Redis), NewSQL, cloud databases | Horizontal scalability, CAP theorem trade-offs |
| 2020s | AI-augmented optimization, vector databases, cloud-native | LLM integration, real-time analytics |
Physical Level (Internal Schema)
The lowest level. Describes HOW data is physically stored on disk.
- Defines file organization: heap files, sorted files, clustered files
- Defines index structures: B+ trees, hash indexes
- Describes disk block size, buffer pool management, compression
- Managed entirely by the DBA and the storage engine
- Application developers and end users have no visibility into this level
Logical Level (Conceptual Schema)
The middle level. Describes WHAT data exists and how entities relate, without any physical storage details.
- Defines all tables (relations), their attributes, and data types
- Declares all integrity constraints (keys, foreign keys, CHECK)
- This is the level where application developers and DBAs work
Student(sid: int PK, name: varchar(100), dept: varchar(50), gpa: decimal(3,2))
Enrollment(sid: int FK→Student, cid: int FK→Course, grade: char(2))
Course(cid: int PK, title: varchar(100), credits: int)
View Level (External Schema)
The highest level. Each user or application sees only a relevant subset of the database.
- Multiple external schemas coexist over a single conceptual schema
- Provides security: payroll view shows salaries; student view does not
- Example: payroll staff see salary; students see only their own transcript
Exam tip: When drawing the 3-level diagram, always show: (1) multiple views at the top, (2) a single conceptual schema in the middle, (3) physical storage at the bottom. This shows the many-to-one-to-one relationship.
1.4 Data Independence
Data independence means changes at one level do not require changes at a higher level.
Physical Data Independence
The ability to modify the physical schema without requiring changes to the logical schema or application programs.
- Add an index → no SQL change needed
- Reorganize from heap file to sorted file → applications unaffected
- Switch from HDD to SSD → schema unchanged
- This is the more achievable and more practical form. Modern RDBMS fully support it.
Logical Data Independence
The ability to modify the logical schema without requiring changes to external schemas (views) or application programs.
- Add a new column to a table → existing views unaffected
- Split one table into two → applications may still use a view that hides the split
- This is harder to achieve because views are defined in terms of the logical schema
| Physical Independence | Logical Independence |
|---|
| What changes | Storage layout, indexes | Table structure, relationships |
| Impact on logical schema | None | View definitions may need updating |
| Difficulty | Easier to achieve | Harder to achieve |
1.5 Schema vs Instance
| Concept | Schema | Instance |
|---|
| What it is | Blueprint / structural definition of the database | Actual data stored at a particular point in time |
| When it changes | Rarely (ALTER TABLE) | Continuously (every INSERT/UPDATE/DELETE) |
| OOP analogy | Class definition | Object instantiated from that class |
| SQL analogy | CREATE TABLE statement | Rows in the table |
| Example | Student(sid INT PK, name VARCHAR(100), dept VARCHAR(50)) | Row: (1, 'Asha', 'CS') |
Analogy: Think of schema as a "type" and instance as a "value" of that type — just as an int variable has a type (int) and a value (42).
1.6 DDL · DML · DCL
-- ── DDL — Data Definition Language (define/change schema) ──────────────────
CREATE TABLE Student (
sid INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
dept VARCHAR(50) DEFAULT 'Undeclared',
gpa DECIMAL(3,2) CHECK (gpa BETWEEN 0 AND 4)
);
ALTER TABLE Student ADD COLUMN email VARCHAR(200) UNIQUE;
DROP TABLE Student;
-- ── DML — Data Manipulation Language (read and write data) ──────────────────
INSERT INTO Student VALUES (1, 'Asha', 'CS', 3.8);
SELECT name, gpa
FROM Student
WHERE dept = 'CS'
ORDER BY gpa DESC;
UPDATE Student SET dept = 'IT' WHERE sid = 1;
DELETE FROM Student WHERE gpa < 1.0;
-- ── DCL — Data Control Language (access control) ────────────────────────────
GRANT SELECT, INSERT ON Student TO app_user;
REVOKE INSERT ON Student FROM app_user;
-- ── TCL — Transaction Control Language ──────────────────────────────────────
BEGIN;
UPDATE Account SET balance = balance - 100 WHERE id = 1;
UPDATE Account SET balance = balance + 100 WHERE id = 2;
COMMIT; -- or ROLLBACK to undo