Chapter 3 — Relational Query Languages
3.1 Relational Algebra
Relational algebra is a procedural formal query language — it specifies both what data to retrieve and how (the sequence of operations). SQL is based on relational algebra but is declarative (specifies only what, not how).
Core Operators
| Operator | Symbol | Output | Description |
|---|
| Selection | σ (sigma) | Subset of rows | Filters rows satisfying a predicate |
| Projection | π (pi) | Subset of columns | Keeps only specified columns; removes duplicates |
| Cartesian Product | × | All row combinations | Pairs every tuple of R with every tuple of S |
| Natural Join | ⋈ | Joined relation | Joins on all common attributes; merges duplicate columns |
| Theta Join | ⋈_θ | Joined relation | Cartesian product filtered by predicate θ |
| Union | ∪ | All tuples in R or S | Both relations must be union-compatible |
| Intersection | ∩ | Tuples in both | Both must be union-compatible |
| Set Difference | − | Tuples in R but not S | Both must be union-compatible |
| Rename | ρ (rho) | Renamed relation | Rename relation and/or attributes |
| Assignment | ← | Named result | Store intermediate result for reuse |
Union-compatible = same number of attributes with compatible domains.
Selection: σdept=′CS′∧gpa>3.5(Student) Projection: πname,gpa(σdept=′CS′(Student)) Natural Join: Student⋈Enrollment⋈Course Theta Join: Student⋈Student.sid=Enrollment.sidEnrollment Common Relational Algebra Query Examples
| Query | Relational Algebra Expression |
|---|
| CS students with GPA > 3.5 | σ_{dept='CS' ∧ gpa>3.5}(Student) |
| Names of CS students | π_{name}(σ_{dept='CS'}(Student)) |
| Students enrolled in course 101 | π_{sid}(σ_{cid=101}(Enrollment)) |
| Student names enrolled in course 101 | π_{name}(Student ⋈ σ_{cid=101}(Enrollment)) |
| Students NOT enrolled in any course | π_{sid}(Student) − π_{sid}(Enrollment) |
Optimization tip: Always push σ as close to the base relation as possible, and push π early to eliminate unneeded columns. This reduces intermediate result sizes.
3.2 SQL Reference Schema
CREATE TABLE Student (
sid INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
dept VARCHAR(50),
gpa DECIMAL(3,2) CHECK (gpa BETWEEN 0 AND 4)
);
CREATE TABLE Course (
cid INT PRIMARY KEY,
title VARCHAR(100) NOT NULL,
credits INT,
dept VARCHAR(50)
);
CREATE TABLE Enrollment (
sid INT REFERENCES Student(sid) ON DELETE CASCADE,
cid INT REFERENCES Course(cid) ON DELETE CASCADE,
grade CHAR(2),
year INT,
PRIMARY KEY (sid, cid)
);
CREATE TABLE Instructor (
iid INT PRIMARY KEY,
name VARCHAR(100),
dept VARCHAR(50),
salary DECIMAL(10,2)
);
CREATE TABLE Teaches (
iid INT REFERENCES Instructor(iid),
cid INT REFERENCES Course(cid),
PRIMARY KEY (iid, cid)
);
3.3 SQL SELECT — Full Syntax and Logical Execution Order
SELECT [DISTINCT] column_list | *
FROM table_list | join_expression
[WHERE row_predicate]
[GROUP BY grouping_columns]
[HAVING group_predicate]
[ORDER BY sort_column [ASC | DESC]]
[LIMIT n OFFSET m]
Logical execution order (how the engine evaluates — NOT the writing order):
- FROM + JOIN — identify all rows from all tables
- WHERE — filter individual rows
- GROUP BY — group the filtered rows
- HAVING — filter groups (applied after grouping)
- SELECT — compute output columns (aliases defined here)
- DISTINCT — remove duplicate output rows
- ORDER BY — sort the result
- LIMIT / OFFSET — paginate
Exam traps:
- You CANNOT use a SELECT alias in a WHERE clause (WHERE runs before SELECT).
- You CAN use a SELECT alias in ORDER BY (ORDER BY runs after SELECT).
- Never use aggregate functions in WHERE — use HAVING instead.
- WHERE filters rows (before grouping); HAVING filters groups (after GROUP BY).
-- ── BASIC SELECT + AGGREGATES ────────────────────────────────────────────
SELECT name, gpa
FROM Student
WHERE dept = 'CS' AND gpa > 3.5
ORDER BY gpa DESC;
-- DISTINCT — remove duplicate rows
SELECT DISTINCT dept FROM Student;
-- Aggregate functions: COUNT, AVG, MAX, MIN, SUM
-- COUNT(*): counts all rows including NULLs
-- COUNT(col): counts non-null values only; AVG, MAX, MIN, SUM all ignore NULLs
SELECT
dept,
COUNT(*) AS total_students,
ROUND(AVG(gpa), 2) AS avg_gpa,
MAX(gpa) AS top_gpa,
MIN(gpa) AS lowest_gpa
FROM Student
GROUP BY dept -- group rows by dept first
HAVING AVG(gpa) > 3.0 -- then filter groups where avg GPA > 3.0
ORDER BY avg_gpa DESC;
-- ── JOINS ────────────────────────────────────────────────────────────────
-- INNER JOIN: only matching rows from both tables
SELECT s.name, c.title, e.grade
FROM Student s
JOIN Enrollment e ON s.sid = e.sid
JOIN Course c ON e.cid = c.cid
WHERE c.dept = 'CS';
-- LEFT OUTER JOIN: all rows from left table; NULLs for non-matching right rows
-- Use: find students NOT enrolled (grade will be NULL)
SELECT s.name, e.cid, e.grade
FROM Student s
LEFT JOIN Enrollment e ON s.sid = e.sid;
-- RIGHT OUTER JOIN: all rows from right table
SELECT s.name, c.title
FROM Enrollment e
RIGHT JOIN Course c ON e.cid = c.cid;
-- FULL OUTER JOIN: all rows from both tables
SELECT s.name, c.title
FROM Student s
FULL OUTER JOIN Enrollment e ON s.sid = e.sid
FULL OUTER JOIN Course c ON e.cid = c.cid;
-- CROSS JOIN (Cartesian product): WARNING — |R| × |S| rows
SELECT s.name, c.title FROM Student s CROSS JOIN Course c;
-- NATURAL JOIN: auto-join on same-named columns (fragile — use with care)
SELECT * FROM Student NATURAL JOIN Enrollment;
-- Self-join: list pairs of students in the same department
SELECT a.name AS student1, b.name AS student2, a.dept
FROM Student a
JOIN Student b ON a.dept = b.dept AND a.sid < b.sid;
-- a.sid < b.sid prevents (Alice,Bob) AND (Bob,Alice) both appearing
-- ── SUBQUERIES ───────────────────────────────────────────────────────────
-- Non-correlated: independent of outer query, runs ONCE
SELECT name FROM Student
WHERE sid IN (
SELECT sid FROM Enrollment WHERE grade = 'A'
);
-- Correlated: references outer query row; runs ONCE PER OUTER ROW (slower)
-- Find students with GPA above their own department's average
SELECT name, gpa, dept
FROM Student s
WHERE gpa > (
SELECT AVG(gpa) FROM Student s2 WHERE s2.dept = s.dept
);
-- EXISTS: true if subquery returns at least one row
SELECT name FROM Student s
WHERE EXISTS (
SELECT 1 FROM Enrollment e WHERE e.sid = s.sid AND e.grade = 'A'
);
-- NOT EXISTS: students who never got an A grade
SELECT name FROM Student s
WHERE NOT EXISTS (
SELECT 1 FROM Enrollment e WHERE e.sid = s.sid AND e.grade = 'A'
);
-- Students never enrolled in any course
SELECT name FROM Student s
WHERE NOT EXISTS (SELECT 1 FROM Enrollment e WHERE e.sid = s.sid);
-- ⚠ Prefer NOT EXISTS over NOT IN when NULLs may be present in the subquery result
-- Scalar subquery: subquery returning a single value, usable in SELECT
SELECT name,
(SELECT COUNT(*) FROM Enrollment e WHERE e.sid = s.sid) AS num_courses
FROM Student s;
-- ── SET OPERATIONS ──────────────────────────────────────────────────────
-- Both relations must be union-compatible (same columns, compatible types)
-- UNION: rows in first OR second result; REMOVES duplicates
SELECT sid FROM Student WHERE dept = 'CS'
UNION
SELECT sid FROM Student WHERE dept = 'IT';
-- UNION ALL: keep ALL duplicates (faster — no dedup step)
SELECT sid FROM Student WHERE dept = 'CS'
UNION ALL
SELECT sid FROM Student WHERE gpa > 3.8;
-- INTERSECT: rows in BOTH results
SELECT sid FROM Enrollment WHERE cid = 101
INTERSECT
SELECT sid FROM Enrollment WHERE cid = 102; -- enrolled in BOTH courses
-- EXCEPT (MINUS in Oracle): rows in first but NOT in second
SELECT sid FROM Enrollment WHERE cid = 101
EXCEPT
SELECT sid FROM Enrollment WHERE cid = 102; -- enrolled in 101 but NOT 102
-- ── DML: INSERT · UPDATE · DELETE ──────────────────────────────────────
-- INSERT — single row
INSERT INTO Student (sid, name, dept, gpa) VALUES (10, 'Ravi', 'CS', 3.7);
-- INSERT — multiple rows at once
INSERT INTO Student VALUES
(11, 'Priya', 'IT', 3.5),
(12, 'Sita', 'CS', 3.9);
-- INSERT from SELECT (bulk insert from another query)
INSERT INTO CS_Backup SELECT * FROM Student WHERE dept = 'CS';
-- UPDATE — update rows matching condition
UPDATE Student
SET gpa = LEAST(gpa + 0.1, 4.0) -- LEAST() prevents exceeding 4.0
WHERE dept = 'CS';
-- UPDATE with subquery
UPDATE Student SET dept = 'CS'
WHERE sid IN (SELECT sid FROM Enrollment WHERE cid = 101);
-- DELETE — remove rows
DELETE FROM Student WHERE gpa < 1.5;
-- DELETE using subquery — remove students never enrolled anywhere
DELETE FROM Student
WHERE sid NOT IN (SELECT DISTINCT sid FROM Enrollment);
-- ── VIEWS ────────────────────────────────────────────────────────────────
-- Simple view: stores the query definition, not data
CREATE VIEW CS_Students AS
SELECT sid, name, gpa FROM Student WHERE dept = 'CS';
-- View with join (hides complexity from user)
CREATE VIEW Student_Courses AS
SELECT s.name AS student_name, c.title AS course_title, e.grade
FROM Student s
JOIN Enrollment e ON s.sid = e.sid
JOIN Course c ON e.cid = c.cid;
-- WITH CHECK OPTION: prevents INSERT/UPDATE that violates the view's WHERE
CREATE VIEW High_GPA_Students AS
SELECT * FROM Student WHERE gpa > 3.5
WITH CHECK OPTION;
-- INSERT INTO High_GPA_Students VALUES(99,'X','CS',2.0) → ERROR
-- Materialized view (PostgreSQL): stores the RESULT physically; must be refreshed
CREATE MATERIALIZED VIEW Dept_Stats AS
SELECT dept, COUNT(*) AS num_students, AVG(gpa) AS avg_gpa
FROM Student GROUP BY dept;
REFRESH MATERIALIZED VIEW Dept_Stats; -- update stored result
-- Access control: grant access to view, not base table
GRANT SELECT ON CS_Students TO student_app_user;
DROP VIEW CS_Students;
DROP MATERIALIZED VIEW Dept_Stats;
Views — Key Concepts
| Concept | Regular View | Materialized View |
|---|
| Storage | Stores query definition only | Stores the query result (physically) |
| Freshness | Always current (re-executes on access) | Stale until refreshed (REFRESH command) |
| Performance | Slower for complex queries | Faster (pre-computed result) |
| Use case | Security, simplification, abstraction | Reporting, OLAP, dashboards |
Updatable views: A view supports INSERT/UPDATE/DELETE only if it selects from a single base table with no DISTINCT, GROUP BY, HAVING, aggregates, or UNION.
-- ── TRIGGERS ─────────────────────────────────────────────────────────────
-- Trigger: automatically fires BEFORE or AFTER INSERT/UPDATE/DELETE
-- Use cases: audit logging, enforce complex constraints, auto-compute derived values
-- PostgreSQL: trigger function + trigger binding
-- Example 1: Validate GPA before insert/update
CREATE OR REPLACE FUNCTION validate_gpa()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.gpa < 0 OR NEW.gpa > 4.0 THEN
RAISE EXCEPTION 'GPA must be between 0.0 and 4.0, got %', NEW.gpa;
END IF;
RETURN NEW; -- BEFORE trigger must RETURN NEW (or NULL to cancel the operation)
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_gpa_check
BEFORE INSERT OR UPDATE ON Student
FOR EACH ROW EXECUTE FUNCTION validate_gpa();
-- FOR EACH ROW: fires once per modified row (row-level trigger)
-- FOR EACH STATEMENT: fires once per SQL statement (statement-level trigger)
-- Example 2: Audit trigger — log every deleted student
CREATE TABLE Student_Audit (
deleted_at TIMESTAMP DEFAULT NOW(),
sid INT, name VARCHAR(100), dept VARCHAR(50)
);
CREATE OR REPLACE FUNCTION audit_student_delete()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO Student_Audit(sid, name, dept)
VALUES (OLD.sid, OLD.name, OLD.dept); -- OLD = the deleted row
RETURN OLD;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_student_audit
AFTER DELETE ON Student
FOR EACH ROW EXECUTE FUNCTION audit_student_delete();
-- Trigger timing:
-- BEFORE trigger: can modify NEW or return NULL to cancel operation
-- AFTER trigger: cannot modify NEW; OLD available for DELETE; used for audit
-- ── STORED PROCEDURES ────────────────────────────────────────────────────
CREATE OR REPLACE PROCEDURE enroll_student(
p_sid INT,
p_cid INT,
p_grade CHAR(2) DEFAULT 'IP'
)
LANGUAGE plpgsql AS $$
BEGIN
IF NOT EXISTS (SELECT 1 FROM Student WHERE sid = p_sid) THEN
RAISE EXCEPTION 'Student % not found', p_sid;
END IF;
IF NOT EXISTS (SELECT 1 FROM Course WHERE cid = p_cid) THEN
RAISE EXCEPTION 'Course % not found', p_cid;
END IF;
INSERT INTO Enrollment(sid, cid, grade, year)
VALUES (p_sid, p_cid, p_grade, EXTRACT(YEAR FROM NOW())::INT)
ON CONFLICT (sid, cid) DO UPDATE SET grade = EXCLUDED.grade;
RAISE NOTICE 'Student % enrolled in course %', p_sid, p_cid;
END;
$$;
CALL enroll_student(1, 101, 'A');
-- ── GRANT / REVOKE ────────────────────────────────────────────────────────
CREATE ROLE student_role;
GRANT SELECT ON Student, Course, Enrollment TO student_role;
CREATE USER alice WITH PASSWORD 'pass123';
GRANT student_role TO alice;
-- WITH GRANT OPTION: alice can further grant this privilege to others
GRANT SELECT ON Student TO alice WITH GRANT OPTION;
REVOKE INSERT ON Enrollment FROM alice;
-- CASCADE: also revokes privileges alice granted using WITH GRANT OPTION
REVOKE SELECT ON Student FROM alice CASCADE;
DCL — GRANT and REVOKE Summary
| Command | Purpose | Example |
|---|
GRANT privilege ON object TO user | Give privilege | GRANT SELECT ON Student TO alice |
GRANT ... WITH GRANT OPTION | Give privilege + allow re-granting | GRANT SELECT ON Student TO alice WITH GRANT OPTION |
GRANT role TO user | Assign a role | GRANT faculty_role TO bob |
REVOKE privilege FROM user | Remove privilege | REVOKE INSERT ON Student FROM alice |
REVOKE ... CASCADE | Remove + propagate to dependents | REVOKE SELECT ON Student FROM alice CASCADE |
Authorization graph: A directed graph of granted privileges. If a privilege is revoked with CASCADE, all downstream grants (granted by that user) are also revoked.
Exam Quick-Reference — Chapter 3
- Relational algebra evaluation order: σ first (push down), π next, then ×/⋈
- SQL logical execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
- HAVING vs WHERE: WHERE filters rows before grouping; HAVING filters groups after GROUP BY. Use aggregates only in HAVING.
- Correlated vs Non-correlated subquery: correlated references outer query, runs per outer row; non-correlated runs once
- NOT IN vs NOT EXISTS: NOT EXISTS is safer when NULLs may be present in subquery result
- View vs Materialized View: view = stored query (always fresh); materialized view = stored result (stale until refreshed)
- Trigger: fires automatically on DML events; BEFORE can modify/cancel; AFTER cannot modify NEW
- GRANT WITH GRANT OPTION: user can re-grant; REVOKE CASCADE removes downstream grants