Chapter 7 — Transaction Processing and Concurrency Control
7.1 What is a Transaction?
A transaction is a logical unit of work — a sequence of one or more database operations that is treated as a single, indivisible operation. It either completes entirely (commit) or has no effect at all (abort/rollback).
Practical examples:
- Bank transfer: READ(A), A ← A−100, WRITE(A), READ(B), B ← B+100, WRITE(B)
- Seat reservation: check availability, lock seat, charge card, issue ticket confirmation
Transaction States
State Descriptions
| State | Description |
|---|
| Active | Transaction is executing operations normally (reading and writing) |
| Partially Committed | The last operation has been executed; changes are in the buffer pool but NOT yet flushed to stable disk |
| Committed | Transaction has completed successfully; all changes are durably written to stable storage |
| Failed | Transaction cannot proceed — caused by error, user ROLLBACK, system failure, or deadlock |
| Aborted | Transaction has been rolled back; ALL its effects have been undone; database is as if the transaction never ran |
Exam question: "Briefly describe the four primary states in a database transaction and illustrate with a state diagram."
Draw the diagram above and describe Active, Partially Committed, Committed, and Failed/Aborted.
7.2 ACID Properties
ACID is the cornerstone of database transaction management. All four properties must be guaranteed.
| Property | Formal Definition | Mechanism in DBMS |
|---|
| Atomicity | A transaction is all-or-nothing — ALL operations complete or NONE take effect | Undo logging (rollback journal) |
| Consistency | A transaction takes the DB from one valid state to another, preserving all integrity constraints | Integrity constraints + application logic |
| Isolation | Concurrent transactions appear to execute serially — each transaction is unaware of others | Concurrency control (locking, MVCC) |
| Durability | Once a transaction commits, its changes persist permanently even if the system crashes immediately after | Redo logging + stable storage |
Detailed ACID Example — Bank Transfer of $100 from A to B
T: Transfer $100 from Account A (balance=1000) to Account B (balance=500)
1. READ(A) → A_balance = 1000
2. A ← A − 100 → A_balance = 900
3. WRITE(A) → write A=900 to buffer
4. READ(B) → B_balance = 500
5. B ← B + 100 → B_balance = 600
6. WRITE(B) → write B=600 to buffer
7. COMMIT → flush to stable disk
| ACID Property | How it applies |
|---|
| Atomicity | If crash after step 3 but before step 6, the system must undo WRITE(A) and restore A=1000 |
| Consistency | At every point, total A+B must be preserved: 1500 before = 1500 after |
| Isolation | If T2 reads A while T is between step 3 and step 6, T2 should NOT see A=900 yet |
| Durability | After step 7 (commit), A=900 and B=600 must survive any subsequent crash |
7.3 Concurrent Execution and Anomalies
Multiple transactions execute concurrently to improve throughput. Without control, concurrency causes these anomalies:
| Anomaly | Description | Example |
|---|
| Lost Update | T1 and T2 both read X; T1 writes X; T2 then overwrites T1's change using the stale value it read earlier | Two ticket agents both read "1 seat left", both write "0 seats" — one reservation is lost |
| Dirty Read | T2 reads a value written by T1 before T1 commits; T1 later aborts | T2 reads T1's uncommitted salary update; T1 rolls back; T2 has acted on phantom data |
| Unrepeatable Read | T1 reads X; T2 updates X and commits; T1 reads X again and sees a different value | T1 checks account balance (1000); T2 withdraws and commits (800); T1 re-checks and sees 800 |
| Phantom Read | T1 reads rows matching a predicate; T2 inserts new matching rows; T1 re-reads — sees a different set | T1 counts CS students: 100; T2 admits new CS student; T1 re-counts: 101 |
7.4 Serializability
Schedules
A schedule is an ordering of operations from multiple concurrent transactions.
- Serial schedule: transactions execute one after another with no interleaving — always correct by definition
- Serializable schedule: an interleaved schedule equivalent to some serial schedule — the goal of concurrency control
Conflict Serializability
Conflicting operations: two operations from different transactions on the same data item, where at least one is a WRITE.
| Ti operation | Tj operation | Conflict? |
|---|
| Read(X) | Read(X) | No — both reading |
| Read(X) | Write(X) | Yes — read-write conflict |
| Write(X) | Read(X) | Yes — write-read conflict |
| Write(X) | Write(X) | Yes — write-write conflict |
A schedule is conflict serializable if it can be transformed into a serial schedule by repeatedly swapping adjacent non-conflicting operations.
Precedence (Serialization) Graph Test
Build a directed graph G:
- Nodes = one node per transaction
- Edges = draw edge Ti → Tj if any operation of Ti conflicts with and precedes any operation of Tj
Theorem:
- G has a cycle → schedule is NOT conflict serializable
- G is acyclic → schedule IS conflict serializable; topological order of G gives the equivalent serial schedule
Conflict Serialization Graph — Worked Example
Schedule S (interleaved):
R1(A) R2(A) W1(A) W2(A) R1(B) W1(B)
T1 operations: R1(A), W1(A), R1(B), W1(B)
T2 operations: R2(A), W2(A)
Step 1: Identify ALL conflicting pairs (diff transactions, same item, ≥1 write):
• R1(A) at pos1, W2(A) at pos4 → T1 precedes T2 → draw T1 → T2
• W1(A) at pos3, W2(A) at pos4 → T1 precedes T2 → T1 → T2 (already have it)
• R2(A) at pos2, W1(A) at pos3 → T2 precedes T1 → draw T2 → T1
Step 2: Precedence graph:
T1 → T2 AND T2 → T1 ← CYCLE EXISTS
Conclusion: Schedule S is NOT conflict serializable.
──────────────────────────────────────────────────────────────
Example of a SERIALIZABLE schedule:
Schedule S2:
R1(A) W1(A) R1(B) W1(B) R2(A) W2(A)
Conflicts:
W1(A) at pos2, R2(A) at pos5 → T1 → T2
W1(A) at pos2, W2(A) at pos6 → T1 → T2
R1(A) at pos1, W2(A) at pos6 → T1 → T2
Precedence graph: T1 → T2 only — NO CYCLE
Conclusion: S2 IS conflict serializable.
Equivalent serial schedule: T1 then T2 (from topological sort)
View Serializability
View serializability is weaker (broader) than conflict serializability. Schedule S is view equivalent to serial schedule S' if all three conditions hold:
- Same initial reads: If Ti reads the initial value of data item Q in S, Ti also reads it in S'
- Same dependent reads: If Ti reads the value of Q written by Tj in S, Ti also reads Q written by Tj in S'
- Same final writes: If Ti performs the last write of Q in S, Ti also performs the last write of Q in S'
Key relationship:
- Every conflict-serializable schedule is view-serializable
- Some view-serializable schedules are NOT conflict-serializable (e.g., blind writes — writing without first reading)
- Checking view serializability is NP-hard → DBMS use conflict serializability in practice
7.5 Lock-Based Protocols
Locking is the primary mechanism for ensuring conflict serializability in practice.
Lock Types and Compatibility
| Lock Type | Symbol | Meaning | Compatible with S? | Compatible with X? |
|---|
| Shared (Read) | S | Allows concurrent reads | ✓ Yes | ✗ No |
| Exclusive (Write) | X | No other read or write allowed | ✗ No | ✗ No |
Compatibility Matrix:
| S requested | X requested |
|---|
| S held by another T | ✓ Compatible — grant | ✗ Must wait |
| X held by another T | ✗ Must wait | ✗ Must wait |
Two-Phase Locking (2PL)
Rules:
- Growing phase: A transaction may acquire locks but may NOT release any locks
- Shrinking phase: A transaction may release locks but may NOT acquire any new locks
- The lock point is the moment when the transaction acquires its LAST lock (the peak)
Guarantee: Any schedule produced by 2PL is conflict serializable.
Variations of 2PL
| Variant | Additional Rule | Extra Guarantee |
|---|
| Basic 2PL | Grow then shrink; release locks at any time in shrinking phase | Conflict serializability |
| Strict 2PL | Hold all exclusive (write) locks until COMMIT or ABORT | Prevents cascading rollbacks; recoverable schedules |
| Rigorous 2PL | Hold ALL locks (read + write) until COMMIT or ABORT | Strongest guarantee; most commonly used in practice |
Cascading rollback problem (why Strict 2PL matters):
If T1 writes X and releases its write lock early, T2 reads X (dirty read). If T1 then aborts, T2 must also abort because it read uncommitted data. This cascade can propagate to many transactions — a nightmare for recovery. Strict 2PL prevents this by keeping write locks until commit.
7.6 Deadlock Handling
Deadlock occurs when two or more transactions are each waiting for a lock held by another — circular wait with no progress possible.
Classic example:
- T1 holds lock on A, waiting for lock on B
- T2 holds lock on B, waiting for lock on A
- Neither can proceed → deadlock
Deadlock Prevention Schemes (Timestamp-Based)
Both schemes assign a timestamp to each transaction when it starts. Older timestamp = higher priority.
| Scheme | Ti requests lock held by Tj | Who is killed? | Style |
|---|
| Wait-Die | If Ti is OLDER: Ti waits. If Ti is YOUNGER: Ti dies (aborts). | Younger transaction always dies | Non-preemptive: older never forcibly killed |
| Wound-Wait | If Ti is OLDER: Ti wounds Tj (forces Tj to abort and retry). If Ti is YOUNGER: Ti waits. | Younger transaction may be killed by an older one | Preemptive: older can preempt younger |
Starvation prevention: When a transaction is restarted after abort, it keeps its original timestamp so it gradually becomes the "oldest" and is never killed again.
Mnemonic:
- Wait-Die: OLD waits patiently; YOUNG dies immediately
- Wound-Wait: OLD wounds (attacks) young; YOUNG waits for old
Deadlock Detection: Wait-For Graph
Build a directed graph:
- Nodes = all active transactions
- Edge Ti → Tj = Ti is waiting for a lock currently held by Tj
Periodically check for cycles. If a cycle exists → deadlock detected.
Resolution: Choose a victim (typically the youngest or the one that has done the least work) and abort it. The cycle is broken; other transactions can proceed.
7.7 Multiple Granularity Locking
Instead of always locking at the individual tuple level, allow locking at different levels of the database hierarchy:
Database > Table (Relation) > Page > Tuple (Row)
Why useful? A transaction that scans an entire table can lock the table once (coarse granularity) instead of locking each of potentially millions of tuples individually.
Intention Locks
Before locking a node N at a fine level, you must place an intention lock on all ancestor nodes in the hierarchy (top-down).
| Lock Mode | Meaning |
|---|
| IS (Intention Shared) | Will acquire S lock(s) on some descendant(s) |
| IX (Intention Exclusive) | Will acquire X lock(s) on some descendant(s) |
| S (Shared) | Read the entire subtree (all descendant tuples) |
| SIX (Shared + Intention Exclusive) | Read entire subtree + write some children |
| X (Exclusive) | Write entire subtree |
Multiple granularity protocol:
- Top-down to acquire: acquire IS or IX on all ancestors before locking a node
- Bottom-up to release: release fine-grained locks before releasing coarse-grained locks
Exam Quick-Reference — Chapter 7
- Transaction states: Active → Partially Committed → Committed (happy path); Active → Failed → Aborted (sad path)
- ACID: Atomicity=undo log, Consistency=constraints+app logic, Isolation=locking/MVCC, Durability=redo log+stable storage
- Conflicting operations: different transactions + same data item + at least one write
- Precedence graph: edge Ti→Tj if Ti's operation conflicts with and precedes Tj's. Cycle → NOT serializable. Acyclic → IS serializable.
- 2PL: growing phase (acquire only) → lock point → shrinking phase (release only)
- Strict 2PL: hold write locks until commit — prevents cascading rollbacks
- Deadlock prevention — Wait-Die: old waits, young dies. Wound-Wait: old wounds young, young waits for old
- Deadlock detection: wait-for graph; cycle = deadlock; abort a victim to break the cycle