Introduction to Concurrency and Recovery

Definition

Concurrency is the ability of a system to manage multiple operations at overlapping times, ensuring that shared resources such as memory, files, or database records are accessed safely and efficiently.

Recovery is the set of methods used to restore a system, transaction, or database to a correct and consistent state after a failure, using techniques such as logging, checkpointing, undo, and redo.

Together, concurrency and recovery form the foundation of reliable multi-user systems, where many activities happen simultaneously but still produce correct results even in the presence of failures.

Main Content

1. Concurrency Control

Concurrency control is the mechanism used to coordinate simultaneous access to shared data so that operations do not interfere with each other and cause errors such as lost updates, dirty reads, or inconsistent results.
It ensures correctness by making concurrent execution appear as if transactions were executed in some serial order, even though they may actually overlap in time.

Concurrency is important because multiple users often interact with the same system at the same moment. Without control, one transaction may overwrite another’s changes, read uncommitted data, or produce results that do not reflect any logical sequence of operations.

Common problems in concurrency include:

Lost update

: Two transactions read the same value and both write back changes, causing one update to disappear.

Dirty read

: A transaction reads data written by another transaction that has not yet committed.

Non-repeatable read

: A transaction reads the same record twice and gets different values because another transaction modified it in between.

Phantom read

: A transaction re-executes a query and finds new rows added by another transaction.

To prevent these issues, systems use techniques such as:

Locks

: A transaction locks a data item before using it.
Shared lock for reading.
Exclusive lock for writing.

Two-Phase Locking (2PL)

: A transaction first acquires locks and later releases them, ensuring serializability.

Timestamp ordering

: Each transaction is assigned a timestamp, and operations are ordered according to those timestamps.

Validation-based control

: Transactions are allowed to proceed, but before commit their results are checked for conflicts.

A simple example is an online bank balance update:

Transaction T1 wants to deposit $100.
Transaction T2 wants to withdraw $50.
If both read the original balance at the same time without control, one update may be lost.
With concurrency control, the system ensures the balance is updated correctly and consistently.

2. Transaction Recovery

Transaction recovery is the process of bringing a database back to a consistent state after a failure by using techniques that undo incomplete work and redo committed work if necessary.
It protects data from corruption caused by crashes, hardware failures, software bugs, or power interruptions.

In database systems, a transaction is a unit of work that must follow the ACID properties:

Atomicity

: All actions happen completely or not at all.

Consistency

: The database remains in a valid state.

Isolation

: Concurrent transactions do not interfere incorrectly.

Durability

: Once committed, changes persist even after failure.

Recovery is mainly required because transactions may fail at different stages:

Before commit: partial changes must be undone.
After commit but before data is written permanently: committed changes may need to be redone.
During system crash: the database may contain incomplete or unflushed updates.

Key recovery techniques include:

Logging

The system records transaction actions in a log.
Example log entries: start, update old value, update new value, commit, abort.

Undo

Reverses changes made by uncommitted transactions.
Ensures incomplete transactions do not affect the database.

Redo

Reapplies changes of committed transactions that were not fully saved.
Ensures durability after crash.

Checkpointing

The system periodically saves a stable point in the log.
Recovery can start from the latest checkpoint rather than from the beginning.

Shadow paging

Uses copies of pages so that updates are applied safely before being made permanent.

Example:

Suppose a transfer transaction moves $200 from Account A to Account B.

Debit A by $200.
Credit B by $200.
Commit the transaction.

If the system crashes after debiting A but before crediting B, recovery must undo the debit if the transaction was not committed. If it had committed but the second update was not saved, recovery must redo the missing credit.

3. Concurrency and Recovery Interaction

Concurrency and recovery are closely related because many transactions may be executing at the same time while failures can occur at any point, so both correctness under overlap and correctness after failure must be maintained together.
Recovery mechanisms depend on concurrency rules to interpret transaction states correctly, and concurrency control often supports recovery by ensuring a predictable order of operations.

The interaction between the two can be understood through these ideas:

Isolation supports recovery

If one transaction cannot see uncommitted changes of another, it becomes easier to undo failures safely.

Logging supports both

Logs record the sequence of actions and help reconstruct the correct state.

Commit order matters

A transaction should not be considered permanent until all necessary conditions are met.

Serializability and recoverability

A schedule should be recoverable, meaning transactions commit only after the transactions whose results they depend on have committed.

Important schedule-related concepts:

Recoverable schedule

A transaction commits only after all transactions whose data it read have committed.

Cascadeless schedule

Transactions only read committed data, preventing cascading rollbacks.

Strict schedule

A transaction cannot read or write a data item until the previous transaction that accessed it has committed or aborted.
This is highly desirable because it simplifies recovery.

Example of why interaction matters:

T1 writes a new value.
T2 reads that uncommitted value.
If T1 aborts, then T2 has used invalid data and must also be rolled back.
This is called cascading rollback, and it can be expensive.

Systems try to avoid this by using strict concurrency control and robust recovery mechanisms together.

Working / Process

1. Transaction execution under concurrency control

Multiple transactions begin executing at the same time.
Each transaction requests access to data items.
The concurrency control method grants or denies access based on rules such as locks, timestamps, or validation.
This step prevents interference and ensures that the resulting schedule remains correct.

2. Logging and monitoring during execution

Every important operation is written to a log before or during the actual data change, depending on the logging protocol.
The system records transaction start, updates, commit, and abort information.
Checkpoints may also be taken periodically to reduce recovery time.
This creates a reliable record of what happened, which is essential if failure occurs.

3. Failure detection and recovery actions

If a crash, power failure, or software error occurs, the recovery manager examines the log.
It identifies committed transactions whose effects must be redone and uncommitted transactions whose effects must be undone.
After undo and redo operations, the database is restored to a consistent state.

The process can be visualized as follows:

Transaction Start
       |
       v
Concurrency Control Checks
       |
       v
Read/Write Data
       |
       v
Log Records Written
       |
       v
Commit or Abort
       |
       +----------------------+
                              |
                          Failure?
                              |
                    +---------+---------+
                    |                   |
                   No                  Yes
                    |                   |
                    v                   v
             Normal Completion   Recovery Manager
                                         |
                                         v
                               Undo / Redo / Checkpoint
                                         |
                                         v
                               Consistent Database State

A simple recovery flow example:

T1 updates a record and commits.
T2 updates another record but crashes before commit.
Recovery examines the log.
T1’s changes are redone if needed.
T2’s changes are undone because they were not committed.
Final database state is correct and stable.

Advantages / Applications

Improved performance and resource usage

Concurrency allows many users and processes to work at the same time, maximizing CPU, memory, and disk utilization.
This is especially important in web servers, databases, and cloud platforms where requests arrive continuously.

Data correctness and reliability

Concurrency control prevents conflicts such as lost updates and dirty reads, while recovery ensures the database remains correct after failures.
This is critical in financial systems, airline booking systems, and healthcare records, where errors can have serious consequences.

Fault tolerance and business continuity

Recovery mechanisms help systems continue functioning after crashes by restoring data to a valid state.
Organizations can reduce downtime and avoid permanent data loss through logs, checkpoints, and rollback/redo procedures.

Applications include:

Banking systems

Multiple customers withdraw, deposit, and transfer funds simultaneously.
Recovery ensures balances remain correct even if a failure occurs mid-transaction.

E-commerce platforms

Many users add items to carts, place orders, and make payments concurrently.
Recovery preserves order records and payment consistency after failures.

Reservation systems

Airline and hotel booking systems must prevent double booking.
Concurrency control ensures only one transaction confirms the final seat or room.

Operating systems and multiprocessor systems

Multiple processes and threads share CPU time and system resources.
Synchronization and recovery help avoid race conditions and maintain stability.

Distributed databases and cloud services

Data is often stored across multiple machines.
Concurrency and recovery mechanisms help maintain consistency despite network delays or node failures.

Summary

Concurrency allows multiple operations to run at the same time safely.
Recovery restores the system to a correct state after failure.
Important terms to remember: locks, logging, rollback, redo, checkpoint, serializability, atomicity.