Multiversion schemes

Comprehensive study notes, diagrams, and exam preparation for Multiversion schemes.

Multiversion Schemes

Definition

A multiversion scheme is a concurrency control method in which multiple versions of a data item are stored simultaneously, and transactions read or write versions according to their timestamps, isolation rules, or scheduling policy. The main idea is that readers can access an appropriate committed version without being forced to wait for writers, while writers create new versions rather than overwriting existing ones.

In database theory, this approach is commonly associated with Multiversion Concurrency Control (MVCC). Each update produces a new version with metadata such as creation time, validity range, or transaction identifier. The system uses this metadata to determine which version is visible to which transaction.


Main Content

1. First Concept

Multiple Versions of the Same Data Item

In a multiversion scheme, a data item such as a bank balance, product price, or student record does not exist in only one form. Instead, the system keeps several versions of it. For example, if a row in a table is updated three times, the database may preserve the old committed versions along with the newest one. This allows earlier transactions to continue seeing a stable view of the data while newer transactions work with updated values.

Example: Suppose an account balance starts at 500. Transaction T1 changes it to 550, and later T2 changes it to 600. Instead of replacing 500 with 550 and then 600 immediately, the system may keep all three versions temporarily, each tagged with information about when it became valid.

Version Metadata and Visibility Rules

Every version usually contains metadata that helps the system decide which transactions can see it. Common metadata includes transaction timestamps, commit time, start time, or version numbers. The visibility rules determine whether a version is current, committed, obsolete, or still being created.

Example: A transaction that started before T2 committed may still see version 550, even though version 600 exists. This prevents the transaction from reading a partially changing database state and gives it a consistent snapshot.

2. Second Concept

Read-Write Separation

One of the most important ideas in multiversion schemes is that reads and writes do not interfere with each other in the same way they do in single-version systems. Readers can use an older committed version, while writers create a new version without overwriting the one being read. This reduces waiting and improves concurrency.

Example: If one user is reading a product record at the same time another user updates the price, the reader can still access the old committed price until its transaction ends. The writer’s new price becomes visible only when the update is committed according to the scheme’s rules.

Snapshot Consistency

A transaction often works on a snapshot, meaning it sees a logically consistent view of the database as it existed at a particular point in time. This is especially useful for long-running analytical queries because they can read without being disturbed by ongoing updates.

Example: A report-generating transaction may start at 10:00 AM and read the same consistent snapshot throughout execution, even if hundreds of updates occur afterward. This avoids inconsistent results such as counting a row before it is updated and another row after it is updated.

3. Third Concept

Common Types of Multiversion Schemes

There are several variants of multiversion schemes, each with a slightly different strategy for deciding which version is visible and how conflicts are handled. The most important forms include timestamp-based multiversion control, multiversion read consistency, and MVCC-based scheduling in databases.

  • Timestamp-based multiversion ordering: Uses transaction timestamps to decide version visibility and commit order.
  • Snapshot isolation / MVCC: Allows each transaction to read from a stable snapshot while writes create new versions.
  • Historical versioning: Keeps older versions for recovery, auditing, or temporal queries.

Conflict Handling and Serialization

Multiversion schemes do not eliminate conflicts entirely; instead, they manage them in a way that preserves correctness. Write-write conflicts still need special handling because two transactions cannot usually finalize two different updates to the same item at the same time. The system may abort one transaction or use ordering rules to decide which update survives.

Example: If T1 and T2 both try to update the same customer address, the database may let both create tentative versions, but only one commit path will succeed if their isolation rules require serializability.


Working / Process

1. Transaction Starts and Gets a View

  • When a transaction begins, the system assigns it a timestamp, snapshot, or version view.
  • This determines which versions of data it is allowed to see during its execution.
  • The transaction does not necessarily see the most recent committed data; it sees the version that is valid for its snapshot.

2. Reads Use the Appropriate Existing Version

  • When the transaction reads a data item, the system selects the newest version that is visible to that transaction.
  • If multiple versions exist, the system chooses the one that satisfies the visibility rule.
  • This means reads usually do not block even when other transactions are updating the same data.

3. Writes Create New Versions and Commit Rules Apply

  • When a transaction modifies data, a new version is created rather than replacing the old one immediately.
  • The system then checks for conflicts, validates commit conditions, and marks the new version as committed if successful.
  • Older versions may be retained for other active transactions and later cleaned up by garbage collection or version pruning.

Example workflow for a record A:

Initial state:   A1 = 100
T1 reads A1
T2 updates A -> A2 = 120
T1 still sees A1
T2 commits A2
T3 starts later and sees A2

This process shows how different transactions can observe different versions without corrupting consistency.


Advantages / Applications

Improved concurrency and higher throughput

Since readers do not block writers and writers do not usually block readers, many transactions can run in parallel. This is especially beneficial in systems with heavy read traffic.

Consistent snapshot reads for analytics and reporting

Long-running queries can execute on a stable version of the data, producing reliable reports without being affected by concurrent updates. This is useful in business intelligence, auditing, and decision-support systems.

Use in modern databases, recovery, and temporal data management

Multiversion schemes are used in relational databases, distributed databases, and systems that need historical records. They also support features like point-in-time recovery, auditing changes, and time-travel queries.


Summary

  • Multiversion schemes store more than one version of a data item to improve concurrency.
  • They let transactions read stable versions while updates create new ones.
  • They are widely used in MVCC-based database systems for fast, consistent access.
  • Important terms to remember: version, snapshot, visibility, commit, MVCC