B+ tree

Comprehensive study notes, diagrams, and exam preparation for B+ tree.

B+ Tree

Definition

A B+ tree is a self-balancing tree data structure in which:

  • All actual data records or record pointers are stored only in the leaf nodes.
  • Internal nodes store only search keys and child pointers.
  • All leaf nodes are linked together in sorted order for fast sequential access.
  • The tree remains height-balanced, meaning all leaf nodes are at the same level.

This structure ensures that searches, insertions, deletions, and range queries can be performed efficiently even for very large datasets.


Main Content

1. Structure of B+ Tree

Internal nodes and leaf nodes

A B+ tree consists of two types of nodes. Internal nodes act as routing nodes that guide the search path, while leaf nodes contain the actual data entries or pointers to records. Internal nodes do not store the complete records, only the keys needed to direct the search.

Ordered leaf level with links

All leaf nodes are arranged in sorted order and connected through linked pointers. This makes B+ trees very efficient for sequential retrieval and range searches. For example, if you need all values from 20 to 50, once the first relevant leaf is found, the rest can be read by following leaf links without restarting the search from the root.

Example structure:

                [30 | 60]
              /      |      \
          [10 | 20] [40 | 50] [70 | 80]
          /   |   \   /  |  \   /  |   \
        L1  L2  L3  L4  L5  L6  L7  L8  L9

Leaf nodes are linked in sorted order:
L1 -> L2 -> L3 -> L4 -> L5 -> L6 -> L7 -> L8 -> L9

In real usage, the leaf nodes would contain actual keys and record pointers, such as:

Leaf: [10, 12, 15] -> [20, 25, 29] -> [30, 35, 39] -> ...

2. Properties of B+ Tree

Balanced height

Every root-to-leaf path has the same length, so the tree remains balanced after insertions and deletions. This guarantees predictable performance.

High branching factor

A node can have many children, so the tree height is usually very small compared to binary trees. Because disk I/O is expensive, fewer levels mean fewer disk accesses and much faster operations.

All data at leaf level

Since only leaf nodes store actual records, searching for an exact key always ends at the leaf level. This simplifies indexing and makes the leaf level the main data layer.

Dynamic growth and shrinkage

The tree adjusts automatically when keys are inserted or removed. Nodes may split, merge, or redistribute keys while maintaining balance and the B+ tree rules.

For example, if a leaf node becomes full during insertion, it is split into two nodes, and a separator key is pushed up to the parent. If deletion causes underflow, keys may be borrowed or nodes may merge.


3. Comparison with B-tree

Data placement difference

In a B-tree, keys and data may be stored in both internal and leaf nodes. In a B+ tree, data is stored only at the leaf level, while internal nodes store only keys.

Search and range query efficiency

B+ trees are often faster for range queries because the leaf nodes are linked. Once the search reaches the first key in the range, the remaining keys can be accessed sequentially. In a B-tree, such sequential traversal is less direct.

Higher fan-out in B+ tree

Because internal nodes in B+ trees store only keys and pointers, they can usually hold more keys than B-tree nodes of the same size. This reduces tree height and improves performance, especially in disk-based storage systems.

Simple comparison:

B-tree:

- Data can appear in internal nodes
- Search may finish at internal node
- Range traversal is less direct

B+ tree:

- Data only in leaf nodes
- Search always ends at leaf
- Leaf nodes are linked for fast range access

A B+ tree is generally preferred in database indexing because of its efficient ordered traversal and strong performance with large datasets.


Working / Process

1. Search operation

  • Start from the root node.
  • Compare the search key with the keys in the internal node.
  • Move to the appropriate child pointer based on the key range.
  • Continue until a leaf node is reached.
  • In the leaf, search for the exact key.
  • If the leaf nodes are linked, range searching becomes easier after the first match.

2. Insertion operation

  • Find the correct leaf node where the new key should be placed.
  • Insert the key in sorted order within the leaf.
  • If the leaf has space, the insertion ends.
  • If the leaf is full, split the leaf into two nodes.
  • Promote the separator key to the parent node.
  • If the parent overflows, it may also split.
  • Repeat the process upward if necessary, possibly creating a new root.

3. Deletion operation

  • Locate the key in the appropriate leaf node.
  • Remove the key from the leaf.
  • If the node still satisfies the minimum occupancy requirement, stop.
  • If the node underflows, borrow a key from a sibling if possible.
  • If borrowing is not possible, merge with a sibling.
  • Update the parent separator keys as required.
  • Continue adjusting upward if the parent also underflows.

Example of insertion split:

Suppose a leaf can hold 3 keys and we insert 40 into:

[10 | 20 | 30]

After insertion:

[10 | 20 | 30 | 40]   -> overflow

Split into:

[10 | 20] -> [30 | 40]

Then a separator key, often 30, is pushed into the parent to guide future searches.


Advantages / Applications

Efficient searching and indexing

B+ trees provide fast key lookup because the tree is balanced and has small height. This is especially useful for large databases where millions of records must be accessed quickly.

Excellent for range queries and sorted data access

The linked leaf nodes allow quick sequential access to records in sorted order. This makes B+ trees ideal for queries like WHERE marks BETWEEN 60 AND 80 or for generating ordered reports.

Widely used in real systems

B+ trees are heavily used in database management systems, file systems, and storage engines. They are common in indexing structures because they minimize disk I/O and support both random and sequential access efficiently.


Summary

  • A B+ tree is a balanced tree used for fast searching and indexing.
  • Internal nodes guide the search, while leaf nodes store the actual data.
  • Leaf nodes are linked together for quick range and sequential access.
  • Important terms to remember: root node, internal node, leaf node, separator key, split, merge, underflow, overflow