Indexing

Definition

An index is a data structure that stores key values from one or more columns of a table along with pointers to the actual rows, allowing faster access to records without scanning the entire table.

In simple words, indexing creates a shortcut to data. Instead of checking each record one by one, the database uses the index to jump quickly to the required location.

Example:
If a student database has millions of records and we want to find a student by roll number, an index on the roll number column allows the database to find the student much faster than a full table scan.

Main Content

1. Basic Concept of Indexing

Purpose of indexing

The primary purpose of indexing is to reduce the time required to search, retrieve, and sort data. Without indexing, the database may need to perform a full table scan, which means checking every row until the required data is found. This becomes inefficient when the table is very large.

How indexing works conceptually

An index is like a separate lookup table that contains selected column values and references to the actual data records. It organizes these values in a way that supports quick searching, often using tree-based structures such as B-trees or hash-based structures.

Example:
If a book has an index at the end, and you want information on “indexing,” you do not read the whole book; you go directly to the relevant page number. Similarly, a database index guides the system directly to the required rows.

2. Types of Indexing

Primary indexing

A primary index is created on a primary key or a unique key. Since primary key values are unique and usually stored in sorted order, this type of indexing makes record retrieval very efficient. It is commonly used in tables where each record must be uniquely identified.

Secondary indexing

A secondary index is created on a non-primary key column. This is useful when queries frequently search based on columns other than the primary key, such as name, department, city, or date. Since these values may repeat, the index often needs to handle multiple matching records.

Clustering and non-clustering indexing

In clustering indexing, the physical order of records in the table is based on the index key, which improves range queries and sequential access. In non-clustering indexing, the physical order of the table is different from the index order, so the index contains pointers to data locations.

Example:
A school database may use a primary index on StudentID, a secondary index on Department, and a clustered index on AdmissionDate if records are stored by admission order.

3. Index Structures and Properties

B-tree and B+ tree indexing

B-tree and B+ tree structures are commonly used because they keep data balanced and allow fast search, insertion, and deletion. In these structures, the tree remains organized even when data changes, which ensures predictable performance.

Hash indexing

Hash indexing uses a hash function to convert a search key into a location where the data is stored. This is very fast for exact-match queries, such as finding a user by ID, but it is not efficient for range queries like “find all students between roll numbers 100 and 200.”

Dense and sparse indexes

In a dense index, every search key value has an index entry. In a sparse index, only some key values are stored, usually one entry per block or page. Dense indexes provide faster lookup but use more space, while sparse indexes use less space but may require extra searching.

ASCII diagram for index lookup:

Index:
A  -> Block 1
C  -> Block 3
F  -> Block 5
K  -> Block 8

Data Blocks:
Block 1: A, A, B
Block 3: C, D, E
Block 5: F, G, H
Block 8: K, L, M

This shows how the index points to the correct block instead of scanning all blocks.

Working / Process

1. Create the index on selected column(s)

The database administrator or database engine chooses one or more columns that are frequently used in search conditions, joins, or sorting. Then an index structure is created for those columns. For example, creating an index on CustomerName helps queries that search customers by name.

2. Store key values with pointers to records

Once the index is created, the database stores the chosen column values in sorted or organized form, along with pointers that tell where the actual records are located in the table. These pointers may refer to rows, pages, or blocks depending on the storage system.

3. Use the index during query execution

When a query is executed, the database checks whether an index can help. If suitable, it searches the index first, finds the relevant pointer, and then accesses the required record directly. This avoids reading the full table and saves time, especially in large datasets.

Example process:
Suppose a query asks for EmployeeID = 205. The database looks up 205 in the index, finds the pointer, and retrieves the exact row immediately instead of searching every employee record.

Advantages / Applications

Faster data retrieval

Indexing significantly reduces the time needed to find records, especially in large databases. This is the most important advantage because it improves overall query performance.

Efficient searching, sorting, and joining

Indexes help with search conditions in WHERE clauses, ordering results using ORDER BY, and combining tables using joins. They are especially useful in systems with frequent read operations.

Useful in real-world applications

Indexing is widely used in banking systems, e-commerce websites, student databases, library management systems, and social media platforms where large volumes of data must be searched quickly.

Example applications:

Searching a product by product ID in online shopping
Finding a patient record in a hospital database
Retrieving a book by ISBN in a library system
Locating an employee by email in an HR system

Summary

Indexing is a method used to speed up data retrieval in databases.
It works by creating a structured shortcut to records.
It is helpful for searching, sorting, and joining large amounts of data.
Important terms to remember: index, primary index, secondary index, clustered index, non-clustered index, dense index, sparse index, B-tree, B+ tree, hash index