Hashing techniques

Definition

Hashing is the process of transforming a key or input message into a fixed-length value using a hash function. The resulting value is called a hash, hash code, or digest, and it is typically used to locate data in a hash table or to verify data integrity.

A good hashing technique aims to:

Produce the same hash for the same input every time
Distribute values uniformly across the available range
Minimize collisions, where different inputs produce the same hash value
Allow fast insertion, searching, and deletion in data structures

For example, if a student ID is used as a key, hashing may convert it into an index in an array so that the student record can be stored and accessed quickly.

Main Content

1. Hash Function

A hash function is the core of any hashing technique. It takes an input key such as a number, string, or record and converts it into a numerical value, usually within a fixed range.
The quality of the hash function strongly affects performance. A good hash function should be deterministic, fast to compute, and capable of distributing keys evenly to reduce clustering and collisions.

A simple example of a hash function for integers is:

h(key) = key mod table_size

If the table size is 10:

h(23) = 23 mod 10 = 3
h(45) = 45 mod 10 = 5
h(33) = 33 mod 10 = 3

Here, 23 and 33 map to the same location, which leads to a collision.

A hash function used in practice may be much more sophisticated. For strings, it may combine character codes using multiplication, addition, and modulus operations to create a more uniform distribution.

Important characteristics of a good hash function:

It should be easy and quick to compute
It should spread keys evenly
It should reduce the chance of two different keys mapping to the same index
It should be stable so that the same key always gives the same result

2. Hash Table

A hash table is a data structure that stores key-value pairs using a hash function to compute the index where each item should be stored.
It provides very fast access in average cases, often close to constant time for searching, inserting, and deleting.

A hash table usually contains:

An array of fixed size
A hash function to compute array positions
A strategy to handle collisions

Example of storage:

If the table size is 7 and the keys are 14, 21, and 28, using key mod 7:

14 mod 7 = 0
21 mod 7 = 0
28 mod 7 = 0

All three keys map to index 0, so collision handling becomes necessary.

A hash table can store:

Simple values, such as names or numbers
Complex records, such as employee details, student marks, or database rows

Why hash tables are powerful:

They provide fast lookups
They are suitable for large data sets
They support efficient membership testing, such as checking whether an item exists

Simple structure:

Index:   0   1   2   3   4   5   6
        -------------------------
Table:  |   |   |   |   |   |   |
        -------------------------

After inserting some values:

Index:   0   1   2   3   4   5   6
        -------------------------
Table:  |14 |   |   |   |   |   |
        -------------------------

3. Collision Handling

A collision occurs when two or more different keys produce the same hash value.
Collision handling is essential because collisions are unavoidable in real systems, especially when the number of keys is larger than the number of available table positions.

There are several common collision handling methods:

a) Chaining

In chaining, each table location stores a list, linked list, or bucket of all items that hash to the same index.

Example: If keys 14, 21, and 28 all hash to index 0, the table entry at index 0 stores all three items in a chain.

Index 0: 14 -> 21 -> 28

Advantages of chaining:

Simple to implement
Works well even when the table is moderately full
Easy to insert new items

b) Open Addressing

In open addressing, when a collision occurs, the algorithm searches for another empty slot in the table according to a probing sequence.

Common probing methods include:

Linear probing

: check the next slot one by one

Quadratic probing

: use a quadratic function to determine the next slot

Double hashing

: use a second hash function to compute the next step

Example of linear probing: If index 3 is occupied, check 4, then 5, then 6, and so on.

Advantages of open addressing:

Uses only the hash table array
Avoids extra linked structures
Can be memory efficient

Disadvantages:

Performance can degrade as the table becomes full
Can cause clustering, especially with linear probing

c) Rehashing / Resizing

When collisions become too frequent or the load factor becomes too high, the table can be expanded and all existing keys are inserted again using a new table size.

This improves distribution and performance but requires extra time during resizing.

Working / Process

1. Choose a hash function and table size

Select a suitable hash function based on the type of data, and decide the size of the hash table. The table size is often chosen as a prime number to improve distribution and reduce clustering.

2. Convert the key into an index

Apply the hash function to the key to get an index in the hash table. For example, if the key is 52 and the table size is 10, then 52 mod 10 = 2.

3. Store or retrieve the item and handle collisions if needed

If the computed slot is empty, insert the item there. If another item is already present, apply the collision handling method such as chaining or probing. For searching, use the same hash function and collision strategy to find the item efficiently.

Example workflow:

Insert key 23 into a table of size 10
Compute 23 mod 10 = 3
Place 23 at index 3
Insert key 33
Compute 33 mod 10 = 3
Since index 3 is occupied, handle the collision using chaining or probing

For a chaining-based table:

Index 3: 23 -> 33

For open addressing with linear probing:

Put 23 at index 3
If 33 also hashes to 3 and index 4 is empty, place 33 at index 4

Advantages / Applications

Hashing provides very fast average-case operations for search, insertion, and deletion, making it highly efficient for large data sets.
It is widely used in dictionaries, symbol tables, caches, database indexing, password storage systems, digital signatures, and file integrity checking.
Hashing helps reduce the time complexity of many tasks from linear time to near constant time in practical use, especially when the hash function is well designed and collisions are managed properly.

Additional major applications include:

Password verification

: storing hashed passwords instead of plain text

Data integrity checking

: detecting whether files have been altered

Database indexing

: quickly locating rows using key-based hashing

Compiler design

: storing identifiers in symbol tables

Caching

: quickly retrieving previously stored results

Distributed systems

: assigning data across servers using consistent hashing

Why hashing is valuable:

Fast access
Efficient memory use in many cases
Useful for exact-match queries
Strong support for security-related tasks when cryptographic hashes are used

Summary

Hashing converts data into a fixed value for fast storage and retrieval.
A hash table uses a hash function to place and find items efficiently.
Collisions are handled using methods like chaining, linear probing, quadratic probing, and double hashing.
Important terms to remember: hash function, hash table, collision, chaining, open addressing, probing, load factor, rehashing, digest, index