Heap Files and Sorted files

Comprehensive study notes, diagrams, and exam preparation for Heap Files and Sorted files.

Heap Files and Sorted Files

Definition

A heap file is a file organization in which records are placed wherever space is available, with no guaranteed order of insertion or sorting.

A sorted file is a file organization in which records are physically stored in sorted order according to a particular attribute, usually a key field such as Roll Number, Employee ID, or Account Number.


Main Content

1. Heap File Organization

No ordering constraint

  • In a heap file, records are stored in the order they arrive, or wherever free space is found. There is no need to maintain sorted sequence.

Efficient insertion

  • Since the system does not need to locate the correct sorted position, insertion is usually very fast and simple.

A heap file is the simplest form of file organization in a database. When a new record is inserted, it is appended to the file or placed in an available empty space in a data page. Because there is no sorting overhead, heap files are widely used in situations where the database receives frequent inserts.

How records are stored

Records may be stored in:

  • contiguous pages when space is available,
  • linked pages with overflow handling,
  • pages with free space lists maintained by the DBMS.

For example, suppose a student database stores records as they come:

Roll No Name Age
105 Asha 20
101 Ravi 21
110 Meena 19

In a heap file, these records may appear in exactly this order, even though the roll numbers are not sorted.

Characteristics of heap files

  • Records are unordered.
  • New records can be added quickly.
  • Searching may require a full scan unless an index exists.
  • Deletion is usually handled by marking the record as deleted or freeing its slot.
  • Updates are easy if the record size does not change significantly.

When heap files are useful

Heap files are ideal when:

  • insert operations are very frequent,
  • the application does not require records to be in any specific order,
  • the system relies on indexes for faster searching rather than physical order.

2. Sorted File Organization

Records stored in order

  • In a sorted file, records are arranged according to a key field, such as student roll number or account number.

Fast key-based retrieval

  • Since records are in sorted order, searching by the sort key can be done more efficiently, often using binary search on pages.

Sorted files are designed to make retrieval efficient for queries on the sorted attribute. For example, if a payroll file is sorted by Employee ID, looking up one employee or a range of employees becomes much faster than scanning an unordered file.

Example of sorted storage

Suppose the following records exist:

Roll No Name Age
101 Ravi 21
105 Asha 20
110 Meena 19

If sorted by Roll No, they are physically stored in ascending order. This is very helpful for:

  • exact-match search,
  • range queries,
  • ordered output such as reports.

Characteristics of sorted files

  • Records are maintained in sorted sequence.
  • Searching is faster than in heap files if the search key matches the sort key.
  • Insertions are slower because records may need to be shifted to maintain order.
  • Deletions can create gaps or require reorganization.
  • Range queries are very efficient.

When sorted files are useful

Sorted files are suitable when:

  • the database frequently searches by the same key,
  • range queries are common,
  • ordered output is needed directly from storage.

3. Comparison Between Heap Files and Sorted Files

Storage order

  • Heap files have no particular order, while sorted files maintain a strict sequence.

Performance trade-off

  • Heap files favor fast insertions; sorted files favor fast searches and range queries.

The main difference between these two file organizations is the balance between insertion speed and search efficiency.

Heap file vs sorted file

Feature Heap File Sorted File
Record order Unordered Ordered by key
Insertion Fast Slow
Search Usually slow without index Faster
Range query Inefficient Efficient
Update cost Low to moderate Can be high
Deletion Simple May need reorganization
Best use case Frequent inserts Frequent searches and ordered retrieval

Practical trade-offs

  • If a university admissions system inserts thousands of new records daily, heap files are efficient.
  • If a bank needs frequent balance and account number lookups, sorted files can improve retrieval.
  • In real DBMSs, these file organizations are often combined with indexing structures for better overall performance.

Diagram for record organization

Heap file:

Page 1:  [105:Asha] [110:Meena]
Page 2:  [101:Ravi] [120:Karan]
Page 3:  [108:Neha] [115:Ali]

Sorted file:

Page 1:  [101:Ravi] [105:Asha]
Page 2:  [108:Neha] [110:Meena]
Page 3:  [115:Ali]  [120:Karan]

This shows that the heap file stores records in no special order, while the sorted file preserves ascending order.


Working / Process

1. Record insertion

  • In a heap file, the DBMS checks for free space in existing pages or creates a new page and places the record there.
  • In a sorted file, the DBMS first finds the correct position for the new record based on the sorting key.
  • If necessary, records are shifted to preserve order.

2. Record retrieval

  • In a heap file, the system may need to scan records one by one unless there is an index.
  • In a sorted file, the system can use efficient search techniques because records are arranged in order.
  • Range retrieval is especially efficient in sorted files because nearby values are stored together.

3. Record deletion and update

  • In a heap file, deletion often means marking a record as deleted or reclaiming its slot.
  • In a sorted file, deleting a record may leave a gap that must be reused later or cleaned through file reorganization.
  • Updates in sorted files can be costly if the key value changes, because the record may need to move to a new position.

Advantages / Applications

Heap files are easy and fast for insertion

  • Since no order must be preserved, data can be written quickly.
  • This is useful in log files, temporary data storage, and high-insert workloads.

Sorted files are efficient for ordered searching

  • Searching by the sort key is faster, and range queries are highly efficient.
  • This makes them useful in reports, ledgers, and systems where records are often accessed in sequence.

Both file types support different database needs

  • Heap files are best when insert speed matters most.
  • Sorted files are best when retrieval by key or ordered output matters most.
  • Database designers choose based on workload patterns and performance requirements.

Summary

  • Heap files store records without any order, making inserts simple and fast.
  • Sorted files store records in key order, making searches and range access more efficient.
  • The choice between them depends on whether the system needs faster insertion or faster retrieval.
  • Important terms to remember: heap file, sorted file, record, key field, insertion, search, range query