Properties and Lossless Join Decomposition

Definition

A decomposition of a relation schema $R$ is the process of splitting $R$ into two or more smaller relation schemas $R_1, R_2, \dots, R_n$ such that the attributes of the smaller schemas together form the attributes of the original schema.

A decomposition is called lossless join decomposition if, for every valid relation instance of $R$ , joining the decomposed relations always produces exactly the original relation, with no loss of information and no extra tuples.

In simple words:

Lossless join

means: $R = R_1 \bowtie R_2 \bowtie \cdots \bowtie R_n$ after decomposition and natural join.
If the join produces extra rows not present in the original relation, the decomposition is lossy.

Main Content

1. First Concept

Properties of Decomposition

Attribute Preservation

: Every attribute of the original relation must appear in at least one of the decomposed relations. If any attribute is missing, the original data cannot be reconstructed properly.

Dependency Preservation

: A good decomposition should preserve the functional dependencies of the original relation so that constraints can be checked without joining relations repeatedly. This reduces complexity and improves efficiency.

Lossless Join Property

: The most important property of decomposition is that when the decomposed relations are joined, they should reproduce the original relation exactly. This avoids spurious tuples and data inconsistency.

Redundancy Reduction

: Decomposition should reduce duplication of data. By splitting a relation into well-designed smaller relations, repeated values are minimized.

Anomaly Prevention

: Proper decomposition helps eliminate insertion, deletion, and update anomalies. For example, if a course and student details are stored in one table, deleting a student row may accidentally remove course information; decomposition avoids this.

Example:

Suppose we have a relation:

$R(StudentID, StudentName, CourseID, CourseName)$

This relation may have redundancy because CourseName repeats for many students enrolled in the same course.

A better decomposition could be:

$R_1(StudentID, StudentName, CourseID)$
$R_2(CourseID, CourseName)$

Here, course details are separated from student details, reducing redundancy.

2. Second Concept

Lossless Join Decomposition

Meaning

: A decomposition is lossless if no information is lost when relations are joined back together. The natural join of decomposed relations must produce exactly the original relation.

Why It Matters

: If decomposition is lossy, joining the tables may create unwanted combinations of tuples. These are called spurious tuples and can lead to wrong query results.

Binary Decomposition Condition

: For decomposition of $R$ into $R_1$ and $R_2$ , the decomposition is lossless if: $(R_1 \cap R_2) \rightarrow R_1 \quad \text{or} \quad (R_1 \cap R_2) \rightarrow R_2$ under the given set of functional dependencies.

This means the common attributes between the two relations must functionally determine one of the relations.

Spurious Tuples Example

: Let:

$R(A, B, C)$

and suppose we decompose it into:

$R_1(A, B)$
$R_2(B, C)$

If B does not determine either A or C, then joining $R_1$ and $R_2$ may create extra tuples.

Example instance of $R$ :

A	B	C
1	x	5
2	x	6

Decomposed relations:

$R_1(A,B)$

A	B
1	x
2	x

$R_2(B,C)$

B	C
x	5
x	6

Natural join gives:

A	B	C
1	x	5
1	x	6
2	x	5
2	x	6

The extra tuples $(1,x,6)$ and $(2,x,5)$ are spurious, so this decomposition is lossy.

ASCII view of the issue:

Original relation R(A,B,C)
      |
      | decomposition
      v
   R1(A,B)      R2(B,C)
      \          /
       \        /
        \      /
       natural join
            |
            v
   Should equal original relation exactly

3. Third Concept

Testing Lossless Join Decomposition

Using Functional Dependencies

: To test whether a decomposition is lossless, we examine the functional dependencies (FDs) that hold on the relation.

For Two Relations

: If $R$ is decomposed into $R_1$ and $R_2$ , and their common attributes $R_1 \cap R_2$ functionally determine either $R_1$ or $R_2$ , then the decomposition is lossless.

For More Than Two Relations

: The test becomes more complex. A common method uses the chase procedure, which checks whether a decomposition preserves the original information by manipulating a table based on the FDs.

Chase Procedure Idea

: Start with a table representing attributes and decomposed relations. Apply the FDs repeatedly. If a row becomes identical to the original full row, the decomposition is lossless.

Practical Importance

: This test is essential during normalization to ensure that splitting tables does not damage the ability to reconstruct the original data.

Example of Binary Lossless Condition:

Suppose:

$R(A, B, C)$
FDs: $A \rightarrow B$
Decomposition:
$R_1(A, B)$
$R_2(A, C)$

Here, the common attribute is A.

Since A → B, A determines all attributes of $R_1$ , so the decomposition is lossless.

If we join $R_1$ and $R_2$ , we get exactly the original tuples.

Working / Process

1. Identify the original relation and its functional dependencies

Determine the relation schema $R$ and list all the FDs that hold on it. These dependencies are essential for deciding whether a decomposition is valid and lossless.

2. Decompose the relation into smaller schemas

Split the large relation into two or more relations based on normalization goals such as reducing redundancy or removing anomalies. Ensure that all original attributes are included across the decomposed relations.

3. Check whether the decomposition is lossless

For a binary decomposition, verify whether the intersection of the two relations functionally determines one of them. For multiple relations, apply the chase test or equivalent method. If the join reproduces exactly the original relation, the decomposition is lossless.

Advantages / Applications

Prevents data loss during reconstruction

A lossless decomposition guarantees that the original information can be obtained exactly by joining the smaller relations.

Avoids spurious tuples

It ensures that no extra or false records are introduced when tables are joined.

Supports normalization and better design

Lossless decomposition is a key requirement in normalization, especially while designing schemas in 3NF and BCNF.

Reduces redundancy and anomalies

Proper decomposition helps minimize repeated data and avoids update, insertion, and deletion anomalies.

Improves database maintenance

Smaller, well-structured tables are easier to manage, query, and enforce constraints on.

Helps enforce integrity constraints

When dependency preservation is also achieved, constraints can be checked more efficiently without reconstructing the original relation.

Summary

Decomposition splits a relation into smaller relations for better design.
Lossless join means the original relation can be reconstructed exactly after joining.
The main goal is to avoid spurious tuples and preserve information.
Important terms to remember: lossless join, lossy decomposition, functional dependency, spurious tuple, natural join, decomposition, dependency preservation