Test for Ratio of Variances - Chi-square Test for Goodness of Fit and Independence of Attributes
Definition
A chi-square test is a statistical hypothesis test that uses the chi-square distribution to compare observed frequencies with expected frequencies, or to test whether two categorical variables are independent.
A test for ratio of variances is a hypothesis test used to compare the variances of two populations. When samples come from normal populations, the ratio of sample variances can be tested using the F-distribution, and the chi-square distribution is fundamental in deriving variance-related inference for a single population.
Main Content
1. Test for Ratio of Variances
Meaning and purpose
- The test for ratio of variances is used to determine whether two populations have the same variance, or whether one population is more variable than the other.
- Variance measures how spread out the data are. Comparing variances is important in quality control, experimental studies, reliability analysis, and before applying methods like the t-test, which often assume equal variances.
Hypotheses and test statistic
-
The null hypothesis is usually:
H₀: σ₁² = σ₂²
and the alternative may be:
H₁: σ₁² ≠ σ₂², σ₁² > σ₂², or σ₁² < σ₂² -
The test statistic is the ratio of sample variances:
F = s₁² / s₂² -
If both populations are normally distributed, this statistic follows an F-distribution under the null hypothesis.
- In variance theory, the chi-square distribution is also central because for a normal population:
(n - 1)s² / σ² ~ χ²(df = n - 1)
This relationship is the foundation for variance estimation and testing for a single population variance.
Example:
Suppose two machine tools produce shafts. We want to know if their variability in shaft diameter is the same. If the sample variance from machine A is 4 and from machine B is 2, then the ratio of variances is 4/2 = 2. A formal F-test determines whether this difference is statistically significant or just due to random sampling variation.
2. Chi-square Test for Goodness of Fit
Meaning and purpose
- The goodness of fit test checks whether the observed data match a theoretical or expected distribution.
- It is used when data are grouped into categories, such as colors, defect types, customer choices, or dice outcomes.
- The test asks: Do the observed frequencies differ from what we would expect under a stated model?
Hypotheses, formula, and interpretation
- Null hypothesis: H₀: The observed data follow the specified distribution
- Alternative hypothesis: H₁: The observed data do not follow the specified distribution
-
The chi-square test statistic is:
χ² = Σ [(O - E)² / E] where:- O = observed frequency
- E = expected frequency
- Large values of χ² suggest a poor fit between observed and expected frequencies.
-
Degrees of freedom are generally:
df = k - 1 - m
where: -
k = number of categories
- m = number of parameters estimated from the data
Example:
A fair die should produce each face about equally often. If 60 rolls give results close to 10 occurrences per face, the fit is good. If one face appears far more often than expected, the chi-square statistic becomes large, suggesting the die may not be fair.
Simple visual idea for goodness of fit:
Observed vs Expected
Face 1: O ██████ E ██████
Face 2: O █████ E ██████
Face 3: O ███████ E ██████
Face 4: O ██████ E ██████
Face 5: O █████ E ██████
Face 6: O ███████ E ██████
When observed bars are close to expected bars, the fit is good.
3. Chi-square Test for Independence of Attributes
Meaning and purpose
- This test determines whether two categorical variables are independent or associated.
- It is widely used in surveys, social science, medicine, marketing, and biology.
- Examples include:
- gender and preference for a product
- smoking status and disease occurrence
- education level and voting behavior
Contingency table and test procedure
- Data are arranged in a contingency table showing frequencies for combinations of categories.
- Null hypothesis: H₀: The attributes are independent
- Alternative hypothesis: H₁: The attributes are associated
-
Expected frequency for a cell is calculated by:
E = (row total × column total) / grand total -
The test statistic is:
χ² = Σ [(O - E)² / E] -
Degrees of freedom:
df = (r - 1)(c - 1)
where:- r = number of rows
- c = number of columns
Example:
A survey records whether students prefer online or offline classes and whether they are male or female. If the distribution of preferences differs significantly by gender, then the variables are not independent.
Contingency table example:
| Online | Offline | Total | |
|---|---|---|---|
| Male | O₁₁ | O₁₂ | R₁ |
| Female | O₂₁ | O₂₂ | R₂ |
| Total | C₁ | C₂ | N |
Expected frequency for the first cell:
E₁₁ = (R₁ × C₁) / N
If the observed and expected counts differ greatly across cells, the attributes may be related.
Working / Process
1. State the hypothesis
- For ratio of variances, define whether you are comparing two population variances.
- For goodness of fit, define the expected theoretical distribution.
- For independence, define the two categorical variables and state whether they are independent.
2. Compute the test statistic
- For ratio of variances, calculate the sample variances and form their ratio.
-
For goodness of fit, compute expected frequencies and use:
χ² = Σ [(O - E)² / E] -
For independence, build the contingency table, compute expected frequencies, and apply the same chi-square formula.
3. Make the decision
- Find the critical value from the chi-square or F distribution, or compute the p-value.
- If the test statistic is extreme enough, reject the null hypothesis.
- Conclude in simple language whether the variances differ, whether the fit is poor, or whether the attributes are independent.
Advantages / Applications
- Helps compare the spread of two populations and assess variability in real-world data.
- Useful for testing whether observed categorical data match a theoretical model such as fair dice, genetic ratios, or production standards.
- Widely applied in analyzing relationships between attributes in surveys, experiments, healthcare studies, and market research.
Summary
- The test for ratio of variances compares variability between populations.
- The chi-square test checks goodness of fit and independence of attributes.
- These methods are important for categorical data and variance analysis.
- Important terms to remember: variance, chi-square, observed frequency, expected frequency, contingency table, degrees of freedom, independence.