Test of Goodness of Fit
Definition
A Test of Goodness of Fit is a statistical hypothesis test used to determine how well an observed frequency distribution matches a theoretical or expected probability distribution. It evaluates whether the discrepancies between the observed data and the expected data are merely due to chance or if the model itself is a poor fit for the population.
Main Content
1. The Chi-Square ($\chi^2$) Statistic
- This is the most common test used for goodness of fit, measuring the total difference between observed counts ($O$) and expected counts ($E$).
- It follows the principle that if the observed data closely matches the expected data, the $\chi^2$ value will be small, suggesting a "good fit."
2. Null and Alternative Hypotheses
- Null Hypothesis ($H_0$): The observed data follows the specified theoretical distribution (e.g., "The data fits a normal distribution").
- Alternative Hypothesis ($H_1$): The observed data does not follow the specified theoretical distribution.
3. Degrees of Freedom (df)
- This represents the number of independent values that can vary in the analysis.
- It is calculated as $df = k - 1 - m$, where $k$ is the number of categories and $m$ is the number of parameters estimated from the data.
Visual Representation of the Fit:
Expected vs Observed
|
Exp | * * *
Obs | * * * *
+--------------
A B C
If the points are close, the "Goodness of Fit" is high.
Working / Process
1. Formulate Hypotheses and Define Expectations
- State the Null Hypothesis ($H_0$) clearly, assuming the data follows a specific model.
- Calculate the "Expected" frequencies for each category based on the model being tested.
2. Calculate the Chi-Square Statistic
- Use the formula: $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$
- Subtract the expected value from the observed value, square the result, divide by the expected value, and sum these values across all categories.
3. Compare with Critical Value
- Determine the significance level (usually $\alpha = 0.05$) and the Degrees of Freedom.
- If the calculated $\chi^2$ exceeds the critical value from the Chi-Square distribution table, reject the Null Hypothesis.
Advantages / Applications
- Used in genetics to determine if observed inheritance patterns match Mendelian ratios.
- Useful in business to analyze if customer arrivals follow a Poisson distribution for staffing requirements.
- Helps researchers validate if collected data fits a specific probability distribution before applying more complex parametric tests.
Summary
- The Test of Goodness of Fit assesses the compatibility between observed sample data and a theoretical probability model.
- It relies on the Chi-Square statistic to quantify the variance between actual and expected results.
- A high Chi-Square value leads to the rejection of the null hypothesis, implying the model does not fit the data.
- Important terms: Observed Frequency ($O$), Expected Frequency ($E$), Significance Level ($\alpha$), and Degrees of Freedom (df).