test of goodness of fit

Comprehensive study notes, diagrams, and exam preparation for test of goodness of fit.

Test of Goodness of Fit

Definition

A Test of Goodness of Fit is a statistical hypothesis test used to determine how well an observed frequency distribution matches a theoretical or expected probability distribution. It evaluates whether the discrepancies between the observed data and the expected data are merely due to chance or if the model itself is a poor fit for the population.


Main Content

1. The Chi-Square ($\chi^2$) Statistic

  • This is the most common test used for goodness of fit, measuring the total difference between observed counts ($O$) and expected counts ($E$).
  • It follows the principle that if the observed data closely matches the expected data, the $\chi^2$ value will be small, suggesting a "good fit."

2. Null and Alternative Hypotheses

  • Null Hypothesis ($H_0$): The observed data follows the specified theoretical distribution (e.g., "The data fits a normal distribution").
  • Alternative Hypothesis ($H_1$): The observed data does not follow the specified theoretical distribution.

3. Degrees of Freedom (df)

  • This represents the number of independent values that can vary in the analysis.
  • It is calculated as $df = k - 1 - m$, where $k$ is the number of categories and $m$ is the number of parameters estimated from the data.
Visual Representation of the Fit:
Expected vs Observed
      |
Exp   |   *   *   *
Obs   |  *  *   *  *
      +--------------
         A   B   C
If the points are close, the "Goodness of Fit" is high.

Working / Process

1. Formulate Hypotheses and Define Expectations

  • State the Null Hypothesis ($H_0$) clearly, assuming the data follows a specific model.
  • Calculate the "Expected" frequencies for each category based on the model being tested.

2. Calculate the Chi-Square Statistic

  • Use the formula: $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$
  • Subtract the expected value from the observed value, square the result, divide by the expected value, and sum these values across all categories.

3. Compare with Critical Value

  • Determine the significance level (usually $\alpha = 0.05$) and the Degrees of Freedom.
  • If the calculated $\chi^2$ exceeds the critical value from the Chi-Square distribution table, reject the Null Hypothesis.

Advantages / Applications

  • Used in genetics to determine if observed inheritance patterns match Mendelian ratios.
  • Useful in business to analyze if customer arrivals follow a Poisson distribution for staffing requirements.
  • Helps researchers validate if collected data fits a specific probability distribution before applying more complex parametric tests.

Summary

  • The Test of Goodness of Fit assesses the compatibility between observed sample data and a theoretical probability model.
  • It relies on the Chi-Square statistic to quantify the variance between actual and expected results.
  • A high Chi-Square value leads to the rejection of the null hypothesis, implying the model does not fit the data.
  • Important terms: Observed Frequency ($O$), Expected Frequency ($E$), Significance Level ($\alpha$), and Degrees of Freedom (df).