Chi-square distribution

Comprehensive study notes, diagrams, and exam preparation for Chi-square distribution.

Chi-Square Distribution

Definition

The Chi-Square ($\chi^2$) distribution is a fundamental continuous probability distribution used extensively in inferential statistics. It represents the sum of the squares of $k$ independent standard normal random variables, where $k$ is known as the degrees of freedom. It is non-negative and is primarily used to test the goodness of fit, independence, and homogeneity in categorical data.


Main Content

1. The Concept of Degrees of Freedom

  • The shape of the Chi-Square distribution depends entirely on the degrees of freedom ($df$), denoted by $k$.
  • As $df$ increases, the distribution becomes more symmetrical and eventually approaches a normal distribution.

2. Properties of the Distribution

  • The curve is always skewed to the right (positively skewed).
  • The total area under the curve is equal to 1, as it is a probability distribution.
  • Values of the Chi-Square statistic are always $\ge 0$.

3. Visualizing the Shape

  • The distribution changes its peak and tail length based on the degrees of freedom.
Density
  ^
  |      / \ (df=2)
  |     /   \
  |    /     \      (df=5)
  |   /       \____
  |  /             \____
  |_______________________> Chi-Square Value

Working / Process

1. Formulate Hypotheses

  • State the Null Hypothesis ($H_0$), which assumes no significant difference between observed and expected frequencies.
  • State the Alternative Hypothesis ($H_1$), which suggests a significant difference exists.

2. Calculate the Chi-Square Statistic

  • Use the formula: $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$
  • $O_i$ represents the observed frequency, and $E_i$ represents the expected frequency.

3. Determine Significance and Conclusion

  • Compare the calculated $\chi^2$ value with the critical value from the Chi-Square distribution table based on $df$ and the chosen significance level (e.g., $\alpha = 0.05$).
  • If Calculated $\chi^2 >$ Table Value, reject the null hypothesis. Otherwise, fail to reject it.

Advantages / Applications

  • Goodness of Fit Test: Used to determine if sample data matches a population distribution.
  • Test of Independence: Used to verify if there is a significant relationship between two categorical variables (e.g., gender and preference for a brand).
  • Homogeneity Test: Used to determine if different populations have the same distribution of a specific variable.

Summary

The Chi-Square distribution is a statistical tool used to analyze categorical data by comparing observed frequencies to expected frequencies. It relies on the degrees of freedom to determine its shape and is critical for hypothesis testing regarding relationships between variables.

  • Key Point 1: It is non-negative and right-skewed.
  • Key Point 2: Its shape changes based on the degrees of freedom.
  • Key Point 3: It is widely used in tests of independence and goodness of fit.
  • Important terms to remember: Degrees of Freedom ($df$), Observed Frequency ($O$), Expected Frequency ($E$), Null Hypothesis ($H_0$), and Critical Value.