Applied Statistics Curve fitting by the method of least squares- fitting of straight lines

Comprehensive study notes, diagrams, and exam preparation for Applied Statistics Curve fitting by the method of least squares- fitting of straight lines.

Applied Statistics Curve fitting by the method of least squares- fitting of straight lines

Definition

Curve fitting by the method of least squares is the statistical technique of determining the line or curve that best fits a set of data points by minimizing the sum of the squares of the deviations (errors) between the observed values and the values estimated by the model.

For fitting a straight line, the equation is usually written as:

where:

  • = independent variable
  • = dependent variable
  • = intercept
  • = slope

The best-fit values of and are obtained so that the sum of squared vertical deviations of the observed points from the line is as small as possible.


Main Content

1. First Concept: Straight Line Fitting

  • A straight line fit is used when the relationship between two variables is nearly linear.
  • The general form of the line is:

Here:

  • tells where the line cuts the -axis.
  • tells how much changes when increases by one unit.

A straight line is suitable when the data points show a roughly upward or downward trend. For example, if the number of hours studied increases and exam marks also increase in a fairly regular pattern, a straight-line model may describe the data well.

Example:
Suppose we have the following data:

x y
1 2
2 3
3 5
4 4

These points may not lie exactly on one line, but a straight line can be fitted to represent the overall trend.

A rough visual idea:

y
|
|         * 
|      *     *
|   *
|________________ x

The goal is not to force every point onto the line, but to find the line that best represents the pattern of all points together.


2. Second Concept: Least Squares Principle

  • The least squares principle says that the best fitting line is the one that minimizes the sum of the squares of the errors.
  • The error for each point is the vertical distance between the observed value and the predicted value from the line.

If the observed value is and the estimated value is , then the error is:

For a straight line:

So the error becomes:

The least squares method minimizes:

We square the errors because:

  • positive and negative errors should not cancel each other out,
  • larger errors should be penalized more strongly,
  • squaring gives a smooth mathematical function that can be optimized.

Idea behind the method:
Among all possible straight lines, choose the one for which the total squared error is smallest.

This is why the method is called least squares.


3. Third Concept: Normal Equations and Calculation of the Best-Fit Line

  • To find the values of and , we differentiate the sum of squares with respect to these parameters and set the results equal to zero.
  • This gives the normal equations:

where:

  • = number of observations
  • = sum of all observed values
  • = sum of all values
  • = sum of the products of and
  • = sum of squares of

These two equations are solved simultaneously to obtain and .

Example calculation:
Suppose data are:

x y xy
1 2 1 2
2 3 4 6
3 5 9 15
4 4 16 16

Now compute the totals:

Normal equations:

Solving these gives the straight line equation. Once and are found, the fitted line can be written and used for prediction.

A fitted line may look like this:

y
|
|         * 
|      *      *
|    /---------
|  *
|________________ x

The line passes through the general center of the data points.


Working / Process

1. Collect and organize the data

  • Identify the independent variable and dependent variable .
  • Arrange the observations in tabular form.
  • Compute extra columns if needed, such as and .

2. Form the normal equations

  • Use the least squares formulas:

  • Substitute the totals from the data into these equations.

3. Solve for the line and interpret it

  • Find and by solving the two equations.
  • Write the fitted line as .
  • Use the line for prediction, trend analysis, or decision-making.
  • Check whether the line fits the data reasonably well by comparing actual and estimated values.

Advantages / Applications

  • It gives a simple and practical way to model relationships between variables.
  • It is useful for prediction and forecasting in business, economics, and science.
  • It provides an objective best-fit line by minimizing total squared error.
  • It is widely used in sales forecasting, demand analysis, production planning, engineering measurements, and scientific experiments.
  • It helps identify trends, such as growth, decline, or stability in data.
  • It forms the basis for more advanced statistical techniques like regression analysis.

Summary

  • The least squares method fits the best straight line by minimizing squared errors.
  • The fitted line is usually written as .
  • The normal equations are used to find the intercept and slope.
  • Important terms to remember: least squares, regression line, intercept, slope, residual, normal equations.