Basic Statistics Measures of Central Tendency
Definition
Measures of central tendency are statistical measures that identify a central or representative value in a dataset. They summarize a collection of values by indicating the point around which the data tend to concentrate.
The main measures of central tendency are:
Mean
- : the average of all values
Median
- : the middle value when data are arranged in order
Mode
- : the most frequently occurring value
These measures provide a concise description of the data and are used to compare datasets, interpret patterns, and support decision-making.
Main Content
1. Mean
The mean is the arithmetic average of a set of numbers. It is obtained by adding all the values and dividing by the total number of values.
Formula:
Example:
For the data set: 4, 6, 8, 10, 12
- Sum = 4 + 6 + 8 + 10 + 12 = 40
- Number of values = 5
- Mean = 40 ÷ 5 = 8
So, the mean is 8.
Important features of the mean:
- It uses every value in the dataset.
- It is easy to calculate and understand.
- It is highly affected by extremely large or extremely small values, called outliers.
Example of sensitivity to outliers:
Data: 5, 6, 7, 8, 100
- Mean = (5 + 6 + 7 + 8 + 100) ÷ 5 = 126 ÷ 5 = 25.2
Although most values are between 5 and 8, the mean becomes 25.2 because of the outlier 100. This shows that the mean may not always represent the “typical” value well when data are skewed.
Where the mean is useful:
- In exam scores, income averages, temperature readings, and scientific measurements.
- When data are numerical and not heavily distorted by outliers.
2. Median
The median is the middle value of a dataset when the values are arranged in ascending or descending order. It divides the dataset into two equal halves.
How to find the median:
- Arrange the data in order.
- If the number of observations is odd, the median is the middle value.
- If the number of observations is even, the median is the average of the two middle values.
Example 1: Odd number of values
Data: 3, 5, 7, 9, 11
- Ordered data: 3, 5, 7, 9, 11
- Middle value = 7
Median = 7
Example 2: Even number of values
Data: 4, 6, 8, 10
- Ordered data: 4, 6, 8, 10
- Two middle values = 6 and 8
- Median = (6 + 8) ÷ 2 = 7
Median = 7
Important features of the median:
- It is not affected much by outliers.
- It is especially useful for skewed data.
- It can be used with ordinal data, where values have a meaningful order but not necessarily equal spacing.
Example showing robustness:
Data: 5, 6, 7, 8, 100
- Ordered data: 5, 6, 7, 8, 100
- Median = 7
Here, the median better represents the center of the data than the mean because it is not distorted by the extreme value 100.
Where the median is useful:
- In house prices, income distribution, and other skewed datasets.
- When data contain outliers.
- When the middle position matters more than the arithmetic average.
Visual idea of the median:
Data in order:
5 6 7 8 100
↑
Median
The arrow points to the middle value, which divides the data into two parts.
3. Mode
The mode is the value that occurs most frequently in a dataset. A dataset may have one mode, more than one mode, or no mode at all.
Types of mode:
Unimodal
- : one mode
Bimodal
- : two modes
Multimodal
- : more than two modes
No mode
- : no value repeats
Example 1: One mode
Data: 2, 4, 4, 6, 8
- The value 4 appears most often.
Mode = 4
Example 2: Two modes
Data: 1, 2, 2, 3, 3, 4
- Both 2 and 3 occur most frequently.
Mode = 2 and 3
Example 3: No mode
Data: 1, 2, 3, 4, 5
- No value repeats.
Mode = No mode
Important features of the mode:
- It can be used for numerical, categorical, and qualitative data.
- It identifies the most common or popular value.
- It may not be unique.
- It may not represent the center well if the data are widely spread or irregular.
Where the mode is useful:
- In market research, such as the most preferred product size or color.
- In fashion, transportation, and consumer behavior studies.
- With categorical data, like the most common blood group or shoe size.
Example with categorical data:
If 20 students choose favorite colors and 8 choose blue, 5 choose red, 4 choose green, and 3 choose yellow, then the mode is blue because it occurs most often.
Working / Process
1. Collect the data and organize it clearly
- Gather all observations from the sample or population.
- Check whether the data are numerical or categorical.
- Arrange numerical data in ascending order when needed, especially for finding the median.
- Example: Data = 12, 7, 9, 15, 7, 10
2. Choose the appropriate measure of central tendency
- Use the mean when data are fairly symmetric and there are no extreme values.
- Use the median when the data are skewed or contain outliers.
- Use the mode when you need the most frequent value or when data are categorical.
- Example: For house prices, the median is often better than the mean because very expensive houses can distort the average.
3. Calculate and interpret the result
- Mean: add all values and divide by the number of values.
- Median: locate the middle value after ordering the data.
- Mode: identify the most frequent value.
- Then interpret what the result means in context.
- Example: If the median income is $50,000, it means half the people earn less than that and half earn more.
Advantages / Applications
Provides a simple summary of data
- Measures of central tendency reduce a large set of values into one meaningful number, making the data easier to understand and communicate.
Useful for comparison
- They help compare different groups, such as comparing average exam scores between two classes or median salaries across industries.
Widely used in real-life decision-making
- Businesses use them to study sales trends.
- Schools use them to assess student performance.
- Governments use them to analyze income, inflation, and population data.
- Researchers use them to describe experimental results.
Summary
- Measures of central tendency show the center of a dataset.
- The main measures are mean, median, and mode.
- Important terms to remember: mean, median, mode, outlier, skewed data, frequency.