Correlation and Regression – Rank Correlation
Definition
Rank correlation is a measure of association that indicates the strength and direction of relationship between two variables based on their ranks instead of their actual values. The most common method is Spearman’s rank correlation coefficient, which shows how closely two ranked lists correspond to each other.
If the ranks of two variables are similar, the rank correlation is positive and close to +1. If the ranks are opposite, it is negative and close to -1. If there is no consistent pattern between the ranks, it is near 0.
Main Content
1. Meaning and Nature of Rank Correlation
- Rank correlation is used when data are given in order of preference, merit, size, importance, or performance rather than in exact measurements.
- It is a non-parametric method, which means it does not require the data to follow a normal distribution or satisfy strict mathematical assumptions.
Rank correlation is particularly important because real-world data are not always precise. For example:
- A teacher may rank students from 1st to 20th based on performance.
- A company may rank employees based on productivity.
- Consumers may rank brands according to preference.
In such cases, we are interested in whether the ranking by one method agrees with the ranking by another method.
Example:
Suppose five students are ranked by mathematics and science:
| Student | Math Rank | Science Rank |
|---|---|---|
| A | 1 | 2 |
| B | 2 | 1 |
| C | 3 | 3 |
| D | 4 | 5 |
| E | 5 | 4 |
These ranks are very close to each other, so the rank correlation is high and positive.
2. Spearman’s Rank Correlation Coefficient
- Spearman’s rank correlation coefficient, usually denoted by ρ (rho) or rs, is the most widely used measure of rank correlation.
- It measures the degree to which the relationship between two variables can be described using a monotonic function, meaning that as one rank increases, the other tends to increase or decrease consistently.
The formula is:
Where:
- = difference between the ranks of each pair
- = square of the rank difference
- = number of paired observations
The value of lies between -1 and +1:
+1
- means perfect agreement in ranks
0
- means no rank relationship
-1
- means perfect inverse ranking
Interpretation:
- If two judges rank contestants almost the same, the coefficient will be close to +1.
- If one judge ranks contestants in exactly the reverse order of the other, the coefficient will be -1.
Mini example:
If ranks of four objects are:
| Object | Rank X | Rank Y | d | d² |
|---|---|---|---|---|
| 1 | 1 | 2 | -1 | 1 |
| 2 | 2 | 1 | 1 | 1 |
| 3 | 3 | 4 | -1 | 1 |
| 4 | 4 | 3 | 1 | 1 |
This indicates a moderately strong positive rank relationship.
3. Ranking, Ties, and Interpretation of Results
- In many datasets, two or more values may be equal. These are called ties, and they must be handled carefully while assigning ranks.
- When ties occur, each tied value is usually assigned the average of the ranks they would have occupied.
Example of ties:
If two students tie for 2nd and 3rd position, both may be given rank:
Why ties matter:
- Ties affect the computation of rank differences.
- In such cases, the simple Spearman formula may need correction.
- Ignoring ties can lead to inaccurate results.
Interpretation of correlation values:
0.80 to 1.00
- : Very strong positive rank correlation
0.50 to 0.79
- : Moderate positive rank correlation
0.20 to 0.49
- : Weak positive rank correlation
0.00 to 0.19
- : Very weak or negligible correlation
- Negative values show opposite ranking directions
Simple relation view:
High agreement: X ranks 1,2,3,4,5 Y ranks 1,2,3,4,5
No clear pattern: X ranks 1,2,3,4,5 Y ranks 3,1,5,2,4
Opposite order: X ranks 1,2,3,4,5 Y ranks 5,4,3,2,1
Working / Process
1. Assign ranks to both variables
- Arrange each set of data in order.
- Give rank 1 to the highest or lowest value depending on the context, but use the same rule for both variables.
- If there are ties, assign average ranks.
2. Find the rank differences
-
For each pair of observations, subtract one rank from the other:
-
Square each difference:
-
Sum all the squared differences:
3. Apply the formula and interpret
-
Substitute the values into Spearman’s formula:
-
Interpret the result:
- Near +1: strong agreement
- Near 0: little or no rank relationship
- Near -1: strong reverse relationship
Worked Example
Suppose ranks of six candidates in two tests are:
| Candidate | Rank in Test A | Rank in Test B | d | d² |
|---|---|---|---|---|
| 1 | 1 | 2 | -1 | 1 |
| 2 | 2 | 1 | 1 | 1 |
| 3 | 3 | 3 | 0 | 0 |
| 4 | 4 | 5 | -1 | 1 |
| 5 | 5 | 4 | 1 | 1 |
| 6 | 6 | 6 | 0 | 0 |
This shows a very strong positive rank correlation.
Advantages / Applications
Useful for ordinal data
- : Rank correlation works well when data are in ranks, preferences, or categories that have order but not exact measurements.
Simple and practical
- : It is easy to understand and apply, especially in education, psychology, business, and sports ranking situations.
Works without strict assumptions
- : Since it is non-parametric, it can be used when data do not meet the assumptions required for Pearson’s correlation.
Common applications:
- Comparing exam results from two different evaluators
- Measuring agreement between judges in competitions
- Studying customer preference rankings
- Comparing rankings of employees, products, or universities
- Analyzing relationships in survey responses where responses are ordered
Summary
- Rank correlation measures the relationship between two ranked lists.
- Spearman’s coefficient is the most common rank correlation method.
- It is useful for ordinal data and situations involving ties or ordered preferences.
- Important terms to remember: rank correlation, Spearman’s rank correlation coefficient, tied ranks, non-parametric data, positive correlation, negative correlation