Correlation and Regression – Rank Correlation

Definition

Rank correlation is a measure of association that indicates the strength and direction of relationship between two variables based on their ranks instead of their actual values. The most common method is Spearman’s rank correlation coefficient, which shows how closely two ranked lists correspond to each other.

If the ranks of two variables are similar, the rank correlation is positive and close to +1. If the ranks are opposite, it is negative and close to -1. If there is no consistent pattern between the ranks, it is near 0.

Main Content

1. Meaning and Nature of Rank Correlation

Rank correlation is used when data are given in order of preference, merit, size, importance, or performance rather than in exact measurements.
It is a non-parametric method, which means it does not require the data to follow a normal distribution or satisfy strict mathematical assumptions.

Rank correlation is particularly important because real-world data are not always precise. For example:

A teacher may rank students from 1st to 20th based on performance.
A company may rank employees based on productivity.
Consumers may rank brands according to preference.

In such cases, we are interested in whether the ranking by one method agrees with the ranking by another method.

Example:

Suppose five students are ranked by mathematics and science:

Student	Math Rank	Science Rank
A	1	2
B	2	1
C	3	3
D	4	5
E	5	4

These ranks are very close to each other, so the rank correlation is high and positive.

2. Spearman’s Rank Correlation Coefficient

Spearman’s rank correlation coefficient, usually denoted by ρ (rho) or rs, is the most widely used measure of rank correlation.
It measures the degree to which the relationship between two variables can be described using a monotonic function, meaning that as one rank increases, the other tends to increase or decrease consistently.

The formula is:

$r_s = 1 - \frac{6 \sum d^2}{n(n^2 - 1)}$

Where:

$d$ = difference between the ranks of each pair
$d^2$ = square of the rank difference
$n$ = number of paired observations

The value of $r_s$ lies between -1 and +1:

+1

means perfect agreement in ranks

0

means no rank relationship

-1

means perfect inverse ranking

Interpretation:

If two judges rank contestants almost the same, the coefficient will be close to +1.
If one judge ranks contestants in exactly the reverse order of the other, the coefficient will be -1.

Mini example:

If ranks of four objects are:

Object	Rank X	Rank Y	d	d²
1	1	2	-1	1
2	2	1	1	1
3	3	4	-1	1
4	4	3	1	1

$\sum d^2 = 4,\quad n=4$

$r_s = 1 - \frac{6(4)}{4(4^2-1)} = 1 - \frac{24}{60} = 0.6$

This indicates a moderately strong positive rank relationship.

3. Ranking, Ties, and Interpretation of Results

In many datasets, two or more values may be equal. These are called ties, and they must be handled carefully while assigning ranks.
When ties occur, each tied value is usually assigned the average of the ranks they would have occupied.

Example of ties:

If two students tie for 2nd and 3rd position, both may be given rank:

$\frac{2+3}{2} = 2.5$

Why ties matter:

Ties affect the computation of rank differences.
In such cases, the simple Spearman formula may need correction.
Ignoring ties can lead to inaccurate results.

Interpretation of correlation values:

0.80 to 1.00

: Very strong positive rank correlation

0.50 to 0.79

: Moderate positive rank correlation

0.20 to 0.49

: Weak positive rank correlation

0.00 to 0.19

: Very weak or negligible correlation
Negative values show opposite ranking directions

Simple relation view:

High agreement:    X ranks 1,2,3,4,5   Y ranks 1,2,3,4,5
No clear pattern:   X ranks 1,2,3,4,5   Y ranks 3,1,5,2,4
Opposite order:     X ranks 1,2,3,4,5   Y ranks 5,4,3,2,1

Working / Process

1. Assign ranks to both variables

Arrange each set of data in order.
Give rank 1 to the highest or lowest value depending on the context, but use the same rule for both variables.
If there are ties, assign average ranks.

2. Find the rank differences

For each pair of observations, subtract one rank from the other: $d = R_x - R_y$
Square each difference: $d^2$
Sum all the squared differences: $\sum d^2$

3. Apply the formula and interpret

Substitute the values into Spearman’s formula: $r_s = 1 - \frac{6 \sum d^2}{n(n^2 - 1)}$
Interpret the result:
- Near +1: strong agreement
- Near 0: little or no rank relationship
- Near -1: strong reverse relationship

Worked Example

Suppose ranks of six candidates in two tests are:

Candidate	Rank in Test A	Rank in Test B	d	d²
1	1	2	-1	1
2	2	1	1	1
3	3	3	0	0
4	4	5	-1	1
5	5	4	1	1
6	6	6	0	0

$\sum d^2 = 4,\quad n = 6$

$r_s = 1 - \frac{6(4)}{6(6^2-1)} = 1 - \frac{24}{210} = 1 - 0.1143 = 0.8857$

This shows a very strong positive rank correlation.

Advantages / Applications

Useful for ordinal data

: Rank correlation works well when data are in ranks, preferences, or categories that have order but not exact measurements.

Simple and practical

: It is easy to understand and apply, especially in education, psychology, business, and sports ranking situations.

Works without strict assumptions

: Since it is non-parametric, it can be used when data do not meet the assumptions required for Pearson’s correlation.

Common applications:

Comparing exam results from two different evaluators
Measuring agreement between judges in competitions
Studying customer preference rankings
Comparing rankings of employees, products, or universities
Analyzing relationships in survey responses where responses are ordered

Summary

Rank correlation measures the relationship between two ranked lists.
Spearman’s coefficient is the most common rank correlation method.
It is useful for ordinal data and situations involving ties or ordered preferences.
Important terms to remember: rank correlation, Spearman’s rank correlation coefficient, tied ranks, non-parametric data, positive correlation, negative correlation