Percentile Ranking

In Simple Ranking, we observed that while simple ranking is indeed a simple to measure position, it has some drawbacks that limit its usefulness in certain situations. One is that it's hard to compare simple rankings across datasets, and another is that it's hard to calculate the simple ranking when the dataset is large.

With percentile ranking, the idea is talk about rank in terms of percentiles, rather than in terms of absolute rank orders. This makes comparisons easier, and allows us to handle larger datasets.

Let's consider an example.

Example: Height of an American woman

Consider the problem of determining how tall a American woman is, relative to other American women. It's not practical to try to rank order all American women by height. But if we have some idea as to how American women's heights are distributed across the range of possible heights (i.e., non-negative heights), then we can determine that a given American woman is, say, taller than 86% of other American women. Of course, this requires that we have some sense for that distribution. We'll examine that when we look at shape.

Note that knowing this percentage allows us to compare the woman in question to people in other populations. For example, if a given American man is taller than 78% of American men, then we might deem her to be "taller" in a relative sense, even if the man is taller than the woman in absolute terms. Similarly, we could compare the American woman to a Danish woman, a Danish man, or even to a redwood tree if we wanted to. Admittedly, comparing her to a redwood tree would be unusual, but percentile rankings offer that flexibility.

Quartiles and deciles

Besides percentiles, it's often useful to deal with larger groupings. One common approach is to divide the data into four equal parts called quartiles. Indeed we saw this earlier when we studied the interquartile range. Another fairly common approach is to divide the data into ten equal parts called deciles. The calculations of the exact boundaries between quartiles and between deciles are complex, so normally we use software to do that.

Strengths of percentile ranking

Easy to understand. Like simple ranking, percentile ranking is common in everyday life.

Scale-invariant. Percentages allow us to disregard the size of the dataset. Because percentiles are normalized to a 0-100% scale, we can compare elements across datasets, as with the example above.

Supports larger datasets and datasets with unknown or irrelevant sizes. Percentiles are practical for cases where the dataset is either too large for simple ranking, or else the size of the dataset either isn't known or else doesn't matter.

Weaknesses of percentile ranking

Requires knowledge of the distribution of actual values across the range of possible values. We have to know (or at least estimate) how the values are distributed in order to calculate the percentage.

Exercises

Exercise 1. Use the Internet to research how much annual household income is required to be in the top 1% of United States households in the year 2020. What about the top 10% The top 50%?