Interquartile Range (IQR)
Remember the range? The range is a quick way to get a sense for the spread of a dataset. But it has a weakness, which is that it's highly sensitive to outliers. In other words, the range is not robust.
Fortunately, there's a modified, robust version of the range called the interquartile range, or IQR. It is just the difference between the third and first quartiles:
Definition. Given a dataset X, the interquartile range (IQR) is given by:
Calculating the IQR involves the following steps:
- Sort the dataset.
- Find Q3, also known as the "third quartile". This is "the" value such that 75% percent of the data are lower than this number. I use scare quotes here because this value isn't generally unique.
- Find Q1, or the "first quartile". This is "the" value such that 25% of the data are lower than this number. Again, Q1 isn't generally unique.
- The IQR is just the difference between Q3 and Q1.
As it happens, computing quartiles is surprisingly complicated. I've always used statistical software to calculate it, and never knew how complicated it is until I looked it up to write this section. For now the gymnastics involved in calculating quartiles are a distraction to the main IQR concept. So in the example below I'll simply give you the Q1 and Q3. We'll return to quartiles later in the section on percentiles and quartiles.
Example: Test scores for a reasonably challenging chemistry test
Recall the chemistry test scores:
83, 87, 61, 92, 38, 78, 73, 55, 98, 74, 86, 69, 40, 83
As I mentioned above, the Q1 and Q3 calculations are complicated. In practice you'll use statistics software to find them. So here I'll just tell you that Q1 is 59.5 and Q3 is 86.25. (I'm using the so-called "exclusive" approach; see Calculating the Interquartile Range in Excel for info on how to perform the calculation in Excel.) So the IQR is 86.25 - 59.5 = 26.75.
Strengths of the interquartile range as a measure of spread
The IQR is like the range, but robust. That is, outliers in the dataset don't impact the IQR much. This is the main reason we use the IQR.
The IQR is conceptually simple. Though the quartile calculation is complicated, the IQR concept is simple. It's just the range of the middle 50% of the dataset.
Weaknesses of the interquartile range as a measure of spread
The IQR is highly scale-dependent. Like the range, the IQR measurement very much depends on the size of the values in the dataset. It's hard to compare IQRs across datasets that use different scales.
The IQR doesn't fully incorporate the dataset's values. Like the range, the IQR focuses more on group endpoints, and accounts for the data values themselves only indirectly. For example, in many cases you can change individual data in the dataset and it won't change the IQR at all. This is great for outliers but arguably less great for non-outlier data.
The IQR throws away a lot of data. The point of throwing out the upper and lower quarters of the data is to remove outliers, but that's arguably more data than we need to throw out. (Those quarters can't be full of outliers, because then they wouldn't be outliers.)
Accordingly, the interdecile range is an alternative to the IQR. With the interdecile range, we throw out the top and bottom 10% of the data, and take the range of the middle 80%. The IQR is more common than the interdecile range, but it's still good to know about it.
Note that boxplots are a useful way to visualize the IQR.