Boxplots

A boxplot is a visualization that allows us to see multiple aspects of a dataset at once:

Here's a boxplot of annual precipitation in Central Park for the years 1869-2019:

Boxplot of annual precipitation in Central Park for the years 1869-2019

In the boxplot above, we see a box with a heavy line through it, some "whiskers" above and below the box (boxplots are sometimes called "box and whisker plots"), and finally some open circles above the upper whisker. Here's what these represent:

  • The heavy line in the middle of the box represents the median, which here happens to be 44.5.
  • The upper bound of the box is the third quartile, and the lower bound is the first quartile. Thus the distance between them is the interquartile range.
  • In this boxplot, the whiskers extend 1.5 · IQR above and below the box.
  • The open circles are outliers (here, anything outside the whiskers).

Note that in the boxplot above, we can see the maximum value because it happens to be an outlier, but we can't see the minimum value because it isn't an outlier.

Oh, speaking of boxplots and outliers...

Example: Boyfriend

Boyfriend.
"Boyfriend." Used with kind permission from xkcd.

I couldn't resist. :)

It happens that while there is a standard practice around what the box and the heavy line represent, there's no standard for the whiskers. In the boxplot above, I used the boxplot function in R, which by default draws whiskers that extend 1.5 · IQR above and below the box. The exact IQR scaling factor is configurable, and if we set it to 0, then the behavior is for the whiskers to represent the maximum and minimum values, like this:

Alternative boxplot of annual precipitation in Central Park for the years 1869-2019

Exercises

Exercise 1. Using the precipitation data, create a boxplot for the July column.