House for sale
House for sale. Source

Central Tendency

Suppose that we have a dataset of housing prices for a certain area:


(All prices are in thousands of US dollars.)

Suppose further that we'd like to come up with a single number that summarizes the data in this dataset. How do we proceed?

Before we answer that question, let's think of some numbers that are definitely not reasonable choices. We can see that with a single exception, all of the prices are in the $400K or $500K range. The single exception is $605K. So $350K would be too low, and $1.8M would be much too high. We'd probably expect the number to be somewhere in the $500K range, just looking at the data.

Intuitively, we have a range of values, and we can think of this range in a geometric way, as numbers on a number line. Like this:

House prices

This visualization helps us see the way the housing prices are distributed across the range. To choose a single number to represent this set, it would be most reasonable to choose the "center" of this dataset in some sense. But where exactly is that center?

One (wrong) way to calculate the center would be to take the midpoint between the top and the bottom of the range. The top of the range is $605K and the bottom of the range is $400K. This approach puts the midpoint at $502.5K:

House prices with weird center

Looking at the number line, though, this proposed center seems problematic. Most of the prices are higher than the midpoint. So while this isn't a horrible first attempt, it does seem a bit too low to be a good summary of the dataset. Part of the problem is that the only data points we've accounted for are the endpoints of the range. A better measure probably incorporates all data points, at least for numerical data.

As it happens, there are multiple common approaches to defining the "center" or "location" of a dataset. In the next few sections, we'll explore these different approaches to measuring central tendency.


Exercise 1. Before proceeding further in the course, see if you can think of other ways we might measure central tendency. (You'll have a chance to see if you were right in the following sections.)