Scatterplots

In a scatterplot, we visualize the relationship between two numerical variables in a dataset. The idea is to choose one of the variables as being on the x-axis, the other on the y-axis, and then just plot each point.

Suppose for example that we have a dataset of hygrometer data, which involves measuring both temperature and relative humidity over time. We saw in the previous section that the relative humidity tends to drop as the temperature increases, and vice versa:

Several days of hygrometer data
Several days of hygrometer data

How can we visualize the relationship between the temperature and relative humidity?

First, let's look at a small subset of the data:

IntervalHumidity.AverageTemperature.Average
11/13/2020 12:00 AM60.372019.8415
11/13/2020 12:06 AM60.680118.7828
11/13/2020 12:12 AM60.015520.1079
11/13/2020 12:18 AM60.575018.8529
11/13/2020 12:24 AM60.530919.5025
11/13/2020 12:30 AM60.338619.0101
11/13/2020 12:36 AM60.391519.6573
11/13/2020 12:42 AM60.594618.8577
11/13/2020 12:48 AM60.336319.6353
11/13/2020 12:54 AM60.560818.5713

In the dataset above we can see that there are three variables: a timestamp, an average relative humidity and an average temperature. The values under humidity and temperature above are too close together to see any obvious patterns, but if we plot the humidity against the temperature for the whole dataset, we get the following:

Relative humidity vs. temperature
Relative humidity vs. temperature

The scatterplot above makes it clear that there is indeed a relationship between these two variables, also perhaps a fairly loose one. We can see that the relative humidity tends to be higher when the temperature is lower, and as the temperature gets higher, we see more of a spread in the values into the lower part of the range.

Smaller datasets are easy to plot by hand. For a larger dataset, you'll typically use a software package to generate a scatterplot, since it would be tedious to plot the individual data points manually. See the tech tutorials below for information on using Excel, Python and R to generate scatterplots.

Exercises

Exercise 1. Create a scatterplot for the variables Age and Weight in the following dataset:

NameAgeHeightWeight (lbs)
Ivo516'2"205
Lorelei235'9"138
Beatrix485'6"129
Persephone125'4"104
Pandora305'5"142

Exercise 2. Use the same dataset to create a scatterplot for the variables Height and Weight.