Introduction to Descriptive Statistics
With descriptive statistics, our goal is to summarize a dataset in a way that's meaningful and easy to understand. This is important because most real-world datasets are too complex to understand simply by looking at the raw data. The dataset may be large. Or it might have lots of variables with unobvious interrelationships.
There are different ways we can summarize a dataset. One is to use numerical quantities that capture something interesting and meaningful about the data. Another is to use plots to help us visualize the data. It's usually helpful to do both.
Our first example shows how to summarize a dataset in a numerical way.
Example: Mean height of a group of American men
Suppose that we have the following dataset, containing the heights of a group of American men. Heights are measured in inches:
74, 69, 74, 71, 69, 67, 68, 68, 67, 63, 71, 71, 64, 69, 69, 67, 72, 63, 71, 71, 64
Even with a dataset this small, it's not immediately obvious what we're looking at here. As a group, are these men roughly average height? Are they taller than average? Shorter?
We can answer that question by calculating the mean (average) height. Don't worry if you don't know how to do that, as we'll get into that in a little bit. But take my word for it that the mean height for this group of men is 68.67 inches, which makes them slightly under the average height for American men over 20, which is 69.1 inches (5 feet, 9 inches). (source)
In the next example, we take a visual approach toward summarizing our dataset.
Example: Visualizing the distribution of heights
In the previous example, we calculated a single number to characterize the average height of the men in the dataset. But we don't know whether that's the average because everybody is roughly that height, or because some men are very tall and others are very short, or what. A visualization can help us with this. The following bar chart shows us how many men there are for each height represented in the dataset:
In each of the previous examples, we took a dataset and summarized it to make it easier to understand. There are many techniques for doing so, and this is what we'll explore in our study of descriptive statistics.