Frequency Distributions

With categorical data, the first step to understanding its shape is to count the values in each category. A frequency table is simply the result of placing the categories and their respective counts in a table.

Here are a couple of examples.

Example: Crime by category, San Francisco (2016)

Earlier in the course, in the section on datasets, we considered a dataset for crime in San Francisco in 2016. The dataset has a Category variable. One question we might ask is how many crimes there were, broken down by category. This is just a matter of counting the crimes and picking a way to display that information. There are lots of tools to make this easy. Here I'll use Excel's pivot table feature to do it.

Here's a tabular presentation of the breakdown by category, in descending order by count. This type of view is called a frequency table because it shows the frequency (count) for each category:

A frequency table of crimes by category in San Francisco, 2016

The dataset contains 39 distinct categories, so I've included only the top several in the table. This is already useful, though. We can see that larceny/theft is the largest category by far, with the following two categories being general catch-all categories ("other offenses" and "non-criminal").

Creating a frequency table is an example of descriptive statistics because it's a way of summarizing a dataset in a way that makes some of its features easier to understand. In this case, the frequency table helps us to understand how many crimes there are in each category.

Example: Crime by day of week, San Francisco (2016)

We can do the same kind of analysis on the same dataset using a different variable. Let's see what the counts look like when broken down by day of week. First, here's the frequency table:

A frequency table of crimes by day of week in San Francisco, 2016

Exercises

Exercise 1. Create a sample frequency table for country of origin. Feel free to choose your own countries and counts.

Exercise 2. Describe a scenario in which you might use a frequency table of the sort you created in Exercise 1.

Exercise 3. Use Excel's pivot table feature and the San Francisco 2016 crime dataset to generate a frequency table for the District variable.