With categorical data, the first step to understanding its shape is to count the values in each category. A frequency table is simply the result of placing the categories and their respective counts in a table.
Here are a couple of examples.
Example: Crime by category, San Francisco (2016)
Earlier in the course, in the section on datasets, we considered a
dataset for crime in San Francisco in 2016. The dataset has a
Category variable. One question we
might ask is how many crimes there were, broken down by category. This is just a matter of counting the crimes
and picking a way to display that information. There are lots of tools to make this easy. Here I'll use
Excel's pivot table feature to do it.
Here's a tabular presentation of the breakdown by category, in descending order by count. This type of view is called a frequency table because it shows the frequency (count) for each category:
The dataset contains 39 distinct categories, so I've included only the top several in the table. This is already useful, though. We can see that larceny/theft is the largest category by far, with the following two categories being general catch-all categories ("other offenses" and "non-criminal").
Creating a frequency table is an example of descriptive statistics because it's a way of summarizing a dataset in a way that makes some of its features easier to understand. In this case, the frequency table helps us to understand how many crimes there are in each category.
Example: Crime by day of week, San Francisco (2016)
We can do the same kind of analysis on the same dataset using a different variable. Let's see what the counts look like when broken down by day of week. First, here's the frequency table:
Exercise 1. Create a sample frequency table for country of origin. Feel free to choose your own countries and counts.
Exercise 2. Describe a scenario in which you might use a frequency table of the sort you created in Exercise 1.