# Frequency Distributions

With categorical data, the first step to understanding its
shape is to count the values in each category. A *frequency table* is
simply the result of placing the categories and their respective counts in a table.

Here are a couple of examples.

## Example: Crime by category, San Francisco (2016)

Earlier in the course, in the section on datasets, we considered a
dataset for crime in San Francisco in 2016. The dataset has a `Category`

variable. One question we
might ask is how many crimes there were, broken down by category. This is just a matter of counting the crimes
and picking a way to display that information. There are lots of tools to make this easy. Here I'll use
Excel's pivot table feature to do it.

Here's a tabular presentation of the breakdown by category, in descending order by count. This type of
view is called a *frequency table* because it shows the frequency (count) for each category:

The dataset contains 39 distinct categories, so I've included only the top several in the table. This is already useful, though. We can see that larceny/theft is the largest category by far, with the following two categories being general catch-all categories ("other offenses" and "non-criminal").

Creating a frequency table is an example of descriptive statistics because it's a way of summarizing a dataset in a way that makes some of its features easier to understand. In this case, the frequency table helps us to understand how many crimes there are in each category.

## Example: Crime by day of week, San Francisco (2016)

We can do the same kind of analysis on the same dataset using a different variable. Let's see what the counts look like when broken down by day of week. First, here's the frequency table:

## Exercises

**Exercise 1.** Create a sample frequency table for country of origin. Feel free to choose your
own countries and counts.

**Exercise 2.** Describe a scenario in which you might use a frequency table of the sort you
created in Exercise 1.

**Exercise 3.** Use Excel's pivot table
feature and the
San Francisco 2016 crime dataset
to generate a frequency table for the `District` variable.