Bar charts are our first approach to visualizing data shape. We need one one categorical variable and one numerical variable, where the numerical values correspond to categorical values. The numerical values could be counts, like we saw with frequency tables. But they don't have to be. They could be any numerical value at all.
Let's look at two examples: one involving a categorical variable and associated counts, and another involving a categorical variable and non-count numerical values.
Example: A bar chart for count data
Let's return to the San Francisco crime by day of week count data from the previous section on frequency distributions. Recall the frequency table:
We can visualize this with a bar chart:
Note that there's a natural ordering to the categories, so we've arranged them accordingly. Sometimes there's no meaningful ordering, in which case the arrangement can be arbitrary (e.g., alphabetical).
It's interesting to see the data in this way. There's a small but noticeable uptick in crime on Friday and Saturday, and slightly less on Sunday. Visualizing the shape allows us to see things like this and speculate on the possible causes. For example, maybe more people are out drinking on Friday and Saturday night, and perhaps there are more assaults or other violent crimes. Obviously we would have to explore the data further to decide whether there's any evidence for this.
Example: A bar chart for non-count numerical data
In the previous exercise we considered crime counts associated with each day of the week. But we can look at non-count numerical data too. For example, suppose we want to plot the heights in inches of people in a family. First, here are the heights:
And here's the corresponding bar chart:
Exercise 1 (advanced). Explore the hypothesis that the higher counts on Friday and Saturday are related to increased violent crime. How might you do this?