Introduction to Inferential Statistics

With descriptive statistics, we have a dataset and we want to describe it in a way that's easy to understand and captures its essential features. Inferential statistics, on the other hand, is about analyzing a dataset to draw conclusions about the real world. These conclusions are generally probable as opposed to certain, since inferential statistics involves investigating a limited dataset and drawing conclusions that go beyond the data at hand. But for many applications, probable conclusions are just fine, and statistics gives us ways to determine just how probable our conclusions are.

As you might guess, inferential statistics is based on probability theory. That's why we investigated the basics of probability theory before jumping into inferential statistics.

Let's consider a couple of examples of inferential statistics.

Example: Predicting election results

One common application of inferential statistics is to use polling and other data to predict election results. A couple of particularly impressive examples of this are Nate Silver's correct predictions of the outcomes in 49 of the 50 states in the 2008 U.S. Presidential election, followed up by his correct predictions for all 50 states in the 2012 U.S. Presidential election.

Nate Silver, statistician who successfully predicted state-by-state outcomes in the 2008 and 2012 U.S. Presidential elections
Nate Silver, statistician who successfully predicted state-by-state outcomes in the 2008 and 2012 U.S. Presidential elections. Source

Example: Deciding whether bookings are abnormally low

Another example involves real-time e-commerce site monitoring. I previously worked for a web-based travel company, and one of my responsibilities was to raise alerts when bookings (sales) were lower than expected. Such drops could be for various reasons, but usually it was because something was broken on the website.

We used various techniques to do this, but one technique is called hypothesis testing. The idea here is to state a hypothesis and then run a test to see whether the data suggest rejecting the hypothesis. For bookings drops, the hypothesis is, roughly, "everything is fine with bookings". Then we look at the current bookings count and decide whether it suggests otherwise, based on probability theory. That way we know whether we need to raise an alert for somebody to fix the site.


Exercise 1. Can you think of any other examples of using data to draw probable conclusions about the real world?