What is Statistics?

We'll kick things off by answering the question, "What is statistics?" First, it will help to establish some context, which is humankind's relatively recent discovery of the power of data.

The big data explosion

It's no secret that modern life is awash with data. E-commerce sites collect data on who's buying what. Climate scientists collect data on CO2 emissions. Automobile manufactures run wind tunnel tests and capture data on vehicle aerodynamics. Financial data feeds high-frequency stock trading algorithms. The list goes on and on.

The big data revolution is underway.
The big data revolution is underway. Photo by Rodolphe Courtier

Interestingly, this wasn't always the case. The Internet and the web are both still fairly recent developments, and the big data explosion has really just begun. So lots of new data collection and processing technology has appeared in recent decades.

But it's not just about technology. The simple insight that data can help us understand the world is recent, as are the techniques for extracting that understanding from the data. As Aileen Nielsen notes:

Medicine got a surprisingly slow start to thinking about the mathematics of predicting the future, despite the fact that prognoses are an essential part of medical practice. This was the case for many reasons. Statistics and a probabilistic way of thinking about the world are recent phenomena, and these disciplines were not available for many centuries even as the practice of medicine developed. Also, most doctors practiced in isolation, without easy professional communication and without a formal recordkeeping infrastructure for patient or population health. Hence, even if physicians in earlier times had been trained as statistical thinkers, they likely wouldn't have had reasonable data from which to draw conclusions.

Aileen Nielsen, Practical Time Series Analysis (O'Reilly), page 2.

As we came to understand the power of data, and as we developed techniques for understanding and using it, we began to collect more of it. And now data is everywhere.

What is statistics, then?

I could look up an official dictionary definition, but that's too boring. Instead I'll just describe statistics in my own words.

Statistics is the mathematical study of concepts, methods and techniques for working with data. One aspect of "working with data" is simply being able to describe it, whether numerically or graphically. This is called descriptive statistics. Another concern in statistics is using data to draw inferences about the real-world processes that generated the data. This is called inferential statistics. This course covers both descriptive and inferential statistics.

That may sound a little abstract, but don't worry—we'll look at these ideas in more detail in just a bit, with examples. The main takeaway at this point is statistics gives us precise mathematical tools for understanding data.

Before we start exploring statistics proper, let's take a quick detour to discuss why somebody might want to study statistics in the first place.

Exercises

Exercise 1. Can you think of examples of statistics in everyday life?

Exercise 2. What's the relationship between the rise of "big data" and the utility of statistics?