# Introduction to Probability

Life is filled with uncertainty:

- How long will it take me to get to the office this morning?
- If I drive 5 mph over the speed limit, will I get a speeding ticket?
- Will my kid get accepted by his first choice university?
- What will the price of Bitcoin be a year from now?
- Will Social Security still be around when I'm ready to retire?
- Which team will win the coin toss?

This is obviously a very partial list. Just about everything in life is an uncertainty. Perhaps Benjamin Franklin said it best:

Our new Constitution is now established, and has an appearance that promises permanency; but in this world nothing can be said to be certain, except death and taxes.

Benjamin Franklin, in a letter to Jean-Baptiste Le Roy, 1789

One of the key goals in probability theory is to determine just how likely or unlikely various events are. Sometimes it's pretty easy, like determining the probability of flipping "heads" on a fair coin. Other times it's outrageously difficult, like knowing the price of Bitcoin a year from now.

When we talk about "determining the probability", we're talking about assigning a numerical value to how likely it is that various events will occur, or that some statement is true. These values are real numbers in the range 0-1. A probability of 1 means that the event is "certain" and a probability of 0 means that it's "impossible". (I've put scare quotes around those because "certain" and "impossible" aren't exactly correct here, but they're close enough for right now.) A probability of 0.5 means that it's a 50/50 chance that the event will occur.

Probability is foundational to inferential statistics, which is why we're studying it. Makes sense if you think about it: inferential statistics concerns reasoning about uncertain events, and so probability allows us to attach numerical measurements to those events to support such reasoning.

To set the stage for probability theory, it will help to say a few words about the history of the subject and the competing interpretations surrounding the notion of probability itself. This will allow us to better appreciate the motivations driving the subsequent development of the theory.

## A brief history

Probability theory has its historical roots in cryptography (study of secret codes) and especially in "games of chance" (gambling). The earliest known contributors were Arab mathematicians between the 8th and 13th centuries, including Al-Khalil, Al-Kindi and Ibn Adlan. Some pioneers of probability theory include Cardano, Fermat, Pascal, Huygens and Laplace. Laplace in particular was a key figure in the development of classical probability theory. Later on, the Soviet mathematician Kolmogorov developed a more modern framework for probability theory, and it's the one we use today.

Let's briefly examine both classical and modern probability theory.

### Classical probability theory

Pierre-Simon Laplace, a major figure in the development of classical probability theory

As I noted above, classical theory is largely due to Pierre-Simon Laplace. It's the more straightforward of the two probability frameworks, but also the more limited. Classical theory makes the following assumptions:

- When performing a probabilistic experiment, there are finitely many outcomes.
- Each of the finitely many outcomes is "equiprobable"; i.e., equally likely to occur.
- Therefore, determining the probability of a given event is simply a matter of counting the number of outcomes in the event and dividing by the total number of possible outcomes.

This is an extremely simple approach, and yet it's quite powerful and applicable to a wide range of real-world applications. Indeed we use it all the time today. Yet the assumptions impose some important limitations that a full-blown theory of probability needs to overcome:

- In many applications, there are infinitely many possible outcomes. Consider for example the "experiment" of measuring my commute time to the office. The number of outcomes is infinite. (It could take me 10 minutes, or 10.1 minutes, or 10.113 minutes, or 392,291 minutes, or...) Classical theory doesn't have a way to handle this.
- In many applications, the outcomes aren't equiprobable. For instance, consider an experiment where I flip a biased coin. Classical theory has nothing to say here.

These limitations led to the search for a framework that could overcome them. That brings us to modern probability theory.

### Modern probability theory

Andrey Kolmogorov, the father of modern probability theory.

The modern theory of probability is due to the Soviet mathematician
Andrey Kolmogorov.
It's the framework we use today, and the framework we'll use in this course even if we keep things pretty
basic. Kolmogorov's treatment extends classical theory to handle experiments with infinitely many outcomes
by building on a foundation of an area of mathematics known as *measure theory*. Kolmogorov's
approach doesn't require outcomes to be equiprobable, and so can easily handle biased coins and many other
cases.

We won't study measure theory in this course, as it's too advanced for a first course in probability. But there's really no way to escape some of the core measure-theoretic ideas. For example, if there are infinitely many possible commute times for me to get to the office, then how can we determine the probability that it takes me exactly 10.113 minutes? (Answer: the probability is 0, even though it's entirely possible that it takes me 10.113 minutes to get there. Say what?) How can we determine the probability that it takes me between 10 and 20 minutes? Kolmogorov's framework helps us answer such questions.

That's enough to give you a flavor of the history. I know you probably weren't expecting a history lesson here, but that's what you get when you show up to my probability and statistics course.

There's one more thing I want to touch on before we move on to the actual mathematics, and that's interpretations of probability.

## Interpretations of probability

In a previous life I studied Philosophy, and one of the things we learned in Philosophy of Science is that
the concept of probability isn't exactly nailed down. There are different interpretations of just what
probability actually *is*. They fall under two categories: *physical* and *evidential*.
For the philosophically-minded among you, the physical (or *objective*) interpretations are:

And here are the evidential (or *Bayesian*) interpretations:

- classical (Laplacian) interpretation and the principle of indifference
- subjective interpretation
- epistemic/inductive interpretation
- logical interpretation

The different interpretations are interesting from a philosophical perspective, and people still debate their respective merits today. Indeed the split between the frequentist and Bayesian approaches to statistics today traces back to differing conceptions of probability. We won't however delve into this in this course. The reason is that both the classical and modern theories are based on sets of axioms that render the philosophical trappings irrelevant from the perspective of developing the theory itself. This was a major achievement that allowed researchers to make forward progress on the mathematics, even if the philosophy was (and is) still contended.

Well, that was likely more than everything you ever wanted to know about the history and philosophy of probability theory. Now let's jump into the mathematics. Our first topic will be to understand the mathematical framework for modeling experiments.