# Variance

Our next measure of spread is *variance*. The calculation for the
variance is similar to the calculation for the average deviation,
except that we sum up squared deviations instead of summing up absolute deviations.

**Definition.** Given a numerical dataset
\(X = \{X_{i}\}\), we the *population variance* is given by:

where \(\mu_X\) is the population mean and \(N\) is the population size.

## An important note...

There are actually two different ways to compute the variance of a given set of data. Right now we'll ignore
that subtlety. Later when we get into inferential statistics, we'll consider the problem of using a sample of
data to estimate the corresponding variance in the population at large. At that time we'll learn about the
*sample variance*, which involves a small tweak to the formula above to improve the quality of the
estimates it makes.

Don't worry if this doesn't make much sense right now. We'll come back to it later in the course.

Let's compute the variance for the test score datasets from the previous sections.

## Example: Test scores for a reasonably challenging chemistry test

Here are our test scores for a chemistry test:

83, 87, 61, 92, 38, 78, 73, 55, 98, 74, 86, 69, 40, 83

To calculate the variance, first we need to calculate the mean. The mean is \(\mu = \frac{1017}{14} \approx 72.64\).

Now we need to sum up each squared difference (or "squared deviation") between a value and the mean:

\(X_i\) | \(\mu\) | \(X_i-\mu\) | \((X_i-\mu)^2\) |
---|---|---|---|

83 | 72.64 | 10.36 | 107.27 |

87 | 72.64 | 14.36 | 206.13 |

61 | 72.64 | -11.64 | 135.56 |

92 | 72.64 | 19.36 | 374.70 |

38 | 72.64 | -34.64 | 1200.13 |

78 | 72.64 | 5.36 | 28.70 |

73 | 72.64 | 0.36 | 0.13 |

55 | 72.64 | -17.64 | 311.27 |

98 | 72.64 | 25.36 | 642.98 |

74 | 72.64 | 1.36 | 1.84 |

86 | 72.64 | 13.36 | 178.41 |

69 | 72.64 | -3.64 | 13.27 |

40 | 72.64 | -32.64 | 1065.56 |

83 | 72.64 | 10.36 | 107.27 |

Total: 4373.21 |

The sum of the squared deviations is approximately 4,373.21. To get the variance, we divide by \(N = 14\), yielding \(\sigma^2 \approx 312.37\).

## Example: Test scores for an easy math test

Here are the test scores for an easy math test:

100, 100, 93, 92, 95, 98, 100, 100, 100, 95, 94, 88, 92

Here the dataset mean is \(\mu = \frac{1247}{13} \approx 95.92\). As in the last section, I'll leave it as an exercise for you to work through the calculations. But the end result is \(\sigma^2 \approx 14.99\). This corresponds with our intuition that the spread for this easy math test is lower than the spread for the chemistry test.

## Understanding the variance

We noted earlier that the formula for calculating the variance is quite similar to the one for calculating the average deviation, differing only in using squared deviations instead of absolute values. Both approaches transform the deviation into a nonnegative value. But squaring the deviation is interesting because it reduces the influence of very small deviations (i.e., deviations that are in between 0 and 1, exclusive) and it amplifies the influence of larger deviations (i.e., deviations greater than 1). So it's typical to find the variance being significantly larger than the average deviation, which is what we see in the examples above. The standard deviation corrects for this to some extent, as we'll see in the following section.

## Strengths of the variance as a measure of spread

**Incorporates all data values.** The variance
incorporates all data values into the final result.

**Can improve signal quality.** Using squared
deviations essentially boosts signal and limits noise. In other words, squaring boosts important deviations
and squashes unimportant deviations.

## Weaknesses of the variance as a measure of spread

**The calculation is less intuitive.** The way we
calculate the variance is less intuitive than the way we calculate the average deviation. Why does the
variance choose 1 as the cutoff point for boosting vs. reducing the signal? Why do we square the deviation as
opposed to applying some larger even power to get even more boosting/reduction? These seem like arbitrary
choices.

**Scale is generally significantly larger than the
original scale.** The scale of the average deviation is the same as the scale of the original data.
With the variance, the scale is typically larger than that of the original data—sometimes orders of
magnitude larger. This makes the variance harder to interpret, which again points to its being less
intuitive.

**The variance is sensitive to outliers.** As
with the other measures of spread we've seen so far, a single outlier in the dataset will dramatically
change the measurement. In other words, the variance is not robust.

## Exercises

**Exercise 1.** Compute the variance of the following data:

202, 102, 285, 98

Compare it to the average deviation you computed in the previous section for the same dataset. Which is larger? Is this what you expected?

**Exercise 2.** Compute the variance of the following data:

185, 245, 205, 215, 3829, 190

Compare it to the average deviation you computed in the previous section for the same dataset. Which is larger? Is this what you expected? How does the outlier impact the variance?