Skewness

Another way to think about the shape of a distribution is to look at its symmetry about its mean. Some common situations are as follows:

  • a distribution with a longer left tail has left (negative) skew
  • a distribution with a longer right tail has right (positive) skew
  • a distribution that's symmetric about its mean has zero skew

Here are some plots to help you visualize these cases:

Three types of skewness
Three types of skewness

Here are some examples of variables for each type of skewness:

Left-skewed Zero skew Right-skewed
  • Age at death from natural causes (cancer, heart disease, etc.)
  • Age at retirement
  • Scores on an easy test
  • IQ scores
  • Shoe size
  • Men's height
  • Blood pressure
  • Daily returns for a given stock
  • Age at death from trauma (accident, murder, suicide, etc.)
  • House prices
  • Annual household income
  • Social media follower count

Compare the examples above to the plots just above that and you should see a correspondence.

There are multiple flavors of the skewness concept, and even more ways of measuring it. We'll focus on the most common one, which is called Pearson's moment coefficient of skewness, or simply the moment coefficient of skewness. It has a fancy name, but if you understand how to measure variance, then it's not too much of a leap to understand skewness. With variance we take the average squared distance from the mean. With skewness we do something similar: we're essentially taking the average cubed z-score (i.e., standardized directed distance from the mean).

Let's look at the formulas.

Measuring skewness

There are separate formulas for measuring population skewness and sample skewness.

Definition. Population skewness is given by

\[ Skew(X) = \frac{1}{N}\sum _{i=1}^N \left( \frac{X_i-\mu_X}{\sigma_X} \right)^3 \]

where \(\mu_X\) is the population mean, \(\sigma_X\) is the population standard deviation and \(N\) is the population size.

Definition. Sample skewness is given by

\[ b_1 = \frac{\frac{1}{n} \sum_{i=1}^n (x_i-\bar{x})^3} {\left[\frac{1}{n-1} \sum_{i=1}^n (x_i-\bar{x})^2\right]^{3/2}} \]

where \(\bar{x}\) is the sample mean.

These formulas are a bit scary looking, but it's all just standard algebra.

Interpreting skewness

To help us interpret skewness, let's kick things off with an example.

Example: Test scores for a very difficult test

Here are the scores for a very difficult test that nearly everybody failed:

20, 32, 38, 44, 42, 38, 40, 36, 15, 19, 25, 26, 30, 74, 40, 28, 28, 56

First, here's a histogram of the test scores:

Histogram of test scores
Histogram of test scores

To calculate the skewness, first let's note that we're dealing with a population and not a sample, since we're measuring every test score we care about. So we'll use the population skewness formula. Therefore we need to calculate the population mean and population standard deviation. The mean is \(\mu = \frac{631}{18} \approx 35.06\), and the standard deviation is \(\sigma \approx 13.6523\).

Now we need to sum up each cubed standardized, mean-adjusted value (i.e., z-score):

\(X_i\) \((X_i-\mu) / \sigma\) \(((X_i-\mu)/\sigma)^3\)
20 -1.1028 -1.3411
32 -0.2238 -0.0112
38 0.2157 0.0100
44 0.6552 0.2812
42 0.5087 0.1316
38 0.2157 0.0100
40 0.3622 0.0475
36 0.0692 0.0003
15 -1.4690 -3.170
19 -1.1760 -1.6265
25 -0.7365 -0.3996
26 -0.6633 -0.2918
30 -0.3703 -0.0508
74 2.8526 23.2123
40 0.3622 0.0475
28 -0.5168 -0.1380
28 -0.5168 -0.1380
56 1.5341 3.6107
Total: 20.1839

The sum of the cubed deviations is approximately 20.1839. To get the skewness, we divide by \(N = 18\), yielding \(Skew(scores) \approx 1.1213\). This is in line with what we see in the histogram, since there is a clear right (positive) skew.

OK, so now let's talk about what's going on with skewness. Like other measures of shape, we're looking at how data are distributed across the range of values. And in this case, it's kind of like the variance in that we care about how far away the data are from the mean. But there are some important differences:

  • With variance, we're averaging squared raw distances. With skewness, we're averaging cubed, standardized distances. That is, skewness averages cubed z-scores.
  • Because variance is dealing with squares, it doesn't care about whether values are greater or less than the mean—it cares only about the distance. (Squares are always non-negative.) With skewness, on the other hand, we're dealing with cubes, which preserve the sign of the value being cubed. So skewness cares about direction—whether the skewness is positive, negative or zero says something about which sorts of value tend to be larger vs. smaller than the mean. And that's part of how skewness allows us to tell the difference between right- and left-skewed data.
  • Another effect of skewness dealing with cubes is that cubes "stretch" or "compress" distances even more than squares do. (Recall that we saw that variance does the same sort of stretching and compressing relative to the average deviation. If a value's absolute distance from the mean is less than 1, then cubing will compress it, and do so more than squaring does. If a value's absolute distance from the mean is greater than 1, then cubing amplifies it, again more than squaring does.
  • This "stretching" or "amplifying" that cubes do means that a small number of more extreme points in the distribution have disproportionate weight or "leverage". In the test scores example above, notice that almost all of the scores are failing, but then there's the one score of 74, which has a z-score of 2.8526. Cubing that z-score gives us 23.2123, which is far larger than all the rest of the cubed z-scores in the table. That means it has disproportionate influence on the overall average (i.e., on the overall skewness), which is why we end up with a positive skewness. In other words, tail values really matter here.
  • Finally, because skewness deals with standardized distances (i.e., z-scores) instead of raw distances, we can more readily compare skewness across datasets.

Applications of skewness

Sample skewness can be a useful signal for testing the normality of a distribution, since a sample drawn from a normal distribution should have skewness close to zero. See also kurtosis.

That does it for our coverage of skewness. Next up is kurtosis, which is a logical extension of skewness.

Exercises

Exercise 1. Consider small population of scores { 2, 3, 2, 4, 3, 15 }. Calculate the skewness.

Exercise 2. Create a small dataset (maybe \(N = 10\) or so) and try to make it right skewed. Afterward measure its skewness to verify the right skew.