Skewness
Another way to think about the shape of a distribution is to look at its symmetry about its mean. Some common situations are as follows:
 a distribution with a longer left tail has left (negative) skew
 a distribution with a longer right tail has right (positive) skew
 a distribution that's symmetric about its mean has zero skew
Here are some plots to help you visualize these cases:
Here are some examples of variables for each type of skewness:
Leftskewed  Zero skew  Rightskewed 




Compare the examples above to the plots just above that and you should see a correspondence.
There are multiple flavors of the skewness concept, and even more ways of measuring it. We'll focus on the most common one, which is called Pearson's moment coefficient of skewness, or simply the moment coefficient of skewness. It has a fancy name, but if you understand how to measure variance, then it's not too much of a leap to understand skewness. With variance we take the average squared distance from the mean. With skewness we do something similar: we're essentially taking the average cubed zscore (i.e., standardized directed distance from the mean).
Let's look at the formulas.
Measuring skewness
There are separate formulas for measuring population skewness and sample skewness.
Definition. Population skewness is given by
where \(\mu_X\) is the population mean, \(\sigma_X\) is the population standard deviation and \(N\) is the population size.
Definition. Sample skewness is given by
where \(\bar{x}\) is the sample mean.
These formulas are a bit scary looking, but it's all just standard algebra.
Interpreting skewness
To help us interpret skewness, let's kick things off with an example.
Example: Test scores for a very difficult test
Here are the scores for a very difficult test that nearly everybody failed:
20, 32, 38, 44, 42, 38, 40, 36, 15, 19, 25, 26, 30, 74, 40, 28, 28, 56
First, here's a histogram of the test scores:
To calculate the skewness, first let's note that we're dealing with a population and not a sample, since we're measuring every test score we care about. So we'll use the population skewness formula. Therefore we need to calculate the population mean and population standard deviation. The mean is \(\mu = \frac{631}{18} \approx 35.06\), and the standard deviation is \(\sigma \approx 13.6523\).
Now we need to sum up each cubed standardized, meanadjusted value (i.e., zscore):
\(X_i\)  \((X_i\mu) / \sigma\)  \(((X_i\mu)/\sigma)^3\) 

20  1.1028  1.3411 
32  0.2238  0.0112 
38  0.2157  0.0100 
44  0.6552  0.2812 
42  0.5087  0.1316 
38  0.2157  0.0100 
40  0.3622  0.0475 
36  0.0692  0.0003 
15  1.4690  3.170 
19  1.1760  1.6265 
25  0.7365  0.3996 
26  0.6633  0.2918 
30  0.3703  0.0508 
74  2.8526  23.2123 
40  0.3622  0.0475 
28  0.5168  0.1380 
28  0.5168  0.1380 
56  1.5341  3.6107 
Total: 20.1839 
The sum of the cubed deviations is approximately 20.1839. To get the skewness, we divide by \(N = 18\), yielding \(Skew(scores) \approx 1.1213\). This is in line with what we see in the histogram, since there is a clear right (positive) skew.
OK, so now let's talk about what's going on with skewness. Like other measures of shape, we're looking at how data are distributed across the range of values. And in this case, it's kind of like the variance in that we care about how far away the data are from the mean. But there are some important differences:
 With variance, we're averaging squared raw distances. With skewness, we're averaging cubed, standardized distances. That is, skewness averages cubed zscores.
 Because variance is dealing with squares, it doesn't care about whether values are greater or less than the mean—it cares only about the distance. (Squares are always nonnegative.) With skewness, on the other hand, we're dealing with cubes, which preserve the sign of the value being cubed. So skewness cares about direction—whether the skewness is positive, negative or zero says something about which sorts of value tend to be larger vs. smaller than the mean. And that's part of how skewness allows us to tell the difference between right and leftskewed data.
 Another effect of skewness dealing with cubes is that cubes "stretch" or "compress" distances even more than squares do. (Recall that we saw that variance does the same sort of stretching and compressing relative to the average deviation. If a value's absolute distance from the mean is less than 1, then cubing will compress it, and do so more than squaring does. If a value's absolute distance from the mean is greater than 1, then cubing amplifies it, again more than squaring does.
 This "stretching" or "amplifying" that cubes do means that a small number of more extreme points in the distribution have disproportionate weight or "leverage". In the test scores example above, notice that almost all of the scores are failing, but then there's the one score of 74, which has a zscore of 2.8526. Cubing that zscore gives us 23.2123, which is far larger than all the rest of the cubed zscores in the table. That means it has disproportionate influence on the overall average (i.e., on the overall skewness), which is why we end up with a positive skewness. In other words, tail values really matter here.
 Finally, because skewness deals with standardized distances (i.e., zscores) instead of raw distances, we can more readily compare skewness across datasets.
Applications of skewness
Sample skewness can be a useful signal for testing the normality of a distribution, since a sample drawn from a normal distribution should have skewness close to zero. See also kurtosis.
That does it for our coverage of skewness. Next up is kurtosis, which is a logical extension of skewness.
Exercises
Exercise 1. Consider small population of scores { 2, 3, 2, 4, 3, 15 }. Calculate the skewness.
Exercise 2. Create a small dataset (maybe \(N = 10\) or so) and try to make it right skewed. Afterward measure its skewness to verify the right skew.