Calculate Skewness in Python

Your goal

You need to calculate the skewness of numerical data in Python.

Step-by-step tutorial

We can use the SciPy library to calculate skewness. Let's use the age column from the Titanic dataset. Note that though this column is semantically numeric, it includes the ? string in places, which makes it an object column. We need to coerce it into a numeric column which converts the ? strings to numerical NaNs.

>>> import pandas as pd
>>> from scipy.stats import skew
>>> titanic_df = pd.read_csv("titanic-full.csv")
>>> age = titanic_df["age"]
>>> age
0         29
1         29
2         29
3         29
4         29
        ...
1304    14.5
1305       ?
1306    26.5
1307      27
1308      29
Name: age, Length: 1309, dtype: object
>>> age = pd.to_numeric(age, errors="coerce")
>>> age
0       29.0
1       29.0
2       29.0
3       29.0
4       29.0
        ...
1304    14.5
1305     NaN
1306    26.5
1307    27.0
1308    29.0
Name: age, Length: 1309, dtype: float64
>>> skew(age, nan_policy="omit")
masked_array(data=0.39385636,
             mask=False,
       fill_value=1e+20)

Note that we used nan_policy="omit" to omit the NaNs from the calculation.