Calculate Kurtosis in Python

Your goal

You need to calculate the kurtosis of numerical data in Python.

Step-by-step tutorial

We can use the SciPy library to calculate kurtosis. Let's use the age column from the Titanic dataset. Note that though this column is semantically numeric, it includes the ? string in places, which makes it an object column. We need to coerce it into a numeric column which converts the ? strings to numerical NaNs.

The SciPy kurtosis function allows us to calculate either Pearson's excess kurtosis (centered at 3), or Fisher's non-excess kurtosis (centered at 0). The default is Fisher's version.

>>> import pandas as pd
>>> from scipy.stats import kurtosis
>>> titanic_df = pd.read_csv("titanic-full.csv")
>>> age = titanic_df["age"]
>>> age
0         29
1         29
2         29
3         29
4         29
        ...
1304    14.5
1305       ?
1306    26.5
1307      27
1308      29
Name: age, Length: 1309, dtype: object
>>> age = pd.to_numeric(age, errors="coerce")
>>> age
0       29.0
1       29.0
2       29.0
3       29.0
4       29.0
        ...
1304    14.5
1305     NaN
1306    26.5
1307    27.0
1308    29.0
Name: age, Length: 1309, dtype: float64
>>> kurtosis(age, nan_policy="omit")
0.14452422741436566

Note that we used nan_policy="omit" to omit the NaNs from the calculation.