Calculate Covariance in Python

Your goal

You need to calculate Pearson's covariance between two numerical variables in Python. Your variables are stored in a Pandas DataFrame.

Step-by-step tutorial

In Python, the two major libraries for getting the covariance are Pandas and NumPy. Both of them actually generate covariance matrices rather than an individual covariance, so you'll need to pluck the covariance out of the matrix.

We'll use Pandas since we're already assuming a Pandas DataFrame.

>>> import pandas as pd
>>> precip = pd.read_csv("precip-central-park.csv")
>>> precip
     YEAR   JAN   FEB   MAR   APR   MAY   JUN   JUL   AUG   SEP   OCT   NOV   DEC  ANNUAL
0    1869  2.53  6.87  4.61  1.39  4.15  4.40  3.20  1.76  2.81  6.48  2.03  5.02   45.25
1    1870  4.41  2.83  3.33  5.11  1.83  2.82  3.76  3.07  2.52  4.97  2.42  2.18   39.25
2    1871  2.07  2.72  5.54  3.03  4.04  7.05  5.57  5.60  2.34  7.50  3.56  2.24   51.26
3    1872  1.88  1.29  3.74  2.29  2.68  2.93  7.83  6.29  2.95  3.35  4.08  3.18   42.49
4    1873  5.34  3.80  2.09  4.16  3.69  1.28  4.61  9.56  3.14  2.73  4.63  2.96   47.99
..    ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...     ...
146  2015  5.23  2.04  4.72  2.08  1.86  4.79  3.98  2.35  3.28  3.91  2.01  4.72   40.97
147  2016  4.41  4.40  1.17  1.61  3.75  2.60  7.02  1.97  2.79  4.15  5.41  2.89   42.17
148  2017  4.83  2.48  5.25  3.84  6.38  4.76  4.19  3.34  2.00  4.18  1.58  2.21   45.04
149  2018  2.18  5.83  5.17  5.78  3.53  3.11  7.45  8.59  6.19  3.59  7.62  6.51   65.55
150  2019  3.58  3.14  3.87  4.55  6.82  5.46  5.77  3.70  0.95  6.15  1.95  7.09   53.03

[151 rows x 14 columns]
>>> jan_jul = precip[["JAN", "JUL"]]
>>> jan_jul
      JAN   JUL
0    2.53  3.20
1    4.41  3.76
2    2.07  5.57
3    1.88  7.83
4    5.34  4.61
..    ...   ...
146  5.23  3.98
147  4.41  7.02
148  4.83  4.19
149  2.18  7.45
150  3.58  5.77

[151 rows x 2 columns]
>>> jan_jul.cov()
          JAN       JUL
JAN  2.664147 -0.536097
JUL -0.536097  5.044624
>>> jan_jul.cov()["JAN"]["JUL"]
-0.5360971169977929