Calculate the Median in R

Your goal

You need to calculate the median of a numeric dataset in R.

Step-by-step tutorial

The correct approach depends on the type of representation you're working with. Let's cover some common cases. In each of them we'll use the median function from the stats package.

Vector data

Here's the approach for plain vector data:

> data <- c(8, 6, 7, 5, 3, 0, 9)
> median(data)
[1] 6

Data frame

For a standard data frame, we run median on a column:

> precip <- read.csv("precip-central-park.csv")
> head(precip)
  YEAR  JAN  FEB  MAR  APR  MAY  JUN  JUL  AUG  SEP  OCT  NOV  DEC ANNUAL
1 1869 2.53 6.87 4.61 1.39 4.15 4.40 3.20 1.76 2.81 6.48 2.03 5.02  45.25
2 1870 4.41 2.83 3.33 5.11 1.83 2.82 3.76 3.07 2.52 4.97 2.42 2.18  39.25
3 1871 2.07 2.72 5.54 3.03 4.04 7.05 5.57 5.60 2.34 7.50 3.56 2.24  51.26
4 1872 1.88 1.29 3.74 2.29 2.68 2.93 7.83 6.29 2.95 3.35 4.08 3.18  42.49
5 1873 5.34 3.80 2.09 4.16 3.69 1.28 4.61 9.56 3.14 2.73 4.63 2.96  47.99
6 1874 5.33 2.04 2.12 8.77 2.24 2.78 5.06 2.43 8.24 1.70 2.30 2.82  45.83
> median(precip[, "ANNUAL"])
[1] 44.55

Tibble

To get a column median, we need to use double-bracket notation:

> library(readr)
> precip <- read_csv("precip-central-park.csv")
Parsed with column specification:
cols(
  YEAR = col_double(),
  JAN = col_double(),
  FEB = col_double(),
  MAR = col_double(),
  APR = col_double(),
  MAY = col_double(),
  JUN = col_double(),
  JUL = col_double(),
  AUG = col_double(),
  SEP = col_double(),
  OCT = col_double(),
  NOV = col_double(),
  DEC = col_double(),
  ANNUAL = col_double()
)
> head(precip)
# A tibble: 6 x 14
   YEAR   JAN   FEB   MAR   APR   MAY   JUN   JUL   AUG   SEP   OCT   NOV   DEC
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  1869  2.53  6.87  4.61  1.39  4.15  4.4   3.2   1.76  2.81  6.48  2.03  5.02
2  1870  4.41  2.83  3.33  5.11  1.83  2.82  3.76  3.07  2.52  4.97  2.42  2.18
3  1871  2.07  2.72  5.54  3.03  4.04  7.05  5.57  5.6   2.34  7.5   3.56  2.24
4  1872  1.88  1.29  3.74  2.29  2.68  2.93  7.83  6.29  2.95  3.35  4.08  3.18
5  1873  5.34  3.8   2.09  4.16  3.69  1.28  4.61  9.56  3.14  2.73  4.63  2.96
6  1874  5.33  2.04  2.12  8.77  2.24  2.78  5.06  2.43  8.24  1.7   2.3   2.82
# … with 1 more variable: ANNUAL <dbl>
> median(precip[["ANNUAL"]])
[1] 44.55