# Variables

In the previous section, Datasets, we noted that a dataset's columns represent variables. There are different types of variable, and the approach to visualizing and analyzing them depends upon the type. Here's a common scheme for classifying variables:

This section offers an overview of the types of variables in the diagram above.

## Numerical variables

*Numerical variables* are variables whose values are, well, numbers. This could be counts,
temperatures, birth orders, test scores and so forth. We further divide numerical variables into two
categories: discrete numerical variables and continuous numerical variables.

### Discrete variables

*Discrete variables* are numerical variables whose possible values have "gaps" between them. For
example, counts are discrete, because counts are numbers like 0, 1, 2, ..., and in between any of those
numbers are gaps like 0.5, 0.62 and so on. Birth years are another example of a discrete variable.

### Continuous variables

A *continuous variable* is a numerical variable whose possible values are "gapless": for any pair of
values, there's always some possible value in between them. Examples would be distances, temperatures,
durations and so forth.

## Categorical variables

The second general type of variable are *categorical variables*, which are variables whose values are
categories or labels. The values are usually text, like a variable whose values are color names like
`BLACK`

, `RED`

and so forth. There are two types of categorical variable: nominal
variables and ordinal variables.

### Nominal variables

A *nominal variable* is a categorical variable whose values are categories or labels that has no
particular order. The color names that we just mentioned are an example since there's no specific order to
the colors. A variable whose values are fruit names is another example.

### Ordinal variables

An *ordinal variables* is a categorical variable whose values are ordered. For example, a level
variable with values `HIGH`

, `MEDIUM`

and `LOW`

would be an ordinal
variable. T-shirt sizes like `XL`

, `L`

, `M`

, `S`

and
`XS`

are another example since they are ordered by size.

## Borderline cases

Not every variable fits neatly into the scheme we describe above. Sometimes it's more of a judgment call.
Take for instance a `Year` variable. Years are numbers, and so in many contexts it might make sense
to treat them as numerical variables. On the other hand, they sometimes behave more like labels. For example
we might want to count celebrity deaths per year, and in such a case there's not really much reason to treat
this as if there's some deeper numerical relationship between the year and the number of deaths—the
year is just a label for a certain period of time, and there were a certain number of celebrity deaths that
occurred during that time. In this context, it would make more sense to treat the year as categorical
data.

That wraps up our quick tour of datasets and variables. In the next section we'll look at a key piece of context around the datasets we study: populations vs. samples.

## Exercises

**Exercise 1.** Consider geographical data that includes `Latitude` and
`Longitude` variables. How would you classify these?

**Exercise 2.** Consider a user feedback scale that includes values `AWESOME`

,
`GOOD`

, `OK`

, `POOR`

and `TERRIBLE`

. What kind of variable is
this?

**Exercise 3.** Suppose you have an employee dataset that includes a variable
`EmployeeID`, with values like 16087 or 32104. What kind of variable is this? Explain.

**Exercise 4.** What kind of variable is a US ZIP code (i.e., a postal code)? Explain.