# Mapping Categorical Data in pandas

In python, unlike R, there is no option to represent categorical data as factors. Factors in R are stored as vectors of integer values and can be labelled.

If we have our data in Series or Data Frames, we can convert these categories to numbers using pandas Series’ `astype`

method and specify ‘categorical’.

#### Nominal Categories

Nominal categories are unordered e.g. colours, sex, nationality.

In the example below we categorise the Series `vertebrates`

of the `df`

dataframe into their individual categories.

By default the categories are ordered alphabetically, which is why in the example below Amphibian is represented by a zero.

```
import pandas as pd
```

```
df = pd.DataFrame({'vertebrates': ['Bird', 'Bird', 'Mammal', 'Fish', 'Amphibian', 'Reptile', 'Mammal']})
df.vertebrates.astype("category").cat.codes
```

You can always pass the types of vertebrates in separately so you have a record of the labels to match the categories.

Any missing categories in this case will be represented by -1

```
vertebrate_types = ['Mammal', 'Reptile', 'Bird', 'Amphibian', 'Fish']
df.vertebrates.astype("category", categories=vertebrate_types).cat.codes
```

However, there is no inherent relationship between these categories so it doesn’t necessary make sense to store these as different numbers on the same scale.

If we wanted to separate the distinct variables out into booleans as we would like for data science models such as, for example, linear regression, we can use `pd.get_dummies`

.

```
pd.get_dummies(df, columns=['vertebrates'])
```

#### Ordinal Categories

Ordinal categories are ordered, e.g. school grades, price ranges, salary bands.

For ordinal categorical data, you pass the parameter `ordered = True`

to the `astype`

method.

```
ordered_satisfaction = ['Very Unhappy', 'Unhappy', 'Neutral', 'Happy', 'Very Happy']
df = pd.DataFrame({'satisfaction':['Mad', 'Happy', 'Unhappy', 'Neutral']})
```

We can have the output categories as text, with NaN for any missing categories:

```
df.satisfaction.astype("category",
ordered=True,
categories=ordered_satisfaction
)
```

Or the output categories as numbers that map to the ordered categories. The number -1 is given to any missing category.

```
df.satisfaction.astype("category",
ordered=True,
categories=ordered_satisfaction
).cat.codes
```