# Resampling time series data with pandas

In this post, we’ll be going through an example of resampling time series data using pandas.

We’re going to be tracking a self-driving car at 15 minute periods over a year and creating weekly and yearly summaries.

Let’s start by importing some dependencies:

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('display.mpl_style', 'default')
%matplotlib inline
```

We’ll be tracking this self-driving car that travels at an average speed between 0 and 60 mph, all day long, all year long.

We have the average speed over the fifteen minute period in miles per hour, distance in miles and the cumulative distance travelled.

Our time series is set to be the index of a pandas DataFrame.

```
range = pd.date_range('2015-01-01', '2015-12-31', freq='15min')
df = pd.DataFrame(index = range)
# Average speed in miles per hour
df['speed'] = np.random.randint(low=0, high=60, size=len(df.index))
# Distance in miles (speed * 0.5 hours)
df['distance'] = df['speed'] * 0.25
# Cumulative distance travelled
df['cumulative_distance'] = df.distance.cumsum()
```

Let’s take a look at our data:

```
df.head()
```

Now, let’s try and plot this data:

```
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(df.index, df['speed'], 'g-')
ax2.plot(df.index, df['distance'], 'b-')
ax1.set_xlabel('Date')
ax1.set_ylabel('Speed', color='g')
ax2.set_ylabel('Distance', color='b')
plt.show()
plt.rcParams['figure.figsize'] = 12,5
```

Oh dear… Not very pretty, far too many data points.

Let’s start resampling, we’ll start with a weekly summary.

The resample method in pandas is similar to its groupby method as you are essentially grouping by a certain time span. You then specify a method of how you would like to resample.

So we’ll start with resampling the speed of our car:

`df.speed.resample()`

will be used to resample the speed column of our DataFrame- The
`'W'`

indicates we want to resample by week. At the bottom of this post is a summary of different time frames. `mean()`

is used to indicate we want the mean speed during this period.

With distance, we want the sum of the distances over the week to see how far the car travelled over the week, in that case we use `sum()`

.

With cumulative distance we just want to take the last value as it’s a running cumulative total, so in that case we use `last()`

.

```
weekly_summary = pd.DataFrame()
weekly_summary['speed'] = df.speed.resample('W').mean()
weekly_summary['distance'] = df.distance.resample('W').sum()
weekly_summary['cumulative_distance'] = df.cumulative_distance.resample('W').last()
#Select only whole weeks
weekly_summary = weekly_summary.truncate(before='2015-01-05', after='2015-12-27')
weekly_summary.head()
```

Now we have weekly summary data. Let’s have a look at our plots now.

```
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(weekly_summary.index, weekly_summary['speed'], 'g-')
ax2.plot(weekly_summary.index, weekly_summary['distance'], 'b-')
ax1.set_xlabel('Date')
ax1.set_ylabel('Speed', color='g')
ax2.set_ylabel('Distance', color='b')
plt.show()
plt.rcParams['figure.figsize'] = 12,5
```

**Much** better

We can do the same thing for an annual summary:

```
annual_summary = pd.DataFrame()
# AS is year-start frequency
annual_summary['speed'] = df.speed.resample('AS').mean()
annual_summary['distance'] = df.speed.resample('AS').sum()
annual_summary['cumulative_distance'] = df.cumulative_distance.resample('AS').last()
annual_summary
```

### Upsampling data

How about if we wanted 5 minute data from our 15 minute data?

In this case we would want to forward fill our speed data, for this we can use `ffil()`

or `pad`

. Our distance and cumulative_distance column could then be recalculated on these values.

If we wanted to fill on the next value, rather than the previous value, we could use backward fill `bfill()`

.

```
five_minutely_data = pd.DataFrame()
five_minutely_data['speed'] = df.speed.resample('5min').ffill()
# 5 minutes is 1/12 hours
five_minutely_data['distance'] = five_minutely_data['speed'] * (1/float(12))
five_minutely_data['cumulative_distance'] = five_minutely_data.distance.cumsum()
```

```
five_minutely_data.head()
```

### Resampling options

pandas comes with many in-built options for resampling, and you can even define your own methods.

In terms of date ranges, the following is a table for common time period options when resampling a time series:

Alias | Description |
---|---|

B | Business day |

D | Calendar day |

W | Weekly |

M | Month end |

Q | Quarter end |

A | Year end |

BA | Business year end |

AS | Year start |

H | Hourly frequency |

T, min | Minutely frequency |

S | Secondly frequency |

L, ms | Millisecond frequency |

U, us | Microsecond frequency |

N, ns | Nanosecond frequency |

These are some of the common methods you might use for resampling:

Method | Description |
---|---|

bfill | Backward fill |

count | Count of values |

ffill | Forward fill |

first | First valid data value |

last | Last valid data value |

max | Maximum data value |

mean | Mean of values in time range |

median | Median of values in time range |

min | Minimum data value |

nunique | Number of unique values |

ohlc | Opening value, highest value, lowest value, closing value |

pad | Same as forward fill |

std | Standard deviation of values |

sum | Sum of values |

var | Variance of values |