Creating a Timeline graphic using R and ggplot2

  • Post author:
  • Post category:R

Creating a Timeline graphic using R and ggplot2

In this post we’re going to be using R and ggplot2 to create a project timeline with milestones and milestone statuses.

The finished product will look like this:

Let’s start by importing libraries that we’ll be using, we’ll only need ggplot2, scales, and lubridate for this task.

In [1]:
library(ggplot2)
library(scales)
library(lubridate)

Now let’s load in some data

In [2]:
df <- read.csv('milestones.csv')
df
month year milestone status
6 2017 Milestone 1 Complete
7 2017 Milestone 2 Complete
10 2017 Milestone 3 Complete
12 2017 Milestone 4 Complete
1 2018 Milestone 5 Complete
1 2018 Milestone 6 Complete
2 2018 Milestone 7 Complete
5 2018 Milestone 8 Complete
6 2018 Milestone 9 On Target
6 2018 Milestone 10 On Target
9 2018 Milestone 11 At Risk
11 2018 Milestone 12 On Target
12 2018 Milestone 13 On Target
12 2018 Milestone 14 On Target
12 2018 Milestone 15 At Risk
4 2019 Milestone 16 Critical
7 2019 Milestone 17 On Target
7 2019 Milestone 18 On Target
9 2019 Milestone 19 On Target
10 2019 Milestone 20 At Risk
10 2019 Milestone 21 On Target
12 2019 Milestone 22 Critical

The first thing we’ll do is define a date for each of these rows as the 1st of the month.

In [3]:
df$date <- with(df, ymd(sprintf('%04d%02d%02d', year, month, 1)))
df <- df[with(df, order(date)), ]
head(df)
month year milestone status date
6 2017 Milestone 1 Complete 2017-06-01
7 2017 Milestone 2 Complete 2017-07-01
10 2017 Milestone 3 Complete 2017-10-01
12 2017 Milestone 4 Complete 2017-12-01
1 2018 Milestone 5 Complete 2018-01-01
1 2018 Milestone 6 Complete 2018-01-01

Next we’ll convert the status to an ordinal categorical variable, in order of criticality ranging from “Complete” to “Critical”. We’ll also define some hexadecimal colour values to associate with these statuses.

In [4]:
status_levels <- c("Complete", "On Target", "At Risk", "Critical")
status_colors <- c("#0070C0", "#00B050", "#FFC000", "#C00000")

df$status <- factor(df$status, levels=status_levels, ordered=TRUE)

In our timeline, we want to vary the height and direction of the lines, because otherwise the text for our milestones will clash.

We need to assign the lines and the heights for milestones within the same month to be the same, so we only change the height and position values.

We then order our data frame by date and status, so that the most critical status is plotted last and the colours displayed are for the most critical milestone status.

In [5]:
positions <- c(0.5, -0.5, 1.0, -1.0, 1.5, -1.5)
directions <- c(1, -1)

line_pos <- data.frame(
    "date"=unique(df$date),
    "position"=rep(positions, length.out=length(unique(df$date))),
    "direction"=rep(directions, length.out=length(unique(df$date)))
)

df <- merge(x=df, y=line_pos, by="date", all = TRUE)
df <- df[with(df, order(date, status)), ]

head(df)
date month year milestone status position direction
2017-06-01 6 2017 Milestone 1 Complete 0.5 1
2017-07-01 7 2017 Milestone 2 Complete -0.5 -1
2017-10-01 10 2017 Milestone 3 Complete 1.0 1
2017-12-01 12 2017 Milestone 4 Complete -1.0 -1
2018-01-01 1 2018 Milestone 5 Complete 1.5 1
2018-01-01 1 2018 Milestone 6 Complete 1.5 1

If there are multiple milestones for a given month, we need to slightly alter their positions (slightly higher if above our timeline and slightly lower if below our timeline).

We can do a cumulative count of individual dates to check if we have multiple milestones for a given month.

In [6]:
text_offset <- 0.05

df$month_count <- ave(df$date==df$date, df$date, FUN=cumsum)
df$text_position <- (df$month_count * text_offset * df$direction) + df$position
head(df)
date month year milestone status position direction month_count text_position
2017-06-01 6 2017 Milestone 1 Complete 0.5 1 1 0.55
2017-07-01 7 2017 Milestone 2 Complete -0.5 -1 1 -0.55
2017-10-01 10 2017 Milestone 3 Complete 1.0 1 1 1.05
2017-12-01 12 2017 Milestone 4 Complete -1.0 -1 1 -1.05
2018-01-01 1 2018 Milestone 5 Complete 1.5 1 1 1.55
2018-01-01 1 2018 Milestone 6 Complete 1.5 1 2 1.60

Because we want to display all months on our timelines, not just the months we have events for, we’ll create a data frame containing all of our months.

We’ll start 2 months before the first milestone and end 2 months after the last milestone for a little bit of a buffer.

In [7]:
month_buffer <- 2

month_date_range <- seq(min(df$date) - months(month_buffer), max(df$date) + months(month_buffer), by='month')
month_format <- format(month_date_range, '%b')
month_df <- data.frame(month_date_range, month_format)

We do the same for the years that we also want to display.

We’ll only display years for which there is a December/January crossover, this is what our intersect line is doing.

In [8]:
year_date_range <- seq(min(df$date) - months(month_buffer), max(df$date) + months(month_buffer), by='year')
year_date_range <- as.Date(
    intersect(
        ceiling_date(year_date_range, unit="year"),
        floor_date(year_date_range, unit="year")
    ),  origin = "1970-01-01"
)
year_format <- format(year_date_range, '%Y')
year_df <- data.frame(year_date_range, year_format)

Now that we’ve got our data in a state ready to be plotted, we can put together our plot.

In [9]:
#### PLOT ####

timeline_plot<-ggplot(df,aes(x=date,y=0, col=status, label=milestone))
timeline_plot<-timeline_plot+labs(col="Milestones")
timeline_plot<-timeline_plot+scale_color_manual(values=status_colors, labels=status_levels, drop = FALSE)
timeline_plot<-timeline_plot+theme_classic()

# Plot horizontal black line for timeline
timeline_plot<-timeline_plot+geom_hline(yintercept=0, 
                color = "black", size=0.3)

# Plot vertical segment lines for milestones
timeline_plot<-timeline_plot+geom_segment(data=df[df$month_count == 1,], aes(y=position,yend=0,xend=date), color='black', size=0.2)

# Plot scatter points at zero and date
timeline_plot<-timeline_plot+geom_point(aes(y=0), size=3)

# Don't show axes, appropriately position legend
timeline_plot<-timeline_plot+theme(axis.line.y=element_blank(),
                 axis.text.y=element_blank(),
                 axis.title.x=element_blank(),
                 axis.title.y=element_blank(),
                 axis.ticks.y=element_blank(),
                 axis.text.x =element_blank(),
                 axis.ticks.x =element_blank(),
                 axis.line.x =element_blank(),
                 legend.position = "bottom"
                )

# Show text for each month
timeline_plot<-timeline_plot+geom_text(data=month_df, aes(x=month_date_range,y=-0.1,label=month_format),size=2.5,vjust=0.5, color='black', angle=90)
# Show year text
timeline_plot<-timeline_plot+geom_text(data=year_df, aes(x=year_date_range,y=-0.2,label=year_format, fontface="bold"),size=2.5, color='black')
# Show text for each milestone
timeline_plot<-timeline_plot+geom_text(aes(y=text_position,label=milestone),size=2.5)
print(timeline_plot)