Creating a risk matrix graphic using R and ggplot2

In this post we’re going to look at creating a risk matrix using R and ggplot2 to visualise the time impact, probability, and cost of various tasks.

The finished product will look like this:

Let’s start with our imports, for this post we’ll need ggplot2 and scales:

In [1]:

library(ggplot2)
library(scales)

Let’s bring in our dataset of tasks that are rated for their time impact, probability and cost between 1 (least severe) to 5 (most severe):

In [2]:

dataset <- read.csv("risk_matrix_data.csv")

head(dataset)

task_id	time_impact	probability	cost
id1	1	3	2
id2	1	2	1
id3	3	5	3
id4	1	3	2
id5	3	5	4
id6	4	5	2

Our plot will run between 0 and 420, with each section increasing in size towards the more severe risks as this kind of graphic is often intended to show the most severe risks so we may have many in our top right corner.

Below we define the boundaries and calculate the midpoints for each of the sections.

In [3]:

min_1 <- 0
min_2 <- 30
min_3 <- 90
min_4 <- 180
min_5 <- 300
max_plot = 420

mid_1 <- ((min_2 - min_1) / 2) + min_1
mid_2 <- ((min_3 - min_2) / 2) + min_2
mid_3 <- ((min_4 - min_3) / 2) + min_3
mid_4 <- ((min_5 - min_4) / 2) + min_4
mid_5 <- ((max_plot - min_5) / 2) + min_5

mid_values <- data.frame("val"= c(1, 2, 3, 4, 5), "midpoint"=c(mid_1, mid_2, mid_3, mid_4, mid_5))

We now calculate the position on the x axis time_impact_pos as the midpoint of the section the risk belongs to on the x axis.

We then do the same for the position on the y axis probability_pos.

In [4]:

dataset <- merge(x=dataset, y=mid_values, by.x="time_impact", by.y="val")
names(dataset)[names(dataset) == 'midpoint'] <- 'time_impact_pos'

dataset <- merge(x=dataset, y=mid_values, by.x="probability", by.y="val")
names(dataset)[names(dataset) == 'midpoint'] <- 'probability_pos'

We can have multiple risks within the same section and so if we have additional risks in the same section, we’d like to offset the position of the risk so there’s not a clash.

We start by calculating which section number each risk falls into.

We can then calculate which number risk that risk is within the section using a cumulative sum for each section.

If the time impact (x) is greater than the probability (y), this section is likely to be a wide, short section so we want to offset the position to the left/right.

If the time impact (x) is less than the probability (y), this section is likely to be a narrow, tall section so we want to offset the position to the top/bottom.

We define offsets to accommodate up to 7 risks within the same section.

In [5]:

dataset$section_no <- dataset$time_impact + (dataset$probability - 1) * 5

dataset <- dataset[order(dataset$section_no,-dataset$cost),]

dataset$section_no_count <- ave(dataset$section_no==dataset$section_no, dataset$section_no, FUN=cumsum)

position_offset <- 30

# Initialise offsets to be 0
dataset$time_impact_pos_offset <- 0
dataset$probability_pos_offset <- 0

# If there are 2 risks in the same section,
# and x < y, move y up
dataset$probability_pos_offset[
    (dataset$section_no_count == 2) &
    (dataset$time_impact <= dataset$probability)
] <- position_offset

# If there are 2 risks in the same section,
# and x > y, move x up
dataset$time_impact_pos_offset[
    (dataset$section_no_count == 2) &
    (dataset$time_impact > dataset$probability)
] <- position_offset

# If there are 3 risks in the same section,
# and x < y, move y down for our 3rd risk
dataset$probability_pos_offset[
    (dataset$section_no_count == 3) &
    (dataset$time_impact <= dataset$probability)
] <- -position_offset

# If there are 3 risks in the same section,
# and x > y, move y down for our 3rd risk
dataset$time_impact_pos_offset[
    (dataset$section_no_count == 3) &
    (dataset$time_impact > dataset$probability)
] <- -position_offset

# Move our risk in perpendicular axis for 4th and 5th risks
dataset$probability_pos_offset[
    (dataset$section_no_count == 4) &
    (dataset$time_impact > dataset$probability)
] <- position_offset
dataset$time_impact_pos_offset[
    (dataset$section_no_count == 4) &
    (dataset$time_impact <= dataset$probability)
] <- position_offset
dataset$probability_pos_offset[
    (dataset$section_no_count == 5) &
    (dataset$time_impact > dataset$probability)
] <- -position_offset
dataset$time_impact_pos_offset[
    (dataset$section_no_count == 5) &
    (dataset$time_impact <= dataset$probability)
] <- -position_offset

# Offset both x and y simultaneously for 6th and 7 risks
dataset$probability_pos_offset[dataset$section_no_count == 6] <- position_offset
dataset$time_impact_pos_offset[dataset$section_no_count == 6] <- position_offset
dataset$probability_pos_offset[dataset$section_no_count == 7] <- -position_offset
dataset$time_impact_pos_offset[dataset$section_no_count == 7] <- -position_offset

Because our cost value is categorical, we make it an ordered factor and give meaningful names for the costs.

These different cost levels will correspond to different background colours. We need to have a text colour that is appropriate for these background colours so if the cost is "Low" (light green) or "Medium" (yellow), then we need to have black text so it’s legible.

In [6]:

dataset$cost <- factor(dataset$cost, ordered=TRUE)

factor_values <- c("V. Low", "Low", "Medium", "High", "V. High")

levels(dataset$cost) <- c("V. Low", "Low", "Medium", "High", "V. High")

dataset$text_color <- "white"
dataset$text_color[dataset$cost=="Low" | dataset$cost=="Medium"] <- "black"

We can now set up our plot.

A large chunk of this code is defining the background fills for each of the different sections.

I’ve commented each section of code to illustrate what’s going on.

In [7]:

# Define some colours
dark_green <- '#009900'
light_green <- '#A4C93F'
yellow <- '#FFFD54'
orange <- '#F19C38'
red <- '#EA3224'

# Initialise plot
# Our x and y values are the x and y positions
# offset by our offsets to prevent points colliding
p <- ggplot(dataset,
            aes(x = time_impact_pos + time_impact_pos_offset,
                y = probability_pos + probability_pos_offset,
                fill = cost
               )
           )
# Remove default values
p<-p+theme_classic()

# Label x and y axes
p <- p + xlab('Time Impact')
p <- p + ylab('Probability')

# Scale x axis between 0 and 420
# Define breaks for our axis ticks
# Label ticks with proper values
# Don't expand axis past min and max points
p <- p + scale_x_continuous(
    limits=c(min_1, max_plot),
    breaks = c(mid_1, mid_2, mid_3, mid_4, mid_5),
    labels = factor_values,
    expand=c(0, 0)
)

# Copy for y axis
p <- p + scale_y_continuous(
    limits=c(min_1, max_plot),
    breaks = c(mid_1, mid_2, mid_3, mid_4, mid_5),
    labels = factor_values,
    expand=c(0, 0)
)


### Backgrounds for each section are rectangles
#Row 1 Rects
p <- p + geom_rect(xmin=min_1, xmax=min_2, ymin=min_1, ymax=min_2, fill=dark_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_2, xmax=min_3, ymin=min_1, ymax=min_2, fill=dark_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_3, xmax=min_4, ymin=min_1, ymax=min_2, fill=light_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_4, xmax=min_5, ymin=min_1, ymax=min_2, fill=yellow, size=0.5, color='black')
p <- p + geom_rect(xmin=min_5, xmax=max_plot, ymin=min_1, ymax=min_2, fill=yellow, size=0.5, color='black')
#Row 2 Rects
p <- p + geom_rect(xmin=min_1, xmax=min_2, ymin=min_2, ymax=min_3, fill=dark_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_2, xmax=min_3, ymin=min_2, ymax=min_3, fill=light_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_3, xmax=min_4, ymin=min_2, ymax=min_3, fill=yellow, size=0.5, color='black')
p <- p + geom_rect(xmin=min_4, xmax=min_5, ymin=min_2, ymax=min_3, fill=yellow, size=0.5, color='black')
p <- p + geom_rect(xmin=min_5, xmax=max_plot, ymin=min_2, ymax=min_3, fill=orange, size=0.5, color='black')
#Row 3 Rects
p <- p + geom_rect(xmin=min_1, xmax=min_2, ymin=min_3, ymax=min_4, fill=dark_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_2, xmax=min_3, ymin=min_3, ymax=min_4, fill=light_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_3, xmax=min_4, ymin=min_3, ymax=min_4, fill=yellow, size=0.5, color='black')
p <- p + geom_rect(xmin=min_4, xmax=min_5, ymin=min_3, ymax=min_4, fill=orange, size=0.5, color='black')
p <- p + geom_rect(xmin=min_5, xmax=max_plot, ymin=min_3, ymax=min_4, fill=orange, size=0.5, color='black')
#Row 4 Rects
p <- p + geom_rect(xmin=min_1, xmax=min_2, ymin=min_4, ymax=min_5, fill=dark_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_2, xmax=min_3, ymin=min_4, ymax=min_5, fill=light_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_3, xmax=min_4, ymin=min_4, ymax=min_5, fill=yellow, size=0.5, color='black')
p <- p + geom_rect(xmin=min_4, xmax=min_5, ymin=min_4, ymax=min_5, fill=orange, size=0.5, color='black')
p <- p + geom_rect(xmin=min_5, xmax=max_plot, ymin=min_4, ymax=min_5, fill=red, size=0.5, color='black')
#Row 5 Rects
p <- p + geom_rect(xmin=min_1, xmax=min_2, ymin=min_5, ymax=max_plot, fill=light_green, size=0.5, color='black')
p <- p + geom_rect(xmin=min_2, xmax=min_3, ymin=min_5, ymax=max_plot, fill=yellow, size=0.5, color='black')
p <- p + geom_rect(xmin=min_3, xmax=min_4, ymin=min_5, ymax=max_plot, fill=yellow, size=0.5, color='black')
p <- p + geom_rect(xmin=min_4, xmax=min_5, ymin=min_5, ymax=max_plot, fill=red, size=0.5, color='black')
p <- p + geom_rect(xmin=min_5, xmax=max_plot, ymin=min_5, ymax=max_plot, fill=red, size=0.5, color='black')

# Plot scatter chart
# Shape is a bordered circle
# Size needs to be large enough to allow for id to be legible
p <- p + geom_point(shape=21, size=10)

# Fill points with colours based on cost values
fill_colors <- c("V. Low" = dark_green, "Low" = light_green, "Medium" = yellow, "High" = orange, "V. High" = red)
p <- p + scale_fill_manual(values = fill_colors, name="Cost")

# Fill the points with the ID labels
p <- p + geom_text(aes(label=task_id), size=3, hjust=0.5, vjust=0.5, color=dataset$text_color)

# Put the legend at the bottom
p <- p +theme(
    legend.position = "bottom"
)

# Show the plot
print(p)