Forest Plot (with Horizontal Bands)

Forest plots are often used in clinical trial reports to show differences in the estimated treatment effect(s) across various patient subgroups. See, for example a review. The page on Clinical Trials Safety Graphics includes a SAS code for a forest plot that depicts the hazard ratios for various patient subgroups (this web page has links to other interesting clinical trial graphics, as well as links to notes on best practices for graphs, and is worth checking out).

The purpose of this blog post is to create the same forest plot using R.

It should be possible to create such a graphic from first principles, using either base R graphics or using the ggplot2 package such as posted here. However, there is a contributed package forestplot that makes it very easy to make forest plots interspersed with tables – we just need to supply the right arguments to the forestplot function in the package. Also, one question that arose was how easy it would be to get the horizontal grey bands for alternate rows in the forest plot. This too was not very difficult. Following is a short explanation of the entire process, as well as the relevant R code:

First, we store the data for the plot, ForestPlotData in any convenient file format and then read it into a dataframe in R:

workdir <- ""C:\\Path\\To\\Relevant\\Directory""
datafile <- file.path(workdir,"ForestPlotData.csv")
data <- read.csv(datafile, stringsAsFactors=FALSE)

Then format the data a bit so that the column labels and columns match the required graphical output:

## Labels defining subgroups are a little indented!
subgps <- c(4,5,8,9,12,13,16,17,20,21,24,25,28,29,32,33)
data$Variable[subgps] <- paste("  ",data$Variable[subgps]) 
## Combine the count and percent column
np <- ifelse(!$Count), paste(data$Count," (",data$Percent,")",sep=""), NA)
## The rest of the columns in the table. 
tabletext <- cbind(c("Subgroup","\n",data$Variable), 
                    c("No. of Patients (%)","\n",np), 
                    c("4-Yr Cum. Event Rate\n PCI","\n",data$PCI.Group), 
                    c("4-Yr Cum. Event Rate\n Medical Therapy","\n",data$Medical.Therapy.Group), 
                    c("P Value","\n",data$P.Value))

Finally, include the forestplot R package and call the forestplot function with appropriate arguments.

The way I got around to creating the horizontal band at every alternate row was by using settings for a very thick transparent line in the hrzl_lines argument! See below. The col=”#99999922″ option gives the light grey color to the line as well as sets it to be transparent.

A graphics device (here, a png file) with appropriate dimensions is first opened and the forest plot is saved to the device.

png(file.path(workdir,"Figures\\Forestplot.png"),width=960, height=640)
forestplot(labeltext=tabletext, graph.pos=3, 
           lower=c(NA,NA,data$Low), upper=c(NA,NA,data$High),
           title="Hazard Ratio",
           xlab="     <---PCI Better---    ---Medical Therapy Better--->",
           hrzl_lines=list("3" = gpar(lwd=1, col="#99999922"), 
                          "7" = gpar(lwd=60, lineend="butt", columns=c(2:6), col="#99999922"),
                          "15" = gpar(lwd=60, lineend="butt", columns=c(2:6), col="#99999922"),
                          "23" = gpar(lwd=60, lineend="butt", columns=c(2:6), col="#99999922"),
                          "31" = gpar(lwd=60, lineend="butt", columns=c(2:6), col="#99999922")),
                              xlab=gpar(cex = 1.2),
                              title=gpar(cex = 1.2)),
           col=fpColors(box="black", lines="black", zero = "gray50"),
           zero=1, cex=0.9, lineheight = "auto", boxsize=0.5, colgap=unit(6,"mm"),
 , ci.vertices=TRUE, ci.vertices.height = 0.4)

Here is the resulting forest plot:


Waterfall plots – what and how?

Waterfall plots” are nowadays often used in oncology clinical trials for a graphical representation of the quantitative response of each subject to treatment. For an informative article explaining waterfall plots see Understanding Waterfall Plots.

In this post, we illustrate the creation of waterfall plots in R.

In a typical waterfall plot, the x-axis serves as the baseline value of the response variable. For each subject, vertical bars are drawn from the baseline, either in the positive or negative direction to depict the change from baseline in the response for the subject. The y-axis thus represents the change from baseline in the response, usually expressed as a percentage, for e.g., percent change in the size of the tumor or percent change in some marker level. Most importantly, in a waterfall plot, the bars are ordered in the decreasing order of the percent change values.

Though waterfall plots have gained popularity in oncology, they can be used for data visualization in other clinical trials as well, where the response is expressed as a change from baseline.


Instead of a tumor growth dataset, we illustrate creation of waterfall plots for the visual depiction of a quality of life data. A quality of life dataset, dataqol2 is available with the R package QoLR.

dataqol2$id <- as.factor(dataqol2$id)
dataqol2$time <- as.factor(dataqol2$time)
dataqol2$arm <- as.factor(dataqol2$arm)

dataqol2 contains longitudinal data on scores for 2 quality of life measures (QoL and pain) for 60 subjects. In the case of QoL, higher scores are better since they imply better quality of life, and for pain, lower scores are better since they imply a decrease in pain. Each subject has these scores recorded at baseline (time = 0) and then at a maximum of 5 more time points post baseline. ‘arm’ represents the treatment arm to which the subjects were assigned. The dataset is in long format.

The rest of this post is on creating a waterfall plot in R for the QoL response variable.

Creating a waterfall plot using the barplot function in base R

The waterfall plot is basically an ‘ordered bar chart’, where each bar represents the change from baseline response measure for the corresponding subject.

As the first step, it would be helpful if we change the format of the dataset from ‘long’ to ‘wide’. We use the reshape function to do this. Also, we retain only the QoL scores, but not the pain scores:

qol2.wide <- reshape(dataqol2, v.names="QoL", idvar = "id", timevar = "time", direction = "wide", drop=c("date","pain"))

For each subject, we then find the best (largest) QoL score value post baseline, compute the best percentage change from baseline and order the dataframe in the decreasing order of the best percentage changes. We also remove subjects with missing percent change values:

qol2.wide$bestQoL <- apply(qol2.wide[,5:9], 1 ,function(x) ifelse(sum(! == 0, NA, max(x,na.rm=TRUE)))
qol2.wide$bestQoL.PerChb <- ((qol2.wide$bestQoL-qol2.wide$QoL.0)/qol2.wide$QoL.0)*100

o <- order(qol2.wide$bestQoL.PerChb,decreasing=TRUE,na.last=NA)
qol2.wide <- qol2.wide[o,]

Create the waterfall plot… Finally!

barplot(qol2.wide$bestQoL.PerChb, col="blue", border="blue", space=0.5, ylim=c(-100,100),
main = "Waterfall plot for changes in QoL scores", ylab="Change from baseline (%) in QoL score",
cex.axis=1.2, cex.lab=1.4)


Since we are depicting changes in quality of life scores, the higher the bar is in the positive direction, the better the improvement in the quality of life. So, the above figure shows that, for most subjects, there was improvement in the quality of life post baseline.

We can also color the bars differently by treatment arm, and include a legend. I used the choose_palette() function from the excellent colorspace R package to get some nice colors.

col <- ifelse(qol2.wide$arm == 0, "#BC5A42", "#009296")
barplot(qol2.wide$bestQoL.PerChb, col=col, border=col, space=0.5, ylim=c(-100,100),
main = "Waterfall plot for changes in QoL scores", ylab="Change from baseline (%) in QoL score",
cex.axis=1.2, cex.lab=1.4, legend.text=c(0,1),
args.legend=list(title="Treatment arm", fill=c("#BC5A42","#009296"), border=NA, cex=0.9))


Treatment arm 1 is associated with the largest post baseline increases in the quality of life score. Since waterfall plots are basically bar charts, they can be colored by other relevant subject attributes as well.

The above is a solution to creating waterfall plots using base R graphics function barplot. It is my aim to simultaneously also develop a solution using the ggplot2 package (and in the process, develop expertise in ggplot2). So here it is…

Creating a waterfall plot using ggplot2

We use the previously created qol2.wide dataframe, but in ggplot2, we also need an x variable. So:

x <- 1:nrow(qol2.wide)

Next we specify some plot settings, we color bars differently by treatment arm and allow the default colors of ggplot2, since I think they are quite nice. We also want to remove the x-axis, and put sensible limits for the y-axis:

b <- ggplot(qol2.wide, aes(x=x, y=bestQoL.PerChb, fill=arm, color=arm)) +
scale_fill_discrete(name="Treatment\narm") + scale_color_discrete(guide="none") +
labs(list(title = "Waterfall plot for changes in QoL scores", x = NULL, y = "Change from baseline (%) in QoL score")) +
theme_classic() %+replace%
theme(axis.line.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.title.y = element_text(face="bold",angle=90)) +
coord_cartesian(ylim = c(-100,100))

Finally, the actual bars are drawn using geom_bar(), and we specify the width of the bars and the space between bars. We specify stat="identity" because we want the heights of the bars to represent actual values in the data. See ?geom_bar

 b <- b + geom_bar(stat="identity", width=0.7, position = position_dodge(width=0.4))


Update: Readers pointed out about a ‘waterfall chart’ in finance that seems to be somewhat different than the graphic discussed in this post, and they seem to use the word ‘chart’ instead of ‘plot’. Here is some info, that also includes some R code for the waterfall chart used in finance. Here is yet another plot referred to as ‘waterfall plot’ that seems to be used to display spectra.

Waterfall seems to be quite a popular name for plots!