Forest Plot (with Horizontal Bands)

Forest plots are often used in clinical trial reports to show differences in the estimated treatment effect(s) across various patient subgroups. See, for example a review. The page on Clinical Trials Safety Graphics includes a SAS code for a forest plot that depicts the hazard ratios for various patient subgroups (this web page has links to other interesting clinical trial graphics, as well as links to notes on best practices for graphs, and is worth checking out).

The purpose of this blog post is to create the same forest plot using R.

It should be possible to create such a graphic from first principles, using either base R graphics or using the ggplot2 package such as posted here. However, there is a contributed package forestplot that makes it very easy to make forest plots interspersed with tables – we just need to supply the right arguments to the forestplot function in the package. Also, one question that arose was how easy it would be to get the horizontal grey bands for alternate rows in the forest plot. This too was not very difficult. Following is a short explanation of the entire process, as well as the relevant R code:

First, we store the data for the plot, ForestPlotData in any convenient file format and then read it into a dataframe in R:

workdir <- ""C:\\Path\\To\\Relevant\\Directory""
datafile <- file.path(workdir,"ForestPlotData.csv")
data <- read.csv(datafile, stringsAsFactors=FALSE)

Then format the data a bit so that the column labels and columns match the required graphical output:

## Labels defining subgroups are a little indented!
subgps <- c(4,5,8,9,12,13,16,17,20,21,24,25,28,29,32,33)
data$Variable[subgps] <- paste("  ",data$Variable[subgps]) 
 
## Combine the count and percent column
np <- ifelse(!is.na(data$Count), paste(data$Count," (",data$Percent,")",sep=""), NA)
 
## The rest of the columns in the table. 
tabletext <- cbind(c("Subgroup","\n",data$Variable), 
                    c("No. of Patients (%)","\n",np), 
                    c("4-Yr Cum. Event Rate\n PCI","\n",data$PCI.Group), 
                    c("4-Yr Cum. Event Rate\n Medical Therapy","\n",data$Medical.Therapy.Group), 
                    c("P Value","\n",data$P.Value))

Finally, include the forestplot R package and call the forestplot function with appropriate arguments.

The way I got around to creating the horizontal band at every alternate row was by using settings for a very thick transparent line in the hrzl_lines argument! See below. The col=”#99999922″ option gives the light grey color to the line as well as sets it to be transparent.

A graphics device (here, a png file) with appropriate dimensions is first opened and the forest plot is saved to the device.

library(forestplot)
png(file.path(workdir,"Figures\\Forestplot.png"),width=960, height=640)
forestplot(labeltext=tabletext, graph.pos=3, 
           mean=c(NA,NA,data$Point.Estimate), 
           lower=c(NA,NA,data$Low), upper=c(NA,NA,data$High),
           title="Hazard Ratio",
           xlab="     <---PCI Better---    ---Medical Therapy Better--->",
           hrzl_lines=list("3" = gpar(lwd=1, col="#99999922"), 
                          "7" = gpar(lwd=60, lineend="butt", columns=c(2:6), col="#99999922"),
                          "15" = gpar(lwd=60, lineend="butt", columns=c(2:6), col="#99999922"),
                          "23" = gpar(lwd=60, lineend="butt", columns=c(2:6), col="#99999922"),
                          "31" = gpar(lwd=60, lineend="butt", columns=c(2:6), col="#99999922")),
           txt_gp=fpTxtGp(label=gpar(cex=1.25),
                              ticks=gpar(cex=1.1),
                              xlab=gpar(cex = 1.2),
                              title=gpar(cex = 1.2)),
           col=fpColors(box="black", lines="black", zero = "gray50"),
           zero=1, cex=0.9, lineheight = "auto", boxsize=0.5, colgap=unit(6,"mm"),
           lwd.ci=2, ci.vertices=TRUE, ci.vertices.height = 0.4)
dev.off()

Here is the resulting forest plot:

Advertisements

14 thoughts on “Forest Plot (with Horizontal Bands)

  1. Hi Jyothi! Thanks for such a great R package. I use it a lot at my company actually. I have a naïve question because I am a novice R user – how do you add a footnote to the plot? I just want to add the date and time as well as the path indicating the location of the source documents. I guess it would take up multiple lines (3?) Thank you!

    Like

    • Hi! Thanks!, But I’m not the author of the forestplot package 🙂 I just used the package.
      With respect to your question, it does look like the forestplot function does not have an argument to support footnotes. However, since it is based on grid graphics, without making too much of a change in the rest of the plot alignment, we can probably add two lines of text by including the following code just before the dev.off() statement:

      grid.text(paste(“Date & time:”,date()), 0.01, 0.03, just=0, gp=gpar(cex=1))
      grid.text(paste(“Source:”,”C:\\Path\\to\\source_file_location\\”), 0.01, 0.01, just=0, gp=gpar(cex=1))

      The above should put the time stamp and file location below the plot. Hope this helps.

      Like

  2. Hello I cannot obtain your same graphs: I received a message after running png(file.path(workdir,”Figures\\Forestplot.png”),width=960, height=640)

    of Error in png(file.path(workdir, “Figures\\Forestplot.png”), width = 960, : unable to start png() device.

    When I add type=”cairo” line is recognized however later:

    Error in grid.Call(L_convert, x, as.integer(whatfrom), as.integer(whatto), : could not open file

    Could you please provide an explanation?

    Thanks

    Like

    • Hi! Thank you for trying the code.
      The plot actually gets saved to a sub-directory named “Figures” within “workdir”. So if this directory is not present, the error is thrown. Unfortunately, I missed allowing the program to create the sub-directory if it did not exist. My mistake! Can you first try creating a “Figures” sub-directory within your “workdir” and then try again?

      Like

  3. I was trying to create similar plot. I still could not understand your data structure. How you had arranged your data. Could you please share your data. Thanks

    Like

    • Hi! The link to the dataset is given in the post itself as an MS Excel file. You may however have to save the file locally in .csv format before trying the read.csv() option.

      Like

  4. Hi

    I have a quick question that I tried to follow the instruction:

    data$Variable[subgps] <- paste(" ",data$Variable[subgps])

    But I got the following error message. Please advise. Many thanks in advance.
    ———————————————————————————–
    Error in `$<-.data.frame`(`*tmp*`, "Variable", value = c("Overall", NA, :
    replacement has 33 rows, data has 1

    YS

    Like

  5. Hi Jyothi,

    Many thanks for your post. I have a question, how do you italicise or made bold the column names and the subheadings for the subgroups (age, sex, race etc)?

    BW,

    Amanda

    Like

  6. Hey
    I like to reproduce your plot.
    Unfortunately I have some errors ….
    can you please help me ?
    Warning message:
    In is.na(data$Count) :
    is.na() applied to non-(list or vector) of type ‘NULL’
    Warning message:
    In cbind(c(“Subgroup”, “\n”, data$Variable), c(“No. of Patients (%)”, :
    number of rows of result is not a multiple of vector length (arg 2)
    Error in forestplot.default(labeltext = tabletext, graph.pos = 3, mean = c(NA, :
    You have provided 2 rows in your mean arguement while the labels have 35 rows

    Like

  7. Thank You Jyothi,

    Your syntax is very helpful for me.

    Only I have one concern, hope you should solve my issue.
    My concern is, On my graph the point estimate point plotted on interval line and that is very big.
    so can you help me please how would I decrease their size?

    Like

  8. Only one suggestion. Your forest plot is amazing, and it’s exactly what I was looking for.
    However, to be PERFECT, the size of the squares should be proportional to the number of patients in that analysis. Actually, in a meta-analysis, they are related to the inverse of variance, which in turn, is related to the sample size.
    To be MORE THAN PERFECT, and now I know that I’m asking too much, it should be able to create the forest plot using the raw data. The same data you used to perform the Cox regression, instead of reorganizing the data in another sheet.
    For what it’s worth, people have added a ggplot2-based function to create a forest plot for the cox model in the survminer package (https://www.r-bloggers.com/forest-plot-with-horizontal-bands/).
    However, their forest plot is still wrong because, I believe, they were not able to perform the analyses by each subgroup.

    Congrats!

    Like

    • Thank you so much for your nice comments!

      The default value of boxsize parameter should make the size of the square inversely proportional to the precision of the estimate. Getting the plot from raw data might be a little more difficult and I can try to do that in a future post!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s