A DIY HTML R Transcript Viewer

This is a fun and very useful little script which I wrote today that I want to share. It generates a HTML file with embedded images of the results of running an R script. So, you have a little R script:

# http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=63

palette(rainbow(12, s = 0.6, v = 0.75))
quartz()
stars(mtcars[, 1:7], len = 0.8, key.loc = c(12, 1.5),
           main = "Motor Trend Cars", draw.segments = TRUE)


And you want to run this and share the output as HTML. Well, running this command:

rout2html tiny-example.R


will evaluate the script in R, build a very pretty HTML file and include all the images. Check it out!

I’ll get to the fun stuff of how this is implemented soon, but first a little about why this is useful. Have you heard the song Baby Got Stats?


Well, if you’re a born-again convert to the command line like me, the first word that’s going to jump out at you from this song is “logfile”. Logfile? A logfile means you are working interactively in a GUI and then later you are going to go back through the logfile to extract the relevant results. This is not a good idea. It means your results are not easily reproducible. This is bad for you if you need to change something or want to double-check that you’ve done your calculations correctly. This is bad for anyone else if they want to check or review your results. A better approach is to write a script and revise that script until it gives you the output you are looking for. That’s reproducible, testable, documentable, and easily sharable.

The problem with GUI apps is that they encourage you to take a logfile approach, rather than a batchfile approach. When you fire up the R GUI, the most natural thing you can do is to start typing. Resist! GUIs should be used as a scratchpad or for interactive help. Instead open your favourite text editor, type up a script (your text editor’s syntax highlighting is going to be MUCH nicer than the R GUI’s), run it using rout2html and view the results in your web browser. Make changes to the script, run the command again, refresh your browser. Now iterate. This approach works in Windows too! (Well, in principle this approach works in Windows. For now rout2html is a bash script. I’ll look into writing a Windows .bat file or making it system-independent.)

There are other applications for rout2html. You can use this to automate a remote R script and view the results in HTML format, for instance if you wanted to analyze performance of a web application. (Check out splunk if you need to parse server logs.) If for some reason you were trying to use R but couldn’t run a GUI, like on a remote server, then you could use rout2html to generate HTML transcripts and view them via a web browser.

So, how does this work? Well, rout2html is a bash script which basically does 3 things:

  • Runs your R script in R via R CMD BATCH.
  • Highlights the resulting Rout file using Pygments
  • Looks for special image codes in the HTML file and replaces them with HTML image tags.

Unfortunately the R Console transcript highlighter has just been submitted to Pygments, so it’s not available in the latest released version. I’ll update this post if my patch gets accepted. In the mean time you’ll need to download the source to Pygments, apply my patch and run setup.py install or setup.py develop.

In order to get images to show up, you’ll need to have a call to quartz() or some other supported graphics device in your R script before each plot. If you’ve been working with the GUI a lot, you probably have that in there already anyway. rout2html runs a script called redefine-graphics-devices.R before it calls your script:

quartz <- function(...) {
    png(file = random_filename_generator('png'), ...)
}

X11 <- function(...) {
    quartz(...)
}

pdf <- function(file = f, ...) {
    quartz(...)
}

jpeg <- function(file = f, ...) {
    quartz(...)
}

random_filename_generator <- function(file_extension) {
    random_filename <- paste(runif(1, 100000, 1000000000), file_extension, sep=".")
    cat(file_extension, "_filename:", random_filename, "\n", sep="")
    return(random_filename)
}

This basically hijacks the graphics devices so they all generate a PNG file with a randomly generated filename, and so the R transcript contains a reference to this PNG filename. The 3rd step in rout2html looks for these filename references and converts them to HTML img tags so the final HTML actually contains the images. The nice thing about this is that, assuming you have explicitly called a graphics device before you start plotting to it, you don’t have to change a thing. You can go back and forth between running this script in the GUI via source(), or via R CMD BATCH yourself, or via rout2html, and it will work in all 3 cases. If you have specified a filename for a PDF or JPEG device, this will be ignored when run via rout2html. The only issue will arise with the PNG device. Because this device is not overwritten (for now) if you have called the png() device, then you’ll need to cat the filename manually in the same format as the random_filename_generator() function, e.g. png_filename:my-custom-filename.png, followed by a newline. Then an img tag should be created for you.

png("i-want-this-filename.png")
plot(c(1,2,3))
cat("png_filename:i-want-this-filename.png\n")

Take a look at the output.

So, in order to run this you need a modified Pygments, the rout2html script somewhere on your PATH, and the redefine-graphics-devices.R script in the same directory as your R script. My plan is to eventually have this implemented as an R package + an executable script, so that R can look after making redefine-graphics-devices available. For now, if you are skilled enough to get this working then you are probably skilled enough to modify it for your needs. Also, all HTML and PNG images will be created in the directory with your R script and redefine-graphics-devices.R. You are welcome to move the HTML and PNG after they are generated, but keep them together since the image tags just reference the filename. CSS is bundled into the header of the HTML. If you wanted to rewrite this script to use a syntax highlighting library other than Pygments, then in addition to changing the pygmentize command, you would need to change the Ruby regexp based on what the output line would look like.

#!/bin/bash

# Save any old workspace.
if [ -e ".RData.xyztmp" ]
then
    echo "A file named .RData.xyztmp already exists. Please remove this before proceeding."
    exit 1
fi

if [ -e ".RData" ]
then
    mv .RData .RData.xyztmp
fi

# Execute redefine-graphics-devices.R and save results in fresh workspace.
# Don't want an outfile here.
R CMD BATCH --no-restore --save redefine-graphics-devices.R /dev/null

# Now run the R script passed in as $1, 
# which will have access to anything defined in our new workspace.
# $2 is passed to the R script as an argument, 
# this will just be ignored unless you look for it,
# but it gives you the option to pass args to your script.

R CMD BATCH --restore --no-save "--args $2" $1 $1out
rm .RData 

# Syntax highlight and create HTML file using Pygments
pygmentize -o $1.html -O full $1out

# Now look for png_filename: or jpg_filename: and convert to HTML img tags.
RUBY_CMD="gsub(/<span class=\"go\">(png|jpg)_filename:([0-9\.A-Za-z-]+\.(png|jpg))<\/span>/, '<img src=\"\2\" \/>')" 
ruby -pe "$RUBY_CMD" < $1.html > $1-with-images.html

# Restore any old workspace.
if [ -e ".RData.xyztmp" ]
then
    mv .RData.xyztmp .RData
fi



Pádraig Brady 23 Sep 2009

Cool you've just prompted me to look at R finally.

`sudo yum install R` ... 108MB ? holy hard disks!

Ah sure we'll install it anyway and have a look.

Hmm it would be nice I think if R colourized it's own terminal display. I.E. have all that parsing logic internal to R and then one could use something like the following to convert to HTML: http://www.pixelbeat.org/scripts/ansi2html.sh

Ana Nelson 24 Sep 2009

Hi, Pádraig,

Thanks for your comment. I would love to see more syntax highlighting within R itself, as you suggest. There is some in the OSX GUI and I think error messages are highlighted in some versions of the terminal interface. It might be possible to do this using Pygments, as in this example where Pygments was used to add syntax highlighting to a Qt window: http://lateral.netmanagers.com.ar/weblog/2009/09/21.html#BB831

I don't necessarily agree on going to ANSI, then to HTML (or LaTeX or other output format). I think you would lose a lot of information this way - in the Pygments HTML you apply CSS styles to classes based on the original token type, whereas if you went to ANSI then HTML you would only know the "color" of a token. However, your script looks interesting and I'll keep it in mind for other uses!