R on Rails with RSRuby

A few weeks ago, I removed Gruff, GLE and ZiYa from my primary Rails application. I had finally managed to implement all of my graphing needs using a single tool, R. I still have a special fondness for Gruff, it was the first software project that I ever submitted a patch to. I was so green at the time that Geoffrey Grosenbach had to, very patiently, explain that you should (a) generate a diff file instead of sending a zip of everything and (b) work off of a subversion trunk instead of a very out-of-date numbered version of the gem. Gruff definitely has its place, but it’s very limited in terms of the types of graph it can produce. Then there’s the little matter of its ImageMagick dependency (shudder).

GLE (Graphics Layout Engine) is an interesting package which I enjoyed exploring, but it is a “graphics” rather than “graphing” package so again I was faced with the prospect of having to implement from scratch any slightly unusual types of graph that I might want to use in the future. ZiYa is an open source Flash-based graphing package which at first I absolutely adored, but printing Flash based images turned out to be a totally lost cause. Working in a very small team, my HTML needs to function for both screen display and printing, and my page layouts need to stay very precisely aligned across multiple browsers and platforms. That’s hard enough to do with PNG files, it’s no fun at all when your Flash pie chart prints as a microscopic dot in a sea of white space in one browser and in another browser it’s so large it comes out as a truncated square, completely filling up its allocation of pixels.

No. I needed a graphing package which was mature and robust, where I could be totally confident in the geometry underlying the graphs and where I wouldn’t have to write basic plot types myself. But, I also needed the flexibility and power to customize graphs so that I could maximize their impact and their ability to communicate information. I needed R.

I have been an R user for several years, and I was already using R on a daily basis at work, but in order to use it to automate the production of graphics I needed to be able to integrate it with Rails.

I’m not sure how I overlooked the RSRuby gem for as long as I did, but it probably had to do with the fact that Rubyforge’s site search has a 3-letter minimum and “R” is only one letter long. (Searching for and tagging with “R” is my least favourite part of working with R. For example, I once tried subscribing to http://del.icio.us/tag/r but it’s basically just a fat-finger feed. Fortunately Google is smart enough to give r-project.org a top ranking if you search for the letter “r”, so that’s something.) Anyway, once I came across RSRuby, my mind was made up to commit to R as my only graphics platform.

R can produce just about any type of graph you might be familiar with, and many more graphs that will probably be completely alien to you unless you have a degree in statistics. The one down side of using R, especially in a business context, is that some of the less “correct” forms of statistical visualization which are nonetheless popular in a business context (like, say, a pie chart) will be relatively difficult to produce. This is a minor inconvenience and easily overcome, either by using a better graph type or tweaking the appearance of the one your customers expect. Also, if you simply call the default R plot() method you will get an image which is, frankly, rather ugly. However, making R graphs pretty and even Tufte-esque is very straightforward and since R is a proper language you will be able to do this DRYly.

I had once tried writing my own bridge between Ruby and R following the example of the R TextMate plugin. Basically, the R TextMate plugin passes commands to the R command line tool and then reads back the results which are printed to STDOUT. I probably could have gotten this to work eventually, but the approach taken by RSRuby is much more sensible. R is designed to be embeddable, and RSRuby uses C to embed an R interpreter within Ruby. The RSRuby code is surprisingly short, and as C code goes it is very readable. Alex Gutteridge has written a very helpful user manual, and where that runs out the source itself, both C and Ruby, is very readable and well commented. When I first started using RSRuby I had a little trouble making sense of the conversion system where R objects can be converted into Ruby objects, but as I gained familiarity I really appreciated the flexibility this system gives you. And, there are plenty of examples in the unit tests to get you started, as well as some very helpful examples in the literate documentation.

My workflow for using RSRuby and R to produce dynamic graphs goes something like this. I design my graphs using the R GUI to work out any calculations that may be required and to fine-tune the look and feel. Once I have written a function which produces the graph, I add this function to one of a number of R files I have in an R/ directory within the lib/ directory of my Rails application. (I like that CLOC will give me the number of lines of R code separately from the number of lines of Ruby code this way.) These R files are evaluated just once when I first start my Rails application and they are then stored in RSRuby’s cache (this means that if I have to make changes to one of these R functions I need to restart my application server). I have Ruby unit tests for each R function, these tests are there to ensure that no exceptions are raised when the functions are called and I can also visually inspect the image files which are produced. Finally, I have helper methods which call the appropriate R function using RSRuby.instance.name_of_function, which corresponds to an R function named name.of.function which is (hopefully) already defined in RSRuby’s cache. The helper method starts an R image device (I usually use PDF), calls the plotting function, closes the R device, and finally returns an image_tag with the filename of the generated image. I simply call the helper method in my view. The images are saved in a graphs/ directory within public/. (I have never liked R’s native PNG production on OSX so I produce PDFs and then use gs to convert these PDFs to PNGs.)

So, the first very obvious limitation with this is that I am caching all of these images in the public directory. I don’t have to worry about restricting access to any of these generated images, so the public directory is fine for me. If you need to restrict access to images then please research the correct way to do this on the mongrel mailing list, since I have seen discussion threads indicating that the “obvious” way to do this, i.e. send_data or send_file, is not recommended. Hopefully that’s common knowledge, but I mention it just in case. But, well, sadly, that’s sort of a premature concern anyway. Perhaps I should have mentioned this earlier, but there are some issues with R and Rails which pretty much preclude doing what I have just described in any sort of a public-facing security-conscious scenario. At least, for the moment and IMHO.

Some developers, myself included, find that R embedded in Ruby via RSRuby has a tendency to segfault. If I try embedding R in a DRb server, it segfaults instantly and very consistently. Until recently I would have almost daily segfaults in my Rails application (for some reason this has improved lately, perhaps since I upgraded to Leopard, but that might just be since I am restarting my web server more frequently due to a development spurt and so the situation which reliably brought on a segfault - requesting a page first thing in the morning after several hours of inactivity - hasn’t occurred as much). I suspect they are related to garbage collection, either Ruby’s or R’s or their interaction, but this is just a hunch. I am hoping to play around with DTrace now that I have upgraded to Leopard and perhaps I will learn more about what’s happening that way, but I’m not a C programmer and my time is very limited, so this might not happen for a while. If you do want to play with R and Rails, then make sure you are using RSRuby 0.5 or higher since one potentially serious issue was addressed in that release.

To sum up, I use R with Rails successfully and very happily, but under controlled circumstances and not in a public-facing or uptime-crucial situation. I don’t have data on speed or memory usage, other than to say that I regularly run reports with dozens or even hundreds of different images and graphics rendering is not the most time consuming part of the task. What matters most to me is that I can easily prototype in R, and my library of graph types can be used in any environment that supports R. I am not locking myself in to a Rails-only or Ruby-only package. Also, although I haven’t even touched on it here, my Rails and Ruby applications have access to the rest of R’s extensive statistical and mathematical functionality and a huge package library. I would, of course, be very interested to hear from anyone who is using R with Ruby and/or Rails, especially with regard to the segfault issue, but also just to hear what other people are doing. I’m very grateful to Alex Gutteridge (and the ancestral Python and Perl developers) for giving me the chance to combine my two favourite languages.


Comments

Jay Gallivan 16 May 2008

i’ve been using RoR for about a year and i have done some work with bindings to gnuplot and the gnu scientific library. but R is certainly attractive.

in a very minimalist way i’ve gotten some calls to rsruby working. but it seems very unstable and certainly provides a very un-rubyish interface. i’ve also had trouble with behavior of the app under mongrel. all in all, too hard to make use of at 0.5 for me. but i’d really like to see this go further.

i’m going to try working with R scripts and calling them via system().

Anonymous 30 May 2008

The latest R supports a Cairo backend that makes the plots look much nicer.