Redirecting Java Output Streams in JRuby (and ANTLR Unit Testing)

Okay, I am going to break my own golden rule and blog about some code I’ve finished writing just this very minute. I’m not going to use it and refactor it for weeks first before I expose it to the tender mercies of the interweb. No doubt I will wake up screaming at 3am, cringe with humiliation and wish in vain I could retract this hideous code from the permanent archive to which I now commit it. But, tough. I’m just too happy and must share the joy. If you want to skip ahead, the source is all here.

Why am I so happy? Well, about 90% of my happiness is down to the fact that I conquered a small corner of Java today. After hours of bitter frustration, a seemingly endless war of attrition against an enemy with infinite stamina, I was victorious. I successfully traversed the great Maze of Javadoc, with its myriad false doors, dead ends and those trap doors that open in the floor and dump you back to where you were 30 minutes ago, disoriented and dizzy. In the end I escaped by collecting all the pieces of the great Talisman of Bufferred (there were only 2 this time) and putting them together in the correct order. They glowed with an eerie green light, my code compiled and I was free. As you wander the Maze of Javadoc there are many false pieces to mislead you, perhaps this StringWriter is a true piece of the Talisman, but when you put false pieces together you risk summoning the dreaded Dragon who will scold you with his fearsome forked tongue. Many dangers await the unwary.

Hmm.. that sounds like much more fun than it was. Maybe someone can turn that into a video game. Every time I have to work with Java I end up describing it in adversarial terms. There is a thrill to winning, I will admit, but I much prefer the harmonious yoga of Ruby to the bare knuckle boxing of Java.

What I actually accomplished today was to use Ruby’s test/unit as a unit testing framework for ANTLR grammars (via JRuby). ANTLR is a parser generator which is a fanstatic tool for writing Domain Specific Languages. (Forget clever tricks with blocks in Ruby, if you want to write DSLs ANTLR is definitely your friend.) The latest update of gUnit was giving me a bit of trouble, and it occurred to me that I might be able to get test/unit to work without too much trouble. (sigh) Yeah, go ahead and laugh. Famous last words. It turned out to be worth it since I not only got test/unit working, I also drastically improved the way I was interfacing with my ANTLR generated java classes AND solved the problem of JRuby writing things to the console. And, now I am actually testing my parser as I will be using it in my applications.

What proved to be the biggest headache was redirecting Java’s System.err and System.out. By default, ANTLR recovers from errors while lexing and parsing. Error messages are written to Java’s System.err, but an exception isn’t raised. In JRuby, these error messages are written out to the console. There’s no obvious way to get at them, and so there’s no way to test if a certain expression is being matched corrrectly by your parser. It’s possible to change this behaviour in your ANTLR grammar so an Exception does get raised (which you can then detect and handle in Ruby), but this involves adding about 20 lines of mostly boilerplate code to your grammar file. This clutters up the grammar, and also defeats the purpose of a general purpose unit testing framework. We don’t want to only be able to test (and use) grammars which raise exceptions.

I first attempted to redirect Ruby’s $stdout and $stderr thinking that JRuby would use these for Java’s output and error messages, but this isn’t the case. Java’s messages are separate. Here is an example which illustrates that Java and Ruby’s messages are separate, and demonstrates how to temporarily capture Java’s System.out stream and access what was written to it later.

require 'java'

include_class 'java.io.PrintStream'
include_class 'java.io.ByteArrayOutputStream'
include_class 'java.lang.System'

System.out.println "Hello from Java!"
puts "Hello from Ruby!"

sys_out_stream = System.out

my_output_stream = ByteArrayOutputStream.new
System.setOut(PrintStream.new(my_output_stream))

System.out.println "Static typing rules!"
puts my_output_stream.toString.gsub("Static", "Dynamic")

System.setOut(sys_out_stream)

System.out.println "Hey! That's not what I said!"

The output is


> Hello from Java!
> Hello from Ruby!
> Dynamic typing rules!
> Hey! That's not what I said!

I found this example to be very helpful in figuring this out.

Going back to ANTLR now, I wanted to recreate the functionality present in gUnit, but in a more ruby-like syntax. My tests look like this:


def test_variable
  assert_invalid_input(:variable, "int x")
  assert_valid_input(:variable, "int x;")
end

def test_program
  assert_valid_input(:program,
  %{
    char c;
    int x;
  })

  parser = SimpleC.new

  parser.program %{
    char c;
    int x;
    void bar(int x);
    int foo(int y, char d) {
      int i;
      for (i=0; i<3; i=i+1) {
        x=3;
        y=5;
      }
    }
  }

  assert_equal ["bar is a declaration", "foo is a definition"], parser.output
end

and they are similar in intent to the gUnit examples presented on the gUnit ANTLR wiki page. The grammar in question is the LL-star example grammar from the ANTLR book.

There are two helper functions also in test_simple_c.rb:


def assert_valid_input(method, input_string)
  assert_nothing_raised {SimpleC.new.send(method, input_string)}
end

def assert_invalid_input(method, input_string)
  assert_raise(AntlrError) {SimpleC.new.send(method, input_string)}
end

SimpleC is a ruby class which wraps the Java parser and lexer to make them more amenable to testing. Check out the source if you are interested. It uses method_missing and handles all the java buffer chaining to turn a string into a lexer into a token stream into a parser. I wrote that just to get these tests working, but it’s going to be my new general purpose ANTLR wrapper so it will no doubt see several changes. There is no sanity checking or graceful handling of method names which aren’t actual parser rules at the moment so beware.