coverage – Pervasive Code

Techniques for Exhaustively Testing a Rails App

Jamie Flournoy — Sat, 23 Aug 2008 22:55:12 +0000

I subscribe to dozens of tech blogs (including but not limited to Ruby and Rails), and although I’ve seen quite a lot of commentary about TDD and BDD, I’m not sold on either of them yet. TDD is interesting, but BDD seems like a waste of time. But I am completely sold on automated testing in general. I write lots and lots of tests and make sure using rcov that my tests cover all of the code.

Getting 100% coverage isn’t easy. In general it means you definitely can’t just write a single test case per method and declare victory when it passes. When the number of possible combinations of inputs (method arguments and/or mock objects) and expected outputs (return values, exceptions, and side effects) becomes large, then the potential for copy-and-paste errors in your test code becomes large, and legibility becomes an issue. This is the point at which I find the recent fascination with writing tests in a near-natural-language DSL to be a distraction. It’s orthogonal to the problem I’m dealing with, which is how to comprehensively test the code. In other words, the problem is not making a small number of tests more readable, but concisely expressing a large number of input/output combinations and a large number of different tests, and making it all readable.

I assume that there are others working on this problem too, so I’ll describe some of the things I’ve come up with, and hopefully if you have any good ideas you’ll post them in the comments section.

Test Data Iterators

If you express the set of inputs and outputs as a data structure, such as a Hash of input => expected ouput pairs, or an Array of [input, expected output] Arrays, you can easily do test_data.each do |input,output| ... end and drive tests that way.

(I considered pulling the data itself out of the test code and loading it from a YAML or CSV file or even including a data-only Ruby script, but that seemed like pointless added complexity. If I were working with a professional tester who didn’t want to touch Ruby code, I might change that to make it easier to dump from an Excel spreadsheet into a text file that the tests would use.)

One problem with this approach is that you may have a test_frobnicate method that invokes frobnicate dozens of times. This means that this one test can take a while to run if the tested method takes any amount of time, which is problematic if you just want to re-test that one input/output combo until it works. Your edit/test/debug loop should be ultra fast, and re-testing successful pairs gets in the way of testing the pair you’re not sure about.

More importantly, trying all those combinations in one test method means that if an assertion fails, the failure masks the outcome of the remaining combinations. If there are 25 combinations and combination #2 fails, then a Test::Unit::AssertionFailedError is thrown, and the other 23 of them go unchecked. That’s not ideal, since it can be useful in debugging if you can see that 24/25 failed vs. 1/25 failed.

To solve this, I changed the code so that it iterates over the test data inside the class definition, generating a new test method for each input/output pair:

class_eval do
  test_data_hash.each do |input, output|
    define_method("test_frobnicator_#{input.hash}") do
      check_frobnicator(input, output)
    end
  end
end

That gives you a bunch of test methods that each run the test once with different input and expected output values. Autotest integrates well with this technique, since it will not re-run the test methods which previously passed until after you have fixed all of the ones which failed.

Test Setup Blocks

I also use a with_x_y_z do...end idiom for mock setup when possible. So a test might contain:

def check_index(inputs, expected_outputs)
  if inputs[:logged_in]
    with_fake_login{|user| get :index, inputs[:params] }
  else
    get :index, inputs[:params]
  end
  
  # ... test outputs against expected output
end

Since this idiom uses blocks, you can nest them to create complicated setup scenarios, pass arguments to the with_blah method to control the behavior of the mock objects, and so on. (You could do this with regular set_up_blah methods but I think this is easier to read.)

Calculated Outputs

Despite their shortcomings, I still use Fixtures (with the PreloadFixtures plugin) and I write unit tests for my models. That means that I can then assume that the models’ behavior is correct in other tests (provided that those unit tests pass). So I try not to put any data that’s already expressed in fixture files into my test method inputs or expected outputs. Instead, I ask for the fixture by its label (widgets(:doodad)) and then use its properties as needed in assertions within the current test. This reduces the size of the input/output data, which makes it easier to read and reduces the amount of effort required to add more combinations later.

Testing Views, Model Associations, and Model Validation

As Bruce Eckel says, If it’s not tested, it’s broken. Or as Peter Drucker said, “What gets measured gets managed.”

I test all of these. I’ve read the arguments of those who say that none of these are necessary, and I disagree. The arugment that testing associations and validations is tantamount to testing Rails itself is just incorrect. Rails’ own unit tests cannot possibly test the validations and associations expressed in every Rails application’s model classes, so Rails application developers still have to do this.

Associations

I can’t tell you how many times I’ve gotten has_one and belongs_to backwards and only found out when ActiveRecord told me that the column didn’t exist, because I told it to look at the table on the wrong side of the association. I feel stupid when I see that I made that mistake, but writing that test takes seconds and once I see the failure it’s easy to fix. Better to catch it and fix it now rather than have a user find it, right?

Validations

The same goes for model validations. Yes, I know that validates_format_of has been tested, but my regexes for username format and email addresses need testing too.

Views

Views should be tested only if you care what your users see.

Somewhere there is probably an organization with no users, where inputs matter but outputs can be anything. Maybe the goal of the project is to burn cash rather than to ship something good. They don’t need to test their views.

I agree that views should be dumbed down as much as possible using helpers, but that’s not enough to assure that they work. If you’re not testing views with automated tests, then how do you test them? With a browser? Every time you change anything, you’re going to exhaustively check every combination of inputs and outputs to make sure that all the dynamic goodies are in the right place? I doubt it. Obviously you do have to check views in a browser to make sure that purely visual aspects are correct, but there are things you can check automatically. You can do quite a lot with assert_select, and for client-side tests, Selenium can do pretty much anything. (It’s probably not worth the effort to have Selenium use the DOM to see that the browser applied CSS in exactly the way that you wanted it to, but if you wanted to, you could.)

I don’t know how to do something similar to coverage testing for views, but a good start is to just check for the effects of everything that the controller stored in instance variables. Maybe those should be displayed as-is in the resulting document, or maybe they should be truncated, or processed by a helper into a different representation.

In any case, you don’t have to test with the same set of inputs as the functional test for the controller; just do white box testing for the minimal set of combinations that should exercise all of the permutations of the view code (which should be a very small number). Testing the helpers separately can cut down on the number of permutations of inputs and outputs also. So if you have a helper that creates a navigation breadcrumb visual element from an array of strings, you could just test that in isolation with 0, 1, 2, and 3 elements, and then check for one version of breadcrumbs when looking for it in a view.

I’m particularly inspired by the claim that Sebastian Delmont of StreetEasy made in the May 31, 2006 episode of the Ruby on Rails Podcast about having no human approval step between a successful test suite run and deployment to production. That’s pretty darn bold, but if you stop and think about it, it is feasible with today’s tools to automate any “white glove test” that a real person would do before approving the release for deployment to production. I’m not that brave yet, but I like the idea of a condition that checks if the changed code has anything to do with a view (ERB, JavaScript, CSS, etc.), and if not, would just rubberstamp it and deploy it. Clearly a change that could require a manual test in multiple browsers wouldn’t be a wise thing to autodeploy, but something like fixing a bug deep in the code shouldn’t require a human to approve it, especially if the autodeploy condition depends on that chunk of code having 100% test coverage following the bug fix.

Your Tips?

I’m always looking for ways to test code with less effort on my part. Please let me know if you have ideas or suggestions.

Making Rcov measure your whole Rails app, even if tests miss entire source files

Jamie Flournoy — Fri, 16 May 2008 22:42:42 +0000

I’ve seen a few Rake tasks for Rcov that work OK, but which fail in an interesting way (if you care about coverage): they give your coverage metrics an unexpected boost if you have 0% coverage in one or more source files.

Huh? Exactly. If you have 500 source files, and your test suite only requires one of them, then you get a free ride on those 499 files that have 0% coverage. Theoretically you could get 100% coverage in your report even though 499 source files are not touched at all. D’oh!

The reason for this is that rcov isn’t responsible for finding all of your source files. It just measures what portion of the files which you loaded were executed. The Rake tasks that people have written just kick off the test code, which also have no need to load all of your application’s source files. They just load whatever they need, and so that’s what rcov is aware of. But that’s not answering the question you thought you were asking, which is “how much of my application is being tested?”

You have to explicitly say that you want to see coverage numbers for all the files your tests need, plus all of the files that your tests did not touch. Put this in your test/test_helper.rb:

# require the entire app if we're running under coverage testing,
# so we measure 0% covered files in the report
coverage_testing_active = defined?(Rcov)
if coverage_testing_active
    all_app_files = Dir.glob('{app,lib}/**/*.rb').grep(
        /^(?!lib\\/(scheduled_tasks|template_optimizer)\\/)/)
    all_app_files.unshift('app/controllers/application.rb')
    all_app_files.each{|rb| require rb}
end

You’ll probably want to customize that regexp to weed out any .rb files that you don’t want to load during the test suite (or which aren’t your own code).

Because of all this require-ing, you might want to apply the snippet in my prior post, which eliminates duplicate required source files if they’re under your RAILS_ROOT. Otherwise you may see evidence of repeated loading of the same source files.

With this snippet in place, you’ll still never see 0% coverage in a file. If there are zero lines of executable code in the file, it will not be listed. But you’ll see really low figures (4% etc.) for files that are loaded but not executed. Hopefully that will help you see where you’ve completely overlooked some code in your tests that users might still be able to get to.

RCOV C0 line coverage more generous than EMMA’s C1 line coverage

Jamie Flournoy — Wed, 11 Jul 2007 19:26:01 +0000

Coverage tests in Ruby (with rcov) are less strict than in Java (with EMMA), so watch out – 100% coverage is easy to attain but not as meaningful.

For Java code coverage, I like EMMA. Clover looks really nice and all, but seriously, $250 a seat? Yeesh. Talk about poor product pricing – I will never buy that product, nor seriously consider buying it, because of the outrageous price. I mean, a single license of IntelliJ is $1 cheaper. That’s like charging $300 for a better Ant. C’mon.

Anyway, EMMA is very conservative about coverage estimates. Consider this single line of Java code:

int x = 1; if (x > 1){throw new RuntimeException();}

When you run it, the assignment will execute, as will the comparison, but the block containing throw will never be reached. So, the line is not fully covered. EMMA looks at bytecodes and maps those back to line numbers, and will mark this line as partially covered.

It’s maddening sometimes, but it’s correct; it’s your problem to figure out how to force all those darned IOExceptions that you know can’t ever happen, or to give up and let some apocalyptic error handling code not be covered. I used to shoot for 85-90% coverage, and just live with some uncovered wacky code when I couldn’t find a reasonable way to trigger hideous errors.

(There’s a little voice in my head that says, “Come on, you slacker, you should make it possible to dynamically break the database configuration and remount the root filesystem as read-only during the test suite so that the error handlers can all be 100% covered!” But that little voice never ships anything, he just sits around and writes more and more paranoid code — no Ariane 5 disaster on my watch! — and gets asymptotically closer to 100% coverage.)

Okay, now the Ruby version:

x = 1; raise RuntimeError if x > 1

Same story, prettier code (cuz it’s Ruby; duh, of course it’s prettier): assignment gets executed, condition is executed, no exception is raised. Rcov says it’s 100% covered.

The difference is C0 coverage (Rcov) vs. C1 coverage (EMMA). EMMA only counts a line as 100% covered if all of the bytecodes compiled from it were executed. Rcov counts a line as 100% covered if execution visited that line at all.

(EMMA uses “basic blocks” instead of bytecodes, so in obscure failure cases the reported line coverage figure is even lower, but that doesn’t affect the conditions for what 100% line coverage requires.)

I’m not sure if it’s feasible to get C1 coverage on the regular Ruby VM, but it would be nice to have. In the meantime, just be aware that 100% C0 coverage is easy to attain with rcov, but doesn’t mean you’re done testing everything.

Of course, 100% C1 coverage doesn’t mean you’re done testing everything either, but it’s closer to meaning that than 100% C0 coverage is.

In fact, it’s good to over-test code that’s already covered once. On any given project in the real world, there is code that will succeed with some values and fail with others. (print x/y, for example.) Just testing with one set of values isn’t enough. I like to create an array of inputs and expected outputs and use Array.each to iterate over them and pass them into a block that calls the code under test and then compares the actual outputs to the expected outputs with assert_equal. It means that I’m following DRY (rather than cutting and pasting several lines of test code and then editing a teeny bit of each pasted chunk… eww) but still testing many different ways.

But, even with a C0 coverage tool, at least you know what hasn’t been touched at all, and you can use your big smart programmer brain to think of a few ways to torture that code. Just remember that you probably should be aiming for an unmeasurable way-over-100% imaginary coverage target, beating the heck out of your fanciest code with many, many different inputs and then moving on to the next chunk of uncovered code only when the last chunk is thoroughly proven to not suck.

Good testing!

Ruby First Impressions: Backup Scripting

Jamie Flournoy — Sun, 04 Mar 2007 06:09:06 +0000

I started programming in Ruby this week, and so far I like it a lot. From my initial use of Ruby as a backup automation scripting language, here are my thoughts.

You might be wondering, why am I working on backup scripting now? Don’t I have some big project I’m supposed to be working on 24/7? Yes, and actually this work is in the critical path of that project.

My super fast laptop is still away being repaired for a video problem, so I’ve taken a major hit in terms of the resources of my main computer: 90% less MHz, 36% less display area, 50% less memory. In the meantime, I’ve been avoiding tasks that need a lot of CPU or graphics performance and instead working on things that are easier on my old desktop computer.

This week, I decided that I would pause working on the design and implementation of my startup project, until I had really sorted out my server backup and monitoring situation.

I have a few servers at home and hosted in a data center, and there are nightly backups in place, but this is done using a hodgepodge of different programs. I’ve got some Bash scripts that invoke rsync every night, some Perl scripts that use cp -rp every night and once a week for different sets of data, and a backup program I found recently that is clever but not quite what I wanted, called rsnapshot. I had previously evaluated faubackup, but right after I started using it, I experienced filesystem corruption on the backup drive, which was pretty scary. (Also, faubackup doesn’t do remote backups, so I would have used rsync anyway.) Both rsnapshot and faubackup are designed to use hard links to save space, so in theory they’re pretty similar. Finally, my Macs use Carbon Copy Cloner, which is a GUI for psync that adds some extra steps to make the target volume bootable. Windows virtual machines in VMWare are handled as one giant VM disk file, which is wasteful but better than nothing.

Ideally, I’d like to duplicate much of what rsnapshot does, which really means using rsync + cp -al to do the heavy lifting, and using a high level wrapper script to manage scheduling and the backup archive. Also, I want something that I haven’t seen done well yet, which is detailed status notification: how much is being backed up, how much is changing daily, how much is being ignored on each remote system by the backup process, etc. And I want a daily summary email telling me that things are being backed up successfully.

So, I’m writing a bunch of Ruby classes, which represent data output from du, df, and find, and which do the text processing and math to provide totals and percentages in a convenient form for high level code. That high level code will interpret configuration data and remote status information, initiate backups, manage the archived backup data, and send summary emails.

On to Ruby impressions. As far as documentation, O’Reilly’s Ruby in a Nutshell is unacceptable because of poor proofreading; the examples don’t work, and the explanations are obviously wrong. I got Programming Ruby a.k.a. the Pickaxe book yesterday at the wonderful Stacey’s Bookstore, and I like it much better. I’ve also found a whole lot of good Ruby sample code online, not limited to just Rails examples as I had expected. It seems like there are a whole lot of people learning and falling in love with this language, and blogging and creating helpful web sites about it.

Regarding the language itself, it feels like Perl, except object oriented for real, and legible. I think I’m probably pretty typical of new Ruby programmers in that I love the way blocks, iterators, and specifically collect and inject work. Things like figuring out the total amount of free disk space on a remote server, not counting a predefined set of filesystem types (procfs, tmpfs, usbfs, etc.), take a half dozen lines of (admittedly fancy, but readable) code. I shudder to imagine how much Java code this would require.

Speaking of which, I’ve been a Java programmer (overlapping with shorter periods of Cold Fusion and PHP work) for 10 years, and a Perl scripter for 11. One thing I loved about Java was the QA first pass you get from static types and compilation: dumb mistakes are caught immediately, before you even try to run the code. Perl always felt expressive but risky, and Ruby feels even more expressive, but still risky in the same way.

The Extreme Programming folks figured out years ago that the only way to write working code in a modern dynamic scripting language (defined for the purposes of this sentence as one that uses dynamic types and is not compiled) is to write a ton of automated tests. Basically you get to omit the code that does casting and type declarations and local variable initialization from your code, but you are forced to write a bunch of explicit test code in return. This would seem like a pointless trade-off, except for the fact that you have to write all that test code for statically typed, compiled languages anyway. So really the question is whether you got to working, debugged code faster with compilation and static type checking, or without it. Although I do like the super fast feedback loop of incremental compilation that you get from IntelliJ or Eclipse with Java, I’m pretty sure the answer is that “without it” is still faster. It just feels really scary, until you’ve written tests.

To eliminate the scary feeling, I started using Test::Unit. It’s is pretty nice, but it is kind of disconcerting that when you define a TestCase subclass and require it in your test script, it seems to just run the tests on its own without being explicitly told to do so. Aside from that potentially irrelevant oddity, it’s working well for me so far.

A related issue is code coverage. Code coverage tools measure which lines of source code are executed during one or more runs of your program, and typically they also generate reports with nice features like bar graphs and highlighted source code displays that show you red areas for code you didn’t execute and green for code you did. That information tells you (with reasonable, but not perfect accuracy) how complete your automated test suite is.

So, as you code, you put TODO comments all through it for things to be written, and you write tests for code that you just wrote to make sure it works, and then you run the test suite with a coverage tool enabled and look at the coverage report to see what you forgot to test. Then when you’re super close to, or at 100% coverage, you attack those TODOs.

In Java, I used JUnit and EMMA as my unit test framework and coverage tool. Those were installed by hand, by me googling for them and then downloading the .zip files and expanding them by hand. I then had to write a bunch of custom Ant build.xml code to enable EMMA’s coverage instrumentation to be added to the compiled Java code. Then I had to write some more Ant build.xml code to generate the coverage report, and I had to do a whole lot of super tricky Ant nonsense to separate the build sequences that led to coverage reports from the ones that led to deployable production code, since the actual compiled code was different. That effort, in total, probably took me somewhere between 1 and 2 days, including all the wrong turns and reading and futzing I had to do to get it to work elegantly, which I eventually did.

Part of the reason for the hassle is that Java is not open source, nor is a usable open source implementation available. The open source community doesn’t really fully embrace Java by packaging it up nicely and helping users to distribute it, because they cannot legally do so.

Sun’s death grip on Java means extra work for Java developers. Getting a hold of all the code and getting it to work together is tedious and requires that you click through license agreements on Sun’s web site; no Linux distribution includes all of this stuff, and Maven, which is cool but (the last time I looked) very poorly documented, is the only thing that seems to even try and pull all of these things together, but it requires that you manually install it on top of Java which itself must be manually installed. Setting up Maven, and configuring it to manage the dependencies of each of your Java projects is no picnic, either. Perl has CPAN and Ruby has Gems and most Linux distributions have great open source package management systems now (and even Mac OS X has MacPorts), but Java has bupkis. You want to assemble an application, you either go out on a limb and try to get your organization to adopt Maven, or you do it all by hand, with lots of trial and error.

So, bear that in mind when I tell you this: for Ruby on Ubuntu Linux, here’s what was required, after Googling for “ruby coverage” and seeing that there was a thing called rcov:

sudo apt-get install rcov
rcov test.tb

Seriously.

I didn’t even look and see if ‘rcov’ was the package name Ubuntu had adopted. I just took a guess, and it worked, and then I read that one web page about usage to find that it’s just “rcov [your Ruby script here]”. Rcov runs the program you give it as a command line argument, and automatically creates a coverage report, quite quickly. It might have taken me one whole minute from “what’s out there” to “hey nice coverage report”.

That is an example of my experience with Ruby thus far. It’s just too easy to be true; I keep waiting for the gigantic gotcha, like performance (fast so far, but so far my code is trivial) or some hideous feature that’s missing.

In the meantime (until I find something not to like), Ruby seems really excellent, and I might not be able to go back to happily programming in Perl or Java or PHP again if this keeps up. :)