Ruby First Impressions: Backup Scripting

I started programming in Ruby this week, and so far I like it a lot. From my initial use of Ruby as a backup automation scripting language, here are my thoughts.

You might be wondering, why am I working on backup scripting now? Don’t I have some big project I’m supposed to be working on 24/7? Yes, and actually this work is in the critical path of that project.

My super fast laptop is still away being repaired for a video problem, so I’ve taken a major hit in terms of the resources of my main computer: 90% less MHz, 36% less display area, 50% less memory. In the meantime, I’ve been avoiding tasks that need a lot of CPU or graphics performance and instead working on things that are easier on my old desktop computer.

This week, I decided that I would pause working on the design and implementation of my startup project, until I had really sorted out my server backup and monitoring situation.

I have a few servers at home and hosted in a data center, and there are nightly backups in place, but this is done using a hodgepodge of different programs. I’ve got some Bash scripts that invoke rsync every night, some Perl scripts that use cp -rp every night and once a week for different sets of data, and a backup program I found recently that is clever but not quite what I wanted, called rsnapshot. I had previously evaluated faubackup, but right after I started using it, I experienced filesystem corruption on the backup drive, which was pretty scary. (Also, faubackup doesn’t do remote backups, so I would have used rsync anyway.) Both rsnapshot and faubackup are designed to use hard links to save space, so in theory they’re pretty similar. Finally, my Macs use Carbon Copy Cloner, which is a GUI for psync that adds some extra steps to make the target volume bootable. Windows virtual machines in VMWare are handled as one giant VM disk file, which is wasteful but better than nothing.

Ideally, I’d like to duplicate much of what rsnapshot does, which really means using rsync + cp -al to do the heavy lifting, and using a high level wrapper script to manage scheduling and the backup archive. Also, I want something that I haven’t seen done well yet, which is detailed status notification: how much is being backed up, how much is changing daily, how much is being ignored on each remote system by the backup process, etc. And I want a daily summary email telling me that things are being backed up successfully.

So, I’m writing a bunch of Ruby classes, which represent data output from du, df, and find, and which do the text processing and math to provide totals and percentages in a convenient form for high level code. That high level code will interpret configuration data and remote status information, initiate backups, manage the archived backup data, and send summary emails.

On to Ruby impressions. As far as documentation, O’Reilly’s Ruby in a Nutshell is unacceptable because of poor proofreading; the examples don’t work, and the explanations are obviously wrong. I got Programming Ruby a.k.a. the Pickaxe book yesterday at the wonderful Stacey’s Bookstore, and I like it much better. I’ve also found a whole lot of good Ruby sample code online, not limited to just Rails examples as I had expected. It seems like there are a whole lot of people learning and falling in love with this language, and blogging and creating helpful web sites about it.

Regarding the language itself, it feels like Perl, except object oriented for real, and legible. I think I’m probably pretty typical of new Ruby programmers in that I love the way blocks, iterators, and specifically collect and inject work. Things like figuring out the total amount of free disk space on a remote server, not counting a predefined set of filesystem types (procfs, tmpfs, usbfs, etc.), take a half dozen lines of (admittedly fancy, but readable) code. I shudder to imagine how much Java code this would require.

Speaking of which, I’ve been a Java programmer (overlapping with shorter periods of Cold Fusion and PHP work) for 10 years, and a Perl scripter for 11. One thing I loved about Java was the QA first pass you get from static types and compilation: dumb mistakes are caught immediately, before you even try to run the code. Perl always felt expressive but risky, and Ruby feels even more expressive, but still risky in the same way.

The Extreme Programming folks figured out years ago that the only way to write working code in a modern dynamic scripting language (defined for the purposes of this sentence as one that uses dynamic types and is not compiled) is to write a ton of automated tests. Basically you get to omit the code that does casting and type declarations and local variable initialization from your code, but you are forced to write a bunch of explicit test code in return. This would seem like a pointless trade-off, except for the fact that you have to write all that test code for statically typed, compiled languages anyway. So really the question is whether you got to working, debugged code faster with compilation and static type checking, or without it. Although I do like the super fast feedback loop of incremental compilation that you get from IntelliJ or Eclipse with Java, I’m pretty sure the answer is that “without it” is still faster. It just feels really scary, until you’ve written tests.

To eliminate the scary feeling, I started using Test::Unit. It’s is pretty nice, but it is kind of disconcerting that when you define a TestCase subclass and require it in your test script, it seems to just run the tests on its own without being explicitly told to do so. Aside from that potentially irrelevant oddity, it’s working well for me so far.

A related issue is code coverage. Code coverage tools measure which lines of source code are executed during one or more runs of your program, and typically they also generate reports with nice features like bar graphs and highlighted source code displays that show you red areas for code you didn’t execute and green for code you did. That information tells you (with reasonable, but not perfect accuracy) how complete your automated test suite is.

So, as you code, you put TODO comments all through it for things to be written, and you write tests for code that you just wrote to make sure it works, and then you run the test suite with a coverage tool enabled and look at the coverage report to see what you forgot to test. Then when you’re super close to, or at 100% coverage, you attack those TODOs.

In Java, I used JUnit and EMMA as my unit test framework and coverage tool. Those were installed by hand, by me googling for them and then downloading the .zip files and expanding them by hand. I then had to write a bunch of custom Ant build.xml code to enable EMMA’s coverage instrumentation to be added to the compiled Java code. Then I had to write some more Ant build.xml code to generate the coverage report, and I had to do a whole lot of super tricky Ant nonsense to separate the build sequences that led to coverage reports from the ones that led to deployable production code, since the actual compiled code was different. That effort, in total, probably took me somewhere between 1 and 2 days, including all the wrong turns and reading and futzing I had to do to get it to work elegantly, which I eventually did.

Part of the reason for the hassle is that Java is not open source, nor is a usable open source implementation available. The open source community doesn’t really fully embrace Java by packaging it up nicely and helping users to distribute it, because they cannot legally do so.

Sun’s death grip on Java means extra work for Java developers. Getting a hold of all the code and getting it to work together is tedious and requires that you click through license agreements on Sun’s web site; no Linux distribution includes all of this stuff, and Maven, which is cool but (the last time I looked) very poorly documented, is the only thing that seems to even try and pull all of these things together, but it requires that you manually install it on top of Java which itself must be manually installed. Setting up Maven, and configuring it to manage the dependencies of each of your Java projects is no picnic, either. Perl has CPAN and Ruby has Gems and most Linux distributions have great open source package management systems now (and even Mac OS X has MacPorts), but Java has bupkis. You want to assemble an application, you either go out on a limb and try to get your organization to adopt Maven, or you do it all by hand, with lots of trial and error.

So, bear that in mind when I tell you this: for Ruby on Ubuntu Linux, here’s what was required, after Googling for “ruby coverage” and seeing that there was a thing called rcov:

Seriously.

I didn’t even look and see if ‘rcov’ was the package name Ubuntu had adopted. I just took a guess, and it worked, and then I read that one web page about usage to find that it’s just “rcov [your Ruby script here]”. Rcov runs the program you give it as a command line argument, and automatically creates a coverage report, quite quickly. It might have taken me one whole minute from “what’s out there” to “hey nice coverage report”.

That is an example of my experience with Ruby thus far. It’s just too easy to be true; I keep waiting for the gigantic gotcha, like performance (fast so far, but so far my code is trivial) or some hideous feature that’s missing.

In the meantime (until I find something not to like), Ruby seems really excellent, and I might not be able to go back to happily programming in Perl or Java or PHP again if this keeps up. :)

4 thoughts on “Ruby First Impressions: Backup Scripting”

  1. Yes, the GPLing of Java is great news, and will have a very positive effect on Java’s adoption by the open source community.

    But it’s not actually there yet – announcements have been made, but only some of it is available today, and they admit that there are non-open-source binaries that will be included. The community will likely replace that quickly, but that’s another few weeks or months between Sun offering a download (99% open source, 1% not?), and having the whole enchilada be open source. That distinction matters to people like the Debian group, gNewSense, etc. (Some distros, like Ubuntu, may choose a more pragmatic path and distribute now with non-free binaries included, similar to the way they consider non-free binary drivers to be something that people should have the option to install.)

    In addition, the open-source community will also have to look over the code and try to decide how patent-encumbered the code is. This is the same struggle that the Mono folks have – if the implementation code is free, but the spec and/or code are covered by existing patents, the developer and end-user are still locked in the trunk.

    So, I predict it’ll be a few years (3-5?) before all of the non-free junk is scrubbed out of Java and it is declared actually open source, 100%.

  2. Quite interesting, I am a beginner and want to write just a home backup script (like rar and copy x directory to y mapped drive and label with date-time, maybe append a log file and add on as I learn what I can do.).
    As a beginner I like the concept behind ruby and have tried perl and a few others in the past, yet still struggle.
    I have found quite a bit of enthusiasm of Ruby and am glad to read your forthright comments.

  3. How goes the backup project? Were you able to build the system that you were after? I’m hoping to pick up some scraps!

Leave a Reply

Your email address will not be published. Required fields are marked *