perl – Pervasive Code

Capacity vs. Scalability

Jamie Flournoy — Wed, 14 Nov 2007 00:23:58 +0000

In I still donâ€t get the fascination with Ruby on Rails, Andy Davidson writes:
Scaling does not mean â€œAllows you to throw money at the problemâ€, it means â€œCan deal with workloadâ€. He goes on to recommend mod_perl instead of Rails.

I’m not interested whether he likes Rails or not. Lots of people hate Rails, and I don’t care. I’m not going to make a big deal about the fact that he’s comparing a runtime architecture (Apache + mod_perl) with a framework (Ruby on Rails).

Those are insignificant compared to his claim that scalability means “Can deal with workload”. Actually, that’s a description of capacity.

Scalability is a very distinct concept from capacity. Scalability is not a true/false property of a system; there are degrees of scalability, which can be represented in a 2D graph of # of simultaneous requests that you can service with an acceptable response time (X axis), plotted against the resources required to service those requests (on the Y axis). The function f in the y=f(x) equation that is behind that graph is how scalable your application is.

If it’s a straight line, that’s quite good: “linear scalability”. More requests cost the same amount per request as the ones you’re getting now. Double your customers, double your net profits.

If it curves down away from a straight line, that’s even better than linear scalability: you’ve attained an economy of scale, so twice as many requests costs less than twice as much as the amount you’re paying now.

If it curves up away from a straight line, that’s bad, because more load means a greater cost per request. Each new customer makes you less money than the last one. Eventually you will grow to a point where you lose money and your business fails. This is what people are referring to when they say something won’t scale. Linear (or better) scalability curves are what people mean when they say something will scale.

In the worst case, the upward curve is asymptotic to a vertical line. In other words, at some number N of simultaneous requests coming in, you “hit a wall”, and no amount of extra resources will help you. “Allows you to throw money at the problem”, as Mr. Davidson puts it, actually describes all three curves, except for this worst case of curving upward asymptotically. But as long as you don’t hit a wall, “Can deal with workload” is satisfied. The more interesting questions, though, are how much it costs you to add capacity, and whether there’s a certain number of requests above which you start to make or lose money.

Of course, the ideal curves are not what you see in practice. In reality you buy resources in chunks, such as a server or a specific plan of bandwidth, power, and rack space from your colocation provider. The graph looks more like a staircase, and going from N customers to N+1 customers means you have to spend $thousands on new hardware. Each of those chunks represents a certain amount of capacity. Capacity is just a measure of how large each chunk is, or of the largest value of X that your server cluster can support without more resources.

But you can’t just extrapolate from the fact that a single server S will support, say, 100 requests per second, that 100 of them will support 10,000 requests per second. If only it worked that way, capacity planning would be really easy. Sadly, architecting a web application for linear scalability is hard. (It’s doable and the approach is fairly well documented, but it’s not easy.)

I think it’s worth pointing out something now which should be obvious: the slope of a straight line doesn’t change its curvature. If you’re paying a silly amount for each request because you’re using an inefficient architecture that scales linearly, but you’re making an even larger silly amount from your customers, you’re still going to be in business if you have 10x as many customers. You may be leaving money on the table due to inefficient use of resources, but you’re not ruined.

If you’re lucky enough to be in that situation, you can probably hire one or two sysadmin/developer ninjas to optimize your app and change the slope of your line downward. Alternatively, you might decide to increase your profits by just buying more servers and advertising. You could even do both.

Likewise, if you’re losing an average of $5 per customer visit regardless of how many customers you have (with that hard-to-attain linear scalability again), then adding a bunch of servers isn’t going to help you. Sun, Compaq, and Dell sold a ton of server hardware in the late 1990s to companies that didn’t understand this.

In a more realistic scenario, you might be paying a lot for servers but not have much revenue. Improving your application’s efficiency would reduce the cost of your resources somewhat, and you might change from losing money to making money. That’s great, but if your curve bent upward before, it still bends upward, and if you were gonna hit a wall before, you still will. The right decision might be to worry about that later, once you’re profitable. At that time you could afford to change your architecture to scale better. Or, you might choose to invest in the future and focus about scalability improvements and growing your customer base now, losing money now but making piles of cash later when you eventually improve your efficiency.

In conclusion, to tie the two terms together: scalability is a measure of how cost-effectively you can grow (or shrink) your capacity.

And to tie this topic to my ongoing claim that trading runtime language performance for developer productivity is generally a good idea for web apps:

Language performance does not affect whether an application scales or not. It is a coefficient to the cost of capacity.

The cost of capacity affects the slope of your curve, but not the curvature. That’s important. Your architecture and application design are what affect the curvature of your scalability. You need to pay attention to both: the curvature of your scalability function and the cost of capacity will tell you where to invest your developer and sysadmin resources for the best return on investment.

Evaluating Future Web Application Technologies

Jamie Flournoy — Mon, 12 Nov 2007 23:22:30 +0000

Technical Architecture is a Form of Investing. I’m reminded of this sort of thinking because of recent news from RubyConf 2007.

First, IronRuby joins Ruby.NET in providing a Ruby runtime on .NET. They’re at different stages of completeness, and building on different .NET runtimes (DLR vs. the regular CLR), but the important point is that Microsoft is investing in dynamic languages. Is it ready for production today? Probably not. But keep an eye on Ruby, Python, and JavaScript if you’re a .NET developer.

Second, JRuby 1.1b1 has been released and as expected is considerably faster (see item #5 in this link) than the standard “MRI” runtime. JRuby joins Jython and Rhino in providing a JVM-based runtime for a dynamic language, with features designed to help developers mix and match the dynamic language code with Java code.

See the trend here? Python, Ruby, and JavaScript are emerging as the dynamic languages of the future for .NET and Java developers.

The hard work done by Sun and Microsoft to make their VMs work well is being leveraged by the next wave of languages. Threads, high performance I/O, memory management, and portability are all features that are quite expensive to get right, and the .NET and Java platforms have pretty much achieved that at this point. (Piggybacking newer, higher-level languages on these mature runtimes means that you get a mature new language runtime faster than if each language’s runtime were built from scratch and painstakingly debugged in isolation from the others.)

There are still some hurdles (performance, type safety fears, lack of mass market acceptance, ECMAScript 4 standardization and adoption, etc.), but in 2 or 3 years, things are going to change dramatically in the web application development world. The seeds of this change are already sown, and it’s just a matter of time. Threads, SQL, OOP, and garbage collection are all features of web application architectures that were initially controversial, but have now met with general acceptance. Dynamic languages are clearly the next step.

Obviously, Java and C# are far from dead, and in 10 years people will still be coding in Java and C#, because as with other languages like C and assembly, the newest and highest-level language isn’t automatically right for every project. But if you’re building web applications, most of what your code does falls into the categories of string manipulation, collection operations, or file and socket I/O. Image processing, crypto, full text search, and other CPU-heavy, byte-twiddling features may be part of your application, but you’re not writing the image scaler, RC4 cipher, or inverted index yourself; those are done in a library, probably written in C, and you’re just calling it. So your needs are likely to be similar to the sweet spot of dynamic languages: maximum expressivity and the fancy features to let you write clever code, making you productive and making the code as clean and elegant as possible. In other words, they put developer productivity first (lower labor cost and shorter development schedules) at the expense of runtime performance. Since hardware gets cheaper over time but code gets uglier over time, this is probably the right choice to make for most web application projects.

Another interesting benefit of layering instead of starting over is that the integration between dynamic languages and Java or CLR languages is much nicer than managed vs. unmanaged code in .NET or, even worse, JNI in Java. That is, it won’t be a bloody mess to mix and match code, from a technical feasibility standpoint. This matters, because The Big Rewrite is among the Things You Should Never Do. But little bitty rewrites are fine, especially if you have a thorough test suite to help you avoid breaking things. (By the way, dynamic languages are great for writing automated tests.)

Which of these three (JavaScript, Python, or Ruby) is going to be dominant 5 years from now? I don’t think any of them will be. The dynamic language community is fragmented, and the various vendors and big sponsors of these three languages are fairly entrenched already. Microsoft is investing in all three; Google has standardized on Python and JavaScript to the exclusion of Ruby; Sun has hired the JRuby team; Mozilla is heavily invested in JavaScript; Adobe supports JavaScript in AIR but not Ruby or Python, etc.

In fact, if you encapsulated the glue code sufficiently well, you could mix and match JavaScript, Python, and Ruby in your application, and port your hideous hydra between the JVM and .NET. You would be wasting a lot of effort since the three languages are largely similar, but you could do it. Alternatively, you could create a portability layer between the DLR and the JVM a la WxWindows, and write-once-debug-everywhere in a more productive language than Java.

These are all repugnant ideas, but only because as I write this and as you read this, we probably realize that to attempt this today would be a huge task. But what about in 2010? Probably not so gross. What about an application that could be executed on Silverlight, AIR, Firefox, SWT, and Mono, unmodified? How about a mobile app that runs on smartphones regardless of the runtime (.NET vs. J2SE)? Not gross at all, and not unthinkable if your app is written in JavaScript using some kind of portability layer that doesn’t exist yet.

In the longer term, JavaScript (a.k.a. ECMAScript 4) is likely to become extremely popular. As far as I know it’s not quite a perfect fit for Steve Yegge’s The Next Big Language, but it’s the closest thing there is, and it has two critical advantages over Ruby and Python that will make it successful: C-family syntax (which makes development tools cheaper to build) and effectively unanimous buy-in from vendors and developers.

So, what about the other dynamic languages that people are using in large numbers today? What’s going to happen to ActionScript, CFScript, PHP, and VB?

ActionScript and CFScript are pretty close to JavaScript by design; I’ve read that ActionScript 3 is actually compliant with the ECMAScript 4 draft specfication. It’s pretty clear that Adobe is betting on JavaScript. In the near future (2 or 3 years) I predict that Adobe will rev its products and support ECMAScript 4 across the board.

PHP and VB.NET/VBScript will hang around for a long time because they’re approachable and already very popular, but they’ve already peaked, and will steadily decline as developers switch to C# (on the .NET side) and Rails (on the Linux side), and then JavaScript as soon as a serious web app framework and an ISP-friendly runtime exist. Microsoft will keep investing in VB to keep customers happy; Yahoo will keep investing in PHP because it is so heavily invested in PHP already; new developers will find PHP to be an easy starting point for light duty web development, with tons of documentation and free applications that they can download and hack. But PHP will not inherit the kingdom from C# or Java, and the languages which do achieve mainstream success after C# and Java will do everything that PHP does language-wise, and the market momentum around those languages will make them better than PHP at what PHP does. Developers will ask themselves why they would write the client side and server side in two different languages, especially when the server-side language is more expressive and has better portability and libraries. That’s not true yet, but it will be in a couple of years. In 5 years or so PHP and VBScript will go the way of Perl CGIs: still used, but by a community a tenth of the size it is today.

What about the new Java-based dynamic language, Groovy? Groovy is interesting, but it’s too late. The Java mainstream of vendors and developers only recently managed to convince the world of “serious” C++ developers that automatic garbage collection and JIT compiled bytecodes can actually work in a high traffic context. The next battle, to promote the dynamic language features that Java lacks but which Groovy brings, will take years to fight. Once a developer makes a decision to not use standard Java, Groovy is on a more or less level playing field with the JVM-hosted versions of Python, JavaScript, and Ruby, but each of those languages has far greater adoption than Groovy, and each of them has greater opportunity for leverage on other runtimes than Groovy. For a Java developer, once the door is opened to other languages, the only advantage Groovy has is that its syntax is familiar. Compare this to JavaScript which web developers also need to know how to use; why learn a third language (Groovy) in addition to Java and JavaScript? Over time the simplicity of coding and debugging in JavaScript on client and server, together with dynamic-language productivity, will overcome the momentum of the Java standard, and web developers using server-side Java now will gradually replace it with JavaScript on the JVM. Conservative attitudes in the mainstream Java community (including Fortune 500 companies and the many offshore development firms that write code for them) will make this take quite a while – probably 5 years before JavaScript becomes a common part of the architectures that currently use J2EE, and 10 years before Java goes the way of COBOL (maintained forever but not used for new projects).

So in conclusion, keeping an eye on the future value of a technology, including who’s investing in it and who’s talking about investing in it, is critical to making your own investments today. In five years you’re not going to be using the same technology stack that you are today, and your project’s success and your own salary will be tied in large part to how well you invested today.

ActiveRecord: the Visual Basic of Object Relational Mappers

Jamie Flournoy — Fri, 05 Oct 2007 02:07:53 +0000

I’ve been working with Ruby on Rails intensively for several months, and I’ve finally found a place where Rails can’t readily be extended to do what I want. It’s ActiveRecord, which is probably the most controversial part of Rails.

I’m reminded of a James Gosling quote disparaging Microsoft tools, particularly Visual Basic: “The easy stuff is easy, but the hard stuff is impossible.” There’s a parallel between VB and Rails in this instance, in that if you only let yourself use the high level tools, the hard stuff is impossible, but the designers specifically tell you to do the hard stuff using a lower level toolset. The controversy that surrounds “X can’t do everything, therefore it sucks” should really be focusing on the feasibility of going through that trapdoor to do things “the hard way”. This is what Delphi did, which is why so many folks chose it over VB; it made the hard stuff easier.

Here’s the task I need to accomplish, for which ActiveRecord is not well suited: complex queries involving SQL functions and multiple-table joins. I want to join a few tables together, order by a SQL function, include with each result row the result of a SQL function that operates on each row, and have all that come back as a graph of high-level objects.

Despite my attempts to use plugins, extend and/or fix bugs in those plugins, and to dig through the ActiveRecord source to figure out what the documentation won’t tell me, I was unable to get it to work. Most of the parts of what I wanted was possible: acts_as_tsearch cleverly weaves SQL functions into a high-level ActiveRecord::Base.find calls; paginating_find provides a very convenient pagination API on top of ActiveRecord::Base.find, and ActiveRecord includes some clever association tricks such as automatic many-to-many relationships (has_and_belongs_to_many), eager loading of associated records using a join (via the :include option to ActiveRecord::Base.find), and a fairly low-level :joins option that lets you add tables to a ‘find’ query which can be used in your :conditions. Problem is, they don’t all work together in a fancy way.

Really, the issue in this case is related to the design choices that went into ActiveRecord.

Some ORMs (object-relational mappers) are designed in a modular fashion: there is a part that helps you describe the relationships between your model objects, a part that helps you construct queries, and a part that does the storage and retrieval. Sometimes there’s another part that uses your description of object relationships to create an empty database with the appropriate data model, or that looks at an existing database and creates an object model that matches it. Sometimes there’s an import/export tool for bulk data loading or dumping as well.

ActiveRecord has the first three functions integrated (which has benefits and drawbacks compared to a more modular approach), has a very isolated schema manipulation module, and has a somewhat isolated data loader tool.

The relationships are explicitly declared in source code using associations: has_one, has_many, belongs_to, and has_and_belongs_to_many. These are pretty fancy and provide some convenience features that make the associations appear as object collections, such that changing the collection and saving it turns into insert/delete/update activity in the database.

Query construction is basically tied to the objects themselves, in a way that greatly simplifies star-join queries, but which handles only the simplest joins across multiple tables, and is barely able to handle self-referential joins at all. So, you can easily load an object (or group of similar objects) and associated objects, but OLAP-style queries (“what are the top 5 states where customers are located who have bought classical CDs within 2 weeks of their release using American Express and had them shipped as gifts via UPS 3-day Select?”) are impossible. Oddly, views, functions, and stored procedures could bridge the gap between real-world data models and ActiveRecord’s limited set of association types, but they are not supported either.

The storage and retrieval code is inseparable from the query code, and so it is not possible to examine and modify the final SQL before it is executed, nor is it possible to provide an arbitrary query and have the results be parsed into an object graph based on the associations you have defined. The code that would allow these features appears to exist and be sufficiently well designed to allow this with a fairly small amount of changes to ActiveRecord. However, it is currently (as of Rails 1.2.3, which is the current release) not part of the documented API and is declared private.

There is a limited facility for constructing simple objects from arbitrary SQL, in find_by_sql. This loses essentially all of the high level functionality of the find method; most notably, it isn’t possible to use find_by_sql results to instantiate an object graph, rather than a flat array of objects (similar to the eager loading feature in the regular find method).

ActiveRecord has fairly good high-level schema creation functionality (“migrations”). Though it lacks concepts for all but the basic database objects, support can be added for foreign key constraints (I kid you not, they aren’t supported by Rails itself!) and views. There’s also a simple way to execute arbitrary SQL. Migrations aren’t technically that amazing, but rather they’re a helpful organizational approach to what can be a really hairy problem: defining a schema and then applying changes to live databases while keeping track of what changes you’ve already applied.

Finally, there is a test data loading facility called Fixtures. The common opinion of Fixtures seems to be that they are broken by design and should be avoided. The main issue I’ve found with them is that the implementation ignores the kind of database design elements that any book on SQL would recommend, such as foreign keys and check constraints. I managed to circumvent this with a combination of a plugin and some customization, described in detail in my previous post, Rails, Fixtures, the Test DB, and Test::Unit. With those changes, all test fixture data is preloaded in the right order (so constraints aren’t violated) before any tests run, and any data alterations within tests are rolled back automatically by Rails.

A secondary issue with Fixtures is that they go directly from YAML text files to SQL INSERT statements, bypassing the ActiveRecord Model classes. ActiveRecord does pretty much rule out any fancy mapping between database tables and objects, so that’s not a problem, but this model-skipping fixture loading implementation means that any code in your model object (validations, before_save filters, etc.) will not be executed when loading fixtures. So fixtures do not work well with the otherwise pervasive Rails design rule of “put all the intelligence in the application”.

Still, despite the commonly-held disdain for using fixtures at all, I find that they can be tamed. In fact I’ve even created a base data facility for loading the fundamental data set that needs to be in the live database (e.g. initial admin user info). My approach is basically to alter fixture behavior to treat it as essentially a bulk data loading tool, and to do the extra housekeeping after loading to make up for the fact that the ActiveRecord model code was bypassed.

As far as I know, there is no bulk data dumping functionality in Rails.

So, to summarize, of the five main ORM features, here’s how ActiveRecord stacks up:

Describing Relationships: Easy to understand and use, with lots of slick functionality
Querying: Easy to understand and use, but limited to simple join structures, and not possible to customize query building or rewrite SQL before execution
Storage and Retrieval: Very easy to use, but only within the limits of the query builder’s features
Schema manipulation: Easy to understand and use; limited in functionality but readily extensible; solid third party plugins are available for missing schema objects
Bulk Loading and Dumping: Loading is badly designed and implemented, but fixable with some effort; dumping is not offered

Okay, so it definitely makes the easy stuff easy. But what about the rest?

As I observed before, ActiveRecord is not designed as a set of modules that you use to assemble a solution that fits your needs. That’s more of the Java approach to design, and it trades flexibility for convenience. It can be a major pain to assemble a working system out of all of those abstract Java APIs, which are sometimes so comically over-patternized as to draw mockery such as the hilarious “Are Javalanders Happy?” code snippet from Execution in the Kingdom of Nouns. Rails makes the opposite trade-off: sacrifice flexibility and gain a very approachable API.

Unfortunately, the Java approach (too abstract to readily use, but extremely flexible) is easily wrapped with a simpler, more convenient, less customizable API. The Rails approach isn’t internally componentized (have a look at ActiveRecord’s activerecord/base.rb source file in its 2,165-line glory, almost all of which is one class), so if you want to fiddle with its internal behavior, you can’t. So with Rails, it’s all or nothing: high level slickness for simple requirements, or hand-written SQL and hand-coded results mapping for your complex requirements.

As I said at the beginning, though, the key question is not how comprehensive the high level feature set is. More important is the question of how painful things are when you drop down to a lower level for a greater degree of control.

It would be nice if there were a middle level of complexity, between the high-level ‘find’ method and ‘has_xxx’ associations, and raw SQL. There isn’t. I think that the reason there isn’t one is that there is still a persistent belief among many Rails core team members and community members that databases should be stupid: just a persistent hash. Once upon a time I worked that way myself: I didn’t have access to or skill with a SQL RDBMS, and so I solved all of my persistence problems with DBM files, which (using Perl’s Tie::Hash class) are conceptually just persistent hashtables. miniSQL was little more than a SQL query parser on top of that sort of storage engine, and MySQL originally was pretty similar. But big databases have all sorts of useful features that address complicated persistence requirements in a fairly elegant way.

Given that Ruby fans like the idea of domain specific languages, which let you work in a super high level language customized to the problem domain, it’s surprising that Rails groupthink is that SQL is bad. It’s actually a very high level language, and allows a well written database to do some pretty amazing optimization on the fly because it provides a strong layer of abstraction between what you requested and how the storage engine provides it.

No, it’s not dynamic, nor is it pure relational perfection, but it’s pretty darn good. Pre- and post-event validations and arbitrary callbacks to user-specified code, functions providing behavior on top of data… these are all things that Ruby and Rails fans hold in high regard when provided by Ruby and Rails, but which are considered a bad idea at the database layer. As I discussed at length in Rails and the notion of Stupid Databases Being a Good Idea, this is a philosophy rooted in DRY, but it has some major flaws.

Mainly, there is the issue that some things must be done in the data tier, and trying to put them in the application tier doesn’t work. The best example that comes to mind is full text search. Satisfying queries is the database’s job, period. It’s just hideously slow to try and do an inner join in the application across a network link to a database. If you find yourself doing this, that’s a pretty good sign that your architecture is broken. But some queries are too complicated for ActiveRecord, so sometimes you must choose between a series of high level queries whose results are intersected in application code (easy to understand, but extremely inefficient), or hand coded SQL.

Well, SQL is fast and is a high level domain-specific language, so it isn’t actually a bad tool for the job. The problem is that this approach (the trapdoor to the lower level API) is regarded differently by different people. Some see it as a common and reasonable approach to complex requirements; others see it as a bad evil scary thing that should be avoided at all costs, a kludge and a design mistake.

As a result, the low level option in Rails is anemic. It’s there, but you’re not supposed to use it. Ruby’s ActiveRecord Makes Dropping to Raw SQL a Royal Pain (Probably on Purpose) notes that there are no bind variables allowed in ActiveRecord. You may be saying, “No, wait a minute, I’ve used them, that can’t be right.” That’s what I thought. Look at the source; the bind variable functionality is actually a high level feature built on top of drivers that don’t have that feature. Whatever you did at the high level, it’s going to the driver as a single string. Okay, it’s nice that they added that feature, especially since it provides a single point of testing and verification for safe escaping. But that functionality (in sanitize_sql) is not part of the public API. Fortunately that same article provides a workaround that makes sanitize_sql accessible, so you can use bind variables in your hand coded SQL code, and pretend that the driver supports them. But that’s not likely to work forever.

The key problem with ActiveRecord is its least common denominator feature set, based around the least featureful of all popular SQL databases: MySQL. Years ago, MySQL AB (the vendor of the MySQL database) took a strong philosophical stand against pretty much any advanced database features (which their product lacked, and which competing products had), but lately they’ve softened and added those features that they claimed nobody really needed. In the meantime, Rails has been designed with minimal expectations for database sophistication; therefore, the limited functionality of ActiveRecord is fairly complete, assuming you’re using a database with similarly limited functionality.

Triggers, stored procedures, functions, data integrity constraints, nested transactions, and views are all examples of unsupported database functionality. Try and use them via ActiveRecord’s high level API, and you will quickly see how fragile and inflexible ActiveRecord really is. If you shouldn’t need those features in your database, then you shouldn’t need anything that ActiveRecord doesn’t already provide, so it shouldn’t matter that you can’t extend ActiveRecord.

Truly, these are features that you need only in a few small cases in your application, so looking at individual queries they’re needed rarely (which is not the same thing as “never”). But looking at whether you need one or more of them in a given application, they’re needed more often than not. The pain of using hand coded SQL makes this worse: some tricky things could be done either using a view or stored procedure, or using a really slick dynamic SQL statement. Making all of those options painful means that even a clever developer can’t use anything in their bag of tricks to craft an elegant solution.

Unfortunately, non-trivial web applications need things like full text search, complex associations between persistent objects, non-trival summary information about associated objects, and complex reports, and ActiveRecord fails at all of these. These are not just things that big dumb ancient companies that like using Object COBOL think they need; Amazon and eBay need them too.

The acts_as_tsearch plugin is a good case study of ActiveRecord’s design flaws. TSearch2 is the standard PostgreSQL full text search engine, and it’s pretty good in my opinion. It’s also pretty straightforward to use. Unfortunately for developers using Rails, TSearch2 uses SQL functions (mainly to_tsquery and rank_cd). The acts_as_tsearch plugin tries to inject SQL into ActiveRecord’s queries via the high-level find interface, but ultimately fails as soon as you use the :joins or :include options. The problem is that ActiveRecord has a very simplistic idea of how queries and joins work, and so if you need to inject SQL functions to get the job done (as is necessary in TSearch2 queries), too bad. (See also issues 7 and 8 in acts_as_tsearch, in which I describe and attempt to clean up the mess that results when you use find_by_tsearch in non-trivial ways.)

A fellow Rails developer asked me in all seriousness why I wasn’t abandoning the full text search functionality of TSearch2 and just using a completely separate, redundant database product designed exclusively for full text search. Seriously, that is considered the “easy” approach: one database for full text search, and another for ACID/OLTP/CRUD. Honestly if I were going to go down that road I would try hard to just abandon the SQL RDMBS and put everything in the other database, since Lucene and its imitators are capable of far more than just find-text-in-document queries. The pain of duplicating everything, using two query languages, two document representations (in addition to the object representation in Ruby) and writing application-tier query correlation makes the double-DB approach seem very unwise.

It makes far more sense to me to use the SQL RDMBS’s full text search facility, even if there’s a 2x or 3x read performance penalty, because the conceptual simplicity of having one powerful storage tier (instead of two halves cobbled together) eliminates a ton of ugliness in the application, and the SQL RDBMS is going to get clustered for reads anyway. Nevertheless, even if I’m wrong about this case (putting search in the SQL RDBMS instead of in a separate server), there are other cases for needing a smart database that gives you exactly the results you need and lets you push data logic into the data tier.

So, what do I suggest? Abandon Rails? Nope. I still like Ruby a lot, and find Rails very useful. I just think that ActiveRecord needs to support the low-level and middle-level abstractions better.

Specifically, supporting bind variables (either by exposing that sanitize_sql function, or better yet by making drivers and connection adapters support bind variables for real) would make the find_by_sql, select_all, and exec approaches to low-level SQL query execution less painful.

More difficult, and substantially more valuable, would be refactoring ActiveRecord::Base to split it up in the way I described above: association descriptions and unmarshalling code separate from query building code separate from SQL execution and result retrieval code. All of this could remain hidden for most users under the same old slick high-level API, but for advanced requirements, the ability to fiddle with the SQL and still use the built in high-level unmarshalling code to create object graphs from flat result sets would be very powerful, and useful.

I looked at one alternative to ActiveRecord, called Sequel, which overlaps with ActiveRecord only partially. It is a query builder and lazy result proxy, which is actually what I thought ActiveRecord would do when I first started working with Rails. The proxy design means that you can either keep adding constraints or start fetching results, from the same Dataset class. This seems like a pretty good approach, though I haven’t really looked closely to make sure it would fit what ActiveRecord needs.

What Sequel lacks, though, is the unmarshalling side: turning a 2-dimensional (rows of columns) result set into a complex object graph (customers with orders with order lines with products from suppliers stored in warehouses), with user-controlled eager or lazy loading behavior. Ruby is well-suited to a design that would allow user-specified code (i.e., a block) to decompose each row into the object graph associated with that row, leaving the remaining associations on those objects to be lazily provided via future queries.

So, I think there is hope for ActiveRecord, definitely. I considered the idea of rolling a minimal Hibernate clone, or some other sort of challenger to ActiveRecord, but I don’t that ActiveRecord is broken beyond repair. I think the shortest path to a badass Ruby ORM is through improvements (refactoring and abstraction) to ActiveRecord.

So, if you’ve read this far, you probably care about these issues. Here’s my call to action: Please help me make ActiveRecord less like VB and more like Delphi. Who else is interested in helping me with this effort? Are there alternatives that I’ve missed, or components that could be integrated into ActiveRecord to make it better?

“Ruby faster than Python and Perl!” ORLY?

Jamie Flournoy — Thu, 16 Aug 2007 23:05:46 +0000

Ruby faster than Python and Perl! cries the headline. This is based on a benchmark that tests i = i + 1 in a loop, so it’s a particularly useless benchmark, even in a world of benchmarks designed to test unrealistic scenarios that make the benchmark author’s product look good.

But wait! A commenter accuses the poster of cheating! (On a benchmark? No!)
>Ummmâ€¦. Why did you test Ruby with less data than you tested Python and Perl? You cheated.

As it turns out, the “microbenchmark” scripts for different languages have differing loop counts, so the total run time is super extra especially meaningless as a way to compare language performance.

When I was in 5th grade (age 9) learning AppleSoft BASIC on the Apple ][+ in math class, we wrote programs that did this:

10 X = 1
20 PRINT X
30 X = X +1
40 GOTO 20

And we would race each other, starting at the same moment and seeing whose column of increasing numbers on the screen scrolled faster. This was of course stupid because all the computers in the lab were chip-for-chip identical to each other, and probably were all made on the same production run on a single day.

We learned that we could cheat by adding a number larger than one in each loop iteration, which was quickly detected and outlawed. Far more cleverly, someone figured out that you could do something like this:

20 PRINT X: X = X + 1: GOTO 20

The same algorithm yielded better performance if it was all written on one line of code. I could be misremembering (it has been 25 years and I don’t have a ][+ handy to verify this on) but it was something like that. Anyway, we learned that the same language runtime on identical hardware using the same algorithm could be made to run faster or slower using simple formatting changes.

So does it make sense to compare different languages this way, which may mean favoring one language’s idiomatic code structure while hitting a weak spot of another? This is a common, and in my opinion valid, critique of apples-vs-oranges benchmarks: how do we know that the performance difference isn’t due to naive coding or configuration on one side and expert tuning on the other side? For that matter, do we know that the benchmark design isn’t selected specifically to highlight exceptionally high performance in one area of a product, to the exclusion of embarassingly slow areas that the benchmark designer would prefer that you not consider?

Thus I claim that this benchmark is approximately as valuable as my 5th grade silly hacks. X=X+1, change the number of iterations to suit your bias, or perhaps just don’t bother making them the same because it’s meaningless anyway. (Z=X*Y and a matrix multiplication are other parts of this benchmark, but they too are so trivial in concept and implementation as to be equally pointless.)

I’m going to guess that the author of the blog post didn’t notice the different in loop iterations, or was looking at the per-second values rather than the total run time. But if we’re looking at average performance over time, then how long does it take for the performance to stabilize? Stabi-whatchamaha? Ask Zed Shaw: look at his list of pet peeves, #3.

Do we know that 0.142 seconds is enough to measure “language performance” (really, it’s the performance of a particular runtime environment being measured) including stuff like garbage collection and JIT compilation overhead? If one language’s runtime waits for N iterations before JIT-compiling the code, whereas another runtime waits for 5N, how many total iterations do you need to minimize the effect of that?

What happens in JRuby, Ruby2C, etc.? The poster says the tests were run on a MacBook Pro – what architecture (PPC vs. Intel) were these language runtimes compiled for, with what compiler, blah blah. GCC versions, optimized for certain CPU models, etc. This stuff can make a big difference in CPU benchmarks, which is why proper benchmarks include things like this in their configuration information.

Or are you measuring small script execution time, and >1s runtimes are meaningless for your needs, in which case Java seems painfully slow and Bash lightning fast?

What the heck is being measured by this “microbenchmark”? Language fanboy gullibility?

For a less awful benchmark, have a look at the Computer Language Benchmarks Game: “What fun! Can you manipulate the multipliers and weights to make your favourite language the best programming language in the Benchmarks Game?” At least they realize how not-terribly-useful synthetic CPU benchmarks of language runtimes are.

Anyway, for most applications, if you’re choosing your language based on runtime performance, you’re choosing very poorly. If you’re choosing your language based on a really awful “microbenchmark” comparable in accuracy to the first toy hack of a room full of 9-year-olds, well…

Impressions of Ruby on Rails from an ex J2EE developer (me)

Jamie Flournoy — Fri, 06 Jul 2007 05:20:15 +0000

A friend who is working primarily in the J2EE technology world (as I was, until early 2006) asked me for a how’s-it-going with respect to Ruby and Rails.

The short version:
– Ruby is fun to program in, as you’ve probably heard
– Rails is over-hyped, but it’s still quite good (definitely not perfect)
– I like the productivity of Ruby on Rails but I wouldn’t call it a silver bullet by any means
– Ruby performance was bad and is getting less bad, and can even be good if you do what the experts say
– The real gem (har har) in the Ruby and Rails space is the community itself

Now for the longer version:

It’s hard for me to be objective since I only started programming in Ruby in February and started using Rails in April, but the language and framework don’t strike me as tedious or kludgy at all. I’m definitely enjoying Ruby and making use of its fancier language features, all of which seem to actually work as advertised without any hideous gotchas when you actually try to do something slick.

I particularly enjoy the way Ruby’s collections allow you to write in the style of a functional language; have a look at the Array, Enumerable, and Hash classes to see what I mean. It’s possible to write extremely compact code that morphs a collection into something very different, but without the inscrutability of Perl.

Compared to my experience with mainstream Java and J2EE, I certainly don’t miss Tomcat, Ant, Struts, or JSP Expression Language. (Instead I’m using Mongrel, Rake, Rails, and ERB.) That’s not Java’s fault, and could be alleviated by just picking better tools and frameworks at the start of a project. I don’t miss Hibernate but I don’t have any gripes about it either.

There are features in Java 1.5 that are helpful in alleviating some of the annoyances of programming in Java, but because these are new features, the vast majority of library code in the wild doesn’t use them (similar to the problems with OO being bolted onto Perl and PHP). Also there are political and business deployment roadblocks to using the latest technology: just because you can download an official release of something and it works perfectly for your code, that doesn’t mean that you necessarily have permission to deploy that technology stack onto production servers. Big organizations take years to officially support new releases, so you may be stuck developing against 3 year old products and APIs.

I’m not trying to troll for Java vs Ruby comment flames here. My point is just that Ruby has had features for a relatively long time that Java is just recently adopting (or simulating), and that’s important for more than just petty bragging rights; a feature that’s unavailable to you for your own org’s political or financial reasons might as well not be there at all. (Or maybe it’s worse, if it makes you bitter about your own work environment: “I have to write all this ugly stupid code because Bob is afraid to upgrade the server… grr.”)

I can remember quite a few times when I painted myself into a corner in Java and had to write lots of type casting / checking / converting code, or had to try and work around single inheritance using composition and lots of boilerplate proxy methods, or really wished java.lang.String weren’t final so I could subclass it, etc. Ruby feels a lot more like Perl or JavaScript than Java when it comes to being able to do whatever you want when you’re sure you know what you’re doing, and being able to write weird fancy code that magically makes everything that calls it become dead simple.

I used to be horrified at the idea of messing with somebody else’s classes (as is done by Hash.slice from will_paginate, which I swiped all by itself since I don’t really need will_paginate), but I’m finding that the apocalypse I feared just isn’t there. If you do something wacky, like add a new method to String, the world doesn’t collapse. Or maybe it does, and you decide not to do it that particular way; just do something else. ActiveSupport adds all kinds of stuff to base classes like String, and it’s OK. Example: ActiveSupport::CoreExtensions::String::Inflections.

I’m still adjusting to using a dynamic language: not uber-checking arguments and state all the way down a method call stack (oh no! is it nil?), not locking down everything so it can only work the way I planned, etc. In most cases, the Ruby equivalent of a NullPointerException is fine. I think this is because the built-in error message includes the type and method you were trying to call, so you can usually just read that and see what you did wrong. The only time you’d want to prevent that is if you bury that error deep in your own code, so that the caller would have no idea of what they did that led to it. I’m finding that I’m OK with writing less paranoid code, because the failure will still happen, and be suitably clear in its cause, without my explicitly checking for it.

An issue with Ruby and Rails is that the official documentation is limited. It’s imperative to have 2 or 3 books on hand at all times, in addition to the RDoc Rails documentation. I own three of the Pragmatic Programmer books: Programming Ruby (the “pickaxe book”), Agile Web Development with Rails, 2nd Edition, and Rails Recipes. Just looking at generated API documentation won’t tell you how you’re supposed to use libraries.

The reverse way of looking at that need for documentation is symbol lookup. Rails does some pre-including of library and framework files before loading the application file you’re working on, so unlike Java, you don’t have a mandatory list of imports at the top that tell you where every class you’re working with came from. There are some symbols and functions you can call from instances of particular classes, and they came from somewhere but it’s not obvious where, at least when you’re starting. A good IDE with “find declaration” could fix this, but initially it’s daunting: “what are *all* the options for this method?” “Where is this method defined, even?” etc. This is where the books are less useful and API documentation can help.

I’ve also subscribed to a couple dozen RSS feeds that help me keep up with what the really slick Ruby and Rails people are using, writing, struggling with, etc. Chief among them is probably Err the Blog. The Ruby on Rails podcast is helpful also.

As for productivity, here are 4 specific time-savers I’ve found:

1) TextMate, which is very slick. There are Windows equivalents, though from what I’ve heard, they’re not 100% as good. Point is, a text editor that understands Ruby is critical, and if it helps you jump around in a big tree of files, that’s very helpful.

I’m an Emacs devotee, and TextMate is far better for Ruby and Rails. It’s hard to explain, but let’s just say that TextMate is much deeper and more featureful than it appears at first glance.

I looked at NetBeans which is pretty close to being a very smart Ruby & Rails IDE, but it was still suffering an identity crisis (am I a Java IDE? or a multi-language thing?) that cluttered the UI badly. This is a hot area of innovation right now so any week now TextMate might be eclipsed (or Eclipsed, har har) by something more full-featured. There are a few other options but I didn’t see anything that led me to believe a multi-hour or multi-day eval would be worth it.

2) Locomotive. It’s like MAMP or WAMP or any of those all-in-one application stack installers, except it gives you a working Rails environment. I set up PostgreSQL separately thru MacPorts (a cousin of the FreeBSD ports project), which was somewhat tedious but decently documented.

Getting all the Ruby and Rails stuff compiled from source is no picnic, and frankly unnecessary. On Ubuntu there’s a gotcha: the Ubuntu people and the RubyGems people can’t agree on a file layout, so Ubuntu’s rubygems package sucks. It’s better to install Ruby using apt-get and then install gems manually from the tarball.

3) Rails Plugins. There’s a lot of stuff being built on top of Rails and then improved day by day, and svn-over-WebDAV is the popular sync protocol. There’s a tool called Piston that helps you manage this. Anyway there’s a lot of stuff that is hard to do and commonly needed which is available in plugin form.

4) ZenTest, and more specifically, Autotest. It watches your source tree and runs only the tests in your test suite that need to be run based on what has changed; it’s the dynamic-language version of the incremental compilation that’s present in any good Java IDE.

Ad Hoc Software Planning with Graphviz

Jamie Flournoy — Mon, 19 Mar 2007 07:07:17 +0000

I’ve been playing around with Graphviz this weekend. I first used it a few years ago with a Perl script that sorta kinda knew Cold Fusion and JavaScript syntax and could output the Graphviz .dot file format, as a means of visualizing all the dependencies between source code files in a project with no compilation phase and no automated tests. It helped me answer a few questions that I had about the code: What should I write tests for first? What should I leave alone, because breaking it breaks a bunch of pages? What pages do I need to test in order to make sure that changes to a deeply-buried chunk of included code didn’t break anything? Having a simple tool that draws graphs of nodes in a fairly clean form can be really useful.

This time, I’m trying to visualize the dependencies among not-yet-written features of my current software project. I have an ERD, so I can use foreign key relationships to determine hard dependencies between tables, but I want to understand functionality in terms of larger functional chunks. Call ’em modules, components, or whatever; I want to know what needs to be built first, and how much of it I really have to build before I can build stuff on top of it and get a minimal system assembled. Ideally I’d like to be able to put together a core set of functionality and then launch, leaving the prioritization of what to do next until later. At that point I should have user feedback, some experience with the low level details of the app, some issues in an issue tracker etc. to guide my decisions.

There’s a Mac port of Graphviz that’s particularly nifty because it watches the input .dot file and updates the already-displayed drawing immediately when that file changes. My first thought was to use the subgraph clustering feature of “dot” to group tables from the ERD into modules, but that didn’t work because dot doesn’t do cluster-to-cluster edges. Instead it connects nodes, meaning tables, which resulted in a really amazingly ugly graph without clustering, and in an endless “Rendering…” message with clustering. I think the problem I gave it to solve (how to lay out the graph while still grouping all of those clusters of nodes in little rectangular boxes) was too complicated, and I don’t feel like gradually building up to that stage to see what the problem was or waiting hours to find out how long it’d actually take. So, I’ll just manually define functional clumps, and map out dependencies, and then try and define for each of them the minimum needed functionality to support the stuff that depends on it.

It would sort of be cool to have an uber-graphing database modeler / class browser / IDE / documentation tool, but so far every product I’ve seen that tries to do that is an unspeakable usability horror that makes you do all sorts of obtrusive stuff to your code and which runs slowly and has lots of bugs and is astonishingly expensive if you want the grown-up version. It’s way past the point of diminishing returns to try and find a tool that will model the database and classes and modules and requirements and keep it all in sync. Better to just get 80% of the way there by hand, make a decision, and then abandon the diagram and move on.

I’m also trying to get to a point where I can create a fairly good effort estimate for each feature in isolation, so that I can start playing the “what has to get cut to meet a given launch date” game. That’s kind of like The Planning Game, except that I have a “big picture” in mind and I’m just trying to take chunks of it and fit them into a short time frame. The Planning Game from XP seems to be more about distrusting the claim of an accurate understanding of the big picture, and just adding functionality that makes sense with each successive iteration.

I disagree with that point of view, but I think I have enough big picture detail figured out that I can avoid painful and tedious rework later. I know what I think the full implementation of each part will look like, so I think I can avoid doing things that will cause problems later.

The next question is whether I should try and create a “big estimate” based on what I think I want to build, given the “big picture” design I have in mind. I think that’s moot at this time; what I need is a definition of a minimal set of useful functionality, and an estimate for that. The estimate for later features can come later. I suppose that with a huge amount of money involved and lots of anxious investors, I’d need to spend more time on accurate estimation and project planning and less time on actual implementation, but that probably also would depend on investor wisdom. In other words, a reasonable idea of the big picture should be enough for smart developers, and a steady succession of new releases that add functionality should be enough for smart investors. Only if the financial bet is a really risky one would anyone need a serious estimate of where things will be in 2 years, and really, that’s almost impossible to predict anyway, and the prediction process itself may decrease the chances of success.

So, it’s time to carve out a chunk of work (using Graphviz to help me carve intelligently), make sure it fits a reasonably short time frame, and build it. Exciting!

Ruby First Impressions: Backup Scripting

Jamie Flournoy — Sun, 04 Mar 2007 06:09:06 +0000

I started programming in Ruby this week, and so far I like it a lot. From my initial use of Ruby as a backup automation scripting language, here are my thoughts.

You might be wondering, why am I working on backup scripting now? Don’t I have some big project I’m supposed to be working on 24/7? Yes, and actually this work is in the critical path of that project.

My super fast laptop is still away being repaired for a video problem, so I’ve taken a major hit in terms of the resources of my main computer: 90% less MHz, 36% less display area, 50% less memory. In the meantime, I’ve been avoiding tasks that need a lot of CPU or graphics performance and instead working on things that are easier on my old desktop computer.

This week, I decided that I would pause working on the design and implementation of my startup project, until I had really sorted out my server backup and monitoring situation.

I have a few servers at home and hosted in a data center, and there are nightly backups in place, but this is done using a hodgepodge of different programs. I’ve got some Bash scripts that invoke rsync every night, some Perl scripts that use cp -rp every night and once a week for different sets of data, and a backup program I found recently that is clever but not quite what I wanted, called rsnapshot. I had previously evaluated faubackup, but right after I started using it, I experienced filesystem corruption on the backup drive, which was pretty scary. (Also, faubackup doesn’t do remote backups, so I would have used rsync anyway.) Both rsnapshot and faubackup are designed to use hard links to save space, so in theory they’re pretty similar. Finally, my Macs use Carbon Copy Cloner, which is a GUI for psync that adds some extra steps to make the target volume bootable. Windows virtual machines in VMWare are handled as one giant VM disk file, which is wasteful but better than nothing.

Ideally, I’d like to duplicate much of what rsnapshot does, which really means using rsync + cp -al to do the heavy lifting, and using a high level wrapper script to manage scheduling and the backup archive. Also, I want something that I haven’t seen done well yet, which is detailed status notification: how much is being backed up, how much is changing daily, how much is being ignored on each remote system by the backup process, etc. And I want a daily summary email telling me that things are being backed up successfully.

So, I’m writing a bunch of Ruby classes, which represent data output from du, df, and find, and which do the text processing and math to provide totals and percentages in a convenient form for high level code. That high level code will interpret configuration data and remote status information, initiate backups, manage the archived backup data, and send summary emails.

On to Ruby impressions. As far as documentation, O’Reilly’s Ruby in a Nutshell is unacceptable because of poor proofreading; the examples don’t work, and the explanations are obviously wrong. I got Programming Ruby a.k.a. the Pickaxe book yesterday at the wonderful Stacey’s Bookstore, and I like it much better. I’ve also found a whole lot of good Ruby sample code online, not limited to just Rails examples as I had expected. It seems like there are a whole lot of people learning and falling in love with this language, and blogging and creating helpful web sites about it.

Regarding the language itself, it feels like Perl, except object oriented for real, and legible. I think I’m probably pretty typical of new Ruby programmers in that I love the way blocks, iterators, and specifically collect and inject work. Things like figuring out the total amount of free disk space on a remote server, not counting a predefined set of filesystem types (procfs, tmpfs, usbfs, etc.), take a half dozen lines of (admittedly fancy, but readable) code. I shudder to imagine how much Java code this would require.

Speaking of which, I’ve been a Java programmer (overlapping with shorter periods of Cold Fusion and PHP work) for 10 years, and a Perl scripter for 11. One thing I loved about Java was the QA first pass you get from static types and compilation: dumb mistakes are caught immediately, before you even try to run the code. Perl always felt expressive but risky, and Ruby feels even more expressive, but still risky in the same way.

The Extreme Programming folks figured out years ago that the only way to write working code in a modern dynamic scripting language (defined for the purposes of this sentence as one that uses dynamic types and is not compiled) is to write a ton of automated tests. Basically you get to omit the code that does casting and type declarations and local variable initialization from your code, but you are forced to write a bunch of explicit test code in return. This would seem like a pointless trade-off, except for the fact that you have to write all that test code for statically typed, compiled languages anyway. So really the question is whether you got to working, debugged code faster with compilation and static type checking, or without it. Although I do like the super fast feedback loop of incremental compilation that you get from IntelliJ or Eclipse with Java, I’m pretty sure the answer is that “without it” is still faster. It just feels really scary, until you’ve written tests.

To eliminate the scary feeling, I started using Test::Unit. It’s is pretty nice, but it is kind of disconcerting that when you define a TestCase subclass and require it in your test script, it seems to just run the tests on its own without being explicitly told to do so. Aside from that potentially irrelevant oddity, it’s working well for me so far.

A related issue is code coverage. Code coverage tools measure which lines of source code are executed during one or more runs of your program, and typically they also generate reports with nice features like bar graphs and highlighted source code displays that show you red areas for code you didn’t execute and green for code you did. That information tells you (with reasonable, but not perfect accuracy) how complete your automated test suite is.

So, as you code, you put TODO comments all through it for things to be written, and you write tests for code that you just wrote to make sure it works, and then you run the test suite with a coverage tool enabled and look at the coverage report to see what you forgot to test. Then when you’re super close to, or at 100% coverage, you attack those TODOs.

In Java, I used JUnit and EMMA as my unit test framework and coverage tool. Those were installed by hand, by me googling for them and then downloading the .zip files and expanding them by hand. I then had to write a bunch of custom Ant build.xml code to enable EMMA’s coverage instrumentation to be added to the compiled Java code. Then I had to write some more Ant build.xml code to generate the coverage report, and I had to do a whole lot of super tricky Ant nonsense to separate the build sequences that led to coverage reports from the ones that led to deployable production code, since the actual compiled code was different. That effort, in total, probably took me somewhere between 1 and 2 days, including all the wrong turns and reading and futzing I had to do to get it to work elegantly, which I eventually did.

Part of the reason for the hassle is that Java is not open source, nor is a usable open source implementation available. The open source community doesn’t really fully embrace Java by packaging it up nicely and helping users to distribute it, because they cannot legally do so.

Sun’s death grip on Java means extra work for Java developers. Getting a hold of all the code and getting it to work together is tedious and requires that you click through license agreements on Sun’s web site; no Linux distribution includes all of this stuff, and Maven, which is cool but (the last time I looked) very poorly documented, is the only thing that seems to even try and pull all of these things together, but it requires that you manually install it on top of Java which itself must be manually installed. Setting up Maven, and configuring it to manage the dependencies of each of your Java projects is no picnic, either. Perl has CPAN and Ruby has Gems and most Linux distributions have great open source package management systems now (and even Mac OS X has MacPorts), but Java has bupkis. You want to assemble an application, you either go out on a limb and try to get your organization to adopt Maven, or you do it all by hand, with lots of trial and error.

So, bear that in mind when I tell you this: for Ruby on Ubuntu Linux, here’s what was required, after Googling for “ruby coverage” and seeing that there was a thing called rcov:

sudo apt-get install rcov
rcov test.tb

Seriously.

I didn’t even look and see if ‘rcov’ was the package name Ubuntu had adopted. I just took a guess, and it worked, and then I read that one web page about usage to find that it’s just “rcov [your Ruby script here]”. Rcov runs the program you give it as a command line argument, and automatically creates a coverage report, quite quickly. It might have taken me one whole minute from “what’s out there” to “hey nice coverage report”.

That is an example of my experience with Ruby thus far. It’s just too easy to be true; I keep waiting for the gigantic gotcha, like performance (fast so far, but so far my code is trivial) or some hideous feature that’s missing.

In the meantime (until I find something not to like), Ruby seems really excellent, and I might not be able to go back to happily programming in Perl or Java or PHP again if this keeps up. :)