I’ve been working with Ruby on Rails intensively for several months, and I’ve finally found a place where Rails can’t readily be extended to do what I want. It’s ActiveRecord, which is probably the most controversial part of Rails.
I’m reminded of a James Gosling quote disparaging Microsoft tools, particularly Visual Basic: “The easy stuff is easy, but the hard stuff is impossible.” There’s a parallel between VB and Rails in this instance, in that if you only let yourself use the high level tools, the hard stuff is impossible, but the designers specifically tell you to do the hard stuff using a lower level toolset. The controversy that surrounds “X can’t do everything, therefore it sucks” should really be focusing on the feasibility of going through that trapdoor to do things “the hard way”. This is what Delphi did, which is why so many folks chose it over VB; it made the hard stuff easier.
Here’s the task I need to accomplish, for which ActiveRecord is not well suited: complex queries involving SQL functions and multiple-table joins. I want to join a few tables together, order by a SQL function, include with each result row the result of a SQL function that operates on each row, and have all that come back as a graph of high-level objects.
Despite my attempts to use plugins, extend and/or fix bugs in those plugins, and to dig through the ActiveRecord source to figure out what the documentation won’t tell me, I was unable to get it to work. Most of the parts of what I wanted was possible: acts_as_tsearch cleverly weaves SQL functions into a high-level ActiveRecord::Base.find calls; paginating_find provides a very convenient pagination API on top of ActiveRecord::Base.find, and ActiveRecord includes some clever association tricks such as automatic many-to-many relationships (has_and_belongs_to_many), eager loading of associated records using a join (via the :include option to ActiveRecord::Base.find), and a fairly low-level :joins option that lets you add tables to a ‘find’ query which can be used in your :conditions. Problem is, they don’t all work together in a fancy way.
Really, the issue in this case is related to the design choices that went into ActiveRecord.
Some ORMs (object-relational mappers) are designed in a modular fashion: there is a part that helps you describe the relationships between your model objects, a part that helps you construct queries, and a part that does the storage and retrieval. Sometimes there’s another part that uses your description of object relationships to create an empty database with the appropriate data model, or that looks at an existing database and creates an object model that matches it. Sometimes there’s an import/export tool for bulk data loading or dumping as well.
ActiveRecord has the first three functions integrated (which has benefits and drawbacks compared to a more modular approach), has a very isolated schema manipulation module, and has a somewhat isolated data loader tool.
The relationships are explicitly declared in source code using associations: has_one, has_many, belongs_to, and has_and_belongs_to_many. These are pretty fancy and provide some convenience features that make the associations appear as object collections, such that changing the collection and saving it turns into insert/delete/update activity in the database.
Query construction is basically tied to the objects themselves, in a way that greatly simplifies star-join queries, but which handles only the simplest joins across multiple tables, and is barely able to handle self-referential joins at all. So, you can easily load an object (or group of similar objects) and associated objects, but OLAP-style queries (“what are the top 5 states where customers are located who have bought classical CDs within 2 weeks of their release using American Express and had them shipped as gifts via UPS 3-day Select?”) are impossible. Oddly, views, functions, and stored procedures could bridge the gap between real-world data models and ActiveRecord’s limited set of association types, but they are not supported either.
The storage and retrieval code is inseparable from the query code, and so it is not possible to examine and modify the final SQL before it is executed, nor is it possible to provide an arbitrary query and have the results be parsed into an object graph based on the associations you have defined. The code that would allow these features appears to exist and be sufficiently well designed to allow this with a fairly small amount of changes to ActiveRecord. However, it is currently (as of Rails 1.2.3, which is the current release) not part of the documented API and is declared private.
There is a limited facility for constructing simple objects from arbitrary SQL, in find_by_sql. This loses essentially all of the high level functionality of the find method; most notably, it isn’t possible to use find_by_sql results to instantiate an object graph, rather than a flat array of objects (similar to the eager loading feature in the regular find method).
ActiveRecord has fairly good high-level schema creation functionality (“migrations”). Though it lacks concepts for all but the basic database objects, support can be added for foreign key constraints (I kid you not, they aren’t supported by Rails itself!) and views. There’s also a simple way to execute arbitrary SQL. Migrations aren’t technically that amazing, but rather they’re a helpful organizational approach to what can be a really hairy problem: defining a schema and then applying changes to live databases while keeping track of what changes you’ve already applied.
Finally, there is a test data loading facility called Fixtures. The common opinion of Fixtures seems to be that they are broken by design and should be avoided. The main issue I’ve found with them is that the implementation ignores the kind of database design elements that any book on SQL would recommend, such as foreign keys and check constraints. I managed to circumvent this with a combination of a plugin and some customization, described in detail in my previous post, Rails, Fixtures, the Test DB, and Test::Unit. With those changes, all test fixture data is preloaded in the right order (so constraints aren’t violated) before any tests run, and any data alterations within tests are rolled back automatically by Rails.
A secondary issue with Fixtures is that they go directly from YAML text files to SQL INSERT statements, bypassing the ActiveRecord Model classes. ActiveRecord does pretty much rule out any fancy mapping between database tables and objects, so that’s not a problem, but this model-skipping fixture loading implementation means that any code in your model object (validations, before_save filters, etc.) will not be executed when loading fixtures. So fixtures do not work well with the otherwise pervasive Rails design rule of “put all the intelligence in the application”.
Still, despite the commonly-held disdain for using fixtures at all, I find that they can be tamed. In fact I’ve even created a base data facility for loading the fundamental data set that needs to be in the live database (e.g. initial admin user info). My approach is basically to alter fixture behavior to treat it as essentially a bulk data loading tool, and to do the extra housekeeping after loading to make up for the fact that the ActiveRecord model code was bypassed.
As far as I know, there is no bulk data dumping functionality in Rails.
So, to summarize, of the five main ORM features, here’s how ActiveRecord stacks up:
- Describing Relationships: Easy to understand and use, with lots of slick functionality
- Querying: Easy to understand and use, but limited to simple join structures, and not possible to customize query building or rewrite SQL before execution
- Storage and Retrieval: Very easy to use, but only within the limits of the query builder’s features
- Schema manipulation: Easy to understand and use; limited in functionality but readily extensible; solid third party plugins are available for missing schema objects
- Bulk Loading and Dumping: Loading is badly designed and implemented, but fixable with some effort; dumping is not offered
Okay, so it definitely makes the easy stuff easy. But what about the rest?
As I observed before, ActiveRecord is not designed as a set of modules that you use to assemble a solution that fits your needs. That’s more of the Java approach to design, and it trades flexibility for convenience. It can be a major pain to assemble a working system out of all of those abstract Java APIs, which are sometimes so comically over-patternized as to draw mockery such as the hilarious “Are Javalanders Happy?” code snippet from Execution in the Kingdom of Nouns. Rails makes the opposite trade-off: sacrifice flexibility and gain a very approachable API.
Unfortunately, the Java approach (too abstract to readily use, but extremely flexible) is easily wrapped with a simpler, more convenient, less customizable API. The Rails approach isn’t internally componentized (have a look at ActiveRecord’s activerecord/base.rb source file in its 2,165-line glory, almost all of which is one class), so if you want to fiddle with its internal behavior, you can’t. So with Rails, it’s all or nothing: high level slickness for simple requirements, or hand-written SQL and hand-coded results mapping for your complex requirements.
As I said at the beginning, though, the key question is not how comprehensive the high level feature set is. More important is the question of how painful things are when you drop down to a lower level for a greater degree of control.
It would be nice if there were a middle level of complexity, between the high-level ‘find’ method and ‘has_xxx’ associations, and raw SQL. There isn’t. I think that the reason there isn’t one is that there is still a persistent belief among many Rails core team members and community members that databases should be stupid: just a persistent hash. Once upon a time I worked that way myself: I didn’t have access to or skill with a SQL RDBMS, and so I solved all of my persistence problems with DBM files, which (using Perl’s Tie::Hash class) are conceptually just persistent hashtables. miniSQL was little more than a SQL query parser on top of that sort of storage engine, and MySQL originally was pretty similar. But big databases have all sorts of useful features that address complicated persistence requirements in a fairly elegant way.
Given that Ruby fans like the idea of domain specific languages, which let you work in a super high level language customized to the problem domain, it’s surprising that Rails groupthink is that SQL is bad. It’s actually a very high level language, and allows a well written database to do some pretty amazing optimization on the fly because it provides a strong layer of abstraction between what you requested and how the storage engine provides it.
No, it’s not dynamic, nor is it pure relational perfection, but it’s pretty darn good. Pre- and post-event validations and arbitrary callbacks to user-specified code, functions providing behavior on top of data… these are all things that Ruby and Rails fans hold in high regard when provided by Ruby and Rails, but which are considered a bad idea at the database layer. As I discussed at length in Rails and the notion of Stupid Databases Being a Good Idea, this is a philosophy rooted in DRY, but it has some major flaws.
Mainly, there is the issue that some things must be done in the data tier, and trying to put them in the application tier doesn’t work. The best example that comes to mind is full text search. Satisfying queries is the database’s job, period. It’s just hideously slow to try and do an inner join in the application across a network link to a database. If you find yourself doing this, that’s a pretty good sign that your architecture is broken. But some queries are too complicated for ActiveRecord, so sometimes you must choose between a series of high level queries whose results are intersected in application code (easy to understand, but extremely inefficient), or hand coded SQL.
Well, SQL is fast and is a high level domain-specific language, so it isn’t actually a bad tool for the job. The problem is that this approach (the trapdoor to the lower level API) is regarded differently by different people. Some see it as a common and reasonable approach to complex requirements; others see it as a bad evil scary thing that should be avoided at all costs, a kludge and a design mistake.
As a result, the low level option in Rails is anemic. It’s there, but you’re not supposed to use it. Ruby’s ActiveRecord Makes Dropping to Raw SQL a Royal Pain (Probably on Purpose) notes that there are no bind variables allowed in ActiveRecord. You may be saying, “No, wait a minute, I’ve used them, that can’t be right.” That’s what I thought. Look at the source; the bind variable functionality is actually a high level feature built on top of drivers that don’t have that feature. Whatever you did at the high level, it’s going to the driver as a single string. Okay, it’s nice that they added that feature, especially since it provides a single point of testing and verification for safe escaping. But that functionality (in sanitize_sql) is not part of the public API. Fortunately that same article provides a workaround that makes sanitize_sql accessible, so you can use bind variables in your hand coded SQL code, and pretend that the driver supports them. But that’s not likely to work forever.
The key problem with ActiveRecord is its least common denominator feature set, based around the least featureful of all popular SQL databases: MySQL. Years ago, MySQL AB (the vendor of the MySQL database) took a strong philosophical stand against pretty much any advanced database features (which their product lacked, and which competing products had), but lately they’ve softened and added those features that they claimed nobody really needed. In the meantime, Rails has been designed with minimal expectations for database sophistication; therefore, the limited functionality of ActiveRecord is fairly complete, assuming you’re using a database with similarly limited functionality.
Triggers, stored procedures, functions, data integrity constraints, nested transactions, and views are all examples of unsupported database functionality. Try and use them via ActiveRecord’s high level API, and you will quickly see how fragile and inflexible ActiveRecord really is. If you shouldn’t need those features in your database, then you shouldn’t need anything that ActiveRecord doesn’t already provide, so it shouldn’t matter that you can’t extend ActiveRecord.
Truly, these are features that you need only in a few small cases in your application, so looking at individual queries they’re needed rarely (which is not the same thing as “never”). But looking at whether you need one or more of them in a given application, they’re needed more often than not. The pain of using hand coded SQL makes this worse: some tricky things could be done either using a view or stored procedure, or using a really slick dynamic SQL statement. Making all of those options painful means that even a clever developer can’t use anything in their bag of tricks to craft an elegant solution.
Unfortunately, non-trivial web applications need things like full text search, complex associations between persistent objects, non-trival summary information about associated objects, and complex reports, and ActiveRecord fails at all of these. These are not just things that big dumb ancient companies that like using Object COBOL think they need; Amazon and eBay need them too.
The acts_as_tsearch plugin is a good case study of ActiveRecord’s design flaws. TSearch2 is the standard PostgreSQL full text search engine, and it’s pretty good in my opinion. It’s also pretty straightforward to use. Unfortunately for developers using Rails, TSearch2 uses SQL functions (mainly to_tsquery and rank_cd). The acts_as_tsearch plugin tries to inject SQL into ActiveRecord’s queries via the high-level find interface, but ultimately fails as soon as you use the :joins or :include options. The problem is that ActiveRecord has a very simplistic idea of how queries and joins work, and so if you need to inject SQL functions to get the job done (as is necessary in TSearch2 queries), too bad. (See also issues 7 and 8 in acts_as_tsearch, in which I describe and attempt to clean up the mess that results when you use find_by_tsearch in non-trivial ways.)
A fellow Rails developer asked me in all seriousness why I wasn’t abandoning the full text search functionality of TSearch2 and just using a completely separate, redundant database product designed exclusively for full text search. Seriously, that is considered the “easy” approach: one database for full text search, and another for ACID/OLTP/CRUD. Honestly if I were going to go down that road I would try hard to just abandon the SQL RDMBS and put everything in the other database, since Lucene and its imitators are capable of far more than just find-text-in-document queries. The pain of duplicating everything, using two query languages, two document representations (in addition to the object representation in Ruby) and writing application-tier query correlation makes the double-DB approach seem very unwise.
It makes far more sense to me to use the SQL RDMBS’s full text search facility, even if there’s a 2x or 3x read performance penalty, because the conceptual simplicity of having one powerful storage tier (instead of two halves cobbled together) eliminates a ton of ugliness in the application, and the SQL RDBMS is going to get clustered for reads anyway. Nevertheless, even if I’m wrong about this case (putting search in the SQL RDBMS instead of in a separate server), there are other cases for needing a smart database that gives you exactly the results you need and lets you push data logic into the data tier.
So, what do I suggest? Abandon Rails? Nope. I still like Ruby a lot, and find Rails very useful. I just think that ActiveRecord needs to support the low-level and middle-level abstractions better.
Specifically, supporting bind variables (either by exposing that sanitize_sql function, or better yet by making drivers and connection adapters support bind variables for real) would make the find_by_sql, select_all, and exec approaches to low-level SQL query execution less painful.
More difficult, and substantially more valuable, would be refactoring ActiveRecord::Base to split it up in the way I described above: association descriptions and unmarshalling code separate from query building code separate from SQL execution and result retrieval code. All of this could remain hidden for most users under the same old slick high-level API, but for advanced requirements, the ability to fiddle with the SQL and still use the built in high-level unmarshalling code to create object graphs from flat result sets would be very powerful, and useful.
I looked at one alternative to ActiveRecord, called Sequel, which overlaps with ActiveRecord only partially. It is a query builder and lazy result proxy, which is actually what I thought ActiveRecord would do when I first started working with Rails. The proxy design means that you can either keep adding constraints or start fetching results, from the same Dataset class. This seems like a pretty good approach, though I haven’t really looked closely to make sure it would fit what ActiveRecord needs.
What Sequel lacks, though, is the unmarshalling side: turning a 2-dimensional (rows of columns) result set into a complex object graph (customers with orders with order lines with products from suppliers stored in warehouses), with user-controlled eager or lazy loading behavior. Ruby is well-suited to a design that would allow user-specified code (i.e., a block) to decompose each row into the object graph associated with that row, leaving the remaining associations on those objects to be lazily provided via future queries.
So, I think there is hope for ActiveRecord, definitely. I considered the idea of rolling a minimal Hibernate clone, or some other sort of challenger to ActiveRecord, but I don’t that ActiveRecord is broken beyond repair. I think the shortest path to a badass Ruby ORM is through improvements (refactoring and abstraction) to ActiveRecord.
So, if you’ve read this far, you probably care about these issues. Here’s my call to action: Please help me make ActiveRecord less like VB and more like Delphi. Who else is interested in helping me with this effort? Are there alternatives that I’ve missed, or components that could be integrated into ActiveRecord to make it better?