Rails Migration Antipatterns and How To Fix Them

Migrations are one of the best features of Rails. Although some folks prefer pure SQL rather than Rails migration DSL, I don’t know of anyone who dislikes the idea of a versioned schema that can evolve in a controlled and repeatable fashion.

But because the concept of database migrations is such a powerful one, it’s tempting to jam any old change that affects the database into a new migration and run rake db:migrate to make it happen. I’ve been guilty of a bit of this in the past, and I’ve joined some projects that did other ugly things in migrations. In the process I’ve learned the hard way that there are some things you must never do in a migration or they will come back to haunt you later. Here they are.

Antipattern: Require the Database to Exist Already

In other words, the antipattern is for the first migration to depend on some tables and maybe even some data already being in the database.

I know that the original Rails blog video shows DHH using a MySQL admin tool to create the blog database interactively, but really you should be using migrations to create the schema programmatically from scratch.

If you’re already working on a project that didn’t do that, you can run rake db:schema:dump and look at db/schema.rb; it contains code that you can insert into a new migration to create the same schema in your development environment. If you’re using DB features that the design philosophy of ActiveRecord doesn’t agree with, such as triggers, and the schema.rb dump doesn’t include them (or if you just think the migration DSL is ugly and you like SQL DDL better), you can do a mysqldump / pg_dump / whateverdump and wrap a migration around the loading of that SQL file.

If you have a hybrid (you have to start with an old db dump and then migrate it so it becomes current), that’s gross, and you have a couple of options which are both pretty ugly. But they will work, and when you’re done the ugliness will be gone.

You could fight your way back to the oldest schema version by debugging the self.down methods and running rake db:rollback repeatedly until you can create a 00001_starting_db_schema.rb migration, or you could just blow away all the migrations and use the highest schema version for a new migration that contains the output of a current rake db:schema:dump. It depends on how many copies of the database are out there with old schemas that would need to be brought up to date. Clearing out db/migrate and replacing it all with a single migration is cleaner, but if your production database is 5 migrations out of date you obviously can’t do that. But you could collapse it down to the one big-bang migration (as the oldest), plus the 5 pending schema changes. If you do it right, you can just deploy the new code and run rake db:migrate and everything will be fine. If not, well, you were testing it on a backup of the production database, right? :)

Antipattern: Only Work Correctly With the Production Data

What’s wrong with developers just making dumps of the production database and loading them locally?

First of all, it means that all schema changes have to start at the production database and work backwards to developers’ sandboxed development environments. Hopefully this strikes you as a very stupid workflow.

Secondly, maybe your users don’t all want to get a message that says “test message foo bar sdfasdfasd bloopity bloop” when you’re testing your new alert system. Should you really be putting their data (passwords, contact info, etc.) at the mercy of your crummy new code?

You should be able to immediately generate an empty, clean database for development. rake db:drop; rake db:create; rake db:migrate should do this; rake db:reset should have the same result but should be faster since it doesn’t bother with each migration in sequence.

You should also be able to immediately generate any essential base data such as the initial admin user. The SeedFu plugin does a good job here.

If you need some additional fake data to fiddle around with in your development environment, the Populator gem is handy for mass-inserting a bunch of faux data, especially in conjunction with Faker.

Note that the migrations should neither depend on nor contain actual data. They should just change the data model.

Antipattern: Clean Up That Only Works on Production Data

This is really a subset of the previous item but it’s worth considering as a special case.

If you want to fix some data that got slightly corrupted by some bad code that has been replaced, migrations aren’t a terrible way to accomplish that.

It’s not really what migrations are for, and a one-off rake task can do it just as well, but if you really want to, you can get away with it under one condition: you have to make your cleanup migration code succeed even if the database is empty (such as when a developer has just run rake db:reset; rake db:migrate).

Antipattern: Load Data

The populator gem is good for initial, mandatory data. The machinist gem is good for synthetic test data. Delete db/fixtures and everything in it. Fixtures are evil.

Wrap a rake task around the “get my development database ready” concept. This task should start with the “get my empty production database ready” task (or some subset of that which is appropriate for developer use).

If you need to load arbitrary data now and then, write an importer. Do this as a rake task, or a web UI to a bulk data importer feature. Better yet, make a web UI in your admin area which is just a wrapper around the rake task that bulk imports data. Then delegate the bulk importing to your customers so your admins can do real admin work. But don’t load data in a migration.

Antipattern: Use Rails Models in the Migration

Models evolve, but old migrations don’t change (nor should they). So when you wrote a migration that used a model, it used the old version of the model code. Then a year later the model has evolved, and the new validations on first_name and last_name fail because it used to be full_name, and that old migration that hasn’t changed has stopped working. It depended on something that did change, incompatibly.

For rockstar points, in your continuous integration environment you should run rake db:drop; rake db:create; rake db:migrate to make sure that this can never happen.

But if it has already happened, rip out the model code and replace it with Rails DSL code, with execute statements containing raw SQL code, or (if you feel like a Ruby rockstar) declare new, stripped down model classes inside your migration class that will act as stand-ins for the limited needs of the migration. See Migrating with Models for more on how to do this last trick.

Conclusion

You should always be able to do this in every Rails environment that your application has: rake db:drop; rake db:create; rake db:migrate; rake db:reset

At this point you should then be able to run rake db:test:prepare and then rake spec or rake test or whatever and have it work.

If any part of that process fails, you are missing out on the benefits of using Rails migrations.

5 thoughts on “Rails Migration Antipatterns and How To Fix Them”

  1. From rails source code.

    # This file is auto-generated from the current state of the database. Instead of editing this file,
    # please use the migrations feature of Active Record to incrementally modify your database, and
    # then regenerate this schema definition.
    #
    # Note that this schema.rb definition is the authoritative source for your database schema. If you need
    # to create the application database on another system, you should be using db:schema:load, not running
    # all the migrations from scratch. The latter is a flawed and unsustainable approach (the more migrations
    # you’ll amass, the slower it’ll run and the greater likelihood for issues).
    #
    # It’s strongly recommended to check this file into your version control system.

  2. Daniel,

    You didn’t actually add any commentary with that excerpt so I can only guess at your point. Perhaps you’re saying that you agree with that comment?

    I suppose I agree with it too, IF you aren’t using any database features that would be omitted by db:schema:dump (schema.rb) or db:structure:dump (schema.sql). And that’s a pretty big “if”. I even capitalized it! :)

    The nice thing about migrations is that even if you dare to use Rails on a project where those opinionated-software decisions don’t happen to fit 100% with your project, you can use still use migrations to build the DB from scratch or incrementally in a reliable and repeatable fashion. Just throw a few raw SQL DDL statements in a migration and put the reverse transformation in the down method, and you’re all set.

  3. Daniel is absolutely correct. You should be using schema:load or structure:load to create the DB, not running all the migrations since the beginning of time. You say “IF you aren’t using any database features that would be omitted by db:schema:dump (schema.rb) or db:structure:dump (schema.sql)”, but you seem to be unaware that while schema.rb does omit certain things, structure.sql will *always* guarantee a complete dump, with no feature omitted. (I don’t recommend using the SQL schema file unless you absolutely have to, but it is a 100% effective last resort.)

    In other words, there’s never any need to run all migrations from zero. Never do it; use the schema file (Ruby or SQL) instead.

Leave a Reply

Your email address will not be published. Required fields are marked *