Making Rails Better – Fixing Architecture Flaws in Active Record

ActiveRecord is a funny thing. On the outside it looks great – it neatly maps relational data to Ruby objects and provides an easy to use API via its domain specific language. But on the inside, it contains two suprising architecture flaws that make it difficult to extend and negatively impact performance.

The Vietnam of Computer Science

Mapping object to relational data turns out to be quite tricky. There are so many failed object-relational mapping systems that the whole field has been called the Vietnam of Computer Science. The problem is that objects and tables don’t map cleanly to each other, and the more you try to automate the process the more complex your code becomes, and sooner or later your system becomes too hard and too slow to use.

The approach that I think works best, and ActiveRecord follows, is too keep things relatively simple. An record in a table is mapped to an object, and any related tables are mapped to associations that contain one or more object (depending if the relation is one to one or one to many). And that’s it, anything past that risks descending into the morass of failed object-relational mappings.

ActiveRecord gets bonus points because it makes it easy to define such mappings via its Domain Specific Language (DSL) – the familiar methods :has_one, :has_many, etc.

Botching Columns

But underneath its exterior, ActiveRecord has a couple of architecture flaws in the way that it handles columns (attributes).

The first issue is that ActiveRecord botches its implementation of columns. Reading and writing data from a database requires converting the data from its textual representation (provided by the database’s client APIs) to and from Ruby objects. Let’s look at how Rails does it for Postgresql:

A quick glance at some code shows the problem:

def translate_field_type(field_type)
  case field_type
    when /\[\]$/i  then 'string'
    when /^timestamp/i    then 'datetime'
    when /^real|^money/i  then 'float'
    when /^interval/i     then 'string'
    when /^(?:point|lseg|box|"?path"?|polygon|circle)/i  then 'string'
    when /^bytea/i        then 'binary'
    else field_type       # Pass through standard types.
  end
end

def default_value(value)
  # Boolean types
  return "t" if value =~ /true/i
  return "f" if value =~ /false/i

  # Char/String/Bytea type values
  return $1 if value =~ /^'(.*)'::(bpchar|text|character varying|bytea)$/

  # Numeric values
  return value if value =~ /^-?[0-9]+(\.[0-9]*)?/

  # Fixed dates / times
  return $1 if value =~ /^'(.+)'::(date|timestamp)/

  # Anything else is blank, some user type, or some function
  # and we can't know the value of that, so return nil.
  return nil
end

Having large case statements in an object-oriented language is a sure sign your design is flawed. The fundamental problem is that the implementation above is not extensible – you can’t easily add your own field types.

You could argue that that extensibility was not a design goal of ActiveRecord, but that would be silly. Even if you agreed that ActiveRecord should only support a few limited data types (which I don’t) there are still enough differences between databases that having an extensible system would clean up the internals of ActiveRecord and get rid of the grungy code above.

And more importantly, it would let users add their own data types. And that is important. For example, with MapBuzz we need to support Postgres’s geometry types and we would also like to support its full text search types. Overriding Rails to support them is an exercise in annoyance since it requires overriding various core methods in the Postgresql adapter.

The way this should have been implemented is introducting a Column object. The column object’s api would be simple – it would have a serialize and derialize method. Note that ActiveRecord does indeed have a column object, but its very weird implementation. For example:

def klass
  case type
    when :integer       then Fixnum
    when :float         then Float
    when :decimal       then BigDecimal
    when :datetime      then Time
    when :date          then Date
    when :timestamp     then Time
    when :time          then Time
    when :text, :string then String
    when :binary        then String
    when :boolean       then Object
  end
end

This code is clearly trying to be much too clever. Keep it simple stupid! There should be a TimeColumn, FloatColumn, etc. That way, a developer can add their own column types – so for us a GeomColumn.

Attributes

The second issue, which is related, is the way that column values are handled. ActiveRecord stores data read from a database in a hash table called attributes. But suprisingly, the attributes hash table is also used to store Ruby objects. Thus the data stored in the attributes hash table may either be a Ruby object (in serialized format) or the text returned from the database (unserialized).

This is a horrible design for two main reasons.

First, it means that every time an attribute is accessed there has to be code to check to see if its its in string format not. If it is, the data must be converted to Ruby, which causes a performance hit.

Second, it means that ActiveRecord cannot keep track of which attributes have changed and which have not. That’s important, because it means that ActiveRecord updates every column even when just one column changes. Besides being a performance hit, it means that ActiveRecord will corrupt your database if you are not careful. That happens when a table contains a column type that ActiveRecord is not familiar with – a good example being a ts_vector field in Postgresql. ActiveRecord will attempt to update it using the wrong value although the column hasn’t changed at all.

So what’s a better solution? A pure object-oriented solution would introduce a Field object, which has four fields – the raw value (from the database), the serialized value (the ruby object), a reference to the column object which knows how to serialize/deserialize the field and a changed column.

But that’s pretty heavy-weight since you’re introducing an extra object per field per record. An alternate solution would be too introduce three hash tables per record – one to hold the raw values, one to hold the serialized values and one to hold the changed flag. You would also want to store references to the records columns, most likely on the class itself (so if your table is called parents, then store the column information on the Parent class).

Fixing Active Record

The good news is that the Rails team is looking at these issues. In particular, Michael Koziarski has recently posted a patch that introduces the concept of a separate hash table to store serialized values. So check out the patch, and be sure to offer Michael your comments!

  1. John
    August 13, 2007

    Yep, give it a few more versions and maybe AR will begin to measure up to some of the Perl ORMs from two years ago 😉

    Reply
  2. Jon
    August 13, 2007

    Would this break portability? One of the ‘cool’ things about rails seems to be that you can move it to another DB without breaking anything, the migrations should just automagically create the schema and you’re ready to roll.

    That being said, breaking portability in order to use advanced DB features seems worth it to me. Migrations are cool, but quite simple in what they can do. (But, to support SQLlite, they sort of have to be)

    Reply
  3. Charlie Savage –
    August 13, 2007

    Hi Jon,

    No, by default it wouldn’t break portability – its just a better way of packaging ActiveRecord’s existing code.

    Of course if added your own custom data type, then it would be up to you to port it across databases if you needed database portability. But in reality, I think database portability is way over-rated. Its hard enough getting one database setup right, let alone several.

    Reply
  4. wilsonm@yahoo.com
    August 13, 2007

    Having real bind variables would also be a big win. They probably cause a larger amount of time spend waiting than anything else.

    Does anyone know if/when they will be available?

    Reply
  5. Charlie Savage –
    August 13, 2007

    Wilsonm,

    That’s good point – its another place where ActiveRecord could be significantly improved.

    Reply
  6. August 14, 2007

    ActiveRecord is the largest library in Rails. Any increased efficiencies in the library would definitely help Rails applications in general.

    I’d love to see a prototype that uses some of these ideas. It might break some third-party plugins but could be well worth the work to update them.

    http://geoffreygrosenbach.com/system/assets/1144684900_normal.png

    Reply
  7. Crescent Fresh
    August 15, 2007

    So how would one go about mapping a db field to a column object (TimeColumn, FloatColumn, etc)? Would there still not be a need for a big case statement? Eg

    def klass
    case type
    when :float then FloatColumn
    when :datetime then TimeColum
    # …
    end
    end

    Reply
  8. Charlie Savage –
    August 15, 2007

    Hi Crescent Fresh,

    Depends at which point you mean. What I mean is:

    1. Have a hash table per adapter class that is keyed on column type (a string, defined by the current database) with values being the correct Ruby column class for that type (you of course have to implement these classes). So:

    mappings =
    {‘varchar’ => StringColumn,
    ‘integer’ => IntegerColumn,
    ‘geometry’ => GeometryColumn}

    2. At startup, read data dictionary info from database to get list of columns (Rails of course already does this).

    3. For each column, look up the Ruby column class in the hash table setup in #1. Create an instance of the ruby column object. Store it in another hash table, called columns, keyed on the column name. So each table has a columns hash table.

    You’ve now avoided all case statements, so the code will be faster. And its easy to extend – a developer could just add a new mapping to the hash table defined in part #1.

    Does that help?

    Reply
  9. Crescent Fresh
    August 16, 2007

    So you’ve replaced case statements with a hash table, correct? And this allows easily and elegantly adding custom column types. Agreed. Are hash table lookups really faster than case/switch statements though (in any language)? Less code, sure, but faster?

    Reply
  10. Charlie Savage –
    August 16, 2007

    CrescentFresh,

    Sort of. Use a hash table to create the right column objects, and then just use inheritance/duck typing from there. Each column would have an api like:

    def serialize
    end

    def unserialize
    end

    As far as hash lookups versus case statements in an interpreted language like Ruby, its hard to tell. Depends on the length of the case statement, and what code it executes. Either way, I wouldn’t imagine one to be much faster than the other. The equation changes for a compiled languages of course, but that’s not the issue here.

    Reply

Leave a Reply

Your email address will not be published.

Top