Making Rails Better – Fixing Architecture Flaws in Active Record

ActiveRecord is a funny thing. On the outside it looks great – it neatly maps relational data to Ruby objects and provides an easy to use API via its domain specific language. But on the inside, it contains two suprising architecture flaws that make it difficult to extend and negatively impact performance.

The Vietnam of Computer Science

Mapping object to relational data turns out to be quite tricky. There are so many failed object-relational mapping systems that the whole field has been called the Vietnam of Computer Science. The problem is that objects and tables don’t map cleanly to each other, and the more you try to automate the process the more complex your code becomes, and sooner or later your system becomes too hard and too slow to use.

The approach that I think works best, and ActiveRecord follows, is too keep things relatively simple. An record in a table is mapped to an object, and any related tables are mapped to associations that contain one or more object (depending if the relation is one to one or one to many). And that’s it, anything past that risks descending into the morass of failed object-relational mappings.

ActiveRecord gets bonus points because it makes it easy to define such mappings via its Domain Specific Language (DSL) – the familiar methods :has_one, :has_many, etc.

Botching Columns

But underneath its exterior, ActiveRecord has a couple of architecture flaws in the way that it handles columns (attributes).

The first issue is that ActiveRecord botches its implementation of columns. Reading and writing data from a database requires converting the data from its textual representation (provided by the database’s client APIs) to and from Ruby objects. Let’s look at how Rails does it for Postgresql:

A quick glance at some code shows the problem:

def translate_field_type(field_type)
  case field_type
    when /\[\]$/i  then 'string'
    when /^timestamp/i    then 'datetime'
    when /^real|^money/i  then 'float'
    when /^interval/i     then 'string'
    when /^(?:point|lseg|box|"?path"?|polygon|circle)/i  then 'string'
    when /^bytea/i        then 'binary'
    else field_type       # Pass through standard types.
  end
end

def default_value(value)
  # Boolean types
  return "t" if value =~ /true/i
  return "f" if value =~ /false/i

  # Char/String/Bytea type values
  return $1 if value =~ /^'(.*)'::(bpchar|text|character varying|bytea)$/

  # Numeric values
  return value if value =~ /^-?[0-9]+(\.[0-9]*)?/

  # Fixed dates / times
  return $1 if value =~ /^'(.+)'::(date|timestamp)/

  # Anything else is blank, some user type, or some function
  # and we can't know the value of that, so return nil.
  return nil
end

Having large case statements in an object-oriented language is a sure sign your design is flawed. The fundamental problem is that the implementation above is not extensible – you can’t easily add your own field types.

You could argue that that extensibility was not a design goal of ActiveRecord, but that would be silly. Even if you agreed that ActiveRecord should only support a few limited data types (which I don’t) there are still enough differences between databases that having an extensible system would clean up the internals of ActiveRecord and get rid of the grungy code above.

And more importantly, it would let users add their own data types. And that is important. For example, with MapBuzz we need to support Postgres’s geometry types and we would also like to support its full text search types. Overriding Rails to support them is an exercise in annoyance since it requires overriding various core methods in the Postgresql adapter.

The way this should have been implemented is introducting a Column object. The column object’s api would be simple – it would have a serialize and derialize method. Note that ActiveRecord does indeed have a column object, but its very weird implementation. For example:

def klass
  case type
    when :integer       then Fixnum
    when :float         then Float
    when :decimal       then BigDecimal
    when :datetime      then Time
    when :date          then Date
    when :timestamp     then Time
    when :time          then Time
    when :text, :string then String
    when :binary        then String
    when :boolean       then Object
  end
end

This code is clearly trying to be much too clever. Keep it simple stupid! There should be a TimeColumn, FloatColumn, etc. That way, a developer can add their own column types – so for us a GeomColumn.

Attributes

The second issue, which is related, is the way that column values are handled. ActiveRecord stores data read from a database in a hash table called attributes. But suprisingly, the attributes hash table is also used to store Ruby objects. Thus the data stored in the attributes hash table may either be a Ruby object (in serialized format) or the text returned from the database (unserialized).

This is a horrible design for two main reasons.

First, it means that every time an attribute is accessed there has to be code to check to see if its its in string format not. If it is, the data must be converted to Ruby, which causes a performance hit.

Second, it means that ActiveRecord cannot keep track of which attributes have changed and which have not. That’s important, because it means that ActiveRecord updates every column even when just one column changes. Besides being a performance hit, it means that ActiveRecord will corrupt your database if you are not careful. That happens when a table contains a column type that ActiveRecord is not familiar with – a good example being a ts_vector field in Postgresql. ActiveRecord will attempt to update it using the wrong value although the column hasn’t changed at all.

So what’s a better solution? A pure object-oriented solution would introduce a Field object, which has four fields – the raw value (from the database), the serialized value (the ruby object), a reference to the column object which knows how to serialize/deserialize the field and a changed column.

But that’s pretty heavy-weight since you’re introducing an extra object per field per record. An alternate solution would be too introduce three hash tables per record – one to hold the raw values, one to hold the serialized values and one to hold the changed flag. You would also want to store references to the records columns, most likely on the class itself (so if your table is called parents, then store the column information on the Parent class).

Fixing Active Record

The good news is that the Rails team is looking at these issues. In particular, Michael Koziarski has recently posted a patch that introduces the concept of a separate hash table to store serialized values. So check out the patch, and be sure to offer Michael your comments!

Leave a Reply

Your email address will not be published. Required fields are marked *

Top