Making Rails Go Vroom

Last week I showed how to profile a Rails application using ruby-prof. The post was driven by our desire to improve MapBuzz‘s map rendering speed. Our nightly log analysis showed that rendering speed has significantly degraded over the last few weeks. Since the whole point of MapBuzz is to share maps with others, we needed to find out what was taking so long.

A day of profiling and we had our answer. With a few simple changes we were able to reduce rendering time from 7.8 seconds to 0.92 seconds (note these times are gathered while profiling the application, so the real times are 2 to 3 times less).

What we found should be generally applicable to all Rails applications, so I’d like to share them:

If you interested in the nitty-gritty details of each change, keep reading!

A Bit of Background

MapBuzz renders maps in a two step process. First, the html page is created and shown to the user. Second, the browser makes an Ajax request to get the features that should be shown on the map. The reason for having a two step process is that it allows users to zoom and pan around the map, without having to reload the whole page.

Log analysis revealed that the Ajax request was taking a long time – 1 to 2 seconds, with a big standard deviation. Our goal is to reduce the average time to less than half a second and to significantly reduce the standard deviation.

If you look at the request URI, you’ll see its an Atom feed of features and not an HTML page. That has performance implications.

First, Atom feeds tend to have lots of links. In our case, each entry has 10 to 15 links. We limit the number of entries to 50, so that means there are roughly 600 link per page.

Second, we generate Atom feeds in a modular fashion – each entry is created from 10 partials. There is a partial for the user, one for ratings, one for content, one for icons, one for geometries, etc. Why did we do it this way? Simple – to keep things DRY. Having a set of partials creates an extremely modular system – we can mix and match them as needed depending on the context of the request.

Setup

Testing was done with two machines. The first was my laptop, which is a Dell Latitude D610 running windows, Ruby 1.8.4, and Rails 1.2.3. The second is our staging server, which is fairly high end Dell Desktop running Fedora Core 6 with a copy of our production database (which is about 10Gb). Profiling was done with ruby-prof 0.5.1 (of course), using the Rails plugin I described previously. Note that the logging level was set to :debug, which will show up in the test results.

Baseline

To start we measured the baseline performance of our test setup. It was ugly indeed:

Total: 7.828
%self     total     self     wait    child    calls  name
24.11      1.95     1.89     0.00     0.06     3605  Kernel#clone
13.15      1.04     1.03     0.02     0.00      569  IO#write
 7.78      4.13     0.61     0.00     3.52      441  <Module::Benchmark>#
                                                     realtime-1
 7.41      0.58     0.58     0.00     0.00      125  PGconn#exec

One bit of solace is that profiling an application will slow it down 2 to 3 times, but that sill leaves us with a 4 seconds per request. So let’s start digging through the results.

Don’t Use ActiveRecord:attributes

The first thing that jumps out is the huge amount of time Kernel#clone takes. A look at the call graph shows that the caller is clone_attribute_value.

And a bit more digging reveals this custom code in our application:

def rating
  # Convert cached rating from BigDecimal to float,
  # otherwise strange rounding errors happen
  attributes['rating'].to_f
end

The problem is that when you call ActiveRecord#attributes it returns a copy of the attributes, thus generating all the clone calls. Uuggh. I think this is a result of ActiveRecord’s flawed attribute implementation, but that is for another blog.

What we want is access to the untyped, original value of the attribute. In theory you are supposed to use the auto generated method, rating_before_type_cast, for this. This has the advantage of skipping the clone call, but it relies on method_missing, which has some overhead (we did not measure how much). Or, you could use read_attribute directly, which also skips the clone call. This is usually your best choice.

For performance critical code, you may wish to read the untyped attribute directly (see the discussion about time below for an example). That can be done using the method read_attribute_before_type_cast, except that it is private. However, with Ruby, that’s easily solved:

module ActiveRecord
  class Base
    public :read_attribute_before_type_cast
  end
end

Lesson – Don’t use attributes

Gain – 24%

Use include

Let’s look at our updated results:

Total: 3.39

 %self     total     self     wait    child    calls  name
 16.73      0.57     0.57     0.00     0.00      569  IO#write
 12.83      0.44     0.44     0.00     0.00      125  PGconn#exec
  8.76      1.77     0.30     0.00     1.48      441  <Module::Benchmark>#
	                                                    realtime-1
  5.99      0.22     0.20     0.00     0.01       40  PGconn#query

What stands out are the 125 calls to PGconn#exec – each one is a separate query. Looks like we forgot to specify a table or two to be eager loaded via ActiveRecord#find’s :include option.

Note there is one downside to using :include – you lose the ability to use the :select option. For us that is important – and can be worked around by using the rails select_parser I’ve blogged about before.

Lesson – Get your :includes right

Gain – 12%

cache_template_loading=true

After fixing attributes and includes, here is our current timings:

Total: 2.594

 %self     total     self     wait    child    calls  name
 12.64      1.14     0.33     0.00     0.81      361  <Module::Benchmark>#
                                                      realtime-1
 12.61      0.33     0.33     0.00     0.00      409  IO#write
  4.90      0.13     0.13     0.00     0.00      403  <Class::File>#mtime
  4.86      0.16     0.13     0.00     0.03    30250  Hash#[]
  3.62      0.09     0.09     0.00     0.00        5  PGconn#exec

A couple of things jump out. First, using Benchmark#realtime is fairly time consuming.
Second, logging also takes its toll as seen in the times for IO#write (remember we have
logging set to :debug).

However, the #mtime call looks suspicious. Some more digging through Rails shows all 403 calls come from
ActionView::Base#compile_template?. Let’s take a look:

def compile_template?(template, file_name, local_assigns)
  method_key    = file_name || template
  render_symbol = @@method_names[method_key]

  if @@compile_time[render_symbol] &&
	   supports_local_assigns?(render_symbol, local_assigns)
    if file_name && !@@cache_template_loading 
      @@compile_time[render_symbol] < File.mtime(file_name) ||
        (File.symlink?(file_name) && 
        (@@compile_time[render_symbol] < File.lstat(file_name).mtime))
    end
  else
    true
  end
end

Remember I mentioned that it takes roughly 500 partial calls to generate the output? This is where it bites us –
by default Rails checks the timestamp of cached templates before running them. In a production environment that
is totally unnecessary. The solution is simple – just update your production.rb file like this:

# Don't check view timestamps!
config.action_view.cache_template_loading = true

And while we are at it, we’ll change the debug level to :info to reduce its impact on the results.

Lesson – Set cache_template_loading to true

Gain – 5%

Don’t use url_for

We’ve managed to reduce the request time from 7.8 to 2.5 seconds. Not bad. But we still have more work to do.


Total: 2.531

 %self     total     self     wait    child    calls  name
 13.59      1.91     0.34     0.00     1.56       45  <Module::Benchmark>#
                                                      realtime
 12.25      0.31     0.31     0.00     0.00      409  IO#write
  4.98      0.81     0.13     0.00     0.69      361  <Module::Benchmark>#
                                                      realtime-1
  4.94      0.13     0.13     0.00     0.00       40  ActiveRecord::Associations::
                                                      HasManyThroughAssociation#
                                                      construct_sql
  3.71      0.09     0.09     0.00     0.00        5  PGconn#exec
  3.04      0.08     0.08     0.00     0.00    29847  Hash#[]
  3.00      0.11     0.08     0.00     0.03     2097  Hash#each
  1.86      0.28     0.05     0.00     0.23      322  ActionController::Routing::
                                                      RouteSet#generate

Of particular interest is the ActionController::Routing::RouteSet#generate. Although it only runs for 0.05 seconds, its total time, including children, is 0.28. Thus creating 322 urls takes 11% of the request time.

People have previously mentioned how slow url_for is – well, its true. Instead of using url_for, just create your urls manually using string mashing. For example, if you want an absolute uri, then do this:

"#{request.protocol}#{request.host_with_port}/controller/#{map.id}"

Lesson – Don’t use url_for or link_to

Gain – 11%

Don’t let Rails parse timestamps

After removing url_for, we’ve smashed the 2 second barrier:

Total: 1.391

 %self     total     self     wait    child    calls  name
  9.06      0.13     0.13     0.00     0.00      726  String#sub!
  8.99      0.13     0.13     0.00     0.00       40  ActionView::Base::CompiledTemplates#
	                                                    _run_ratom_47app47views47geometry47_geometry46ratom
  6.69      0.09     0.09     0.00     0.00        5  PGconn#exec
  5.54      0.08     0.08     0.00     0.00      741  Item#id
  5.54      0.30     0.08     0.00     0.22      121  <Class::Date>#_parse

Next up is <Class::Date>#_parse, whose total time, including children, is a whopping 21% of the time. Ouch.
If you take a look at the code (its in the standard ruby library), you can see its quite complicated
since it tries to parse a variety of date formats. In addition, it makes use of rational numbers, which are none too fast.

Taking a look at a graph profile, we see that <Class::Date>#_parse is called from <Class::ActiveRecord::ConnectionAdapters::Column>#string_to_time.
Thus Rails uses it to convert timestamps received from the database into Ruby objects.

As I detailed in a recent post, the problem is that Ruby’s DateTime implementation is extremely slow compared to its Time implementation. The solution is to avoid DateTime’s entirely and use custom time parsing code. Fortunately, our database can output time in ISO 861 format, which Time can quickly parse.

So, any place we access times, we simply roll our own attribute readers and writers like this (note that we don’t use read_attribute!).

def created_on
  @created_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end
  
def created_on=(value)
  write_attribute(:created_on, value)
  @created_on = nil
end
  
def updated_on
  @updated_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end

Lesson – Don’t let Rails parse timestamps – do it yourself

Gain – 20%

Don’t symbolize keys

We’ve almost hit the 1 second barrier:

Total: 1.156

 %self     total     self     wait    child    calls  name
  8.13      0.11     0.09     0.00     0.01      642  Kernel#send-3
  8.13      0.09     0.09     0.00     0.00        5  PGconn#exec
  6.75      0.08     0.08     0.00     0.00     7497  String#concat
  4.15      0.05     0.05     0.00     0.00      817  Hash#each
  4.07      0.05     0.05     0.00     0.00       40  ActiveRecord::Associations::
                                                      HasAndBelongsToManyAssociation#
                                                      construct_sql
  2.77      0.03     0.03     0.00     0.00     6360  String#match
  2.77      0.06     0.03     0.00     0.03       47  Array#each_index
  2.77      0.03     0.03     0.00     0.00      280  IconType#size
  2.60      0.03     0.03     0.00     0.00     4309  Array#[]
		

At this point, most of the easy wins are gone. However, there is at least one left. Although its not shown in the flat profile, the graph profile shows that the method ActiveSupport::CoreExtensions::Hash::Keys#symbolize_keys takes 2.77% of the time including its children. Additional rummaging through the call graph shows that the calls come from ActionView#compile_and_render_template:

def compile_and_render_template(extension, template = nil, 
			        file_path = nil, local_assigns = {})
  # convert string keys to symbols if requested
  local_assigns = local_assigns.symbolize_keys if 
	          @@local_assigns_support_string_keys

A bit of research shows that local_assigns_support_string_keys is deprecated and slated for removal from Rails! Sweet – that means we can save an additional 3% simply by adding this line to environment.rb:

config.action_view.local_assigns_support_string_keys = false

If only all optimizations were so simple.

Lesson – Don’t symbolize keys

Gain – 3%

Wrapping Up

So what’s our final time? Let’s see:

Total: 0.922

 %self     total     self     wait    child    calls  name
 10.30      0.10     0.10     0.00     0.00        5  PGconn#exec
  3.47      0.17     0.03     0.00     0.14      312  Array#each-2
  3.47      0.03     0.03     0.00     0.00      212  ActiveRecord::ConnectionAdapters::
                                                      Quoting#quote
  3.47      0.03     0.03     0.00     0.00    19455  Hash#[]
  3.47      0.36     0.03     0.00     0.33      360  ActionView::Base#render-2
  3.36      0.06     0.03     0.00     0.03       47  Array#each_index
  3.36      0.03     0.03     0.00     0.00      299  ActiveRecord::Associations::
                                                      AssociationProxy#initialize
  3.36      0.14     0.03     0.00     0.11      705  ActiveRecord::Associations::
                                                      AssociationProxy#method_missing

Not bad – we reduced the request time from 7.8 to 0.92 seconds with a day’s worth of work. Obviously there is still much to be done – we need to run the code through our performance monitoring suite and make sure that the average time holds across a number of requests and that the standard deviation is in line. But we’ve at least made a good start.

  1. Paul
    July 18, 2007

    Great write up. Nice to see that there is a lot of room for speeding things up with a bit of work.

    Of course everyone’s app will be different but some key bits of info everyone can apply.

    Reply
  2. tom
    July 18, 2007

    Hmm… seems like Rails optimizations often boil down to removing syntactical sugar and bringing things out of the cozy Rails nest.

    Reply
  3. Nick Kallen
    July 18, 2007

    Using include can vastly slow things down depending on the context. The problem is Rails has to generate these very sophisticated queries–the `t0_c0` stuff–and extract that data out into the different models. I’ve done lots of profiling on :include and these numbers are misleading. It really depends on context.

    Reply
  4. Charlie Savage –
    July 18, 2007

    Hey Nic,

    Thanks for the comment. I’d like to hear more about your findings with :include. I’ve haven’t seen them slow down our application, but I’m sure they are cases where it could happen.

    But are you sure the slowdown really comes from generating queries? That code is conceptually simple and its implementation isn’t that complex (I had to dive through it for the select parse plugin).

    It seems more likely to me that the slowdown you see would happen either because 1) the query gets complex enough that database performance suffers or 2) parsing of the results gets complicated enough it slows down the query.

    Or is this not what you saw?

    Reply
  5. Charlie Savage –
    July 18, 2007

    Hi Paul,

    I definitely agree that everyone’s application is different – what we see is most likely not same thing that others see. So grab a copy of ruby-prof and find out for sure (or DTrace if on Solaris).

    Charlie

    Reply
  6. Charlie Savage –
    July 18, 2007

    Hey Tom,

    Suppose it depends on what you mean by syntactical sugar. Is url_for syntactical sugar? I’d say no – its more a Rails API. So its more about knowing which apis to avoid – or as you say, getting out of the cozy Rails nest when necessary.

    Reply
  7. July 18, 2007

    If you’re interested in helping out with an improved implementation of Active Record’s attributes, you should drop us a line on the rails core list.

    Some of those issues, especially the repeated-parsing of dates etc, would be nice to fix for 2.0.

    As for url_for, how were you using it? Named routes or hash arguments?

    Reply
  8. Charlie Savage –
    July 18, 2007

    Hey Michael – I’ve been meaning to write up my thoughts on ActiveRecord and attributes for a while now….

    I’ll definitely drop a line to the core list when I do – hopefully they’ll add some value.

    As far as routes, we just use 5 regular routes, not named routes. I can post them if you’d like – but its pretty simple stuff.

    Reply
  9. Michael Koziarski
    July 19, 2007

    Optimising regular old routes will be challenging, however I do think that there’s the possibility of significantly increasing the speed of named routes. After all, they don’t have to search to find the possible matches, there’s only one. So each of the foo_url() methods could instead just be a string mashing method like you use above.

    We’re definitely interested in hearing any thoughts from people who’ve taken the time to profile their applications and have some ideas / patches to speed things up.

    Reply
  10. July 19, 2007

    Great tips! We’ll definitely look into this in the future. Thanks for sharing… 🙂

    Reply
  11. July 19, 2007

    Excellent tips here … bookmarked your site and look forward to more brain-pickin’! 😀

    Reply
  12. Jim Cropcho
    July 19, 2007

    Great post!

    In regards to your “Don’t Use ActiveRecord:attributes” section:

    Would it be useful to redefine ActiveRecord’s attr_reader method, i.e.:

    def attr_reader(attrib)
    read_attribute_before_type_cast attrib
    end

    so that the changes are made across the application for accessor methods, or does this not optimize like your example?

    Reply
  13. Jim Cropcho
    July 19, 2007

    Great post!

    In regards to your “Don’t Use ActiveRecord:attributes” section:

    Would it be useful to redefine ActiveRecord’s attr_reader method, i.e.:

    `def attr_reader(attrib)`

    ` read_attribute_before_typecast attrib`

    `end`

    so that the changes are made across the application for accessor methods, or does this not optimize like your example?

    Reply
  14. Charlie Savage –
    July 19, 2007

    Hi Jim,

    I think its probably not a good idea to redefine attr_reader since its highly likely that it will break your application. Also, you would have to do the typecasting yourself.

    I haven’t looked at Rail’s autogenerated methods for a while, but its possible they cache the values anyway. In our case, we were directly accessing attributes in a couple places, and that turned out to be too slow.

    Reply
  15. July 20, 2007

    nice article.

    we did some of our optimisations lately for our very complex app. and were able to reduce queries by 90%.

    hope you would agree, optimisation is left for the end phase. when you really need them for performance.

    the syntactic sugar is what makes us do the crazy ideas come to life faster.

    will test your finding myself and report the gains.

    Senthil

    Reply
  16. Charlie Savage –
    July 20, 2007

    Hi Senthil,

    As the old adage goes – “don’t prematurely optmize.” I have to say I’m of two minds about that. On the one had, spending a lot of time upfront doing micro-optimizations like the ones I write about are a waste of time. On the other hand, if your architecture cannot support the performance you need, and you don’t discover that until the very end, then you are screwed.

    Looking forwarding to hearing your results – accurate, published performance information helps us all!

    Reply
  17. July 20, 2007

    The most important thing in this article that is also the most easily missed is BENCHMARKING!

    Don’t blindly apply any of the techniques here until you have profiled or benchmarked your application to see what the bottleneck is. The ruby-prof plugin is awesome for this purpose.

    Applying the techniques mentioned here only gave me a 5% speed boost, but using the ruby-prof plugin helped identify and fix a separate bottleneck that gave me a 200% speed boost. If I didn’t use ruby-prof, I would never have found that other bottleneck.

    Reply
  18. Charlie Savage –
    July 20, 2007

    Hi Topfunky,

    Absolutely. We focused on this particular request because our log files showed it was dreadfully slow.

    ruby-prof is the last step in any performance improvement project. First you have to find out what requests are slow (use your log files). Second, you have to figure out why they are slow (is it processing the request, rendering the results, querying the database, etc.). And finally, if you’re sure the problem is in your Rails code (or Rails), then break out ruby-prof to see what it is.

    I’m not surprised you only gained 5% from doing the same things we did. Our use case was a bit unusual – lots of partials with lots of URIs with lots of datetimes.

    Curious how you gained your 200%. Any lessons for the rest of us?

    Reply
  19. Binal
    July 23, 2007

    Hey, your post helped me in identifying the show stoppers and optimizing my rails app too 🙂

    Reply
  20. thies
    July 26, 2007

    hmm, just a quick hack – rails uses Date._parse to parse the strings returned from the database in my case (MySQL) the strings are either of format
    YYYY-MM-DD
    or
    YYYY-MM-DD HH:II:SS

    so i wrote this tiny patch to the Date class which takes care of all the common cases without haveing to touch my models:

    class Date
    class < < self alias :_slow_parse :_parse def _parse(string, comp=false) if string =~ /^(\d+)-(\d+)-(\d+)$/ return {:year=>$1.to_i, :mon=>$2.to_i, :mday=>$3.to_i }
    elsif string =~ /^(\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)$/
    return {:year=>$1.to_i, :mon=>$2.to_i,
    :mday=>$3.to_i, :hour=>$4.to_i,
    :min=>$5.to_i, :sec=>$6.to_i }
    else
    a = _slow_parse(string, comp)
    puts “Date::_parse slow #{string} -> #{a.inspect}”
    return a
    end
    end
    end

    what do you think?

    thies

    Reply
  21. Charlie Savage –
    July 26, 2007

    Binal – Find anything worth sharing with others?

    Reply
  22. Charlie Savage –
    July 26, 2007

    Hi Thies,

    Yeah – using regexs is the approach that the Time class takes, so that looks good to me. Maybe you can reuse it directly, or are your date formats slightly different?

    As far as overriding _parse, I think that’s the right solution if you have a lot of dates. Having to add custom attributes each time would be a pain. In our case, we only have 2 of them, so it wasn’t a big deal.

    Reply
  23. rick
    August 1, 2007

    I don’t think read_attribute actually calls #attribute as you claim, it pulls the value from @attributes like #read_attribute_before_type_cast. The generated methods don’t cache the type casted attributes though, so multiple calls to #created_at would call the slow Column#string_to_time method each time.

    Reply
  24. Charlie Savage –
    August 2, 2007

    Hi Rick,

    I double checked, and you are right. read_attribute does do the type cast (from string to whatever) but you generally want that. Thanks for the correction and I’ll update the blog.

    Reply
  25. Bob Hutchison
    August 3, 2007

    Nice article… it prompted me to check my own code out, and work out how to get the plugin to work with old versions of Rails. Post is here: http://recursive.ca/hutch/2007/08/03/398/ if anyone is interested.

    Reply
  26. Charlie Savage –
    August 3, 2007

    Hi Bob,

    Thanks for the link. Good point on the old versions of rails. Suppose the plugin could just use alias. Or maybe define alias_chain if needed.

    If you have the time, could you add patch to the ruby-prof project on ruby-forge and I’ll include it in the next release?

    Reply
  27. Hardy
    April 1, 2008

    Good, Keep it up.. to optimizing more…
    Interesting topic it is.

    Reply

Leave a Reply

Your email address will not be published.

Top