How to Profile Your Rails Application

Yesterday I mentioned that ruby-prof now supports profiling rails applications. Lets see how that works.

But a word of caution before beginning. Like all web applications, Rails apps are complex. They encompass many pieces – clients on remote machines, network connections, any number of application servers and database servers. Getting the whole ensemble to work can be tricky – let alone making it perform well.

To avoid drowning in complexity (and data), its crucial to use a rigorous approach to analyzing your application’s performance. For a good example of how to approach the problem, take a look at Zed Shaw’s analysis of Ruby/Odeum’s performance.

Ok – enough lecturing – onto the good stuff.

Installation and Setup

Assume that after much hard work, you’ve identified a request that takes too long (where of course you have to define what too long is). In addition, you know that the culprit is your Rails code – not the client, not the network and not the database.The trick is to find out what is taking so long – and there is where ruby-prof helps.

First install ruby-prof by opening a command line and typing:

gem install ruby-prof

Next, copy ruby-prof’s Rails plugin to your Rails plugin directory:

cp rails_plugin rails_app/vendor/plugins

Finally create a new Rails configuration with the following setting:

config.cache_classes = true

If you don’t cache classes, then the time Rails spends reloading and compiling your models and controllers will overwhelm anything else your code does. And obviously try to make your test setup as similar as possible to your production environment – use a copy of the same database, use the same web server, etc.

Profiling a Request

Once you’ve completed your setup, and started your Rails server, make a single request. Obviously for real testing you’ll need to make many requests, but let’s start with one to see how ruby-prof works.

To make this as realistic as possible, I’m going to show some old profiling data for MapBuzz, which is a social mapping site I started that let’s you see what’s going on in your neighborhood.

One of the things that is important to us is how long it takes to render the home page – so let’s make a request to the home page. Once the request is complete, the results are output in the Rails log file.

Completed in 0.71900 (1 reqs/sec)
Rendering: 0.37600 (52%) | DB: 0.18700 (26%)
200 OK [http://localhost/]

Thread ID: 175635930
Total: 0.719

 %self     total     self     wait    child    calls  name
 26.15      0.19     0.19     0.00     0.00       23  PGconn#exec
 26.01      0.19     0.19     0.00     0.00       27  PGresult#each
  4.45      0.03     0.03     0.00     0.00     1403  String#match
  4.45      0.03     0.03     0.00     0.00      171  IO#write
  2.23      0.02     0.02     0.00     0.00      872  String#to_s
  2.23      0.02     0.02     0.00     0.00      467  String#gsub
  2.23      0.03     0.02     0.00     0.02       76  ManyParser#_parse
  2.23      0.05     0.02     0.00     0.03       60  MonitorMixin
                                                      #synchronize
  2.23      0.02     0.02     0.00     0.00       60  Parsers#sequence
  2.23      0.02     0.02     0.00     0.00     1640  ParseContext#eof
  2.23      0.02     0.02     0.00     0.00       31  ActionView::
                                                      Partials
                                                      #add_object_to_
                                                      local_assigns!
  2.23      0.02     0.02     0.00     0.00       11  ActionView::
                                                      Helpers::
                                                      UrlHelper#link_to
  2.23      0.02     0.02     0.00     0.00      243  Object#add_error
  2.23      0.02     0.02     0.00     0.00      175  Kernel#clone

The first line shows the overall time as measured by the benchmark class used by Rails. Below that, you’ll see a flat profile generated by ruby-prof that shows the slowest methods. By default methods that take more than 1% of the time will be printed – for brevity’s sake I’ve culled it to 2%.

A quick scan of the results shows some interesting things. First, the most time is spent in PGconn#exec, which is used to query the database via a C extension. Most likely there isn’t much to optimize there on the Rails side, although there certainly may be room for optimization on the database side.

Next, PGconn#each is likely looping over the query results. Its time seems excessive, but we’ll worry about that later since we notice there are 1,403 calls to String#match. That seems overly excessive – what is causing that?

Digging Deeper

Flat profiles are great for providing an overview of where time is spent in your program, but they leave out a lot of useful information. In this case, we’d like to know what code is calling String#match. To do that we need information provided by graph profiles.

To create a graph profile, open up vendor/ruby-prof/lib/profiling.rb, and edit it like this:

module ActionController #:nodoc:
  module Profiling #:nodoc:
    def perform_action_with_profiling
      if RubyProf.running? or
        ...
      else
        ...
       # Example for Graph html printer
        printer = RubyProf::GraphHtmlPrinter.new(result)
        File.open('request.html', 'w') do |file|
          printer.print(file, {:min_percent => 1,
                               :print_file => true})
        end
      end
    end
  end
end

This will create a new file, request.html, that shows a graph profile of the request. Rerun the request, and then open up request.html.

In our case we are interested in the callers of String#match, so let’s take a look:

%Total
%Self
Total
Self
Wait
Child
Calls
Name
Line

0.00
0.00
0.00
0.00

1/1403
ActionController::AbstractRequest#accepts
65

0.00
0.00
0.00
0.00

195/1403
Array#each-1
103

0.00
0.00
0.00
0.00

1/1403
ActionController::Rescue#local_machine?
10

0.00
0.00
0.00
0.00

232/1403
Array#each-2
103

0.00
0.00
0.00
0.00

1/1403
ApplicationController#sanitize_status_codes
118

0.00
0.00
0.00
0.00

8/1403

ActiveRecord::ConnectionAdapters::

PostgreSQLAdapter#translate_field_type

55

0.00
0.00
0.00
0.00

232/1403
Array#each-3
103

0.00
0.00
0.00
0.00

11/1403
Geos::GeometryFactory#create_geom_from_hex
39

0.02
0.02
0.00
0.00

380/1403
Array#each-4
103

0.02
0.02
0.00
0.00

342/1403
Array#each-5
103

4.45%
4.45%
0.03
0.03
0.00
0.00
1403
String#match
39

0.00
0.00
0.00
0.00

1403/1429
Regexp#match
118

If you haven’t seen a graph profile before, you’re probably feeling a bit overwhelmed. To help out a bit, I’ve written some documentation here. However, the quick summary is that the method of interest is shown in bold. Methods above it are the method’s callers, while the methods below are the methods callees.

Notice the calls Array#each, Array#each-1, Array#each-2, etc? The dash and number indicate that a method has been called recursively. Thus Array#each-1 means that Array#each was called and then it in turn called Array#each either directly or indirectly.

Going back to our investigation, we see that Array#each-4 called String#match the most times. So the next step would be to click the Array#each-4 hyperlink and see where it is called from.

Visualizing a Profile

Once you create your own graph profile, you’ll quickly realize how valuable the html hyperlinks are for navigating through the mass of information. But wouldn’t it be nice to visualize the profile? Glad you asked!

ruby-prof 0.5.0 can now output call tree information thanks to a patch from Carl Shimer. To create a call out put format, once again you’ll have to modify profiling.rb:

module ActionController #:nodoc:
  module Profiling #:nodoc:
    def perform_action_with_profiling
      if RubyProf.running? or
        ...
      else
        ...
       # Example for Graph html printer
        printer = RubyProf::CallTreePrinter.new(result)
        File.open('callgrind.out', 'w') do |file|
          printer.print(file, {:min_percent => 1,
                               :print_file => true})
        end
      end
    end
  end
end

Once you’ve created the output file, open it with KCachegrind (sorry, this is Linux only, although there is a port to Windows which I have not tried using). KCachegrind take a bit of work to get used to, but once you do you can create visualizations that show you what your program is doing. In this case, here is a picture that shows String#match’s callers.

Call Graph

Wrapping Up

Remember – most of the time you won’t need ruby-prof. It provides much too much information to be used as a general way of finding performance bottlenecks – instead analyze your web log files, Rails log files and database log files.

But once you know the problem is in your Rails code, ruby-prof is invaluable in pointing out where it is.

  1. Gary Murakami
    July 12, 2007

    You wrote, “Obviously for real testing you’ll need to make many requests, …” This is my current wish. With many requests, it appears that mongrel spawns a new thread for each request, resulting in output from ruby-prof for each request/thread and loss of aggregate behavior over many requests. Any hints and remedies would be welcome, e.g., is there a way to make mongrel use the same thread?

    Reply
  2. Tim
    July 12, 2007

    Um.. where is the rails plugin?

    ruby-prof/rails_plugin doesn’t seem to exist

    Reply
  3. Charlie Savage –
    July 13, 2007

    Tim,

    It should be installed under:

    gems\ruby-prof-0.5.0\lib\ruby-prof\rails_plugin

    I think maybe I’ll change it to be directly under the top ruby-prof directory in a future release.

    Reply
  4. Charlie Savage –
    July 13, 2007

    Hi Gary,

    Yeah, ruby-prof doesn’t maintain aggregate information. And I’m not sure how mongrel does threading, haven’t ever checked.

    I’m not sure though how useful it would be though for ruby-prof to aggregate data. I think profiling one request is fine for getting a feel for timings. Usually what I do is tail the rails log file, and then do one request at a time and scan the results to make sure they are consistent.

    Now, to validate that your changes have improved performance, you definitely want to gather average times and standard deviations before and after any changes. And to do that, you should run a fair number of tests (and if I remember my statistics correctly, there are methodologies that tell you just how many tests that is).

    Hope this helps a bit – good luck!

    Reply
  5. Tim
    July 19, 2007

    I don’t think the plugin was included in 0.5.1

    Reply
  6. Charlie Savage –
    July 19, 2007

    Hi Tim,

    Wow – you’re right – how embarrassing. I just uploaded 0.5.2 which fixes the issue – hopefully it will be showing up soon on RubyForge.

    Reply
  7. July 30, 2007

    Following the above directions, I’m trying to profile some actions that are notoriously slow on our site, but I can’t access them because I keep getting LoadError’s. Any clue what might cause this?

    Reply
  8. Location of Ruby Gem dir on Mac OS X
    March 18, 2008

    After amany Google searches, I eventually found the location of the Gems dir on OS X 10.5 Leopard. It’s /Library/Ruby/Gems/1.8/gems

    So to copy the plugin, you do:

    cp -r /Library/Ruby/Gems/1.8/gems/ruby-prof-0.6.0/rails_plugin /path/to/railsapp/vendor/plugins/

    Reply
  9. March 19, 2008

    Note also the importance of profiling the *whole app* — so start the profiling within mongrel or somewhere that is global, so you can see if the asset manager is eating your lunch or what is. Good luck!

    Reply
  10. March 19, 2008

    Note that if you google for it, kcachegrind appears available for mac os and for windows, too. Thanks!

    Reply
  11. Neel
    June 26, 2008

    While I a m having many request at a time I can see only one Thread ID information on request.html, How to view information related to multiple requests.

    Reply

Leave a Reply

Your email address will not be published.

Top