ruby-prof 0.4.0 with call graphs

Update: ruby-prof 0.5.0 is now available and has significantly more features than 0.4.0, including better threading support, rails support, call tree output, etc.

After porting ruby-prof to Windows a couple of weeks ago, Shugo
, ruby-prof’s author, and I started working together. We’re
happy to announce the release of ruby-prof 0.4.0, which is chock
full of new features:

  • Addition of call graph profiles similar to GProf
  • Improved speed – overhead is now as low as 15% for some code, although
    you should generally expect around 50%
  • Full support for multiple threads
  • New cross-referenced html reports
  • New ruby-prof script that makes it easy to profile your programs
    without modifying them
  • Vastly improved documentation
  • Detection of recursive calls and
    call cycles
  • Support for windows

Best of all, ruby-prof is now distributed as a gem so it’s as easy to
install as:

gem install ruby-prof

If you’re on Windows, the gem includes a pre-built windows binary. If
you’re on Linux or Unix, the binary will be automatically built on intallation.

Graph Profiles

My favorite new feature – and the raison d’être for this
release, is the addition of call graphs. Here is the source code for the following examples.

Most profiling tools for Ruby1
output flat profiles, which are useful for identifying which methods
take the longest. They are concise and easy to understand.
Let’s take a look at an example to
see what I mean.

For short programs, flat profiles work well. But for
long programs, it’s really helpful to understand more about the context
in which a method is called. For example, which methods called the one
we’re interested in? Which methods did it call?

A quick word of warning – call graphs can be overwhelming if you haven’t
seen one before. Let’s venture forth and take a look at one that
shows the same results as the flat profile above.
Quite a bit different, isn’t it? If you haven’t used a call graph before
I’ve put together a little tutorial here.
There are also a number of excellent examples on the Web – search Google
for “gprof call graph tutorial.”


The thing that got me started working on ruby-prof was that the
built in Ruby profiler is beyond slow (and doesn’t support call graphs).
So how does ruby-prof compare?

We spent a good deal of time profiling ruby-prof, using oprofile on
Linux and Rational PurifyPlus on
Windows. When we started, ruby-prof added over 100% overhead to a program.
Thus, if a program took 10 seconds to run the profile run would take at
least 20. That’s not too bad, but there was clearly room for improvement.

Internally ruby-prof maintains two main data structures. The first is
a stack that keeps track of the current call sequence. The second is a
hash table with one entry per method profiled per thread.
The slowest part was looking up the method information in the hash table.
This happens each time a method is entered or exited.

To give you some idea of the number of times this happens, take a look
at the the Fixnum# method
in the graph profile above. For a program that lasted only 8 seconds, Fixnum#
got called a whopping 250,000 times (and these results are from last week
before our optimization work, so the program really only takes around 3
or 4 seconds to run).

The key to speeding up ruby-prof was to reduce these lookups as much as
possible. This was done by a combination of caching to avoid lookups,
simplyifing the hash key to make lookups faster,
and making the method tracking logic as simple as possible.

The result are quite encouraging. On “normal” programs, like Rails
apps, ruby-prof’s overhead is now less than 50% (and in fact, when I profile
our unit tests it is more in the range of 10% to 20%). For
programs that stress profilers, like the prime test above or a
factorial method, the overhead is somewhere in the 50% to 80% range.

Next time – some results from profiling Rails.

1The latest release of Mauricio Fernandez excellent rcov extension looks like it now has this functionality

Leave a Reply

Your email address will not be published. Required fields are marked *