<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>cfis : Category ruby-prof, everything about ruby-prof</title>
    <link>http://cfis.savagexi.com</link>
    <atom:link rel="self" type="application/rss+xml" href="http://cfis.savagexi.com/category/ruby-prof.rss"/>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>Charlie Savage's Blog</description>
    <item>
      <title>Profiling Your Rails Application - Take Two</title>
      <description>&lt;p&gt;Last year I wrote about how to &lt;a href="http://cfis.savagexi.com/2007/07/10/how-to-profile-your-rails-application"&gt;profile&lt;/a&gt; your Rails application, which is a lot harder then it seems. Its not so much the profiling itself - its easy enough to create one-off results. Instead, its coming up with a reproducible process that lets you measure performance changes over time.&lt;/p&gt;
&lt;p&gt;Some things that don't work over the long term:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;Insert profiling code into your application code&lt;/li&gt;
    &lt;li&gt;Use unit tests for profiling&lt;/li&gt;
    &lt;li&gt;Use functional tests for profiling&lt;/li&gt;
    &lt;li&gt;Use integration tests for profiling&lt;/li&gt;
    &lt;li&gt;Modify standard rails environments (test, development, production for profiling)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the latest version of &lt;a href="http://cfis.savagexi.com/2008/11/12/ruby-prof-0-7-0"&gt;ruby-prof&lt;/a&gt; introduces a new approach to profiling  		your Ruby or Rails code that is heavily based on the excellent work &lt;a href="http://bitsweat.net/"&gt;Jeremy&lt;/a&gt;  		has done on the request profiler included in newer versions of Rails.&lt;/p&gt;
&lt;p&gt;The basic idea is to extend Ruby's TestUnit library so that individual test  		cases are profiled by including a new RubyProf::Test module. When you include this module, ruby-prof will run each test once as a warm up  		and then ten more times to gather profiling data (using another new feature  		of the 0.7.0 release, the ability to pause and resume a profiling run).  Profile data is then output for each test.&lt;/p&gt;
&lt;p&gt;Let's look at an example:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;class&lt;/span&gt;&lt;/span&gt; ExampleTest &lt;span style="color: rgb(153, 0, 0);"&gt;&amp;lt;&lt;/span&gt; Test&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;Unit&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;TestCase&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;include&lt;/span&gt;&lt;/span&gt; RubyProf&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;Test&lt;br /&gt;  &lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;def&lt;/span&gt;&lt;/span&gt; test_stuff&lt;br /&gt;    puts &lt;span style="color: rgb(255, 0, 0);"&gt;&amp;quot;Test method&amp;quot;&lt;/span&gt;&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/tt&gt;&lt;/pre&gt;
&lt;p&gt;The line &lt;tt&gt;&lt;span style="font-weight: bold;"&gt; 		&lt;span style="color: rgb(0, 0, 255);"&gt;include&lt;/span&gt;&lt;/span&gt; RubyProf&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;Test&lt;/tt&gt;  		turns the test case into a profiling test case. The same approach  		could be used for hooking into other testing frameworks - all patches are  		of course welcome!&lt;/p&gt;
&lt;h3&gt;Using a Profile Environment for Rails&lt;/h3&gt;
&lt;p&gt;Now lets talk about profiling Rails. There are two main issues that make it harder then it seems.&lt;/p&gt;
&lt;p&gt;First, to get any useful data you need to profile a Rails app using the production environment settings in conjunction with a test database.  		Using the development environment doesn't work because the time it takes  		Rails to reload classes on each request drowns out any useful information.&lt;/p&gt;
&lt;p&gt;Second, how should profile tests be written and where should they go?&lt;/p&gt;
&lt;p&gt;The solution I've adopted is to use functional like-tests that use a  		PROFILE environment, and place them in a directory called test/profile.&lt;/p&gt;
&lt;p&gt;Let's look at another example:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 128);"&gt;require&lt;/span&gt;&lt;/span&gt; File&lt;span style="color: rgb(153, 0, 0);"&gt;.&lt;/span&gt;dirname&lt;span style="color: rgb(153, 0, 0);"&gt;(&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;__FILE__&lt;/span&gt;&lt;/span&gt;&lt;span style="color: rgb(153, 0, 0);"&gt;)&lt;/span&gt; &lt;span style="color: rgb(153, 0, 0);"&gt;+&lt;/span&gt; &lt;span style="color: rgb(255, 0, 0);"&gt;'/../profile_test_helper'&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;class&lt;/span&gt;&lt;/span&gt; MyControllerTest &lt;span style="color: rgb(153, 0, 0);"&gt;&amp;lt;&lt;/span&gt; Test&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;Unit&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;TestCase&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;include&lt;/span&gt;&lt;/span&gt; RubyProf&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;Test&lt;br /&gt;&lt;br /&gt;  fixtures&lt;span style="color: rgb(153, 0, 0);"&gt; :my_fixture&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;def&lt;/span&gt;&lt;/span&gt; setup&lt;br /&gt;    &lt;span style="color: rgb(0, 153, 0);"&gt;@controller&lt;/span&gt; &lt;span style="color: rgb(153, 0, 0);"&gt;=&lt;/span&gt; MyController&lt;span style="color: rgb(153, 0, 0);"&gt;.&lt;/span&gt;new&lt;br /&gt;    &lt;span style="color: rgb(0, 153, 0);"&gt;@request&lt;/span&gt;    &lt;span style="color: rgb(153, 0, 0);"&gt;=&lt;/span&gt; ActionController&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;TestRequest&lt;span style="color: rgb(153, 0, 0);"&gt;.&lt;/span&gt;new&lt;br /&gt;    &lt;span style="color: rgb(0, 153, 0);"&gt;@response&lt;/span&gt;   &lt;span style="color: rgb(153, 0, 0);"&gt;=&lt;/span&gt; ActionController&lt;span style="color: rgb(153, 0, 0);"&gt;::&lt;/span&gt;TestResponse&lt;span style="color: rgb(153, 0, 0);"&gt;.&lt;/span&gt;new&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;def&lt;/span&gt;&lt;/span&gt; test_get&lt;br /&gt;    get&lt;span style="color: rgb(153, 0, 0);"&gt;(&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;:index&lt;/span&gt;&lt;/span&gt;&lt;span style="color: rgb(153, 0, 0);"&gt;)&lt;/span&gt;&lt;br /&gt;  &lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="color: rgb(0, 0, 255);"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/tt&gt;&lt;/pre&gt;
&lt;p&gt;The only difference between a functional test and a profile test are  		the inclusion of the RubyProf::Test module and loading profile_test_helper.rb.  		profile_test_helper is unfortunately needed because the standard test_helper.rb  		file Rails uses loads the TEST environment. Hopefully future versions  		of Rails will fix this by allowing greater flexibility in specifying a test  		environment.&lt;/p&gt;
&lt;p&gt;So to get started with profiling your Rails application:&lt;/p&gt;
&lt;ol&gt;
    &lt;li&gt;Copy profile_test_helper.rb from the ruby-prof distribution to your  		  rails test directory&lt;/li&gt;
    &lt;li&gt;Modify profile_test_helper.rb as needed to set ruby-prof's output  		  directory&lt;/li&gt;
    &lt;li&gt;Create a profile.rb file in the environments directory&lt;/li&gt;
    &lt;li&gt;Update your databases.yml file to include a profile database (just map it to your test database)&lt;/li&gt;
    &lt;li&gt;Create a new directory test/profile&lt;/li&gt;
    &lt;li&gt;Start writing profiling tests that look similar to the above example&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And now you'll have reproducible profiling tests cases.&lt;/p&gt;
&lt;p&gt;So what's missing? A way of keeping track of how your  		applications performance changes over time. A quick hack is to use  		source control to keep profile tests results around. A more sophisticated  		solution would be to use ruby-prof's API to dump profile results into a  		database and then put a nice web front end onto it. Any takers?&lt;/p&gt;

</description>
      <pubDate>Thu, 13 Nov 2008 11:11:00 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:e7a00427-bd00-4c50-8066-fdf41ef25854</guid>
      <comments>http://cfis.savagexi.com/2008/11/13/profiling-your-rails-application-take-two#comments</comments>
      <category>ruby-prof</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=504</trackback:ping>
      <link>http://cfis.savagexi.com/2008/11/13/profiling-your-rails-application-take-two</link>
    </item>
    <item>
      <title>ruby-prof 0.7.0</title>
      <description>&lt;p&gt;I'm happy to announce the release of 		&lt;a href="http://rubyforge.org/projects/ruby-prof"&gt;ruby-prof&lt;/a&gt; 0.7.0, the  		superfast, open-source, Ruby profiler that helps you find bottlenecks in  		your Ruby code. This release was a joint effort, with major contributions  		from &lt;a href="http://bitsweat.net/"&gt;Jeremy Kemper&lt;/a&gt; (aka bitsweat) of  		Rails fame and &lt;a href="http://www.linkedin.com/pub/0/567/a2"&gt;Hin Boen&lt;/a&gt;  		from CodeGear. There are two major new features in this release, as well  		as a number of smaller enhancements and bug fixes. For a full list of changes,  		take a look at the 		&lt;a href="http://rubyforge.org/forum/forum.php?forum_id=28366"&gt;release notes&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The first major new feature is improved Rails profiling, which I'll talk  		about in a separate &lt;a href="http://cfis.savagexi.com/2008/11/13/profiling-your-rails-application-take-two"&gt;post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The second major feature is significant internal changes that make it  		easier to integrate ruby-prof with IDEs. ruby-prof is already being used  		by Aptana's &lt;a href="http://www.aptana.com/rails/"&gt;RadRails&lt;/a&gt; and has  		been integrated into the next version of Code Gear's 		&lt;a href="http://www.codegear.com/products/3rdrail"&gt;3rd Rail&lt;/a&gt;. As part  		of this work, Hin has built a user interface for ruby-prof that lets a user  		inspect individual methods to see how much time they took as well as how  		they were called.&lt;/p&gt;
&lt;p&gt;One big problem though, previous versions of ruby-prof only kept track  		of aggregate data. This made it impossible for Hin to create the user interface  		he wanted. For example, look at this call sequence:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;     A&lt;br /&gt;    / \&lt;br /&gt;   B   K&lt;br /&gt;  / \   \&lt;br /&gt; C   D   B&lt;br /&gt;        / \&lt;br /&gt;       C   D&lt;/tt&gt;&lt;/pre&gt;
&lt;p&gt;With earlier versions of ruby-prof, there was no way to tell what percent  		of the time spent in method C was a result of the A -&amp;gt; B -&amp;gt; C call sequence  		versus the A -&amp;gt; K -&amp;gt; B -&amp;gt; C call sequence. &lt;br /&gt;
&lt;br /&gt;
Or take another example:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;  A    K&lt;br /&gt;  |    |&lt;br /&gt;  B    B&lt;br /&gt;  |    |&lt;br /&gt;  C    D	&lt;/tt&gt;&lt;/pre&gt;
&lt;p&gt;In this case, if you tried to reconstruct the call sequence from ruby-prof  		you would end up with this incorrect result:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;     A    K&lt;br /&gt;     |  /&lt;br /&gt;     B   &lt;br /&gt;    / \ &lt;br /&gt;  C    D	&lt;/tt&gt;&lt;/pre&gt;
&lt;p&gt;So working with Hin, I rearchitected ruby-prof to keep track of full call  		sequences. Most likely you won't notice any difference - the changes will  		only affect you if you use ruby-prof's api to present results in a custom  		way. In that case, you'll have to update your code, which should only take  		a few minutes (to see the api in use, take a look at the various printer  		classes that ship with ruby-prof).&lt;/p&gt;
&lt;p&gt;Enjoy, and all feedback is welcome.&lt;/p&gt;

</description>
      <pubDate>Wed, 12 Nov 2008 09:41:00 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:8201cb27-ae0d-4362-9e4c-f229565d12fb</guid>
      <comments>http://cfis.savagexi.com/2008/11/12/ruby-prof-0-7-0#comments</comments>
      <category>ruby-prof</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=503</trackback:ping>
      <link>http://cfis.savagexi.com/2008/11/12/ruby-prof-0-7-0</link>
    </item>
    <item>
      <title>Must Read Rails Performance Article</title>
      <description>        &lt;p&gt;A couple of days ago, Alex Dymo from &lt;a href="http://www.pluron.com/corporate"&gt;Pluron&lt;/a&gt; sent me an email describing some of the great work he has done optimizing the performance of their online project managment software &lt;a href="http://www.acunote.com/promo"&gt;Accunote&lt;/a&gt;. His great insight was that their performance problems were caused by allocating too much memory, thus forcing Ruby's Garbage Collector to frequently run ruining performance. &lt;/p&gt;
        &lt;p&gt;Using a patched version of ruby and &lt;a href="http://ruby-prof.rubyforge.org/"&gt;ruby-prof&lt;/a&gt;, Alex was able to more than double performance  (with hints of more to come) and reduced memory consumption by 75%, or 750MB (yes - that is Megabytes). Alex does a wonderful job of documenting his approach with a series of blog posts &lt;a href="http://blog.pluron.com/2008/01/guerrillas-way.html"&gt;here&lt;/a&gt; and &lt;a href="http://blog.pluron.com/2008/01/ruby-on-rails-i.html"&gt;here&lt;/a&gt;.&lt;/p&gt;
        &lt;p&gt;The main culprit was Rail's handling of attributes, which is dreadfully designed (an obvious case of the simplest solution to a problem is the wrong solution - something I've been meaning to blog about for almost a year now ). But he also implicated Ruby's built-in benchmarking module.&lt;/p&gt;
        &lt;p&gt;Even better, Alex provided patches to ruby core (already accepted), Rails (already accepted) and ruby-prof. We'll also gladly accept his patches, and since its about time for a ruby-prof refresh, we'll spin out a new release as soon as we can. More to follow.&lt;/p&gt;


</description>
      <pubDate>Sat, 02 Feb 2008 15:58:00 -0700</pubDate>
      <guid isPermaLink="false">urn:uuid:49271e68-f14c-48bd-b3a7-326d14372fee</guid>
      <comments>http://cfis.savagexi.com/2008/02/02/must-read-rails-performance-article#comments</comments>
      <category>ruby-prof</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=488</trackback:ping>
      <link>http://cfis.savagexi.com/2008/02/02/must-read-rails-performance-article</link>
    </item>
    <item>
      <title>Making Rails Go Vroom</title>
      <description>      &lt;p&gt;&lt;a href="http://cfis.savagexi.com/articles/2007/07/09/announcing-ruby-prof-0-5-0"&gt;&lt;/a&gt; Last week I &lt;a href="http://cfis.savagexi.com/articles/2007/07/10/how-to-profile-your-rails-application"&gt;showed&lt;/a&gt; how to profile a Rails application using &lt;a href="http://rubyforge.org/projects/ruby-prof/"&gt;ruby-prof&lt;/a&gt;. The post was driven by our desire to improve &lt;a href="http://www.mapbuzz.com"&gt;MapBuzz&lt;/a&gt;'s map rendering speed. Our nightly log analysis showed that rendering speed has significantly degraded over the last few weeks. Since the whole point of MapBuzz is to share maps with others, we needed to find out what was taking so long.&lt;/p&gt;
      &lt;p&gt;A day  of profiling and we had our answer. With a few simple changes we were able to reduce rendering time from 7.8 seconds to 0.92 seconds (note these times are gathered while profiling the application, so the real times are 2 to 3 times less). &lt;/p&gt;
      &lt;p&gt;What we found should be generally applicable to all Rails applications, so I'd like to share them:&lt;/p&gt;
      &lt;ul&gt;
        &lt;li&gt;Don't use &lt;a href="#attributes"&gt;ActiveRecord#attributes&lt;/a&gt;&lt;/li&gt;
        &lt;li&gt;Get your &lt;a href="#includes"&gt;:includes&lt;/a&gt; right&lt;/li&gt;
        &lt;li&gt;Don't check template timestamps ( &lt;a href="#cache_template_loading"&gt;cache_template_loading&lt;/a&gt; = true) &lt;/li&gt;
        &lt;li&gt;Don't use &lt;a href="#url_for"&gt;url_for&lt;/a&gt; &lt;/li&gt;
        &lt;li&gt;Don't let Rails parse &lt;a href="#timestamps"&gt;timestamps&lt;/a&gt; &lt;/li&gt;
        &lt;li&gt;Don't &lt;a href="#symbolize"&gt;symbolize&lt;/a&gt; keys (local_assigns_support_string_keys = false) &lt;/li&gt;
      &lt;/ul&gt;
      &lt;p&gt;If you interested in the nitty-gritty details of each change, keep reading! &lt;/p&gt;
      &lt;h3&gt; A Bit of Background &lt;/h3&gt;
      &lt;p&gt;&lt;a href="http://www.mapbuzz.com"&gt;MapBuzz&lt;/a&gt; renders maps in a two step process. First, the html page is created and shown to the user. Second, the browser makes an Ajax request to get the features that should be shown on the map. The reason for having a two step process is that it allows users to zoom and pan around the map, without having to reload the whole page.&lt;/p&gt;
      &lt;p&gt;Log analysis revealed that the Ajax &lt;a href="http://localhost:3000/map_feature/396"&gt;request&lt;/a&gt; was taking a long time - 1 to 2 seconds, with a big standard deviation. Our goal is to reduce the average time to less than half a second and to significantly reduce the standard deviation.&lt;/p&gt;
      &lt;p&gt;If you look at the request &lt;a href="http://localhost:3000/map_feature/396"&gt;URI&lt;/a&gt;, you'll see its an Atom feed of features and not an HTML page. That has performance implications.&lt;/p&gt;
      &lt;p&gt; First, Atom feeds tend to have lots of links. In our case, each entry has 10 to 15 links. We limit the number of entries to 50, so that means there are roughly 600 link per page. &lt;/p&gt;
      &lt;p&gt;Second, we generate Atom feeds in a  modular fashion - each entry is created from 10 partials. There is a partial for the user, one for ratings, one for content, one for icons, one for geometries, etc. Why did we do it this way? Simple - to keep things &lt;a href="http://en.wikipedia.org/wiki/Don%27t_repeat_yourself"&gt;DRY&lt;/a&gt;. Having a set of partials  creates an extremely modular system - we can mix and match them as needed depending on the context of the request.&lt;/p&gt;
      &lt;h3&gt;Setup&lt;/h3&gt;
      &lt;p&gt;Testing was done with two machines. The first was my laptop, which is a Dell Latitude D610 running windows, Ruby 1.8.4, and Rails 1.2.3. The second is our staging server, which is fairly high end Dell Desktop running Fedora Core 6 with a copy of our production database (which is about 10Gb). Profiling was done with ruby-prof 0.5.1 (of course), using the Rails plugin I described &lt;a href="http://cfis.savagexi.com/articles/2007/07/10/how-to-profile-your-rails-application"&gt;previously&lt;/a&gt;. Note that the logging level was set to :debug, which will show up in the test results. &lt;/p&gt;
      &lt;h3&gt;Baseline&lt;/h3&gt;
      &lt;p&gt;To start we measured the baseline performance of our test setup. It was ugly indeed:&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;Total: 7.828&lt;br /&gt;
%self     total     self     wait    child    calls  name
24.11      1.95     1.89     0.00     0.06     3605  Kernel#clone
13.15      1.04     1.03     0.02     0.00      569  IO#write
 7.78      4.13     0.61     0.00     3.52      441  &amp;lt;Module::Benchmark&amp;gt;#
                                                     realtime-1
 7.41      0.58     0.58     0.00     0.00      125  PGconn#exec&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;One bit of solace is that profiling an application will slow it down 2 to 3 times, but that sill leaves us with a 4 seconds per request. So let's start digging through the results.&lt;/p&gt;
      &lt;h3&gt;&lt;a name="attributes"&gt;Don't Use ActiveRecord:attributes&lt;/a&gt;&lt;/h3&gt;
      &lt;p&gt;The first thing that jumps out is the huge amount of time Kernel#clone takes. A look at the call graph shows that the caller is clone_attribute_value.&lt;/p&gt;
       And a bit more digging reveals this custom code in our application: 
       &lt;pre&gt;&lt;tt&gt;def rating
  # Convert cached rating from BigDecimal to float,
  # otherwise strange rounding errors happen
  attributes['rating'].to_f
end&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;The problem is that when you call ActiveRecord#attributes it returns a &lt;em&gt;copy&lt;/em&gt; of the attributes, thus generating all the clone calls. Uuggh. I think this is a result of ActiveRecord's flawed attribute implementation, but that is for another blog.&lt;/p&gt;
      &lt;p&gt;What we want is access to the untyped, original value of the attribute. In theory you are supposed to use the auto generated method, rating_before_type_cast, for this. This has the advantage of skipping the clone call, but it relies on method_missing, which has some overhead (we did not measure how much). Or, you could use read_attribute directly, which also skips the clone call. This is usually your best choice. &lt;/p&gt;
      &lt;p&gt;For performance critical code, you may wish to read the untyped attribute directly (see the discussion about time below for an example). That can be done using the method read_attribute_before_type_cast, except that it is private. However, with Ruby, that's easily solved:&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;module ActiveRecord
  class Base
    public :read_attribute_before_type_cast
  end
end&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;&lt;strong&gt;Lesson &lt;/strong&gt; - Don't use attributes&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Gain&lt;/strong&gt; - 24%&lt;/p&gt;
      &lt;h3&gt;&lt;a name="includes"&gt;Use include&lt;/a&gt;&lt;/h3&gt;
      &lt;p&gt;Let's look at our updated results: &lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;Total: 3.39

 %self     total     self     wait    child    calls  name
 16.73      0.57     0.57     0.00     0.00      569  IO#write
 12.83      0.44     0.44     0.00     0.00      125  PGconn#exec
  8.76      1.77     0.30     0.00     1.48      441  &amp;lt;Module::Benchmark&amp;gt;#
	                                                    realtime-1
  5.99      0.22     0.20     0.00     0.01       40  PGconn#query&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;What stands out are the 125 calls to PGconn#exec - each one is a separate query. Looks like we forgot to specify a table or two to be eager loaded via ActiveRecord#find's &lt;a href="http://api.rubyonrails.com/classes/ActiveRecord/Associations/ClassMethods.html"&gt;:include&lt;/a&gt; option.&lt;/p&gt;
      &lt;p&gt;Note there is one downside to using :include - you lose the ability to use the :select option. For us that is important - and can be worked around by using the rails &lt;a href="http://cfis.savagexi.com/articles/2007/02/13/select-meets-include-or-a-pitch-for-rparsec"&gt;select_parser&lt;/a&gt; I've blogged about before.&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Lesson &lt;/strong&gt; - Get your :includes right&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Gain&lt;/strong&gt; - 12% &lt;/p&gt;
      &lt;h3&gt;&lt;a name="cache_template_loading"&gt;cache_template_loading=true&lt;/a&gt;&lt;/h3&gt;
      &lt;p&gt;After fixing attributes and includes, here is our current timings:&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;Total: 2.594

 %self     total     self     wait    child    calls  name
 12.64      1.14     0.33     0.00     0.81      361  &amp;lt;Module::Benchmark&amp;gt;#
                                                      realtime-1
 12.61      0.33     0.33     0.00     0.00      409  IO#write
  4.90      0.13     0.13     0.00     0.00      403  &amp;lt;Class::File&amp;gt;#mtime
  4.86      0.16     0.13     0.00     0.03    30250  Hash#[]
  3.62      0.09     0.09     0.00     0.00        5  PGconn#exec&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;A couple of things jump out.  First, using Benchmark#realtime is fairly time consuming. 
        Second, logging also takes its toll as seen in the times for IO#write (remember we have 
        logging set to :debug).&lt;/p&gt;
      &lt;p&gt;However, the #mtime call looks suspicious.  Some more digging through Rails shows all 403 calls come from 
        ActionView::Base#compile_template?.  Let's take a look:&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;def compile_template?(template, file_name, local_assigns)
  method_key    = file_name || template
  render_symbol = @@method_names[method_key]

  if @@compile_time[render_symbol] &amp;amp;&amp;amp;
	   supports_local_assigns?(render_symbol, local_assigns)
    if file_name &amp;amp;&amp;amp; !@@cache_template_loading 
      @@compile_time[render_symbol] &amp;lt; File.mtime(file_name) ||
        (File.symlink?(file_name) &amp;amp;&amp;amp; 
        (@@compile_time[render_symbol] &amp;lt; File.lstat(file_name).mtime))
    end
  else
    true
  end
end&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;Remember  I mentioned that it takes roughly 500 partial calls to generate the output? This is where it bites us -
        by default Rails checks the timestamp of cached templates before running them.  In a production environment that
        is totally unnecessary.  The solution is simple - just update your production.rb file like this:&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;# Don't check view timestamps!
config.action_view.cache_template_loading = true&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;And while we are at it, we'll change the debug level to :info to reduce its impact on the results.&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Lesson &lt;/strong&gt; - Set cache_template_loading to true&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Gain&lt;/strong&gt; - 5% &lt;/p&gt;
      &lt;h3&gt;&lt;a name="url_for"&gt;Don't use url_for&lt;/a&gt;&lt;/h3&gt;
      &lt;p&gt;We've managed to reduce the request time from 7.8 to 2.5 seconds. Not bad. But we still have more work to do.&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;
Total: 2.531

 %self     total     self     wait    child    calls  name
 13.59      1.91     0.34     0.00     1.56       45  &amp;lt;Module::Benchmark&amp;gt;#
                                                      realtime
 12.25      0.31     0.31     0.00     0.00      409  IO#write
  4.98      0.81     0.13     0.00     0.69      361  &amp;lt;Module::Benchmark&amp;gt;#
                                                      realtime-1
  4.94      0.13     0.13     0.00     0.00       40  ActiveRecord::Associations::
                                                      HasManyThroughAssociation#
                                                      construct_sql
  3.71      0.09     0.09     0.00     0.00        5  PGconn#exec
  3.04      0.08     0.08     0.00     0.00    29847  Hash#[]
  3.00      0.11     0.08     0.00     0.03     2097  Hash#each
  1.86      0.28     0.05     0.00     0.23      322  ActionController::Routing::
                                                      RouteSet#generate
&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;Of particular interest is the ActionController::Routing::RouteSet#generate.  Although it only runs for 0.05 seconds, its total time, including children, is 0.28. Thus creating 322 urls takes 11% of the request time.&lt;/p&gt;
      &lt;p&gt;People have &lt;a href="http://metaatem.net/2006/06/23/railsconf-panel-rails-application-optimization-techniques-tools-with-stefan-kaes"&gt;previously&lt;/a&gt; &lt;a href="http://scottstuff.net/blog/articles/2005/10/17/benchmarking-typo"&gt;mentioned&lt;/a&gt; how slow url_for is - well, its true. Instead of using url_for, just create your urls manually using string mashing. For example, if you want an absolute uri, then do this: &lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;&amp;quot;#{request.protocol}#{request.host_with_port}/controller/#{map.id}&amp;quot;&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;&lt;strong&gt;Lesson &lt;/strong&gt; - Don't use url_for or link_to&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Gain&lt;/strong&gt; - 11% &lt;/p&gt;
      &lt;h3&gt;&lt;a name="timestamps"&gt;Don't let Rails parse timestamps &lt;/a&gt;&lt;/h3&gt;
      &lt;p&gt;After removing url_for, we've smashed the 2 second barrier:&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;Total: 1.391

 %self     total     self     wait    child    calls  name
  9.06      0.13     0.13     0.00     0.00      726  String#sub!
  8.99      0.13     0.13     0.00     0.00       40  ActionView::Base::CompiledTemplates#
	                                                    _run_ratom_47app47views47geometry47_geometry46ratom
  6.69      0.09     0.09     0.00     0.00        5  PGconn#exec
  5.54      0.08     0.08     0.00     0.00      741  Item#id
  5.54      0.30     0.08     0.00     0.22      121  &amp;lt;Class::Date&amp;gt;#_parse&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;Next up is &amp;lt;Class::Date&amp;gt;#_parse, whose total time, including children, is a whopping 21% of the time.  Ouch.
        If you take a look at the code (its in the standard ruby library), you can see its quite  complicated 
        since it tries to parse a variety of date formats. In addition, it makes use of rational numbers, which are none too fast.&lt;/p&gt;
      &lt;p&gt;Taking a look at a graph profile, we see that &amp;lt;Class::Date&amp;gt;#_parse is called from &amp;lt;Class::ActiveRecord::ConnectionAdapters::Column&amp;gt;#string_to_time. 
        Thus Rails uses it to convert timestamps received from the database into Ruby objects.&lt;/p&gt;
      &lt;p&gt;As I detailed in a recent &lt;a href="http://cfis.savagexi.com/articles/2007/07/13/tick-tick-tick"&gt;post&lt;/a&gt;, the problem is that  Ruby's DateTime implementation is extremely slow compared to its Time implementation. The solution is to avoid DateTime's entirely and use custom time parsing code. Fortunately, our database can output time in ISO 861 format, which Time can quickly parse.&lt;/p&gt;
      &lt;p&gt;So, any place we access times, we simply roll our own attribute readers and writers like this (note that we don't use &lt;a href="#attributes"&gt;read_attribute&lt;/a&gt;!).&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;def created_on
  @created_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end
  
def created_on=(value)
  write_attribute(:created_on, value)
  @created_on = nil
end
  
def updated_on
  @updated_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end
&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;&lt;strong&gt;Lesson &lt;/strong&gt; - Don't let Rails parse timestamps - do it yourself&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Gain&lt;/strong&gt; - 20% &lt;/p&gt;
      &lt;h3&gt;&lt;a name="symbolize"&gt;Don't symbolize keys&lt;/a&gt;&lt;/h3&gt;
      &lt;p&gt;We've almost hit the 1 second barrier: &lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;Total: 1.156

 %self     total     self     wait    child    calls  name
  8.13      0.11     0.09     0.00     0.01      642  Kernel#send-3
  8.13      0.09     0.09     0.00     0.00        5  PGconn#exec
  6.75      0.08     0.08     0.00     0.00     7497  String#concat
  4.15      0.05     0.05     0.00     0.00      817  Hash#each
  4.07      0.05     0.05     0.00     0.00       40  ActiveRecord::Associations::
                                                      HasAndBelongsToManyAssociation#
                                                      construct_sql
  2.77      0.03     0.03     0.00     0.00     6360  String#match
  2.77      0.06     0.03     0.00     0.03       47  Array#each_index
  2.77      0.03     0.03     0.00     0.00      280  IconType#size
  2.60      0.03     0.03     0.00     0.00     4309  Array#[]
		
&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;At this point, most of the easy wins are  gone. However, there is at least one left. Although its not shown in the flat profile, the graph profile shows that the method ActiveSupport::CoreExtensions::Hash::Keys#symbolize_keys takes 2.77% of the time including its children. Additional rummaging through the call graph shows that the calls come from ActionView#compile_and_render_template: &lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;def compile_and_render_template(extension, template = nil, 
			        file_path = nil, local_assigns = {})
  # convert string keys to symbols if requested
  local_assigns = local_assigns.symbolize_keys if 
	          @@local_assigns_support_string_keys
&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;A bit of research shows that local_assigns_support_string_keys is deprecated and slated for removal from Rails! Sweet - that means we can save an additional 3% simply by adding this line to environment.rb:&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;config.action_view.local_assigns_support_string_keys = false&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;If only all optimizations were so simple.&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Lesson &lt;/strong&gt; - Don't symbolize keys&lt;/p&gt;
      &lt;p&gt;&lt;strong&gt;Gain&lt;/strong&gt; - 3% &lt;/p&gt;
      &lt;h3&gt;Wrapping Up&lt;/h3&gt;
      &lt;p&gt;So what's our final time? Let's see: &lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;Total: 0.922

 %self     total     self     wait    child    calls  name
 10.30      0.10     0.10     0.00     0.00        5  PGconn#exec
  3.47      0.17     0.03     0.00     0.14      312  Array#each-2
  3.47      0.03     0.03     0.00     0.00      212  ActiveRecord::ConnectionAdapters::
                                                      Quoting#quote
  3.47      0.03     0.03     0.00     0.00    19455  Hash#[]
  3.47      0.36     0.03     0.00     0.33      360  ActionView::Base#render-2
  3.36      0.06     0.03     0.00     0.03       47  Array#each_index
  3.36      0.03     0.03     0.00     0.00      299  ActiveRecord::Associations::
                                                      AssociationProxy#initialize
  3.36      0.14     0.03     0.00     0.11      705  ActiveRecord::Associations::
                                                      AssociationProxy#method_missing
&lt;/tt&gt;&lt;/pre&gt;
      &lt;p&gt;Not bad - we reduced the request time from 7.8 to 0.92 seconds with a day's worth of work. Obviously there is still much to be done - we need to run the code through our performance monitoring suite and make sure that the average time holds across a number of requests and that the standard deviation is in line. But we've at least made a good start.&lt;/p&gt;


</description>
      <pubDate>Wed, 18 Jul 2007 11:12:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:625d0ad1-6c6c-4d96-ba57-d721b422b5e7</guid>
      <comments>http://cfis.savagexi.com/2007/07/18/making-rails-go-vroom#comments</comments>
      <category>Programming</category>
      <category>Rails</category>
      <category>Ruby</category>
      <category>ruby-prof</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=452</trackback:ping>
      <link>http://cfis.savagexi.com/2007/07/18/making-rails-go-vroom</link>
    </item>
    <item>
      <title>How to Profile Your Rails Application</title>
      <description>	&lt;p&gt;&lt;a href="http://cfis.savagexi.com/articles/2007/07/09/announcing-ruby-prof-0-5-0"&gt;Yesterday&lt;/a&gt; I mentioned that ruby-prof now supports profiling rails applications. Lets see how that works.&lt;/p&gt;
			&lt;p&gt;But a word of caution  before beginning. Like all web applications, Rails apps are complex. They encompass many pieces -  clients on remote machines, network connections, any number of application servers and database servers. Getting the whole ensemble to work can be tricky - let alone making it perform well. &lt;/p&gt;
			&lt;p&gt;To avoid drowning in complexity (and data), its  crucial  to use a rigorous approach to analyzing your application's performance. For a good example of how to approach the problem,  take a look at Zed Shaw's &lt;a href="http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html"&gt;analysis&lt;/a&gt; of Ruby/Odeum's performance. &lt;/p&gt;
			&lt;p&gt;Ok - enough lecturing - onto the good stuff. &lt;/p&gt;
			&lt;h3&gt;Installation and Setup &lt;/h3&gt;
			&lt;p&gt;Assume that after much hard work, you've identified a request that takes too long (where of course you have to define what  too long is). In addition, you know that the culprit is your Rails code - not the  client, not the network and not the  database.The trick is to find out what is taking so long - and there is where ruby-prof helps.&lt;/p&gt;
			&lt;p&gt;First install ruby-prof by opening a command line and typing:&lt;/p&gt;
			&lt;pre&gt;&lt;tt&gt;gem install ruby-prof&lt;/tt&gt;&lt;/pre&gt;
			&lt;p&gt;Next, copy ruby-prof's Rails plugin to your Rails plugin directory:&lt;/p&gt;
			&lt;pre&gt;&lt;tt&gt;cp rails_plugin rails_app/vendor/plugins&lt;/tt&gt;&lt;/pre&gt;
			&lt;p&gt;Finally create a new Rails configuration with the following setting:&lt;/p&gt;
      &lt;pre&gt;&lt;tt&gt;config.cache_classes = true&lt;/tt&gt;&lt;/pre&gt;
			&lt;p&gt;If you don't cache classes, then the time Rails spends reloading and compiling your models and controllers will overwhelm anything else your code does. And obviously try to make your test setup as similar as possible to your production environment - use a copy of the same database, use the same web server, etc. &lt;/p&gt;
			&lt;h3&gt;Profiling a Request&lt;/h3&gt;
			&lt;p&gt;Once you've completed your setup, and started your Rails server, make a single request. Obviously for real testing you'll need to make many requests, but let's start with one  to see how ruby-prof works.&lt;/p&gt;
			&lt;p&gt;To make this as realistic as possible, I'm going to show some old profiling data for &lt;a href="http://www.mapbuzz.com"&gt;MapBuzz&lt;/a&gt;, which is a social mapping site I started that let's you see what's going on in your neighborhood.&lt;/p&gt;
			&lt;p&gt;One of the things that is important to us is how long it takes to render the home page - so let's make a request to the home page. Once the request is complete,  the results are output in the Rails log file.&lt;/p&gt;
			&lt;pre&gt;&lt;tt&gt;Completed in 0.71900 (1 reqs/sec)
Rendering: 0.37600 (52%) | DB: 0.18700 (26%)
200 OK [http://localhost/]

Thread ID: 175635930
Total: 0.719

 %self     total     self     wait    child    calls  name
 26.15      0.19     0.19     0.00     0.00       23  PGconn#exec
 26.01      0.19     0.19     0.00     0.00       27  PGresult#each
  4.45      0.03     0.03     0.00     0.00     1403  String#match
  4.45      0.03     0.03     0.00     0.00      171  IO#write
  2.23      0.02     0.02     0.00     0.00      872  String#to_s
  2.23      0.02     0.02     0.00     0.00      467  String#gsub
  2.23      0.03     0.02     0.00     0.02       76  ManyParser#_parse
  2.23      0.05     0.02     0.00     0.03       60  MonitorMixin
                                                      #synchronize
  2.23      0.02     0.02     0.00     0.00       60  Parsers#sequence
  2.23      0.02     0.02     0.00     0.00     1640  ParseContext#eof
  2.23      0.02     0.02     0.00     0.00       31  ActionView::
                                                      Partials
                                                      #add_object_to_
                                                      local_assigns!
  2.23      0.02     0.02     0.00     0.00       11  ActionView::
                                                      Helpers::
                                                      UrlHelper#link_to
  2.23      0.02     0.02     0.00     0.00      243  Object#add_error
  2.23      0.02     0.02     0.00     0.00      175  Kernel#clone&lt;/tt&gt;&lt;tt&gt;
&lt;/tt&gt;&lt;/pre&gt;
			&lt;p&gt;The first line shows the overall time as measured by the &lt;a href="http://www.ruby-doc.org/stdlib/libdoc/benchmark/rdoc/index.html"&gt;benchmark&lt;/a&gt; class used by Rails. Below that, you'll see a flat profile generated by ruby-prof that shows the slowest methods. By  default  methods that take more than 1% of the time will be printed - for brevity's sake I've culled it to 2%.&lt;/p&gt;
			&lt;p&gt;A quick scan of the results shows some interesting things. First, the most time is spent in PGconn#exec, which is used to query the database  via a C extension. Most likely there isn't much to optimize there on the Rails side, although there certainly may be room for  optimization on the database side.&lt;/p&gt;
			&lt;p&gt;Next, PGconn#each is likely looping over the query results. Its time seems excessive, but we'll worry about that later since we notice  there are 1,403 calls to String#match. That seems overly excessive - what is causing that?&lt;/p&gt;
			&lt;h3&gt;Digging Deeper&lt;/h3&gt;
			&lt;p&gt;Flat profiles are great for providing an overview of where time is spent in your program, but they leave out a lot of useful information. In this case, we'd like to know what code is calling String#match. To do that we need information provided by graph profiles. &lt;/p&gt;
			&lt;p&gt;To create a graph profile, open up vendor/ruby-prof/lib/profiling.rb, and edit it like this:&lt;/p&gt;
			&lt;pre&gt;&lt;tt&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;module&lt;/span&gt;&lt;/span&gt; ActionController &lt;span style="font-style: italic"&gt;&lt;span style="color: #9A1900"&gt;#:nodoc:&lt;/span&gt;&lt;/span&gt;&lt;span class="style1"&gt;
  &lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;module&lt;/span&gt;&lt;/span&gt; Profiling &lt;span style="font-style: italic"&gt;&lt;span style="color: #9A1900"&gt;#:nodoc:&lt;/span&gt;&lt;/span&gt;&lt;span class="style2"&gt;
    &lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;def&lt;/span&gt;&lt;/span&gt; perform_action_with_profiling
      &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;if&lt;/span&gt;&lt;/span&gt; RubyProf&lt;span style="color: #990000"&gt;.&lt;/span&gt;running&lt;span style="color: #990000"&gt;?&lt;/span&gt; &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;or&lt;/span&gt;&lt;/span&gt;&lt;span class="style2"&gt;
        ...
      &lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;else&lt;/span&gt;&lt;/span&gt;
        ...
       &lt;span style="font-style: italic"&gt;&lt;span style="color: #9A1900"&gt;# Example for Graph html printer&lt;/span&gt;&lt;/span&gt;
        printer &lt;span style="color: #990000"&gt;=&lt;/span&gt; RubyProf&lt;span style="color: #990000"&gt;::&lt;/span&gt;GraphHtmlPrinter&lt;span style="color: #990000"&gt;.&lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #000000"&gt;new&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #990000"&gt;(&lt;/span&gt;result&lt;span style="color: #990000"&gt;)&lt;/span&gt;
        File&lt;span style="color: #990000"&gt;.&lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #000000"&gt;open&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #990000"&gt;(&lt;/span&gt;&lt;span style="color: #FF0000"&gt;'request.html'&lt;/span&gt;&lt;span style="color: #990000"&gt;,&lt;/span&gt; &lt;span style="color: #FF0000"&gt;'w'&lt;/span&gt;&lt;span style="color: #990000"&gt;)&lt;/span&gt; &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;do&lt;/span&gt;&lt;/span&gt; &lt;span style="color: #990000"&gt;|&lt;/span&gt;file&lt;span style="color: #990000"&gt;|&lt;/span&gt;
          printer&lt;span style="color: #990000"&gt;.&lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #000000"&gt;print&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #990000"&gt;(&lt;/span&gt;file&lt;span style="color: #990000"&gt;,&lt;/span&gt; &lt;span style="color: #FF0000"&gt;{&lt;/span&gt;&lt;span style="color: #990000"&gt;:&lt;/span&gt;min_percent &lt;span style="color: #990000"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color: #993399"&gt;1&lt;/span&gt;&lt;span style="color: #990000"&gt;,&lt;/span&gt;
                               &lt;span style="color: #990000"&gt;:&lt;/span&gt;print_file &lt;span style="color: #990000"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;true&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #FF0000"&gt;}&lt;/span&gt;&lt;span style="color: #990000"&gt;)&lt;/span&gt;
        &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;span class="style1"&gt;
      &lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;
    &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;
  &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;
&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;
&lt;/tt&gt;&lt;/pre&gt;

			&lt;p&gt;This will create a new file, request.html, that shows a graph profile of the request. Rerun the request, and then open up request.html.&lt;/p&gt;
			&lt;p&gt;In our case we are interested in the callers of String#match, so let's take a look: &lt;/p&gt;
			&lt;table class="profile"&gt;
        &lt;tr&gt;
          &lt;th&gt;  %Total&lt;/th&gt;
          &lt;th&gt;   %Self&lt;/th&gt;
          &lt;th&gt;     Total&lt;/th&gt;
          &lt;th&gt;      Self&lt;/th&gt;
          &lt;th&gt;      Wait&lt;/th&gt;
          &lt;th&gt;       Child&lt;/th&gt;
          &lt;th&gt;               Calls&lt;/th&gt;
          &lt;th&gt;Name&lt;/th&gt;
          &lt;th&gt;Line&lt;/th&gt;
        &lt;/tr&gt;
							          &lt;!-- Create divider row --&gt;
            &lt;tr class="break"&gt;&lt;td colspan="8"&gt;&lt;/td&gt;&lt;/tr&gt;
        
          
            &lt;!-- Parents --&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;              1/1403&lt;/td&gt;
                &lt;td&gt;ActionController::AbstractRequest#accepts&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/mapbuzz/web/trunk/vendor/plugins/content_negotiation/lib/request_changes.rb#line=65"&gt;65&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;            195/1403&lt;/td&gt;
                &lt;td&gt;&lt;a href="#Array_each-1_175635930"&gt;Array#each-1&lt;/a&gt;&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/Development/ruby/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/validations.rb#line=103"&gt;103&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;              1/1403&lt;/td&gt;
                &lt;td&gt;ActionController::Rescue#local_machine?&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/mapbuzz/web/trunk/app/controllers/application.rb#line=10"&gt;10&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;            232/1403&lt;/td&gt;
                &lt;td&gt;&lt;a href="#Array_each-2_175635930"&gt;Array#each-2&lt;/a&gt;&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/Development/ruby/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/validations.rb#line=103"&gt;103&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;              1/1403&lt;/td&gt;
                &lt;td&gt;ApplicationController#sanitize_status_codes&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/mapbuzz/web/trunk/app/controllers/application.rb#line=118"&gt;118&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;              8/1403&lt;/td&gt;
                &lt;td&gt;&lt;p&gt;ActiveRecord::ConnectionAdapters::&lt;/p&gt;
                &lt;p&gt;PostgreSQLAdapter#translate_field_type&lt;/p&gt;&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/mapbuzz/web/trunk/vendor/plugins/geom/lib/active_record_extensions.rb#line=55"&gt;55&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;            232/1403&lt;/td&gt;
                &lt;td&gt;&lt;a href="#Array_each-3_175635930"&gt;Array#each-3&lt;/a&gt;&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/Development/ruby/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/callbacks.rb#line=103"&gt;103&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;             11/1403&lt;/td&gt;
                &lt;td&gt;Geos::GeometryFactory#create_geom_from_hex&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/mapbuzz/web/trunk/vendor/plugins/geom/lib/geos_rails.rb#line=39"&gt;39&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.02&lt;/td&gt;
                &lt;td&gt;      0.02&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;            380/1403&lt;/td&gt;
                &lt;td&gt;&lt;a href="#Array_each-4_175635930"&gt;Array#each-4&lt;/a&gt;&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/Development/ruby/lib/ruby/gems/1.8/gems/activerecord-1.15.3/lib/active_record/validations.rb#line=103"&gt;103&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.02&lt;/td&gt;
                &lt;td&gt;      0.02&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;            342/1403&lt;/td&gt;
                &lt;td&gt;&lt;a href="#Array_each-5_175635930"&gt;Array#each-5&lt;/a&gt;&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/Development/ruby/lib/ruby/gems/1.8/gems/rparsec-0.4.1/rparsec/parsers.rb#line=103"&gt;103&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
            

            &lt;tr class="method"&gt;
              &lt;td&gt;   4.45%&lt;/td&gt;
              &lt;td&gt;   4.45%&lt;/td&gt;
              &lt;td&gt;      0.03&lt;/td&gt;
              &lt;td&gt;      0.03&lt;/td&gt;
              &lt;td&gt;      0.00&lt;/td&gt;
              &lt;td&gt;      0.00&lt;/td&gt;
              &lt;td&gt;                1403&lt;/td&gt;
              &lt;td&gt;&lt;a name="String_match_175635930"&gt;String#match&lt;/a&gt;&lt;/td&gt;
              &lt;td&gt;&lt;a href="file://C:/mapbuzz/web/trunk/vendor/plugins/geom/lib/geos_rails.rb#line=39"&gt;39&lt;/a&gt;&lt;/td&gt;
            &lt;/tr&gt;

            &lt;!-- Children --&gt;
             
              &lt;tr&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;&amp;nbsp;&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                &lt;td&gt;      0.00&lt;/td&gt;
                
                &lt;td&gt;           1403/1429&lt;/td&gt;
                &lt;td&gt;Regexp#match&lt;/td&gt;
                &lt;td&gt;&lt;a href="file://C:/mapbuzz/web/trunk/vendor/plugins/geom/lib/geos_rails.rb#line=118"&gt;118&lt;/a&gt;&lt;/td&gt;
              &lt;/tr&gt;
            
            &lt;!-- Create divider row --&gt;
            &lt;tr class="break"&gt;&lt;td colspan="8"&gt;&lt;/td&gt;&lt;/tr&gt;
			&lt;/table&gt;
			&lt;p&gt;&amp;nbsp; &lt;/p&gt;
			&lt;p&gt;If you haven't seen a graph profile before, you're probably feeling a bit overwhelmed. To  help out a bit, I've written some documentation &lt;a href="http://ruby-prof.rubyforge.org/graph.txt"&gt;here&lt;/a&gt;. However, the quick summary is that the method of interest is shown in bold. Methods above it are the method's callers, while the methods below are the methods callees.&lt;/p&gt;
			&lt;p&gt;Notice the calls Array#each, Array#each-1, Array#each-2, etc? The dash and number indicate that a method has been called recursively. Thus Array#each-1 means that Array#each was called and then it in turn called Array#each either directly or indirectly.&lt;/p&gt;
			&lt;p&gt;Going back to our investigation, we see that Array#each-4 called String#match the most times. So  the next step would be to click the Array#each-4 hyperlink and see where it is called from.&lt;/p&gt;
			&lt;h3&gt;Visualizing a Profile&lt;/h3&gt;
			&lt;p&gt;Once you create your own graph profile, you'll quickly realize how valuable the html hyperlinks are for navigating through the mass of information. But wouldn't it be nice to visualize the profile? Glad you asked!&lt;/p&gt;
			&lt;p&gt; ruby-prof 0.5.0 can now output call tree information thanks to a patch from Carl Shimer. To create a call out put format, once again you'll have to modify profiling.rb:&lt;/p&gt;
			&lt;pre&gt;&lt;tt&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;module&lt;/span&gt;&lt;/span&gt; ActionController &lt;span style="font-style: italic"&gt;&lt;span style="color: #9A1900"&gt;#:nodoc:&lt;/span&gt;&lt;/span&gt;&lt;span class="style1"&gt;
  &lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;module&lt;/span&gt;&lt;/span&gt; Profiling &lt;span style="font-style: italic"&gt;&lt;span style="color: #9A1900"&gt;#:nodoc:&lt;/span&gt;&lt;/span&gt;&lt;span class="style2"&gt;
    &lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;def&lt;/span&gt;&lt;/span&gt; perform_action_with_profiling
      &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;if&lt;/span&gt;&lt;/span&gt; RubyProf&lt;span style="color: #990000"&gt;.&lt;/span&gt;running&lt;span style="color: #990000"&gt;?&lt;/span&gt; &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;or&lt;/span&gt;&lt;/span&gt;&lt;span class="style2"&gt;
        ...
      &lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;else&lt;/span&gt;&lt;/span&gt;
        ...
       &lt;span style="font-style: italic"&gt;&lt;span style="color: #9A1900"&gt;# Example for Graph html printer&lt;/span&gt;&lt;/span&gt;
        printer &lt;span style="color: #990000"&gt;=&lt;/span&gt; RubyProf&lt;span style="color: #990000"&gt;::&lt;/span&gt;CallTreePrinter&lt;span style="color: #990000"&gt;.&lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #000000"&gt;new&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #990000"&gt;(&lt;/span&gt;result&lt;span style="color: #990000"&gt;)&lt;/span&gt;
        File&lt;span style="color: #990000"&gt;.&lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #000000"&gt;open&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #990000"&gt;(&lt;/span&gt;&lt;span style="color: #FF0000"&gt;'callgrind.out'&lt;/span&gt;&lt;span style="color: #990000"&gt;,&lt;/span&gt; &lt;span style="color: #FF0000"&gt;'w'&lt;/span&gt;&lt;span style="color: #990000"&gt;)&lt;/span&gt; &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;do&lt;/span&gt;&lt;/span&gt; &lt;span style="color: #990000"&gt;|&lt;/span&gt;file&lt;span style="color: #990000"&gt;|&lt;/span&gt;
          printer&lt;span style="color: #990000"&gt;.&lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #000000"&gt;print&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #990000"&gt;(&lt;/span&gt;file&lt;span style="color: #990000"&gt;,&lt;/span&gt; &lt;span style="color: #FF0000"&gt;{&lt;/span&gt;&lt;span style="color: #990000"&gt;:&lt;/span&gt;min_percent &lt;span style="color: #990000"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color: #993399"&gt;1&lt;/span&gt;&lt;span style="color: #990000"&gt;,&lt;/span&gt;
                               &lt;span style="color: #990000"&gt;:&lt;/span&gt;print_file &lt;span style="color: #990000"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;true&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #FF0000"&gt;}&lt;/span&gt;&lt;span style="color: #990000"&gt;)&lt;/span&gt;
        &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;span class="style1"&gt;
      &lt;/span&gt;&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;
    &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;
  &lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;
&lt;span style="font-weight: bold"&gt;&lt;span style="color: #0000FF"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/tt&gt;&lt;/pre&gt;
&lt;p&gt;Once you've created the output file, open it with &lt;a href="http://kcachegrind.sourceforge.net/cgi-bin/show.cgi"&gt;KCachegrind&lt;/a&gt; (sorry, this is Linux only, although there is a port to Windows which I have not tried using). KCachegrind take a bit of work to get used to, but once you do you can create visualizations that show you what your program is doing. In this case, here is a picture that shows  String#match's callers. &lt;/p&gt;
&lt;p&gt;&lt;img src="http://cfis.savagexi.com/files/call_graph.png" alt="Call Graph" width="519" height="1010" longdesc="Call Graph" /&gt;&lt;/p&gt;
&lt;h3&gt;Wrapping Up&lt;/h3&gt;
&lt;p&gt;Remember - most of the time you won't need ruby-prof. It provides much too much information to be used as a general way of finding performance bottlenecks - instead analyze your web log files, Rails log files and database log files. &lt;/p&gt;
&lt;p&gt;But once you know the problem is in your Rails code, ruby-prof is invaluable in pointing out where it is.&lt;/p&gt;


</description>
      <pubDate>Tue, 10 Jul 2007 10:58:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:f7d6c706-2232-4fb3-b548-9940181ea9dc</guid>
      <comments>http://cfis.savagexi.com/2007/07/10/how-to-profile-your-rails-application#comments</comments>
      <category>Rails</category>
      <category>ruby-prof</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=446</trackback:ping>
      <link>http://cfis.savagexi.com/2007/07/10/how-to-profile-your-rails-application</link>
    </item>
    <item>
      <title>Announcing ruby-prof 0.5.0</title>
      <description>&lt;p class="extended"&gt;Make sure to grab the latest version of ruby-prof, currently 0.5.2, which includes some performance tweaks and bug fixes.&lt;/p&gt;
&lt;p&gt;A &lt;a href="http://cfis.savagexi.com/articles/2006/06/21/ruby-prof-0-4-0-with-call-graphs"&gt;year&lt;/a&gt; has passed since the last release of &lt;a href="http://rubyforge.org/projects/ruby-prof"&gt;ruby-prof&lt;/a&gt;. If you haven't used ruby-prof,  its a superfast, open-source, profiler for Ruby that shows you  &lt;a href="http://ruby-prof.rubyforge.org/graph.html"&gt;where&lt;/a&gt; your program is slow.&lt;/p&gt;
			&lt;p&gt;Over the last few months the list of bugs  and enhancements on Ruby Forge has been accumulating at an embarassing rate. And not to mention, I need to profile our rails &lt;a href="http://www.mapbuzz.com"&gt;app&lt;/a&gt;.&lt;/p&gt;
		  &lt;p&gt;So its time for another release.
			And this isn't just a minor bug fix release - its a  major update that adds lots of new stuff. To install it, open a command prompt and type:&lt;/p&gt;
			&lt;pre&gt;&lt;tt&gt;gem install ruby-prof&lt;/tt&gt;&lt;/pre&gt;
			&lt;p&gt;Then pick the appropriate &lt;a href="http://rubygems.org/read/book/1"&gt;gem&lt;/a&gt; for your platform.&lt;/p&gt;
			&lt;h3&gt;Multithreaded Applications &lt;/h3&gt;
		  &lt;p&gt;The most important change is support for  multi-threaded applications. Previously, ruby-prof  correctly recognized the  threads in a program and their independent call stacks, but it got the thread times wrong. ruby-prof would incorrectly add a child thread's time to a parent thread's time, making hard to figure out where the time in a program was spent. Now, ruby-prof keeps track of a thread's &amp;quot;wait time,&amp;quot; which is the time it spends waiting for other threads. Thanks to Sylvain Joyeux for a patch that got the ball rolling on better thread support.		  &lt;/p&gt;
		  &lt;h3&gt;Measurement Modes &lt;/h3&gt;
		  &lt;p&gt;Sylvain also submitted a second patch that suprised me. Instead of tracking method times, why not track object allocations? That seemed reasonable, but he implemented it as another type  of clock mode (ruby-prof supports 3 different ways of measuring time - process time, wall time and cpu time). &lt;/p&gt;
		  &lt;p&gt;At first I was hesitant about the patch - shoehorning object allocations into clock modes didn't make much sense to me. But after thinking about it a bit, I decided Sylvain was onto something. Instead of thinking about &amp;quot;clock modes&amp;quot; I started thinking about &amp;quot;measurement modes&amp;quot; (catchy, isn't it?). And once you have that perspective, then it makes sense for ruby-prof to measure all sorts of things in a running program - time, object allocations, memory usage, etc. &lt;/p&gt;
		  &lt;p&gt;So with a bit of refactoring, ruby-prof's now supports for &amp;quot;measure modes.&amp;quot; They are process time, wall time, cpu time an object allocation. And now its easy to add more if anyone wants to submit a patch!&lt;/p&gt;
		  &lt;p&gt;I should note there is one downside to the object allocations mode - it  only works if you have a &lt;a href="http://rubyforge.org/tracker/index.php?func=detail&amp;amp;aid=11497&amp;amp;group_id=426&amp;amp;atid=1700"&gt;patched&lt;/a&gt; Ruby interpreter (if you don't ruby-prof works fine, you just don't get access to allocation information). &lt;/p&gt;
		  &lt;h3&gt;Rails &lt;/h3&gt;
		  &lt;p&gt;What use would a Ruby profiler be if it didn't work with Rails? ruby-prof now provides some hooks to make easier to profile your rails app. But I'll leave those for a future blog post.&lt;/p&gt;
		  &lt;h3&gt;Call Graph Format &lt;/h3&gt;
		  &lt;p&gt;Carl Shimer implemented support for the call  graph format which is used by &lt;a href="http:/kcachegrind.sourceforge.net/cgi-bin/show.cgi/KcacheGrindCalltreeFormat"&gt;KCachegrind&lt;/a&gt; to visualize calling information.&lt;/p&gt;
		  &lt;h3&gt;Recursive Methods &lt;/h3&gt;
		  &lt;p&gt;Finally, ruby-prof has much improved support for recursive method calls as explained in the updated readme file.&lt;/p&gt;
		  &lt;h3&gt;Bug Fixes &lt;/h3&gt;
		  &lt;p&gt;And last, this release fixes some nasty bugs:&lt;/p&gt;
			&lt;p&gt;Unknown Singleton Object - This was a tough one since I didn't understand how to reproduce it. But luckily Matthew Fallshaw submitted a reproducable test case, and from there it was just a matter of reading the translated version of the &lt;a href="http://rhg.rubyforge.org/"&gt;Ruby Hacking Guide&lt;/a&gt; to unravel the mysteries of Ruby's singleton classes.&lt;/p&gt;
			&lt;p&gt;64-bit support - The 0.4.x releases used ints to store pointers instead of longs, which meant they didn't compile and/or work on 64 bit machines. This is now fixed thanks to a patch submitted by Diego 'Flameeyes' Petten&#242;.&lt;/p&gt;
			&lt;p&gt;IRB support - Previously you couldn't start ruby-prof in a method and then exit that method. Which meant that ruby-prof couldn't be used in IRB. That restriction has now been removed.&lt;/p&gt;
			&lt;p&gt;sort order - The sort orderan  in generated reports was wrong. Graph reports now sort based on a method's total percent time while flat reports use a method's self percent time.&lt;/p&gt;
			&lt;p&gt;So give version 0.5.0 a try and see how you like it!&lt;/p&gt;


</description>
      <pubDate>Mon, 09 Jul 2007 00:56:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:f6bc1364-114a-43c3-9426-286d4d7b51dc</guid>
      <comments>http://cfis.savagexi.com/2007/07/09/announcing-ruby-prof-0-5-0#comments</comments>
      <category>Ruby</category>
      <category>ruby-prof</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=445</trackback:ping>
      <link>http://cfis.savagexi.com/2007/07/09/announcing-ruby-prof-0-5-0</link>
    </item>
    <item>
      <title>Profiling Ruby Code</title>
      <description> &lt;p&gt;&lt;a href="http://on-ruby.blogspot.com/"&gt;Pat Eyler&lt;/a&gt; has written a nice set
      of articles about profiling Ruby code. He &lt;a href="http://on-ruby.blogspot.com/2006/08/profile-and-ruby-prof.html#links"&gt;shows&lt;/a&gt; how
      to use the built in Ruby profiler as well as ruby-prof (my personal
      &lt;a href="http://cfis.savagexi.com/articles/2006/06/21/ruby-prof-0-4-0-with-call-graphs"&gt;favorite&lt;/a&gt; :).
      He talks a bit about &lt;a href="http://ruby-prof.rubyforge.org/graph.html"&gt;call
      graphs&lt;/a&gt;, so if you're not familiar with them, his &lt;a href="http://on-ruby.blogspot.com/2006/08/ruby-prof-and-call-graphs.html#links"&gt;article&lt;/a&gt;      is
      a good place to start.&lt;/p&gt;

</description>
      <pubDate>Wed, 16 Aug 2006 00:00:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:e6c26abf-e909-451d-bff4-3651bb4ee991</guid>
      <comments>http://cfis.savagexi.com/2006/08/16/profiling-ruby-code#comments</comments>
      <category>Ruby</category>
      <category>ruby-prof</category>
      <category>Tools</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=371</trackback:ping>
      <link>http://cfis.savagexi.com/2006/08/16/profiling-ruby-code</link>
    </item>
    <item>
      <title>ruby-prof 0.4.0 with call graphs</title>
      <description>&lt;p class="extended"&gt;Update: &lt;a href="http://cfis.savagexi.com/articles/2007/07/09/announcing-ruby-prof-0-5-0"&gt;ruby-prof 0.5.0&lt;/a&gt; is now available and has significantly more features than 0.4.0, including better threading support, rails support, call tree output, etc.&lt;/p&gt;
        &lt;p&gt;After porting &lt;a href="http://ruby-prof.rubyforge.org/"&gt;ruby-prof&lt;/a&gt; to Windows a couple of weeks ago, &lt;a href="http://shugo.net/"&gt;Shugo
            Maeda&lt;/a&gt;, ruby-prof's author, and I started working together. We're
             happy to announce the release of ruby-prof 0.4.0, which is chock
          full of new features:&lt;/p&gt;
        &lt;ul&gt;
          &lt;li&gt; Addition of call graph profiles similar to GProf&lt;/li&gt;
          &lt;li&gt; Improved speed - overhead is now as low as 15% for some code, although
            you should generally expect around 50% &lt;/li&gt;
          &lt;li&gt; Full support for multiple threads&lt;/li&gt;
          &lt;li&gt; New cross-referenced html reports&lt;/li&gt;
          &lt;li&gt; New ruby-prof script that makes it easy to profile your programs
            without modifying them &lt;/li&gt;
          &lt;li&gt; Vastly improved documentation&lt;/li&gt;
          &lt;li&gt; Detection of recursive calls and
            call cycles&lt;/li&gt;
          &lt;li&gt; Support for windows&lt;/li&gt;
          &lt;/ul&gt;
        &lt;p&gt;&amp;nbsp;&lt;/p&gt;
        &lt;p&gt;Best of all, ruby-prof is now distributed as a gem so it's as easy to
          install as:&lt;/p&gt;
        &lt;pre&gt;&lt;tt&gt;gem install ruby-prof&lt;/tt&gt;&lt;/pre&gt;
					
        &lt;p&gt;If you're on Windows, the gem includes a pre-built windows binary. If
          you're on Linux or Unix, the binary will be automatically built on intallation.&lt;/p&gt;
        &lt;h3&gt;Graph Profiles &lt;/h3&gt;
        &lt;p&gt;My favorite new feature - and the raison d&#8217;&#234;tre for this
          release, is the addition of call graphs.   &lt;a href="http://cfis.savagexi.com/pages/prime.rb"&gt;Here&lt;/a&gt; is the source code for the following examples.&lt;/p&gt;

&lt;p&gt;Most profiling tools for Ruby&lt;a href="#footnote1"&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;
          output flat profiles, which are useful for identifying which methods
          take the longest. They are concise and easy to understand.
          Let's take a  look at an  &lt;a href="http://ruby-prof.rubyforge.org/flat.txt"&gt;example&lt;/a&gt; to
          see what I mean.&lt;/p&gt;
        &lt;p&gt;For short programs, flat profiles work well. But for
          long programs, it's really helpful to understand more about the context
          in which a method is called. For example, which methods called the one
          we're interested in? Which methods did it call? &lt;/p&gt;
        &lt;p&gt; A quick word of warning - call graphs can be overwhelming if you haven't
          seen one before. Let's venture forth and take a look at  &lt;a href="http://ruby-prof.rubyforge.org/graph.html"&gt;one&lt;/a&gt; that
          shows the same results as the flat profile above.
          Quite a bit different, isn't it? If you haven't used a call graph before
          I've put together a little tutorial &lt;a href="http://ruby-prof.rubyforge.org/graph.txt"&gt;here&lt;/a&gt;.
          There are also a number of excellent examples on the Web - search Google
          for &amp;quot;gprof call graph tutorial.&amp;quot; &lt;/p&gt;
        &lt;h3&gt;Speed&lt;/h3&gt;
        &lt;p&gt;The  thing that got me started working on ruby-prof was that the
          built in Ruby profiler is beyond slow (and doesn't support call graphs).
          So how does ruby-prof compare?&lt;/p&gt;
        &lt;p&gt;We spent a good deal of time profiling ruby-prof, using &lt;a href="http://oprofile.sourceforge.net/news/"&gt;oprofile&lt;/a&gt; on
          Linux and Rational &lt;a href="http://www-306.ibm.com/software/awdtools/purifyplus/win/"&gt;PurifyPlus&lt;/a&gt; on
          Windows. When we started, ruby-prof added over 100% overhead to a program.
          Thus, if a program took 10 seconds to run the profile run would take at
          least 20. That's not too bad, but there was clearly room for improvement. &lt;/p&gt;
        &lt;p&gt;Internally ruby-prof maintains two main data structures. The first is
          a stack that keeps track of the current call sequence. The second is a
          hash table with one entry per method profiled per thread.
          The slowest part was looking up the method information in the hash table.
          This happens each time a method is entered or exited.&lt;/p&gt;
        &lt;p&gt;To give you some idea of the number of times this happens, take a look
          at the the &lt;a href="http://ruby-prof.rubyforge.org/graph.html#Fixnum____21277412"&gt;Fixnum#&lt;/a&gt; method
          in the graph profile above. For a program that lasted only 8 seconds, Fixnum#
          got called a whopping 250,000 times (and these results are from last week
          before our optimization work, so the program really only takes around 3
          or 4 seconds to run). &lt;/p&gt;
        &lt;p&gt;The key to speeding up ruby-prof was to reduce these lookups as much as
          possible. This was done by a combination of caching to avoid lookups, 
          simplyifing the hash key to make lookups faster,
          and making the method tracking logic as simple as possible. &lt;/p&gt;
        &lt;p&gt;The result are quite encouraging. On &amp;quot;normal&amp;quot; programs, like Rails
          apps, ruby-prof's overhead is now less than 50% (and in fact, when I profile
          our unit tests it is more in the range of 10% to 20%). For
          programs that stress profilers, like the prime test above or a
          factorial method, the overhead is somewhere in the 50% to 80% range.&lt;/p&gt;
        &lt;p&gt;Next time - some results from profiling Rails. &lt;/p&gt;
        &lt;p&gt;&lt;sup&gt;&lt;a id="rp_footnote_1"&gt;1&lt;/a&gt;&lt;/sup&gt;The latest release of Mauricio Fernandez excellent &lt;a href="http://eigenclass.org/hiki.rb?rcov"&gt;rcov&lt;/a&gt; extension looks like it now has this functionality&lt;/p&gt;


</description>
      <pubDate>Wed, 21 Jun 2006 13:06:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:e63ec33f-c49a-4112-a735-9e453f78dfea</guid>
      <comments>http://cfis.savagexi.com/2006/06/21/ruby-prof-0-4-0-with-call-graphs#comments</comments>
      <category>Ruby</category>
      <category>ruby-prof</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=226</trackback:ping>
      <link>http://cfis.savagexi.com/2006/06/21/ruby-prof-0-4-0-with-call-graphs</link>
    </item>
    <item>
      <title>Porting ruby-prof to Windows</title>
      <description>  &lt;p&gt;Yesterday I wanted to profile some methods  I'm using on a Rails controller.
    To get a feel for profiling Ruby code I put together a test
    case, added a
    &amp;quot;require 'prof'&amp;quot; to the top of the file and eagerly waited for the
    results. And waited, and waited, and waited. Thinking I did something
    wrong, I ran the code without a profiler - it took about 2 seconds. With the
    profiler it took so long I gave up. And this
    was on a dual core pentium D processor with 1 Gig of memory running Fedora Core
    5.&lt;/p&gt;
  &lt;p&gt;Time for some investigation. It turns out this is a well known &lt;a href="http://blog.zenspider.com/archives/2005/04/space_vs_time.html"&gt;problem&lt;/a&gt; -
    the built-in Ruby profiler, which is written in Ruby, is so slow as to be useless.
    I came across two alternatives - &lt;a href="http://raa.ruby-lang.org/project/ruby-prof/"&gt;ruby-prof,&lt;/a&gt; a
    C  extension written by Shugo Maeda, and  &lt;a href="http://blog.zenspider.com/archives/2005/04/space_vs_time.html"&gt;ZenProfile&lt;/a&gt;, an inline C exension done by Ryan Davis.&lt;/p&gt;
  &lt;p&gt;I went with ruby-prof. On Linux it was easy enough to download, build and
    install and it worked like a charm. But I do most of my work on my laptop which
    runs Windows XP. So I opened up MingW and built and installed the extension on
    Windows (that's not quite true, I had to hack the C a bit, more info below). But when I ran the test script I was met with a program fail message
    saying that the stack was empty. Ugh.&lt;/p&gt;
  &lt;p&gt;Since I find it impossible to debug extensions on Windows  with MingW I
    fired up Visual Studio 2005, rebuilt the extension, and tried again. Same issue.&lt;/p&gt;
  &lt;p&gt;Digging deeper, it turns out the profilers (as well as the wonderful &lt;a href="http://eigenclass.org/hiki.rb?rcov"&gt;rcov&lt;/a&gt; project)
    work by registering a callback with Kernel::set_trace_func. When Ruby
    executes a line of code, enters a new Ruby or C method, or exists a Ruby or C
    method, the callback is activated.&lt;/p&gt;
  &lt;p&gt;The problem is that ruby-prof assumes that each call into a method is matched
    by a return - and if its not then the failure I see is triggered. To understand
    the problem, let's look at a super simple test case:&lt;/p&gt;
  &lt;pre&gt;&lt;tt&gt;&lt;span class="preproc"&gt;require&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;'profiler'&lt;/span&gt;
&lt;/tt&gt;&lt;/pre&gt;
  &lt;p&gt;I said it was simple - didn't I!  Here's the trace from Linux:&lt;/p&gt;
  &lt;pre&gt;&lt;tt&gt;return start_profile
call print_profile
call stop_profile
c-call set_trace_func
&lt;/tt&gt;&lt;/pre&gt;
&lt;p&gt;Start_profile is the method that installs the set_trace callback - so it makes
  sens that the first thing we see is returning from that method. Once the program
  is done, the profiler  calls print_profile, which calls stop_profile, which
  calls set_trac_func which uninstalls the callback. So the method enters and
  returns do not balance.&lt;/p&gt;
&lt;p&gt;Although the method names ruby-prof uses are slightly different, the problem
  remains the same. ruby-prof hacks through it by pushing and popping
  extra items on its stack to counterweigh the imbalanced method calls.  Thus its
  hard-coded to a specific sequence of method calls. So why doesn't it work on Windows?&lt;/p&gt;
&lt;p&gt; A quick trace running our test program on Windows shows the problem:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;return start_profile
return require
call print_profile
call stop_profile
c-call set_trace_func
&lt;/tt&gt;&lt;/pre&gt;
There is an extra &amp;quot;return require&amp;quot; which is being generated by Ruby gems.
And if you run the program in &lt;a href="http://www.ruby-ide.com/ruby/ruby_ide_and_ruby_editor.php"&gt;Arachno&lt;/a&gt;, which uses a modified version of Ruby to supports
its fantastic debugger (its fast enough that I always  run Rails under the debugger
so I can set breakpoints at key places - definitely go check it out).
&lt;pre&gt;&lt;tt&gt;c-return set_trace_func 
return start_profile 
c-return require__ 
return require 
c-return require__ 
return require 
call print_profile 
call stop_profile 
c-call set_trace_func 
&lt;/tt&gt;&lt;/pre&gt;

&lt;p&gt;It quickly becomes clear that assuming a balanced stack is a bad idea.  If you look
  at the built in Ruby profile it doesn't make such an  assumption.
&lt;/p&gt;
&lt;p class="extended"&gt;These changes have been merged into ruby-prof-0.4.0 which is now available as a RubyGem.&lt;/p&gt;
&lt;p&gt;So, I've patched ruby-prof to remove this assumption and to make it compile on
  Windows. I'll submit the patch to Shugo Maeda, but in the meantime,  I've provided
  windows binaries for anyone who wants to  use the profiler on windows.  To install:&lt;/p&gt;
&lt;p&gt;1. Download the windows extension, prof.xo, and put it in your ruby\lib\ruby\site_ruby\1.8\i386-msvcrt
  directory.&lt;/p&gt;
&lt;p&gt;2. Download unprof.rb and put it in your ruby\lib\ruby\site_ruby\1.8 directory.&lt;/p&gt;
&lt;p&gt;3. To use the profiler simply require 'unprof' at the top of the file&lt;/p&gt;
&lt;p&gt;One thing to note about my changes. The self-time for the &amp;quot;toplevel&amp;quot; method will
  always show &amp;quot;0&amp;quot;.  Its looks like the Ruby profiler does the same thing, so I think
  this is ok.&lt;/p&gt;
&lt;h2&gt; Assembly Hacking&lt;/h2&gt;
&lt;p&gt;This section is for anyone who's interested in some lower level details - feel
  free to skip it.&lt;/p&gt;
&lt;p&gt;Getting ruby-prof to compile on windows required a few of the usual changes. 
  For example, making sure that the extension's initialization method is property
  exported using __declspec(dllexport), etc.&lt;/p&gt;
&lt;p&gt;However, ruby-prof provides an extra twist.  It can measure time in several ways including using some low-level functionality provided by more
  recent Pentium and PowerPC processors. To access this information it uses this
  inline assembly call:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;
&lt;span class="keyword"&gt;static&lt;/span&gt;&lt;span class="normal"&gt; prof_clock_t&lt;/span&gt;
&lt;span class="function"&gt;cpu_get_clock&lt;/span&gt;&lt;span class="symbol"&gt;()&lt;/span&gt;
&lt;span class="cbracket"&gt;{&lt;/span&gt;
&lt;span class="preproc"&gt;#if&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="function"&gt;defined&lt;/span&gt;&lt;span class="symbol"&gt;(&lt;/span&gt;&lt;span class="normal"&gt;__i386__&lt;/span&gt;&lt;span class="symbol"&gt;)&lt;/span&gt;
&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="type"&gt;unsigned&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="type"&gt;long&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="type"&gt;long&lt;/span&gt;&lt;span class="normal"&gt; x&lt;/span&gt;&lt;span class="symbol"&gt;;&lt;/span&gt;
&lt;span class="normal"&gt;    __asm__ &lt;/span&gt;&lt;span class="function"&gt;__volatile__ &lt;/span&gt;&lt;span class="symbol"&gt;(&lt;/span&gt;&lt;span class="string"&gt;&amp;quot;rdtsc&amp;quot;&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="symbol"&gt;:&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;&amp;quot;=A&amp;quot;&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="symbol"&gt;(&lt;/span&gt;&lt;span class="normal"&gt;x&lt;/span&gt;&lt;span class="symbol"&gt;));&lt;/span&gt;
&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="keyword"&gt;return&lt;/span&gt;&lt;span class="normal"&gt; x&lt;/span&gt;&lt;span class="symbol"&gt;;&lt;/span&gt;
&lt;span class="preproc"&gt;#elif&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="function"&gt;defined&lt;/span&gt;&lt;span class="symbol"&gt;(&lt;/span&gt;&lt;span class="normal"&gt;__powerpc__&lt;/span&gt;&lt;span class="symbol"&gt;)&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="symbol"&gt;||&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="function"&gt;defined&lt;/span&gt;&lt;span class="symbol"&gt;(&lt;/span&gt;&lt;span class="normal"&gt;__ppc__&lt;/span&gt;&lt;span class="symbol"&gt;)&lt;/span&gt;
&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="type"&gt;unsigned&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="type"&gt;long&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="type"&gt;long&lt;/span&gt;&lt;span class="normal"&gt; x&lt;/span&gt;&lt;span class="symbol"&gt;,&lt;/span&gt;&lt;span class="normal"&gt; y&lt;/span&gt;&lt;span class="symbol"&gt;;&lt;/span&gt;

&lt;span class="normal"&gt;    __asm__ &lt;/span&gt;&lt;span class="function"&gt;__volatile__ &lt;/span&gt;&lt;span class="symbol"&gt;(&lt;/span&gt;&lt;span class="string"&gt;&amp;quot;&lt;/span&gt;&lt;span class="specialchar"&gt;\n&lt;/span&gt;&lt;span class="string"&gt;\&lt;/span&gt;
&lt;span class="string"&gt;1:	mftbu   %1&lt;/span&gt;&lt;span class="specialchar"&gt;\n&lt;/span&gt;&lt;span class="string"&gt;\&lt;/span&gt;
&lt;span class="string"&gt;	mftb    %L0&lt;/span&gt;&lt;span class="specialchar"&gt;\n&lt;/span&gt;&lt;span class="string"&gt;\&lt;/span&gt;
&lt;span class="string"&gt;	mftbu   %0&lt;/span&gt;&lt;span class="specialchar"&gt;\n&lt;/span&gt;&lt;span class="string"&gt;\&lt;/span&gt;
&lt;span class="string"&gt;	cmpw    %0,%1&lt;/span&gt;&lt;span class="specialchar"&gt;\n&lt;/span&gt;&lt;span class="string"&gt;\&lt;/span&gt;
&lt;span class="string"&gt;	bne-    1b&amp;quot;&lt;/span&gt;
&lt;span class="normal"&gt;	&lt;/span&gt;&lt;span class="symbol"&gt;:&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;&amp;quot;=r&amp;quot;&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="symbol"&gt;(&lt;/span&gt;&lt;span class="normal"&gt;x&lt;/span&gt;&lt;span class="symbol"&gt;),&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;&amp;quot;=r&amp;quot;&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="symbol"&gt;(&lt;/span&gt;&lt;span class="normal"&gt;y&lt;/span&gt;&lt;span class="symbol"&gt;));&lt;/span&gt;
&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="keyword"&gt;return&lt;/span&gt;&lt;span class="normal"&gt; x&lt;/span&gt;&lt;span class="symbol"&gt;;&lt;/span&gt;
&lt;span class="preproc"&gt;#endif&lt;/span&gt;
&lt;span class="cbracket"&gt;}&lt;/span&gt;
&lt;/tt&gt;&lt;/pre&gt;


&lt;p&gt;For x86 chips, what it does is call the &lt;a href="http://en.wikipedia.org/wiki/RDTSC"&gt;rdtsc&lt;/a&gt; assembly function which returns
  the number of clock cycles that have been executed.   So if you call get_cpu_clock,
  wait 1 second, and call get_cpu_clock again, you can calculate the chip's clock
  frequency. Using this information, you can time method calls. For instance,
  if the chip's frequency is 500Mhz and a method takes 250,000,000 cycles to complete,
  you can calculate it took 0.5 seconds.&lt;/p&gt;
&lt;p&gt;This of course won't work with Visual C++ because it uses its own syntax for
  inline assembly calls. In this case there are couple ways of porting this code.
  Newer versions of Visual C++ support compiler intrinsics, and there is one for
  &lt;a href="http://msdn2.microsoft.com/en-us/library/twchhe95.aspx"&gt;rdtsc&lt;/a&gt;.  However,
  I thought it would be better to use inline assembly to support older versions.
   Here's the code:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;&lt;span class="keyword"&gt;static&lt;/span&gt;&lt;span class="normal"&gt; prof_clock_t&lt;/span&gt;
&lt;span class="function"&gt;&lt;/span&gt;&lt;span class="function"&gt;cpu_get_clock&lt;/span&gt;&lt;span class="symbol"&gt;()&lt;/span&gt;
&lt;span class="cbracket"&gt;{&lt;/span&gt;
&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="type"&gt;prof_clock_t&lt;/span&gt;&lt;span class="normal"&gt; cycles &lt;/span&gt;&lt;span class="symbol"&gt;=&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="number"&gt;0&lt;/span&gt;&lt;span class="symbol"&gt;;&lt;/span&gt;

&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="keyword"&gt;__asm&lt;/span&gt;
&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="cbracket"&gt;{&lt;/span&gt;
&lt;span class="normal"&gt;        rdtsc&lt;/span&gt;
&lt;span class="normal"&gt;        mov DWORD PTR cycles&lt;/span&gt;&lt;span class="symbol"&gt;,&lt;/span&gt;&lt;span class="normal"&gt; eax&lt;/span&gt;
&lt;span class="normal"&gt;        mov DWORD PTR &lt;/span&gt;&lt;span class="symbol"&gt;[&lt;/span&gt;&lt;span class="normal"&gt;cycles &lt;/span&gt;&lt;span class="symbol"&gt;+&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="number"&gt;4&lt;/span&gt;&lt;span class="symbol"&gt;],&lt;/span&gt;&lt;span class="normal"&gt; edx&lt;/span&gt;
&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="cbracket"&gt;}&lt;/span&gt;
&lt;span class="normal"&gt;    &lt;/span&gt;&lt;span class="keyword"&gt;return&lt;/span&gt;&lt;span class="normal"&gt; cycles&lt;/span&gt;&lt;span class="symbol"&gt;;&lt;/span&gt;
&lt;span class="cbracket"&gt;}&lt;/span&gt;
&lt;/tt&gt;&lt;/pre&gt;
To use this timing method you have to specifically enable it by  including the
following line in your ruby code.

&lt;pre&gt;&lt;tt&gt;&lt;span class="normal"&gt;ENV&lt;/span&gt;&lt;span class="symbol"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"RUBY_PROF_CLOCK_MODE"&lt;/span&gt;&lt;span class="symbol"&gt;]&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="symbol"&gt;=&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;"cpu"&lt;/span&gt;
&lt;span class="preproc"&gt;require&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;'unprof'&lt;/span&gt;
&lt;/tt&gt;&lt;/pre&gt;
 &lt;p&gt;However, I can't say this works very well. The calculated
   frequency for my chip is always different. I don't know why - my best guess is that
   its a Pentium M with Intel's speed step technology so the clock frequency varies
   to save power.  However, I'm usually plugged in so I don't think that's it. Note
   you can tell ruby-prof your click frequency like this:&lt;/p&gt;
&lt;pre&gt;&lt;tt&gt;&lt;span class="normal"&gt;ENV&lt;/span&gt;&lt;span class="symbol"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"RUBY_PROF_CLOCK_MODE"&lt;/span&gt;&lt;span class="symbol"&gt;]&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="symbol"&gt;=&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;"cpu"&lt;/span&gt;
ENV&lt;span class="symbol"&gt;[&lt;/span&gt;&lt;span class="string"&gt;"RUBY_PROF_CPU_FREQUENCY"&lt;/span&gt;&lt;span class="symbol"&gt;]&lt;/span&gt;&lt;span class="symbol"&gt;=&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;"466000000"&lt;/span&gt;
&lt;span class="preproc"&gt;require&lt;/span&gt;&lt;span class="normal"&gt; &lt;/span&gt;&lt;span class="string"&gt;'unprof'&lt;/span&gt;&lt;/tt&gt;&lt;/pre&gt; 
&lt;p&gt;So my recommendation is just use the default ruby-prof timing method - it does
  the job perfectly well.&lt;/p&gt;

&lt;div class="extended"&gt;These changes have been merged into &lt;a href="http://rubyforge.org/projects/ruby-prof/"&gt;ruby-prof-0.4.0&lt;/a&gt; so I've taken them offline&lt;/div&gt;

</description>
      <pubDate>Fri, 09 Jun 2006 13:14:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:fe6a3c87-36f8-4f7b-a014-e6da1fea7ce4</guid>
      <comments>http://cfis.savagexi.com/2006/06/09/porting-ruby-perf-to-windows#comments</comments>
      <category>Rails</category>
      <category>Ruby</category>
      <category>ruby-prof</category>
      <category>Tools</category>
      <trackback:ping>http://cfis.savagexi.com/trackbacks?article_id=221</trackback:ping>
      <link>http://cfis.savagexi.com/2006/06/09/porting-ruby-perf-to-windows</link>
    </item>
  </channel>
</rss>
