Reusing Rails Controllers is a Good Thing

Posted by Charlie Mon, 11 Feb 2008 18:11:00 GMT

If there were such a thing as the Ten Commandments of programming, code reuse would surely be included. Now you're probably thinking I've lost my mind, because any good developer knows that code reuse is a pipe dream. But that's because you are thinking on the macro level and not the micro level. At the micro level, code reuse has altogether more pleasant acronym, DRY, or do not repeat yourself.

The tussle in the Rails community over components illustrated this tension well. On one extreme, advocates dreamed of creating plug-and-play components that could be reused across multiple applications. On the other extreme, detractors sneered at code reuse as a hopeless endeavor and vowed to strip out all traces of components from the the Rails 2.0 release. Left out in the cold was the view that reusing code, specifically controllers, within a single application was a good thing.

Reusing Controllers

Rails is built using the Model-View-Controller pattern, which is designed to segment an application into controllers, models and views. Rails also adds in the concept of filters, which are pieces of code that run before and after controllers.

The Rails community encourages reuse of models, filters and views, but seems to actively discourage the reuse of controllers if the Ruby on Rails book is any indication:

When Rails was initially released, it came with a system for creating components. Unfortunately, the implementation of components left a lot to be desired: performance was poor, and there were unanticipated side effects. As a result,
components are being phased out.

Instead, the common wisdom now is to synthesize component-like functionality using a combination of before filters and partials. Use the before filter to set up the context for the partial, and then render the fragment you want using a regular render :partial call.

Like much conventional wisdom, this advice is hogwash.

Filters + Partials != Controllers

The problem with just using filters and partials is that it only applies to a subset of web applications. A good example, and of course the one used in the Ruby on Rails book, is a shopping website. The focal point of most shopping websites is a shopping cart. The point of the application is to make it easy for users to add things to the cart, modify the cart and hopefully buy the contents of the cart. Since the cart plays such a crucial role, it often make sense to have a filter that setups the cart so each controller has easy access to it. A nice side affect of this approach, is that views also have access to the cart, as described in the quote above.

But for many other types of applications, filters and partials can't make up for controllers. For example, take a look at the Boulder community on MapBuzz. The top-left side of the page is rendered by the community controller, the comments on the bottom left by a comment controller and the map listings on the right are by a map browser controller. If you log-in, then a couple of additional tabs are added to the page, each rendered by its own controller.

This type of composition is quite common in Web 2.0 applications. Take most social networking sites - they'll mix together news feeds, discussion boards, friends/friends lists, pictures,etc., in a variety of different ways depending on the current page.

The design problem is that any given controller can be called in two different contexts:

  • When the whole page is rendered via a Browser page refresh
  • When just the controller is rendered via an Ajax call

Trying to do this with just filters and helpers is a non-starter, because you end up with one big controller that needs to run different filters depending on the context of the call.

The better approach is to divide your controllers into logical units, and then have a separate page controller for the entire page. When the page is rendered, the page controllers should delegate rendering the various sub-parts (for example, tabs) of the page to the appropriate controller. When just one of the sub-parts of the page needs to be rerendered, due to an Ajax call, then you directly call the appropriate controller.

Performance

In Rails, a controller or view can call another controller using the much maligned render_component method. Part of the problem is the method is misnamed. It no longer has anything to do with rendering components - instead its used to invoke another controller. Therefore, it would be more appropriately named render_controller, call_controller, invoke_controller, etc.

Assuming you agree with my so far, reading the Rails documentation for render_component with certainly give you pause:

Components should be used with care. They‘re significantly slower than simply splitting reusable parts into partials and conceptually more complicated. Don‘t use components as a way of separating concerns inside a single application. Instead, reserve components to those rare cases where you truly have reusable view and controller elements that can be employed across many applications at once.

So to repeat: Components are a special-purpose approach that can often be replaced with better use of partials and filters.

Undoubtedly this was true once upon a time. Is it still? There is one way to find - run a test. I created a new Rails application using the built-in generators and then added the following simple code:

controller/main_controller.rb

class MainController < ApplicationController
  def get_without_controller
    a = 1
  end
  
  def get_with_controller
    a = 1
  end
end

controller/sidebar_controller.rb

class SidebarController < ApplicationController
  def get
    render(:partial => 'sidebar/content')
  end
end

views/main/get_without_controller.html.erb

<p>Some fun content goes here</p>
<div class="sidebar">
  <%= render(:partial => 'sidebar/content') %>
</div>

views/main/get_with_controller.html.erb

<p>Some fun content goes here</p>
<div class="sidebar">
  <%= render_component(:controller => SidebarController,
                       :action => 'get') %>
</div>

views/sidebar/_content.html.erb

<p>Hi there</p>

There are two paths through this application:

  • GET '/main/get_without_controller.rb'
  • GET '/main/get_with_controller.rb'

In case its not obvious, get_without_controller users render(:partial) to include the sidebar content while get_with_controller uses render_component. Using both benchmark and ruby-prof, I ran each method 100 times using a souped up integration test (more about that in a future post). The results, using Rails 2.02 on Ruby 1.8.4 on WindowsXP on a Pentium M laptop (about 3 years old) are:

Method 100 Requests (s) 1 Request (s)
get_without_controller 0.30 0.0030
get_with_controller 0.45 0.0045

 

So using components is 50% slower, but the overhead is a miniscule 0.0015 seconds per request. That overhead is obviously lost in a real application. Of course you have to be careful when using render_component to not try and do to much per HTTP request - but the same is true using filters and partials.

DRY Up Your Controllers

In truth, render_component is the most primitive way imaginable of reusing controllers. But it does let you to DRY up your Rails application by letting you create more cohesive controllers that can be reused within a single website. For most websites you won't need this functionality, but when you do, there isn't a substitute for it and don't let anyone browbeat you into thinking there is.

Posted in ,  | 13 comments | 2 trackbacks

Rails and Postgresql - Eliminate Hundreds of Thousands of Queries a Day

Posted by Charlie Fri, 08 Feb 2008 18:16:00 GMT

Update - It turns out that Rails does cache column data dictionary queries (which is what you would expect), but not for :has_and_belongs_to_many associations (HABTMA). I know those are "old fashioned," but they fit our data model perfectly in a couple of places. So be warned - using just a couple of HABTMA associations will generate a huge number of data dictionary queries.

As part of monitoring the performance of MapBuzz, we run a nifty little program called PgFouine to analyze the postgresql log files every night. PgFouine summarizes the most common queries and slowest queries. Here is our data from yesterday:

Most frequent queries (N)

Rank Times executed Total duration Av. duration (s) Query
1 140,115 2m42s 0.00 SELECT a.attname, format_type(a.atttypid, a.atttypmod), d.adsrc, a.attnotnull FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = ''::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum;

Besides being the most run query, this was also the seventh most time consuming query.

I was shocked the first time I saw this many months ago - my first guess was that we were not caching classes in our production mode. But it turned out that wasn't the problem. Its just Rails being silly - every time it loops over a model's columns it strikes up a conversation with the database. That happens a fair bit - when you use :include to add additional tables to a query, when you use dynamic finders (find_by_x), when you use relationships setup by :has_many, :has_and_belongs_to, etc. And this isn't the only place Rails is wasteful - it constantly queries the database for table names and indices - it just happens those queries don't run nearly as often.

Rails Plugin

Clearly there is no reason to do this in a production environment, and in truth, I don't see much reason to do it in a development environment either. So yesterday I finally go around to patching Rails and submitting a bug report. The patch caches data dictionary queries for the Postgresql adapter. After loading the patch, Rails still supports the ability to add tables to your database at runtime, but no longer supports adding or removing columns from a table at runtime or recycling table names. If these things are important to you, the patch also provide a flush_dd_cache method that flushes the query cache. An obvious alternative solution is to add a cache_dd_info class variable to ActiveRecord::Base, which would be off by default but on in production. However, I'm skeptical there is a need for such a flag.

On a per request basis, you won't see much performance gain from the patch in your Rails application servers, but it will remove needless load from your database. And while you are waiting for Rails to be patched (if it is patched), feel free to download the Rails plugin we are using to solve the problem. Note the plugin also fixes two other ActiveRecord bugs, which are its incorrect handling of Postgresql schemas and its ignoring of views.

Posted in  | 5 comments | no trackbacks

Making Rails Better - Fixing Architecture Flaws in Active Record

Posted by Charlie Sun, 12 Aug 2007 02:19:00 GMT

ActiveRecord is a funny thing. On the outside it looks great - it neatly maps relational data to Ruby objects and provides an easy to use API via its domain specific language. But on the inside, it contains two suprising architecture flaws that make it difficult to extend and negatively impact performance.

The Vietnam of Computer Science

Mapping object to relational data turns out to be quite tricky. There are so many failed object-relational mapping systems that the whole field has been called the Vietnam of Computer Science. The problem is that objects and tables don't map cleanly to each other, and the more you try to automate the process the more complex your code becomes, and sooner or later your system becomes too hard and too slow to use.

The approach that I think works best, and ActiveRecord follows, is too keep things relatively simple. An record in a table is mapped to an object, and any related tables are mapped to associations that contain one or more object (depending if the relation is one to one or one to many). And that's it, anything past that risks descending into the morass of failed object-relational mappings.

ActiveRecord gets bonus points because it makes it easy to define such mappings via its Domain Specific Language (DSL) - the familiar methods :has_one, :has_many, etc.

Botching Columns

But underneath its exterior, ActiveRecord has a couple of architecture flaws in the way that it handles columns (attributes).

The first issue is that ActiveRecord botches its implementation of columns. Reading and writing data from a database requires converting the data from its textual representation (provided by the database's client APIs) to and from Ruby objects. Let's look at how Rails does it for Postgresql:

A quick glance at some code shows the problem:

def translate_field_type(field_type)
  case field_type
    when /\[\]$/i  then 'string'
    when /^timestamp/i    then 'datetime'
    when /^real|^money/i  then 'float'
    when /^interval/i     then 'string'
    when /^(?:point|lseg|box|"?path"?|polygon|circle)/i  then 'string'
    when /^bytea/i        then 'binary'
    else field_type       # Pass through standard types.
  end
end

def default_value(value)
  # Boolean types
  return "t" if value =~ /true/i
  return "f" if value =~ /false/i

  # Char/String/Bytea type values
  return $1 if value =~ /^'(.*)'::(bpchar|text|character varying|bytea)$/

  # Numeric values
  return value if value =~ /^-?[0-9]+(\.[0-9]*)?/

  # Fixed dates / times
  return $1 if value =~ /^'(.+)'::(date|timestamp)/

  # Anything else is blank, some user type, or some function
  # and we can't know the value of that, so return nil.
  return nil
end

Having large case statements in an object-oriented language is a sure sign your design is flawed. The fundamental problem is that the implementation above is not extensible - you can't easily add your own field types.

You could argue that that extensibility was not a design goal of ActiveRecord, but that would be silly. Even if you agreed that ActiveRecord should only support a few limited data types (which I don't) there are still enough differences between databases that having an extensible system would clean up the internals of ActiveRecord and get rid of the grungy code above.

And more importantly, it would let users add their own data types. And that is important. For example, with MapBuzz we need to support Postgres's geometry types and we would also like to support its full text search types. Overriding Rails to support them is an exercise in annoyance since it requires overriding various core methods in the Postgresql adapter.

The way this should have been implemented is introducting a Column object. The column object's api would be simple - it would have a serialize and derialize method. Note that ActiveRecord does indeed have a column object, but its very weird implementation. For example:

def klass
  case type
    when :integer       then Fixnum
    when :float         then Float
    when :decimal       then BigDecimal
    when :datetime      then Time
    when :date          then Date
    when :timestamp     then Time
    when :time          then Time
    when :text, :string then String
    when :binary        then String
    when :boolean       then Object
  end
end

This code is clearly trying to be much too clever. Keep it simple stupid! There should be a TimeColumn, FloatColumn, etc. That way, a developer can add their own column types - so for us a GeomColumn.

Attributes

The second issue, which is related, is the way that column values are handled. ActiveRecord stores data read from a database in a hash table called attributes. But suprisingly, the attributes hash table is also used to store Ruby objects. Thus the data stored in the attributes hash table may either be a Ruby object (in serialized format) or the text returned from the database (unserialized).

This is a horrible design for two main reasons.

First, it means that every time an attribute is accessed there has to be code to check to see if its its in string format not. If it is, the data must be converted to Ruby, which causes a performance hit.

Second, it means that ActiveRecord cannot keep track of which attributes have changed and which have not. That's important, because it means that ActiveRecord updates every column even when just one column changes. Besides being a performance hit, it means that ActiveRecord will corrupt your database if you are not careful. That happens when a table contains a column type that ActiveRecord is not familiar with - a good example being a ts_vector field in Postgresql. ActiveRecord will attempt to update it using the wrong value although the column hasn't changed at all.

So what's a better solution? A pure object-oriented solution would introduce a Field object, which has four fields - the raw value (from the database), the serialized value (the ruby object), a reference to the column object which knows how to serialize/deserialize the field and a changed column.

But that's pretty heavy-weight since you're introducing an extra object per field per record. An alternate solution would be too introduce three hash tables per record - one to hold the raw values, one to hold the serialized values and one to hold the changed flag. You would also want to store references to the records columns, most likely on the class itself (so if your table is called parents, then store the column information on the Parent class).

Fixing Active Record

The good news is that the Rails team is looking at these issues. In particular, Michael Koziarski has recently posted a patch that introduces the concept of a separate hash table to store serialized values. So check out the patch, and be sure to offer Michael your comments!

Posted in  | 10 comments | no trackbacks

Making Rails Go Vroom

Posted by Charlie Wed, 18 Jul 2007 17:12:00 GMT

Last week I showed how to profile a Rails application using ruby-prof. The post was driven by our desire to improve MapBuzz's map rendering speed. Our nightly log analysis showed that rendering speed has significantly degraded over the last few weeks. Since the whole point of MapBuzz is to share maps with others, we needed to find out what was taking so long.

A day of profiling and we had our answer. With a few simple changes we were able to reduce rendering time from 7.8 seconds to 0.92 seconds (note these times are gathered while profiling the application, so the real times are 2 to 3 times less).

What we found should be generally applicable to all Rails applications, so I'd like to share them:

If you interested in the nitty-gritty details of each change, keep reading!

A Bit of Background

MapBuzz renders maps in a two step process. First, the html page is created and shown to the user. Second, the browser makes an Ajax request to get the features that should be shown on the map. The reason for having a two step process is that it allows users to zoom and pan around the map, without having to reload the whole page.

Log analysis revealed that the Ajax request was taking a long time - 1 to 2 seconds, with a big standard deviation. Our goal is to reduce the average time to less than half a second and to significantly reduce the standard deviation.

If you look at the request URI, you'll see its an Atom feed of features and not an HTML page. That has performance implications.

First, Atom feeds tend to have lots of links. In our case, each entry has 10 to 15 links. We limit the number of entries to 50, so that means there are roughly 600 link per page.

Second, we generate Atom feeds in a modular fashion - each entry is created from 10 partials. There is a partial for the user, one for ratings, one for content, one for icons, one for geometries, etc. Why did we do it this way? Simple - to keep things DRY. Having a set of partials creates an extremely modular system - we can mix and match them as needed depending on the context of the request.

Setup

Testing was done with two machines. The first was my laptop, which is a Dell Latitude D610 running windows, Ruby 1.8.4, and Rails 1.2.3. The second is our staging server, which is fairly high end Dell Desktop running Fedora Core 6 with a copy of our production database (which is about 10Gb). Profiling was done with ruby-prof 0.5.1 (of course), using the Rails plugin I described previously. Note that the logging level was set to :debug, which will show up in the test results.

Baseline

To start we measured the baseline performance of our test setup. It was ugly indeed:

Total: 7.828
%self total self wait child calls name 24.11 1.95 1.89 0.00 0.06 3605 Kernel#clone 13.15 1.04 1.03 0.02 0.00 569 IO#write 7.78 4.13 0.61 0.00 3.52 441 <Module::Benchmark># realtime-1 7.41 0.58 0.58 0.00 0.00 125 PGconn#exec

One bit of solace is that profiling an application will slow it down 2 to 3 times, but that sill leaves us with a 4 seconds per request. So let's start digging through the results.

Don't Use ActiveRecord:attributes

The first thing that jumps out is the huge amount of time Kernel#clone takes. A look at the call graph shows that the caller is clone_attribute_value.

And a bit more digging reveals this custom code in our application:
def rating
  # Convert cached rating from BigDecimal to float,
  # otherwise strange rounding errors happen
  attributes['rating'].to_f
end

The problem is that when you call ActiveRecord#attributes it returns a copy of the attributes, thus generating all the clone calls. Uuggh. I think this is a result of ActiveRecord's flawed attribute implementation, but that is for another blog.

What we want is access to the untyped, original value of the attribute. In theory you are supposed to use the auto generated method, rating_before_type_cast, for this. This has the advantage of skipping the clone call, but it relies on method_missing, which has some overhead (we did not measure how much). Or, you could use read_attribute directly, which also skips the clone call. This is usually your best choice.

For performance critical code, you may wish to read the untyped attribute directly (see the discussion about time below for an example). That can be done using the method read_attribute_before_type_cast, except that it is private. However, with Ruby, that's easily solved:

module ActiveRecord
  class Base
    public :read_attribute_before_type_cast
  end
end

Lesson - Don't use attributes

Gain - 24%

Use include

Let's look at our updated results:

Total: 3.39

 %self     total     self     wait    child    calls  name
 16.73      0.57     0.57     0.00     0.00      569  IO#write
 12.83      0.44     0.44     0.00     0.00      125  PGconn#exec
  8.76      1.77     0.30     0.00     1.48      441  <Module::Benchmark>#
	                                                    realtime-1
  5.99      0.22     0.20     0.00     0.01       40  PGconn#query

What stands out are the 125 calls to PGconn#exec - each one is a separate query. Looks like we forgot to specify a table or two to be eager loaded via ActiveRecord#find's :include option.

Note there is one downside to using :include - you lose the ability to use the :select option. For us that is important - and can be worked around by using the rails select_parser I've blogged about before.

Lesson - Get your :includes right

Gain - 12%

cache_template_loading=true

After fixing attributes and includes, here is our current timings:

Total: 2.594

 %self     total     self     wait    child    calls  name
 12.64      1.14     0.33     0.00     0.81      361  <Module::Benchmark>#
                                                      realtime-1
 12.61      0.33     0.33     0.00     0.00      409  IO#write
  4.90      0.13     0.13     0.00     0.00      403  <Class::File>#mtime
  4.86      0.16     0.13     0.00     0.03    30250  Hash#[]
  3.62      0.09     0.09     0.00     0.00        5  PGconn#exec

A couple of things jump out. First, using Benchmark#realtime is fairly time consuming. Second, logging also takes its toll as seen in the times for IO#write (remember we have logging set to :debug).

However, the #mtime call looks suspicious. Some more digging through Rails shows all 403 calls come from ActionView::Base#compile_template?. Let's take a look:

def compile_template?(template, file_name, local_assigns)
  method_key    = file_name || template
  render_symbol = @@method_names[method_key]

  if @@compile_time[render_symbol] &&
	   supports_local_assigns?(render_symbol, local_assigns)
    if file_name && !@@cache_template_loading 
      @@compile_time[render_symbol] < File.mtime(file_name) ||
        (File.symlink?(file_name) && 
        (@@compile_time[render_symbol] < File.lstat(file_name).mtime))
    end
  else
    true
  end
end

Remember I mentioned that it takes roughly 500 partial calls to generate the output? This is where it bites us - by default Rails checks the timestamp of cached templates before running them. In a production environment that is totally unnecessary. The solution is simple - just update your production.rb file like this:

# Don't check view timestamps!
config.action_view.cache_template_loading = true

And while we are at it, we'll change the debug level to :info to reduce its impact on the results.

Lesson - Set cache_template_loading to true

Gain - 5%

Don't use url_for

We've managed to reduce the request time from 7.8 to 2.5 seconds. Not bad. But we still have more work to do.


Total: 2.531

 %self     total     self     wait    child    calls  name
 13.59      1.91     0.34     0.00     1.56       45  <Module::Benchmark>#
                                                      realtime
 12.25      0.31     0.31     0.00     0.00      409  IO#write
  4.98      0.81     0.13     0.00     0.69      361  <Module::Benchmark>#
                                                      realtime-1
  4.94      0.13     0.13     0.00     0.00       40  ActiveRecord::Associations::
                                                      HasManyThroughAssociation#
                                                      construct_sql
  3.71      0.09     0.09     0.00     0.00        5  PGconn#exec
  3.04      0.08     0.08     0.00     0.00    29847  Hash#[]
  3.00      0.11     0.08     0.00     0.03     2097  Hash#each
  1.86      0.28     0.05     0.00     0.23      322  ActionController::Routing::
                                                      RouteSet#generate

Of particular interest is the ActionController::Routing::RouteSet#generate. Although it only runs for 0.05 seconds, its total time, including children, is 0.28. Thus creating 322 urls takes 11% of the request time.

People have previously mentioned how slow url_for is - well, its true. Instead of using url_for, just create your urls manually using string mashing. For example, if you want an absolute uri, then do this:

"#{request.protocol}#{request.host_with_port}/controller/#{map.id}"

Lesson - Don't use url_for or link_to

Gain - 11%

Don't let Rails parse timestamps

After removing url_for, we've smashed the 2 second barrier:

Total: 1.391

 %self     total     self     wait    child    calls  name
  9.06      0.13     0.13     0.00     0.00      726  String#sub!
  8.99      0.13     0.13     0.00     0.00       40  ActionView::Base::CompiledTemplates#
	                                                    _run_ratom_47app47views47geometry47_geometry46ratom
  6.69      0.09     0.09     0.00     0.00        5  PGconn#exec
  5.54      0.08     0.08     0.00     0.00      741  Item#id
  5.54      0.30     0.08     0.00     0.22      121  <Class::Date>#_parse

Next up is <Class::Date>#_parse, whose total time, including children, is a whopping 21% of the time. Ouch. If you take a look at the code (its in the standard ruby library), you can see its quite complicated since it tries to parse a variety of date formats. In addition, it makes use of rational numbers, which are none too fast.

Taking a look at a graph profile, we see that <Class::Date>#_parse is called from <Class::ActiveRecord::ConnectionAdapters::Column>#string_to_time. Thus Rails uses it to convert timestamps received from the database into Ruby objects.

As I detailed in a recent post, the problem is that Ruby's DateTime implementation is extremely slow compared to its Time implementation. The solution is to avoid DateTime's entirely and use custom time parsing code. Fortunately, our database can output time in ISO 861 format, which Time can quickly parse.

So, any place we access times, we simply roll our own attribute readers and writers like this (note that we don't use read_attribute!).

def created_on
  @created_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end
  
def created_on=(value)
  write_attribute(:created_on, value)
  @created_on = nil
end
  
def updated_on
  @updated_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end

Lesson - Don't let Rails parse timestamps - do it yourself

Gain - 20%

Don't symbolize keys

We've almost hit the 1 second barrier:

Total: 1.156

 %self     total     self     wait    child    calls  name
  8.13      0.11     0.09     0.00     0.01      642  Kernel#send-3
  8.13      0.09     0.09     0.00     0.00        5  PGconn#exec
  6.75      0.08     0.08     0.00     0.00     7497  String#concat
  4.15      0.05     0.05     0.00     0.00      817  Hash#each
  4.07      0.05     0.05     0.00     0.00       40  ActiveRecord::Associations::
                                                      HasAndBelongsToManyAssociation#
                                                      construct_sql
  2.77      0.03     0.03     0.00     0.00     6360  String#match
  2.77      0.06     0.03     0.00     0.03       47  Array#each_index
  2.77      0.03     0.03     0.00     0.00      280  IconType#size
  2.60      0.03     0.03     0.00     0.00     4309  Array#[]
		

At this point, most of the easy wins are gone. However, there is at least one left. Although its not shown in the flat profile, the graph profile shows that the method ActiveSupport::CoreExtensions::Hash::Keys#symbolize_keys takes 2.77% of the time including its children. Additional rummaging through the call graph shows that the calls come from ActionView#compile_and_render_template:

def compile_and_render_template(extension, template = nil, 
			        file_path = nil, local_assigns = {})
  # convert string keys to symbols if requested
  local_assigns = local_assigns.symbolize_keys if 
	          @@local_assigns_support_string_keys

A bit of research shows that local_assigns_support_string_keys is deprecated and slated for removal from Rails! Sweet - that means we can save an additional 3% simply by adding this line to environment.rb:

config.action_view.local_assigns_support_string_keys = false

If only all optimizations were so simple.

Lesson - Don't symbolize keys

Gain - 3%

Wrapping Up

So what's our final time? Let's see:

Total: 0.922

 %self     total     self     wait    child    calls  name
 10.30      0.10     0.10     0.00     0.00        5  PGconn#exec
  3.47      0.17     0.03     0.00     0.14      312  Array#each-2
  3.47      0.03     0.03     0.00     0.00      212  ActiveRecord::ConnectionAdapters::
                                                      Quoting#quote
  3.47      0.03     0.03     0.00     0.00    19455  Hash#[]
  3.47      0.36     0.03     0.00     0.33      360  ActionView::Base#render-2
  3.36      0.06     0.03     0.00     0.03       47  Array#each_index
  3.36      0.03     0.03     0.00     0.00      299  ActiveRecord::Associations::
                                                      AssociationProxy#initialize
  3.36      0.14     0.03     0.00     0.11      705  ActiveRecord::Associations::
                                                      AssociationProxy#method_missing

Not bad - we reduced the request time from 7.8 to 0.92 seconds with a day's worth of work. Obviously there is still much to be done - we need to run the code through our performance monitoring suite and make sure that the average time holds across a number of requests and that the standard deviation is in line. But we've at least made a good start.

Posted in , , ,  | 27 comments | 7 trackbacks

Tick, tick, tick

Posted by Charlie Fri, 13 Jul 2007 17:05:00 GMT

While analyzing Rails performance, I stumbled upon the fact that parsing timestamps in ActiveRecord is very slow (I'll post more about that in an upcoming blog).

A bit of googling revealed that Charlie Bowman discovered this a year an a half ago. He has a nice post comparing the performance of Ruby's Time and DateTime classes. Sadly he didn't use ruby-prof to come up with his results, but we'll forgive him for that :)

Now you might reasonably assume that the Time class deals with, er, time and the DateTime class deals with date times, which we'll refer to as timestamps. Well, like me, you'd be wrong. Totally wrong.

It turns out that Time and DateTime both deal with timestamps. But they are implemented completely differently.

Time is a c extension that wraps your operating system's time functionality. It stores times as seconds from January 1st, 1970. Its biggest upside is performance. Its biggest downside is its limited range - times before 1970 are "out of range." I have to admit that doesn't make any sense to me since no other language I've used suffers from this problem and I'm fairly sure Windows and Linux don't either. But no matter.

DateTime is a pure ruby implementation of time that let's you store a much wider range of dates. It also supports both the Julian and Gregorian calendars - on the off hand chance you need such a thing.

Rails takes the safe route and converts timestamps to DateTimes. But there is a performance penalty to pay. Below is simple little script showing how much time it takes to parse a timestamp using 3 approaches - DateTime.parse, Time.parse and Time.xmlschema.

require 'date'
require 'ruby-prof'

# Use xml schema format
now = '2007-07-12T23:36:25.609375-06:00'

result = RubyProf.profile do 
  1000.times do
    time = DateTime.parse(now)
  end
end

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})

result = RubyProf.profile do 
  1000.times do
    time = Time.parse(now)
  end
end

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})


result = RubyProf.profile do 
  1000.times do
    time = Time.xmlschema(now)
  end
end

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})

And the results:


Total: 3.86

 %self     total     self     wait    child    calls  name
 14.15      0.83     0.55     0.00     0.28    19000  Integer#gcd
 10.91      2.16     0.42     0.00     1.74    19000  <Class::Rational>#reduce
  8.06      0.37     0.31     0.00     0.06    27000  Rational#initialize
  7.33      0.28     0.28     0.00     0.00     6000  String#sub!
  5.70      0.74     0.22     0.00     0.52    27000  Class#new
  4.87      0.94     0.19     0.00     0.75     1000  <Class::Date>#_parse
  3.68      0.14     0.14     0.00     0.00    29000  <Class::Object>#allocate
  3.65      2.30     0.14     0.00     2.16    19000  Object#Rational
  3.26      0.86     0.13     0.00     0.74    27000  <Class::Rational>#new!
  3.24      0.79     0.13     0.00     0.67     4000  Rational#+

---------------
Total: 1.359

 %self     total     self     wait    child    calls  name
 32.16      0.44     0.44     0.00     0.00     6000  String#sub!
  6.92      0.95     0.09     0.00     0.86     1000  <Class::Date>#_parse
  6.84      0.36     0.09     0.00     0.27     1000  <Class::Time>#make_time
  5.81      0.23     0.08     0.00     0.15     1000  Rational#/
  5.74      0.08     0.08     0.00     0.00     5000  Rational#initialize
  4.71      0.10     0.06     0.00     0.03     1000  <Class::Time>#apply_offset
  4.49      0.09     0.06     0.00     0.03     3000  Integer#gcd
  3.46      0.23     0.05     0.00     0.19     3000  <Class::Rational>#reduce
  3.46      1.36     0.05     0.00     1.31     1000  <Class::Time>#parse

---------------
Thread ID: 21259270
Total: 0.297

 %self     total     self     wait    child    calls  name
 37.04      0.14     0.11     0.00     0.03     1000  <Class::Time>#zone_offset
 15.82      0.30     0.05     0.00     0.25     1000  <Class::Time>#xmlschema
 10.44      0.08     0.03     0.00     0.05     1000  <Class::Time>#apply_offset
  5.39      0.02     0.02     0.00     0.00     1000  <Class::Time>#utc
  5.39      0.03     0.02     0.00     0.02     1000  <Class::Time>#month_days
  5.39      0.02     0.02     0.00     0.00     3000  Fixnum#*
  5.39      0.02     0.02     0.00     0.00     1000  Fixnum#-
  5.05      0.01     0.01     0.00     0.00     4000  Fixnum#divmod
  5.05      0.01     0.01     0.00     0.00     1000  String#upcase
  5.05      0.01     0.01     0.00     0.00     8000  String#to_i

DateTime.parse is slow because its a complicated piece of code that relies heavily on rational numbers. Time.parse is roughly 3 times faster, although if you look under the hood, it actually uses part of DateTime.parse. That seems a bit strange to me, but I didn't spend the time to investigate further.

The real winner here is Time.xmlschema which is a over 10 times faster than DateTime.parse. Why is it so much faster? Instead of a mashing a bunch of strings, it uses a regular expression to parse timestamps. Thus it trades flexibility for performance.

When talking to a database there is no need for flexibility - your database most likely outputs time in the exact same way all the time. So if you heavily use dates in your Rails application, then roll your own timestamp parsing.

Posted in ,  | 3 comments | 1 trackback

How to Profile Your Rails Application

Posted by Charlie Tue, 10 Jul 2007 16:58:00 GMT

Yesterday I mentioned that ruby-prof now supports profiling rails applications. Lets see how that works.

But a word of caution before beginning. Like all web applications, Rails apps are complex. They encompass many pieces - clients on remote machines, network connections, any number of application servers and database servers. Getting the whole ensemble to work can be tricky - let alone making it perform well.

To avoid drowning in complexity (and data), its crucial to use a rigorous approach to analyzing your application's performance. For a good example of how to approach the problem, take a look at Zed Shaw's analysis of Ruby/Odeum's performance.

Ok - enough lecturing - onto the good stuff.

Installation and Setup

Assume that after much hard work, you've identified a request that takes too long (where of course you have to define what too long is). In addition, you know that the culprit is your Rails code - not the client, not the network and not the database.The trick is to find out what is taking so long - and there is where ruby-prof helps.

First install ruby-prof by opening a command line and typing:

gem install ruby-prof

Next, copy ruby-prof's Rails plugin to your Rails plugin directory:

cp rails_plugin rails_app/vendor/plugins

Finally create a new Rails configuration with the following setting:

config.cache_classes = true

If you don't cache classes, then the time Rails spends reloading and compiling your models and controllers will overwhelm anything else your code does. And obviously try to make your test setup as similar as possible to your production environment - use a copy of the same database, use the same web server, etc.

Profiling a Request

Once you've completed your setup, and started your Rails server, make a single request. Obviously for real testing you'll need to make many requests, but let's start with one to see how ruby-prof works.

To make this as realistic as possible, I'm going to show some old profiling data for MapBuzz, which is a social mapping site I started that let's you see what's going on in your neighborhood.

One of the things that is important to us is how long it takes to render the home page - so let's make a request to the home page. Once the request is complete, the results are output in the Rails log file.

Completed in 0.71900 (1 reqs/sec)
Rendering: 0.37600 (52%) | DB: 0.18700 (26%)
200 OK [http://localhost/]

Thread ID: 175635930
Total: 0.719

 %self     total     self     wait    child    calls  name
 26.15      0.19     0.19     0.00     0.00       23  PGconn#exec
 26.01      0.19     0.19     0.00     0.00       27  PGresult#each
  4.45      0.03     0.03     0.00     0.00     1403  String#match
  4.45      0.03     0.03     0.00     0.00      171  IO#write
  2.23      0.02     0.02     0.00     0.00      872  String#to_s
  2.23      0.02     0.02     0.00     0.00      467  String#gsub
  2.23      0.03     0.02     0.00     0.02       76  ManyParser#_parse
  2.23      0.05     0.02     0.00     0.03       60  MonitorMixin
                                                      #synchronize
  2.23      0.02     0.02     0.00     0.00       60  Parsers#sequence
  2.23      0.02     0.02     0.00     0.00     1640  ParseContext#eof
  2.23      0.02     0.02     0.00     0.00       31  ActionView::
                                                      Partials
                                                      #add_object_to_
                                                      local_assigns!
  2.23      0.02     0.02     0.00     0.00       11  ActionView::
                                                      Helpers::
                                                      UrlHelper#link_to
  2.23      0.02     0.02     0.00     0.00      243  Object#add_error
  2.23      0.02     0.02     0.00     0.00      175  Kernel#clone

The first line shows the overall time as measured by the benchmark class used by Rails. Below that, you'll see a flat profile generated by ruby-prof that shows the slowest methods. By default methods that take more than 1% of the time will be printed - for brevity's sake I've culled it to 2%.

A quick scan of the results shows some interesting things. First, the most time is spent in PGconn#exec, which is used to query the database via a C extension. Most likely there isn't much to optimize there on the Rails side, although there certainly may be room for optimization on the database side.

Next, PGconn#each is likely looping over the query results. Its time seems excessive, but we'll worry about that later since we notice there are 1,403 calls to String#match. That seems overly excessive - what is causing that?

Digging Deeper

Flat profiles are great for providing an overview of where time is spent in your program, but they leave out a lot of useful information. In this case, we'd like to know what code is calling String#match. To do that we need information provided by graph profiles.

To create a graph profile, open up vendor/ruby-prof/lib/profiling.rb, and edit it like this:

module ActionController #:nodoc:
  module Profiling #:nodoc:
    def perform_action_with_profiling
      if RubyProf.running? or
        ...
      else
        ...
       # Example for Graph html printer
        printer = RubyProf::GraphHtmlPrinter.new(result)
        File.open('request.html', 'w') do |file|
          printer.print(file, {:min_percent => 1,
                               :print_file => true})
        end
      end
    end
  end
end

This will create a new file, request.html, that shows a graph profile of the request. Rerun the request, and then open up request.html.

In our case we are interested in the callers of String#match, so let's take a look:

%Total %Self Total Self Wait Child Calls Name Line
    0.00 0.00 0.00 0.00 1/1403 ActionController::AbstractRequest#accepts 65
    0.00 0.00 0.00 0.00 195/1403 Array#each-1 103
    0.00 0.00 0.00 0.00 1/1403 ActionController::Rescue#local_machine? 10
    0.00 0.00 0.00 0.00 232/1403 Array#each-2 103
    0.00 0.00 0.00 0.00 1/1403 ApplicationController#sanitize_status_codes 118
    0.00 0.00 0.00 0.00 8/1403

ActiveRecord::ConnectionAdapters::

PostgreSQLAdapter#translate_field_type

55
    0.00 0.00 0.00 0.00 232/1403 Array#each-3 103
    0.00 0.00 0.00 0.00 11/1403 Geos::GeometryFactory#create_geom_from_hex 39
    0.02 0.02 0.00 0.00 380/1403 Array#each-4 103
    0.02 0.02 0.00 0.00 342/1403 Array#each-5 103
4.45% 4.45% 0.03 0.03 0.00 0.00 1403 String#match 39
    0.00 0.00 0.00 0.00 1403/1429 Regexp#match 118

 

If you haven't seen a graph profile before, you're probably feeling a bit overwhelmed. To help out a bit, I've written some documentation here. However, the quick summary is that the method of interest is shown in bold. Methods above it are the method's callers, while the methods below are the methods callees.

Notice the calls Array#each, Array#each-1, Array#each-2, etc? The dash and number indicate that a method has been called recursively. Thus Array#each-1 means that Array#each was called and then it in turn called Array#each either directly or indirectly.

Going back to our investigation, we see that Array#each-4 called String#match the most times. So the next step would be to click the Array#each-4 hyperlink and see where it is called from.

Visualizing a Profile

Once you create your own graph profile, you'll quickly realize how valuable the html hyperlinks are for navigating through the mass of information. But wouldn't it be nice to visualize the profile? Glad you asked!

ruby-prof 0.5.0 can now output call tree information thanks to a patch from Carl Shimer. To create a call out put format, once again you'll have to modify profiling.rb:

module ActionController #:nodoc:
  module Profiling #:nodoc:
    def perform_action_with_profiling
      if RubyProf.running? or
        ...
      else
        ...
       # Example for Graph html printer
        printer = RubyProf::CallTreePrinter.new(result)
        File.open('callgrind.out', 'w') do |file|
          printer.print(file, {:min_percent => 1,
                               :print_file => true})
        end
      end
    end
  end
end

Once you've created the output file, open it with KCachegrind (sorry, this is Linux only, although there is a port to Windows which I have not tried using). KCachegrind take a bit of work to get used to, but once you do you can create visualizations that show you what your program is doing. In this case, here is a picture that shows String#match's callers.

Call Graph

Wrapping Up

Remember - most of the time you won't need ruby-prof. It provides much too much information to be used as a general way of finding performance bottlenecks - instead analyze your web log files, Rails log files and database log files.

But once you know the problem is in your Rails code, ruby-prof is invaluable in pointing out where it is.

Posted in ,  | 10 comments | 4 trackbacks

:select meets :include (or a pitch for rparsec)

Posted by Charlie Tue, 13 Feb 2007 20:45:00 GMT

One reason developers like ActiveRecord is that it automatically generates the SQL needed for querying a relational database. Obviously, there are various knobs for controlling the generated SQL. You can use the :select option to modify the fields returned in a result a set. You improve performance can use the :include option to specify that related tables are loaded via joins.

Unfortunately, :select and :include don't work together. The reason is that when you specify :include ActiveRecord renames every field returned in the query to avoid name conflicts. ActiveRecord then uses those field names to build a graph of ActiveRecord objects.

Supporting :select with :include requires a full-fledged SQL SELECT parser. This might not be obvious at first sight - it seems like you could just use regular expressions. But remember that the SELECT clause of a query can be quite complex. For example:

SELECT field1, 3+4 AS field2, 3/sum(max(3,2))
FROM some_table
JOIN some_other_table 
ON some_table.id = some_other_table.some_table_id

Not to mention that some database systems let you nest a SELECT inside of a SELECT.

Parser Combinators

There are a many ways of creating parsers, with the most famous being lex and yacc. To use lex and yacc you first specify the parsing rules using EBNF or equivalent. You then run lex and yacc to create a parser in your target language, which is generally C. However, there is a Ruby implementation of lex and yacc, called racc, which you can use to generate a parser in Ruby.

Another approach is to create your parser by hand. This might sound like a silly idea, but it has a couple of advantages. First, it teaches you how parsers work. Second, its simpler since you only have to know one language. Third, it lets you take advantage of the abilities of the language you are using.

One way to approach the problem is to use parser combinators. The idea came out of functional languages, so some of the papers on the web are a bit dense. But the basic idea is to create a number of different parsers, each that does a specific task, and combine them using the Composite pattern.

For example, let's say I want to parse a simple SQL case statement:

SELECT
  CASE field1 < 100
    WHEN true THEN 'small'
    ELSE 'big'
  END

Here is some example Ruby code that does it:

# Parse a single when expression
when_clause = sequence(keyword[:when], lazy_expr, 
              (operator[':'] | keyword['then']),
              lazy_expr) do |_,cond,_,val|
  [cond, val]
end

# Parse the else statement
default_case = (keyword[:else] >> lazy_expr).optional

# Parse a case expression
case_expr = sequence(keyword[:case], lazy_expr,
               when_clause.many, default_case,
               keyword[:end]) do |val, whens, default, _|
  CaseExpr.new(val, whens, default)
end

Notice how readable it is. Working from the top down:

  • A when clause is a sequence of the keyword WHEN, followed by some expression, followed by : or the keyword THEN followed by an expression
  • The default clause is optional and consists of the keyword ELSE which proceeds some expression
  • A case statement is made up of the keyword CASE, followed by some expression, followed by any number of when clauses, a default clause and finally the keyword END.

The end result is a CaseExpr object that contains the value of the expression, an array of the when statements and a default else statement.

Not too bad is it? Note that everything is a parser. Look again at the variable default_case. It points at an optional parser that contains a sequence parser (created by >>). The sequence parser contains a keyword parser and an expression parser (which of course is a complicated parser in its own right).

rparsec

You could build your own parser combinator framework for Ruby, but luckily Ben Yu has already created one called rparsec. Although the documentation says rparsec is a port of Haskell's parsec library, it sure doesn't look like it to me (although my Haskell skills are minimal at best). Instead, it looks like a port of the Java combinator parser framework described by Steven Metsker in his book Building Parsers with Java.

Steven's book is a fantastic introduction to parser combinators - a must read if you're interested in understanding parser combinators. It takes you step-by-step through creating your own parser and the pitfalls you will encounter along the way. And if you subscribe to Safari you can read it online.

A Rails Plugin

To use the select parser with Rails:

  1. Install the rparsec gem
  2. Install the rails plugin

Good luck and happy parsing!

Posted in ,  | no comments | no trackbacks

A Community for REST

Posted by Charlie Thu, 27 Jul 2006 20:51:00 GMT

One of the great ironies of the Web is that a huge number of technical people refuse to believe what is right in front of their eyes. The Web is the world's most diverse and powerful information system – one that has massively scaled from a single user at its inception to hundreds of millions today and billions tomorrow. REST is deceptively simple - its based on three fundamental design decisions:

  • Agreement on a global addressing scheme, URIs, to identify resources.
  • Agreement on a limited, generic application interface to manipulate resources. HTTP provides this generic interface via its “verbs," which include GET, POST, PUT, DELETE, etc.
  • Agreement on common formats, particularly HTML, to represent resources. Over time additional formats have been added, including JPEG, GIF, PNG, SVG, etc.

The Siren Call

Yet these lessons have been ignored time and time again. People just can't get past the siren call of building distributed systems using distributed objects. Let's count the failures:

  • DCOM
  • CORBA
  • XML RPC
  • SOAP
  • Web Services
  • Java RMI
  • JNI

And hundreds of home grown solutions that will never make it outside the Enterprise walls. Like I wrote last spring, every developer I've ever met who has written a distributed computing system always write their own RPC mechanism. I know, I've done it.

REST proponents have been struggling to get their message across for years. A the height of the Web Services myopia, REST was portrayed as an overly simplistic solution that most certainly was not adequate for building Enterprise applications.

Five years later the tide has turned - the evangelization of REST believers is starting to pay off. And they are about to be joined by a new champion - the Rails community.

A New Champion

It looks to me that the Rails community is about to become the first main-stream web development community to fully embrace REST.

Over the last year, the Rails community has been learning Web architecture best practices at light speed. Watching this has been quite refreshing - there is surprisingly little "not invented here syndrome." Some of the fundamental changes have included:

  • RPC style controllers to REST style controllers
  • Closely related, the move to CRUD interfaces
  • Changing deployment from Apache/cgi to Lighttpd/fastcgi to http load balancing with Apache/Mongrel
  • Some support for content negotiation to allow the reuse of controllers for creating multiple types of representations

As discussed above, its no surprise that Rails started down the wrong path on these architectural issues. What is a surprise is how quickly the Rails community has discovered its mistakes and starting fixing them. The ability to rapidly adopt is partially due to the community's youth, which frees it from the shackles of a backwards compatibility required by an installed base. But its also a characteristic of the community.

At the same time, Rails has been quickly grabbing mind share and market share. Adoption hasn't been driven by technical superiority - if David Heinemeier Hansson used Python I think we'd be talking about Python on Rails today. Instead, adoption has been driven by one part marketing, one part a unified community (Rails is the way to develop Web applications if you use Ruby) and one part luck. Web Services have almost run their course, J2EE is creaking under its own weight and .NET has not taken over the world. Now that the Web is hip again, there is a vacuum that is ripe to be filled by a new platform. PHP has clearly filled most of this void, but Rails has grabbed itself a niche that is growing quickly.

The combination of the Rails marketing juggernaut, its growing market share and the community's move toward REST all point to a new champion for the REST architectural style. It will be interesting to see if this final nudge pushes the REST style architecture over the tipping point so its finally appreciated by the majority of the technical community as opposed to a few enlightened souls.

Posted in ,  | 4 comments | no trackbacks

Porting ruby-prof to Windows

Posted by Charlie Fri, 09 Jun 2006 19:14:00 GMT

Yesterday I wanted to profile some methods I'm using on a Rails controller. To get a feel for profiling Ruby code I put together a test case, added a "require 'prof'" to the top of the file and eagerly waited for the results. And waited, and waited, and waited. Thinking I did something wrong, I ran the code without a profiler - it took about 2 seconds. With the profiler it took so long I gave up. And this was on a dual core pentium D processor with 1 Gig of memory running Fedora Core 5.

Time for some investigation. It turns out this is a well known problem - the built-in Ruby profiler, which is written in Ruby, is so slow as to be useless. I came across two alternatives - ruby-prof, a C extension written by Shugo Maeda, and ZenProfile, an inline C exension done by Ryan Davis.

I went with ruby-prof. On Linux it was easy enough to download, build and install and it worked like a charm. But I do most of my work on my laptop which runs Windows XP. So I opened up MingW and built and installed the extension on Windows (that's not quite true, I had to hack the C a bit, more info below). But when I ran the test script I was met with a program fail message saying that the stack was empty. Ugh.

Since I find it impossible to debug extensions on Windows with MingW I fired up Visual Studio 2005, rebuilt the extension, and tried again. Same issue.

Digging deeper, it turns out the profilers (as well as the wonderful rcov project) work by registering a callback with Kernel::set_trace_func. When Ruby executes a line of code, enters a new Ruby or C method, or exists a Ruby or C method, the callback is activated.

The problem is that ruby-prof assumes that each call into a method is matched by a return - and if its not then the failure I see is triggered. To understand the problem, let's look at a super simple test case:

require 'profiler'

I said it was simple - didn't I! Here's the trace from Linux:

return start_profile
call print_profile
call stop_profile
c-call set_trace_func

Start_profile is the method that installs the set_trace callback - so it makes sens that the first thing we see is returning from that method. Once the program is done, the profiler calls print_profile, which calls stop_profile, which calls set_trac_func which uninstalls the callback. So the method enters and returns do not balance.

Although the method names ruby-prof uses are slightly different, the problem remains the same. ruby-prof hacks through it by pushing and popping extra items on its stack to counterweigh the imbalanced method calls. Thus its hard-coded to a specific sequence of method calls. So why doesn't it work on Windows?

A quick trace running our test program on Windows shows the problem:

return start_profile
return require
call print_profile
call stop_profile
c-call set_trace_func
There is an extra "return require" which is being generated by Ruby gems. And if you run the program in Arachno, which uses a modified version of Ruby to supports its fantastic debugger (its fast enough that I always run Rails under the debugger so I can set breakpoints at key places - definitely go check it out).
c-return set_trace_func 
return start_profile 
c-return require__ 
return require 
c-return require__ 
return require 
call print_profile 
call stop_profile 
c-call set_trace_func 

It quickly becomes clear that assuming a balanced stack is a bad idea. If you look at the built in Ruby profile it doesn't make such an assumption.

These changes have been merged into ruby-prof-0.4.0 which is now available as a RubyGem.

So, I've patched ruby-prof to remove this assumption and to make it compile on Windows. I'll submit the patch to Shugo Maeda, but in the meantime, I've provided windows binaries for anyone who wants to use the profiler on windows. To install:

1. Download the windows extension, prof.xo, and put it in your ruby\lib\ruby\site_ruby\1.8\i386-msvcrt directory.

2. Download unprof.rb and put it in your ruby\lib\ruby\site_ruby\1.8 directory.

3. To use the profiler simply require 'unprof' at the top of the file

One thing to note about my changes. The self-time for the "toplevel" method will always show "0". Its looks like the Ruby profiler does the same thing, so I think this is ok.

Assembly Hacking

This section is for anyone who's interested in some lower level details - feel free to skip it.

Getting ruby-prof to compile on windows required a few of the usual changes. For example, making sure that the extension's initialization method is property exported using __declspec(dllexport), etc.

However, ruby-prof provides an extra twist. It can measure time in several ways including using some low-level functionality provided by more recent Pentium and PowerPC processors. To access this information it uses this inline assembly call:


static prof_clock_t
cpu_get_clock()
{
#if defined(__i386__)
    unsigned long long x;
    __asm__ __volatile__ ("rdtsc" : "=A" (x));
    return x;
#elif defined(__powerpc__) || defined(__ppc__)
    unsigned long long x, y;

    __asm__ __volatile__ ("\n\
1:	mftbu   %1\n\
	mftb    %L0\n\
	mftbu   %0\n\
	cmpw    %0,%1\n\
	bne-    1b"
	: "=r" (x), "=r" (y));
    return x;
#endif
}

For x86 chips, what it does is call the rdtsc assembly function which returns the number of clock cycles that have been executed. So if you call get_cpu_clock, wait 1 second, and call get_cpu_clock again, you can calculate the chip's clock frequency. Using this information, you can time method calls. For instance, if the chip's frequency is 500Mhz and a method takes 250,000,000 cycles to complete, you can calculate it took 0.5 seconds.

This of course won't work with Visual C++ because it uses its own syntax for inline assembly calls. In this case there are couple ways of porting this code. Newer versions of Visual C++ support compiler intrinsics, and there is one for rdtsc. However, I thought it would be better to use inline assembly to support older versions. Here's the code:

static prof_clock_t
cpu_get_clock()
{
    prof_clock_t cycles = 0;

    __asm
    {
        rdtsc
        mov DWORD PTR cycles, eax
        mov DWORD PTR [cycles + 4], edx
    }
    return cycles;
}
To use this timing method you have to specifically enable it by including the following line in your ruby code.
ENV["RUBY_PROF_CLOCK_MODE"] = "cpu"
require 'unprof'

However, I can't say this works very well. The calculated frequency for my chip is always different. I don't know why - my best guess is that its a Pentium M with Intel's speed step technology so the clock frequency varies to save power. However, I'm usually plugged in so I don't think that's it. Note you can tell ruby-prof your click frequency like this:

ENV["RUBY_PROF_CLOCK_MODE"] = "cpu"
ENV["RUBY_PROF_CPU_FREQUENCY"]= "466000000"
require 'unprof'

So my recommendation is just use the default ruby-prof timing method - it does the job perfectly well.

These changes have been merged into ruby-prof-0.4.0 so I've taken them offline

Posted in , , ,  | 4 comments | no trackbacks

Updated Rails Plugins

Posted by Charlie Thu, 13 Apr 2006 19:18:00 GMT

Tom Fakes was kind enough to point out that the Rails Rest controller and content negotiation plugins I posted a couple of weeks ago can no longer be downloaded. They unfortunately were deleted when my server crashed last week.

So I've uploaded new versions of the plugins that work with Rails 1.1.1.

If you're interesetd in the Rest controller, I'd recommend looking at this one on RubyForge. It was written by Dan Kubb and then modified based on feedback from Peter and myself. This post to the microformats rest mailing list provides a bit of the technical background.

Posted in ,  | no comments | no trackbacks

Older posts: 1 2 3