Building Ruby 1.9.1 on Windows

Posted by Charlie Sun, 29 Mar 2009 04:29:00 GMT

As noted else where, ruby 1.9.1 hasn't exactly bounded out of the gate. That's not particularly surprising, considering 1.9.1 has been available for only a couple of months and requires changes to existing code. In addition, there are a number of incompatible gems, giving rise to the isitrub19y website as a clearing house of information. So despite the great efforts from the Rails team, the rest of the community is still lagging behind.

That's particularly true on Windows, where a new one-click installer isn't yet available. According to the latest market share stats from Net Applications, Windows controls 88% of the desktop market. I have no idea how many Ruby installations exist, and how they are divided by operating system. But looking at RubyForge, by far and away the most popular download of all times is the Windows one-click installer with over 3 million downloads.

Luis Lavena has taken over stewardship of the one-click installer, and clearly needs a bit of help. So although I have very little free time, I offered to pitch in as I could. While Luis is concentrating on putting together a new version of the one-click installer using Mingw and msys, I thought I could help out by putting 1.9.1 through its paces on Windows.

My basic approach was to simply start with the basics:

  • Build ruby with Visual Studio 2008
  • Build the default extensions and libraries Ruby uses (zlib, iconv, openssl, etc)
  • Run Ruby's unit tests

That was almost a month ago. Thirty-nine patches later (I have no doubt Nobu is getting sick of me), I just about have Ruby 1.9.1's test suite running on Windows. There a still a few remaining issues, in particular a couple of imap tests that hang.

As for Visual Studio, I'm using it for two reasons. First, it has a lights-out debugger that makes it much easier to track down and fix problems. Second, its lets you compile instrumented executable and libraries that can detect incorrect API usage, heap corruption, stack corruption and mismatched calling conventions.

It quickly became obvious that no one had ever done that with Ruby, because it turned up a whole host of issues. For example, the dl extension used the cdecl calling convention to call the Windows API instead of stdcall. Or that there were a set of memory leaks in printf/sprintf.

The other thing that was bothersome was the huge number of compiler warnings generated by building Ruby. See for your self - and then realize the original list doesn't include any of the warnings generated by building Ruby's extensions. Cleaning up the warnings took a number of patches, but at this point most of them have been fixed. And all credit to Nobu for working through my patches, fixing them and applying them since my knowledge of the Ruby runtime is fairly limited, thereby causing most of my patches to not be quite right.

Anyway, since its not all that obvious how to build Ruby on Windows (with Visual Studio or Mingw), I'll see if I can put together a few posts that describe how to do it for anyone who wants to roll their own.

Posted in  | 9 comments | no trackbacks

libxml-ruby 1.1.3 - Boosting Performance

Posted by Charlie Sun, 22 Mar 2009 05:19:00 GMT

I'm happy to announce the release of libxml-ruby 1.1.3. Besides including the usual assortment of new features and bug fixes, this release also includes a speed boost of roughly 10% to 20%.

This resulted from RubyInside's recent post summarizing the performance of Ruby parsers. As expected, libxml-ruby blew away Hpricot and REXML in pure parsing speed (which of course is a simplistic view of what is important in an xml processor, but nevertheless still important). But it consistently finished a bit behind Nokogiri.

I was a bit surprised by that since libxml-ruby and Nokogiri use the libxml2 library as their parsing engine. Since the specific test cases almost exclusively tested parsing, the two extensions should have identical run times.

Since the times were different, then the obvious conclusion was that the two extensions were using different libxml2 APIs or using different settings. I suspected the second, but when investigating performance you never know beforehand.

Not to bore everyone with the nitty-gritty details of using libxml2, but when looking into the first test, parsing an in-memory string, it didn't look there was much difference in API calls.

For libxml-ruby:

xmlCreateMemoryParserCtxt
xmlParseDocument

For Nokogiri:

xmlReadMemory
-> xmlCreateMemoryParserCtxt
-> xmlDoRead
-> xmlParseDocument

So that didn't solve the mystery.

The next possibility was xmlDoRead was modifying the libxml2 parser context. Now a libxml2 parser context is a beast of a thing - for those brave souls who want to take a peek, its defined in libxml2's online documentation.

Working through the options one-by-one, I finally found the culprit, an obscure field in the structure:

int	dictNames	: Use dictionary names for the tree

What this setting controls is whether libxml2 uses a dictionary to cache strings it has previously parsed. Caching strings makes a big difference, so by default it should be enabled. That is now the case with libxml-ruby 1.1.3 and higher.

Rerunning the published benchmarks now shows libxml-ruby and Nokogiri to have equivalent performance. If you run the tests yourself, beware though. The order in which the extensions are tested changes the results. Whichever extension is tested first will always be faster, at least on my Fedora 10 box. I assume that's because the first parser has more memory available to it when the test begins and therefore invokes Ruby's garbage collector a few times less.

Posted in  | 6 comments | no trackbacks

libxml-ruby reaches 1.0!

Posted by Charlie Wed, 11 Mar 2009 05:52:00 GMT

A mere seven years after its inception, libxml-ruby has finally reached version 1.0. libxml-ruby provides ruby, via the libxml2 libary, the super fast, feature rich xml parser that is has sorely lacked.

Last year I posted about the resurrection of the project, and since then we've made enormous progress. The 1.0 release marks the culmination of this work, and comes with tons of goodies:

  • Ruby 1.9.1 support
  • Out of the box support for OS X 10.5 and MacPorts
  • Greatly expanded documentation
  • Much better test coverage
  • A nice, clean API that makes it easy to do simple things, but provides all the power of libxml2 if you need it

Not to mention that libxml-ruby is blindingly fast and incredibly feature rich (see my post from last year for all the details), making it the choice for a number of high-traffic websites.

So give them a try - its as easy to install as:

gem install libxml-ruby

And if you feel like polishing your ruby, xml, or C skills, come join the community!

Posted in  | 5 comments | no trackbacks

Resurrecting libxml-ruby

Posted by Charlie Wed, 16 Jul 2008 16:38:00 GMT

There is general discontent with the state of XML processing in Ruby - see for example here or here. An obvious solution is to use libxml. However that has been a non-starter since the libxml Ruby bindings have historically caused numerous segementation faults, don't run on Windows and recently lost their current maintainer, Dan Janowski. Making it even more frustrating is that Dan had spent the last year rearchitecting the bindings, successfully fixing the segmentation faults.

Since MapBuzz heavily depends on libxml, it seemed time to step in and contribute. Over the last two weeks I've added support for Windows, cleaned out the bug database and patch list, resolved the few remaining segmentation issues, greatly improved the RDocs and refactored large portions of the code base to conform with modern Ruby extension standards.

After iterating through a couple of releases over the last two weeks, the Ruby libxml community is happy to announce the availability of version 0.8.0, which we believe is ready for prime time. It offers a great combination of speed, functionality and conformance (libxml passes all 1800+ tests in the OASIS XML Tests Suite).

So give it a try - its as easy to install as:

gem install libxml-ruby

If you're on Windows there may be an extra step if you haven't already installed libxml2. If not, then the libxml-ruby distribution includes a prebuilt libxml2 dll in the libxml-ruby/mingw directory. Copy the dll to libxml-ruby/lib, your Ruby bin directory, or somewhere on your path (basically put it someplace where Windows can find it).

Undoubtedly there are still some bugs left, so please report anything you find, so we can fix them in future releases.

Blindingly Fast

The major reason people consider using libxml-ruby is performance. Here are the results from running (on my laptop) a few simple benchmarks that have recently been blogged about on the Web (you can find them in the benchmark directory of the libxml distribution).

From Zack Chandler:

              user     system      total        real
libxml    0.032000   0.000000   0.032000 (  0.031000)
Hpricot   0.640000   0.031000   0.671000 (  0.890000)
REXML     1.813000   0.047000   1.860000 (  2.031000)
From Stephen Bannasch:
              user     system      total        real
libxml    0.641000   0.031000   0.672000 (  0.672000)
hpricot   5.359000   0.062000   5.421000 (  5.516000)
rexml    22.859000   0.047000  22.906000 ( 23.203000)

From Andreas Meingast:

LIBXML THROUGHPUT:
	10.2570516817665 MB/s
	10.2570830340359 MB/s
	12.6992253283934 MB/s
  10.2570516817665 MB/s
	8.51116888387252 MB/s
	10.2570830340359 MB/s

HPRICOT THROUGHPUT:
	0.211597647822036 MB/s
	0.202390771964726 MB/s
	0.180272812529665 MB/s
	0.198474511420818 MB/s
	0.198474499681793 MB/s
  0.180925089981179 MB/s

REXML THROUGHPUT:
	0.130301425548982 MB/s
	0.131630590068325 MB/s
	0.128316078417727 MB/s
	0.125203555921636 MB/s
	0.120181872867636 MB/s
	0.115330940074107 MB/s

I can't vouch for the appropriateness of the tests, but they show libxml clocking in at 10x hpricot and 30x to 60x REXML. I'd be happy to accept additional tests or more appropriate tests if you have any.

An Embarrassment of Riches

In addition to performance, the libxml-ruby bindings provide impressive coverage of libxml's functionality. Goodies include:

  • SAX
  • DOM
  • XMLReader (streaming interface)
  • XPath
  • XPointer
  • XML Schema
  • DTDs
  • XSLT (split into the libxslt-ruby bindings)

Now, your first reaction might be that SAX, DOM and XPath are all you need, but validating parsers make it a whole lot easier to sanitize user contributed content on web sites. And the XMLReader offers a clever way of combining the DOM's ease of use (well, ok, compared to SAX at least) with SAX's memory and speed advantages.

Better yet, most of this functionality is exposed via an easy-to-use, Ruby like API. There are still of course some warts lurking in the code, where libxml's C api leaks through to Ruby, but they are being removed one by one. And for those of you who aren't C hackers, much of this work can be done in good old Ruby.

A Long History

For such a useful, and full-featured library, the libxml-ruby bindings have a star-crossed history. Out of curiosity, I went back and traced their lineage. Sean Chittenden originally wrote them back in 2002. At the start of 2005, Trans Onoma adopted the project after Sean had moved on, and at the end of 2005 the bindings found their current home on Ruby Forge. At that point Ross Bamford took over maintenance and worked on the bindings for roughly a year, until early 2007, when then the bindings again became unmaintained. Dan Janowski picked up the ball in 2007 and completely overhauled the binding's memory model. Sadly, Dan had to give up active support this spring.

But on the bright side, Trans, Dan and Sean are all once again active on the mailing list, providing valuable experience and insight. From my point of view, with the renewed push towards a production quality release, and bringing in new users, the libxml-ruby community is as healthy as it has been in a long while.

Posted in  | 30 comments | 2 trackbacks

Fun With Here Documents

Posted by Charlie Thu, 07 Feb 2008 21:07:00 GMT

Today I was fixing up some inefficiencies in Rail's Postgresql adapter (more about that in a later post), and came across this strange looking code:

def tables(name = nil)
  schemas = schema_search_path.split(/,/).map { |p| quote(p) }.join(',')
  query(<<-SQL, name).map { |row| row[0] }
        SELECT tablename
        FROM pg_tables
        WHERE schemaname IN (#{schemas})
        SQL
end

The code is passing a here document, which is basically a long string, as a parameter to the query method. The here document is demarked by the string "SQL."

What took me by surprise is the here document is defined after the method call, which is a code construct I've never seen before. Making sure my memory wasn't failing me, I double checked my copy of The Pick Axe book and verified it never mentions this feature.

Hats off to Jamis, who added this neat little trick to Rails in revision 2317, way back in September 2005, and to Anthony who blogged about it last month. And finally, the Ruby Wikibook has a great tutorial about here documents. Who knew that you can have multiple here document parameters per method call?

Posted in  | 3 comments | no trackbacks

An Update on the Ruby and Python GEOS Bindings

Posted by Charlie Mon, 10 Sep 2007 20:03:00 GMT

Its been over a year since I blogged about GEOS, an open source project that provides rich functionality for analyzing and manipulating geometries and is a key part of PostGIS.

Sandros and Mateusz spent a ton of time refactoring and improving GEOS last year, and it looks like version 3.0 will finally be released this fall. As part of the 3.0 release, I've completely rebuilt the Ruby and Python bindings to use GEOS's C API. In theory this should decouple the bindings somewhat from GEOS releases - we'll see if that works out in practice.

Over the last couple of weeks, Mark Cave Ayland and I have spent time polishing off some rough edges. We managed to:

  • Fix various autoconf build issues
  • Get GEOS building with both older and newer versions of MingW and Msys on Windows

In addition, I added VC++ project files for the Python and Ruby bindings and updated the Ruby bindings to be more "Ruby" like.

Long story short - building GEOS and the Ruby/Python bindings should now be easier and more fool proof. If you're in the GNU world, its as simple as:

./configure --enable-python --enable-ruby

If you're in the Microsoft world, just open the VC++ solution in the source tree (thanks Mateusz), tweak the locations of your Ruby and Python installations, and hit the compile button.

So if you are using, or planning to use, the bindings I'd recommend grabbing the latest from SVN. Once you've built everything, then take a look at the test scripts I've written in the swig/ruby/test and swig/python/test directories. They should provide enough examples to get you started.

Feedback is of course welcome, particularly from Python users. Since I use Ruby day in and day out, I know the right Rubyisms to put into the bindings. But I don't know Python well enough to know the right Pythonisms.

Posted in ,  | no comments | no trackbacks

Making Rails Go Vroom

Posted by Charlie Wed, 18 Jul 2007 17:12:00 GMT

Last week I showed how to profile a Rails application using ruby-prof. The post was driven by our desire to improve MapBuzz's map rendering speed. Our nightly log analysis showed that rendering speed has significantly degraded over the last few weeks. Since the whole point of MapBuzz is to share maps with others, we needed to find out what was taking so long.

A day of profiling and we had our answer. With a few simple changes we were able to reduce rendering time from 7.8 seconds to 0.92 seconds (note these times are gathered while profiling the application, so the real times are 2 to 3 times less).

What we found should be generally applicable to all Rails applications, so I'd like to share them:

If you interested in the nitty-gritty details of each change, keep reading!

A Bit of Background

MapBuzz renders maps in a two step process. First, the html page is created and shown to the user. Second, the browser makes an Ajax request to get the features that should be shown on the map. The reason for having a two step process is that it allows users to zoom and pan around the map, without having to reload the whole page.

Log analysis revealed that the Ajax request was taking a long time - 1 to 2 seconds, with a big standard deviation. Our goal is to reduce the average time to less than half a second and to significantly reduce the standard deviation.

If you look at the request URI, you'll see its an Atom feed of features and not an HTML page. That has performance implications.

First, Atom feeds tend to have lots of links. In our case, each entry has 10 to 15 links. We limit the number of entries to 50, so that means there are roughly 600 link per page.

Second, we generate Atom feeds in a modular fashion - each entry is created from 10 partials. There is a partial for the user, one for ratings, one for content, one for icons, one for geometries, etc. Why did we do it this way? Simple - to keep things DRY. Having a set of partials creates an extremely modular system - we can mix and match them as needed depending on the context of the request.

Setup

Testing was done with two machines. The first was my laptop, which is a Dell Latitude D610 running windows, Ruby 1.8.4, and Rails 1.2.3. The second is our staging server, which is fairly high end Dell Desktop running Fedora Core 6 with a copy of our production database (which is about 10Gb). Profiling was done with ruby-prof 0.5.1 (of course), using the Rails plugin I described previously. Note that the logging level was set to :debug, which will show up in the test results.

Baseline

To start we measured the baseline performance of our test setup. It was ugly indeed:

Total: 7.828
%self total self wait child calls name 24.11 1.95 1.89 0.00 0.06 3605 Kernel#clone 13.15 1.04 1.03 0.02 0.00 569 IO#write 7.78 4.13 0.61 0.00 3.52 441 <Module::Benchmark># realtime-1 7.41 0.58 0.58 0.00 0.00 125 PGconn#exec

One bit of solace is that profiling an application will slow it down 2 to 3 times, but that sill leaves us with a 4 seconds per request. So let's start digging through the results.

Don't Use ActiveRecord:attributes

The first thing that jumps out is the huge amount of time Kernel#clone takes. A look at the call graph shows that the caller is clone_attribute_value.

And a bit more digging reveals this custom code in our application:
def rating
  # Convert cached rating from BigDecimal to float,
  # otherwise strange rounding errors happen
  attributes['rating'].to_f
end

The problem is that when you call ActiveRecord#attributes it returns a copy of the attributes, thus generating all the clone calls. Uuggh. I think this is a result of ActiveRecord's flawed attribute implementation, but that is for another blog.

What we want is access to the untyped, original value of the attribute. In theory you are supposed to use the auto generated method, rating_before_type_cast, for this. This has the advantage of skipping the clone call, but it relies on method_missing, which has some overhead (we did not measure how much). Or, you could use read_attribute directly, which also skips the clone call. This is usually your best choice.

For performance critical code, you may wish to read the untyped attribute directly (see the discussion about time below for an example). That can be done using the method read_attribute_before_type_cast, except that it is private. However, with Ruby, that's easily solved:

module ActiveRecord
  class Base
    public :read_attribute_before_type_cast
  end
end

Lesson - Don't use attributes

Gain - 24%

Use include

Let's look at our updated results:

Total: 3.39

 %self     total     self     wait    child    calls  name
 16.73      0.57     0.57     0.00     0.00      569  IO#write
 12.83      0.44     0.44     0.00     0.00      125  PGconn#exec
  8.76      1.77     0.30     0.00     1.48      441  <Module::Benchmark>#
	                                                    realtime-1
  5.99      0.22     0.20     0.00     0.01       40  PGconn#query

What stands out are the 125 calls to PGconn#exec - each one is a separate query. Looks like we forgot to specify a table or two to be eager loaded via ActiveRecord#find's :include option.

Note there is one downside to using :include - you lose the ability to use the :select option. For us that is important - and can be worked around by using the rails select_parser I've blogged about before.

Lesson - Get your :includes right

Gain - 12%

cache_template_loading=true

After fixing attributes and includes, here is our current timings:

Total: 2.594

 %self     total     self     wait    child    calls  name
 12.64      1.14     0.33     0.00     0.81      361  <Module::Benchmark>#
                                                      realtime-1
 12.61      0.33     0.33     0.00     0.00      409  IO#write
  4.90      0.13     0.13     0.00     0.00      403  <Class::File>#mtime
  4.86      0.16     0.13     0.00     0.03    30250  Hash#[]
  3.62      0.09     0.09     0.00     0.00        5  PGconn#exec

A couple of things jump out. First, using Benchmark#realtime is fairly time consuming. Second, logging also takes its toll as seen in the times for IO#write (remember we have logging set to :debug).

However, the #mtime call looks suspicious. Some more digging through Rails shows all 403 calls come from ActionView::Base#compile_template?. Let's take a look:

def compile_template?(template, file_name, local_assigns)
  method_key    = file_name || template
  render_symbol = @@method_names[method_key]

  if @@compile_time[render_symbol] &&
	   supports_local_assigns?(render_symbol, local_assigns)
    if file_name && !@@cache_template_loading 
      @@compile_time[render_symbol] < File.mtime(file_name) ||
        (File.symlink?(file_name) && 
        (@@compile_time[render_symbol] < File.lstat(file_name).mtime))
    end
  else
    true
  end
end

Remember I mentioned that it takes roughly 500 partial calls to generate the output? This is where it bites us - by default Rails checks the timestamp of cached templates before running them. In a production environment that is totally unnecessary. The solution is simple - just update your production.rb file like this:

# Don't check view timestamps!
config.action_view.cache_template_loading = true

And while we are at it, we'll change the debug level to :info to reduce its impact on the results.

Lesson - Set cache_template_loading to true

Gain - 5%

Don't use url_for

We've managed to reduce the request time from 7.8 to 2.5 seconds. Not bad. But we still have more work to do.


Total: 2.531

 %self     total     self     wait    child    calls  name
 13.59      1.91     0.34     0.00     1.56       45  <Module::Benchmark>#
                                                      realtime
 12.25      0.31     0.31     0.00     0.00      409  IO#write
  4.98      0.81     0.13     0.00     0.69      361  <Module::Benchmark>#
                                                      realtime-1
  4.94      0.13     0.13     0.00     0.00       40  ActiveRecord::Associations::
                                                      HasManyThroughAssociation#
                                                      construct_sql
  3.71      0.09     0.09     0.00     0.00        5  PGconn#exec
  3.04      0.08     0.08     0.00     0.00    29847  Hash#[]
  3.00      0.11     0.08     0.00     0.03     2097  Hash#each
  1.86      0.28     0.05     0.00     0.23      322  ActionController::Routing::
                                                      RouteSet#generate

Of particular interest is the ActionController::Routing::RouteSet#generate. Although it only runs for 0.05 seconds, its total time, including children, is 0.28. Thus creating 322 urls takes 11% of the request time.

People have previously mentioned how slow url_for is - well, its true. Instead of using url_for, just create your urls manually using string mashing. For example, if you want an absolute uri, then do this:

"#{request.protocol}#{request.host_with_port}/controller/#{map.id}"

Lesson - Don't use url_for or link_to

Gain - 11%

Don't let Rails parse timestamps

After removing url_for, we've smashed the 2 second barrier:

Total: 1.391

 %self     total     self     wait    child    calls  name
  9.06      0.13     0.13     0.00     0.00      726  String#sub!
  8.99      0.13     0.13     0.00     0.00       40  ActionView::Base::CompiledTemplates#
	                                                    _run_ratom_47app47views47geometry47_geometry46ratom
  6.69      0.09     0.09     0.00     0.00        5  PGconn#exec
  5.54      0.08     0.08     0.00     0.00      741  Item#id
  5.54      0.30     0.08     0.00     0.22      121  <Class::Date>#_parse

Next up is <Class::Date>#_parse, whose total time, including children, is a whopping 21% of the time. Ouch. If you take a look at the code (its in the standard ruby library), you can see its quite complicated since it tries to parse a variety of date formats. In addition, it makes use of rational numbers, which are none too fast.

Taking a look at a graph profile, we see that <Class::Date>#_parse is called from <Class::ActiveRecord::ConnectionAdapters::Column>#string_to_time. Thus Rails uses it to convert timestamps received from the database into Ruby objects.

As I detailed in a recent post, the problem is that Ruby's DateTime implementation is extremely slow compared to its Time implementation. The solution is to avoid DateTime's entirely and use custom time parsing code. Fortunately, our database can output time in ISO 861 format, which Time can quickly parse.

So, any place we access times, we simply roll our own attribute readers and writers like this (note that we don't use read_attribute!).

def created_on
  @created_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end
  
def created_on=(value)
  write_attribute(:created_on, value)
  @created_on = nil
end
  
def updated_on
  @updated_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end

Lesson - Don't let Rails parse timestamps - do it yourself

Gain - 20%

Don't symbolize keys

We've almost hit the 1 second barrier:

Total: 1.156

 %self     total     self     wait    child    calls  name
  8.13      0.11     0.09     0.00     0.01      642  Kernel#send-3
  8.13      0.09     0.09     0.00     0.00        5  PGconn#exec
  6.75      0.08     0.08     0.00     0.00     7497  String#concat
  4.15      0.05     0.05     0.00     0.00      817  Hash#each
  4.07      0.05     0.05     0.00     0.00       40  ActiveRecord::Associations::
                                                      HasAndBelongsToManyAssociation#
                                                      construct_sql
  2.77      0.03     0.03     0.00     0.00     6360  String#match
  2.77      0.06     0.03     0.00     0.03       47  Array#each_index
  2.77      0.03     0.03     0.00     0.00      280  IconType#size
  2.60      0.03     0.03     0.00     0.00     4309  Array#[]
		

At this point, most of the easy wins are gone. However, there is at least one left. Although its not shown in the flat profile, the graph profile shows that the method ActiveSupport::CoreExtensions::Hash::Keys#symbolize_keys takes 2.77% of the time including its children. Additional rummaging through the call graph shows that the calls come from ActionView#compile_and_render_template:

def compile_and_render_template(extension, template = nil, 
			        file_path = nil, local_assigns = {})
  # convert string keys to symbols if requested
  local_assigns = local_assigns.symbolize_keys if 
	          @@local_assigns_support_string_keys

A bit of research shows that local_assigns_support_string_keys is deprecated and slated for removal from Rails! Sweet - that means we can save an additional 3% simply by adding this line to environment.rb:

config.action_view.local_assigns_support_string_keys = false

If only all optimizations were so simple.

Lesson - Don't symbolize keys

Gain - 3%

Wrapping Up

So what's our final time? Let's see:

Total: 0.922

 %self     total     self     wait    child    calls  name
 10.30      0.10     0.10     0.00     0.00        5  PGconn#exec
  3.47      0.17     0.03     0.00     0.14      312  Array#each-2
  3.47      0.03     0.03     0.00     0.00      212  ActiveRecord::ConnectionAdapters::
                                                      Quoting#quote
  3.47      0.03     0.03     0.00     0.00    19455  Hash#[]
  3.47      0.36     0.03     0.00     0.33      360  ActionView::Base#render-2
  3.36      0.06     0.03     0.00     0.03       47  Array#each_index
  3.36      0.03     0.03     0.00     0.00      299  ActiveRecord::Associations::
                                                      AssociationProxy#initialize
  3.36      0.14     0.03     0.00     0.11      705  ActiveRecord::Associations::
                                                      AssociationProxy#method_missing

Not bad - we reduced the request time from 7.8 to 0.92 seconds with a day's worth of work. Obviously there is still much to be done - we need to run the code through our performance monitoring suite and make sure that the average time holds across a number of requests and that the standard deviation is in line. But we've at least made a good start.

Posted in , , ,  | 27 comments | 7 trackbacks

Tick, tick, tick

Posted by Charlie Fri, 13 Jul 2007 17:05:00 GMT

While analyzing Rails performance, I stumbled upon the fact that parsing timestamps in ActiveRecord is very slow (I'll post more about that in an upcoming blog).

A bit of googling revealed that Charlie Bowman discovered this a year an a half ago. He has a nice post comparing the performance of Ruby's Time and DateTime classes. Sadly he didn't use ruby-prof to come up with his results, but we'll forgive him for that :)

Now you might reasonably assume that the Time class deals with, er, time and the DateTime class deals with date times, which we'll refer to as timestamps. Well, like me, you'd be wrong. Totally wrong.

It turns out that Time and DateTime both deal with timestamps. But they are implemented completely differently.

Time is a c extension that wraps your operating system's time functionality. It stores times as seconds from January 1st, 1970. Its biggest upside is performance. Its biggest downside is its limited range - times before 1970 are "out of range." I have to admit that doesn't make any sense to me since no other language I've used suffers from this problem and I'm fairly sure Windows and Linux don't either. But no matter.

DateTime is a pure ruby implementation of time that let's you store a much wider range of dates. It also supports both the Julian and Gregorian calendars - on the off hand chance you need such a thing.

Rails takes the safe route and converts timestamps to DateTimes. But there is a performance penalty to pay. Below is simple little script showing how much time it takes to parse a timestamp using 3 approaches - DateTime.parse, Time.parse and Time.xmlschema.

require 'date'
require 'ruby-prof'

# Use xml schema format
now = '2007-07-12T23:36:25.609375-06:00'

result = RubyProf.profile do 
  1000.times do
    time = DateTime.parse(now)
  end
end

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})

result = RubyProf.profile do 
  1000.times do
    time = Time.parse(now)
  end
end

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})


result = RubyProf.profile do 
  1000.times do
    time = Time.xmlschema(now)
  end
end

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})

And the results:


Total: 3.86

 %self     total     self     wait    child    calls  name
 14.15      0.83     0.55     0.00     0.28    19000  Integer#gcd
 10.91      2.16     0.42     0.00     1.74    19000  <Class::Rational>#reduce
  8.06      0.37     0.31     0.00     0.06    27000  Rational#initialize
  7.33      0.28     0.28     0.00     0.00     6000  String#sub!
  5.70      0.74     0.22     0.00     0.52    27000  Class#new
  4.87      0.94     0.19     0.00     0.75     1000  <Class::Date>#_parse
  3.68      0.14     0.14     0.00     0.00    29000  <Class::Object>#allocate
  3.65      2.30     0.14     0.00     2.16    19000  Object#Rational
  3.26      0.86     0.13     0.00     0.74    27000  <Class::Rational>#new!
  3.24      0.79     0.13     0.00     0.67     4000  Rational#+

---------------
Total: 1.359

 %self     total     self     wait    child    calls  name
 32.16      0.44     0.44     0.00     0.00     6000  String#sub!
  6.92      0.95     0.09     0.00     0.86     1000  <Class::Date>#_parse
  6.84      0.36     0.09     0.00     0.27     1000  <Class::Time>#make_time
  5.81      0.23     0.08     0.00     0.15     1000  Rational#/
  5.74      0.08     0.08     0.00     0.00     5000  Rational#initialize
  4.71      0.10     0.06     0.00     0.03     1000  <Class::Time>#apply_offset
  4.49      0.09     0.06     0.00     0.03     3000  Integer#gcd
  3.46      0.23     0.05     0.00     0.19     3000  <Class::Rational>#reduce
  3.46      1.36     0.05     0.00     1.31     1000  <Class::Time>#parse

---------------
Thread ID: 21259270
Total: 0.297

 %self     total     self     wait    child    calls  name
 37.04      0.14     0.11     0.00     0.03     1000  <Class::Time>#zone_offset
 15.82      0.30     0.05     0.00     0.25     1000  <Class::Time>#xmlschema
 10.44      0.08     0.03     0.00     0.05     1000  <Class::Time>#apply_offset
  5.39      0.02     0.02     0.00     0.00     1000  <Class::Time>#utc
  5.39      0.03     0.02     0.00     0.02     1000  <Class::Time>#month_days
  5.39      0.02     0.02     0.00     0.00     3000  Fixnum#*
  5.39      0.02     0.02     0.00     0.00     1000  Fixnum#-
  5.05      0.01     0.01     0.00     0.00     4000  Fixnum#divmod
  5.05      0.01     0.01     0.00     0.00     1000  String#upcase
  5.05      0.01     0.01     0.00     0.00     8000  String#to_i

DateTime.parse is slow because its a complicated piece of code that relies heavily on rational numbers. Time.parse is roughly 3 times faster, although if you look under the hood, it actually uses part of DateTime.parse. That seems a bit strange to me, but I didn't spend the time to investigate further.

The real winner here is Time.xmlschema which is a over 10 times faster than DateTime.parse. Why is it so much faster? Instead of a mashing a bunch of strings, it uses a regular expression to parse timestamps. Thus it trades flexibility for performance.

When talking to a database there is no need for flexibility - your database most likely outputs time in the exact same way all the time. So if you heavily use dates in your Rails application, then roll your own timestamp parsing.

Posted in ,  | 3 comments | 1 trackback

Announcing ruby-prof 0.5.0

Posted by Charlie Mon, 09 Jul 2007 06:56:00 GMT

Make sure to grab the latest version of ruby-prof, currently 0.5.2, which includes some performance tweaks and bug fixes.

A year has passed since the last release of ruby-prof. If you haven't used ruby-prof, its a superfast, open-source, profiler for Ruby that shows you where your program is slow.

Over the last few months the list of bugs and enhancements on Ruby Forge has been accumulating at an embarassing rate. And not to mention, I need to profile our rails app.

So its time for another release. And this isn't just a minor bug fix release - its a major update that adds lots of new stuff. To install it, open a command prompt and type:

gem install ruby-prof

Then pick the appropriate gem for your platform.

Multithreaded Applications

The most important change is support for multi-threaded applications. Previously, ruby-prof correctly recognized the threads in a program and their independent call stacks, but it got the thread times wrong. ruby-prof would incorrectly add a child thread's time to a parent thread's time, making hard to figure out where the time in a program was spent. Now, ruby-prof keeps track of a thread's "wait time," which is the time it spends waiting for other threads. Thanks to Sylvain Joyeux for a patch that got the ball rolling on better thread support.

Measurement Modes

Sylvain also submitted a second patch that suprised me. Instead of tracking method times, why not track object allocations? That seemed reasonable, but he implemented it as another type of clock mode (ruby-prof supports 3 different ways of measuring time - process time, wall time and cpu time).

At first I was hesitant about the patch - shoehorning object allocations into clock modes didn't make much sense to me. But after thinking about it a bit, I decided Sylvain was onto something. Instead of thinking about "clock modes" I started thinking about "measurement modes" (catchy, isn't it?). And once you have that perspective, then it makes sense for ruby-prof to measure all sorts of things in a running program - time, object allocations, memory usage, etc.

So with a bit of refactoring, ruby-prof's now supports for "measure modes." They are process time, wall time, cpu time an object allocation. And now its easy to add more if anyone wants to submit a patch!

I should note there is one downside to the object allocations mode - it only works if you have a patched Ruby interpreter (if you don't ruby-prof works fine, you just don't get access to allocation information).

Rails

What use would a Ruby profiler be if it didn't work with Rails? ruby-prof now provides some hooks to make easier to profile your rails app. But I'll leave those for a future blog post.

Call Graph Format

Carl Shimer implemented support for the call graph format which is used by KCachegrind to visualize calling information.

Recursive Methods

Finally, ruby-prof has much improved support for recursive method calls as explained in the updated readme file.

Bug Fixes

And last, this release fixes some nasty bugs:

Unknown Singleton Object - This was a tough one since I didn't understand how to reproduce it. But luckily Matthew Fallshaw submitted a reproducable test case, and from there it was just a matter of reading the translated version of the Ruby Hacking Guide to unravel the mysteries of Ruby's singleton classes.

64-bit support - The 0.4.x releases used ints to store pointers instead of longs, which meant they didn't compile and/or work on 64 bit machines. This is now fixed thanks to a patch submitted by Diego 'Flameeyes' Pettenò.

IRB support - Previously you couldn't start ruby-prof in a method and then exit that method. Which meant that ruby-prof couldn't be used in IRB. That restriction has now been removed.

sort order - The sort orderan in generated reports was wrong. Graph reports now sort based on a method's total percent time while flat reports use a method's self percent time.

So give version 0.5.0 a try and see how you like it!

Posted in ,  | 2 comments | 2 trackbacks

:select meets :include (or a pitch for rparsec)

Posted by Charlie Tue, 13 Feb 2007 20:45:00 GMT

One reason developers like ActiveRecord is that it automatically generates the SQL needed for querying a relational database. Obviously, there are various knobs for controlling the generated SQL. You can use the :select option to modify the fields returned in a result a set. You improve performance can use the :include option to specify that related tables are loaded via joins.

Unfortunately, :select and :include don't work together. The reason is that when you specify :include ActiveRecord renames every field returned in the query to avoid name conflicts. ActiveRecord then uses those field names to build a graph of ActiveRecord objects.

Supporting :select with :include requires a full-fledged SQL SELECT parser. This might not be obvious at first sight - it seems like you could just use regular expressions. But remember that the SELECT clause of a query can be quite complex. For example:

SELECT field1, 3+4 AS field2, 3/sum(max(3,2))
FROM some_table
JOIN some_other_table 
ON some_table.id = some_other_table.some_table_id

Not to mention that some database systems let you nest a SELECT inside of a SELECT.

Parser Combinators

There are a many ways of creating parsers, with the most famous being lex and yacc. To use lex and yacc you first specify the parsing rules using EBNF or equivalent. You then run lex and yacc to create a parser in your target language, which is generally C. However, there is a Ruby implementation of lex and yacc, called racc, which you can use to generate a parser in Ruby.

Another approach is to create your parser by hand. This might sound like a silly idea, but it has a couple of advantages. First, it teaches you how parsers work. Second, its simpler since you only have to know one language. Third, it lets you take advantage of the abilities of the language you are using.

One way to approach the problem is to use parser combinators. The idea came out of functional languages, so some of the papers on the web are a bit dense. But the basic idea is to create a number of different parsers, each that does a specific task, and combine them using the Composite pattern.

For example, let's say I want to parse a simple SQL case statement:

SELECT
  CASE field1 < 100
    WHEN true THEN 'small'
    ELSE 'big'
  END

Here is some example Ruby code that does it:

# Parse a single when expression
when_clause = sequence(keyword[:when], lazy_expr, 
              (operator[':'] | keyword['then']),
              lazy_expr) do |_,cond,_,val|
  [cond, val]
end

# Parse the else statement
default_case = (keyword[:else] >> lazy_expr).optional

# Parse a case expression
case_expr = sequence(keyword[:case], lazy_expr,
               when_clause.many, default_case,
               keyword[:end]) do |val, whens, default, _|
  CaseExpr.new(val, whens, default)
end

Notice how readable it is. Working from the top down:

  • A when clause is a sequence of the keyword WHEN, followed by some expression, followed by : or the keyword THEN followed by an expression
  • The default clause is optional and consists of the keyword ELSE which proceeds some expression
  • A case statement is made up of the keyword CASE, followed by some expression, followed by any number of when clauses, a default clause and finally the keyword END.

The end result is a CaseExpr object that contains the value of the expression, an array of the when statements and a default else statement.

Not too bad is it? Note that everything is a parser. Look again at the variable default_case. It points at an optional parser that contains a sequence parser (created by >>). The sequence parser contains a keyword parser and an expression parser (which of course is a complicated parser in its own right).

rparsec

You could build your own parser combinator framework for Ruby, but luckily Ben Yu has already created one called rparsec. Although the documentation says rparsec is a port of Haskell's parsec library, it sure doesn't look like it to me (although my Haskell skills are minimal at best). Instead, it looks like a port of the Java combinator parser framework described by Steven Metsker in his book Building Parsers with Java.

Steven's book is a fantastic introduction to parser combinators - a must read if you're interested in understanding parser combinators. It takes you step-by-step through creating your own parser and the pitfalls you will encounter along the way. And if you subscribe to Safari you can read it online.

A Rails Plugin

To use the select parser with Rails:

  1. Install the rparsec gem
  2. Install the rails plugin

Good luck and happy parsing!

Posted in ,  | no comments | no trackbacks