Posted by Charlie
Sun, 29 Mar 2009 04:29:00 GMT
As noted else where, ruby 1.9.1 hasn't exactly bounded out of the gate. That's not particularly surprising, considering 1.9.1 has been available for only a couple of months and requires changes to existing code. In addition, there are a number of incompatible gems, giving rise to the isitrub19y website as a clearing house of information. So despite the great efforts from the Rails team, the rest of the community is still lagging behind.
That's particularly true on Windows, where a new one-click installer isn't yet available. According to the latest market share stats from Net Applications, Windows controls 88% of the desktop market. I have no idea how many Ruby installations exist, and how they are divided by operating system. But looking at RubyForge, by far and away the most popular download of all times is the Windows one-click installer with over 3 million downloads.
Luis Lavena has taken over stewardship of the one-click installer, and clearly needs a bit of help. So although I have very little free time, I offered to pitch in as I could. While Luis is concentrating on putting together a new version of the one-click installer using Mingw and msys, I thought I could help out by putting 1.9.1 through its paces on Windows.
My basic approach was to simply start with the basics:
- Build ruby with Visual Studio 2008
- Build the default extensions and libraries Ruby uses (zlib, iconv, openssl, etc)
- Run Ruby's unit tests
That was almost a month ago. Thirty-nine patches later (I have no doubt Nobu is getting sick of me), I just about have Ruby 1.9.1's test suite running on Windows. There a still a few remaining issues, in particular a couple of imap tests that hang.
As for Visual Studio, I'm using it for two reasons. First, it has a lights-out debugger that makes it much easier to track down and fix problems. Second, its lets you compile instrumented executable and libraries that can detect incorrect API usage, heap corruption, stack corruption and mismatched calling conventions.
It quickly became obvious that no one had ever done that with Ruby, because it turned up a whole host of issues. For example, the dl extension used the cdecl calling convention to call the Windows API instead of stdcall. Or that there were a set of memory leaks in printf/sprintf.
The other thing that was bothersome was the huge number of compiler warnings generated by building Ruby. See for your self - and then realize the original list doesn't include any of the warnings generated by building Ruby's extensions. Cleaning up the warnings took a number of patches, but at this point most of them have been fixed. And all credit to Nobu for working through my patches, fixing them and applying them since my knowledge of the Ruby runtime is fairly limited, thereby causing most of my patches to not be quite right.
Anyway, since its not all that obvious how to build Ruby on Windows (with Visual Studio or Mingw), I'll see if I can put together a few posts that describe how to do it for anyone who wants to roll their own.
Posted in Ruby | 9 comments | no trackbacks
Posted by Charlie
Sun, 22 Mar 2009 05:19:00 GMT
I'm happy to announce the release of libxml-ruby 1.1.3. Besides including the usual assortment of new features and bug fixes, this release also includes a speed boost of roughly 10% to 20%.
This resulted from RubyInside's recent post summarizing the performance of Ruby parsers. As expected, libxml-ruby blew away Hpricot and REXML in pure parsing speed (which of course is a simplistic view of what is important in an xml processor, but nevertheless still important). But it consistently finished a bit behind Nokogiri.
I was a bit surprised by that since libxml-ruby and Nokogiri use the libxml2 library as their parsing engine. Since the specific test cases almost exclusively tested parsing, the two extensions should have identical run times.
Since the times were different, then the obvious conclusion was that the two extensions were using different libxml2 APIs or using different settings. I suspected the second, but when investigating performance you never know beforehand.
Not to bore everyone with the nitty-gritty details of using libxml2, but when looking into the first test, parsing an in-memory string, it didn't look there was much difference in API calls.
For libxml-ruby:
xmlCreateMemoryParserCtxt
xmlParseDocument
For Nokogiri:
xmlReadMemory
-> xmlCreateMemoryParserCtxt
-> xmlDoRead
-> xmlParseDocument
So that didn't solve the mystery.
The next possibility was xmlDoRead was modifying the libxml2 parser context. Now a libxml2 parser context is a beast of a thing - for those brave souls who want to take a peek, its defined in libxml2's online documentation.
Working through the options one-by-one, I finally found the culprit, an obscure field in the structure:
int dictNames : Use dictionary names for the tree
What this setting controls is whether libxml2 uses a dictionary to cache strings it has previously parsed. Caching strings makes a big difference, so by default it should be enabled. That is now the case with libxml-ruby 1.1.3 and higher.
Rerunning the published benchmarks now shows libxml-ruby and Nokogiri to have equivalent performance. If you run the tests yourself, beware though. The order in which the extensions are tested changes the results. Whichever extension is tested first will always be faster, at least on my Fedora 10 box. I assume that's because the first parser has more memory available to it when the test begins and therefore invokes Ruby's garbage collector a few times less.
Posted in Ruby | 6 comments | no trackbacks
Posted by Charlie
Wed, 11 Mar 2009 05:52:00 GMT
A mere seven years after its inception, libxml-ruby has finally reached version 1.0.
libxml-ruby provides ruby, via the
libxml2 libary, the super fast,
feature rich xml parser that is has sorely lacked.
Last year I
posted about the resurrection of the project, and since then
we've made enormous progress. The 1.0 release marks the culmination
of this work, and comes with tons of goodies:
- Ruby 1.9.1 support
- Out of the box support for OS X 10.5 and MacPorts
- Greatly expanded
documentation
- Much better test coverage
- A nice, clean API that makes it easy to do simple things, but
provides all the power of libxml2 if you need it
Not to mention that libxml-ruby is blindingly fast and
incredibly feature rich (see my
post from last year for all the details), making it the choice
for a number of high-traffic websites.
So give them a try - its as easy to install as:
gem install libxml-ruby
And if you feel like polishing your ruby, xml, or C skills, come
join the community!
Posted in Ruby | 5 comments | no trackbacks
Posted by Charlie
Wed, 16 Jul 2008 16:38:00 GMT
There is general discontent with the state of XML processing in Ruby - see for example here or here. An obvious solution is to use libxml. However that has been a non-starter since the libxml Ruby bindings have historically caused numerous segementation faults, don't run on Windows and recently lost their current maintainer, Dan Janowski. Making it even more frustrating is that Dan had spent the last year rearchitecting the bindings, successfully fixing the segmentation faults.
Since MapBuzz heavily depends on libxml, it seemed time to step in and contribute. Over the last two weeks I've added support for Windows, cleaned out the bug database and patch list, resolved the few remaining segmentation issues, greatly improved the RDocs and refactored large portions of the code base to conform with modern Ruby extension standards.
After iterating through a couple of releases over the last two weeks, the Ruby libxml community is happy to announce the availability of version 0.8.0, which we believe is ready for prime time. It offers a great combination of speed, functionality and conformance (libxml passes all 1800+ tests in the OASIS XML Tests Suite).
So give it a try - its as easy to install as:
gem install libxml-ruby
If you're on Windows there may be an extra step if you haven't already installed libxml2. If not, then the libxml-ruby distribution includes a prebuilt libxml2 dll in the libxml-ruby/mingw directory. Copy the dll to libxml-ruby/lib, your Ruby bin directory, or somewhere on your path (basically put it someplace where Windows can find it).
Undoubtedly there are still some bugs left, so please report anything you find, so we can fix them in future releases.
Blindingly Fast
The major reason people consider using libxml-ruby is performance. Here are the results from running (on my laptop) a few simple benchmarks that have recently been blogged about on the Web (you can find them in the benchmark directory of the libxml distribution).
From Zack Chandler:
user system total real
libxml 0.032000 0.000000 0.032000 ( 0.031000)
Hpricot 0.640000 0.031000 0.671000 ( 0.890000)
REXML 1.813000 0.047000 1.860000 ( 2.031000)
From
Stephen Bannasch:
user system total real
libxml 0.641000 0.031000 0.672000 ( 0.672000)
hpricot 5.359000 0.062000 5.421000 ( 5.516000)
rexml 22.859000 0.047000 22.906000 ( 23.203000)
From Andreas Meingast:
LIBXML THROUGHPUT:
10.2570516817665 MB/s
10.2570830340359 MB/s
12.6992253283934 MB/s
10.2570516817665 MB/s
8.51116888387252 MB/s
10.2570830340359 MB/s
HPRICOT THROUGHPUT:
0.211597647822036 MB/s
0.202390771964726 MB/s
0.180272812529665 MB/s
0.198474511420818 MB/s
0.198474499681793 MB/s
0.180925089981179 MB/s
REXML THROUGHPUT:
0.130301425548982 MB/s
0.131630590068325 MB/s
0.128316078417727 MB/s
0.125203555921636 MB/s
0.120181872867636 MB/s
0.115330940074107 MB/s
I can't vouch for the appropriateness of the tests, but they show libxml clocking in at 10x hpricot and 30x to 60x REXML. I'd be happy to accept additional tests or more appropriate tests if you have any.
An Embarrassment of Riches
In addition to performance, the libxml-ruby bindings provide impressive coverage of libxml's functionality. Goodies include:
- SAX
- DOM
- XMLReader (streaming interface)
- XPath
- XPointer
- XML Schema
- DTDs
- XSLT (split into the libxslt-ruby bindings)
Now, your first reaction might be that SAX, DOM and XPath are all you need, but validating parsers make it a whole lot easier to sanitize user contributed content on web sites. And the XMLReader offers a clever way of combining the DOM's ease of use (well, ok, compared to SAX at least) with SAX's memory and speed advantages.
Better yet, most of this functionality is exposed via an easy-to-use, Ruby like API. There are still of course some warts lurking in the code, where libxml's C api leaks through to Ruby, but they are being removed one by one. And for those of you who aren't C hackers, much of this work can be done in good old Ruby.
A Long History
For such a useful, and full-featured library, the libxml-ruby bindings have a star-crossed history. Out of curiosity, I went back and traced their lineage. Sean Chittenden originally wrote them back in 2002. At the start of 2005, Trans Onoma adopted the project after Sean had moved on, and at the end of 2005 the bindings found their current home on Ruby Forge. At that point Ross Bamford took over maintenance and worked on the bindings for roughly a year, until early 2007, when then the bindings again became unmaintained. Dan Janowski picked up the ball in 2007 and completely overhauled the binding's memory model. Sadly, Dan had to give up active support this spring.
But on the bright side, Trans, Dan and Sean are all once again active on the mailing list, providing valuable experience and insight. From my point of view, with the renewed push towards a production quality release, and bringing in new users, the libxml-ruby community is as healthy as it has been in a long while.
Posted in Ruby | 30 comments | 2 trackbacks
Posted by Charlie
Thu, 07 Feb 2008 21:07:00 GMT
Today I was fixing up some inefficiencies in Rail's Postgresql adapter (more about that in a later post), and came across this strange looking code:
def tables(name = nil)
schemas = schema_search_path.split(/,/).map { |p| quote(p) }.join(',')
query(<<-SQL, name).map { |row| row[0] }
SELECT tablename
FROM pg_tables
WHERE schemaname IN (#{schemas})
SQL
end
The code is passing a here document, which is basically a long string, as a parameter to the query method. The here document is demarked by the string "SQL."
What took me by surprise is the here document is defined after the method call, which is a code construct I've never seen before. Making sure my memory wasn't failing me, I double checked my copy of The Pick Axe book and verified it never mentions this feature.
Hats off to Jamis, who added this neat little trick to Rails in revision 2317, way back in September 2005, and to Anthony who blogged about it last month. And finally, the Ruby Wikibook has a great tutorial about here documents. Who knew that you can have multiple here document parameters per method call?
Posted in Ruby | 3 comments | no trackbacks
Posted by Charlie
Mon, 10 Sep 2007 20:03:00 GMT
Its been over a year since I blogged about GEOS, an open source project that provides rich functionality for analyzing
and manipulating geometries and is a key part of PostGIS.
Sandros and Mateusz spent a ton of time refactoring and improving GEOS last year, and it looks like version 3.0 will finally be released this fall. As part of the 3.0 release, I've completely rebuilt the Ruby and Python bindings to use GEOS's C API. In theory this should decouple the bindings somewhat from GEOS releases - we'll see if that works out in practice.
Over the last couple of weeks, Mark Cave Ayland and I have spent time polishing off some rough edges. We managed to:
- Fix various autoconf build issues
- Get GEOS building with both older and newer versions of MingW and Msys on Windows
In addition, I added VC++ project files for the Python and Ruby bindings and updated the Ruby bindings to be more "Ruby" like.
Long story short - building GEOS and the Ruby/Python bindings should now be easier and more fool proof. If you're in the GNU world, its as simple as:
./configure --enable-python --enable-ruby
If you're in the Microsoft world, just open the VC++ solution in the source tree (thanks Mateusz), tweak the locations of your Ruby and Python installations, and hit the compile button.
So if you are using, or planning to use, the bindings I'd recommend grabbing the latest from SVN. Once you've built everything, then take a look at the test scripts I've written in the swig/ruby/test and swig/python/test directories. They should provide enough examples to get you started.
Feedback is of course welcome, particularly from Python users.
Since I use Ruby day in and day out, I know the right Rubyisms
to put into the bindings. But I don't know Python well enough to know the right Pythonisms.
Posted in GIS, Ruby | no comments | no trackbacks
Posted by Charlie
Wed, 18 Jul 2007 17:12:00 GMT
Last week I showed how to profile a Rails application using ruby-prof. The post was driven by our desire to improve MapBuzz's map rendering speed. Our nightly log analysis showed that rendering speed has significantly degraded over the last few weeks. Since the whole point of MapBuzz is to share maps with others, we needed to find out what was taking so long.
A day of profiling and we had our answer. With a few simple changes we were able to reduce rendering time from 7.8 seconds to 0.92 seconds (note these times are gathered while profiling the application, so the real times are 2 to 3 times less).
What we found should be generally applicable to all Rails applications, so I'd like to share them:
If you interested in the nitty-gritty details of each change, keep reading!
A Bit of Background
MapBuzz renders maps in a two step process. First, the html page is created and shown to the user. Second, the browser makes an Ajax request to get the features that should be shown on the map. The reason for having a two step process is that it allows users to zoom and pan around the map, without having to reload the whole page.
Log analysis revealed that the Ajax request was taking a long time - 1 to 2 seconds, with a big standard deviation. Our goal is to reduce the average time to less than half a second and to significantly reduce the standard deviation.
If you look at the request URI, you'll see its an Atom feed of features and not an HTML page. That has performance implications.
First, Atom feeds tend to have lots of links. In our case, each entry has 10 to 15 links. We limit the number of entries to 50, so that means there are roughly 600 link per page.
Second, we generate Atom feeds in a modular fashion - each entry is created from 10 partials. There is a partial for the user, one for ratings, one for content, one for icons, one for geometries, etc. Why did we do it this way? Simple - to keep things DRY. Having a set of partials creates an extremely modular system - we can mix and match them as needed depending on the context of the request.
Setup
Testing was done with two machines. The first was my laptop, which is a Dell Latitude D610 running windows, Ruby 1.8.4, and Rails 1.2.3. The second is our staging server, which is fairly high end Dell Desktop running Fedora Core 6 with a copy of our production database (which is about 10Gb). Profiling was done with ruby-prof 0.5.1 (of course), using the Rails plugin I described previously. Note that the logging level was set to :debug, which will show up in the test results.
Baseline
To start we measured the baseline performance of our test setup. It was ugly indeed:
Total: 7.828
%self total self wait child calls name
24.11 1.95 1.89 0.00 0.06 3605 Kernel#clone
13.15 1.04 1.03 0.02 0.00 569 IO#write
7.78 4.13 0.61 0.00 3.52 441 <Module::Benchmark>#
realtime-1
7.41 0.58 0.58 0.00 0.00 125 PGconn#exec
One bit of solace is that profiling an application will slow it down 2 to 3 times, but that sill leaves us with a 4 seconds per request. So let's start digging through the results.
The first thing that jumps out is the huge amount of time Kernel#clone takes. A look at the call graph shows that the caller is clone_attribute_value.
And a bit more digging reveals this custom code in our application:
def rating
# Convert cached rating from BigDecimal to float,
# otherwise strange rounding errors happen
attributes['rating'].to_f
end
The problem is that when you call ActiveRecord#attributes it returns a copy of the attributes, thus generating all the clone calls. Uuggh. I think this is a result of ActiveRecord's flawed attribute implementation, but that is for another blog.
What we want is access to the untyped, original value of the attribute. In theory you are supposed to use the auto generated method, rating_before_type_cast, for this. This has the advantage of skipping the clone call, but it relies on method_missing, which has some overhead (we did not measure how much). Or, you could use read_attribute directly, which also skips the clone call. This is usually your best choice.
For performance critical code, you may wish to read the untyped attribute directly (see the discussion about time below for an example). That can be done using the method read_attribute_before_type_cast, except that it is private. However, with Ruby, that's easily solved:
module ActiveRecord
class Base
public :read_attribute_before_type_cast
end
end
Lesson - Don't use attributes
Gain - 24%
Let's look at our updated results:
Total: 3.39
%self total self wait child calls name
16.73 0.57 0.57 0.00 0.00 569 IO#write
12.83 0.44 0.44 0.00 0.00 125 PGconn#exec
8.76 1.77 0.30 0.00 1.48 441 <Module::Benchmark>#
realtime-1
5.99 0.22 0.20 0.00 0.01 40 PGconn#query
What stands out are the 125 calls to PGconn#exec - each one is a separate query. Looks like we forgot to specify a table or two to be eager loaded via ActiveRecord#find's :include option.
Note there is one downside to using :include - you lose the ability to use the :select option. For us that is important - and can be worked around by using the rails select_parser I've blogged about before.
Lesson - Get your :includes right
Gain - 12%
After fixing attributes and includes, here is our current timings:
Total: 2.594
%self total self wait child calls name
12.64 1.14 0.33 0.00 0.81 361 <Module::Benchmark>#
realtime-1
12.61 0.33 0.33 0.00 0.00 409 IO#write
4.90 0.13 0.13 0.00 0.00 403 <Class::File>#mtime
4.86 0.16 0.13 0.00 0.03 30250 Hash#[]
3.62 0.09 0.09 0.00 0.00 5 PGconn#exec
A couple of things jump out. First, using Benchmark#realtime is fairly time consuming.
Second, logging also takes its toll as seen in the times for IO#write (remember we have
logging set to :debug).
However, the #mtime call looks suspicious. Some more digging through Rails shows all 403 calls come from
ActionView::Base#compile_template?. Let's take a look:
def compile_template?(template, file_name, local_assigns)
method_key = file_name || template
render_symbol = @@method_names[method_key]
if @@compile_time[render_symbol] &&
supports_local_assigns?(render_symbol, local_assigns)
if file_name && !@@cache_template_loading
@@compile_time[render_symbol] < File.mtime(file_name) ||
(File.symlink?(file_name) &&
(@@compile_time[render_symbol] < File.lstat(file_name).mtime))
end
else
true
end
end
Remember I mentioned that it takes roughly 500 partial calls to generate the output? This is where it bites us -
by default Rails checks the timestamp of cached templates before running them. In a production environment that
is totally unnecessary. The solution is simple - just update your production.rb file like this:
# Don't check view timestamps!
config.action_view.cache_template_loading = true
And while we are at it, we'll change the debug level to :info to reduce its impact on the results.
Lesson - Set cache_template_loading to true
Gain - 5%
We've managed to reduce the request time from 7.8 to 2.5 seconds. Not bad. But we still have more work to do.
Total: 2.531
%self total self wait child calls name
13.59 1.91 0.34 0.00 1.56 45 <Module::Benchmark>#
realtime
12.25 0.31 0.31 0.00 0.00 409 IO#write
4.98 0.81 0.13 0.00 0.69 361 <Module::Benchmark>#
realtime-1
4.94 0.13 0.13 0.00 0.00 40 ActiveRecord::Associations::
HasManyThroughAssociation#
construct_sql
3.71 0.09 0.09 0.00 0.00 5 PGconn#exec
3.04 0.08 0.08 0.00 0.00 29847 Hash#[]
3.00 0.11 0.08 0.00 0.03 2097 Hash#each
1.86 0.28 0.05 0.00 0.23 322 ActionController::Routing::
RouteSet#generate
Of particular interest is the ActionController::Routing::RouteSet#generate. Although it only runs for 0.05 seconds, its total time, including children, is 0.28. Thus creating 322 urls takes 11% of the request time.
People have previously mentioned how slow url_for is - well, its true. Instead of using url_for, just create your urls manually using string mashing. For example, if you want an absolute uri, then do this:
"#{request.protocol}#{request.host_with_port}/controller/#{map.id}"
Lesson - Don't use url_for or link_to
Gain - 11%
After removing url_for, we've smashed the 2 second barrier:
Total: 1.391
%self total self wait child calls name
9.06 0.13 0.13 0.00 0.00 726 String#sub!
8.99 0.13 0.13 0.00 0.00 40 ActionView::Base::CompiledTemplates#
_run_ratom_47app47views47geometry47_geometry46ratom
6.69 0.09 0.09 0.00 0.00 5 PGconn#exec
5.54 0.08 0.08 0.00 0.00 741 Item#id
5.54 0.30 0.08 0.00 0.22 121 <Class::Date>#_parse
Next up is <Class::Date>#_parse, whose total time, including children, is a whopping 21% of the time. Ouch.
If you take a look at the code (its in the standard ruby library), you can see its quite complicated
since it tries to parse a variety of date formats. In addition, it makes use of rational numbers, which are none too fast.
Taking a look at a graph profile, we see that <Class::Date>#_parse is called from <Class::ActiveRecord::ConnectionAdapters::Column>#string_to_time.
Thus Rails uses it to convert timestamps received from the database into Ruby objects.
As I detailed in a recent post, the problem is that Ruby's DateTime implementation is extremely slow compared to its Time implementation. The solution is to avoid DateTime's entirely and use custom time parsing code. Fortunately, our database can output time in ISO 861 format, which Time can quickly parse.
So, any place we access times, we simply roll our own attribute readers and writers like this (note that we don't use read_attribute!).
def created_on
@created_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end
def created_on=(value)
write_attribute(:created_on, value)
@created_on = nil
end
def updated_on
@updated_on || Time.iso8601(read_attribute_before_type_cast('created_on'))
end
Lesson - Don't let Rails parse timestamps - do it yourself
Gain - 20%
We've almost hit the 1 second barrier:
Total: 1.156
%self total self wait child calls name
8.13 0.11 0.09 0.00 0.01 642 Kernel#send-3
8.13 0.09 0.09 0.00 0.00 5 PGconn#exec
6.75 0.08 0.08 0.00 0.00 7497 String#concat
4.15 0.05 0.05 0.00 0.00 817 Hash#each
4.07 0.05 0.05 0.00 0.00 40 ActiveRecord::Associations::
HasAndBelongsToManyAssociation#
construct_sql
2.77 0.03 0.03 0.00 0.00 6360 String#match
2.77 0.06 0.03 0.00 0.03 47 Array#each_index
2.77 0.03 0.03 0.00 0.00 280 IconType#size
2.60 0.03 0.03 0.00 0.00 4309 Array#[]
At this point, most of the easy wins are gone. However, there is at least one left. Although its not shown in the flat profile, the graph profile shows that the method ActiveSupport::CoreExtensions::Hash::Keys#symbolize_keys takes 2.77% of the time including its children. Additional rummaging through the call graph shows that the calls come from ActionView#compile_and_render_template:
def compile_and_render_template(extension, template = nil,
file_path = nil, local_assigns = {})
# convert string keys to symbols if requested
local_assigns = local_assigns.symbolize_keys if
@@local_assigns_support_string_keys
A bit of research shows that local_assigns_support_string_keys is deprecated and slated for removal from Rails! Sweet - that means we can save an additional 3% simply by adding this line to environment.rb:
config.action_view.local_assigns_support_string_keys = false
If only all optimizations were so simple.
Lesson - Don't symbolize keys
Gain - 3%
Wrapping Up
So what's our final time? Let's see:
Total: 0.922
%self total self wait child calls name
10.30 0.10 0.10 0.00 0.00 5 PGconn#exec
3.47 0.17 0.03 0.00 0.14 312 Array#each-2
3.47 0.03 0.03 0.00 0.00 212 ActiveRecord::ConnectionAdapters::
Quoting#quote
3.47 0.03 0.03 0.00 0.00 19455 Hash#[]
3.47 0.36 0.03 0.00 0.33 360 ActionView::Base#render-2
3.36 0.06 0.03 0.00 0.03 47 Array#each_index
3.36 0.03 0.03 0.00 0.00 299 ActiveRecord::Associations::
AssociationProxy#initialize
3.36 0.14 0.03 0.00 0.11 705 ActiveRecord::Associations::
AssociationProxy#method_missing
Not bad - we reduced the request time from 7.8 to 0.92 seconds with a day's worth of work. Obviously there is still much to be done - we need to run the code through our performance monitoring suite and make sure that the average time holds across a number of requests and that the standard deviation is in line. But we've at least made a good start.
Posted in Programming, Rails, Ruby, ruby-prof | 27 comments | 7 trackbacks
Posted by Charlie
Fri, 13 Jul 2007 17:05:00 GMT
While analyzing Rails performance, I stumbled upon the fact that parsing timestamps in ActiveRecord is very slow (I'll post more about that in an upcoming blog).
A bit of googling revealed that Charlie Bowman discovered this a year an a half ago. He has a nice post comparing the performance of Ruby's Time and DateTime classes. Sadly he didn't use ruby-prof to come up with his results, but we'll forgive him for that :)
Now you might reasonably assume that the Time class deals with, er, time and the DateTime class deals with date times, which we'll refer to as timestamps. Well, like me, you'd be wrong. Totally wrong.
It turns out that Time and DateTime both deal with timestamps. But they are implemented completely differently.
Time is a c extension that wraps your operating system's time functionality. It stores times as seconds from January 1st, 1970. Its biggest upside is performance. Its biggest downside is its limited range - times before 1970 are "out of range." I have to admit that doesn't make any sense to me since no other language I've used suffers from this problem and I'm fairly sure Windows and Linux don't either. But no matter.
DateTime is a pure ruby implementation of time that let's you store a much wider range of dates. It also supports both the Julian and Gregorian calendars - on the off hand chance you need such a thing.
Rails takes the safe route and converts timestamps to DateTimes. But there is a performance penalty to pay. Below is simple little script showing how much time it takes to parse a timestamp using 3 approaches - DateTime.parse, Time.parse and Time.xmlschema.
require 'date'
require 'ruby-prof'
# Use xml schema format
now = '2007-07-12T23:36:25.609375-06:00'
result = RubyProf.profile do
1000.times do
time = DateTime.parse(now)
end
end
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})
result = RubyProf.profile do
1000.times do
time = Time.parse(now)
end
end
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})
result = RubyProf.profile do
1000.times do
time = Time.xmlschema(now)
end
end
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, {:min_percent => 1})
And the results:
Total: 3.86
%self total self wait child calls name
14.15 0.83 0.55 0.00 0.28 19000 Integer#gcd
10.91 2.16 0.42 0.00 1.74 19000 <Class::Rational>#reduce
8.06 0.37 0.31 0.00 0.06 27000 Rational#initialize
7.33 0.28 0.28 0.00 0.00 6000 String#sub!
5.70 0.74 0.22 0.00 0.52 27000 Class#new
4.87 0.94 0.19 0.00 0.75 1000 <Class::Date>#_parse
3.68 0.14 0.14 0.00 0.00 29000 <Class::Object>#allocate
3.65 2.30 0.14 0.00 2.16 19000 Object#Rational
3.26 0.86 0.13 0.00 0.74 27000 <Class::Rational>#new!
3.24 0.79 0.13 0.00 0.67 4000 Rational#+
---------------
Total: 1.359
%self total self wait child calls name
32.16 0.44 0.44 0.00 0.00 6000 String#sub!
6.92 0.95 0.09 0.00 0.86 1000 <Class::Date>#_parse
6.84 0.36 0.09 0.00 0.27 1000 <Class::Time>#make_time
5.81 0.23 0.08 0.00 0.15 1000 Rational#/
5.74 0.08 0.08 0.00 0.00 5000 Rational#initialize
4.71 0.10 0.06 0.00 0.03 1000 <Class::Time>#apply_offset
4.49 0.09 0.06 0.00 0.03 3000 Integer#gcd
3.46 0.23 0.05 0.00 0.19 3000 <Class::Rational>#reduce
3.46 1.36 0.05 0.00 1.31 1000 <Class::Time>#parse
---------------
Thread ID: 21259270
Total: 0.297
%self total self wait child calls name
37.04 0.14 0.11 0.00 0.03 1000 <Class::Time>#zone_offset
15.82 0.30 0.05 0.00 0.25 1000 <Class::Time>#xmlschema
10.44 0.08 0.03 0.00 0.05 1000 <Class::Time>#apply_offset
5.39 0.02 0.02 0.00 0.00 1000 <Class::Time>#utc
5.39 0.03 0.02 0.00 0.02 1000 <Class::Time>#month_days
5.39 0.02 0.02 0.00 0.00 3000 Fixnum#*
5.39 0.02 0.02 0.00 0.00 1000 Fixnum#-
5.05 0.01 0.01 0.00 0.00 4000 Fixnum#divmod
5.05 0.01 0.01 0.00 0.00 1000 String#upcase
5.05 0.01 0.01 0.00 0.00 8000 String#to_i
DateTime.parse is slow because its a complicated piece of code that relies heavily on rational numbers. Time.parse is roughly 3 times faster, although if you look under the hood, it actually uses part of DateTime.parse. That seems a bit strange to me, but I didn't spend the time to investigate further.
The real winner here is Time.xmlschema which is a over 10 times faster than DateTime.parse. Why is it so much faster? Instead of a mashing a bunch of strings, it uses a regular expression to parse timestamps. Thus it trades flexibility for performance.
When talking to a database there is no need for flexibility - your database most likely outputs time in the exact same way all the time. So if you heavily use dates in your Rails application, then roll your own timestamp parsing.
Posted in Rails, Ruby | 3 comments | 1 trackback
Posted by Charlie
Mon, 09 Jul 2007 06:56:00 GMT
Make sure to grab the latest version of ruby-prof, currently 0.5.2, which includes some performance tweaks and bug fixes.
A year has passed since the last release of ruby-prof. If you haven't used ruby-prof, its a superfast, open-source, profiler for Ruby that shows you where your program is slow.
Over the last few months the list of bugs and enhancements on Ruby Forge has been accumulating at an embarassing rate. And not to mention, I need to profile our rails app.
So its time for another release.
And this isn't just a minor bug fix release - its a major update that adds lots of new stuff. To install it, open a command prompt and type:
gem install ruby-prof
Then pick the appropriate gem for your platform.
Multithreaded Applications
The most important change is support for multi-threaded applications. Previously, ruby-prof correctly recognized the threads in a program and their independent call stacks, but it got the thread times wrong. ruby-prof would incorrectly add a child thread's time to a parent thread's time, making hard to figure out where the time in a program was spent. Now, ruby-prof keeps track of a thread's "wait time," which is the time it spends waiting for other threads. Thanks to Sylvain Joyeux for a patch that got the ball rolling on better thread support.
Measurement Modes
Sylvain also submitted a second patch that suprised me. Instead of tracking method times, why not track object allocations? That seemed reasonable, but he implemented it as another type of clock mode (ruby-prof supports 3 different ways of measuring time - process time, wall time and cpu time).
At first I was hesitant about the patch - shoehorning object allocations into clock modes didn't make much sense to me. But after thinking about it a bit, I decided Sylvain was onto something. Instead of thinking about "clock modes" I started thinking about "measurement modes" (catchy, isn't it?). And once you have that perspective, then it makes sense for ruby-prof to measure all sorts of things in a running program - time, object allocations, memory usage, etc.
So with a bit of refactoring, ruby-prof's now supports for "measure modes." They are process time, wall time, cpu time an object allocation. And now its easy to add more if anyone wants to submit a patch!
I should note there is one downside to the object allocations mode - it only works if you have a patched Ruby interpreter (if you don't ruby-prof works fine, you just don't get access to allocation information).
Rails
What use would a Ruby profiler be if it didn't work with Rails? ruby-prof now provides some hooks to make easier to profile your rails app. But I'll leave those for a future blog post.
Call Graph Format
Carl Shimer implemented support for the call graph format which is used by KCachegrind to visualize calling information.
Recursive Methods
Finally, ruby-prof has much improved support for recursive method calls as explained in the updated readme file.
Bug Fixes
And last, this release fixes some nasty bugs:
Unknown Singleton Object - This was a tough one since I didn't understand how to reproduce it. But luckily Matthew Fallshaw submitted a reproducable test case, and from there it was just a matter of reading the translated version of the Ruby Hacking Guide to unravel the mysteries of Ruby's singleton classes.
64-bit support - The 0.4.x releases used ints to store pointers instead of longs, which meant they didn't compile and/or work on 64 bit machines. This is now fixed thanks to a patch submitted by Diego 'Flameeyes' Pettenò.
IRB support - Previously you couldn't start ruby-prof in a method and then exit that method. Which meant that ruby-prof couldn't be used in IRB. That restriction has now been removed.
sort order - The sort orderan in generated reports was wrong. Graph reports now sort based on a method's total percent time while flat reports use a method's self percent time.
So give version 0.5.0 a try and see how you like it!
Posted in Ruby, ruby-prof | 2 comments | 2 trackbacks
Posted by Charlie
Tue, 13 Feb 2007 20:45:00 GMT
One reason developers like ActiveRecord is that it automatically generates the SQL needed for querying a relational database. Obviously, there are various knobs for controlling the generated SQL. You can use the :select option to modify the fields returned in a result a set. You improve performance can use the :include option to specify that related tables are loaded via joins.
Unfortunately, :select and :include don't work together. The reason is that when you specify :include ActiveRecord renames every field returned in the query to avoid name conflicts. ActiveRecord then uses those field names to build a graph of ActiveRecord objects.
Supporting :select with :include requires a full-fledged SQL SELECT parser. This might not be obvious at first sight - it seems like you could just use regular expressions. But remember that the SELECT clause of a query can be quite complex. For example:
SELECT field1, 3+4 AS field2, 3/sum(max(3,2))
FROM some_table
JOIN some_other_table
ON some_table.id = some_other_table.some_table_id
Not to mention that some database systems let you nest a SELECT inside of a SELECT.
Parser Combinators
There are a many ways of creating parsers, with the most famous being lex and yacc. To use lex and yacc you first specify the parsing rules using EBNF or equivalent. You then run lex and yacc to create a parser in your target language, which is generally C. However, there is a Ruby implementation of lex and yacc, called racc, which you can use to generate a parser in Ruby.
Another approach is to create your parser by hand. This might sound like a silly idea, but it has a couple of advantages. First, it teaches you how parsers work. Second, its simpler since you only have to know one language. Third, it lets you take advantage of the abilities of the language you are using.
One way to approach the problem is to use parser combinators. The idea came out of functional languages, so some of the papers on the web are a bit dense. But the basic idea is to create a number of different parsers, each that does a specific task, and combine them using the Composite pattern.
For example, let's say I want to parse a simple SQL case statement:
SELECT
CASE field1 < 100
WHEN true THEN 'small'
ELSE 'big'
END
Here is some example Ruby code that does it:
when_clause = sequence(keyword[:when], lazy_expr,
(operator[':'] | keyword['then']),
lazy_expr) do |_,cond,_,val|
[cond, val]
end
default_case = (keyword[:else] >> lazy_expr).optional
case_expr = sequence(keyword[:case], lazy_expr,
when_clause.many, default_case,
keyword[:end]) do |val, whens, default, _|
CaseExpr.new(val, whens, default)
end
Notice how readable it is. Working from the top down:
- A when clause is a sequence of the keyword WHEN, followed by some expression, followed by : or the keyword THEN followed by an expression
- The default clause is optional and consists of the keyword ELSE which proceeds some expression
- A case statement is made up of the keyword CASE, followed by some expression, followed by any number of when clauses, a default clause and finally the keyword END.
The end result is a CaseExpr object that contains the value of the expression, an array of the when statements and a default else statement.
Not too bad is it? Note that everything is a parser. Look again at the variable default_case. It points at an optional parser that contains a sequence parser (created by >>). The sequence parser contains a keyword parser and an expression parser (which of course is a complicated parser in its own right).
rparsec
You could build your own parser combinator framework for Ruby, but luckily Ben Yu has already created one called rparsec. Although the documentation says rparsec is a port of Haskell's parsec library, it sure doesn't look like it to me (although my Haskell skills are minimal at best). Instead, it looks like a port of the Java combinator parser framework described by Steven Metsker in his book Building Parsers with Java.
Steven's book is a fantastic introduction to parser combinators - a must read if you're interested in understanding parser combinators. It takes you step-by-step through creating your own parser and the pitfalls you will encounter along the way. And if you subscribe to Safari you can read it online.
A Rails Plugin
To use the select parser with Rails:
- Install the rparsec gem
- Install the rails plugin
Good luck and happy parsing!
Posted in Rails, Ruby | no comments | no trackbacks