Posted by Charlie
Sun, 29 Mar 2009 04:29:00 GMT
As noted else where, ruby 1.9.1 hasn't exactly bounded out of the gate. That's not particularly surprising, considering 1.9.1 has been available for only a couple of months and requires changes to existing code. In addition, there are a number of incompatible gems, giving rise to the isitrub19y website as a clearing house of information. So despite the great efforts from the Rails team, the rest of the community is still lagging behind.
That's particularly true on Windows, where a new one-click installer isn't yet available. According to the latest market share stats from Net Applications, Windows controls 88% of the desktop market. I have no idea how many Ruby installations exist, and how they are divided by operating system. But looking at RubyForge, by far and away the most popular download of all times is the Windows one-click installer with over 3 million downloads.
Luis Lavena has taken over stewardship of the one-click installer, and clearly needs a bit of help. So although I have very little free time, I offered to pitch in as I could. While Luis is concentrating on putting together a new version of the one-click installer using Mingw and msys, I thought I could help out by putting 1.9.1 through its paces on Windows.
My basic approach was to simply start with the basics:
- Build ruby with Visual Studio 2008
- Build the default extensions and libraries Ruby uses (zlib, iconv, openssl, etc)
- Run Ruby's unit tests
That was almost a month ago. Thirty-nine patches later (I have no doubt Nobu is getting sick of me), I just about have Ruby 1.9.1's test suite running on Windows. There a still a few remaining issues, in particular a couple of imap tests that hang.
As for Visual Studio, I'm using it for two reasons. First, it has a lights-out debugger that makes it much easier to track down and fix problems. Second, its lets you compile instrumented executable and libraries that can detect incorrect API usage, heap corruption, stack corruption and mismatched calling conventions.
It quickly became obvious that no one had ever done that with Ruby, because it turned up a whole host of issues. For example, the dl extension used the cdecl calling convention to call the Windows API instead of stdcall. Or that there were a set of memory leaks in printf/sprintf.
The other thing that was bothersome was the huge number of compiler warnings generated by building Ruby. See for your self - and then realize the original list doesn't include any of the warnings generated by building Ruby's extensions. Cleaning up the warnings took a number of patches, but at this point most of them have been fixed. And all credit to Nobu for working through my patches, fixing them and applying them since my knowledge of the Ruby runtime is fairly limited, thereby causing most of my patches to not be quite right.
Anyway, since its not all that obvious how to build Ruby on Windows (with Visual Studio or Mingw), I'll see if I can put together a few posts that describe how to do it for anyone who wants to roll their own.
Posted in Ruby | 9 comments | no trackbacks
Posted by Charlie
Sun, 22 Mar 2009 05:19:00 GMT
I'm happy to announce the release of libxml-ruby 1.1.3. Besides including the usual assortment of new features and bug fixes, this release also includes a speed boost of roughly 10% to 20%.
This resulted from RubyInside's recent post summarizing the performance of Ruby parsers. As expected, libxml-ruby blew away Hpricot and REXML in pure parsing speed (which of course is a simplistic view of what is important in an xml processor, but nevertheless still important). But it consistently finished a bit behind Nokogiri.
I was a bit surprised by that since libxml-ruby and Nokogiri use the libxml2 library as their parsing engine. Since the specific test cases almost exclusively tested parsing, the two extensions should have identical run times.
Since the times were different, then the obvious conclusion was that the two extensions were using different libxml2 APIs or using different settings. I suspected the second, but when investigating performance you never know beforehand.
Not to bore everyone with the nitty-gritty details of using libxml2, but when looking into the first test, parsing an in-memory string, it didn't look there was much difference in API calls.
For libxml-ruby:
xmlCreateMemoryParserCtxt
xmlParseDocument
For Nokogiri:
xmlReadMemory
-> xmlCreateMemoryParserCtxt
-> xmlDoRead
-> xmlParseDocument
So that didn't solve the mystery.
The next possibility was xmlDoRead was modifying the libxml2 parser context. Now a libxml2 parser context is a beast of a thing - for those brave souls who want to take a peek, its defined in libxml2's online documentation.
Working through the options one-by-one, I finally found the culprit, an obscure field in the structure:
int dictNames : Use dictionary names for the tree
What this setting controls is whether libxml2 uses a dictionary to cache strings it has previously parsed. Caching strings makes a big difference, so by default it should be enabled. That is now the case with libxml-ruby 1.1.3 and higher.
Rerunning the published benchmarks now shows libxml-ruby and Nokogiri to have equivalent performance. If you run the tests yourself, beware though. The order in which the extensions are tested changes the results. Whichever extension is tested first will always be faster, at least on my Fedora 10 box. I assume that's because the first parser has more memory available to it when the test begins and therefore invokes Ruby's garbage collector a few times less.
Posted in Ruby | 6 comments | no trackbacks
Posted by Charlie
Wed, 11 Mar 2009 05:52:00 GMT
A mere seven years after its inception, libxml-ruby has finally reached version 1.0.
libxml-ruby provides ruby, via the
libxml2 libary, the super fast,
feature rich xml parser that is has sorely lacked.
Last year I
posted about the resurrection of the project, and since then
we've made enormous progress. The 1.0 release marks the culmination
of this work, and comes with tons of goodies:
- Ruby 1.9.1 support
- Out of the box support for OS X 10.5 and MacPorts
- Greatly expanded
documentation
- Much better test coverage
- A nice, clean API that makes it easy to do simple things, but
provides all the power of libxml2 if you need it
Not to mention that libxml-ruby is blindingly fast and
incredibly feature rich (see my
post from last year for all the details), making it the choice
for a number of high-traffic websites.
So give them a try - its as easy to install as:
gem install libxml-ruby
And if you feel like polishing your ruby, xml, or C skills, come
join the community!
Posted in Ruby | 5 comments | no trackbacks
Posted by Charlie
Sun, 08 Mar 2009 01:20:00 GMT
One of the great new features of the upcoming postgresql 8.4 release is the addition of window functions. Previously limited to enterprise databases such as Oracle and DB2, they open up a whole new world of functionality to sql queries.
Window functions are one of the more obscure parts of the sql standard, so you may never have heard of them. In a nutshell, they let you perform calculations based on the current record and its set of related records. This turns out to be quite useful. A good place to find out more information is the postgresql documentation, which does a good job of explaining some of the more common use cases.
Workplace of the Future
My introduction to window functions was almost five years ago while doing a project for Ubisense. Ubisense sells indoor tracking systems, based on ultra-wideband, that can locate tags within 6 inches.
For one of projects, we worked on Cisco's Connected Workspace. The Connected Workspace was designed to see if office space could be laid out in a way to increase worker happiness and productivity. To do this, Cisco took all the cubicles out of the main floor of one of its buildings and replaced it with a fairly radical design. Roughly half of the floor was made into a a large open open space with individual and group desks. The remainder of the floor was split between a large kitchen with a really nice eating room and offices that ranged in size from 1 to 12 people. Here is a picture of the main floor area (courtesy Cisco Systems):

For a few more pictures, check out Cisco's presentation.
The idea was that employees could sit wherever they wanted, there were no assigned seats. If employees needed to collaborate they could work in the open areas, if they needed privacy they could grab one of the smaller offices and if they needed to do a conference call they could grab one of the larger offices.
The other impetus behind the experiment was financial. Cisco has a huge campus in Santa Clara hundreds of buildings, each costing millions of dollars to maintain. Was it possible to pack more people into each building and maintain, or improve, their hapiness and productivity?
The Experiment
Ubisense was hired to figure out how well the different parts of the connect workspace were utilized. By giving each employee a tag, the system anonymously keep track of each time someone entered or left a room. This aggregate data could then be used to gain insight into the effectiveness of the new floor plan:
- Did employees spend time in the open area?
- If so, in which parts of the open area (it was divided into 5 subdivisons)?
- How much were the individual offices being used? Were there too many or too few?
- What about the larger conference room?
- How much was the kitchen and eating area utilized?
To do this, I hooked into Ubisense's platform API to monitor each time a tag entered or left a room. That information was then entered into a Oracle database (without any user information, so the data was totally anonymous). Thus the Oracle table consisted of millions of rows of data - with each row representing an tag entering a room or leaving a room. For example, here is a simplified view of the data:
| tag_id |
room_id |
event |
time |
| 1 |
Conference #1 |
Enter |
10:00am |
| 2 |
Office #2 |
Enter |
11:15am |
| 2 |
Office #2 |
Leave |
11:20am |
| 1 |
Conference #1 |
Leave |
11:30am |
Window Functions to the Rescue
The next trick was to analyze the data to answer the questions I posed above. To do that required figuring out how much time each tag spent in each room. So something like this:
| room_id |
enter |
leave |
duration |
| Conference #1 |
10:00am |
11:30am |
1 hour 30 min |
| Office #2 |
11:15am |
11:20am |
5 min |
OObviously you could write a script in the language of your choice to process the raw data and populate this new table. But that adds another level of complexity to the system and makes it hard to do add-hoc queries.
And this is where window functions are so useful. Using window functions, you can implement the basic algorithm fully in >
- Sort the data by tag_id, room_id and id so that room enter records for a tag are directly followed by room exit records
- Select the room exit records
- Use theUse the lag window function to pull the previous record, which is the room enter record, and then subtract the two times to get the duration
- Wrap this query up in a view, let's call it room_usage, that can serve as the basis for add-hoc queries or reports.
Without window functions, item #3 is impossible with sql because there is no way to relate a record to its surrounding records (ie., a window).
And thus window functions provide a great new data analysis tools which postgresql will make available to everyone at no-cost.
Posted in Technology | 2 comments | no trackbacks
Posted by Charlie
Thu, 13 Nov 2008 18:11:00 GMT
Last year I wrote about how to profile your Rails application, which is a lot harder then it seems. Its not so much the profiling itself - its easy enough to create one-off results. Instead, its coming up with a reproducible process that lets you measure performance changes over time.
Some things that don't work over the long term:
- Insert profiling code into your application code
- Use unit tests for profiling
- Use functional tests for profiling
- Use integration tests for profiling
- Modify standard rails environments (test, development, production for profiling)
So the latest version of ruby-prof introduces a new approach to profiling your Ruby or Rails code that is heavily based on the excellent work Jeremy has done on the request profiler included in newer versions of Rails.
The basic idea is to extend Ruby's TestUnit library so that individual test cases are profiled by including a new RubyProf::Test module. When you include this module, ruby-prof will run each test once as a warm up and then ten more times to gather profiling data (using another new feature of the 0.7.0 release, the ability to pause and resume a profiling run). Profile data is then output for each test.
Let's look at an example:
class ExampleTest < Test::Unit::TestCase
include RubyProf::Test
def test_stuff
puts "Test method"
end
end
The line include RubyProf::Test turns the test case into a profiling test case. The same approach could be used for hooking into other testing frameworks - all patches are of course welcome!
Using a Profile Environment for Rails
Now lets talk about profiling Rails. There are two main issues that make it harder then it seems.
First, to get any useful data you need to profile a Rails app using the production environment settings in conjunction with a test database. Using the development environment doesn't work because the time it takes Rails to reload classes on each request drowns out any useful information.
Second, how should profile tests be written and where should they go?
The solution I've adopted is to use functional like-tests that use a PROFILE environment, and place them in a directory called test/profile.
Let's look at another example:
require File.dirname(__FILE__) + '/../profile_test_helper'
class MyControllerTest < Test::Unit::TestCase
include RubyProf::Test
fixtures :my_fixture
def setup
@controller = MyController.new
@request = ActionController::TestRequest.new
@response = ActionController::TestResponse.new
end
def test_get
get(:index)
end
end
The only difference between a functional test and a profile test are the inclusion of the RubyProf::Test module and loading profile_test_helper.rb. profile_test_helper is unfortunately needed because the standard test_helper.rb file Rails uses loads the TEST environment. Hopefully future versions of Rails will fix this by allowing greater flexibility in specifying a test environment.
So to get started with profiling your Rails application:
- Copy profile_test_helper.rb from the ruby-prof distribution to your rails test directory
- Modify profile_test_helper.rb as needed to set ruby-prof's output directory
- Create a profile.rb file in the environments directory
- Update your databases.yml file to include a profile database (just map it to your test database)
- Create a new directory test/profile
- Start writing profiling tests that look similar to the above example
And now you'll have reproducible profiling tests cases.
So what's missing? A way of keeping track of how your applications performance changes over time. A quick hack is to use source control to keep profile tests results around. A more sophisticated solution would be to use ruby-prof's API to dump profile results into a database and then put a nice web front end onto it. Any takers?
Posted in ruby-prof | no comments | no trackbacks
Posted by Charlie
Wed, 12 Nov 2008 16:41:00 GMT
I'm happy to announce the release of ruby-prof 0.7.0, the superfast, open-source, Ruby profiler that helps you find bottlenecks in your Ruby code. This release was a joint effort, with major contributions from Jeremy Kemper (aka bitsweat) of Rails fame and Hin Boen from CodeGear. There are two major new features in this release, as well as a number of smaller enhancements and bug fixes. For a full list of changes, take a look at the release notes.
The first major new feature is improved Rails profiling, which I'll talk about in a separate post.
The second major feature is significant internal changes that make it easier to integrate ruby-prof with IDEs. ruby-prof is already being used by Aptana's RadRails and has been integrated into the next version of Code Gear's 3rd Rail. As part of this work, Hin has built a user interface for ruby-prof that lets a user inspect individual methods to see how much time they took as well as how they were called.
One big problem though, previous versions of ruby-prof only kept track of aggregate data. This made it impossible for Hin to create the user interface he wanted. For example, look at this call sequence:
A
/ \
B K
/ \ \
C D B
/ \
C D
With earlier versions of ruby-prof, there was no way to tell what percent of the time spent in method C was a result of the A -> B -> C call sequence versus the A -> K -> B -> C call sequence.
Or take another example:
A K
| |
B B
| |
C D
In this case, if you tried to reconstruct the call sequence from ruby-prof you would end up with this incorrect result:
A K
| /
B
/ \
C D
So working with Hin, I rearchitected ruby-prof to keep track of full call sequences. Most likely you won't notice any difference - the changes will only affect you if you use ruby-prof's api to present results in a custom way. In that case, you'll have to update your code, which should only take a few minutes (to see the api in use, take a look at the various printer classes that ship with ruby-prof).
Enjoy, and all feedback is welcome.
Posted in ruby-prof | 3 comments | no trackbacks
Posted by Charlie
Tue, 11 Nov 2008 18:15:00 GMT
No doubt this post is two weeks past its prime, but better late than never, right? On Sunday, October 26th, Yue and I headed downtown to check out Barak Obama's campaign rally at Civic Center park in the heart of Denver.
By the time we arrived the place was packed. In fact, it turned out that over 100,000 of our closest friends were there, making it Obama's largest crowd in the United States up to that point. And I think the second largest crowed I've ever been in, surpassed only by watching fireworks on the 4th of July from the National Mall in Washington, DC).
Wanting to at least be able to see Obama, we managed to be the last two people allowed into the center of the park (versus around the periphery). Once inside, we squirreled our way about half way to the stage. As you can see in the pictures, that wasn't all that close, but close enough to catch a glimpse of Obama. The first picture is looking west towards Denver's City hall. If you're interested, click the picture to get a bigger version, then find the tree in the center of the stage under the Colorado flag, and look right 3.5 columns to see Obama working the crowd:
]
And here's the view looking back east, back towards the Colorado State Capital (supposedly Denver is the only city where the city hall faces the state Capital building, but I've never verified if that's true).

For the most part Obama stuck to his standard stump speech, but what struck me was its optimism and focus on working together. Nice words no doubt, but it was a nice change of pace from typical campaign bashing.
On the way out, there was a table for all the poor souls who lost their keys and cell phones that day:

And a couple pictures of Yue for good measure, that show what a beautiful day it was:


Update: If you can't see Obama, check out Paul's post. Paul also points out the snipers that are visible on top of City Hall. And they weren't the only ones, there were plenty more to the north where Denver's taller buildings are located.
no comments | no trackbacks
Posted by Charlie
Tue, 11 Nov 2008 03:24:00 GMT
Rafting down the Grand Canyon has been on my todo list for a long time. Over the years, I've rafted or canoed the James, Potomac, Rappahannock, Shenandoah, Arkansas, Taylor (well that's mostly a creek), Green and Colorado rivers. But never the Grand Canyon.
So this summer it was time. Seven of us - Haydon, Dave, Natasha, Brian, Lauren, Yue and myself - took the plunge and paddled down the upper part of the Grand Canyon from Marble Canyon to the Bright Angel Trail. We went with Outdoors Unlimited, which I highly recommend. They not only provided equipment, but also knowledgeable guides and great food.
Here's all of us by a grotto a few hundred feet above the Colorado River (from left to right is Dave, Haydon, Yue, myself, Brian, Natasha and Lauren - click the picture for a bigger version):

There are several different types of trips you can take down the canyon - we opted for a paddle trip. A paddle trip is just like it sounds - you get to paddle your way down the canyon using yellow, rubber boats, with six people per boat plus a guide. Depending on how much you like thrills, the best seats in the boat are the front two, where you get really wet from waves breaking over the bow when you hit a big rapid.
Yue, who isn't much of a camper, was a good sport about the whole thing once she discovered she could sleep in a tent versus sleeping under the stars. Here is our campground ffrom the third night, with yours truly sleeping outside:

It took five days and ninety five miles to get to our drop-off point - the deepest part of the canyon at the bottom of the Bright Angel Trail.. From there its an eight mile hike, and 4,380 feet up, to get to the visitor center on the South Rim. I had hiked the very top bit of the trail twenty years ago, but hadn't been back since.
Its an absolutely beautiful trail, surprisingly cool on the bottom half (well, we did start at 7am) as it climbs up along a small creek. Here is what it looks like once you've hiked about two miles and reached the top of the inner gorge - the picture is looking south with the Great Unconformity in the foreground and the towering south wall in the background:

Dave took the prize, hiking up in an amazing time of 2:45, followed by Lauren and Natasha at 3:15, and the rest of us at 3:45 (which by the way I was quite proud of).
Here the obligatory picture from the top:

And here's Lauren, Natasha, Haydon and Brian a enjoying some well-deserved ice cream.

And some of our better pictures:





&



2 comments | no trackbacks
Posted by Charlie
Wed, 27 Aug 2008 18:55:00 GMT
One of the projects we've been working on for MapBuzz the last few weeks is building an interactive map that shows all the events going on in Denver during the Democratic National Convention. Users can pick the event type and date they are interested in, and the map refreshes with icons for relevant events. By clicking on a given event, the user can see exactly where and when the event is taking place. I think the map turned out pretty well - its a good example of mashup pulling data from different sources. In this case, base maps from Google, event information from Zvents, and all rendering/styling/page from MapBuzz.
It did clarify my thinking on a few points. First, Rails built-in page caching is really limited - it ignores query parameters and only works for html. So we had to hack around that, more info coming in a later post. Second, for building mashups xml really is superior to JSON simply because it supports namespaces (for all their pain points, namespaces really do facilitate merging of data from multiple sources). Third, when you need it, xslt is invaluable. Zvents serves its data using RSS, but our client only supports Atom. The simple solution was a quick xsl transformation to convert Zvent's rss feed over to Atom using libxslt (and thus MapBuzz's contribution back to the Ruby community to get the libxml and libxslt bindings back into good shape).
Posted in Design | 2 comments | no trackbacks
Posted by Charlie
Wed, 16 Jul 2008 16:38:00 GMT
There is general discontent with the state of XML processing in Ruby - see for example here or here. An obvious solution is to use libxml. However that has been a non-starter since the libxml Ruby bindings have historically caused numerous segementation faults, don't run on Windows and recently lost their current maintainer, Dan Janowski. Making it even more frustrating is that Dan had spent the last year rearchitecting the bindings, successfully fixing the segmentation faults.
Since MapBuzz heavily depends on libxml, it seemed time to step in and contribute. Over the last two weeks I've added support for Windows, cleaned out the bug database and patch list, resolved the few remaining segmentation issues, greatly improved the RDocs and refactored large portions of the code base to conform with modern Ruby extension standards.
After iterating through a couple of releases over the last two weeks, the Ruby libxml community is happy to announce the availability of version 0.8.0, which we believe is ready for prime time. It offers a great combination of speed, functionality and conformance (libxml passes all 1800+ tests in the OASIS XML Tests Suite).
So give it a try - its as easy to install as:
gem install libxml-ruby
If you're on Windows there may be an extra step if you haven't already installed libxml2. If not, then the libxml-ruby distribution includes a prebuilt libxml2 dll in the libxml-ruby/mingw directory. Copy the dll to libxml-ruby/lib, your Ruby bin directory, or somewhere on your path (basically put it someplace where Windows can find it).
Undoubtedly there are still some bugs left, so please report anything you find, so we can fix them in future releases.
Blindingly Fast
The major reason people consider using libxml-ruby is performance. Here are the results from running (on my laptop) a few simple benchmarks that have recently been blogged about on the Web (you can find them in the benchmark directory of the libxml distribution).
From Zack Chandler:
user system total real
libxml 0.032000 0.000000 0.032000 ( 0.031000)
Hpricot 0.640000 0.031000 0.671000 ( 0.890000)
REXML 1.813000 0.047000 1.860000 ( 2.031000)
From
Stephen Bannasch:
user system total real
libxml 0.641000 0.031000 0.672000 ( 0.672000)
hpricot 5.359000 0.062000 5.421000 ( 5.516000)
rexml 22.859000 0.047000 22.906000 ( 23.203000)
From Andreas Meingast:
LIBXML THROUGHPUT:
10.2570516817665 MB/s
10.2570830340359 MB/s
12.6992253283934 MB/s
10.2570516817665 MB/s
8.51116888387252 MB/s
10.2570830340359 MB/s
HPRICOT THROUGHPUT:
0.211597647822036 MB/s
0.202390771964726 MB/s
0.180272812529665 MB/s
0.198474511420818 MB/s
0.198474499681793 MB/s
0.180925089981179 MB/s
REXML THROUGHPUT:
0.130301425548982 MB/s
0.131630590068325 MB/s
0.128316078417727 MB/s
0.125203555921636 MB/s
0.120181872867636 MB/s
0.115330940074107 MB/s
I can't vouch for the appropriateness of the tests, but they show libxml clocking in at 10x hpricot and 30x to 60x REXML. I'd be happy to accept additional tests or more appropriate tests if you have any.
An Embarrassment of Riches
In addition to performance, the libxml-ruby bindings provide impressive coverage of libxml's functionality. Goodies include:
- SAX
- DOM
- XMLReader (streaming interface)
- XPath
- XPointer
- XML Schema
- DTDs
- XSLT (split into the libxslt-ruby bindings)
Now, your first reaction might be that SAX, DOM and XPath are all you need, but validating parsers make it a whole lot easier to sanitize user contributed content on web sites. And the XMLReader offers a clever way of combining the DOM's ease of use (well, ok, compared to SAX at least) with SAX's memory and speed advantages.
Better yet, most of this functionality is exposed via an easy-to-use, Ruby like API. There are still of course some warts lurking in the code, where libxml's C api leaks through to Ruby, but they are being removed one by one. And for those of you who aren't C hackers, much of this work can be done in good old Ruby.
A Long History
For such a useful, and full-featured library, the libxml-ruby bindings have a star-crossed history. Out of curiosity, I went back and traced their lineage. Sean Chittenden originally wrote them back in 2002. At the start of 2005, Trans Onoma adopted the project after Sean had moved on, and at the end of 2005 the bindings found their current home on Ruby Forge. At that point Ross Bamford took over maintenance and worked on the bindings for roughly a year, until early 2007, when then the bindings again became unmaintained. Dan Janowski picked up the ball in 2007 and completely overhauled the binding's memory model. Sadly, Dan had to give up active support this spring.
But on the bright side, Trans, Dan and Sean are all once again active on the mailing list, providing valuable experience and insight. From my point of view, with the renewed push towards a production quality release, and bringing in new users, the libxml-ruby community is as healthy as it has been in a long while.
Posted in Ruby | 30 comments | 2 trackbacks