Posted by Charlie
Thu, 10 Apr 2008 23:38:00 GMT
Hearing about the latest and greatest Facebook application reminds me of reading FastCompany or Business 2.0, before it went bust. The typical story goes like this - a couple of ninja developers spend every waking moment for a year coding up the Web's next killer site. Launched with little fanfare, the website quickly goes viral, and starts to generate huge amounts of traffic. In no time, the company is worth millions of dollars (with our without revenue is optional), and everyone is rich and happy.
The rise of Facebook hyper-charges this narrative. Facebook has dramatically compressed the time between an idea and fortune and fame. We've left "web years" in the dust by moving into the parallel universe of "Facebook time." This go-round, the prototypical story goes like this - Stanford student whips up a Facebook app over the weekend and releases it on Monday. By the end of the week it has half a million users and by the end of the month the student flips the application, neatly paying for this year's tuition.
Just as some of the women you see on the covers of women's magazines really do exist, some of these stories are surely true. But for the rest of us, success is not so easy. It usually comes the way it has always come, through insight, perseverance, hard work and a bit of luck.
Which gets us to the point of this blog - telling the technical story behind the development of MapNotes - MapBuzz's first Facebook application. MapNotes is a simple application - it makes it easy to put PostIt notes on a map and share them with friends.
Our naive thinking behind MapNotes was too quickly develop a Facebook application to get familiar with the platform, and then roll out a series of applications quickly thereafter. We soon discovered that a number of technical gremlins lay behind MapNote's apparent simplicity. Above all was our desire to deeply integrate MapBuzz with Facebook, so that what users do on MapBuzz is available on Facebook and what users do on Facebook is available on MapBuzz.
Although there is plenty of information on the Web about developing one-off Facebook application, precious little exists describing the challenges of integrating a destination website with Facebook and how to support multiple facebook applications.
To share what we've learned, I'm kicking off a new series of posts over the coming weeks that will dive deep into the technical details of developing for Facebook. I'll link to the articles from this page, so consider it both an introduction and Table of Contents rolled into one. Some things I'll cover are:
- Login and installation
- Session management
- Account and link management
- Asynchronous processing
- Notifications and newsfeeds
- Using Ajax
If you're more into usability issues and nice looking screenshots, then head over to the MapBuzz blog for more info. Enjoy!
Posted in MapBuzz, Facebook | no comments | no trackbacks
Posted by Charlie
Sun, 30 Mar 2008 19:06:00 GMT
Its a funny thing when things that you wish would happen do happen, but then you realize they aren't so great after all. Take for example the change this year in daylight savings time. Not being a morning person, I love daylight savings time - it means an extra hour of sun every day. And if it was up to me, I would abolish standard time.
But now that daylight savings time has been moved up a month, I've found it disappointing. Why? Because is still winter in Denver. The extra hour of sunlight a day is just a tease - its still chilly outside and its still snowing. At least when its dark and cold outside I don't feel like I'm missing out by being locked up inside working.
Posted in Colorado | no comments | no trackbacks
Posted by Charlie
Mon, 11 Feb 2008 18:11:00 GMT
If there were such a thing as the Ten Commandments of programming, code reuse would surely be included. Now you're probably thinking I've lost my mind, because any good developer knows that code reuse is a pipe dream. But that's because you are thinking on the macro level and not the micro level. At the micro level, code reuse has altogether more pleasant acronym, DRY, or do not repeat yourself.
The tussle in the Rails community over components illustrated this tension well. On one extreme, advocates dreamed of creating plug-and-play components that could be reused across multiple applications. On the other extreme, detractors sneered at code reuse as a hopeless endeavor and vowed to strip out all traces of components from the the Rails 2.0 release. Left out in the cold was the view that reusing code, specifically controllers, within a single application was a good thing.
Reusing Controllers
Rails is built using the Model-View-Controller pattern, which is designed to segment an application into controllers, models and views. Rails also adds in the concept of filters, which are pieces of code that run before and after controllers.
The Rails community encourages reuse of models, filters and views, but seems to actively discourage the reuse of controllers if the Ruby on Rails book is any indication:
When Rails was initially released, it came with a system for creating components. Unfortunately, the implementation of components left a lot to be desired: performance
was poor, and there were unanticipated side effects. As a result,
components are being phased out.
Instead, the common wisdom now is to synthesize component-like functionality
using a combination of before filters and partials. Use the before filter to set up the context for the partial, and then render the fragment you want using
a regular render :partial call.
Like much conventional wisdom, this advice is hogwash.
Filters + Partials != Controllers
The problem with just using filters and partials is that it only applies to a subset of web applications. A good example, and of course the one used in the Ruby on Rails book, is a shopping website. The focal point of most shopping websites is a shopping cart. The point of the application is to make it easy for users to add things to the cart, modify the cart and hopefully buy the contents of the cart. Since the cart plays such a crucial role, it often make sense to have a filter that setups the cart so each controller has easy access to it. A nice side affect of this approach, is that views also have access to the cart, as described in the quote above.
But for many other types of applications, filters and partials can't make up for controllers. For example, take a look at the Boulder community on MapBuzz. The top-left side of the page is rendered by the community controller, the comments on the bottom left by a comment controller and the map listings on the right are by a map browser controller. If you log-in, then a couple of additional tabs are added to the page, each rendered by its own controller.
This type of composition is quite common in Web 2.0 applications. Take most social networking sites - they'll mix together news feeds, discussion boards, friends/friends lists, pictures,etc., in a variety of different ways depending on the current page.
The design problem is that any given controller can be called in two different contexts:
- When the whole page is rendered via a Browser page refresh
- When just the controller is rendered via an Ajax call
Trying to do this with just filters and helpers is a non-starter, because you end up with one big controller that needs to run different filters depending on the context of the call.
The better approach is to divide your controllers into logical units, and then have a separate page controller for the entire page. When the page is rendered, the page controllers should delegate rendering the various sub-parts (for example, tabs) of the page to the appropriate controller. When just one of the sub-parts of the page needs to be rerendered, due to an Ajax call, then you directly call the appropriate controller.
Performance
In Rails, a controller or view can call another controller using the much maligned render_component method. Part of the problem is the method is misnamed. It no longer has anything to do with rendering components - instead its used to invoke another controller. Therefore, it would be more appropriately named render_controller, call_controller, invoke_controller, etc.
Assuming you agree with my so far, reading the Rails documentation for render_component with certainly give you pause:
Components should be used with care. They‘re significantly slower than simply splitting reusable parts into partials and conceptually more complicated. Don‘t use components as a way of separating concerns inside a single application. Instead, reserve components to those rare cases where you truly have reusable view and controller elements that can be employed across many applications at once.
So to repeat: Components are a special-purpose approach that can often be replaced with better use of partials and filters.
Undoubtedly this was true once upon a time. Is it still? There is one way to find - run a test. I created a new Rails application using the built-in generators and then added the following simple code:
controller/main_controller.rb
class MainController < ApplicationController
def get_without_controller
a = 1
end
def get_with_controller
a = 1
end
end
controller/sidebar_controller.rb
class SidebarController < ApplicationController
def get
render(:partial => 'sidebar/content')
end
end
views/main/get_without_controller.html.erb
<p>Some fun content goes here</p>
<div class="sidebar">
<%= render(:partial => 'sidebar/content') %>
</div>
views/main/get_with_controller.html.erb
<p>Some fun content goes here</p>
<div class="sidebar">
<%= render_component(:controller => SidebarController,
:action => 'get') %>
</div>
views/sidebar/_content.html.erb
<p>Hi there</p>
There are two paths through this application:
- GET '/main/get_without_controller.rb'
- GET '/main/get_with_controller.rb'
In case its not obvious, get_without_controller users render(:partial) to include the sidebar content while get_with_controller uses render_component. Using both benchmark and ruby-prof, I ran each method 100 times using a souped up integration test (more about that in a future post). The results, using Rails 2.02 on Ruby 1.8.4 on WindowsXP on a Pentium M laptop (about 3 years old) are:
| Method |
100 Requests (s) |
1 Request (s) |
| get_without_controller |
0.30 |
0.0030 |
| get_with_controller |
0.45 |
0.0045 |
So using components is 50% slower, but the overhead is a miniscule 0.0015 seconds per request. That overhead is obviously lost in a real application. Of course you have to be careful when using render_component to not try and do to much per HTTP request - but the same is true using filters and partials.
DRY Up Your Controllers
In truth, render_component is the most primitive way imaginable of reusing controllers. But it does let you to DRY up your Rails application by letting you create more cohesive controllers that can be reused within a single website. For most websites you won't need this functionality, but when you do, there isn't a substitute for it and don't let anyone browbeat you into thinking there is.
Posted in Design, Rails | 13 comments | 2 trackbacks
Posted by Charlie
Fri, 08 Feb 2008 18:16:00 GMT
Update - It turns out that Rails does cache column data dictionary queries (which is what you would expect), but not for :has_and_belongs_to_many associations (HABTMA). I know those are "old fashioned," but they fit our data model perfectly in a couple of places. So be warned - using just a couple of HABTMA associations will generate a huge number of data dictionary queries.
As part of monitoring the performance of MapBuzz, we run a nifty little program called PgFouine to analyze the postgresql log files every night. PgFouine summarizes the most common queries and slowest queries. Here is our data from yesterday:
Most frequent queries (N)
| Rank |
Times executed |
Total duration |
Av. duration (s) |
Query |
| 1 |
140,115 |
2m42s |
0.00 |
SELECT a.attname, format_type(a.atttypid, a.atttypmod), d.adsrc, a.attnotnull
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
WHERE a.attrelid = ''::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum; |
Besides being the most run query, this was also the seventh most time consuming query.
I was shocked the first time I saw this many months ago - my first guess was that we were not caching classes in our production mode. But it turned out that wasn't the problem. Its just Rails being silly - every time it loops over a model's columns it strikes up a conversation with the database. That happens a fair bit - when you use :include to add additional tables to a query, when you use dynamic finders (find_by_x), when you use relationships setup by :has_many, :has_and_belongs_to, etc. And this isn't the only place Rails is wasteful - it constantly queries the database for table names and indices - it just happens those queries don't run nearly as often.
Rails Plugin
Clearly there is no reason to do this in a production environment, and in truth, I don't see much reason to do it in a development environment either. So yesterday I finally go around to patching Rails and submitting a bug report. The patch caches data dictionary queries for the Postgresql adapter. After loading the patch, Rails still supports the ability to add tables to your database at runtime, but no longer supports adding or removing columns from a table at runtime or recycling table names. If these things are important to you, the patch also provide a flush_dd_cache method that flushes the query cache. An obvious alternative solution is to add a cache_dd_info class variable to ActiveRecord::Base, which would be off by default but on in production. However, I'm skeptical there is a need for such a flag.
On a per request basis, you won't see much performance gain from the patch in your Rails application servers, but it will remove needless load from your database. And while you are waiting for Rails to be patched (if it is patched), feel free to download the Rails plugin we are using to solve the problem. Note the plugin also fixes two other ActiveRecord bugs, which are its incorrect handling of Postgresql schemas and its ignoring of views.
Posted in Rails | 5 comments | no trackbacks
Posted by Charlie
Thu, 07 Feb 2008 21:07:00 GMT
Today I was fixing up some inefficiencies in Rail's Postgresql adapter (more about that in a later post), and came across this strange looking code:
def tables(name = nil)
schemas = schema_search_path.split(/,/).map { |p| quote(p) }.join(',')
query(<<-SQL, name).map { |row| row[0] }
SELECT tablename
FROM pg_tables
WHERE schemaname IN (#{schemas})
SQL
end
The code is passing a here document, which is basically a long string, as a parameter to the query method. The here document is demarked by the string "SQL."
What took me by surprise is the here document is defined after the method call, which is a code construct I've never seen before. Making sure my memory wasn't failing me, I double checked my copy of The Pick Axe book and verified it never mentions this feature.
Hats off to Jamis, who added this neat little trick to Rails in revision 2317, way back in September 2005, and to Anthony who blogged about it last month. And finally, the Ruby Wikibook has a great tutorial about here documents. Who knew that you can have multiple here document parameters per method call?
Posted in Ruby | 3 comments | no trackbacks
Posted by Charlie
Sun, 03 Feb 2008 22:57:00 GMT
Shugo and I are happy to announce the release of ruby-prof 0.6.0. If you haven't used ruby-prof, its a superfast, open-source, profiler for Ruby that shows you where your program is slow.
The big news is that this release, thanks to Shugo, supports Ruby 1.9. And there is talk about merging ruby-prof into Ruby itself, but we'll have to wait and see if that really happens.
This release also includes experimental support for memory profiling, added by Alexander Dymo, which I'll talk about in more detail below. And of course it includes a number of bug fixes, almost all of which were reported and fixed by the community. Special thanks goes out to Sylvain Joyeux, Michael Granger, Makoto Kuwata and Dan Fitch.
The best way to get started with ruby-prof is to look at the readme file which is online at RubyForge.
Memory Profiling
The 0.5.0 release introduced the concept of measurement modes, which offer a way of extending ruby-prof to track all sorts of metrics about a running program, including time, object allocations, memory usage, etc. Developers can implement custom measurement modes by simply implementing a measurement function that tracks the metric of interest, and ruby-prof takes care of all the rest.
The idea was inspired by Sylvain Joyeux, who used it in the 0.5.0 release to track object allocations in a running program. In this release, Alexander Dymo added support for tracking memory allocations as I discussed in my last post.
The only disadvantage of these two measurement modes is that they require patched ruby interpreters, making them unavailable to most ruby developers. Tracking memory usage is particularly important, and there has been some great work done by Evan Weaver with BleakHouse, Erick Hodel with mem_inspect and Alexander Dymo. I'm not partial to any of these approaches, outside the fact they should be implemented in C for performance reasons. But I would like to see a single solution which could be merged into Ruby itself, opening it up to all Ruby developers.
I've asked Alexander to start up a discussion on ruby-core, so we'll see where it leads.
no comments | no trackbacks
Posted by Charlie
Sat, 02 Feb 2008 22:58:00 GMT
A couple of days ago, Alex Dymo from Pluron sent me an email describing some of the great work he has done optimizing the performance of their online project managment software Accunote. His great insight was that their performance problems were caused by allocating too much memory, thus forcing Ruby's Garbage Collector to frequently run ruining performance.
Using a patched version of ruby and ruby-prof, Alex was able to more than double performance (with hints of more to come) and reduced memory consumption by 75%, or 750MB (yes - that is Megabytes). Alex does a wonderful job of documenting his approach with a series of blog posts here and here.
The main culprit was Rail's handling of attributes, which is dreadfully designed (an obvious case of the simplest solution to a problem is the wrong solution - something I've been meaning to blog about for almost a year now ). But he also implicated Ruby's built-in benchmarking module.
Even better, Alex provided patches to ruby core (already accepted), Rails (already accepted) and ruby-prof. We'll also gladly accept his patches, and since its about time for a ruby-prof refresh, we'll spin out a new release as soon as we can. More to follow.
Posted in ruby-prof | 5 comments | no trackbacks
Posted by Charlie
Fri, 01 Feb 2008 08:22:00 GMT
Maybe its just me, but what I want from a JavaScript library seems to be diverging from what Prototype provides. What I want, in order of importance, is:
- A cross-browser API that hides some of the major differences between Internet Explorer and standards compliant browsers
- Unobtrusive
- As small as possible, and if that's not possible, then at least modular
- A selector API
Where Prototype really falls down is on points two and three - its very obtrusive, getting larger by the day and isn't modular.
Accepting Something for What It Is
Prototype's greatest sin is its disdain for JavaScript.You can see this disdain shine through in a number of ways.
First, Prototype originated as part of Rails, which provides helpers that use Ruby code to generate JavaScript. If programs could talk, Rails would be saying "Let me take care of this for you since you certainly don't want to dirty your hands with JavaScript."
Second, Prototype wastes over 200 lines of code (about 5%) duplicating Ruby's Enumerable API in JavaScript, for no obvious reason except the developers prefer Ruby's way of doing things. The problem is that Ruby's Enumerable API is based on one of the core features of Ruby - its elegant use of anonymous functions (called blocks) to apply snippets of code to a sequence of items. JavaScript has first-class anonymous functions, but it doesn't have the language support for using them as iterators. As a result, Prototype's JavaScript code doesn't look natural because it is working outside the design strengths of JavaScript. And more importantly, it forces Prototype into using exceptions as a iteration signaling method, which is a nasty hack.
For example, let's look at the any method. In Ruby, any? returns true if an item in a list matches some criteria. Thus to find if any number in an array is odd you would write this:
[2, 4, 6, 8, 11].any? do |value|
value.even? # even? is from Rails, not Ruby
end
In my view, porting any to JavaScript is of dubious value at best. But let's look at the contortions that Prototype has to go through to do it:
any: function(iterator, context) {
iterator = iterator ? iterator.bind(context) : Prototype.K;
var result = false;
this.each(function(value, index) {
if (result = !!iterator(value, index))
throw $break;
});
return result;
}
each: function(iterator, context) {
var index = 0;
iterator = iterator.bind(context);
try {
this._each(function(value) {
iterator(value, index++);
});
} catch (e) {
if (e != $break) throw e;
}
return this;
}
_each: function(iterator) {
for (var i = 0, length = this.length; i < length; i++)
iterator(this[i]);
}
The any method calls each with calls _each which then calls your method. And since JavaScript doesn't support returning values from an anonymous function used as an iterator (there is no yield keyword like in Ruby), the any method is forced to throw an exception (see $break) to signal that an element has been found. That might seem like a small offense until you are trying debug JavaScript code using Venkman and keep interrupted by meaningless exceptions (which happens if you've asked Venkman to stop at all errors and exceptions).
More examples of trying to make JavaScript more like Ruby abound:
- The addition of a Class object that introduces an initialize function, instead of just accepting JavaScript's combined constructor/initalizer idiom
- A number of useless additions to the String class (methods like succ, times, etc) - 100 plus lines of code
- A number of useless additions to the Array class (methods like succ, times, etc) - a bit less than 100 lines of code
The end result is that over 10% of Prototype is wasted trying to add Ruby like-features to JavaScript that don't fit well, simply because the Prototype designers prefer Ruby's idioms over JavaScript's idioms. The obvious problem is that Prototype is a JavaScript library, not a Ruby library.
Lay Off My Prototypes
Prototype also fails miserably on the unobtrusive test. In its first version, Prototype added methods to JavaScript's Object prototype - which is a big no-no. Not learning from its past mistakes, the latest version of Prototype has this gem:
(function() {
var element = this.Element;
this.Element = function(tagName, attributes) {
attributes = attributes || { };
tagName = tagName.toLowerCase();
var cache = Element.cache;
if (Prototype.Browser.IE && attributes.name) {
tagName = '<' + tagName + ' name="' + attributes.name + '">';
delete attributes.name;
return Element.writeAttribute(document.createElement(tagName), attributes);
}
if (!cache[tagName]) cache[tagName] = Element.extend(document.createElement(tagName));
return Element.writeAttribute(cache[tagName].cloneNode(false), attributes);
};
Object.extend(this.Element, element || { });
}).call(window);
Take a good, long look at this method. It replaces a browser's built in Element object, which is used to represent elements in a DOM tree, with an Element function. Replacing a core browser object is nuts. Especially for the ridiculously small payoff. Instead of writing this:
var element = document.createElement('foo')
element.id = 7
This change lets you write this:
var element = new Element('foo', {id: 7})
And how many times does Prototype use this function? A measly 4 times! And to add insult to injury, the code as written is broken because it breaks the prototype chain. The last line in the function should be:
this.Element.prototype = element.prototype
Without this line, any custom extensions you've made to the Element object are lost. Trust me, it took a good long time to debug why our code no longer worked.
Time for a Diet
Finally, Prototype is getting bigger with every release. Version 1.5 weighs in at 3,396 lines of code while version 1.6 is 4,307 lines, a 27% increase. I'm sure the additional code is useful, but I'm also sure there are great swaths of Prototype that I don't need. Unfortunately, Prototype doesn't provide a mechanism to package up only the parts of it you want. When the library was smaller, that was a reasonable decision. But as Prototype continues to grow, there will come a point where its benefits are outweighed by its weight (and for me I've passed that point).
So What Next
The last few years have been JavaScript's golden years, marked by an amazing outpouring of experimentation and creativity that has led to a number of great JavaScript libraries. A huge benefit of this work is revealing the pain points, beyond cross-browser compatibility issues, of working with JavaScript. These issues include the lack of a Selector API, better iterators, better chaining of DOM methods, wordy method names (getElementById), etc.
Of course each library takes its own approach to solving these problems, and with that comes a downside - lockin. For large JavaScript projects, switching between libraries is a boring, tedious, time-consuming undertaking. Which is the reason we've remained with Prototype for as long as we have and will continue to do so for a bit longer while we plan our migration to a new library.
Posted in JavaScript | 15 comments | 1 trackback
Posted by Charlie
Tue, 29 Jan 2008 18:04:00 GMT
Time for a rant.
A few years ago I decided it was finally time to learn Linux after having used DOS, Windows, and the Mac OS for years. My plan of attack was to run my own domain - savagexi.com, complete with a website, blogs, mail server, DNS server and DHCP server. And if I'd ever find the time, MythTv.
Back then Fedora seemed like the best choice, and every year or so I upgrade the servers in the basement to the latest version. Upgrading Fedora always sucks, but my experience over the weekend warrants a big, resounding F.
When working on my own machines, I tend to go beyond flying by the seat-of-my-pants to wanton recklessness. There's nothing quite like a nasty error message (disk failure, missing partition, broken boot loader, misconfigured X server etc.) to focus the mind and learn how things really work. Over the years, my reckless attitude has cost me only once, when a disk drive that was part of a Logical Volume gave up its soul when it screeched to a dreadful halt. And even then, I almost managed to rescue the data I needed off the remaining disk, finding out five minutes too late what I should have done instead of what I did. Since then, I've eschewed LVM and gone with nice, simple RAID 1 arrays (which means having 2 disks that mirror each other so if one breaks you can get your data from the other one) to at least provide a modicum of redundancy.
The impetus for upgrading this time around was spam. I've always heard how wonderful greylisting is, and after one too many emails about navigating the love canal with confidence, it was time to take action. But of course I ran into a roadblock - setting up greylisting on Fedora 6 using a program called PostGrey didn't work because it conflicted with SELinux (see, I'm a glutton for punishment, using SELinux on a home network). Of course that took some doing to figure out, since Fedora 6 doesn't bother to actually log a message about the problem. So after reading the Fedora 8 release notes about how PostGrey and SELinux are best of buddies, I decided it was worth the pain to upgrade the email server.
From past experience, I was under no illusion it would be easy. But little did I suspect just how dreadful it would be. I decided to do the upgrade using a network install since I don't have a DVD burner (yeah, yeah), which means the bytes are downloaded on demand across the Internet. It actually works pretty well if you pick a fast mirror, such as facebook. But when things go wrong you have to stop the installation, reboot the machine, Google around a bit, fix whatever problem is, start the installation over and redownload the bytes. Remember the stop-reboot-fix-install sequence, I must have done it twenty times.
Day 1
Attempt #1. Things got off to a rousing start with Anaconda, the Fedora installer, complaining that the disk partitions on the two drives in the machine had to be labeled. Of course Anaconda should have just fixed the problem itself, but no, it is a remarkably unhelpful program.
Attempt #2. So stop-reboot-google around-fix the problem - and try again. This time Anaconda bitched about not finding any valid partitions, or in English, it couldn't read the 2 hard-drives on the machine and thus couldn't update them. Since I had just rebooted the machine, it stretched the imagination that Anaconda could be so dumb. But either way, back to the stop-reboot-fix-reboot-start cycle. Except this time there was no fix, since the machine booted just fine.
Attempt #3. Try again. This time I gave into Anaconda when it offered the choice of wiping the drives clean, and hit the next button. I then quickly decided that was a bad move, and hit the back button. No luck. Although the installation hadn't started yet (I was on screen that was asking me some question I don't remember), when I rebooted the machine I was greeted by the message GRUB. Mind you, not a grub prompt, just four capital letters that spelled GRUB. Ugh.
So it was now time to dig out the Fedora 6 rescue disk and run it. It couldn't find any partitions either, and dumped me at a command prompt. From there I could run the ever exciting program fdisk, which let's you manage the partitions on your disk. fdisk is a nice, easy to use program, but its living on the edge - one false move and you can easily delete your data. From fdisk I noted that the machine had two hard-drives, the first was 80GB and the second 60GB. I also saw that first drive (80GB) no longer had a partition table thanks to Anaconda. Working backwards, I recreated its partitions. That was easy to do, since the two drives are part of a RAID array and thus I assumed the first partition on the first disk should be 60GB.
Attempt #4. Reboot and .... get greeted by the every friendly GRUB message again.
Attempt #5. Reboot, but this time I hit the F12 key to open the Boot menu. I then noticed that the last choice in the boot menu was to start a utility disk, which miraculously opened to a grub prompt (it wasn't until the next day I figured out how to run grub from the rescue disk, although I suspected it was possible). Of course I don't know diddly about Grub, so it took another 30 minutes of Googling to figure out how to fix the problem (basically reinstall grub on the drive).
Attempt #6. This time, Anaconda had the decency to recognize my partitions and even offered me a chance to upgrade them. Hooray. Pushing my luck, I hit the next button, and watched Anaconda check the dependencies for all installed packages. 5%, 10%, 15%, 25%, 26%...and then nothing. Of course.
More Googling, and finally enlightenment. Turns out I was hardly the first to run into this show-stopper bug. If that wasn't bad enough, the bug was still open 2 months after it was reported, and none of the mirrors had been updated (there has been a respin of the the Fedora 8 CDs, but its hardly useful if I can't get to it). So I read through the whole thread, and in one of the comments a Fedora developer had posted a link to a "update image" on his website. After a bit of research, I figured out what an update image is and how to use it.
Attempt #7. If you don't first succeed, try, try again. This time Anaconda got past the dependency checker, and amazingly enough finished. Success was near at hand. NOT.
Attempt #8. Reboot the machine and watch in horror as the dreaded GRUB message rears its ugly head.
So back to the rescue disk - which of course can't mount any partitions ( wtf?) and spits me out to a linux prompt. Back to fdisk. And once again enlightenment - Disk #1 had once again lost its partition table. Fix it. Boy this is getting tedious.
Attempt #9. Surely things are fixed by now. Reboot. And then watch in amazement as the computer tries to load Fedora Core 6, spits out pages and pages of errors, and unceremoniously dumps me to a login prompt. Of course the login prompt doesn't work. WTF?
Ah - my favorite pastime, loading the rescue disk. Try fdisk again, everything looks ok. So the next obvious thing is the RAID array is broken somehow. Go read about mdadm, which is the Linux program for creating and managing software RAID arrays. Using the wonders of Google, I found a very helpful article that explains how to rescue your RAID array. Following the instructions, I remount the array and discover that only Disk #2 is available. And then it dawns on me - somehow Anaconda only updated Drive #2, thus leaving Drive #1 with Fedora 6 in a very broken state. So a bit more Googling, and I learn how to re-add Disk #1 back into the array. And then nothing. Hmm. More Googling - how exactly do you know what a RAID array is doing?
That didn't take long, and I stare in wonderment as something actually goes right - mdadm is happily resyncing Disk #1 with Disk #2 and says it will be done in a bit over an hour. At this point its 3:30 am, so I call it a day.
Day 2
Attempt #10. After a good night's sleep, it was time for more fun. The RAID array had successfully fixed itself overnight, so crossing my fingers I rebooted the machine. My heart sunk when I was greeted with lines and lines of warnings about disk overflow errors. But wait, those were for the extra partition on Disk #1 (remember only the first 60GB are used in the RAID array, leaving 20GB free). Once the cruft had cleared, the machine managed to boot all the way to the Fedora 8 welcome screen. Hallelujah! Of course a fair bit was broken, including the DNS server, which meant at least a few hours in BIND hell (BIND and I simply don't get along). But first things first.
However, I was worried about the disk overflow errors. For some reason, the kernel thought the 20GB partition was smaller that it really was. A bit of Googling turned up a couple of potential causes and solutions, but none worked. So back to fdisk. I figured the best course of action was to just delete the 2nd partition and recreate it.
Attempt #11. After recreating the problem partition, it was time to reboot the machine. And of course back to my old friend GRUB. I have no idea how I ended up back there, but clearly old flings die slowly. But at this point I was an old hand at moving on, and rescue disk in hand, it was time to work some magic at the grub prompt. And to be on the safe-side, I Googled around a bit more to see if somehow I had mistakenly configured GRUB with RAID and could kick this habit once and for all. Fortunately, I turned up this gem of an article and promptly changed things around based on its recommendations.
Attempt #12. And finally, one day later, a clean boot to Fedora 8 (minus of course BIND being unhappy).
Denouement
It beats me how any normal person manages to maintain their own Linux system - I only succeed through sheer determination and stubbornness. I realize that Fedora recommends a clean install with each new version, but to do that without losing your personal data and system configuration takes knowledge and effort beyond almost anyone who lives on this planet, including myself. So overall - I give Fedora an F for its horribly broken upgrade program.
And of course the kicker - PostGrey still doesn't work with SELinux on Fedora 8. But at least in FC8 its polite enough to actually log an error. So anyone for creating and compiling their own policy files? Ah, I feel another rant coming along about SELinux.
Posted in Tools | 18 comments | no trackbacks
Posted by Charlie
Tue, 29 Jan 2008 03:06:00 GMT
For MapBuzz, we use a popular open source project called Trac for managing our bugs, feature requests, release schedules, etc. As long as you don't have complex requirements, Trac is pretty good - its a lot more pleasant to use then expensive commercial products such as Rational ClearQuest.
Unlike ClearQuest, Trac is designed to live on the Web. But living on the Web can be dangerous - in recent months our database was getting overwhelmed by spam. Cleaning it out was becoming a tedious, daily chore.
After trying a variety of counter measures over a period of a few months, I finally gave up and handed it over to Anders (and do take a look at the very cool URI he has). It took him about one minute to diagnose the problem - spammers weren't coming in through the front door, they were coming in through the back door. I had assumed that spammers were using Trac's web interface to futher their nefarious causes, but instead they were using our automated email ticket submission system. The way that works is when an error is generated, either on a MapBuzz client or server, an email with all the relevant information is sent to trac@mapbuzz.com. Bugs submitted that way are easy to spot - we use the imaginative names "MapBuzz Client Error" or "MapBuzz Server Error" for them.
The solution was obvious - only let computers from within the mapbuzz domain email tickets. But figuring out how to do it was another thing. The problem with not having a full-time admin is that there is always a huge startup cost in fixing IT problems as you rack your brain trying to remember how some complex piece of sofware works. In this case it was Postfix, and after an hour of rummaging through the manuals, we finally discovered the right incantation. Undoubtedly there are other ways to do this, and probably better ways, but we added the following line to the file roleaccount_exceptions:
# Only allow sending to trac from local domain
trac@mapbuzz.com permit_mynetworks,reject
Or in English, only machines in the MapBuzz domain can send tickets to Trac. And Voila - no more spam!
Posted in Tools | 1 comment | no trackbacks