Rails – Simple Asynchronous Processing

Sooner or later, for most large websites you have to bite the bullet and implement some form of asynchronous processing to deal with long-running tasks. For example, with MapBuzz we have a several long-running tasks:

Importing data
Batch geodcoding
Emailing event notifications to users

If you’re developing a Facebook application, moving long-running tasks to a background process or thread is critical since Facebook times out requests to your server within ten to twelve seconds.

So Many Choices

Having decided you need asynchronous processing, the next question is how to do it. And this is where things get complicated – there are a myriad of approaches, each applicable for certain problem domains. Let’s look at some possibilities:

Spawn/Fork – Create processes on demand to perform background tasks
Distributed Objects – Use a distributed object protocol (RMI, Corba, DCOM, DrB, etc) to communicate with another process to perform background tasks
Job Queue – Persist tasks in shared files or databases and execute them using background processes
Messaging Processing – Send messages to another process via a message bus

In the Ruby world, there are a number of implementations for each approach – a few examples include:

Spawn/Fork – Spawn
Distributed Objects – BackgroundDRb
Job Queue – Delayed Job (DJ), BackgroundJob (BJ), BackgroundFu, Starling, Beanstalkd, Sparrow
Message Sending – AP4R

Not surprisingly, most of these solutions are designed to work with Rails, since there’s no need to speed up processing if its just another machine on the other end instead of an impatient human.

Selecting the best one for your application is totally dependent on your use cases. Having said that, its still possible to reach some broad conclusions. Spawning or forking processes makes it impossible to offload processing to additional machines, so you’ll quickly run into scalability limits. Distributed objects solve that problem, but experience has shown distributed object protocols are very brittle because they bind clients and servers so tightly together – thus I would never use them. Job queues are more reliable because tasks are represented in a standard format (usually text based, such as xml) that is persisted to files or database tables. Message queues are similar, but add significantly more functionality such as message routing, transformation, prioritization, etc.

For many websites, a job queue is the best solution. Job queues are relatively light weight and let you distribute processing across multiple machines. However, the ruby based solutions listed above require installing and managing additional software as well as writing the job processing code itself. They also make it more difficult to develop and test software since you know have to debug multiple processes at once.

A Simple HTTP Based Solution

So what’s a simpler solution? Reuse what you already have. Most Rails applications are divided into multiple instances, distributed across one or more machines, that embed an http server (mongrel, thin, ebb) for requests. Thus we already have our background processes and an easy way to communicate with them – http (of course!). And if your using mongrel or a proxy server (Pound, Lighttpd, Nginx, Apache, etc.), then you also get a built-in request queue.

In other words:

simple background queue = HTTP + Load Balancer + Rails instances

Besides simplicity, a big advantage to this approach is that background tasks run within the Rails environment, giving you access to ActiveRecord, your models, etc.

Worker Plugin

Thus enters a new Rails plugin called worker (yeah the name leaves something to be desired). Let’s look at an example:

class ImportController < ApplicationController
  # Add support for using workers
  include Worker

  # Incoming requests are handled by this method
  resource :Geodata do
    def post
      read_file(params)
    end
  end
  
  # This method handles requests in a worker process
  resource :process do
    def post
    end
  end

  private 
  
  def read_file(params)
    worker_params = {:file_name => file_name, 
                     :tags => params['tags'],
                     :controller => 'import',
                     :resource => 'process',
                     :map_id => @map.id}

    # Create worker request
    create_worker.post(worker_params)
  end
end

So how does this work? A user POSTs a file to http://myserver/import/geodata. That method does various checks (deleted for brevity) and then sends a request to http://myserver/import/process which runs in a separate Rails instance. Although this controller delegates back to itself (in a separate process) it could call any controller it wishes.

The worker plugin will pass a session key, if available, to the background process. That turns out to be very useful since it allows sharing session information between the foreground process and background process if you’re storing session information since in memcached or a background database. That means you can use the same authentication and authorization mechanisms in the background process as you do in the foreground process.

In addition, all worker requests are signed with a MD5 hash to verify that no-one in the middle is spoofing requests.

Environments and Configuration

By default, Rails applications use three environments – testing, development and production. Each environment is quite different, which affects how you want to use worker processes. To deal with these differences, the worker plugin uses a strategy pattern to invoke requests.

In a test environment, there are no background running Rails instances. More importantly, you need to be able to check that worker requests correctly complete. Thus you want worker requests to happen synchronously and within the test process. This is the Worker::Controller strategy, and works similarly to how Rails render_component functionality works. To set this up, add the following lines to your test environment file:

config.after_initialize do
  Worker::Config.strategy = Worker::Controller
end

In development mode, you have one Rails instance running. In this case, you want worker requests to happen asynchronously but within the single development process. This is the Worker::HttpAsync strategy. To set this up, add the following lines to your development environment file:

config.after_initialize do
  Worker::Config.strategy = Worker::HttpAsync
end

Note this assumes that your development process is running on the standard port 3000.

Finally, in production mode, you’ll have multiple Rails instances running. To be on the safe side, some of these instances should be dedicated to only fulfilling worker requests. The easiest way to do this is put them on an internally accessible IP address, say 8500, that outsides cannot access. Thus the port, and perhaps IP address, of the user-facing Rails instances will be different than worker instances. To set this up, add an additional line to your config file that globally sets the host and port number of workers. Note this assumes that there is either one worker or a pool or workers at the given host and port.

config.after_initialize do
  Worker::Config.strategy = Worker::HttpAsync 
  Worker::HttpAsync.options = {:host => 'some_other_host',
                               :port => '8500'}
end

The Code

We’re releasing the worker plugin under an MIT license. If there is sufficient interest, we’re happy to setup a RubyForge project.

CFIS

CFIS

CFIS

Rails – Simple Asynchronous Processing

So Many Choices

A Simple HTTP Based Solution

Worker Plugin

Environments and Configuration

The Code

Leave a Reply Cancel reply

CFIS

CFIS

Rails – Simple Asynchronous Processing

So Many Choices

A Simple HTTP Based Solution

Worker Plugin

Environments and Configuration

The Code

Maybe you are interested

DNC Event Map

A New Take on Transparent PNGs in IE6 – Performance and VML

Leave a Reply Cancel reply