Rails – Simple Asynchronous Processing

Sooner or later, for most large websites you have to bite the bullet and implement some form of asynchronous processing to deal with long-running tasks. For example, with MapBuzz we have a several long-running tasks:

  • Importing data
  • Batch geodcoding
  • Emailing event notifications to users

If you’re developing a Facebook application, moving long-running tasks to a background process or thread is critical since Facebook times out requests to your server within ten to twelve seconds.

So Many Choices

Having decided you need asynchronous processing, the next question is how to do it. And this is where things get complicated – there are a myriad of approaches, each applicable for certain problem domains. Let’s look at some possibilities:

  • Spawn/Fork – Create processes on demand to perform background tasks
  • Distributed Objects – Use a distributed object protocol (RMI, Corba, DCOM, DrB, etc) to communicate with another process to perform background tasks
  • Job Queue – Persist tasks in shared files or databases and execute them using background processes
  • Messaging Processing – Send messages to another process via a message bus

In the Ruby world, there are a number of implementations for each approach – a few examples include:

Not surprisingly, most of these solutions are designed to work with Rails, since there’s no need to speed up processing if its just another machine on the other end instead of an impatient human.

Selecting the best one for your application is totally dependent on your use cases. Having said that, its still possible to reach some broad conclusions. Spawning or forking processes makes it impossible to offload processing to additional machines, so you’ll quickly run into scalability limits. Distributed objects solve that problem, but experience has shown distributed object protocols are very brittle because they bind clients and servers so tightly together – thus I would never use them. Job queues are more reliable because tasks are represented in a standard format (usually text based, such as xml) that is persisted to files or database tables. Message queues are similar, but add significantly more functionality such as message routing, transformation, prioritization, etc.

For many websites, a job queue is the best solution. Job queues are relatively light weight and let you distribute processing across multiple machines. However, the ruby based solutions listed above require installing and managing additional software as well as writing the job processing code itself. They also make it more difficult to develop and test software since you know have to debug multiple processes at once.

A Simple HTTP Based Solution

So what’s a simpler solution? Reuse what you already have. Most Rails applications are divided into multiple instances, distributed across one or more machines, that embed an http server (mongrel, thin, ebb) for requests. Thus we already have our background processes and an easy way to communicate with them – http (of course!). And if your using mongrel or a proxy server (Pound, Lighttpd, Nginx, Apache, etc.), then you also get a built-in request queue.

In other words:

simple background queue = HTTP + Load Balancer + Rails instances 

Besides simplicity, a big advantage to this approach is that background tasks run within the Rails environment, giving you access to ActiveRecord, your models, etc.

Worker Plugin

Thus enters a new Rails plugin called worker (yeah the name leaves something to be desired). Let’s look at an example:

class ImportController < ApplicationController
  # Add support for using workers
  include Worker

  # Incoming requests are handled by this method
  resource :Geodata do
    def post
      read_file(params)
    end
  end
  
  # This method handles requests in a worker process
  resource :process do
    def post
    end
  end

  private 
  
  def read_file(params)
    worker_params = {:file_name => file_name, 
                     :tags => params['tags'],
                     :controller => 'import',
                     :resource => 'process',
                     :map_id => @map.id}

    # Create worker request
    create_worker.post(worker_params)
  end
end

So how does this work? A user POSTs a file to http://myserver/import/geodata. That method does various checks (deleted for brevity) and then sends a request to http://myserver/import/process which runs in a separate Rails instance. Although this controller delegates back to itself (in a separate process) it could call any controller it wishes.

The worker plugin will pass a session key, if available, to the background process. That turns out to be very useful since it allows sharing session information between the foreground process and background process if you’re storing session information since in memcached or a background database. That means you can use the same authentication and authorization mechanisms in the background process as you do in the foreground process.

In addition, all worker requests are signed with a MD5 hash to verify that no-one in the middle is spoofing requests.

Environments and Configuration

By default, Rails applications use three environments – testing, development and production. Each environment is quite different, which affects how you want to use worker processes. To deal with these differences, the worker plugin uses a strategy pattern to invoke requests.

In a test environment, there are no background running Rails instances. More importantly, you need to be able to check that worker requests correctly complete. Thus you want worker requests to happen synchronously and within the test process. This is the Worker::Controller strategy, and works similarly to how Rails render_component functionality works. To set this up, add the following lines to your test environment file:

config.after_initialize do
  Worker::Config.strategy = Worker::Controller
end 

In development mode, you have one Rails instance running. In this case, you want worker requests to happen asynchronously but within the single development process. This is the Worker::HttpAsync strategy. To set this up, add the following lines to your development environment file:

config.after_initialize do
  Worker::Config.strategy = Worker::HttpAsync
end 

Note this assumes that your development process is running on the standard port 3000.

Finally, in production mode, you’ll have multiple Rails instances running. To be on the safe side, some of these instances should be dedicated to only fulfilling worker requests. The easiest way to do this is put them on an internally accessible IP address, say 8500, that outsides cannot access. Thus the port, and perhaps IP address, of the user-facing Rails instances will be different than worker instances. To set this up, add an additional line to your config file that globally sets the host and port number of workers. Note this assumes that there is either one worker or a pool or workers at the given host and port.

config.after_initialize do
  Worker::Config.strategy = Worker::HttpAsync 
  Worker::HttpAsync.options = {:host => 'some_other_host',
                               :port => '8500'}
end 

The Code

We’re releasing the worker plugin under an MIT license. If there is sufficient interest, we’re happy to setup a RubyForge project.

  1. jim Cropcho
    June 17, 2008

    thanks for getting this up; I’m def checking out the worker plugin.

    Reply
  2. aleco
    June 17, 2008

    this sounds like a very interesting plugin. are you planning to add scheduled tasks that are not triggered by a controller action?

    ps: personally I’d strongly prefer github over rubyforge.

    Reply
  3. Jim Cropcho
    June 17, 2008

    I noticed that the .zip doesn’t include any tests. Has any testing been done on the plugin yet? If not, I suppose writing them would be a first step, should a formal project be created.

    Reply
  4. Charlie Savage –
    June 17, 2008

    Aleco – Github is also fine if that’s what people want.

    Reply
  5. Charlie Savage –
    June 17, 2008

    Jim – Sorry, there aren’t any direct tests with the plugin. Having said that, we’ve been running the code in production for the last couple of months. In addition, it is tested indirectly through our unit and functional test suites (the tests verify that certain background requests get completed correctly).

    So I so feel fairly confident about the code – although I’m sure there are some lurking bugs. Adding some direct tests would be a good thing.

    Reply
  6. Noel
    June 18, 2008

    The instructions for the test and development environment file modifications are the same — is this correct?

    Reply
  7. Thomas Kadauke
    June 18, 2008

    Hey there,

    Here is a completely different approach to background processing, that can be used with any of the above mentioned choices:

    http://devblog.imedo.de/2008/6/18/running-ruby-blocks-in-the-background

    Instead of implementing a complete background task communication protocol, this solution builds on top of any communication protocol to execute code in a background process. Plus, it is the only solution that I know of, that has DRY error recovery support.

    Reply
  8. Charlie Savage –
    June 19, 2008

    Good catch Noel – the test and development modes use different workers. Test mode uses an in-process worker (similar to render_component), while development mode uses an out of process worker (well sort of, its the same Rails instance but its done via a 2nd http request).

    Reply
  9. June 19, 2008

    Hey there,

    I’m not sure if my last comment made it all the way through. Anyway, here is it:

    Here is a completely different approach to background processing, that can be used with any of the above mentioned choices:

    http://devblog.imedo.de/2008/6/18/running-ruby-blocks-in-the-background

    Instead of implementing a complete background task communication protocol, this solution builds on top of any communication protocol to execute code in a background process. Plus, it is the only solution that I know of, that has DRY error recovery support.

    Reply
  10. Charlie Savage –
    June 19, 2008

    Hi Thomas,

    Thanks for the link. Looks like youv’e done some great work – the Imedo solution of passing blocks is quite interesting, I’ve never seen anything like that before.

    By passing blocks, its a ruby-centric take on distributed objects. As you could probably guess from the article, I’m not a fan of such solutions, since I think they bind processes to tightly together and are tough to debug. Having said that, they can can work well in a local network.

    As for working with any communication protocol, I think that’s over engineering. I know it seems like a good idea, but every time I’ve tried it the extra complexity has never been worth the flexibility. Is someone really going to switch communication protocol? And of course, you still need at least one communication protocol anyway, so now a user needs to install two things.

    Error recovery is definitely a good point since its non-obvious what the best way is for a background process to report an error. So curious what you mean that it has DRY error support?

    And as for DRY generally – one of the thing that I like about the solution we’re using is that everything is a Rails instance. Thus there is no need to implement a bunch of supporting code in background instances that duplicate what you already have in your foreground instances.

    Reply
  11. June 20, 2008

    I really like the approach and I got i working… basically. But I had to set

    session :cookie_only => false

    in the controller. Otherwise I’d get a SessionFixationAttempt error on the process method. I’m working with Rails 2.1.

    Now the bad thing is, with :cookie_only set to false, the :process action can’t add flash-notices. I’d really like the user to be noticed about the progress…

    What am I doing wrong?

    Niko.

    PS: +1 for github 😀

    Reply
  12. niko
    June 20, 2008

    pls forget about the last comment… of course the flash notice isn’t added until the process action finishes.

    Reply
  13. Charlie Savage –
    June 20, 2008

    Hey Niko – Good point I forgot to mention. You have to tell rails that you’re passing the session id as URL parameter.

    As far as notifying a user about progress – you can’t really since this is a background asynchronous process. What we do instead is send a message to a user’s “Inbox” telling them when the action has completed. You could also send an email, sms, etc.

    Reply
  14. Thomas Kadauke
    June 23, 2008

    @Charlie: Our background processing plugin does not implement any communication protocol (like e.g. ActiveMessaging does), but it relies on the user to use an existing one, or implement a new one. In the case of ActiveMessaging, that is a single line of code. It’s really not that complicated, just a matter of how the code block is sent to the background process.

    The advantage of being able to switch communication protocols on the fly is that you get error recovery for free. Shipped with the plugin are fallback handlers that execute the code block in-process, write it to disk, or discard it altogether.

    The DRY part about this is that for using fallback handlers, you still only need to write

    background do
      # your code
    end
    

    if it fails to connect to your background process, the code block is automatically sent to a fallback handler (which can be configured in environment.rb).

    Reply
  15. Charlie Savage –
    June 23, 2008

    Thomas – Thanks for the comments. I understand that your background process can use various communication protocols to send blocks to background processes. What’s involved on the receiving end though to support the various protocols?

    As far error handling, is it really the case that if using ActiveMessaging to send a message fails then rolling over to some other protocol is going to work? That seems unlikely to me, but what are you seeing in practice.

    Reply

Leave a Reply

Your email address will not be published.

Top