Rails and Postgresql – Eliminate Hundreds of Thousands of Queries a Day

Update – It turns out that Rails does cache column data dictionary queries (which is what you would expect), but not for :has_and_belongs_to_many associations (HABTMA). I know those are “old fashioned,” but they fit our data model perfectly in a couple of places. So be warned – using just a couple of HABTMA associations will generate a huge number of data dictionary queries.

As part of monitoring the performance of MapBuzz, we run a nifty little program called PgFouine to analyze the postgresql log files every night. PgFouine summarizes the most common queries and slowest queries. Here is our data from yesterday:

Most frequent queries (N)

Rank
Times executed
Total duration
Av. duration (s)
Query

1
140,115
2m42s
0.00
SELECT a.attname, format_type(a.atttypid, a.atttypmod), d.adsrc, a.attnotnull
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
WHERE a.attrelid = ''::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum;

Besides being the most run query, this was also the seventh most time consuming query.

I was shocked the first time I saw this many months ago – my first guess was that we were not caching classes in our production mode. But it turned out that wasn’t the problem. Its just Rails being silly – every time it loops over a model’s columns it strikes up a conversation with the database. That happens a fair bit – when you use :include to add additional tables to a query, when you use dynamic finders (find_by_x), when you use relationships setup by :has_many, :has_and_belongs_to, etc. And this isn’t the only place Rails is wasteful – it constantly queries the database for table names and indices – it just happens those queries don’t run nearly as often.

Rails Plugin

Clearly there is no reason to do this in a production environment, and in truth, I don’t see much reason to do it in a development environment either. So yesterday I finally go around to patching Rails and submitting a bug report. The patch caches data dictionary queries for the Postgresql adapter. After loading the patch, Rails still supports the ability to add tables to your database at runtime, but no longer supports adding or removing columns from a table at runtime or recycling table names. If these things are important to you, the patch also provide a flush_dd_cache method that flushes the query cache. An obvious alternative solution is to add a cache_dd_info class variable to ActiveRecord::Base, which would be off by default but on in production. However, I’m skeptical there is a need for such a flag.

On a per request basis, you won’t see much performance gain from the patch in your Rails application servers, but it will remove needless load from your database. And while you are waiting for Rails to be patched (if it is patched), feel free to download the Rails plugin we are using to solve the problem. Note the plugin also fixes two other ActiveRecord bugs, which are its incorrect handling of Postgresql schemas and its ignoring of views.

  1. Charlie Savage –
    February 8, 2008

    Hi Rick,

    Ah interesting – I see what you mean. Did a bit more digging, and it looks like the issues is coming from HasAndBelongsToManyAssociation associations . I updated the bug report with the stack trace. I’m not that familiar with the internal of ActiveRecord, but it looks to me that the column caching code might be better in the base connection adapter class instead of ActiveRecord::Base, which would solve this issue.

    Reply
  2. Koz
    February 8, 2008

    We also have Base.reset_column_information.

    Perhaps it’s worth investigating moving all the caching into the adapters for 2.1, at present the output of Base.inspect will repeatedly query the database too.

    Reply
  3. Charlie Savage –
    February 8, 2008

    Hi Koz,

    I just took a quick look, and it doesn’t seem that hard. I’d approach it by renaming the various columns methods on the concrete adapters (Postgresql, mysql, etc) to read_columns. Then I’d add a columns method on AbstractAdapter which checks a cache, and if nothing is found, calls read_methods.

    It would also be good to remove the column caching on ActiveRecord::Base but there is on hangup – the looping over column names to set the primary key (which only ActiveRecord::Base knows). Not sure what do about that.

    Anyway, hope this helps.

    Reply

Leave a Reply to Charlie Savage - Cancel reply

Your email address will not be published.

Top