:select meets :include (or a pitch for rparsec)

One reason developers like ActiveRecord is that it automatically generates the SQL needed for querying a relational database. Obviously, there are various knobs for controlling the generated SQL. You can use the :select option to modify the fields returned in a result a set. You improve performance can use the :include option to specify that related tables are loaded via joins.

Unfortunately, :select and :include don’t work together. The reason is that when you specify :include ActiveRecord renames every field returned in the query to avoid name conflicts. ActiveRecord then uses those field names to build a graph of ActiveRecord objects.

Supporting :select with :include requires a full-fledged SQL SELECT parser. This might not be obvious at first sight – it seems like you could just use regular expressions. But remember that the SELECT clause of a query can be quite complex. For example:

SELECT field1, 3+4 AS field2, 3/sum(max(3,2))
FROM some_table
JOIN some_other_table 
ON some_table.id = some_other_table.some_table_id

Not to mention that some database systems let you nest a SELECT inside of a SELECT.

Parser Combinators

There are a many ways of creating parsers, with the most famous being lex and yacc. To use lex and yacc you first specify the parsing rules using EBNF or equivalent. You then run lex and yacc to create a parser in your target language, which is generally C. However, there is a Ruby implementation of lex and yacc, called racc, which you can use to generate a parser in Ruby.

Another approach is to create your parser by hand. This might sound like a silly idea, but it has a couple of advantages. First, it teaches you how parsers work. Second, its simpler since you only have to know one language. Third, it lets you take advantage of the abilities of the language you are using.

One way to approach the problem is to use parser combinators. The idea came out of functional languages, so some of the papers on the web are a bit dense. But the basic idea is to create a number of different parsers, each that does a specific task, and combine them using the Composite pattern.

For example, let’s say I want to parse a simple SQL case statement:

SELECT CASE field1 < 100
 WHEN true THEN 'small'
ELSE 'big'

Here is some example Ruby code that does it:

# Parse a single when expression
when_clause = sequence(keyword[:when], lazy_expr, 
              (operator[':'] | keyword['then']),
              lazy_expr) do |_,cond,_,val|
  [cond, val]

# Parse the else statement
default_case = (keyword[:else] >> lazy_expr).optional

# Parse a case expression
case_expr = sequence(keyword[:case], lazy_expr,
               when_clause.many, default_case,
               keyword[:end]) do |val, whens, default, _|
  CaseExpr.new(val, whens, default)

Notice how readable it is. Working from the top down:

  • A when clause is a sequence of the keyword WHEN, followed by some expression, followed by : or the keyword THEN followed by an expression
  • The default clause is optional and consists of the keyword ELSE which proceeds some expression
  • A case statement is made up of the keyword CASE, followed by some expression, followed by any number of when clauses, a default clause and finally the keyword END.

The end result is a CaseExpr object that contains the value of the expression, an array of the when statements and a default else statement.

Not too bad is it? Note that everything is a parser. Look again at the variable default_case. It points at an optional parser that contains a sequence parser (created by >>). The sequence parser contains a keyword parser and an expression parser (which of course is a complicated parser in its own right).


You could build your own parser combinator framework for Ruby, but luckily Ben Yu has already created one called rparsec. Although the documentation says rparsec is a port of Haskell’s parsec library, it sure doesn’t look like it to me (although my Haskell skills are minimal at best). Instead, it looks like a port of the Java combinator parser framework described by Steven Metsker in his book Building Parsers with Java.

Steven’s book is a fantastic introduction to parser combinators – a must read if you’re interested in understanding parser combinators. It takes you step-by-step through creating your own parser and the pitfalls you will encounter along the way. And if you subscribe to Safari you can read it online.

A Rails Plugin

To use the select parser with Rails:

  1. Install the rparsec gem
  2. Install the rails plugin

Good luck and happy parsing!

Leave a Reply

Your email address will not be published.