On Long Transactions

It was with great interest that I read that PostGIS 1.1.3 supports long transactions. Except there is one problem – if you dig into the documentation you’ll see it does no such thing.

Instead, PostGIS supports record level locking as defined in OGC’s Web
Feature Service (WFS) specification. According to the spec (section 10, page 34):

The purpose of the LockFeature operation is to expose a long term feature
locking mechanism to ensure consistency. The lock is considered long term because network latency would make feature locks last relatively longer than native commercial database locks.

This has nothing to do with long transactions, and really should be called
something like “record locking”

So What is a Long Transaction?

The term long transaction came out of the GIS industry to describe updates that take days or weeks or months to complete. It was coined to highlight the difference between normal database transactions, or “short transactions,” that take milliseconds to complete.

Most of what we do on computers are long transactions – writing documents,
creating spreadsheets, drawing graphics, writing new software, etc.

In the GIS world, long transactions are crucial for modeling the world. For example,
imagine a developer wants to build a new subdivision. Part of the required work is to design the subdivsion’s networks – roads, water pipes, sewer pipes, electrical lines and phone lines. Another part is to lay out the parcels – where the houses will
go. Creating these designs can take months, and it is often necessary to create
several different designs to find the optimal one.

While this works is being done, you want it to be isolated from other users so as to not disturb their work.

Versioned Databases

Two naive ways of implementing long transactions are:

Both of these approaches were tried in the industry, and unsurprisingly, failed. The problem is that they don’t scale in multi-user systems. Before long, users start stepping on each other toes and the whole system grinds to a halt.

Instead, what is needed is an approach that allows users to create their own “version” of the database, work on it as long as needed, and once its done, merge it back into the main database. If you are a developer, this should sound awfully familiar. Its the exact same functionality that branches in source control systems provide.

Implementations

Smallworld was the first commercial implementation of a GIS that had a versioned database that supported long transactions. Later, Oracle, working with Smallworld, introduced a similar technology in Oracle 9i called which they called Workspace
Manager. ESRI, the largest GIS vendor, also now supports long transactions.

Unfortunately, Postgresql/PostGIS does not support versioned databases. And
the new locking functionality it provides is almost useless because it won’t scale in multi-user environments. Of course, the PostGIS developers are just implementing a poorly thought out part of the WFS specification.

  1. August 7, 2006

    Looking into this new feature was on my todo list. I appreciate your evaluation. PostgreSQL doesn’t appear to be close to fully supporting versioning. There are several third party tools for database replication — Slony, for example — which might be useful for creating branches, but nothing that obviously merges databases. That’s the real nut, yes?

    By coincidence, I’m also reading “Data and Reality”. I can’t remember whether you put me onto it earlier this year, or whether it was a recommendation from Bill de h├ôra. Good stuff, and fun.

    Reply
  2. Charlie
    August 8, 2006

    Hey Sean,

    Thanks for the comment. Yeah, Postgresql doesn’t support versioning and I don’t see any indications on the mailing lists that adding it has even been discussed.

    Versioning is a really powerful technology for some types of applications – GIS, genomic applications, document management, etc.

    But its not as useful for online transaction processing (OLTP) or online analytical processing (OLAP), the core markets for relational databases.

    Oracle needed versioning for Oracle Spatial, but I’d guess it required deep changes to the database (I wasn’t involved in that project, it was Smallworld’s Cambridge developers who were involved). I’d also guess the same is true for Postgresql – so its nothing the PostGIS developers could hack onto postgresql.

    As far as replication, that’s a different thing. A versioned database stores diffs between branches and manages merging changes upwards and downwards (from the branch up to the trunk, from the trunk down to the branch). Also remember that versions can form a tree any number of levels deep and you may wish to keep versions forever.

    Replication, on the other hand, copies the entire database and is designed for keeping two different copies of the database in sync.

    For comparison, think about branches in SVN (i.e, a versioned database) versus resyncing your local copy with the master copy (more like replication) – two very different things.

    Last, glad you’re liking Data and Reality. Its a great book!

    Reply

Leave a Reply

Your email address will not be published.

Top