Technical Paper 9 – The Why and the How of the Long Transaction

by Richard G. Newell

Abstract

The recent literature on GIS technology has seen the emergence of a new set of
terminology including “long transaction”, “version management”,
“check-out”, “check-in”, “seamless mapbase” and so on.
As is common with new terminology, few people understand what
these things mean, why they are important and what they are for. This paper attempts
to explain what a long transaction is, why it is necessary and how it is managed
in a GIS.

Introduction

Any user of a GIS, mapping system, CAD system, word processor or indeed any system
which involves updating data over a significant period of time is in fact engaged
in a long transaction. Contrast this with the user of a commercial DBMS application
such as banking or airline reservation. In such an application, a user may prepare
an input screen over a period of a few seconds which then updates the system resulting
in a transaction which lasts a small fraction of a second.

What is the difference between these two uses of a computer system? Why is it
that the short transaction mechanisms implemented in all of today’s commercial
DBMSs do not satisfy the requirements of design-type systems?

Long transaction applications in a GIS involve the modification of data that is
relatively static. These include:

Data conversion
Map-base and asset management
Analysis which produces large amounts of intermediate results
Design studies with multiple alternative designs
Short transaction examples occur in the everyday operational use of a system.
These might include:
Vehicle tracking
Customer service bureau
Fault logging
Emergency planning

Drawing office managers, using paper drawings are forced into implementing a long
transaction mechanism, known as drawings management. It is impractical to allow
two draftsmen to have a drawing out for update at the same time. There is only
one master copy of every drawing and only one draftsman at a time can modify it.

Any other user can have read only access to a copy of a drawing, in which case
the information he has may be out of date, and this is usually deemed acceptable.
In fact it is not uncommon in organisations such as local authorities, for there
to be multiple copies of the same set of maps, all independently updated and maintained.
Although this is deemed to be not so acceptable, these organisations put up with
it because there is no alternative where the maps are on paper.

Sheet-Based or Tiled Systems

The early digital mapping systems were based on CAD systems where the mapbase
was held as a collection of CAD drawings. The methods of managing such a system
where multiple users wish to access and update the mapbase are based on the manual
drawing office approach. Indeed there is a market for Document Management systems
in which an intelligent drawing register is held in a database alongside the drawings
to record the status of each drawing in the registry. The advantage of these sheet
based digital mapping systems is their simplicity, but their disadvantage is that
handling any object which lies across a tile boundary becomes exceedingly cumbersome.

However, as digital mapping systems tried to evolve into truly seamless databases,
it was found that the information that needed to be held to store the relationships
between parts of objects on adjacent map-sheets was not nearly so easy to manage
as the strictly partitioned map-base that one found in digital mapping systems.
Indeed, it was found that so much code was required in map-management systems that
more modern systems took a radically different approach which abandoned the concepts
of sheets and tiles to implement a truly seamless database.

This appeared attractive because it allowed implementors to move from a a file
based system to an implementation based on a database. It was now feasible to hold
all of the map data in a commercial relational database management system (RDBMS).
Much of the functionality that is provided by these systems is required to handle
the large data volumes involved in a GIS.

Commercial Relational Database Management Systems

The commercial relational database vendors have invested man-centuries of effort
in producing very robust systems which ensure the safety and integrity of data
at all times. They also include rich facilities for designing and building data
models, an aspect which everybody now realizes is the most important point of departure
in building a GIS. However, vendors who build their systems on such engines have
to overcome three things which are not provided by the database vendors:

Spatial modelling and queries
Performance of spatial queries
Long transaction handling

On the first of these two, the database vendors and the standards organisations
are beginning to make progress in addressing them. Indeed, provided the RDBMS provides
facilities to control data clustering, it is not too hard to obtain adequate spatial
performance. However, on the issue of long transactions, there is little sign yet
that the vendors are doing anything. This may be because the GIS market is considered
to be small compared to the total DBMS market, and it does not get the attention
that it deserves. Also it requires fundamental changes to existing approaches.

Commercial DBMSs are designed to handle short transactions and to maximize transaction
throughput. In theory, one could use the short transaction mechanism to handle
multiple users of a GIS, but it is not effective, for the following reasons.

Commercial DBMS vendors adopt one of two approaches to transaction locking, known
as the pessimistic approach and the optimistic approach. In the pessimistic approach
the system requests locks on all records that are to be updated before commencing
a transaction. Thus all other users are locked out of accessing these records.
When the transaction is finally closed the locks are released. The problem with
this is that if one imagines many users holding locks for a long period of time,
the system becomes totally unusable because other users are denied access to the
locked records.

In the optimistic approach, each user carries on updating records within the privacy
of his own transaction and if, at the time the transaction is closed, a conflict
is detected, he may well lose all of the work that he has just completed. For a
small amount of work this is deemed to be acceptable, but if somebody has been
working for days or weeks it certainly is not.

In either method, should there be a system failure while a transaction is open,
then all work is lost. Thus these methods can handle transactions which are open
for a few seconds or minutes, but certainly not those which last for hours or days.

So in order to get round this problem, GIS vendors who base their systems on commercial
RDBMS avoid using the short transaction mechanism completely and instead implement
a long transaction mechanism of their own called
“check-out”.

Check-out and check-in

In a system which employs check-out, the user who wishes to update the database
requests of the system the part of the database he wishes to work on to be copied
into a single user database. Whether or not the single user database is proprietary
or is a commercial RDBMS does not matter as it only handles temporary data.

One advantage that stems from doing this is that the checked out database can
be held on the local disk of a workstation, and so the user puts no load at all
on the database server or the network while he is working. The only work that the
server has to deal with is checking out data and then later checking in the updates.

However, there are a number of disadvantages of using check-out:

Check-out may take a long time
The user is restricted to a subset of the database
The handling of alternatives is cumbersome
It is difficult to maintain relationships between that data which is checked
out and that which is not.

From the vendor’s point of view, the biggest disadvantage of check-out is that
to make it work effectively requires an enormous amount of develop-ment effort
and so the gains made on relying on the R&D resources of the DBMS vendors are
lost in overcoming the limitations of the short transaction.

A further problem is that, as in the two mechanisms of short transactions, the
same applies to check-out. Either one locks the data intended for update or one
employs a system of conflict resolution on check-in. In the former case, the system
suffers the same problems as in pessimistic locking. At least in the latter case
one can salvage most of the work in the event that a conflict is detected. However,
conflict resolution is not a trivial matter.

Given that there are still problems of using check-out to overcome the long transaction
problem, one has to ask what are the alternatives. Either, one has to wait for
the commercial DBMS vendors to provide long transaction support and indeed some
of the early implementations of object-oriented database management systems claim
to address this issue, or the vendor has to implement his own long transaction
mechanism.

The lack of long transaction support is the most serious short coming of today’s
commercial RDBMSs for the support of GIS.

One of the most powerful approaches to handling long transactions is to implement
a mechanism for version management deep in the database engine itself.

Version Management

A version managed database is capable of holding any number of versions of the
whole database without replicating data that is common between versions. Thus all
users can see the whole database at all times, subject to any changes made within
the privacy of their own versions.

A long transaction commences with the creation of a new version from an existing
version. At the start, the new version will look identical in all respects to the
parent version from which it was created. However, as the user proceeds in modifying
the database, the database stores the effects of his changes, but no other user
operating in a different version can see these changes. The user of course works
within a sequence of short transactions, each of which can be committed at any
time. Thus the database can store persistently the results of a long transaction
at all stages in its evolution. Intermediate commit stages may sometimes be given
a name, in which case they are known as check points

The operation of closing a long transaction is achieved by merging any changes
that have been made by other users to the parent version, followed by posting the
combined changes back up to the parent. The step of merging the parent’s changes
is where conflicts may be detected and dealt with.

As in the case of check-out, version management also minimizes the load on the
database server by maximizing the utilization of the workstation. This allows good
performance of many workstations on one server. This contrasts with the situation
in most commercial DBMSs in which query processing for all users is carried out
by the database server, thus giving it an enormous workload in a large system.

Version management has many advantages over both map-management and check-out
in handling the long transaction issue:

No delay before commencing update
Always access to the whole database by all users
Simultaneous alternatives can be handled

One of the disadvantages of version management is that it is extremely difficult
to implement it on top of a commercial DBMS, thus the vendor is forced either into
the compromise of check-out or of implementing his own database engine which supports
the concept. However, one of the most difficult aspects of database implementation
is handling short transactions efficiently. Since this is not required for most
GIS applications it is much easier to build a robust system with good performance.

Short Transactions in a GIS

There is much data in corporate DBMSs which needs to be accessed from a GIS. Most
of this data is maintained in a commercial DBMS in a short transaction environment.
Short transactions are important where it is essential for all users of the system
to see the most up-to-date version of the database. GIS access to such data is
typically for read purposes only and thus a simple interface mechanism will normally
suffice. It is of course desirable from the user’s point of view to hide differences
in the user interface between the GIS and the external DBMS. In cases where it
is also desirable to maintain short transaction data via the GIS user interface,
then it makes a lot of sense to use a commercially available database engine.

Summary

All multiple-user GIS systems maintain their data by using a long transaction
mechanism. This paper has examined simple mapping systems which maintain multiple
map-sheets using a document management approach, through systems which try to maintain
continuity between map-sheets by means of an extension of document management known
as map management, ultimately leading to truly seamless GIS systems which need
a different approach. Two approaches available in the market place are explored,
namely check-out and version management. The drawbacks and benefits of all approaches
are described. The conclusion is reached that the most elegant and powerful solution
is version management and the lack of support for this in today’s commercial RDBMSs
is a major drawback to using these systems to underpin a GIS. Thus today’s vendors
who wish to bring the benefits of version management to their customers must implement
it themselves.

CFIS

CFIS

CFIS

Technical Paper 9 – The Why and the How of the Long Transaction

Abstract

Introduction

Sheet-Based or Tiled Systems

Commercial Relational Database Management Systems

Check-out and check-in

Version Management

Short Transactions in a GIS

Summary