Lost in Abstraction – What Went Wrong with GML

Joel Spolsky wrote a classic post back in 2001 titled Don’t Let Architecture Astronauts Scare You, which warns that the more abstract a solution to a problem is the less useful it becomes. Abstraction is a funny thing – just the right amount gives you the insight to solve previously unsolvable problems but too much obscures what you are trying to accomplish thereby leading you astray.

Atom is great example of an standard that gets the abstraction level right – as I wrote the other day its simple model of collections containing entries turns out be a very useful way to view the world.

In contrast, the Geography Markup Language (GML), which is the GIS industry’s primary data exchange format, is a standard that gets it wrong. Even if you know nothing about GIS or GML its a good example of trying so solve a problem by adding one too many levels of abstraction thereby solving nothing.

What’s Your View of the World?

GML sets out to solve an awfully difficult problem – how can different systems exchange geographic information (if you want to be hip, go with spatial information instead)?

The funny thing is the geometry part is fairly easy – put a few smart people in a room and they can come up with some reasonable way of describing points, lines, polygons and how they may or may not connect (you’d say roads connect with one another, but probably not to your front yard).

No, the hard bit is trying to describe the world. When you work with geographic information you immediately come face-to-face with the realization you’re trying to model the real world. Of course that’s true for any computer system, such as your company’s payroll system, but somehow dealing with the physicality of the world really drives the point home.

Thus the real problem GML tries to solve is how can your computer system and my computer system exchange data about the world in a meaningful way? In my opinion that’s an unsolvable problem, because the way your database models the world is different than mine. There is a fantastic book on the subject, called Data and Reality, that every programmer, architect and model should read. I wrote about it last year, and to quote the post:

Take a simple term – say a street. What it is it? Have you ever been on a street that ends, and then starts up in a few blocks? Is that the same street? How about a street whose name changes – is it still the same street? Does a street include boulevards, highways, freeways? Does a street cross city, county, state boundaries? Do you and I live on the same street, although I live in the United States and you live in Canada?

A Good Start – Take #1

GML 1 took a fairly simplistic way of solving the problem. You could define collections of features, with each feature having “normal” properties and geometric properties. For example, a road might look like this:

<Feature typeName="Road">
  <property typeName="classification">motorway</property>
  <property typeName="number" type="integer">11</property>
  <geometricProperty typeName="linearGeometry">
    <LineString srsName="EPSG:4326">
        0.0,100.0 100.0,0.0

Having implemented the first commercial support of GML (in the Smallworld Internet Application Server), I can say that GML 1 was easy to implement and gave us a good way to exchange data between clients and servers.

A Standard Goes Awry – Take #2

And then things fell apart. GML 2 abandoned its RDF like flavor and replaced it with…well…nothing. Instead, GML moved up a level of abstraction.

Say what? Instead of saying “this is how you encode a road” it now says “this is how you construct a model that will let you encode a road.” Thus GML turned itself into a modeling language. If you’re a programmer, a good analogy is UML, the universal modeling language. UML provides you the tools for modeling computer programs, which you then implement in some programming language. GML provides you the tools to model the world in the way you see fit, and then implement it via XML. To do this, GML uses XML Schema as its modeling language and makes use of every last obscure feature, including such gems as substitution groups.

What is a Client to Do?

So how does this work in the real world? Let’s say I want to model the road I showed above. First I have to create an XML Schema that defines a Road Feature, which must inherit from the appropriate abstract GML classes. Then I write out the road in an XML document that validates against the schema I just wrote.

But think of the poor client – what is it supposed to do with the xml document it just received from my server? Well – there are three obvious approaches.

First choice – hard-code the client to know exactly how I described a road.

Second choice – build a hugely complex piece of software that can read in the XML schema, generate an implementation of it on the fly, then load in the XML data. This is in fact possible – and exactly what I did when implementing our GML 2.0 support. However, after all that work, you still need a human to map the incoming data to your data model. In theory, being able to declare the mapping in a configuration file should be more flexible/powerful than coding the transformation by hand (like in the first choice). And maybe that’s true in some programming languages. But if you’re using a dynamic language like Ruby or Python, then the advantage is much less clear-cut. Having taken this approach once, I would not do it again.

Third choice – Every one agrees on a standard way of exchanging information and everyone then hard-codes their system to use it. This in fact is the solution that GML is headed towards by creating “GML Profiles.” As the GML Simple Profile Feature states:

The GML specification declares a large number of XML elements and attributes meant to
support a wide variety of capabilities…With such a wide scope, interoperability can only be achieved by defining
profiles of GML that deal with a restricted subset of GML capabilities.

At this point you might be asking why exactly you went through the whole effort of creating a custom xml schema or even using GML. Good question – I don’t have a good answer.

And This Solves What?

After seven years, the GML spec now weighs in at 548 pages with a 24 page index. Its numbing complexity has resulted in the development of several GML profiles, with the “Simple” one coming in at over 100 more pages.

In my view, the fundamental premise of GML is wrong. The ability to create custom data models is an anti-feature that makes integration between different computer systems impossible because it assumes that those systems can actually understand the data. Computer systems have no such intelligence – they only understand what someone has programmed them to understand.

To hit the sweet spot you must come up with a standard, simple format that every system can use. As Tim Bray, the father of XML, states:

The value of a markup language is proportional approximately to the square of the number of different software implementations that can process it. I could argue this from theory but would prefer to do so by example: HTML. RSS. PDF.

If you insist upon adding extensions, then at least come up with a standard base format like Atom has. GML offers no such simple standard, and as a result its adoption in the world is much less that it could have been, or should have been.

Instead mindshare has now switched to KML, a simpler and easier to understand format. KML is hardly perfect – you can’t define your own properties (as far as I can see) and it commits the cardinal sin of mixing presentation and data (didn’t we finally get past that with HTML and CSS?). But this game is over, KML has won.

Leave a Reply

Your email address will not be published. Required fields are marked *