Posted by Charlie
Wed, 11 Jul 2007 18:47:00 GMT
Chris Tweedie posted an interesting reply to my rant about GIS standards and took me to task for "WMS bashing" and why WMS is a useful Enterprise standard. Perhaps - except I clearly stated in my original article that I was only talking about standards used on the web and not in the Enterprise.
Nevertheless, Chris does a good job of pointing out the flaws of tiled mapping systems. Its the usual litany of suspects - they take a lot of disk space, aren't standardized, don't support arbitrary scales and don't support customized styles.
Having spent four years of my life writing a WMS server, the Smallworld Internet Application Server (SIAS), I'm all too familiar with these issues. But in the end it all boils down to a fundamental conflict - map styling versus performance.
The Importance of Style
Even the slightest hint that you're limiting styling options is enough to send customers into a rage. And no, I'm not kidding.
This is important stuff - so important that there are laws in Germany, and probably other countries, saying exactly how maps should be rendered. I remember long, agitated discussions with German customers about how SIAS rendered circles. Our circles weren't perfect, they were off slightly off for reasons I no longer remember and in ways no one would ever notice. But it made our system non-compliant under German law, so had to be fixed.
Styling in Smallworld is totally customizable. It can vary based any number of things - here is just a small subset:
- Object type - interstates are blue, main roads are red
- Object attribute - Steel pipes are gray, copper pipes are yellow
- Scale - A pipe is a green at less than 1:10,000, blue for 1:10,000 and greater
- Area - Roads in England are not rendered the same as in the United States
- And of course the kicker - users could override the rendering methods themselves and do whatever they pleased.
Smallworld then adds the concept of "drawing applications" (yeah, horrible name), which basically means that an engineer wants to see one set of styles, while a customer service representative wants to see another.
So when building SIAS, one of the absolute, you must meet this requirement or not bother building a product, was faithfully reproducing the styles users had set up in their Smallworld databases.
The Importance of Performance
At the same time, producing a beautifully rendered map counts for diddly if users have to wait a minute for it to show up in their browser.
Now Smallworld's rendering system is fast - after fifteen years of optimization it is the fastest GIS rendering system I've ever seen (its used to blow away ESRI, maybe it still does). And it had a clever caching system that made it possible to keep nearby geographic features in memory, making panning and zooming operations lighting fast for desktop clients.
This is really important - one of the things most people don't appreciate is how much data it takes to render a map. For example, say you want to make a detailed map of Manhattan. It will include tens of thousands of street segments, parcels, buildings, etc. All of which have to be fetched from the database. Then you have to look up the styles for each feature, and finally, render the map. So local, in-memory, caching is key - even today.
But this system breaks down on the Intranet or Web. A SIAS server may support tens or hundreds of users - each interested in a different geographic area. Thus you're almost guaranteed to break the cache, resulting in expensive queries back to the database.
One obvious solution is tying clients to servers based on their bounding box, but we never went down that route because we didn't see a way of doing it in a generic manner that would work out of the box.
So to support arbitrary styles and scales, and have decent performance, requires a lot of hardware. A whole lot of hardware. And that introduces a whole host of other issues - managing the hardware, client sessions, updates, etc. All of which are solvable, but it still takes time to get right.
And the Twain Shall Meet
So is there a middle ground? I think so - if you're willing to throw away arbitrarily scales and don't mind using lots of diskspace.
From talking to customers over the years, I don't think arbitrarily scales are as important as people think. What is important is what features are displayed at different scales and how they are styled. For examples, roads should be visible when the scale is less than 1:20,000 and be drawn in red.
In a typical Smallworld setup, users would create roughly 10 of these rules, which were called display scales. Other GIS systems have similar concepts. Thus, these display scales become the basis for your tiled zoom levels - although you would probably want to make sure you have 15 to 20 of them.
So what are the downsides? There are myriad:
- Users can't change map styling on the fly (and if you really want this, then you're a power user and should just install a desktop client)
- Updates to the database invalidate tiles, you need a process to determine which tiles have been invalidated and then regenerate them
- In versioned databases, like Smallworld, you can really only support one version (unless you have *lots* of diskspace and time, a typical Smallworld database would have hundreds or thousands of alternatives).
And the advantage? You move map rendering out of the main code path. For an Intranet or Web client, I think its a no-brainer, you have to do it if you want a scaleable system.
A Toy Standard
As you might have guessed by now, I think WMS is a fatally flawed standard. Its a "toy" standard - its great if you have a few users but its extremely difficult to scale - whether you are on the Web or in the Enterprise.
Its difficult to scale because it doesn't constrain the problem. It imagines a world of instant map rendering where any client can request any bounding box, any scale, any coordinate system and any styling. Such a world does not exist today. Maybe it will in five or ten years, but that's doesn't help us now.
The obvious solution is to constrain the problem. If this seems like a horrible thing to do then just think of the Web. There is only one way to address things (URIs), there are only a few actions you can perform (HTTP has a handful of verbs), there is no central authority (and thus you get broken links), etc.
In the map rendering world, the constraints are fairly clear - fixed scales, fixed bounding boxes (ie, tiles) and fixed styling via pre-defined style groups. If you're willing to make those three simplifications, then you can create a Web Mapping Standard that really works.
Posted in GIS, Smallworld, Web | 8 comments | no trackbacks
Posted by Charlie
Tue, 08 Aug 2006 08:03:00 GMT
For those interested in the history of GIS, in the late 1980's and early
1990's, the founders of Smallworld laid out their vision for the future of
GIS in a series of technical papers. I recently noticed the papers are no
longer online, so I fished them out of the Internet
Archive WayBack Machine and have posted them
on my site.
Fifteen years later, its interesting to reread the papers, and see how these
ideas changed the industry. The best known of the articles is Ten
Difficult Problems in Building a GIS, by Richard Newell. The keypoints are:
- Spatial data should be stored in seamless databases, not tiled systems
- Spatial databases should support huge amounts of data
- Spatial databases should be versioned to enable long
transactions
- Topology should
be supported
- Vector and raster data should be supported
- Interaction with spatial data should be done via a dynamic,
object-oriented language
(in the same way Ruby and ActiveRecord work in Ruby on Rails)
These ideas were so far ahead of their time that they propelled Smallworld
into a hundred million dollar a year company and an IPO on Nasdaq a mere
six years after its founding in 1990. They also created an extraordinarily
loyal user base. Once you used it, you never wanted to go back. Just like Mac
users knew their machines were vastly superior to Wintel boxes, Smallworld
users knew their software was light years ahead of anything ESRI, or anyone
else, offered.
When I started
at Smallworld in 1997 no other GIS system had these features. ESRI was struggling
to overcome its ancient, ArcInfo/ArcGIS tiled-based technology that used AML
and Avenue for customization. Technically, we beat them hand-downs in every
technical benchmark. When we lost a deal, it was always for political
reasons.
It was only recently have these ideas have entered the mainstream:
- Spatial data is stored in relational database such as
Oracle or PostGIS
- Terabyte size GIS databases are common
- Smallworld, Oracle and ESRI support versioned databases
- Google maps has trained user to think something is wrong if you can't see
vector data overlaid on top of raster data
- Perl, Python and Ruby show the productivity gains provided by object-oriented,
dynamic languages
But even today, there are very few environments that combine all these elements
together. And Smallworld still
has some unique features. For example, it has the concept of worlds. In most
GIS systems there is one world - the outside world where you see map data.
But let's say your map shows a building. Often times it is useful to click
on the building and go inside of it - you've entered the building's world which
has its own coordinate system and bounds. Once inside the building, you may
want to open a switch box and see how fibers connect to each other. And then
you might want to know where does a particular fiber lead, what customers will
be impacted if it gets turned off.
Another thing Smallworld excels at is speed - it was built when network connections
were painfully slow. Thus the system does some very clever caching at
the client, resulting in near instantaneous response times, even when working
against a terabyte sized database. All this happens under the hood, the user
doesn't have to know anything about it. And it even works across dial up lines,
which of course were the norm back in the early 90's.
So if you have a few minutes, its definitely worth you time to look through
these papers.
Posted in GIS, Smallworld | 1 comment | no trackbacks
Posted by Charlie
Sun, 06 Aug 2006 03:12:00 GMT
It was with great interest that I read that PostGIS 1.1.3 supports long
transactions. Except there is one problem - if you dig into the documentation
you'll see it does no such thing.
Instead, PostGIS supports record level
locking as defined in OGC's Web
Feature Service (WFS) specification.
According to the spec (section 10, page 34):
The purpose of the LockFeature operation is to expose a long term feature
locking mechanism to ensure consistency. The lock is considered long term because
network latency would make feature locks last relatively longer than native
commercial database locks.
This has nothing to do with long transactions, and really should be called
something like "record locking"
So What is a Long Transaction?
The term long transaction came
out of the GIS industry to describe updates that take days or weeks or months
to complete. It was coined to highlight the difference between normal database
transactions, or "short transactions," that take milliseconds to complete.
Most of what we do on computers are long transactions - writing documents,
creating spreadsheets, drawing graphics, writing new software, etc.
In the
GIS world, long transactions are crucial for modeling the world. For example,
imagine a developer wants to build a new subdivision. Part of the
required work is to design the
subdivsion's networks - roads, water pipes, sewer pipes, electrical lines and
phone lines. Another part is to lay out the parcels - where the houses will
go. Creating these designs can take months, and it is often necessary to create
several different designs to find the optimal one.
While this works is being
done, you want it to be isolated from other users so as to not disturb their
work.
Versioned Databases
Two naive ways of implementing long transactions are:
Both of these approaches were tried in the industry, and unsuprisingly,
failed. The problem is that they don't scale in multi-user systems. Before
long, users start stepping on each other toes and the whole system grinds to
a halt.
Instead, what is needed is an approach that allows users to create
their own "version" of the database, work on it as long as needed, and once
its done, merge it back into the main database. If you are a developer, this
should sound awfully familiar. Its the exact same functionality that branches in
source control systems provide.
Implementations
Smallworld was the first commercial implementation of
a GIS that had a versioned database that supported long transactions. Later,
Oracle, working with Smallworld, introduced a similar technology in Oracle
9i called which they called Workspace
Manager. ESRI, the largest GIS vendor, also now supports
long transactions.
Unfortunately, Postgresql/PostGIS does not support versioned databases. And
the new locking functionality it provides is almost useless
because it won't scale in multi-user environments. Of course, the PostGIS developers
are just implementing a poorly thought out part of the WFS specification.
Posted in GIS, Modeling, Smallworld | 2 comments | no trackbacks
Posted by Charlie
Sun, 06 Aug 2006 03:12:00 GMT
It was with great interest that I read that PostGIS 1.1.3 supports long
transactions. Except there is one problem - if you dig into the documentation
you'll see it does no such thing.
Instead, PostGIS supports record level
locking as defined in OGC's Web
Feature Service (WFS) specification.
According to the spec (section 10, page 34):
The purpose of the LockFeature operation is to expose a long term feature
locking mechanism to ensure consistency. The lock is considered long term because
network latency would make feature locks last relatively longer than native
commercial database locks.
This has nothing to do with long transactions, and really should be called
something like "record locking"
So What is a Long Transaction?
The term long transaction came
out of the GIS industry to describe updates that take days or weeks or months
to complete. It was coined to highlight the difference between normal database
transactions, or "short transactions," that take milliseconds to complete.
Most of what we do on computers are long transactions - writing documents,
creating spreadsheets, drawing graphics, writing new software, etc.
In the
GIS world, long transactions are crucial for modeling the world. For example,
imagine a developer wants to build a new subdivision. Part of the
required work is to design the
subdivsion's networks - roads, water pipes, sewer pipes, electrical lines and
phone lines. Another part is to lay out the parcels - where the houses will
go. Creating these designs can take months, and it is often necessary to create
several different designs to find the optimal one.
While this works is being
done, you want it to be isolated from other users so as to not disturb their
work.
Versioned Databases
Two naive ways of implementing long transactions are:
Both of these approaches were tried in the industry, and unsuprisingly,
failed. The problem is that they don't scale in multi-user systems. Before
long, users start stepping on each other toes and the whole system grinds to
a halt.
Instead, what is needed is an approach that allows users to create
their own "version" of the database, work on it as long as needed, and once
its done, merge it back into the main database. If you are a developer, this
should sound awfully familiar. Its the exact same functionality that branches in
source control systems provide.
Implementations
Smallworld was the first commercial implementation of
a GIS that had a versioned database that supported long transactions. Later,
Oracle, working with Smallworld, introduced a similar technology in Oracle
9i called which they called Workspace
Manager. ESRI, the largest GIS vendor, also now supports
long transactions.
Unfortunately, Postgresql/PostGIS does not support versioned databases. And
the new locking functionality it provides is almost useless
because it won't scale in multi-user environments. Of course, the PostGIS developers
are just implementing a poorly thought out part of the WFS specification.
Posted in GIS, Modeling, Smallworld | 2 comments | no trackbacks
Posted by Charlie
Thu, 20 Jul 2006 07:55:00 GMT
A great debate in
linguistics is how much language influences thought. In the world of computer
science, I am firm believer in the theory. Paul Graham, amongst many
others, has nicely argued the yes side of the debate.
Many developers start off in statically typed languages like I did. I learned
to program using Pascal, did a bit of Assembly and then settled in
with Pascal via Delphi.
The first dynamic language I used was Magik - it was quite a shock. Since no one has ever heard of Magik, its a proprietary language used in the Smallworld GIS system that is quite similar to Ruby.
Magik had awful tools, no static type checking, no debugger, no GUI building tools,
etc. And no compiler, at least not in the sense I was used to.
First Impressions Can Be Misleading
Needless to say, my first impressions were less than enthusiastic. Everyone
kept telling me the environment was so much more productive, but I didn't buy
it. I could churn out object pascal almost as fast, and Delphi's blazingly fast
compiler and great tools made up for any difference. I suppose people were comparing
to C++, where at the time you might as well have gone off and had several cups
of coffee between each edit-compile-test cycle. Or maybe to toy languages like AML (another proprietary language from the GIS world).
Anyway, for beginners, the environment was just awful. The library documentation
was non-existent - if you wanted to know what a method did, you went and read
the source code. No sir,
no fancy online hyper-linked context sensitive help files.
And the final nail in the coffin, the Magik IDE was an albatross called
Emacs. Emacs drove me too such distraction I went off and wrote my own
IDE (sorry, ten years haven't changed my mind about Emacs, but I sure like VIM).
But I was paid to write Magik, so I wrote Magik. After a rough start, things
started looking up a bit. It sure was
nice not worrying about memory allocation and deallocation. And having an interactive
console, where you could poke around inside a running program, that sure was
neat. And then there were the truly weird things - like dynamically
loading classes. Even better, you could replace methods in an existing
class by simply redefining them in another file (we called it reopening
classes, the term today is monkey patching). And you could pass functions as parameters via the use of procs which
were closures - although I didn't know that at the time.
Stuck With Complexity
Almost ten years later, I find myself mostly programming in Ruby, Javascript
and C. Yikes, what happened? I'm as surprised as anyone.
My take is that contrary to popular wisdom,
a good language gets out of your way and lets you do what you need to. This
is quite counterintuitive. Computer programs are pinnacles of brittle complexity
- one tiny mistake in millions of lines of code brings the whole edifice crashing
down. The natural inclination is to make the walls of that edifice as thick and
strong as possible. Java is a great example of this line of thought, you can
see examples of it throughout its design:
- Use of static type checking
- Polymorphism only through
inheritance of classes or interfaces
- Final classes
- The forced use of exception specifications
- The forced handling of exception specifications
- Difficulty in modifying code at runtime
- Strong encapsulation
- Clunky reflection
These things make the language less malleable. In return, the payoff should
be more robust programs. But do you really get that? My experience is no, but
I would love to hear about any references to studies or research that can provide
a definitive answer either way.
Trusting my experience, I don't believe programs written in Java (or C++,
etc.) are on average more robust than programs written in Python, Ruby, Smalltalk,
Perl, etc. So what has the loss of malleability cost you? Once again, my experience
tells me quite a lot.
Given a reasonable sized program, I can guarantee you a
few things:
- It contains bugs
- It's used in ways the designers and developers
never imagined
- It's execution environment is constantly changing
If you're stuck with a brittle edifice of complexity you don't want it to be
a fortress complete with ten foot walls and surrounded by a serpent filled
moat. What you want is a building with an open floor pan
where you can nudge a wall here, add one there and remove one over
there.
In more concrete terms, if code is buggy then
you want to be able to write up a patch, throw it in a directory somewhere, and
have the application load it automatically replacing the invalid code. Or closely
related, you want to provide a simple mechanism to add in new functionality,
just like Selenium does
via its user user-extensions.js file.
Maybe you need to graft on a major piece of new functionality, such as adding
support for serializing objects to JSON.
One approach is to open up the base Object class and add a new method, toJSON,
just as Rails 1.1 did.
Or let's say you find yourself typing in the same boilerplate code
over and over. Why not write a method that tells the language to do this for
you? Ruby and Rails are filled with this type of metaprogramming, just as Magik
is and of course the granddaddy of the technique, Lisp. Soon you're on the road
to creating your own domain
specific languages, one of the hot topics
du jour.
Or maybe you need to retrofit Object X so that it can be processed by Method
A which expects to be passed an Object A. For some reason, Object X cannot inherit
from Object A. So instead you leverage duck
typing to add the needed methods to Object X.
These things are easy to do in some language, hard in others, and impossible
in others.
Really, I Know What I'm Doing
Mastering a skill requires mastering its tools - be it construction, sword fighting,
cooking, bike riding, flying, etc. The techniques above are some of the sharp
tools of programming. You can use them to quickly make mince-meat
out of your problem - or, on not so good days, mince meat out of your fingers.
But when you make dinner tonight, I'm guessing you're not reaching for the dullest
knife in the drawer.
Posted in Design, JavaScript, Magik, Ruby, Smallworld | 5 comments | 1 trackback