Posted by Charlie
Mon, 10 Sep 2007 20:03:00 GMT
Its been over a year since I blogged about GEOS, an open source project that provides rich functionality for analyzing
and manipulating geometries and is a key part of PostGIS.
Sandros and Mateusz spent a ton of time refactoring and improving GEOS last year, and it looks like version 3.0 will finally be released this fall. As part of the 3.0 release, I've completely rebuilt the Ruby and Python bindings to use GEOS's C API. In theory this should decouple the bindings somewhat from GEOS releases - we'll see if that works out in practice.
Over the last couple of weeks, Mark Cave Ayland and I have spent time polishing off some rough edges. We managed to:
- Fix various autoconf build issues
- Get GEOS building with both older and newer versions of MingW and Msys on Windows
In addition, I added VC++ project files for the Python and Ruby bindings and updated the Ruby bindings to be more "Ruby" like.
Long story short - building GEOS and the Ruby/Python bindings should now be easier and more fool proof. If you're in the GNU world, its as simple as:
./configure --enable-python --enable-ruby
If you're in the Microsoft world, just open the VC++ solution in the source tree (thanks Mateusz), tweak the locations of your Ruby and Python installations, and hit the compile button.
So if you are using, or planning to use, the bindings I'd recommend grabbing the latest from SVN. Once you've built everything, then take a look at the test scripts I've written in the swig/ruby/test and swig/python/test directories. They should provide enough examples to get you started.
Feedback is of course welcome, particularly from Python users.
Since I use Ruby day in and day out, I know the right Rubyisms
to put into the bindings. But I don't know Python well enough to know the right Pythonisms.
Posted in GIS, Ruby | no comments | no trackbacks
Posted by Charlie
Wed, 11 Jul 2007 18:47:00 GMT
Chris Tweedie posted an interesting reply to my rant about GIS standards and took me to task for "WMS bashing" and why WMS is a useful Enterprise standard. Perhaps - except I clearly stated in my original article that I was only talking about standards used on the web and not in the Enterprise.
Nevertheless, Chris does a good job of pointing out the flaws of tiled mapping systems. Its the usual litany of suspects - they take a lot of disk space, aren't standardized, don't support arbitrary scales and don't support customized styles.
Having spent four years of my life writing a WMS server, the Smallworld Internet Application Server (SIAS), I'm all too familiar with these issues. But in the end it all boils down to a fundamental conflict - map styling versus performance.
The Importance of Style
Even the slightest hint that you're limiting styling options is enough to send customers into a rage. And no, I'm not kidding.
This is important stuff - so important that there are laws in Germany, and probably other countries, saying exactly how maps should be rendered. I remember long, agitated discussions with German customers about how SIAS rendered circles. Our circles weren't perfect, they were off slightly off for reasons I no longer remember and in ways no one would ever notice. But it made our system non-compliant under German law, so had to be fixed.
Styling in Smallworld is totally customizable. It can vary based any number of things - here is just a small subset:
- Object type - interstates are blue, main roads are red
- Object attribute - Steel pipes are gray, copper pipes are yellow
- Scale - A pipe is a green at less than 1:10,000, blue for 1:10,000 and greater
- Area - Roads in England are not rendered the same as in the United States
- And of course the kicker - users could override the rendering methods themselves and do whatever they pleased.
Smallworld then adds the concept of "drawing applications" (yeah, horrible name), which basically means that an engineer wants to see one set of styles, while a customer service representative wants to see another.
So when building SIAS, one of the absolute, you must meet this requirement or not bother building a product, was faithfully reproducing the styles users had set up in their Smallworld databases.
The Importance of Performance
At the same time, producing a beautifully rendered map counts for diddly if users have to wait a minute for it to show up in their browser.
Now Smallworld's rendering system is fast - after fifteen years of optimization it is the fastest GIS rendering system I've ever seen (its used to blow away ESRI, maybe it still does). And it had a clever caching system that made it possible to keep nearby geographic features in memory, making panning and zooming operations lighting fast for desktop clients.
This is really important - one of the things most people don't appreciate is how much data it takes to render a map. For example, say you want to make a detailed map of Manhattan. It will include tens of thousands of street segments, parcels, buildings, etc. All of which have to be fetched from the database. Then you have to look up the styles for each feature, and finally, render the map. So local, in-memory, caching is key - even today.
But this system breaks down on the Intranet or Web. A SIAS server may support tens or hundreds of users - each interested in a different geographic area. Thus you're almost guaranteed to break the cache, resulting in expensive queries back to the database.
One obvious solution is tying clients to servers based on their bounding box, but we never went down that route because we didn't see a way of doing it in a generic manner that would work out of the box.
So to support arbitrary styles and scales, and have decent performance, requires a lot of hardware. A whole lot of hardware. And that introduces a whole host of other issues - managing the hardware, client sessions, updates, etc. All of which are solvable, but it still takes time to get right.
And the Twain Shall Meet
So is there a middle ground? I think so - if you're willing to throw away arbitrarily scales and don't mind using lots of diskspace.
From talking to customers over the years, I don't think arbitrarily scales are as important as people think. What is important is what features are displayed at different scales and how they are styled. For examples, roads should be visible when the scale is less than 1:20,000 and be drawn in red.
In a typical Smallworld setup, users would create roughly 10 of these rules, which were called display scales. Other GIS systems have similar concepts. Thus, these display scales become the basis for your tiled zoom levels - although you would probably want to make sure you have 15 to 20 of them.
So what are the downsides? There are myriad:
- Users can't change map styling on the fly (and if you really want this, then you're a power user and should just install a desktop client)
- Updates to the database invalidate tiles, you need a process to determine which tiles have been invalidated and then regenerate them
- In versioned databases, like Smallworld, you can really only support one version (unless you have *lots* of diskspace and time, a typical Smallworld database would have hundreds or thousands of alternatives).
And the advantage? You move map rendering out of the main code path. For an Intranet or Web client, I think its a no-brainer, you have to do it if you want a scaleable system.
A Toy Standard
As you might have guessed by now, I think WMS is a fatally flawed standard. Its a "toy" standard - its great if you have a few users but its extremely difficult to scale - whether you are on the Web or in the Enterprise.
Its difficult to scale because it doesn't constrain the problem. It imagines a world of instant map rendering where any client can request any bounding box, any scale, any coordinate system and any styling. Such a world does not exist today. Maybe it will in five or ten years, but that's doesn't help us now.
The obvious solution is to constrain the problem. If this seems like a horrible thing to do then just think of the Web. There is only one way to address things (URIs), there are only a few actions you can perform (HTTP has a handful of verbs), there is no central authority (and thus you get broken links), etc.
In the map rendering world, the constraints are fairly clear - fixed scales, fixed bounding boxes (ie, tiles) and fixed styling via pre-defined style groups. If you're willing to make those three simplifications, then you can create a Web Mapping Standard that really works.
Posted in GIS, Smallworld, Web | 8 comments | no trackbacks
Posted by Charlie
Fri, 29 Jun 2007 15:59:00 GMT
Some of my recent posts could be interpreted as veiled critisism of the Open Geopspatial Consortium (OGC). But in truth, I've been very impressed how OGC has reinvented itself over the last six or seven years. So I thought I'd post about my experiences with OGC - and of course give you my spin!
Back when I was with Smallworld and leading the development of the Smallworld Internet Application Server, I was Smallworld's, and then General Electric's, representative to the OGC. It was hardly a plum assignment - I was the only one who wanted it - and it took plenty of cajoling to get reluctant management buy-in.
Now to be clear, I was hardly a mover and shaker in OGC. I attended the meetings, spoke up ever so often, drove Smallworld's participation in testbeds (see below) and helped write a discussion paper. But it sure was an interesting experience, and I quickly figured out that making standards is really hard.
The Early Years
Smallworld was an early participant in OGC, but eventually gave up on the process as OGC developed a series of standards that went nowhere - like Simple Features for SQL (the most successful), CORBA and COM. These standards never made any sense for two main reasons. First, they require a fundamental rewrite to support, and no vendor has the stomach for that. Second, they are based on a faulty assumption that distributed object protocols actually work.
The combination of suffering through the inevitable, and interminable squabbles about technical minutiae, with the full realization it was all for naught, was too much to take for the Smallworld representatives before me and they eventually stopped going.
The Web Testbed Years
I got involved with OGC right after the first Web Mapping Testbed, so around 2000/2001. The Web Mapping Testbed was a brilliant idea - sitting around in a room writing standards wasn't working out. So the OGC decided to create six month testbeds, with each testbed focused on solving some problem that a large OGC member had (with the member funding some of the cost of the testbed). By the end of the six months you needed a rough spec, and much more importantly, a working implementation. That set off a torrent of innovation - and gave birth to all of the important OGC specs used today, including WMS, GML, WFS, etc.
But back in 2000, none of that existed. As we wrote SIAS, we sure wanted some standard, any standard, to follow. And thus I pushed Smallworld and GE to get back into the OGC process. On our part, we implemented full support for WMS (including SVG at a later release!) and were one of the first companies to support GML.
But it still took lots of cajoling to convince anyone it was a good idea. I used to give talks about OGC at our user conferences, and I remember sitting on panels at GITA with titles such as "Why Use OGC Standards."
More interesting was the reaction of our customers. Smallworld/GE dominated the utility and telecoms business at the time. Utilities and telecoms are some of the most conservative organizations around - telling them they could now share their data on the Web was enough to send them into epileptic shock.
US customers were particularly uninterested. But European customers were different, particularly German customers. Many of our German customers were actually small organizations, with a service area limited to a city or two. They were huge supporters of OGC standards, and that's where we made most of our progress.
And What About the Web?
A quick caveat before continuing - since leaving GE for Ubisense and later MapBuzz, I haven't been involved in OGC. So beware - some of these thoughts may be wrong.
As I've talked about in a previous post, I don't think the OGC standards have succeeded on the Web. I find nothing surprising about it - there are a couple of good reasons for it.
First, the OGC testbeds are designed to solve hard problems for large organizations. For example, the last one I took part in modeled a disaster response to a series of tornadoes that touched down in the Washington DC metro area. They goal was for the federal, state and scores of local governments to effectively share information in real-time to manage the emergency response. Thus the demos were all about combining data from multiple sources - the latest satellite imagery, reconnaissance missions flow by droids in real-time, street vector data, parcel data, etc. If something 1/100 as effective as these demos had been used in New Orleans after hurricane Katrina, a lot of people would have been spared a lot of misery.
To get this to work you need complex standards - things like GML, SLD, WCS and WFS. But this was certainly not the Web. Sure the "Web Services" moniker was thrown around all the time, but these were SOAP services combined with sophisticated Java clients, and very few browsers in site.
Second, the OGC membership is a combination of academics, leading companies in the industry, and large organizations like Lockheed Martin and the Federal Government. Thus, the organization is geared towards writing standards that play in that world.
The Role of Google
It seems like a great coup to me that Google submitted KML to the OGC for standardization.
The big question in my mind is how will Google change OGC? The Web is in Google's DNA. Will it be able to use its knowledge, and power in web mapping, to nudge the OGC to a more web centric view? Time obviously will tell, but it sure seems like a great time to be part of OGC and watch the technical minutiae fly.
Posted in GIS, Technology | 3 comments | no trackbacks
Posted by Charlie
Thu, 28 Jun 2007 04:02:00 GMT
Let's face facts - the stable of GIS Web standards is suboptimal. To show you what I mean, let's think about the common things you'd want to do with web mapping and see if there is a successful standard that you can leverage.
We'll define success as the widespread use of the standard on the web - there are thousands, or tens of thousands, of working examples (of a web nearing billions of users, that doesn't seem too much to ask). Of course a standard may be wildley successful in another domain, such as Enterprise software, but that doesn't count as the web.
Rendering Maps
Let's start with the most obvious thing - making maps. This is the domain of Web Map Service (WMS) and Style Layer Descriptor (SLD).
It doesn't take long to figure out why no main stream mapping sites actually uses WMS. Its design makes it difficult to cache results and unfortunately rendering maps ain't quick nor easy - there is a reason Google/Microsoft/Yahoo use cached tiles. The biggest WMS server I know of is Terraserver, and its seems to have kept Microsoft researchers busy happily writing papers about how to scale it.
Instead, everyone has seemed to come up with the same approach for web mapping - use pre-rendered tiles that use the Mercator projection. Or, if you have a desktop client (Google Earth, Virtual Earth), then the engineering is difficult enough that you use your own proprietary algorithms/encodings to actually make the system scaleable enough to work.
And as far as styling maps - have you actually read the SLD standard or used it? I didn't think so.
So we'll say no success in the standards world here. Which raises an interesting question - is there any room or need for a web map tiling standard (and the Web Coverage Standard, WCS, doesn't count since it support arbitrarily bounding boxes like WMS).
Adding Points to a Map
To provide some hooks into Google Earth, Google offers KML, which lets developers add custom information to maps. Behind the might of Google, KML has gained a large market share. Since Google has recently submitted KML to the OGC for standardization, we'll call this a standards win.
Specifying Locations
This is the world of GeoRss, which lets you specify points/lines/polgyons. GeoRss has clawed its way into importance due to its simplicity and one would assume developer fatigue with reading the GML spec. GeoRss is fine for what it does, but this is the most basic level possible, akin to a first-grade reading level. But we'll call it a win.
Sharing Data
This is the world of Web Feature Server (WFS) and Geography Markup Language (GML). I've previously blogged about why I think GML is too complex for the web, and since WFS depends on GML, the same arguments apply. Just to add some spice to the party, WFS adds its own proprietary (is it fair to call a standard proprietary?) XML query language for reasons that have never been obvious to me (XQuery anyone?).
So do you see these standards used on the Web? I sure don't. Instead I see people using RSS and ATOM with GeoRSS, or shoehorning feature information into KML. There is event talk of shoehorning GML into KML.
In my view this is the biggest gap in GIS web standards - something, someday is going to fill it in. If you are a standards wonk, and want to make a difference, I'd say start here.
If it was up to me, such a standard would be designed as an Atom extension that provides a super-simple way of including feature property values. And it would use GeoRSS for geometries.
Remembering Context
There is remarkable amount of state involved when looking at a map. The most obvious thing is where you are looking at - but you also have to remember the scale, the layers that are on or off, the projection, the styles in use, etc. This is the world of the Web Map Context (WMC) standard. Since I've never implemented WMC, I don't have an opinion on its technical merits. But going back to our measure of success, is it used on the web? Not as far as I can see.
Dejure versus Defacto?
So standards strike out on map rendering and sharing geographic features, but have succeeded in specifying locations and custom map content. And there is an interesting pattern here - the dejure standards have failed, while the defacto standards have succeeded. Perhaps more about that in another post.
But after the incredible amout of energy and time spent developing these standards, it feels like precious little success.
Posted in GIS, Web | 13 comments | 4 trackbacks
Posted by Charlie
Tue, 19 Jun 2007 08:00:00 GMT
Last Saturday I had the pleasure of attending the first FRUGOS unconference - which was a meeting of 20 or so open source GIS enthusiasts who live along Colorado's Front Range. Brian and Sean organized, while Tom played host. Peter and Sean both had nice writeups of the conference - so I won't repeat what they've said.
Instead, here is a list of attendees and what they do. The list is taken from my notes, so I'm sure I've messed up some details and missed a person or two. If so, let me know, and I'll fix any mistakes.
- Gregor Allensworth-Mosheh
- Gregor is one of
the people behind HostGIS and gave a couple of interesting talks – both of which I sadly missed but
Peter has the scoop on his blog.
- Norman Barker
- Norman, visiting from England, is one of the main developers of hibernate support for PostGis.
- Peter Batty
- Take a look at Peter’s nice summary of the unconference. Peter and I have worked together for a number of years – including at Smallworld, GE and Ubisense. More recently Peter was the CTO of Integraph, but is now looking for a new gig. Check out his blog at http://geothought.blogspot.com/.
- Tom Churchill
- Tom hosted the meeting at is the founder of
Churchill Navigation, which makes extremely cool software for the next
generation personal navigation devices.
- G Hussain Chinoy
- Hussain is an active developer on NASA’s opensource WorldWind project.
Besides giving a great demo of both the .NET and Java versions of WorldWind, he also provides a fascinating glimpse into the
politics of WorldWind, NASA and OpenSource.
- Scott Davis
- Scott recently started his own consulting
company, and has just finished Pragmatic GIS. Can’t wait to get my copy! His blog is at http://www.davisworld.org/blojsom/blog/.
- Tom Gehring
- Tom worked a number
of years on IBM mainframes, and recently decided to change careers and get
involved with GIS [Not sure if I have spelled Tom’s last name correctly].
- Randy George
- Randy is with CadMaps and has worked extensively with vector map technologies such as SVG, VML, and more recently, XAML. Check out the Cadmaps blog at http://www.cadmaps.com/gisblog.htm.
- Sean Gillies
- Sean is one of the main organizers of Frugos, and works as a web developer for the
University of North Carolina for their ancient world’s project. It sounds like a great project – mapping
whatever information they can find about ancient Greece and Rome. Sean
is a big Python user and has one of the best know GIS blogs - http://zcologia.com/news.
- Chris Haller
- Chris works part time at the University of Colorado medical center and part time at PlaceMatters. In his free time he’s works on a Social
Mapping site called iCommunityTv that combines maps and multimedia. Check it out at http://blog.eparticipation.com/.
- Chris Helm
- Another Chris who is at the
University of Colorado. Chris works with Bruce on GLIMS, which is a database of the world's glaciers based on reflections from a radiomter. GLIMS is a big Postgresql/PostGIS database with a MapServer front end. Output is done via OGR or KML.
- Dan Moore
- Dan is a Web
developer and has done a fair bit of work with Google maps.
- Jim Olsten
- Jim has worked extensively
with GIS for a variety of projects including NEPA impact studies, etc.
- Trent Pigeno
- Trent is a GIS/web developer, and works with
Brian at the Timoney Group.
Trent recently traveled to South America (I think it was Chile), where he did some volunteer
GIS work, before returning to the states [Not sure if I have spelled Tom’s last
name correctly].
- Bruce Raup
- Bruce works at the National Snow and Ice Data Center in Boulder. He is the technical lead for the Global Land Ice Measurements from
Space (GLIMS) database and is a heavy user of GRASS, OGR and his own Perl scripts.
- Charlie Savage
- Well you already know me since you’re
reading this blog.
- John Spinney
- John
works for OpenWave, which is one of the main
providers of software and browser that run on mobile phones. John’s blog is at http://www.maperture.net/.
- Brian Timoney
- Brian is one of the main organizers of Frugos, and runs the Timoney Group in Denver,
which does map consulting work based on open source software, with a focus on
the petroleum industry. One of the
interesting things Brian mentioned was that a number of their customers want to
use Google Earth as a document management system.
- Bill Thorp
- Bill works for the National Park Service in Fort Collins, and is
involved with cataloging and managing their numerous web services. You can see
a nice picture of Bill at http://science.nature.nps.gov/im/contactsim/index.cfm - just scroll down a bit.
- Eric Weisbender
- Eric has the
honor of being listed last! Eric is a
GIS specialist for Western Area Power
Administration, which is part of the Department of Energy. He’s a strong proponent of open source
software, including PostGIS, OGR, Hibernate, etc.
Posted in Cartography, GIS | 2 comments | no trackbacks
Posted by Charlie
Mon, 04 Jun 2007 03:42:00 GMT
As we add custom styles for MapBuzz, an obvious question is how to style GeoRss points. In particular, I would like to specify two images/icons for each Atom entry that has a GeoRss point - a normal image and a hover image for mouse overs.
I'm wondering if there is any community consensus on how to do this? Doing a bit of research, I found a discussion about this on the GeoRss mailing list in January. A good starting point is a post by Christopher Schmidt who talked about reusing KML, while Mikel Maron asked if reusing CSS was more appropriate.
I agree that styling information shouldn't be added to GeoRss and that reusing CSS is a good choice. However, CSS doesn't work for points when you want to represent them with an image/symbol. Based on its HTML heritage, CSS considers images to be markup and not presentation and thus does not support changing an image's src attribute. The closest it gets is supporting background images, but that seems like the wrong solution for this problem.
Thus, we need to find another solution. Some ideas I've pondered include:
1. Use KML as Chris suggested. It would look something like this:
<Style id="highlightPlacemark">
<IconStyle>
<Icon>
<href>http://maps.google.com/mapfiles/kml/paddle/red-stars.png</href>
</Icon>
</IconStyle>
</Style>
<Style id="normalPlacemark">
<IconStyle>
<Icon>
<href>http://maps.google.com/mapfiles/kml/paddle/wht-blank.png</href>
</Icon>
</IconStyle>
</Style>
<StyleMap id="exampleStyleMap">
<Pair>
<key>normal</key>
<styleUrl>#normalPlacemark</styleUrl>
</Pair>
<Pair>
<key>highlight</key>
<styleUrl>#highlightPlacemark</styleUrl>
</Pair>
</StyleMap>
The obvious downside to this is how verbose it is - which is fine for KML but doesn't fit the GeoRss philosphy of keeping things simple.
2. Reuse atom's icon element:
<atom:icon>http://www.mapbuzz.com/images/marker.gif</atom:icon>
<atom:icon pseudo-class="hover">
http://www.mapbuzz.com/images/marker_hover.gif</atom:icon>
The downsides to this approach are:
- atom:icon is defined only at the feed level.
- we have to introduce a custom attribute, which I called pseudo-class to match CSS's terminology.
- If Atom ever supports icon at the entry level the semantics likely will be a bit different.
- atom:icon does not specify widths or heights, which is important to support SVG symbols.
3. Reuse XHTML's img element:
<xhtml:img href="http://www.mapbuzz.com/images/marker.gif"
height="32" width="32"/>
<xhtml:img href="http://www.mapbuzz.com/images/marker.gif"
height="32" width="32"alt="hover"/>
The advantage to this approach is that Atom's content element already allows mixing in of XHTML, so there is some precedence. It also supports image sizes and we could hijack alt to specify different images types.
4. Reuse SVG's image element:
<xhtml:img xlink:href="http://www.mapbuzz.com/images/marker.gif"
x="100" y="100"height="32" width="32"/>
<xhtml:img xlink:href="http://www.mapbuzz.com/images/marker.gif"
x="100" y="100"height="32" width="32" pseudo-class="hover"/>
An SVG image introduces a funny twist - it let's you specify x and y values.I could see this being confused with the x/y values in a GeoRss point. Alternatively, it could be helpful to precisely postion this image. SVG images also support a number of style related attributes, such as opacity, which could be helpful.
Currently, option #4 looks like the best choice to me, but just wondering what other people think.
Posted in GIS | no comments | no trackbacks
Posted by Charlie
Wed, 09 May 2007 22:03:00 GMT
Its not every day someone takes the time to write me an open letter - I have to say its kind of fun. Brian added some additional thoughts to our ongoing conversation about GML. In truth, this is where blogging breaks down a bit, it would be much easier to sit down in a room for an hour and have a great in-depth technical discussion (of course, then our discussion wouldn't be available for the whole world to see which is significant downside).
Since its a bit hard sifting through where things stand in a long discussion, let me recap the points I think we agree on:
- GML is a toolkit that provides rules for translating your proprietary data model into XML
- Having translated your data model into GML/XML, it is then necessary to code both clients and servers to understand it
Where we disagree is whether this is a good idea or not.
I see at least three very different use cases here:
- I want to share within my own organization
- I want to share with a preselected set of outside organizations
- I want to share with the world
I'll agree with Brian that for the first two use cases, GML 2 (and 3) provides a workable solution (although I think GML 1 was a better solution and that the overhead of GML 2 is prohibitive).
Its item #3 though that really matters. One of the things that makes the Web different is Metcalfe's Law (and Reed's Law) becomes predominant - the value of something becomes much more important the more people use it. Which leads me to the conclusion that everyone has to agree to a shared data model and format. Otherwise you end with thousands of one-off data integrations, which does nothing to solve the general problem.
There are obvious downsides to agreeing to a general data model - it will always be a lowest common denominator and wont work for many complex integrations that live in the realm of the first two use cases. But there is an obvious upside - it is the only thing that has any chance of working out on the web. If you don't agree, then please show me a real-life example that disproves it.
So where does that leave us? I believe that GML as it is formulated has no chance of success out on the Web because its simply not designed for it. The obvious consequence is the emergence of the Atom / GeoRSS combination and KML. And truth be told, those standards solve the problem of rendering maps made up of multiple geographic data sources well enough.
What they don't solve is exchanging attribute data between systems. And this leads right into the hornet's nest of the Semantic Web and data modeling - no one has every come up with a solution to this problem and I doubt anyone ever will.
So faced with that daunting task - why not try the simplest thing that could possibly work - which ironically was more or less GML 1:
<Feature typeName="Road">
<description>M11</description>
<property typeName="classification">motorway</property>
<property typeName="number" type="integer">11</property>
<geometricProperty typeName="linearGeometry">
<LineString srsName="EPSG:4326">
<coordinates>
0.0,100.0 100.0,0.0
</coordinates>
</LineString>
</geometricProperty>
</Feature>
In today's world, I'd modify this a bit and start with Atom, add in GeoRSS, and then add in an new namespace that encodes properties like above. And I'd stick the same stuff in the KML metadata tag.
Now, I don't expect this to do diddly-squat for machine to machine integration. What I do expect it to do is make it easy for clients to show a nice property browser to users when they mouse over a feature on a map. And for the web, that's good enough since it all comes down to humans in the end anyway.
Posted in GIS, Modeling, Web | 5 comments | no trackbacks
Posted by Charlie
Tue, 01 May 2007 19:13:00 GMT
Joel Spolsky wrote a classic post back in 2001 titled Don't Let Architecture Astronauts Scare You, which warns that the more abstract a solution to a problem is the less useful it becomes. Abstraction is a funny thing - just the right amount gives you the insight to solve previously unsolvable problems but too much obscures what you are trying to accomplish thereby leading you astray.
Atom is great example of an standard that gets the abstraction level right - as I wrote the other day its simple model of collections containing entries turns out be a very useful way to view the world.
In contrast, the Geography Markup Language (GML), which is the GIS industry's primary data exchange format, is a standard that gets it wrong. Even if you know nothing about GIS or GML its a good example of trying so solve a problem by adding one too many levels of abstraction thereby solving nothing.
What's Your View of the World?
GML sets out to solve an awfully difficult problem - how can different systems exchange geographic information (if you want to be hip, go with spatial information instead)?
The funny thing is the geometry part is fairly easy - put a few smart people in a room and they can come up with some reasonable way of describing points, lines, polygons and how they may or may not connect (you'd say roads connect with one another, but probably not to your front yard).
No, the hard bit is trying to describe the world. When you work with geographic information you immediately come face-to-face with the realization you're trying to model the real world. Of course that's true for any computer system, such as your company's payroll system, but somehow dealing with the physicality of the world really drives the point home.
Thus the real problem GML tries to solve is how can your computer system and my computer system exchange data about the world in a meaningful way? In my opinion that's an unsolvable problem, because the way your database models the world is different than mine. There is a fantastic book on the subject, called Data and Reality, that every programmer, architect and model should read. I wrote about it last year, and to quote the post:
Take a simple term - say a street. What it is it? Have you ever been on a street that ends, and then starts up in a few blocks? Is that the same street? How about a street whose name changes - is it still the same street? Does a street include boulevards, highways, freeways? Does a street cross city, county, state boundaries? Do you and I live on the same street, although I live in the United States and you live in Canada?
A Good Start - Take #1
GML 1 took a fairly simplistic way of solving the problem. You could define collections of features, with each feature having "normal" properties and geometric properties. For example, a road might look like this:
<Feature typeName="Road">
<description>M11</description>
<property typeName="classification">motorway</property>
<property typeName="number" type="integer">11</property>
<geometricProperty typeName="linearGeometry">
<LineString srsName="EPSG:4326">
<coordinates>
0.0,100.0 100.0,0.0
</coordinates>
</LineString>
</geometricProperty>
</Feature>
Having implemented the first commercial support of GML (in the Smallworld Internet Application Server), I can say that GML 1 was easy to implement and gave us a good way to exchange data between clients and servers.
A Standard Goes Awry - Take #2
And then things fell apart. GML 2 abandoned its RDF like flavor and replaced it with...well...nothing. Instead, GML moved up a level of abstraction.
Say what? Instead of saying "this is how you encode a road" it now says "this is how you construct a model that will let you encode a road." Thus GML turned itself into a modeling language. If you're a programmer, a good analogy is UML, the universal modeling language. UML provides you the tools for modeling computer programs, which you then implement in some programming language. GML provides you the tools to model the world in the way you see fit, and then implement it via XML. To do this, GML uses XML Schema as its modeling language and makes use of every last obscure feature, including such gems as substitution groups.
What is a Client to Do?
So how does this work in the real world? Let's say I want to model the road I showed above. First I have to create an XML Schema that defines a Road Feature, which must inherit from the appropriate abstract GML classes. Then I write out the road in an XML document that validates against the schema I just wrote.
But think of the poor client - what is it supposed to do with the xml document it just received from my server? Well - there are three obvious approaches.
First choice - hard-code the client to know exactly how I described a road.
Second choice - build a hugely complex piece of software that can read in the XML schema, generate an implementation of it on the fly, then load in the XML data. This is in fact possible - and exactly what I did when implementing our GML 2.0 support. However, after all that work, you still need a human to map the incoming data to your data model. In theory, being able to declare the mapping in a configuration file should be more flexible/powerful than coding the transformation by hand (like in the first choice). And maybe that's true in some programming languages. But if you're using a dynamic language like Ruby or Python, then the advantage is much less clear-cut. Having taken this approach once, I would not do it again.
Third choice - Every one agrees on a standard way of exchanging information and everyone then hard-codes their system to use it. This in fact is the solution that GML is headed towards by creating "GML Profiles." As the GML Simple Profile Feature states:
The GML specification declares a large number of XML elements and attributes meant to
support a wide variety of capabilities...With such a wide scope, interoperability can only be achieved by defining
profiles of GML that deal with a restricted subset of GML capabilities.
At this point you might be asking why exactly you went through the whole effort of creating a custom xml schema or even using GML. Good question - I don't have a good answer.
And This Solves What?
After seven years, the GML spec now weighs in at 548 pages with a 24 page index. Its numbing complexity has resulted in the development of several GML profiles, with the "Simple" one coming in at over 100 more pages.
In my view, the fundamental premise of GML is wrong. The ability to create custom data models is an anti-feature that makes integration between different computer systems impossible because it assumes that those systems can actually understand the data. Computer systems have no such intelligence - they only understand what someone has programmed them to understand.
To hit the sweet spot you must come up with a standard, simple format that every system can use. As Tim Bray, the father of XML, states:
The value of a markup language is proportional approximately to the square of the number of different software implementations that can process it. I could argue this from theory but would prefer to do so by example: HTML. RSS. PDF.
If you insist upon adding extensions, then at least come up with a standard base format like Atom has. GML offers no such simple standard, and as a result its adoption in the world is much less that it could have been, or should have been.
Instead mindshare has now switched to KML, a simpler and easier to understand format. KML is hardly perfect - you can't define your own properties (as far as I can see) and it commits the cardinal sin of mixing presentation and data (didn't we finally get past that with HTML and CSS?). But this game is over, KML has won.
Posted in GIS | 6 comments | 2 trackbacks
Posted by Charlie
Fri, 09 Feb 2007 17:56:00 GMT
With a headline like that, this better be good, right?
I'm excited to announce the speakers for the Mapping Applications on the Web seminar on March 4th at GITA's annual conference in San Antonio.
Its an all-star team, including:
Peter Batty, Vice President of Integraph. Peter's presentation is titled "The disruption of geospatial technology" and will cover the radical changes that have reshaped the industry over the last few years and how traditional GIS vendors are adapting to the changed geospatial world.
Geoff Zeiss, Director of Technology at Autodesk. Geoff, who has a great blog, will talk about open source geospatial software and open standards, and how Web 2.0 technologies can be used by enterprises and utilities to improve their operations.
Bill Gail, Microsoft, Virtual Earth team. Bill will offer a peek behind the curtain of Virtual Earth by talking about the challenges of managing such massive amounts of data. He'll also provide his insight on what is coming next from Microsoft, and how he thinks the new geospatial Internet will be used in the future.
And for my presentation, I'll talk about social mapping - using the web to create mashups, share data and create communities of users.
Best of all, the last 40 minutes of the seminar will be a question and answer session. So if you're attending GITA, make sure to sign up for the seminar and come prepared with lots of questions!
Posted in GIS | no comments | no trackbacks
Posted by Charlie
Tue, 08 Aug 2006 08:03:00 GMT
For those interested in the history of GIS, in the late 1980's and early
1990's, the founders of Smallworld laid out their vision for the future of
GIS in a series of technical papers. I recently noticed the papers are no
longer online, so I fished them out of the Internet
Archive WayBack Machine and have posted them
on my site.
Fifteen years later, its interesting to reread the papers, and see how these
ideas changed the industry. The best known of the articles is Ten
Difficult Problems in Building a GIS, by Richard Newell. The keypoints are:
- Spatial data should be stored in seamless databases, not tiled systems
- Spatial databases should support huge amounts of data
- Spatial databases should be versioned to enable long
transactions
- Topology should
be supported
- Vector and raster data should be supported
- Interaction with spatial data should be done via a dynamic,
object-oriented language
(in the same way Ruby and ActiveRecord work in Ruby on Rails)
These ideas were so far ahead of their time that they propelled Smallworld
into a hundred million dollar a year company and an IPO on Nasdaq a mere
six years after its founding in 1990. They also created an extraordinarily
loyal user base. Once you used it, you never wanted to go back. Just like Mac
users knew their machines were vastly superior to Wintel boxes, Smallworld
users knew their software was light years ahead of anything ESRI, or anyone
else, offered.
When I started
at Smallworld in 1997 no other GIS system had these features. ESRI was struggling
to overcome its ancient, ArcInfo/ArcGIS tiled-based technology that used AML
and Avenue for customization. Technically, we beat them hand-downs in every
technical benchmark. When we lost a deal, it was always for political
reasons.
It was only recently have these ideas have entered the mainstream:
- Spatial data is stored in relational database such as
Oracle or PostGIS
- Terabyte size GIS databases are common
- Smallworld, Oracle and ESRI support versioned databases
- Google maps has trained user to think something is wrong if you can't see
vector data overlaid on top of raster data
- Perl, Python and Ruby show the productivity gains provided by object-oriented,
dynamic languages
But even today, there are very few environments that combine all these elements
together. And Smallworld still
has some unique features. For example, it has the concept of worlds. In most
GIS systems there is one world - the outside world where you see map data.
But let's say your map shows a building. Often times it is useful to click
on the building and go inside of it - you've entered the building's world which
has its own coordinate system and bounds. Once inside the building, you may
want to open a switch box and see how fibers connect to each other. And then
you might want to know where does a particular fiber lead, what customers will
be impacted if it gets turned off.
Another thing Smallworld excels at is speed - it was built when network connections
were painfully slow. Thus the system does some very clever caching at
the client, resulting in near instantaneous response times, even when working
against a terabyte sized database. All this happens under the hood, the user
doesn't have to know anything about it. And it even works across dial up lines,
which of course were the norm back in the early 90's.
So if you have a few minutes, its definitely worth you time to look through
these papers.
Posted in GIS, Smallworld | 1 comment | no trackbacks