Posted by Charlie
Fri, 29 Jun 2007 15:59:00 GMT
Some of my recent posts could be interpreted as veiled critisism of the Open Geopspatial Consortium (OGC). But in truth, I've been very impressed how OGC has reinvented itself over the last six or seven years. So I thought I'd post about my experiences with OGC - and of course give you my spin!
Back when I was with Smallworld and leading the development of the Smallworld Internet Application Server, I was Smallworld's, and then General Electric's, representative to the OGC. It was hardly a plum assignment - I was the only one who wanted it - and it took plenty of cajoling to get reluctant management buy-in.
Now to be clear, I was hardly a mover and shaker in OGC. I attended the meetings, spoke up ever so often, drove Smallworld's participation in testbeds (see below) and helped write a discussion paper. But it sure was an interesting experience, and I quickly figured out that making standards is really hard.
The Early Years
Smallworld was an early participant in OGC, but eventually gave up on the process as OGC developed a series of standards that went nowhere - like Simple Features for SQL (the most successful), CORBA and COM. These standards never made any sense for two main reasons. First, they require a fundamental rewrite to support, and no vendor has the stomach for that. Second, they are based on a faulty assumption that distributed object protocols actually work.
The combination of suffering through the inevitable, and interminable squabbles about technical minutiae, with the full realization it was all for naught, was too much to take for the Smallworld representatives before me and they eventually stopped going.
The Web Testbed Years
I got involved with OGC right after the first Web Mapping Testbed, so around 2000/2001. The Web Mapping Testbed was a brilliant idea - sitting around in a room writing standards wasn't working out. So the OGC decided to create six month testbeds, with each testbed focused on solving some problem that a large OGC member had (with the member funding some of the cost of the testbed). By the end of the six months you needed a rough spec, and much more importantly, a working implementation. That set off a torrent of innovation - and gave birth to all of the important OGC specs used today, including WMS, GML, WFS, etc.
But back in 2000, none of that existed. As we wrote SIAS, we sure wanted some standard, any standard, to follow. And thus I pushed Smallworld and GE to get back into the OGC process. On our part, we implemented full support for WMS (including SVG at a later release!) and were one of the first companies to support GML.
But it still took lots of cajoling to convince anyone it was a good idea. I used to give talks about OGC at our user conferences, and I remember sitting on panels at GITA with titles such as "Why Use OGC Standards."
More interesting was the reaction of our customers. Smallworld/GE dominated the utility and telecoms business at the time. Utilities and telecoms are some of the most conservative organizations around - telling them they could now share their data on the Web was enough to send them into epileptic shock.
US customers were particularly uninterested. But European customers were different, particularly German customers. Many of our German customers were actually small organizations, with a service area limited to a city or two. They were huge supporters of OGC standards, and that's where we made most of our progress.
And What About the Web?
A quick caveat before continuing - since leaving GE for Ubisense and later MapBuzz, I haven't been involved in OGC. So beware - some of these thoughts may be wrong.
As I've talked about in a previous post, I don't think the OGC standards have succeeded on the Web. I find nothing surprising about it - there are a couple of good reasons for it.
First, the OGC testbeds are designed to solve hard problems for large organizations. For example, the last one I took part in modeled a disaster response to a series of tornadoes that touched down in the Washington DC metro area. They goal was for the federal, state and scores of local governments to effectively share information in real-time to manage the emergency response. Thus the demos were all about combining data from multiple sources - the latest satellite imagery, reconnaissance missions flow by droids in real-time, street vector data, parcel data, etc. If something 1/100 as effective as these demos had been used in New Orleans after hurricane Katrina, a lot of people would have been spared a lot of misery.
To get this to work you need complex standards - things like GML, SLD, WCS and WFS. But this was certainly not the Web. Sure the "Web Services" moniker was thrown around all the time, but these were SOAP services combined with sophisticated Java clients, and very few browsers in site.
Second, the OGC membership is a combination of academics, leading companies in the industry, and large organizations like Lockheed Martin and the Federal Government. Thus, the organization is geared towards writing standards that play in that world.
The Role of Google
It seems like a great coup to me that Google submitted KML to the OGC for standardization.
The big question in my mind is how will Google change OGC? The Web is in Google's DNA. Will it be able to use its knowledge, and power in web mapping, to nudge the OGC to a more web centric view? Time obviously will tell, but it sure seems like a great time to be part of OGC and watch the technical minutiae fly.
Posted in GIS, Technology | 3 comments | no trackbacks
Posted by Charlie
Thu, 28 Jun 2007 02:04:00 GMT
Last week Landon Blake vented his frustration at the Open Geospatial Consurtium by claiming that poorly designed standards create obstacles to innovation and collaboration. This touched off an interesting discussion on always interesting Geowanking mailing list.
I don't buy Landon's argument about collaboration - and in fact I think his blog post shows the opposite. But innovation is an entirely different matter. I'm with Landon on this one - I think standards can stiffle innovation. How? Well, I see two ways.
First, some standards are so broadly defined that there is little hope of interoperability. Why would you come up with such a standard? I think Landon gets right to the heart of the matter:
If you were a proprietary software company working to design a software standard that would dominate your particular industry, would you tweak it to your competitive advantage, or to someone else’s competitive advantage? Come on now, be honest with me…
Not to pick on GML again, but I think its a good example of such a standard. As a vendor, I can go implement GML support in my product, check the "supports OGC standards tick box," and yet provide no interoperability with other vendor's implementations. Cynical? Damn right. Does it happen. Of course.
A second problem arises from "blessed" standards which are too complex or too poorly designed. A good example is XML Schema. Because the W3C put its stamp of approval on XML Schema, its the main game in town. It goes back to that old saying "You won't be fired for buying IBM" - well you won't be fired for implementing or using XML Schema.
However, there are better alternatives such as RelaxNG. Now better is obviously in the eye of the beholder, but if you ask a number of technical people which standard they prefer, RelaxNG will win hands down.
Yet its adoption is painfully slow despite the fact that it was authored by one of the foremost experts on markup languages and has the support of the digerati. If RelaxNG can't topple Goliath, then woe to those that try to topple entrenched standards with less illustrious supporters.
Posted in Technology | no comments | no trackbacks
Posted by Charlie
Thu, 08 Feb 2007 08:58:00 GMT
Web sites based on user generated content need to ...well... let users generate content (obviously you read this blog for its insights!).
Except that opens a huge can of worms. Some users will want to deface content that other users have created. Others will try and upload content that covers your web site in platypuses - just to prove they can. Still others will try and upload malicious content so they can steal other user's personal information, such as passwords.
Let's look at one small part of the problem - letting users add content to a site. Say you have a nice little form that looks something like this (note this form does not work):
Soon your first user wanders by - and is aghast at the primitiveness of your solution. How do I make things bold? Color? Links? Pictures?
At this point a bright idea pops into your head - why not write a nice, simple, little formatting language? Maybe we could make italic text like ''this''. And bold text like '''this'''. Thankfully, you soon come to your senses and realize this has all been done before - just pick you favorite between MediaWiki, Textile, Markdown, and scores of others.
Then again, why bother making users learn some strange new markup language? What about a nice editor - something like this. Then again, if you hate you users, you could try something like this. Either way, you happily code up your nice editor and wait for good things to happen.
When Bad Things Happen
Before you know it, you're running the next MySpace, and you're barely off the phone with Yahoo when Google comes calling. But then disaster strikes - one of your valued customers, Sammy, has decided to befriend every other member. Your website crashes under the load, Yahoo and Google decide you're an incompetent lout and they shower their millions on your evil competitor.
What went wrong? You got burned by a classic example of cross-site scripting. Sammy embedded javascript into the content he uploaded for his profile. When a user, Sally, looked at it, the embedded JavaScript was executed, causing Sally to unknowingly become Sammy's friend. And then when Sue looked at Sally's profile, she became Sammy's friend. And soon everyone loved Sammy.
Don't Trust Your Users
The sad moral of the story is don't trust your users. You have to sanitize everything uploaded to your web site. Your nice little editor produces HTML - which you blithely except and store right into the database.
Although a bit ashamed at your gaffe, you figure any fool can clean up a bit of HTML. You code up a baroque regular expression, that only you can understand, and update your web site. You figure Google will be back begging tomorrow.
Tomorrow dawns, and now everyone is friends of Sally. Eeek gads - what went wrong this time? After a bit of searching you come across a sketchy looking site that has pages and pages of examples on how to defeat your primitive defences.
Despairing, you take the day off and ponder becoming a rock star, or if that fails, a real estate agent.
Friends Abound
The next day, everyone has decided to become friends with Sue. Your heart warms as you watch all your users getting to know each other. That feeling lasts through your first cup of coffee, when you get slapped by a lawsuit from some weird sounding European country you've never heard of claiming that you've willfully exposed personal details about your users without their consent. You figure its time to buckle down and solve this problem once and for all.
A bit of searching on Google turns up a bewildering number of choices. You find one site though, HTML Purifier, that offers a nice comparison of the options. By now its lunch, and you call up your buddy Bill to grab some food.
XHTML Basic
You tell Bill about your woes. He calls you an idiot and says that you have to buy him lunch. He then points out that your fancy little editor generates XHTML, right? So why don't you use libxml to parse the XHTML, and tell it to validate it against Basic XHTML. Seeing the bewildered look in your eye, Bill sighs deeply, and tries again.
Look, he says, trying to validate HTML leads right into the morass of tag soup where browsers do their best to render whatever you throw at them. Although that sounds nice, it reality it leaves a vast attack space for someone to slip in malicious content, often times using invalid HTML.
If you switch to XHTML, you immediately eliminate that problem. Continuing, Bill also points out that you can reuse all the work that has gone into building the XML tool chain. And even better, he explains, the W3C has kindly spent the last five years breaking XHTML into different modules. Each module is rigorously defined via a DTD.
They have also been kind enough to define XHTML Basic, which combines a subset of XHTML modules to create a simplified version of XHTML that can run on small devices such as PDAs and cellphones. It eliminates most of the nasty XHTML elements - with a tweak here and a there you can get rid of them all. And while you are at it, its probably best to eliminate all the predefined character entities (you do use UTF8, don't you?).
So all you have to do is take the XHTML Basic DTD and validate user input against it. You say that sounds awfully difficult. Bill laughs, and quickly writes a few lines of Ruby code on a napkin:
require 'xml/libxml_so'
def verify(html)
dtd = XML::Dtd.new("public", 'xhtml1-transitional.dtd')
parse = XML::Parser.string(html)
parse.validate(dtd)
end
You stare incredulously - that's it? Bill replies - not quite. DTDs can't verify attributes, so you still have to make sure there aren't any nasty JavaScript fragments lurking in them. For example:
<img src="javascript:alert('you have been hacked')" />
He also mentions that you could have libxml validate against a Relax NG schema instead, which supports validation of attributes.
And voila, you've successfully plugged at least one security hole in your web site. Undoubtedly there are many more to be found.
Update - Its definitely worth reading the great comment from Ambush Commander, who is the author of HTML Purifier, an HTML sanitization library. I should have been more clear that XHTML Basic is a good starting point since it removes a large portion of XHML that you don't want to support. However, it still includes dangerous elements like <script> and <object>, so clearly you have to remove those. Anyway, its worth checking out out his comment and my response.
Posted in Design, Ruby, Technology | 5 comments | 1 trackback
Posted by Charlie
Sat, 19 Aug 2006 08:33:00 GMT
I was struck by Nicholas Carr's post about Wikipedia dominating search results.
Try it for yourself - google a few topics off the top of your head. Odds are
the first page of results will include a link to Wikipedia.
I was surprised by this at first -
how has such centralization risen out of the vastness of the Web? But upon
reflection, it seems to me a natural consequence of increasing returns.
Diminishing Returns
If you're not familiar with increasing returns, its an economic theory used
to model knowledge based economies. Traditional economics is based on diminishing
returns, where each additional unit of a good or service
has less value than the one that preceded it. Say you build cars - as you build
more and more cars your costs will increase - raw materials will become more
expensive, labor costs will go up, you'll have to buy land to build new factories,
etc. At some point its not worth your time to build new cars. Diminishing returns
is a powerful model for describing the part of the economy that deals with rival goods
- goods that can only be consumed by one person (if I buy the last red car
on the lot you cannot).
Increasing Returns and Making Gorillas
In contrast, with increasing
returns the value of a good or service increases
as more people use it.
This causes positive feedback mechanisms to kick-in that reinforce the use
of the good or service to the detriment of other goods or services. The end
result is that a market becomes dominated by a single good or service.
Increasing
returns are used to model knowledge-based markets which are based on non-rivals
goods (goods that many people can share). The software industry is full of
examples, including Microsoft (Windows and Office), Oracle (databases), SAP
(ERP software), etc. Geoffrey
Moore noted the dominance of these firms, which he called gorillas,
in his 1998 book The
Gorilla Game.
The Web is also full of examples - Yahoo, Amazon, Google, etc. And for business
that can leverage network
effects , where customers are enabled to directly interact with each
other, growth can be truly spectacular as witnessed by EBay, CraigsList,
MySpace and Skype. And now Wikipedia.
What is surprising, at least to me, is that open source methodologies can
create gorillas. In the software world, you have to look no further than Bind
or Apache. But it also happens online - two of the organizations
mentioned above, CraigsList and Wikipedia, have been built by the users themselves.
As users entered in more and more content, they were able to attract more and
more users. Those users in turn created more content - creating a powerful
feedback mechanism that catapulted CraigsList and Wikipedia in the very top
tier of web sites.
Dethroning Gorillas
Dethroning a gorilla is hard. To
do it you have to offer a product that is vastly superior, otherwise there
is no hope in convincing users to pay the costs of changing.
And that usually means you have to leverage a technology revolution. For instance,
Microsoft beating out IBM by betting on the PC, Microsoft beating out WordPerfect
and Lotus by betting on Windows, Google beating out Microsoft (at least online)
by betting on the Web, CraigsList beating out newspaper classifieds by using
the Web, etc.
But open source provides a second surprise here. If you can't ride a technology
revolution, then open source appears to be the only viable way of attacking
a gorilla. The obvious example is Linux versus Windows, but others abound -
MySQL/Postgresql versus Oracle, CVS/SVN versus a slew of commercial products,
JBoss versus WebSphere, Open Office versus Microsoft Office, etc. An established
gorilla can crush commercial competitors by any number of means - undercutting
them on price, colluding against them, copying their functionality or just
buying them outright.
But those techniques don't work against open source projects. That means
an open source project has as much time as it needs to establish
itself, get a few users, and start leveraging increasing returns. Which leads
to an interesting question - over a long period of time, can a commercial entity
compete against open source projects? More concretely, can Microsoft maintain
the domination of Windows and Office over the next ten years assuming that
some technological revolution doesn't come along and make the whole experiment
moot. If the answer is no, its strikes me that someone has an awfully interesting
Economics thesis to write in the future.
Posted in Technology | no comments | no trackbacks
Posted by Charlie
Sat, 19 Aug 2006 08:33:00 GMT
I was struck by Nicholas Carr's post about Wikipedia dominating search results.
Try it for yourself - google a few topics off the top of your head. Odds are
the first page of results will include a link to Wikipedia.
I was surprised by this at first -
how has such centralization risen out of the vastness of the Web? But upon
reflection, it seems to me a natural consequence of increasing returns.
Diminishing Returns
If you're not familiar with increasing returns, its an economic theory used
to model knowledge based economies. Traditional economics is based on diminishing
returns, where each additional unit of a good or service
has less value than the one that preceded it. Say you build cars - as you build
more and more cars your costs will increase - raw materials will become more
expensive, labor costs will go up, you'll have to buy land to build new factories,
etc. At some point its not worth your time to build new cars. Diminishing returns
is a powerful model for describing the part of the economy that deals with rival goods
- goods that can only be consumed by one person (if I buy the last red car
on the lot you cannot).
Increasing Returns and Making Gorillas
In contrast, with increasing
returns the value of a good or service increases
as more people use it.
This causes positive feedback mechanisms to kick-in that reinforce the use
of the good or service to the detriment of other goods or services. The end
result is that a market becomes dominated by a single good or service.
Increasing
returns are used to model knowledge-based markets which are based on non-rivals
goods (goods that many people can share). The software industry is full of
examples, including Microsoft (Windows and Office), Oracle (databases), SAP
(ERP software), etc. Geoffrey
Moore noted the dominance of these firms, which he called gorillas,
in his 1998 book The
Gorilla Game.
The Web is also full of examples - Yahoo, Amazon, Google, etc. And for business
that can leverage network
effects , where customers are enabled to directly interact with each
other, growth can be truly spectacular as witnessed by EBay, CraigsList,
MySpace and Skype. And now Wikipedia.
What is surprising, at least to me, is that open source methodologies can
create gorillas. In the software world, you have to look no further than Bind
or Apache. But it also happens online - two of the organizations
mentioned above, CraigsList and Wikipedia, have been built by the users themselves.
As users entered in more and more content, they were able to attract more and
more users. Those users in turn created more content - creating a powerful
feedback mechanism that catapulted CraigsList and Wikipedia in the very top
tier of web sites.
Dethroning Gorillas
Dethroning a gorilla is hard. To
do it you have to offer a product that is vastly superior, otherwise there
is no hope in convincing users to pay the costs of changing.
And that usually means you have to leverage a technology revolution. For instance,
Microsoft beating out IBM by betting on the PC, Microsoft beating out WordPerfect
and Lotus by betting on Windows, Google beating out Microsoft (at least online)
by betting on the Web, CraigsList beating out newspaper classifieds by using
the Web, etc.
But open source provides a second surprise here. If you can't ride a technology
revolution, then open source appears to be the only viable way of attacking
a gorilla. The obvious example is Linux versus Windows, but others abound -
MySQL/Postgresql versus Oracle, CVS/SVN versus a slew of commercial products,
JBoss versus WebSphere, Open Office versus Microsoft Office, etc. An established
gorilla can crush commercial competitors by any number of means - undercutting
them on price, colluding against them, copying their functionality or just
buying them outright.
But those techniques don't work against open source projects. That means
an open source project has as much time as it needs to establish
itself, get a few users, and start leveraging increasing returns. Which leads
to an interesting question - over a long period of time, can a commercial entity
compete against open source projects? More concretely, can Microsoft maintain
the domination of Windows and Office over the next ten years assuming that
some technological revolution doesn't come along and make the whole experiment
moot. If the answer is no, its strikes me that someone has an awfully interesting
Economics thesis to write in the future.
Posted in Technology | no comments | no trackbacks