One of the great new features of the upcoming postgresql 8.4 release is the addition of window functions. Previously limited to enterprise databases such as Oracle and DB2, they open up a whole new world of functionality to sql queries.
Window functions are one of the more obscure parts of the sql standard, so you may never have heard of them. In a nutshell, they let you perform calculations based on the current record and its set of related records. This turns out to be quite useful. A good place to find out more information is the postgresql documentation, which does a good job of explaining some of the more common use cases.
Workplace of the Future
My introduction to window functions was almost five years ago while doing a project for Ubisense. Ubisense sells indoor tracking systems, based on ultra-wideband, that can locate tags within 6 inches.
For one of projects, we worked on Cisco’s Connected Workspace. The Connected Workspace was designed to see if office space could be laid out in a way to increase worker happiness and productivity. To do this, Cisco took all the cubicles out of the main floor of one of its buildings and replaced it with a fairly radical design. Roughly
half of the floor was made into a a large open open space with individual and group desks. The remainder of the floor was split between a large kitchen with a really nice eating room and offices that ranged in size from 1 to 12 people. Here is a picture of the main floor area (courtesy Cisco Systems):
For a few more pictures, check out Cisco’s presentation.
The idea was that employees could sit wherever they wanted, there were no assigned seats. If employees needed to collaborate they could work in the open areas, if they needed privacy they could grab one of the smaller offices and if they needed to do a conference call they could grab one of the larger offices.
The other impetus behind the experiment was financial. Cisco has a huge campus in Santa Clara hundreds of buildings, each costing millions of dollars to maintain. Was it possible to pack more people into each building and maintain, or improve, their hapiness and productivity?
The Experiment
Ubisense was hired to figure out how well the different parts of the connect workspace were utilized. By giving each employee a tag, the system anonymously keep track of each time someone entered or left a room. This aggregate data could then be used to gain insight into the effectiveness of the new floor plan:
- Did employees spend time in the open area?
- If so, in which parts of the open area (it was divided into 5 subdivisons)?
- How much were the individual offices being used? Were there too many or too few?
- What about the larger conference room?
- How much was the kitchen and eating area utilized?
To do this, I hooked into Ubisense’s platform API to monitor each time a tag entered or left a room. That information was then entered into a Oracle database (without any user information, so the data was totally anonymous). Thus the Oracle table consisted of millions of rows of data – with each row representing an tag entering a room or leaving a room. For example, here is a simplified view of the data:
tag_id
room_id
event
time
1
Conference #1
Enter
10:00am
2
Office #2
Enter
11:15am
2
Office #2
Leave
11:20am
1
Conference #1
Leave
11:30am
Window Functions to the Rescue
The next trick was to analyze the data to answer the questions I posed above. To do that required figuring out how much time each tag spent in each room. So something like this:
room_id
enter
leave
duration
Conference #1
10:00am
11:30am
1 hour 30 min
Office #2
11:15am
11:20am
5 min
OObviously you could write a script in the language of your choice to process the raw data and populate this new table. But that adds another level of complexity to the system and makes it hard to do add-hoc queries.
And this is where window functions are so useful. Using window functions, you can implement the basic algorithmfully in:
- Sort the data by tag_id, room_id and id so that room enter records for a tag are directly followed by room exit records
- Select the room exit records
- Use theUse the lag window function to pull the previous record, which is the room enter record, and then subtract the two times to get the duration
- Wrap this query up in a view, let’s call it room_usage, that can serve as the basis for add-hoc queries or reports.
Without window functions, item #3 is impossible with sql because there is no way to relate a record to its surrounding records (ie., a window).
And thus window functions provide a great new data analysis tools which postgresql will make available to everyone at no-cost.