by Peter M. Batty
Abstract
Data modelling for AM/FM is more complex than for traditional applications for
a variety of reasons, in particular because of the need to model spatial and topological
relationships. This paper examines the general differences between data modelling
for AM/FM and other applications, and looks at a range of common modelling issues
in utility applications, including the efficient handling of various types of tracing
and network analysis, outage management, and generation and maintenance of schematics.
Introduction
There are some significant differences between data modelling for traditional
applications and GIS or AM/FM, which arise from the need to model spatial and topological
relationships. Utility applications have some additional complications compared
to many GIS applications, due to the need to model complex networks, and these
complications are not well handled by traditional GIS data models (the term GIS
data model is used in two different contexts in this paper: one is the system data
model used by the GIS vendor to model spatial and topological aspects of data stored
in the system, and the other is the user-specific data model developed on top of
the core system for a specific application).
This paper starts by discussing the general differences between data modelling
for GIS and other applications, from a user application perspective. It goes on
to look at a variety of modelling issues relevant to utility applications, primarily
from a system data model point of view. In particular a number of non-traditional
modelling techniques which offer benefits for utility applications are examined.
A more detailed look is then taken at a fairly complex application which is common
to most utilities, outage management, considering how the system data model features
which have been discussed can be applied in practice, and what sort of trade-offs
need to be considered in the design. Finally, the creation and maintenance of the
user-specific data model is discussed, and in particular the use of CASE tools
is examined.
Why Is GIS Data Modelling Different?
A traditional data modelling exercise goes through two major stages: the production
of a logical model, and then a physical model. The logical model represents a model
of the real world and is completely independent of how the system is physically
implemented. The physical model represents the actual data structures (typically
tables) which are used to implement the logical model. By far the most common way
of representing a data model is using some form of entity-relationship diagram
in which the objects (or entities) which need to be modelled are displayed as boxes,
and the relationships between them are represented by lines.
Dangers in Trying to Represent Spatial Relationships
A data model for a traditional application explicitly records all relationships
between entities which may be of interest to the application. This provides a big
trap for the unwary person producing a GIS logical data model, because in a typical
GIS application there are many relationships which will be derived implicitly from
the spatial characteristics of objects. For example, in a utility application you
might want to know which facilities are in a given work area, which are in a given
tax area, which cables are underneath roads, and so on. In a traditional logical
model this would result in defining relationships between work area and all facilities,
between tax area and all facilities, between cable and road, and so on. It is possible
to very quickly end up with a logical model in which almost everything is related
to everything, like the following:
[ Figure not available ]
It should be fairly obvious that this is not a very helpful model, although the
author has seen a number of logical models like this. The second variant of the
not terribly useful logical model is the following, which again the author has
seen in real situations:
[ Figure not available ]
In this case the designer has realised that the previous logical model is not
useful, and has tried to show that objects are related by their location. While
this model represents a little better what we are trying to model, it is still
not very helpful. Firstly, objects are in the majority of cases not related because
they have exactly the same location, but because their locations have some more
complex relationship between them, such as one crosses the other, one is inside
the other, or one is within 100 feet of the other. This could in theory be modelled
by defining suitable relationships between locations, but in practice it is totally
impractical to maintain explicit relationships between all locations. Secondly,
since all objects are shown as having a relationship to location on this diagram,
this again clutters things up (if non-spatial relationships were also shown on
this diagram it would be very confusing). If we understand that any object may
have a spatial component to it, and that through this it can be related to any
other object with a spatial component, then it is generally clearer not to try
to show this on our logical model.
Although the previous two models were somewhat simplified examples, as a general
rule it makes sense to omit spatial relationships from a logical model. This is
not an absolute rule though. In particular there are some relationships which could
be derived spatially but which could also be regarded as more explicit. A common
example is a hierarchical aggregation of areas. For example, in a utility there
is usually a hierarchy of organiz- ational areas, such as local office areas, which
are aggregated into districts, which are aggregated into regions (use of the terms
district, division and region, and their position in the hierarchy, seems to vary
from one utility to another). Since a district is always the aggregation of a number
of local office areas, this is quite a strong relationship and it may be useful
to show this on the logical model (whether this relationship is modelled explicitly
in the GIS is a separate implementation decision which will be discussed later).
Representation of Topological Relationships
Similar issues apply to the representation of topo-logical relationships: which
objects can be connected to which other objects? As with spatial relation-ships,
topological relationships can be quite exten-sive, especially in a utility application,
and show-ing them explicitly may clutter up the data model diagram. However, there
is a better case for explicitly showing topological relationships than spatial
relationships. The main one is that in a utility application there are usually
definite rules which are associated with topological relationships, for example
a cable can be connected to a trans-former but not to a gas main. This is not generally
the case with spatial relationships, for example you would typically not prohibit
a cable from crossing a gas main (there may be some such spatial constraints, but
they are normally far fewer, and too complex to represent on a data model diagram).
It is very important to make sure that topol-ogical relationships (or connectivity
rules) are documented in some form, but this does not necessarily need to be done
by showing them on the data model diagram. There are various other options, such
as creating a separate entity-relationship type diagram just showing topol-ogical
relationships, using a matrix or table to show which pairs of objects can be connected,
or just listing the valid connections for an object with the rest of the documentation
about that object.
A diagrammatic or tabular representation of connectivity rules can generally only
convey high level information about the rules. Rules can be quite complex, so it
is often necessary to include some additional description in addition to a diagram.
For example, a rule might be that it is only valid to connect two primary electrical
conductors if the phases of one are a subset of the other – so a cable with phase
BC could be connected to a cable with phase B, but not to a cable with phase AB.
Another rule might be that it is not possible to connect two gas mains with different
diameters unless they have a suitable fitting in between them.
Some people might debate whether defining these sort of complex rules is part
of the data model or part of the application. There is a strong argu-ment for including
these as part of the data model, since enforcing them is fundamental to the integrity
of the database. Also, when using an object-oriented approach to design and development,
which is widely recognized as being of benefit in complex applications like GIS,
the behaviour of an object is an important part of its definition and when taking
this approach, connectivity rules should definitely be regarded as part of the
data model.
In summary, it is difficult to give a hard and fast rule for how to document connectivity
rules. In many cases it may be appropriate to display the high level rules on the
data model diagram (ignoring complex constraints), if this can be done without
detracting from the clarity of the diagram. Whether this is practical depends on
the number of topological relationships and the number of explicit relationships
which need to be shown on the diagram. In any case, it is likely that addit-ional
documentation will be required for each object to explain more complex aspects
of the rules.
Constraints Imposed by the System Data Model
We have already mentioned that we are really discussing two separate types of
data model in this paper: the application-specific or user data model, which has
been the main topic of discussion so far, and the system data model which is provided
by the GIS vendor.
It is the system data model which handles fundamental issues such as how spatial
data and topological relationships are modelled. When implementing a GIS, you typically
cannot change the system data model. This puts constraints on the way that you
do data modelling, which at times may hinder you but at other times (hopefully
most of the time!) should help you. It helps you because the GIS vendor has implemented
a model and functionality which handles the most complex fundamental aspects of
the system. It also gives you a frame of reference in which to think about how
to specify spatial and topological relationships, which as we have seen already
can be difficult to do if you start with a blank sheet of paper. For example, most
systems use some variant on the traditional GIS model in which an object can be
a point, line or area. As you define the objects in your data model, you categorise
each one as a point, line or area, which makes the modelling process much simpler
than if you did not have this framework to work within.
However, while this simplification generally helps you, there is a danger that
the system data model may not be sufficiently rich to handle the complexity of
what you want to model. The experience of this author is that the traditional point-line-area
model has a number of shortcomings, especially for modelling utility networks.
The next section of this paper looks at various aspects of the system data model,
and in particular looks at some examples of non-traditional modelling approaches
which offer benefits for utility applications.
System Data Model Issues
This section considers various aspects of the system data model provided by the
GIS vendor which are important in being able to model utility networks effectively.
Sheet-based Versus Continuous Models
At a fundamental level, it is very important that the database is seamless, so
that an object like a cable is always stored as a single object regardless of how
long it is, and it does not have to be split into mul-tiple objects because it
crosses arbitrary map sheet or tile boundaries. This significantly simplifies application
development and data maintenance.
Most systems now provide this capability at a basic level. However, it is important
to consider how you can access these objects as a user or application developer.
Because of the very large data volumes in typical GIS databases, most systems only
allow you to work on a subset of the database at one time for analysis or update
purposes. Being able to access the whole database at once without constraints offers
significant advantages for many applications, such as network analysis and outage
management.
Providing this seamless access to a database which may be tens or even hundreds
of gigabytes in size, and obtaining good performance, is obviously not a simple
task. There are two key issues in achieving this. The first is a good spatial indexing
mechanism. The most common approach is to use some form of quadtree index. There
is extensive literature on this topic – for example see Samet, 1990. The second
key issue is the under-lying DBMS architecture. The server-oriented archi-tecture
used by standard commercial DBMSs like Oracle, Ingres and Sybase is fundamentally
unsuitable for providing the required performance in a networked environment with
many users. A client-oriented DBMS architecture can provide an order of magnitude
better performance for this sort of application. This DBMS architecture is actually
far more important in achieving good performance in multi-user networked environments
than the spatial indexing mechanism used, but it has recei-ved far less attention
in the literature. See Newell and Batty, 1993, for more details on this topic.
In summary, the important things to look for in a system data model in this area
are that it is seam-less, that it allows unconstrained access to the whole database
for update or analysis, and that it delivers good performance in a production environment.
Spatial Object Versus Spatial Attribute Model
As we mentioned earlier, most system data models are based on some variant of
the point-line-area model, in which each object belongs to one of these categories.
Some systems have extended this model, for example by adding a “control point
feature” which is like a point, but has two connection nodes. This is suitable
for modelling certain electrical objects like transformers and simple switches.
However, the basic approach is still that each object has a (single) spatial type.
Each object will then have a number of alpha-numeric attributes defined for it
(size, material, equipment number, etc.).
A much more flexible model, especially for utility applications, can be obtained
by looking at the spatial aspects of an object in a different way. Instead of insisting
that an object has a single spatial type, we can allow an object to have multiple
spatial attributes, each of which has a spatial type such as point, line or area.
This simple step of moving spatial information from an object level to an attribute
level gives lots of new modelling possibilities. We will look at some examples
to illustrate this.
Depending on the application, you may wish to regard a road either as a line or
as an area. If doing some kind of route analysis, you will be interested in tracing
along its centreline. If looking at access to properties along the road, you will
be interested in the right of way area associated with the road. With the traditional
spatial object model you would need to model these two things as separate objects,
and typically you would need to write some specific code to create and maintain
the relationship between these two objects. With the spatial attribute model, you
can simply give the road two spatial attributes, a centerline which is linear,
and a right of way which is an area.
A very common requirement in utility applicat-ions is to be able to display an
object at a location which is offset from the location where it really exists.
For example, many electric utilities display transformers offset from the cable
to which they are attached. This can be very simply handled by the spatial attribute
model: one spatial attribute can be used to store the actual location, and another
spatial attribute can store the location where its picture is to be displayed.
This applies to many situations, for example where multiple conductors are running
through the same duct, and you may want to dis-play each conductor offset by a
different amount.
Another area in which the spatial attribute model greatly simplifies modelling
is in the hand-ling of multiple representations of the same object. It is particularly
common in utility applications for the same object to appear on multiple different
types of map, including various schematics. Again this can be handled simply by
defining multiple spatial attributes on an object: one which repres-ents the actual
location, and additional ones which represent the position of the object in each
type of schematic representation. There are additional modelling techniques which
can be useful in handling schematics, such as the use of multiple worlds: we will
return to this subject later.
Basic Network Topology Modelling
The way in which network topology is modelled is obviously of fundamental importance
to utility applications. With the traditional model, a linear object typically
has two nodes, one at each end. Connected objects are defined as those sharing
a node, so other objects can only be connected to the end of a linear object. If
an new service line needs to be connected in the middle of a cable, for example,
the cable must be split into two separate cables to model the connectivity correctly.
This can lead to having to split something which is really a single object into
many different objects, typically replicating the attributes on every instance,
which causes problems in terms of data storage, data maintenance and performance.
The following diagram shows the sort of situation we are talking about:
[ Figure (drawing) not available ]
These issues can be overcome by using a two level linear network model. With this
approach, every high level linear object – we call this a chain – is made up of
one or more (continuous) low level linear objects – we call these links. In the
above drawing, the main (secondary) cable geometry would be a single chain consisting
of nine links, and each service cable geometry would be a chain with just a single
link. The links define the connectivity, but the chains are the spatial attributes
associated with an object – so we can just have a single secondary cable object
with one set of attributes, rather than having to create nine secondary cable objects.
This approach obviously allows points to be connected in the middle of a chain
too, for example a single primary conductor could have many transformer connection
points along its length.
Modelling Complex Network Topology
Tracing through a linear network is a fundamental requirement for many utility
applications. Common requirements for controlling a trace include stopp-ing at
specified objects, possibly qualified by attribute, for example stopping at all
open valves or switches. Another important requirement for electrical networks
in particular is being able to do directional tracing – upstream or downstream.
For simple networks, these requirements can be met by the traditional linear network
model consisting of links and nodes, where a link runs between two nodes, and two
links are connected if they share a common node. However, utility networks often
include objects whose connectivity cannot be easily modelled using this simple
model, in particular various types of switching or control facilities. For example,
consider the following diagram of a transfer switch:
[ Figure (diagram) not available ]
This switch has three connections, one input and two outputs. The switch can be
in one of two positions. In position 1, shown in the drawing, current from the
input goes out on output 1, and in position 2 it goes out on output 2.
There are several things which can help us produce an elegant solution to the
problem of modelling these complex objects. The first is the spatial attribute
model: we could model our transfer switch with three point geometries (spatial
attributes), to represent the input connection and the two output connections.
The second is that we need some way of defining the behaviour of this object within
a trace – the trace needs to know that if it reaches the input connection point,
then if the position attribute of the transfer switch is equal to “Position
1” then it should continue tracing from the output 1 connection point, and
otherwise it should continue tracing from the output 2 connection point.
This is a situation where object-oriented programming is very useful. With conventional
procedural programming, we would need to modify our tracing code each time we wanted
to handle a special case object like this, which makes it impossible to create
a general purpose trace routine, and causes support and maintenance problems. In
an object-oriented system, we define the trace behaviour on each object, so that
the general trace code does not need to be modified and new object behaviour can
be introduced very easily. For example, our trace code could be set up to check
whether any object it hit had a special method defined called trace_outputs, and
if it did then it would call this method to get a list of nodes from which it continue
the trace (a method is similar to a function in a procedural programming language
– see Batty, 1993, for more information on object-oriented programming in GIS).
The following is an example of what this method would look like for the transfer
switch:
method transfer_switch(trace_input) if trace_input = input_connection then if self.position = 1 then return {output_connection1} else return {output_connection2} endif elif trace_input = output_connection1 and self.position = 1 then return {input_connection} elif trace_input = output_connection2 and self.position = 2 then return {input_connection} else return {} endif endmethod
This method can contain any kind of program-ming logic, so the behaviour can be
extremely sophisticated if necessary. This gives us an elegant way of handling
the transfer switch.
The transfer switch is a fairly simple example – we also need to model more complex
devices such as the following switch cabinet (this particular example is an S&C
model PMH9):
This is displayed as a single object on the map, but internally it contains four
switches which can be operated independently. Each switch controls current on three
phases (A, B and C). The left hand diagram shows all three phases combined, and
the right hand diagram shows the phases separately. The two switches on the left
hand side are group operated switches, which means that they are either open on
all three phases or closed on all three phases, and the two switches on the right
hand side are fuse switches, which can be independently open or closed on each
phase. The trace behaviour we want needs to recognise the positions of all these
switches and derive the correct output for a given input appropriately.
The behaviour of the PMH9 switch cabinet is significantly more complex than that
of the transfer switch, so we really need something more to help us model that.
For this we will introduce another couple of new concepts: multiple worlds and
hypernodes.
[ Figure (diagram) not available ]
A good approach to this problem is to model the internal structure of the switch
cabinet as a separate set of GIS objects with their own attributes and topology
– in this case we need to model bus- bars, fuse switches and group operated switches.
We will lay these out in a schematic representation as in the diagram above (the
simpler left hand representation is sufficient providing that we store separate
attributes for the switch position on each phase, and that the tracing function
used can stop based on complex predicates involving these attributes). An issue
we need to resolve is where to place these objects – they provide more detail than
we really want in our main geographic data-base. This is one area where the concept
of multiple worlds is useful. A world is an independent coordinate system within
the same database. Many different worlds can be created, and it is possible for
a world to be related to an object. In this case we create a new world which we
think of as being owned by the switch cabinet. We then create the objects representing
the internals of the switch cabinet in its internals world. These can be created
from a list of standard templates, or objects can be created and edited individually.
These objects are connected using ordinary topological rules, and normal trace
constraints will apply, like not going through open switches.
The one thing which is still missing is a link between the cables which are connected
to the switch cabinet, which exist in the main GIS world, and the internal switches
and busbars, which exist in a separate world belonging to the switch cabinet. This
is where the hypernode comes in. A hyper-node is an object which has two point
geometry attributes, and special tracing behaviour defined on it, similar to that
which we defined on the transfer switch. In this case the special behaviour is
quite simple – it just says that if the trace hits a point belonging to a hypernode,
then it should continue tracing from the other point belonging to the hyper-node.
In this way a hypernode can be used to make a trace jump (“through hyperspace” –
hence the name!) from one point to another. The two points (or “ends”) of a hypernode
can be in different worlds. Hence we can add hypernodes
which connect the cables coming into the switch cabinet to the appropriate connection
points in the internal model. The nice thing about this approach is that we do
not have to define any special trace behaviour on any objects, even though we are
modelling some very complex behaviour. We simply specify that the switch cabinet
is an object which has internals, and all the special behaviour we need is already
defined on the hypernode, which is a standard system object. For a more detailed
discussion of the use of multiple worlds, see Newell and Doe, 1994.
Schematics
We have already mentioned that it is a common requirement in utility applications
for an object to have multiple representations, appearing not only on one or more
types of geographical map, but also potentially on various types of schematic diagram.
Several of the modelling techniques which have already been discussed are very
useful in handling schematics. Multiple spatial attributes can be used to store
the location of the object in each schematic. The geometry for each different schematic
can be stored in a different world, to provide a clean separation between each
schematic and the geographic representation.
In some types of schematic there may not be a one to one correspondence between
objects in the geographic world and objects in a schematic. For example, many cables
shown in the geographic world may be combined into a single line section in a schematic.
The spatial attribute model is not sufficient to handle this case: we need to be
able to handle (explicit) relationships between objects. The ability to define
and maintain explicit relation-ships of various types (such as one to many and
many to many) is important for many aspects of data modelling. The system should
be able to automatically enforce rules relating to relationships, like referential
integrity, or more complex rules. For example, in the case of a schematic line
section which is related to multiple cables there must be a mechanism for ensuring
that the schematic is updated appropriately if any of the cables are modified.
These data modelling require-ments can be met using a DBMS feature known as a trigger,
which is discussed in the next section.
Maintaining a Complex Model
In a GIS there are often complex relationships which need to be maintained, and
complex rules which need to be validated whenever certain objects are updated.
It is difficult to consistently validate rules via specific code in the application,
because we need to ensure that validation is always done, whichever mechanism is
used to update the record. For example, whether the record is created or updated
by a data translator, or via one of a number of interactive menus, we always want
to make sure that the same validation is done. This can be implemented by the use
of a DBMS which supports triggers. A trigger is a function (or method in object-oriented
terms) which is invoked whenever a specified object or attribute is inserted, updated
or deleted. Ideally, it should be possible to invoke the full range of GIS functions
from within a trigger, and it should be possible to cause the current transaction
to be rolled back if an invalid condition is found within a trigger.
Triggers can be used for a wide variety of functions. At a very simple level,
for example, a trigger could be defined to create a given anno-tation at a standard
offset from a certain type of point whenever the point was inserted or updated.
A trigger could also be used to implement complex connectivity rules, for example
checking that the phases of two connected cables are compatible, and returning
an error condition which will roll back the current transaction if they are not.
A trigger could also implement more complex functionality such as updating an associated
schematic geometry when the geographical representation of a record is updated.
An Application Example
This section looks briefly at some design issues involved in a common utility
application, outage management, as an example of the sort of trade-offs which need
to be considered when designing a data model.
Outage Management
We will consider the design of an outage management system for a radial electricity
network. The basic idea of this application is to record calls from customers whose
power is out and from this information predict which device is most probably causing
the outage. The customer calls may be entered into a separate system by the telephone
operators, and then passed on to the GIS for outage analysis. In a hierarchical
network, several customers will be fed from a single transformer. A number of transformers
will typically be on a section of network which is isolated by a fuse switch (i.e.
if that fuse switch is out, all customers served from all transformers downstream
of that fuse switch will be out). Further up the hierarchy there will be other
devices such as reclosers which are possible causes of an outage.
In order to predict which device or devices are the likely cause of an outage,
we need to look at the pattern of calls. If we receive several calls from customers
served by the same transformer, then we would predict that the transformer was
the probable cause of the outage. However, if we had predicted several transformers
beneath the same fuse switch, then we would change our prediction to say that it
was most likely that the fuse switch was out rather than all of the individual
transformers. Exactly how many devices need to be predicted before a device further
upstream is predicted depends on the type of device upstream and a range of other
factors. A detailed discussion of all the design requirements is beyond the scope
of this paper. However, it is sufficient to know that for any predictable device
on the network, we need to be able to efficiently identify the switchable devices
immediately downstream of that device, the transformers directly fed from this
device, and if the device is predicted as being out we need to be able to calculate
the number of customers and the total load (kVA) downstream of that device.
To meet the requirement of being able to efficiently identify the immediately
downstream predictable devices and transformers from a given device, we have several
options. We could dynamically trace downstream each time we needed this information.
This could potentially involve tracing downstream for some distance, which could
be a performance issue. A second possibility is that we could construct a separate
network, using a similar approach to that which was discussed for schematics, which
contained only the devices we were interested in for outage management, connected
by linear objects which could be formed from an aggregation of several cables in
the detailed geographic network. We would still have to do a trace each time we
needed the downstream devices, using the “outage network” but performance should
be better as the network is simpler. However, creating a separate network has a
data storage overhead, and we need to write some application code (probably a set
of triggers), which creates and maintains the second network automatically. A third
option is that we could maintain a set of explicit relationships which models the
hierarchy of predictable devices. This would again have some storage overhead,
and would need some triggers writing to maintain the hierarchy, but would probably
give the best performance of any of the approaches.
So we have (at least) three possible approaches to this problem. The simplest
one in terms of the data model and application development require-ments is likely
to be the least efficient in terms of performance. This is a case where prototyping
is very useful to try the different approaches. This application was recently implemented
by the author, and the first approach which was tried as a prototype was the first
option described, which was most attractive because of its simplicity. The performance
obtained with this approach was tested and found to be good, so it was decided
that it was not worth prototyping the other options. This sort of situation occurs
quite frequently, where you have a choice of deriving complex relationships between
objects on the fly, or of creating more explicit relationships which will give
you better performance when querying the relationship, but which has overheads
in terms of data storage, application development and performance of updates. On
a case by case basis you need to evaluate the pros and cons of the different options,
and this may often involve prototyping some of the options.
Managing The Data Model
This section briefly discusses technology which can help in the development and
maintenance of a data model, in particular the use of CASE technology.
Problems in Designing and Maintaining a Data Model
One of the largest costs in most large GIS implementations is the cost of customising
the system to meet an organisation’s specific requirements. In turn, designing
and maintaining the data model for the GIS is typically one of the most significant
elements of this customisation. Data modelling is a complex task for most applications,
but this is particularly true for GIS as we have seen already. GIS projects typically
have quite long life cycles, and the technology is relatively new to most users,
which means that at the outset of the project they typically do not realise the
full capabilities of the system. Both of these things contribute to the fact that
requirements are very likely to change during the course of the project, and these
often require the data model to change. Also as new applications are developed
and added to the system, there may well be requirements for further changes to
the data model. Since GIS projects involve capturing large amounts of data, it
is critical that these changes to the data model can be made without losing any
data which is already stored in the system.
With most traditional GIS software, it has been very difficult to address these
issues. Typically the design of the data model takes a long time and is difficult
to subsequently change. This usually means that a very long period of time is spent
at the beginning of the project doing requirements analysis and data model design
to try to make sure that it is exactly right (which of course it never will be)
before any other work begins, as it is so difficult to make changes subsequently.
This section briefly discusses how a CASE tool can be used to address these issues,
and how this in turn radically changes the way in which one can approach the problem
of customising a GIS.
What is a CASE Tool?
The acronym CASE stands for Computer Aided Software Engineering, and it is used
to describe a variety of computer-based tools which can be used to assist in the
design and development of computer programs. Such tools have been developed for
various purposes, including the analysis and documentation of procedures and data
flows, and the design and documentation of a data model. It is the latter function
which we consider here: the use of a tool which can be used to define a graphical
representation of a data model, in the form of an entity-relationship diagram which
we discussed earlier. By clicking on individual objects it is possible to define
more detailed information about them, such as what attributes they have, what the
types of these attributes are, and so on. With some CASE tools, it is possible
to automatically generate code which will create the data structures which have
been designed.
Compatibility Between the CASE Tool and the DBMS
For doing a high level conceptual design, you do not necessarily need a close
correspondence between the CASE tool being used and the DBMS which will eventually
be used to implement the actual system. However, if the CASE tool is to be used
to do the physical database design and to actually generate code, then clearly
a much closer link is required between the CASE tool and the DBMS, and application
development environment, used to implement the actual system.
This requirement can really be split into two. The first is that the CASE tool
supports all the data modelling concepts supported by the DBMS: all its datatypes,
all the types of relationships it supports, and other concepts which we have discussed
such as triggers. In particular for GIS the CASE tool needs to understand the way
in which spatial information and relationships are stored in the DBMS. The CASE
tool may extend to covering aspects of the user interface of the application, such
as which fields are visible to the user and what sort of interface is used to edit
objects and their individual fields.
The second requirement is that the CASE tool must be able to generate code in
an appropriate format which will implement the data model which has been designed.
Providing that common data modelling concepts are supported, it is obviously technically
possible for one CASE tool to output code in multiple formats suitable for different
DBMSs.
These requirements are to a certain extent independent of each other. If the first
one is met but not the second then it is at least possible to use the CASE tool
to do the physical design for the system, but the code to implement this then has
to be written manually. On the other hand, it is possible to have a CASE tool which
supports a subset of the concepts supported by the DBMS (with GIS, for example,
a CASE tool which supports alphanumeric datatypes but not spatial datatypes), which
could be made to output code in an appropriate format for the DBMS, but further
development work would then be required in the DBMS environment (outside the CASE
tool) to incorporate any of these additional concepts in the application. In this
situation it is likely to be much more difficult to usefully maintain the data
model using the CASE tool after its initial creation, since the CASE tool does
not know about any of the changes which have been made to the data model within
the DBMS environment. Clearly, the CASE tool is much more useful if it meets both
of these requirements in full.
Maintenance of the Data Model
While the requirements in the previous section demand a very close link between
the CASE tool and the DBMS, the biggest benefits the author has found from the
use of a CASE tool come from taking this integration one stage further and providing
the ability to update the data model of an existing DBMS which is already populated
with data, without losing any existing data. This is particularly important in
GIS projects, since as we mentioned, they tend to last for a long time, requirements
are particularly prone to change during the course of the project, and typically
the database will contain large amounts of data when these changes have to be made.
It is highly desirable to be able to make data model changes on a test version
of the database so that they can be tested before applying them to the production
version. Ideally one would like to be able to do this without replicating data
in the master database.
Use of an Incremental Development Methodology
If suitable tools are available to allow the data model of a populated database
to be easily changed, this can significantly change the approach which is taken
to an application development project. Instead of taking the traditional approach
of trying to completely design the data model at the beginning of the project,
which typically takes a very long time, it is possible to start with a much simpler
core data model and develop it over time in parallel with the development of application
prototypes. This allows benefits to be delivered to users much more quickly. For
a more detailed discussion of the use of CASE tools with GIS, see Kendrick and
Batty, 1994.
Conclusion
This paper has covered a range of issues relating to GIS and AM/FM data modelling.
We started by considering how best to represent spatial and topological relationships
when designing an application data model, which is not specifically covered by
any of the common approaches to data model design. We then considered how new features
in the GIS system data model could help the user model certain things more easily.
From this perspective it is important that GIS vendors continue to look at enhancing
their system data models rather than just continuing to use the simple point-line-area
spatial object model which is still the most commonly used approach. It is also
important for users to consider the system data model of any system which they
are evaluating. Finally, we discussed CASE technology which can simplify the creation
and maintenance of a data model, and which in doing so can radically change the
approach which is taken to a GIS development project, by allowing the data model
to be developed incrementally rather than having to completely design it at the
beginning of the project.
References
Batty, P.M, 1993: Object-Orientation – some objectivity please!: Proceedings of
GIS 93 Conference, Birmingham, UK.
Kendrick, G., and Batty, P.M., 1994: Use of an Integrated Case Tool for GIS Customisation:
Proceedings of EGIS 94.
Newell, R.G., and Batty, P.M., 1994: GIS databases are different: Proceedings
of AM/FM Conference XVII, pp 279-288.
Newell, R.G., and Doe, M., 1994: Discrete Geometry with Seamless Topology in a
GIS.
Samet, H., 1990: The design and analysis of spatial data structures: Addison-Wesley,
Reading, Massachussetts, 493 p.