Home -> Online Articles -> The Growing Need for Spatial ETL

17-09-2008

The Growing Need for Spatial ETL
On Data, Technologies and Convergence
 

 

Spatial ETL (extracting, transforming, and loading) tools have been around for over a decade. Yet only in the past few years has their true strategic value emerged in geospatial initiatives around the globe. Now more than ever, it’s the data that’s pushing the capabilities of Spatial ETL tools into a whole new dimension. This article describes how the future of these critical data moving tools is being shaped by emerging data model, source, and format requirements alongside the trends in web and traditional IT technologies.

 

By Don Murray


Figure 1. The number of formats that Spatial ETL tools have had to add support for continues to increase as demonstrated by this timeline of supported formats in FME, a Spatial ETL tool built by Safe Software.

 
Format Translation

When many people think about Spatial ETL, they usually think about translating data between formats. Common responses are “Spatial ETL is used to move data from format X to format Y!” Certainly, as figure 1 indicates, Spatial ETL tools must support an ever increasing number of formats in order to continue to meet the diverse and evolving needs of organizations around the world.

While format support is most definitely required for moving data (if you can’t read or write data, then you can’t apply the full power of Spatial ETL), Spatial ETL is much more than just format translation technology. This point was really driven home to the people at Safe Software by a recent poll of its users. It showed that the number one use of our Spatial ETL tool was to convert data from ESRI Shape to ESRI Shape! For MapInfo users, the number one destination was MapInfo, and the same pattern was found for AutoCAD and Microstation users! This was very humbling discovery.

After having put a great deal of energy into Spatial ETL since 1993, Safe Software envisioned that the number one use of Safe’s technology would be something more dramatic. With their latest addition of support for AutoCAD Map Object Data, it is now no surprise that the greatest excitement about the enhancement is from AutoCAD Map 3D users who again want to simply go from AutoCAD Map Object Data to AutoCAD Map Object Data.


Figure 2. Traditionally, Spatial ETL tools simply extracted,
transformed,  and loaded data from one format to another.
Today’s Spatial ETL tools must be able to provide full data
transformation capabilities including format translation,
data model transformation, and data integration,
along with full distribution capabilities.
They must also be able to support
a wide variety of data types – from CAD to vector
to 3D and so on.

 

The Backbone of Spatial ETL

So if today, Spatial ETL isn’t just about format translation, then what is it about? As a customer survey distinctly indicated, Spatial ETL is about reconciling data model differences. This reconciliation between the source data model and the destination data model is performed by the transformation capabilities in Spatial ETL. These transformations can be as simple as a coordinate reprojection or more sophisticated, such as combining multiple data sources while changing both attribution and geometry structures.

Ultimately, users need to be able to use data with the tools of their choice, and these tools need to be able to consume data in both the correct format and the appropriate data model. Spatial ETL enables this effective communication of spatial information so users can leverage the power of their spatial data assets.

With the proliferation of not just new formats but also a growing list of applications and new spatial data types, such as 3D and BIM, users have a greater need than ever before to get data into the format and data model they require so that they can immediately use it. This is one of the key values of Spatial ETL; it is a non-intrusive technology which allows the data to remain where it is and does not require it to be physically moved in order to be transformed. This is a sharp contrast to other approaches in which physical data models are dictated to organizations that want to participate in a data sharing project. Spatial ETL makes it easy to access data where it is, in the structure it is already in, by the applications each user prefers.


Figure 3. ETL tools are facilitating the convergence between traditional IT and GIS.


Convergence

In the early days of Spatial ETL, the tools were limited to one type of spatial data. Some focused on vector formats which encapsula­ted GIS and CAD while others focused on raster formats.  Today’s spatial ETL tools must answer to all worlds, as convergence is becoming a common requirement at multiple levels.

First, there’s a convergence of different spatial data types because state-of-the-art spatial/GIS systems now support multiple types of spatial data. For example, Oracle and ESRI database technologies now support vector, raster, tabular, point cloud and 3D data types at the database level. With these database technologies comes a whole new set of applications that can, for the first time, exploit multiple types of spatial data.

Populating these databases is a challenge for users since they must retrieve data from traditional spatial sources and push it into these new databases in order to effectively exploit the new set of spatial tools available to them. To satisfy this requirement, users need a Spatial ETL solution that supports multiple types of spatial data in order to be capable of extracting data from multiple sources, transforming it into the organization’s chosen data model, and loading it into the database (see Figure 2).

Secondly, there’s a convergence between traditional IT and what could historically be considered the mapping or GIS department. With the advent of Google Earth, users are now exposed to spatial data in ways that only a few years ago were impossible. The support for spatial data types within corporate databases such as Oracle and ESRI Geodatabase further facilitates this convergence, as now there is a single data store for all corporate data. This has begun to blur the traditional line between mapping and IT applications. Nowadays, standard IT systems incorporate maps as a new way for users to visualize and analyze their data. GIS and mapping tools begin to leverage standard IT data, thereby providing extra value to organizations.

As the job of traditional ETL tools is to enable data sharing between disparate IT applications, traditional ETL vendors such as Informatica, IBM, and Microsoft are starting to team up with leading Spatial ETL vendors to provide users with complete ETL systems that can move both traditional and spatial data between systems with ease (see figure 3). This convergence of IT and GIS technology is indeed enabling organizations to perform analysis and visualization as never before.  

Constructive Solid Geometries (CSG) CSG enables a model to be built out of relatively simple objects to create very complicated geometries. They are typically used in 3D systems for BIM and CAD. The illustrations below is from Wikipedia and illustrates how a very complex object can be constructed with very simple objects.

 

New Data Types

The geospatial industry has traditionally focused on exterior spaces such as countries, counties, cities, and parcels. Up until recently, geospatial data would end at the building footprint and not contain true 3D geometries but simply elevation, or 2 ½ D data. In the past year, the geospatial community’s interest in Building Information Model (BIM) data has increased substantially. This is reflected in that databases have been extended by Oracle and ESRI to support this growing requirement for 3D data storage.

BIM and 3D data has historically been the domain of the Architecture, Engineering and Construction (AEC) community. The AEC community has great experience working with BIM models and is able to create entire 3D models of building construction projects, greatly improving the efficiency of the construction process. Convergence between the world of BIM (interior) data and the world of GIS (exterior) data promises a marriage that will open up a whole new set of opportunities. Imagine making a city’s entire infrastructure available at the fingertips of its occupants, builders, and emergency responders; not just the exterior infra­structure! This marriage can revolutionize emergency response and improve many other facets of city living. For example, a firefighter will know exactly how much hose is required to reach a specific area within a building on fire. This potentially life-saving information is a direct result of combining the power of GIS and BIM. Google and Microsoft have also inspired people’s imaginations by building 3D views of cities. Currently, these models are exterior views of buildings which construct a model of the “cityscape.” The future possi­bilities for convergence are clear: complete cities with both “interior” and “exterior” information available in an integrated environment.

The key to making the marriage of BIM, 3D and GIS work is the data model transformation capabilities of Spatial ETL tools. This adds a whole new dimension to the development process for Spatial ETL tools like FME. Embracing a new data type requires Spatial ETL vendors to first identify leading data sources and targets to ensure the most significant impact in the new market. In first foray into 3D, Safe Software found that one of the best sources of 3D building data is a format called IFC (Industry Foundation Classes). This format is an open specification that is developed by the International Alliance for Interoperability (IAI) to facilitate inter­operability in the building industry. According to Safe Software, the best data targets are the leading databases with which they were already familiar, such as Oracle and ESRI Geodatabase, and Adobe PDF for its impressive support for 3D data.

When Safe Software began developing support for 3D data, they first analyzed the 3D models of several different Geospatial systems so that they had some level of confidence in the new 3D data model that they had created within their technology. Since they have added support for new data types before, they were not surprised to find out that some of their initial IFC to PDF translation tests were missing data!

It turns out that IFC supports a 3D concept called Constructive Solid Geometries (CSG, see box section) which the other systems don’t support. To resolve this substantial difference between IFC and other systems, the challenge was to find a solution to the data model conflict so that the data could be moved from a system that supports CSG to systems that do not. To be effective, it also had to ensured that CSG objects were not lost when moving data between two CSG capable systems.

This geometric requirement is yet another example of the data model inconsistencies that must be resolved by Spatial ETL tools in order to achieve effective communication from one system to another. It also illustrates that the data model transformation which occurs in Spatial ETL tools is often not just about attribution, but also about geometric representation.

These conflicts are nothing new to Spatial ETL. Another example, albeit simpler, that Safe Software has addressed within the 2D market space is resolving the movement of data from systems that support arcs to those that only support straight line segments.


Boolean Union: The merger of two objects into one.


Boolean difference: The subtraction of one object from another.

 

Boolean intersection: The portion common to both objects.

 

Extending Spatial ETL to the Web

No discussion surrounding Spatial ETL would be complete without addressing the role that Spatial ETL plays within upcoming Spatial Data Infrastructure (SDI) projects. As already expressed, the role of Spatial ETL is always about getting data from one or more data stores into a form that can immediately be used.  Traditional Spatial ETL has typically been a batch process which is run periodically. While there have been instances where it has been used to transform spatial data in a live process, these projects have been infrequent because of the burden they impressed on the GIS department.

Web service technologies and new Spatial ETL server solutions have now come together making dynamic, or on-the-fly, Spatial ETL available for the first time. Dynamic Spatial ETL enables web services to serve data to users in a data model that is totally different from the data model of the underlying data. It is even possible for a single web site to provide different user communities with different views of the data through the power of dynamic Spatial ETL. This is fundamental for a SDI project to be effective, as different user communities require different views of the world and have different priorities of what they need to see.

Until this technology emerged, SDI initiatives were reminiscent of the Model T Ford in which customers could have any color they want as long as it was black. But as we learned with traditional Spatial ETL: when it comes to data, one size doesn’t fit all.

The need for specific data models for distinct communities is best demonstrated by the INSPIRE project in the European Union. This project has the challenge of building a pan-European SDI that will serve users in multiple countries. To be effective, the INSPIRE SDI must be able to serve the same data to users who speak different languages, and thus must also serve the data in different languages. If ever there was a need for a single system to remodel data on-the-fly, then this is it.

While there are standards that all Spatial ETL servers must support, there is also a

proliferation of web protocols, or formats, that is occurring.  Here the winning approach is to once again for Spatial ETL tools to support as many different web formats as possible, for example Open Geospatial Consortium (OGC) protocols, GeoRSS, GeoJSON, KML, and GML.

A Spatial ETL vendor’s role is not to try to predict which formats are going to succeed in the marketplace, but rather to support as many of the leading technologies as possible, thereby allowing the market to decide. When it comes to web technologies, advancements move very quickly and are incredibly exciting to watch.

While one major use of Spatial ETL server technologies is to make data available to web applications and users, conversely, web technologies can be used as their own source for spatial data and services. An example of this is MapQuest’s recent release of a new web API that provides users with a set of routing and mapping capabilities. There are many other new web services being released all the time. In fact, the OGC has just announced its Web Processing Service (WPS) standard which will enable more and more web services to become available.

 

The Future for Spatial ETL

Throughout the industry, we are seeing an explosion of data sources in a wide variety of areas from new data types to databases and web services. At the same time, we are seeing a great increase in the number of applications that require access to that data. With the proliferation of applications and data, the future need for Spatial ETL is growing as more than ever, it is all about the data.

 

Don Murray don.murray@safe.com is President of Safe Software.

Have a look at www.safe.com





RIEGL LiDAR 2012
Trimble
Microsoft
FOIF