Our society faces major challenges such as climate change and loss of ecosystems. To address these challenges, we have to find optimised, sustainable solutions. In this process, data is indispensable.
Thanks to Copernicus, INSPIRE, and the Open Data Directive more and more geodata has become publicly available. However, this data is merely the tip of the proverbial iceberg. Large swathes of relevant data remain hidden from view due to security concerns and legal obligations such as GDPR. The new European Data Strategy aims to make this pool of data more accessible by introducing data spaces, a concept that’s already shown itself to be successful in the automotive industry. In this article, the differences between the existing spatial data infrastructures (SDI) and data spaces will be explained. We describe which issues can be resolved through the use of these data spaces, and what will need to be kept in mind when implementing them.
The automotive industry is good at finding efficient solutions to incredibly complex issues. Their products need to comply with a myriad of legal and economical requirements at every stage of their elaborate research, development, and production chains. To make these processes more cost-effective and secure, car manufacturers and their suppliers exchange data.
Originally, companies only had a limited and often flawed insight into their supply chain. To remedy the issues that kept cropping up due to this lack of transparency, the Catena-X Data Space was created. This data space allows all participating organisations to exchange their data in a single, secure platform. Jointly, the contributing organisations decide who is allowed access to which data, and to what purpose.
Due to the success of this approach, the new European Data Strategy builds upon the concept of data spaces to unlock the hidden potential of closed data in other sectors. Data spaces are set to support several strategic sectors, such as Agriculture, Environment, Energy, Finance, Healthcare, Manufacturing, Mobility, and Public Authorities. In addition, a data space to support the implementation of the Green Deal is already in the works. Every data space will contain a combination of public and proprietary data from both companies and governmental organisations.
What is a data space?
Governance is central to the data space concept: A set of rules and standards that establish roles and their corresponding access levels within a data space, as well as their technical implementation. For example, data providers can allow their data to be used within a training pool for AI models, but severely limit the export of that data outside of the data space. Common technical standards will have to be agreed upon as well, particularly data models such as INSPIRE, XPlanung, or 3A/NAS for the Geospatial and Environmental sectors.
Just as in SDIs, a data space is a distributed architecture, and source data is heterogeneous. Every contributing organisation can host their data in whatever manner they desire, be it on premise or in the cloud. Controlled access to that data can be securely managed through an adapter such as the Eclipse Dataspace Connector.
All data sets within a data space are interoperable. That does not mean that all data needs to conform to the same format or schema, but rather that they can automatically be integrated and harmonised as required. For this, matching- and mapping technology will be utilised, such as annotations (“This is a parcel.”). ETL tools like hale»studio and hale»connect can use these annotations to automatically prepare data for processors in different parts of the data space.
Such processing services are another integral part of the data space. Communal rules dictate how these services are allowed to access and use the data, for example whether or not data is allowed to be temporarily cached during processing. Trust plays an important part in this. Starting in 2022, a formal certification programme for processing services will be available. Once a service is certified, all participants in the data space can be certain that this service will do exactly what it claims to do.
Which issues do data spaces solve?
The creation of a data space only makes sense where there is a concrete use case where critical data gaps can be closed by using previously inaccessible data. These data gaps need to be defined and thoroughly documented.
Such a data gap also exists in scenarios where there is data available, but not in sufficient quantity to train a useful AI model. Within the security of the data space, a much greater amount of training data can be made available. Since only the final AI model will be exported out of the data space, the confidentiality of the training data is guaranteed.
There is another problem data spaces solve. It is common practice for modern platforms to siphon off and sell large amounts of data without any input from the subjects of said data, be it companies or private citizens. Within a data space, rules can be established to secure data sovereignty. They can also be set up to provide a more balanced division of the value generated through data, or to provide a “pay as you go” usage model that encompasses only the data a use case requires.
To ensure data sovereignty, the data space has to be built upon hardware, software, and operating systems that have been designed and secured to allow for it. Therefore, the data spaces’ infrastructure is being created in collaboration with GAIA-X, Europe’s distributed cloud platform.
What does this mean in the real world?
Our lighthouse project FutureForest.ai is a great way to illustrate the usefulness of data spaces. In this project, wetransform collaborates with the TU Munich and the TU Berlin, as well as several German state forests and forest research institutes to create a data space for forestry data. All these organisations contribute access to their data within the data space, so better decisions can be made to adapt forests to climate change.
The FF.ai data space combines public data, such as elevation models and land coverage maps, and private data, such as sensor data and detailed information from location mapping. The forest owners contribute their data and, in return, leverage better decision-making models.
This last decade has allowed us to make great strides in terms of spatial data accessibility, chiefly through open data initiatives. Unfortunately, a lack of attention paid to organisational frameworks and data usage conditions still hampers progress. Data spaces provide the infrastructure to solve this issue.
More than projects – the Environmental Data Spaces Community
It’s still early days in the Geo- and GIS-community when it comes to the implementation and usage of data spaces. Many projects are being launched, both nationally and internationally. To create a network between all the different parties currently involved in these projects and provide more developmental continuity, wetransform has established the Environmental Data Spaces Community. Aided by several partners and the framework laid out by the International Data Spaces Association, which sets the standards for data spaces, wetransform supports the creation of diverse data ecosystems with the goal of making environmental data accessible and usable inside a secure data space that protects data sovereignty.
Author: Thorsten Reitz, gis.Business 2/2022