4.5 Article

Geospatial Queries on Data Collection Using a Common Provenance Model

Journal

Publisher

MDPI
DOI: 10.3390/ijgi10030139

Keywords

provenance; lineage; graph; data queries; metadata

Funding

  1. Catalan Government [SGR2017 1690]
  2. European Union [641762-2, 776740, 689443]
  3. Spanish MCIU Ministry through the NEWFORLAND project (MCIU/AEI/ERDF, EU) [RTI2018-099397-B-C21/22]
  4. ICREA Academia Excellence in Research Grant

Ask authors/readers for more resources

This paper discusses the application of lineage information in geospatial data, proposing an extension of process step lineage descriptions and introducing a system for providing lineage information as a service. By including functionalities and algorithm descriptions in lineage, the paper suggests a way to enhance the high-level information independently from software details and transform processing steps into reusable workflows.
Lineage information is the part of the metadata that describes what, when, who, how, and where geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial information. In addition, if the lineage of a collection of datasets can be related and presented together, datasets, process chains, and methodologies can be compared. This paper proposes extending process step lineage descriptions into four explicit levels of abstraction (process run, tool, algorithm and functionality). Including functionalities and algorithm descriptions as a part of lineage provides high-level information that is independent from the details of the software used. Therefore, it is possible to transform lineage metadata that is initially documenting specific processing steps into a reusable workflow that describes a set of operations as a processing chain. This paper presents a system that provides lineage information as a service in a distributed environment. The system is complemented by an integrated provenance web application that is capable of visualizing and querying a provenance graph that is composed by the lineage of a collection of datasets. The International Organization for Standardization (ISO) 19115 standards family with World Wide Web Consortium (W3C) provenance initiative (W3C PROV) were combined in order to integrate provenance of a collection of datasets. To represent lineage elements, the ISO 19115-2 lineage class names were chosen, because they express the names of the geospatial objects that are involved more precisely. The relationship naming conventions of W3C PROV are used to represent relationships among these elements. The elements and relationships are presented in a queryable graph.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available