Data flow, standardization, and quality control


Contact
brenner.silva [ at ] awi.de

Abstract

Earth system cyberinfrastructures include three types of data services: repositories, collections, and federations. These services arrange data by their purpose, level of integration, and governance. For instance, registered data of uniform measurements fulfill the goal of publication but do not necessarily play a part in a data flow system. The data repository provides the first and high level of integration that strongly depends on the standardization of incoming data. Applications within the Digital Earth showcases connect repositories and federated databases to the end-user, the scientist. One example here is the framework Observation to Archive and Analysis (O2A) that is operational and continuously developed at the Alfred-Wegener-Institute, Bremerhaven. The O2A uses OGC standards and a representational state transfer (REST) architecture, where both data and interface operations are openly available. A data repository is one of the components of the O2A framework and much of its functionality, for instance the near real-time monitoring, depends on the standardization of the incoming data. In this context, we develop a modular approach to provide the standardization and the quality control for monitoring of the ingested data. Two modules are under development. First, the driver module that executes transformation of tabular data into a standardized format. Second, the quality control module that runs the quality tests on the ingested data. Both modules rely on the sensor operator and on the data scientist, two actors that interact with both ends of the ingest component of the O2A framework (http://data.awi.de/o2a-doc). The result is the harmonized data of multiple sources accessible at the end-point, or the web service of the data repository (https://dashboard.awi.de/data-xxl/). Here we focus on the concepts and current development that aim at the enhanced monitoring and scientific workflow with a special focus on the modules driver and quality control.



Item Type
Conference (Talk)
Authors
Divisions
Primary Division
Programs
Primary Topic
Helmholtz Cross Cutting Activity (2021-2027)
Publication Status
Published
Event Details
5th Data Science Symposium.
Eprint ID
53695
DOI 10.5281/zenodo.4546067

Cite as
Silva, B. , Software Engineering Team and Computing and Data Centre (2021): Data flow, standardization, and quality control , 5th Data Science Symposium . doi: 10.5281/zenodo.4546067


Download
[thumbnail of Dataflow_Driver_Quality.pdf]
Preview
PDF
Dataflow_Driver_Quality.pdf

Download (2MB) | Preview

Share
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Citation

Geographical region
N/A

Research Platforms
N/A

Campaigns
N/A


Actions
Edit Item Edit Item