Data flow, standardization, and quality control
Earth system cyberinfrastructures include three types of data services: repositories, collections, and federations. These services arrange data by their purpose, level of integration, and governance. For instance, registered data of uniform measurements fulfill the goal of publication but do not necessarily play a part in a data flow system. The data repository provides the first and high level of integration that strongly depends on the standardization of incoming data. Applications within the Digital Earth showcases connect repositories and federated databases to the end-user, the scientist. One example here is the framework Observation to Archive and Analysis (O2A) that is operational and continuously developed at the Alfred-Wegener-Institute, Bremerhaven. The O2A uses OGC standards and a representational state transfer (REST) architecture, where both data and interface operations are openly available. A data repository is one of the components of the O2A framework and much of its functionality, for instance the near real-time monitoring, depends on the standardization of the incoming data. In this context, we develop a modular approach to provide the standardization and the quality control for monitoring of the ingested data. Two modules are under development. First, the driver module that executes transformation of tabular data into a standardized format. Second, the quality control module that runs the quality tests on the ingested data. Both modules rely on the sensor operator and on the data scientist, two actors that interact with both ends of the ingest component of the O2A framework (http://data.awi.de/o2a-doc). The result is the harmonized data of multiple sources accessible at the end-point, or the web service of the data repository (https://dashboard.awi.de/data-xxl/). Here we focus on the concepts and current development that aim at the enhanced monitoring and scientific workflow with a special focus on the modules driver and quality control.