Towards an interoperable digital ecosystem in Earth System Science research
Earth System Science (ESS) relies on the availability of data from varying resources and ranging over different disciplines. Hence, data sources are rich and diverse, including observatories, satellites, measuring campaigns, model simulations, case studies, laboratory experiments as well as citizen science etc. At the same time, practices of professional research data management (RDM) are differing significantly among various disciplines. There are many well-known challenges in enabling a free flow of data in the sense of the FAIR criteria. Such are data quality assurance, unique digital identifiers, access to and integration of data repositories, just to mention a few. The Helmholtz DataHub Earth&Environment is addressing digitalization in ESS by developing a federated data infrastructure. Existing RDM practices at seven centers of the Helmholtz Association working together in a joint research program within the Research Field Earth and Environment (RF E&E) are harmonized and integrated in a comprehensive way. The vision is to establish a digital research ecosystem fostering digitalization in geosciences and environmental sciences. Hereby, issues of common metadata standards, digital object identifiers for samples, instruments and datasets, defined role models for data sharing certainly play a central role. The various data generating infrastructures are registered digitally in order to collect metadata as early as possible and enrich them along the flow of the research cycle. Joint RDM bridging several institutions relies on professional practices of distributed software development. Apart from operating cross-center software development teams, the solutions rely on concepts of modular software design. For example, a generic framework has been developed to allow for quick development of tools for domain specific data exploration in a distributed manner. Other tools incorporate automated quality control in data streams. Software is being developed following guiding principles of open and reusable research software development. A suite of views is being provided, allowing for varying user perspectives, monitoring data flows from sensor to archive, or publishing data in quality assured repositories. Furthermore, high-level data products are being provided for stakeholders and knowledge transfer (for examples see https://datahub.erde-und-umwelt.de). Furthermore, tools for integrated data analysis, e.g. using AI approaches for marine litter detection can be implemented on top of the existing software stack. Of course, this initiative does not exist in isolation. It is part of a long-term strategy being embedded within national (e.g. NFDI) and international (e.g. EOSC, RDA) initiatives.