Intensive research in the earth sciences over the past decades has created a tremendous wealth of literature, data, and material collections. So far, literature, data and sample collections have been separated. Information technology and the internet, in particular the new cyberinfrastructures for the earth sciences, offer ways to interlink literature, data and samples, creating the potential for new interpretations of the data and materials beyond the interpretation already published in the literature. To achieve this, technical, editorial and custodial issues need to be resolved. A key to this is the use of persistent identifiers for literature, data and sample collection objects. Past experience has shown that URLs are transient, but systems of persistent identifiers (e.g. handle.net, DOI, URN) already exist and can be used to reference these objects. The project "Publication and citation of scientific primary data" (STD-DOI) shows prototypically how these criteria can be met and implements a system for the publication of scientific data, which is open to the scientific community in any scientific field. This project uses persistent identifiers (DOI, handle.net and URN) to identify datasets available in a digital format. In addition, the data publications may be included into the catalogue of the German National Library of Science and Technology (TIB). Data at finer granularity are only identified by generic handle.net IDs, not by DOIs. Ideally, literature should already reference the materials used and the data derived from these. Since this is not yet done, repositories publishing data and tracking sample material record the literature based on these data and samples in their databases. In the case of the STD-DOI project, its metadata profile includes identifiers of related material, e.g. literature interpreting the data, related datasets, or samples from which the data were derived. These metadata can then be used to create ontologies interlinking literature, data and samples. The challenging task ahead is that of interlinking literature, data and samples with as little editorial work as possible. Keeping the amount of work small is essential to allow the indexing of the back catalogue of already existing works. A key technology to solve this task is the automatic creation of ontologies, which can be generated automatically by text mining applications. These ontologies can be combined with ontologies generated from reference lists and from metadata. This presentation will look at existing systems for data publication (STD-DOI project, http://www.std-doi.de), sample identification (SESAR project, http://www.geosamples.org) and for the management of interconnected literature, data publications and sample collections (TaxonConcept, http://taxonconcept.stratigraphy.net), and how these systems can be used to enable new discoveries in the earth sciences.
AWI Organizations > Infrastructure > Computing and Data Centre > PANGAEA
Helmholtz Research Programs > MARCOPOLI (2004-2008) > I-MARCOPOLI