Refining data–data and data–model vegetation comparisons using the Earth mover's distance (EMD)


Contact
ulrike.herzschuh [ at ] awii.de

Abstract

Comparing temporal and spatial vegetation changes between reconstructions or between reconstructions and model simulations requires carefully selecting an appropriate evaluation metric. A common way of comparing reconstructed and simulated vegetation changes involves measuring the agreement between pollen- or model-derived unary vegetation estimates, such as the biome or plant functional type (PFT) with the highest affinity scores. While this approach based on summarising the vegetation signal into unary vegetation estimates performs well in general, it overlooks the details of the underlying vegetation structure. However, this underlying data structure can influence conclusions since minor variations in pollen percentages modify which biome or PFT has the highest affinity score (i.e. modify the unary vegetation estimate). To overcome this limitation, we propose using the Earth mover's distance (EMD) to quantify the mismatch between vegetation distributions such as biome or PFT affinity scores. The EMD circumvents the issue of summarising the data into unary biome or PFT estimates by considering the entire range of biome or PFT affinity scores to calculate a distance between the compared entities. In addition, each type of mismatch can be given a specific weight to account for case-specific ecological distances or, said differently, to account for the fact that reconstructing a temperate forest instead of a boreal forest is ecologically more coherent than reconstructing a temperate forest instead of a desert. We also introduce two EMD-based statistical tests that determine (1) if the similarity of two samples is significantly better than a random association given a particular context and (2) if the pairing between two datasets is better than might be expected by chance. To illustrate the potential and the advantages of the EMD as well as the tests in vegetation comparison studies, we reproduce different case studies based on previously published simulated and reconstructed biome changes for Europe and capitalise on the advantages of the EMD to refine the interpretations of past vegetation changes by highlighting that flickering unary estimates, which give an impression of high vegetation instability, can correspond to gradual vegetation changes with low EMD values between contiguous samples (case study 1). We also reproduce data-model comparisons for five specific time slices to identify those that are statistically more robust than a random agreement while accounting for the underlying vegetation structure of each pollen sample (case study 2). The EMD and the statistical tests are included in the paleotools R package (https://github.com/mchevalier2/paleotools, last access: 3 May 2023).



Item Type
Article
Authors
Divisions
Primary Division
Programs
Primary Topic
Publication Status
Published online
Eprint ID
58464
DOI 10.5194/cp-19-1043-2023

Cite as
Chevalier, M. , Dallmeyer, A. , Weitzel, N. , Li, C. , Baudouin, J. P. , Herzschuh, U. , Cao, X. and Hense, A. (2023): Refining data–data and data–model vegetation comparisons using the Earth mover's distance (EMD) , Climate of the Past, 19 (5), pp. 1043-1060 . doi: 10.5194/cp-19-1043-2023


Download
[thumbnail of cp-19-1043-2023.pdf]
Preview
PDF
cp-19-1043-2023.pdf

Download (6MB) | Preview

Share
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Citation


Actions
Edit Item Edit Item