On the reliability of acoustic annotations and automatic detections of Antarctic blue whale calls under different acoustic conditions
Evaluation of the performance of computer-based algorithms to automatically detect mammalian vocalizations often relies on comparisons between detector outputs and a reference data set, generally obtained by manual annotation of acoustic recordings. To explore the reproducibility of these annotations, inter- and intra-analyst variability in manually annotated Antarctic blue whale (ABW) Z-calls are investigated by two analysts in acoustic data from two ocean basins representing different scenarios in terms of call abundance and background noise. Manual annotations exhibit strong inter- and intra-analyst variability, with less than 50% agreement between analysts. This variability is mainly caused by the difficulty of reliably and reproducibly distinguishing single calls in an ABW chorus made of overlaying distant calls. Furthermore, the performance of two automated detectors, based on spectrogram correlation or subspace-detection strategy, is evaluated by comparing detector output to a “conservative” manually annotated reference data set, which comprises only analysts' matching events. This study highlights the need for a standardized approach for human annotations and automatic detections, including a quantitative description of their performance, to improve the comparability of acoustic data, which is particularly relevant in the context of collaborative approaches in collecting and analyzing large passive acoustic data sets.
AWI Organizations > Institutes > HIFMB: Helmholtz Institute for Functional Marine Biodiversity