Event recognition in marine seismological data using Random Forest machine learning classifier
Automatic detection of seismic events in ocean bottom seismometer (OBS) data is difficult due to elevated levels of noise compared to the recordings from land. Popular deep-learning approaches that work well with earthquakes recorded on land perform poorly in a marine setting. Their adaptation to OBS data requires catalogues containing hundreds of thousands of labelled event examples that currently do not exist, especially for signals different than earthquakes. Therefore, the usual routine involves standard amplitude-based detection methods and manual processing to obtain events of interest. We present here the first attempt to utilize a Random Forest supervised machine learning classifier on marine seismological data to automate catalogue screening and event recognition among different signals [i.e. earthquakes, short duration events (SDE) and marine noise sources]. The detection approach uses the short-term average/long-term average method, enhanced by a kurtosis-based picker for a more precise recognition of the onset of events. The subsequent machine learning method uses a previously published set of signal features (waveform-, frequency- and spectrum-based), applied successfully in recognition of different classes of events in land seismological data. Our workflow uses a small subset of manually selected signals for the initial training procedure and we then iteratively evaluate and refine the model using subsequent OBS stations within one single deployment in the eastern Fram Strait, between Greenland and Svalbard. We find that the used set of features is well suited for the discrimination of different classes of events during the training step. During the manual verification of the automatic detection results, we find that the produced catalogue of earthquakes contains a large number of noise examples, but almost all events of interest are properly captured. By providing increasingly larger sets of noise examples we see an improvement in the quality of the obtained catalogues. Our final model reaches an average accuracy of 87 per cent in recognition between the classes, comparable to classification results for data from land. We find that, from the used set of features, the most important in separating the different classes of events are related to the kurtosis of the envelope of the signal in different frequencies, the frequency with the highest energy and overall signal duration. We illustrate the implementation of the approach by using the temporal and spatial distribution of SDEs as a case study. We used recordings from six OBSs deployed between 2019 and 2020 off the west-Svalbard coast to investigate the potential link of SDEs to fluid dynamics and discuss the robustness of the approach by analysing SDE intensity, periodicity and distance to seepage sites in relation to other published studies on SDEs.