Prediction and interpretation of pathogenic bacteria occurrence at a recreational beach using data-driven algorithms

Published in Ecological Informatics, 2023

Recommended citation: Jang, J., Abbas, A., Kim, H., Rhee, C., Shin, S. G., Chun, J. A., ... & Cho, K. H. (2023). Prediction and interpretation of pathogenic bacteria occurrence at a recreational beach using data-driven algorithms. Ecological Informatics, 102370. https://authors.elsevier.com/c/1i5qV5c6cL2WXM

The goal of this work is to compare different conventional (tree based) machine learning algorithms with deep learning architectures (IA-LSTM and TFT). The comparison is done in terms of their prediction accuracy, interpretation and computation efficiency. The problem at hand was pathogenic bacteia occurence in beach waters in South Korea. The predictors included various hydrological and climatic facotrs which affect pathogenic bacteria occurence in beach waters. The results showed that tree based machine learning algorithms have high prediction performance however lags behind in interpretation espcially in terms of local interpretation. This short-coming can however be partially overcome by using model-agnostic interpretation methods such as SHAP. The deep learning algorithms however take longer time to train and have difficulty in application of model-agnostic interpretation methods. However, the attention maps from these architectures (IA-LSTM and TFT) provide detailed information about important predictors for each sample.

Download paper here