A Bibliometric-Systematic Literature Review (B-SLR) of Ma-chine Learning-Based Water Quality Prediction: Trends, Gaps, and Future Directions.

JA Muñoz-Alegría, J Núñez, R Oyarzún, CA Chávez, JL Arumí, L Rodríguez-López

Keywords: water quality, machine learning, Explainable artificial intelligence (XAI), Biblio-metric-systematic literature review (B-SLR), Topic modeling.

Abstract

Predicting the quality of freshwater, both surface and groundwater, is essential for the sustainable management of water resources. This study collected 1822 articles from the Scopus database (2000-2024) and filtered them using Topic Modeling to create the study corpus. The B-SLR analysis identified exponential growth in scientific publica-tions since 2020, indicating that this field has reached a stage of maturity. The results showed that the predominant techniques for predicting water quality, both surface and groundwater, fall into three main categories: i) ensemble models, with Bagging and Boosting representing 43.07% and 25.91% respectively, highlighting Random For-est (RF), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boost-ing (XGB) along with their optimized variants; ii) deep neural networks such as Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), which excel at modeling complex temporal dynamics; and iii) traditional algorithms like Artificial Neural Network (ANN), Support Vector Machines (SVM), and Decision Tree (DT), which remain widely used. Current trends point towards the use of hybrid and ex-plainable architectures, with increased application of interpretability techniques. Emerging approaches such as Generative Adversarial Network (GAN) and Group Method of Data Handling (GMDH) for data-scarce contexts, Transfer Learning for knowledge reuse, and Transformer architectures that outperform LSTM in time series prediction tasks were also identified. Furthermore, the most studied water bodies (e.g., rivers, aquifers) and the most commonly used water quality indicators (e.g., WQI, EWQI, dissolved oxygen, nitrates) were identified. The B-SLR and Topic Modeling methodology provided a more robust, reproducible, and comprehensive overview of AI/ML/DL models for freshwater quality prediction, facilitating the identification of thematic patterns and research opportunities.

Más información

Título de la Revista: Water (MDP)
Fecha de publicación: 2025
Página de inicio: 1
Página final: 46
Idioma: Inglés
Notas: WoS