Version 1
: Received: 14 June 2023 / Approved: 14 June 2023 / Online: 14 June 2023 (08:40:50 CEST)
How to cite:
Tadesse, K. B.; Dinka, M. O. Water Quality Class Modeling Using Machine Learning Algorithms at Roodeplaat Dam, South Africa. Preprints2023, 2023061016. https://doi.org/10.20944/preprints202306.1016.v1
Tadesse, K. B.; Dinka, M. O. Water Quality Class Modeling Using Machine Learning Algorithms at Roodeplaat Dam, South Africa. Preprints 2023, 2023061016. https://doi.org/10.20944/preprints202306.1016.v1
Tadesse, K. B.; Dinka, M. O. Water Quality Class Modeling Using Machine Learning Algorithms at Roodeplaat Dam, South Africa. Preprints2023, 2023061016. https://doi.org/10.20944/preprints202306.1016.v1
APA Style
Tadesse, K. B., & Dinka, M. O. (2023). Water Quality Class Modeling Using Machine Learning Algorithms at Roodeplaat Dam, South Africa. Preprints. https://doi.org/10.20944/preprints202306.1016.v1
Chicago/Turabian Style
Tadesse, K. B. and Megersa Olumana Dinka. 2023 "Water Quality Class Modeling Using Machine Learning Algorithms at Roodeplaat Dam, South Africa" Preprints. https://doi.org/10.20944/preprints202306.1016.v1
Abstract
Water pollution is a common problem for dams situated within an urban or agricultural catchment. This can negatively affect the hydro ecosystem, drinking, recreational and other uses of water. In this study, the drinking water quality class of the Roodeplaat Dam, South Africa which faces pollution problems was modeled using machine learning algorisms in Python Jupyter Notebook 6.0.0. Eleven monthly water quality parameters recorded at five sampling stations from January 1981 to September 2017 were used for training and testing the model. Five machine learning classifiers: Gaussian Naïve Bayes (GNB), K-nearest neighbors (KNN), Decision Tree (DT), Support Vector Machines (SVM), and Linear Regression (LR) at a test size of 20%, 25%, 30%, and 40% were used to classify water into five classes (Excellent to Very bad). It was investigated that the dam water has only three classes good, medium, and bad. The prediction accuracies of machine learning algorithms from the highest to the lowest were 96.39%, 96.17%, 92.25%, 90.20, and 54.19% for KNN, DT, SVM, GNB, and LR, respectively. Therefore, KNN at a test size of 30% was recommended to classify the water quality of Roodeplat Dam accurately. Hence, machine learning algorithms can be used to identify the class of water quality before the water is treated and distributed for drinking use.
Keywords
Decision Tree; linear regression; Naïve Bayes; Python; Support Vector Machine
Subject
Environmental and Earth Sciences, Water Science and Technology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.