Forecasting of Water Quality Index Using Long Short-Term Memory (LSTM) Networks
Forecasting of Water Quality Index Using Long Short-Term Memory (LSTM) Networks
---------------------------------------------------------------------------------------------------------------------------------------
Submitted: 15-10-2021 Revised: 26-10-2021 Accepted: 28-10-2021
---------------------------------------------------------------------------------------------------------------------------------------
ABSTRACT water consumed meets the standard of approved
Investigating the quality of water is crucial towards agency for water quality[2].
the prevention of outbreak of water borne disease Water assumes a crucial role in our day-
as well as its applicability of water in the area of by-day life and the nature of water in a region
road construction, agriculture, ad fishery. The main intensely influences the practical improvement of
focus of this paper is to build up a water quality nearby ordinary industrial, agricultural and other
forecasting model with the assistance of water anthropogenic activities. Common water resources
quality parameters utilizing Long Short-Term like groundwater and surface water have
Memory (LSTM) and Recurrent Neural Network dependably been the least expensive and most
(RNN). This study uses the water quality index broadly accessible sources of freshwater. In any
data of six years from the Kaggle online database. case, these assets are destined to progress toward
For this paper, data incorporates the estimation of becoming defiled because of different variables
four parameters namely pH, water temperature, including human, industrial and commercial
water conductivity, and ORP which impact and activities just as common procedures.
affect water quality index. To assess the Notwithstanding that, poor sanitation foundation
performance of the developed LSTM model and protocols and absence of mindfulness additionally
benchmarked with RNN, the metrics used are contribute enormously to drinking water
Regression coefficient and Root Mean Squared defilement. A considerable lot level of the water
Error. The prediction performance shows that the pollutants have long haul long term negative effects
LSTM out performed that of RNN for the on water quality, causing hazard to human
prediction of water quality index (WQI). wellbeing [3].
Keywords: Water Quality, LSTM, RNN, Poor water quality influences the earth
Regression, Model, Forcasting activities and human well being. Additionally,
contaminated water can prompt some waterborne
I. INTRODUCTION ailments and furthermore impact child mortality.
Water is crucial for all types human and As indicated by the United Nations, waterborne
physical activities such as agriculture, construction infections cause the death of 1.5 million children
transportation etc. The nature of water helps in for every year. The World Health Organization
controlling the biotic diversity, vitality, and rate of says that consistently more than 3.4 million
succession. The impacts of unclean water are individuals die because of water-related ailments.
broad, affecting each part of life affects not just In this way, it is extremely essential to devise novel
aquatic lives but a greater percentage of human life methodologies and techniques for checking the
and sustainability [1]. The integrity of water quality level of water deterioration deteriorating water
genuinely influences human wellbeing, fishery quality and to figure out future water quality
economy, and agricultural activities. In this way, patterns. So as to complete valuable and productive
the administration of water assets is pivotal so as to water quality analysis and foreseeing the water
upgrade the nature of water so as to be sure that the quality examples, it is important to incorporate a
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1007
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
temporal dimension to the analysis, with the goal compared. The results verify the efficiency of the
that the seasonal variation of water quality is model that is proposed.
tended to. Distinctive approaches have been
proposed and applied for analysis and checking II. LITERATURE REVIEW
water quality and time series analysis. The study of accuracy of different
machine learning model for the classification of
1.1 Water Quality Parameters water quality is proposed by [4], the outcome of the
1.1.1 pH classification performance shows that MLP and
pH is the measure of the degree of the acidity or IBk classifiers outperformed others with regression
alkalinity of a water solution. The pH scale is a coefficient of 91.57% for the Kinta rivers dataset
logarithmic value with the range 0 to 17, whose used to train the models. Similar research work was
neutral point is 7.A water solution above 7 is implemented by [5] where series of analysis was
alkaline or basic solution, while the water solution carried out to investigate the different parameters
below the value of 7 is acidic. The effect of of water quality collected from an online database
temperature on Ph is of inverse relationship. That as well as the use of principal component analysis
is, as the temperature of water increases, so do it and artificial neural network (ANN) to predict the
pH value decreases and vice versa. quality of water index parameters. The significant
1.1.2 Temperature of Water result shows that ANN was able to predict the
Temperature is another important water quality quality of water for different usage. Another crucial
parameters which is need to be considered aspect of water quality investigation is in the area
alongside other water properties. Temperature has a of monitoring using either internet of things (IoT)
effect on the conductivity and pH of water. and other remote sensing technology. In order to
1.1.3 Water Conductivity prevent outbreak of water borne disease through
Water conductivity is the ionic strength of a water improve water quality measurement, the use of IoT
solution to conduct electricity with the typical unit was proposed by [2], to improve the quality of
for measurement being micro-Siemens per water in Fiji Island. Similar Sigfox based IoT
centimetre. The conductivity of water increases as monitoring approach with water quality prediction
the dissolved ions increases. An increase in was investigated by [6]. The monitoring platform
conductivity of value of water, may signify that the leverage of a low power wide area network
water is polluted, such as sewage leaks, chemical technologies to transmit sensed was quality
waste flooding into water. Water conductivity is parameters used for training a deep learning
directly related with the salinity, denoting the fact algorithm for the prediction of water quality.
that conductivity of water increases as the salinity
increases. III. METHODOLOGY
1.1.4 Oxidation and Reduction Potential Machine learning is an aspect of Artificial
(ORP) Intelligence (AI) that gives a machine the capacity
Oxidation-Reduction potential is the to consequently learn and improve from experience
measure of a water solution oxidizing power. That without being unequivocally programmed. It
is the potential of a water solution ability to sanitize centres on the development of computer programs
itself. The more the oxidizers the more the ORP that can get the data and use it for learning
values. Similarly, the lower the ORP, the more the processes themselves. The process of learning
reducers starts with observations or information, such as
The paper is focus on the use of LSTM to examples, direct expertise, or instruction, so as to
forecast the water quality parameter using the look for patterns in data and settle on better choices
dataset from Kaggle database. These parameters later on based on the examples we give. The main
incorporate physical, biological and chemical aim is to enable the computers to adapt
factors which impact water quality. The outcome consequently without human interference or help
shows that the Machine Learning procedures so as and alter the activities as required. The
to anticipate the future water quality patterns of a methodology utilized in this research involves
specific region with the assistance of historical Machine Learning with training and testing data
water quality data. LSTM model is utilized to build from Kaggle online data repository. The theoretical
up a methodology for viable water quality forecast background of the methodology is as follows:
and analysis. The model performance based on
LSTM and RNN for water quality forecast is
3.1 Long Short-Term Memory (LSTM)
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1008
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
Recurrent neural networks (RNN) are their default behaviour. LSTMs also have a chain
networks with loops in them, enabling the like structure, yet the repeating module has an
information to persevere as shown in figure 1. alternate structure, not at all like RNNs [8]. Rather
When the gap between the related information and than having a single neural network, there are four
the place it is required is small, RNNs can learn to layers, cooperating in a unique manner. The way to
be programmed to utilize the past information [7]. LSTM is the cell state. The cell state is somewhat
Unfortunately, as the gap increases, RNNs become similar to a conveyor belt. It runs straight down the
unfit to learn to associate the information. whole chain, with some minor linear connections.
LSTMs are an improved type of RNNs, It is extremely simple for information to the cell
equipped for adapting long term conditions. state, carefully controlled by structures called gates
Recollecting information for long periods purposes as represented by figure 2.
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1009
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
LSTM networks are appropriate for A RNN utilizing LSTM can be trained in a
classifying, processing and making predictions supervised fashion, on a set of training sequences,
based on time series data, since there can be lags of using an optimization algorithm, gradient descent,
obscure duration between important events in a joined with back propagation through time to
time series [9]. They were created to manage the calculate the gradients needed during the
exploding gradient and vanishing gradient optimization process, in order to change weights.
problems that can be experienced when training Gates are an approach to alternatively let data
traditional RNNs. The activation function of the through. They are made out of a sigmoid neural net
LSTM gates is frequently the logistic function. The layer and a point wise multiplication operation as
weight of these connections, which need to be illustrated by figure 3.
learned during training, decide how the gates
operate.
The sigmoid layer yields numbers through", while a value of 1 signifies "let
somewhere in the range of 0 and 1, depicting the everything through". An LSTM has three of these
amount of every component ought to be let gates, to secure and control the cell state as shown
through. A value of 0 signifies "let nothing in figure 4.
The initial phase in our LSTM is to choose Following stage is to choose what new data we are
what data we are going to discard from the cell going to store in the cell state. This has two
state. This choice is made by the sigmoid layer, sections. Initial, a sigmoid layer, called the input
called the "forget gate" layer. It looks at ht-1 and xt gate layer, chooses which values should be
and yields a number somewhere in the range of 0 updated. Next, a tanh layer makes a vector of new
and 1 for every cell state Ct-1. 1 signifies "totally candidate values, C’t, which could be added to the
keep this" and 0 implies “totally dispose of this". state.
ft = σ(Wf . [ht−1 , xt ] + bf ) (1) it = σ(Wi . [ht−1 , xt ] + bi ) (2)
C′t = tanh(Wc . [ht−1 , xt ], bc ) (3)
ft is the value of the forget gate at t th time. Wf is it is the value of the input gate. Wiis the weight
the weight between the forget gate and the input between the input gate and the input layer. biis the
layer. ht-1 is the output of the previous memory bias vector. Wc is the weight between the input gate
block. xt is the input vector. bfis the bias vector. and the hidden layer.
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1010
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
After that, we update the old cell state, Ct-1, into the water quality index. The contrast between the
new cell state, Ct. We multiply the old state by ft, model's expectations and the output delivered with
overlooking the things we chose to overlook the output that it should produce and modify the
before. Then we add it * C't. This is the new number of neurons in the output layer, to such an
candidate values, scaled by the amount we chose to extent that we will have progressively accurate
update each state value. prediction. This process is repeated to train the
Ct = ft ∗ Ct−1 + it ∗ C′t (4) model. Each cycle of modifying weights and biases
Ct is the memory from the current block. Ct-1 is the is called as a “training step”. The LSTM model is
memory of the previous block. trained for 250 epochs, adam solver and a batch
At last, we have to choose what we are size of 1 is used as seen in Table 1.
going to yield. This output will be based on our cell
state however will be a filtered form. To begin 3.4 Model Evaluation
with, we run a sigmoid layer which chooses what This enables the testing of the proposed
parts of the cell state, we are going to yield. At that LSTM model against the information that has never
point, we put the cell state through tanh (to push been utilized for preparing. This helps to acertain
the values in the range of - 1 and 1) and multiply it how the model may perform against information
by the result of the sigmoid gate, with the goal that that it has not yet observed. This is intended to be
we just output the parts we chose to. illustrative of how the model may perform in
ot = σ(Wo [ht−1 , xt ] + bo (5) reality. We will utilize Mean Squared Error (MSE),
ht = ot ∗ tanh(Ct ) (6) Root Mean Squared Error (RMSE) and Coefficient
otis the value of the output gate. Wo is the of Determination (R2) to assess the execution of our
weight between the hidden layer and the output models.
gate. bo is the bias vector. ht is the output of the
current block. 3.4.1 Mean Squared Error (MSE)
Thus, this single unit settles on choice by MSE calculates the averages of the
thinking about the present information, past output squares of the error, that is, the average squared
and past memory. What's more, it produces new difference between the evaluated qualities and what
output and adjusts its memory [10]. is assessed. It is a risk function, comparing to the
estimation of the squared error loss. It evaluates the
3.2 Data Collection nature of a predictor (i.e. a capacity mapping
This step is essential in light of the fact discretionary contributions to an example of
that the quality and amount of information that we estimations of some arbitrary variable). In the event
accumulate will legitimately decide how great your that a vector of n forecasts produced from an
predictive model can be. The information for this example of n data pints focuses on all variables and
examination has been gathered from Kaggle, an Y is the vector of observed estimations of the
online information repository supporting the factors being anticipated, at that point the within-
procurement, handling and long haul storage of example MSE of the predictor is figured as: -
1
water quality over the world. Four parameters have MSE = n ∑ni=1(Yi − Y′i )2 (7)
been decided for this study, i.e. Temperature, pH,
dissolved oxygen (DO) and turbidity. The time 3.4.2 Root Mean Squared Error (RMSE)
interval of 15 minutes has been gotten to complete The root-mean-square deviation (RMSD)
a successful predict process using this time series or root-mean-square error (RMSE) is an as often as
that incorporates date/time, parameters and their possible utilized measure of the differences
measurements along with their measurement units. between values (example or populace values)
anticipated by a model or an estimator and the
values observed. The RMSD speaks to the square
root of the second sample moment of the
3.2 Model Training differences between forecasted values and observed
The training dataset contains 1,339 rows, values or the quadratic mean of these differences.
which is 80% of the total dataset. The second piece RMSD is a proportion of accuracy, to contrast
of the dataset will be utilized for assessing the between the error of various models for a specific
model. The testing dataset contains 263 columns, dataset and not between datasets, as it is scale-
which is 20% of the original dataset. The LSTM dependent. RMSD is the square root of the average
model training using the dataset was ccarried outto of squared errors.
steadily improve the model's capacity to predict the
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1011
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
1
1
RMSE = √MSE = √ ∑ni=1(Yi − Y′i )2 (8) y̅ = ∑ni=1 yi (9)
n
n
Then the change of the dataset can be measured
using three sum of squares formulas:
3.4.3 Regression Coefficient • The total sum of squares (proportional to the
The coefficient of determination, indicated variance of the data)
by R2 or r2 and pronounced "R squared", is the SStot = ∑i(yi − y̅)2 (10)
extent of the variance in the dependent variable that • The regression sum of squares, also called the
is expected from the independent variable(s). explained sum of squares
A data set has n values indicated as y1 ..., SSreg = ∑i(fi − y̅)2 (11)
yn (collectively known as yi or as a vector y =
• The sum of squares of residuals, also called the
[y1,...,yn]T), each associated with a predicted (or
residual sum of squares
modelled) value f1,...,fn (known as fi, or
SSres = ∑i(yi − fi )2 = ∑i ei 2 (12)
sometimes ŷi, as a vector f).
The most general definition of the coefficient of
Define the residuals as ei = yi − fi (forming a
determination is
vector e). SS
If is the mean of the observed data: R2 = 1 − SSres (13)
tot
Table 1: LSTM architecture and set options for the training model
Model Training Parameters Values
Solver Adam
Maximum number of epochs 250
Gradient threshold 1
Initial learn rate 0.005
Learn rate schedule piecewise
Learn rate drop period 120
Learn rate drop factor 0.25
Verbose 0
Number of features 1
Number of responses 1
Number of hidden units 150
IV. RESULTS values are in the range of 6.7 to 7.3. This shows
The basic line plot for all the four that the pH of water from a dataset from Kaggle is
parameters required for predicting water quality is in the ideal range. (6.5 to 7.5). Similarly, for trend
showing in figure 5.The plot contains temperature and variation was observed for dissolved oxygen
shows a variation within the range of 0 to 30 degree and turbidity.
Celsius. In the line plot of pH, we can see that the
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1012
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
The model was trained and used for LSTM has performed better than the other
prediction of water quality index. Here, it can be conventional RNNmodel.
seen that the proposed LSTM model that was The performance evaluation of a typical
trained is better than the other models that was used regression algorithm such as employed in the
to benchmark it in terms of performance. The models presented using LSTM is measured by its
performances of the two models have been root mean square error (RMSE). This is an
calculated using the three metrics, R2, MSE and indication of how close the forecast values are to
RMSE as seen in Table 2. Here, we can see that the observed values.
The trained LSTM network with the specified training options as described in Table 1. The figure 6 below the
training trend of the LSTM model in terms of the RMSE and Loss Function of the training model
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1013
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
To forecast the water quality, the values of input to the function. The test data was standardize
multiple time steps in the future is shown in figure using the same parameters as the training data. The
7, where the prediction and update-state of the state figures shows that the LSTM model could
function is used to predict time steps one at a time accurately predict the water quality which can
and update the network state at each prediction. For guide scientist and engineers of the usage of water
each prediction, use the previous prediction as for their various application.
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1014
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
Figure 7: Plot the training time series with the forecasted values
The RMSE values for all the models with observed values with higher RMSE of 11.01
expressed as error functions are visibly provided in While, figure 9 describes the comparisons of the
the Error plot (appearing as a subplot) to all the forecast with the observed values 5.24.Here, the
models. Lower values of RMSE compared to the LSTM prediction model is more accurate, due to
distance from line of best fit indicates how accurate reduced value of the RMSE and the fact that the
the prediction is. Consequently, RMSE is a model was updated with the observed values
measure of the prediction error which by inference instead of the predicted values.
indicates a measure of anomaly [11]. However, in the plots for their respective
To make predictions on a new sequence, updated forecast, the RMSE is visibly improved.
reset the network state using reset State. Resetting This is because during the updated forecast, the
the network state prevents previous predictions forecast values of the previous 10% have been
from affecting the predictions on the new data. replaced with the observed values. It suffices
Reset the network state, and then initialize the therefore, to conclude that the performance of the
network state by predicting on the training data. LSTM models is reliably drawn based on the
Figure 8 denotes the comparisons of the forecast RMSE.
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1015
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1016
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 10 Oct 2021, pp: 1007-1017 www.ijaem.net ISSN: 2395-5252
accurately predict the water quality index with predicting water table depth in agricultural
better than that of RNN based on the R2and RMSE, areas,” J. Hydrol., vol. 561, no. April, pp.
Considering about the disadvantage of a long 918–929, 2018, doi:
preparing cycle or long training cycles experienced 10.1016/j.jhydrol.2018.04.065.
using LSTM structure, an increasingly successful [10] P. K. Kashyap, S. Kumar, A. Jaiswal, M.
memory block can be structured in future further Prasad, and A. H. Gandomi, “Towards
research work. Precision Agriculture: IoT-Enabled
Intelligent Irrigation Systems Using Deep
REFERENCES Learning Neural Network,” IEEE Sens. J.,
[1] O. Elijah et al., “A concept paper on smart vol. 21, no. 16, pp. 17479–17491, 2021, doi:
river monitoring system for sustainability in 10.1109/JSEN.2021.3069266.
river,” Int. J. Integr. Eng., vol. 10, no. 7, pp. [11] Otuoze AO, Mustafa MW, Sofimieari IE,
130–139, 2018, Dobi AM, Sule AH, Abioye AE, et al.
doi: 10.30880/ijie.2018.10.07.012. Electricity theft detection framework based
[2] A. N. Prasad, K. A. Mamun, F. R. Islam, and on universal prediction algorithm. Indones J
H. Haqva, “Smart water quality monitoring Electr Eng Comput Sci 2019;15.
system,” in 2nd Asia-Pacific World doi:10.11591/ijeecs.v15.i2.pp758-768
Congress on Computer Science and
Engineering, APWC on CSE 2015, 2016,
pp. 1–6,
doi: 10.1109/APWCCSE.2015.7476234.
[3] O. Elijah et al., “Application of UAV and
Low Power Wide Area Communication
Technology for Monitoring of River Water
Quality,” in 2018 2nd International
Conference on Smart Sensors and
Application, ICSSA 2018, 2018, pp. 105–
110, doi: 10.1109/ICSSA.2018.8535994.
[4] R. Rosly et al., “The Study on the Accuracy
of Classifiers for Water Quality
Application,” Int. J. u- e- Serv. Sci.
Technol., vol. 8, no. 3, pp. 145–154, 2015,
doi: 10.14257/ijunesst.2015.8.3.13.
[5] T. K. Anyachebelu, “Prediction of a Water
Quality Index using Online Sensor Data,”
University of Bedfordshire, 2019.
[6] P. Boccadoro, V. Daniele, P. Di Gennaro, D.
Lofù, and P. Tedeschi, “Water Quality
Prediction on a Sigfox-compliant IoT
Device: The Road Ahead of Water,” pp. 1–
13, 2020, [Online]. Available:
http://arxiv.org/abs/2007.13436.
[7] W. C. Wong, E. Chee, J. Li, and X. Wang,
“Recurrent Neural Network-Based Model
Predictive Control for Continuous
Pharmaceutical Manufacturing,” 2018, doi:
10.3390/math6110242.
[8] P. Boccadoro, I. Student, V. Daniele, P. Di
Gennaro, and D. L. Ieee, “Water Quality
Prediction on a Sigfox-compliant IoT
Device : The Road Ahead of WaterS,” arXiv
Prepr. arXiv2007.13436, no. July, 2020.
[9] Z. Jianfeng, Y. Zhu, X. Zhang, M. Ye, and J.
Yang, “Developing a Long Short-Term
Memory ( LSTM ) based model for
DOI: 10.35629/5252-031010071017 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1017