A Study On Rainfall Prediction Techniques: December 2021

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/357448842

A STUDY ON RAINFALL PREDICTION TECHNIQUES

Article · December 2021

CITATIONS
READS
0
3,680

1 author:

C K Gomathy
Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya University
249 PUBLICATIONS 738 CITATIONS

Some of the authors of this publication are also working on these related projects:

Cloud Security Alliance View project

A MACHINE KNOWLEDGE MOVE IN THE DIRECTION OF MARK PRESENCE View project

All content following this page was uploaded by C K Gomathy on 31 December 2021.

The user has requested enhancement of the downloaded file.


International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

A STUDY ON RAINFALL PREDICTION TECHNIQUES


Dr. C K.Gomathy , ANNAPAREDDY BALA NARASIMHA REDDY, ARAVAPALLI PAVAN KUMAR,
AILE LOKESH
Sri Chandrasekharendra SaraswathiViswa Mahavidyalaya , Kanchipuram

ABSTRACT:
Rainfall prediction is important as heavy rainfall can lead to many disasters. The prediction helps
people to take preventive measures and moreover the prediction should be accurate. There are two types of
prediction short term rainfall prediction and long term rainfall. Prediction mostly short term prediction can
gives us the accurate result. The main challenge is to build a model for long term rainfall prediction. Heavy
precipitation prediction could be a major drawback for earth science department because it is closely
associated with the economy and lifetime of human. It’s a cause for natural disasters like flood and drought
that square measure encountered by individuals across the world each year. Accuracy of rainfall statement
has nice importance for countries like India whose economy is basically dependent on agriculture. The
dynamic nature of atmosphere, applied mathematics techniques fail to provide sensible accuracy for
precipitation statement. The prediction of precipitation using machine learning techniques may use
regression. Intention of this project is to offer non-experts easy access to the techniques, approaches utilized
in the sector of precipitation prediction and provide a comparative study among the various machine
learning techniques.
Keywords: Rainfall Prediction, Disaster Management, Rainfall Measure, Rain fall Accurate Results

I. INTRODUCTION
Rainfall forecasting is very important because heavy and irregular rainfall can have many impacts like
destruction of crops and farms, damage of property so a better forecasting model is essential for an early
warning that can minimize risks to life and property and also managing the agricultural farms in better way.
This prediction mainly helps farmers and also water resources can be utilized efficiently. Rainfall prediction is
a challenging task and the results should be accurate. There are many hardware devices for predicting rainfall
by using the weather conditions like temperature, humidity, pressure. These traditional methods cannot work in

© 2021, | | Page 1
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

an efficient way so by using machine learning techniques we can produce accurate results. We can just do it by
having the historical data analysis of rainfall and can predict the rainfall for future seasons. We can apply
many techniques like classification, regression according to the requirements and also we can calculate the
error between the actual and prediction and also the accuracy. Different techniques produce different
accuracies so it is important to choose the right algorithm and model it according to the requirements.

Regression analysis:
Regression analysis deals with the dependence of one variable (called as dependent variable) on one or
more other variables, (called as independent variables) which is useful for estimating and/ or predicting the
mean or average value of the former in terms of known or fixed values of the latter. For example, the salary of
a person is based on his/her experience here, the experience attribute is independent variable salary is
dependent variable. Simple linear regression defines the relationship between a single dependentvariable and
a single independent variable. The below equation is the general form of regression.
y = β0 + β1x + ε where β0 and β1 are parameters, and ε isa probabilistic error term. Regression analysis
is a vital tool for modeling and analyzing information. It is used for predictive analysis that is forecasting of
rainfall or weather, predicting trends in business, finance, and marketing. It can also be used for correcting
errors and also provide quantitative support.
The advantages of regression analysis are:
1. It is a powerful technique for testing relationshipbetween one dependent variable and many independent
variables.
2. It allows researchers to control extraneous factors.
3. Regression asses the cumulative effect of multiple factors.
4. It also helps to attain the measure of error using the regression line as a base for estimations.

II. LITERATURE REVIEW


Thirumalai, Chandrasegar, et al. discusses the amount of rainfall in past years according to the crop seasons and
predicts the rainfall for future years. The crop seasons are Rabi, Kharif and Zaid. Linear regression method is
applied for early prediction. Here, Rabi and kharif were taken as variables if one variable was given then other
can be predicted using linear regression. Standard deviation and Mean was also calculated for future

© 2021, | | Page 2
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

prediction of crop seasons. This implementation will be used for farmers to have an idea of which crop to
harvest according to crop seasons. Geetha, A., and G. M. Nasira. implements a model which predicts the
weather conditions like rainfall, fog, thunderstorms and cyclones which will be helpful to the people to take
preventive measures. Data mining techniques were used and a data mining tool named Rapid miner was used to
model the decision trees. The data set of Trivandrum with attributes like day, temperature, dew point,
pressure etc. The dataset is divided into training set and testing set and decision tree algorithm is applied. The
accuracy is calculated, actual and predicted values are compared. The accuracy is 80.67 and to achieve high
value it can be extended by applying soft computing techniques like fuzzy logic and genetic algorithms.
Parmar, Aakash, Kinjal Mistree, and Mithila Sompura discusses the different methods used for rainfall
prediction for weather forecasting with their limitations. Various neural networks algorithm which are used
for prediction are discussed with their steps in detail categorizes various approaches and algorithms used for
rainfall prediction by various researchers in today’s era. Finally, presents conclusion of paper. Done the
background work about some models of machine learning ARIMA Model, Artificial neural network and types
like Back- Propagation Neural Network - Cascade Forward Back Propagation Network Layer Recurrent
Network, Self-Organizing Map and Support Vector Machine, Collected, surveyed and table presents
categorization of different approaches of rainfall prediction. Dash, Yajnaseni, Saroj K. Mishra, and Bijaya K.
Panigrahi has used artificial intelligence techniques like Artificial Neural Network (ANN), Extreme Learning
Machine (ELM), K nearest neighbor (KNN) are applied for prediction of summer monsoon and post monsoon
rainfall. The dataset used is the time series data of Kerala from 1871 to 2016 taken from Indian Institute of
Tropical Meteorology (IITM).The data is pre-processed and normalization was performed on the data next, the
data is divided into training and testing the data up to 2010 was taken as training set and the data from 2011-
2016 taken as test set. The above mentioned algorithms were applied and its performance was calculated by
using MAE, RMSE, and MASE. The ELM algorithm has given accurate results compared to the others. Singh,
Gurpreet, and Deepak Kumar states that there are many machine learning algorithms applied for the prediction
of rainfall and in this, they have used a hybrid approach that is combining two techniques, Random forest and
Gradient boosting with many machine learning techniques like ada boost, K-Nearest Neighbor(KNN), Support
vector machine(SVM), and Neural Network(NN).These have been applied on the rainfall data of North
Carolina from 2007 – 2017 and also the performance is calculated by applying different metrics F-score,
precision, accuracy, recall. Finally, eight hybrid models have been proposed and Gradient boosting-Ada boost

© 2021, | | Page 3
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

has been the superior which exhibited good results. Kar, Kaveri, Neelima Thakur, and Prerika Sanghvi has used
the fuzzy logic approach for the prediction of rainfall on the data of temperature in a geographic location. The
fuzzy model has been applied Due to other climatic factors the prediction is not accurate so they have
considered other influencing factors like humidity also analyzed the advantages of fuzzy system over other
techniques. Sardeshpande, Kaushik D., and Vijaya R. Thool [7] has used the artificial neural networks, back
propagation (BPNN), radial basis function (RBFNN) and generalized regression (GRNN) on the rainfall data of
India mainly Nanded district, Maharashtra was considered and the data is normalized between 0 to 1 and the
algorithms are applied and the performance of those was calculated and compared. BPNN and RBFNN has
given good results compared to GRNN. Chen, Binghong, et al. [8] focuses on the non-linear machine learning
approaches like gradient boosting decision tree model and deep neural networks for a short term prediction of
rainfall and these algorithms were built on Alibaba cloud and data was collected from different sites and
effectiveness is calculated by using classification metrics AUC, F1 score, precision and accuracy and by
Regression metric RMSE, correlation. It has been observed that DNN showed better result than ECData.
Moon, Seung-Hyun, et al [9] implements an early warning system (EWS) that produces a signal when it
reaches a threshold limit that givesWarning before 3 hrs. This was done by using machine learning
techniques. South Korea data from 2007 to 2012 was taken and performance is measured by some criteria and
aconfusion matrix was produced.

III. PROPOSED METHOD


The predictive model is used to prediction of the precipitation. The first step is converting data in to the
correct format to conduct experiments then make a good analysis of data and observe variation in the
patterns of rainfall. We predict the rainfall by separating the dataset into training set and testing set then we
apply different machine learning approaches (MLR, SVR, etc.) and statistical techniques and compare and
draw analysis over various approaches used. With the help of numerous approaches we attempt to minimize
the error.
Dataset Description:
The dataset [10] consists of the measurement of rainfall from year 1901-2015 for each state.
• Data consists of 19 attributes (individual months, annual, and combinations of 3 consecutive

© 2021, | | Page 4
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

months) for 36 sub divisions.


• The data is available only from 1950 to 2015 for some of the subdivisions
• The attributes are the amount of rainfall measured in mm.
As the dataset is very large, feature reduction is done so that it improves the accuracy, reduces the
computation time and also storage. Principal Component Analysis (PCA) is a technique of extracting
necessary variables from a huge set of variables. It extracts low dimensional set with a motive to capture the
maximum amount of information. With few variables, visualization becomes more significant. It is done by
using covariance matrix and by obtaining Eigen values from it. In our dataset by using PCA it has reduced
the attributes by considering only the rainfall data of combination of three consecutive months and annual
data from every subdivision. Techniques used: Multiple Linear Regression: Multiple regressions
tries to model the connection between two or additional variables and a response by fitting an equation to
determined information. Clearly, it's nothing however an extension of straight forward regression toward the
mean. The general form of multivariable linear regression model is: y=α+β 1x1+ β2x2+…+ βkxk+ε where y =
dependent variable and x1, x2… xk are independent variables,α,β are coefficients. Multiple regression will
model additional complicated relationship that comes from numerous options along they should to be
employed in cases wherever one explicit variable isn't evident enough to map the link between the
independent and also the variable quantity.

Support Vector Regression:

Support Vector regression machine learning and data science with the term SVM or support vector machine but
SVR that is support vector regression is a bit different from SVM that is support vector machine as the name
suggests that is integration algorithm so we can use SVR for working with continuous value instead of
classification which is SVM Support Vector Machines support linear and nonlinear regression that we can refer
to as Support Vector Regression. Instead of trying to fit the largest possible street between two classes while
limiting margin violations, Support Vector Regression tries to fit as many instances as possible on the street
while limiting margin violations. The size of the lane is measured by a hyper parameter Epsilon.

© 2021, | | Page 5
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

Kernel- The function used to map a low dimensional datainto higher dimensional data.
Hyper plane- in SVM this is a basically The Separation line between the data classes also in SVR we are going
to define itis as the line that will that will help us to predict the continuous value or target value.
Boundary line - the SVM plane which creates imagine the support vector can be on boundary lines or outside
the boundary line separates two classes in the concept same.
Vectors-these are the data points which are closest to theboundary the distance of the point is minimum.
SVR performs linear regression in higher dimensional space. We can think of SVR as if each data point in the
training represents its own dimension. When we evaluate kernel between a test point and a point in the training
set the resulting value gives you the coordinate of your test point in that dimension. The vector we get when we
evaluate the test point for all points in the training set, k is the representation of the test point in the higher
dimensional space. The equation of the hyper plane is wx+b=0 and the two equations of boundary lines is
Wx+b=+e, Wx+b=-e Equation that satisfy our SVR is e<=y-Wx-b<=+e
SVR has a different regression goal compared to linear regression in linear regression, we are trying to
minimize the error between the prediction and data whereas in SVR a goal is to make sure that error do not
exceed the threshold.

Lasso Regression:
Lasso is Least Absolute Shrinkage and Selection Operator Lasso regression works by introducing a bias
term but instead of squaring the slope, the absolute value of the slope is added as a penalty term.

Lasso Regression:Min (sum of squared residuals) + α * |slope|Here α * |slope| is penalty term.


The effect of Alpha on Lasso is as Alpha increases the slope of the regression line is reduced and becomes
more horizontal and the model becomes less sensitive to the variations of the independent variable. Lasso
Regression helps to reduce over fitting and it is particularly useful for feature selection. It can be useful if
we have several independent variables that are useless

© 2021, | | Page 6
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

Fig 1. Rainfall Prediction Model

Algorithm:
Rainfall prediction Input: Rainfall data set
Output: Accuracy/error of the prediction Step1: Import the rainfall data set csv file.
Step2: Fill the missing values with mean value of the data. Step3: Scaling the features- scaling the data to a
fixed scale. Step4: Feature Reduction- PCA is used to minimize the data. Step5: The data is divided into
training set (70%) and testing set (30%).
Step6: Multiple Linear Regression algorithm, Support Vector Regression and Lasso Regression is applied
and the Mean Absolute Error, r2 score is calculated.
Step7: The scatter plots are plotted between predicted and testing data for the applied models and the
errors are compared and best model among them is selected.
Step8: Display the results.

© 2021, | | Page 7
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

IV. EXPERIMENTAL RESULTS

The data of rainfall from 1901-2015 is collected and data is studied and plotted to understand the rainfall
in various regions. The below is the histograms plotted for the rainfall data monthly, annual and
consecutive of three months. It is observed that there is a rise in volume of rainfall(Y-axis) in the months
of July, August and September.

Fig 2. Histograms of the rainfall data monthly, annual andconsecutive of three months

© 2021, | | Page 8
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

The below plot is the line graph for the amount of rainfall over the years and it is detected that there was a high
volume of rainfall in 1950s.

Fig 3. Line graph for distribution of rainfall from the year 1901-2015.

The below bar graph shows the amount of rainfall for all months in the subdivisions and it is observed that
the volume of rainfall is sensibly good in Eastern India in the months of March, April, May.

© 2021, | | Page 9
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

Fig 4. Bar graphs for the amount of rainfall in all subdivisions,monthly

After the analysis of data, pre-processing techniques are applied and regression models (MLR, SVR and
Lasso) are applied and a scatter plot is plotted.

© 2021, | | Page
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

Fig 5. Scatter plot between the predictions and testing set

Then, for each regression model the MAE and r2 score are calculated and compared and a graph is plotted.

© 2021, | | Page
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

Fig 6. Comparison among applied models

Fig 7: Input processing Flow

© 2021, | | Page
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

V. CONCLUSION
This project concentrated on estimation of rainfall and it is estimated that SVR is a valuable and adaptable
strategy, helping the client to manage the impediments relating to distributional properties of fundamental
factors, geometry of the information and the normal issue of model over fitting. The decision of bit capacity is
basic for SVR displaying. We prescribe tenderfoots to utilize straight and RBF piece for direct and non-
straight relationship individually. We see that SVR is better than MLR as an expectation strategy. MLR can't
catch the non-linearity in a data set and SVR winds up helpful in such circumstances. We additionally
process Mean Absolute Error (MAE) for both MLR and SVR models to assess execution of the models. At
last, we look at the presentation of SLR, SVR and tuned SVR model. True to form, the tuned SVR model gives
the best expectation.

VI. REFERENCES
[1] Thirumalai, Chandrasegar, et al. "Heuristic prediction of rainfall using machine learning techniques."
2017 International Conference on Trends in Electronics and Informatics (ICEI). IEEE, 2017.
[2] Geetha, A., and G. M. Nasira. "Data mining for meteorological applications: Decision trees for
modeling rainfall prediction." 2014 IEEE International Conference on Computational Intelligence and
Computing Research.
IEEE, 2014
[3] Parmar, Aakash, Kinjal Mistree, and Mithila Sompura. "Machine learning techniques for rainfall
prediction: A review." 2017 International Conference on Innovations in information Embedded and
Communication Systems.2017.
[4] Dash, Yajnaseni, Saroj K. Mishra, and Bijaya K. Panigrahi. "Rainfall prediction for the Kerala state of
India using artificial intelligence approaches." Computers & Electrical Engineering 70 (2018): 66-73.
[5] Dr.C K Gomathy, Article: An Effective Innovation Technology In Enhancing Teaching And Learning Of
Knowledge Using Ict Methods, International Journal Of Contemporary Research In Computer Science
And Technology (Ijcrcst) E-Issn: 2395-5325 Volume3, Issue 4,P.No-10-13, April ’2017

© 2021, | | Page
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

[6] Dr.C K Gomathy, Article: A Semantic Quality of Web Service Information Retrieval Techniques Using
Bin Rank, International Journal of Scientific Research in Computer Science Engineering and Information
Technology ( IJSRCSEIT ) Volume 3 | Issue 1 | ISSN : 2456-3307, P.No:1563-1578, February-2018

[7] Dr.C K Gomathy, Article: A Web Based Platform Comparison by an Exploratory Experiment Searching
For Emergent Platform Properties, IAETSD Journal For Advanced Research In Applied Sciences, Volume
5, Issue 3, P.No-213-220, ISSN NO: 2394-8442,Mar/2018

[8] Dr.C K Gomathy, Article: A Study on the Effect of Digital Literacy and information Management,
IAETSD Journal For Advanced Research In Applied Sciences, Volume 7 Issue 3, P.No-51-57, ISSN NO:
2279-543X,Mar/2018

[9] Dr.C.K.Gomathy, A.V.Sripadh Kaustthub, K.Banuprakash, Article: An Effect of Big Data Analytics on
Enhancing Automated Aviation , International Journal Of Contemporary Research In Computer Science
And Technology (Ijcrcst) E-Issn: 2395-5325 Volume 4, Issue 3,P.No-1-7.March -2018
[10] Dr.C K Gomathy, Article: A Semantic Quality of Web Service Information Retrieval Techniques Using
Bin Rank A Cloud Monitoring Framework Perform in Web Services, International Journal of Scientific
Research in Computer Science Engineering and Information Technology IJSRCSEIT | Volume 3 | Issue 5 |
ISSN : 2456-3307,May-2018

[11] Dr.C K Gomathy, Article: Supply chain-Impact of importance and Technology in Software Release
Management, International Journal of Scientific Research in Computer Science Engineering and
Information Technology ( IJSRCSEIT ) Volume 3 | Issue 6 | ISSN : 2456-3307, P.No:1-4, July-2018

© 2021, | | Page
IJSREM
International Journal of Scientific Research in Engineering and Management
Volume: 05 Issue: 10 | Oct - ISSN: 2582-

Author’s Profile:

1. ANNAPAREDDY BALA NARASIMHA REDDY, Student, B.E. Computer Science and Engineering, Sri
Chandrasekharendra SaraswathiViswa Mahavidyalaya deemed to be university, Enathur, Kanchipuram, India. His
Area of Interest Internet of things,big data analytics.

2. ARAVAPALLI PAVAN KUMAR Student, B.E. Computer Science and Engineering, Sri Chandrasekharendra
SaraswathiViswa Mahavidyalaya deemed to be university, Enathur, Kanchipuram, India. His Area of Interest
Internet of things,big data analytics.

3. AILE LOKESH, Student, B.E. Computer Science and Engineering, Sri Chandrasekharendra SaraswathiViswa
Mahavidyalaya deemed to be university, Enathur, Kanchipuram, India. His Area of Interest Internet of things,big
data analytics.

,
4. Dr.C.K.Gomathy is Assistant Professor in Computer Science and Engineering at Sri Chandrasekharendra
SaraswathiViswa Mahavidyalaya deemed to be university, Enathur, Kanchipuram, India. Her area of interest is
Software Engineering, Big Data Analytics Web Services, Knowledge Management and IOT.

© 2021, IJSREM | www.ijsrem.com | Page 15

View publication stats

You might also like