Weather Prediction 2

“WEATHER PREDICTION”
MAJOR PROJECT
Submitted by ASHWIN.J
(18MCA038)
Under the Guidance of
Dr . B . UMA MAHESWARI M.Sc., MCA., M.Phil., PhD
Assistant Professor,
Department of Computer Applications.
In partial fulfillment of the requirements for the award of the degree of

MASTER OF COMPUTER APPLICATIONS
of Bharathiar University
DEPARTMENT OF COMPUTER APPLICATIONS

PSG COLLEGE OF ARTS & SCIENCE
An Autonomous college -Affiliated to Bharathiar
University Accredited with ‘A’ grade by NAAC (3rdcycle)
College with Potential for Excellence
(Status Awarded by the UGC)
Star College Status Awarded by
DBT MSTAn ISO 9001:2015
Certified Institution Coimbatore -
641 014
APRIL 2021
DEPARTMENT OF COMPUTER
APPLICATIONS PSG COLLEGE OF ARTS
& SCIENCE
An Autonomous College -Affiliated to Bharathiar University
Accredited with ‘A’ Grade by NAAC (3rdCycle)
College with Potential for Excellence (Status Awarded by the UGC)
Star College Status Awarded by DBT-MST
An ISO 9001:2015 Certified
Institution Civil
Aerodrome Post
Coimbatore -641 014
APRIL-2021
CERTIFICATE
This is to certify that this Project work entitled “WEATHER
PREDCTION” is a bonafide record of work done by ASHWIN.J
(18MCA038) in partial fulfillment of the requirements for the award of Degree
of Master of Computer Applications of Bharathiar University.
Dr.B.UMA MAHESWARI M.Sc.,MCA.,M.Phil.,PhD Dr.R.SUDHA MCA.,M.Phil.,Ph.D.,
Faculty Guide Head of the Department
Submitted for Viva-Voce Examination held on 09.04.2021
Dr.B.UMA MAHESWARI M.Sc.,MCA.,M.Phil.,PhD
Internal Examiner External Examiner

DECLARATION
ASHWIN J (18MCA038), hereby declare that this Project work

entitled “WEATHER PREDICTION ” is submitted to PSG College of Arts
and Science (Autonomous), Coimbatore in partial fulfillment for the award of
Master of Computer Applications, is a record of original work done by me
under the supervisionand guidance of Dr. B.UMA MAHESWARI M.Sc.,
MCA., M.Phil., PhD Assistant Professor in Department of Computer
Applications, PSG College of Arts andScience, Coimbatore.
This Project work has not been submitted by me for the award of any
other Degree/ Diploma/ Associate ship/ Fellowship or any other similar degree
to any other university
Place:Coimbatore ASHWIN J
Date : 08.04.2021 (18MCA038)

ACKNOWLEDGEMENT
With great gratitude, I would like to acknowledge the help of those who
contributed with their valuable suggestions and timely assistance to complete
this work.
First and foremost, I would like to extend my heartfelt gratitude and

place my sincere thanks to Thiru.L.GOPALAKRISHNAN Trustee, PSG &
SONS Charities, Coimbatore for providing all sorts of support and necessary
facilities throughout the course.
I express my deep sense of gratitude to Secretary, Dr.T.KANNAIAN

M.Sc., M.Tech., Ph.D., for permitting me to undertake this work.
I thank our Principal, Dr.D.BRINDHA M.Sc., MPhil., Ph.D.,

M.A(Yoga)., forher support and constant source of inspiration through the
course of project and also I would like to thank our Vice Principal,
Dr.A.ANGURAJ M.Sc., M.Phil., PhD., for hissupport.
I own my deepest gratitude to Dr.R.SUDHA MCA., M.Phil., Ph.D.,

Head of the Department, for her consultancy, encouraging me to pursue new
goals and ideas.
My sincere thanks to Dr . B . UMA MAHESWARI M.Sc., MCA.,

M.Phil., PhD for her valuable suggestions, support and guidance as my
internal guide, without which my work would not have reached the present
form.
I thank to Mr. R. Ramkumar MCA, for his valuable suggestions,

support and guidance as my external guide, without which my work would not
have reached the present form.
Last but not the least, I am greatly indebted to my parents and friends for their kind
co-operation in each and every step I took in this project.
DATE02.04.2021
CHENNAI
Mr. J. ASHWIN (3rd MCA)

REG.No. 18MCA038
PSG College of Arts and Science
Coimbatore.
TO WHOM SO EVER IT MAY CONCERN
This is to certify that Mr. J.ASHWIN (REGNO: 18MCA038) doing MCA final year
at PSG COLLEGE OF ARTS & SCIENCE, COIMBATORE had successfully
completed the PROJECT entitled “weather prediction” in department of “Machine
Learning” in our organization during the period of JANUARY 2021 to MARCH
2021.
WE WISH ALL THE BEST FOR HIS BETTER FUTURE
For Shiash Info Solutions Private Limited
Ashwini Kanniyappan
Manager – Human Resources
Shiash Info Solutions Private Limited

#51, Level 4, Tower A, Rattha TEK Meadows, Old Mahabalipuram
Road, Sholinganallur, Chennai – 600 119, Tamil Nadu
India+914466255681 [email protected]
SYNOPSIS:
Weather forecasting has gained attention many researchers from various

research communities due to its effect to the global human life. The emerging
machine learning techniques in the last decade coupled with the wide
availability of massive weather observation data and the advent of information
and computer technology have motivated many researches to explore hidden
hierarchical pattern in the large volume of weather dataset for weather
forecasting. This study investigates machine learning techniques for weather
forecasting.Using machine learning algorithms we can get accuarcy of datasets.
TABLE OF CONTENTS
S.No CONTENTS PAGE
NO
1 Introduction
1.1 Project Overview 1
2 System specification
2.1 Software requirement 3
2.2 Hardware requirement 3
3 System Analysis
3.1 Existing system 5
3.2 Proposed system 6
4 System design and development
4.1 System flow diagram 7
4.2 Data collection 7
5 Modules
5.1 Dataset selection 8
5.2 Features selection 8
5.3 Normalization
5.4 Machine Learning 9
5.5 Data preprocessing
5.6 Data visualization
6 Implementation of algorithm
6.1 Linear regression 10
7 Testing
Testing and implementation 11
8 Future enhancement 12
9 Conclusion
13
Bibliography 14
Appendices 15
1.INTRODUCTION:
1.1.PROJECT OVERVIEW
Weather is an important aspect of a person’s life as it can help us to know when it’ll rain
and when it’ll be sunny. Weather forecasting is the attempt by meteorologists to predict
the weather conditions at some future time and the weather conditions that may be
expected. The climatic condition parameters are based on the temperature, pressure,
humidity, dewpoint, rainfall, precipitation, wind speed and size of dataset. Here, the
parameters temperature, pressure, humidity, dewpoint, precipitation, rainfall is only
considered for experimental analysis.
1
2.SYSTEM SPECIFICATION:
2.1. software Requirement

The software used in our projects are:
Python 3.7: Python is an interpreted, high level, general programming language. Its
formatting is visually uncluttered, and it often uses English keywords where other
languages use punctuation. It provides a vast library for data mining and predictions.
Jupiter Notebook/ Spider/ Pycharm: It is an open source cross-platform integrated
development environment (IDE) for scientific programming in the Python language.
Spyder integrates with a number of prominent packages as well as another open source
software.
Numpy: Numpy was used for building the front-end part of the system.
Pandas: Pandas was used for the data preprocessing and statistical analysis of data.
Matplotlib: Matplotlib was used for the graphical representation of our prediction.
2.2. Hardware Requirement

Operating System : Windows OS
Processor : i3 or higher
Ram : 8 GB or higher
IDE : Anaconda
2
3. SYSTEM ANALYSIS:
3.1. Existing system

It was not until the invention of the eletric telegraphin 1835 that the modern age of
weather forecasting began. Before that, the fastest that distant weather reports could
travel was around 160 kilometres per day (100 mi/d), but was more typically 60–120
kilometres per day (40–75 mi/day) (whether by land or by sea). By the late 1840s, the
telegraph allowed reports of weather conditions from a wide area to be received almost
instantaneously,allowing forecasts to be made from knowledge of weather conditions
further
The two men credited with the birth of forecasting as a science were an officer of the
navy .Both were influential men in British naval and governmental circles, and though
ridiculed in the press at the time, their work gained scientific credence, was accepted by
the Royal Navy, and formed the basis for all of today's weather forecasting knowledge.
Beaufort developed the Wind Force Scale and Weather Notation coding, which he was
to use in his journals for the remainder of his life. He also promoted the development of
reliable tide tables around British shores, and with his friend expanded weather record-
keeping at 200 British costguard stations.
Robert FitzRoy was appointed in 1854 as chief of a new department within the to the
collection of weather data at sea as a service to . This was the forerunner of the modern.
All ship captains were tasked with collating data on the weather and computing it, with
the use of tested instruments that were loaned for this purpose.
3.2.Proposed system
To predict weather forecasting, a massive amount of data is being fed into the algorithm
that uses deep learning techniques to learn it and then make predictions based on the
past data. However, the trained ML model works on a physics free approach for the
forecasting process. The model has been designed to learn from the atmospheric
examples daily without applying any prior data fed on to the system. The underlying
convolution neural network (CNN) ‘U-Net’, which comprises a sequence of layers — a
set of mathematical operations — takes the input satellite imagery and then transforms
them into output images. The sequence of layers in the convolutional neural network are
usually arranged in an encoding phase, which, in turn, decreases the resolution of the
output images. However, with Google AI, the separate decoding phase has been added
to expand the low-resolution images.To start with, the engineering team trained the
model by feeding historical data from 2017 to 2019 collected from the US for
evaluation, and then compared the same to three baselines models — High-Resolution
Rapid Refresh (HRRR); an Optical Flow (OF) algorithm; and a persistence model.
According to researchers, once compared, Google artificial intelligence outperformed all
the traditional methods by using precision and recall graphs. The model would be
treating weather prediction as an image-to-image translation problem and believed in
leveraging state-of-the-art CNN. Moving forward, for the best results mechanism with
its ML model to have accurate forecasts.
3
4.SYSTEM DESIGN AND DEVELOPMENT:
Using python 3 version we have design the model with the help of repositries also we
have design it we are using numpy,matpoltlib,seaborn,pandas,scikit-learn etc
NumPy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such as
masked arrays and matrices), and an assortment of routines for fast operations on arrays,
including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete
Fourier transforms, basic linear algebra, basic statistical operations, random simulation
and much more.
At the core of the NumPy package, is the ndarray object. This encapsulates n-
dimensional arrays of homogeneous data types, with many operations being performed
in compiled code for performance. There are several important differences between
NumPy arrays and the standard Python sequences:
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow
dynamically). Changing the size of an ndarray will create a new array and delete the
original.
The elements in a NumPy array are all required to be of the same data type, and thus
will be the same size in memory. The exception: one can have arrays of (Python,
including NumPy) objects, thereby allowing for arrays of different sized elements.
NumPy arrays facilitate advanced mathematical and other types of operations on large
numbers of data. Typically, such operations are executed more efficiently and with less
code than is possible using Python’s built-in sequences.
A growing plethora of scientific and mathematical Python-based packages are using
NumPy arrays; though these typically support Python-sequence input, they convert such
input to NumPy arrays prior to processing, and they often output NumPy arrays. In other
words, in order to efficiently use much (perhaps even most) of today’s
scientific/mathematical Python-based software, just knowing how to use Python’s built-
in sequence types is insufficient - one also needs to know how to use NumPy arrays.
The points about sequence size and speed are particularly important in scientific
computing. As a simple example, consider the case of multiplying each element in a 1-D
sequence with the corresponding element in another sequence of the same length. If the
data are stored in two Python lists, a and b, we could iterate over each element:
seaborn is a Python data visualization library based on matplotlib. It provides a high-
level interface for drawing attractive and informative statistical graphics.
4
4.1. System flow diagram
4.2. Data collection
Wind
Wind
Precip Apparent Spee Loud Pressure
Tempera Humidit Bearing Visibilit Daily
Type Temperatur d Cove (millibar
ture (C) y (degree y (km) Summary
e (C) (km/ r s)
s)
h)
5
meantempm
maxpressurem_1 -0.519699
meanpressurem_1 -0.365682
minpressurem_1 -0.201003
minhumidity_1 -0.148602
precipm_2 0.084394
precipm_1 0.086617
6
meantempm
precipm_3 0.098684
maxhumidity_1 0.132466
maxdewptm_3 0.829230
maxtempm_3 0.832974
mindewptm_3 0.833546
meandewptm_3 0.834251
mintempm_3 0.836340
mintempm_2 0.854320
meantempm_3 0.855662
maxtempm_2 0.863906
mintempm_1 0.905423
maxtempm_1 0.923787
mintempm 0.973122
maxtempm 0.976328
meantempm 1.000000
7
5. MODULES :
The steps involved in preprocessing are :
5.1. Dataset selection

Dataset selection where we select as per the algorthim we select dataset already
given by companies or research purpose datasets mostly opensource we can download
from websites like datasearch,kaggle etc
5.2. Features selection

The data we have collected has many unwanted attributes which will not be
needed in our project. Hence, we use the attributes which we need only.
5.3. Normalization
The data we collected from internet should be first normalized. Normalization
refers to rescaling real valued numeric attributes into the rage or 0 and 1. After the data
are filtered it is then normalized.
5.4. Machine Learning

Training a model is the process of iteratively improving your prediction equation
by looping through the dataset multiple times, each time updating the weight and bias
values in the direction indicated by the slope of the cost function (gradient). Training is
complete when we reach an acceptable error threshold, or when subsequent training
iterations fail to reduce our cost.
5.5. Data preprocessing

When we talk about data, we usually think of some large datasets with huge
number of rows and columns. While that is a likely scenario, it is not always the case —
data could be in so many different forms: Structured Tables, Images, Audio files, Videos
etc..
Machines don’t understand free text, image or video data as it is, they understand 1s and
0s. So it probably won’t be good enough if we put on a slideshow of all our images and
expect our machine learning model to get trained just by that!
In any Machine Learning process, Data Preprocessing is that step in which the data gets
transformed, or Encoded, to bring it to such a state that now the machine can easily
parse it. In other words, the features of the data can now be easily interpreted by the
algorithm.
5.6. Data visualization

Data visualization is the graphical representation of information and data. By
using visual elements like charts, graphs, and maps, data visualization tools provide an
accessible way to see and understand trends, outliers, and patterns in data.
8
In the world of Big Data, data visualization tools and technologies are essential to
analyze massive amounts of information and make data-driven decisions. Our eyes are
drawn to colors and patterns. We can quickly identify red from blue, square from circle.
Our culture is visual, including everything from art and advertisements to TV and
movies.
Correlation Value Interpretation
0.8 - 1.0 Very Strong
0.6 - 0.8 Strong
0.4 - 0.6 Moderate
0.2 - 0.4 Weak
0.0 - 0.2 Very Weak
To assess the correlation in this data I will call the corr() method of the Pandas
DataFrame object. Chained to this corr() method call I can then select the column of
interest ("meantempm") and again chain another method call sort_values() on the
resulting Pandas Series object. This will output the correlation values from most
negatively correlated to the most positively correlated.
9
6. IMPLIMENTATION OF ALGORITHM:
6.1. Linear Regression

Regression is a method of modelling a target value based on independent
predictors. This method is mostly used for forecasting and finding out cause and effect
relationship between variables. Regression techniques mostly differ based on the
number of independent variables and the type of relationship between the independent
and dependent variables.
Linear regression aims to apply a set of assumptions primary regarding linear
relationships and numerical techniques to predict an outcome (Y, aka the dependent
variable) based off of one or more predictors (X's independent variables) with the end
goal of establishing a model (mathematical formula) to predict outcomes given only the
predictor values with some amount of uncertainty.
The generalized formula for a Linear Regression model is:
where:
ŷ is the predicted outcome variable (dependent variable)
xj are the predictor variables (independent variables) for j = 1,2,..., p-1 parameters
β0 is the intercept or the value of ŷ when each xj equals zero
βj is the change in ŷ based on a one unit change in one of the corresponding xj
Ε is a random error term associated with the difference between the predicted ŷi value
and the actual yi value
That last term in the equation for the Linear Regression is a very important one. The
most basic form of building a Linear Regression model relies on an algorithm known as
Ordinary Least Squares which finds the combination of βj's values which minimize
the Ε term.
10
7. TESTING:
7.1. TESTING AND IMPLIMENTATION
Looking at the histogram of the values for maxhumidity the data exhibits quite a bit of
negative skew. I will want to keep this in mind when selecting prediction models and
evaluating the strength of impact of max humidities. Many of the underlying statistical
methods assume that the data is normally distributed. For now I think I will leave them
alone but it will be good to keep this in mind and have a certain amount of skepticism of
it.
11
This plot exhibits another interesting feature. From this plot, the data is multimodal,
which leads me to believe that there are two very different sets of environmental
circumstances apparent in this data. I am hesitant to remove these values since I know
that the temperature swings in this area of the country can be quite extreme especially
between seasons of the year. I am worried that removing these low values might have
some explanatory usefulness but, once again I will be skeptical about it at the same time.
from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=2/12,random_state=0)
from sklearn.metrics import r2_score

r_squared = r2_score(y_test, y_pred)*100
12
8. CONCLUTION:
In this project, I demonstrated how to use the Linear Regression Machine

Learning algorithm to predict future mean weather temperatures based off the data
collected in the prior project. I demonstrated how to use the statsmodels library to select
statistically significant predictors based off of sound statistical methods. I then utilized
this information to fit a prediction model based off a training subset using Scikit-
Learn's LinearRegression class. Using this fitted model I could then predict the expected
values based off of the inputs from a testing subset and evaluate the accuracy of the
prediction, which indicates a reasonable amount of accuracy.
13
BIBLIOGRAPHY
Reference books
Katarya, Rahul, and Polipireddy Srinivas. "Predicting Heart Disease at EarlyStages using
Machine Learning: A Survey." 2020 International Conference on Electronics and
Sustainable Communication Systems (ICESC). IEEE, 2020.
Gavhane, Aditi, et al. "Prediction of heart disease using machine learning." 2018
Second International Conference on Electronics, Communication and Aerospace
Technology (ICECA). IEEE, 2018.
Kohli, Pahulpreet Singh, and Shriya Arora. "Application of machine learning in disease
prediction." 2018 4th International conference on computing communication and
automation (ICCCA). IEEE, 2018.
Krishnan, Santhana, and S. Geetha. "Prediction of Heart Disease Using Machine

Learning Algorithms." 2019 1st international conference on innovations in information
and communication technology (ICIICT). IEEE, 2019.
Atallah, Rahma, and Amjed Al-Mousa. "Heart Disease Detection Using Machine
Learning Majority Voting Ensemble Method." 2019 2nd International Conference on new
Trends in Computing Sciences (ICTCS). IEEE, 2019.
Reference website
• www.kaggle.com
• www.tutorialpoint.com
• ieeexplore.ieee.org
• semanticscholar.org
14
APPENDICES:
Screenshots
Collections of dataset
Impementation of program in the portal

15
Importing of weather forecasting and prediction
Output data
16
Sample coding
from datetime import datetime, timedelta
import time
from collections import namedtuple
import pandas as pd
import requests
import matplotlib.pyplot as plt
API_KEY = '7052ad35e3c73564'
BASE_URL = "http://api.wunderground.com/api/{}/history_{}/q/NE/Lincoln.json"
target_date = datetime(2016, 5, 16)
features = ["date", "meantempm", "meandewptm", "meanpressurem", "maxhumidity", "minhumidity", "maxtem

pm",
"mintempm", "maxdewptm", "mindewptm", "maxpressurem", "minpressurem", "precipm"]
DailySummary = namedtuple("DailySummary", features)
def extract_weather_data(url, api_key, target_date, days):
records = []
for _ in range(days):
request = BASE_URL.format(API_KEY, target_date.strftime('%Y%m%d'))
response = requests.get(request)
if response.status_code == 200:
data = response.json()['history']['dailysummary'][0]
records.append(DailySummary(
date=target_date,
meantempm=data['meantempm'],
meandewptm=data['meandewptm'],
17
meanpressurem=data['meanpressurem'],
maxhumidity=data['maxhumidity'],
minhumidity=data['minhumidity'],
maxtempm=data['maxtempm'],
mintempm=data['mintempm'],
maxdewptm=data['maxdewptm'],
mindewptm=data['mindewptm'],
maxpressurem=data['maxpressurem'],
minpressurem=data['minpressurem'],
precipm=data['precipm']))
time.sleep(6)
target_date += timedelta(days=1)
return records
records = extract_weather_data(BASE_URL, API_KEY, target_date, 500)
# if you closed our terminal or Jupyter Notebook, reinitialize your imports and
# variables first and remember to set your target_date to datetime(2016, 5, 16)
records += extract_weather_data(BASE_URL, API_KEY, target_date, 500)
df = pd.DataFrame(records, columns=features).set_index('date')
tmp = df[['meantempm', 'meandewptm']].head(10)
# 1 day prior
N=1
# target measurement of mean temperature
feature = 'meantempm'
18
# total number of rows
rows = tmp.shape[0]
# a list representing Nth prior measurements of feature
# notice that the front of the list needs to be padded with N
# None values to maintain the constistent rows length for each N
nth_prior_measurements = [None]*N + [tmp[feature][i-N] for i in range(N, rows)]
# make a new column name of feature_N and add to DataFrame
col_name = "{}_{}".format(feature, N)
tmp[col_name] = nth_prior_measurements
tmp
def derive_nth_day_feature(df, feature, N):
rows = df.shape[0]
nth_prior_measurements = [None]*N + [df[feature][i-N] for i in range(N, rows)]
col_name = "{}_{}".format(feature, N)
df[col_name] = nth_prior_measurements
for feature in features:
if feature != 'date':
for N in range(1, 4):
derive_nth_day_feature(df, feature, N)
df.columns
Index(['meantempm', 'meandewptm', 'meanpressurem', 'maxhumidity',
'minhumidity', 'maxtempm', 'mintempm', 'maxdewptm', 'mindewptm',
'maxpressurem', 'minpressurem', 'precipm', 'meantempm_1', 'meantempm_2',
'meantempm_3', 'meandewptm_1', 'meandewptm_2', 'meandewptm_3',
19
'meanpressurem_1', 'meanpressurem_2', 'meanpressurem_3',
'maxhumidity_1', 'maxhumidity_2', 'maxhumidity_3', 'minhumidity_1',
'minhumidity_2', 'minhumidity_3', 'maxtempm_1', 'maxtempm_2',
'maxtempm_3', 'mintempm_1', 'mintempm_2', 'mintempm_3', 'maxdewptm_1',
'maxdewptm_2', 'maxdewptm_3', 'mindewptm_1', 'mindewptm_2',
'mindewptm_3', 'maxpressurem_1', 'maxpressurem_2', 'maxpressurem_3',
'minpressurem_1', 'minpressurem_2', 'minpressurem_3', 'precipm_1',
'precipm_2', 'precipm_3'],
dtype='object')
# make list of original features without meantempm, mintempm, and maxtempm
to_remove = [feature
for feature in features
if feature not in ['meantempm', 'mintempm', 'maxtempm']]
# make a list of columns to keep
to_keep = [col for col in df.columns if col not in to_remove]
# select only the columns in to_keep and assign to df
df = df[to_keep]
df.columns
Index(['meantempm', 'maxtempm', 'mintempm', 'meantempm_1', 'meantempm_2',
'meantempm_3', 'meandewptm_1', 'meandewptm_2', 'meandewptm_3',
'meanpressurem_1', 'meanpressurem_2', 'meanpressurem_3',
'maxhumidity_1', 'maxhumidity_2', 'maxhumidity_3', 'minhumidity_1',
'minhumidity_2', 'minhumidity_3', 'maxtempm_1', 'maxtempm_2',
20
'maxtempm_3', 'mintempm_1', 'mintempm_2', 'mintempm_3', 'maxdewptm_1',
'maxdewptm_2', 'maxdewptm_3', 'mindewptm_1', 'mindewptm_2',
'mindewptm_3', 'maxpressurem_1', 'maxpressurem_2', 'maxpressurem_3',
'minpressurem_1', 'minpressurem_2', 'minpressurem_3', 'precipm_1',
'precipm_2', 'precipm_3'],
dtype='object')
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2015-01-01 to 2017-09-27
Data columns (total 39 columns):
meantempm 1000 non-null object
maxtempm 1000 non-null object
mintempm 1000 non-null object
meantempm_1 999 non-null object
meandewptm_1 999 non-null object
meanpressurem_1 999 non-null object
maxhumidity_1 999 non-null object
21
minhumidity_1 999 non-null object
maxtempm_1 999 non-null object
mintempm_1 999 non-null object
maxdewptm_1 999 non-null object
mindewptm_1 999 non-null object
maxpressurem_1 999 non-null object
minpressurem_1 999 non-null object
precipm_1 999 non-null object
dtypes: object(39)
22
memory usage: 312.5+ KB
df = df.apply(pd.to_numeric, errors='coerce')
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2015-01-01 to 2017-09-27
Data columns (total 39 columns):
meantempm 1000 non-null int64
maxtempm 1000 non-null int64
mintempm 1000 non-null int64
meantempm_1 999 non-null float64
meandewptm_1 999 non-null float64
meanpressurem_1 999 non-null float64
maxhumidity_1 999 non-null float64
minhumidity_1 999 non-null float64
maxtempm_1 999 non-null float64
23
mintempm_1 999 non-null float64
maxdewptm_1 999 non-null float64
mindewptm_1 999 non-null float64
maxpressurem_1 999 non-null float64
minpressurem_1 999 non-null float64
precipm_1 889 non-null float64
dtypes: float64(36), int64(3)
memory usage: 312.5 KB
# Call describe on df and transpose it due to the large number of columns
spread = df.describe().T
24
# precalculate interquartile range for ease of use in next calculation
IQR = spread['75%'] - spread['25%']
# create an outliers column which is either 3 IQRs below the first quartile or
# 3 IQRs above the third quartile
spread['outliers'] = (spread['min']<(spread['25%']-(3*IQR)))|(spread['max'] > (spread['75%']+3*IQR))
# just display the features containing extreme outliers
spread.ix[spread.outliers,]
%matplotlib inline
plt.rcParams['figure.figsize'] = [14, 8]
df.maxhumidity_1.hist()
plt.title('Distribution of maxhumidity_1')
plt.xlabel('maxhumidity_1')
df.minpressurem_1.hist()
plt.title('Distribution of minpressurem_1')
plt.xlabel('minpressurem_1')
plt.show()
# iterate over the precip columns
for precip_col in ['precipm_1', 'precipm_2', 'precipm_3']:
# create a boolean array of values representing nans
missing_vals = pd.isnull(df[precip_col])
df[precip_col][missing_vals] = 0
df = df.dropna()
25
26

Weather Prediction 2

Uploaded by

Copyright:

Available Formats

Weather Prediction 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Weather Prediction 2

Uploaded by

Copyright:

Available Formats

“WEATHER PREDICTION”

Department of Computer Applications.

In partial fulfillment of the requirements for the award of the degree of

DEPARTMENT OF COMPUTER APPLICATIONS

Faculty Guide Head of the Department

Submitted for Viva-Voce Examination held on 09.04.2021

Dr.B.UMA MAHESWARI M.Sc.,MCA.,M.Phil.,PhD

Internal Examiner External Examiner

ASHWIN J (18MCA038), hereby declare that this Project work

Date : 08.04.2021 (18MCA038)

First and foremost, I would like to extend my heartfelt gratitude and

I express my deep sense of gratitude to Secretary, Dr.T.KANNAIAN

I thank our Principal, Dr.D.BRINDHA M.Sc., MPhil., Ph.D.,

I own my deepest gratitude to Dr.R.SUDHA MCA., M.Phil., Ph.D.,

My sincere thanks to Dr . B . UMA MAHESWARI M.Sc., MCA.,

I thank to Mr. R. Ramkumar MCA, for his valuable suggestions,

Mr. J. ASHWIN (3rd MCA)

TO WHOM SO EVER IT MAY CONCERN

WE WISH ALL THE BEST FOR HIS BETTER FUTURE

For Shiash Info Solutions Private Limited

Shiash Info Solutions Private Limited

Weather forecasting has gained attention many researchers from various

2.1. software Requirement

2.2. Hardware Requirement

3.1. Existing system

4.2. Data collection

The steps involved in preprocessing are :

5.1. Dataset selection

5.2. Features selection

5.4. Machine Learning

5.5. Data preprocessing

5.6. Data visualization

Correlation Value Interpretation

0.8 - 1.0 Very Strong

0.6 - 0.8 Strong

0.4 - 0.6 Moderate

0.2 - 0.4 Weak

0.0 - 0.2 Very Weak

6.1. Linear Regression

7.1. TESTING AND IMPLIMENTATION

from sklearn.model_selection import train_test_split

from sklearn.metrics import r2_score

In this project, I demonstrated how to use the Linear Regression Machine

Krishnan, Santhana, and S. Geetha. "Prediction of Heart Disease Using Machine

Impementation of program in the portal

from datetime import datetime, timedelta

from collections import namedtuple

import matplotlib.pyplot as plt

target_date = datetime(2016, 5, 16)

features = ["date", "meantempm", "meandewptm", "meanpressurem", "maxhumidity", "minhumidity", "maxtem

"mintempm", "maxdewptm", "mindewptm", "maxpressurem", "minpressurem", "precipm"]

DailySummary = namedtuple("DailySummary", features)

def extract_weather_data(url, api_key, target_date, days):

request = BASE_URL.format(API_KEY, target_date.strftime('%Y%m%d'))

records = extract_weather_data(BASE_URL, API_KEY, target_date, 500)

# variables first and remember to set your target_date to datetime(2016, 5, 16)

records += extract_weather_data(BASE_URL, API_KEY, target_date, 500)

tmp = df[['meantempm', 'meandewptm']].head(10)