Email Classification: Roll No-41463 (LP-3)
Email Classification: Roll No-41463 (LP-3)
Email Classification: Roll No-41463 (LP-3)
Email Classification
Classify the email using binary classification method. Email Spam detection has two
states: a) Normal State Not Spam b) Abnormal State Spam. Use K-Nearest Neighbors and
Support Vector Machine for Classification. Analyze their performance.
In [2]: df = pd.read_csv("emails.csv")
df.head()
Out[2]:
Email
the to ect and for of a you hou ... connevey jay valued lay infrastructu
No.
Email
0 0 0 1 0 0 0 2 0 0 ... 0 0 0 0
1
Email
1 8 13 24 6 6 2 102 1 27 ... 0 0 0 0
2
Email
2 0 0 1 0 0 0 8 0 0 ... 0 0 0 0
3
Email
3 0 5 22 0 5 1 51 2 10 ... 0 0 0 0
4
Email
4 7 6 17 1 5 2 57 0 9 ... 0 0 0 0
5
Out[3]:
Email
the to ect and for of a you hou ... connevey jay valued lay infrastru
No.
Email
5167 2 2 2 3 0 0 32 0 0 ... 0 0 0 0
5168
Email
5168 35 27 11 2 6 5 151 4 3 ... 0 0 0 0
5169
Email
5169 0 0 1 1 0 0 11 0 0 ... 0 0 0 0
5170
Email
5170 2 7 1 0 2 1 28 2 0 ... 0 0 0 0
5171
Email
5171 22 24 5 1 6 5 148 8 2 ... 0 0 0 0
5172
In [4]: df.info()
<class 'pandas.core.frame.DataFrame'>
In [5]: df.describe()
Out[5]:
the to ect and for of
the 0
to 0
ect 0
and 0
for 0
of 0
a 0
you 0
hou 0
in 0
on 0
is 0
this 0
enron 0
i 0
be 0
that 0
will 0
have 0
with 0
your 0
at 0
we 0
s 0
are 0
it 0
by 0
com 0
as 0
..
decisions 0
produced 0
ended 0
greatest 0
degree 0
solmonson 0
imbalances 0
fall 0
fear 0
hate 0
fight 0
reallocated 0
debt 0
reform 0
australia 0
plain 0
prompt 0
remains 0
ifhsc 0
enhancements 0
connevey 0
jay 0
valued 0
lay 0
infrastructure 0
military 0
allowing 0
ff 0
dry 0
Prediction 0
In [7]: x = df.iloc[:,1:3001]
y = df.iloc[:,-1].values
In [ ]:
Analyzing performance
MSE: 0.12560386473429952
MAE: 0.12560386473429952
RMSE: 0.3544063553807966
R2 Score: 0.40780091899790494
Analyzing Performance
In [12]: print("MSE: ", mean_squared_error(y_test, y_pred))
print("MAE: ", mean_absolute_error(y_test, y_pred))
print("RMSE: ", np.sqrt(mean_squared_error(y_test, y_pred)))
print("R2 Score: ", metrics.r2_score(y_test, y_pred))
print("Accuracy Score for KNN: ", accuracy_score(y_test, y_pred))
MSE: 0.07149758454106281
MAE: 0.07149758454106281
RMSE: 0.2673903224521464
R2 Score: 0.6629020615834228
In [ ]: