MLA Lab 6:-Implementation of Decision Tree
MLA Lab 6:-Implementation of Decision Tree
MLA Lab 6:-Implementation of Decision Tree
Theory:-
Decision trees are a popular machine learning algorithm used for both
classification and regression tasks. They operate by recursively partitioning the
input space into regions, with each partition corresponding to a decision based
on the values of input features. Here's a concise overview:
Splitting Criteria: Decision trees make decisions based on splitting criteria, such
as Gini impurity for classification or mean squared error for regression. These
criteria quantify the impurity or uncertainty in a dataset.
Pruning: Decision trees can suffer from overfitting, especially when they grow
too deep. Pruning techniques help to prevent overfitting by removing nodes
that do not significantly improve the tree's performance on a validation set.
Handling Missing Values: Decision trees can handle missing values by either
ignoring them during the splitting process or imputing them based on certain
criteria.
Code(Python):-
"""MLA_lAB6_a254ipynb
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
import warnings
warnings.filterwarnings('ignore')
[Table of Contents](#0.1)
"""
data = '/content/car_evaluation.csv'
df = pd.read_csv(data, header=None)
[Table of Contents](#0.1)
Now, I will explore the data to gain insights about the data.
"""
df.shape
"""We can see that there are 1728 instances and 7 variables in the data
set."""
df.head()
df.columns = col_names
col_names
"""We can see that the column names are renamed. Now, the columns have
meaningful names."""
df.info()
print(df[col].value_counts())
"""We can see that the `doors` and `persons` are categorical in nature. So, I
will treat them as categorical variables.
df['class'].value_counts()
df.isnull().sum()
X = df.drop(['class'], axis=1)
y = df['class']
X_train.shape, X_test.shape
X_train.dtypes
X_train.head()
X_test = encoder.transform(X_test)
X_train.head()
X_test.head()
# import DecisionTreeClassifier
y_pred_gini = clf_gini.predict(X_test)
y_pred_train_gini = clf_gini.predict(X_train)
y_pred_train_gini
plt.figure(figsize=(12,8))
tree.plot_tree(clf_gini.fit(X_train, y_train))
import graphviz
dot_data = tree.export_graphviz(clf_gini, out_file=None,
feature_names=X_train.columns,
class_names=y_train,
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph
y_pred_en = clf_en.predict(X_test)
y_pred_train_en = clf_en.predict(X_train)
y_pred_train_en
plt.figure(figsize=(12,8))
tree.plot_tree(clf_en.fit(X_train, y_train))
import graphviz
dot_data = tree.export_graphviz(clf_en, out_file=None,
feature_names=X_train.columns,
class_names=y_train,
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph
cm = confusion_matrix(y_test, y_pred_en)
print(classification_report(y_test, y_pred_en))
OP:-
Previewing dataset:---
Train.head
Test.head:-