MLA Lab 6:-Implementation of Decision Tree

MLA Lab 6:- Implementation of Decision Tree
Name : Tushar Patil

Roll no: A254
Batch : B
Theory:-
Decision trees are a popular machine learning algorithm used for both
classification and regression tasks. They operate by recursively partitioning the
input space into regions, with each partition corresponding to a decision based
on the values of input features. Here's a concise overview:
Splitting Criteria: Decision trees make decisions based on splitting criteria, such
as Gini impurity for classification or mean squared error for regression. These
criteria quantify the impurity or uncertainty in a dataset.
Tree Construction: Decision trees are constructed recursively. At each step,

the algorithm selects the best feature and corresponding threshold to split the
data into two or more subsets. This process continues until a stopping criterion
is met, such as reaching a maximum depth or minimum number of samples in
a node.
Pruning: Decision trees can suffer from overfitting, especially when they grow
too deep. Pruning techniques help to prevent overfitting by removing nodes
that do not significantly improve the tree's performance on a validation set.
Tree Interpretability: One of the main advantages of decision trees is their

interpretability. The resulting tree structure can be easily visualized and
understood, making it valuable for explaining the decision-making process to
stakeholders.
Handling Categorical Features: Decision trees naturally handle categorical
features by splitting them into distinct categories. Some implementations may
require encoding categorical features into numerical values.
Ensemble Methods: Decision trees can be combined into ensemble methods

like Random Forests or Gradient Boosted Trees, which often result in improved
performance by aggregating the predictions of multiple trees.
Scalability: While decision trees are efficient for small to medium-sized

datasets, they may not scale well to very large datasets due to their
computational complexity.
Handling Missing Values: Decision trees can handle missing values by either
ignoring them during the splitting process or imputing them based on certain
criteria.
Handling Imbalanced Classes: Decision trees can be biased towards the

majority class in imbalanced datasets. Techniques such as class weights or
resampling can be employed to mitigate this issue.
Hyperparameter Tuning: Decision trees have hyperparameters that can be

tuned to optimize performance, such as maximum depth, minimum samples
per leaf, and maximum features considered for splitting.
Code(Python):-
"""MLA_lAB6_a254ipynb
Automatically generated by Colaboratory.
Original file is located at

https://colab.research.google.com/drive/1rd0tGaEJq0VTrq4QvnCeWS-tpCzqOi9B
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
import warnings
warnings.filterwarnings('ignore')
"""# **8. Import dataset** <a class="anchor" id="8"></a>
[Table of Contents](#0.1)
"""
data = '/content/car_evaluation.csv'
df = pd.read_csv(data, header=None)
"""# **9. Exploratory data analysis** <a class="anchor" id="9"></a>
[Table of Contents](#0.1)
Now, I will explore the data to gain insights about the data.
"""
df.shape
"""We can see that there are 1728 instances and 7 variables in the data
set."""
df.head()
col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',

'class']
df.columns = col_names
col_names
# let's again preview the dataset

df.head()
"""We can see that the column names are renamed. Now, the columns have
meaningful names."""
df.info()
col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',

'class']
for col in col_names:
print(df[col].value_counts())
"""We can see that the `doors` and `persons` are categorical in nature. So, I
will treat them as categorical variables.
### Explore `class` variable

"""
df['class'].value_counts()
df.isnull().sum()
X = df.drop(['class'], axis=1)
y = df['class']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33,

random_state = 42)
X_train.shape, X_test.shape
X_train.dtypes
X_train.head()
# import category encoders

!pip install category_encoders
import category_encoders as ce
!pip install category_encoders
encoder = ce.OrdinalEncoder(cols=['buying', 'maint', 'doors', 'persons',

'lug_boot', 'safety'])
X_train = encoder.fit_transform(X_train)
X_test = encoder.transform(X_test)
X_train.head()
X_test.head()
# import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier
# instantiate the DecisionTreeClassifier model with criterion gini index
clf_gini = DecisionTreeClassifier(criterion='gini', max_depth=3,

random_state=0)
# fit the model

clf_gini.fit(X_train, y_train)
y_pred_gini = clf_gini.predict(X_test)
from sklearn.metrics import accuracy_score
print('Model accuracy score with criterion gini index: {0:0.4f}'.

format(accuracy_score(y_test, y_pred_gini)))
y_pred_train_gini = clf_gini.predict(X_train)
y_pred_train_gini
print('Training-set accuracy score: {0:0.4f}'. format(accuracy_score(y_train,

y_pred_train_gini)))
# print the scores on training and test set
print('Training set score: {:.4f}'.format(clf_gini.score(X_train, y_train)))
print('Test set score: {:.4f}'.format(clf_gini.score(X_test, y_test)))
plt.figure(figsize=(12,8))
from sklearn import tree
tree.plot_tree(clf_gini.fit(X_train, y_train))
import graphviz
dot_data = tree.export_graphviz(clf_gini, out_file=None,
feature_names=X_train.columns,
class_names=y_train,
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph
# instantiate the DecisionTreeClassifier model with criterion entropy
clf_en = DecisionTreeClassifier(criterion='entropy', max_depth=3,

random_state=0)
# fit the model

clf_en.fit(X_train, y_train)
y_pred_en = clf_en.predict(X_test)
from sklearn.metrics import accuracy_score
print('Model accuracy score with criterion entropy: {0:0.4f}'.

format(accuracy_score(y_test, y_pred_en)))
y_pred_train_en = clf_en.predict(X_train)
y_pred_train_en
print('Training-set accuracy score: {0:0.4f}'. format(accuracy_score(y_train,

y_pred_train_en)))
# print the scores on training and test set
print('Training set score: {:.4f}'.format(clf_en.score(X_train, y_train)))
print('Test set score: {:.4f}'.format(clf_en.score(X_test, y_test)))
plt.figure(figsize=(12,8))
from sklearn import tree
tree.plot_tree(clf_en.fit(X_train, y_train))
import graphviz
dot_data = tree.export_graphviz(clf_en, out_file=None,
feature_names=X_train.columns,
class_names=y_train,
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph
# Print the Confusion Matrix and slice it into four pieces
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred_en)
print('Confusion matrix\n\n', cm)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred_en))
OP:-
Previewing dataset:---
Train.head
Test.head:-

MLA Lab 6:-Implementation of Decision Tree

Uploaded by

Copyright:

Available Formats

MLA Lab 6:-Implementation of Decision Tree

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MLA Lab 6:-Implementation of Decision Tree

Uploaded by

Copyright:

Available Formats

MLA Lab 6:- Implementation of Decision Tree

Name : Tushar Patil

Tree Construction: Decision trees are constructed recursively. At each step,

Tree Interpretability: One of the main advantages of decision trees is their

Ensemble Methods: Decision trees can be combined into ensemble methods

Scalability: While decision trees are efficient for small to medium-sized

Handling Imbalanced Classes: Decision trees can be biased towards the

Hyperparameter Tuning: Decision trees have hyperparameters that can be

Automatically generated by Colaboratory.

Original file is located at

"""# **8. Import dataset** <a class="anchor" id="8"></a>

"""# **9. Exploratory data analysis** <a class="anchor" id="9"></a>

col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',

# let's again preview the dataset

col_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',

for col in col_names:

### Explore `class` variable

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33,

# import category encoders

!pip install category_encoders

encoder = ce.OrdinalEncoder(cols=['buying', 'maint', 'doors', 'persons',

from sklearn.tree import DecisionTreeClassifier

# instantiate the DecisionTreeClassifier model with criterion gini index

clf_gini = DecisionTreeClassifier(criterion='gini', max_depth=3,

# fit the model

from sklearn.metrics import accuracy_score

print('Model accuracy score with criterion gini index: {0:0.4f}'.

print('Training-set accuracy score: {0:0.4f}'. format(accuracy_score(y_train,

# print the scores on training and test set

print('Training set score: {:.4f}'.format(clf_gini.score(X_train, y_train)))

print('Test set score: {:.4f}'.format(clf_gini.score(X_test, y_test)))

from sklearn import tree

# instantiate the DecisionTreeClassifier model with criterion entropy

clf_en = DecisionTreeClassifier(criterion='entropy', max_depth=3,

# fit the model

from sklearn.metrics import accuracy_score

print('Model accuracy score with criterion entropy: {0:0.4f}'.

print('Training-set accuracy score: {0:0.4f}'. format(accuracy_score(y_train,

# print the scores on training and test set

print('Training set score: {:.4f}'.format(clf_en.score(X_train, y_train)))

print('Test set score: {:.4f}'.format(clf_en.score(X_test, y_test)))

from sklearn import tree

# Print the Confusion Matrix and slice it into four pieces

from sklearn.metrics import confusion_matrix

print('Confusion matrix\n\n', cm)

from sklearn.metrics import classification_report

You might also like

"""# 8. Import dataset <a class="anchor" id="8"></a>

"""# 9. Exploratory data analysis <a class="anchor" id="9"></a>