Data Mining Journal 4 Kashan

Bahria University,
Karachi Campus
LAB EXPERIMENT NO.

_4_
LIST OF TASKS
TASK NO OBJECTIVE
1 Using python implement Decision Tree Algorithm on Diabetes Dataset the chances of
diabetes in a person. visualize the results of the model in the form of a confusion matrix
using matplotlib and seaborn.
2 Using Knime implement Task # 01.
3 Using python perform the parameter tuning to optimize the Decision Tree performance and
compare the results with task # 1.
Date: ___________
Kashan Riaz 02-131212-075 Data mining Journal

Task No. 1: Diabetes dataset decision tree python
Solution:
 Task 1:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
# Load the dataset

data = pd.read_csv("diabetes.csv")
# Display the first few rows of the dataset

print(data.head())
# Splitting the data into features (X) and target variable (y)
X = data.drop('Outcome', axis=1)
y = data['Outcome']
# Splitting the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initializing the Decision Tree classifier

clf = DecisionTreeClassifier(criterion='entropy',splitter='best')
# Training the classifier

clf.fit(X_train, y_train)
# Predicting on the test set

y_pred = clf.predict(X_test)
# Calculating accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Creating a confusion matrix

cm = confusion_matrix(y_test, y_pred)
# Plotting the confusion matrix

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", cbar=False)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.title('Confusion Matrix')
plt.show()
Task No.2: Knime workflow
Solution:
 Preprocessing and decision trees.
 Data cleaning, data transformation etc.
 Correlation matrix tells which coloumns have most correlation etc.
 Decision trees can be made in knime.
 Following workflow for decision trees.

 Assign different values to different colors on color manager.

 Keep partitioning as 80 percent.

Scorer view accuracy.

 Decision tree learner view



Task No.3:Optimize decision tree
Solution:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
# Load the dataset

data = pd.read_csv("diabetes.csv")
# Splitting the data into features (X) and target variable (y)
X = data.drop('Outcome', axis=1)
y = data['Outcome']
# Splitting the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Define the parameter grid to search

param_grid = {
'criterion': ['gini', 'entropy'],
'max_depth': [None, 5, 10, 15, 20],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
# Initialize the Decision Tree classifier

clf = DecisionTreeClassifier(random_state=42)
# Initialize GridSearchCV
grid_search = GridSearchCV(clf, param_grid, cv=5)
# Perform GridSearchCV
grid_search.fit(X_train, y_train)
# Get the best parameters

best_params = grid_search.best_params_
print("Best Parameters:", best_params)
# Predicting on the test set using the best model

best_clf = grid_search.best_estimator_
y_pred_tuned = best_clf.predict(X_test)

# Calculating accuracy
accuracy_tuned = accuracy_score(y_test, y_pred_tuned)
print("Accuracy (Tuned Model):", accuracy_tuned)
# Creating a confusion matrix for the tuned model

cm_tuned = confusion_matrix(y_test, y_pred_tuned)
# Plotting the confusion matrix for the tuned model

plt.figure(figsize=(8, 6))
sns.heatmap(cm_tuned, annot=True, fmt="d", cmap="Blues", cbar=False)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.title('Confusion Matrix (Tuned Model)')
plt.show()
Accuracy increased from 75 to 78 percent now since data more tuned.

Data Mining Journal 4 Kashan

Uploaded by

Copyright:

Available Formats

Data Mining Journal 4 Kashan

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Journal 4 Kashan

Uploaded by

Copyright:

Available Formats

Bahria University,

LAB EXPERIMENT NO.

Kashan Riaz 02-131212-075 Data mining Journal

# Load the dataset

# Display the first few rows of the dataset

# Initializing the Decision Tree classifier

# Training the classifier

# Predicting on the test set

# Creating a confusion matrix

# Plotting the confusion matrix

Scorer view accuracy.

Kashan Riaz 02-131212-075 Data mining Journal

Kashan Riaz 02-131212-075 Data mining Journal

# Load the dataset

# Define the parameter grid to search

# Initialize the Decision Tree classifier

# Get the best parameters

# Predicting on the test set using the best model

Kashan Riaz 02-131212-075 Data mining Journal

# Creating a confusion matrix for the tuned model

# Plotting the confusion matrix for the tuned model

Accuracy increased from 75 to 78 percent now since data more tuned.

Kashan Riaz 02-131212-075 Data mining Journal

You might also like