Python Programming
Python Programming
1. Installation of Python -:
a. Install Python IDLE.
b. Add to Path variable.
c. Go to CMD.
d. python -v.
e. pip -v.
f. pip install numpy.
g. Check if installed -pip show numpy.
h. pip install pandas.
i. pip show pandas.
Output -:
Enter a number: 54
54 s not a prime number.
Output -:
Enter the first string: Hello
Enter the second string: World
Concatenated String: Hello World
ASSGNMENT -02
1. Write a program to add two matrixes’ manually.
Source Code -:
def get_matrix(size):
matrix = []
print("Enter the elements row-wise:")
for i in range(size):
row = []
for j in range(size):
row.append(int(input(f"Element [{i}][{j}]: ")))
matrix.append(row)
return matrix
def add_matrices(matrix1, matrix2):
size = len(matrix1)
result = [[0 for _ in range(size)] for _ in range(size)]
for i in range(size):
for j in range(size):
result[i][j] = matrix1[i][j] + matrix2[i][j]
return result
def print_matrix(matrix):
for row in matrix:
print(row)
size = int(input("Enter the size of the square matrices: "))
print("Enter elements for the first matrix:")
matrix1 = get_matrix(size)
print("Enter elements for the second matrix:")
matrix2 = get_matrix(size)
result = add_matrices(matrix1, matrix2)
print("The sum of the matrices is:")
print_matrix(result)
Output -:
Enter the size of the square matrices: 2
Enter elements for the first matrix:
Enter the elements row-wise:
Element [0][0]: 1
Element [0][1]: 2
Element [1][0]: 3
Element [1][1]: 4
Enter elements for the second matrix:
Enter the elements row-wise:
Element [0][0]: 5
Element [0][1]: 6
Element [1][0]: 7
Element [1][1]: 8
The sum of the matrices is:
[6, 8]
[10, 12]
Output -:
Enter the number of rows for the first matrix: 2
Enter the number of columns for the first matrix: 2
Enter the elements of a 2x2 matrix row by row:
12
34
Enter the number of rows for the second matrix: 2
Enter the number of columns for the second matrix: 2
Enter the elements of a 2x2 matrix row by row:
45
67
Resultant Matrix:
[16, 19]
[36, 43]
Output -:
[9, 16, 21]
[24, 25, 24]
[21, 16, 9]
ASSGNMENT -03
1. Program -: Checking validates of Python Libraries -:
a. sys b. scipy c. numpy d. matplotlib e. pandas f. sklearn
a. sys
Source code -:
import sys
print('Python:{}'.format(sys.version))
Output -:
Python:3.12.6
b. scipy
Source code -:
import scipy
print('Scipy:{}'.format(scipy.__version__))
Output -:
Scipy:1.13.1
c. numpy
Source code -:
import numpy
print('Numpy:{}'.format(numpy.__version__))
Output -:
Numpy:1.26.4
d. matplotlib
Source code -:
import matplotlib
print('Matplotlib:{}'.format(matplotlib.__version__))
Output -:
Matplotlib:3.9.1
e. pandas
Source Code -:
import pandas
print('Pandas:{}'.format(pandas.__version__))
Output -:
Pandas:2.2.2
f. sklearn
Source Code -:
import sklearn
print('Sklearn:{}'.format(sklearn.__version__))
Output -:
Sklearn:1.5.1
Output -:
2D Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Output -:
Matrix 1:
[[1 2 3]
[4 5 6]
[7 8 9]]
Matrix 2:
[[9 8 7]
[6 5 4]
[3 2 1]]
Output -:
Matrix A:
[[1 2]
[3 4]]
Matrix B:
[[5 6]
[7 8]]
Matrix A * Matrix B:
[[19 22]
[43 50]]
Output -:
Numpy array:
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
COO representation:
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
(3, 3) 1.0
ASSGNMENT -04
1. Write a program to implement KNN model and to plot first feature and second feature of
iris dataset.
Source Code -:
import numpy as np
import pandas as pd
import mglearn
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
#generate dataset
X,y=mglearn.datasets.make_forge()
#plot dataset
mglearn.discrete_scatter(X[:,0], X[:, 1], y)
plt.legend(["class 0", "class 1"], loc=4)
plt.xlabel("first feature:")
plt.ylabel("2nd feature:")
print(X,y)
print("X.shape:{}".format(X.shape))
plt.show()
Output -:
[[ 9.96346605 4.59676542]
[11.0329545 -0.16816717]
[11.54155807 5.21116083]
[ 8.69289001 1.54322016]
[ 8.1062269 4.28695977]
[ 8.30988863 4.80623966]
[11.93027136 4.64866327]
[ 9.67284681 -0.20283165]
[ 8.34810316 5.13415623]
[ 8.67494727 4.47573059]
[ 9.17748385 5.09283177]
[10.24028948 2.45544401]
[ 8.68937095 1.48709629]
[ 8.92229526 -0.63993225]
[ 9.49123469 4.33224792]
[ 9.25694192 5.13284858]
[ 7.99815287 4.8525051 ]
[ 8.18378052 1.29564214]
[ 8.7337095 2.49162431]
[ 9.32298256 5.09840649]
[10.06393839 0.99078055]
[ 9.50048972 -0.26430318]
[ 8.34468785 1.63824349]
[ 9.50169345 1.93824624]
[ 9.15072323 5.49832246]
[11.563957 1.3389402 ]] [1 0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 0 1 0 0 0 0 1 0]
X.shape:(26, 2)
2. Write a program of the characterise of the dataset including its key, shape, class and
features.
Source Code -:
import mglearn
import matplotlib.pyplot as plt
X,y=mglearn.datasets.make_wave(n_samples=40)
plt.plot(X, y, 'o')
plt.ylim(-3, 3)
plt.xlabel("Feature")
plt.ylabel("Target")
plt.show()
Output -:
3. Write a program of the characterise of breast cancer dataset including its key shape ,
class, features.
Source Code -:
import numpy as np
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
print("cancer,keys(): \n{}".format(cancer.keys()))
print("Shape of cancer data: {}".format(cancer.data.shape))
print("Sample count per class: \n{}".format({n: v for n, v in zip(cancer.target_names,
np.bincount(cancer.target))}))
print("Feature name: \n{}".format(cancer.feature_names))
Output-:
cancer,keys():
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename',
'data_module'])
Shape of cancer data: (569, 30)
Sample count per class:
{'malignant': 212, 'benign': 357}
Feature name:
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
'mean smoothness' 'mean compactness' 'mean concavity'
'mean concave points' 'mean symmetry' 'mean fractal dimension'
'radius error' 'texture error' 'perimeter error' 'area error'
'smoothness error' 'compactness error' 'concavity error'
'concave points error' 'symmetry error' 'fractal dimension error'
'worst radius' 'worst texture' 'worst perimeter' 'worst area'
'worst smoothness' 'worst compactness' 'worst concavity'
'worst concave points' 'worst symmetry' 'worst fractal dimension']
4. Write a program to show how KNN classification work with different numbers of
neighbours.
Source Code -:
import mglearn
from sklearn.datasets import fetch_california_housing
import matplotlib.pyplot as plt
housing = fetch_california_housing()
print("Data shape: {}".format(housing.data.shape))
X,y = mglearn.datasets.load_extended_boston()
print("X.shape:{}".format(X.shape))
#mglearn.plots.plot_knn_classification(n_neighbors=1)
mglearn.plots.plot_knn_classification(n_neighbors=3)
plt.show()
Output -:
Data shape: (20640, 8)
X.shape:(506, 104)
5. Write a program to plot training accuracy and test accuracy and regression on dataset of
breast cancer.
Source Code -:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt
import mglearn
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target,
stratify=cancer.target, random_state=66)
training_accuracy = []
test_accuracy = []
neighbors_settings = range(1, 11)
for n_neighbors in neighbors_settings:
clf = KNeighborsClassifier(n_neighbors=n_neighbors)
clf.fit(X_train, y_train)
training_accuracy.append(clf.score(X_train, y_train))
test_accuracy.append(clf.score(X_test, y_test))
plt.plot(neighbors_settings, training_accuracy, label="Training accuracy")
plt.plot(neighbors_settings, test_accuracy, label="Test accuracy")
plt.ylabel("Accuracy")
plt.xlabel("n_neighbors")
plt.legend()
mglearn.plots.plot_knn_regression(n_neighbors=1)
mglearn.plots.plot_knn_regression(n_neighbors=3)
plt.show()
Output -:
6. Write a program to implement KNN Regressor and also plot it.
Source Code -:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import mglearn
X, y = mglearn.datasets.make_wave(n_samples=40)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
plt.plot(X_train, y_train, 'o')
plt.plot(X_test, y_test, '+')
plt.ylim(-3, 3)
plt.xlabel("Feature")
plt.ylabel("Target")
reg = KNeighborsRegressor(n_neighbors=3)
reg.fit(X_train, y_train)
print("Test Set Predictions:\n", reg.predict(X_test))
print("Test Set R^2: {:.2f}".format(reg.score(X_test, y_test)))
plt.show()
Output -:
Test Set Predictions:
[-0.05396539 0.35686046 1.13671923 -1.89415682 -1.13881398 -1.63113382
0.35686046 0.91241374 -0.44680446 -1.13881398]
Test Set R^2: 0.83
7. Write a program to implement KNN Regressor and also plot it and also find its train test
result.
Source Code -:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import mglearn
X, y = mglearn.datasets.make_wave(n_samples=40)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
plt.plot(X_train, y_train, 'o')
plt.plot(X_test, y_test, '+')
plt.ylim(-3, 3)
plt.xlabel("Feature")
plt.ylabel("Target")
reg = KNeighborsRegressor(n_neighbors=3)
reg.fit(X_train, y_train)
print("Test Set Predictions:\n", reg.predict(X_test))
print("Test Set R^2: {:.2f}".format(reg.score(X_test, y_test)))
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
line = np.linspace(-3, 3, 1000).reshape(-1, 1)
for n_neighbors, ax in zip([1, 3, 9], axes):
reg = KNeighborsRegressor(n_neighbors=n_neighbors)
reg.fit(X_train, y_train)
ax.plot(line, reg.predict(line))
ax.plot(X_train, y_train, 'o',c=mglearn.cm2(0), markersize=8)
ax.plot(X_test, y_test, '+',c=mglearn.cm2(1), markersize=8)
ax.set_title(f"n_neighbors = {n_neighbors}\nTrain score: {reg.score(X_train,
y_train):.2f}\nTest score: {reg.score(X_test, y_test):.2f}")
ax.set_ylabel("Target")
axes[0].legend(["Model prediction","Training data/target","Test data/target"],loc="best")
plt.show()
Output -:
Test Set Predictions:
[-0.05396539 0.35686046 1.13671923 -1.89415682 -1.13881398 -1.63113382
0.35686046 0.91241374 -0.44680446 -1.13881398]
Test Set R^2: 0.83
ASSGNMENT -05
1. Write a program using Gaussian NB to find accuracy score, confusion matrix ,actual value ,
predicted value, F1 score.
Source Code -:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay,
f1_score
import matplotlib.pyplot as plt
X, y = make_classification(
n_features=6,
n_classes=3,
n_samples=800,
n_informative=2,
random_state=1,
n_clusters_per_class=1
)
plt.scatter(X[:, 0], X[:, 1], c=y, marker="*")
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=125
)
model = GaussianNB()
model.fit(X_train, y_train)
predicted = model.predict([X_test[6]])
print("Actual Value:", y_test[6])
print("Predicted Value:", predicted[0])
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_pred, y_test)
f1 = f1_score(y_pred, y_test, average="weighted")
print("Accuracy:", accuracy)
print("F1 Score:", f1)
labels = [0, 1, 2]
cm = confusion_matrix(y_test, y_pred, labels=labels)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=labels)
disp.plot()
plt.show()
Output -:
Actual Value: 0
Predicted Value: 0
Accuracy: 0.8484848484848485
F1 Score: 0.8491119695890328
ASSGNMENT -06
1. Write a program to implement SVM in dataset and also plot it .
Source Code-:
from sklearn.datasets import make_blobs
import mglearn
import matplotlib.pyplot as plt
from sklearn.svm import LinearSVC
import numpy as np
X, y = make_blobs(centers=4, random_state=8)
y=y%2
mglearn.discrete_scatter(X[:, 0], X[:, 1], y)
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")
linear_svm = LinearSVC().fit(X, y)
mglearn.plots.plot_2d_separator(linear_svm, X)
X_new = np.hstack([X, X[:,1:]**2])
figure = plt.figure()
ax = figure.add_subplot(projection='3d', elev=-152, azim=-26)
mask = y ==0
ax.scatter(X_new[mask, 0], X_new[mask, 1], X_new[mask, 2], c='b',label='Class 0' , s=60)
ax.scatter(X_new[~mask, 0], X_new[~mask, 1], X_new[~mask, 2], c='r',
marker='^',label='Class 1' , s=60)
ax.set_xlabel("Feature 0")
ax.set_ylabel("Feature 1")
ax.set_zlabel("Feature 1 **2")
ax.legend()
plt.show()
Output -:
ASSGNMENT -07
1. Write a program to implement KNN Classifier by taking any dataset.
Source Code -:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
url = "https://raw.githubusercontent.com/plotly/datasets/master/timeseries.csv"
dataset = pd.read_csv(url)
print(dataset.head())
X = dataset.drop(columns=['Date', 'G'])
Y = dataset['G']
Y_binned = pd.cut(Y, bins=3, labels=[0, 1, 2])
X_train, X_test, Y_train, Y_test = train_test_split(X, Y_binned, random_state=0)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, Y_train)
X_new = pd.DataFrame([[5, 2, 3, 1, 4, 6]], columns=X.columns)
prediction = knn.predict(X_new)
print(f"Prediction: {prediction}")
y_pred = knn.predict(X_test)
print(f"Test set predictions: \n{y_pred}")
print(f"Test set score: {knn.score(X_test, Y_test):.2f}")
Output -:
Date A B C D E F G
0 2008-03-18 24.68 164.93 114.73 26.27 19.21 28.87 63.44
1 2008-03-19 24.18 164.89 114.75 26.22 19.07 27.76 59.98
2 2008-03-20 23.99 164.63 115.04 25.78 19.01 27.04 59.61
3 2008-03-25 24.14 163.92 114.85 27.41 19.61 27.84 59.41
4 2008-03-26 24.44 163.45 114.84 26.86 19.53 28.02 60.09
Prediction: [1]
Test set predictions:
[1 0 1]
Test set score: 1.00
ASSGNMENT -08
1. Write a program to implement SVM in dataset and also plot it in 3D with hyperplane.
Source Code-:
from sklearn.datasets import make_blobs
import mglearn
import matplotlib.pyplot as plt
from sklearn.svm import LinearSVC
import numpy as np
X, y = make_blobs(centers=4, random_state=8)
y=y%2
mglearn.discrete_scatter(X[:, 0], X[:, 1], y)
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")
linear_svm = LinearSVC().fit(X, y)
mglearn.plots.plot_2d_separator(linear_svm, X)
X_new = np.hstack([X, X[:,1:]**2])
figure = plt.figure()
ax = figure.add_subplot(projection='3d', elev=-152, azim=-26)
mask = y ==0
ax.scatter(X_new[mask, 0], X_new[mask, 1], X_new[mask, 2], c='b',label='Class 0' , s=60)
ax.scatter(X_new[~mask, 0], X_new[~mask, 1], X_new[~mask, 2], c='r',
marker='^',label='Class 1' , s=60)
print(mask)
print(~mask)
ax.set_xlabel("Feature 0")
ax.set_ylabel("Feature 1")
ax.set_zlabel("Feature 1 **2")
linear_svm_3d = LinearSVC().fit(X_new, y)
coef, intercept = linear_svm_3d.coef_.ravel(), linear_svm_3d.intercept_
figure = plt.figure()
ax = figure.add_subplot(projection='3d', elev=-152, azim=-26)
xx = np.linspace(X_new[:, 0].min()-2, X_new[:, 0].max() +2, 50)
yy = np.linspace(X_new[:, 1].min()-2, X_new[:, 1].max() +2, 50)
XX, YY = np.meshgrid(xx, yy)
ZZ = (coef[0] * XX + coef[1] * YY + intercept) / -coef[2]
ax.plot_surface(XX, YY, ZZ, rstride=8, cstride=8, alpha=0.3)
ax.scatter(X_new[mask, 0], X_new[mask, 1], X_new[mask, 2], c='b',label='Class 0' , s=60)
ax.scatter(X_new[~mask, 0], X_new[~mask, 1], X_new[~mask, 2], c='r',
marker='^',label='Class 1' , s=60)
ax.set_xlabel("Feature 0")
ax.set_ylabel("Feature 1")
ax.set_zlabel("Feature 1 **2")
ax.legend()
plt.show()
Output -:
[False True False False False True True False False False True True
False False False True True False False True True False True True
False True True False True False False False False False True True
False True False True False False True True True True False True
False False False True True True False True True True True False
False True False True True False False False True True False True
True False True True False True False False True True False True
False True True True True False False True False True True False
False False False False]
[ True False True True True False False True True True False False
True True True False False True True False False True False False
True False False True False True True True True True False False
True False True False True True False False False False True False
True True True False False False True False False False False True
True False True False False True True True False False True False
False True False False True False True True False False True False
True False False False False True True False True False False True
True True True True]
ASSGNMENT -09
1. Write a program to implement Decision Tree Classifier in python.
Source Code -:
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier, plot_tree
cancer = load_breast_cancer()
X_train,X_test,Y_train,Y_test=train_test_split(cancer.data,cancer.target,stratify=cancer.tar
get,random_state=42)
'''
tree=DecisionTreeClassifier(random_state=0)
tree.fit(X_train,Y_train)
print("Accuracy on trining set: {:.3f}".format(tree.score(X_train,Y_train)))
print("Accuracy on test set: {:.3f}".format(tree.score(X_test,Y_test)))
'''
tree=DecisionTreeClassifier(max_depth=3,random_state=0)
tree.fit(X_train,Y_train)
print("Accuracy on trining set: {:.3f}".format(tree.score(X_train,Y_train)))
print("Accuracy on test set: {:.3f}".format(tree.score(X_test,Y_test)))
from sklearn.tree import export_graphviz
export_graphviz(tree,out_file="tree.dot",class_names=["malignant","benign"],feature_na
mes=cancer.feature_names,impurity=False,filled=True)
import graphviz
with open("tree.dot")as f:
dot_graph=f.read()
print(dot_graph)
plt.figure(figsize=(12, 8))
plot_tree(tree,
filled=True,
feature_names=cancer.feature_names,
class_names=["malignant", "benign"],
rounded=True,
fontsize=10)
plt.title("Decision Tree for Breast Cancer Classification")
plt.show()
Output -:
Accuracy on trining set: 0.977
Accuracy on test set: 0.944
digraph Tree {
node [shape=box, style="filled", color="black", fontname="helvetica"] ;
edge [fontname="helvetica"] ;
0 [label="worst radius <= 16.795\nsamples = 426\nvalue = [159, 267]\nclass = benign",
fillcolor="#afd7f4"] ;
1 [label="worst concave points <= 0.136\nsamples = 284\nvalue = [25, 259]\nclass =
benign", fillcolor="#4ca6e8"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="radius error <= 1.048\nsamples = 252\nvalue = [4, 248]\nclass = benign",
fillcolor="#3c9fe5"] ;
1 -> 2 ;
3 [label="samples = 251\nvalue = [3, 248]\nclass = benign", fillcolor="#3b9ee5"] ;
2 -> 3 ;
4 [label="samples = 1\nvalue = [1, 0]\nclass = malignant", fillcolor="#e58139"] ;
2 -> 4 ;
5 [label="worst texture <= 25.62\nsamples = 32\nvalue = [21, 11]\nclass = malignant",
fillcolor="#f3c3a1"] ;
1 -> 5 ;
6 [label="samples = 12\nvalue = [3, 9]\nclass = benign", fillcolor="#7bbeee"] ;
5 -> 6 ;
7 [label="samples = 20\nvalue = [18, 2]\nclass = malignant", fillcolor="#e88f4f"] ;
5 -> 7 ;
8 [label="texture error <= 0.473\nsamples = 142\nvalue = [134, 8]\nclass = malignant",
fillcolor="#e78945"] ;
0 -> 8 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
9 [label="samples = 5\nvalue = [0, 5]\nclass = benign", fillcolor="#399de5"] ;
8 -> 9 ;
10 [label="worst concavity <= 0.191\nsamples = 137\nvalue = [134, 3]\nclass =
malignant", fillcolor="#e6843d"] ;
8 -> 10 ;
11 [label="samples = 5\nvalue = [2, 3]\nclass = benign", fillcolor="#bddef6"] ;
10 -> 11 ;
12 [label="samples = 132\nvalue = [132, 0]\nclass = malignant", fillcolor="#e58139"] ;
10 -> 12 ; }
ASSGNMENT -10
1. Write a program to implement K Mean Clustering using python. Given dataset is: A1(2,10);
A2(2,5); A3(8,4); A4(5,8); A5(7,5); A6(6,4); A7(1,2); A8(4,9).
Source code -:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
data = np.array([[2, 10], [2, 5], [8, 4], [5, 8], [7, 5], [6, 4], [1, 2], [4, 9]])
k=3
kmeans = KMeans(n_clusters=k, random_state=0)
kmeans.fit(data)
centroids = kmeans.cluster_centers_
labels = kmeans.labels_
plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='rainbow', marker='o', label='Data
points')
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='black', marker='X', label='Centroids')
plt.title(f"K-means Clustering with k={k}")
plt.xlabel('X Coordinate')
plt.ylabel('Y Coordinate')
plt.legend()
plt.show()
print("Cluster Centroids:\n", centroids)
print("Cluster Labels:\n", labels)
Output -:
Cluster Centroids:
[[7. 4.33333333]
[3.66666667 9. ]
[1.5 3.5 ]]
Cluster Labels:
[1 2 0 1 0 0 2 1]
ASSGNMENT -11
1. Write a program to implement Ward’s Algorithm without using linkage and dendrogram.
Given dataset in (x,y): 1(4,4); 2(8,4); 3(15,8); 4(24,12); 5(24,12).
Source Code -:
import numpy as np
import matplotlib.pyplot as plt
def get_data_points():
n = int(input("Enter the number of data points: "))
data = []
print("Enter the coordinates of each data point (x y):")
for _ in range(n):
x, y = map(float, input().split())
data.append((x, y))
return np.array(data)
data = get_data_points()
def euclidean_distance(a, b):
return np.sqrt(np.sum((a - b) ** 2))
n = len(data)
distance_matrix = np.zeros((n, n))
for i in range(n):
for j in range(i + 1, n):
distance_matrix[i, j] = euclidean_distance(data[i], data[j])
distance_matrix[j, i] = distance_matrix[i, j]
clusters = [[i] for i in range(n)]
positions = np.arange(n)
def ward_distance(c1, c2):
combined_cluster = np.vstack((data[c1], data[c2]))
mean_combined = np.mean(combined_cluster, axis=0)
variance = np.sum((combined_cluster - mean_combined) ** 2)
return variance
merge_history = []
heights = []
while len(clusters) > 1:
min_distance = float('inf')
clusters_to_merge = (None, None)
for i in range(len(clusters)):
for j in range(i + 1, len(clusters)):
dist = ward_distance(clusters[i], clusters[j])
if dist < min_distance:
min_distance = dist
clusters_to_merge = (i, j)
points_i = [f"({data[p][0]}, {data[p][1]})" for p in clusters[i]]
points_j = [f"({data[p][0]}, {data[p][1]})" for p in clusters[j]]
print(f"Distance between clusters {points_i} and {points_j}: {dist}")
i, j = clusters_to_merge
new_cluster = clusters[i] + clusters[j]
clusters = [clusters[k] for k in range(len(clusters)) if k not in (i, j)]
clusters.append(new_cluster)
new_position = (positions[i] + positions[j]) / 2
positions = np.delete(positions, [i, j])
positions = np.append(positions, new_position)
merge_history.append((i, j))
heights.append(min_distance)
def plot_dendrogram(merge_history, heights):
plt.figure(figsize=(12, 6))
current_positions = np.arange(n)
colors = plt.cm.viridis(np.linspace(0, 1, len(merge_history)))
for idx, (merge, height) in enumerate(zip(merge_history, heights)):
i, j = merge
plt.plot([current_positions[i], current_positions[i]], [0, height], color=colors[idx])
plt.plot([current_positions[j], current_positions[j]], [0, height], color=colors[idx])
plt.plot([current_positions[i], current_positions[j]], [height, height], color=colors[idx])
new_position = (current_positions[i] + current_positions[j]) / 2
current_positions = np.delete(current_positions, [i, j])
current_positions = np.append(current_positions, new_position)
for idx, pos in enumerate(np.arange(n)):
plt.text(pos, -0.5, f'({data[idx][0]}, {data[idx][1]})',
ha='center', va='top', fontsize=12, color='red')
plt.title("Dendrogram ")
plt.xlabel("Data Points")
plt.ylabel("Distance")
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
plot_dendrogram(merge_history, heights)
Output -:
Enter the number of data points: 5
Enter the coordinates of each data point (x y):
44
84
15 8
24 4
24 12
Distance between clusters ['(4.0, 4.0)'] and ['(8.0, 4.0)']: 8.0
Distance between clusters ['(15.0, 8.0)'] and ['(24.0, 4.0)']: 48.5
Distance between clusters ['(24.0, 4.0)'] and ['(24.0, 12.0)']: 32.0
Distance between clusters ['(15.0, 8.0)'] and ['(4.0, 4.0)', '(8.0, 4.0)']: 72.66666666666666
Distance between clusters ['(24.0, 4.0)', '(24.0, 12.0)'] and ['(15.0, 8.0)', '(4.0, 4.0)', '(8.0,
4.0)']: 383.2