Fundamentals of Data Science Students

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

MISRIMAL NAVAJEE MUNOTH JAIN

ENGINEERING COLLEGE
(Managed By Tamil Nadu Educational and Medical Trust)
Thoraipakkam, Chennai – 600097.

DEPARTMENT
OF
COMPUTER SCIENCE AND ENGINEERING

DATA SCIENCE LABORATORY (CS3361)


RECORD NOTE BOOK
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE
(Managed By Tamil Nadu Educational and Medical Trust)
Thoraipakkam, Chennai – 600097.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Register Number

BONAFIDE CERTIFICATE
This is to certify that this is a bonafide record of work done
by……………………………………………………… of B.E Computer Science
and Engineering in the DATA SCIENCE LABORATORY (CS3361) during the
Academic year 2023-2024.

Staff In-Charge Head of the Department

Submitted for the University Practical Examination held on _____________________

Internal Examiner External Examiner


MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE,
CHENNAI – 97

DEPARTMENT OFCOMPUTER SCIENCE AND ENGINEERING

VISION

Producing competent Computer Engineers with a strong background in the latest trends and
technology to achieve academic excellence and to become pioneer in software and hardware
products with an ethical approach to serve the society.

MISSION

​To provide quality education in Computer Science and Engineering with the state of the art

facilities.

​To provide the learning audience that helps the students to enhance problem solving skills and to

inculcate in them the habit of continuous learning in their domain of interest.

​To serve the society by providing insight solutions to the real world problems by employing the

latest trends of computing technology with strict adherence to professional and ethical
responsibilities.
CS3361 - DATA SCIENCE LABORATARY SYLLABUS

COURSE OBJECTIVES:
1. To understand the python libraries for data science
2. To understand the basic Statistical and Probability measures for data science.
3. To learn descriptive analytics on the benchmark data sets.
4. To apply correlation and regression analytics on standard data sets.
5. To present and interpret data using visualization packages in Python.

LIST OF EXERCISES:

1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the RESULTs of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap

COURSE OUTCOMES:
At the end of this course, the students will be able to:

1: Make use of the python libraries for data science


2: Make use of the basic Statistical and Probability measures for data science.
3: Perform descriptive analytics on the benchmark data sets.
4: Perform correlation and regression analytics on standard data sets
5: Present and interpret data using visualization packages in Python.
INDEX

EX. DATE TITLE PG.NO MARK SIGN


NO S
1. Download, install and explore the features of
NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.

2. Working with Numpy arrays

3. Working With Pandas Data Frames

4. Reading data from text files, Excel and the


web and exploring various commands for
doing descriptive analytics on the Iris data
set.

5a. Univariate analysis: Frequency, Mean,


Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis.

5b. Bivariate analysis: Linear and logistic regression


modeling

5c. Multiple Regression analysis

5d. Also compare the RESULTs of the above


analysis for the two data sets

6a. Normal curves

6b. Density and contour plots

6c. Correlation and scatter plots

6d. Histograms

6e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap


Exp. no: 1 DOWNLOAD, INSTALL AND EXPLORE THE FEATURES
Date: OF NUMPY, SCIPY, JUPYTER, STATSMODELS AND
PANDAS PACKAGES.

AIM:

ALGORITHM:
Step 1: Start
Step 2: Download python 3.8 or higher and get-pip.py
Step 3: Install python with Add python.exe to PATH
Step 4: Drag and drop get-pip.py in terminal(cmd) and install
Step 5: Enter command in terminal(cmd)
a. python -m pip install --upgrade pip
b. python -m pip install numpy scipy jupyter statsmodels pandas
Step 6: Stop

SOURCE CODE:
OUTPUT:
RESULT:
Exp. no: 2
Date: WORKING WITH NUMPY ARRAYS

AIM

ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operations of array
Step4: Stop

PROGRAM

import numpy as np
# Creating array object
arr = np.array( [[ 1, 2,
3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)

OUTPUT

Array is of type: <class 'numpy.ndarray'>


No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int32
Program to Perform Array Slicing

a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing")
print(a[1:])

OUTPUT
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]

Program to Perform Array Slicing


# array to begin with
import numpy as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print('Our array is:' )
print(a)
# this returns array of items in the second column
print('The items in the second column are:' )
print(a[...,1])
print('\n' )
# Now we will slice all items from the second row
print ('The items in the second row are:' )
print(a[1,...])
print('\n' )
# Now we will slice all items from column 1 onwards
print('The items column 1 onwards are:' )
print(a[...,1:])
OUTPUT:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]

RESULT:
Exp. no: 3
Date: WORKING WITH PANDAS DATA FRAMES

AIM:

ALGORITHM:

Step 1:Start
Step 2:import pandas package with an alias name as pd
Step 3:Write the data in the form of a dictionary and store it in the variable 'data'
Step 4:assign variable 't' with pd.DataFrame(data)
Step 5:increment the index value of 't' by 1
Step 6:print the value 't'
Step 7:Stop.

SOURCE CODE:
import pandas as pd
data={"Name":["Ram","Subash","Rahul","Arun","Deepak"],"Age":[24,25,24,26,25],"
CGPA":[9.5,9.3,9.0,8.5,.88]}
t=pd.DataFrame(data)
t.index+=1
print(t)
OUTPUT:
Name Age CGPA
1 Ram 24 9.50
2 Subash 25 9.30
3 Rahul 24 9.00
4 Arun 26 8.50
5 Deepak 25 0.88
>

RESULT:
Exp. no: 4 READING DATA FROM FILES AND EXPLORING
Date: VARIOUS COMMANDS FOR DOING DESCRIPTIVE
ANALYSIS ON IRIS DATASET

AIM:

ALGORITHM:

Step 1: Start
Step 2: import pandas ,numpy, matplotlib.pyplot, seaborn and from sklearn.datasets import
Step 3: load iris sns.set()
Step 4: Assign iris_data = pd.read_csv()
Step 5: print iris_data.head()
Step 6: printiris_data.describe()
Step 7: Set sns.countplot(x='species', data=iris_data) plt.show()
Step 8: Set sns.scatterplot(x='petal_length', y='petal_width',hue='species', data=iris_data)
Step 9: Set plt.legend(bbox_to_anchor=(1, 1), loc=1)
Step 10: Present via plt.show()
Step 11: Stop

SOURCE CODE:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets
import load_iris sns.set()
iris_data = pd.read_csv("D:\Downloads\cse\IRIS.csv")
print (iris_data.head())
print("*********************Descriptive Analysis****************************")
print(iris_data.describe())
#SPECIES COUNT
sns.countplot(x='species', data=iris_data) plt.show()
# COMPARING SEPAL LENGTH AND SEPAL WIDTH
sns.scatterplot(x='petal_length', y='petal_width',hue='species', data=iris_data)
plt.legend(bbox_to_anchor=(1, 1), loc=1)
plt.show()

OUTPUT:

#SPECIES COUNT
# COMPARING SEPAL LENGTH AND SEPAL WIDTH

RESULT:
Exp. no: 5a
UNIVARIATE ANALYSIS: FREQUENCY, MEAN, MEDIAN, MODE,
Date: VARIANCE, STANDARD DEVIATION, SKEWNESS AND KURTOSIS

AIM

ALGORITHM
Mean
Sum all the values in the dataset.
Divide the sum by the number of values in the dataset.
Median
Sort the dataset in ascending order.
If the number of observations is odd, the median is the middle value.
If the number of observations is even, the median is the average of the two middle values.
Mode
Count the occurrences of each unique value in the dataset.
The mode is the value(s) with the highest frequency.
Variance
Calculate the mean of the dataset.
Subtract the mean from each data point, square the RESULT, and sum all the squared differences.
Divide the sum by the number of data points.
Standard Deviation
Calculate the variance.
Take the square root of the variance.
Skewness
Calculate the mean and standard deviation of the dataset.
For each data point, subtract the mean and divide by the standard deviation.
Calculate the mean of the cubed values of the RESULTs.
Skewness is the mean divided by the cubed standard deviation.
Kurtosis
Calculate the mean and standard deviation of the dataset.
For each data point, subtract the mean and divide by the standard deviation.
Calculate the mean of the fourth power of these values.
Kurtosis is the mean divided by the fourth power of the standard deviation
SOURCE CODE:

import statistics
# initializing list
li = [1, 2, 3, 3, 2, 2, 2, 1]
# using mean() to calculate average of list
# elements
print ("The average of list values is : ",end="")
print (statistics.mean(li))
# Python code to demonstrate the
# working of median() on various
# range of data-sets
# importing the statistics module
from statistics import median
# Importing fractions module as fr
from fractions import Fraction as fr

# tuple of positive integer numbers


data1 = (2, 3, 4, 5, 7, 9, 11)

# tuple of floating point values


data2 = (2.4, 5.1, 6.7, 8.9)

# tuple of fractional numbers


data3 = (fr(1, 2), fr(44, 12),
fr(10, 3), fr(2, 3))

# tuple of a set of negative integers


data4 = (-5, -1, -12, -19, -3)

# tuple of set of positive


# and negative integers
data5 = (-1, -2, -3, -4, 4, 3, 2, 1)

# Printing the median of above datasets


print("Median of data-set 1 is % s" % (median(data1)))
print("Median of data-set 2 is % s" % (median(data2)))
print("Median of data-set 3 is % s" % (median(data3)))
print("Median of data-set 4 is % s" % (median(data4)))
print("Median of data-set 5 is % s" % (median(data5)))

# working of mode() function


# on a various range of data types

# Importing the statistics module


from statistics import mode
# Importing fractions module as fr
# Enables to calculate harmonic_mean of a
# set in Fraction
from fractions import Fraction as fr

# tuple of positive integer numbers


data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)

# tuple of a set of floating point values


data2 = (2.4, 1.3, 1.3, 1.3, 2.4, 4.6)

# tuple of a set of fractional numbers


data3 = (fr(1, 2), fr(1, 2), fr(10, 3), fr(2, 3))

# tuple of a set of negative integers


data4 = (-1, -2, -2, -2, -7, -7, -9)

# tuple of strings
data5 = ("red", "blue", "black", "blue", "black", "black", "brown")

# Printing out the mode of the above data-sets


print("Mode of data set 1 is % s" % (mode(data1)))
print("Mode of data set 2 is % s" % (mode(data2)))
print("Mode of data set 3 is % s" % (mode(data3)))
print("Mode of data set 4 is % s" % (mode(data4)))
print("Mode of data set 5 is % s" % (mode(data5)))
# importing statistics module
from statistics import variance

# importing fractions as parameter values


from fractions import Fraction as fr

# tuple of a set of positive integers


# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers


sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers


# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of fractional numbers


sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))

# tuple of a set of floating point values


sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)

# Print the variance of each samples


print("Variance of Sample1 is % s " % (variance(sample1)))
print("Variance of Sample2 is % s " % (variance(sample2)))
print("Variance of Sample3 is % s " % (variance(sample3)))
print("Variance of Sample4 is % s " % (variance(sample4)))
print("Variance of Sample5 is % s " % (variance(sample5)))
# importing the statistics module
from statistics import stdev

# importing fractions as parameter values


from fractions import Fraction as fr

# creating a varying range of sample sets


# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers


sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers


# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of floating point values


sample4 = (1.23, 1.45, 2.1, 2.2, 1.9)

# Print the standard deviation of


# following sample sets of observations
print("The Standard Deviation of Sample1 is % s"
% (stdev(sample1)))

print("The Standard Deviation of Sample2 is % s"


% (stdev(sample2)))

print("The Standard Deviation of Sample3 is % s"


% (stdev(sample3)))

print("The Standard Deviation of Sample4 is % s"


% (stdev(sample4)))

import scipy
from scipy.stats import skew

# Creating a dataset
dataset = [88, 85, 82, 97, 67, 77, 74, 86,
81, 95, 77, 88, 85, 76, 81]
# Calculate the skewness
print(skew(dataset, axis=0, bias=True))
from scipy.stats import kurtosis

# Calculate the kurtosis


print(kurtosis(dataset, axis=0, bias=True))

OUTPUT:

The average of list values is : 2


Median of data-set 1 is 5
Median of data-set 2 is 5.9
Median of data-set 3 is 2
Median of data-set 4 is -5
Median of data-set 5 is 0.0
Mode of data set 1 is 5
Mode of data set 2 is 1.3
Mode of data set 3 is 1/2
Mode of data set 4 is -2
Mode of data set 5 is black
Variance of Sample1 is 15.80952380952381
Variance of Sample2 is 3.5
Variance of Sample3 is 61.125
Variance of Sample4 is 1/45
Variance of Sample5 is 0.17613000000000006
The Standard Deviation of Sample1 is 3.9761191895520196
The Standard Deviation of Sample2 is 1.8708286933869707
The Standard Deviation of Sample3 is 7.8182478855559445
The Standard Deviation of Sample4 is 0.41967844833872525
0.029331688766181797
-0.29271198374234686

RESULT:
Exp. no: 5b
BIVARIATE ANALYSIS: LINEAR AND LOGISTIC
Date:
REGRESSION MODELING

AIM

ALGORITHM

estimate_coef Function:
Calculate the mean of x and y.
Initialize variables SS_xy and SS_xx to zero.
Iterate through each observation in x and y.
Update SS_xy by adding the product of the corresponding
x and y values.
Update SS_xx by adding the square of the corresponding
x value.
Calculate the slope (b_1) as SS_xy / SS_xx.
Calculate the intercept (b_0) using the formula b_0 =
mean(y) - b_1 * mean(x).
Return the tuple (b_0, b_1).
plot_regression_line Function:
Scatter plot the actual data points using Matplotlib.
Calculate the predicted response vector y_pred using the
regression coefficients.
Plot the regression line using Matplotlib.
Display the plot with labels for the x and y axes.
main Function:
Define the observations/data (x and y arrays).
Call the estimate_coef function to obtain the regression
coefficients.
Print the estimated coefficients.
Call the plot_regression_line function to visualize the
regression line.
Main Program Execution:
If the script is run as the main program (if __name__ ==
"__main__": block):
Execute the main function.
SOURCE CODE:

import numpy as np

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points

n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)

m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot

plt.scatter(x, y, color = "m",

marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")


# putting labels

plt.xlabel('x')

plt.ylabel('y')

# function to show plot

plt.show()

import numpy as np

def main():

# observations / data

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {} \\nb_1 =

{}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if __name__ == "__main__":

main()
OUTPUT:

Estimated coefficients:

b_0 = 1.2363636363636363 \nb_1 = 1.1696969696969697

RESULT:.
Exp. no: 5 c
Date: MULTIPLE REGRESSION ANALYSIS

AIM

ALGORITHM

Import NumPy for numerical operations.


Import LinearRegression from scikit-learn for linear regression modeling.
Define a 2D array x containing input features.
Define a 1D array containing OUTPUT values.
Create a LinearRegression model using LinearRegression().
Fit the model to the data using the fit method.
Use the score method to calculate the coefficient of determination (R-squared) of the model on the given
data.
Print the intercept and coefficients of the linear regression model.
Use the model to predict the response for the input features (x).
Create new input features (x_new).
Use the model to predict the corresponding responses (y_new) for the new input features.

SOURCE CODE:

import numpy as np

from sklearn.linear_model import LinearRegression

x = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55,

34], [60, 35]]

y = [4, 5, 20, 14, 32, 22, 38, 43]

x, y = np.array(x), np.array(y)

model = LinearRegression().fit(x, y)

r_sq = model.score(x, y)

print(f"coefficient of determination: {r_sq}")

print(f"intercept: {model.intercept_}")
print(f"coefficients: {model.coef_}")

y_pred = model.predict(x)

print(f"predicted response:\n{y_pred}")

x_new = np.arange(10).reshape((-1, 2))

y_new = model.predict(x_new)

OUTPUT:

coefficient of determination: 0.8615939258756776


intercept: 5.52257927519819
coefficients: [0.44706965 0.25502548]
predicted response:
[ 5.77760476 8.012953 12.73867497 17.9744479
23.97529728 29.4660957
38.78227633 41.27265006]

RESULT:
Exp. no: 5 d
COMPARE THE RESULTS OF ANALYSIS FOR THE TWO
Date:
DATA SETS.

AIM

ALGORITHM

Step 1: Start
Step 2: Import pandas
Step 3: Assign df1 = pd.read_csv("compardata1.csv")
Step 4: Assign df2=pd.read_csv("compardata2.csv")
Step 5: Assign c_RESULT = df1[df1.apply(tuple, 1).isin (df2.apply(tuple,1))]
print(c_RESULT)
Step 6: Assign c_RESULT1 = pd.merge(df1, df2)
Step 7: print c_RESULT1
Step 8: Stop

SOURCE CODE:

import pandas as pd
df1 = pd.read_csv("compardata1.csv")
df2=pd.read_csv("compardata2.csv")
c_RESULT = df1[df1.apply(tuple, 1).isin (df2.apply(tuple,1))]
print(c_RESULT)
c_RESULT1 = pd.merge(df1, df2)
print(c_RESULT1)
RESULT:
Exp. no: 6a
Date: NORMAL CURVES

AIM:

ALGORITHM:

Step 1: Start
Step 2:import matplotlib.pyplot (plt) , numpy(np) and ,math packages
Step 3: Assign x = np.arange(0, math.pi*2, 0.05)
Step 4:Assign y=np.sin(x)
Step 5:Using plt.plot(x,y) plot the graph
Step 6:Give labels to the x and y axis and a title to the plot
Step 7:Using the show() function show the plot
Step 8:Stop

SOURCE CODE:
from matplotlib import pyplot as plt
import numpy as np
import math
x = np.arange(0, math.pi*2, 0.05)
y = np.sin(x)
plt.plot(x,y)
plt.xlabel("angle")
plt.ylabel("sine")
plt.title('sine wave')
plt.show()
OUTPUT:

RESULT:
Exp. no: 6b
Date: DENSITY AND CONTOUR PLOTS

AIM:

ALGORITHM:

• Use np.meshgrid to create a 2D grid (X, Y) from the 1D arrays feature_x and feature_y.
• Define a function Z that computes values based on the grid points (X, Y).
• Z = np.cos(X / 2) + np.sin(Y / 4).
• Use plt.contour to create contour lines based on the values of Z at different (X, Y) points
• Set the title of the plot using plt.title.
• Set labels for the x and y axes using plt.xlabel and plt.ylabel.
• Use plt.show to display the contour plot.

SOURCE CODE:

import matplotlib.pyplot as plt


import numpy as np

feature_x = np.arange(0, 50, 2)


feature_y = np.arange(0, 50, 3)

# Creating 2-D grid of features


[X, Y] = np.meshgrid(feature_x, feature_y)

fig, ax = plt.subplots(1, 1)

Z = np.cos(X / 2) + np.sin(Y / 4)

# plots contour lines


ax.contour(X, Y, Z)

ax.set_title('Contour Plot')
ax.set_xlabel('feature_x')
ax.set_ylabel('feature_y')

plt.show()
OUTPUT:

ALGORITHM:

• Import the required libraries: pandas for data manipulation, seaborn for data visualization, and
numpy for numerical operations.
• Import matplotlib.pyplot for additional customization of the plot.
• Generate or load the data that you want to visualize.
• If your data is not already in a pandas DataFrame, create one.
• Use seaborn.kdeplot to create a kernel density plot for the variable in the DataFrame.
• Customize the appearance by specifying optional parameters such as shade, color, etc.
• Use seaborn.set_style, seaborn.set_palette, and other styling functions to customize the
appearance of the plot.
• Use plt.xlabel, plt.ylabel, and plt.title to add labels and a title to the plot.
• Use plt.show to display the density plot.

SOURCE CODE:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

# Generate sample data


np.random.seed(123)
data = np.random.randint(35, 101, 100)

# Create data frame


df = pd.DataFrame({'Marks': data})

# Create density plot


sns.kdeplot(df['Marks'], shade=True, color='blue')
# Add axis labels and plot title
sns.set_style("darkgrid")
sns.set_palette("pastel")
sns.set(font_scale=1.2)
plt.xlabel("Marks Scored")
plt.ylabel("Density")
plt.title("Density Plot of Student Marks")
plt.show()

RESULT:
Exp. no: 6c
Date: CORRELATION AND SCATTER PLOTS

AIM:

ALGORITHM:
Step 1:Start
Step 2:Import pandas(pd),matplotlib.pyplot(plt) and seaborn(sns) packages
Step 3:Read the dataset and store it in variable df and increment the value of index by 1
Step 4:Print the head value of the dataset using df.head() function
Step 5:Print the correlations of the dataset using df.corr(method='pearson') function
Step 6:Plot the scatter plot using sns.scatterplot(x=df.Age, y=df.Glucose, data=df)
Step 7:show the plots using show() function
Step 8:Stop

SOURCE CODE:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#Importing Dataset
df=pd.read_csv("D:\Downloads\cse\diabetes.csv")
df.index+=1
print(df.head())

#Correlation
correlations = df.corr(method = 'pearson') print("Correlations of
attributes in the data:\n",correlations)
#SCATTER PLOT
sns.scatterplot(x= df.Pregnancies
, y=df.Glucose, data=df) plt.show()
OUTPUT:

RESULT
Exp. no: 6d HISTOGRAMS
Date:

AIM:

ALGORITHM:

Step 1: Start
Step 2: import pandas and matplotlib
Step 3: read the dataset and increment the index value of it by 1.
Step4: Plot the histogram for the dataset using the hist() function and show it using
show() function.
Step 5: Stop.

SOURCE-CODE:
import pandas as pd
import matplotlib.pyplot as plt

#Importing Dataset
df=pd.read_csv("D:\Downloads\cse\diabetes.csv") df.index+=1
#HISTOGRAM
df.hist()
plt.show()
OUTPUT:

RESULT:
Exp. no: 6e
Date: THREE-DIMENSIONAL PLOTTING

AIM:

ALGORITHM:
Step 1: Start
Step 2: import pandas ,matplotlib and mplot3d from mpl_toolkits
Step 3: read and store the dataset in the variable df
Step 4: increment the index of variable df by 1
Step 5: print df.head()
Step 6: assign variable x=df.Age , y=df.Pregnancies and z =df.DiabetesPedigreeFunction
Step 7: Plot the data using the figure(),axes(),get_cmap() functions of matplotlib and plot the
3D scatter plot using the scatter3D() function
Step 8: Set the label for x,y,z axes using set_(axes)label() function
Step 9: give the title of the plot using title() function
Step 10: show the plot using show() function
Step 11: Stop

SOURCE CODE:

import pandas as pd
import matplotlib.pyplot as plt from
mpl_toolkits import mplot3d
#Importing Dataset
df=pd.read_csv("D:\Downloads\cse\diabetes.csv") df.index+=1
print(df.head())
x=df.Age
y=df.Pregnancies
z=df.DiabetesPedigreeFunction

#THREE-DIMENSIONAL PLOTTING
fig = plt.figure(figsize = (10, 7)) ax =
plt.axes(projection ="3d") my_cmap =
plt.get_cmap('hsv')
sctt = ax.scatter3D(x, y, z,alpha = 0.8,c = (x + y + z), cmap = my_cmap,marker ='*')
ax.set_xlabel('X-age')
ax.set_ylabel('Y-Pregnancies')
ax.set_zlabel('Z-DiabetesPedigreeFunction')
plt.title("3D scatter plot")
plt.show()

OUTPUT:

RESULT :
Exp. no: 7
Date: VISUALIZING GEOGRAPHIC DATA WITH BASEMAP

AIM:

ALGORITHM:

Step 1: Start
Step 2: Import Basemap and import matplotlb.pyplot using from mpl-toolkits. basemap
Step 3: Define sample longitude and latitude coordinates.
Step 4: Use the m object to convert longitude and latitude coordinates to map coordinates.
Step 5: Assign the tile of the map using the “title()” method.
Step 6: Display the map using the “show()” method.
Step 7: Stop.

SOURCE CODE:

from mpl_toolkits.basemap import Basemap


import matplotlib.pyplot as plt

# Create a map using Basemap


m = Basemap(projection='mill',llcrnrlat=-60,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')

# Draw coastlines and countries


m.drawcoastlines()
m.drawcountries()

# Draw parallels and meridians


m.drawparallels(range(-90,91,30),labels=[1,0,0,0])
m.drawmeridians(range(-180,181,60),labels=[0,0,0,1])

# Sample data points (longitude, latitude)


lon = [0, 45, -60, 150, -30]
lat = [30, -20, 60, -10, 45]

# Convert latitude and longitude to map coordinates


x, y = m(lon, lat)

# Plot data points on the map


m.scatter(x, y, marker='o', color='red', label='Data Points')

# Add a title and legend


plt.title('Geographic Data Visualization with Basemap')
plt.legend()
# Show the map
plt.show()

OUTPUT:

RESULT :
ADDITIONAL PROGRAMS

Write a NumPy program to create a null vector of size 10 and update sixth value to 11
PROGRAM

# Importing the NumPy library with an alias 'np'


import numpy as np
# Creating a NumPy array 'x' filled with zeros of length 10
x = np.zeros(10)
# Printing the initial array 'x' filled with zeros
print(x)
# Modifying the sixth value (index 6, considering zero-based indexing) of the array to 11
print("Update sixth value to 11")
x[6] = 11
# Printing the updated array 'x' after modifying the sixth value
print(x)

OUTPUT
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Update sixth value to 11
[ 0. 0. 0. 0. 0. 0. 11. 0. 0. 0.]

Write a NumPy program to convert an array to a float type


PROGRAM

# Importing the NumPy library with an alias 'np'


import numpy as np
# Defining a Python list 'a' containing integers
a = [1, 2, 3, 4]
# Printing the original array 'a'
print("Original array")
print(a)
# Converting the array 'a' to a NumPy array of type float using asfarray()
x = np.asfarray(a)
# Printing the array 'x' after converting to a float type
print("Array converted to a float type:")
print(x)

OUTPUT

Original array
[1, 2, 3, 4]
Array converted to a float type:
[1. 2. 3. 4.]
Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10

PROGRAM
# Importing the NumPy library with an alias 'np'
import numpy as np

# Creating a NumPy array 'x' using arange() from 2 to 11 and reshaping it into a 3x3 matrix
x = np.arange(2, 11).reshape(3, 3)

# Printing the resulting 3x3 matrix 'x'


print(x)

OUTPUT

[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]

Write a NumPy program to convert a list of numeric value into a one-dimensional


NumPy array
PROGRAM
# Importing the NumPy library with an alias 'np'
import numpy as np

# Creating a Python list 'l' containing floating-point numbers


l = [12.23, 13.32, 100, 36.32]

# Printing the original Python list


print("Original List:", l)

# Creating a NumPy array 'a' from the Python list 'l'


a = np.array(l)

# Printing the one-dimensional NumPy array 'a'


print("One-dimensional NumPy array: ", a)

OUTPUT

Original List: [12.23, 13.32, 100, 36.32]


One-dimensional NumPy array: [ 12.23 13.32 100. 36.32]
Write a NumPy program to create an empty and a full array

PROGRAM

# Importing the NumPy library with an alias 'np'


import numpy as np
# Creating an empty array of shape (3, 4) using np.empty()
x = np.empty((3, 4))
# Printing the empty array 'x'
print(x)
# Creating a filled array of shape (3, 3) with all elements as 6 using np.full()
y = np.full((3, 3), 6)
# Printing the filled array 'y'
print(y)

OUTPUT

[[4.65855215e-310 0.00000000e+000 2.10077583e-312 6.79038654e-313]


[2.22809558e-312 2.14321575e-312 2.35541533e-312 6.79038654e-313]
[2.22809558e-312 2.14321575e-312 2.46151512e-312 2.41907520e-312]]
[[6 6 6]
[6 6 6]
[6 6 6]]

Write a NumPy program to convert a list and tuple into arrays

PROGRAM

# Importing the NumPy library with an alias 'np'


import numpy as np

# Creating a Python list


my_list = [1, 2, 3, 4, 5, 6, 7, 8]

# Printing a message indicating the conversion of the list to an array using np.asarray() function
print("List to array: ")

# Converting the Python list to a NumPy array using np.asarray() and printing the resulting array
print(np.asarray(my_list))

# Creating a Python tuple containing two lists


my_tuple = ([8, 4, 6], [1, 2, 3])

# Printing a message indicating the conversion of the tuple to an array using np.asarray() function
print("Tuple to array: ")

# Converting the Python tuple to a NumPy array using np.asarray() and printing the resulting array
print(np.asarray(my_tuple))
OUTPUT

List to array:
[1 2 3 4 5 6 7 8]
Tuple to array:
[[8 4 6]
[1 2 3]]

Write a NumPy program to find the real and imaginary parts of an array of complex
numbers

PROGRAM

# Importing the NumPy library with an alias 'np'


import numpy as np

# Calculating square root of a complex number


x = np.sqrt([1 + 0j])

# Calculating square root of another complex number


y = np.sqrt([0 + 1j])

# Printing the original array 'x' and 'y'


print("Original array:x ", x)
print("Original array:y ", y)

# Printing the real part of the array 'x' and 'y'


print("Real part of the array:")
print(x.real)
print(y.real)

# Printing the imaginary part of the array 'x' and 'y'


print("Imaginary part of the array:")
print(x.imag)
print(y.imag)
OUTPUT

Original array:x [1.+0.j]


Original array:y [0.70710678+0.70710678j]
Real part of the array:
[1.]
[0.70710678]
Imaginary part of the array:
[0.]
[0.70710678]
Write a NumPy program to merge three given NumPy arrays of same shape
PROGRAM

import numpy as np
arr1 = np.random.random(size=(25, 25, 1))
arr2 = np.random.random(size=(25, 25, 1))
arr3 = np.random.random(size=(25, 25, 1))
print("Original arrays:")
print(arr1)
print(arr2)
print(arr3)
result = np.concatenate((arr1, arr2, arr3), axis=-1)
print("\nAfter concatenate:")
print(result)

OUTPUT
Original arrays:
[[[0.23424822]
[0.51175253]]

[[0.57232915]
[0.22516223]]]

[[[0.01776688]
[0.40250687]]

[[0.10133723]
[0.67184758]]]

[[[0.22401405]
[0.28253877]]

[[0.23720417]
[0.09512562]]]

After concatenate:
[[[0.23424822 0.01776688 0.22401405]
[0.51175253 0.40250687 0.28253877]]

[[0.57232915 0.10133723 0.23720417]


[0.22516223 0.67184758 0.09512562]]]
Write a NumPy program to add a border (filled with 0's) around an existing array

PROGRAM
# Importing the NumPy library with an alias 'np'
import numpy as np

# Creating a 3x3 NumPy array filled with ones


x = np.ones((3, 3))

# Printing the original array 'x'


print("Original array:")
print(x)

# Modifying the array 'x' to set 0s on the border and 1s inside the array using the np.pad function
print("0 on the border and 1 inside in the array")
x = np.pad(x, pad_width=1, mode='constant', constant_values=0)

# Printing the modified array 'x' with 0s on the border and 1s inside
print(x)

OUTPUT

Original array:
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]

0 on the border and 1 inside in the array

[[0. 0. 0. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 0. 0. 0.]]

Write a NumPy program to append values to the end of an array

PROGRAM

# Importing the NumPy library with an alias 'np'


import numpy as np

# Creating a Python list


x = [10, 20, 30]

# Printing a message indicating the original array


print("Original array:")

# Printing the original array


print(x)

# Appending values to the end of the array using np.append() and assigning the result back to 'x'
x = np.append(x, [[40, 50, 60], [70, 80, 90]])

# Printing a message indicating the array after appending values


print("After append values to the end of the array:")

# Printing the array after appending values


print(x)

OUTPUT
Original array:
[10, 20, 30]
After append values to the end of the array:
[10 20 30 40 50 60 70 80 90]

Write a NumPy program to find the real and imaginary parts of an array of complex numbers

PROGRAM

# Importing the NumPy library with an alias 'np'


import numpy as np
# Calculating square root of a complex number
x = np.sqrt([1 + 0j])
# Calculating square root of another complex number
y = np.sqrt([0 + 1j])
# Printing the original array 'x' and 'y'
print("Original array:x ", x)
print("Original array:y ", y)
# Printing the real part of the array 'x' and 'y'
print("Real part of the array:")
print(x.real)
print(y.real)
# Printing the imaginary part of the array 'x' and 'y'
print("Imaginary part of the array:")
print(x.imag)
print(y.imag)

OUTPUT

Original array:x [1.+0.j]


Original array:y [0.70710678+0.70710678j]
Real part of the array:
[1.]
[0.70710678]
Imaginary part of the array:
[0.]
[0.70710678]
Write a NumPy program to search the index of a given array in another given array.

Original NumPy array:


[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Searched array:
[4 5 6]
Index of the searched array in the original array: [1]

PROGRAM

import numpy as np
np_array = np.array([[1,2,3], [4,5,6] , [7,8,9], [10, 11, 12]])
test_array = np.array([4,5,6])
print("Original Numpy array:")
print(np_array)
print("Searched array:")
print(test_array)
print("Index of the searched array in the original array:")
print(np.where((np_array == test_array).all(1))[0])

OUTPUT

Original Numpy array:


[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Searched array:
[4 5 6]
Index of the searched array in the original array:
[1]

You might also like