Fundamentals of Data Science Students

MISRIMAL NAVAJEE MUNOTH JAIN
ENGINEERING COLLEGE
(Managed By Tamil Nadu Educational and Medical Trust)
Thoraipakkam, Chennai – 600097.
DEPARTMENT
OF
COMPUTER SCIENCE AND ENGINEERING
DATA SCIENCE LABORATORY (CS3361)

RECORD NOTE BOOK
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE
(Managed By Tamil Nadu Educational and Medical Trust)
Thoraipakkam, Chennai – 600097.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Register Number
BONAFIDE CERTIFICATE
This is to certify that this is a bonafide record of work done
by……………………………………………………… of B.E Computer Science
and Engineering in the DATA SCIENCE LABORATORY (CS3361) during the
Academic year 2023-2024.
Staff In-Charge Head of the Department
Submitted for the University Practical Examination held on _____________________
Internal Examiner External Examiner

MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE,
CHENNAI – 97
DEPARTMENT OFCOMPUTER SCIENCE AND ENGINEERING
VISION
Producing competent Computer Engineers with a strong background in the latest trends and
technology to achieve academic excellence and to become pioneer in software and hardware
products with an ethical approach to serve the society.
MISSION
To provide quality education in Computer Science and Engineering with the state of the art
facilities.
To provide the learning audience that helps the students to enhance problem solving skills and to
inculcate in them the habit of continuous learning in their domain of interest.
To serve the society by providing insight solutions to the real world problems by employing the
latest trends of computing technology with strict adherence to professional and ethical
responsibilities.
CS3361 - DATA SCIENCE LABORATARY SYLLABUS
COURSE OBJECTIVES:
1. To understand the python libraries for data science
2. To understand the basic Statistical and Probability measures for data science.
3. To learn descriptive analytics on the benchmark data sets.
4. To apply correlation and regression analytics on standard data sets.
5. To present and interpret data using visualization packages in Python.
LIST OF EXERCISES:
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and Pandas
packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the RESULTs of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap
COURSE OUTCOMES:
At the end of this course, the students will be able to:
1: Make use of the python libraries for data science

2: Make use of the basic Statistical and Probability measures for data science.
3: Perform descriptive analytics on the benchmark data sets.
4: Perform correlation and regression analytics on standard data sets
5: Present and interpret data using visualization packages in Python.
INDEX
EX. DATE TITLE PG.NO MARK SIGN

NO S
1. Download, install and explore the features of
NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.
2. Working with Numpy arrays
3. Working With Pandas Data Frames
4. Reading data from text files, Excel and the

web and exploring various commands for
doing descriptive analytics on the Iris data
set.
5a. Univariate analysis: Frequency, Mean,

Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis.
5b. Bivariate analysis: Linear and logistic regression

modeling
5c. Multiple Regression analysis
5d. Also compare the RESULTs of the above

analysis for the two data sets
6a. Normal curves
6b. Density and contour plots
6c. Correlation and scatter plots
6d. Histograms
6e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap

Exp. no: 1 DOWNLOAD, INSTALL AND EXPLORE THE FEATURES
Date: OF NUMPY, SCIPY, JUPYTER, STATSMODELS AND
PANDAS PACKAGES.
AIM:
ALGORITHM:
Step 1: Start
Step 2: Download python 3.8 or higher and get-pip.py
Step 3: Install python with Add python.exe to PATH
Step 4: Drag and drop get-pip.py in terminal(cmd) and install
Step 5: Enter command in terminal(cmd)
a. python -m pip install --upgrade pip
b. python -m pip install numpy scipy jupyter statsmodels pandas
Step 6: Stop
SOURCE CODE:
OUTPUT:
RESULT:
Exp. no: 2
Date: WORKING WITH NUMPY ARRAYS
AIM
ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operations of array
Step4: Stop
PROGRAM
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2,
3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
OUTPUT
Array is of type: <class 'numpy.ndarray'>

No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int32
Program to Perform Array Slicing
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing")
print(a[1:])
OUTPUT
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]
Program to Perform Array Slicing

# array to begin with
import numpy as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print('Our array is:' )
print(a)
# this returns array of items in the second column
print('The items in the second column are:' )
print(a[...,1])
print('\n' )
# Now we will slice all items from the second row
print ('The items in the second row are:' )
print(a[1,...])
print('\n' )
# Now we will slice all items from column 1 onwards
print('The items column 1 onwards are:' )
print(a[...,1:])
OUTPUT:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]
RESULT:
Exp. no: 3
Date: WORKING WITH PANDAS DATA FRAMES
AIM:
ALGORITHM:
Step 1:Start
Step 2:import pandas package with an alias name as pd
Step 3:Write the data in the form of a dictionary and store it in the variable 'data'
Step 4:assign variable 't' with pd.DataFrame(data)
Step 5:increment the index value of 't' by 1
Step 6:print the value 't'
Step 7:Stop.
SOURCE CODE:
import pandas as pd
data={"Name":["Ram","Subash","Rahul","Arun","Deepak"],"Age":[24,25,24,26,25],"
CGPA":[9.5,9.3,9.0,8.5,.88]}
t=pd.DataFrame(data)
t.index+=1
print(t)
OUTPUT:
Name Age CGPA
1 Ram 24 9.50
2 Subash 25 9.30
3 Rahul 24 9.00
4 Arun 26 8.50
5 Deepak 25 0.88
>
RESULT:
Exp. no: 4 READING DATA FROM FILES AND EXPLORING
Date: VARIOUS COMMANDS FOR DOING DESCRIPTIVE
ANALYSIS ON IRIS DATASET
AIM:
ALGORITHM:
Step 1: Start
Step 2: import pandas ,numpy, matplotlib.pyplot, seaborn and from sklearn.datasets import
Step 3: load iris sns.set()
Step 4: Assign iris_data = pd.read_csv()
Step 5: print iris_data.head()
Step 6: printiris_data.describe()
Step 7: Set sns.countplot(x='species', data=iris_data) plt.show()
Step 8: Set sns.scatterplot(x='petal_length', y='petal_width',hue='species', data=iris_data)
Step 9: Set plt.legend(bbox_to_anchor=(1, 1), loc=1)
Step 10: Present via plt.show()
Step 11: Stop
SOURCE CODE:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets
import load_iris sns.set()
iris_data = pd.read_csv("D:\Downloads\cse\IRIS.csv")
print (iris_data.head())
print("*********************Descriptive Analysis****************************")
print(iris_data.describe())
#SPECIES COUNT
sns.countplot(x='species', data=iris_data) plt.show()
# COMPARING SEPAL LENGTH AND SEPAL WIDTH
sns.scatterplot(x='petal_length', y='petal_width',hue='species', data=iris_data)
plt.legend(bbox_to_anchor=(1, 1), loc=1)
plt.show()
OUTPUT:
#SPECIES COUNT
# COMPARING SEPAL LENGTH AND SEPAL WIDTH
RESULT:
Exp. no: 5a
UNIVARIATE ANALYSIS: FREQUENCY, MEAN, MEDIAN, MODE,
Date: VARIANCE, STANDARD DEVIATION, SKEWNESS AND KURTOSIS
AIM
ALGORITHM
Mean
Sum all the values in the dataset.
Divide the sum by the number of values in the dataset.
Median
Sort the dataset in ascending order.
If the number of observations is odd, the median is the middle value.
If the number of observations is even, the median is the average of the two middle values.
Mode
Count the occurrences of each unique value in the dataset.
The mode is the value(s) with the highest frequency.
Variance
Calculate the mean of the dataset.
Subtract the mean from each data point, square the RESULT, and sum all the squared differences.
Divide the sum by the number of data points.
Standard Deviation
Calculate the variance.
Take the square root of the variance.
Skewness
Calculate the mean and standard deviation of the dataset.
For each data point, subtract the mean and divide by the standard deviation.
Calculate the mean of the cubed values of the RESULTs.
Skewness is the mean divided by the cubed standard deviation.
Kurtosis
Calculate the mean and standard deviation of the dataset.
For each data point, subtract the mean and divide by the standard deviation.
Calculate the mean of the fourth power of these values.
Kurtosis is the mean divided by the fourth power of the standard deviation
SOURCE CODE:
import statistics
# initializing list
li = [1, 2, 3, 3, 2, 2, 2, 1]
# using mean() to calculate average of list
# elements
print ("The average of list values is : ",end="")
print (statistics.mean(li))
# Python code to demonstrate the
# working of median() on various
# range of data-sets
# importing the statistics module
from statistics import median
# Importing fractions module as fr
from fractions import Fraction as fr
# tuple of positive integer numbers

data1 = (2, 3, 4, 5, 7, 9, 11)
# tuple of floating point values

data2 = (2.4, 5.1, 6.7, 8.9)
# tuple of fractional numbers

data3 = (fr(1, 2), fr(44, 12),
fr(10, 3), fr(2, 3))
# tuple of a set of negative integers

data4 = (-5, -1, -12, -19, -3)
# tuple of set of positive

# and negative integers
data5 = (-1, -2, -3, -4, 4, 3, 2, 1)
# Printing the median of above datasets

print("Median of data-set 1 is % s" % (median(data1)))
# working of mode() function

# on a various range of data types
# Importing the statistics module

from statistics import mode
# Importing fractions module as fr
# Enables to calculate harmonic_mean of a
# set in Fraction
# tuple of positive integer numbers

data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)
# tuple of a set of floating point values

data2 = (2.4, 1.3, 1.3, 1.3, 2.4, 4.6)
# tuple of a set of fractional numbers

data3 = (fr(1, 2), fr(1, 2), fr(10, 3), fr(2, 3))

data4 = (-1, -2, -2, -2, -7, -7, -9)
# tuple of strings
data5 = ("red", "blue", "black", "blue", "black", "black", "brown")
# Printing out the mode of the above data-sets

print("Mode of data set 1 is % s" % (mode(data1)))
# importing statistics module
from statistics import variance
# importing fractions as parameter values

# tuple of a set of positive integers

# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

sample2 = (-2, -4, -3, -1, -5, -6)
# tuple of a set of positive and negative numbers

# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
# tuple of a set of fractional numbers

sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))

sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
# Print the variance of each samples

print("Variance of Sample1 is % s " % (variance(sample1)))
# importing the statistics module
from statistics import stdev
# importing fractions as parameter values

# creating a varying range of sample sets

# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

sample2 = (-2, -4, -3, -1, -5, -6)
# tuple of a set of positive and negative numbers

# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

sample4 = (1.23, 1.45, 2.1, 2.2, 1.9)
# Print the standard deviation of

# following sample sets of observations
print("The Standard Deviation of Sample1 is % s"
% (stdev(sample1)))

% (stdev(sample2)))

% (stdev(sample3)))

% (stdev(sample4)))
import scipy
from scipy.stats import skew
# Creating a dataset
dataset = [88, 85, 82, 97, 67, 77, 74, 86,
81, 95, 77, 88, 85, 76, 81]
# Calculate the skewness
print(skew(dataset, axis=0, bias=True))
from scipy.stats import kurtosis
# Calculate the kurtosis

print(kurtosis(dataset, axis=0, bias=True))
OUTPUT:
The average of list values is : 2

Median of data-set 1 is 5
Median of data-set 2 is 5.9
Median of data-set 3 is 2
Median of data-set 4 is -5
Median of data-set 5 is 0.0
Mode of data set 1 is 5
Mode of data set 2 is 1.3
Mode of data set 3 is 1/2
Mode of data set 4 is -2
Mode of data set 5 is black
Variance of Sample1 is 15.80952380952381
Variance of Sample4 is 1/45
The Standard Deviation of Sample1 is 3.9761191895520196
0.029331688766181797
-0.29271198374234686
RESULT:
Exp. no: 5b
BIVARIATE ANALYSIS: LINEAR AND LOGISTIC
Date:
REGRESSION MODELING
AIM
ALGORITHM
estimate_coef Function:
Calculate the mean of x and y.
Initialize variables SS_xy and SS_xx to zero.
Iterate through each observation in x and y.
Update SS_xy by adding the product of the corresponding
x and y values.
Update SS_xx by adding the square of the corresponding
x value.
Calculate the slope (b_1) as SS_xy / SS_xx.
Calculate the intercept (b_0) using the formula b_0 =
mean(y) - b_1 * mean(x).
Return the tuple (b_0, b_1).
plot_regression_line Function:
Scatter plot the actual data points using Matplotlib.
Calculate the predicted response vector y_pred using the
regression coefficients.
Plot the regression line using Matplotlib.
Display the plot with labels for the x and y axes.
main Function:
Define the observations/data (x and y arrays).
Call the estimate_coef function to obtain the regression
coefficients.
Print the estimated coefficients.
Call the plot_regression_line function to visualize the
regression line.
Main Program Execution:
If the script is run as the main program (if __name__ ==
"__main__": block):
Execute the main function.
SOURCE CODE:
import numpy as np
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
import numpy as np
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \\nb_1 =
{}".format(b[0], b[1]))
# plotting regression line
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
OUTPUT:
Estimated coefficients:
b_0 = 1.2363636363636363 \nb_1 = 1.1696969696969697
RESULT:.
Exp. no: 5 c
Date: MULTIPLE REGRESSION ANALYSIS
AIM
ALGORITHM
Import NumPy for numerical operations.

Import LinearRegression from scikit-learn for linear regression modeling.
Define a 2D array x containing input features.
Define a 1D array containing OUTPUT values.
Create a LinearRegression model using LinearRegression().
Fit the model to the data using the fit method.
Use the score method to calculate the coefficient of determination (R-squared) of the model on the given
data.
Print the intercept and coefficients of the linear regression model.
Use the model to predict the response for the input features (x).
Create new input features (x_new).
Use the model to predict the corresponding responses (y_new) for the new input features.
SOURCE CODE:
import numpy as np
from sklearn.linear_model import LinearRegression
x = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55,
34], [60, 35]]
y = [4, 5, 20, 14, 32, 22, 38, 43]
x, y = np.array(x), np.array(y)
model = LinearRegression().fit(x, y)
r_sq = model.score(x, y)
print(f"coefficient of determination: {r_sq}")
print(f"intercept: {model.intercept_}")
print(f"coefficients: {model.coef_}")
y_pred = model.predict(x)
print(f"predicted response:\n{y_pred}")
x_new = np.arange(10).reshape((-1, 2))
y_new = model.predict(x_new)
OUTPUT:
coefficient of determination: 0.8615939258756776

intercept: 5.52257927519819
coefficients: [0.44706965 0.25502548]
predicted response:
[ 5.77760476 8.012953 12.73867497 17.9744479
23.97529728 29.4660957
38.78227633 41.27265006]
RESULT:
Exp. no: 5 d
COMPARE THE RESULTS OF ANALYSIS FOR THE TWO
Date:
DATA SETS.
AIM
ALGORITHM
Step 1: Start
Step 2: Import pandas
Step 3: Assign df1 = pd.read_csv("compardata1.csv")
Step 4: Assign df2=pd.read_csv("compardata2.csv")
Step 5: Assign c_RESULT = df1[df1.apply(tuple, 1).isin (df2.apply(tuple,1))]
print(c_RESULT)
Step 6: Assign c_RESULT1 = pd.merge(df1, df2)
Step 7: print c_RESULT1
Step 8: Stop
SOURCE CODE:
import pandas as pd
df1 = pd.read_csv("compardata1.csv")
df2=pd.read_csv("compardata2.csv")
c_RESULT = df1[df1.apply(tuple, 1).isin (df2.apply(tuple,1))]
print(c_RESULT)
c_RESULT1 = pd.merge(df1, df2)
print(c_RESULT1)
RESULT:
Exp. no: 6a
Date: NORMAL CURVES
AIM:
ALGORITHM:
Step 1: Start
Step 2:import matplotlib.pyplot (plt) , numpy(np) and ,math packages
Step 3: Assign x = np.arange(0, math.pi*2, 0.05)
Step 4:Assign y=np.sin(x)
Step 5:Using plt.plot(x,y) plot the graph
Step 6:Give labels to the x and y axis and a title to the plot
Step 7:Using the show() function show the plot
Step 8:Stop
SOURCE CODE:
from matplotlib import pyplot as plt
import numpy as np
import math
x = np.arange(0, math.pi*2, 0.05)
y = np.sin(x)
plt.plot(x,y)
plt.xlabel("angle")
plt.ylabel("sine")
plt.title('sine wave')
plt.show()
OUTPUT:
RESULT:
Exp. no: 6b
Date: DENSITY AND CONTOUR PLOTS
AIM:
ALGORITHM:
• Use np.meshgrid to create a 2D grid (X, Y) from the 1D arrays feature_x and feature_y.
• Define a function Z that computes values based on the grid points (X, Y).
• Z = np.cos(X / 2) + np.sin(Y / 4).
• Use plt.contour to create contour lines based on the values of Z at different (X, Y) points
• Set the title of the plot using plt.title.
• Set labels for the x and y axes using plt.xlabel and plt.ylabel.
• Use plt.show to display the contour plot.
SOURCE CODE:

import numpy as np
feature_x = np.arange(0, 50, 2)

feature_y = np.arange(0, 50, 3)
# Creating 2-D grid of features

[X, Y] = np.meshgrid(feature_x, feature_y)
fig, ax = plt.subplots(1, 1)
Z = np.cos(X / 2) + np.sin(Y / 4)
# plots contour lines

ax.contour(X, Y, Z)
ax.set_title('Contour Plot')
ax.set_xlabel('feature_x')
ax.set_ylabel('feature_y')
plt.show()
OUTPUT:
ALGORITHM:
• Import the required libraries: pandas for data manipulation, seaborn for data visualization, and
numpy for numerical operations.
• Import matplotlib.pyplot for additional customization of the plot.
• Generate or load the data that you want to visualize.
• If your data is not already in a pandas DataFrame, create one.
• Use seaborn.kdeplot to create a kernel density plot for the variable in the DataFrame.
• Customize the appearance by specifying optional parameters such as shade, color, etc.
• Use seaborn.set_style, seaborn.set_palette, and other styling functions to customize the
appearance of the plot.
• Use plt.xlabel, plt.ylabel, and plt.title to add labels and a title to the plot.
• Use plt.show to display the density plot.
SOURCE CODE:
import pandas as pd
import numpy as np
# Generate sample data

np.random.seed(123)
data = np.random.randint(35, 101, 100)
# Create data frame

df = pd.DataFrame({'Marks': data})
# Create density plot

sns.kdeplot(df['Marks'], shade=True, color='blue')
# Add axis labels and plot title
sns.set_style("darkgrid")
sns.set_palette("pastel")
sns.set(font_scale=1.2)
plt.xlabel("Marks Scored")
plt.ylabel("Density")
plt.title("Density Plot of Student Marks")
plt.show()
RESULT:
Exp. no: 6c
Date: CORRELATION AND SCATTER PLOTS
AIM:
ALGORITHM:
Step 1:Start
Step 2:Import pandas(pd),matplotlib.pyplot(plt) and seaborn(sns) packages
Step 3:Read the dataset and store it in variable df and increment the value of index by 1
Step 4:Print the head value of the dataset using df.head() function
Step 5:Print the correlations of the dataset using df.corr(method='pearson') function
Step 6:Plot the scatter plot using sns.scatterplot(x=df.Age, y=df.Glucose, data=df)
Step 7:show the plots using show() function
Step 8:Stop
SOURCE CODE:
import pandas as pd
#Importing Dataset
df=pd.read_csv("D:\Downloads\cse\diabetes.csv")
df.index+=1
print(df.head())
#Correlation
correlations = df.corr(method = 'pearson') print("Correlations of
attributes in the data:\n",correlations)
#SCATTER PLOT
sns.scatterplot(x= df.Pregnancies
, y=df.Glucose, data=df) plt.show()
OUTPUT:
RESULT
Exp. no: 6d HISTOGRAMS
Date:
AIM:
ALGORITHM:
Step 1: Start
Step 2: import pandas and matplotlib
Step 3: read the dataset and increment the index value of it by 1.
Step4: Plot the histogram for the dataset using the hist() function and show it using
show() function.
Step 5: Stop.
SOURCE-CODE:
import pandas as pd
#Importing Dataset
df=pd.read_csv("D:\Downloads\cse\diabetes.csv") df.index+=1
#HISTOGRAM
df.hist()
plt.show()
OUTPUT:
RESULT:
Exp. no: 6e
Date: THREE-DIMENSIONAL PLOTTING
AIM:
ALGORITHM:
Step 1: Start
Step 2: import pandas ,matplotlib and mplot3d from mpl_toolkits
Step 3: read and store the dataset in the variable df
Step 4: increment the index of variable df by 1
Step 5: print df.head()
Step 6: assign variable x=df.Age , y=df.Pregnancies and z =df.DiabetesPedigreeFunction
Step 7: Plot the data using the figure(),axes(),get_cmap() functions of matplotlib and plot the
3D scatter plot using the scatter3D() function
Step 8: Set the label for x,y,z axes using set_(axes)label() function
Step 9: give the title of the plot using title() function
Step 10: show the plot using show() function
Step 11: Stop
SOURCE CODE:
import pandas as pd
import matplotlib.pyplot as plt from
mpl_toolkits import mplot3d
#Importing Dataset
df=pd.read_csv("D:\Downloads\cse\diabetes.csv") df.index+=1
print(df.head())
x=df.Age
y=df.Pregnancies
z=df.DiabetesPedigreeFunction
#THREE-DIMENSIONAL PLOTTING
fig = plt.figure(figsize = (10, 7)) ax =
plt.axes(projection ="3d") my_cmap =
plt.get_cmap('hsv')
sctt = ax.scatter3D(x, y, z,alpha = 0.8,c = (x + y + z), cmap = my_cmap,marker ='*')
ax.set_xlabel('X-age')
ax.set_ylabel('Y-Pregnancies')
ax.set_zlabel('Z-DiabetesPedigreeFunction')
plt.title("3D scatter plot")
plt.show()
OUTPUT:
RESULT :
Exp. no: 7
Date: VISUALIZING GEOGRAPHIC DATA WITH BASEMAP
AIM:
ALGORITHM:
Step 1: Start
Step 2: Import Basemap and import matplotlb.pyplot using from mpl-toolkits. basemap
Step 3: Define sample longitude and latitude coordinates.
Step 4: Use the m object to convert longitude and latitude coordinates to map coordinates.
Step 5: Assign the tile of the map using the “title()” method.
Step 6: Display the map using the “show()” method.
Step 7: Stop.
SOURCE CODE:
from mpl_toolkits.basemap import Basemap

# Create a map using Basemap

m = Basemap(projection='mill',llcrnrlat=-60,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
# Draw coastlines and countries

m.drawcoastlines()
m.drawcountries()
# Draw parallels and meridians

m.drawparallels(range(-90,91,30),labels=[1,0,0,0])
m.drawmeridians(range(-180,181,60),labels=[0,0,0,1])
# Sample data points (longitude, latitude)

lon = [0, 45, -60, 150, -30]
lat = [30, -20, 60, -10, 45]
# Convert latitude and longitude to map coordinates

x, y = m(lon, lat)
# Plot data points on the map

m.scatter(x, y, marker='o', color='red', label='Data Points')
# Add a title and legend

plt.title('Geographic Data Visualization with Basemap')
plt.legend()
# Show the map
plt.show()
OUTPUT:
RESULT :
ADDITIONAL PROGRAMS
Write a NumPy program to create a null vector of size 10 and update sixth value to 11
PROGRAM
# Importing the NumPy library with an alias 'np'

import numpy as np
# Creating a NumPy array 'x' filled with zeros of length 10
x = np.zeros(10)
# Printing the initial array 'x' filled with zeros
print(x)
# Modifying the sixth value (index 6, considering zero-based indexing) of the array to 11
print("Update sixth value to 11")
x[6] = 11
# Printing the updated array 'x' after modifying the sixth value
print(x)
OUTPUT
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Update sixth value to 11
[ 0. 0. 0. 0. 0. 0. 11. 0. 0. 0.]
Write a NumPy program to convert an array to a float type

PROGRAM

import numpy as np
# Defining a Python list 'a' containing integers
a = [1, 2, 3, 4]
# Printing the original array 'a'
print("Original array")
print(a)
# Converting the array 'a' to a NumPy array of type float using asfarray()
x = np.asfarray(a)
# Printing the array 'x' after converting to a float type
print("Array converted to a float type:")
print(x)
OUTPUT
Original array
[1, 2, 3, 4]
Array converted to a float type:
[1. 2. 3. 4.]
Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10
PROGRAM
import numpy as np
# Creating a NumPy array 'x' using arange() from 2 to 11 and reshaping it into a 3x3 matrix
x = np.arange(2, 11).reshape(3, 3)
# Printing the resulting 3x3 matrix 'x'

print(x)
OUTPUT
[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]
Write a NumPy program to convert a list of numeric value into a one-dimensional

NumPy array
PROGRAM
import numpy as np
# Creating a Python list 'l' containing floating-point numbers

l = [12.23, 13.32, 100, 36.32]
# Printing the original Python list

print("Original List:", l)
# Creating a NumPy array 'a' from the Python list 'l'

a = np.array(l)
# Printing the one-dimensional NumPy array 'a'

print("One-dimensional NumPy array: ", a)
OUTPUT
Original List: [12.23, 13.32, 100, 36.32]

One-dimensional NumPy array: [ 12.23 13.32 100. 36.32]
Write a NumPy program to create an empty and a full array
PROGRAM

import numpy as np
# Creating an empty array of shape (3, 4) using np.empty()
x = np.empty((3, 4))
# Printing the empty array 'x'
print(x)
# Creating a filled array of shape (3, 3) with all elements as 6 using np.full()
y = np.full((3, 3), 6)
# Printing the filled array 'y'
print(y)
OUTPUT
[[4.65855215e-310 0.00000000e+000 2.10077583e-312 6.79038654e-313]

[2.22809558e-312 2.14321575e-312 2.35541533e-312 6.79038654e-313]
[2.22809558e-312 2.14321575e-312 2.46151512e-312 2.41907520e-312]]
[[6 6 6]
[6 6 6]
[6 6 6]]
Write a NumPy program to convert a list and tuple into arrays
PROGRAM

import numpy as np
# Creating a Python list

my_list = [1, 2, 3, 4, 5, 6, 7, 8]
# Printing a message indicating the conversion of the list to an array using np.asarray() function
print("List to array: ")
# Converting the Python list to a NumPy array using np.asarray() and printing the resulting array
print(np.asarray(my_list))
# Creating a Python tuple containing two lists

my_tuple = ([8, 4, 6], [1, 2, 3])
# Printing a message indicating the conversion of the tuple to an array using np.asarray() function
print("Tuple to array: ")
# Converting the Python tuple to a NumPy array using np.asarray() and printing the resulting array
print(np.asarray(my_tuple))
OUTPUT
List to array:
[1 2 3 4 5 6 7 8]
Tuple to array:
[[8 4 6]
[1 2 3]]
Write a NumPy program to find the real and imaginary parts of an array of complex
numbers
PROGRAM

import numpy as np
# Calculating square root of a complex number

x = np.sqrt([1 + 0j])
# Calculating square root of another complex number

y = np.sqrt([0 + 1j])
# Printing the original array 'x' and 'y'

print("Original array:x ", x)
print("Original array:y ", y)
# Printing the real part of the array 'x' and 'y'

print("Real part of the array:")
print(x.real)
print(y.real)
# Printing the imaginary part of the array 'x' and 'y'

print("Imaginary part of the array:")
print(x.imag)
print(y.imag)
OUTPUT
Original array:x [1.+0.j]

Original array:y [0.70710678+0.70710678j]
Real part of the array:
[1.]
[0.70710678]
Imaginary part of the array:
[0.]
[0.70710678]
Write a NumPy program to merge three given NumPy arrays of same shape
PROGRAM
import numpy as np
arr1 = np.random.random(size=(25, 25, 1))
print("Original arrays:")
print(arr1)
print(arr2)
print(arr3)
result = np.concatenate((arr1, arr2, arr3), axis=-1)
print("\nAfter concatenate:")
print(result)
OUTPUT
Original arrays:
[[[0.23424822]
[0.51175253]]
[[0.57232915]
[0.22516223]]]
[[[0.01776688]
[0.40250687]]
[[0.10133723]
[0.67184758]]]
[[[0.22401405]
[0.28253877]]
[[0.23720417]
[0.09512562]]]
After concatenate:
[[[0.23424822 0.01776688 0.22401405]
[0.51175253 0.40250687 0.28253877]]
[[0.57232915 0.10133723 0.23720417]

[0.22516223 0.67184758 0.09512562]]]
Write a NumPy program to add a border (filled with 0's) around an existing array
PROGRAM
import numpy as np
# Creating a 3x3 NumPy array filled with ones

x = np.ones((3, 3))
# Printing the original array 'x'

print("Original array:")
print(x)
# Modifying the array 'x' to set 0s on the border and 1s inside the array using the np.pad function
print("0 on the border and 1 inside in the array")
x = np.pad(x, pad_width=1, mode='constant', constant_values=0)
# Printing the modified array 'x' with 0s on the border and 1s inside
print(x)
OUTPUT
Original array:
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
0 on the border and 1 inside in the array
[[0. 0. 0. 0. 0.]
[0. 1. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[0. 0. 0. 0. 0.]]
Write a NumPy program to append values to the end of an array
PROGRAM

import numpy as np
# Creating a Python list

x = [10, 20, 30]
# Printing a message indicating the original array

print("Original array:")
# Printing the original array

print(x)
# Appending values to the end of the array using np.append() and assigning the result back to 'x'
x = np.append(x, [[40, 50, 60], [70, 80, 90]])
# Printing a message indicating the array after appending values

print("After append values to the end of the array:")
# Printing the array after appending values

print(x)
OUTPUT
Original array:
[10, 20, 30]
After append values to the end of the array:
[10 20 30 40 50 60 70 80 90]
Write a NumPy program to find the real and imaginary parts of an array of complex numbers
PROGRAM

import numpy as np
# Calculating square root of a complex number
x = np.sqrt([1 + 0j])
# Calculating square root of another complex number
y = np.sqrt([0 + 1j])
# Printing the original array 'x' and 'y'
print("Original array:x ", x)
print("Original array:y ", y)
# Printing the real part of the array 'x' and 'y'
print("Real part of the array:")
print(x.real)
print(y.real)
# Printing the imaginary part of the array 'x' and 'y'
print("Imaginary part of the array:")
print(x.imag)
print(y.imag)
OUTPUT
Original array:x [1.+0.j]

Original array:y [0.70710678+0.70710678j]
Real part of the array:
[1.]
[0.70710678]
Imaginary part of the array:
[0.]
[0.70710678]
Write a NumPy program to search the index of a given array in another given array.
Original NumPy array:

[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Searched array:
[4 5 6]
Index of the searched array in the original array: [1]
PROGRAM
import numpy as np
np_array = np.array([[1,2,3], [4,5,6] , [7,8,9], [10, 11, 12]])
test_array = np.array([4,5,6])
print("Original Numpy array:")
print(np_array)
print("Searched array:")
print(test_array)
print("Index of the searched array in the original array:")
print(np.where((np_array == test_array).all(1))[0])
OUTPUT
Original Numpy array:

[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Searched array:
[4 5 6]
Index of the searched array in the original array:
[1]

Fundamentals of Data Science Students

Uploaded by

Copyright:

Available Formats

Fundamentals of Data Science Students

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fundamentals of Data Science Students

Uploaded by

Copyright:

Available Formats

MISRIMAL NAVAJEE MUNOTH JAIN

DATA SCIENCE LABORATORY (CS3361)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Staff In-Charge Head of the Department

Submitted for the University Practical Examination held on _____________________

Internal Examiner External Examiner

DEPARTMENT OFCOMPUTER SCIENCE AND ENGINEERING

inculcate in them the habit of continuous learning in their domain of interest.

1: Make use of the python libraries for data science

EX. DATE TITLE PG.NO MARK SIGN

2. Working with Numpy arrays

3. Working With Pandas Data Frames

4. Reading data from text files, Excel and the

5a. Univariate analysis: Frequency, Mean,

5b. Bivariate analysis: Linear and logistic regression

5c. Multiple Regression analysis

5d. Also compare the RESULTs of the above

6a. Normal curves

6b. Density and contour plots

6c. Correlation and scatter plots

6e. Three dimensional plotting

7. Visualizing Geographic Data with Basemap

Array is of type: <class 'numpy.ndarray'>

Program to Perform Array Slicing

# tuple of positive integer numbers

# tuple of floating point values

# tuple of fractional numbers

# tuple of a set of negative integers

# tuple of set of positive

# Printing the median of above datasets

# working of mode() function

# Importing the statistics module

# tuple of positive integer numbers

# tuple of a set of floating point values

# tuple of a set of fractional numbers

# tuple of a set of negative integers

# Printing out the mode of the above data-sets

# importing fractions as parameter values

# tuple of a set of positive integers

# tuple of a set of negative integers

# tuple of a set of positive and negative numbers

# tuple of a set of fractional numbers

# tuple of a set of floating point values

# Print the variance of each samples

# importing fractions as parameter values

# creating a varying range of sample sets

# tuple of a set of negative integers

# tuple of a set of positive and negative numbers

# tuple of a set of floating point values

# Print the standard deviation of

print("The Standard Deviation of Sample2 is % s"

print("The Standard Deviation of Sample3 is % s"

print("The Standard Deviation of Sample4 is % s"

# Calculate the kurtosis

The average of list values is : 2

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# mean of x and y vector

# calculating cross-deviation and deviation about x

SS_xy = np.sum(yx) - nm_y*m_x

SS_xx = np.sum(xx) - nm_x*m_x