Dec 23 py

.
Indian Maritime University

A Central University, Govt of India)
End Semester Examinations – December 2023
Programme Name: B Tech (ME)
Semester: III
Subject Code: UG11T4305
Subject Name: Statistics and Data Analysis Using Python and R
Date: Max Marks: 70

Duration: 03 HrsPass Marks: 35
Answer Key
Section A
MCQs–All Questions are Compulsory. (10×01 mark = 10 Marks)
1. d. a.title()
2.b. 2
3. a. Categorical data
4.c. 3
5.d. DataFrame
6. c. Using the ‘def’ keyword
7.b. type()
8. a. A multi-dimensional array object
9. a. 15
10.c. To visualize the spread and central tendency of numerical data
Section B
Answer all the Questions. (05×02 marks=10 Marks)
11.Explain tuple unpacking in Python with an example. 2 Marks

Answer:
Tuple unpacking is a feature of Python that allows you to assign the elements of
a tuple to separate variables in a single statement. This is achieved by using the
assignment operator (=) followed by the tuple you want to unpack. The number
of variables on the left-hand side of the assignment operator must match the
number of elements in the tuple.
12. What is a NumPy array, and how does it differ from a Python list? 2 Marks
Answer: Definition-1 Mark , Comparison with list- 1 Mark (1 or 2 points)
• The homogeneous multidimensional array is the main object of NumPy.
• It is basically a table of elements which are all of the same type and
indexed by a tuple of positive integers.
• The dimensions are called axis in NumPy.
• The NumPy's array class is known as ndarray or alias array.
• Types of Numpy arrays:
1.One Dimensional Array:
A one-dimensional array is a type of linear array.
2. Multidimensional Array:
Data in multidimensional arrays are stored in tabular form
NumPy arrays provide several advantages over Python lists:
Homogeneous Data Type: In a NumPy array, all elements are of the same
data type, unlike Python lists, which can contain elements of different
data types. This homogeneity allows for more efficient memory storage
and mathematical operations.
Efficient Computation: NumPy arrays are designed for numerical and

scientific computing. They are implemented in C and optimized for
performance, making operations on NumPy arrays significantly faster
than equivalent operations on Python lists.
Multidimensional Support: NumPy arrays can have multiple dimensions,

such as 1D arrays (vectors), 2D arrays (matrices), and even higher-
dimensional arrays. This capability is essential for working with complex
data, such as images, time series, and numerical simulations.
Broadcasting: NumPy arrays support broadcasting, which allows for

element-wise operations between arrays of different shapes and sizes.
This simplifies many mathematical operations and reduces the need for
explicit loops.
Array-Oriented Functions: NumPy provides a wide range of mathematical

functions and operations that can be applied directly to entire arrays
without the need for explicit loops. This promotes a more concise and
efficient coding style.
13. Explain range() function with example. 2 Marks
Answer: Definition-1 Mark, Example- 1 Mark
The Python range() function returns a sequence of numbers, in a given

range. The most common use of it is to iterate sequences on a sequence of
numbers using Python loops.
Syntax: range(start, stop, step)

Parameter :
● start: [ optional ] start value of the sequence
● stop: next value after the end value of the sequence

● step: [ optional ] integer value, denoting the difference between any
two numbers in the sequence
Return : Returns an object that represents a sequence of numbers
14. Define "kurtosis" in statistics. 2 Marks
Answer:
Kurtosis is a statistical measure that quantifies the "tailedness" of a

probability distribution, indicating how much data in a distribution
deviates from a normal distribution. There are three common types of
kurtosis: Leptokurtic, Mesokurtic and Platykurtic.
15.Write a R program to print the output as a subtraction of two vectors
x = (1, 2, 3, 4, 5) and y = (10, 20, 30, 40, 50). 2 Marks
Answer:
x = c(1, 2, 3, 4, 5) (1/2 mark)
y = c(10, 20, 30, 40, 50) (1/2 mark)

z=x-y (1/2 mark)
print(z) (1/2 mark)
Section C
Answer any 5 of the following 7 questions. (05×10 marks=50 Marks)
16.
a) Compare and contrast the list and set data structures in Python with
examples.
5 Marks
Answer:Any 5 Points- 1 mark each
List Set
Lists are Ordered. Sets are Unordered.
Sets are mutable but only stored

Lists are Mutable
immutable elements.
The List is an indexed sequence. The Set is a non-indexed sequences.
The list allows duplicate elements The set doesn’t allow duplicate elements.
elements by their position can be Position access to elements is not

accessed. allowed.
Multiple null elements can be

Null elements can be stored only once.
stored.
We can represent a List by [ ] We can represent a set by { }
Example: [6, 7, 8, 9, 10] Example: {6, 7, 8, 9, 10}
If we want to create an empty list, If we want to create an empty set, we

we use: l=[] use: a=set()
b) Write Python code to open a file, read its contents, write any sentence, and
print them to the console. 5 Marks
Answer: Answer should contain open() function in read and write mode.
17.
a) List out some common types of plots that Matplotlib can create, and explain
any two of them in brief. 5 Marks
Answer:List out plots-1 mark, explanation of any 2 plots-2 marks each
● common types of plots that Matplotlib can create:
1. Line Plot:
2. Scatter Plot
3. Bar Plot
4. Histogram
5. Pie plot
6. Area plot
1. Line Plot:
Line plots are drawn by joining straight lines connecting data points where the x-
axis and y-axis values intersect. Line plots are the simplest form of representing
data. In Matplotlib, the plot() function represents this.
Example:
import matplotlib.pyplot as pyplot
pyplot.plot([1,2,3,5,6], [1, 2, 3, 4, 6])

pyplot.axis([0, 7,8, 10])
# Print the chart

pyplot.show()
Program Output:
Bar Plot
The bar plots are vertical/horizontal rectangular graphs that show data
comparison where you can gauge the changes over a period represented in
another axis (mostly the X-axis). Each bar can store the value of one or multiple
data divided in a ratio. The longer a bar becomes, the greater the value it holds.
In Matplotlib, we use the bar() or bar() function to represent it.
Example:
pyplot.bar([0.25,2.25,3.25,5.25,7.25],[300,400,200,600,700],
lab_l="Carpenter",color='b',width=0.5)
pyplot.bar([0.75,1.75,2.75,3.75,4.75],[50,30,20,50,60],
label="Plumber", color='g',width=.5)
pyplot.legend()
pyplot.xlabel('Days')
pyplot.ylabel('Wage')
pyplot.title('Details')
# Print the chart

pyplot.show()
Program Output:
Scatter Plot
We can implement the scatter (previously called XY) plots while comparing
various data variables to determine the connection between dependent and
independent variables. The data gets expressed as a collection of points
clustered together meaningfully. Here each value has one variable (x)
determining the relationship with the other (Y). We use ht
Example:
x1 = [1, 2.5,3,4.5,5,6.5,7]
y1 = [1,2, 3, 2, 1, 3, 4]
x2=[8, 8.5, 9, 9.5, 10, 10.5, 11]
y2=[3,3.5, 3.7, 4,4.5, 5, 5.2]
pyplot.scatter(x1, y1, label = 'high bp low heartrate', color='c')
pyplot.scatter(x2,y2,label='low bp high heartrate',color='g')
pyplot.title('Smart Band Data Report')
pyplot.xlabel('x')
pyplot.ylabel('y')
pyplot.legend()
# Print the chart
pyplot.show()
Program Output:
Pie Plot
A pie plot is a circular graph where the data get represented within that
components/segments or slices of pie. Data analysts use them while
representing the percentage or proportional data in which each pie slice
represents an item or data classification. In Matplotlib, the pie() function
represents it.
Example:
slice = [12, 25, 50, 36, 19]
activities = ['NLP','Neural Network', 'Data analytics', 'Quantum Computing',
'Machine Learning']
cols = ['r','b','c','g', 'orange']
pyplot.pie(slice,
labels =activities,
colors = cols,
startangle = 90,
shadow = True,
explode =(0,0.1,0,0,0),x
autopct ='%1.1f%%')
pyplot.title('Training Subjects')
pyplot.show()
Program Output:
b) Explain Dictionary data type in Python. How to create, access, and modify
dictionary elements?
5 Marks
Answer:Dictionary explanation: 2 marks, create, access, and modify
dictionary elements: 1 mark each.
Dictionary:
• Python dictionary is an unordered collection of elements.
• The data is stored as key-value pairs using a Python dictionary.
• This data structure is mutable.
• Keys must only have one component.
• Values can be of any type, including integer, list, and tuple.
• Eg.
d = {1: 500, 'name': 'Akash', 'roll': 120, 'city': 'Pune', 0: 100, 2: 'Hello'}
Create, access, and modify dictionary elements:

● You can create a dictionary using curly braces {} and specifying key-value
pairs separated by colons :
my_dict = {"name": "John", "age": 30, "city": "New York",}
● While indexing is used with other data types to access values, a dictionary
uses keys. Keys can be used inside square brackets []
● Dictionaries are mutable. We can add new elements or change the value
of existing elements using an assignment operator.
● If the key is already present, then the existing value gets updated. In case
the key is not present, a new (key: value) pair is added to the dictionary.
18.
a) Develop a Python program that implements a function that calculates the
factorial of a given number. 5 Marks
Answer:
Any logic that is applicable can be used to calculate the factorial of a number,
but it should be in a function only.
Program:
b) The joint probability distribution of two random variables X and Y is given by
y\x -2 4
1 0.1 0.1
-3 0.2 0.4
5 0.1 0.1
(i) Evaluate the marginal distributions of y.

(ii) Examine whether X & Y are independent.
(iii)Find P(Y=5/X=4) (1+2+2 mark)
Answer:
i) Marginal Distribution of y is
y 1 -3 5
h(y) 0.2 0.6 0.2
(1 Mark)
ii) As P(-2,1) =0.1≠ P(x=-2).P(y=1)=0.4*0.2=0.08, x and y are not independent
(2mark)
iii) P(Y=5/X=4)= f(4,5)/g(4)=0.1/0.6=1/6 (2 Marks )
19.
a) Explain different NumPy attributes. 5 Marks
Answer: 1 mark each

Individual examples can be given, or one program demonstrating attributes.
1. shape:
The shape attribute returns a tuple representing the dimensions (size) of the
NumPy array. For a 1D array, it returns the number of elements; for a 2D array, it
returns a tuple with the number of rows and columns, and so on.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape) # Output: (2, 3) - 2 rows, 3 columns
2. dtype:
The dtype attribute returns the data type of the elements in the NumPy
array.
import numpy as np
arr = np.array([1, 2, 3])
print(arr.dtype) # Output: int64
3. size:
The size attribute returns the total number of elements in the array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.size) # Output: 6
4. ndim:
The ndim attribute returns the number of dimensions (axes) in the array.
import numpy as np
arr = np.array([1, 2, 3])
print(arr.ndim) # Output: 1 (1D array)
5. type():
Prints type of array object
b) Write an output of the following R program

print("New vector using seq() function-")
v = seq(3, 15, by= 3)
print("Original vector:")
print(v)
print("Check which values are even:\n")
is_even<- vector %% 2 == 0
print(is_even)
5 Marks
Answer:
New vector using seq() function- (1 mark)

3 6 9 12 15 (1 mark)
Check which values are even: (1 mark)
FALSE TRUE FALSE TRUE FALSE (2 mark)
20.
a. Suppose you have a dataset representing the test scores (out of 100) of a
group of students in a math class. The scores are as follows: 85, 92, 78, 88, 95,
90, 82, and 89. Calculate the mean and standard deviation of these test scores
5 Marks
Answer:---
Mean (μ) = (Sum of all values) / (Number of values) (1 mark)
Mean (μ) = (85 + 92 + 78 + 88 + 95 + 90 + 82 + 89) / 8 = 699 / 8 = 87.375
The mean test score is 87.375. (1 mark)
Standard Deviation () = [((xi - )²) / (N )] (1 mark)
(xi - )² 6.39 + 21.20 + 88.39 + 0.38 + 58.92 + 6.90 + 29.09 + 2.62 213.09 (1 mark)
Standard Deviation () = (213.09 / 8) =
So, the standard deviation of the test scores is approximately 5.16. (1 mark)
b. Explain the key data structures in Pandas. 5 Marks
Answer:Series: 2.5 marks, Dataframe: 2.5 marks
Pandas is a popular Python library for data manipulation and analysis. It provides
two main key data structures for working with structured data: Series and
DataFrame. These data structures are built on top of NumPy arrays and offer
powerful and flexible ways to work with data. Here's an explanation of each:
1.Series:
A Series is essentially a one-dimensional labeled array. It can hold data of any

type (integers, floats, strings, etc.) and is similar to a column in a spreadsheet or
a single variable in statistics.
Each element in a Series is associated with a label or index, which can be used
for data retrieval and alignment.
You can create a Series from a list, NumPy array, or dictionary.
import pandas as pd
# a simple char list
list6 = ['g', 'e', 'e', 'k', 's']
# create series form a char list
res = pd.Series(list6)
print(res)
Output:
2. DataFrame:
A DataFrame is a two-dimensional labeled data structure, similar to a

spreadsheet or a SQL table. It consists of rows and columns, where each column
can have a different data type.
DataFrame is the most commonly used Pandas data structure, and it's suitable
for a wide range of data analysis tasks.
You can create a DataFrame from a variety of sources, including dictionaries,

NumPy arrays, CSV files, Excel files, and more.
Example:
import pandas as pd
# list of strings
l = ['Geeks', 'For', 'Geeks', 'is','portal', 'for', 'Geeks']
# Calling DataFrame constructor on list
df = pd.DataFrame(l)
display(df)
Output:
21.
a. Write Python statements that create an empty list, an empty tuple, an empty
set, an empty dictionary, and an empty string. 5 marks
Answer:1 Mark each
empty list:
l=[]
an empty tuple:
t=()
an empty set:
empty_set=set()
an empty dictionary:
d={}
an empty string:
s=” “
b. There are 50 students in a class, the regression equation of marks in Python

programming (X) on marks in Mathematics (Y) is 3Y – 5X + 180 = 0. The mean
( )
th
9
marks of Mathematics is 44 and variance of marks in Python is of the
16
variance of marks in Mathematics. Find the mean marks in Python programming
and the coefficient of correlation between marks in two subjects.
5 Marks
Answer:
n= 50 ,Y =44
2 9 2 σx 3
σ x = σ y =¿ = ____________ 1M
16 σy 4
Given regression equation X on Y is 3Y – 5X + 180 = 0
3
X = Y +36.
5
σx 3
And b XY =r =
σy 5
r = 0.8. _____________2M
the regression line pass through the point ( X , Y ) ,
X =62.4 ________2M.
22.
a) Compare and contrast the use of central tendency measures and dispersion
measures in Exploratory Data Analysis. 5 marks
Answer:
● Central tendency measures focus on summarizing where the data tends to

cluster, while dispersion measures focus on how data points are spread
out around the central value.
● Central tendency measures provide a single value that represents the
center of the data, whereas dispersion measures provide insights into the
variability of the data.
● The choice between central tendency and dispersion measures depends
on the nature of the data and the specific goals of the analysis. In many
cases, both types of measures are used together to provide a more
comprehensive understanding of the data distribution.
● Central tendency measures are appropriate for answering questions like
"What is the typical value?" or "Where does the data tend to center?"
while dispersion measures help answer questions like "How spread out are
the data points?" or "What is the variability within the data?"
b) Explain for and while loop with syntax, flowchart & examples. 5 marks
Answer:while loop:2.5 Marks, for loop: 2.5 Marks
While loop:
Python While Loop is used to execute a block of statements repeatedly until a

given condition is satisfied. And when the condition becomes false, the line
immediately after the loop in the program is executed.
Syntax of While loop:
while expression:
statements
Sample Example :
# Python program to illustrate while loop
count = 0
while (count < 3):
count = count + 1
print("Python")
Output :
Python
Python
Python
for loop :
Python For loop is used for sequential traversal i.e. it is used for iterating over an
iterable like String, Tuple, List, Set or Dictionary.
Loop continues until we reach the last item in the sequence. The body of for loop
is separated from the rest of the code using indentation.
Syntax of for loop :

for val in sequence:
loop body
Flowchart of for loop:
Sample Example :
# Python program to illustrate Iterating over a list
l = [10,20,30]
for i in l:
print(i)
Output :
10
20
30

Dec 23 py

Uploaded by

Dec 23 py

Uploaded by

.

Indian Maritime University

Date: Max Marks: 70

MCQs–All Questions are Compulsory. (10×01 mark = 10 Marks)

6. c. Using the ‘def’ keyword

8. a. A multi-dimensional array object

10.c. To visualize the spread and central tendency of numerical data

Answer all the Questions. (05×02 marks=10 Marks)

11.Explain tuple unpacking in Python with an example. 2 Marks

number of elements in the tuple.

Answer: Definition-1 Mark , Comparison with list- 1 Mark (1 or 2 points)

• The homogeneous multidimensional array is the main object of NumPy.

• The dimensions are called axis in NumPy.

• The NumPy's array class is known as ndarray or alias array.

• Types of Numpy arrays:

1.One Dimensional Array:

A one-dimensional array is a type of linear array.

NumPy arrays provide several advantages over Python lists:

Efficient Computation: NumPy arrays are designed for numerical and

Multidimensional Support: NumPy arrays can have multiple dimensions,

Broadcasting: NumPy arrays support broadcasting, which allows for

Array-Oriented Functions: NumPy provides a wide range of mathematical

13. Explain range() function with example. 2 Marks

Answer: Definition-1 Mark, Example- 1 Mark

The Python range() function returns a sequence of numbers, in a given

Syntax: range(start, stop, step)

● stop: next value after the end value of the sequence

14. Define "kurtosis" in statistics. 2 Marks

Kurtosis is a statistical measure that quantifies the "tailedness" of a

15.Write a R program to print the output as a subtraction of two vectors

x = (1, 2, 3, 4, 5) and y = (10, 20, 30, 40, 50). 2 Marks

x = c(1, 2, 3, 4, 5) (1/2 mark)

y = c(10, 20, 30, 40, 50) (1/2 mark)

Answer any 5 of the following 7 questions. (05×10 marks=50 Marks)

Lists are Ordered. Sets are Unordered.

Sets are mutable but only stored

The List is an indexed sequence. The Set is a non-indexed sequences.

elements by their position can be Position access to elements is not

Multiple null elements can be

We can represent a List by [ ] We can represent a set by { }

Example: [6, 7, 8, 9, 10] Example: {6, 7, 8, 9, 10}

If we want to create an empty list, If we want to create an empty set, we

pyplot.plot([1,2,3,5,6], [1, 2, 3, 4, 6])

# Print the chart

# Print the chart

Create, access, and modify dictionary elements:

b) The joint probability distribution of two random variables X and Y is given by

(i) Evaluate the marginal distributions of y.

Answer: 1 mark each

arr = np.array([[1, 2, 3], [4, 5, 6]])

b) Write an output of the following R program

New vector using seq() function- (1 mark)

Mean (μ) = (Sum of all values) / (Number of values) (1 mark)

Mean (μ) = (85 + 92 + 78 + 88 + 95 + 90 + 82 + 89) / 8 = 699 / 8 = 87.375

The mean test score is 87.375. (1 mark)

Standard Deviation () = [((xi - )²) / (N )] (1 mark)

Standard Deviation () = (213.09 / 8) =

b. Explain the key data structures in Pandas. 5 Marks

Answer:Series: 2.5 marks, Dataframe: 2.5 marks

A Series is essentially a one-dimensional labeled array. It can hold data of any

You can create a Series from a list, NumPy array, or dictionary.

# a simple char list

list6 = ['g', 'e', 'e', 'k', 's']

# create series form a char list

A DataFrame is a two-dimensional labeled data structure, similar to a

You can create a DataFrame from a variety of sources, including dictionaries,

l = ['Geeks', 'For', 'Geeks', 'is','portal', 'for', 'Geeks']

# Calling DataFrame constructor on list

Answer:1 Mark each

b. There are 50 students in a class, the regression equation of marks in Python

● Central tendency measures focus on summarizing where the data tends to

Answer:while loop:2.5 Marks, for loop: 2.5 Marks

Python While Loop is used to execute a block of statements repeatedly until a

Syntax of While loop:

Syntax of for loop :

You might also like