0% found this document useful (0 votes)
2 views19 pages

Dec 23 py

The document outlines the end semester examination details for the B Tech (ME) program at Indian Maritime University, including subject code, name, date, maximum marks, and pass marks. It contains an answer key for multiple-choice questions, descriptive questions, and programming tasks related to statistics and data analysis using Python and R. The document also includes various questions on Python programming, data structures, and statistical concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
2 views19 pages

Dec 23 py

The document outlines the end semester examination details for the B Tech (ME) program at Indian Maritime University, including subject code, name, date, maximum marks, and pass marks. It contains an answer key for multiple-choice questions, descriptive questions, and programming tasks related to statistics and data analysis using Python and R. The document also includes various questions on Python programming, data structures, and statistical concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 19

.

Indian Maritime University


A Central University, Govt of India)
End Semester Examinations – December 2023
Programme Name: B Tech (ME)
Semester: III
Subject Code: UG11T4305
Subject Name: Statistics and Data Analysis Using Python and R

Date: Max Marks: 70


Duration: 03 HrsPass Marks: 35

Answer Key
Section A

MCQs–All Questions are Compulsory. (10×01 mark = 10 Marks)

1. d. a.title()

2.b. 2

3. a. Categorical data

4.c. 3

5.d. DataFrame

6. c. Using the ‘def’ keyword

7.b. type()

8. a. A multi-dimensional array object

9. a. 15

10.c. To visualize the spread and central tendency of numerical data

Section B

Answer all the Questions. (05×02 marks=10 Marks)

11.Explain tuple unpacking in Python with an example. 2 Marks


Answer:

Tuple unpacking is a feature of Python that allows you to assign the elements of
a tuple to separate variables in a single statement. This is achieved by using the
assignment operator (=) followed by the tuple you want to unpack. The number
of variables on the left-hand side of the assignment operator must match the

number of elements in the tuple.

12. What is a NumPy array, and how does it differ from a Python list? 2 Marks

Answer: Definition-1 Mark , Comparison with list- 1 Mark (1 or 2 points)

• The homogeneous multidimensional array is the main object of NumPy.

• It is basically a table of elements which are all of the same type and
indexed by a tuple of positive integers.

• The dimensions are called axis in NumPy.

• The NumPy's array class is known as ndarray or alias array.

• Types of Numpy arrays:

1.One Dimensional Array:

A one-dimensional array is a type of linear array.

2. Multidimensional Array:
Data in multidimensional arrays are stored in tabular form

NumPy arrays provide several advantages over Python lists:

Homogeneous Data Type: In a NumPy array, all elements are of the same
data type, unlike Python lists, which can contain elements of different
data types. This homogeneity allows for more efficient memory storage
and mathematical operations.

Efficient Computation: NumPy arrays are designed for numerical and


scientific computing. They are implemented in C and optimized for
performance, making operations on NumPy arrays significantly faster
than equivalent operations on Python lists.

Multidimensional Support: NumPy arrays can have multiple dimensions,


such as 1D arrays (vectors), 2D arrays (matrices), and even higher-
dimensional arrays. This capability is essential for working with complex
data, such as images, time series, and numerical simulations.

Broadcasting: NumPy arrays support broadcasting, which allows for


element-wise operations between arrays of different shapes and sizes.
This simplifies many mathematical operations and reduces the need for
explicit loops.

Array-Oriented Functions: NumPy provides a wide range of mathematical


functions and operations that can be applied directly to entire arrays
without the need for explicit loops. This promotes a more concise and
efficient coding style.

13. Explain range() function with example. 2 Marks

Answer: Definition-1 Mark, Example- 1 Mark

The Python range() function returns a sequence of numbers, in a given


range. The most common use of it is to iterate sequences on a sequence of
numbers using Python loops.

Syntax: range(start, stop, step)


Parameter :
● start: [ optional ] start value of the sequence

● stop: next value after the end value of the sequence


● step: [ optional ] integer value, denoting the difference between any
two numbers in the sequence
Return : Returns an object that represents a sequence of numbers

14. Define "kurtosis" in statistics. 2 Marks

Answer:

Kurtosis is a statistical measure that quantifies the "tailedness" of a


probability distribution, indicating how much data in a distribution
deviates from a normal distribution. There are three common types of
kurtosis: Leptokurtic, Mesokurtic and Platykurtic.

15.Write a R program to print the output as a subtraction of two vectors

x = (1, 2, 3, 4, 5) and y = (10, 20, 30, 40, 50). 2 Marks

Answer:

x = c(1, 2, 3, 4, 5) (1/2 mark)

y = c(10, 20, 30, 40, 50) (1/2 mark)


z=x-y (1/2 mark)
print(z) (1/2 mark)
Section C

Answer any 5 of the following 7 questions. (05×10 marks=50 Marks)

16.
a) Compare and contrast the list and set data structures in Python with
examples.
5 Marks
Answer:Any 5 Points- 1 mark each

List Set

Lists are Ordered. Sets are Unordered.

Sets are mutable but only stored


Lists are Mutable
immutable elements.

The List is an indexed sequence. The Set is a non-indexed sequences.

The list allows duplicate elements The set doesn’t allow duplicate elements.

elements by their position can be Position access to elements is not


accessed. allowed.

Multiple null elements can be


Null elements can be stored only once.
stored.

We can represent a List by [ ] We can represent a set by { }

Example: [6, 7, 8, 9, 10] Example: {6, 7, 8, 9, 10}

If we want to create an empty list, If we want to create an empty set, we


we use: l=[] use: a=set()

b) Write Python code to open a file, read its contents, write any sentence, and
print them to the console. 5 Marks
Answer: Answer should contain open() function in read and write mode.
17.

a) List out some common types of plots that Matplotlib can create, and explain
any two of them in brief. 5 Marks
Answer:List out plots-1 mark, explanation of any 2 plots-2 marks each
● common types of plots that Matplotlib can create:
1. Line Plot:
2. Scatter Plot
3. Bar Plot
4. Histogram
5. Pie plot
6. Area plot
1. Line Plot:
Line plots are drawn by joining straight lines connecting data points where the x-
axis and y-axis values intersect. Line plots are the simplest form of representing
data. In Matplotlib, the plot() function represents this.

Example:
import matplotlib.pyplot as pyplot

pyplot.plot([1,2,3,5,6], [1, 2, 3, 4, 6])


pyplot.axis([0, 7,8, 10])

# Print the chart


pyplot.show()

Program Output:
Bar Plot
The bar plots are vertical/horizontal rectangular graphs that show data
comparison where you can gauge the changes over a period represented in
another axis (mostly the X-axis). Each bar can store the value of one or multiple
data divided in a ratio. The longer a bar becomes, the greater the value it holds.
In Matplotlib, we use the bar() or bar() function to represent it.

Example:
import matplotlib.pyplot as pyplot

pyplot.bar([0.25,2.25,3.25,5.25,7.25],[300,400,200,600,700],
lab_l="Carpenter",color='b',width=0.5)
pyplot.bar([0.75,1.75,2.75,3.75,4.75],[50,30,20,50,60],
label="Plumber", color='g',width=.5)
pyplot.legend()
pyplot.xlabel('Days')
pyplot.ylabel('Wage')
pyplot.title('Details')

# Print the chart


pyplot.show()
Program Output:
Scatter Plot
We can implement the scatter (previously called XY) plots while comparing
various data variables to determine the connection between dependent and
independent variables. The data gets expressed as a collection of points
clustered together meaningfully. Here each value has one variable (x)
determining the relationship with the other (Y). We use ht

Example:
import matplotlib.pyplot as pyplot
x1 = [1, 2.5,3,4.5,5,6.5,7]
y1 = [1,2, 3, 2, 1, 3, 4]
x2=[8, 8.5, 9, 9.5, 10, 10.5, 11]
y2=[3,3.5, 3.7, 4,4.5, 5, 5.2]
pyplot.scatter(x1, y1, label = 'high bp low heartrate', color='c')
pyplot.scatter(x2,y2,label='low bp high heartrate',color='g')
pyplot.title('Smart Band Data Report')
pyplot.xlabel('x')
pyplot.ylabel('y')
pyplot.legend()
# Print the chart
pyplot.show()
Program Output:
Pie Plot
A pie plot is a circular graph where the data get represented within that
components/segments or slices of pie. Data analysts use them while
representing the percentage or proportional data in which each pie slice
represents an item or data classification. In Matplotlib, the pie() function
represents it.

Example:
import matplotlib.pyplot as pyplot
slice = [12, 25, 50, 36, 19]
activities = ['NLP','Neural Network', 'Data analytics', 'Quantum Computing',
'Machine Learning']
cols = ['r','b','c','g', 'orange']
pyplot.pie(slice,
labels =activities,
colors = cols,
startangle = 90,
shadow = True,
explode =(0,0.1,0,0,0),x
autopct ='%1.1f%%')
pyplot.title('Training Subjects')
pyplot.show()
Program Output:
b) Explain Dictionary data type in Python. How to create, access, and modify
dictionary elements?
5 Marks
Answer:Dictionary explanation: 2 marks, create, access, and modify
dictionary elements: 1 mark each.

Dictionary:
• Python dictionary is an unordered collection of elements.
• The data is stored as key-value pairs using a Python dictionary.
• This data structure is mutable.
• Keys must only have one component.
• Values can be of any type, including integer, list, and tuple.
• Eg.
d = {1: 500, 'name': 'Akash', 'roll': 120, 'city': 'Pune', 0: 100, 2: 'Hello'}

Create, access, and modify dictionary elements:


● You can create a dictionary using curly braces {} and specifying key-value
pairs separated by colons :
my_dict = {"name": "John", "age": 30, "city": "New York",}

● While indexing is used with other data types to access values, a dictionary
uses keys. Keys can be used inside square brackets []

● Dictionaries are mutable. We can add new elements or change the value
of existing elements using an assignment operator.
● If the key is already present, then the existing value gets updated. In case
the key is not present, a new (key: value) pair is added to the dictionary.

18.
a) Develop a Python program that implements a function that calculates the
factorial of a given number. 5 Marks
Answer:

Any logic that is applicable can be used to calculate the factorial of a number,
but it should be in a function only.

Program:

b) The joint probability distribution of two random variables X and Y is given by

y\x -2 4

1 0.1 0.1

-3 0.2 0.4

5 0.1 0.1

(i) Evaluate the marginal distributions of y.


(ii) Examine whether X & Y are independent.
(iii)Find P(Y=5/X=4) (1+2+2 mark)
Answer:

i) Marginal Distribution of y is

y 1 -3 5
h(y) 0.2 0.6 0.2

(1 Mark)
ii) As P(-2,1) =0.1≠ P(x=-2).P(y=1)=0.4*0.2=0.08, x and y are not independent
(2mark)
iii) P(Y=5/X=4)= f(4,5)/g(4)=0.1/0.6=1/6 (2 Marks )

19.
a) Explain different NumPy attributes. 5 Marks

Answer: 1 mark each


Individual examples can be given, or one program demonstrating attributes.

1. shape:

The shape attribute returns a tuple representing the dimensions (size) of the
NumPy array. For a 1D array, it returns the number of elements; for a 2D array, it
returns a tuple with the number of rows and columns, and so on.
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])


print(arr.shape) # Output: (2, 3) - 2 rows, 3 columns

2. dtype:
The dtype attribute returns the data type of the elements in the NumPy
array.

import numpy as np
arr = np.array([1, 2, 3])
print(arr.dtype) # Output: int64

3. size:
The size attribute returns the total number of elements in the array.

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.size) # Output: 6

4. ndim:
The ndim attribute returns the number of dimensions (axes) in the array.

import numpy as np
arr = np.array([1, 2, 3])
print(arr.ndim) # Output: 1 (1D array)

5. type():
Prints type of array object

b) Write an output of the following R program


print("New vector using seq() function-")
v = seq(3, 15, by= 3)
print("Original vector:")
print(v)
print("Check which values are even:\n")
is_even<- vector %% 2 == 0
print(is_even)
5 Marks
Answer:

New vector using seq() function- (1 mark)


3 6 9 12 15 (1 mark)
Check which values are even: (1 mark)
FALSE TRUE FALSE TRUE FALSE (2 mark)

20.

a. Suppose you have a dataset representing the test scores (out of 100) of a
group of students in a math class. The scores are as follows: 85, 92, 78, 88, 95,
90, 82, and 89. Calculate the mean and standard deviation of these test scores
5 Marks

Answer:---

Mean (μ) = (Sum of all values) / (Number of values) (1 mark)

Mean (μ) = (85 + 92 + 78 + 88 + 95 + 90 + 82 + 89) / 8 = 699 / 8 = 87.375

The mean test score is 87.375. (1 mark)

Standard Deviation () = [((xi - )²) / (N )] (1 mark)

(xi - )² 6.39 + 21.20 + 88.39 + 0.38 + 58.92 + 6.90 + 29.09 + 2.62 213.09 (1 mark)

Standard Deviation () = (213.09 / 8) =

So, the standard deviation of the test scores is approximately 5.16. (1 mark)

b. Explain the key data structures in Pandas. 5 Marks

Answer:Series: 2.5 marks, Dataframe: 2.5 marks

Pandas is a popular Python library for data manipulation and analysis. It provides
two main key data structures for working with structured data: Series and
DataFrame. These data structures are built on top of NumPy arrays and offer
powerful and flexible ways to work with data. Here's an explanation of each:

1.Series:

A Series is essentially a one-dimensional labeled array. It can hold data of any


type (integers, floats, strings, etc.) and is similar to a column in a spreadsheet or
a single variable in statistics.
Each element in a Series is associated with a label or index, which can be used
for data retrieval and alignment.

You can create a Series from a list, NumPy array, or dictionary.

import pandas as pd

# a simple char list

list6 = ['g', 'e', 'e', 'k', 's']

# create series form a char list

res = pd.Series(list6)

print(res)

Output:

2. DataFrame:

A DataFrame is a two-dimensional labeled data structure, similar to a


spreadsheet or a SQL table. It consists of rows and columns, where each column
can have a different data type.

DataFrame is the most commonly used Pandas data structure, and it's suitable
for a wide range of data analysis tasks.

You can create a DataFrame from a variety of sources, including dictionaries,


NumPy arrays, CSV files, Excel files, and more.

Example:

import pandas as pd

# list of strings

l = ['Geeks', 'For', 'Geeks', 'is','portal', 'for', 'Geeks']

# Calling DataFrame constructor on list

df = pd.DataFrame(l)

display(df)

Output:
21.

a. Write Python statements that create an empty list, an empty tuple, an empty
set, an empty dictionary, and an empty string. 5 marks

Answer:1 Mark each

empty list:

l=[]

an empty tuple:

t=()

an empty set:

empty_set=set()

an empty dictionary:

d={}

an empty string:

s=” “

b. There are 50 students in a class, the regression equation of marks in Python


programming (X) on marks in Mathematics (Y) is 3Y – 5X + 180 = 0. The mean

( )
th
9
marks of Mathematics is 44 and variance of marks in Python is of the
16
variance of marks in Mathematics. Find the mean marks in Python programming
and the coefficient of correlation between marks in two subjects.
5 Marks

Answer:

n= 50 ,Y =44
2 9 2 σx 3
σ x = σ y =¿ = ____________ 1M
16 σy 4
Given regression equation X on Y is 3Y – 5X + 180 = 0
3
X = Y +36.
5
σx 3
And b XY =r =
σy 5
r = 0.8. _____________2M
the regression line pass through the point ( X , Y ) ,

X =62.4 ________2M.

22.

a) Compare and contrast the use of central tendency measures and dispersion
measures in Exploratory Data Analysis. 5 marks

Answer:

● Central tendency measures focus on summarizing where the data tends to


cluster, while dispersion measures focus on how data points are spread
out around the central value.
● Central tendency measures provide a single value that represents the
center of the data, whereas dispersion measures provide insights into the
variability of the data.
● The choice between central tendency and dispersion measures depends
on the nature of the data and the specific goals of the analysis. In many
cases, both types of measures are used together to provide a more
comprehensive understanding of the data distribution.
● Central tendency measures are appropriate for answering questions like
"What is the typical value?" or "Where does the data tend to center?"
while dispersion measures help answer questions like "How spread out are
the data points?" or "What is the variability within the data?"
b) Explain for and while loop with syntax, flowchart & examples. 5 marks

Answer:while loop:2.5 Marks, for loop: 2.5 Marks

While loop:

Python While Loop is used to execute a block of statements repeatedly until a


given condition is satisfied. And when the condition becomes false, the line
immediately after the loop in the program is executed.

Syntax of While loop:

while expression:
statements

Sample Example :
# Python program to illustrate while loop
count = 0
while (count < 3):
count = count + 1
print("Python")

Output :
Python
Python
Python

for loop :
Python For loop is used for sequential traversal i.e. it is used for iterating over an
iterable like String, Tuple, List, Set or Dictionary.
Loop continues until we reach the last item in the sequence. The body of for loop
is separated from the rest of the code using indentation.

Syntax of for loop :


for val in sequence:
loop body
Flowchart of for loop:

Sample Example :
# Python program to illustrate Iterating over a list
l = [10,20,30]
for i in l:
print(i)
Output :
10
20
30

You might also like