Assignment 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Activity Data Type

Number of beatings from Wife Discrete


Results of rolling a dice Discrete
Weight of a person Continuous
Weight of Gold Continuous
Distance between two places Continuous
Length of a leaf Continuous
Dog's weight Continuous
Blue Color Discrete
Number of kids Discrete
Number of tickets in Indian railways Discrete
Number of times married Discrete
Gender (Male or Female) Discrete
Q1) Identify the Data type for the Following:

Q2) Identify the Data types, which were among the following
Nominal, Ordinal, Interval, Ratio.
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Interval
Weight Ratio
Hair Color Nominal
Socioeconomic Status Ordinal
Fahrenheit Temperature Interval
Height Ratio
Type of living accommodation Ordinal
Level of Agreement Ordinal
IQ(Intelligence Scale) Ratio
Sales Figures Interval
Blood Group Nominal
Time Of Day Ratio
Time on a Clock with Hands Ratio
Number of Children Ordinal
Religious Preference Nominal
Barometer Pressure Ratio
SAT Scores Ratio
Years of Education Interval

Q3) Three Coins are tossed, find the probability that two heads and one tail are
obtained?
ANS: {HHH, HHT, HTH, HTT, TTT, TTH, THT, THH}
Getting two heads and one tail : {HHT,HTH,THH}
So, Getting two heads and one tail/Number of desired outcomes
3/8=0.375
Q4) Two Dice are rolled, find the probability that sum is
a) Equal to 1
b) Less than or equal to 4
c) Sum is divisible by 2 and 3
ANS: Two Dice are rolled = 36
a) Probability of getting Sum equal to 1 = 0,
0/36 = 0
b) Probability of getting Sum less than or equal to 4 = {(1,1)(2,1),(3,1),(1,2).
(2,2),(1,3)}
6/36 =1/6=0.17

c) Probability of getting Sum Sum is divisible by 2 and 3 = {(1,5)(2,4),(3,3),


(4,2).(5,2),(6,6)}
6/36 =1/6=0.17

Q5) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at
random. What is the probability that none of the balls drawn is blue?
ANS: A bag contains 2 red, 3 green and 2 blue balls = S=2+3+2=7
N(S)=(7C2)=7*6/2*1 = 21
Two balls are drawn at random = N(A) =(5C2)= 5*4/2*1 = 10
the probability that none of the balls drawn is blue = N(A)/N(S)=10/21

Q6) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children (ignoring the nature of
the child-Generalized view)
CHILD Candies count Probability
A 1 0.015
B 4 0.20
C 3 0.65
D 5 0.005
E 6 0.01
F 2 0.120
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20
ANS: Expected value = sum(probability * count)
(1*0.015 + 4*0.20 +3*0.65 + 5*0.005 + 6*0.01 + 2*0.120) = 3.090
Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range &
comment about the values / draw inferences, for the given dataset
- For Points,Score,Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range
and also Comment about the values/ Draw some inferences.
Use Q7.csv file
ANS :
Mean Median Mode Variance Standard Range Comments
Deviation
Points 3.596563 3.69 3.92 0.285881 0.534679 2.17 Not a normal distribution,
data has skewness closer to 0
as data is equally distributed
around mean and kurtosis is
negative. Does not have
outliers.
Score 3.21725 3.435 3.44 0.957379 0.978457 3.911 Somewhat normal
distribution, will have positive
skewness and kurtosis. Has
outliers on upper extreme.
Weigh 17.84875 17.6 17.02 3.193166 1.786943 8.4 Somewhat normal
distribution, will have positive
skewness and kurtosis. Has
outliers on upper extreme.

import matplotlib.pyplot as plt


f,ax=plt.subplots(figsize=(15,5))
plt.subplot(1,3,1)
plt.boxplot(data.Points)
plt.title('Points')
plt.subplot(1,3,2)
plt.boxplot(data.Score)
plt.title('Score')
plt.subplot(1,3,3)
plt.boxplot(data.Weigh)
plt.title('Weigh')
plt.show()

Q8) Calculate Expected Value for the problem below


a) The weights (X) of patients at a clinic (in pounds), are
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected
Value of the Weight of that patient?
ANS: probability=1/9
Expected value=sum(PX)=sum(weights)/9=1308/9=145.3333
Using Python
ANS: import numpy as np
data=np.array([108, 110, 123, 134, 135, 145, 167, 187, 199])
data.mean()
Q9) Calculate Skewness, Kurtosis & draw inferences on the following data Cars
speed and distance
Use Q9_a.csv
SP and Weight(WT)
Use Q9_b.csv
ANS : Q.9 a)
Car Speed
Skewness -0.117510

Kurtosis -0.50899

Distance
Skewness 0.806895
Kurtosis 0.405053

Interfac
e
import matplotlib.pyplot as plt
f,ax=plt.subplots(figsize=(20,5))
plt.subplot(1,4,1)
plt.boxplot(data1.speed)
plt.title('Speed')
plt.subplot(1,4,2)
plt.boxplot(data1.dist)
plt.title('Distance')
plt.show()
Q.9 b)
SP
Skewnes
1.61145
s
Kurtosis 2.977329

Weight(WT)

Skewness -0.64753
Kurtosis 0.950291

Interface
import matplotlib.pyplot as plt
f,ax=plt.subplots(figsize=(20,5))
plt.subplot(1,4,1)
plt.boxplot(data2.SP)
plt.title('SP')
plt.subplot(1,4,2)
plt.boxplot(data2.WT)
plt.title('WT')
plt.show()

Q10) Draw inferences about the following boxplot & histogram


Ans: Data is positively skewed as it has a longer tail on the right. It will have a
negative kurtosis as data is distributed from 100-400, not very concentrated
about mean. Mean of weight will range between 50-100.

Ans: Data has outliers on upper extreme. More values are concentrated near
lower extreme which means data will have positive skewness and will have a
longer tail on right also the kurtosis will be positive as data is concentrated near
mean.
Q11) Suppose we want to estimate the average weight of an adult male in
Mexico. We draw a random sample of 2,000 men from a population of
3,000,000 men and weigh them. We find that the average person in our
sample weighs 200 pounds, and the standard deviation of the sample is 30
pounds. Calculate 94%,98%,96% confidence interval?

ANS: Avg. weight of Adult in Mexico with 94% CI =


198.738325292158,201.261674707842
stats.norm.interval(0.94,200,30/(2000**0.5))

Avg. weight of Adult in Mexico with 98% CI =


198.43943840429978,201.56056159570022
stats.norm.interval(0.98,200,30/(2000**0.5))

Avg. weight of Adult in Mexico with 96% CI =


198.62230334813333,201.37769665186667
stats.norm.interval(0.96,200,30/(2000**0.5))

Q12) Below are the scores obtained by a student in tests

34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean, median, variance, standard deviation.
ANS:
Mean 41
Median 40.5
SD 5.052663829
Variance 25.52941176

2) What can we say about the student marks?


ANS: plt.boxplot(x)
Data is Positivey skewed

Q13) What is the nature of skewness when mean, median of data are equal?
ANS: If the distribution is symmetric, then the mean is equal to the median, and the distribution
has zero skewness. If the distribution is both symmetric and unimodal, then the mean = median
= mode.

Q14) What is the nature of skewness when mean > median ?


ANS:  If the mean is greater than the median, the distribution is positively skewed.
Q15) What is the nature of skewness when median > mean?
ANS: If the mean is less than the median, the distribution is negatively skewed.
Q16) What does positive kurtosis value indicates for a data ?
ANS: Positive excess values of kurtosis (>3) indicate that a distribution is peaked and possess
thick tails

Q17) What does negative kurtosis value indicates for a data?


ANS: Negative excess values of kurtosis (<3) indicate that a distribution is flat and has thin tails

Q18) Answer the below questions using the below boxplot visualization.

What can we say about the distribution of the data?


Ans: The distribution is Asymmetric, left skewed distribution of the data .median is
much closer to the third quartile than the first quartile, which means the distribution is
left-skewed.

What is nature of skewness of the data?


Ans: Negatively skewness,
What will be the IQR of the data (approximately)?
ANS: 1st quartile(18)-2nd quartile(10)=8

Q19) Comment on the below Boxplot visualizations?

Draw an Inference from the


distribution of data for Boxplot 1
with respect Boxplot 2.
ANS: The vertical line inside the
both box that represents the
median is equally close to the first
quartile and the third quartile, which
means the distribution is
symmetrical and has no skew.

Q 20) Calculate probability from the given dataset for the below cases
Data _set: Cars.csv
Calculate the probability of MPG of Cars for the below cases.
MPG <- Cars$MPG
ANS:
a. P(MPG>38)= 0.3475
stats.norm.cdf(38,cars.MPG.mean(),cars.MPG.std())
b. P(MPG<40) = 0.7293
stats.norm.cdf(40,cars.MPG.mean(),cars.MPG.std())
c. P (20<MPG<50) = 1.2430
stats.norm.cdf(0.50,cars.MPG.mean(),cars.MPG.std())-
stats.norm.cdf(0.20,cars.MPG.mean(),cars.MPG.std())
Q 21) Check whether the data follows normal distribution
a) Check whether the MPG of Cars follows Normal Distribution
Dataset: Cars.csv
ANS: From plot and values we can say that data is fairly symmetrical, i.e
fairly normally distributed.
b) Check Whether the Adipose Tissue (AT) and Waist Circumference(Waist)
from wc-at data set follows Normal Distribution
Dataset: wc-at.csv
ANS: For AT mean> median, right whisker is larger than left whisker, data is
positively skewed.

For WC mean> median, both the whisker are of same length, median is
slightly shifted towards left. Data is fairly symmetrically distributed.

Q 22) Calculate the Z scores of 90% confidence interval,94% confidence


interval, 60% confidence interval
ANS: Z scores of 90% = 1.644
Z scores of 94% = 1.880
Z scores of 60% = 0.8416

Q 23) Calculate the t scores of 95% confidence interval, 96% confidence


interval, 99% confidence interval for sample size of 25
ANS: T scores of 95% = 2.063
T scores of 96% = 2.171
T scores of 99% = 2.796

Q 24) A Government company claims that an average light bulb lasts 270
days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs
last an average of 260 days, with a standard deviation of 90 days. If the
CEO's claim were true, what is the probability that 18 randomly selected
bulbs would have an average life of no more than 260 days

Hint: rcode  pt(tscore,df)

df  degrees of freedom
ANS: T score = -0.47140
p_value=1-stats.t.cdf(abs(-0.4714), df=17)= 0.32167411684460556

You might also like