Probability and Statistics (Tutorial 2)
Probability and Statistics (Tutorial 2)
Probability and Statistics (Tutorial 2)
Sample Variance:
Definition: If the 𝑛 observations in a sample are denoted by 𝑥1 , 𝑥2 , … , 𝑥𝑛 , the
sample variance is
𝑛
2
σ 𝑖=1 𝑥𝑖 − 𝑥ҧ
2
𝑛(σ𝑛𝑖=1 𝑥𝑖2 ) − σ𝑛𝑖=1 𝑥𝑖 2
𝑠 = =
𝑛−1 𝑛(𝑛 − 1)
Note: The Standard Deviation is the positive square root of the variance
Sample Range:
Definition: If the 𝑛 observations in a sample are denoted by 𝑥1 , 𝑥2 , … , 𝑥𝑛 ,
the sample range is
𝑟 = max 𝑥𝑖 − min(𝑥𝑖 )
Coefficient of Variation:
Definition: The coefficient of variation (CV) is a statistical measure of the
relative dispersion of data points in a data series around the mean.
𝑆
𝐶. 𝑉 = ത ∗ 100
𝑋
Example (1):
A sample of 13 tall buildings for two different cities is listed below. Find the
variance (using both formulas) and standard deviation for each, then determine
Which set of data is more variable?
Houston : 75, 71, 64, 56, 53, 55, 47, 55, 52, 50, 50, 50, 47
Pittsburgh: 64, 54, 40, 32, 46, 44, 42, 41, 40, 40, 34, 32, 30
Sample
Example (1):
Solution : For Houston ഥ = σ 𝑿 = 𝟕𝟐𝟓 = 𝟓𝟓. 𝟕𝟕
𝑿
𝒏 𝟏𝟑
2
σ(𝑋−𝑋)2 𝑛(σ 𝑋 2 )−(σ 𝑋)
n X 𝑿−𝑋 𝑿−𝑋
𝟐
𝑋2 Variance = 𝑠2 = =
𝑛−1 𝑛(𝑛−1)
1 75 19.23 369.79 5625 2
2 71 15.23 231.95 5041 2
σ(𝑋 − 𝑋) 946.27
𝑆 = = = 78.86
3 64 8.23 67.73 4096 𝑛−1 12
4 56 0.23 0.05 3136 OR
5 53 -2.77 7.67 2809
σ 2 ) − (σ 𝑋)2
6 55 -0.77 0.59 3025 𝑛( 𝑋
𝑆2 =
7 47 -8.77 76.91 2209 𝑛(𝑛 − 1)
8 55 -0.77 0.59 3025
13 41379 −(725)2
9 52 -3.77 14.21 2704 = = 78.86
13 (12)
10 50 -5.77 33.29 2500
11 50 -5.77 33.29 2500 σ(𝑋−𝑋)2
12 50 -5.77 33.29 2500 Standard deviation = 𝑠 = = 𝑆2
𝑛−1
13 47 -8.77 76.91 2209
Total 725 ـــــ 946.27 41379 𝑆 = 78.86 = 8.88
Example (1):
Solution : For Pittsburgh ഥ = σ 𝑿 = 𝟓𝟑𝟗 = 𝟒𝟏. 𝟒𝟔
𝑿
𝒏 𝟏𝟑
2
σ(𝑋−𝑋)2 𝑛(σ 𝑋 2 )−(σ 𝑋)
n X 𝑿−𝑋 𝑿−𝑋
𝟐
𝑋2 Variance = 𝑠2 = =
𝑛−1 𝑛(𝑛−1)
1 64 22.54 508.05 4096 2
2 54 12.54 157.25 2916 2
σ(𝑋 − 𝑋) 1065.21
𝑆 = = = 88.76
3 40 -1.46 2.13 1600 𝑛−1 12
4 32 -9.46 89.49 1024 OR
5 46 4.54 20.61 2116
σ 2 ) − (σ 𝑋)2
6 44 2.54 6.45 1936 𝑛( 𝑋
𝑆2 =
7 42 0.54 0.29 1764 𝑛(𝑛 − 1)
8 41 -0.46 0.21 1681
13 23413 −(539)2
9 40 -1.46 2.13 1600 = = 88.76
13 (12)
10 40 -1.46 2.13 1600
11 34 -7.46 55.65 1156 σ(𝑋−𝑋)2
12 32 -9.46 89.49 1024 Standard deviation = 𝑠 = = 𝑆2
𝑛−1
13 30 -11.46 131.33 900
Total 539 ـــــ 1065.21 23413 𝑆 = 88.76 = 9.42
Example (1):
Solution :
𝑆
Coefficient of Variation = 𝐶. 𝑉= ∗ 100
𝑋ത
8.88
𝐶. 𝑉𝐻𝑜𝑢𝑠𝑡𝑜𝑛 = ∗ 100 = 15.92% 𝑆𝑖𝑛𝑐𝑒 𝐶. 𝑉𝑃𝑖𝑡𝑡𝑠𝑏𝑢𝑟𝑔ℎ > 𝐶. 𝑉𝐻𝑜𝑢𝑠𝑡𝑜𝑛
55.77
9.42 ∴ Pittsburgh is more variable than Houston
𝐶. 𝑉𝑃𝑖𝑡𝑡𝑠𝑏𝑢𝑟𝑔ℎ = ∗ 100 = 22.72%
41.46
3) Measures of Position
Percentiles
Percentiles
I) given a set of data of a size 𝑛, and required the percentile rank of a certain
value 𝑋:
𝑛𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 + 0.5
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = ∗ 100%
𝑛
II) given a set of data of a size 𝑛, and required the corresponding value 𝑋 of a
certain percentile:
𝑛∗𝑃
Rank of 𝑋 =
100
Example (2):
The average weekly earnings in dollars for various industries are listed below.
804 736 659 489 777 623 597 524 228 555
a) Find the percentile rank of 777, 623 & 555.
b) What values corresponds to the 40th & 65th percentile?
Example (2):
Solution :
➢ By sorting data:
228 489 524 555 597 623 659 736 777 804
𝑛𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 + 0.5
a) Percentile = ∗ 100%
𝑡𝑜𝑡𝑎𝑙 𝑛𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
3+0.5
✓ For 555 ➔ P = ∗ 100% = 35% ➔ 𝑃35
10
5+0.5
✓ For 623 ➔ P = ∗ 100% = 55% ➔ 𝑃55
10
8+0.5
✓ For 777 ➔ P = ∗ 100% = 85% ➔ 𝑃85
10
Example (2):
Solution :
➢ By sorting data:
228 489 524 555 597 623 659 736 777 804
𝑛.𝑃 10(40)
b) 𝑃40 = ? ➔ C = 100
=
100
= 4 (Integer)
✓ ∴ 𝑃65 = 659
Example (6): Check each data set for outliers.
a) 88, 72, 97, 84, 86, 85, 100
b) 145, 119, 122, 118, 125, 116
Example (6): Check each data set for outliers.
Solution :
a) 88, 72, 97, 84, 86, 85, 100
Sort the Data : 72, 84, 85, 86, 88, 97, 100
𝑄1 = 84
𝐼𝑄𝑅 = 97 − 84 = 13
𝑄3 = 97
(𝑄1 −1.5 𝐼𝑄𝑅 , 𝑄3 + 1.5 𝐼𝑄𝑅) = (84 − 1.5 ∗ 13 , 97 + 1.5 ∗ 13) = (𝟔𝟒. 𝟓 , 𝟏𝟏𝟔. 𝟓)
𝑆𝑖𝑛𝑐𝑒 𝐴𝑙𝑙 𝑑𝑎𝑡𝑎 𝑓𝑎𝑙𝑙 𝑖𝑛 𝟔𝟒. 𝟓 , 𝟏𝟏𝟔. 𝟓 , 𝑇ℎ𝑒𝑛 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑛𝑜 𝑜𝑢𝑡𝑙𝑖𝑒𝑟𝑠
Example (6): Check each data set for outliers.
Solution :
b) 145, 119, 122, 118, 125, 116
119 + 122
Sort the Data : 116, 118, 119, 122, 125, 145 𝑄2 =
2
= 120.5
𝑄1 = 118.
𝐼𝑄𝑅 = 125 − 118 = 7
𝑄3 = 125
Note: Any value out of the range 𝑄1 − 1.5𝐼𝑄𝑅 , 𝑄3 + 1.5𝐼𝑄𝑅 is called an outlier.
Example (7): Construct a boxplot for the following data
.
30 34 29 30 34 29 31
33 34 27 30 27 34 32
Example (7): Construct a boxplot for the following data
.30 34 29 30 34 29 31
n = 14
33 34 27 30 27 34 32
Solution :
15 19 26 28 31 35 38 38 39 41
47 48 53 53 54 61 62 67 68 76
79 86 88 89 98
Example (1): 15 19 26 28 31 35 38 38 39 41
47 48 53 53 54 61 62 67 68 76
Solution: 79 86 88 89 98
Stem (tens) Leaf (ones) Since 𝑛 = 25, then the median is the
25+1
1 5 9 value with the rank no. = 13
2
2 6 8 Therefore, the median = 𝟓𝟑
3 1 5 8 8 9
Since the values 38, 53 are repeated
4 1 7 8
once then the data is bimodal, and
5 3 3 4 mode is equal to 𝟑𝟖 or 𝟓𝟑.
6 1 2 7 8
7 6 9
8 6 8 9
9 8
Skewness of Data
Skewness of data:
(i) Symmetric: If the mode = the median = the mean.
(ii) Left or Negative Skewed: If the mode < the median < the mean.
(iii) Right or Positive Skewed: If the mode > the median > the mean.