Probability and Statistics (Tutorial 2)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Probability And Statistics

Eng./ Mohammed Abdulmonem Borg


2) Measures of Variation
Population Variance:
Definition: When the population is finite and consists of 𝑁 values, we may
define the population variance as
σ 𝑛 2
2 𝑖=1 𝑥𝑖 − 𝜇
𝜎 =
𝑁

Sample Variance:
Definition: If the 𝑛 observations in a sample are denoted by 𝑥1 , 𝑥2 , … , 𝑥𝑛 , the
sample variance is
𝑛
2
σ 𝑖=1 𝑥𝑖 − 𝑥ҧ
2
𝑛(σ𝑛𝑖=1 𝑥𝑖2 ) − σ𝑛𝑖=1 𝑥𝑖 2
𝑠 = =
𝑛−1 𝑛(𝑛 − 1)

Note: The Standard Deviation is the positive square root of the variance
Sample Range:
Definition: If the 𝑛 observations in a sample are denoted by 𝑥1 , 𝑥2 , … , 𝑥𝑛 ,
the sample range is
𝑟 = max 𝑥𝑖 − min(𝑥𝑖 )

Coefficient of Variation:
Definition: The coefficient of variation (CV) is a statistical measure of the
relative dispersion of data points in a data series around the mean.

𝑆
𝐶. 𝑉 = ത ∗ 100
𝑋
Example (1):
A sample of 13 tall buildings for two different cities is listed below. Find the
variance (using both formulas) and standard deviation for each, then determine
Which set of data is more variable?

 Houston : 75, 71, 64, 56, 53, 55, 47, 55, 52, 50, 50, 50, 47
 Pittsburgh: 64, 54, 40, 32, 46, 44, 42, 41, 40, 40, 34, 32, 30

Sample
Example (1):
Solution : For Houston ഥ = σ 𝑿 = 𝟕𝟐𝟓 = 𝟓𝟓. 𝟕𝟕
𝑿
𝒏 𝟏𝟑
2
σ(𝑋−𝑋)2 𝑛(σ 𝑋 2 )−(σ 𝑋)
n X 𝑿−𝑋 𝑿−𝑋
𝟐
𝑋2  Variance = 𝑠2 = =
𝑛−1 𝑛(𝑛−1)
1 75 19.23 369.79 5625 2
2 71 15.23 231.95 5041 2
σ(𝑋 − 𝑋) 946.27
𝑆 = = = 78.86
3 64 8.23 67.73 4096 𝑛−1 12
4 56 0.23 0.05 3136 OR
5 53 -2.77 7.67 2809
σ 2 ) − (σ 𝑋)2
6 55 -0.77 0.59 3025 𝑛( 𝑋
𝑆2 =
7 47 -8.77 76.91 2209 𝑛(𝑛 − 1)
8 55 -0.77 0.59 3025
13 41379 −(725)2
9 52 -3.77 14.21 2704 = = 78.86
13 (12)
10 50 -5.77 33.29 2500
11 50 -5.77 33.29 2500 σ(𝑋−𝑋)2
12 50 -5.77 33.29 2500  Standard deviation = 𝑠 = = 𝑆2
𝑛−1
13 47 -8.77 76.91 2209
Total 725 ‫ـــــ‬ 946.27 41379 𝑆 = 78.86 = 8.88
Example (1):
Solution : For Pittsburgh ഥ = σ 𝑿 = 𝟓𝟑𝟗 = 𝟒𝟏. 𝟒𝟔
𝑿
𝒏 𝟏𝟑
2
σ(𝑋−𝑋)2 𝑛(σ 𝑋 2 )−(σ 𝑋)
n X 𝑿−𝑋 𝑿−𝑋
𝟐
𝑋2  Variance = 𝑠2 = =
𝑛−1 𝑛(𝑛−1)
1 64 22.54 508.05 4096 2
2 54 12.54 157.25 2916 2
σ(𝑋 − 𝑋) 1065.21
𝑆 = = = 88.76
3 40 -1.46 2.13 1600 𝑛−1 12
4 32 -9.46 89.49 1024 OR
5 46 4.54 20.61 2116
σ 2 ) − (σ 𝑋)2
6 44 2.54 6.45 1936 𝑛( 𝑋
𝑆2 =
7 42 0.54 0.29 1764 𝑛(𝑛 − 1)
8 41 -0.46 0.21 1681
13 23413 −(539)2
9 40 -1.46 2.13 1600 = = 88.76
13 (12)
10 40 -1.46 2.13 1600
11 34 -7.46 55.65 1156 σ(𝑋−𝑋)2
12 32 -9.46 89.49 1024  Standard deviation = 𝑠 = = 𝑆2
𝑛−1
13 30 -11.46 131.33 900
Total 539 ‫ـــــ‬ 1065.21 23413 𝑆 = 88.76 = 9.42
Example (1):
Solution :
𝑆
 Coefficient of Variation = 𝐶. 𝑉= ∗ 100
𝑋ത
8.88
𝐶. 𝑉𝐻𝑜𝑢𝑠𝑡𝑜𝑛 = ∗ 100 = 15.92% 𝑆𝑖𝑛𝑐𝑒 𝐶. 𝑉𝑃𝑖𝑡𝑡𝑠𝑏𝑢𝑟𝑔ℎ > 𝐶. 𝑉𝐻𝑜𝑢𝑠𝑡𝑜𝑛
55.77
9.42 ∴ Pittsburgh is more variable than Houston
𝐶. 𝑉𝑃𝑖𝑡𝑡𝑠𝑏𝑢𝑟𝑔ℎ = ∗ 100 = 22.72%
41.46
3) Measures of Position
Percentiles
Percentiles
I) given a set of data of a size 𝑛, and required the percentile rank of a certain
value 𝑋:
𝑛𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 + 0.5
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = ∗ 100%
𝑛

II) given a set of data of a size 𝑛, and required the corresponding value 𝑋 of a
certain percentile:
𝑛∗𝑃
Rank of 𝑋 =
100
Example (2):
The average weekly earnings in dollars for various industries are listed below.
 804 736 659 489 777 623 597 524 228 555
a) Find the percentile rank of 777, 623 & 555.
b) What values corresponds to the 40th & 65th percentile?
Example (2):
Solution :

➢ By sorting data:
228 489 524 555 597 623 659 736 777 804
𝑛𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 + 0.5
a) Percentile = ∗ 100%
𝑡𝑜𝑡𝑎𝑙 𝑛𝑜.𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠

3+0.5
✓ For 555 ➔ P = ∗ 100% = 35% ➔ 𝑃35
10
5+0.5
✓ For 623 ➔ P = ∗ 100% = 55% ➔ 𝑃55
10
8+0.5
✓ For 777 ➔ P = ∗ 100% = 85% ➔ 𝑃85
10
Example (2):
Solution :

➢ By sorting data:
228 489 524 555 597 623 659 736 777 804
𝑛.𝑃 10(40)
b) 𝑃40 = ? ➔ C = 100
=
100
= 4 (Integer)

4th value + 5th value 555+597


✓ ∴ 𝑃40 = = = 576
2 2
𝑛.𝑃 10(65)
𝑃65 = ? ➔ C = = = 6.5 ≅ 7 (Non integer)
100 100

✓ ∴ 𝑃65 = 659
Example (6): Check each data set for outliers.
a) 88, 72, 97, 84, 86, 85, 100
b) 145, 119, 122, 118, 125, 116
Example (6): Check each data set for outliers.
Solution :
a) 88, 72, 97, 84, 86, 85, 100
 Sort the Data : 72, 84, 85, 86, 88, 97, 100

𝑄1 = 84
𝐼𝑄𝑅 = 97 − 84 = 13
𝑄3 = 97

(𝑄1 −1.5 𝐼𝑄𝑅 , 𝑄3 + 1.5 𝐼𝑄𝑅) = (84 − 1.5 ∗ 13 , 97 + 1.5 ∗ 13) = (𝟔𝟒. 𝟓 , 𝟏𝟏𝟔. 𝟓)

𝑆𝑖𝑛𝑐𝑒 𝐴𝑙𝑙 𝑑𝑎𝑡𝑎 𝑓𝑎𝑙𝑙 𝑖𝑛 𝟔𝟒. 𝟓 , 𝟏𝟏𝟔. 𝟓 , 𝑇ℎ𝑒𝑛 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑛𝑜 𝑜𝑢𝑡𝑙𝑖𝑒𝑟𝑠
Example (6): Check each data set for outliers.
Solution :
b) 145, 119, 122, 118, 125, 116
119 + 122
 Sort the Data : 116, 118, 119, 122, 125, 145 𝑄2 =
2
= 120.5

𝑄1 = 118.
𝐼𝑄𝑅 = 125 − 118 = 7
𝑄3 = 125

(𝑄1 −1.5 𝐼𝑄𝑅 , 𝑄3 + 1.5 𝐼𝑄𝑅) = (118 − 1.5 ∗ 7 , 125 + 1.5 ∗ 7)


= (𝟏𝟎𝟕. 𝟓, 𝟏𝟑𝟓. 𝟓)
𝑆𝑖𝑛𝑐𝑒 (𝟏𝟒𝟓) 𝑓𝑎𝑙𝑙 𝑜𝑢𝑡 𝑜𝑓 𝟏𝟏, 𝟏𝟑𝟏. 𝟔𝟐 , 𝑇ℎ𝑒𝑛 𝟏𝟒𝟓 𝑖𝑠 𝑎𝑛 𝑜𝑢𝑡𝑙𝑖𝑒𝑟
Box Plots
Box Plots:

Note: Any value out of the range 𝑄1 − 1.5𝐼𝑄𝑅 , 𝑄3 + 1.5𝐼𝑄𝑅 is called an outlier.
Example (7): Construct a boxplot for the following data
.
30 34 29 30 34 29 31
33 34 27 30 27 34 32
Example (7): Construct a boxplot for the following data
.30 34 29 30 34 29 31
n = 14
33 34 27 30 27 34 32
Solution :

 Sort the Data in order : 27 27 29 29 30 30 30


31 32 33 34 34 34 34
30 + 31
𝑄2 = = 30.5
2
29 30.5 34
𝑄1 = 29
27
𝐼𝑄𝑅 = 34 − 29 = 5
𝑄3 = 34
25 Horizontal scale 35
Stem and Leaf Plot
Stem and Leaf Plot:
is a data plot that uses part of the data value as the stem and part of the data
value as the leaf to form groups or classes .
Example (1):
The number of visitors to the Historic Museum for 25 randomly selected hours is
shown. Construct a stem and leaf plot for the data
15 53 48 19 38 86 63 98 79 38
62 89 67 39 26 28 35 54 88 76
31 47 53 41 68
Find also the mode and the median.
Solution: Rearrange the data ascendingly

15 19 26 28 31 35 38 38 39 41
47 48 53 53 54 61 62 67 68 76
79 86 88 89 98
Example (1): 15 19 26 28 31 35 38 38 39 41
47 48 53 53 54 61 62 67 68 76
Solution: 79 86 88 89 98
Stem (tens) Leaf (ones) Since 𝑛 = 25, then the median is the
25+1
1 5 9 value with the rank no. = 13
2
2 6 8 Therefore, the median = 𝟓𝟑

3 1 5 8 8 9
Since the values 38, 53 are repeated
4 1 7 8
once then the data is bimodal, and
5 3 3 4 mode is equal to 𝟑𝟖 or 𝟓𝟑.
6 1 2 7 8
7 6 9
8 6 8 9
9 8
Skewness of Data
Skewness of data:
(i) Symmetric: If the mode = the median = the mean.

(ii) Left or Negative Skewed: If the mode < the median < the mean.

(iii) Right or Positive Skewed: If the mode > the median > the mean.

You might also like