Lecture 7 9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

LECTURE 7

Measures of Variation

Measures of Dispersion or Variability


Measures of dispersion describe the spread of the data. They include the range,
interquartile range, standard deviation and variance

Range and Interquartile Range


The range is given as the smallest and largest observations. This is the simplest measure
of variability. Note in statistics (unlike physics) a range is given by two numbers, not the
difference between the smallest and largest. For some data it is very useful, because one
would want to know these numbers, for example knowing in a sample the ages of
youngest and oldest participant. If outliers are present it may give a distorted impression
of the variability of the data, since only two observations are included in the estimate

Quartiles and Interquartile Range


The quartiles, namely the lower quartile, the median and the upper quartile, divide the
data into four equal parts; that is there will be approximately equal numbers of
observations in the four sections (and exactly equal if the sample size is divisible by four
and the measures are all distinct). Note that there are in fact only three quartiles and
these are points not proportions. It is a common misuse of language to refer to being ‘in
the top quartile’. Instead one should refer to being ‘in the top quarter or ‘above the top
quartile’. However, the meaning of the first statement is clear and so the distinction is
really only useful to display a superior knowledge of statistics! The quartiles are calculated
in a similar way to the median; first arrange the data in size order and determine the
median, using the method described above. Now split the data in two (the lower half and
upper half, based on the median).

The first quartile is the middle observation of the lower half, and the third quartile is the
middle observation of the upper half. This process is demonstrated in Example 2, below.
The interquartile range is a useful measure of variability and is given by the lower and
upper quartiles. The interquartile range is not vulnerable to outliers and, whatever the
distribution of the data, we know that 50% of observations lie within the interquartile range.

Example Calculation of the Quartiles


Suppose we had 18 birth weights arranged in increasing order.
1.51, 1.53. 1.55, 1.55, 1.79. 1.81, 2.10, 2.15, 2.18,
2.22, 2.35, 2.37, 2.40, 2.40, 2.45, 2.78. 2.81, 2.85.
The median is the average of the 9th and 10th observations (2.18+2.22)/2 = 2.20 kg. The
first half of the data has 9 observations so the first quartile is the 5th observation, namely
1.79kg. Similarly the 3rd quartile would be the 5th observation in the upper half of the
data, or the 14th observation, namely 2.40 kg. Hence the interquartile range is 1.79 to
2.40 kg.

Standard Deviation and Variance


The standard deviation of a sample (s) is calculated as follows:

Sample Variance

The expression is interpreted as: from each individual observation subtract


the mean ( ), then square this difference. Next add each of the n squared differences.
This sum is then divided by (n-1). This expression is known as the sample variance .
The variance is expressed in square units, so we take the square root to return to the
original units, which gives the standard deviation, s.

Suppose we had 18 birth weights arranged in increasing order.


1.51, 1.53. 1.55, 1.55, 1.79. 1.81, 2.10, 2.15, 2.18,
2.22, 2.35, 2.37, 2.40, 2.40, 2.45, 2.78. 2.81, 2.85.

Find the sample variance and standard deviation.

Summary:
What are the 4 main measures of variability?
Range: The difference between the highest and lowest values
Interquartile range: The range of the middle half of a distribution
Standard Deviation: Average distance from the mean
Variance: Average of squared distances from the mean.

Exercises:
1. Find sample variance and the standard deviation for the following data set: 1245, 1255, 1654,
1547, 1787, 1989, 1878, 2011, 2145, 2545, 2656.

2. You survey households in your area to find the average rent they are paying. Find the sample
variance and the standard deviation from the following data in pesos: 1550, 1700, 900, 850, 1000,
950.

3. Find the range, interquartile range, sample variance, standard deviation of the following data:
132, 144,895, 441, 623, 325, 366, 412, 530, 332, 225, 239, 661, 754, 354, 554, 874, 771, 664, 334,
198, 178, 213, 423, 324, 133, 534, 654, 541, 367, 569, 756, 881, 159.
LECTURE 8

Probability and Statistics


Probability And Statistics are the two important concepts in Math. Probability is all about
chance. Whereas statistics is more about how we handle various data using different
techniques.

What is Probability?
Probability denotes the possibility of the outcome of any random event. The meaning of
this term is to check the extent to which any event is likely to happen. For example, when
we flip a coin in the air, what is the possibility of getting a head? The answer to this
question is based on the number of possible outcomes. Here the possibility is either head
or tail will be the outcome. So, the probability of a head to come as a result is ½.
The probability is the measure of the likelihood of an event to happen. It measures
the certainty of the event. The formula for probability is given by:
P(E) = Number of Favourable Outcomes/Number of total outcomes
P(E) = n(E)/n(S)

Here,
n(E) = Number of event favorable to event E
n(S) = Total number of outcomes

Workshop
1. There are 18 tickets marked with numbers 1 to 18. What‘s the probability of selecting
a ticket having the following property:
a) even number
b) number divisible by 3
c) prime number
d) number divisible by 6
2. Determine the probability of following results when throwing 2 playing cubes (a red one
and a blue one):
a) sum equals to 8
b) sum divisible by 5
c) even sum

What is Statistics?
Statistics is the study of the collection, analysis, interpretation, presentation, and
organization of data. It is a method of collecting and summarizing the data. This has many
applications from a small scale to large scale. Whether it is the study of the population of
the country or its economy, statistics are used for all such data analysis. Statistics has a
huge scope in many fields such as sociology, psychology, geology, weather forecasting,
etc. The data collected here for analysis could be quantitative or qualitative. Quantitative
data are also of two types such as: discrete and continuous. Discrete data has a fixed
value whereas continuous data is not a fixed data but has a range. There are many terms
and formulas used in this concept.

Terms Used in Probability and Statistics


There are various terms utilised in the probability and statistics concepts, Such as:
• Random Experiment
• Sample Sample
• Random variables
• Expected Value
• Independence
• Variance
• Mean
Standard probability definition
Let a random event meet following conditions:
• number of the events is finite
• all events have the same chance to occur
• no two events can occur in the same time

Probability of an event A equals , where n = number of all possible events,


m = number of cases favorable
for the event A
Stands: 0 ≤ P(A) ≤ 1
Probability of an impossible event : P(A) = 0
Probability of a sure event: P(A) = 1

Random Experiment
An experiment whose result cannot be predicted, until it is noticed is called a random
experiment. For example, when we throw a dice randomly, the result is uncertain to us.
We can get any output between 1 to 6. Hence, this experiment is random.

Sample Space
A sample space is the set of all possible results or outcomes of a random experiment.
Suppose, if we have thrown a dice, randomly, then the sample space for this experiment
will be all possible outcomes of throwing a dice, such as;

Sample Space = { 1,2,3,4,5,6}

Random Variables
The variables which denote the possible outcomes of a random experiment are called
random variables. They are of two types:
• Discrete Random Variables
• Continuous Random Variables
Discrete random variables take only those distinct values which are countable. Whereas
continuous random variables could take an infinite number of possible values.

Independent Event
When the probability of occurrence of one event has no impact on the probability of
another event, then both the events are termed as independent of each other. For
example, if you flip a coin and at the same time you throw a dice, the probability of
getting a ‘head’ is independent of the probability of getting a 6 in dice.

Mean
Mean of a random variable is the average of the random values of the possible
outcomes of a random experiment. In simple terms, it is the expectation of the possible
outcomes of the random experiment, repeated again and again or n number of times. It
is also called the expectation of a random variable.

Expected Value
Expected value is the mean of a random variable. It is the assumed value which is
considered for a random experiment. It is also called expectation, mathematical
expectation or first moment. For example, if we roll a dice having six faces, then the
expected value will be the average value of all the possible outcomes, i.e. 3.5.

Variance
Basically, the variance tells us how the values of the random variable are spread around
the mean value. It specifies the distribution of the sample space across the mean.

Exercises

Example 1: A bucket contains 5 blue, 4 green and 5 red balls. Sudheer is asked to pick
2 balls randomly from the bucket without replacement and then one more ball is to be
picked. What is the probability he picked 2 green balls and 1 blue ball?

Solution: Total number of balls = 14

Probability of drawing

1 green ball = 4/14

another green ball = 3/13

1 blue ball = 5/12


Probability of picking 2 green balls and 1 blue ball = 4/14 * 3/13 * 5/12 = 5/182.

Example 2: What is the probability that Ram will choose a marble at random and that it
is not black if the bowl contains 3 red, 2 black and 5 green marbles.

Solution: Total number of marble = 10

Red and Green marbles = 8

Find the number of marbles that are not black and divide by the total number of
marbles.

So P(not black) = (number of red or green marbles)/(total number of marbles)

= 8 /10

= 4/5

Probability Distribution

A probability distribution is a mathematical function that describes the probability of


different possible values of a variable. Probability distributions are often depicted using
graphs or probability tables.

Example: Probability distribution


We can describe the probability distribution of one coin flip using a probability table:

Outcome Probability

Heads Tails
.5 .5

Common probability distributions include the binomial distribution, Poisson distribution,


and uniform distribution. Certain types of probability distributions are used in hypothesis
testing, including the standard normal distribution, the F distribution, and Student’s t
distribution.
What is a probability distribution?

A probability distribution is an idealized frequency distribution.

A frequency distribution describes a specific sample or dataset. It’s the number of times
each possible value of a variable occurs in the dataset. The number of times a value
occurs in a sample is determined by its probability of occurrence. Probability is a number
between 0 and 1 that says how likely something is to occur:

• 0 means it’s impossible.


• 1 means it’s certain.

The higher the probability of a value, the higher its frequency in a sample. More
specifically, the probability of a value is its relative frequency in an infinitely large sample.

Infinitely large samples are impossible in real life, so probability distributions are
theoretical. They’re idealized versions of frequency distributions that aim to describe the
population the sample was drawn from.

Probability distributions are used to describe the populations of real-life variables, like
coin tosses or the weight of chicken eggs. They’re also used in hypothesis testing to
determine p values.

Example: Probability distributions are idealized


frequency distributions

Imagine that an egg farmer wants to know the


probability of an egg from her farm being a certain
size. The farmer weighs 100 random eggs and
describes their frequency distribution using a
histogram:

She can get a rough idea of the probability of different egg sizes directly from this
frequency distribution. For example, she can see that there’s a high probability of an egg
being around 1.9 oz., and there’s a low probability of an egg being bigger than 2.1 oz.

Suppose the farmer wants more precise probability estimates. One option is to improve
her estimates by weighing many more eggs.
A better option is to recognize that egg size appears
to follow a common probability distribution called a
normal distribution. The farmer can make an
idealized version of the egg weight distribution by
assuming the weights are normally distributed:

Since normal distributions are well understood by


statisticians, the farmer can calculate precise
probability estimates, even with a relatively small
sample size.

Random Variable

A Random Variable is a set of possible values from a random experiment.

Example: Tossing a coin: we could get Heads or Tails.

Let's give them the values Heads=0 and Tails=1 and we have a Random Variable "X":

In Short:

Note: We could choose Heads=100 and Tails=150 or other values if we want! It is our
choice.

So:

• We have an experiment (such as tossing a coin)


• We give values to each event
• The set of values is a Random Variable

A Random Variable has a whole set of values and it could take on any of those values,
randomly.

Example: X = {0, 1, 2, 3}
• X could be 0, 1, 2, or 3 randomly.
• And they might each have a different probability.

Capital Letters Probability

We use a capital letter, like X or Y, We can show the probability of any one
to avoid confusion with the Algebra value using this style:
type of variable.
P(X = value) = probability of that value

Example (continued): Throw a die once

X = {1, 2, 3, 4, 5, 6}
Sample Space
In this case they are all equally likely, so
A Random Variable's set of values is the probability of any one is 1/6
the Sample Space.
P(X = 1) = 1/6
Example: Throw a die once
P(X = 2) = 1/6
Random Variable X = "The score
shown on the top face". P(X = 3) = 1/6

X could be 1, 2, 3, 4, 5 or 6 P(X = 4) = 1/6

So the Sample Space is {1, 2, 3, 4, 5, 6} P(X = 5) = 1/6

P(X = 6) = 1/6

Note that the sum of the probabilities = 1,


as it should be.

Workshops:

Example: How many heads when we toss 3 coins?

X = "The number of Heads" is the Random Variable.

In this case, there could be 0 Heads (if all the coins land Tails up), 1 Head, 2 Heads or 3
Heads.

So the Sample Space = {0, 1, 2, 3}


But this time the outcomes are NOT all equally likely. The three coins can land in eight
possible ways:

X = Number of Heads X = Number of Heads

Looking at the table we see just 1 case of Three Heads, but 3 cases of Two Heads, 3
cases of One Head, and 1 case of Zero Heads. So:

P(X = 3) = 1/8

P(X = 2) = 3/8

P(X = 1) = 3/8

P(X = 0) = 1/8

Example: Two dice are tossed.

The Random Variable is X = "The sum of the scores on the two dice".
Let's make a table of all possible values:

There are 6 × 6 = 36 possible outcomes, and the Sample Space (which is the sum of the
scores on the two dice) is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

Let's count how often each value occurs, and work out the probabilities:

• 2 occurs just once, so P(X = 2) = 1/36


• 3 occurs twice, so P(X = 3) = 2/36 = 1/18
• 4 occurs three times, so P(X = 4) = 3/36 = 1/12
• 5 occurs four times, so P(X = 5) = 4/36 = 1/9
• 6 occurs five times, so P(X = 6) = 5/36
• 7 occurs six times, so P(X = 7) = 6/36 = 1/6
• 8 occurs five times, so P(X = 8) = 5/36
• 9 occurs four times, so P(X = 9) = 4/36 = 1/9
• 10 occurs three times, so P(X = 10) = 3/36 = 1/12
• 11 occurs twice, so P(X = 11) = 2/36 = 1/18
• 12 occurs just once, so P(X = 12) = 1/36

A Range of Values

We could also calculate the probability that a Random Variable takes on a range of
values.

Example (continued) What is the probability that the sum of the scores is 5, 6, 7 or 8?

In other words: What is P(5 ≤ X ≤ 8)?

P(5 ≤ X ≤ 8) =P(X=5) + P(X=6) + P(X=7) + P(X=8)

=(4+5+6+5)/36
=20/36

=5/9

Solving

We can also solve a Random Variable equation.

Example (continued) If P(X=x) = 1/12, what is the value of x?

Looking through the list above we find:

• P(X=4) = 1/12, and


• P(X=10) = 1/12

So there are two solutions: x = 4 or x = 10

Notice the different uses of X and x:

• X is the Random Variable "The sum of the scores on the two dice".
• x is a value that X can take.

You might also like