2.normal Distribution
2.normal Distribution
2.normal Distribution
distribution
Dhanya
N.M.
What is
normal??
🞅 Since each normally distributed variable has its own mean and standard deviation,
the shape and location of these curves will vary.
🞅 In practical applications, then, you would have to have a table of areas under the
curve for each variable.
🞅 To simplify this situation, statisticians use what is called the standard normal distribution.
Empirical
rule
Finding Areas Under the Standard Normal
Distribution Curve
🞅 For the solution of problems using the standard normal distribution, a four-step procedure
is recommended with the use of the Procedure Table shown.
🞅 Step 1 Draw the normal distribution curve and shade the area.
🞅 Step 2 Find the appropriate figure in the Procedure Table and follow the directions
given.
Find the area to the left of z =
1.99
Find the area to the right of z=-
1.16
Find the area between z=1.68 and
z=1.37
Find
probabilities
Answer
s
🞅
0.1587
🞅
0.9335
🞅
0.3085
Find
probability
Answer
s
Problem
🞅 It is known that IQ scores form a normal distribution with μ = 100 and σ =15.
Given this information, what is the probability of randomly selecting an individual
with an IQ score less than 120?
🞅 Given this information, what proportion of the cars are traveling between 55 and 65
miles
per hour? Using probability notation, we can express the problem as p(55 < X < 65) ?
Answer
Holiday Spending
🞅 A survey by the National Retail Federation found that women spend on average $146.21
for the Christmas holidays. Assume the standard deviation is $29.44. Find the
percentageof women who spend less than $160.00. Assume the variable is normally
distributed.
Emergency Call Response
Time
🞅 The American Automobile Association reports that the average time it takes to respond
to an emergency call is 25 minutes. Assume the variable is approximately normally
distributed and the standard deviation is 4.5 minutes. If 80 calls are randomly selected,
approximately how many will be responded to in less than 15 minutes?
Police Academy Qualifications
🞅 To qualify for a police academy, candidates must score in the top 10% on a general
abilities test. The test has a mean of 200 and a standard deviation of 20. Find the
lowest possible score to qualify. Assume the test scores are normally distributed.
Answer
Systolic Blood
Pressure
🞅 For a medical study, a researcher wishes to select people in the middle 60% of the
population based on blood pressure. If the mean systolic blood pressure is 120 and
the standard deviation is 8, find the upper and lower readings that would qualify
people to participate in the study.
The Normal Approximation to the
Binomial Distribution
🞅 A normal distribution is often used to solve problems that involve the binomial
distribution since when n is large (say, 100), the calculations are too difficult to do by
hand using the binomial distribution.
🞅 Reca ll that a binomial distribution has the following characteristics:
🞅 1. There must be a fixed number of trials.
🞅 2. The outcome of each trial must be independent.
🞅 3. Each experiment can have only two outcomes or outcomes that can be reduced
to two outcomes.
🞅 4. The probability of a success must remain the same for each trial.
Correction for
continuity
🞅 In addition to the previous condition of np ≥ 5 and nq ≥ 5, a correction for continuity
may be used in the normal approximation.
🞅 A correction for continuity is a correction employed when a continuous distribution is
used to approximate a discrete distribution.
Reading While
Driving
🞅 A magazine reported that 6% of American drivers read the newspaper while driving. If
300 drivers are selected at random, find the probability that exactly 25 say they read
the newspaper while driving.
Taxonomy of Probability Distributions
Discrete probability distributions
🞅Binomial distribution
🞅Multinomial distribution
🞅Poisson distribution
🞅Hypergeometric distribution
🞅 Firstly, the most important point to note is that the normal distribution is also known
as
the Gaussian distribution.
🞅 It is named after the genius of Carl Friedrich Gauss.
🞅 Lastly, an important point to note is that the simple predictive models are usually the
most used models due to the fact that they can be explained and are well-understood.
🞅 Now to add to this point; normal distribution is simple and hence its simplicity makes
it
extremely popular.
What Does Probability
Distribution Mean?
Let me explain by building the appropriate building blocks first.
🞅 Consider the predictive models we might be interested in building in our
data science projects.
🞅 If we want to predict a variable accurately then the first task we need
to
perform is to understand the underlying behavior of our target variable.
🞅 What we need to do first is to determine the possible outcomes of the target
variable and if the underlying outcomes are discrete (distinct values) or
continuous (infinite values).
🞅 For the sake of simplic ity, if we are estimating the behaviour of a dice
then the first step is to know that it can take any value from 1 to 6
(discrete).
🞅 Then the next step would be to start assigning probabilities to the events
(values). Consequently, if a value cannot occur then it is assigned a
probability of 0%.
The higher the probability, the more likely
it is for the event to occur.
🞅 The idea revolves around the theorem that when you repeat an experiment a large
number of times on a large number of random variables then the sum of their
distributions will be very close to normality.
🞅 As height of a person is a random variable and is based on other random variables
such as the amount of nutrition a person consumes, the environment they live in, their
genetics and so on, the sum of the distributions of these variables end up being very
close to normal.
🞅 The bell-shaped curve above has 100 mean and 1 standard deviation
🞅 Mean is the center of the curve. This is the highest point of the curve as most of the
points
are at the mean.
🞅 There are equal number of points on each side of the curve. The center of the curve
has the most number of points.
🞅 The total area under the curve is the total probability of all of the values that the
variable
ca n take.
🞅 The total curve area is therefore 100%
Characteristics
🞅 Approximately 68.2% of all of the points are within the range -1 to 1 standard
deviation.
🞅 About 95.5% of all of the points are within the range -2 to 2 standard deviations.
🞅 About 99.7% of all of the points are within the range -3 to 3 standard deviations.
🞅 This allows us to easily estimate how volatile a variable is and given a confidence
level,
what its likely value is going to be.
🞅 As an instance, in the gray bell shaped curve above, there is a 68.2% chance that
the value of the variable will be within 101–99.
🞅 Imagine the confidence you can now have when making future decisions with that
information!!!
Normal Probability Distribution
Function
🞅 The probability density function of normal distribution is:
🞅 The probability density function is essentially the probability of continuous random
variable
taking a value.
�
�
🞅 Now what’s even more fascinating is that once you add a large number
of random variables with differing distributions together, your new
variable will end up having a normal distribution. This is essentially known
as the Central Limit Theorem.
🞅 The variables that exhibit normal distribution always exhibit normal
distribution.
As an instance, if A and B are two variables with normal distributions then:
🞅 A x B is normally distributed
🞅 A + B is normally distributed
🞅 As a result, it is extremely simple to forecast a variable and find the
probability of it within a range of values because of the well-known
probability distribution function.
What If The Sample Distribution Is Not
Normal?
🞅 Once we gather sample for a variable, we can compute the Z-score via
linearly
transforming the sample using the formula above:
🞅 Calculate the mean
🞅 Calculate the standard deviation
🞅 For each value x, compute Z using:
2. Using Boxcox
Transformation
🞅 The population distribution of SAT scores is normal with a mean of μ = 500 and a
standard deviation of 100. Given this information about the population and the
known proportions for a normal distribution, we can determine the probabilities
associated with specific samples. For example, what is the probability of randomly
selecting an individual from this population who has an SAT score greater than 700?