Session 6
Session 6
Session 6
Probability Distributions
Contents
Introduction, p46
6.1 The Bernoulli distribution, p47
6.2 Binomial distribution, p48
6.3 Poisson distribution, p49
6.4 Normal distribution, p51
Summary, p53
Learning Outcomes, p54
Introduction
Health professionals very often use the term ‘probability’ in their day-to-day
work to express the level of uncertainty of health events that occur. Most of
the information related to patient diagnosis and outcome of treatment are
uncertain so that we cannot predict outcomes in health care with 100%
assurance. For example, a cardiothoracic surgeon may say that the patient
has 80% chance of surviving if he/she undergoes a Coronary Artery Bypass
Graft (CABG) surgery. A nurse may say that about one fourth of her
patients would not follow the medical advices given to them. Further, it is
observed that most of the data collected in health sciences research follow
some specific patterns. A pre-specified pattern of a data set is known as the
distribution of the data set. Such data sets are said to have parametric
distributions. However, there are data sets that do not follow any sort of
pattern, and such data sets are said to have non-parametric distributions. In
session 1, we learned that a random variable is a variable whose possible
values are numerical outcomes of a random phenomenon, and that there are
two types of random variables, discrete and continuous. Discrete random
variables can take only a finite number of distinct values, for example, the
number of emergency care admissions per week in a base hospital. The
possible list of probabilities associated with each of its possible values of a
discrete random variable is called the probability function (or probability
distribution) of that variable. A function that provides the probability of
occurrence of all possible outcomes of a discrete variable is called the
probability mass function.
A continuous random variable, on the other hand can take an infinite
number of possible values. In health sciences continuous random variables
are usually health measurements such as weight and blood glucose level.
Such variables usually are defined over an interval of values, and are
represented by the area under a curve. A function that provides the
probability of occurrence of all possible outcomes of a continues variable is
called the probability density function.
p if x =1
P (X =x) =
1-p if x =0
These outcomes are often described as ‘success’ (yes) often denoted by 1,
and a ‘failure’ (no) denoted by 0. Suppose a study, where 1000 smokers
were studied to identify those who developed lung cancer in the recent years,
found that 90 out of 1000 smokers experienced lung cancer, then the
probability that a smoker developed lung cancer in the sample is 90 / 1000 =
Suppose in a cancer hospital, health records indicate that 80% of the patients
suffering from esophageal cancer would eventually die of it. What is the
probability that out of 6 randomly selected esophageal cancer patients?
a. Four(4) patients will recover?
Here n=6 and x=4, p=0.20 (a success) and (1-P) = q=0.80 (failure)
The probability that 4 patients will recover is equal to
The probability that more than 4 esophageal cancer patients will recover =
0.0046 or 0.4%
Hereµ = 3
a. For one or more cases, we can work out the probability of finding 1
minus the zero cases
Thus, the probability of having registered 2 or more cases but less than 5
cases in a week is approximately 62%.
Where X is the normal random variable, µ is the mean and σ is the standard
deviation
The value ofΠ is approximately 3.1415 and the value of e is approximately
2.718
The normal distribution is a continuous probability distribution. Continuous
variables usually contain a very large number of outcomes. We can draw a
frequency distribution curve for a normally distributed variable and this
curve is generally referred to as normal curve.
Properties of normal distribution
• The total area under the normal curve is equal to 1 (or 100%).
• The probability that X is greater than the value “a” equals the area
under the normal curve bounded by “a” and plus infinity (as
indicated by the non-shaded area in the figure 6.1 given below).
• Also, the probability that X is less than “a “equals the area under the
normal curve bounded by “a” and minus infinity (as indicated by the
Activity6.1
2. Suppose in a particular MOH office, 30% of the mothers are anemic. What is the
probability that out of 10 randomly selected mothers,
(a) 3 mothers are non-anemic?
(b) 5 or more mothers are equal to its mean.
Summary
Learning Outcomes
At the end of the lesson you should be able to,
Review Questions
3. A nursing home owner knows that, on average, 10 elders per year would get
admitted to the nursing home for long term care.
a) The variable in this example is normally distributed.
b) The mean of the variable is 5.
c) The variance of this variable is 25.
d) The variable in this example is continuous.
e) The standard deviation of the variable in this example is greater than its
mean.