Statistics Lecture Notes
Statistics Lecture Notes
Statistics Lecture Notes
COURSE CONTENT
8.1 Gupta, S.P. (2002), Statistical Methods, Sultan Chand and Sons
8.2 Aczel, B. & Sounderpandian, M. (2006), Complete Business Statistics, McGraw
Hill
8.3 Anderson, Sweeney and William, (2007), Statistics for Business Economics, 9th
edition, Thomson Publishing
8.4 Wisniewski, M. (2010), Quantitative Methods for Decision Makers, Prentice Hall;
5 edition
8.5 Curwin, J. (2001), Quantitative Methods for Business Decisions, Cengage
Learning Business Press; 5 edition
8.6 Waters, D. (2007), Quantitative Methods for Business, Prentice Hall; 4 edition
8.7 Terry Lucey (2007), Quantitative Methods, BookPower, Sixth Edition
................end...........
LESSON ONE: INTRODUCTION
b) Ordinal scale
Items are not only grouped into categories but they are also ranked into some order. Therefore in an
ordinal scale, numerals are used to represent relative position or order among the values of the
variables.
The use of ordinal scale implies a statement of ‘greater than’ or ‘less than’ (equality is also
acceptable) without being able to state how much greater or less. The real difference between ranks
1 and 2 may be more or less than the difference between ranks 5 and 6.
Since the numbers of this scale have only a rank meaning, the appropriate measure of central
tendency is the median. A percentile or quartile measure is used for measuring dispersion.
Correlations are restricted to various rank order methods. Measures of statistical significance are
restricted to non-parametric methods.
c) Interval scale
Numerals assigned to each measure are ranked in order and the intervals between them are equal.
Hence numerals used represent quantity and some mathematical operations would yield
meaningful values.
However, the zero point is not meaningful, i.e. interval scales have an arbitrary zero and it is not
possible to determine for them what may be called an absolute zero or the unique origin.
The primary limitation of the interval scale is the lack of a true zero; it does not have the capacity to
measure the complete absence of a trait or characteristic.
The Fahrenheit scale is an example of an interval scale. One can say that an increase in temperature
from 30o to 40o involves the same increase in temperature as an increase from 60o to 70o, but one
cannot say that the temperature of 60o is twice as warm as the temperature of 30o because both
numbers are dependent on the fact that then zero on the scale is set arbitrarily at the temperature
of the freezing point of water. The ratio of the two temperatures, 30o and 60o, means nothing
because zero is an arbitrary point.
Intervals scales provide more powerful measurement than ordinal scales since the interval scale
incorporates the concept of equality of interval.
As such more powerful statistical measures can be used with interval scales. Mean is the
appropriate measure of central tendency, while standard deviation is the most widely used
measure of dispersion.
Product moment correlation techniques are appropriate and the generally used tests for statistical
significance are the‘t’ test and ‘F’ test.
d) Ratio scale
Ratio scales have an absolute or true zero of measurement. E.G: the zero point on a centimeter
scale indicates the complete absence of length or height. But an absolute zero of temperature is
theoretically unattainable and it remains a concept existing only in the scientist’s mind.
Ratio scale represents the actual amounts of variables. Measures of physical dimensions such as
weight, height, distance, et. Are examples.
All statistical techniques are usable with ratio scale and all mathematical operations (including
multiplication and division) can be used
Geometric and harmonic means can be used as measures of central tendency and coefficients of
variation may also be calculated.
LESSON TWO: DATA COLLECTION, ORGANIZATION AND PRESENTATION
2.1 Introduction
Data refers to any information or facts collected for reference or analysis.
There are two types of data: secondary data and primary data.
Secondary Data
Its data that been gathered earlier for some other purpose. In contrast, the data that are
collected first hand by someone specifically for the purpose of facilitating the study are
known as primary data.
E.G: the demographic statistics collected every ten years are the primary data with the
registrar of persons but the same statistics used by anyone else would be secondary data
with that individual.
Advantages of secondary data
i) It is far more economical as the cost of collecting original data is saved.
ii) Use of secondary data is time saving.
Disadvantages of secondary data
i) One does not always know how accurate the secondary data are.
ii) The secondary data might be out dated.
The largest value is 73 and the smallest is 65. Hence, the range is 73 – 65 = 8 inches.
Frequency Distribution
Ungrouped data
In forming an array a value is repeated as many times as it appears. The number of times a
value appears in the listing is referred to as its frequency. In giving the frequency of a value,
we answer the question, “ How frequently does the value occur in the listing?”
When the data is arranged in tabular form by giving its frequencies, the table is called a
frequency table. The arrangement itself is called a frequency distribution.
Quite often it is useful to give relative frequencies instead of actual frequencies. The relative
frequency of any observation is obtained by dividing the actual frequency of the observation
by the total frequency (sum of all frequencies).
If the relative frequencies are multiplied by 100 and expressed as a percentage, we get the
percentage frequency distribution.
An advantage of expressing frequencies as percentages is that one can then compare
frequency distributions of two sets of data.
Example:
The following data were obtained when a die was tossed 30 times. Construct a frequency
table.
1 2 4 2 2 6 3 5 6 3
3 1 3 1 3 4 5 3 5 3
5 1 6 3 1 2 4 2 4 4
Grouped Data
When dealing with a huge mass of data and when the observed values consist of too many
distinct values, it is preferable to divide the entire range of values and group the data into
classes.
E.G: If we are interested in the distribution of ages of people, we could form the classes
0 – 19, 20 – 39, 40 – 59, 60 – 79 and 80 – 99. A class such as 40 – 59 represents all the
people with ages between 40 and 59 years inclusive.
When data are arranged in this way, they are called grouped data. The number of
individuals in a class is called the class frequency.
The following set of steps are suggested to form a frequency distribution from the raw data
i) Range
Scan through the raw data and find the smallest and the largest value. The largest
value minus the smallest value gives the range.
ii) Number of classes
Decide on a suitable number of classes. This could be anywhere from six to twenty.
iii) Class size
Divide the range by the number of classes. Round this figure to a convenient value to
obtain the class size and form the classes.
iv) Frequency
Find the number of observations in each class.
Example
The following data gives the amounts (in dollars) spent on groceries by 40 housewives during a
week.
22 12 9 8 33 32 30 33 8 11
21 16 12 15 37 30 16 22 12 24
18 25 37 16 25 28 25 18 9 28
25 28 26 15 12 35 38 16 24 31
Construct a frequency distribution using seven classes.
NB: The upper boundary of one class is the lower boundary of the next.
When the data is grouped, the cumulative frequency distribution gives the total frequency of
all the values less than the upper boundary of a given class.
Example
Find the cumulative frequency distribution for the grouped data given below:
Class Frequency Cumulative frequency (cf)
5 – 19 4 4
20 – 34 12 16
35 – 49 15 31
50 – 64 16 47
65 – 79 22 69
80 – 94 11 80
3.1 Introduction
A measure of central tendency, also called measures of location or averages, is a single
value within the range of data that is used to represent all the values in the series.
Indirect method
Dx
X P.M .
n
where P.M = provisional mean, Dx = Deviations from P.M, Dx =
the sum of deviations from P.M
Grouped series
Direct method
X
xf Where f = frequencies, n = number of items
n
Indirect method
X P.M .
fDx
n
NB: For a grouped frequency distribution the value of X is taken as the mid point of each class.
Examples
1. The monthly sales of ABC stores for the period of 6 months were as follows:
37,000, 48,000, 84,000, 73,000, 35,000, 53,000.
2. Find N , where N f
2
Interpolation Formula
Steps
1. Construct the less than cumulative frequency distribution
2. Find N , where N f
2
Grade A B C D E
No of students (f) 10 15 67 50 21
No of students 2 7 15 30 20 4 1
h 2N
Second Quartile: Q2 LQ2 C
f Q2 4
h 3N
Third Quartile: Q3 LQ3 C
fQ3 4
In general, the three quartiles can be computed for grouped data by the formula
h iN
Qi LQi C
fQi 4
N = Total frequency
C = Cumulative frequency of the class preceding the ith quartile class.
Computation of the Deciles
h iN
Di LDi C
f Di 10
N = Total frequency
C = Cumulative frequency of the class preceding the ith decile class.
Computation of the Percentiles
h iN
Pi LPi C
f Pi 100
N = Total frequency
C = Cumulative frequency of the class preceding the ith percentile class.
NB: Analogous to the graphical method of estimating the median, the quartiles, deciles and
percentiles of a grouped frequency distribution can be estimated using the cumulative
frequency curve (ogive curve).
Examples
1. Find the 1st , 2nd and 3rd quartiles for the following data
13, 9, 18, 15, 14, 21, 7, 10, 11, 20, 5, 18, 25, 16, 17
2. Given below is the number of families in a locality according to their monthly expenditure
Monthly expenditure No. of families
140 - 150 17
150 - 160 29
160 - 170 42
170 - 180 72
180 - 190 84
190 – 200 107
200 – 210 49
210 – 220 34
220 – 230 31
230 – 240 16
240 – 250 12
Calculate:
i) All the quartiles
ii) 7th decile
iii) 90th percentile
Interpolation Formula
h f m f1
Mode L
2 f m f1 f 2
D1
= L i
1
D D2
D1 f1 f 0 , D2 f1 f 2
Examples
1. Find the mode for the data below
a) 1, 2, 3, 4, 5, 6; Solution: The mode does not exist
b) 7, 8, 3, 8, 6, 10, 8 Solution: Mode = 8; This is a uni-modal distribution
c) 29,30,60,13,30,7,2,7 Solution: Modes are 30 and 7; This is a bi-modal distribution
d)
X 4 5 6 7 8 9 10
F 2 5 21 18 9 2 1
In either case the median will be about one third as far away from the mean as the mode is. This means
that
G.M n x1 x2 ... xn
1
G.M x1 x2 ... xn n
1
Log G.M log x1 log x2 ... log xn
n
1
log G.M
n
log xi
G.M x1fn x2f2 ... xnfn N
1
Log G.M f1 log x1 f 2 log x2 ... f n log xn
N
1
log G.M
N
f log x
i i
Grouped data
f f1 f 2 ... f n
f x
H.M = =
f1 f 2 f
... n
x1 x2 xn
Examples
1. Calculate the Harmonic mean of the following data
11, 13, 15, 16, 19, 22, 13, 20
3.5 Exercise
1. What are the requirements of a good average? Compare the mean, the median and the mode
in the light of these requirements.
2. Find the mean, median and mode for the following set of data
i) 3, 5, 2, 6, 5, 9, 5, 2, 8 and 6
ii) 51.6, 48.7, 50.3, 49.5 and 48.9
3. The following data pertain to marks obtained by 120 students in their final examination in
mathematics:
Marks Number of Students
30 -39 1
40 – 49 3
50 – 59 11
60 – 69 21
70 – 79 43
80 -89 32
90 - 99 9
Total 120
Calculate the mode and the median.
4. Suppose we are given the following series:
Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70
interval
Frequency 6 12 22 37 17 8 5
4.1 Introduction
Dispersion refers to the degree to which numerical data tends to spread about an average
value. It is the extent of the scatteredness of items around a measure of central tendency.
The measures of dispersion are also referred to as measures of variation or measures of
spread.
Limitations
It is not based on each and every value of the distribution
It is subject to fluctuations of considerable magnitude from sample to sample
It cannot be computed in case of open-ended distributions
It does not explain or indicate anything about the character of the distribution within the
two extreme observations.
2. Calculate the average deviation from the mean for the following
Sales (thousands) 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of days (f) 3 6 11 3 2
Merits of Mean Deviation
1. It is easy to compute and understand
2. It uses all the data
3. It is less affected by the extreme values
4. Since deviations are taken from a central value, comparison about formation of different
distributions can easily be made.
5. It shows the significance of an average in the distribution
Demerits
1. Ignores algebraic signs while taking the deviations
2. Cannot be computed for distributions with open-ended class
3. Rarely used in sociological studies
x x
2
x x
2
2
, where = sum of squares of the deviations from arithmetic mean
n
Variance for grouped data
f x x
2
2
f
Computing the standard deviation
Standard deviation for ungrouped data
x x
2
n
f x x
2
f
NB: The computations of 2 can be simplified by using the following version of the formula
f
Examples
1. Find the standard deviation of the wages of the following ten workers working in a factory
Worker A B C D E F G H I J
Weekly Sales 1320 1310 1315 1322 1326 1340 1325 1321 1320 1331
N1 X 1 N 2 X 2
observations n1 n 2 is given by X 12
N1 N 2
Combined standard deviation of two series is given by
N112 N 2 2 2 N1d12 N 2 d 2 2
12
N1 N 2
where 12 = Combined standard deviation
d1 X 1 X 12 ; d 2 X 2 X 12
NB: The above formula can be extended to find out the standard deviation of three or more
groups. For example, combined standard deviation of three groups would be
Example
1. The number of workers employed, the mean wage per week and the standard deviation in each
branch of a company are given below. Calculate the mean wages and standard deviation of all
workers taken together for the factory.
Coefficient of Variation
The measures of dispersion which are expressed in terms of the original units of the
observations are termed as absolute measures. Such measures are not suitable for comparing
the variability of two distributions which are not expressed in the same units of measurements.
Therefore it is better to use relative measure of dispersion obtained as ratios or percentages and
are thus pure numbers independent of the unit of measurement.
Standard deviation is an absolute measure of dispersion and a relative measure based on the
standard deviation is called the coefficient of variation. It is a pure number and suitable for
comparing the variability, homogeneity or uniformity of two or more distributions. It is given as
a percentage and calculated as
Coefficient of variation (CV) = 100
Mean
The lower the C.V the more consistent or stable the distribution is since the less the variability.
Example
Over a period of 3 months the daily number of components produced by two comparable
machines was measured, giving the following statistics
Machine A: mean = 242.8; Standard deviation = 20.5
Machine B: mean = 281.3; Standard deviation = 23.0
Which machine has less variability in its performance?
3 Mean Median
SK p
Kurtosis refers to the degree of flatness or peakedness of a frequency curve. The degree of
peakedness of a distribution is measured relative to the peakedness of the normal distribution.
If a distribution is more peaked than the normal curve, it is called Leptokurtic; if it is more flat-
topped than the normal curve, it is called platykurtic or flat-topped. The normal curve is itself
known as Mesokurtic.
Leptokurtic curve
Freq
Mesokurtic (normal
curve)
Platykurtic curve
4.6 Activities
1. The following table indicates the marks obtained by students in a statistics test.
Marks Number of students
0 – 20 5
20 – 40 7
40 – 60 -
60 – 80 8
80 – 100 7
The arithmetic mean for the class was 52.5 marks. You are required to determine the value
of:
i) The missing frequency
ii) The median mark
iii) The modal mark
iv) The standard deviation
v) The coefficient of skewness
2. From the prices of the shares X and Y given below, state which share is more stable in value
and which one would you invest on and why?
X: 55 54 52 53 56 58 52 50 51 49
Y: 108 107 105 105 106 107 104 103 104 101
3. An analysis of the monthly wages paid to workers of two firms A and B belonging to the
same industry gives the following results:
Firm A Firm B
No. of wage earners 586 648
Average monthly wage 52.5 47.5
Standard deviation 10 11
Compute the combined standard deviation.
LESSON FIVE: PROBABILITY DISTRIBUTIONS
5.1 Introduction
Probability is the likelihood or chance that a particular event will occur.
In probability and statistics the term experiment refers to any procedure that gives rise to a
collection of outcomes which cannot be predetermined.
In tossing a coin, the possible outcomes are as follows:
Tossing 1 coin : H , T
Tossing 2 coins: HH , HT , TH , TT
Tossing 3 coins: HHH , HHT , HTH , HTT , THH , THT , TTH , TTT
EXAMPLE
Let the set of all outcomes (sample space) in the experiment of tossing two coins be
HH , HT , TH , TT . Then
A= HT , TH is the event of getting just one head/tail
B= HH , HT , TH is the event of getting atleast one head
= is the the impossible event
S = HH , HT , TH , TT is the sure event
An elementary event or simple event is the event containing only one point of the sample
space E.G: In the Toss of two coins, the following are elementary events:
Let X denote the number of smokers among the three students chosen. Then:
Simple event in S Random variable X
SSS 3
SSN 2
SNS 2
SNN 1
NSS 2
NSN 1
NNS 1
NNN 0
The probability distribution of a random variable can be described by using all the values that
a random variable can together with the corresponding probabilities. Such a listing is called a
probability distribution or probability mass function of the random variable.
Example
Suppose X represents the number of heads in a random experiment of tossing three coins.
The sample space is:
S HHH , HHT , HTH , HTT , THH , THT , TTH , TTT
The probability distribution of the random variable X defined as the “number of heads” is
x P(X =x)
1
0 8
3
1 8
3
2 8
1
3 8
In general, suppose X is a random variable that assumes the values x1 , x2, …, xk. if we
represent the probability that X assumes the value xi by P(X=xi), then the probability function
can be given in the form of a table as
X P(x)
x1 p(x1)
x2 P(x2)
. .
. .
. .
xk P(xk)
Sum = 1
k
The sum of the probabilities, i.e. p( x ) p( x ) P( x ) ... P( x ) is one.
i 1
i 1 2 k
Example
The number of telephone calls received in an office between 9 – 10 am has the probability
distribution as shown below:
Number of calls (X) Probability, P(x)
0 0.05
1 0.20
2 0.25
3 0.20
4 0.10
5 0.15
6 0.05
variable with respective probabilities p( x1 ), p( x2 ),... p( xn ) , then its mean (also called the
expected value) is given by
x1 p x1 x2 p x2 ...xn p xn
n
= xi p xi
i 1
i 1
The positive square root of the variance is called the standard deviation of the random
variable. The variance is commonly denoted as 2 , hence the standard deviation equals .
Example
Suppose we are given the following data relating to the breakdown of a machine in a certain
company during a given week, where x represents the number of breakdowns of the machine and
P(x) represents the probability value of x.
x 0 1 2 3 4
P(x) 0.12 0.20 0.25 0.30 0.13
Find the mean and the variance of the number of breakdowns per week for this machine
NB: The computations of 2 can be simplified by using the following version of the formulae:
2 x2 .P x 2
NB:
i) The curve is symmetrical w.r.t the vertical axis through zero
ii) It is strongly recommended that we sketch the curves and identify the areas under the
curve and the values along the horizontal axis.
EXAMPLES
1. If P 0 z c 0.3944 . Find c.
b) P z c 0.8238
c) P 1 z c 0.1525
d) P c z c 0.8164
Having considered areas under the standard normal curve, we now consider the general case
of a normal distribution with any mean and any standard deviation , where 0 .
If X is a normal random variable with mean and standard deviation , then X can be
X
converted into a standard normal variable z by setting z
EXAMPLE 6
Suppose X has a normal distribution with = 30 and 4. Find
a) P(30 X 35) b) P( X 40) c) P( X 22)
5.6 Activities
1. A salesman who sells cars for General Motors claims that he sells the largest number of cars
on Saturday. He has the following probability distribution for the number of cars he expects
to sell on a particular Saturday.
No. of cars (x) Probability P(x)
0 .1
1 .2
2 .3
3 .3
4 .1
Total 1.0
i) On a typical Saturday, how many cars does the salesman expect to sell?
ii) What is the variance of the distribution?
2. In a recent survey, 90% of the homes in a city were found to have colored TV’s. In a sample
of nine homes, what is the probability that:
i. All nine have colored TV’s?
ii. Less than five have colored TV’s?
iii. More than five have colored TV’s?
iv. At least seven homes have colored TV’s?
3. The life times of electric components manufactured by Raman Industries Ltd are normally
distributed with mean of 2500 hours and standard deviation of 600 hours. If the daily production is
500 components, how many are expected to have a life time of:
i) Less than 2600 hours
ii) Between 2350 hours and 2580 hours
iii) More than 2380 hours
LESSON SIX: SAMPLING AND SAMPLING DISTRIBUTIONS
6.1 Introduction
The field of inferential or inductive statistics is concerned with studying facts about populations.
Specifically, the interest is in learning about the population parameters. This is accomplished by
picking a sample and computing the values of the appropriate statistics.
A parameter is a numerical descriptive measure of a population. Because it is based on the
observation in the population, its value is almost always unknown.
A Sample statistic is a numerical descriptive measure of a sample. It is calculated from the
observations in the sample.
NB: The term statistic refers to sample quantity and the term parameter refers to a population
quantity.
Sampling is the process of selecting a sample from a population.
b) Non-probability sampling
It is used when a researcher is not interested in selecting a sample that is representative of the
population.
i) Purposive Sampling
It allows the researcher to use cases that have the required information with respect to the
objectives of his or her study e.g. educational level, age group, religious sect etc.
ii) Quota Sampling
The researcher purposively selects subjects to fit the quotas identified e.g. Gender: Male or
Female; Class Level: Graduate or Undergraduate; Religion: Muslim, Protestant, catholic,
Jewish; Social economic class: Upper, middle or lower.
iii) Snow ball sampling
It is used when the population that possesses the characteristics under study is not well
known and can be best located through referral networks. Initial subjects are identified who
in turn identify others. Commonly used in drug cultures, teenage gang activities, Mungiki
sect, insider trading, Mau Mau etc.
iv) Convenience or Accidental Sampling
Involves selecting cases or units of observation as they become available to the researcher
e.g. asking a question to the radio listeners, roommates or neighbours.
6.3 Reasons for Sampling
We obtain a sample rather than a complete enumeration (a census) of the population for many
reasons. There are six main reasons for sampling in lieu of the census.
i) Economy: Directly observing only a portion of the population requires fewer resources than a
census.
ii) The Time factor: A sample may provide an investigator with needed information quickly
iii) The very large populations: Many populations about which inferences must be made are quite
large and sample evidence may be the only way to obtain information.
iv) Partly inaccessible populations: Some populations contain elementary units so difficult to
observe that they are in a sense inaccessible e.g. in determining consumer attitudes not all of the
users of a product can be queried.
v) The Destructive nature of the observation: Sometimes the very act of observing the desired
characteristics of the elementary unit destroys it for the use intended. Classical examples of this
occur in quality control
vi) Accuracy and sampling: A sample may be more accurate than a census. A sloppily conducted
census can provide less reliable information than a carefully obtained sample.
Sampling error: It comprises the difference between the sample and the population that are due
solely to the particular elementary units that happen to have been selected.
There are two basic causes for sampling error.
One is Chance: Bad luck may result in untypical choices. Unusual elementary units do
exists, and there is always a possibility that an abnormally large number of them will be
chosen. The main protection against this type of error is to use a large enough sample.
Another cause of sampling error is sampling bias. This is the tendency to favor the selection
of elementary units that have particular characteristics. Sampling bias is usually the result of
a poor sampling plan.
Non sampling error
The other main cause of unrepresentative samples is non sampling error. This type can occur
whether a census or a sample is being used.
A non-sampling error is an error that results solely from the manner in which the observations
are made. The simplest example of non sampling error is inaccurate physical measurement due
to faulty instruments or poor procedures. Consider the observation of human weights – no 2
answers will be of equal reliability.
2
mean and variance of the sampling distribution of x are given by x and x2 .
n
When random samples of size n are drawn without replacement from a finite population of size N that
has a mean and a variance 2 , the mean and the variance of the sampling distribution of x are
given by
2 N n
x and x2
n N 1
2
If the population size is large compared to the sample size, x2 , approximately
n
The standard deviation of the sampling distribution of x is commonly known as the standard error of
the mean. It is when sampling with replacement. For a sample drawn without replacement from a
n
N n
finite population of size N, the standard error of the mean is
n N 1
In the latter case it is approximately if the population is very large compared to the sample
n
2
size. In our discussion, we shall assume that the population is large enough that can be taken
n
as the value of x2 even when sampling without replacement.
The standard error of the mean then depends on two quantities, 2 and n. It will be large if
2 is large, i.e. if the scatter in the parent population is large. On the other hand, the standard
error will be small if the sample size n is large. Since with a larger sample we can get more
information about the population mean and consequently less scatter of the sample mean
about .
The variance of the parent population is usually not under the experimenter’s control. Therefore
one sure way of reducing the standard error of the mean is by picking a large sample – the larger
the better.
So for we have concerned ourselves with two parameters of the sampling distribution of
x x and x2 . We now turn our attention to the distribution itself
The probability distribution of x will very much depend on the distribution of the sampled
population.
Note that if n the sample size, is large, the distribution of x is close to a normal
2
distribution of course with mean and variance . The statement of this result is contained
n
in the central limit theorem.
2
distribution with mean and variance , provided the sample size is large.
n
The central limit theorem tells us that the shape of the distribution is approximately normal. We
2
already know that if the population has mean and variance 2 , then x and x2 .
n
Converting to the z scale, we can give an alternate version of the central limit theorem.
x
When the sample size is large, the distribution of is close to that of a standard
n
normal variable z.
(Recall that to convert to the z scale the rule is: subtract the mean and divide by the standard
deviation of the r.v in question)
Since the central limit theorem applies if the sample size is large, a natural question is, how large
is large enough?
This will depend on the nature of the sampled population
If the parent population is normally distributed, then the distribution of x is normal for
any sample size,
If the parent population has a symmetric distribution, the approximation to the normal
distribution will be reached for a moderately small sample size, as low as 10.
In most instances, the tendency towards normality is so strong that the approximation is fairly
satisfactory with a sample size of about 30.
Example 1
The records of the Dept of health, education and welfare show that the mean expenditure
incurred by a student during 2010 was $5000 and the standard deviation of the expenditure was
$800. Find the approximate probability that the mean expenditure of 64 students picked at
random was
a) More than $4820
b) Between $4800 and $5120
Example 2
The length of life (in hours) of a certain type of electric bulb is a random variable with a mean
life of 500 hours and a standard deviation of 35 hours.
What is the approximate probability that a random sample of 49 bulbs will have a mean life
between 488 and 505 hours?
very close to 0 or 1) and if n is large, then the distribution of the sample proportion x is
n
x x
2
i
Similarly, S 2 is an estimator of 2 and s 2 i 1
is its estimate computed from a set
n 1
X
of data x1 , x2 ,......, xn . Also If X represents the number of successes in a sample of n, then
n
x
is an estimator of P and if in a particular sample there are x successes, then is an estimate
n
of P.
The major limitation of a point estimate is that it fails to indicate how close it is to the
quantity it is supposed to estimate. In other words, a point estimate does not give any idea
about the reliability or precision of the method of estimation used.
Interval Estimation
Another method of estimating parameters is called the method of Interval Estimation or
Confidence Interval.
It involves computing two points and constructing an interval within which the parameter lies
with a specified degree of confidence. In constructing the end points of the interval, all of the
factors, namely, the point estimate, the population variance, and the sample size, are brought
into play.
When we find a point estimate, we certainly do not expect that it will exactly equal to the
parameter value on the dot. Also if we take two samples from the same population, we do not
expect the two estimates computed from these samples to be exactly equal. This is due to the
sampling error involved. Thus, the method of point estimation has some drawbacks.
7.4 Confidence Intervals for Population Mean when the Population
Variance is Known.
If the population has a normal distribution and is known, then a 1 100 percent
Example 2:
A random sample of 16 fully grown turkeys had a mean weight of 20.8kgs. If we can assume
from past experience that 2.8 kgs, construct confidence interval for , the true mean weight,
with the following confidence coefficients.
a) 90%
b) 95%
c) 98%
2
differ from by more than a pre assigned quality e is n 2 .
e
Example
A population has a normal distribution with variance 225. Find how large a sample must be
drawn in order to be 95% confident that the sample mean will not differ from the population
mean by more than 2 units.
7.6 Confidence Interval for Population Mean When the Population
Variance is Unknown
A 1 100 percent confidence interval for when the population is normally distributed and
Note that tn 1, , will be very close to 2 if n is 30 or more. In that case, the above confidence.
2
Example 1
When 16 cigarettes of a particular brand were tested in a laboratory for the amount of nicotine
content, it was found that their mean content was 18.3 mg with S =1.8mg.
Set a 90 percent confidence interval for the mean nicotine content in the population of
cigarettes of this brand. (Assume that the amount of nicotine in the cigarette is normally
distributed).
Example 2
In order to estimate the amount of time in minutes that teller spends on a customer, a bank
manager decided to observe 64 customers picked at random. The amount of time the teller spent
on each customer was recorded. It was found that the sample mean was 3.2 minutes with
S 2 1.44 find a 98% confidence interval for the mean amount of time .
Example 3
The following data represent the amount of sugar consumed (in pounds) in a household during
five randomly picked weeks: 3.8, 4.5, 5.2, 4.0 and 5.5. Construct a 90% confidence interval for
the true mean consumption . (Assume a normal distribution for the amount of sugar consumed)
LESSON EIGHT: HYPOTHESIS TESTING
8.0 Introduction
A statistical hypothesis is a statement, assertion or claim about the nature of a population.
Hypothesis testing is a procedure based on sample evidence and probability theory to
determine whether the hypothesis is a reasonable statement.
b) H A : 0
c) H A : 0 , where 0 is a given specific value.
H A : 0 with 0 450
C 0 Z
n
where n is the sample size, is the population standard deviation, which is assumed known and
Z is the value on the z scale such that the area in right tail is .
b) A left-tailed test
Suppose the null and alternative hypotheses are given as
H 0 : 0
H A : 0
Once again, the alternative hypothesis is one sided (less than, <). We reject H 0 for smaller
values of x , leading to the rejection of H 0 if the value falls in the left tail of the distribution of
x as shown below. This gives a one-tailed test that is specifically a left-tailed test.
x
0
Z
Actions Reject H0 Do not reject H0
The critical value C is given by C 0 Z .
n
The decision rule is given as:
x 0
Reject H 0 if x 0 Z or equivalently reject H 0 if Z
n
n
c) A Two-Tailed test
A test leads to a two-tailed test if the alternative hypothesis is two sided.
Consider he following example:
E.g. Suppose a machine is adjusted to manufacture bolts to the specification of 1 – inch diameter,
and we state the null and alternative hypotheses as
H0 : 1
HA : 1
If the sample mean of the diameters was too far off on either side of 1, we would favor rejecting
H 0 . If the value of x falls in either tail of the distribution of X , we will reject H 0 .
The rejection region with 0.05 has been distributed as 0.025 at each tail
2
2 2
x
0
Z Z z - scale
2 2
C1 0 Z and C2 0 Z
2 n 2 n
The decision rule is formulated as follows:
x 0
Reject H 0 if x 0 Z or x 0 Z or equivalently reject H 0 if is less
2 n 2 n
n
than Z or greater than Z
2 2
0 Less than Z
0 Less than
Z
or greater than
Z
2 2
Example 1
After taking a refresher course, a salesman found that his sales (in dollars) on 9 random days
were 1280, 1250, 990, 1100, 880, 1300, 1100, 950 and 1050. Does the sample indicate that the
refresher course had the desired effect, in that his mean sale is now more than 1000 dollars?
Assume 100 , and the probability of erroneously saying that the refresher course is beneficial
should not exceed 0.01. Also assume that the sales are normally distributed.
Example 2
An IQ test was administered to 9 students and their mean IQ was found to be 95. Assuming the
population variance is 144, is it true that the mean IQ in the population is less than 100?
Use 0.15 , and assume that IQ is normally distributed.
Example 3
A machine can be adjusted so that when under control, the mean amount of sugar filled in a bag
is 5kgs. From past experience, the standard deviation of the amount filled is known to be
0.15kgs.
To check if the machine is in control, a random sample of 16 bags was weighed and the mean
weight was found to be 5.1kgs. At the 5% level of significance, is there evidence to believe that
the adjustment is out of control [Assume a normal distribution of the amount of sugar filled in a
bag]
8.6.2 Test of Hypothesis for the Population Mean when the Population Variance is
Unknown and the Sample is Small
x 0
In the case where was known, we used the test statistic
n
Since is not known, we will use its estimation S. Hence the appropriate test statistic is
x 0
T
S
n
At this point we need the added assumption that the population is normally distributed,
especially if n is small. Since, under this assumption, the statistic T has student’s t distribution
with n – 1 d.f, we get the decision rules given in the following table, depending upon the
particular alternative hypothesis
Example 4
A car salesman claims that a particular make of car would give a mean milleage of greater than
20 miles per litre To test the claim, a field experiment was conducted where 10 cars were each
run on one litre of petrol. The results (in miles) were 23, 18, 22, 19, 19, 22, 18, 18, 24, 22.
Do the data corroborate the salesman’s claim? Use 0.05 and assume a normal distribution
for mileage per gallon.
Example 5
A home economist claims that is a person is put on a certain diet, it will lead to a reduction of his
or her weight. The following data records the weights (in pounds) of five people, before and
after the diet. Does the data support the claim at the 5% level of significance?
Person number 1 2 3 4 5
Before the diet 175 168 140 130 150
After the diet 170 169 133 132 143
Example 6
An auto dealer believes that his new model will give mean trouble-free service of at least 12,000
miles. In a simulated test with 4 cars, the following numbers of trouble-free miles were
obtained: 11,000, 12,000, 11,800 and 11,200
Do these data refute the dealer’s claim? Use 0.05 [assume a normal distribution]
Example 7
A machine can be adjusted so that when under control, the mean amount of sugar filled in a bag
is 5 kg. To check if the machine is in control, six bags were picked at random and their weights
were found to be 5.3, 5.2, 4.8, 5.2, 4.8 and 5.3.
At the 5% level of significance, is there evidence to believe that the machine is not in control?
[Assume a normal distribution for the weight of a bag]
x
Furthermore, if the sample is large, the shape of the distribution of is approximately
n
normal. Consequently, under the null hypothesis, which postulates that the population
x
proportion is p0 , has a distribution that is approximately normal with mean p0 and
n
We now have a situation analogous to the one where we tested hypotheses regarding the
population mean when 2 was known.
p0 1 p0
, that of 0 by p0 and that of
x
The role of x is played by by
n n n
The table below gives the 3 cases based on the nature of the alternative hypothesis
p p0 Greater than Z
p p0 Less than Z
Example 1
A machine is known to produce 30% defective tubes. After repairing the machine, it was found
that it produced 22 defective tubes in the first run of 100. Is it true that after the repaired the
proportion of defective tubes is reduced? Use 0.01 .
Example 2
The proportion of Kenyans who traveled abroad last year was 20%. To find the attitude of
people on foreign travel this year, 100 people were interviewed. Of these 15 said they would
travel and the remaining 85 said they would not. Is there any basis to believe that the attitude has
changed from last year? Use 0.10 .
X Y
12 22
m n
The decision rules for various forms of alternative hypothesis are given in the table below.
Alternative hypothesis The decision rule is to reject H 0 if the computed value is
1 2 Greater than Z
1 2 Less than Z
surveys it is known that the variance of weight in Kenya is 12 100 and in the U.S it is
22 169 .
Is it true that there is significant difference between mean weights in the two places? Use
0.05 . [Assume that the weights are normally distributed]
Example 2
In order to compare two brands of cigarettes, brand A and brand B, for their nicotine content, a
sample of 60 was inspected from brand A and a sample of 40 from brand B. The results of the
tests were summarized as follows.
Brand A x 15.4 S12 3
At the 5% level of significance, do the two brands differ in their mean nicotine content?
8.7.2 Difference in Population Means when the Variances are unknown but are assumed
equal
The following test procedure is particularly suited for the case when small independent
samples are drawn from normally distributed populations both having the same variance.
We are interested in testing the null hypothesis H 0 : 1 2
X Y
When the variance are known, we used the statistic
12 22
m n
But we are given that the variances are equal. So suppose 12 22 and let 2 represent the
X Y
common value. The above test statistic then reduces to
1 1
m n
Since is not known, we shall use its polled estimator S P where, S p2
m 1 S12 n 1 S22
mn2
X Y
Therefore, the test statistic appropriate for carrying out the test of H 0 is
1 1
Sp
m n
The test procedure for the various form of the alternative hypothesis are given in the table below
Alternative Hypothesis The decision rule is to reject H 0 if the computed value of is
Example 3
A nitrogen fertilizer was used on 10 plots and the mean yield per plot was found to be x 82.5
with an estimate S1 of the population standard deviation of yield per plot equal to 10kg. On the
other hand, 15 plots treated with phosphate fertilizer gave a mean yield y 90.5 kg per plot with
an estimate S 2 of the standard deviation of yield per plot equal to 20kg. At the 5% level of
significance are the two fertilizers significantly different?
9.0 Introduction
This lesson covers the tests of goodness of fit, tests of independence and tests of
homogeneity
If n items are picked independently from such a population, this leads to the binomial distribution.
A generalization of this is when the population can be broken into more than two mutually exclusive
categories. For example, a coin could land heads, trails or on edge; when a die is rolled it could land
showing up one of the six faces; a person might be a democrat, a Republican, or an independent; a
person might be an A, B, O or AB blood type, and so on.
If n independent observations are made from such a population, we get a generalized concept of the
binomial distribution called the Multinomial distribution.
With our background of the last section, we are equipped to test the following null hypothesis
Ho: The Proportion of Democrats in the U.S is 0.60 (implying the proportion of non-
Democrats is 0.40)
In this section we consider how to test a null hypothesis of the following type.
Ho: In the U.S, the proportion of Democrats is 0.55, the proportion of Republicans is 0.35,
and the proportion of independents is 0.10.
To test the above hypothesis, suppose we interview 1000 people picked at random. On the basis of
the stipulated null hypothesis, we would expect 550 Democrats, 350 Republicans and 100
independents.
If we actually observe 568 Democrats, 342 Republicans and 90 independents in this sample, we
might be quite willing to go along with the null hypothesis.
On the other hand, if the sample yields 460 Democrats, 400 Republicans and 140 independent, we
would be reluctant to accept Ho.
Thus in the final analysis, the statistical test will have to be based on how good a fit or closeness
there is between the observed numbers and the numbers that one would expect from the
hypothesized distribution.
Tests of this type which determine whether the sample data are in conformity with the
hypothesized distribution are called tests of goodness of fit, since they literally test how good the fit
is.
The test criterion is provided by a statistic X whose value for any sample is given as a number 2
defined by
Oi Ei
2
6
2
i 1 Ei
Where Oi represents the observed frequency of the face marked i on the die and Ei the
corresponding expected frequency obtained by assuming that the null hypothesis is true.
Example:
It is believed that the proportions of people with A, B ,O and AB blood types in the population
are, respectively. 0.4, 0.2, 0.3 and 0.1. When 400 randomly picked people were examined, the
observed numbers of each type were 148, 96,106 and 50.
At the 5% level of significance, test the hypothesis that these data bear out the stated belief.
Summary:
1. The population is divided into K categories (classes) C1, C2,…, Ck
2. The null hypothesis stipulates that the probability that as individual belongs to category C1 is P1, that
it belongs to category C2 is P2, and so on.
3. To test this hypothesis, a random sample of n individuals is picked. The observed frequencies of the
categories are recorded as O1, O2,…,OK.
4. If the null hypothesis is true, then the expected frequencies E1, E2,…,Ek are obtained as follows:
O E O E2 O Ek
2 2 2
1 1
2
2 ... k
E1 E2 Ek
6. If none of the expected frequencies is less than 5, the distribution of X can be approximated very
closely by a chi-square distribution. Since there are K categories, the number of d.f associated with
the chi-square is K – I.
7. The critical region for a given level of significance will therefore consist of the right tail of the chi-
square distribution with K – 1 d.f.
The decision rule is:
Reject Ho if the computed 2 value is greater than the table value k 1, ,
2
Note:
The distribution of the statistic X employed here is only approximately chi-square. It should not be used
if one of more of the expected frequencies is less than 5.
9.3 Test of Independence
In the previous section, we have observed only one characteristic on any individual e.g. in classifying
an individual as A, B, O or AB blood type, we observed the characteristic “blood type”.
Here we are interested in observing more than one variable on each individual and finding if there
exists a relationship between these variables. For example: for each person we might observe both
blood type and eye color and investigate if these characteristics are related in any way.
In short, our goal is to test whether two attributes observed on members of a population are
independent.
As a first step, we pick a sample of size n and classify the data in a two way table on the basis of the
two variables. Such a table is called a contingency table, since it alludes to whether the distribution
according to one variable is contingent on the distribution of the other. If there are r rows and c
columns, it is referred to as an “r by c” contingency table.
O E
2
significance is: Reject Ho if the computed 2 value is greater than the table value 2r 1 c 1,
Example:
In a certain community, 360 randomly picked people were classified according to their age group
and political leaning. The data is presented below:
Political Age group
leaning 20-35 36-50 Over 50 Total
Conservative 10 40 10 60
Moderate 80 85 45 210
Liberal 30 25 35 90
Total 120 150 90 360
Test the hypothesis that a person’s age and political leaning are not related. Use = 0.05
O E
2
Example:
In order to investigate whether the distribution of the blood types in Europe is the same as in the
U.S , information was collected on 200 randomly picked people in Europe and 300 people in the
U.S. From the data provided below, is it true that the distribution of blood types in Europe and
the U.S are significantly different:
Location
Blood type Europe U.S Total
A 95 125 220
B 50 70 120
O 45 90 135
AB 10 15 25
Total 200 300 500
LESSON TEN: ANALYSIS OF VARIANCE
10.1 Introduction
In case we are not able to make these assumptions in a particular problem, the analysis of
variance technique should not be used. In such cases, we should consider using a “non-
parametric (distribution-free) technique”.
i.e. the arithmetic means of the population from which the k samples are randomly drawn
are equal to one another.
The steps involved in carrying out the analysis are:
The degrees of freedom will be one less the number of samples i.e. if there are 4 samples,
then the degrees of freedom will be 4 – 1 = 3. In general v = k – 1 where k = number of
samples.
ii) Take the deviations of the various observations in a sample from the mean values of the
respective samples
iii) Square these deviations and obtain the total which gives the sum of squares within the
samples.
iv) Divide this total obtained in step (c) by the degrees of freedom, the d.f is obtained by
deducting from the total number of observations, the number samples, the number of
samples, i.e. v = n – k , where k refers to the total number of all the observations.
Calculate the F-Ratio
Calculate the F – ratio as follows
F is always computed with the variance between the sample means as the numerator and the
variance within the sample means as the denominator.
The denominator is computed by combining the variance within the k samples into single
measures.
Compare the computed value of F
Compare the calculated value of F with the table value of F for the given d.f at a certain critical level
(generally we take 5% level of significance).
If the calculated value of F is greater than the table value of F, it indicates that the difference in
sample means is significant,
i.e. it could not have arisen due to fluctuations of random sampling or, in other words, the
samples do not come from the same population.
On the other hand, if the calculated value of F is less than the table value, the difference is not
significant and hence could have arisen due to fluctuations of random sampling.
Example
As head of a department of a consumers’ research organization, you have the responsibility for testing
and comparing lifetimes of four brands of electric bulbs. Suppose you test the lifetime of three electric
bulbs of each of the four brands.
The data is shown below, each entry representing the lifetime of an electric bulb, measured in
hundreds of hours.
Brand
A B C D
20 25 24 23
19 23 20 20
21 21 22 20
Can we infer that the mean lifetime of the four brands of electric bulbs are equal?
To use the ANOVA table, it is convenient to use the following short-cut computational formulas:
k Tj2 T2
Between samples sum of squares = SSB =
j=1 nj N
k nj k T j2
Within samples sum of squares = SSW = X 2
ij
j 1 i 1 j 1 nj
nj
k
T2
Total sum of squares = SST X ij2
j 1 i 1 N
The format for the ANOVA table using the computational formulas is shown below:
MSB
F
MSW
k nj k T j2
SSW = X
SSW
Within Samples 2
n–k MSW =
nk
ij
j 1 i 1 j 1 nj
nj
k
T2
Total SST X ij2 n-1
j 1 i 1 N
Example
Consider the above example.
In order to use the computational formulas the following four quantities must be computed;
k nj k T j2 T2
X ij2 ,
j 1 j 1
Tj , n
j 1
, and
N
.
j
LESSON ELEVEN: REGRESSION AND CORRELATION ANALYSIS
11.1 Introduction
Correlation analysis is a statistical tool used to ascertain the association between two variables while
regression analysis is used to determine the nature and extent of relationship between variables.
This lesson explains the methods used in studying correlation and regression.
Scatter Diagram
It helps to illustrate diagrammatically any relationships that may exist between two variables.
The following diagram indicate various degrees of correlation
Diagram to be drawn
Examples
1. Draw a scatter diagram from the following data
Supply (x) 4 5 8 9 10 12 15
Demand (y) 3 4 6 5 7 8 11
Example
The following data refers to exam marks vs hours of study for a sample of 8 candidates that sat a
statistics exam
6 d 2
Use the formula r 1
n n 2 1
Example
Two managers are asked to rank a group of employees in order of potential to eventually become
top managers. The rankings are as follows:
Employees ranking by manager I Ranking by manager II
A 10 9
B 2 4
C 1 2
D 4 3
E 3 1
F 6 5
G 5 6
H 8 8
I 7 7
J 9 10
Calculate the coefficient of rank correlation and comment on the value.
Example
Calculate the rank correlation Coefficient for the following data of marks of 2 tests given to
candidates for a clerical job
Preliminary Test 92 89 87 86 83 77 71 63 53 50
Final test 86 83 91 77 68 85 52 82 37 57
The adjustment consists of adding 1 m 3 m to the value of
12 d 2
where m stands for
Example
An examination of eight applicants for a clerical post was taken by a firm. From the marks
obtained by the applicants in the accounting and statistics papers, compute the Rank coefficient
of correlation.
Applicant A B C D E F G H
Marks in accounting 15 20 28 12 40 60 20 80
Marks in statistics 40 30 50 30 20 10 30 60
Merits of the Rank method
It is simpler to understand and easier to apply compared to the Karl Pearson’s method.
Where the data are of qualitative nature like honesty, efficiency, intelligence etc, the method
can be used with great advantage.
It is the only method that can be used where we are given the ranks and not the actual values.
Limitations
The method cannot be used for finding out correlation in a grouped frequency distribution.
Where the number of observations exceeds 30, the calculations become quite tedious and
require a lot of time.
n2
The test statistic to carry out the test is r .
1 r2
If H0 is true, then this statistic has the students’ t distribution with n-2 degrees of freedom.
Example
Consider the previous example on Exam marks Vs hours of study where we obtained r = 0.88
and r2 = 0.77 based on a sample with n = 10. Test the hypothesis that the population correlation
coefficient is zero at the 5% level.
Types of Regression
Simple linear regression: Involves a relationship between two variables only.
Multiple regression: Analyses or considers the relationship between three or more variables.
In regression analysis, an attempt is made to determine a line (Curve) which best fits the given
pair of data. In case of a linear relationship, a line with the equation of the Y a bX where a
and b are constants to be determined is fitted. The constants a and b are determined such that
S Y a bX is a minimum.
2
With the use of differential calculus, S is minimized for a and b which satisfy the following two
normal equations
Y na b X
XY a X b X 2
aˆ
1
n
Y bˆ X = ˆ
Y bX
Example
The following data give the observations on weekly income and expenditure on food for five
households.
Weekly Income (£) 240 270 300 30 360
Expenditure on food(£) 200 220 240 245 250
a) Plot the data on a scatter diagram
b) Determine the least squares regression line of expenditure on weekly income.
c) Using the equation in (b), estimate the expenditure on food for someone having a weekly
income of £380.
11.8 Activities
1. For the following results showing marks obtained by 15 students, calculate the Rank
correlation
Marks in 50 50 40 39 38 37 36 35 34 33 32 31 30 29 28
Maths
Marks in 50 49 51 52 43 47 42 40 44 40 30 41 32 33 31
English
2. The following data gives the aptitude test scores and productivity indices of 10 workers selected at
random.
This is a mathematical technique that deals with the optimization of a linear function of
variables known as objective function subject to a set of linear inequalities known as
constrains. The objective function may be profit, revenue, contribution and cost. The
constrains may be imposed by different resources such as labour, finance, materials,
machines, market, technology etc. By linearity is meant a mathematical expression in
which all expressions among the variables are linear (plotted you obtain a straight line.
The objective function, which describes the primary purpose of the formulation -
to maximize some return (profit) or to minimize some cost.
The constrains set, which is a system of inequalities under which optimization is
to be accomplished.
a) Linearity - costs, revenues or any physical properties which form the basis of the
problem vary in direct proportion (linearly) with the quantities or number of
components produced.
b) Divisibility - quantities, revenues and costs are infinitely divisible i.e. any
fraction or decimal answer is valid.
c) Certainty – the technique makes no allowance for uncertainty in the estimate
made, although the evaluation of dual values indicates the sensitivity of the
solution to marginal uncertainty in constraint values.
d) Positive solutions – non-negativity constraints are introduced to ensure only
positive values are considered.
e) Interdependence between demand for products is ignored; products may be
complementary or a substitute for one another.
f) Time factors are ignored. All production is assumed to be instantaneous
ADVANTAGES OF LP
1. In certain types of problems such as inventory control management, Chemical
Engineering design, dynamic programming may be the only technique that can solve
the problems.
2. It helps in attaining the optimum use of productive factors. Linear programming
indicates how a manager can utilize his productive factors most effectively by a better
selection and distribution of these elements. E.g. more efficient use of manpower and
machines can be obtained by use of linear programming.
3. Most problems requiring multistage, multi period or sequential decision process are
solved using this type of programming.
4. Because of its wide range, it is applicable to linear or non-linear problems, discrete or
continuous variables, deterministic or stochastic problems.
5. The mathematical techniques used can be adapted to the computer.
6. Better and more successful decisions
LIMITATIONS OF L.P
1. Each problem has to be modelled according to its own constraints and
requirements. This requires great experience and ingenuity.
2. The number of state variables has to be kept low to prevent complicated
calculations.
3. It treats all relationships as linear. I.e. if direct cost of producing 10 units is sh. 100
then on 20 units it is assumed to be sh. 200. This may not always be the case in
practice.
4. All the parameters in the linear programming model are assumed to be known
with certainty which is not possible in real situation.
a) Graphical methods
b) Simplex method
Whichever the method to be adopted, the first step is to formulate the linear
programming problems using the following steps:
Example 1:
A manufacturer has two products P1 and P2 both of which are produced in two steps by
machines M1 and M2. The process times per hundred for the products on the machines
are:
P1 4 5 10
P2 5 2 5
The manufacturer is in a market upswing and can sell as much as he can produce of
both the products. Formulate the mathematical model and determine the optimal
product mix.
Solutions:
X2
26
24
22
20
18
16
14
12
10
8
6
4 D(0,16)
2
0 C ( 10,12)
Feasible region
A (0,0) B(25,0)
4 8 12 16 20 24 25 28 32 36 40
X1
Considering the points of intersections, their coordinates and testing using the
objective functions;
Product P1 = 25
Product P2 = 0
Z = 10x1 + 5x2
Identify the biggest number in Z row (10). This gives the column of the interest.
Divide the elements in the identified column by quantity solution
100/ 4 = 25
80/5 = 16
The smallest of the answer obtained is 16, which identifies the row of interest.
The point where the identified column and the row meet, gives the pivot element
(5)
Step 5: Make pivot elements 1 (by dividing the row with pivot element by the value of
pivot element) and give the row identified a new identity (the identity of the identified
column). The draw initial simplex tableau reproduced.
Old row: S2 5 2 0 1 80
X1 1 0.4 0 0.2 16
X1 4 1.6 0 0.8 64
OLD ROW: Z 10 5 0 0 0
OLD ROW: Z 10 5 0 0 0
X1 10 4 0 2 160
Since all the elements in the Z row are not negatives or zeros, the optimal solution is not
reached. Go to step 8.
a) Pivot element
Column identified = X2
36/3.4 = 10.6
16/0.4 = 40
X2 0 1 0.29 -0.24 36
Product P1 = 1.6
Product P2 = 36
DUALITY
Every linear program has an opposite program called Dual program. The initial
formulated programme is called primal program. The relationship between primal and
dual program is that the objective optimal solution is the same and the solution of one
can be deduced from the other.
Procedure for determining dual program from primal is:
a) Maximum primal implies minimum dual and vice versa
b) Less or equal to (≤) primal implies greater or equal to (≥) dual and vice versa.
c) Number of variables in the dual program equal number of constraints in the primal and
vice versa.
d) The right hand side of dual constraints inequalities are objective co-efficient in primal
program and vice versa.
e) Constraint coefficients in the dual program are the transpose of the matrix of constraint
co-efficient in the primal.
f) Non-negativity conditions do not change.
Example 1:
Given primal program:
Max, Z = 4 x1 + 2x2 +5x3
Subject to: x1 + 2x2 - x3 ≤ 20 …………………….y1
4 x1 + 8x2 +11x3 ≤ 28 ……………….y2
6 x1 + x2 + 8x3 ≤ 32 ………………....y3
And x1, x2, x3 ≥ 0
Required:
Obtain the dual program
Solution:
Constraints coefficient matrix
1 2 -1
4 8 11
6 1 8
Transposing the above matrix:
1 4 6
2 8 1
-1 11 8
Dual program;
Mix, Z = 20 y1 + 28y2 +32y3
Subject to: y1 + 4y2 - 6y3 ≥ 4
2y1 + 8y2 +y3 ≥ 2
-1y1 + 11y2 + 8y3≥5
And y1, y2, y3 ≥ 0
Example 2:
Given primal program:
Min, Z = 5x1 + 8x2
Subject to: 2x1 + 3x2 ≥ 5 …………………….y1
4 x1 + 10x2 ≥ 19… ……………….y2
x1 + 12x2 ≥ 24… ……………….y2
And x1, x2 ≥ 0
Required:
Obtain the dual program
Dual program:
Max, Z = 5 y1 + 19y2 + 24y3
Subject to: 2y1 + 4y2 + y3 ≤ 5
3y1 + 10y2 +12y3 ≤ 8
And x1, x2, x3 ≥ 0
NOTE:
The solution to the dual can be deduced from the solution to the primal using simplex
method. The procedure involve associating the values in the Z-row of the optimal
primal tableau with the dual variables, where the first slack variable is associated with
the first dual variable, the second slack variable with the second dual variable and so
on.
Example3:
Suppose you have primal program as:
Max, Z = 2x1 + 3x2
Subject to: 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 5
And x1, x2 ≥ 0
After performing all steps involved in simplex method, the optimal (last) tableau is:
SENSITIVITY ANALYSIS
This involves determining the effect the various change to the primal programme
would have on the current solution to the program. It is also called Post-Optimality
analysis.
The various changes that can occur in linear programming problem include:
a) Changes in the coefficient of the objective program.
b) Changes in the availability of resources or the right hand side of the inequalities.
c) Changes in the coefficient of the constraints.
d) Addition of new constraints.
Example 4:
Suppose we have a formulated linear program model as:
Max, Z = 2x1 + 3x2
Subject to: 2x1 + x2 ≤ 4 ……………………………R1
x1 + 2x2 ≤ 5 ……………………………R2
And x1, x2 ≥ 0
Also suppose we are given optimal solution (after solving using simplex of graphical
method) as: x1 = 1 x2 = 2 and Z = 8
a) Supposing the 1st constrain (R1) increases by 20% and 2nd constrain (R2) increases by 10%,
perform the sensitivity analysis to find the new solution and check whether it is feasible
solution.
Solution:
The new solution is given as:
Current basic variable = (inverse matrix for constraints coefficient) x (New right
hand side)
But inverse of matrix = x Adjoint
Matrix for the coefficient of the constraints for the problem above is 2 1
1 2
Determinant = 4 – 1 =3
Adjoint = Transpose of the cofactor of the matrix
But cofactor = 2 -1
-1 2
Transposing the cofactor = Adjoint = 2 -1
-1 2
Inverse = x 2 -1
-1 2
New right hand side:
New R1 = 4 + x 4 = 4.8
New R2 = 5 + x 5 = 5.5
REVISION EXERCISE:
1) Using the information give an example 4 above, determine the new optimal solution by
performing the sensitivity analysis when:
a) R1 increases by 10% and R2 decreases by 20%.
b) R1 remain 4 and R2 increases by 30%
c) R1 reduces by 2 units and R2 increase by 3 units.
Here, the base period is fixed and prices of subsequent years are expressed as relatives
of the prices of the base year. A price relative is price of an item in one year relative to
another year i.e.
P1/P0 ×100
Where; P1 = price of current year
P0 = price of base year
Example:
From the following data, compute price index number by taking 2002 as base year.
Year 2002 2003 2004 2005 2006 2007
Price of 8 10 12.5 18 22 25
sugar/ Kg
Solution
Year Price of sugar/ Price index (P1/P0×
Kg 100)
2002 8 8/8 × 100 = 100
2003 10 10/8 ×100 = 125
2004 12.5 12.5/8 × 100 =
2005 18 156.50
2006 22 18/8 × 100 = 225
2007 25 22/8 × 100 = 275
25/8 ×100 = 312.5
In this method, the base is not fixed and it changes from year to year. The price of the
previous period is taken as the base period. This method shows whether the rate of
change is rising, falling or constant as well as the extent of change from year to year.
Price index number = (price of the current year)/ (price of previous year) × 100
Example;
Construct the chain base index numbers from the following data.
Year 2002 2003 2004 2005 2006 2007
Price 120 125 140 150 135 160
(Shs)
Solution
Year Prices Chain base index
(Shs) numbers
2002 120 -
2003 125 125/120 × 100 = 104.17
2004 140 140/125 × 100 = 112.0
2005 150 150/140 × 100 = 107.14 Weighted index number;
2006 135 135/150 × 100 = 90.00 If all commodities selected do not have
2007 160 160/135 × 100 = 118.52 equal importance for consumers then
weighted system is adopted. Appropriate
weights are assigned to different commodities. An index is called Weighted Aggregate
index when it is constructed for an aggregate of items (prices) that have been weighted
in some way (by corresponding quantities produced, consumed or sold), so as to reflect
their importance.
The important formulae of constructing weighted index numbers include;
i) Laspeyres Method (L) - The base year quantities/prices are taken as weights. The
method tries to answer the question “what is the change in aggregate value of the base
period list of goods when valued at given period prices?”
N.B in Laspeyres index weights (q0) are the base year quantities and do not change from
one year to next unlike Paasche index which requires continuous use of new quantity
weights for each period considered.
iii) Fisher’s Ideal Method - Taken as geometric mean of Laspeyres and Paasche indices.
∑P0q0 ∑P0q1
P01 = √ (L × P)
iv) Marshall-Edge Worth method - The current year as well as base year prices and
quantities are considered.
∑P0q0 + P0q1
P01 = 5580 + 7730 × 100
2810 + 3815
= 13310/ 6625 × 100 = 200.9
REVISION QUESTIONS:
1) Explain uses and limitations of index numbers
2) Given below is a table of four commodities with the corresponding prices and quantities
over the years (2012 and 2013)
TIME
Bread 5 5 7 6.5
Calculate:
a) Laspeyre’s price index
b) Paasche price index
c) Fishers price index
DECISION THEORY
Decision making is at the core of businesses and the lives of each person. Some
decisions are major and not made often while other are minor and made often. Success
in business or in life depends on the decisions made. Therefore, what is involved in
good decision making is crucial. Decision theory is an analytical and systematic
approach to the study of decision making.
It’s important to distinguish between a good decision and a bad decision. A good
decision:
Is based on logic
Is made after considering all available data and alternatives
Applies appropriate quantitative techniques
A bad decision misses at least one of these components.
Even though a good decision occasionally does not result in favourable outcome it is
still a good decision because if used in the long term it results in successful outcomes. A
bad decision sometimes by luck may results in a favourable outcome but none the less it
is still a bad decision.
There are six steps involved in taking any decision irrespective of how major or minor
it’s such as taking a trip to town or investing two millions of shillings.
a) Clearly define the problem at hand (for example, whether or not produce a new
product x)
b) List possible alternatives (strategies or courses of action) which the decision maker
can choose from. For example, production of x can be from a large plant, a small
plant or some other alternatives. Not producing at all that is doing nothing is an
important alternative. All important alternatives must be considered.
c) Identify possible outcomes. The outcomes that the decision maker has no control
over are termed as states of nature. Since the product is for sale the possible
outcomes are the kind of demand for the product that will exist in the market: the
product might have high demand or it might have low demand. The full ranges of
outcomes have to be considered; pessimistic and optimistic ones.
d) List the payoffs or profit of each combination of alternatives and outcomes. It is clear
that not all decisions can be evaluated on the basis of profit but a way to measure
benefits from different alternatives and outcomes has to be found. Such payoffs are
termed as conditional values. The payoffs are more easily compared when
presented in a payoff matrix, also termed as payoff table or decision table. (see table
1)
e) Select one of the mathematical decision theory models
f) Apply model to make the decision.
Table1: pay off table (matrix) showing conditional values for a manufacturer
State of nature
Strategy or Favourable Unfavourable
alternatives market market
Construct large plant 200,000 -180,000
Construct small plant 100,000 -20,000
Do nothing 0 0
Decision Making Environment for managers:
Managers make decision in environments which can be grouped into four states:
Certainty
Risk
Uncertainty
Conflict / Game theory
Both decision theory and game theory have the objective of assisting the decision maker
by providing a structure to enable the evaluation of information of the relative
likelihood of different outcomes so that the best course of action can be identified.
a) Environment of Certainty
Certainty exists if all information required to make a decision is known and available.
This is a case of perfect information. Assuming certainty for a problem where all the
information is not known with certainty often provides a reasonable approximation of
the optimal solution. This is where all the information about which state of nature will
occur is for sure. The model used to recommend the best cause of action is deterministic
models.
b) Environment of Risk
Condition of risk exists if perfect information is not available but the probabilities of
certain outcomes can be estimated. Therefore, decision making under risk relies heavily
on probability theory. Various stochastic methods have been developed for decision
making under conditions of risk as queuing theory. In a risk situation the different
outcomes available to the decision maker have known probabilities which can be
expressed in a probability distribution or function.
The method of using the expected monetary value (EMV) is the most popular method
of decision making under risk. EMV is the weighted sum of possible payoffs for each
alternative. In this environment it’s not known exactly which state of nature will occur.
However, there is sufficient information for us to estimate the chances of occurrence of
the various state of nature. The model is used to recommend the best cause of action is
probabilistic models (stochastic models).
This includes:
i) Maximise expected monetary value
ii) Minimum expected opportunity loss
Either the case use the formula: Expected value =Σ (Real value ₓ corresponding
probability)
E (X) = Σ X P (X)
Example:
James M is a manager who is contemplating in putting up plant which could be large or
small. The following data has to interrupt; the market demand is likely to be either
favourable or unfavourable. If James constructs a large plant and under favourable
market is likely to get a profit of 200,000, but if the market demand is unfavourable he
makes loss of 180,000. If he constructs a small plant and under a favourable market he
gets a profit of 100,000 but if the market is unfavourable he gets a loss of 20,000. Further
James believed the favourable and unfavourable markets are equally likely. Represent
the above information in decision table and advice the management on what plant to
put up basing on monetary value and opportunity loss.
Solution:
Decision table:
State of nature
Strategy or Favourable market Unfavourable market (0.5)
alternatives (0.5)
Construct large plant 200,000 - 180,000
Construct small plant 100,000 -20,000
No plant 0 0
Maximise expected monetary value:
Large plant: 200,000 (0.5) + -180,000 (0.5) = 100,000 – 90,000 = 10,000
Small plant: 100,000 (0.5) + -20,000 (0.5) = 50,000 – 10,000 = 40,000
No plant: 0 (0.5) + 0 (0.5) = 0
Decision is to put up small plant as it will maximise on the expected monetary value
Opportunity loss:
This is the amount one would lose by not taking the best alternative. It is also called the
amount of regret. To obtain the regret table, for each state on nature we get the
difference between the consequences of any alternative and the best possible alternative
i.e.
Opportunity loss table/ regret table:
Options Favourable market Unfavourable market
Large plant 200,000 – 200,000 = 0 0 - -180,000 = 180,000
Small plant 200,000 – 100,000 = 100,000 0 - -20,000 = 20,000
No plant 200,000 – 0 = 200,000 0–0=0
Expected opportunity loss;
Large plant: 0 (0.5) + 180,000 (0.5) = 90,000
Small plant: 100,000 (0.5) + 20,000 (0.5) = 60,000
No plant: 200,000 (0.5) + 0 (0.5) = 100,000
Decision is put up small plant as it minimises on the opportunity loss.
c) Environment under uncertainty
These refer to situations where more than one outcome can result from any single
decision. Several methods are used to make decision in circumstances where only the pay
offs are known and the likelihood of each state of nature are known.
a) Maximin Method
This criteria is based on the “conservative approach’ to assume that the worst possible is
going to happen. The decision maker considers each strategy and locates the minimum
pay off for each and then selects that alternative which maximizes the minimum payoff
Illustration
Rank the products A B and C applying the Maximin rule using the following payoff table
showing potential profits and losses which are expected to arise from launching these
three products in three market conditions
Table 1
Ranking the MAXIMIN rule = BAC
b) MAXIMAX method
This method is based on ‘extreme optimism’ the decision maker selects that particular
strategy which corresponds to the maximum of the maximum pay off for each strategy
Illustration
Using the above example
Max. profits row maxima
Product A +8
Product B +12
Product C +16
Illustration
Regret table in £ 000’s
Boom Steady state Recession Mini regret row
condition maxima
Product A 8 5 22 22
Product B 18 0 0 18
Product C 0 6 38 38
A regret table (table 2) is constructed based on the payoff table. The regret is the
‘opportunity loss’ from taking one decision given that a certain contingency occurs in
our example whether there is boom steady state or recession
The ranking using MINIMAX regret method = BAC
d) The expected monetary value method
The expected pay off (profit) associated with a given combination of act and event is
obtained by multiplying the payoff for that act and event combination by the probability
of occurrence of the given event. The expected monetary value (EMV) of an act is the
sum of all expected conditional profits associated with that act
Example
A manager has a choice between
i. A risky contract promising shs 7 million with probability 0.6 and shs 4 million
with probability 0.4 and
ii. A diversified portfolio consisting of two contracts with independent outcomes
each promising Shs 3.5 million with probability 0.6 and shs 2 million with
probability 0.4
Can you arrive at the decision using EMV method?
Solution
The conditional payoff table for the problem may be constructed as below.
(Shillings in millions)
Event Probability Conditional pay offs Expected pay off decision
Ei (Ei) decision
(i) Contract Portfolio(iii) Contract (i) x Portfolio (i) x
(ii) (ii) (iii)
Ei 0.6 7 3.5 4.2 2.1
E2 0.4 4 2 1.6 0.8
EMV 5.8 2.9
Using the EMV method the manager must go in for the risky contract which will yield
him a higher expected monetary value of shs 5.8 million
Investment 1£ 2£ 3£
opportunities
A 5000 7000 3000
B -2000 10000 6000
C 4000 4000 4000
Solution
Economic condition
Investment 1£ 2£ 3£ Minimum Maximum
opportunities £ £
A 5000 7000 3000 3000 7000
B -2000 10000 6000 -2000 10000
C 4000 4000 4000 4000 4000
i. Using the Maximin rule Highest minimum = £ 4000
Choose investment C
ii. Using the Maximax rule Highest maximum = £ 10000
Choose investment B
a. Minimax Regret rule
1 2 3 Maximum regret
A 0 3000 3000 3000
B 7000 0 0 7000
C 1000 6000 2000 6000
NOTE: When a competitive situation meets all these criteria above we call it a game. Only
in a few real life competitive situation can game theory be applied because all the rules are
difficult to apply at the same time to a given situation.
DEFINITION OF TERMS:
Game: It is an activity between two or more persons involving actions by each one of
them according to a set of rules which results in some gain for each. If in a game the
actions are determined by skills, it is called game of strategy but if they are determined
by chance it is termed as a game of chance.
Strategy: It is the total pattern of choices employed by any player. It’s a complete set of
plan of action specifying precisely what the player will do under every possible future
contingency that might occur during the play of the game. Two types of strategies are:
a) Pure strategy – It’s a situation where each player in the game adopts a simple
strategy as an optimal strategy. Here the value of the game is the same for both
players.
b) Mixed strategy – A player adopt a mixture of strategies if the game is played
many times. In this case the players’ uses a combination of strategies and each
player always keep guessing as to which course of action is to be selected by the
other player at a particular occasion. Thus, there is a probabilistic situation and
objective of the player is to maximize expected gains or to minimize losses.
Example
Two players X and Y have two alternatives each. They show their choices by pressing
two types of buttons in front of them but they cannot see the opponents move. It is
assumed that both players have equal intelligence and both intend to win the game.
This sort of simple game can be illustrated in tabular form as follows:
Player Y
Player X Button r Button t
Button m X wins 2 points X wins 3 points
Button n Y wins 2 points X wins 1 point
The game is biased against Y because if player X presses button ‘m’ he will always win.
Hence Y will be forced to press button r to cut down his losses
Alternative example
Player Y
Player X Button r Button t
Button m X wins 3 points Y wins 4 points
Button n Y wins 2 points X wins 1 point
In this case X will not be able to press button ‘m’ all the time in order to win (or button ‘n’).
Similarly Y will not be able to press button ‘r’ or button‘t’ all the time in order to win. In
such a situation each player will exercise his choice for part of the time based on the
probability.
3, -4, -2, 1 are the known pay offs and here the game has been represented in the form of a
matrix. When the games are expressed in this fashion the resulting matrix is commonly
known as PAYOFF MATRIX.
STRATEGY:
It refers to a total pattern of choices employed by any player. Strategy could be pure or a
mixed.
In a pure strategy, player X will play one row all of the time or player Y will also
play one of the column all the time.
In a mixed strategy, player X will play each of his rows a certain portion of the time
and player Y will play each of his columns a certain portion of the time.
In this game X cannot win so he should adopt first row strategy in order to minimize
losses
This decision rule is known as ‘maximum strategy’ i.e. X chooses the highest of these
minimum pay offs
Thus player Y will make the best of the situation by playing his 2nd column which is a
‘Minimax strategy’
This game is also a game of pure strategy and the value of the game is –1(win of 1 point
per game to y) using matrix notation, the solution is shown below
Player Y
Row Minimum
3 -1 4 2 1
Player X -1 -3 -7 0 7
4 -7 3 -9 9
4 -1 4 2
column maximum
Saddle point also gives the value of such a game. In a game having a saddle point, the
optimum strategy for both players is to play the row or column containing the saddle
point.
Note: if in a game there is no saddle point the players will resort to what is known as
mixed strategies.
b) Mixed Strategies
Example
Find the optimum strategies and the value of the game from the following pay off matrix
concerning two person game
Player Y
1 4
Player X
5 3
In this game there is no saddle point.
Let Q be the proportion of time player X spends playing his 1st row and 1-Q be the
proportion of time player X spends playing his 2nd row;
Similarly
Let R be the proportion of time player Y spends playing his 1st column and 1-R be the
proportion of time player Y spends playing his second row
The following matrix shows this strategy
Player Y
R 1 R
Q 1 4
1 Q 5 3
Player X
X’s strategy
X will like to divide his play between his rows in such a way that his expected winning or
loses when Y plays the 1st column will be equal to his expected winning or losses when y
plays the second column
Column 1
Points Proportion played Expected winnings
1 Q Q
5 1-Q 5(1-Q)
Step II
Interchange each of these pairs of subtracted numbers found in step I
1 4 2
5 3 3
1 4
Thus player X plays his two rows in the ratio 2: 3
And player Y plays his columns in the ratio 1:4
This is the same result as calculated before
DOMINANCE
Dominated strategy is useful for reducing the size of the payoff table.
Rule of dominance
i. If all the elements in a column are greater than or equal to the corresponding
elements in another column, then the column is dominated.
ii. Similarly if all the elements in a row are less than or equal to the corresponding
elements in another row, then the row is dominated.
Dominated rows and columns may be deleted which reduces the size of the game to a 2 by
2 game.
N.B. Always look for dominance then saddle points first when solving a game problem.
Example:
Determine the optimum strategies and the value of the game from the following 2 x m pay
off matrix game for X and Y
Y
6 3 1 0 3
X
3 2 4 2 1
In this columns I, II, and IV are dominated by columns III and V hence Y will not play
these columns.
So the game is reduced to 2×2 matrix, hence this game can be solved using methods
already discussed.
Y
1 3
X
4 1