Business Statistics Material
Business Statistics Material
Business Statistics Material
Chapter One
Introduction to Business Statistics
1.1 Introduction
Nowadays most executives and other decision makers pass effective decisions based on research findings. Most
researches in different areas of study require data so as to generate valuable information that facilitate the
decision making process. Data are raw materials for researches. Moreover, the quality of the collected data
greatly affects or determines the precision of results to be obtained from a specific investigation. Therefore, it is
extremely important to know about the basics of data collection.
We encounter the term statistics frequently in our everyday language. It really has two meanings. In the more
common usage, statistics refers to numerical information. Examples include the average salary of instructors in
Oda Bultum University, the average number of cars sold per week, the percentage of students attending
Cooperatives department, the number of car accidents occurred over the last five years and the number of deaths
due to HIV AIDS in Chiro town last year. In these examples statistics is a number or a percentage.
Other examples include the mean time spent on waiting to get service in Dashen Bank is 10 minutes, typical
Toyota automobile in Ethiopia travels 15,000 kilo meters per year.
The above are examples of a statistics. A collection of more than one figure is called statistics (plural).
Statistics can appear in graphic form as well as in sentence form. The subject of statistics has a much broader
meaning than just collecting and publishing numerical information.
For numerical facts to be called Statistics they should be comparable either period wise or region wise, or in
reference to some other means of comparison.
As an example, suppose that the marketing head of a given supermarket in Hawassa wants to know the average
expenditure of households in the city, among other things, so as to revise his marketing strategy. To achieve his
objective, the head collects data on expenditure from a sample of 1000 household‟s selected using stratification
.Moreover; the head used the interview approach to gather the required information. Thus, the data collected by
the marketing head are Statistics as they fulfill all the requirements of the definition.
Limitations of statistics
The fact that Statistics is applicable in almost all fields of study is not a guarantee for its perfection. Of course,
there is no perfect science in the globe. Statistical methods as well have their own limitations. The following are
the major limitations:
i. Statistics does not deal with individual items
This is to mean that Statistics deals only with aggregates of facts and no importance is attached to individual
items. For instance, age of a single student in a given class in a given year is not a Statistical data. In contrast,
the age of all students within a given class in a given year form an aggregate and hence can be considered as
data. Alternatively, the semester GPA of a single student for 4 semesters also forms a Statistical data. In short,
Statistical methods are suited only to those problems or situations where group characteristics are desired to be
studied.
ii. Statistics deals only with quantitatively expressed items
Another limitation of Statistics is that it deals with those subjects of inquiry that are capable of being
quantitatively measured and numerically expressed. Accordingly, such qualitative characteristics as health,
poverty, honesty and intelligence are not suitable for Statistical analysis however; problems involving such
qualitative variables are treated in Statistics indirectly. For example, the variable health may be studied through
death rate, which is a quantitative variable. However, these are only indirect methods.
iv. Statistical results are not universally true
As it is often said, Statistical results are true only on the average. Meaning, the results obtained from Statistical
data analysis are not true for each member or item within the data for which the analysis is made. Statistical
statements or conclusions are not generally true or applicable to individuals, but are applicable to the majority
of cases.
v. Statistics is liable to be misused
Misuses of Statistics, unfortunately, are probably as common as valid uses of Statistics. In reality, Statistical
methods can be properly used by experienced or trained people, as it requires skill to draw sensible conclusions
from data. It is actually this limitation that hinders the possibility of mass popularity of such a useful and
applicable science.
1.5 Classification of Statistics
The study of statistics is usually divided into two categories: descriptive statistics and inferential statistics.
Descriptive Statistics: Methods of organizing, summarizing, and presenting data in an informative way.
Descriptive statistics is a branch of statistics devoted to accurate representation of a mass of data with graphs
and summery measures. It deals with collecting, summarizing and simplifying data to draw meaningful
conclusions. It does not attempt to use samples to predict the parameters of population. It does not look beyond
the data at hand, but rather concentrates on how best to understand and present these data. Measures of central
tendency, dispersion, skeweness and kurtosis are examples of descriptive statistics. The data can be presented
using tools like graphs, tables, averages, mode, medians etc.
Calculating the average age of students at Oda Bultum University from graduating class students;
Recording second year cooperatives students grade for the previous semester and then finding the
average of these grades;
Drawing graphs that show the difference in brand of cars sold in the year 2014.
Inferential Statistics
Another facet of statistics is inferential statistics that is also called statistical inference and inductive statistics.
Our main concern regarding inferential statistics is finding out something about a population based on a sample
taken from the population. For example, based on a sample survey by Ethiopian Reporter newspaper, only 60%
of young people prefer to drink coca-cola in Ethiopia. Since this is inference about the population (all young
people in Ethiopia) based on sample data, we refer to them as inferential statistics.
If the Ethiopian economic association reports that the domestic product of Ethiopia this year is 120 million tons,
this is descriptive statistics. But if the association predicts the domestic product to be doubled after 10 years
based on the present information, this is inferential statistics.
Inferential Statistics: the methods used to find out something about a population, based on a sample.
Note the words “population” and “sample” in the definition of inferential statistics. We often make reference
to the population living in Ethiopia. However, in statistics the word population has a broader meaning. A
population may consist of individuals such as all the students enrolled at Hawassa University, all second year
cooperative students, or all the prisoners at Kaliti Prison. A population may also consist of a group of
measurements, such as all the heights of the cooperatives students in Awada campus. Thus, a population in the
statistical sense of the word does not necessarily refer to people.
Population can be finite (limited in its size) or infinite (unrestricted). In finite population, observations are
countable- at least in theory. In contrast, infinite population is indefinitely large. The observations cannot be
even in theory. To infer something about a population, we usually take a sample from the population.
A Sample is a portion, or part, of the population of interest. Parameter: It is a measurable characteristic of the
population or it is a numerical result obtained as measuring the population. Statistic: It is a measurable
characteristic of the sample. In short it is a sample result.
a) Examples: color, beauty, sex, location qualitative variables are also called categorical variables. A
categorical variable is a variable with categorical data,
Hence we have two types of data; qualitative/ Categorical & quantitative data.
Data that can be grouped by specific categories are referred to as categorical data. Categorical data use
either the nominal or ordinal scale of measurement.
Data that use numerical values to indicate how much or how many are referred to as quantitative data.
Quantitative data are obtained using either the interval or ratio scale of measurement.
Quantitative variables can be further classified as
Discrete variables, and
Continuous variables
a) Discrete variables are variables whose values are counts and all discrete variables have gaps in their
scale of measurement. Discrete variables take on only a finite set of values. Typically discrete variables
result from counting. Examples: number of students, number of households (family size), Number of
pages of a book. The number of cars sold in any day as the number of cars sold must be 0, 1, 2, …. It
cannot be between 0 and 1 or 1 and 2. Number of people visiting a bank on any day, Number of cars in a
parking list and Family size are examples.
b) Continuous variables are variables that can have any value within an interval. Continuous variables
take on an infinite number of values. Typically, continuous variables result from measuring.
Examples: weight, Length, Volume, temperature and elevation. e t c.
Chapter Two
Sampling and Sampling Distributions
Statistics is a science of inference. It is the science of making general conclusion about the entire group (the
population) based on information obtained from a small group or sample. In statistics we are interested in
obtaining information about a total collection of elements, which we will refer to as population. For instance,
we might have all the residents of a given state, or all the television sets produced in last year by a particular
manufacturer. In such cases, we try to learn about the populations by choosing a sub-group of its elements. This
sub group of population is called a sample.
2.1. Sampling Theory
Sampling theory is the study of relationships existing between a population and samples drawn from the
population. Sample is a part of the population from which it is selected.
The process of selecting a sample is known as sampling. Thus, the sampling theory is a study of relationship that exists
between the population and the samples drawn from the population. The complete enumeration, popularly known
as census, may not be feasible either due to non-availability of time or because of high cost involved.
2.1.1. Basic Definitions
Sampling: - May be defined as the selection of some parts of an aggregate or totality on the basis of which a
judgment or inference about the aggregate or totality is made.
Statistic: - Statistical measurable value of the sample or a measurable characteristic value of the sample.
Parameter: - A measurable value of the population or a measurable characteristic value of the population. It is a
population result.
Sample frame; - it is a potential respondents/population where a sample to be chosen. It is a listing of items that
make up the population.
Sampling design: - A sample design is a definite plan for obtaining a sample from the sampling frame.
2.1.2. The need for samples
It is often not feasible to study the entire population. Some of the major reasons why sampling is necessary are
listed as follows;
A. The destructive nature of certain testes. Many experiments especially in quality control demand
destructing outputs consider the following tests:
Testing wine or coffee
Testing strength of light bulbs
Blood test for a patient
Unless sample is taken from the entire population the wine tester should drink all the wine and all the light
bulbs produced should be destroyed nothing would remain for sale and also all the blood from the patient
should be poured-out the patient will die. Here sample is a must.
B. The physical impossibility of checking all items is the population.
The populations of fish, birds and other wild lives are large and are constantly moving being born and dying.
There is no mechanism to contact all items or individual members of the population.
C. The cost of studying all the items in a population is often prohibitive.
Public opinion polls and consumer testing organizations usually contact fewer families out of millions. Consider
a multi-national corporation with 50 million customers worldwide. If this company plans to undertake market
survey out of the 50 million it will take 2000 samples, if it takes 20 br. to mail samples and tabulate the
responses of 2000 samples, the same survey involving 50 million populations would cost about one billion br.
There are N C n distinct possible samples in the case of sampling without replacement; the chance of selecting
1
each one of them is .
N Cn
There are possible samples in the case of sampling with replacement, the chance of selecting each one of
them is 1/ .
i) Lottery method
Example: If we want to take a sample of 25 persons out of a population of 150, the procedure is to write the
names of all the 150 persons on separate slips of papers, fold these slips, mix them thoroughly and then make a
blindfold selection of 25 slips without replacement.
These numbers are very widely used in all the sampling techniques and have proved to be quite reliable as
regards accuracy and representivness.
A systematic random sample should not be used; if there is a predetermined pattern to the population. Values are listed
in ascending or descending order.
A systematic random sample should not be used; if there is a predetermined pattern to the population.
Values are listed in ascending or descending order.
The procedure starts in determining the first element to be included in the sample, select a unit i
randomly from the first group, i as the first element. The second unit will be (i+k)th element from the
frame. Totality we have a sample of size n from the population of size N, i th, (i+k)th, (i+2k)th,… (i+(n-
1)k)th element of the population are taken as a sample.
Example:
Suppose that N = 20 and we want to select a sample of size 4, so that k = N/n =20/4 = 5.
The first element in the sample is selected from the first 5 units randomly, say 3 rd, which is the random
start. Then, every 5th unit is selected, and the sample contains the 3rd, 8th, 13th and 18th units of the
population.
C. Stratified Random Sample
A population is first divided into subgroups called strata, and a sample is selected from each stratum. Stratum
can be
- Proportional sample / to the population or
- Non-proportional sample.
Stratified sampling has the advantage in some cases of more accuracy reflecting the characteristics of the
population than simple random or systematic random sampling.
Proportionate stratified sample The size of the sample selected from each subgroup is proportional to
the size of that subgroup in the entire population.
Disproportionate stratified sample The size of the sample selected from each subgroup is
disproportional to the size of that subgroup in the population. Here, equal numbers of elements are
selected from each stratum regardless of how the stratum is represented in the population.
Example. Studying advertising expenditure of 352 large companies. Profitability percentage is used to
stratify this population. We need to select 50 samples.
Convenience sampling is a non-probability sampling technique. As the name implies, the sample is identified
primarily by convenience.
Elements are included in the sample without pre-specified or known probabilities of being selected.
B. Judgment Sampling
One additional non probability sampling technique is judgment sampling. In this approach, the person most
knowledgeable on the subject of the study selects elements of the population that he or she feels are most
representative of the population.
Hence, you get the opinions of preselected experts in the subject matter. Although the experts may be well
informed, you cannot generalize their results to the population.
C. Quota sampling
In this technique, quota is set up according to given criteria, but the sample with in prescribed quota is selected
by personal judgment of the investigator. It is suitable in market and public opinion surveys where stratification
is very difficult.
However, it suffers from representivness as the interviewer may select samples convenient for him with regards
to location and sample unit.
It is the combination of judgment and stratified sampling methods. So it enjoys the merits of both.
Generally, non probability samples can have certain advantages, such as convenience, speed, and low cost. However,
their lack of accuracy due to selection bias and the fact that the results cannot be used for statistical inference more
than offset these advantages.
.
Chapter three
Hypothesis Testing
3.1. Basic concepts
Hypothesis is a statement about the value of a population parameter developed for the purpose of testing
or hypothesis is an assertion or tentative solution.
Hypothesis testing is a procedure based on sample evidence and probability distribution used to alter
whether the hypothesis is a reasonable statement and should be not be rejected, or is unreasonable and
should be rejected.
There are two types of hypothesis:
1) The Null hypothesis: - is an assertion that a population parameter assumes a fixed value. It always
includes the equality sing, and is denoted by Ho. The null hypothesis is often established in such a way
that it states „nothing is different‟ from what it is supposed to be, is claimed to be, or has been in the
past.
2) The alternative hypothesis: - describes what you will conclude if you reject the null hypothesis. It is a
statement that is accepted if the sample data provide evidence that the null hypothesis is false. It is
written as H1 and is read “H sub-one”. It is also referred to as the research hypothesis. The alternative
hypothesis is accepted if the sample data provide us with statistically significant evidence that the null
hypothesis is false.
Level of significance is the risk we assume of rejecting the null hypothesis when it is a actually true.
The level of significance is designated by the Greek letter alpha, ,, it is also referred to as the level of
risk
Step III: Find the Test statistic
There are many test statistics, Z (the normal distribution), the student t test, F, and X2or the chi –square.
Test statistic – A value, determined from sample information, used to reject or not to reject the null
hypothesis.
The standard normal deviate, Z distribution is used as test statistic when the sample size is large, n 30.
In hypothesis testing the test static Z is computed by
xN
Z=
n
Step IV: Determine the decision rule
A decision rule is a statement of the conditions under which the null hypothesis is rejected and the
conditions under which it is not rejected.
The critical value separates the critical region from the noncritical region. The symbol for critical
value is C.V.
The critical or rejection region is the range of values of the test value that indicates that there is a
significant difference and that the null hypothesis should be rejected.
The noncritical or non-rejection region is the range of values of the test value that indicates that
the difference was probably due to chance and that the null hypothesis should not be rejected.
Steps V: Take a sample and made a decision
At this step a decision is made to reject or not to reject the null hypothesis.
5.3. Type I and type II errors (concepts)
In hypothesis testing, there are two possible kinds of errors called type I error and type II error.
1. Type I error: - is the error committed in rejecting the null hypothesis while it is actually true. The
probability of type I error is denoted by and is called the level of significance. i.e., the level of
significance is the probability of rejecting the null hypothesis when it is actually true.
The level of significance is also referred to as the level of risk. This may be a more appropriate
term because it is the risk you take of rejecting the null hypothesis when it is really true.
There is no unique level of significance; it depends up on the choice of the researcher.
The researcher must decide on the level of significance before formulating a decision rule and
collecting sample data.
There are two commonly used levels of significances. The .05 and .01.
2. Type II error: - is the error that is committed in accepting the null hypothesis when it is actually false.
The probability of type II error is designated by a Greek letter beta ()).
We often refer to these two possible errors as the alpha error, ,, and the beta error, ,, Alpha ()) is
the probability of making a Type I error, and beta ()) is the probability of making a type II error.
Notice that there are two possibilities for a correct decision and two possibilities for an incorrect
decision.
Possible outcomes of a hypothesis test
H0true H0 false
Error Correct
Reject H0 Type I Decision
Correct Error
Do not reject decision Type II
H0
2.33
Find critical and noncritical Regions for 0.01 (Right-Tailed Test)
Critical region
For a two-tailed test, then, the critical region must be split into two equal parts. If = 0.01, then one-half of the
area, or 0.005, must be to the right of the mean and one half must be to the left of the mean.
Non critical
Critical region region Critical region
-2.58 +2.58
Test statistics =
The observed value is the statistic (such as the mean) that is computed from the sample data.
The expected value is the parameter (such as the mean) that you would expect to obtain if the null
hypothesis were true in other words, the hypothesized value.
The denominator is the standard error of the statistic being tested (in this case, the standard error of the
mean).
The z test is defined formally as follows.
The z test is a statistical test for the mean of a population. It can be used when n 30, or when the
population is normally distributed and s is known. The formula for the z test is
X O
Z
n
X O $43,250 $42,000
Z Z 1.32
n 5230 30
Step 4 Make the decision. Since the test value, +1.32, is less than the critical value, +1.65, and is not in the
critical region, the decision is to not reject the null hypothesis.
Step 5 Summarize the results. There is not enough evidence to support the claim that assistant professors earn
more on average than $42,000 per year.
P- Value method
Step: 1 State the hypotheses and identify the claim.
H0: µ= $42,000 and H1: µ$42,000 (claim)
Step 2 Compute the test value.
X O $43,250 $42,000
Z Z 1.32
n 5230 30
Step 3 Find the P-value., find the corresponding area under the normal distribution for z = 1.32. It is 0.4066.
Subtract this value for the area from 0.5 to find the area in the right tail. 0.5 - 0.4066 = 0.0934.
Example 2: A researcher claims that the average wind speed in a certain city is 8 miles per hour a sample of 32
days has an average wind speed of 8.2 miles per hour. The standard deviation of the population is
0.6 mile per hour. At =0.05, is there enough evidence to reject the claim? Use the P-value
method.
Solution
Step 1 State the hypotheses and identify the claim.
H0: µ= 8 (claim) and H1: µ8
Step 2 Compute the test value.
X O 8.2 8
Z Z 1.89
n 0.6 32
Step 3Find the P-value. Find the corresponding area for z =1.89. It is 0.9706. = (0.5 + 0.4706=0.9706) subtract
the value from 1.0000. 1.0000 - 0.9706 = 0.0294.
Since this is a two-tailed test, the area of 0.0294 must be doubled to get the P-value. 2(0.0294) = 0.0588
Step 4 Make the decision. The decision is to not reject the null hypothesis, since the P-value is greater than
0.05.
Step 5 Summarize the results. There is not enough evidence to reject the claim that the average wind speed is 8
miles per hour.