Average: Sagni D. 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 85

CHAPTER ONE

1. Introduction
1.1. Definitions and classification of Statistics
Definitions: We can define statistics in two ways (senses).
In the plural sense: statistics are the raw data themselves (Numerical facts), like statistics
of births, statistics of deaths, statistics of students, statistics of imports and exports, etc.
In the singular sense: statistics is concerned with scientific methods for collecting,
organizing, summarizing, presenting and analyzing data as well as deriving valid
conclusions and making reasonable decisions on the basis of this analysis.
Classifications: Depending on how data can be used statistics is sometimes divided in to two
main areas or branches.
1. Descriptive Statistics:
Is concerned with summary calculations, graphs, charts and tables.
Generally characterizes or describes a set of data elements by graphically displaying the
information or describing its central tendencies and how it is distributed.
In descriptive statistics our objective is to describe a group of data that we have „in hand‟
i.e. data that are accessible to us.
We are not interested in other data that we are not gathered.
Example: the following data refers to the number of malaria patients who have been treated in
Nekemte Hospital from 1986 to 1990 E. C. 3645; 4568; 5432; 6751; 7369
If we calculate the average malaria patients from 1986 to 1990 as
1
Average  (3645  4568  5432  6751  7369)  5553 then our work belongs to the domain of
5
descriptive statistics.
If we say that there was an increase of 724 patients from 1986 to 1990, then again this belongs to
the domain of descriptive statistics.
2. Inferential Statistics: consists of generalizing from samples to populations, performing
estimations and hypothesis tests, determining relationships among variables, and making
predictions. Statistical techniques based on probability theory are required.
Example 2: Suppose we want to have an idea about the percentage of illiterates in our country.
We take a sample from the population and find the proportion of illiterates in the sample. This
sample proportion with the help of probability enables us to make some inferences about the
population proportion. This study belongs to inferential statistics.
1.2. Stages in Statistical Investigation
Before we deal with statistical investigation, let us see what statistical data mean. Each and every
numerical data can‟t be considered as statistical data unless it possesses the following criteria.
These are:
The data must be aggregate of facts
They must be affected to a marked extent by a multiplicity of causes
They must be estimated according to reasonable standards of accuracy
The data must be collected in a systematic manner for predefined purpose
The data should be placed in relation to each other
Sagni D. 1
A statistician should be involved at all the different stages of statistical investigation. This
includes formulating the problem, and then collecting, organizing and classifying, presenting,
analyzing and interpreting of statistical data. Let‟s see each stage in detail
II.. Formulating the problem: first research must emanate if there is a problem. At this stage
the investigator must be sure to understand the problem and then formulate it in statistical
term. Clarify the objectives very carefully. Ask as many questions as necessary because “An
approximate answer to the right question is worth a great deal more than a precise answer to
the wrong question.” -The first golden rule of applied mathematics-
Therefore, the first stage in any statistical investigation should be to:
Get a clear understanding of the physical background to the situation under study;
Clarify the objectives;
Formulate the objective in statistical terms
IIII.. Proper collection of data: in order to draw valid conclusions, it is important „good‟ data.
Data are gathered with aim to meet predetermine objectives. In other words, the data must
provide answers to problems. The data itself form the foundation of statistical analyses and
hence the data must be carefully and accurately collected. In section 1.6 we will see the
methods of data collection.
IIIIII.. Organization and classification of data: in this stage the collected data organized in a
systematic manner. That means the data must be placed in relation to each other. The
classification or sorting out of data is, by itself, a kind of organization of data.
IIVV.. Presentation of data: The purpose of putting the organized data in graphs, charts and tables
is two-fold. First, it is a visual way to look at the data and see what happened and make
interpretations. Second, it is usually the best way to show the data to others. Reading lots of
numbers in the text puts people to sleep and does little to convey information.
VV.. Analyses of data: is the process of looking at and summarizing data with the intent to extract
useful information and develop conclusions. Data analysis is closely related to data mining,
but data mining tends to focus on larger data sets, with less emphasis on making inference,
and often uses data that was originally collected for a different purpose. In this stage different
types of inferential statistical methods will apply. For instance, hypothesis testing such as
 2 test of association.
V
VII.. Interpretation of data: interpretation means drawing valid conclusions from data which form
the basis of decision making. Correct interpretation requires a high degree of skill and
experience.
Note that: Analyses and interpretation of data are the two sides of the same coin.

1.3. Definition of Some Terms


In this section, we will define those terms which will be used most frequently. These are:
Data: are the values (measurements or observations) that the variables can assume. Or. Facts or
figures from which the conclusion can be drawn
Data set: Facts or figures collected for a particular study. Each value in the data set is called data
value or datum.

Sagni D. 2
Raw Data: Data sheets are where the data are originally recorded. Original data are called raw
data. Data sheets are often hand drawn, but they can also be printouts from database programs
like Microsoft Excel.
Population: The totality of all subjects with certain common characteristics that are
being studied in a specified time and place.
Sample: Is a portion of a population which is selected using some technique of sampling.
Sample must be representative of the population so that it must be selected by any of the
developed technique.
Sampling: Is the process of selecting units (e.g., people, households, organizations) from a
population of interest so that by studying the sample we may fairly generalize our results back to
the population from which they were chosen. There are two types of sampling techniques namely
random sampling technique and non-random sampling technique.
Sample size: The number of elements or observation to be included in the sample.
Parameter: Any measure computed from the data of a population.
Example: Populations mean   and population standard deviation  
Statistic: Any measure computed from the sample.

Example: sample mean x , sample standard deviation S 
Survey: A collection of quantitative information about members of a population when no special
control is exercised over any of the factors influencing the variable of interest.
Sample survey: A survey that include only a portion of the population.
Census: A collection of information about every member of a population
Sample survey has the following advantages over census
Sample survey saves time and cost Avoid wastage of material
Has great accuracy
Variable: A variable is a characteristic or attribute that can assume different values. Variables
whose values are determined by chance are called random variables. Variables are often
specified according to their type and intended use and hence variable can be classified in to two
namely qualitative and quantitative variables.
A quantitative variable is naturally measured as a number for which meaningful arithmetic
operations make sense. Examples: Height, age, crop yield, GPA, salary, temperature, area,
air pollution index (measured in parts per million), etc.
Qualitative variable: Any variable that is not quantitative is qualitative. Qualitative
variables take a value that is one of several possible categories. As naturally measured,
qualitative variables have no numerical meaning. Examples: Hair color, gender, field of
study, marital status, political affiliation, status of disease infection.
Quantitative variables can be classified as discrete and continuous variable. Discrete variables
can assume certain numerical values. That is, there are gaps between the possible values. Such as
0, 1, 2... It may be countable finite or countable infinite. For example the number of students in a
classroom, number of children a family. Continuous variable can take any value within a
specified interval with a finite enough measuring device. No gaps between possible values. They
are obtained by measuring. For example, consider the heights of two people no matter how close

Sagni D. 3
it is we can find another person whose height falls somewhere between the two heights is a
continuous variable.
1.4. Applications, uses and limitations of statistics
I. Applications of statistics
Apart from helping elicit an intelligent assessment from a body of figures and facts, statistics
is indispensable tool for any scientific enquiry-right from the stage of planning enquiry to the
stage of conclusion. It applies almost all sciences: pure and applied, physical natural,
biological, medical, agricultural and engineering. It also finds applications in social and
management sciences, in commerce, business and industry.
In almost all fields of human endeavor.
Almost all human beings in their daily life are subjected to obtaining numerical facts
Applicable in some process e.g. invention of certain drugs, extent of environmental pollution.
In industries especially in quality control area.
II. Uses of Statistics
Statistics presents fact in the form of numerical data
It condenses and summarizes a mass of data in to a few presentable and precise figures.
It facilitates comparison of data
It helps in formulating and testing hypothesis
It helps in predicting future trend
It helps in formulating polices.
III. Limitations of Statistics
Statistics with all its wide application in every sphere of human activity has its own limitation.
Some of them are given below
Statistics is not suitable to the study of qualitative phenomenon: Since statistics is
basically a science and deals with a set of numerical data, it is applicable to the study of only
these subjects of enquiry, which can be expressed in terms of quantitative measurements. As
a matter of fact, qualitative phenomenon like honesty, poverty, beauty, intelligence etc,
cannot be expressed numerically and any statistical analysis cannot be directly applied on
these qualitative phenomenons. Nevertheless, statistical techniques may be applied indirectly
by first reducing the qualitative expressions to accurate quantitative terms. For example, the
intelligence of a group of students can be studied on the basis of their marks in a particular
examination.
Statistics does not study individuals: Statistics does not give any specific importance to the
individual items; in fact it deals with an aggregate of objects. Individual items, when they are
taken individually do not constitute any statistical data and do not serve any purpose for any
statistical enquiry.
Statistical laws are not exact: It is well known that mathematical and physical sciences are
exact. But statistical laws are not exact and statistical laws are only approximations.
Statistical conclusions are not universally true. They are true only on an average.
Statistics table may be misused: Statistics must be used only by experts; otherwise,
statistical methods are the most dangerous tools on the hands of the inexpert. The use of
statistical tools by the inexperienced and untraced persons might lead to wrong conclusions.
Sagni D. 4
Statistics can be easily misused by quoting wrong figures of data. As King says aptly
„statistics are like clay of which one can make a God or Devil as one pleases.‟
Statistics is only, one of the methods of studying a problem: Statistical method does not
provide complete solution of the problems because problems are to be studied taking the
background of the countries culture, philosophy or religion into consideration. Thus the
statistical study should be supplemented by other evidences.

1.5. Scales of measurement


Normally, when one hears the term measurement, they may think in terms of measuring the
length of something (i.e. the length of a piece of wood) or measuring a quantity of something
(i.e. a cup of flour). This represents a limited use of the term measurement. In statistics, the term
measurement is used more broadly and is more appropriately termed scales of measurement.
Scales of measurement refer to ways in which variables or numbers are defined and categorized.
Each scale of measurement has certain properties which in turn determine the appropriateness for
use of certain statistical analyses. The four scales of measurement are nominal, ordinal, interval,
and ratio.
Nominal Scales
Nominal scales possess the following properties.
 Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
 No arithmetic and relational operation can be applied.
 No quantitative information is conveyed
 Thus only gives names or labels to various categories.
Examples:
 Political party preference (Republican, Democrat, or Other,)
 Sex (Male or Female.)
 Marital status (married, single, widow, divorce)
 Country code
 Regional differentiation of Ethiopia.
2. Ordinal Scales
Ordinal Scales are measurement systems that possess the following properties:
 Level of measurement which classifies data into categories that can be ranked, however
differences between the ranks do not exist.
 Arithmetic operations are not applicable but relational operations are applicable.
 Ordering is the sole property of ordinal scale.
Examples:
 Letter grades (A, B, C, D, F).
 Rating scales (Excellent, Very good, Good, Fair, poor).
 Military status.
3. Interval Scales
Interval scales are measurement systems that possess the following properties:

Sagni D. 5
 Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.
Examples: IQ, Temperature in F0.
4. Ratio Scales
Ratio scales measurement possess the following properties: Level of measurement which
classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios
exist between the different units of measure.
 All arithmetic and relational operations are applicable.
Examples:
 Weight  Number of students
 Height  Age
Use of level of measurements
 Helps you decide how to interpret the data from the variable.
 Helps you decide what statistical analysis is appropriate on the values that were assigned.
For example if a measurement is nominal then you know that you never average the data
level.

Sagni D. 6
CHAPTER TWO
2. Methods of data collection and Organization
Once it is decide what type of study is to be made, it becomes necessary to collected information
about the concerned study, mostly in the form of data. In order to generate valid conclusion from
a data, information has to be collected in a systematic manner. Whatever the quality of sampling
and analysis method, a haphazardly collected dataset is less likely to produce valuable and
generalizable information.
Types of Data:- There are two types (sources) for the collection of data.
1) Primary data
The primary data are the first hand information collected, compiled and published by
organization for some purpose. They are most original data in character and have not
undergone any sort of statistical treatment.
Refer to those that are collected by conducting survey to meet the specific problem needs at
hand.
Example: Population census reports are primary data because these are collected, complied and
published by the population census organization.
2) Secondary data
The secondary data are the second hand information which are already collected by someone
(organization) for some purpose and are available for the present study. The secondary data
are not pure in character and have undergone some treatment at least once.
Data taken from already available published or unpublished source.
2.1 Methods of collection
There are three major methods of data collection
1. self-administered questionnaire
2. direct investigation-measurement (observation) of the subject and interviewing(face-to-
face, telephone, --- )
3. the use of documentary source
1. Self-administered questionnaire
Questionnaire is the main data collection instrument in formal sample survey. Before
examining the steps in designing a questionnaire we need to review the types of questions used
in questionnaires. Depending on the amount of freedom given to respondent in offering
responses, there are two basic types of questions that can be used in questionnaires: open-
ended questions and closed ended questions.

Sagni D. 7
The type of questions for use will be determined by the form of responses wanted, the nature of
the respondents and their ability to answer the questions.
Open–ended questions:- allows the respondent to answer it freely in his or her own words
Example: what do you think are the reasons for a high drop-out rate of village health
committee members?
Closed–ended questions:- Predetermined list of alternate responses is presented to the
respondent for checking the appropriate one(s). It implies that the respondent‟s answers are
restricted in some way to a limited range of alternatives.
Advantage
It is the cheapest and can be conducted by a single researcher.
Questionnaires can be sending to a wide geographical area.
There is no interviewer variability
Disadvantage
Low response rate
No assurance that the questioners were answered by the right person.
Mail questionnaire is not suitable for illiterate community
2. Direct investigation
i) Measurement or/and observation
Data can be obtained through direct observation or measurement that provides accurate
information but it is expensive and inconvenient
eg: Land area measurement, Animal weight gain, Physical examination, direct
observation of work.
ii) Interview
a) Face-to-Face interview
Advantage:-
Interviewers can observe the surroundings and can use nonverbal communication and
visual aids.
The interviewer can help the respondent if he/she has difficulty in understanding the
questions.
Respondent is likely to answer all the questions alone
Disadvantage:-
Cost is high
Interviewer bias is also high
Sagni D. 8
Untrained interviewer may distort the meaning of the questions
b) Telephone Interview
Advantage:-
It is less expensive in time and money compared to face to face interviews
Relatively high response rate
Reach people who would not open their doors to an interviewer, but might willing to
talk on the telephone
Disadvantage:-
Unrepresentative of the groups which do not have telephones
Unlisted telephone numbers are excluded from the study.
Respondent may be substitute by another
3. The use of documentary source
Extracting information from existing resources.
Is much less expensive than any other two sources
It is difficult to get the information needed when records are compiled in un standardized
manner.
Example: - Hospital records, professional institutes, Official statistics, - - -
2.2. Methods of Data Organization
This topic introduces tabular and graphical methods commonly used to summarize both
qualitative and quantitative data. Tabular and graphical summaries of data can be obtained in
annual reports, newspaper articles and research studies. Everyone is exposed to these types of
presentations, so it is important to understand how they are prepared and how they will be
interpreted.
Modern statistical software packages provide extensive capabilities for summarizing data and
preparing graphical presentations. MINITAB, SPSS and STATA are three packages that are
widely available.
2.2.1 Editing of Data.
After collecting the data either from primary or secondary source, the next step is its editing.
Editing means the examination of collected data to discover any error and mistake before
presenting it. It has to be decided before hand what degree of accuracy is wanted and what extent
of errors can be tolerated in the inquiry. The editing of secondary data is simpler than that of
primary data.

Sagni D. 9
2.2.2. Classification of Data
The process of arranging data into homogenous group or classes according to some common
characteristics present in the data is called classification. For Example, The process of sorting
letters in a post office, the letters are classified according to the regions and further arranged
according to zones, cities, etc.
Bases of Classification:- There are four important bases of classification:
(1) Qualitative Base:- When the data are classified according to some quality or attributes such
as sex, religion, literacy, intelligence etc…
(2) Quantitative Base:- When the data are classified by quantitative characteristics like heights,
weights, ages, income etc…
(3) Geographical Base:- When the data are classified by geographical regions or location, like
states, provinces, cities, countries etc…
(4) Chronological or Temporal Base:- When the data are classified or arranged by their time of
occurrence, such as years, months, weeks, days etc… For Example: Time series data.
2.2.3 Tabulation of Data
The process of placing classified data into tabular form is known as tabulation. A table is a
symmetric arrangement of statistical data in rows and columns. Rows are horizontal
arrangements whereas columns are vertical arrangements.
2.2.4 Frequency distribution
A frequency distribution is the organization of row data in table form, using classes and
frequencies. There are three basic types of frequency distributions, and there are specific
procedures for constructing each type. The three types are categorical, ungrouped and grouped
frequency distributions.
The reasons for constructing a frequency distribution are as follows
To organize the data in a meaningful, intelligible way.
To enable the reader to determine the nature or shape of the distribution
To facilitate computational procedures for measures of average and spread
To enable the researcher to draw charts and graphs for the presentation of data
To enable the reader to make comparisons between different data set
2.2.4.1. Categorical Frequency Distribution:- The categorical frequency distribution is used
for data which can be placed in specific categories such as nominal or ordinal level data. For
example, data such as data such as political affiliation, religious affiliation, or major field of
study would use categorical frequency distribution.

Sagni D. 10
The major components of categorical frequency distribution are class, tally and frequency.
Moreover, even if percentage is not normally a part of a frequency distribution, it will be added
since it is used in certain types of graphical presentations, such as pie graph.
Example 2.1: Twenty-five army inductees were given a blood test to determine their blood type.
The data set is given as follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
Solution:
A B C D
Class Tally Frequency Percent
A //// 5 20
B //// // 7 28
O //// //// 9 36
AB //// 4 16

2.2.4.2 Ungrouped Frequency Distribution


When the data are numerical interested of categorical, the range of data is small and each class is
only one unit, this distribution is called an ungrouped frequency distribution.
The major components of this type of frequency distributions are class, tally, frequency and
cumulative frequency. The steps are almost similar with that of categorical frequency
distribution.
Cumulative frequencies are used to show how many values are accumulated up to and including
a specific class.
Example 2.2: The following data represent the number of days of sick leave taken by each of 50
workers of a company over the last 6 weeks.
2 0 0 5 8 3 4 1 0 0 7 1
7 1 5 4 0 4 0 1 8 9 7 0
1 7 2 5 5 4 3 3 0 0 2 5
1 3 0 2 4 5 0 5 7 5 1 1
0 2
A. Construct ungrouped frequency distribution
B. How many workers had at least 1 day of sick leave?
C. How many workers had between 3 and 5 days of sick leave?

Sagni D. 11
Solution:
A. Since this data set contains only a relatively small number of distinct or different
values, it is convenient to represent it in a frequency table which presents each distinct
value along with its frequency of occurrence.

B. Since 12 of the 50workers had no days of sick leave, the answer is 50-12=38
C. The answer is the sum of the frequencies for values 3, 4 and 5 that is 4+5+8=17
2.2.4.3. Grouped Frequency Distribution
When the range of the data is large, the data must be grouped in which each class has more than
one unit in width. Some of basic terms that are most frequently used while we deal with
frequency distribution are the following:
Lower Class Limits are the smallest number that can belong to the different class.
Upper Class Limits are the largest number that can belong to the different classes.
Units of measurement (U) the distance between two possible consecutive measures. It
usually taken as 1, 0.1, 0.01, 0.001…
Class boundaries are the number used to separate classes, but without the gaps created by
class limits.
Class marks are the average of the lower and upper class limit or class boundaries.
Class width is the difference between two consecutive lower class limits or two consecutive
lower class boundaries or the difference between any two consecutive class marks.
Cumulative frequency is the number of observations less/more than or equal to specific
value.
Cumulative frequency above is the total frequency of all values greater than or equal to
frequency of a given class.

Sagni D. 12
Cumulative frequency below is the total frequency of all values less than or equal to
frequency of a given class.
Cumulative frequency distribution (CFD) it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type depending of cumulative frequency used.
Relative frequency (rf) it is frequency divided by the total frequency.
Relative cumulative frequency (rcf) it is the cumulative frequency divided by the total
frequency.

Relative frequency distribution enables us to understand the distribution of the data and to
compare different sets of data.
Guidelines for classes
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. These means that no data value can fall into
two different classes.
3. The classes must be all inclusive or exhaustive. These means that all data values must
be included.
4. The classes must be continuous. There should be no gaps in a frequency distribution.
5. The classes must be equal in width. One exception occurs when a distribution is open-
ended. i.e., it has no specific beginning or end values.
Steps for constructing frequency distribution
1. Find the highest and the lowest values.
2. Compute the Range(R); or
3. Select the number of classes desired usually between 5 and 20 or use Struge‟s rule
where is the number of class and is the total number of
observations.
4. Find the class width by dividing the range by the number of classes

Note that: Round the answer up to the nearest whole number if there is a reminder. For
instance, and

Sagni D. 13
5. Select the starting point as the lowest class limit. This is usually the lowest score
(observation). Add the width to that score to get the lower class limit of the next class.
Keep adding until you achieve the number of desired class calculated in step 3.
6. Find the upper class limit; subtract unit of measurement from the lower class limit of
the second class in order to get the upper limit of the first class. Then add the width to
each upper class limit to get all upper class limits.
7. Find the class boundaries.

and . In short, and

8. Talley the data


9. Write numeric values for the tallies in the frequency column.
10. Find cumulative frequency.
11. Find relative frequency or/and relative cumulative frequency.
Example 2.3: Consider the following set of data and construct the frequency distribution.
11 29 6 33 14 21 18 17 22 38
31 22 27 19 22 23 26 39 34 27
Steps
1.
2.

3. Select starting point. Take the minimum which is 6 then add width 6 on it to get the next
class LCL.
6 12 18 24 30 36
4. Upper class limit. Since unit of measurement is one. So 11 is the UCL of the
first class.
11 17 23 29 35 41
5. Complete the frequency distribution as
Class limit 6-11 12-17 18-23 24-29 30-35 36-41
Class Boundaries 5.5-11.5 11.5-17.5 17.5-23.5 23.5-29.5 29.5-35.5 35.5-41.5
Frequency 2 2 7 4 3 2
6. Find the cumulative frequency of example 2.3.

Sagni D. 14
2.5. Diagrammatic and Graphical Presentation of Data
The way of presentation of statistical data that we have discussed above does not always prove to
be interesting to a layman. Too many figures are often confusing and fail to convey the massage
effectively.
2.5.1. Diagrammatic display of data: Bar charts, Pie-chart, Cartograms
There are techniques for presenting data in visual displays using geometric and pictures.
Importance
They have greater attraction Diagrams are appropriate for
They facilitate comparison presenting discrete
They are easily understandable
I. Pie chart
Pie chart is a circular diagram and the area of the sector of a circle is used in pie chart.

These angles are made in the circle by mean of a protractor to show different components. The
arrangement of the sectors is usually anti-clock wise.
Example2.4: The following table gives the details of monthly budget of a family. Represent
these figures by a suitable diagram.

Sagni D. 15
Solution: The necessary computations are given below:

II. Bar Charts


The bar graph (simple bar chart, multiple bar chart and stratified or stacked bar chart) uses vertical or
horizontal bars to represent the frequencies of a distribution. While we draw bar chart, we have to
consider the following two points. These are
Make the bars the same width
Make the units on the axis that are used for the frequency equal in size
A. A simple bar chart is used to represents data involving only one variable classified on spatial,
quantitative or temporal basis. In simple bar chart, we make bars of equal width but variable
length, i.e. the magnitude of a quantity is represented by the height or length of the bars.
Following steps are undertaken in drawing a simple bar diagram:
Draw two perpendicular lines one horizontally and the other vertically at an appropriate
place of the paper.
Take the basis of classification along horizontal line (X-axis) and the observed variable
along vertical line (Y-axis) or vice versa.
Marks signs of equal breath for each class and leave equal or not less than half breath in
between two classes.
Finally marks the values of the given variable to prepare required bars.

Sagni D. 16
Example 2.5: Draw simple bar diagram to represent the profits of a bank for 5 years.

B. Multiple bar charts are used two or more sets of inter-related data are represented (multiple
bar diagram facilities comparison between more than one phenomenon). The technique of
simple bar chart is used to draw this diagram but the difference is that we use different
shades, colors, or dots to distinguish between different phenomena.
Example 2.6: Draw a multiple bar chart to represent the import and export of Canada (values in
$) for the years 1991 to 1995.

Sagni D. 17
C. Stratified (Stacked or component) Bar Chart is used to represent data in which the total
magnitude is divided into different or components. In this diagram, first we make simple bars for
each class taking total magnitude in that class and then divide these simple bars into parts in
the ratio of various components. This type of diagram shows the variation in different
components within each class as well as between different classes. Sub-divided bar diagram
is also known as component bar chart or staked chart.
Example 2.7: The table below shows the quantity in hundred kgs of Wheat, Barley and Oats
produced on a certain form during the years 1991 to 1994. Draw stratified bar chart.

Solution: To make the component bar chart, first of all we have to take year wise total
production. The required diagram is given below:

2.5.2. Graphical presentation of data: Histogram, Frequency Polygon, Ogive Curves


Statistical graphs can be used to describe the data set or to analyze it. Graphs are also useful in
getting the audience‟s attention in a publication or a speaking presentation.
They can be used to discuss an issue, reinforce a critical point, or summarize a data set. They can
also be used to discover a trend or pattern in a situation over a period of time.
The three most commonly used graphs in research are
1. The histogram. 2. The frequency polygon.
3. The cumulative frequency graph, or ogive.

Sagni D. 18
(1). Histogram
Histogram is a special type of bar graph in which the horizontal scale represents classes of data
values and the vertical scale represents frequencies. The height of the bars correspond to the
frequency values, and the drawn adjacent to each other (without gaps).
We can construct a histogram after we have first completed a frequency distribution table for a
data set. The axis is reserved for the class boundaries.

Example2.8: Take the data in example 2.3.


7.0

6.0

5.0

4. 0
Frequency

3.0

2.0

1.0

0.0 5.5 11.5 17.5 35.5 41.5 (


Relative frequency histogram has the 23.5 29.5 and
same shape horizontal ) scale as a histogram,
Class boundaries

but the vertical ( ) scale is marked with relative frequencies instead of actual frequencies.

(2). Frequency Polygon


A frequency polygon uses line segment connected to points located directly above class midpoint
values. The heights of the points correspond to the class frequencies, and the line segments are
extended to the left and right so that the graph begins and ends on the horizontal axis with the
same distance that the previous and next midpoint would be located.
Example 2.9: Take the data in example 2.3.
7.0

6.0

5.0

4.0

3.0

2.0
(3). Ogive2.5
Graph
8.5 14.5 20.5 26.5
Midpoints
32.5 38.5 44.5

An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as the
cumulative frequency distribution lists cumulative frequencies. Note that the Ogive uses class
Sagni D. 19
boundaries along the horizontal scale, and graph begins with the lower boundary of the first class
and ends with the upper boundary of the last class. Ogive is useful for determining the number of
values below some particular value. There are two type of Ogive namely less than Ogive and
more than Ogive. The difference is that less than Ogive uses less than cumulative frequency and
more than Ogive uses more than cumulative frequency on axis.

Example 2.10: Take the data in example 2.3 and draw less than and more than Ogive

20 Less than Ogive

15

10

More than Ogive


0
5.5 11.5 17.5 23.5 29.5 35.5 41.5
Class Boundaries

Sagni D. 20
CHAPTER THREE
3. Measures of Central Tendency
3.1 Introduction
Measures of central tendency are measures of the location of the middle or the center of a distribution.
The definition of "middle" or "center" is purposely left somewhat vague so that the term "central
tendency" can refer to a wide variety of measures.
 The tendency statistical data to get concentrated at certain value is called central tendency. And
various methods that determine the actual value at which the data tend to concentrate are called
measure of central tendency. One of the most important objectives of statistical analysis is to
get one single value that describes the characteristics of the entire data. Such a value is called
the central value or average.
 When we want to make comparison between groups of numbers it is good to have a single
value that is considered to be a good representative of each group. This single value is called
the average of the group.
 Averages are also called measures of central tendency.
 An average which is representative is called typical average and an average which is not
representative and has only a theoretical value is called a descriptive average.
Characteristic of a good measure of central tendency (A typical average should possess the following):
It should be defined rigidly which means that it should have a definite value.
It should be based on all observation under investigation.
It should be not be affected by extreme observations.
It should be capable of further algebraic treatment.
It should be as little as affected by fluctuations of sampling or should be stable with sampling.
It should be ease to calculate and simple to understand.
It should be unique and always exist.
Note:-There is no measure satisfied all the above condition!
The Summation Notation:
Let X1, X2 ,X3 …XN be a number of measurements where N is the total number of observation
and Xi is ith observation.
Very often in statistics an algebraic expression of the form X1+X2+X3+...+XN is used in a
formula to compute a statistic. It is tedious to write an expression like this very often, so
mathematicians have developed a shorthand notation to represent a sum of scores, called the
summation notation.
N
The symbol X
i=1
i is mathematical shorthand for X11+X2+X3+...+XN
N

X i=1
i = X 1 + X 2 + + X N
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the numbers."
Example: Suppose the following were scores made on the first homework assignment for five students in
the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where N=5, the summation could be
written:
5

X
i=1
i = X 1 + X 2 + X 3 + X 4 + X 5 = 5 + 7 + 7 + 6 + 8 = 33
The "i=1" in the bottom of the summation notation tells where to begin the sequence of summation. If the
expression were written with "i=2", the summation would start with the second number in the set.
5
For example: X
i=2
i = X 2 + X 3 + X 4 + X 5 = 7 + 7 + 6 + 8 = 28

Sagni D. 21
The "N" in the upper part of the summation notation tells where to end the sequence of summation. If
there were only three scores then the summation and example would be:
3

X
i=1
i = X 1 + X 2 + X 3 = 5 + 7 + 7 = 19
Sometimes if the summation notation is used in an expression and the expression must be written a
number of times, as in a proof, then a shorthand notation for the shorthand notation is employed. When
the summation sign " ∑ " is used without additional notation, then "i=1" and "N" are assumed

PROPERTIES OF SUMMATION
n
1. ∑ K = nK , Where k is any constant
i= 1
n n
2. ∑ KX i = K ∑ X i , Where k is any constant
i= 1 i= 1
n n
3.
 (a + bX
i
i ) = na + b X i
i=1
, where a and b are any constant

4. n n n

 (X
i=1
i + Yi ) =  X i +  Yi
i=1 i=1
N
5. (X
i=1
i  Yi ) = X 1  Y1 + X 2  Y2 +    + X N  YN

Example 3.1: considering the following data determine


X Y
5 6
7 7
7 8
6 7
8 8
5 5 5
∑ Xi ∑ 11
5 5
a) b) ∑ Yi c) d) (X + Y )
i i e) (X i  Yi )
i= 1 i= 1 i= 1
i=1 i=1
5 2 5 5 5
∑X ∑ XiYi ∑ X i∑ Y i
5 5

 X + Y
i
f) g) h) i i g)
i= 1 i= 1 i= 1 i= 1
i=1 i=1
3.2 Types of measures of central tendency
There are several different measures of central tendency; each has its own advantage and disadvantage.
The Mean The Median
The Mode
The choice of these averages depends up on which best fit the property under discussion.
3.5. The Mean
There are three types of mean which are suitable for a particular type of data. They are
a) Arithmetic mean c) Harmonic mean
b) Geometric mean
3.5.1.The Arithmetic Mean:
- Divided in to two i.e. simple arithmetic mean and the weighted arithmetic mean
1) Simple Arithmetic Mean:
Different methods exist for grouped and ungrouped data. These are direct method and indirect method.
a) Direct method
- The mean is defined as the sum of the magnitude of the items divided by the number of items
The mean of X1, X2 ,X3 …Xn is denoted by A.M, or X and is given by:

Sagni D. 22
X1 + X 2 +    + X n n
X= =  Xi / n
n i=1
When the data are arranged or given in the form of frequency distribution i.e. there are k
variate values such that a value X i has a frequency f i ( i=1,2,---,k) ,then the Arithmetic
mean will be
k

f i Xi

k
f i= n
X= i=1 Where k is the number of classes and
k i= 1
f i=1
i

 Arithmetic Mean for Grouped Data


If data are given in the shape of a continuous frequency distribution, then the arithmetic mean is obtained
as follows:
n

fY i i
X= i=1
n
Where Y i = the class mark of the ith class and fi = the frequency of the ith class
f
i=1
i

Example 3.2:
1) Daily cash earnings of 15 workers working in different industries are as follows:
11.63,8.22,12.56,12.14,29.23,18.23,11.49,11.30,17.00,9.16,8.64,27.56,8.23,19.77,12.81.Find
the average daily earning of a worker?
2) The distribution of age at first marriage of 130 males was as given below
Age in years(X): 18 19 20 21 22 23 24 25 26 27 28 29
No. of males (f): 2 1 4 8 10 12 17 19 18 14 13 12

Compute the average age of males at first marriage?


3) Calculate the mean for the following age distribution.

Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
b) Indirect Method
 Coding of data: - a linear transformation of data may be regarded as coding. In coding we
shift the origin and change the scale.
The effect of coding on mean is given below.
1) If we subtract an arbitrary constant from each observation, the mean is also reduced by the
constant value
2) If we divided each observation of a set by arbitrary constant, the mean reduced as many
times the constant divisor.
Note: In case of addition or multiplication, the word „reduced‟ should be replaced by increased in the
above statement.
The origin data are transformed using some assuming mean (working mean) denoted by A and let x i
denotes the original value, then

Sagni D. 23
d i =x i  A  xi = d i + A
n
Show!
n d i
x =  xi since xi = A + d i  x = A + i=1

i=1 n
 When the data are arranged or given in the form of frequency distribution
n

fx
n

i=1
i i  f (d i i + A) fd i i
x= = i=1
= A+
n n n
 For grouped data
xi  A
di = = xi = A + wd i Show!
w

f i xi  f (A + wd
i i ) fd i i

x= = = A+ w
n n n
Example 3.3: 1) Suppose the deviation of the observation from the assumed mean of 7 are 1, -1, -2, -2, 0,
-3, -2, 2, 0,-3
a) Find the true mean.
b) Find the original observation
3) Find the mean of the marks obtained by 51 students with A=48.5 and w=10 of
xi 28.5 38.5 48.5 58.5 68.5
fi 4 12 15 13 7
Special properties of Arithmetic mean
1. The sum of the deviations of a set of items from their mean is always zero. i.e.
n

 (X
i=1
i  X )= 0
2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.
n n

 (X i  X )2 <  (X i  A) 2
i=1 i=1

3. If X 1 is the mean of observations n 1


If X 2 is the mean of observations n 2
.
.
.
If X k is the mean of observations n k
Then the mean of all the observation in all groups often called the combined mean is given by:
n

n
i=1
i Xi
X c = n

n i=1
i

4. If a wrong figure has been used when calculating the mean the correct mean can be obtained
without repeating the whole process using:
(correct value  Wrong value)
Corrected mean = Wrong mean +
n
Where n is total number of observations.
5. The effect of transforming original series on the mean.
a) If a constant k is added/ subtracted to/from every observation then the new mean will be the old mean±
k respectively i.e. X new = X old ± K
b) If every observations are multiplied by a constant k then the new mean will be k*old mean i.e.
X new = X old  K .
Example 3.4:

Sagni D. 24
1) In a class there are 30 females and 70 males. If females averaged 60 in an examination and
boys averaged 72, find the mean for the entire class?
2) An average weight of 10 students was calculated to be 65.Latter it was discovered that one
weight was misread as 40 instead of 80 k.g. calculate the correct average weight?
3) The mean of n Tetracycline Capsules X1, X2, …,Xn are known to be 12 gm. New set of capsules
of another drug are obtained by the linear transformation Yi = 2Xi – 0.5 (i = 1, 2, …, n ) then
what will be the mean of the new set of capsules?
4) The mean of a set of numbers is 500.
a. If 10 is added to each of the numbers in the set, then what will be the mean of the new
set?
b. If each of the numbers in the set are multiplied by -5, then what will be the mean of the
new set?
2) Weighted Mean
- When a proper importance is desired to be given to different data a weighted mean is appropriate.
- Weights are assigned to each item in proportion to its relative importance.
- Let X1, X2, …Xn be the value of items of a series and W1, W2, …Wn their corresponding weights , then
the weighted mean denoted X w is defined as:
n

X W
i=1
i i
X w = n

W i=1
i

Example 3.5:
1. Example: A student obtained the following marks in his examinations: English 60, Biology 75,
Physics 59 and Chemistry 55. find the students weighted mean if weights 1, 2, 1, 3 and 3
respectively allotted to the subjects.
Solution:
n

X w i i
60  1  75  2  63  1  59  3  55  3 615
Xw  i 1
   61.5
n
1 2 1 3  3
w
10
i
i 1
2. A teacher allots weights 2 to homework, 3 to mid-exam and 5 to final exam. If students score 90,
50, and 60 for HW, MID and FIN respectively. What is his/her academic performance?
Merits and Demerits of Arithmetic Mean
Merits:
• It is rigidly defined.
• It is based on all observation.
• It is suitable for further mathematical treatment.
• It is stable average, i.e. it is not affected by fluctuations of sampling to some extent.
• It is easy to calculate and simple to understand.
Demerits:
• It is affected by extreme observations.
• It cannot be used in the case of open end classes.
• It cannot be determined by the method of inspection.
• It cannot be used when dealing with qualitative characteristics, such as intelligence, honesty, beauty.
• It can be a number which does not exist in a serious.
• Sometimes it leads to wrong conclusion if the details of the data from which it is obtained are not
available.
• It gives high weight to high extreme values and less weight to low extreme values.
3.5.2. Geometric Mean (G.M)

Sagni D. 25
Here it is the particular type of data for which the Geometric mean is of importance because it gives a
good mean value. If the vitiate values are measured as ratios, proportions or percentages, geometric mean
gives a better measure of central tendency than other means.
G.M of N vitiate values is the Nth root of their product. Like arithmetic mean it also depends on all
observations. It is affected by the extreme values but not to the extent of average. However, there is one
great drawback with it, that it cannot be calculated if any one or more values are zero or negative.
Suppose X1, X2, ---, XN are N variate values, then the G.M is given as,
G = N X 1 X 2 ... X N
In case X1, X2 . . . XK have the corresponding frequencies f1,f2, . . ., fk, then
K
where N  f i
N f1
1
f f
G= X X ... X
i=1
In case of grouped data, mid-values of the class intervals are considered as Xi.
For logarithmic values of of X‟s, it becomes average of logX i values and the formula for Geometric mean
is
1 
G = anti log 
N
 log 10 X i  for i=1,2,. . . ,N.

In case of frequency distribution where each of Xi occurs fi times (i=1,2,. . .,k)
1 
G = anti log   f i log 10 X i  Where N = ∑ f i for i=1, 2. . . K,
N 
Then taking antilog of both sides, we obtain G.M.
Note: The geometric mean is less affected by extreme values than is the arithmetic mean and is useful as
a measure of central tendency for some positively skewed distributions.
Example1: Find the geometric mean of 2, 4 and 8.
Solution: GM  3 2  4  8  3 64  4
Example 2: The price of a commodity increased by 5% from 1995 to 1996, by 8% from 1997 to 1998 and
by 77% from 1999 to 2000. What is the average yearly price increase?
Solution: Let Y0 be price of the original year (1995)
Y1
Y1= Y0+0.05Y0=1.05Y0   1.05
Y0
Y
Y2= Y1+0.08Y1=1.08Y1  2  1.08
Y1
Y
Y3= Y2+0.77Y2=1.05Y2  3  1.77
Y2
Thus GM  3 1.05  1.08  1.77  3 2.00718  1.26
Therefore the average price increase was 26%.
3.5.3. Harmonic Mean (H.M)
H.M is the inverse of the arithmetic mean of the reciprocals of the observations of a set. It is a suitable
measure of central tendency when the data pertains to speed, rates, and time.
Let X1, X2, . . ., XN be N variate values in a set; then the harmonic mean,
1
H= for i=1, 2, …, k
1 1
N
X
i
If the data are arranged in the for of a frequency distribution in which an observation Xi has frequency fi
(i=1, 2, . . .,k), the harmonic mean is given by,

Sagni D. 26
Where N = ∑ f i for i=1, 2… k.
1
H
1 fi
N
X
i
-It fulfills almost all properties of a good measure of central tendency, except when any
observation is zero, it cannot be calculated. Its main advantage is that it gives more wieghtage to
small values and less weightage to large values.
Example 3.6:
1) A man travels from A.A to Awasa by a car and takes four hours to cover the whole distance.
In the first hour he maintains a speed of 50km/h, in the second hour his speed remains
64km/h, in the third 80km/h and in the fourth hour he travels at the speed of 55km/h.Find the
average speed of the motorist?
2) The price commodity increased by 5%, 8% and 77% for three consecutive years. What is
average yearly price increase?
3) The arithmetic mean of two numbers is 13 and their geometric mean is 12. Find
a) The numbers
b) H.M
4) Proof the following theorem
a. If x1 and x2 are two observed values, the geometric mean of their arithmetic mean
and harmonic mean is equal to the geometric mean of the numbers x1 and x2.
3.6. The Mode
Mode is a value which occurs most frequently in a set of values
The mode may not exist and even if it does exist, it may not be unique.
In case of discrete distribution the value having the maximum frequency is the
model value.
If in a set of observed values, all values occur once or equal number of times, there
is no mode
Examples:
1. Find the mode of 5, 3, 5, 8, and 9
Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
- The mode of a set of numbers X1, X2, …Xn is usually denoted by X̂ .
Mode for Grouped data.
If data are given in the shape of continuous frequency distribution, the mode is defined as:
Δ1
Xˆ = Lmod + ( )W
Δ1 + Δ2
Δ1 = f mo  f 1
Where: X̂ = the mode of the distribution
Δ2 = f mo  f 2
Lmo= the lower class boundary of the modal class

Sagni D. 27
fmo= frequency of the modal class
f1= frequency of the class preceding the modal class
f2= frequency of the class succeeding the modal class W=the size of the modal class
Note: The modal class is a class with the highest frequency
Example 3.7: The following is the distribution of the size of certain farms selected at random
from a district. Calculate the mode of the distribution.
Size of farms No. of farms
5- 15 _____________________________8
15- 25____________________________12
25- 35____________________________17
35- 45____________________________29
45- 55____________________________31
55- 65____________________________5
65- 75____________________________3
Merits and Demerits of Mode
Merits:
It is not affected by extreme observations.
Easy to calculate and simple to understand.
It can be calculated for distribution with open end class.
Can be used for qualitative data as well.
Demerits:
It is not rigidly defined.
It is not based on all observations
It is not suitable for further mathematical treatment.
It is not stable average, i.e. it is affected by fluctuations of sampling to some extent.
Often its value is not unique.
3.7. The Median
- In a distribution, median is the value of the variable which divides it in to two equal halves.
- In an ordered series of data median is an observation lying exactly in the middle of the series. It
is the middle most value in the sense that the number of values less than the median is equal to
the number of values greater than it.
-If X1, X2, …Xn be the observations, then the numbers arranged in ascending order will be X[1],
X[2], …X[n], where X[i] is ith smallest value.

X[1]< X[2]< …<X[n]

-Median is denoted by. X
Median for ungrouped data.
X n +1 , if n is odd
2
1
(X n + X n ) ,if n is even
2 +1
2 2

Sagni D. 28
Example 3.8: 1) Actual waiting time for the first job on the selected sample of nine people
having different field of specializations was given below.
Waiting time ( in month ):11.6, 11.3, 10.7, 18.0, 3.3, 9.2, 8.3, 3.8, 6.8
Calculate the median of the waiting time?
2) The export of agricultural products in million dollars from a country during eight quarters in
1974 and 1975 was, 29.7, 16.6, 2.3, 14.1, 36.6, 18.7, 3.5, 21.3.
Find the median of the given set of values?
Median for grouped data.
-If data are given in the shape of continuous frequency distribution, the median is defined as:
~ W n
X = Lmed + (  fc )
f med 2
Where: L med =lower class boundary of the median class.
f med = The frequency of the median class
f c= The comulative frequency less than type preceding the median class .
W=the size of the median class.
n=total number of observation.
Note: The median class is the class with the smallest cumulative frequency (less than type)
greater than or equal to n/2.
Example 3.9: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3

Merits and Demerits of Median


Merits:
• Median is a positional average and hence not influenced by extreme observations.
• Can be calculated in the case of open end intervals.
• Median can be located even if the data are incomplete.
Demerits:
• It is not a good representative of data if the number of items is small.
• It is not amenable to further algebraic treatment.
• It is susceptible to sampling fluctuations.
~
Empirical relationship between X, Xˆ, and X
~
 X = Xˆ = X , for symmetrical distribution
 ~
 X  Xˆ = 3 X  X , for unimodal skewed or asymmetrical frequency distribution.

Sagni D. 29
CHAPTER FOUR
4. Measures of Variation (Dispersion)
4.1 Introduction
Consider the following two sets of scores:
Set 1: 40, 50, 60, 60, 40, 50
Set 2: 0, 100, 25, 75, 80, 20
Both these sets have the same mean (50), but the second set is a lot more widely dispersed
("scattered") than the first.

Set 1 Set 2
Measure of
central tendency alone does not adequately describe a set of observation unless all observations
are the same. So we need some additional information like
1) The extent to which the items in a particular distribution are scatters around the central
tendency i.e. measure of dispersion.
2) The direction of scatteredness whether more items are attached towards higher or lower
values i.e. measure of skewness.
3) The extent to which the distribution is more peaked or more flat toped than the normal
distribution i.e. measure of kurtosis.
Definition:
 The scatter or spread of items of a distribution is known as dispersion or variation. In other
words the degree to which numerical data tend to spread about an average value is called
dispersion or variation of the data.
 Measures of dispersions are statistical measures which provide ways of measuring the extent
in which data are dispersed or spread out.
Good measures of variation possess:
It should be easy to compute and understand.
It should be based on all observations.
It should be Uniquely defined
It should be capable of further algebraic treatment.
It should be as little as affected by extreme values
Absolute and relative measures
Measures of dispersion may be either absolute or relative
1. Absolute measures of dispersion (AMD): Absolute measure is expressed in the SI unit in
which the original data are given such as kilograms, tones etc. These measures are
suitable for comparing the variability in two distributions having variables expressed in
the same units and of the same averaging size. These measures are not suitable for

Sagni D. 30
comparing the variability in two distributions having variables expressed in different
units.
2. Relative measures of dispersion (RMD): when one desires to compare the dispersion in
two sets of data, however comparing the two AMDs may lead to fallacious results. It may
be that the two variables involved are measured in different units. For example, we may
wish to know, for a certain population, whether serum cholesterol levels, measured in
milligrams per 100 ml, are more variable than body weight, measure in kilograms.
Furthermore, although the same unit of measurement is used, the two MCT (means) may be
quite different. If we compare the AMD of weights of first grade children with the AMD of the
weights of high school freshmen, we may find that the latter AMD is numerically larger than the
former, because the weights themselves are larger, not because the AMD is larger.
What is needed in situation like these is a measure of relative variation rather than absolute
variation. It is the ratios of absolute dispersion to an appropriate average such as co- efficient of
Standard Deviation or Co-efficient of Mean Deviation.
4.2 Types of Measures of Dispersion
Various measures of dispersions are in use. The most commonly used measures of dispersions
are:
Absolute measure Relative measures
Range Relative range
Quartile deviation Coefficient of quartile deviation
Mean deviation Coefficient of mean deviation
Variance Coefficient of variation
Standard deviation Standard scores
4.2.1 The Range (R)
The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the range
of scores. Because the range is greatly affected by extreme scores, it may give a distorted picture
of the scores. The following two distributions have the same range, 13, yet appear to differ
greatly in the amount of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.
For ungrouped data:
Rx x , where x  max imum value & X  min imum value
max min max min
For grouped data:
Rx x , where x UCB for the last class & X  LCB for the first class
max min max min
Relative Range (RR)
It is also sometimes called coefficient of range and given by:
X  X min
For ungrouped data: RR = max
X max + X min
UCBlast  LCB first
For grouped data: RR =
UCBlast + LCB first
Merits and Demerits of range
Merits:
It is rigidly defined.
Sagni D. 31
It is easy to calculate and simple to understand.
Demerits:
It is not based on all observation.
It is highly affected by extreme observations.
It is affected by fluctuation in sampling.
It is not liable to further algebraic treatment.
It cannot be computed in the case of open end distribution.
It is very sensitive to the size of the sample.
Example 1
For raw data, 5, 6,8,4,5, 3,9,8,7,3,5,6,8,11
R=11-3=8
11  3 8
coefficien t of range    0.57
11  3 14
Example 2:
Height Number of
(in) Students
Less than 59.5 0
Less than 62.5 5
Less than 65.5 23
Less than 68.5 65
Less than 71.5 92
Less than 74.5 100
R=74.5-56.5=18
xmax  xmin 74.5  56.5
coefficien t range    0.135
xmax  xmin 74.5  56.5
Example 4.1:1) Find the R, and RR and then identify which data is more dispersed?
a) For the month income of 10 workers Xi: 347, 420, 500,600,696,710, 835, 850,
and 900.
b) For the following age distribution.
Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
2. If the range and relative range of a series are 4 and 0.25 respectively. Then what is the value
of:
a) Smallest observation b) Largest observation
The Mean Deviation:
The mean deviation or the average deviation is defined as the mean of the absolute deviations of
observations from some suitable average which may be the arithmetic mean, the median or the
mode. The difference ( ) is called deviation and when we ignore the negative
sign, this deviation is written as and is read as mod deviations. The mean of
these mod or absolute deviations is called the mean deviation or the mean absolute deviation.

Sagni D. 32
Thus for sample data in which the suitable average is the , the mean deviation ( ) is given
by the relation:

For frequency distribution, the mean deviation is given by

When the mean deviation is calculated about the median, the formula becomes

The mean deviation about the mode is

For a population data the mean deviation about the population mean is

The mean deviation is a better measure of absolute dispersion than the range and the quartile
deviation. A drawback in the mean deviation is that we use the absolute deviation
which does not seem logical. The reason for this is that is always
equal to zero. Even if we use median or mode in place of , even then the summation
or will be zero or approximately zero with the result that
the mean deviation would always be either zero or close to zero. Thus the very definition of the
mean deviation is possible only on the absolute deviations.
The mean deviation is based on all the observations, a property which is not possessed by the
range and the quartile deviation. The formula of the mean deviation gives a mathematical
impression that is a better way of measuring the variation in the data. Any suitable average
among the mean, median or mode can be used in its calculation but the value of the mean
deviation is minimum if the deviations are taken from the median. A series drawback of the
mean deviation is that it cannot be used in statistical inference.
Coefficient of the Mean Deviation:
A relative measure of dispersion based on the mean deviation is called the coefficient of the
mean deviation or the coefficient of dispersion. It is defined as the ratio of the mean deviation to
the average used in the calculation of the mean deviation. Thus

Example:
Calculate the mean deviation form (1) arithmetic mean (2) median (3) mode in respect of the
marks obtained by nine students gives below and show that the mean deviation from median is
minimum.
Marks (out of 25): 7, 4, 10, 9, 15, 12, 7, 9, 7
Solution:
After arranging the observations in ascending order, we get
Sagni D. 33
Marks: 4, 7, 7, 7, 9, 9, 10, 12, 15

(Since 7 is repeated maximum number of times)


Marks

Total

From the above calculations, it is clear that the mean deviation from the median hast the least
value.
Example:
Calculate the mean deviation from mean and its coefficients from the following data.
Size of
Items
Frequency

Solution:
The necessary calculation is given below:
Size of Items

Sagni D. 34
Total

4.2.2 The Variance


Population Variance
If we divide the variation by the number of values in the population, we get something called the
population variance. This variance is the "average squared deviation from the mean".

N
( xi  u ) 2
Population Variance    , i  1,2,3,..., N
2 i 1

N
Sample Variance
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate
the corresponding parameter. This formula has the problem that the estimated value isn't the
same as the parameter. To counteract this, the sum of the squares of the deviations is divided by
one less than the sample size.

n
( xi  x ) 2
Sample Variance  i 1
n 1
2
i.e. The sample variance, denoted by s , of a set of n observed values having a mean x is the
sum of the squared deviations divided by n  1 .
The following steps are used to calculate the sample variance:
1. Find the arithmetic mean.
2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the number of
observations minus one, i.e., n-1 (where n is equal to the number of observations in the data
set).
For the case of frequency distribution it is expressed as:
n

 f i ( xi  x ) 2
 i 1
2
s n 1
We usually use the following short cut formula.
n 2

x
2
 nx
i 1

2 i
s n 1
, for raw data
n 2


2
fi x  n x
 i 1
for frequency distribiti on, where  fi  n
2 i
s n 1
,

Standard Deviation

Sagni D. 35
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.
population s tan dard deviation    
2

sample s tan dard deviation  s 


2
s
Examples: Find the variance and standard deviation of the following sample data
1. 5, 17, 12, 10,8
2. The data is given in the form of frequency distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Coefficient of Variation (C.V)
• Is defined as the ratio of standard deviation to the mean usually expressed as percents.
S
C.V   100 0 0
X
• The distribution having less C.V is said to be less variable or more consistent.
Examples:
1. An analysis of the monthly wages paid (in Birr) to workers in two firms A and B belonging to
the same industry gives the following results
Value Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
Solutions: Calculate coefficient of variation for both firms.
SA 10
C.VA  100 0 0  100 0 0  19.05 0 0
XA 52.5
SB 11
C.VB   100 0 0   100 0 0  23.16
XB 47.5
Since C.V < C.V , in firm B there is greater variability in individual wages
A B .
2. A meteorologist interested in the consistency of temperatures in three cities during a given
week collected the following data. The temperatures for the five days of the week in the three
cities were
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Which city have the most consistent temperature, based on these data?
(Exercise)
Sagni D. 36
4.2.3 Standard Scores (Z-scores)
If X is a measurement from a distribution with mean X and standard deviation S, then its
value in standard units is

Xi  u
Zi  , for the population

Xi  x
Zi  , for the sample
s
Z gives the deviations from the mean in units of standard deviation.
Z gives the number of standard deviation a particular observation lie above or below the
mean.
It is used to compare two observations coming from different groups.
Examples:
1. Two sections were given introduction to statistics examinations. The following information
was given.
Value Section 1 Section 2
Mean 78 90
Stand. deviation 6 5

Student A from section 1 scored 90 and student B from section 2 scored 95. Relatively speaking
who performed better?
Solutions:
Calculate the standard score of both students.
X  X 1 90  78 X  X 2 95  90
Z1  1  2 Z2  2  1
S1 6 S2 5
 Student A performed better relative to his section because the score of student A is two
standard deviation above the mean score of his section while, the score of student B is only
one standard deviation above the mean score of his section.
2. Two groups of people were trained to perform a certain task and tested to find out which
group is faster to learn the task. For the two groups the following information was given:
Value Group one Group two
Mean 10.4 min 11.9 min
Stand.dev. 1.2 min 1.3 min
Relatively speaking:
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while person B from Group two
take 9.3 minutes, who was faster in performing the task? Why?
Solutions:
a) Use coefficient of variation.
S 1.2
C.V1  1  100 0 0  100 0 0  11.54 0 0
X1 10.4
S2 1.3
C.V2   100 0 0 
 100 0 0  10.92 0 0
X2 11.9
Since C.V < C.V , group 2 is more consistent.
2 1

Sagni D. 37
b) Calculate the standard score of A and B
X  X 1 9.2  10.4
ZA  A   1
S1 1.2
X B  X 2 9.3  11.9
ZB    2
S2 1.3
 Child B is faster because the time taken by child B is two standard deviation shorter than the
average time taken by group 2 while, the time taken by child A is only one standard deviation
shorter than the average time taken by group 1.

4.3 Skewness
Skewness is the degree of asymmetry or departure from symmetry of a distribution.
A skewed frequency distribution is one that is not symmetrical.
Skewness is concerned with the shape of the curve not size.
If the frequency curve (smoothed frequency polygon) of a distribution has a longer tail to
the right of the central maximum than to the left, the distribution is said to be skewed to the
right or said to have positive skewness. If it has a longer tail to the left of the central
maximum than to the right, it is said to be skewed to the left or said to have negative
skewness.
For moderately skewed distribution, the following relation holds among the three
commonly used measures of central tendency.
Mean  mod e  3mean  median 
Measures of Skewness

-Denoted by sk or  3
-There are various measures of skewness.
mean  mod e x  xˆ
1. The Pearsonian coefficient of skewness  3  sk  
std.dev s

The shape of the curve is determined by the value of sk


sk  0 then the distribution is positively skewed.
sk =0 then the distribution is symmetric
sk  0 then the distribution is positively skewed
Remark:
In a positively skewed distribution, smaller observations are more frequent than larger
observations. i.e. the majority of the observations have a value below an average.
In a negatively skewed distribution, smaller observations are less frequent than larger
observations. i.e. the majority of the observations have a value above an average.

Sagni D. 38
Examples:
1. Suppose the mean, the mode, and the standard deviation of a certain distribution are 32,
30.5 and 10 respectively. What is the shape of the curve representing the distribution?
2. Some characteristics of annually family income distribution (in Birr) in two regions is as
follows:
Region Mean Median Standard Deviation
A 6250 5100 960
B 6980 5500 940
a) Calculate coefficient of skewness for each region
b) For which region is, the income distribution more skewed. Give your interpretation for
this Region
c) For which region is the income more consistent?
Solutions: (exercise)
3. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5. If
the coefficient of variation is 20%, find the Pearsonian coefficient of skewness and the
probable mode of the distribution. (exercise)
4. The sum of fifteen observations, whose mode is 8, was found to be 150 with coefficient
of variation of 20%
a. Calculate the pearsonian coefficient of skewness and give appropriate conclusion.
b. Are smaller values more or less frequent than bigger values for this distribution?
c. If a constant k was added on each observation, what will be the new pearsonian
coefficient of skewness? Show your steps. What do you conclude from this?

4.4 Kurtosis
Kurtosis is the degree of peakdness of a distribution, usually taken relative to a normal
distribution. A distribution having relatively high peak is called leptokurtic. If a curve
representing a distribution is flat topped, it is called platykurtic. The normal distribution which is
not very high peaked or flat topped is called mesokurtic.
Measures of kurtosis
The moment coefficient of kurtosis:
• Denoted by  4 and given by
M4 M4
4  
M 2 2  4

whre M 4 the fourth moment about the mean.


M 3 the third moment about the mean.
 is the population the s t an dard deviation.
The peakdness depends on the value of  4 .
If  4  3 then the curve leptokuric
If  4  3 then the curveis mesokurtic

Sagni D. 39
If  4  3 then the curveis platykurtic

Examples:
1. If the first four central moments of a distribution are:
M1=0, M2=16, M3=-60, M4=162
a) Compute a measure of skewness
b) Compute a measure of kurtosis and give your interpretation.

Solutions:
M3 60
Sk    0.94  0
a) M 2 3
2
3
16 2
 The distribution is negatively skewed
b)
M4 162
4    0.6  3
M 2 2
16 2
 The curve is leptokurtic

Sagni D. 40
5. Elementary probability
5.1. Introduction
• Probability theory is the foundation upon which the logic of inference is built.
• It helps us to cope up with uncertainty.
• In general, probability is the chance of an outcome of an experiment. It is the measure of
how likely an outcome is to occur.
5.2. Definitions of some probability terms
1. Experiment: Any process of observation or measurement or any process which
generates well defined outcome.
2. Probability Experiment (Random Experiment): It is an experiment that can be
repeated any number of times under similar conditions and it is possible to enumerate
the total number of outcomes without predicting an individual out come.
Example: If a fair coin is tossed three times, it is possible to enumerate all possible
eight sequences of head (H) and tail (T). But it is not possible to predict which
sequence will occur at any occasion.
3. Outcome: The result of a single trial of a random experiment
4. Sample Space(S): Set of all possible outcomes of a probability experiment.
Example 1: Sample space of a trial conducted by three tossing of a coin is
S= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Example 2: Recording the gender of children of two-child families.
S= {bb, bg, gb, gg}. An event B may be:B=“children of both genders.” Then B={bg,
gb}.
Sample space can be
Countable (finite or infinite)
Uncountable
5. Event (Sample Point): It is a subset of sample space. It is a statement about one or
more outcomes of a random experiment. It is denoted by capital letter A, B, C - - -.
For example, in the event, that there are exactly two heads in three tossing of a coin, it
would consist of three points HTH, HHT and THH.
Remark: If S (sample space) has n members with two possible outcomes in each trial then
there are exactly 2n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an Event: the complement of an event A means non- occurrence of A
and is denoted by A' orAc or {A , contains those points of the sample space which
don‟t belong to A.
8. Elementary (simple) Event: an event having only a single element or sample point.
Sagni D. 41
9. Mutually Exclusive (Disjoint) Events: Two events which cannot happen at the same
time.
10. Independent Events: Two events are said to be independent if the occurrence of one
does not affect the probability of the other occurring.
11. Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.
5.3 Counting Techniques
The number of outcomes of the random experiment or number of cases favorable to an event can be
determined by using mathematical methods (multiplication rule, addition rule, permutation and
combinations) without direct enumeration.

Addition rule
If there are k procedures and the ith procedure may be performed in ways , then the number

of ways in which we may perform procedure1 or procedure 2 or …procedure k is given by


assuming that no two procedures may be performed together.

Example:
1. Suppose that we are planning a trip and are deciding between bus or train transportation. If there
are 3 bus routes and 2 train routes, how many different routes are available for the trip?
Solution: There are 3 bus and 2 train routes. Thus there routes are available for trip.

Multiplication rule

In a sequence of n events in which the first one has possibilities, the second one has , the 3rd one

has and etc, the total possibility of the sequence will be

Example:

1. An instructor gives a six question multiple choice examinations. There are four possible
responses for each question. How many answer keys can be made?
2. A product is assembled in three stages. At the first stage there are five assembly lines, at the
second stage there are there are 6 assembly lines and at the third stage there are 10 assembly
lines. In how many different ways may the product be routed through the assembly process?
Solution:

Totally
the product can be routed in an assembly process by

Permutations

Sagni D. 42
A permutation is an arrangement of n objects in a specific order. The number of arrangement of n
different objects taking all together is given by

The number of arrangement of n different objects in a circle is given by

Example:

1. Suppose that the photographer want to arrange 4 people in a raw for photographing. By how
many different ways can the arrangement be done?
2. How many different 5 letter permutation can be performed from the letters in the word
DISCOVER?
Solution:
1. The number of arrangement of 4 people in a raw is given by
2.
8!
8 p5   6720
(8  5)!

Combination

A selection of distinct objects without regard to order is called a combination. The difference between a
permutation and a combination is that in a combination, the order or arrangement of the objects is not
important; by contrast, order is important in a permutation.

Sagni D. 43
Example: 1 how many different committees of 3 people can be chosen to work on a special project

From a group of 9 people? 9C3 = 84

I n a club there is 7 women and 5 men. A committee of 3 women and 2 men is to be chosen. How many
different possibilities are there?

Solution:

Sagni D. 44
Example:

Exercise

A committee of 5 people must be selected from 5 men and 8 women. By how many ways can the
selection be done if there are at least 3 women in the committee?

Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. With each event A a
real number called the probability of A satisfies the following properties called axioms of
probability or postulates of probability.
a) 0  P A  1
b) P(s) =1
c) If A and B are mutually exclusive events, the probability that one or the other occur
equals the sum of the two probabilities. i. e. P (AuB) =P (A) +P (B)
d) For any event A , P A  0
e) Pφ = 0
f) For any event A and B ,P(AuB)=P(A)+P(B)-P(AnB)

Sagni D. 45
 
g) P A = 1  P(A)

5.5. Conditional Probability and Independence

Conditional Events: If the occurrence of one event has an effect on the next occurrence of the other event
then the two events conditional or dependant events.

Conditional Probability
Let A and B be two events such that P(A) 0. Denote by P(B|A) the probability of B given that A has
occurred. Since A is known to have occurred, it becomes the new sample space replacing the original S
.From this we are led to the definition

p(A  B
Or PA B = , P (B) 0 or P (A  B) = P (A|B).P(B)
P B 

The above definition implies that the probability that both A and B occur is equal to the probability that A
occurs times the probability that B occurs given that A has occurred. We call P the conditional probability
of B given A, i.e., the probability that B will occur given that A has occurred. It is easy to show that
conditional probability satisfies the axioms of probability.

Remark:
1) 2) and
 
3) P A / B  1  P A / B   
4) P B  / A  1  PB / A
5) if and are mutually exclusive
6) For three events

Sagni D. 46
4) Generalization of multiplication theorem, for events we have

Examples

1. The probability that it is Friday and that a student is absent is 0.03. Since there are 5 school days in a
week, the probability that it is Friday is 0.2. What is the probability that a student is absent given
that today is Friday?

Solution:

P Absent | Friday  
P( Friday and Absent ) 0.03
  0.15
P( Firday ) 0.2

2. A jar contains black and white marbles. Two marbles are chosen without replacement. The
probability of selecting a black marble and then a white marble is 0.34, and the probability of
selecting a black marble on the first draw is 0.47. What is the probability of selecting white marble
on the second draw, given that the first marble drawn was black?

Solution:

Sagni D. 47
PWhite | Black  
P( Black and White ) 0.34
  0.72
P( Black ) 0.47

CHAPTER SIX
Probability Distribution
6.1 Introduction
In chapter one variable is defined as the characteristics or attributes that assign different values
for different elements. In this chapter we define variable that relate to random (probability)
experiment. Therefore, this chapter focuses on definition and discussion of random variables and
probability distribution of random variables and its types.
6.2 Definition of random variables and probability Distribution
Random variable: - is numerical valued function defined on the sample space. It is a function
that assigns a real number to elements of possible outcomes of random experiment. Generally a
random variables are denoted by capital letters and the value of the random variables are denoted
by small letters
Example:Consider an experiment of tossing a fair coin three times. Let the random variable X
be the number of heads in three tosses, then find X?X assumes only four values: 0, 1, 2,
3Random variables are of two types:
1. Discrete random variables are variables which can assume only a specific number of values.
These are variables that assign countable values like 0, 1, 2…
Examples:
• Number of children in a family.
• Number of car accidents per week.
• Number of defective items in a given company.
• Number of bacteria per two cubic centimeter of water.
2. Continuous random variables are variables that can assume all values between any two
give values. It is variables that assume value in certain defined interval.
Examples: • Height of students at certain college.
• Mark of a student.
• Life time of light bulbs.
Sagni D. 48
• Length of time required to complete a given training.
Probability distribution consists of a value a random variable can assume and the
corresponding probabilities of the values or it is a function that assigns probability for each
element of random variable. Therefore, it is nothing but pair of values of random variables with
their corresponding probabilities.
Probability distribution can be discrete or continues. This classification depends on the possible
value of the random variables, whether it is discrete or continuous.
A) Discrete probability distribution: - is a formula, a table, a graph or other devices used to
specify all possible values of the discrete random variable (R.V) say X along with their
respective probabilities. It is simply a pair of discrete random variable with its probability.
Example 1Suppose we toss a coin three times, the sample space is represented as
TTT , TTH , THT , HTT , HHT, HTH, THH, HHH and Suppose X is the number of heads.

A. Assign a value for a random variable.


B. Construct the probability distribution for A
Solution:

A. Once a random variable, say X , is defined as the number of heads, X  0,1, 2, or3
Number of heads, X 0 1 2 3
Probability PX  x i  1/8 3/8 3/8 1/8

We can check that  PX  xi       1


1 3 3 1
8 8 8 8

Two requirements for a discrete probability distribution

1. The sum of the probabilities of all the events in the sample space must equal 1; that
is,  P X   1
2. The probability of each event in the sample space must be between or equal to 0
and 1. that is, 0  P X   1
B) Continuous probability distribution

Definition: It is a pair of values of continuous random variable with corresponding probability.


The non-negative function f(x) is for continuous random variables X is said to be valid
probability density function if it satisfies the following two requirements

i) f(x)≥0, for all x



ii)  f ( x)  1


6.3 Expectation and Variance of Random Variables


1. Let a discrete random variable X assume the values X , X , ….,X with the probabilities
1 2 n
P(X ), P(X ), ….,P(X ) respectively. Then the expected value of X, denoted as E(X) is
1 2 n
defined as:
Sagni D. 49
n
E(X) =X1.P(X1) +X2.P(X2) +…. +Xn.P(Xn =  X i .P X i 
i 1

2. Let X be a continuous random variable assuming the values in the interval (a, b) such that
b b

 f x d (x) =1,then E  X    X . f ( x)d ( x)


a a

Let X is given random variable the expected value of X is its mean i.e. Mean of X=E(X)
2. The variance of X is given by:
 
Variance of X=Var(x) = E X 2  ( E  X ) 2

n
E ( X 2 )   X i .P X i 
2
If X is discrete
i 0
Where
  X 2 f x d ( x) if X is continuous
x

Example: Find the expected value of the following random variable

0 1 2 3 4

0.18 0.34 0.23 0.21 0.04

Here X is with values 0, 1, 2, 3, 4 that is countable form

5
Solution: E ( x)   Xi P( X  Xi )
i 0

= X 1 PX  X 1   X 2 PX  X 2   ...X 4 PX  X 4 

 0(0.18)  1(0.34)  2(0.23)  3(0.21)  4(0.04)

= 1.14

If is a random variable with mean μ, then the variance of x, denoted by is defined by

An alternative formula for is derived as follows

Sagni D. 50
That is,

Example:
What is the expected value and Variance of a random variable X obtained by tossing a coin three
times where X is the number of heads?
Solution: The probability distribution of this experiment becomes
Number of heads, X 0 1 2 3
Probability PX  x i  1/8 3/8 3/8 1/8
3
E ( x)   Xi P( X  Xi )
i 0

= 0(1/8) + 1(3/8) + 2(3/8) + 3(1/8)= 0 + 3/8 + 6/8 + 3/8 = 3/2= 1.5


3
E ( x 2 )   X 2 i P( X  Xi ) = 0(1/8) + 1(3/8) + 4(3/8) + 9(1/8) = 3
i 0

The variance of the random variable X, var(X) = E(X2)-(E(X))2= 0.75


Exercise: Let X be a continuous R.V with distribution
1
 x 0 x2
f ( x)   2

0, otherwise

Then find a) P (1<x<1.5 b) E(x) c) Var(x)

Solution:

 2 x dx  4 1.5 
1.5
a. P1  x  1.5 
1 1 2
 1  0.3125
1

     
2 2
b. E  X    x 2 dx  2 3  1  and E X 2   x 3 dx  2 4  1 
1 1 7 1 1 15
1
2 6 6 1
2 8 8

Sagni D. 51
2

 2 2 15  7 
c. Var  X   E X  E  X      =0.513889
8 6
6.3.1 Common Discrete Probability Distributions
1. Binomial Distribution
A binomial experiment is a probability experiment that satisfies the following four requirements
called assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a
failure.
3. The probability of a success must remain the same for each trial.
4. The outcomes of each trial must be independent
. Examples of binomial experiments
• Tossing a coin 20 times to see how many tails occur.
• Asking 200 people if they watch BBC news.
• Registering a newly produced product as defective or non defective.
Definition: The outcomes of the binomial experiment and the corresponding probability of these
outcomes are called Binomial Distribution.
Let p=probability of success q= 1-p = probability of failure on any given trials
Then the probability getting x success in n trials is given by binomial probability distribution
 n  x n  x
 . p q x  0,1,2,....n
P X  x    x 
0 otherwise

And this can be written as X ~ Bin (n, p)
When using the binomial formula to solve problems, we have to identify three things:
• The number of trials (n)
• The probability of a success on any one trial (P) and
• The number of successes desired (X).
If X is a binomial random variable with parameters n and p then

Example:Five fair coins are flipped. If the outcomes are assumed independent, find the
probability of the number of heads obtained is 3, 4, or 5.

If we let equal the number of heads (successes) parameters .

 5  1   1 
3 2

PX  3       
10
Hence,
 3  2   2  32

 5  1   1 
4

PX  4       
5
 4  2   2  32

Sagni D. 52
 5  1   1 
5

PX  5       
1
 5  2   2  32

4. The probability that a patient contracting IB will recover from the distance under medical
treatment is 0.6 out of 15 patients contracting the diseases

a) What is the probability that exactly 10 is record?

b) What is the expected number of patient who will recover?

c) What is the variance of the number of patient who will recover?

Assume that the patients are subjected under the same medical treatment.

2. Poisson Distribution
A random variable X is said to have a Poisson distribution if its probability distribution is given
by:
  x .  
 x  0,1,2.....
P( X  x)   x! Where  is the average number occurrence of an event in
0
 otherwise
the unit length of interval or distance and x is the number of occurrence in a Poisson process.
The Poisson distribution depends only on the average number of occurrences per unit interval of
space.
The Poisson distribution is used as a distribution of rare events, such as:
o Number of misprints. o Hereditary.
o Natural disasters like earth quake. o Arrivals
o Accidents. o Number of misprints per page
The process that gives rise to such events is called Poisson process. If X is a Poisson random
variable with parameters λ then E(x) = λ, var (x)= λ. For Poisson probability distribution, the
expected value and varianceare equal.

Example1 If X is a Poisson random variable with parameter   2 , find PX  0

Solution:

e 2 2 0
PX  0  Using the fact that 2 0  1 , 0! 1, we obtain Px  0  e 2  0.135
0!

Examples

1. Suppose the average number of accidents occurring weekly on a particular high way is
equal to 1.2. Approximate the probability that there is at least one accident this week.

Sagni D. 53
Solution: That is, if x denotes the number of accidents that will occur this week, then x is
approximately Poisson random variable with mean value   1.2 . The desired
probability is now obtained as follows.

e 1.2 1.2
0
px  1  1  px  0  1   1  e 1.2  0.6988
0!

Therefore, there is approximately a 70% chance that there will be at least one accident this
week.

2. If 1.6 accidents can be expected an intersection on any given day, what is the probability
that there will be 3 accidents on any given day?

Solution: The probability that there will be 3 accidents on any given day

is px  3 
e 1.6
1.6 3
 0.683e1.6
3!

3. A sale firm receives, on the average, 3 calls per hour on its toll-free number. For any given hour,
find the probability that it will receive the following.
a. At most 3 calls
b. At least 3 calls
c. Five or more calls
6.3.2 Common Continuous Probability Distributions
1. Normal Distribution
Every continuous random variable X has a curve associated with it. This curve, formally
known as a probability density function, can be used to obtain probabilities associated with the
random variable. This is accomplished as follows, consider any two points a and b , where a is
less than b . The probability that x assumes any value that lies between a and b is equal to the
area under the curve between a and b . That is, Pa  x  b=

Since X must assume some value, it follows that the total area under the density curve must be
equal 1. Also, since the area under the graph of the probability density function between points
a and b is the same regardless of whether the end points a and b themselves are included.
That is, Pa  x  b  Pa  x  b

Normal Random Variables

The most important type of random variable is the normal random variable. The probability
density function of a normal random variable X is determined by two parameters: the expected
value and the standard deviation of X . We designate these values as  and  , respectively.

  EX  And   SD X 

Sagni D. 54
The normal probability density function is a bell-shaped density curve that is symmetric about
the value  ; its variability is measured by  . The larger  is, the more variability there is in the
curve.

Since the probability density function of a normal random variable is symmetric about its

expected value ; it follows that is equally likely to be on either side of . That is,

Moreover, a random variable X is said to have a normal distribution if its probability density
function is given by

x
2

f x  
1 1
.e  2
 where    x  ,      ,   0
 2   
  Ex  and  2  var iance x  are parameters of the normal distribution.

Properties of Normal Distribution:

 It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum
ordinate is at μ=x and is given by
f x  
1
 2
 It is asymptotic to the x-axis, i.e., it extends indefinitely in either direction from the mean.
 It is a continuous distribution i.e. there is no gaps or holes.
 It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation.
 Total area under the curve sums to 1, i.e., the area of the distribution on each side of the

mean is 0.5   f ( x)d x   1


 It is uni-modal, i.e., values mound up only in the center of the curve.


 Median=Mean=mode =μ and located at the center of the distribution.
 The probability that a random variable will have a value between any two points is equal
to the area under the curve between those points.
Note: To facilitate the use of normal distribution, the following distribution known as the
standard normal distribution was derived by using the transformation

1
X 
Z

 f z  
1
e2
Z2
 
i.e. if X ~ N  ,  2 then Z ~ 0,1
2
Sagni D. 55
Properties of the Standard Normal Distribution

Same as a normal distribution, but


• Mean is zero
• Variance is one
• Standard Deviation is one

Sagni D. 56
Areas under the standard normal distribution curve have been tabulated in various ways. The most
common ones are the areas between Z=0 and a positive value of Z.
Given a normally distributed random variable X with Mean μ and standard deviation σ
a X  b a b
Pa  X  b   P     P Z  
        

Example1: Find the area under the normal distribution curve between and

Solution: Draw the area as follows:

0 2.34

Since table gives the area between 0 and any value to the right of 0, one need look up the

value in the table. Find 2.3 in the left column and 0.04 in the top row. The value where the

column and row meet in the table is the answer, 0.4904.

0.00 0.01 0.02 0.03 0.04 …

0.0
0.1
0.2

2.2
2.3 0.4904

Example 2: Find

A.
B.

Solution:

A. Draw the area as follows:

Sagni D. 57
0 1.50

0 0 1.5

B. Draw the required area as follows:

0 0.8

0 0.8 0 0 0.8

Example 6.17: Find

A.
B.

Solution:

A. Draw the graph as follows:

Sagni D. 58
0 1 2

0 1 2 0 2 0 1 2

B. Draw the graph as follows:

-1.5 0 2.5

-1.50 2.5 -1.50 0 2.5

Since , due to symmetric property of normal distribution

Sagni D. 59
2. Student’s t Distribution

It is often the case that one wants to calculate the size of sample needed to obtain a certain level
of confidence in survey results. Unfortunately, this calculation requires prior knowledge of the
population standard deviation ( ). Realistically, is unknown. Often a preliminary sample will

be conducted so that a reasonable estimate of this critical population parameter can be made. If
such a preliminary sample is not made, but confidence intervals for the population mean are to
be constructing using an unknown , then the distribution known as the Student t distribution can

be used. In addition, in statistics as long as sample size is large enough, most datasets can be
explained by Standard Normal Distribution. But when the sample size is small, statisticians rely
on the distribution of the t statistic (also known as the t score), whose value is given by:

[x  μ]
t=
s
n

Where x the sample mean, μ is the population mean, s is the standard deviation of the sample,
and n is the sample size.

The distribution of the t statistic is called the t distributionor theStudent t distribution. The
particular form of the t distribution is determined by its Degrees of Freedom (df). The degree of
freedom refers to the number of independent observations in a set of data. When estimating a
mean score or a proportion from a single sample, the number of independent observations is
equal to the sample size minus one. The t distribution can be used with any statistic having a
bell-shaped distribution (i.e., approximately normal).

The t distribution has the following properties:

 The mean of the distribution is equal to 0.


 The variance is greater than one, but approaches one from above as the sample size
increases ( =1 for the standard normal distribution).

 With infinite degrees of freedom, the t distribution is the same as the standard normal
distribution.
 The t distribution is similar to standard normal distribution in the following ways
 It is bell-shaped.
 It is symmetric about the mean.

Sagni D. 60
 The mean, median, and mode are equal to zero and located at the center of the
distribution.
 The curve never touches the x axis.
 The t distribution differs from standard normal distribution in the following ways.
 The variance is greater than one
 The t distribution is actually a family of curves based on the concept of degrees
of freedom, which is related to sample size.
 As the sample size increases, the t distribution approaches the standard normal
distribution.
3. Chi-square Distribution:  2  Distributi on

The square of a standard normal variable is called a chi-square variate with one degree of
freedom. Thus if is a random variable following normal distribution with mean and standard

deviation , then is a standard normal variate. is a chi-square variate with 1 degree

of freedom.

If are independent random variables following normal distribution with means

and standard deviations respectively then the variate

this is the sum of the square of independent standard normal variates, follows chi-

square distribution with degree of freedom.

Sagni D. 61
Sagni D. 62
CHAPTER SEVEN
Sampling and sampling distribution of the mean
7.1 Introduction

One must get statistical data by investigating the characteristics of elements of the population.
Therefore, in any investigation in which we decided to use data to solve defined problem
depending on the coverage of units in the populations the way we get data can be seen in two
ways; complete enumeration (census survey) of all units in which data is collected from each
elements of the populations and sampling portions of populations (sample survey) in order to
study about the characteristics of the populations. In most of the cases it is not either possible or
feasible to investigate the whole elements of the population. This could be due to resource
constraints (funds and time) and for some other reasons such as nature of the population. To
handle such problems, sample survey is an alternative and feasible method of data gathering.
Thus, sample survey is a method of collecting statistical data from sample elements to provide
information or statistical data that are relevant to researchers, planners and policy markers.

7.2 The concept of sampling

Sampling is that part of statistical practice concerned with the selection of individual
observations intended to yield some knowledge about a population of concern, especially for the
purposes of statistical inference. Before having further discussion on the specific type of
sampling methods, it is valuable to be acquainted to the following terms:

1. A (statistical) population: is the complete set of possible measurements for which


inferences are to be made. In other words, it is totality of all subjects under study within
specified space and time.
Examples
- Population of trees under specified climatic conditions
- Population of animals fed a certain type of diet
- Population of farms having a certain type of natural fertility
- Population of households, etc
- The population could be finite or infinite
- There are two ways of investigation: Census and sample survey.
2. Census: a complete enumeration of the population.
Example: Ethiopian Population and House Census
3. Sample: Portion or part of population selected using some statistical techniques. Most of
the time sample is selected in order to investigate the characteristics of the populations
from which it is collected,
In practice, most of the time we don‟t conduct census, instead we conduct sample survey
4. Parameter: Characteristic or measure obtained from a population. On the other word it
said to be population values that gives numerical expressions that summarizes the
characteristics of the populations. Example, population mean, population proportion, etc

Sagni D. 63
5. Statistic: Characteristic or measure obtained from a sample. We stated that parameter is
summary obtained from the population. Therefore, statistic is summary obtained from the
sample.
6. Sampling: Sampling is a statistical process in which one can select and examine sample
units and provide statistical information by involving variety of techniques instead of
considering the whole population units. In other words, it is a process that allows the
investigator to obtain accurate information from a sample and relate that information to the
population characteristic without examining every unit of that population.

Sampling can be done either with replacement or without replacement.

Sampling with replacement (swr): in this case, a unit is selected from a population with a
known probability and a unit is returned to the population before the next selection is made
(after records its characteristic(s)).Thus, in this method at each selection, the population size
remains constant and the probability at each selection or draw remains the same and a unit has
chances of being selected more than once. There are Nn possible samples of size n from a
population of N units. That is one can select sample of n out of N in Nn ways.
Sampling without replacement (swor): in this selection procedure, if n unit from a population
size N is selected, it is not returned to the population. Thus, for any subsequent selection, the
population size reduced by one. There are possible samples of size n from a population of
N units. This is n can be selected out of N in ways.
Sampling unit: the ultimate unit to be sampled or elements of the population to be sampled.
Examples:
If somebody studies Scio-economic status of the households, households is the sampling unit.
If one studies performance of freshman students in some college, the student is the sampling unit.

Sample size: The number of sampling units which are selected from a population.

The sample size depends on a number of considerations which are as follows.

a) The purpose for which the sample is drawn.


b) The type of population from which the sample is to be drawn.
c) Availability of technical people or equipment needed.
d) Resources allotted for the study in terms of time and money.
e) Precision required.

Study Unit is the unit on which information is collected. The unit on which measurements are
performed from which information is collected.

Sampling Fraction (Sampling Interval) isthe ratio between the numbers of units in the
sample to the number of units in the source population.

Sagni D. 64
Sampling frame isThe list of all the units in the source population from which a sample is to
be taken.

Examples:
List of households. List of students in the registrar office.
Errors in sample survey:

There are two types of errors


a) Sampling error:
It is the discrepancy between the population value and sample value due to the fact that
the sample is not a perfect representation of the population. This is type of error is
committed because of that sample is used. In complete enumeration it is not occurred.
May arise due to inappropriate sampling techniques applied. This error can be
decreased by increasing sample size.
b) Non sampling errors: are errors due to procedure bias such as:
Due to incorrect responses ( is called response or observational error)
Measurement or lack of preciseness of definition.
Errors at different stages in processing the data such as editing and tabulating of data..
This type of error can be occurred even if you collect your data from whole population.

Reasons for Sampling (Advantages of conducting sample survey than census)

Reduced cost: Finances required to cover the whole population can hardly be made
available
Greater speed: Too much time required studying the whole population and often the
study becomes outdated by the time it is complete.
Greater accuracy: Complete enumeration (census study) adds many errors which are
reduced or eliminating by sampling.
The only option when the population is infinite: Incase, the population is infinite or
consists uncountable number of units, its study is impossible.
Because of the above consideration, in practice we take sample and make conclusion about the
population values such as population mean and population variance, known as parameters of the
population.
 Sometimes taking a census makes more sense than using a sample. Some of the reasons
include:
Universality
Detailedness
Non-representativeness

Sagni D. 65
7.3 Sampling Techniques
Depending on chance of the units of the population included in sample, sampling can be
divided into two.
A) Probability Sampling
B) Non-probability Sampling
A. Random Sampling or probability sampling.
A probability sampling scheme is one in which every unit in the population has a known
nonzero probability of being sampled and the process involves random selection. Probability
sampling includes: Simple Random Sampling, Systematic Sampling, Stratified Sampling,
Cluster Sampling or Multistage Sampling.
1. Simple Random Sampling:
It is a method of selecting items from a population such that every possible sample of
specific size has an equal chance of being selected. In this case, sampling may be with or
without replacement. Or all elements in the population have the same pre-assigned non
zero probability to be included in to the sample.
This could be accomplished by writing each study units name or code on a slip of paper
and selecting adequate number of them using Lottery Method. It can also be done by
assigning a number to each sampling unit then samples are selected using Table of
Random Numbers or Computer application.
2. Stratified Random Sampling:
The population will be divided in to non-overlapping and exhaustive groups called strata.
A separate sample is taken from each stratum using Simple or Systematic Random
Sampling techniques.
Elements in the same strata should be more or less homogeneous while different in
different strata.
It is applied if the population is heterogeneous.
The main advantage is it improves representativeness of the sample and it creates
reasonable comparison among strata. The major limitation is it requires separate sampling
frame for each stratum.
Some of the criteria basis for stratification is: Characteristics of the population (Sex, Age,
ethnic origin and Occupation, etc.) and Geographical
3. Cluster Sampling:
The population is divided in to non-overlapping groups called clusters.
Assuming the groups are homogenous among each other, Cluster sampling selects few
groups (clusters) from the population as Primary Sampling Unit (PSU) then the required
information is collected from all elements, Secondary Sampling Units (SSU), within each
selected group.
Clusters are formed in a way that elements within a cluster are heterogeneous, i.e.
observations in each cluster should be more or less dissimilar.
The major advantage of Cluster Sampling is that, it doesn‟t require the sampling frame of
the SSU. It would be adequate to have the sampling frame for PSU (i.e. clusters). Its major

Sagni D. 66
limitation is the fact that it relies on the assumption of homogeneity among clusters, which
rarely happens in the real world.
4. Systematic Sampling:

A Systematic Random Sampling: This method selects units at a fixed interval throughout
the sampling frame after a random start. Here are the steps you need to follow in order to
achieve a systematic random sample:

Number the units in the population from 1 to N,


Decide on the n (sample size) that you need,
Calculate the Sampling Fraction k (K = N/n),
Randomly select an integer between 1 to k, suppose it is j 1  j  k 
The j th unit is selected at first and then  j + k th ,  j + 2k th , ....etc until the required
sample size is reached

The general advantage of Systematic Random Sampling is the fact that it is easier and less
time consuming to perform. In some situation it can also be conducted without sampling
frame.

B. Non Random Sampling or non probability sampling.


It is a sampling technique in which the choice of individuals for a sample depends on the
basis of convenience, personal choice or interest.
It is any sampling method where some elements of the population have no chance of
selection or where the probability of selection cannot be accurately determined.
In Non-probability sampling, the sample is less likely to be representative of the
population, thus information about the relationship between sample and population is
limited, making it difficult to extrapolate from the sample to the population.
Non-probability sampling is used when there is no sampling frame to conduct probability
sampling, or when it is impossible to conduct probability sampling due to economical and
feasibility factors.
Non-probability sampling is can be Purposive, Convenience, Quota and Snowball
Sampling.
a) Judgmental or Purposive Sampling: The researcher chooses the sample based on who
he/she think would be appropriate for the study. Samples are taken based on previous
knowledge of the population (from which the samples are taken), and the specific
purpose of the study or investigation. Researchers use their personal judgment in
selecting the sample(s)
b) Convenience Sampling: The selection of units from the population is based on easy
availability and/or accessibility.
i. Quota Sampling: It starts with systematically setting “Quota” to represent subgroups
of a population. Then data is collected to meet the predefined Quota.

Sagni D. 67
ii. Snowball Sampling: The researcher begins by identifying someone who meets the
inclusion criteria of the study. Then the study subject would be asked to recommend
others who s/he may know who also meet the criteria.
7.4 Sampling Distribution

Because statistic such as x varies from sample to sample, they are random variables. As such,
Statistic has probability distributions associated with them. In order to make probability
statements regarding a sample statistic, we need to know the probability distribution of the
sample statistic. That is to say, we need to know the shape, center and spread of the sample
statistic‟s distribution.

The sampling distribution of a statistic is a probability distribution for all possible values of the
statistic computed from a sample of size n.

 There are commonly three properties of interest of a given sampling distribution.


 Its Mean
 Its Variance
 Its Functional form.
7.4.1 Sampling Distribution of the sample mean
Sampling distribution of the sample mean is a theoretical probability distribution that shows the
functional relationship between the possible values of a given sample mean based on samples of
size and the probability associated with each value, for all possible samples of size drawn from
that particular population.
Steps for the construction of Sampling Distribution of the mean
1. From a finite population of size N, randomly draw all possible samples of size n
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution or relative
frequency distribution.
Example: Suppose we have a population of size 5=N, consisting of the age of five children: 1,
3, 5, 7 and

 Population mean   
 X i  1  3  5  7  9  25  5
N 5 5

Population var iance   2 


 ( X i   )  (1  5) 2  (3  5) 2  (5  5) 2  (7  5) 2  (9  5) 2  40  8
2

N 5 5
The standard deviation is σ=2.828427. In most of the situations we never know all population
values µ and σ, but we estimate sample values.

Example: Take samples of size 2 without replacement and construct sampling distribution of the
sample mean.

There are N  = 5



 n  
  
  10 possible samples of size as shown below.
   2

Sagni D. 68
Sample No Sample Mean ( x )

1 1, 3 2

2 1, 5 3

3 1, 7 4

4 1, 9 5

5 3, 5 4

6 3, 7 5

7 3, 9 6

8 5, 7 6

9 5, 9 7

10 7, 9 8

1 3 1 5
For instance, x1 = =2, x 2 = =3, etc
2 2

N 1
Sampling is random so that each sample has the same probability 1   = of being selected.
 n  10

xf Probability
2 1 1/10
3 1 1/10
4 2 2/10
5 2 2/10
6 2 2/10
7 1 1/10
8 1 1/10
10 1.0

This is the sampling distribution of x.


Remark:
2
1. In general if sampling is without replacement  2 x 
n

Sagni D. 69
 2  N n
2. If sampling is with replacement  2
 x 

n  N 1 
3. In any case the sample mean is unbiased estimator of the population mean.

i.e.  x    E x   (show)
 Sampling may be from a normally distributed population or from a non- normally distributed
population.
 When sampling is from a normally distributed population, the distribution of x will possess
the following property.
1. The distribution of x will be normal
2. The mean of x is equal to the population mean, i.e.  x  
3. The variance of x is equal to the population variance divided by the sample size i.e.
2
 2x 
n
   x
 x ~ N   ,   Z 
 n 
n
7.5 Central Limit Theorem

Suppose a random variable X has population mean μ and standard deviation σ and that a random
sample of size n is taken from this population. Then the sampling distribution of x becomes
2
approximately normal with  x   and variance  2
x  as the sample size n increases
n
(n  30 ).

Simply stated: For any population, regardless of its shape, as the sample size increases, the shape
of the sampling distribution of the sample mean, x ,becomes more normal.

Exercise:
1. If the uric acid values in normal adult males are approximately normally distributed with
mean 5.7 mgs and standard deviation 1mg find the probability that a sample of size 9 will
yield a mean.
i. greater that 6
ii. between 5 and 6
iii. Less that 5.2
2. Suppose that all students who are at examination in a particular year the mean score was 450 with
s.d of 120.If 400 of the students who took the test during that particular year were selected at random.
a) Determine the standard error of the mean
b) What is the probability that their scores have a mean
i) greater than 456
ii) Between 440 and 460

Sagni D. 70
3. In 2000, as reported by ACT Research Service, the mean ACT Math score

Was  =20.7 If ACT Math scores are normally distributed with  =5, answer the following questions.

(a) Describe the sampling distribution of the sample mean, x

(b) What is the probability that a randomly selected student has an ACT Math score less than 18?

(c) What is the probability that a random sample of 10 ACT test takers had a mean math score of 18 or
less?

Sagni D. 71
CHAPTER EIGHT
Statistical Estimation and Hypothesis testing
8.1 Introduction
Inference is the process of making interpretations or conclusions from sample data for the
totality of the population. There are two ways through which inference can be made.
 Statistical estimation  Statistical hypothesis testing
8.2 Statistical Estimation:
This is one way of making inference about the population parameter where the investigator does
not have any prior notion about values or characteristics of the population parameter.
There are two ways estimation:
i. Point Estimation: make a reasonable guess of the unknown value of a designated population
quantity, e.g., the populations mean. It is a single value or number of sample information that
is used to estimate a parameter. The best point estimate of the population mean  is the
sample mean X.

ii. Interval estimation: It is the procedure that results in the interval of values as an estimate for a
parameter, which is interval that contains the likely values of a parameter. It deals with
identifying the upper and lower limits of a parameter.
Confidence interval for Population Mean
A confidence interval is a specific interval estimate of a parameter determined by using data
obtained from a sample and the specific confidence level of the estimate. The confidence level is
the probability that the value of the parameter falls within the range specified by the confidence
interval surrounding the statistic. There are different conditions to be considered to construct
confidence intervals of the population mean, .

Condition-1: Assume that  2 is known and the population distribution is approximately normal
Consider samples of size n drawn from a population, whose mean is μ and standard deviation
is  . The sampling distribution of X will have a mean  X   and a standard

deviation  X  , and approaches a normal distribution as n gets large.
n
This allows us to use the normal distribution curve for computing confidence intervals.

Z 
X    ~ N (0,1)
 n
   X  Z n
 X   , where  is a measure of error.
  Z n

Sagni D. 72
To obtain the value of Z, we have to attach this to a theory of chance. That is, there is an area of
size 1-  Such that:
 P Z 2  Z  Z 2   1  
Where:  = is the probability that the parameter lies outside the interval
Z  2  is the value of the standard normal variable corresponding to the right of which  2
probability lie , i.e.  PZ  Z  2    2
 X  
 P  Z  2   Z  2   1  
  n 

 P X  Z 2  n    X  Z 2  
n  1
If the population has a normal distribution and  is known, then a 1   100%

confidence interval for  is given by X  Z 2  n , X  Z 2  n 


Note: When (as is often the case) we don't know the population standard deviation  and n is
large ( n  30 ), we can approximate it by the sample standard deviation S , and obtain the
following approximation of the 1   100% confidence interval for  :

X  Z 2 S n , X  Z 2 S n 
Z  2  Z-value with an area of /2 to its right (obtained from a table).
Example 8.1: A random sample of 900 workers showed an average height of 67 inches with a
standard deviation of 5 inches.
a) Find a 95% confidence interval of the mean height of all workers
b) Find a 99% confidence interval of the mean height of all workers
Solution:

a) X  67 , S=5, n=900
 1   100%  95%  1     0.95
   0.05   2  0.025
 Z 2  Z 0.025  1.96, from the table.

The required interval will be:


X  Z  2 S n , X  Z 2 S n 
 (67  1.96 * 5 30 ,67  1.96 * 5 30 )
 66.673,67.327 

This indicates that we are 95 percent confident (sure) that the true mean in between 66.673
and 67.327 inclusively.
Sagni D. 73
 1   100%  99%  1     0.99
b)
   0.01   2  0.005
 Z 2  Z 0.005  2.58, from the table.
The required interval will be:
X  Z  2 S n , X  Z 2 S n 
 (67  2.58 * 5 30,67  2.58 * 5 30)
 66.57,67.43
Interpretation for confidence interval is left for students.
Example 8.2: Suppose we want to estimate a 95% confidence interval for the average quarterly
returns of all fixed-income funds in the Ethiopia. We draw a sample of 100 observations and
calculate the sample mean to be 0.05 and the standard deviation 0.03. We assume that those
returns are normally distributed with known variance.
Solution: X  0.05,   0.03, n=100
 1   100%  95%  1     0.95
   0.05   2  0.025
 Z 2  Z 0.005  2.58, from the table
 The confidence interval is:

 X  Z 2  n 
 0.05  1.96 (0.03 10) 
 (0.04412, 0.05588)
Condition-2: If the population variance  2 is not known and n is Small (n<30 the population is
normal:
In this case, the standard deviation  is replaced by the estimated standard deviation S. Since
the standard error is an estimate for the true value of the standard deviation, the distribution of
the sample mean X is no longer normal with mean  and standard deviation  n . Instead,

the sample mean follows the t -distribution with mean X and standard deviation S n . The t -

distribution is also described by its degrees of freedom. For a sample of size n, the t -
distribution will have n-1 degrees of freedom. As the sample size n increases, the t -distribution
becomes closer to the normal distribution.

t 
X    has t distribution with n-1 degree of freedom.
S n

The value of t 2 can be obtained from a table with an area of  2 to the right with n  1
degrees of freedom.

Sagni D. 74
Therefore, the 1   100% confidence interval for  when the population is normally

distributed and  is not known is given by X  t2 S n , X  t 2 S n 


Example 8.3: A Drug Company is testing a new drug which is supposed to reduce blood
pressure. From the six people who are used as subjects, it is found that the average drop in blood
pressure is 2.28 points, with a standard deviation of 0.95 points. What is the 95% confidence
interval for the mean change in pressure?
Solution:
X  2.28 , S  0.95 , n  6
 1   100%  95%  1     0.95
   0.05   2  0.025
 t 2  t 0.025  2.571, from the table, with df  5.
The required interval will be:

X  t 2 S n , X  t 2 S n 

 ( 2.28  2.571* 0.95 
6 ,2.28  2.571* 0.95 6)
 1.28.3.28
Properties of best estimator
The following are some qualities of an estimator
1. Unbiased Estimator: An estimator is said to be unbiased if the mean of its sampling
distribution is equal to the population parameter. An unbiased estimator is one whose expected

value is the parameter of the population under consideration. i.e. E ˆ   .
2. Consistent Estimator: An estimator which gets closer to the value of the parameter as the
sample size increases i.e. ˆ gets closer to θ as the sample size increases.
3. Relatively Efficient Estimator: Suppose we have two estimators that have the same mean but
different variances. Then the estimator with the smaller variance is said to be more efficient.
8.4 Statistical Hypothesis Testing
A statistical hypothesis test is a method of making statistical decisions using experimental data.
Hypothesis Testing is a common method of drawing inferences about a population based on
statistical evidence from a sample.
Definitions:
Statistical hypothesis is an assertion, statement, or claim about the population whose
plausibility is to be evaluated on the basis of the sample data.

Sagni D. 75
Test statistic is a statistics whose value serves to determine whether to reject or not reject the
hypothesis to be tested. There are two types of statistical hypotheses for each situation: the null
hypothesis and the alternative hypothesis.
a. Null hypothesis is a claim or statement about a population parameter that is usually assumed
to be true from the very beginning until it is declared false. It is a statistical hypothesis that
states a hypothesis of equality or the hypothesis of no difference between a parameter and a
specific value. It is usually denoted by H .
0

b. Alternative hypothesis is a claim or statement about a population parameter that will be true
if the null hypothesis is false. It is a statistical hypothesis that states a hypothesis of
difference between a parameter and a specific value. It is usually denoted by H or H .
1 A

Types and size of errors:


. There are two types of errors in hypothesis testing.
Type I error is rejecting the null hypothesis when it is actually true. The significance level (  )
can be interpreted as the probability of rejecting the null hypothesis when it is actually true.
 =P (type I error) = level of significance
Type II error occurs when a false null hypothesis is not rejected. The null hypothesis is actually
false but we wrongfully conclude do not reject it.  represents the probability that H0 is not
rejected when actually is false.
 =P (type II error)
The power of a test ( 1   ) is the probability of correctly rejecting a false null hypothesis.
Note: The two types of errors that occur in hypothesis testing depend on each other.
The following table gives a summary of possible results of any hypothesis test:
Actual situation (condition)
H0 is true H0 is false
Do not Reject H0 Correct Decision Type II error
Decision
Reject H0 Type I error Correct Decision
General steps in hypothesis testing:
1. State the appropriate hypothesis 6. Making the decision.
2. Select the level significance,  7. Conclusion
3. Select an appropriate test statistics
4. Identify the critical region.
5. Compute the test value

Sagni D. 76
Introduction to Statistics and Probability
8.4.1 Hypothesis tests about a population mean: 
Suppose that the hypothesized value of  is denoted by  0 then one can formulate two sided (1)
and one sided (2 and 3) hypothesis as follows:
1. H 0 :    0 VS H1 :    0
2. H 0 :    0 VS H1 :    0
3. H 0 :    0 VS H1 :    0
Condition-1: If the population standard deviation,  is known and sampling is from a normal

distribution: The formula for the test statistic is: Z cal 


X   0 
 n
After specifying α we have the following test criteria corresponding to the above three
hypothesis.
Hypothesis
Decision rule is to reject H0 if:
Null Alternative
  0 Z cal  Z  2

  0 VS   0 Z cal  Z 
  0 Z cal  Z 
Note: When we don't know the population standard deviation  and n is large ( n  30 ), we can
approximate it by the sample standard deviation S , and obtain the following test statistics:

Z cal 
X    ~ N (0,1)
0

S n
The decision rule is the same as condition-1.
Condition-2: When the population standard deviation,  , is unknown, the population is
normally or approximately normally distributed, and sample size is small (n<30).
( X  0 )
The formula for the test statistic is t cal  ~ t ( n1)
S n
After specifying α we have the following test criteria corresponding to the above three
hypothesis.
Hypothesis
Decision rule is to reject H0 if:
Null Alternative
  0 t cal  t 2
VS   0 t cal  t
  0
  0 t cal  t

Sagni D. 77
Introduction to Statistics and Probability
Example 8.5: The Telecommunication provides telephone service in an area. According to the
company‟s records, the average length of all calls placed was 12.5 minutes. A sample of 150
such calls placed through this Corporation produced a mean length of 13 minutes with a standard
deviation of 2.6 minutes. Can you conclude that the mean length of all current calls is different
from 12.5 minutes? Use the 0.05 level of significance and assume that the distribution of all call
is normal.
Solution: Let  0  population mean
1. State the null and alternative hypothesis:
H 0 :   12.5 (The mean length of all current calls is 12.5 minutes)
H1 :   12.5 (The mean length of all current calls is different from12.5 minutes).
2. Select the level significance,  = 0.05 (given)
3. Select an appropriate test statistics. Z-statistic is appropriate because the sample size is
large
4. Identify the critical region:
Here we have two critical regions since we have two tailed hypothesis. The critical region is
Z cal  Z 0.025  1.96
5. Compute the test value. Given that X  13 ,   2.6 , n=150

 Z cal 
X   0   13  12.5  0.5  2.27
S n 2.6 150 0.22
6. Decision: Reject H0, since Z cal  2.27  Z 0.025  1.96
7. Conclusion: At 5% level of significance, we have evidence to say that the average length
of all such calls is not equal to 12.50 minutes.
Example 8.6: Ten individuals are chosen at random from a population and their height is found
to be in inches 63, 63, 65, 66, 67, 68, 69, 70, 71 and 71. In the height of the data the average
height of the population is 66 inches. Can we conclude that the height of an individual is
decreasing? (Use   0.05 and assume the normality of the population).
Solution: Let  0  population mean
1. State the null and alternative hypothesis: H 0 :   66 VS H1 :   12.5
2. Select the level significance,  = 0.05 (given)
3. Select an appropriate test statistics: t -statistic is appropriate because the population
standard deviation is unknown and the sample size is small.
4. Critical region: t 0.05,9  1.8331 Reject Ho if tcal < -1.8331  (,1.8331) .
5. Compute the test value
10 n

 Xi (X i  X )2
X i 1
 67.3 , S  i 1
 3.02 , n=10
10 n 1

Sagni D. 78
Introduction to Statistics and Probability

 t cal 
X     67.8  66  1.891
0

S n 3.02 10
6. Decision: do not reject H0, since t cal is not in the rejection region
7. Conclusion: At 5% level of significance, we have no evidence to say that the average
height of an individual is less than 66 inches.
Example 8.7: An authority from a district power station of the town told reporters recently that
the average monthly electric Bill of households in Addis Ababa is not more than Birr 100. A
random sample of 400 households from the city produces a mean of Birr 105 Bill with standard
deviation of Birr 40. Test the claim of the authority at 5% level of significance.
Solution:
1. State the null and alternative hypothesis: H 0 :   100 (claim ) VS H1 :   100
2. Select the level significance,  = 0.05 (given)
3. Select an appropriate test statistics. Z-statistic is appropriate because the sample size is large
and the population is non-normal.
4. Critical region: Z cal  Z  Z 0.05  1.645
 (, 2.5) is the acceptance region for the null hypothesis

5. Compute the test value  Z cal 


X     105  100  2.5
0

S n 40 400
6. Decision: Reject H0, since Z cal is not in the acceptance region
7. Conclusion: At 5% level of significance the claim of the authority is not correct.
8.5 Test of Association
It is also possible to apply hypothesis testing on categorical data.
Suppose we have a population consisting of observations having two attributes or qualitative
characteristics say A and B. Suppose A has r mutually exclusive and exhaustive classes and B
has c mutually exclusive and exhaustive classes. The entire set of data can be represented using
c*r contingency table.
B
A B1 B2 . . Bj . Bc Total
A O O O O R
1 11 12 1j 1c 1
.
A O O O O R
i i1 i2 ij ic i
.
A O O O O
r r1 r2 rj rc

Total C C C n
1 2 j
The chi-square test is used to test the hypothesis of independency of two attributes

Sagni D. 79
Introduction to Statistics and Probability
 Oij  eij 2 
r c
The statistic is given by     2
 ~  2 with r  1c  1 deg ree of freedom .
i 1 j 1  e 
 ij

Where Oij =The number of units that belong to category i of A and j of B.


eij = Expected frequency that belong to category i of A and j of B and eij is given by
Ri  C j
eij  Where Ri=the i th raw total, Cj= the j th column total. n= number of observation.
n
r c r c
Remarks:  Oij   eij
i 1 j 1 i 1 j 1

The null and alternative hypothesis may be stated as:


H0: There is no association between A and B.
H1: not H0 (There is association between A and B).
Decision Rule:
Reject H for independency at α level of significance if the calculated value of  2 exceeds the
0
tabulated value with degree of freedom equal to (c-1) (r-1).
Example 8.10 A researcher is interested to assess the effect of litracy on family planning use.
Accordingly he collected data and tabulated the findings in the following manner. Can we say
there is association between educational status and family planning use?
FP Use Educational Status Total
Ilitrate Litrate
Yes 63 49 112
No 15 33 48
Total 78 82 160
Solution: H0: There is no association between Educational Status and FP use.
H1: not H0.
Ri  C j
eij  Where Ri=the i th raw total Cj= the j th column total. n= number of observation
n
112  78 112  82 78  48 82  48
e11   45.6 , e12   57.4 , e21   23.4 , e22   24.8
160 160 160 160
r  Oij  eij 2  63  54.62 49  57.62 15  23.42 33  24.62
c
   
2
     8.464
i 1 j 1  e  54 .6 57 .4 23 .4 24 .6
 ij 
Decision Rule:
Reject H for independency at 0.05 level of significance if the calculated value of  2 = 8.464
0

exceeds the  0.05 ((1)  3.841. Since 8.464 > 3.841, Ho is rejected and we conclude that there is
2

association between educational level and family planning use.

Sagni D. 80
Introduction to Statistics and Probability
CHAPTER NINE
Simple Linear Regression and Correlation
9.1. Introduction
In previous chapters we have been dealing with a single variable only. But, it is often of interest
to see whether there is a relationship, between two or more variables. For example we may be
interested in examining the relationship between the variables in one of the following pair of
variables:
Price and demand of a commodity Level of education and monthly income
Height and weight of individuals Amount of fertilizer used and crop yield
In each of the above examples we may be interested to in knowing the extent (degree) of
relationship between variables or we would like to predict some values of a variable from known
values of the other variable. The main objective of this chapter is to discover and measure the
association or relation between two variables; that is to determine how the variables change
together. This can be done by employing the methods of regression and correlation.
9.2. REGRESSION
Regression is the estimation or prediction of values of one variable from known values of
one or more variables. The variable whose value is to be estimated or predicted is known as
dependent/predicted variable; while the variables whose values are used to determine the value
of the dependent variable are called independent/predictor variables.
Regression could be classified in to two types according to the number of variables. If the
variables are only two (one dependent and one independent), then the regression is called simple
regression. If more than two variables are involved then the regression is known as multiple
regression.
Regression Analysis is a statistical technique that can be used to develop a mathematical
equation showing how variables are related. Our objective is therefore to establish a
mathematical relationship which could help as in determining the value of the dependent
variable. The graph obtained from such relationships could be a linear or a curve (non-linear). If
the relationship is linear, then the regression is called a linear regression. If the relationship is
non-linear, then the regression is known as non-linear regression. In this chapter we will deal
with the type of regression involving only two variables and having a linear relationship, i.e.
simple linear regression.Simple Linear Regression: A regression analysis involving only two
variables and having a linear relationship.
The regression equation
Regression equation is a mathematical equation that defines the relationship between two
variables. Let X and Y be two variables of interest such that X is the independent variable and Y
is the dependent variable. Suppose we have data of n-observations for each of the two variables.
The data, given in ordered pairs, can be presented as: x1 , y1 , x2 , y 2 , x3 , y3 xn , y n  . The
regression equation called the regression of Y on X is given by:
Y    X  
Where Y  dependent var iable and X  independen t var iable ,
  regression cons tan t and   regression slope
  randomdisturbance term
The parameters  and  are unknown population parameters and the above equation is the
population regression equation. The parameters  and  can be estimated by several methods.

Sagni D. 81
Introduction to Statistics and Probability
One of the several methods is the least square method. The method of least square estimates
the parameters  and  by minimizing the sum of squares.
The above regression model can is estimated by:
Yˆ  a  bX
Where : a is thevalue of Y when x  0 or theY  int ercept
b is the regression slope (regression coefficien t ) and it indicates the change
inY for a unit change inX
a and b ar obtained by min imizing SSE   2

  Yi  Yˆ 
2

Where Yi  Observed value


Yˆi  Expected value  a  bX i
This method is known as the least squareOLS 
 Minimizing SSE    2 gives

b
 X  X Y  Y   n X Y   X  Y
i i i i i i

 X  X  n X   X 
2 2 2
i i i

a  Y  bX
The regression slope (b)
The regression slope (b) also called the regression coefficient indicates the value by which the
variable Y changes for a unit change in X.
o If bis positive, we say there is a direct relationship between the two variables.
o If bis negative, we say there is an inverse relationship between the two variables.
o If bis zero, we say there is no linear relationship between the two variables.
9.3.CORRELATION
Correlation is a statistical measure used to determine the degree of association
(relationship) between two or more variables. By „association‟, we mean the tendency of the
variables to move together. Note that correlation measures the degree as well as direction of
relationship between two or more variables. A correlation analysis involving only two variables
and having a linear relationship is called Simple Linear Correlation.
Suppose the relationship between two variables can be approximated by a straight line. Then, a
measure of correlation known as the Pearsonian coefficient of correlation can be used to measure
the degree of relationship between the two variables.
The Pearson coefficient of correlation denoted by rxy is given by:
n X i Yi   X i  Yi
rxy 
n X i   X i  n Yi   Yi 
2 2 2 2

Properties of Pearsonian coefficient of correlation


o The value of rxy is always between -1 and 1. That is  1  rxy  1
o Adding a constant to each values of X and Y as well as multiplying each value by a
constant does not affect the value of rxy

Sagni D. 82
Introduction to Statistics and Probability
o It measures only the strength of association between variables, i.e., it does not measure
the cause and effect relationship
If 0 < rxy < 1 we say there is a positive correlation between the variables.
If -1 < rxy < 0 we say there is a negative correlation between the variables.
If rxy =1 we say there is a perfect positive correlation between the variables.
If rxy =0 we say there is no linear correlation between the variables.
If rxy = -1 we say there is a perfect negative correlation between the variables.
Covariance
The covariance between two variables, say X and Y, measures the joint variation in the two
variables, i.e., it measures the way in which the values of X and Y vary together.
The covariance between two variables X and Y (Sxy) is given by:
n X i Yi   X i  Yi
S xy 
nn  1
The coefficient of determination (R2)
Itis the percent of the variation that can be explained by the regression equation. R 2 is the
Explained var iation
explained variation divided by the total variation, i.e., R 2   100% . It is the
Total var iation
square of rxy . That is, R2= rxy  100% .
2

Relationship between the regression slope, correlation coefficient, covariance and variances
S xy S xy Sy Sx
1. b  2
2. r  3. b  r and r b
Sx SxSy Sx SY
Examples
1. The systolic blood pressures of 10 men of various ages are tabulated as follows:
Age(X) 37 35 41 43 42 50 49 54 60 65
Systolic Blood pressure (Y) 110 117 125 130 138 146 148 150 154 160
a) Find the regression equation of Y on X
b) Find the value of systolic blood pressure for an age of 55
c) Find the correlation coefficient
d) Find the covariance
e) Find the coefficient of determination
Solution:
Xi Yi Xi2 XiYi Yi2
37 110 1369 4070 12100
35 117 1225 4095 13689
41 125 1681 5125 15625
43 130 1849 5590 16900
42 138 1764 5796 19044
50 146 2500 7300 21316
49 148 2401 7252 21904
54 150 2916 8100 22500
60 154 3600 9240 23716
65 160 4225 10400 25600

Sagni D. 83
Introduction to Statistics and Probability

X i =476 Y i
=1378 X i
2
=23530 X Y i i =66968 Y i
2
=192394
a) Regression of Y on X
n X i Yi   X i  Yi 10(66968)  476  1378 13752
b    1.567
n X i   X i  10(23530)  476
2 2 2
8724
a  Y  bX  137.8  1.567(47.6)  62.766
Therefore the regression line of Y on X is given by : Yˆ  62.766  1.576 X
Since b is positive there is a direct relationsh ip between age and systolic blood pressure.
b) X=55  Y  62.766  1.567(55)  148.951  149 . Therefore the value of systolic blood
pressure that corresponds to age of 55 is 149.
c) Correlation coefficient ( rxy )
n X i Yi   X i  Yi 13752 13752
rxy     0.93
n X i   X i  n Yi   Yi  93.4  158.3
2 2
2 2 8724 25056

Since 0  rxy  1 there is a positive correlatio n between age and systolic blood pressure
d) Covariance
n X i Yi   X i  Yi 13752
S xy    152.8
nn  1 10  9
e) R2= rxy  100% = 0.932  100%  86.49 .
2

Interpretation: 86.49% of the variation in Y is explained by the regression equation.


2. The following summary data gives the score of 12 students in mid-exam (X) and final exam (Y).
 X i =687  Y i =741  X i 2 =45591  X iYi =48407  Yi 2 =52525
a) Find the regression equation of Y on X
b) Predict the score of final exam if the score of final exam is 85
c) Find the correlation coefficient
d) Determine the proportion of the total variation in Y which is explained by the regression
equation
Solution:
a) Regression of Y on X
n X i Yi   X i  Yi 12(48407)  687  741
b   0.956
n X i   X i  12(45591)  687 
2 2 2

a  Y  bX  61.75  0.956(57.25)  70.194


Therefore the regression line of Y on X is given by : Yˆ  70.194  0.956 X
Since b is positive there is a direct relationsh ip between mid  exam and final exam scores.
b) X=85  Y  70.194  0.956(85)  88.28 . Therefore final exam score that corresponds to
mid-exam score of 85 is 88.28.
c) Correlation coefficient ( rxy )

Sagni D. 84
Introduction to Statistics and Probability
n X i Yi   X i  Yi 12(48407)  687(741)
rxy    0.9149
n X i   X i  n Yi   Yi  12(45591)  687  12(52525)  741
2 2 2 2 2 2

Since 0  rxy  1 there is a positive correlatio n between mid  exam score and final exam score

d) R2= rxy  100% = 0.9149 2  100%  83.7 .


2

Interpretation: 83.7% of the variation in Y is explained by the regression equation.


3. The coefficient of correlation between two variables X and Y is 0.64. Their covariance is 16
and the standard deviation of X is 3.
a) Find the standard deviation of Y
b) Find the regression coefficient
Solution:
Given: rxy  0.64 S xy  16 Sx  3
S xy rS x 0.64  3 S xy 16
a) r   Sy    0.12 b) b  2   1.78
SxSy S xy 16 Sx 9
Exercises
1. The following data shows study hours spent (X) on a Statistics Course and examination
scores (Y) for a Sample 8 Students.
Study Hours (X) 20 16 34 23 27 32 18 22
Score (Y) 64 61 84 70 88 92 72 77
a) Find the regression equation of Y on X
b) Find the value of Exam score for a study hour of 25
c) Find the correlation coefficient
d) Find the covariance
e) Find the coefficient of determination ( Determine the proportion of the total variation in
Y which is explained by the regression equation.
2. The following summary data gives the values of variable (X) and variable (Y).
 X i =3000  Y i =260  X i 2 =9245  X iYi =8047.5  Yi 2 =775
a) Find the regression equation of Y on X
b) Find the correlation coefficient
c) Find the covariance
3. The covariance between two variables X and Y is 33.89. Their standard deviation of X and Y
are 16.63 and 7.15 respectively.
a) Find the regression coefficient
b) Find the correlation coefficient

Sagni D. 85

You might also like