Modules in Stat101
Modules in Stat101
Modules in Stat101
Table of Contents
2
Title…………………………………………………………………………………………… 1
Table of Contents………………………………………………………………………….. 3
3
Objectives
History of Statistics
It has been believed that statistics started with the beginning of man’s existence.
As early as 3000 B.C., the population was recorded in Babylonia and in China. Almost
five thousand years ago, the Sumerians counted their citizens for taxation purposes,
and at various times later, the Egyptians conducted their inquiries into the occupation
of the people. In biblical times, censuses were undertaken by Moses in 1491 B.C. and
by David in 1017 B.C. The Athenians and other classical Greeks took census in time of
stress, carefully counting the adult male citizens in wartime and the general populates
when the food supply was endangered. The Romans registered adult males and their
property for military and administrative purposes. The sixth King of Rome from 578 to
534 B.C., Servinus Tullius, was given credit for instituting the gathering of population
data. It could be seen that the nature and uses of statistics before was for population
recordings that will be used for occupation inquiries, taxation, military power and
administrative purposes.
In England, William the conqueror required the compilation of information on
population and resources. This compilation, “The Domes Day Book,” is the first
landmark in British statistics. In the Middle Ages, registrations on land ownership and
manpower for wars were made. In the 13 th century, tax lists of the parish included the
registration of by those who were subject to tax.
It was Achenwall (1719 – 1772) who first introduced the word “statistiks” and
later popularized “statistics” in the books of Zimmerman and Sinclair.
European mathematicians and gamblers suspected that game of chance such as
rolled dice; playing cards and tossed coins followed certain laws of probability. It was
Gerolano Cardano (1501 – 1576), an Italian mathematician, physician and gambler
wrote “Liber de Ludo Aleae” (The Book on Chance and Games) who was considered as
the first known to study of the principles of probability.
In 18th century, statistics was used in the study entitled “Political Arrangement
of the Modern States of the Known World.” The description of the work was at first
verbal. Gradually, an increasing proportion of numerical data was used in the
description of the work.
In the 19th century, a Belgian Astronomer Lambert Adolfe Quetelet applied the
theory of probability to an anthropological measurement and expanded the same
principle to the physiological, physical, and chemical fields. He established a central
commission for statistics, which became a model for similar organizations in other
countries.
Sir Francis Galton (1822-1911), an English scientist and cousin of Charles
Darwin, developed the use of percentiles and the correlation method and was an early
proponent of statistical analysis as applied to mental and behavioral phenomena. Karl
Pearson, a British applied mathematician and philosopher of science, was one of the
major developers of the science of statistics. He originated the basic statistical concepts
and procedures as standard deviation, the random walk and the chi- square test.
4
Sir Ronald Aylmer Fisher (1890-1962), a British statistician, was the most
prominent figure in the fields of statistics in the twentieth century. Fisher (F- test) use
in the analysis of variance (ANOVA) in inferential statistics. He started investigations on
experimental design, randomization and mathematical statistics.
Shortly before the Second World War, the number of applications of statistical
methods in the social sciences began to increase. The number of surveys of all kinds
increased, and the need to interpret data in mathematics, business and the social
sciences made it necessary for workers to have at least a basic understanding of
statistics.
Importance
Some of the uses of statistics as a science are evident even before and nowadays
and also in empirical studies. Among its uses and contributions are the following:
Statistics aids in the decision making in the actual condition of a field;
Statistics summarizes or describes data about the characteristic of a group; and
Statistics helps to forecast or predict future outcomes, make inferences, and
helps in comparing or establishing relationships.
Fields of Statistics
5
Inferential statistics – pertains to the methods dealing with making
inferences, estimates or predictions about a large set of data using the information
gathered. Commonly used inferential statistical tools are testing hypothesis using z –
test, t – test, paired t – test, f – test, simple linear correlation, analysis of
variance (ANOVA), chi – square, regression and time series analysis.
Examples:
1. Is there a significant relationship between the grades of BSHRM in
Mathematics and English?
2. Is there any reason to doubt that hypertension is dependent on smoking?
3. Determining whether there is significant relationship between job
satisfaction and performance of BSHRM graduates in DMMMSU – MLUC.
4. Determining whether there is significant difference between On The Job
training Service Delivery of BSHRM students in San Fernando City, La
Union.
Definition of Terms
6
Variable – characteristic or attribute of persons or objects, which assumes
different values (numerical) or labels (quantitative).
Example: Age, Gender, Educational Attainment
Qualitative variable – characteristics expressed in words or statements
describing the categories into which units of observation are classified. Yields
categorical values.
Examples: Religious affiliation, Status of Appointment
Quantitative variable – characteristics expressed in numerals or numbers
describing the categories. Yields numerical values.
Examples: Weight, height, scores in CET
Discrete - quantitative variables that are classified countable many possible
values.’
Examples: No. of Unemployed Graduates in the Philipppines, Type “O”
blood donors.
Continuous - quantitative variables that are amenable for numerical
measurement.
Examples: Height of a person, weight of 10 persons.
Nominal – the crudest form of measurement. This may be classified as
responses that has something to do with labels and names where there is no ordering
and cant find differences.
Example: Market survey about a product:
20-like; 40-dislike; 10-no opinion
Ordinal – a sort improvement of nominal level. Ranking file or responses will
fall from this level.
Example: Performance Rating of Faculty Members in DMMMSU - MLUC:
Outstanding – Very Satisfactory – Moderately Satisfactory
Interval – possesses the properties of the nominal and interval levels yet has no
absolute zero (not amenable for numerical measurement).
Example: Sample IQ of BSHRM students in DMMMSU – MLUC.
Ratio – possesses all the properties of the nominal, ordinal, and interval but
contrary to interval, it has a starting point or absolute zero.
Example: Sample weight of babies (in kilograms)
7
Exercise 1
Name: Score:
Course & Year: Date:
8
Chapter 2. Collection of Data
Objectives
Research Methodologies
Once a research question has been determined the next step is to identify which
method will be appropriate and effective.
There are two sources of data. Primary data collection uses surveys, experiments
or direct observations. Secondary data collection (data mining) may be conducted by
collecting information from a diverse source of documents or electronically stored
information.
Document – This method identify trends in leisure research and practice.
Participants keep diaries and journals researcher conducts content analysis of studies,
reports and diaries.
Advantages:
1. It can get comprehensive and historical information.
2. It can yield an impression of how strategy operates without interrupting the
strategy.
3. The information already exists.
Challenges:
1. It often takes a lot of time.
2. Information can be incomplete.
3. Need to be clear about what you are looking for.
4. Data is restricted to what already exists.
9
A record with discrete entries arranged by date reporting on what
4. Diaries
happened over the course of the day.
Methodology in the social sciences for studying the content of
communication. It is the study of recorded human communication
5. Content Analysis such as books, websites, printings and laws.
Survey – this technique is used to quickly and easily get a lot of information
from people in a non – threatening way.
Advantages:
1. It can complete the information anonymously.
2. Inexpensive to administer.
3. Easy to compare and to analyze.
4. Can administer to many people.
5. Factual and relevant information can be obtained in using this method.
6. Get a full range and in-depth of information.
7. Can be flexible.
Challenges:
1. Doesn’t always get the full story.
2. Might not get careful feedback.
3. Question wording can bias respondent’s answer.
4. Sometimes yield inaccurate information.
5. Can take a lot of time.
10
6. Can be costly.
7. Interviewer can bias the response.
11
Assess resources such as time, accessibility of respondents, and money factors
before conducting a research.
Determine the sample size needed. Lynch formula is the basic approach to
determine the appropriate sample.
x p 1 p
2
NZ
n
N d 2 Z 2 p 1 p
where:
n = the sample required
N = the actual population
p = the largest possible proportion (0.50)
d = sampling error
z = the value of the normal variable (1.96)
for reliability level of 0.95
Prepare the materials that will be needed that if questionnaires will be used,
validity (truthfulness of the instrument) and reliability (consistency of the
instrument) should be tested.
Sampling Techniques
Random Sampling - method of selecting a sample size (n) from a population (N)
where all possible combinations of size (n) have an equal chance of being selected as the
sample.
1. Lottery sampling – this constitutes the principle of a raffle system where
everybody is given the chance to be chosen as respondent.
Example: How will you select 5 winners out of 50 to come and see the
fight of Manny Pacquiao against Mayweather?
12
Based on the information, how can Pablo and his selected members pick
sample units proportionate to the aforementioned number of members for each
organizations and fraternities and sororities?
Steps:
1.1 Determine the distribution of population and its’ percentage
1.2 Use the Lynch formula to determine the sample size needed in the study.
x p 1 p
2
NZ
n
N d 2 Z 2 p 1 p
3435 1.96 x 0.50 0.50
2
3435 .05 2 1.96 2 0.50 0.50
346
13
Beta Sigma with 675, UI with 970, SI with 1400, TGP / S with 365, and APO
with 200 members.
However, in North – La Union Campus, the SAS Coordinator only
provided them the copy of 4 accredited fraternities and sororities. SI with 500
members, UI with 650, Beta Sigma with 240, and TGP with 170 members.
Based on the information, how can Pablo and his selected members pick
sample units proportionate to the aforementioned number of members for each
fraternities and sororities in the different campuses?
Steps:
a. Determine the distribution of population and its’ percentage
b. Use the Lynch formula to determine the sample size needed in the
study.
x p 1 p
2
NZ
n
N d 2 Z 2 p 1 p
8815 1.96 x 0.50 0.50
2
8815 .05 2 1.96 2 0.50 0.50
368
14
UI 66 * 0.42 28
BS 66 * 0.15 10
TGP 66 * 0.11 7
Total 66 (0.18) 66
APO 144 * 0.09 13
SI 144 * 0.35 50
Mid La Union BS 144 * 0.21 30
UI 144 * 0.24 35
TGP/S 144 * 0.11 16
Total 144 (0.39) 144
De Molay 166 * 0.05 8
BS 166 * 0.18 28
UI 166 * 0.25 40
South La Union
SI 166 * 0.37 58
TGP / S 166 * 0.10 16
APO 166 * 0.05 8
Total 158 (0.43) 158
Grand Total 368
Exercise 2
Name: Score:
Course & Year: Date:
15
A. Identify whether the statement is a qualitative research or quantitative research. Give
also the appropriate data collection technique and specific type that can be utilized.
1. A Company Case Study on LUELCO Multi – Purpose Cooperative
Objectives
16
Discover the different ways of presenting data.
Identify the essential parts of a frequency distribution.
Learn to construct frequency distribution.
Identify and present data using Microsoft excel.
Textual Presentation
The presentation is in narrative or paragraph form. The data are within the text
of the paragraph. This form may not get the immediate interest of the reader.
Examples:
1. Latest Survey conducted by the National Statistics Office revealed that
there were already 2.9 million unemployed graduates accumulated from 2005 –
2007.
2. The continuing decrease of the price of crude and gasoline in the
Philippines is simply the effect of the low price of this products in the World
Market.
Tabular Presentation
Parts of a Table:
1. Table heading – it consist of a table number and the brief title.
2. Stubs – it consists of the classifications or categories that are found on the
left side of the body of the table.
3. Box head – it identifies what are obtained in the column.
4. Body – it is the main part of the table and contains the substance of the
table.
5. Footnote – It is used for citations and references.
Types:
1. General or Reference Table – a repository of information and the main
purpose is to present data in such a way that individual items may easily be
found by a reader.
Example:
Table1. College of Arts and Management Faculty Profile
As to Educational Attainment
Males Females Total
PH.d, ED.d 0 5 5
Master’s Degree 3 9 12
BS/AB 2 11 13
Total 5 25 30
a. What percent of the total population both males and females obtained
master’ degree? 40%
17
b. What percent of the population of the males holds BS/AB degree?
40%
c. How many males and females and the percentage description is
dominant in the College of Arts and Management Faculty Profile as to
Educational Attainment? 13 & 43.33%
2. Summary or text table – usually small in size and design to guide the reader
in analyzing the data.
Example:
Table 2. Population of Students in the College of Arts
and Management for S/Y 2007-2008
Programs 1st Year 2nd Year 3rd Year 4th Year Total
BSM 120 100 70 68 358
AB 70 60 50 40 220
1. Determine the range by obtaining the difference between the highest score and lowest
score.
2. Determine the ideal number of class intervals or categories desired somewhere
between 5 and 20.
3. Determine the appropriate class size or class width by dividing the range and the
desired categories.
4. Write the class intervals starting from the lowest lower limit provided that it should
be divisible by class width.
5. Determine the class frequencies after tabulation referring to the tally column.
6. Assign representative for each category by computing for the class mark that can be
18
obtained by dividing the sum of the upper and lower limit of the class interval by 2.
Example
A. Construct frequency distribution. The following scores are the results in Statistics
examination.
88 77 72 85 90 20 25 60 45 77
50 62 76 56 42 24 21 40 41 78
61 67 35 29 78 87 84 90 64 58
79 74 69 66 61 68 56 51 48 39
27 81 75 71 67 63 57 53 68 85
80 75 70 63 57 52 49 44 33 23
Steps:
1. Range = 90 – 20 = 70
2. No. of Class Intervals = 7
3. i = 70 / 7 = 10
a. How many of the students and its percentage got a score below 59.5?
26 students & 43%
b. What percent of the total population obtained the highest frequency?
21.67%
Exercise 3
19
Name: Score:
Course & Year: Date:
A. The following shows the list of transportation means of sample 50 BSBA students in
going to DMMMSU – MLUC.
25 40 42 25 39 43 36
28 36 48 34 46 39 49
29 44 36 25 39 40 53
32 35 29 35 40 48 37
11 – 30.
CI CB Tally f M <cf %(<cf)
Graphical Presentation
20
The most convenient and popular way of describing data is using graphical
presentation.
Advantages:
@ It is easier to understand and interpret data when they are presented
graphically than using words or a frequency table.
@ It can present data in a simple and clear way.
@ It can illustrate the important aspects of the data.
This leads to better analysis and presentation of the data. In this topic, we will
discuss the approach for the most commonly used graphical methods such as bar
charts, histograms, frequency polygons, pie chart and xy scatter diagram.
Classifications
Bar charts
Bar charts are used when comparing the values of multiple variables. Bar charts
are presented using vertical or horizontal bars. Bars may be drawn separately from
each other. It is important that the width of each bar should be the same to avoid
misleading information. Table 1 presents the population of Staff in DMMMSU.
Figure 1 is an example of simple bar chart. Purposively, they reflect the actual
magnitude of the frequency of each item and frequencies can be compared by
comparing the heights of bars on the chart. Apparently, there are more staff in
DMMMSU – MLUC and SRDI has the least number of staff.
21
Table 2 presents the population of male and female staff in five operating units in
DMMMSU.
Table 2. Number of Male and Female Staff in DMMMSU
Operating Agency Male Female N
NLUC 90 120 210
MLUC 100 200 300
SLUC 70 100 170
SRDI 40 60 100
Apiculture 50 70 120
Total 350 550 900
Histograms
Histograms or column bar charts are common ways of presenting frequency in a
number of categories. Commonly used graphical presentation methods also include the
frequency polygon and ogive. Histograms portray an unequal width frequency
distribution table for further statistical use. The bars appear in a histogram where the
classes are marked on the x axis and the class frequencies on the y axis. It is important
to note that a bar chart does not have x-axis units. The histogram is constructed by
creating x-axis units of equal size and these should correspond to the frequency table.
22
Based on the histogram in Figure 3, we can conclude the following:
1. The lowest score got by the students in the test was 20 and the highest was 90.
2. The class with the highest frequency is 60 up to 69 scores while the class with the
lowest is 30 – 39 with a total of 13 and 3 observations fall within this range,
respectively.
3. Majority or 72% of the students passed while 28% failed.
Pie Chart
This presentation is best used when the total categories are between 2 to 6. A
pie chart shows the proportional size of items that make up a data series to the sum of
the items. It always shows only one data series and is useful when you want to
23
emphasize a significant element. To make small slices easier to see, you can group them
together as one item in a pie chart and then break down that item in a smaller pie or
bar chart next to the main chart. Apparently, it clearly show that DMMMSU – MLUC
has the most number of male staff in five operating agencies and Apiculture has the
least number of male staff.
Other classifications:
Area – an area chart emphasizes the magnitude of change over time. By displaying
the sum of the plotted values, an area chart also shows the relationship of parts to a
whole. See Table 3.
XY (Scatter) – xy (scatter) chart either shows the relationships among the numeric
values in several data series or plots two groups of numbers as one series of xy
coordinates. This chart shows uneven intervals – or clusters – of data and is
24
commonly used for scientific data. When you arrange your data, place x values in
one row or column, and then enter corresponding y values in the adjacent rows or
columns. See Table 9.
Doughnut – like a pie chart, a doughnut chart shows the relationship of parts to a
whole, but it can contain more than one data series. Each ring of the doughnut
chart represents a data series. See Table 10.
Stock – the high – low- close chart is often used to illustrate stock prices. This chart
can also be used for scientific data, for example, to indicate temperature changes.
You must organize your data in the correct order to create this and other stock
charts. See Table 11.
25
Bubble – a bubble chart is a type of xy (scatter) chart. The size of the data marker
indicates the value of a third variable. To arrange your data, place the x values in
one row or column, and enter corresponding y values and bubble sizes in the
adjacent rows or columns. See Table 12.
Radar - in a radar chart, each category has its own value axis radiating from the
center point. Lines connect all the values in the same series. A radar chart
compares the aggregate value of a number of data series. See Table 13.
26
Surface – a surface chart is useful when you want to find optimum combinations
between two sets of data. As in a topographic map, colors, and patterns indicate
areas that are in the same range of values. See Table 14.
Cone, Cylinder and Pyramid – the cone, cylinder and pyramid data markers can
lend a dramatic effect to 3-D column and bar charts. See Table 15, 16, & 17.
27
28
29
Exercise 4
Name: Score:
Course & Year: Date:
A. The following data have been obtained for the number of customers arriving per hour
in a sample of 30 supermarkets.
52 29 26 29 32 26
62 49 40 25 25 31
44 24 27 38 73 32
27 32 28 30 57 34
39 37 28 24 51 50
b. Give at least three conclusions that can be drawn from the presentation.
b.1
b.2
b.3
30
Travel Expenses Php3,000,000
Materials Php2,500,000
Other Expenses Php1,500,000
Total Php60,000,000
Objectives
The study of the descriptive statistics is not complete without the inclusion of
the concept of a measure of the “Central Location.” It is the tendency of the
observations to converge at a point or at the center of a frequency distribution.
There are three measures of central location widely used in descriptive statistics;
the mean, median, and mode, each of which has its own appropriate use in describing
the sample or population under study.
Mean
Of the three measures, the mean is the basic to higher statistical computation,
since it varies less from sample to sample. It is more reliable because in computing for
the mean all the data in the distributions are used.
Characteristics:
1. It is the average value in the given distribution.
2. It serves as the balance point in the distribution.
3. It is the most sensitive measure of location.
4. It is always affected by extreme values.
5. It is used when data is categorized as interval or ratio.
Computations:
1. Ungrouped Data
n
Xi
i1
x
N
Where:
x = the unknown mean.
X i = scores or observations.
n
X = summation or total of items or observations.
i1 i
N = number or population.
31
2. Grouped Data
n
f i Mi
i 1
x Long Method
N
Where:
x = the unknown mean.
f i = number of items or observations.
X i = midpoint or representative of each category.
n
fi Mi = summation or total of the product of frequency and
i 1
midpoint for each category.
N = number of population.
Alternative Formula:
n '
fi d i
' i 1
x x i Coded Deviation Method
N
Where:
x = the unknown mean.
x ' = the assumed mean.
f i = number of items or observations.
d i = deviation from the mean.
n
fi d i = summation or total of the product of the frequency and
i 1
deviation from the mean of each category.
N = number of population.
Weighted Mean
There are some instances when some values or items are taken with greater
importance than others. Computation of the average in this situation makes use of
weights. Sometimes it is applicable when data are presented by ranking.
1. For Grouped and Ungrouped Data
n
Wi X i
i 1
wx
Wt
Where:
w x = the unknown weighted mean.
W = corresponding weights.
i
X = scores or observations.
i
n
Wi X i = summation of the product of weights times observations.
i 1
W = total number of weights.
t
32
Examples
Interpretation: The value could be analyzed that the average or typical incorrect
answer on a true – false competency test obtained by 15 sample students is 3.
2. What is the average length of lives of 10 sample car batteries that lasted with the
following (in years): 1.5, 1.7, 2.1, 2.3, 3, 3.5, 5.3, 5.7, 6.2, & 6.6?
Computation: (Using the Formula)
n
Xi
i 1
x
N
37.9
10
3.79
Computation: (Using ES PLUS CASIO)
mode, stat, 1 – var, 1.5 =, 1.7 =, 2.1 =, 2.3 =, 3 =, 3.5 =, 5.3 =, 5.7 =,
6.2 =, 6.6, AC, shift, 1, var, x , then =. The mean is the same as 3.79.
Computation: (Using MS EXCEL)
select a vacant cell, =, average, highlight the scores,), press enter.
The mean is the same as 3.79.
Interpretation: The value could be analyzed that the mean life of 10 sample car
batteries lasted for 3.79 years.
3. The following are the responses of the 60 students (two sections) in statistics class
about the level of effectiveness of the programmed material in statistics after the
experiment.
Level of Effectiveness
Very Effective (5) Effective (4) Moderate (3) Slight(2) Negligible (1)
33
14 27 10 7 2
Computation: (Using the formula)
n
W X
i 1 i i
wx
Nt
224
60
3.73
Computation: (Using the ES PLUS CASIO)
mode, stat, 1 var, shift, mode, arrow down, stat, 1, for x column (5 =, 4 =, 3 =,
2 =, 1 =), for freq column (14 =, 27 =, 10 =, 7 =, 2 =), AC, shift, 1, var, x , then =.
The weighted mean is the same as 3.73.
Interpretation: The result could be analyzed that the programmed material in statistics
is somewhat close to be effective and favorable to the majority of the students.
n
f i d'
i 1 i
x x' i Coded Deviation Method
N
3
37 5
28
37.54 or 38
34
mode, stat, 1 var, shift, mode, arrow down, stat, 1, for x column (27 =, 32 =, 37 =,
42 =, 47 =), 52 =), for freq column (6 =, 2 =, 9 =, 6 =, 4 =, 1 =), AC, shift, 1, var,
x , then =. The mean is the same as 37.54 or 38.
Interpretation: The value could be interpreted and perceived that teller machine can
accommodate and process transactions an average of 38 in a day.
Median
Unlike the mean, the median is not easily affected by extreme values, since only
the middle terms or values which are arranged from increasing or decreasing are
considered in the computation. For research studies associated with ordinal data,
median is applicable and stable measure to use.
Characteristics:
1. It divides the distribution into two equal parts.
2. It is not amenable for further computation since middle terms are being
considered.
3. It is not affected by extreme values.
4. It is used when data is categorized as ordinal or ranking.
Computation:
1. Ungrouped Data
1. a When n is an odd number
x x1 n
2
Where:
x = the unknown median.
N = the number of items or observations.
Examples
35
A. Solve what is being asked for each number.
1. A food inspector examined a random sample of 5 cans of a certain brand of canned
goods to determine percent of impurities. The following data were recorded, 1.2, 1.8,
0.8, 1.3, and 1.8. Find the median.
Computation:
x x1 5
2
3
Interpretation: Therefore, the third item of the observation and may be classified as
the median is 1.3 after arranging from ascending order.
Mode
The mode, although easy to compute is seldom used because of its unstable
characteristic. However, it is a more appropriate measure of central location for data
which all for a nominal scale as a measure of popularity.
Characteristics:
1. Computation depends on the frequency occurrence or that appears most
frequent in the distribution.
2. It is appropriate to use when the distribution is bimodal.
36
3. It is used when data is categorized as nominal level of measurement.
Computation:
1. Ungrouped Data
Select the score that appears most frequent in the distribution.
d
2. Grouped Data x L CB 1 i
d1 d2
Where:
x = the unknown mode.
L CB = lower class boundary of the modal class.
d = difference of the frequency of the modal class minus the frequency
1
that precedes it.
d = difference of the frequency of the modal class minus below of the
2
modal class frequency.
Examples
37
Exercise 5
Name: Score:
Course & Year: Date:
38
35 – 39 2
40 – 44 2
45 – 49 4
50 – 54 4
55- 59 8
60 – 64 3
65 – 69 5
70 – 74 7
75 – 79 6
80 – 84 1
85 – 89 5
90 – 94 1
95 – 99 1
N = 50
1 – 10. Compute the mean value using the formula method or ES PLUS CASIO and
interpret the result.
Computation:
Interpretation:
Interpretation:
21 – 30. Compute the value of the mode and interpret the result.
Computation:
39
Interpretation:
B. Apply the appropriate measure to describe the level of satisfaction of 100 faculty
members in DMMMSU in terms of promotion and interpret the result as well. The following
data were recorded:
Level of Satisfaction
5 4 3 2 1
15 20 42 14 9
31 – 40. Compute the level of satisfaction of 100 faculty members in DMMMSU in terms
of promotion.
Computation:
Interpretation:
Quantiles
Quantiles is just the extension of the median concept. It figures the complete
breakdown of the distribution as to percentiles, deciles and quartiles.
Classifications:
1. Percentile – It is the value that divides the distribution into 100 equal
parts.
2. Decile – It is the value that divides the distribution into 10 equal parts.
3. Quartile – It is the value that divides the distribution into 4 equal parts.
Examples
40
A. Using the same example of efficiency ratings of faculty members in DMMMSU – MLUC
as reflected on Table 20. Find the following:
1. Q3
Computation:
3N
4 PS
Q L i
3 CB
f Q3
37.5 36
74.5 5
6
75.75 or 76
Interpretation: The first 75% of the population of faculty members obtained the
highest rating of 75.75 or 76.
2. Top management decides to promote the upper 5% of the distribution according to
efficiency rating. What is the lowest efficiency rating included in the promotion?
Computation:
48 43
P 84.5 5
96 5
89.5 or 90
Interpretation: The lowest efficiency rating of a faculty member that will be included in
the promotion is 89.5 or 90. Thus, a faculty member will be promoted if his/her rating
is 90 and above.
3. However, due to cost – cutting measures, the company decided to retrench the lowest
10% of the distribution. The group is believed to have the least contribution in the
production of the company. What is the highest efficiency rating included in this range?
Computation:
5 3
D 39.5 5
1 2
44.5 or 44
Interpretation: The highest efficiency rating of a faculty member included in the lowest
10% performer and will be retrenched is 44.5 or 44. Thus, a faculty member will be
terminated if his/her rating is 44 and below.
4. What is the percentile rank of an employee with an efficiency rating of 80?
Computation:
.5j 42
80 79.5 5
1
84.2
Interpretation: The corresponding percentile rank of an employee with an efficiency
rating of 80 is 84.2.
Exercise 6
Name: Score:
Course & Year: Date:
A. Using the same example of efficiency ratings of faculty members in DMMMSU – MLUC
as reflected on Table 20. Find the following:
1 – 5. P55
41
Computation:
Interpretation:
6 – 10. Q3
Computation:
Interpretation:
Interpretation:
16 – 20. What is the corresponding rank of the employee if his/her efficiency rating is
72?
Computation:
42
Interpretation:
Objectives
43
Determine to use the measures of variability.
Compute the measures of skewness and kurtosis.
Range
The range of a set of data is the simplest of the measures of dispersion and
could easily be solved by simply getting the difference between the highest value and
the lowest value in a distribution.
Characteristics:
1. Easiest to compute and to understand.
2. It is dependent only upon two extremes values.
3. It provides the least satisfactory conclusion about the population.
Computations:
1. Ungrouped Data
R = H.S – L.S
2. Grouped Data
R = U.lHCI – L.lSCI
Examples
Quartile Deviation
44
3N N
4 PS 4 PS
1. Q3 L CB i 2. Q1 L CB i
f Q3 fQ1
Examples
The mean deviation or more accurate mean absolute deviation is defined as the
average of the absolute deviations from the mean. Generally, the dispersion of a set of
data is said to be small if the values are close to the mean, and large if values are
scattered about the mean.
Characteristics:
1. Deviation of each score or representative from the mean.
2. More stable to use as measure of dispersion than the range and QD for it
describes the dispersion of the entire score in the distribution from the mean.
Computations:
1. Ungrouped Data
X X
MAD
N
Where:
MAD = Mean absolute deviation.
45
X = Observed scores.
X = value of the mean.
N = Number of cases.
2. Grouped Data
MAD
f x x
N
Where:
MAD = Mean absolute deviation.
f = Observed frequency.
X = Representative of each category.
X = Value of the mean.
N = Number of cases.
Examples
X X- X X X
39 9.14 9.14
31 1.14 1.14
27 -2.86 2.86
34 4.16 4.16
39 9.14 9.14
19 -10.86 10.86
20 -9.86 9.86
X X = 47.14
46
35 – 39 9 37 333 -0.54 0.54 4.86
40 – 44 6 42 252 4.46 4.46 26.76
45 – 49 4 47 188 9.46 9.46 37.84
50 - 54 1 52 52 14.46 14.46 14.46
N = 28 fM =1051 f X X = 158.24
Interpretation: It could be analyzed that on the average, the score deviates from the
mean of 37.54 by 5.65.
Variance and standard deviation are the only measures of dispersion that can be
applied for higher statistics. This can be applied in making inferences about the
consistency of the population or sample to be studied.
Characteristics:
1. Either of the two measures can be applied for statistical inferences.
2. It is the dispersion of each observation relative to the mean of the set of
scores.
Computations:
1. Ungrouped Data
s
x x 2 sample standarad deviation
n 1
s2
x x 2 sample variance
n 1
Where:
s = sample standard deviation.
x = observed frequency.
x = the mean value.
n = number of cases.
2. Grouped Data
s
f x x 2 sample standard deviation
n 1
s2
f x x 2 sample variance
n 1
Where:
s = sample standard deviation
f = class frequency
47
x = midpoint or representative of each category
x = the mean value
n = the number of cases
Examples
X X- X x x 2
2 -3.43 11.76
3 -2.43 5.90
4 -1.43 2.04
5 -0.43 0.18
6 0.57 0.32
8 2.57 6.60
10 4.57 20.88
x = 38
x = 5.43
x x 2 = 47.71
CI f M fM X- X x x 2 f x x 2
25 – 29 6 27 162 -10.54 111.09 666.55
30 – 34 2 32 64 -5.54 30.69 61.38
35 – 39 9 37 333 -0.54 0.29 2.62
40 – 44 6 42 252 4.46 19.89 119.35
45 – 49 4 47 188 9.46 89.49 357.97
50 - 54 1 52 52 14.46 209.09 209.09
N = 28 fM =1051 f x x 2 = 1416.96
48
Computation: (Using the formula)
1416.96
s
27
7.24
s2 52.48
Computation: (Using ES PLUS CASIO)
Mode, stat, 1 – var, shift, mode, arrow down, stat, 1, for x column (-10.54 =,
-5.54 =, -0.54 =, 4.46 =, 9.46 =, 14.46 =), for freq column (6 =, 2 =, 9 =, 6 =,
4 =, 1 =), AC, shift, 1, var, sx, then =. The standard deviation is the same as 7.24.
Taking the square, the variance is the same as 52.48.
Exercise 7
Name: Score:
Course & Year: Date:
A. The following are the IQ scores of 8 selected BSHRM students and 8 BSBA students in
the College of Arts and Management:
BSBA BSHRM
73 70
49
80 77
81 81
83 83
85 83
90 84
92 90
95 91
1 – 15. Which of the two groups have consistent distribution of IQ? Apply the
appropriate measure to compare the IQ scores using the formula or ES PLUS CASIO.
Computation:
Interpretation:
B. The distribution below summarizes the results of the 50-item test of 60 students in
Statistics.
Number of Correct Answers frequency
6 - 10 1
11 – 15 5
16 – 20 8
21 – 25 8
26 – 30 8
31 – 35 12
36 – 40 9
41 – 45 4
46 – 50 5
16 – 25. Compute for the variance and standard deviation using the formula or ES
PLUS CASIO and interpret the result as well.
Computation:
50
Interpretation:
Skewness
51
Skewness can be computed using the formula Pearsonian Coefficient of
Skewness formula:
3 Mean Median
SK
Standard deviation
Interpretation:
SK = 0 (Perfectly Symmetrical)
SK > 0 (Frequency polygon is skewed to the right)
SK < 0 (Frequency polygon is skewed to the left)
Example
Kurtosis
Types of Kurtosis
Formulas:
1. Ungrouped Data
K
x x 4
n s4
Where:
x = score n = sample size
X = mean value s = standard deviation
2. Grouped Data
K
f x x 4
n s4
52
Where:
x = representative of each category
X = mean value.
n = sample size
s = standard deviation
Examples
1. With reference to the student who was investigating the effect of synthetic fertilizer on
the growth of peanut seedlings, compute the kurtosis and analyze the result.
Computation:
K
x x 4
n s4
882
8 88.92
1.24
Interpretation: The value could be analyzed that the distribution is platykurtic and
that the effect of synthetic fertilizer on the growth of peanut seedlings are more
dispersed or scattered from one another.
K
f x x 4
n s4
154061.1
28 2747.605
2.002
Interpretation: It could be analyzed that the distribution is platykurtic and therefore
the number of transactions that can be made in a day are more dispersed or scattered
from one another.
Exercise 8
Name: Score:
Course & Year: Date:
1 - 10. With reference to the IQ’s of 8 selected students in the BSHRM and 8 in the BSBA
Department, compute the skewness and analyze the result.
Computation:
53
Interpretation:
Interpretation:
Objectives
54
Explain and differentiate the concepts of permutations and combinations and as
well as solve.
Explain and understand the theoretical and experimental concepts of
probability.
Enumerate and explain the characteristics of a normal curve.
Transform raw score from a standard score (z – score).
Determine the areas under the normal curve.
Multiplicative Rule
Listing and counting the elements of a sample space is appropriate for simple
experiments. For more complicated experiments we can make use of the Multiplicative
Counting Rule.
Fundamental Principle: If a thing can be done in any of m different ways and then a
second thing can be done in any of n different ways, it follows that the total number of
different ways that can be done is m times n.
Examples
55
S1 = {A, B, C} (AX, AY, AZ, AW
S2 = {X, Y, Z, W} BX, BY, BZ, BW
CX, CY, CZ, CW)
2. How many two – digit numbers can be made with the digits 2, 4, 6, and 8 if (a)
repetitions are allowed; (b) no digit is to be repeated in a number?
Computation: (Using the Listing Method)
(a) There are four choices for the tens’ digit, and after a choice is made, there
remain four choices of units’ digit as far as repetitions of digits are allowed. Therefore,
the total possibilities are 4 x 4 = 16.
Permutation
Examples
56
Vacant cell, =, permut, (, 6, 5, ), then enter. The total number of permutation is the
same as 720.
2. In how many ways can 6 BSBA students be seated in a room which has 11 chairs?
Since only 6 chairs are to be occupied, we need to find the number of
permutations of 11 things taken 6 at a time.
Computation: (Using the Formula)
11 10 9 8 7 6 5 4 3 2 1
P 11, 6
11 6 !
332,640
Computation: (Using the ES PLUS CASIO)
11, shift, x, 6, then =. The total number of permutation is the same as 332,640.
The formulas of permutations were derived on the assumption that the set of n
things, or objects, includes objects that are all different. The formulas do not apply if
some of the objects are alike. The word WEDNESDAY for example, has 9 letters. But we
can not make 9! permutations using the 11 letters at a time because 2 letters are alike.
The number of permutations of n things taken all at a time, where n1 of the
objects are alike and the others distinct is:
n! n!
P or P
n1 ! n1 ! n 2 ! n3 !....
Examples
57
n!
P
n ! n ! n !....
1 2 3
9 8 7 6 5 4 3 2 1
2! 2! 4 !
3780
Computation: (Using the ES PLUS CASIO)
9, shift, x-1, ÷, (, 2, shift, x-1, x, 2, shift, x-1, x, 4, shift, x-1,), then =.
The total number of permutation is the same as 3,780.
2. In how many ways is it possible for 12 customers to buy 5 cans of sardines, 5 cans of
corned beef, and 2 cans of meatloaf if each customer gets 1can?
Computation: (Using the Formula)
If the cans were all different there would be 12! Ways in which each customer
could get 1 can. But since there are groups of 5, 5, and 2 like cans, therefore we have:
n!
P
n1 ! n2 ! n3 !....
12 11 10 9 8 7 6 5 4 3 2 1
2 ! 5 ! 5!
16,632
Computation: (Using the ES PLUS CASIO)
12, shift, x-1, ÷, (, 2, shift, x-1, x, 5, shift, x-1, x, 5, shift, x-1,), then =.
The total number of permutation is the same as 16,632.
Combination
Examples
58
n!
C 18, 3
r! n r !
18 ! 18 ! 18 17 16
816
3 ! 18 3 ! 3 ! 15 ! 3!
2. In how many ways can 5 mathematics teachers be employed from 6 male applicants
and 4 female applicants in DMMMSU – MLUC if (a) 3 are to be men; (b) 3 or 4 are to be
men?
Computation: (Using the Formula)
(a) The 3 men can be had in C (6, 3) different ways. The remaining 2 can be had
from the female applicants in C (4, 2) ways. Hence, the total number of ways is
6! 4!
C 6, 3 C 4, 2
3! 6 - 3 ! 2! 4 - 2 !
120
Computation: (Using ES PLUS CASIO)
6, shift, ÷, 3, x, 4, shift, ÷, 2, then =. The total number of combination
is the same as 120.
(b) To find the number of ways of filling the vacancies with either 3 or 4 men
among those employed, we add the number of ways of filling the vacancies with 3 men
and 4 women to the number of ways of employing 4 men and 3 women. That is, we
have:
Computation: (Using the formula)
C 6, 3 C 4, 2 C 6, 4 C (4,1) 120 60 180
Exercise 9
59
Name: Score:
Course & Year: Date:
A. Identify what principle is applicable on combinatorics and solve using the formula or
ES PLUS CASIO.
1 - 5. In how many different ways can the letters of the word 'CORPORATION' be
arranged so that the vowels always come together?
Computation:
6 - 10. A license plate has 3 letters and 3 digits in that order. A witness to a hit and run
accident saw the first 2 letters and the last digit. If the letters and digits can be
repeated, how many license plates must be checked by the police to find the culprit?
Computation:
11 - 15. In a box there are 5 black pens, 3 white pens and 4 red pens. In how many
ways 2 black pens, 2 white pens and 2 red pens can be chosen?
Computation:
16 - 20. Twenty people meet in a room and each shakes hands with all the others. If all
of them will have to shake hands once again before leaving, how many handshakes will
there be?
Computation:
60
21 - 25. In how many ways can an animal trainer arrange 5 lions and 4 tigers in a row
so that no two lions are together?
Computation:
26 - 30. How many 3-digit numbers can be formed from the digits 2, 3, 5, 6, 7 and 9,
which are divisible by 5 and none of the digits is repeated?
Computation:
61
Probability
The theory of probability, which received its first impetus from games of chance,
has been highly developed and now has wide and important applications in the fields of
insurance, annuities, and other social sciences.
In our study of probability, we indicate precisely those conditions which favor
the happening of an event and those which oppose the happening.
Probability of an Event
2. If four cards are to be removed from a standard deck of playing cards, find the
probability that (a) all four are face cards; and (b) all four are spades.
Computation: (Using the Formula)
(a) There are 12 face cards and consequently, C (12, 4) ways of getting four face
cards. The total number of ways of getting four cards from the deck is given by C (52,
4). Therefore:
Computation: (Using the Formula)
C 12, 4
P 4 face cards
C 52, 4
99
54,145
(b)
62
C 13, 4
P 4 spades
C 52, 4
11
4,165
Probability in the Union of Events
Probability in the union of two sets includes all the elements in E1 and E2 by P
(E1 E2).
Example: If A = {u, v, w, x, y, z} and B = {a, v, y, z), then
A B = {a, u, v, w, x, y, z}
Theorem 1: If E1 and E2 are any events in a sample space S, then
PE E
1 2
PE
1
PE
2
PE E
1 2
Corollary 1: If E1 and E2 are disjoint sets, then E1 and E2 are said to be mutually
exclusive. For this case, n (E1 E2) = 0 then,
PE E
1 2
PE PE
1
2
Disjoint Sets – Two sets which have no common element.
Probability in the intersection of two events includes all he elements which are
common to E1 and E2 by P (E1 E2).
Example: Same example in PUE.
A B = {v, y, z}
Theorem 2: If E1 and E2 are dependent events in the sample space S, then
PE E PE P (E2 E1)
1 2 1
Corollary 2: If E1 and E2 are independent events, then,
PE E
1 2
PE PE
1
2
Examples
2. A group of boys are to compete in a foot race. The probability that runner A will win
is 1/7, and the probability that runner B will win is ¼. Find the probability that A or B
will win.
Computation:
E1 = probability that runner A will win
63
E2 = probability that runner B will win, mutually exclusive events,
therefore:
P E1 E 2 P E1 P E 2
1 1 11
7 4 28
3. One card is drawn from a deck of 52 cards and then a second is drawn. Find the
probability that the first card is a spade and the second card is a club if (a) first card is
replaced before the second card is drawn; and (b) the first is not replaced.
Computation:
Let E1 = be the set of 13 spades.
Let E2 = be the set of 13 clubs. Therefore:
(b) If first card will not be replaced
P E1 E 2 P E1 P (E2 E1)
1 13 13
=
4 51 204
This type of probability will consider type of experiment in which the outcome of
any trial is independent of the outcome of any other trial.
Theorem 3: If p is the probability that an event will occur in a single trial of an
experiment and q is the probability that the event will fail to occur, then the probability
that the event will occur exactly r times in n trials is:
n r
C n, r pr q
Examples
1. In 4 throws of a die, what is the probability of (a) exactly 2 aces; (b) 2 or more aces?
Computation:
(a) The probability of one ace on a single throw is 1/6 and of failure is 5/6.
Hence, the probability of two aces is:
n r
C n, r pr q
2 2
1 5
C 4, 2
6 6
25
216
(b) The probability of 2, 3, or 4 aces is equal to the sum of the separate
probabilities of those events. Hence, we have:
64
n r
C n, r pr q
2 2 3 4
1 5 1 5 1
C 4, 2 C 4, 3
6 6 6 6 6
171 171
6 4 1296
One can determine the chance or probability using the concept of venn diagram.
It is a diagram that shows all possible logical relations between two sets.
Example
1. Business Survey: A survey of 800 small business firms in a certain city indicate that
250 own photo-copiers, 420 own fax machines, and 180 own both photo-copiers and
fax machines.
Illustration and Computation:
a) Draw a Venn diagram illustrating the events P={owns photocopier} and F={owns
fax machine}.
310
P F
180
70 240
b) How many businesses in the survey own either a photocopier or a fax machine?
n(P F) 70 180 240 490
c) What is the probability that a randomly selected business owns either a
photocopier or a fax machine?
490
P(P F) .6125
800
d) How many businesses in the survey own neither a photocopier nor a fax
machine?
n NOT(P F) 800 490 310
e) What is the probability that a randomly selected business owns neither a
photocopier nor a fax machine?
310
P NOT(P F) .3875
800
f) How many businesses in the survey own a photocopier but not a fax machine?
n P F 70
g) What is the probably that a randomly selected business owns a photocopier but
not a fax machine?
P P F 70
800
.0875
65
P(P F) 180/800 180
PP|F 3/7 or 0.4286
P(F) 420/800 420
Exercise 10
Name: Score:
Course & Year: Date:
1 - 6. If one card is removed at a random from a standard deck of 52 playing cards, find
the probability that the card will be (a) one of the face cards; (b) one of the black suits;
and (c) a red jack.
Computation:
7 - 10. A baby places the wooden digits 1, 2, 3, 4, 5, 6 in a row. What is the probability
that the number thus formed is (a) less than 400,000, (b) more than 200,000?
Computation:
11 – 12. The probabilities that teams A, B, and C will win the conference championship
are ½ , 1/5, and 1/8. Find the probability that one of them will win the title.
Computation:
13 - 16. One box contains 3 black balls and 4 white balls and another box contains 6
black balls and 5 white balls. If one ball is drawn from each box, find the probability
that both balls will be (a) black; and (b) white.
Computation:
66
17 – 20. A coin is tossed 5 times. Find the probability of (a) exactly 2 heads; (b) exactly
3 heads; and (c) fewer than 2 heads.
Computation:
Consider the problem. Customer service: A cable television company has 8000
subscribers in a suburban community. The company offers two premium channels,
HBO and SHOWTIME. If 2450 subscribers receive HBO and 1950 receive SHOWTIME
and 5150 do not receive any premium channel, find the following:
23 – 24. What is the probability that a randomly selected subscriber receives both HBO
and SHOWTIME?
27 – 28. What is the probability that a randomly elected subscriber receives HBO but
not SHOWTIME?
67
29 – 30. What is the probability a randomly selected Showtime subscriber also receives
HBO?
Normal Distribution
It is symmetrical about the mean, median and the mode since all of the
measures are centrally located.
x
x =x =x
It is asymptotic relatively to the horizontal line where there is the tendency
that two tails of the curve will tend to meet with the horizontal line but will
never intersect.
Like any geometrical figure, the area of the normal is curve is also 100%.
50% 50%
z
68
4. Refer the unknown area of the normal curve from the Tabulated Areas of the Normal
Probability Distribution of the computed standard score.
Examples
0 1.0
c. P (z 1) = .5 - .3413 = .1587 or 15.87%
2. The salaries of MBA graduates who entered the field of marketing services averaged
approximately P45, 000 with a standard deviation of P2, 250. If these salaries were
normally distributed, what proportion of MBA graduates who entered marketing
services had salaries in excess of P47, 500, which is the average salary for those
graduates entering the field of brand/product management?
Computation:
x μ
a. z
σ
47, 500 45, 000
2250
1.11
b.
0 1.11
c. P ( z 1.11) = .5 - .3665 = .1335 or 13.35%
69
Exercise 10
Name: Score:
Course & Year: Date:
a. b.
c.
11 – 20. In a mathematics examination the average grade was 83 and the standard
deviation was 5. All students with grades from 87 to 90 received a grade of B. If the
grades are approximately normally distributed and 15 students received a B grade, how
many students took the examination?
Computation:
a. b.
c.
70
Chapter VII. Hypothesis Testing
Objectives
Introduction
Statement of a Hypothesis
There are two types of statistical hypotheses for each situation: the null
hypothesis and the alternative hypothesis.
a. The null hypothesis, symbolized by H0, is a statistical hypothesis that
states that there is no difference between a parameter and a specific value, or
that there is no difference between two parameters.
b. The alternative hypothesis, symbolized by H1, is a statistical
hypothesis that states the existence of a difference between a parameter and a
specific value, or states that there is a difference between two parameters.
71
1. A medical researcher is interested in finding out whether a new medication will have
any undesirable side effects. The researcher is particularly concerned with the pulse
rate of the patients who take the medication. Will the pulse rate increase, decrease, or
remain unchanged after a patient takes the medication? Since the researcher knows
that the mean pulse rate for the population under study is 82 beats per minute, the
hypotheses for this situation are
H0: µ = 82 (The new medication will not have any undesirable side effects on the
pulse rate of the patients.)
H1: µ ≠ 82 (The new medication will have an undesirable side effects on the pulse
rate of the patients.)
Type of test: two – tailed (possible side effects of the medicine could be to raise
or lower the pulse rate.)
2. A chemist invents an additive to increase the life of an automobile battery. The mean
lifetime of the automobile battery is 36 months.
H0: µ = 36 (An invented additive will not increase the life of an automobile
battery.)
H1: µ > 36 (An invented additive will increase the life of an automobile battery.)
Type of test: One – tailed (This test is called right-tailed, since the interest is in
an increase only.)
3. A researcher thinks that if expectant mothers use vitamin pills, the birth weight of
the babies will increase. The average birth weight of the population is 8.6 pounds.
H0: µ = 8.6 lbs (The use of vitamin pills of expectant mothers will not increase the
weight of the babies.)
H1: µ > 8.6 lbs (The use of vitamin pills of expectant mothers will increase the
weight of the babies.)
Type of Test: One - tailed
4. A psychologist feels that playing soft music during a test will change the results of
the test. The psychologist is not sure whether the grades will be higher or lower. In the
past, the mean of the scores was 73.
H0: µ = 73 (Playing soft music during a test will not change the results of the test.)
H1: µ 73 (Playing soft music during a test will change the result of the test.)
Type of Test: Two - tailed
> <
Is greater than Is less than
Is increased Is decreased or
Reduced from
≥ ≤
Is greater than or Is less than or
equal to equal to
Is at least Is at most
= ≠
Is equal to Is not equal to
Has not changed from Has changed from
72
Exercise 11
Name: Score:
Course & Year: Date:
A. Formulate the null and alternative hypothesis and identify the type of test.
1 - 3. The manufacturer of a certain brand of cigarettes claims that the average nicotine
content does not exceed 2.5 mg. State the H 0 and H1 in testing and determine what type
of test to be utilized.
Hypotheses:
Hypothetical Question:
H o:
H1 :
Type of Test:
4 - 6. Test the hypothesis that there exists significant difference between the
performance of BSBA and BSOA students in Statistics against that it does not.
Hypotheses:
H o:
H1 :
Type of Test:
7 - 9. A real estate agent claims that 60% of all private residences being built today are
3 – bedroom homes. To test this claim, a large sample of new residences is inspected;
the proportion of these homes with 3 bedrooms is recorded and used as our test
statistic.
Hypotheses:
H o:
H1 :
Type of Test:
H o:
H1 :
Type of Test:
13 – 15. A group of educators would like to find out whether programmed materials are
more effective than traditional method in teaching – learning process.
Hypotheses:
H o:
73
H1 :
Type of Test:
After stating the hypotheses, the researcher’s next step is to design the study.
The researcher selects the correct statistical test, chooses an appropriate level of
significance, and formulates a plan for conducting the study.
Statistical Test
A statistical test uses the data obtained from a sample to make a decision about
whether or not the null hypothesis should be rejected.
Statistical test is called the test value.
H0 True H0 False
Reject H0 Error Type I Correct Decision
Do not reject Ho Correct Decision Error Type II
A type I error occurs if one rejects the null hypothesis when it is true.
A type II error occurs if one accepts the null hypothesis when it is false.
Vocabulary List
74
Acceptance region – region where null hypothesis will be accepted and proved
to be true.
Level of Significance – probability of committing making Type I Error. Chance
that the Null Hypothesis will be rejected.
One - tailed test – rejection region is located at one extreme of the range of
values or if the condition of the alternative hypothesis is directional.
Two – tailed test – rejection is located at two extreme of range of values or if
the condition of the alternative hypothesis is non – directional.
Testing Hypothesis Concerning Mean/s
μμ μμ μ μ
0
H 1:
0, , or 0
x μ0
1. z if n 30 (z test)
σ / n
x μ0
2. t if n 30 (t test)
s / n
v n 1
μμ z z t t
0 α α
If , then the critical region is in z – test and in t – test
75
z z z z
α/2
If μ μ
0, then the critical region is
α/2 and in z – test and
t t t t
α/2
and in t – test which lies on each tail of the distribution.
α/2
Note:
z – test is used when the population standard deviation is known and with large
sample size given n > 30.
t – test is used when the sample standard deviation is known and with sample
size given n < 30.
Example
1. In a recent survey of utility workers both in public and private in Region 1, it was
found out that the average monthly net income of utility workers is Php5, 500. Suppose
a researcher wants to test this figure by taking a random sample of 250 utility workers
in Region 1 to determine whether the monthly income has changed. Suppose further
that the average net monthly income of the 250 utility workers is found to be Php6, 700
with a population standard deviation of P1, 550. Does this seem to indicate that average
net monthly income after the survey already greater than Php5, 500? Use a 0.05 level of
significance.
Illustrative Testing:
a) H0: The average net monthly income of utility workers in Region 1 after the
survey is still Php5, 500. µ = Php5, 500
b) H1: The average net monthly income of utility workers in Region 1 after the
survey is already greater than Php5, 500. µ > Php5, 500
c) = 0.05 (one – tailed test)
d) Critical Region: z > 1.645
0 1.645
e) Computation: x = Php6, 700; µ = Php5, 500; = Php1, 550; n = 250
xμ
z
σ
n
Php 6,700 Php 5,500 250
Php1,550
12.24
f) Decision: Reject H0 and accept H1 and conclude that the mean net monthly
income after the survey is already greater than Php5, 500.
2. In a certain study, it was found out that the average time required by workers to
complete a certain task was 44.3 minutes. A group of 20 workers was randomly chosen
to undergo a special training for one month. After the training, it was observed that the
workers’ average time to complete the task was 38 minutes with a standard deviation of
76
6.5 minutes. Can it be concluded at 95% confidence that the special training facilitated
the completion of the task?
Illustrative Testing:
a) H0: The special training did not facilitate the completion of the task and still
44.3 minutes. µ = 44.3 minutes
b) H1: The special training facilitated the completion of the task and it is now
less than 44.3 minutes. µ < 44.3 minutes
c) = 0.05 (one – tailed test)
d) Critical Region: v = 20 – 1 = 19; t < -2.093
-1.729 0
e) Computation: x = 38 min; µ = 44.3 min; s = 6.5 min; n = 20
xμ
t
s
n 1
38 - 44.3 19
6.5
4.22
f) Decision: Reject H0 and accept H1 and conclude that the special training
facilitated the completion of the task.
μ μ , μ μ μ μ
1 2
H 1: , or
1 2 1 2
(x1 x 2 ) d 0
1. z
(independent large samples)
2 / n σ2 / n
σ1 1 2 2
x1
x2 d
2. t 0 independent small samples
s2 s 2
1
2
n n
1 2
77
Establishing the Critical Region
1 2
If
z z t t
α α
in z – test and in t – test which lies on the left tail of the
distribution.
If μ1 μ 2 with corresponding level of significance , then the critical
z z z z t t t t
α/2 α/2
region is and in z – test and and in t – test
α/2 α/2
Example
-1.725 0 1.725
e) Computation: x 21 = 85;
x1 x
d x 2 = 81; s1 = 4; s2 = 5; n1 = 12; n2 = 10
t 0 independen t small samples
s2 s2
1 2
n n
1 2
85 81 d
0 78
1612 2510
2.04
f) Decision: Reject H0 and accept H1 and conclude that the conventional
classroom procedure is still effective than the program material based on the computed
value.
Illustrative Testing:
a) H0: There is no significant difference between the level of effectiveness of the
Performance Appraisal System as perceived by the supervisors and rank and
file.
b) H1: There is significant difference between the level of effectiveness of the
Performance Appraisal System as perceived by the supervisors and rank and
file.
c) = .05 (two – tailed test)
d) Critical Value = t.05 > 1.812 & t.05 < -1.812; df = 10; = (.05)
79
t
-2.22 0 2.22
x x
t 1 2
σ 2 σ2
1 2
n n
1 2
3.8117 3.3483
.5177 2 .5355 2
6 6
1.52
f) Decision: Since the computed value is less than the tabulated value and falls
within the acceptance region, accept the H 0 and reject the H1 and conclude that
there is no significant difference between the level of effectiveness of the
80
Performance Appraisal System as perceived by the supervisors and rank and
file.
d μ do
t
s2 / n
d
where :
d sample mean difference
μ do hypothesized population mean difference
Example
1. Test the hypothesis that the diet was successful after 12 weeks if t he weights of 9
obese women before and after 12 weeks on a very low calorie diet were as follows:
Illustrative Testing:
81
a) H0: The very low calorie diet was not successful after 12 weeks for 9 obese
women.
b) H1: The very low calorie diet was successful after 12 weeks for 9 obese women.
c) = 0.05 (one – tailed test)
d) Critical Region: v = 9 – 1 = 8
-1.86 0
e) Computation: (Using the Formula)
di 34.0 25.5 14.3
d 22.59
n 9
2
sd
di d 2
34.0 22.59 2 25.5 22.59 2 14.3 22.59 2
28.3
n 1 8
22.59 0
t 12.74
28.30
9
Computation: (Using MS EXCEL)
open ms excel, input after weights column, input before weights column,
select data, data analysis, t – test (paired two samples), highlight
variable 1 (after column), highlight variable 2 (before column),
alpha at 0.05, select output range, then enter.
f) Decision: Since -12.7395 falls in the rejection region, we reject H 0 and accept
H1 and conclude that the diet was successful after the very low calorie diet in 12 weeks.
Analysis of Variance
82
Analysis of Variance is a technique in inferential analysis in which it is designed
to test whether or not more than 2 samples (groups) are significantly different from each
other. To test this claim, one – way or two – way ANOVA can be applied in testing.
It minimizes the time and effort when computing and testing more than two
samples.
T-test has statistical limitation.
The interaction effects between and among the variables can be measured.
Objectives
F
Xi X ii 2
1 1
MSW k 1
N N
i ii
83
Steps:
1. Specify pairs of means to be computed.
2. Compute for Scheffe Test.
3. Compute for the test statistic.
4. Compare the ratio and make interpretation.
Example
1. Four groups of 6 students in Statistics class each is taken from the four programs of the
College of Arts and Management is subjected in an experiment, where each group is
subjected to one of the types of teaching method. The grades of the students are tabulated
at the end of 1 month of experiment. At 0.05 level of significance, test the hypothesis that
the there is no significant difference in the average grade gains among the four groups of
students using the four methods of teaching against that there is significant difference
among the four methods.
Methods
Group 1 Group 2 Group 3 Group 4
Student
Method A Method B Method C Method D
1 78 95 97 80
2 82 85 89 86
3 85 85 88 80
4 79 92 90 75
5 80 82 90 80
6 82 90 80 82
Illustrative Testing:
a) H0: There is no significant difference in the average grade gains among the
four groups of students using the four methods of teaching.
b) H1: There is significant difference in the average grade gains among the four
groups of students using the four methods of teaching.
c) = 0.05 (one – tailed test)
d. Computation: (Using the Formula)
Methods
Stud Group 1 Group 2 Group 3 Group4
A (X11) B (X21) C (X31) D (X41)
X11 2 X212 X 31 2 X 41 2
1 78 95 97 80 6084 9025 9409 6400
2 82 85 89 86 6724 7225 7921 7396
3 85 85 88 80 7225 7225 7744 6400
4 79 92 90 75 6241 8464 8100 5625
5 80 82 90 80 6400 6724 8100 6400
6 82 90 80 82 6724 8100 6400 6724
(X11) =
X11 X 21 = X 31 = X 41 2 =
2 2 2
(X21) = (X31) = (X11) =
486
529 534 483 =39,398 46,763 47,674 38,945
84
2
Σ Xt
e. SS TOT Σ X 2
t Nt
172,780 - 172,042.7
737.33
f . SS
Σ X11 2
Σ X 21 2
Σ X31 2
Σ X 41 2
Σ X T 2
BET N11 N21 N3 N4 NT
486 2
529 2 534 2 483 2 2032 2
6 6 6 6 24
371
g. SS W SS TOT SSBET
737.33 371
366.33
h. Degrees of Freedom = 4 – 1 = 3 (between)
= 24 – 4 = 20 (within)
SSBET
i. MSBET
dfBET
371
3
123.67
SS W
MS
W df w
366.33
20
18.32
MSBET
j. F
MS W
123.67
18.32
6.75
F – Tabulated Value = 3.10
85
m. Application of Post Hoc Multiple Comparison or Post ANOVA Scheffe Test.
Given:
x11 = 81 MSW = 18.32
x 21 = 88.17 K–1=3
x 31 = 89
x 41 = 80.5
2. F
X i X ii 2
a 1
1 k 1
MSW
N N
i ii
81 88.17 2
1 1
18.32 3
6 6
2.80
86
F
Xi X ii 2
b 1
1 k 1
MSW
N Nii
i
88.17
- 89 2
1 1
18.32 3
6 6
.04
F
X i X ii
2
F
X i X ii 2
c 1 d 1
1 1 k 1
MSW k 1 MSW
N N N N ii
i ii i
89
80.5 2 81 - 89 2
1 1 1 1
18.32 3 18.32 3
6 6 6 6
3.94 3.49
F
X i X ii 2
e 1
1 k 1
MSW
N N
i ii
81 - 80.5 2
1 1
18.32 3
6 6
.013
F
Xi X ii 2
f 1
1 k 1
MSW
N N
i ii
88.17
- 80.5 2
1 1
18.32 3
6 6
3.21
3. F – Test
F k 1 df
3 3.10
3.01
4. a. Since the computed value for Fa sheffe test (2.80) is less than Ftest (3.01),
therefore, there is no significant difference between methods A and B.
b. Since the computed value for Fb sheffe test (.04) is less than Ftest (3.01),
therefore, there is no significant difference between methods B and C.
c. Since the computed value for Fc sheffe test (3.94) is greater than Ftest
(3.01), therefore, there is significant difference between methods C and D.
d. Since the computed value for Fd sheffe test (3.49) is greater than Ftest
(3.01), therefore, there is significant difference between methods A and C.
87
e. Since the computed value for Fe sheffe test (.013) is less than Ftest (3.01),
therefore, there is no significant difference between methods A and D.
f. Since the computed value for Ff sheffe test (3.21) is greater than Ftest
(3.01), therefore, there is significant difference between methods B and D.
SUMMARY
Groups Count Sum Average Variance
Method A 6 486 81 6.4
Method B 6 529 88.16667 24.56667
Method C 6 534 89 29.6
Method D 6 483 80.5 12.7
ANOVA
Source of Variation SS df MS f P-value f crit
Between Groups 371 3 123.6667 6.751592 0.002509 3.098391
Within Groups 366.3333 20 18.31667
Total 737.3333 23
ANOVA for single factor may be extended to more than one independent
variable. Similar to one – way analysis of variance, the variability of data may be caused
by such sources as error and each independent variable. The effects produced by each
independent variable are called main effects, while the effect by the combination of the
variable is called interaction.
88
Example
Illustrative Solution:
a. H0: There is no interaction between Study Habit (Factor A) and Memory
Retention (Factor B).
H1: There is an interaction between Study Habit (Factor A) and Memory
Retention (Factor B).
b. = 0.05 (one tailed test)
c. SS TOT Σ X 2
Σ Xt
2
t Nt
4161 - 4013.6
147.37
1652 1822 347 2
d. SS ROWS
15 15 30 `
1815 2208.27 4013.6
9.63
136 2 1102 1012 347 2
e. SS
COLUMNS 10 10 10 30
1849.6 1210 1020.1 - 4013.63
66.07
89
64 2 552 462 722 552 552
f. SS 4161
ERROR 5 5 5 5 5 5
4161 4094.2
66.8
g. SS SS SS SS
INTERACTION TOT ROWS COLUMNS
SS
ERROR
147.37 9.63 66.07 66.8
4.87
h. Degrees of Freedom:
dfrows = (r - 1), (N – ab) = (2 - 1), (30 – 6) = 1, 24
dfcolumns = (c – 1), (N – ab) = (3 – 1), (30 – 6) = 2, 24
dferror = N – ab = 30 – 6 = 24
dfinteraction = (r – 1) . (c – 1), (N –ab)
= (2 – 1) . (3 – 1), (30 – 6)
= (2, 24)
SS ROWS SS COLUMNS
i. MS MS
ROWS a 1 COLUMNS b 1
9.63 66.07
2 1 3 1
9.63 33.04
SS INTERACTION SS ERROR
MS INTERACTION MSERROR
a 1 b 1 N ab
4.87 66.8
2 24
2.44 2.78
MS MS
j. f rows ROWS f COLUMNS COLUMNS
MS ERROR MS ERROR
9.63 33.04
2.78 2.78
3.46 11.88
MS
f INTERACTION
INTERACTION MS ERROR
2.44
2.78
0.88
k. Summary Table
Degrees F–
Source of Sum of Mean F-
of Tabular Decision Interpretation
Variation Squares Squares Ratio
Freedom Value
Accept Not
Factor A 1 9.63 9.63 3.46 4.26
H0 Significant
Reject
Factor B 2 66.07 33.04 11.88 3.40 Significant
H0
Interaction Accept No
2 4.87 2.44 0.88 3.40
AB H0 interaction
90
Error 24 66.8 2.78
Total 29 147
l. Decision: The study habit with music F – computed value of 3.46 is lower than
the tabular F – value of 4.26 so it leads to the acceptance of the null hypothesis. Hence,
there is no significant difference among the three with music populations in their mean
memory retention. It can be stated that studying with music has no effect in the
memory retention.
The study habit without music F – computed value of 11.88 exceeds the tabular
F – value of 3.40 thus, it leads to the rejection of the null hypothesis. Therefore, there is
significant difference exists between the three without music populations in their
memory retention. It can be stated that studying without music has an effect in the
memory retention.
91
For interaction, since that F – computed value is lower than the F - tabulated
value, it leads to the acceptance of the null hypothesis. It means that there is no
interaction between the study habit and memory retention. The result gives an
implication that the student with high memory retention using music while studying
will have a high retention even without music.
Exercise 12
Name: Score:
Course & Year: Date:
A. Test systematically the corresponding hypothesis. Use data analysis of Microsoft Excel
to test the following conjectures.
92
1 - 10. Test the hypothesis that the average number of suicide victims in Iraq is 33
every month if a random sample of 9 suicide events tallied victims 29, 30, 32, 34, 35,
37, 39, & 42. Use a 0.05 level of significance and assume that the distribution of
victims is normal.
11 – 20. Test the conjecture that performance of the experimental group is better than
the control group using the programmed material in statistics at 0.05 level 0f
significance.
Students EG CG
1 26 22
21 - 30. In 2 17 19 a study
conducted by
3 20 14
graduating students in
the College 4 19 18 of Arts and
5 19 11
6 24 16
7 23 15
8 25 20
Management in DMMMSU-MLUC the following data were recorded to determine the
level of effectiveness of the computerized registration during enrollment in min per
transaction.
After
Before (Min/trans.)
(Min/trans.)
25 17
20 12
30 12
28 20
25 18
19 17
32 18
20 15
31 - 40. Test the conjecture that there is a significant difference on the level of
effectiveness of the Performance Appraisal System as to Educational Attainment as
perceived by the three groups.
93
4 3.00 3.08 3.05
5 3.89 4.23 4.48
6 3.61 4.23 4.67
41 - 50. Apply the two – way ANOVA to the following data. Use a 0.05 level of
significance.
94
H 1: σ 2 σ 20 , σ2 σ20 2 2
, or σ σ 0
x2
n 1 s 2
σ20
Where:
n = sample size
s 2 = sample variance
σ2
0 = specified value of the population variance
If σ 2 σ20 , then the critical region is x 2 x 2α which lies on the right tail of the
distribution.
If
2σ σ2 , then the critical region is x2 x12 - α which lies on the left tail of the
0
distribution.
If σ 2 σ 2
0 , then the critical region is x2 x2 and x2 x2 which lies on
1 - α/2 /2
Example
95
0 36.415
2
e) Computation: s = 2.03; n = 25
x2
n 1 s2
σ20
24 2.03
1.15
42.37
f) Decision: Reject H0 and accept H1 and conclude that the machine is out of
control since the variance exceeded to 42.37.
In this case, the objective is to check and test the equality or uniformity of two
2 and σ 2
variances σ1 2 of two populations. We shall test the null hypothesis against one
of the usual alternatives.
2 2
H0: σ1 σ 2
For independent random samples of size n 1 and n2, respectively from the two
populations, f – test for testing is given by:
2
s1
f
s2
2
Where:
2
s1 = variance of the first population
s2
2 = variance of the second population
1 2
tail of the distribution.
If
σ2 σ2 , then the critical region is f f v ,v
1-α 1 2 which lies on the left tail of
1 2
the distribution.
96
If σ2 σ2 ,
1 2 then the critical region is f f v ,v
1 - α/2 1 2 and f f
v ,v
α/2 1 2
Example
0.36 2.91
f f
v ,v
α/2 1 2 = f0.05 (12, 10)
= 2.91
Theorem:
ff v ,v
1 - α/2 1 2
1
f v ,v
α 2 1
Therefore:
97
f f v ,v
.95 1 2
1
f v ,v
α 2 1
1
f 10, 12
.05
0.36
2
e) Computation: s1 = 16; s 2
2 = 25
s 2
f 1
s22
16
25
0.64
f.) Decision: Accept H0 and conclude that scores taken from the conventional
classroom procedure are consistent or do not differ from the scores taken from the
programmed material.
98
Exercise 13
Name: Score:
Course & Year: Date:
1 - 10. A researcher would like to determine whether taking smoking on early age will
shorten the life span of the person. Past experience indicates that the person will live
with a mean life of 55 years and a variance of 3.5 years if started early smoking.
Suppose a sample of 27 subjects have been experimented and found out that the
variance of the person to live is 2.9 years, does this seem to indicate that taking early
smoking will shorten the variance of life of a person? Use 0.05 level of significance.
Illustrative Testing:
a. Ho:
b. H1:
e. Computations:
99
f. Decision:
b. H1:
e. Computations:
f. Decision:
100
Testing Hypothesis Concerning Proportions
We shall consider the problem of testing the hypothesis that the proportion of
successes in a binomial experiment equals some specified values.
Problem of testing hypothesis H0 that the population proportion p p0 against
the usual alternatives:
H 0: p p 0
H1: p p0 , p p0 , or p p0
Assuming that the distribution of the population being sampled is at least
approximately normal and large, z - value for testing p p0 is given by:
x n p0
z
n p0 q 0
Where:
x = sample
n = population
p0 = proportion of success
q0 = proportion of failure
101
p p z z
α
If 0, then the critical region is which lies on the left tail of the
distribution.
z z z z
α/2
If p p , then the critical region is and which lies on
0 α/2
1. In a survey of drug users conducted by PDEA, 18 out of 423 (4.26%) were found to be
HIV positive. Can we conclude that fewer than 5% of the drug users in the sampled
population are HIV positive? Use 0.05 level of significance.
Illustrative Testing:
a) H0: The proportion of sampled population who are HIV positive is equal to .05
or 5%. P = .05
b) H1: The proportion of sampled population who are HIV positive is less than .
05 or 5%. P < .05
c) = 0.05 (one – tailed test)
-1.645 0
e) Computation: x = 18; n = 423; p0 = 4.26%; q0 = 95.74%
x n p0
z
n p0 q 0
18 423 .0426
423 .0426 .9574
0.00477
f.) Decision: Accept H0 and reject H1 and conclude that the proportion of the
population who are HIV positive is equal to 0.05 or 5%.
z z
α
is which lies on the left tail of the distribution.
z z z z
α/2
region is and which lies on each tail of the distribution.
α/2
Example
1. A poll is taken among the residents in San Fernando City, La Union to determine and
compare level of acceptability favoring the proposal on Charter Change of the two
Barangay’s being sampled. Barangay Catbangen was found out to have 80 out of 150
favoring the proposal and there were 90 out of 200 favoring the proposal in Brgy.
Santiago. Would you agree that the proportion of the vote taken from Brgy. Santiago
favoring the Charter Change is higher than the proportion taken from Brgy. Catbangen?
Use a 0.05 level of significance to test the hypothesis.
Illustrative Testing:
a) H0: The proportions of the votes taken from 2 Barangay’s being sampled do
not differ from one another. p1 p2
b) H1: The proportion of the vote taken from Brgy. Santiago is higher than the
proportion of the vote taken from Brgy. Catbangen. p1 p2
c) = 0.05 (one – tailed test)
d) Critical Region: z > 1.645
P1 P2
z
pq 1
1
n
10 n 2 1.645
e) Computation: P1 =0.53 0.45
80/150; P2 = 90/200; n1 = 150; n2 = 200
0.49 0.51 1150 1200
1.48
80 90
p
150 200 103
0.49
q 0.51
f.) Decision: Accept H0 and reject H1 and conclude that the proportions of the
votes of the 2 Barangay’s being sampled do not differ from one another or equal.
x2
oi ei 2
i ei
Where:
oi = observed frequencies
ei = expected frequencies
If p1, p2, …, pk not all are equal, then the critical region is x 2 x 2α which
Example
1. In a shop, a set of data was collected to determine whether or not the proportion of
defectives produced by workers was the same for the day, evening, or midnight shift
worked. The following data were collected.
SHIFT
Day Evening Midnight
Defectives 45 55 70
104
Nondefectives 905 890 870
0 7.378
e) Computation:
1. e
950170 57.0 4. e
2665 950 893.03
1 2835 4 2835
2. e2
945170 56.67 5. e5
945 2665 888.33
2835 2835
3. e
940170 56.37 6. e
940 2665 883.63
3 2835 6 2835
SHIFT Total
Day Evening Midnight
Defectives 45 (57.0) 55 (56.67) 70 (56.37) 170
Non-defectives 905 (893.03) 890 (888.33) 870 (883.63) 2665
Total 950 945 940 2835
x2
45 57 2
55 56.67 2
70 56.37 2
57.0 56.67 56.37
905 893.03 2
890 888.33 2
870 883.63 2
893.03 888.33 883.63
6.29
f.) Decision: Accept H0 and reject H1 and conclude that it would certainly be
dangerous to say that the proportions of defectives produced by the workers are the
same for all shifts.
105
Exercise 14
Name: Score:
Course & Year: Date:
a. Ho:
b. H1:
106
e. Computation:
f. Decision:
Illustrative Testing:
a. Ho:
b. H1:
107
e. Computation:
f. Decision:
Can we conclude that the proportions of parents who favor placing sex
education in the secondary level are not the same for these 5 provinces? Use a 0.05
level of significance.
Illustrative Testing:
a. Ho:
b. H1:
108
e. Computations:
f. Decision:
109
When testing the null hypothesis that the population correlation coefficient is
zero, the following t – test should be used:
r n 2
t with v n 2
1 r2
Where:
r = Pearson’s r correlation coefficient.
n = number of respondents (number of pairs sample)
n xy x y
r
2 2
n x2 x n y2 y
Pearson r
Establishing the Critical Region
If p > 0, then the critical region is t t which lies on the right tail of the
α
distribution.
of the distribution.
t t t t
α/2
If p 0, then the critical region is and in t – test which lies
α/2
1. A team of social psychologists has developed a scale that purports to measure social
isolation. The scores made on the scale by 15 respondents are correlated with their
scores on an index revealing the degrees of prejudice felt towards minority groups. They
obtain a Pearson’s r of 0.60. May they conclude that the obtained correlation is not
likely to have been drawn from a population in which the true correlation is zero? Use a
0.05 level of significance.
Illustrative Testing:
a) H0: The population correlation coefficient from which this sample was drawn
equals to 0. p = 0
b) H1: The population correlation coefficient from which this sample was drawn
does not equal to zero. P 0
c) = 0.05 (two – tailed test)
d) Critical Region: t > 2.16 and t < -2.16; v = 15 – 2 = 13
-2.16 0 2.16
e) Computation: r = 0.60; n = 15
110
0.60 15 2
t
2
1 0.60
2.70
f.) Decision: Reject H0 and accept H1 and conclude that the relation between the
origin of the respondents and the degree of prejudice felt toward the minority groups.
When testing the null hypothesis that the population correlation coefficient is
not zero, the following z – test should be used:
zr Zr
z
1
n 3
Where:
zr = the transformed value of the sample r
Zr = the transformed value of the population correlation.
z z
α
is which lies on the left tail of the distribution.
z z z z
α/2
is and which lies on each tail of the distribution.
α/2
Example
1. Using the same example, let us test H0 that the population correlation from which the
sample was drawn is 0.25.
Illustrative Solution:
a) H0: The population correlation coefficient from which this sample was drawn
equals to 0.25. p = 0.25
b) H1: The population correlation coefficient from which this sample was drawn
does not equal to 0.25. P 0.25
c) = 0.05 (two – tailed test)
d) Critical Region: z > 1.645 and t < -1.645;
111
-1.645 0 1.645
If rs < r we accept H0
If rs r we reject H0
Example
1. Test the hypothesis that there is a strong relation of the social class backgrounds of
husbands and wives prior to marriage. Use a 0.01 level of significance.
Wife’s Rank (x) Husband’s Rank (y) D D2
1 4 -3 9
2 2 0 0
3 9 -6 36
4 1 3 9
5 7 -2 4
6 10 -4 16
7 8 -1 1
8 13 -5 25
9 5 4 16
10 3 7 49
11 11 0 0
12 6 6 36
13 12 1 1
14 15 -1 1
15 14 1 1
D = 0 D 2 = 204
Illustrative Testing:
a) H0: The population value of the spearman correlation coefficient is 0 and that
there is no strong relation of the social class of husband’s and wives prior to marriage.
p=0
112
b) H1: The true population correlation coefficient is greater than zero. P > 0
c) = 0.01 (one – tailed test)
d) Critical Region: rs(0.01) > 0.623
0 .623
e) Computation: n = 15; D 2 = 204
6 D2
rs 1
n n2 1
6 204
1
15 224
0.64
f.) Decision: Reject H0 and Accept H1 and conclude that since population value of
rs is greater than zero, there is a strong relation of the social class of husbands and
wives prior to marriage.
113
Exercise 15
Name: Score:
Course & Year: Date:
b. H1:
e. Computation:
114
f. Decision:
16 - 30. A consumer panel tests 9 brands of microwave ovens for over-all quality. The
ranks assigned by the panel and the suggested retail prices are as follows:
Suggested Price
Manufacturer Panel Rating
in Php
A 6 4800
B 9 3950
C 2 5750
D 8 5500
E 5 5100
F 1 5450
G 7 4000
H 4 4650
I 3 4200
Is there a significant relationship between the quality and the price of a
microwave oven? Use a 0.05 level of significance.
Illustrative Testing:
a. Ho:
b. H1:
e. Computation:
115
f. Decision:
x2
k oi ei 2
i 1 ei
Where:
2 = value of a random variable very close to the chi-squared
x
distribution
oi = observed frequencies
ei = expected frequencies
Analysis
Example
116
0 15.086
e) Computation:
Observed & Expected Frequencies of
180 Tosses a Die
X 1 2 3 4 5 6
Observed 28 36 36 30 27 23
Expected 30 30 30 30 30 30
x2
28 30 2
36 30 2
36 30 2
30 30 30
30 30 2
27 30 2
23 30 2
30 30 30
4.47
f.) Decision: Accept H0 and reject H1 and conclude that there is sufficient
evidence to prove that the coin is balanced.
x2
k oi ei 2
i 1 ei
Where:
2 = value of a random variable very close to the chi-squared
x
distribution
oi = observed frequencies
ei = expected frequencies
To compute the expected frequencies:
Expected frequency = (column total / row total) / grand total
Analysis
Large value of the chi-squared distribution leads to the rejection of the null
hypothesis and therefore classifications of variables are dependent.
Small value of the chi-squared distribution leads to the acceptance of the
null hypothesis and therefore, classifications of variables are independent.
Example
117
1. In an experiment to study the dependence of hypertension on smoking habits, the
following data were taken on 150 individuals.
Moderate Heavy
Non-smokers Total
Smokers Smokers
Hypertension 21 (26.07) 28 (27.2) 36 (31.73) 85
No hypertension 25 (19.93) 20 (20.8) 20 (24.27) 65
Total 46 48 56 150
Test the hypothesis that the presence or absence of hypertension is independent
of smoking habits. Use a 0.05 level of significance.
Illustrative Testing:
a) H0: The presence or absence of hypertension is independent of smoking
habits.
b) H1: The presence or absence of hypertension is dependent of smoking habits.
c) = 0.05 (one – tailed test)
0 5.991
e) Computation:
1. e
46 85 26.07 4. e
46 65 19.93
1 150 4 150
2. e2
48 85
27.2 5. e5
48 65 20.8
150 150
3. e
56 85 31.73 6. e
56 65 24.27
3 150 6 150
x2
21 26.07 2
28 27.2 2
36 31.73 2
26.07 27.2 31.73
25 19.93 2
20 20.8 2
20 24.27 2
19.93 20.8 24.27
3.65
f.) Decision: Accept H0 and reject H1 and conclude that the presence or absence
of hypertension is independent of smoking habits.
In this section, we test the hypothesis that the population proportions within
each row are the same. We are basically interested in determining whether three or
more categories or classifications of variables are homogenous.
To test homogeneity of three or more variables of classifications we use:
x2
k oi ei 2
i 1 ei
118
To compute the expected frequencies:
Expected frequency = (column total / row total) / grand total
Analysis
Large value of the chi-squared distribution leads to the rejection of the null
hypothesis and therefore classifications of variables are homogenous.
Small value of the chi-squared distribution leads to the acceptance of the
null hypothesis and therefore, classifications of variables are heterogeneous.
Example
1. A random sample of 200 married men, all retired, were classified according to
education and number of children.
Number of Children
Education 0-3 4-7 Over 7 Total
Elementary 14 37 32 83
Secondary 19 42 17 78
College 12 17 10 39
Total 45 96 59 200
Test the hypothesis, at the 0.05 level of significance that the size of a family is
independent of the level of education attained by the father.
Illustrative Solution:
a) H0: The size of a family is independent of the level of education attained by the
father.
b) H1: The size of a family is dependent of the level of education attained by the
father.
c) = 0.05 (one – tailed test)
0 9.488
e) Computation:
119
1. e
45 83 18.675 4. e
45 78 17.55 7. e
45 39 8.775
1 200 4 200 7 200
2. e2
96 83 39.84 5. e5
96 78 37.44 8. e8
96 39 18.72
200 200 200
3. e
59 83 24.485 6. e
59 78 23.01 9. e
59 39 11.505
3 200 6 200 9 200
x2
14 18.675 2
37 39.84 2
32 24.485 2
19 17.55 2
18.675 39.84 24.485 17.55
42 37.44 2
17 23.01 2
12 8.775 2
17 18.72 2
37.44 23.01 8.775 18.72
10 11.505 2
7.48
11.505
f.) Decision: Accept H0 and reject H1 and conclude that the size of a family is
independent of the level of education attained by the father.
120
Exercise 16
Name: Score:
Course & Year: Date:
X 1 2 3 4 5
Grade 83 85 80 90 82
Test the hypothesis, at 0.05 level of significance that the distribution is uniform.
Illustrative Testing:
a. Ho:
b. H1:
e. Computation:
f. Decision:
121
11 - 20. A random sample of 100 students in the College of Arts and Management are
classified according to gender and the number of hours they watched television during a
week:
Gender
Male Female Total
Over 30 hours 15 30 45
Under 30 hours 35 20 55
Total 50 50 100
Use a 0.05 level of significance to test the hypothesis that the time spent
watching television is independent of whether the viewer is male or female.
Illustrative Testing:
a. Ho:
b. H1:
e. Computation:
f. Decision:
122
21 - 30. Accordingto the study of the National Statistics Office, widows live longer than
widowers. Consider the following survival data collected on 100 widows and 100
widowers following the death of a spouse:
Status
Years Lived Widow Widower Total
Less than 5 26 29 55
5 to 10 40 40 80
More than 10 34 31 65
Total 100 100 200
Can we conclude at the 0.05 level of significance that the proportions of widows
and widowers are equal with respect to the different time periods that a spouse survives
after the death of his or her mate?
Illustrative Testing:
a. Ho:
b. H1:
e. Computation:
123
f. Decision:
Objectives
124
Figure 2. Perfect Negative Relationship
X 1 2 3 4 5 6 7
Y 12 10 8 6 4 2 0
In real life situations, however, the relationship between variables is not perfect.
Figure 3 illustrates a high positive relationship, while figure 4 shows a high negative
relationship.
Figure 3. Very High Positive Relationship
X 1 2 3 4 5 6 7
Y 2 2 4 8 8 10 11
125
-
Pearson’s r Formula
n xy x y
r
2 2
n x2 x n y2 y
Pearson r
Where:
r = Pearson product moment correlation
x = independent variable
y = dependent variable
n = total number of paired observations
Example
1. Compute the correlation between the heights (x) in feet and weights (y) in kilogram of
10 sample students in DMMMSU – MLUC.
Height
5.2 4.5 4.11 5.7 5.8 6.2 4.8 5.5 6 5.4
(x)
Weight
40 42.5 55 70 75 60 80 62 63 54
(y)
126
5.8 75 33.64 5625 435
6.2 60 38.44 3600 372
4.8 80 23.04 6400 384
5.5 62 30.25 3844 341
6 63 36 3969 378
5.4 54 29.16 2916 291.6
x = 53.21 y = 601.5
x 2 = 287.20 y 2 = 37,685.25 xy = 3225.9
Σ x 2 = 2831.3 Σ y 2 =361,802.3
n xy x y
r
2 2
n x2 x n y2 y
10 3225.9 53.21 601.5
10 (287.20) 2831.3 10 (37685.25) 361802.3
0.32
Correlation 0.323428854
pearson r 0.323428854
Interpretation:
For the data on the heights and weights of 10 sample students in DMMMSU –
MLUC, the computed value of r is 0.32. The value implies that weight has a low positive
correlation with height since r is positive. It can also be perceived that as a person
becomes taller, there is a small tendency to increase the weight.
127
Linear Regression
Methods
1. Graphical Method – this method consists of plotting the points corresponding to the
paired values of X and Y on the rectangular coordinate system. It provides a rough
estimate.
Example
1. Plot the points corresponding to the paired values of the age of adults (X) and Y peak
heart rate (Y) on the rectangular coordinate system and draw the trend line.
Age 10 20 20 25 30 30 30 40 45 50
PHR 210 200 195 195 190 180 185 180 170 ?
Interpretation:
Peak heart rate tends to decrease as the age increases.
2. Regression Formula – a fairly accurate estimate when the values of any two
variables are given. It makes use of the equation y a b X .
Y ΣX 2
ΣX ΣXY
a
n ΣX
2
ΣX
2
128
n ΣXY ΣX ΣY
b
n ΣX 2 ΣX
2
Where:
a = intercept
b = slope of the line fitted to the sample
y = estimated value of the dependent variable
X = observed value of the independent variable
Example
1. Compute the regression line of the above example and interpret the result. Make a
prediction on the peak heart rate if the age of a person is 60 years old.
Y ΣX 2
ΣX ΣXY
a
n ΣX 2
ΣX
2
1870 10,350 300 54,625
10 10,350 90,000
219.78
n ΣXY ΣX ΣY
b
n ΣX 2 ΣX 2
Regression equation:
129
y 219.78 - 1.09X
(Using MS EXCEL)
open ms excel, input age observations, input phr observations, select data,
data analysis, regression analysis, OK, highlight phr observations, place
cursor to x range, highlight age observations, set confidence level at 95,
select and assign output range, then press OK.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.970793994
R Square 0.942440979
Adjusted R Square 0.935246101
Standard Error 3.507597574
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 1611.574074 1611.574 130.9878 3.07301E-06
Residual 8 98.42592593 12.30324
Total 9 1710
Coefficients Standard Error t Stat P-value
Intercept 219.77 3.071 71.560 1.62E-12
X Variable 1 -1.09 0.095 -11.445 3.07E-06
Interpretation:
Peak heart rate tends to decrease as age increases based on the illustration
above by scatter diagram. The peak heart rate of an individual can reach during
intensive exercise decreases by an estimated 1.09 for each increase in the age of one
year.
Prediction:
If the age of the person is 25 years old, we can predict the peak heart rate of a
person using the regression equation:
y 219.78 - 1.09X
219.78 - 1.09 25
192.53 or 192
130
Exercise 17
Name: Score:
Course & Year: Date:
JP JS
2.75 2.55
2.50 2.35
4.00 3.75
3.25 4.25
3.10 4.10
4.25 3.34
4.10 2.86
3.05 2.90
3.45 2.75
2.45 3.46
Illustrative Testing:
a. Ho:
b. H1:
131
e. Computation:
f. Decision:
Use the scatter diagram and compute the regression equation and interpret as
well.
Predict of what will be the grade of the student using the regression equation if
the time he or she spent for his/her study was 25.
Illustrative Testing:
a. Ho:
b. H1:
132
e. Computation:
f. Decision:
References
133