Modules in Stat101

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 133

Prepared by:

Mark Anthony C. Ochoa


Instructor
General Overview
This compilation is a printed instructional material in Fundamentals of
Statistics for the College of Arts and Management Students of the Don Mariano Marcos
Memorial State University, Mid – La Union Campus which was developed by the
instructor in charge of the subject matter. It has been developed primarily to remedy
the difficulties being encountered by the students towards mathematical and analytical
disciplines. In addition, it has been highly anticipated that this instructional material
will enhance the teaching and learning classroom set – up and as well as will improve
the performance of the students.

This compilation was presented with the following components, specific


objectives of each topic, a discussion of the content with accompanying illustrative
examples and exercises for each skill area or every lesson. At the end of every topic,
chapter test will be also given and administered to the students.

Indeed, the compilation is quite similar to a work text or module because it


includes all relevant topics and compiled exercises that are well organized and
presented which also require the learners to answer and satisfy seven (10) chapter tests
which will be given at the end of every topic presentation.

Learners in this discipline will be provided with opportunities to demonstrate


their understanding and their skill in a particular lesson through the presentation of
instructional tasks that are classified challenging and realistic. The success of this
instructional material will depend on the sincerity and sense of responsibility of the
students, and nevertheless, dedication and commitment of the proponent who is also a
comrade for acquiring quality education.

Table of Contents

2
Title…………………………………………………………………………………………… 1

General Overview…………………………………………………………….. ………….. 2

Table of Contents………………………………………………………………………….. 3

Chapter 1. Introductory Concepts……………………………………………………… 4

Chapter 2. Collection of Data……………………………………. …………………….. 9

Chapter 3. Presentation of Data…...…………………………………………………… 17

Chapter 4. Measures of Central Tendency……………..…………………………….. 31

Chapter 5. Measures of Variability………………….………………………………….. 44

Chapter 6. Combinatorics and Probabilities………….....………….……………….. 55

Chapter 7. Hypothesis Testing…………………….…………………….……………….. 71

Chapter 8. The Chi – Square Analysis…………………........................................ 115

Chapter 9. Linear Correlation and Regression Analysis…………………………….. 123

References……………………………………………………………………… ……………….. 132

Chapter 1. Introductory Concepts

3
Objectives

 Relate and discuss the historical concept and development of Statistics.


 Explain the importance and uses of statistics.
 Describe and differentiate fields of statistics
 Enumerate and differentiate the different kinds of variable / data.
 Categorize variables as to level of measurements such as nominal, ordinal,
interval, and ratio.

History of Statistics

It has been believed that statistics started with the beginning of man’s existence.
As early as 3000 B.C., the population was recorded in Babylonia and in China. Almost
five thousand years ago, the Sumerians counted their citizens for taxation purposes,
and at various times later, the Egyptians conducted their inquiries into the occupation
of the people. In biblical times, censuses were undertaken by Moses in 1491 B.C. and
by David in 1017 B.C. The Athenians and other classical Greeks took census in time of
stress, carefully counting the adult male citizens in wartime and the general populates
when the food supply was endangered. The Romans registered adult males and their
property for military and administrative purposes. The sixth King of Rome from 578 to
534 B.C., Servinus Tullius, was given credit for instituting the gathering of population
data. It could be seen that the nature and uses of statistics before was for population
recordings that will be used for occupation inquiries, taxation, military power and
administrative purposes.
In England, William the conqueror required the compilation of information on
population and resources. This compilation, “The Domes Day Book,” is the first
landmark in British statistics. In the Middle Ages, registrations on land ownership and
manpower for wars were made. In the 13 th century, tax lists of the parish included the
registration of by those who were subject to tax.
It was Achenwall (1719 – 1772) who first introduced the word “statistiks” and
later popularized “statistics” in the books of Zimmerman and Sinclair.
European mathematicians and gamblers suspected that game of chance such as
rolled dice; playing cards and tossed coins followed certain laws of probability. It was
Gerolano Cardano (1501 – 1576), an Italian mathematician, physician and gambler
wrote “Liber de Ludo Aleae” (The Book on Chance and Games) who was considered as
the first known to study of the principles of probability.
In 18th century, statistics was used in the study entitled “Political Arrangement
of the Modern States of the Known World.” The description of the work was at first
verbal. Gradually, an increasing proportion of numerical data was used in the
description of the work.
In the 19th century, a Belgian Astronomer Lambert Adolfe Quetelet applied the
theory of probability to an anthropological measurement and expanded the same
principle to the physiological, physical, and chemical fields. He established a central
commission for statistics, which became a model for similar organizations in other
countries.
Sir Francis Galton (1822-1911), an English scientist and cousin of Charles
Darwin, developed the use of percentiles and the correlation method and was an early
proponent of statistical analysis as applied to mental and behavioral phenomena. Karl
Pearson, a British applied mathematician and philosopher of science, was one of the
major developers of the science of statistics. He originated the basic statistical concepts
and procedures as standard deviation, the random walk and the chi- square test.

4
Sir Ronald Aylmer Fisher (1890-1962), a British statistician, was the most
prominent figure in the fields of statistics in the twentieth century. Fisher (F- test) use
in the analysis of variance (ANOVA) in inferential statistics. He started investigations on
experimental design, randomization and mathematical statistics.
Shortly before the Second World War, the number of applications of statistical
methods in the social sciences began to increase. The number of surveys of all kinds
increased, and the need to interpret data in mathematics, business and the social
sciences made it necessary for workers to have at least a basic understanding of
statistics.

Importance

Some of the uses of statistics as a science are evident even before and nowadays
and also in empirical studies. Among its uses and contributions are the following:
 Statistics aids in the decision making in the actual condition of a field;
 Statistics summarizes or describes data about the characteristic of a group; and
 Statistics helps to forecast or predict future outcomes, make inferences, and
helps in comparing or establishing relationships.

In education, statistical techniques and methods are used to get information on


enrollment, finance, physical facilities, dropout rate, proficiency level, test and many
others.
In management, statistics is used in decision making and in varied aspects
such as organizational behavior, labor relations, human – resource management,
performance assessment and evaluation for the improvement of personal relation.
In economics, it determine trends, helps financial analysts make investment
decisions and determines the potential of an investment including inventory turn over
ratio of cash flow to total assets and inventory values to current liabilities.
Statistical designs of experiments are useful and valuable in medicine and
physical sciences, causes and effects of factors, which affect experiments once
evaluated.
Psychologists and sociologists can understand human beings better through
systematic and personality tests or through observations.
And even in sports, it is very essential in giving summary. Commentators
usually give statistical analyses as a sort of evaluation that has been done during the
game and what ought to do for tomorrow.
Statistics really promote great impact in our history and present condition. It
aids to help work easy to understand and make it reliable in delivering services to the
people and to the other fields of sciences.

Fields of Statistics

 Descriptive statistics – is concerned with gathering, classifying, presenting and


summarizing to describe group of characteristics of data. The most commonly used
summarizing values to describe group characteristics of data are frequency count and
percentages, measures of central tendency, variability, and skew ness and
kurtosis.
Examples:
1. Average Score in a College Entrance Examination
2. Deviation of CAM Student Scores in Algebra
3. Distribution of Faculty Members in terms of Academic Rank
4. Managerial Satisfaction of DMMMSU – MLUC Administrators

5
 Inferential statistics – pertains to the methods dealing with making
inferences, estimates or predictions about a large set of data using the information
gathered. Commonly used inferential statistical tools are testing hypothesis using z –
test, t – test, paired t – test, f – test, simple linear correlation, analysis of
variance (ANOVA), chi – square, regression and time series analysis.
Examples:
1. Is there a significant relationship between the grades of BSHRM in
Mathematics and English?
2. Is there any reason to doubt that hypertension is dependent on smoking?
3. Determining whether there is significant relationship between job
satisfaction and performance of BSHRM graduates in DMMMSU – MLUC.
4. Determining whether there is significant difference between On The Job
training Service Delivery of BSHRM students in San Fernando City, La
Union.

Definition of Terms

 Statistics – the science and art of gathering, organizing, presenting, analyzing,


and interpreting of data.
 Population – the complete collection of objects such as scores, people,
measurements, etc.
Example: A group composed of 50 4 th Year BSHRM students attended
bartending seminar.
 Sample – the subset of population and only part of the whole scores or
respondents.
Example: Only 30 selected 4th Year BSHRM students were advised to
attend bartending seminar.
 Process data – results from a combination of resources such as time and other
materials needed to form an object.
Example: Process in making a cement, pandesuelo, compilation, and etc.
 Parameter – the numerical measurement describing some of the characteristics
of a population.
Example: Assuming 40 out of 50 of the 4 th Year BSHRM students were
present during the bartending seminar. Therefore, 80% is the parameter
describing the characteristic of population who attended.
 Statistic – the numerical measurement describing some of the characteristics of
sample.
Example: Assuming 21 out of 30 selected 4th Year BSHRM students were
very attentive. Therefore, 70% is the statistic to describe attentiveness of the
sample.
 Data – collections of objects that may be classified as raw, grouped, primary and
secondary.
Example: Trend in Enrollment, Number of Board Passers
 Raw data – this classification of data is presented in an array or original form.
Example: 20, 25, 45, 30, 16, 27, 19, 32, are the scores of 8 selected
BSHRM students in statistics.
 Grouped data – this classification of data is arranged in a frequency table.
Example: Frequency Count of Faculty Members along Educational
Attainment.
 Primary data – this data may be classified original.
Example: Autobiography, diaries, personal data sheet form, and etc.
 Secondary data – this data may be classified replicated and copied.
Example: Copied Preparation, Citations

6
 Variable – characteristic or attribute of persons or objects, which assumes
different values (numerical) or labels (quantitative).
Example: Age, Gender, Educational Attainment
 Qualitative variable – characteristics expressed in words or statements
describing the categories into which units of observation are classified. Yields
categorical values.
Examples: Religious affiliation, Status of Appointment
 Quantitative variable – characteristics expressed in numerals or numbers
describing the categories. Yields numerical values.
Examples: Weight, height, scores in CET
 Discrete - quantitative variables that are classified countable many possible
values.’
Examples: No. of Unemployed Graduates in the Philipppines, Type “O”
blood donors.
 Continuous - quantitative variables that are amenable for numerical
measurement.
Examples: Height of a person, weight of 10 persons.
 Nominal – the crudest form of measurement. This may be classified as
responses that has something to do with labels and names where there is no ordering
and cant find differences.
Example: Market survey about a product:
20-like; 40-dislike; 10-no opinion
 Ordinal – a sort improvement of nominal level. Ranking file or responses will
fall from this level.
Example: Performance Rating of Faculty Members in DMMMSU - MLUC:
Outstanding – Very Satisfactory – Moderately Satisfactory
 Interval – possesses the properties of the nominal and interval levels yet has no
absolute zero (not amenable for numerical measurement).
Example: Sample IQ of BSHRM students in DMMMSU – MLUC.
 Ratio – possesses all the properties of the nominal, ordinal, and interval but
contrary to interval, it has a starting point or absolute zero.
Example: Sample weight of babies (in kilograms)

7
Exercise 1
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.


A. A certain electric manufacturing company makes electric wiring, which sells, to
contractors in the construction industry. Approximately 800 electric contractors purchase
wire annually. The Director of marketing wants to determine the electronic contractors’
satisfaction with the wire. He developed a satisfaction scale that yields satisfaction score
between 10 and 50 for participant responses. In a random sample, 25 of the 900
contractors were asked to complete a satisfaction survey. The satisfaction scores for the
25 participants are averaged to produce a mean satisfaction score.
1 - 2. What is the population for this study?
3 - 4. What is the sample for this study?
5 - 6. What is the parameter for this study?
7 - 8. What is the statistic for this study?

B. Identify whether the statement constitutes descriptive or inferential.


9 - 10. Relationship between monetary benefits and job satisfaction.
11 - 12. Difference between the performance of experimental group and
control group in statistics.
13 - 14. Distribution of enrollment in DMMMSU – MLUC.
15 - 16. Mean of transportation used among students studying in the
College of Arts and Management.
17 – 18. Level of effectiveness of pair-think-share strategy to CAM
students.
19 – 20. Effects of remedial sessions to the performance of students in
CAM.

C. Identify whether discrete or continuous quantitative variable.


21. Distribution of defective machines in DMMMSU-MLUC.
22. Number of criminal violations committed in a year.
23. Effectiveness rating of compilation in statistics.
24. Sales of Pay – Per – View fight between Miguel Cotto and Manny
Pacquiao in dollars.
25. Weather condition in Baguio City, Philippines.
26. Number of Tourism Spots in San Fernando City affected by Global
Financial Crisis.

D. Determine which of the four levels of measurement is most appropriate.


27. Weights of a sample newly born babies.
28. Brand of expensive cars displayed in a car show.
29. Top 10 world billionaires for year 2007 according to Forbes survey.
30. Sample salaries of newly BSHRM graduates in the different hotels
and restaurant in San Fernando City, La Union.

8
Chapter 2. Collection of Data

Objectives

 Identify, explain and differentiate quantitative and qualitative methods of data


gathering.
 Discuss the steps in making a research or study.
 Identify, explain and differentiate sampling techniques in acquiring the target
respondents.

Research Methodologies

Once a research question has been determined the next step is to identify which
method will be appropriate and effective.

 Quantitative research – this includes the methods such as Experiments like


random treatment assignments and quasi experiments using nonrandomized
treatments and Surveys which are cross-sectional or longitudinal.
 Qualitative research - this includes ethnographies which are observations of
groups; grounded theory which uses multi-staged data collection; phenomenological
studies which studying subjects over a period of time through developing relationships
with them and reporting findings based on research experiences; and case studies
which use various data to investigate the subject over time and by activity.

Data Collection Techniques

There are two sources of data. Primary data collection uses surveys, experiments
or direct observations. Secondary data collection (data mining) may be conducted by
collecting information from a diverse source of documents or electronically stored
information.
 Document – This method identify trends in leisure research and practice.
Participants keep diaries and journals researcher conducts content analysis of studies,
reports and diaries.
Advantages:
1. It can get comprehensive and historical information.
2. It can yield an impression of how strategy operates without interrupting the
strategy.
3. The information already exists.
Challenges:
1. It often takes a lot of time.
2. Information can be incomplete.
3. Need to be clear about what you are looking for.
4. Data is restricted to what already exists.

Types of Documents Nature


Process of systematically examining past events and communicate
1. Historical Research
on understanding of the existence.
Process of reading, analyzing, evaluating, and summarizing
2. Literature Review
scholarly materials about a specific topic.
Statistical procedure that integrates the results of several independent
3. Meta – Analysis studies considered to be combinable. It may explain heterogeneity
between the results of individual studies.

9
A record with discrete entries arranged by date reporting on what
4. Diaries
happened over the course of the day.
Methodology in the social sciences for studying the content of
communication. It is the study of recorded human communication
5. Content Analysis such as books, websites, printings and laws.

 Observation – this method gather accurate information about how a strategy


actually operates, particularly about processes.
Advantages:
1. View operations of a strategy as they are actually occurring.
2. Can adapt to events as they occur.
Challenges:
1. Can be difficult to interpret seen behaviors.
2. Can be complex to categorize observations.
3. Can influence behaviors of strategy participants.
4. Can be expensive.

Types of Observation Nature


It is one that does not have just one correct answer. Any answer
1. Interpretive that includes support from the text may have some degree of
“correctness.”
It is a multi – format, unpublished group of materials gathered
and organized by an anthropologist, folklorist, ethnomusicologist,
2. Ethnographic
or other cultural researchers to document human life and
traditions.
This refers to a form of sociological research methodology in
3. Participant Observer which the researcher takes on a role in the social situation under
observation.
This is a descriptive or explanatory analysis of a person, group or
event. An explanatory case study is used to explore causation in
order to find underlying principles. To fully understand or depict
4. Case Study stakeholder’s experiences in strategy, and conduct
comprehensive examination through cross comparisons of
cases.

 Survey – this technique is used to quickly and easily get a lot of information
from people in a non – threatening way.
Advantages:
1. It can complete the information anonymously.
2. Inexpensive to administer.
3. Easy to compare and to analyze.
4. Can administer to many people.
5. Factual and relevant information can be obtained in using this method.
6. Get a full range and in-depth of information.
7. Can be flexible.
Challenges:
1. Doesn’t always get the full story.
2. Might not get careful feedback.
3. Question wording can bias respondent’s answer.
4. Sometimes yield inaccurate information.
5. Can take a lot of time.

10
6. Can be costly.
7. Interviewer can bias the response.

 Experimental – this method obtains information under controlled conditions.


Subjects may be randomly assigned to various tests and experiences then
assessed via observation or standardized scales.
Advantages:
1. It is systematic and scientific.
2. It is innovative.
3. Researcher can manipulate one or more variables and controls and
measures any change in other variables.
Challenges:
1. It requires thorough understanding of experimental research designs.
2. It takes time to undergo the process.
3. It is sensitive.

Samples of Qualitative and Quantitative Researches:


1. Design and Analysis of the Student Strengths Index for Non Traditional CAM
Students
2. Attitude, Knowledge and Experience of Staff Nurses on Prioritizing Comfort
Measures in Care of the Dying Patient in Region 1.
3. Effects of Smoking and Drinking Liquor on Teen Pregnancy.
4. An Autobiography of Health: A study of Health and Identity in San Fernando
City, La Union
5. Emotional Intelligence in Pre – Registration Among BSBA Students.
6. Re-imposition of Death Penalty on First Degree Murder Incidents in the
Philippines.
7. The Material Culture of Childbirths in Medieval and Sixteenth Century.
8. Marriage Practices among Igorot, Ilocano and Ilonggo.
9. Poverty Alleviation in Region 1.
10. Effectiveness of Programmed Materials as an Alternative Learning Material in
Statistics.
11. Importance of Theory of Job Performance and Job Satisfaction.
12. Comparative Study on the Results of Effectiveness of Learning Materials to
SLC, UCC and DMMMSU-MLUC Students.
13. Review on the Content and Face Validity of Learning Materials in Statistics.
14. Case Study on the Management and Financial Aspects of DMMMSU
Cooperative.

Plan To Conduct a Research

 Think and decide for a title.


Characteristics of a Good Research Problem/Title:
1. Interesting
2. Relevant
3. Novel
4. Measurable
5. Time Bound
6. Ethical
Example: “Impacts of the Global Financial Crisis to SME’s Industries in
Region 1 for calendar Year 2010 - 2012”

 After which, estimate the number of population to be targeted. As much as


possible, consider the actual number of population using registration method.

11
 Assess resources such as time, accessibility of respondents, and money factors
before conducting a research.
 Determine the sample size needed. Lynch formula is the basic approach to
determine the appropriate sample.

x p 1  p 
2
NZ
n 
N d 2  Z 2 p  1  p  

 

where:
n = the sample required
N = the actual population
p = the largest possible proportion (0.50)
d = sampling error
z = the value of the normal variable (1.96)
for reliability level of 0.95

 Apply the appropriate sampling techniques in determining the sample size


needed for an overlapping and non – overlapping number of respondents.

 Prepare the materials that will be needed that if questionnaires will be used,
validity (truthfulness of the instrument) and reliability (consistency of the
instrument) should be tested.

Sampling Techniques

This method will require statistical approaches or designs in determining the


appropriate sample taken from non – overlapping or overlapping number of population.

 Random Sampling - method of selecting a sample size (n) from a population (N)
where all possible combinations of size (n) have an equal chance of being selected as the
sample.
1. Lottery sampling – this constitutes the principle of a raffle system where
everybody is given the chance to be chosen as respondent.
Example: How will you select 5 winners out of 50 to come and see the
fight of Manny Pacquiao against Mayweather?

2. Table of Random Numbers – this makes use of a table in determining the


appropriate sample size needed. Selection of respondents is left adequately on the given
table of random numbers.
 Systematic Sampling – this method involves an ordering scheme or selection
representing the population.
1. Stratified sampling – selection of samples is taken from non – overlapping
population or homogeneous in nature. One Department can be considered as one
category or stratum.
Example: Project TGP
Talek Pablo, president of TGP/S Fraternity, and with his selected
members would want to study the effectiveness of accredited school Fraternities
and Sororities in learning process in DMMMSU – MLUC in partial fulfillment for
their tertiary level requirement.
The SAS Coordinator provided them the copy of only 5 accredited
fraternities and sororities in the campus, APO with 320 members, SI with 1,200
members, BS with 710 members, UI with 830, and TGP/S with 375.

12
Based on the information, how can Pablo and his selected members pick
sample units proportionate to the aforementioned number of members for each
organizations and fraternities and sororities?

Steps:
1.1 Determine the distribution of population and its’ percentage

Distribution of Population and its’ Percentage


Fraternity / Sorority frequency Percentage
APO 320 0.09
SI 1,200 0.35
BS 710 0.21
UI 830 0.24
TGP/S 375 0.11
Total 3435 100

1.2 Use the Lynch formula to determine the sample size needed in the study.

x p 1  p 
2
NZ
n 
N d 2  Z 2 p  1  p  
 
3435 1.96  x 0.50  0.50 
2

3435  .05  2  1.96  2 0.50  0.50  
 
 346

1.3 Determine the distribution of sample for each fraternity / sorority


proportionate to the actual number of population.

Distribution of Sample for each Fraternity / Sorority


Fraternity / Sorority Proportionality Sample Size
APO 346 * 0.09 31
SI 346 * 0.35 121
BS 346 * 0.21 73
UI 346 * 0.24 83
TGP/S 346 * 0.11 38
Total 346

2. Cluster sampling – selection of samples is taken from over – lapping


population or heterogeneous in nature. One College may be considered as one cluster.

Example: Project TGP in a Wide Scope


Let us say Talek Pablo, president of TGP/S Fraternity, and with his
selected members would want to study the effectiveness of accredited school
Fraternities and Sororities in learning process in DMMMSU in partial fulfillment
for their tertiary level requirement.
The SAS Coordinator of Mid La Union Campus provided them the copy of
only 5 accredited fraternities and sororities in the campus, APO with 320
members, SI with 1,200 members, BS with 710 members, UI with 830, and
TGP/S with 375.
In the South - La Union Campus, the SAS Coordinator provided them the
copy of 6 accredited fraternities and sororities. De Molay with 210 members,

13
Beta Sigma with 675, UI with 970, SI with 1400, TGP / S with 365, and APO
with 200 members.
However, in North – La Union Campus, the SAS Coordinator only
provided them the copy of 4 accredited fraternities and sororities. SI with 500
members, UI with 650, Beta Sigma with 240, and TGP with 170 members.
Based on the information, how can Pablo and his selected members pick
sample units proportionate to the aforementioned number of members for each
fraternities and sororities in the different campuses?

Steps:
a. Determine the distribution of population and its’ percentage

Distribution of Population and its’ Percentage


Campus Fraternity / Sorority Frequency Percentage
SI 500 0.32
UI 650 0.42
North La Union
BS 240 0.15
TGP 170 0.11
Total 1560 100%
APO 320 0.09
SI 1,200 0.35
Mid La Union BS 710 0.21
UI 830 0.24
TGP/S 375 0.11
Total 3435 100%
De Molay 210 0.05
BS 675 0.18
UI 970 0.25
South La Union
SI 1400 0.37
TGP / S 365 0.10
APO 200 0.05
Total 3820 100%
Grand Total 8815 100%

b. Use the Lynch formula to determine the sample size needed in the
study.

x p 1  p 
2
NZ
n 
N d 2  Z 2 p  1  p  
 
8815  1.96  x 0.50  0.50 
2

8815  .05 2  1.96  2 0.50  0.50  
 
 368

2.3 Determine the distribution of sample for each fraternity / sorority


proportionate to the actual number of population.

Distribution of Population and its’ Percentage


Campus Fraternity / Sorority Proportionality Sample Size
North La Union SI 66 * 0.32 21

14
UI 66 * 0.42 28
BS 66 * 0.15 10
TGP 66 * 0.11 7
Total 66 (0.18) 66
APO 144 * 0.09 13
SI 144 * 0.35 50
Mid La Union BS 144 * 0.21 30
UI 144 * 0.24 35
TGP/S 144 * 0.11 16
Total 144 (0.39) 144
De Molay 166 * 0.05 8
BS 166 * 0.18 28
UI 166 * 0.25 40
South La Union
SI 166 * 0.37 58
TGP / S 166 * 0.10 16
APO 166 * 0.05 8
Total 158 (0.43) 158
Grand Total 368

3. Multi – stage sampling – this makes use of geographical bases and


considered as wide selection of samples.
Example: Awareness and Acceptability of the Proposed Amendments of
the Constitution: A National Study

 Non – Random Sampling – this is considered a non – probability sampling that


makes use of personal judgment in selecting samples.
1. Purposive sampling – this sampling method is parallel to the purpose of the
researcher.
Example: Assuming that an agent of a certain company is to submit a
quarterly market report about the product to the manager. In order to satisfy
his/her boss and looking forward for the promotion, he/she decided to consider
only the customers’ perceptions.
2. Quota sampling – selection of sample will be taken from personal and
assumed estimate.
Example: A group of researchers would like to determine the level of
popularity of amateur basketball league players, specifically, NCAA and UAAP in
Region 1. The group prefers to consider the city or town which has the most
number of population and estimate that it is also proportional to the actual
number of those who are watching the sports event.
3. Convenience sampling – selection of sample is influence by personal
satisfaction.
Example: Assuming a group of students is conducting a research about
the effectiveness of mobile cellular phones to daily living in DMMMSU – MLUC.
Instead of going into the process of giving the students chance to be chosen as
the respondent, the group asked the perceptions of those that have numbers or
registration in their respective mobille phones.

Exercise 2
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

15
A. Identify whether the statement is a qualitative research or quantitative research. Give
also the appropriate data collection technique and specific type that can be utilized.
1. A Company Case Study on LUELCO Multi – Purpose Cooperative

2. Philippines Colonization: It’s Impact to Socio – Political at Present

3. Effectiveness of Programmed Materials in Statistics

4. Birth and Death Practices among Isolated Communities in Region 1

5. Acceptability of Learning Materials in Statistics as Perceived by the Pool of Experts

6. Effectiveness of Bilingualism in Teaching Mathematics to UP DILIMAN and


DMMMSU-MLUC Students: A Comparative Study

7. Organizational Climate and Job Satisfaction among Employees in DMMMSU – MLUC

8. Teleconferencing on the Latest Trends and Issues in Teaching Mathematics

9. Demographic Status Among Unmarried Couple in San Fernando City, La Union

10. Understanding the Laws of Motion

B. Make a sampling design.


The researcher would like to conduct a study in order to determine the
effectiveness of Bilingualism in teaching mathematics to students in DMMMSU-MLUC.
The registrar provided a copy of the number of students enrolled to the different
programs offered in the campus.

Table 1. Distribution of Programs in terms of Population


Program N % n
CAM BSBA 350
BSHRM 400
AB 150
BSOM 250
COE
BSME 500
BSEE 320
CTED
BSE 420
BSIE 340
CIT
IIT 450
COT
BSEMT 650
BSFT 380
BSGT 260
BSIT 200
Total
Chapter 3. Presentation of Data

Objectives

16
 Discover the different ways of presenting data.
 Identify the essential parts of a frequency distribution.
 Learn to construct frequency distribution.
 Identify and present data using Microsoft excel.

Textual Presentation

 The presentation is in narrative or paragraph form. The data are within the text
of the paragraph. This form may not get the immediate interest of the reader.
Examples:
1. Latest Survey conducted by the National Statistics Office revealed that
there were already 2.9 million unemployed graduates accumulated from 2005 –
2007.
2. The continuing decrease of the price of crude and gasoline in the
Philippines is simply the effect of the low price of this products in the World
Market.

Tabular Presentation

 The presentation constitutes the process of classification and tabulation of data.

1. Tabulation – is the process of condensing classified data and arranging them


in table. Through this process understanding and comparison of the data can
easily be made.
2. Classification – is the process of putting together similar terms or variable
from the mass of data based on their characteristics such as age, height,
religious affiliation, gender, and etc..

Parts of a Table:
1. Table heading – it consist of a table number and the brief title.
2. Stubs – it consists of the classifications or categories that are found on the
left side of the body of the table.
3. Box head – it identifies what are obtained in the column.
4. Body – it is the main part of the table and contains the substance of the
table.
5. Footnote – It is used for citations and references.

Types:
1. General or Reference Table – a repository of information and the main
purpose is to present data in such a way that individual items may easily be
found by a reader.
Example:
Table1. College of Arts and Management Faculty Profile
As to Educational Attainment
Males Females Total
PH.d, ED.d 0 5 5
Master’s Degree 3 9 12
BS/AB 2 11 13
Total 5 25 30
a. What percent of the total population both males and females obtained
master’ degree? 40%

17
b. What percent of the population of the males holds BS/AB degree?
40%
c. How many males and females and the percentage description is
dominant in the College of Arts and Management Faculty Profile as to
Educational Attainment? 13 & 43.33%

2. Summary or text table – usually small in size and design to guide the reader
in analyzing the data.
Example:
Table 2. Population of Students in the College of Arts
and Management for S/Y 2007-2008
Programs 1st Year 2nd Year 3rd Year 4th Year Total
BSM 120 100 70 68 358
AB 70 60 50 40 220

BSHRM 150 110 85 75 420


BSOM 150 120 80 50 400
Total 490 390 285 233 1398
a. Among of the given programs, which has the greatest number of
population from first year to fourth year? As to what percent?
BSHRM & 30.04%
b. Which has the lowest population? As to what percent?
AB & 15.74%

 Frequency Distribution – refers to tabular arrangement of data by classes or


categories together with their corresponding class frequencies.
Essential parts:
1. Class Frequency (f) – refers to the number of observations or items belonging
to a category.
2. Class Interval (CI) – refers to grouping or category defined by a lower limit and
an upper limit.
3. Class Width (i) – refers to the size of each class interval that is obtained by
dividing the range by the desired class intervals.
4. Class Boundaries (CB) – these are more precise expressions of the class limits
by at least 0.5 situated between the upper limit of one interval and lower limit of
the next interval.
5. Class Mark (M) – representative value of the corresponding class interval.
6. Less Than Cumulative Frequency - Total frequency of all values less than the
upper class boundary of a given class interval.
7. Relative Cumulative Frequency - Enables us to read off the percentage of
observations falling down below certain specified values or categories.

Steps in Constructing Frequency Distribution

1. Determine the range by obtaining the difference between the highest score and lowest
score.
2. Determine the ideal number of class intervals or categories desired somewhere
between 5 and 20.
3. Determine the appropriate class size or class width by dividing the range and the
desired categories.
4. Write the class intervals starting from the lowest lower limit provided that it should
be divisible by class width.
5. Determine the class frequencies after tabulation referring to the tally column.
6. Assign representative for each category by computing for the class mark that can be

18
obtained by dividing the sum of the upper and lower limit of the class interval by 2.

Example

A. Construct frequency distribution. The following scores are the results in Statistics
examination.
88 77 72 85 90 20 25 60 45 77
50 62 76 56 42 24 21 40 41 78
61 67 35 29 78 87 84 90 64 58
79 74 69 66 61 68 56 51 48 39
27 81 75 71 67 63 57 53 68 85
80 75 70 63 57 52 49 44 33 23
Steps:
1. Range = 90 – 20 = 70
2. No. of Class Intervals = 7
3. i = 70 / 7 = 10

CI CB Tally F M <cf %(<cf)


20 – 29 19.5 – 29.5 IIIII-II 7 24.5 7 12%
30 – 39 29.5 – 39.5 III 3 34.5 10 17%
40 – 49 39.5 – 49.5 IIIII-II 7 44.5 17 28%
50 – 59 49.5 – 59.5 IIIII-IIII 9 54.5 26 43%
60 – 69 59.5 – 69.5 IIIII-IIIII-III 13 64.5 39 65%
70 – 79 69.5 – 79.5 IIIII-IIIII-II 12 74.5 51 85%
80 – 89 79.5 – 89.5 IIIII-II 7 84.5 58 97%
90– 99 89.5 – 99.5 II 2 94.5 60 100%
N = 60

a. How many of the students and its percentage got a score below 59.5?
26 students & 43%
b. What percent of the total population obtained the highest frequency?
21.67%

Exercise 3

19
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. The following shows the list of transportation means of sample 50 BSBA students in
going to DMMMSU – MLUC.

Bus Tricycle Bus Bus Motorcycle


Tricycle Tricycle Car Motorcycle Bus
Bus Jeepney Tricycle Tricycle Bus
Car Jeepney Car Tricycle Jeepney
Jeepney Bus Tricycle Bus Bus
Bus Tricycle Jeepney Car Jeepney
Jeepney Tricycle Tricycle Bus Bus
Jeepney Jeepney Tricycle Tricycle Tricycle
Bus Jeepney Jeepney Car Bus
Tricycle Bus Bus Jeepney Jeepney

1 – 2. What mean of transportation is most common to the students in


going to DMMMSU – MLUC?
3 – 4. What mean of transportation is not common to the students in
going to DMMMSU – MLUC?
5 – 6. What percent of the population does not prefer Bus and Jeepney
transportations?
7 - 8. What percent of the population prefer Motorcycle and Tricycle
transportations?
9 - 10. What percent of the population does not prefer Car and Tricycle
transportations?

B. Construct a frequency distribution of the Number of Transactions Made by a Teller


Machine in 28 Days and interpret the result.

25 40 42 25 39 43 36
28 36 48 34 46 39 49
29 44 36 25 39 40 53
32 35 29 35 40 48 37

11 – 30.
CI CB Tally f M <cf %(<cf)

Graphical Presentation

20
The most convenient and popular way of describing data is using graphical
presentation.
Advantages:
@ It is easier to understand and interpret data when they are presented
graphically than using words or a frequency table.
@ It can present data in a simple and clear way.
@ It can illustrate the important aspects of the data.
This leads to better analysis and presentation of the data. In this topic, we will
discuss the approach for the most commonly used graphical methods such as bar
charts, histograms, frequency polygons, pie chart and xy scatter diagram.

Classifications

 Bar charts
Bar charts are used when comparing the values of multiple variables. Bar charts
are presented using vertical or horizontal bars. Bars may be drawn separately from
each other. It is important that the width of each bar should be the same to avoid
misleading information. Table 1 presents the population of Staff in DMMMSU.

Table 1. Number of Staff in DMMMSU


Operating Agency N %

NLUC 210 23.3

MLUC 300 33.3

SLUC 170 18.9

SRDI 100 11.1

Apiculture 120 13.3

Total 900 100.0

Figure 1 is an example of simple bar chart. Purposively, they reflect the actual
magnitude of the frequency of each item and frequencies can be compared by
comparing the heights of bars on the chart. Apparently, there are more staff in
DMMMSU – MLUC and SRDI has the least number of staff.

Figure 1. Number of Staff in DMMMSU

21
Table 2 presents the population of male and female staff in five operating units in
DMMMSU.
Table 2. Number of Male and Female Staff in DMMMSU
Operating Agency Male Female N
NLUC 90 120 210
MLUC 100 200 300
SLUC 70 100 170
SRDI 40 60 100
Apiculture 50 70 120
Total 350 550 900

Figure 2 presents multiple bar charts that show comparative magnitudes of


each component. It shows the number of male and female staff in five operating
agencies in DMMMSU. It is apparent that generally, there are more female staff in the
five operating agencies. Moreover, there are more male and female staff in DMMMSU –
MLUC than other operating agencies.
Figure 2. Number of Male and Female Staff in DMMMSU

 Histograms
Histograms or column bar charts are common ways of presenting frequency in a
number of categories. Commonly used graphical presentation methods also include the
frequency polygon and ogive. Histograms portray an unequal width frequency
distribution table for further statistical use. The bars appear in a histogram where the
classes are marked on the x axis and the class frequencies on the y axis. It is important
to note that a bar chart does not have x-axis units. The histogram is constructed by
creating x-axis units of equal size and these should correspond to the frequency table.

Figure 3 shows the histogram presentation of the frequency table of scores of


BSBA students in a 100 – item statistics examination.
88 77 72 85 90 20 25 60 45 77
50 62 76 56 42 24 21 40 41 78
61 67 35 29 78 87 84 90 64 58
79 74 69 66 61 68 56 51 48 39
27 81 75 71 67 63 57 53 68 85
80 75 70 63 57 52 49 44 33 23

22
Based on the histogram in Figure 3, we can conclude the following:
1. The lowest score got by the students in the test was 20 and the highest was 90.
2. The class with the highest frequency is 60 up to 69 scores while the class with the
lowest is 30 – 39 with a total of 13 and 3 observations fall within this range,
respectively.
3. Majority or 72% of the students passed while 28% failed.

Figure 3. Scores of BSBA Students in a 100 – Item Statistics Examination

 Frequency Polygon (Line Graph)


Frequency polygons are not that commonly used when compared with
histograms. These consist of line segments which are drawn by connecting the points
formed by the intersections of class midpoints and class frequencies. The construction
of a frequency polygon is illustrated in Figure 4.

Figure 4. Scores of BSBA Students in a 100 – Item Statistics Examination

 Pie Chart
This presentation is best used when the total categories are between 2 to 6. A
pie chart shows the proportional size of items that make up a data series to the sum of
the items. It always shows only one data series and is useful when you want to

23
emphasize a significant element. To make small slices easier to see, you can group them
together as one item in a pie chart and then break down that item in a smaller pie or
bar chart next to the main chart. Apparently, it clearly show that DMMMSU – MLUC
has the most number of male staff in five operating agencies and Apiculture has the
least number of male staff.

Figure 5. Population of Male Staff in DMMMSU

Other classifications:

 Area – an area chart emphasizes the magnitude of change over time. By displaying
the sum of the plotted values, an area chart also shows the relationship of parts to a
whole. See Table 3.

 XY (Scatter) – xy (scatter) chart either shows the relationships among the numeric
values in several data series or plots two groups of numbers as one series of xy
coordinates. This chart shows uneven intervals – or clusters – of data and is

24
commonly used for scientific data. When you arrange your data, place x values in
one row or column, and then enter corresponding y values in the adjacent rows or
columns. See Table 9.

 Doughnut – like a pie chart, a doughnut chart shows the relationship of parts to a
whole, but it can contain more than one data series. Each ring of the doughnut
chart represents a data series. See Table 10.

 Stock – the high – low- close chart is often used to illustrate stock prices. This chart
can also be used for scientific data, for example, to indicate temperature changes.
You must organize your data in the correct order to create this and other stock
charts. See Table 11.

25
 Bubble – a bubble chart is a type of xy (scatter) chart. The size of the data marker
indicates the value of a third variable. To arrange your data, place the x values in
one row or column, and enter corresponding y values and bubble sizes in the
adjacent rows or columns. See Table 12.

 Radar - in a radar chart, each category has its own value axis radiating from the
center point. Lines connect all the values in the same series. A radar chart
compares the aggregate value of a number of data series. See Table 13.

26
 Surface – a surface chart is useful when you want to find optimum combinations
between two sets of data. As in a topographic map, colors, and patterns indicate
areas that are in the same range of values. See Table 14.

 Cone, Cylinder and Pyramid – the cone, cylinder and pyramid data markers can
lend a dramatic effect to 3-D column and bar charts. See Table 15, 16, & 17.

27
28
29
Exercise 4
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. The following data have been obtained for the number of customers arriving per hour
in a sample of 30 supermarkets.

52 29 26 29 32 26
62 49 40 25 25 31
44 24 27 38 73 32
27 32 28 30 57 34
39 37 28 24 51 50

a. Construct a histogram or line graph.

b. Give at least three conclusions that can be drawn from the presentation.
b.1

b.2

b.3

B. Construct the appropriate graphical presentation can be utilized on the information


below. Summarize and interpret as well at the back of this paper.

Table 1. Annual Budget Allocation of MACO Company


for Calendar Year 2013 – ‘14
Component Budget
Salaries and Wages Php30,000,000
Electricity Php10,000,000
Miscellaneous Php8,000,000
Maintenance Php5,000,000

30
Travel Expenses Php3,000,000
Materials Php2,500,000
Other Expenses Php1,500,000
Total Php60,000,000

Chapter IV. Measures of Central Location

Objectives

 Discuss and differentiate the characteristics, uses, and limitations of the


measures of location. (mean, median, & mode)
 Compute and interpret applications of the different measures of central
tendency presented in ungrouped or grouped data.
 Determine when to apply the appropriate measures of central tendency.
 Differentiate and compute three different classifications of quantiles or fractiles.

Measures of Central Tendency

The study of the descriptive statistics is not complete without the inclusion of
the concept of a measure of the “Central Location.” It is the tendency of the
observations to converge at a point or at the center of a frequency distribution.
There are three measures of central location widely used in descriptive statistics;
the mean, median, and mode, each of which has its own appropriate use in describing
the sample or population under study.

Mean

Of the three measures, the mean is the basic to higher statistical computation,
since it varies less from sample to sample. It is more reliable because in computing for
the mean all the data in the distributions are used.
Characteristics:
1. It is the average value in the given distribution.
2. It serves as the balance point in the distribution.
3. It is the most sensitive measure of location.
4. It is always affected by extreme values.
5. It is used when data is categorized as interval or ratio.

Computations:
1. Ungrouped Data
n
 Xi
i1
x 
N
Where:
x = the unknown mean.
X i = scores or observations.
n
 X = summation or total of items or observations.
i1 i
N = number or population.

31
2. Grouped Data
n
 f i Mi
i 1
x   Long Method 
N

Where:
x = the unknown mean.
f i = number of items or observations.
X i = midpoint or representative of each category.
n
 fi Mi = summation or total of the product of frequency and
i 1
midpoint for each category.
N = number of population.

Alternative Formula:
 n ' 
  fi d i 
' i  1
x  x   i  Coded Deviation Method
 N 
 
 
Where:
x = the unknown mean.
x ' = the assumed mean.
f i = number of items or observations.
d i = deviation from the mean.
n
 fi d i = summation or total of the product of the frequency and
i 1
deviation from the mean of each category.
N = number of population.

Weighted Mean

There are some instances when some values or items are taken with greater
importance than others. Computation of the average in this situation makes use of
weights. Sometimes it is applicable when data are presented by ranking.
1. For Grouped and Ungrouped Data
n
 Wi X i
i 1
wx 
Wt
Where:
w x = the unknown weighted mean.
W = corresponding weights.
i
X = scores or observations.
i
n
 Wi X i = summation of the product of weights times observations.
i 1
W = total number of weights.
t

32
Examples

A. Solve what is being asked for each number.


1. The number of incorrect answers on a true – false competency test for a random
sample of 15 students was recorded as follows: 2, 1, 3, 5, 7, 3, 1, 4, 2, 2, 6, 1, 3, 2, & 4.
Find the mean of the incorrect answers on a true – false competency test.
Computation: (Using the Formula)
n
 Xi
i 1
x 
N
46

15
 3.1 or 3
Computation: (Using ES PLUS CASIO)
Mode, stat, 1 – var, 2 =, 1 =, 3 =, 5 =, 7 =, 3 =, 1 =, 4 =, 2 =, 2 =, 6 =, 1 =,
3 =, 2 =, 4 =, AC, shift, 1, var, x , then =. The mean is the same as 3.1.
Computation: (Using MS EXCEL)
select a vacant cell,=, average, highlight the scores,), press enter.
The mean is the same as 3.1.

Interpretation: The value could be analyzed that the average or typical incorrect
answer on a true – false competency test obtained by 15 sample students is 3.

2. What is the average length of lives of 10 sample car batteries that lasted with the
following (in years): 1.5, 1.7, 2.1, 2.3, 3, 3.5, 5.3, 5.7, 6.2, & 6.6?
Computation: (Using the Formula)
n
 Xi
i 1
x 
N
37.9

10
 3.79
Computation: (Using ES PLUS CASIO)
mode, stat, 1 – var, 1.5 =, 1.7 =, 2.1 =, 2.3 =, 3 =, 3.5 =, 5.3 =, 5.7 =,
6.2 =, 6.6, AC, shift, 1, var, x , then =. The mean is the same as 3.79.
Computation: (Using MS EXCEL)
select a vacant cell, =, average, highlight the scores,), press enter.
The mean is the same as 3.79.

Interpretation: The value could be analyzed that the mean life of 10 sample car
batteries lasted for 3.79 years.

3. The following are the responses of the 60 students (two sections) in statistics class
about the level of effectiveness of the programmed material in statistics after the
experiment.

Level of Effectiveness
Very Effective (5) Effective (4) Moderate (3) Slight(2) Negligible (1)

33
14 27 10 7 2
Computation: (Using the formula)
n
 W X
i 1 i i
wx 
Nt
224

60
 3.73
Computation: (Using the ES PLUS CASIO)
mode, stat, 1 var, shift, mode, arrow down, stat, 1, for x column (5 =, 4 =, 3 =,
2 =, 1 =), for freq column (14 =, 27 =, 10 =, 7 =, 2 =), AC, shift, 1, var, x , then =.
The weighted mean is the same as 3.73.

Interpretation: The result could be analyzed that the programmed material in statistics
is somewhat close to be effective and favorable to the majority of the students.

4. With reference to the table of Number of Transactions made by Teller Machine in 28


days, find the mean and interpret the result.

Table 18. Number of Transactions Made by Teller Machine in 28 Days.


d'
CI CB Tally f M fM fd’
25 – 29 24.5 – 29.5 IIIII-I 6 27 162 -2 -12
30 – 34 29.5 – 34.5 II 2 32 64 -1 -2
35 – 39 34.5 – 39.5 IIIII-IIII 9 37 333 0 0
40 – 44 39.5 – 44.5 IIIII-I 6 42 252 1 6
45 – 49 44.5 – 49.5 IIII 4 47 188 2 8
50 - 54 49.5 – 54.5 I 1 52 52 3 3
N = 28 fM =1051 fd’ = 3

Computations: (Using the Formula)


n
 f M
i 1 i i
x   Long Method
N
1051

28
 37.54 or 38

  n 
 f i d' 
i  1 i
x  x'   i  Coded Deviation Method 
 N 
 
 
 3 
 37   5
 28 
 37.54 or 38

Computation: (Using ES PLUS CASIO)

34
mode, stat, 1 var, shift, mode, arrow down, stat, 1, for x column (27 =, 32 =, 37 =,
42 =, 47 =), 52 =), for freq column (6 =, 2 =, 9 =, 6 =, 4 =, 1 =), AC, shift, 1, var,
x , then =. The mean is the same as 37.54 or 38.

Interpretation: The value could be interpreted and perceived that teller machine can
accommodate and process transactions an average of 38 in a day.

Median

Unlike the mean, the median is not easily affected by extreme values, since only
the middle terms or values which are arranged from increasing or decreasing are
considered in the computation. For research studies associated with ordinal data,
median is applicable and stable measure to use.
Characteristics:
1. It divides the distribution into two equal parts.
2. It is not amenable for further computation since middle terms are being
considered.
3. It is not affected by extreme values.
4. It is used when data is categorized as ordinal or ranking.

Computation:
1. Ungrouped Data
1. a When n is an odd number

x  x1  n
2
Where:

x = the unknown median.
N = the number of items or observations.

1. b When n is even number


xn  xn  2
 2
x  2
2
Where:

x = the unknown median.
N = the number of items or observations.
2. Grouped Data
N 
  2  PS 
x  Ll CB   i
 fmc 
 
Where:

x = the unknown median.
Ll
CB = lower class boundary where Nth / 2 item is found.
N
= median class.
2
PS = partial sum of the frequency before the median class..
fmc = frequency of the median class or where Nth / 2 item is found.

Examples

35
A. Solve what is being asked for each number.
1. A food inspector examined a random sample of 5 cans of a certain brand of canned
goods to determine percent of impurities. The following data were recorded, 1.2, 1.8,
0.8, 1.3, and 1.8. Find the median.

Computation:

x  x1  5
2
 3
Interpretation: Therefore, the third item of the observation and may be classified as
the median is 1.3 after arranging from ascending order.

2. With reference to the table of Number of Transactions made by Teller Machine in 28


days, find the median and interpret the result.

Table 19. Number of Transactions Made by Teller Machine in 28 Days

CI CB Tally F M fM d' <cf

25 – 29 24.5 – 29.5 IIIII-I 6 27 162 -2 6


30 – 34 29.5 – 34.5 II 2 32 64 -1 8 (PS)
35 – 39
34.5 – 39.5 IIIII-IIII 9 37 333 0 17
(MC)
40 – 44 39.5 – 44.5 IIIII-I 6 42 252 1 23
45 – 49 44.5 – 49.5 IIII 4 47 188 2 27
50 - 54 49.5 – 54.5 I 1 52 52 3 28
N = 28 fM =1051

Computation: (Using the Formula)


Median class = 28 / 2 = 14
N 
  2  PS 
x  L CB   i
 f mc 
 
14  8 
 34.5   5

9 
 37.83 or 38
Interpretation: The value could be interpreted that the teller machine can process
lower or greater than the middle value of 38 a day.

Mode

The mode, although easy to compute is seldom used because of its unstable
characteristic. However, it is a more appropriate measure of central location for data
which all for a nominal scale as a measure of popularity.
Characteristics:
1. Computation depends on the frequency occurrence or that appears most
frequent in the distribution.
2. It is appropriate to use when the distribution is bimodal.

36
3. It is used when data is categorized as nominal level of measurement.

Computation:
1. Ungrouped Data
Select the score that appears most frequent in the distribution.
  d 
2. Grouped Data x  L CB   1 i
 d1  d2 
Where:

x = the unknown mode.
L CB = lower class boundary of the modal class.
d = difference of the frequency of the modal class minus the frequency
1
that precedes it.
d = difference of the frequency of the modal class minus below of the
2
modal class frequency.

Examples

A. Solve what is being asked for each number.


1. A manufacturer would like to determine the size of shoes of adult that was sellable
during the month of June. The following sizes were recorded for 10 sample adults, 4.5,
5.5, 6, 6, 7.5, 8, 8, 8, 9, & 10. What particular size of shoes was sellable?
Computation and Interpretation:
Since there are two scores that appears frequently in the distribution. We have a
bimodal solution. But the size of shoes that appears most frequent and sellable during
the month of June was size 8.

2. With reference to the table of Number of Transactions made by Teller Machine in 28


days, find the mode and interpret the result.

CI CB Tally F M fM d' <cf


25 – 29 24.5 – 29.5 IIIII-I 6 27 162 -2 6
30 – 34 29.5 – 34.5 II 2 32 64 -1 8
35 – 39
34.5 – 39.5 IIIII-IIII 9 37 333 0 17
(MoC)
40 – 44 39.5 – 44.5 IIIII-I 6 42 252 1 23
45 – 49 44.5 – 49.5 IIII 4 47 188 2 27
50 - 54 49.5 – 54.5 I 1 52 52 3 28
N = 28 fM =1051
Computation:
  d1 
x  L CB   i
 d1  d2 
9  2
 34.5   5
7  3
 38
Interpretation: The value could be interpreted and perceived that there are 38
transactions that can be processed by teller machine everyday.

37
Exercise 5
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Below is the table of efficiency ratings of 50 Faculty Members in DMMMSU – MLUC.


Compute the three measures of location.

Table 20. Efficiency Ratings of 50 Faculty Members in DMMMSU -


MLUC
CI CB f M fM d’ fd’ < cf
30 – 34 1

38
35 – 39 2
40 – 44 2
45 – 49 4
50 – 54 4
55- 59 8
60 – 64 3
65 – 69 5
70 – 74 7
75 – 79 6
80 – 84 1
85 – 89 5
90 – 94 1
95 – 99 1
N = 50

1 – 10. Compute the mean value using the formula method or ES PLUS CASIO and
interpret the result.
Computation:

Interpretation:

11 - 20. What is the value of the median? Interpret the result.


Computation:

Interpretation:

21 – 30. Compute the value of the mode and interpret the result.
Computation:

39
Interpretation:

B. Apply the appropriate measure to describe the level of satisfaction of 100 faculty
members in DMMMSU in terms of promotion and interpret the result as well. The following
data were recorded:

Level of Satisfaction
5 4 3 2 1
15 20 42 14 9

31 – 40. Compute the level of satisfaction of 100 faculty members in DMMMSU in terms
of promotion.
Computation:

Interpretation:

Quantiles

Quantiles is just the extension of the median concept. It figures the complete
breakdown of the distribution as to percentiles, deciles and quartiles.
Classifications:
1. Percentile – It is the value that divides the distribution into 100 equal
parts.
2. Decile – It is the value that divides the distribution into 10 equal parts.
3. Quartile – It is the value that divides the distribution into 4 equal parts.

Examples

40
A. Using the same example of efficiency ratings of faculty members in DMMMSU – MLUC
as reflected on Table 20. Find the following:
1. Q3
Computation:
 3N 
 4  PS 
Q  L   i
3 CB
 f Q3 
 
 37.5  36 
 74.5    5
 6
 75.75 or 76
Interpretation: The first 75% of the population of faculty members obtained the
highest rating of 75.75 or 76.
2. Top management decides to promote the upper 5% of the distribution according to
efficiency rating. What is the lowest efficiency rating included in the promotion?
Computation:
 48  43 
P  84.5    5
96  5
 89.5 or 90
Interpretation: The lowest efficiency rating of a faculty member that will be included in
the promotion is 89.5 or 90. Thus, a faculty member will be promoted if his/her rating
is 90 and above.
3. However, due to cost – cutting measures, the company decided to retrench the lowest
10% of the distribution. The group is believed to have the least contribution in the
production of the company. What is the highest efficiency rating included in this range?
Computation:
5  3
D  39.5   5
1  2 
 44.5 or 44
Interpretation: The highest efficiency rating of a faculty member included in the lowest
10% performer and will be retrenched is 44.5 or 44. Thus, a faculty member will be
terminated if his/her rating is 44 and below.
4. What is the percentile rank of an employee with an efficiency rating of 80?
Computation:
 .5j  42 
80  79.5   5
 1 
 84.2
Interpretation: The corresponding percentile rank of an employee with an efficiency
rating of 80 is 84.2.
Exercise 6
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Using the same example of efficiency ratings of faculty members in DMMMSU – MLUC
as reflected on Table 20. Find the following:

1 – 5. P55

41
Computation:

Interpretation:

6 – 10. Q3
Computation:

Interpretation:

11 - 15. The company decided to let employees undergo seminar – workshop on


enhancing productivity output for those who have been observed to contribute least and
classified the lowest 35% of the distribution. What is the efficiency rating range that will
be included to undergo seminar – workshop?
Computation:

Interpretation:

16 – 20. What is the corresponding rank of the employee if his/her efficiency rating is
72?
Computation:

42
Interpretation:

Chapter V. Measures of Variation / Variability

Objectives

 Compare and contrast the different measures of variability and dispersion.


 Compute with ease the measures of dispersion of grouped and raw data.

43
 Determine to use the measures of variability.
 Compute the measures of skewness and kurtosis.

Measures of Variation / Variability

Measures of variability describe the extent of scattering of individual items about


the average or point of central location. It gives additional information on whether the
observed group is homogenous or heterogeneous with respect to a given distribution. It
includes several measures of dispersion for interval, nominal, and ordinal data.

Range

The range of a set of data is the simplest of the measures of dispersion and
could easily be solved by simply getting the difference between the highest value and
the lowest value in a distribution.
Characteristics:
1. Easiest to compute and to understand.
2. It is dependent only upon two extremes values.
3. It provides the least satisfactory conclusion about the population.
Computations:
1. Ungrouped Data
R = H.S – L.S
2. Grouped Data
R = U.lHCI – L.lSCI
Examples

A. Solve what is being asked for each number.


1. Given: 15, 24, 34, 56, 75, 48, 39, & 60.
Computation:
R = 75 – 15 = 60
Interpretation: It could be analyzed that the difference of the highest value and the
lowest value is 60.
2. With reference to Table 19, find the range.
Computation:
R = 54 – 25 = 29
Interpretation: It could be analyzed that the difference of the highest value and the
lowest value of the number of transactions made in a day is 29.

Quartile Deviation

Quartile Deviation or semi – interquartile range is the dispersion in the middle


half of the distribution.
Characteristics:
1. Dispersion is in the middle half of the distribution.
2. More stable as measure of dispersion than the range.
Computations:
1. Ungrouped Data
QD = (Q3 – Q1) / 2
2. Grouped Data
QD = (Q3 – Q1) / 2

44
 3N  N 
 4  PS   4  PS 
1. Q3  L CB   i 2. Q1  L CB   i
 f Q3   fQ1 
   

Examples

A. Solve what is being asked for each number.


1. Consider the temperature readings in San Fernando City, La Union in a year: 20, 27,
28, 32, 34, 36, 38, & 40. Compute the Quartile Deviation.
Computation: (Using the Formula)
QD = (36 – 27)/2 = 4.5
Interpretation: It could be analyzed that the dispersion or variability of the middle half
of the distribution from the first quartile and median or median and third quartile is
4.5. It does not reflect the variability of the whole distribution.

2. With reference to the Table of Number of Transactions made by Teller Machine in 28


days. Compute the quartile deviation.
Computation: (Using the Formula)

QD = (42.83 – 32)/2 = 5.415


 3 28 
 4  17 
1. Q3  39.5   5
 6 
 
 42.83
 28 
 4  6
2. Q1  29.5   5
 2 
 
 32
Interpretation: It could be analyzed that the dispersion or variability of the middle half
of the distribution from the first quartile and median or median and third quartile is
5.415. It does not reflect the variability of the whole distribution.

Mean Absolute Deviation

The mean deviation or more accurate mean absolute deviation is defined as the
average of the absolute deviations from the mean. Generally, the dispersion of a set of
data is said to be small if the values are close to the mean, and large if values are
scattered about the mean.
Characteristics:
1. Deviation of each score or representative from the mean.
2. More stable to use as measure of dispersion than the range and QD for it
describes the dispersion of the entire score in the distribution from the mean.

Computations:
1. Ungrouped Data
 X  X
MAD 
N
Where:
MAD = Mean absolute deviation.

45
X = Observed scores.
X = value of the mean.
N = Number of cases.
2. Grouped Data

MAD 

f x  x 
N
Where:
MAD = Mean absolute deviation.
f = Observed frequency.
X = Representative of each category.
X = Value of the mean.
N = Number of cases.
Examples

A. Solve what is being asked for each number.


1. Find the mean absolute deviation of the given scores in a statistics class: 39, 31, 27,
34, 39, 19, & 20.
Computation: (Using the Formula)
a. First, compute for the value of the mean.
X = 209 / 7
= 29.86

b. Set up the given data in a table manner.

X X- X X  X

39 9.14 9.14
31 1.14 1.14
27 -2.86 2.86
34 4.16 4.16
39 9.14 9.14
19 -10.86 10.86
20 -9.86 9.86
 X  X = 47.14

MAD = 47.14 / 7 = 6.73


Interpretation: It could be analyzed that on the average, the raw data differ by 6.73
from the mean value of 29.86.

2. With reference to the Table of Number of Transactions made by Teller Machine in 28


days.

Computation: (Using the Formula)


a. Compute for the value of the mean.
X = 37.54

b. Set up the frequency distribution.


CI f M fM X- X X  X f X  X

25 – 29 6 27 162 -10.54 10.54 63.24


30 – 34 2 32 64 -5.54 5.54 11.08

46
35 – 39 9 37 333 -0.54 0.54 4.86
40 – 44 6 42 252 4.46 4.46 26.76
45 – 49 4 47 188 9.46 9.46 37.84
50 - 54 1 52 52 14.46 14.46 14.46
N = 28 fM =1051 f X  X = 158.24

MAD = 158.24 / 28 = 5.65


Computation: (Using ES PLUS CASIO)
Mode, stat, 1 – var, shift, mode, arrow down, stat, 1, for x column (10.54 =, 5.54
=,
0.54 =, 4.46 =, 9.46 =, 14.46 =), for freq column (6 =, 2 =, 9 =, 6 =, 4 =, 1 =), AC,
Shift, 1, var, x , then =. The MAD is the same as 5.65.

Interpretation: It could be analyzed that on the average, the score deviates from the
mean of 37.54 by 5.65.

Standard Deviation & Variance

Variance and standard deviation are the only measures of dispersion that can be
applied for higher statistics. This can be applied in making inferences about the
consistency of the population or sample to be studied.
Characteristics:
1. Either of the two measures can be applied for statistical inferences.
2. It is the dispersion of each observation relative to the mean of the set of
scores.

Computations:
1. Ungrouped Data

s 

 x  x 2  sample standarad deviation 
n  1

s2 

 x  x 2  sample variance 
n  1

Where:
s = sample standard deviation.
x = observed frequency.
x = the mean value.
n = number of cases.

2. Grouped Data

s 

f x  x 2  sample standard deviation
n  1

s2 

f x  x 2  sample variance
n  1
Where:
s = sample standard deviation
f = class frequency

47
x = midpoint or representative of each category
x = the mean value
n = the number of cases

Examples

A. Solve what is being asked for each number.


1. A student was investigating the effect of synthetic fertilizer on the growth of peanut
seedlings. A random sample of those seedlings yielded the heights in inches. Find the
variance and standard deviation.

X X- X x  x 2
2 -3.43 11.76
3 -2.43 5.90
4 -1.43 2.04
5 -0.43 0.18
6 0.57 0.32
8 2.57 6.60
10 4.57 20.88
 x = 38
x = 5.43
 x  x 2 = 47.71

Computation: (Using the Formula)


47.71
s 
7 1
 2.82
2
s  7.95
Computation: (Using ES PLUS CASIO)
Mode, stat, 1 –var, 2 =, 3 =, 4 =, 5 =, 6 =, 8 =, 10 =, AC, shift, 1, var, sx, then =.
The standard deviation is the same as 2.82. Taking the square, the variance
is the same as 7.95.
Computation: (Using MS EXCEL)
select a vacant cell, =, stdev, highlight the scores,), press enter.
The standard deviation is the same as 3.79.

Interpretation: It could be analyzed and perceived that heights of seedlings deviate


relatively from the mean by 2.82 inches.

2. With reference to the Table of Number of Transactions Made by Teller Machine in 28


days, find the standard deviation and variance.

CI f M fM X- X x  x 2 f  x  x 2
25 – 29 6 27 162 -10.54 111.09 666.55
30 – 34 2 32 64 -5.54 30.69 61.38
35 – 39 9 37 333 -0.54 0.29 2.62
40 – 44 6 42 252 4.46 19.89 119.35
45 – 49 4 47 188 9.46 89.49 357.97
50 - 54 1 52 52 14.46 209.09 209.09
N = 28 fM =1051  f  x  x 2 = 1416.96

48
Computation: (Using the formula)
1416.96
s 
27
 7.24
s2  52.48
Computation: (Using ES PLUS CASIO)
Mode, stat, 1 – var, shift, mode, arrow down, stat, 1, for x column (-10.54 =,
-5.54 =, -0.54 =, 4.46 =, 9.46 =, 14.46 =), for freq column (6 =, 2 =, 9 =, 6 =,
4 =, 1 =), AC, shift, 1, var, sx, then =. The standard deviation is the same as 7.24.
Taking the square, the variance is the same as 52.48.

Interpretation: It could be analyzed and perceived that transactions that made by


teller machine deviates from the mean by 7.24 a day.

Exercise 7
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. The following are the IQ scores of 8 selected BSHRM students and 8 BSBA students in
the College of Arts and Management:

BSBA BSHRM
73 70

49
80 77
81 81
83 83
85 83
90 84
92 90
95 91

1 – 15. Which of the two groups have consistent distribution of IQ? Apply the
appropriate measure to compare the IQ scores using the formula or ES PLUS CASIO.
Computation:

Interpretation:

B. The distribution below summarizes the results of the 50-item test of 60 students in
Statistics.
Number of Correct Answers frequency
6 - 10 1
11 – 15 5
16 – 20 8
21 – 25 8
26 – 30 8
31 – 35 12
36 – 40 9
41 – 45 4
46 – 50 5

16 – 25. Compute for the variance and standard deviation using the formula or ES
PLUS CASIO and interpret the result as well.
Computation:

50
Interpretation:

Skewness

It refers to the symmetry or asymmetry of the frequency distribution. A


frequency distribution is positively skewed if the bump of the flatness is found on the
left and if its tail extends farther to the right of the mode than it does to the left and
negatively skewed if the bump of the flatness is found on the right side and its tail
extends on the left tail of the distribution.

51
Skewness can be computed using the formula Pearsonian Coefficient of
Skewness formula:

3 Mean  Median
SK 
Standard deviation
Interpretation:
SK = 0 (Perfectly Symmetrical)
SK > 0 (Frequency polygon is skewed to the right)
SK < 0 (Frequency polygon is skewed to the left)

Example

1. With reference to the Table of Number of Transactions Made by Teller Machine in 28


days, compute for the skewness.
Computation:
3 Mean  Median 
SK 
Standard deviation
3 37.54  37.83

7.24
  0.12
Interpretation: The value could be analyzed that the number of transactions that can
be made by teller machine is skewed to the left and therefore, negatively skewed. The
mean is lower than the median or the mode.

Kurtosis

This refers to the flatness or peaked ness of one distribution in relation to


another.

Types of Kurtosis

1. Leptokurtic – one distribution is more peaked than another. K > 3


2. Mesokurtic – flatness or peaked ness of the distribution is normally distributed. K =
3
3. Platykurtic – if it is less peaked than the other distribution. K < 3

Formulas:
1. Ungrouped Data

K 

 x  x 4
n s4

Where:
x = score n = sample size
X = mean value s = standard deviation

2. Grouped Data

K 

f x  x 4
n s4

52
Where:
x = representative of each category
X = mean value.
n = sample size
s = standard deviation

Examples

1. With reference to the student who was investigating the effect of synthetic fertilizer on
the growth of peanut seedlings, compute the kurtosis and analyze the result.
Computation:

K 

 x  x 4
n s4
882

8  88.92 
 1.24
Interpretation: The value could be analyzed that the distribution is platykurtic and
that the effect of synthetic fertilizer on the growth of peanut seedlings are more
dispersed or scattered from one another.

2. With reference to the Table of Number of Transactions Made by Teller Machine in 28


days, compute kurtosis and analyze the result.
Computation:

K 

f x  x 4
n s4
154061.1

28  2747.605
 2.002
Interpretation: It could be analyzed that the distribution is platykurtic and therefore
the number of transactions that can be made in a day are more dispersed or scattered
from one another.

Exercise 8
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

1 - 10. With reference to the IQ’s of 8 selected students in the BSHRM and 8 in the BSBA
Department, compute the skewness and analyze the result.
Computation:

53
Interpretation:

11 - 20. With reference to the Table of Efficiency Ratings of 50 Faculty Members in


DMMMSU-MLUC, compute the Kurtosis and analyze the result.
Computation:

Interpretation:

Chapter VI. Combinatorics and Probabilities

Objectives

54
 Explain and differentiate the concepts of permutations and combinations and as
well as solve.
 Explain and understand the theoretical and experimental concepts of
probability.
 Enumerate and explain the characteristics of a normal curve.
 Transform raw score from a standard score (z – score).
 Determine the areas under the normal curve.

Permutations, Combinations and Probability

Permutations, combinations, and probability have a strong recreational appeal


to many people. In addition to this interest, they have important practical and
theoretical applications.

Experiment, Sample Space & Elements

An activity or process that has a number of outcomes is called an experiment.


Hence, tossing a coin, throwing a die, spinning a wheel, drawing a card, games of
chance and other related activities are typical examples of experiments.
The set of collection of all possible outcomes of an experiment is called sample
space. The sample space of an experiment consists of elements or sample points such
that to reach outcome of the experiment, there corresponds exactly one element in the
sample space. Notation for sample space can be denoted as S.
Example: The possible outcomes of tossing a coin once.
Experiment = tossing a coin once.
Sample space = {Head, Tail} is the set of all possible outcomes.
Elements = Head or tail are the elements in the sample space.

Multiplicative Rule

Listing and counting the elements of a sample space is appropriate for simple
experiments. For more complicated experiments we can make use of the Multiplicative
Counting Rule.
Fundamental Principle: If a thing can be done in any of m different ways and then a
second thing can be done in any of n different ways, it follows that the total number of
different ways that can be done is m times n.

Examples

A. Solve each of the following for each number.


1. Suppose a governor from the College of Arts and Management will be chosen from
three candidates and a vice – governor from four other candidates. Let us determine the
number of different ways the two positions can be filled.
Computation: (Using the Listing Method)
There are three ways of filling the office of governor, and with each one of these
possible ways, there are four choices for vice – governor. In all, there are 3 x 4 = 12
different ways of making two choices.

Let S1 = {A, B, C} represent governor position.


S2 = {X, Y, Z, W} represents vice – governor position. Hence:

Positions Possible outcomes

55
S1 = {A, B, C} (AX, AY, AZ, AW
S2 = {X, Y, Z, W} BX, BY, BZ, BW
CX, CY, CZ, CW)

2. How many two – digit numbers can be made with the digits 2, 4, 6, and 8 if (a)
repetitions are allowed; (b) no digit is to be repeated in a number?
Computation: (Using the Listing Method)
(a) There are four choices for the tens’ digit, and after a choice is made, there
remain four choices of units’ digit as far as repetitions of digits are allowed. Therefore,
the total possibilities are 4 x 4 = 16.

Digits Possible two digit outcomes


2 (22, 24, 26, 28)
4 (42, 44, 46, 48)
6 (62, 64, 66, 68)
8 (82, 84, 86, 88)
(b) If repetitions of digits are not allowed, there are four choices for the tens’ digit
and only three for the units’ digit, making 4 x 3 = 12.

Digits Possible two digit outcomes


2 (24, 26, 28)
4 (42, 46, 48)
6 (62, 64, 68)
8 (82, 84, 86)

Permutation

It is an arrangement of objects in a definite order. The letters a, b, and c for


example, have the following possible permutations or arrangement.
abc, acb, zyx, bca, bac, and cab
There are six permutations. We can determine the number of permutations,
however, without writing a list. The number of permutations of n things taken r at a
time is given by the formula:
P (n, r) = n (n - 1) (n - 2)… (n – r + 1) or by
n!
P  n, r  
n  r !

Examples

A. Solve each of the following for each number.


1. How many permutations can be made from the letters in the word “Sunday” if (a) 5
letters are used at a time; and (b) 4 letters are used.
(a) We want to find the number of permutations of 6 things or letters taken 5 at
a time, therefore:
Computation: (Using the Formula)
6  5  4  3  2  1
P  6, 5 
6  5 !
 720
Computation: (Using the ES PLUS CASIO)
6, shift, x, 5, then =. The total number of permutation is the same as 720.

Computation: (Using MS Excel)

56
Vacant cell, =, permut, (, 6, 5, ), then enter. The total number of permutation is the
same as 720.

(b) We want to find the number of permutations of 6 things or letters taken 4 at


a time, therefore:
Computation: (Using the Formula)
6  5  4  3  2  1
P  6, 4  
6  4 !
 360
Computation: (Using the ES PLUS CASIO)
6, shift, x, 4, then =. The total number of permutation is the same as 360.

Computation: (Using MS Excel)


Vacant cell, =, permut, (, 6, 4, ), then enter. The total number of permutation is the
same as 360.

2. In how many ways can 6 BSBA students be seated in a room which has 11 chairs?
Since only 6 chairs are to be occupied, we need to find the number of
permutations of 11 things taken 6 at a time.
Computation: (Using the Formula)
11  10  9  8  7  6  5  4  3  2  1
P 11, 6  
11  6 !
 332,640
Computation: (Using the ES PLUS CASIO)
11, shift, x, 6, then =. The total number of permutation is the same as 332,640.

Computation: (Using MS Excel)


Vacant cell, =, permut, (, 11, 6, ), then enter. The total number of permutation is
the same as 332,640.

Permutations of Things Not All Different

The formulas of permutations were derived on the assumption that the set of n
things, or objects, includes objects that are all different. The formulas do not apply if
some of the objects are alike. The word WEDNESDAY for example, has 9 letters. But we
can not make 9! permutations using the 11 letters at a time because 2 letters are alike.
The number of permutations of n things taken all at a time, where n1 of the
objects are alike and the others distinct is:
n! n!
P  or P 
n1 ! n1 ! n 2 ! n3 !....

Examples

A. Solve each of the following for each number.


1. How many permutations can be made with the 9 letters in the word TENNESSEE?
Computation: (Using the Formula)
There are 2 S’s, 4 E’s, and 2 N’s. Therefore:

57
n!
P 
n ! n ! n !....
1 2 3
9  8  7  6  5  4  3  2 1

2! 2! 4 !
 3780
Computation: (Using the ES PLUS CASIO)
9, shift, x-1, ÷, (, 2, shift, x-1, x, 2, shift, x-1, x, 4, shift, x-1,), then =.
The total number of permutation is the same as 3,780.

2. In how many ways is it possible for 12 customers to buy 5 cans of sardines, 5 cans of
corned beef, and 2 cans of meatloaf if each customer gets 1can?
Computation: (Using the Formula)
If the cans were all different there would be 12! Ways in which each customer
could get 1 can. But since there are groups of 5, 5, and 2 like cans, therefore we have:
n!
P 
n1 ! n2 ! n3 !....
12  11  10  9  8  7  6  5  4  3  2  1

2 ! 5 ! 5!
 16,632
Computation: (Using the ES PLUS CASIO)
12, shift, x-1, ÷, (, 2, shift, x-1, x, 5, shift, x-1, x, 5, shift, x-1,), then =.
The total number of permutation is the same as 16,632.

Combination

A part or all of set of objects is called combination. A combination differs from a


permutation in the sense that it does not involve the order of selection or the
arrangement of the members. The letters a, b, c, and d taken 3 at a time have the
combinations.
abc abd acd bcd
Changing the order of one of these combinations does not make a new
combination. Thus a b c and and b c a are two permutations of a single combination. If
C (n, r) stands for the number of combinations of n things taken r at a time (r ≤ n),
then:
n n  1 n  2      n  r  1
C n, r   or
r!
n!
C  n, r  
r! n  r !

Examples

A. Solve each of the following for each number.


1. In how many ways can a committee be selected from 18 persons if the committee is
to have (a) 3 members; (b) 14 members?
Computation: (Using the Formula)
(a) We wish to find the number of combinations of 18 things taken 3 at a time.
Therefore:

58
n!
C 18, 3  
r! n  r !
18 ! 18 ! 18  17  16
    816
3 ! 18  3  ! 3 ! 15 ! 3!

Computation: (Using ES PLUS CASIO)


18, shift, ÷, 3, then =. The total number of combination is the same as 816.

Computation: (Using MS Excel)


Vacant cell, =, combin, (, 18, 3, ), then enter. The total number of combination is
the same as 816.

(b) We wish to find the number of combinations of 18 things taken 14 at a time.


Therefore:
Computation: (Using the Formula)
n!
C18,14  
r ! n  r !
18! 18! 18  17  16  15
    3060
14! 18  14  ! 14 ! 4 ! 4!

Computation: (Using ES PLUS CASIO)


18, shift, ÷, 14, then =. The total number of combination is the same as 3060.

Computation: (Using MS Excel)


Vacant cell, =, combin, (, 18, 14, ), then enter. The total number of combination is
the same as 3060.

2. In how many ways can 5 mathematics teachers be employed from 6 male applicants
and 4 female applicants in DMMMSU – MLUC if (a) 3 are to be men; (b) 3 or 4 are to be
men?
Computation: (Using the Formula)
(a) The 3 men can be had in C (6, 3) different ways. The remaining 2 can be had
from the female applicants in C (4, 2) ways. Hence, the total number of ways is
6! 4!
C  6, 3   C  4, 2   
3!  6 - 3 ! 2!  4 - 2 !
 120
Computation: (Using ES PLUS CASIO)
6, shift, ÷, 3, x, 4, shift, ÷, 2, then =. The total number of combination
is the same as 120.

(b) To find the number of ways of filling the vacancies with either 3 or 4 men
among those employed, we add the number of ways of filling the vacancies with 3 men
and 4 women to the number of ways of employing 4 men and 3 women. That is, we
have:
Computation: (Using the formula)
C 6, 3   C  4, 2   C  6, 4   C (4,1)  120  60  180

Computation: (Using ES PLUS CASIO)


6, shift, ÷, 3, x, 4, shift, ÷, 2, +, 6, shift, ÷, 4, x, 4, shift, ÷, 1, then =.
The total number of combination is the same as 180.

Exercise 9

59
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Identify what principle is applicable on combinatorics and solve using the formula or
ES PLUS CASIO.
1 - 5. In how many different ways can the letters of the word 'CORPORATION' be
arranged so that the vowels always come together?
Computation:

6 - 10. A license plate has 3 letters and 3 digits in that order. A witness to a hit and run
accident saw the first 2 letters and the last digit. If the letters and digits can be
repeated, how many license plates must be checked by the police to find the culprit?
Computation:

11 - 15. In a box there are 5 black pens, 3 white pens and 4 red pens. In how many
ways 2 black pens, 2 white pens and 2 red pens can be chosen?
Computation:

16 - 20. Twenty people meet in a room and each shakes hands with all the others. If all
of them will have to shake hands once again before leaving, how many handshakes will
there be?
Computation:

60
21 - 25. In how many ways can an animal trainer arrange 5 lions and 4 tigers in a row
so that no two lions are together?
Computation:

26 - 30. How many 3-digit numbers can be formed from the digits 2, 3, 5, 6, 7 and 9,
which are divisible by 5 and none of the digits is repeated?
Computation:

61
Probability

The theory of probability, which received its first impetus from games of chance,
has been highly developed and now has wide and important applications in the fields of
insurance, annuities, and other social sciences.
In our study of probability, we indicate precisely those conditions which favor
the happening of an event and those which oppose the happening.
Probability of an Event

By definition, it denotes the probability of an event E, n (E) the number of


elements in the event, and n (S) the number of elements in the sample space.
n E
P E 
n  S
Examples

A. Solve each of the following for each number.


1. If six 1 peso coins are tossed simultaneously, find the probability that (a) all fall
heads up; (b) three fall heads up and three fall tails up.
Computation: (Using the Formula)
(a) There are two ways for a coin to fall, head or tail, and by the Fundamental
Principle, six coins can fall in 64 ways. All heads can occur in only one way. Therefore:
n E
P E 
n  S
1

64
(b) A particular three of the coins can fall heads and the remaining coins fall
tails in only one way. But the three coins with heads up can occur in C (6, 3) = 20 ways.
Therefore:
Computation: (Using the Formula)
n E
P E 
n  S
20 5
 
64 16

2. If four cards are to be removed from a standard deck of playing cards, find the
probability that (a) all four are face cards; and (b) all four are spades.
Computation: (Using the Formula)
(a) There are 12 face cards and consequently, C (12, 4) ways of getting four face
cards. The total number of ways of getting four cards from the deck is given by C (52,
4). Therefore:
Computation: (Using the Formula)
C 12, 4 
P  4 face cards  
C  52, 4 
99

54,145
(b)

62
C 13, 4 
P  4 spades  
C  52, 4 
11

4,165
Probability in the Union of Events

Probability in the union of two sets includes all the elements in E1 and E2 by P
(E1  E2).
Example: If A = {u, v, w, x, y, z} and B = {a, v, y, z), then
A  B = {a, u, v, w, x, y, z}
Theorem 1: If E1 and E2 are any events in a sample space S, then

PE  E
1 2

 PE
1
 
 PE
2
 
 PE  E
1 2
 
Corollary 1: If E1 and E2 are disjoint sets, then E1 and E2 are said to be mutually
exclusive. For this case, n (E1  E2) = 0 then,

PE  E
1 2

 PE  PE
1
  2
 
Disjoint Sets – Two sets which have no common element.

Probability in the Intersection of Events

Probability in the intersection of two events includes all he elements which are
common to E1 and E2 by P (E1  E2).
Example: Same example in PUE.
A  B = {v, y, z}

   
Theorem 2: If E1 and E2 are dependent events in the sample space S, then
PE  E  PE  P (E2  E1)
1 2 1
Corollary 2: If E1 and E2 are independent events, then,

PE  E
1 2

 PE  PE
1
 
2
 
Examples

A. Solve each of the following.


1. A card is drawn from a deck of 52 cards. Find the probability that the card is a queen
or a heart.
Computation:
Let E1 = set of 4 queens
Let E2 = set of 13 heart, therefore:
   
P E1  E 2  P E1  P E 2  P E1  E 2    
4 13 1
  
52 52 52
4

13

2. A group of boys are to compete in a foot race. The probability that runner A will win
is 1/7, and the probability that runner B will win is ¼. Find the probability that A or B
will win.
Computation:
E1 = probability that runner A will win

63
E2 = probability that runner B will win, mutually exclusive events,
therefore:
 
P E1  E 2  P E1  P E 2    
1 1 11
  
7 4 28
3. One card is drawn from a deck of 52 cards and then a second is drawn. Find the
probability that the first card is a spade and the second card is a club if (a) first card is
replaced before the second card is drawn; and (b) the first is not replaced.
Computation:
Let E1 = be the set of 13 spades.
Let E2 = be the set of 13 clubs. Therefore:

(a) If first card will be replaced


 
P E1  E 2  P E1  P E 2    
13 131
  
52 52 16

   
(b) If first card will not be replaced
P E1  E 2  P E1  P (E2  E1)

1 13 13
=  
4 51 204

Probabilities in Repeated Trials

This type of probability will consider type of experiment in which the outcome of
any trial is independent of the outcome of any other trial.
Theorem 3: If p is the probability that an event will occur in a single trial of an
experiment and q is the probability that the event will fail to occur, then the probability
that the event will occur exactly r times in n trials is:

n  r
C  n, r  pr q

Examples

1. In 4 throws of a die, what is the probability of (a) exactly 2 aces; (b) 2 or more aces?
Computation:
(a) The probability of one ace on a single throw is 1/6 and of failure is 5/6.
Hence, the probability of two aces is:
n  r
C n, r  pr q
2 2
1 5
 C  4, 2     
6 6
25

216
(b) The probability of 2, 3, or 4 aces is equal to the sum of the separate
probabilities of those events. Hence, we have:

64
n  r
C n, r  pr q
2 2 3 4
1 5 1 5 1
 C 4, 2      C 4, 3        
6 6 6 6 6
171 171
 
6 4 1296

Probability Using Venn Diagram

One can determine the chance or probability using the concept of venn diagram.
It is a diagram that shows all possible logical relations between two sets.

Example

1. Business Survey: A survey of 800 small business firms in a certain city indicate that
250 own photo-copiers, 420 own fax machines, and 180 own both photo-copiers and
fax machines.
Illustration and Computation:
a) Draw a Venn diagram illustrating the events P={owns photocopier} and F={owns
fax machine}.

310

P F
180
70 240

b) How many businesses in the survey own either a photocopier or a fax machine?
n(P  F)  70  180  240  490
c) What is the probability that a randomly selected business owns either a
photocopier or a fax machine?
490
P(P  F)   .6125
800
d) How many businesses in the survey own neither a photocopier nor a fax
machine?
n  NOT(P  F)   800  490  310
e) What is the probability that a randomly selected business owns neither a
photocopier nor a fax machine?
310
P  NOT(P  F)    .3875
800
f) How many businesses in the survey own a photocopier but not a fax machine?
 
n P  F  70

g) What is the probably that a randomly selected business owns a photocopier but
not a fax machine?

P P F  70
800
 .0875

h) What is the conditional probability that a randomly selected business owns a


photocopier, given that they own a fax machine?

65
P(P  F) 180/800 180
PP|F      3/7 or 0.4286
P(F) 420/800 420

Exercise 10
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

1 - 6. If one card is removed at a random from a standard deck of 52 playing cards, find
the probability that the card will be (a) one of the face cards; (b) one of the black suits;
and (c) a red jack.
Computation:

7 - 10. A baby places the wooden digits 1, 2, 3, 4, 5, 6 in a row. What is the probability
that the number thus formed is (a) less than 400,000, (b) more than 200,000?
Computation:

11 – 12. The probabilities that teams A, B, and C will win the conference championship
are ½ , 1/5, and 1/8. Find the probability that one of them will win the title.
Computation:

13 - 16. One box contains 3 black balls and 4 white balls and another box contains 6
black balls and 5 white balls. If one ball is drawn from each box, find the probability
that both balls will be (a) black; and (b) white.
Computation:

66
17 – 20. A coin is tossed 5 times. Find the probability of (a) exactly 2 heads; (b) exactly
3 heads; and (c) fewer than 2 heads.

Computation:

Consider the problem. Customer service: A cable television company has 8000
subscribers in a suburban community. The company offers two premium channels,
HBO and SHOWTIME. If 2450 subscribers receive HBO and 1950 receive SHOWTIME
and 5150 do not receive any premium channel, find the following:

21 – 22. How many subscribe to receive both HBO and SHOWTIME?

23 – 24. What is the probability that a randomly selected subscriber receives both HBO
and SHOWTIME?

25 – 26. How many subscribe to HBO but not to SHOWTIME?

27 – 28. What is the probability that a randomly elected subscriber receives HBO but
not SHOWTIME?

67
29 – 30. What is the probability a randomly selected Showtime subscriber also receives
HBO?

Normal Distribution

One of the assumptions in statistical estimation and hypothesis testing


problems concerning means, variances, proportions and correlation coefficient is the
condition of normality.

Characteristics of a Normal Curve

 It is symmetrical about the mean, median and the mode since all of the
measures are centrally located.

x
 
x =x =x
 It is asymptotic relatively to the horizontal line where there is the tendency
that two tails of the curve will tend to meet with the horizontal line but will
never intersect.
 Like any geometrical figure, the area of the normal is curve is also 100%.

50% 50%
z

Steps in Finding the Area of a Normal Curve

1. Compute the value of z – score or standard score if it’s unknown.


x  μ
z 
σ
Where:
z = the unknown standard score
x = score
 = mean value
 = standard deviation
2. Convert normal curve at x distribution to a standardized normal distribution with
mean equal to zero and standard deviation of 1.
3. Shade the unknown area of the normal curve where z – score always starts at zero.

68
4. Refer the unknown area of the normal curve from the Tabulated Areas of the Normal
Probability Distribution of the computed standard score.

Examples

A. Solve each of the following.


1. Studies show that gasoline use for compact cars sold in the Philippines is normally
distributed with a mean use of 30.5 miles per gallon (mpg) and standard deviation of
4.5 mpg. What percentage of compacts obtains 35 or more miles per gallon?
Computation:
x  μ
a. z 
σ
35  30.5

4.5
 1.0
b.

0 1.0
c. P (z  1) = .5 - .3413 = .1587 or 15.87%

2. The salaries of MBA graduates who entered the field of marketing services averaged
approximately P45, 000 with a standard deviation of P2, 250. If these salaries were
normally distributed, what proportion of MBA graduates who entered marketing
services had salaries in excess of P47, 500, which is the average salary for those
graduates entering the field of brand/product management?
Computation:
x  μ
a. z 
σ
47, 500  45, 000

2250
 1.11
b.

0 1.11
c. P ( z  1.11) = .5 - .3665 = .1335 or 13.35%

3. Refer to Example 1. If manufacturer wants to develop a compact car that outperforms


95% of the current compacts in fuel economy, what must the gasoline usage rate be for
the new car?
Computation:
x  μ
z 
σ
x  30.5
1.645 
4.5
x  37.9

69
Exercise 10
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Solve each of the following:


1 -10. On an examination, the average grade was 86 and the standard deviation was 7.
If 17% of the class is given A’s, and the grades are curved to follow a normal
distribution, what is the lowest possible A and the highest possible B?
Computation:

a. b.

c.

11 – 20. In a mathematics examination the average grade was 83 and the standard
deviation was 5. All students with grades from 87 to 90 received a grade of B. If the
grades are approximately normally distributed and 15 students received a B grade, how
many students took the examination?
Computation:

a. b.

c.

70
Chapter VII. Hypothesis Testing

Objectives

 Understand the definitions used in hypothesis testing.


 State the null and alternative hypotheses.
 Explain the difference between type I and type II errors and the power of a test.
 Find critical values for the test - statistics.
 State the six steps used in hypothesis testing.
 Test mean/s for small and large samples using the z-test, t – test and paired t –
test.
 Test proportion/s using the z-test.
 Test variance/s or standard deviations using the chi square test and f-test.
 Test correlation coefficient using t – test and z – test.

Introduction

 Statistical hypothesis testing is a decision-making process for evaluating claims


about a population.
 In hypothesis testing, the researcher must define the population under study,
state the particular hypotheses that will be investigated, give the significance level,
select a sample from the population, collect the data, perform the calculations required
for the statistical test, and reach a conclusion.

Methods to Test Hypotheses

The three methods used to test hypotheses are:


 The traditional method.
 The P-value method.
 The confidence interval method.

Statement of a Hypothesis

 There are two types of statistical hypotheses for each situation: the null
hypothesis and the alternative hypothesis.
a. The null hypothesis, symbolized by H0, is a statistical hypothesis that
states that there is no difference between a parameter and a specific value, or
that there is no difference between two parameters.
b. The alternative hypothesis, symbolized by H1, is a statistical
hypothesis that states the existence of a difference between a parameter and a
specific value, or states that there is a difference between two parameters.

Formulation of Null & Alternative Hypothesis

71
1. A medical researcher is interested in finding out whether a new medication will have
any undesirable side effects. The researcher is particularly concerned with the pulse
rate of the patients who take the medication. Will the pulse rate increase, decrease, or
remain unchanged after a patient takes the medication? Since the researcher knows
that the mean pulse rate for the population under study is 82 beats per minute, the
hypotheses for this situation are
H0: µ = 82 (The new medication will not have any undesirable side effects on the
pulse rate of the patients.)
H1: µ ≠ 82 (The new medication will have an undesirable side effects on the pulse
rate of the patients.)
Type of test: two – tailed (possible side effects of the medicine could be to raise
or lower the pulse rate.)

2. A chemist invents an additive to increase the life of an automobile battery. The mean
lifetime of the automobile battery is 36 months.
H0: µ = 36 (An invented additive will not increase the life of an automobile
battery.)
H1: µ > 36 (An invented additive will increase the life of an automobile battery.)
Type of test: One – tailed (This test is called right-tailed, since the interest is in
an increase only.)

3. A researcher thinks that if expectant mothers use vitamin pills, the birth weight of
the babies will increase. The average birth weight of the population is 8.6 pounds.
H0: µ = 8.6 lbs (The use of vitamin pills of expectant mothers will not increase the
weight of the babies.)
H1: µ > 8.6 lbs (The use of vitamin pills of expectant mothers will increase the
weight of the babies.)
Type of Test: One - tailed

4. A psychologist feels that playing soft music during a test will change the results of
the test. The psychologist is not sure whether the grades will be higher or lower. In the
past, the mean of the scores was 73.
H0: µ = 73 (Playing soft music during a test will not change the results of the test.)
H1: µ  73 (Playing soft music during a test will change the result of the test.)
Type of Test: Two - tailed

Hypothesis-Testing Common Phrases

> <
Is greater than Is less than
Is increased Is decreased or
Reduced from
≥ ≤
Is greater than or Is less than or
equal to equal to
Is at least Is at most
= ≠
Is equal to Is not equal to
Has not changed from Has changed from

72
Exercise 11
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Formulate the null and alternative hypothesis and identify the type of test.
1 - 3. The manufacturer of a certain brand of cigarettes claims that the average nicotine
content does not exceed 2.5 mg. State the H 0 and H1 in testing and determine what type
of test to be utilized.
Hypotheses:
Hypothetical Question:
H o:
H1 :
Type of Test:

4 - 6. Test the hypothesis that there exists significant difference between the
performance of BSBA and BSOA students in Statistics against that it does not.
Hypotheses:

H o:
H1 :
Type of Test:

7 - 9. A real estate agent claims that 60% of all private residences being built today are
3 – bedroom homes. To test this claim, a large sample of new residences is inspected;
the proportion of these homes with 3 bedrooms is recorded and used as our test
statistic.
Hypotheses:

H o:
H1 :
Type of Test:

10 - 12. The researcher would like to determine whether hypertension is dependent on


smoking habit.
Hypotheses:

H o:
H1 :
Type of Test:

13 – 15. A group of educators would like to find out whether programmed materials are
more effective than traditional method in teaching – learning process.
Hypotheses:

H o:

73
H1 :
Type of Test:

Design of the Study

 After stating the hypotheses, the researcher’s next step is to design the study.
The researcher selects the correct statistical test, chooses an appropriate level of
significance, and formulates a plan for conducting the study.

Statistical Test

 A statistical test uses the data obtained from a sample to make a decision about
whether or not the null hypothesis should be rejected.
 Statistical test is called the test value.

Possible Outcomes of a Hypothesis Test

H0 True H0 False
Reject H0 Error Type I Correct Decision
Do not reject Ho Correct Decision Error Type II

Summary of Possible Outcomes

 A type I error occurs if one rejects the null hypothesis when it is true.
 A type II error occurs if one accepts the null hypothesis when it is false.

Steps in Hypothesis Testing

1. Formulate the null hypothesis.


2. Formulate the alternative hypothesis.
3. Assign level of significance and identify what type of test will be used.
4. Identify the critical region/s and its critical value/s.
5. Test hypothesis using the appropriate statistical tool.
6. Formulate correct decisions to avoid type 1 or type 2 errors.

Vocabulary List

 Hypothesis – a statement or tentative theory which aims to explain facts about


the real world.
 Statistical Hypothesis – is an assertion or conjecture concerning one or more
populations.
 Type I Error – rejection of the null hypothesis when it is true.
 Type II Error – acceptance of the null hypothesis when it is false.
 Critical Value – it is the tabulated value where the null is to be rejected
otherwise, alternative hypothesis will be accepted.
 Critical region – rejection region where the null hypothesis will be rejected and
proved to be wrong.

74
 Acceptance region – region where null hypothesis will be accepted and proved
to be true.
 Level of Significance – probability of committing making Type I Error. Chance
that the Null Hypothesis will be rejected.
 One - tailed test – rejection region is located at one extreme of the range of
values or if the condition of the alternative hypothesis is directional.
 Two – tailed test – rejection is located at two extreme of range of values or if
the condition of the alternative hypothesis is non – directional.
Testing Hypothesis Concerning Mean/s

In this section, the objective is to test hypothesis concerning one population


mean or difference between two population means.

One Population Mean

Problem of testing hypothesis Ho that the population mean μ  μ against the


0
usual alternatives is given by:
 H 0: μ  μ 0

 μμ μμ μ  μ

0
H 1:
0, , or 0

Assuming that the distribution of the population being sampled is at least


approximately normal, z – test or t- test value for testing μ  μ 0 is given by:

x  μ0
1. z  if n  30 (z  test)
σ / n

x  μ0
2. t  if n  30 (t  test)
s / n
v  n 1

Establishing the Critical Region

 If μμ , then the critical region is z z in z – test and t t in t – test


0 α α
which lies on the right tail of the distribution.

μμ z  z t  t
0 α α
 If , then the critical region is in z – test and in t – test

which lies on the left tail of the distribution.

75
z z z  z
α/2
 If μ  μ
0, then the critical region is
α/2 and in z – test and

t t t  t
α/2
and in t – test which lies on each tail of the distribution.
α/2

Note:
 z – test is used when the population standard deviation is known and with large
sample size given n > 30.
 t – test is used when the sample standard deviation is known and with sample
size given n < 30.

Example

1. In a recent survey of utility workers both in public and private in Region 1, it was
found out that the average monthly net income of utility workers is Php5, 500. Suppose
a researcher wants to test this figure by taking a random sample of 250 utility workers
in Region 1 to determine whether the monthly income has changed. Suppose further
that the average net monthly income of the 250 utility workers is found to be Php6, 700
with a population standard deviation of P1, 550. Does this seem to indicate that average
net monthly income after the survey already greater than Php5, 500? Use a 0.05 level of
significance.
Illustrative Testing:
a) H0: The average net monthly income of utility workers in Region 1 after the
survey is still Php5, 500. µ = Php5, 500
b) H1: The average net monthly income of utility workers in Region 1 after the
survey is already greater than Php5, 500. µ > Php5, 500
c)  = 0.05 (one – tailed test)
d) Critical Region: z > 1.645

0 1.645
e) Computation: x = Php6, 700; µ = Php5, 500;  = Php1, 550; n = 250
xμ
z
σ
n


 Php 6,700  Php 5,500  250
Php1,550
 12.24
f) Decision: Reject H0 and accept H1 and conclude that the mean net monthly
income after the survey is already greater than Php5, 500.

2. In a certain study, it was found out that the average time required by workers to
complete a certain task was 44.3 minutes. A group of 20 workers was randomly chosen
to undergo a special training for one month. After the training, it was observed that the
workers’ average time to complete the task was 38 minutes with a standard deviation of

76
6.5 minutes. Can it be concluded at 95% confidence that the special training facilitated
the completion of the task?
Illustrative Testing:
a) H0: The special training did not facilitate the completion of the task and still
44.3 minutes. µ = 44.3 minutes
b) H1: The special training facilitated the completion of the task and it is now
less than 44.3 minutes. µ < 44.3 minutes
c)  = 0.05 (one – tailed test)
d) Critical Region: v = 20 – 1 = 19; t < -2.093

-1.729 0
e) Computation: x = 38 min; µ = 44.3 min; s = 6.5 min; n = 20
xμ
t
s
n 1


 38 - 44.3  19
6.5
  4.22
f) Decision: Reject H0 and accept H1 and conclude that the special training
facilitated the completion of the task.

Two Population Means

Hypothesis testing involving the difference between two population means is


used to determine whether or not it is reasonable to conclude that the two population
means are unequal.
Problem of testing hypothesis H0 that the population mean μ1  μ 2 against the
usual alternatives is given by:
 H0: μ1  μ2

 μ μ , μ μ μ  μ

1 2
H 1: , or
1 2 1 2

Assuming that the distribution of the population being sampled is at least


approximately normal, z – test or t- test value for testing μ  μ 0 is given by:

(x1  x 2 )  d 0
1. z 
   
(independent large samples)
2 / n  σ2 / n
σ1 1 2 2

 x1 
 x2  d
2. t  0  independent small samples
 s2  s 2 
 1   
    2 
n   n
 1  2 

77
Establishing the Critical Region

 If μ  μ , then the critical region is z  z in z – test and t  t which lies


1 2 α α
on the right tail of the distribution.

μ μ with corresponding level of significance , then the critical region is

1 2
 If

z  z t  t
α α
in z – test and in t – test which lies on the left tail of the

distribution.
 If μ1  μ 2 with corresponding level of significance , then the critical

z z z  z t t t  t
α/2 α/2
region is and in z – test and and in t – test
α/2 α/2

which lies on each tail of the distribution.

Example

1. A course in mathematics is taught to 12 students by the conventional classroom


procedure. A second group of 10 students was given the same course by means of
programmed materials. At the end of the semester the same examination was given to
each group. The 12 students meeting in the classroom made an average grade of 85
with a standard deviation of 4, while the 10 students using the programmed materials
made an average of 81 with a standard deviation of 5. Test the hypothesis that the two
methods of learning are equal using at 0.10 level of significance. Assume the population
to be approximately normal with equal variances.
Illustrative Testing:
a) H0: The two methods of learning do not differ from one another, hence, equally
effective in learning. μ1  μ 2
b) H1: There is significant difference between the conventional classroom
procedure and programmed materials in learning process. μ1  μ 2
c)  = 0.10 (two – tailed test)
d) Critical Region: v1 = 12 + 10 – 2 = 20; t > 1.725 & t < -1.725

-1.725 0 1.725
e) Computation: x 21 = 85;
x1  x 
d x 2 = 81; s1 = 4; s2 = 5; n1 = 12; n2 = 10
t  0  independen t small samples 
 s2   s2 
 1   2 
 n    n 
 1  2
 85  81  d
0 78

1612  2510
 2.04
f) Decision: Reject H0 and accept H1 and conclude that the conventional
classroom procedure is still effective than the program material based on the computed
value.

2. A research study was undertaken to determine the level of effectiveness of the


Performance Appraisal System of a certain agency as perceived by the supervisors and
rank and file. The perceptions of the employees relative to the level of effectiveness of
the PAS were compared according to the respondents’ educational attainment. The data
gathered are shown below.

Supervisors Descriptive Rank & File Descriptive


Indicators
(AWM) Rating (AWM) Rating
A. Basis for Employee Devt.
1. Promotions 3.53 VE 3.02 ME
2. Incentives & Rewards 3.54 VE 2.93 ME
3. Training & Scholarship
3.71 VE 3.11 ME
Grants
4. Administrative Discipline 3.20 ME 2.96 ME
B. Measuring Employee
Performance
1. Work – related Activities 4.45 VE 4.02 VE
2. Behavioral Dimensions 4.44 VE 4.05 VE
Overall 3.82 VE 3.35 VE

Based on the table, is there significant difference between the level of


effectiveness of the Performance Appraisal System as perceived by the supervisors and
rank and file?

Illustrative Testing:
a) H0: There is no significant difference between the level of effectiveness of the
Performance Appraisal System as perceived by the supervisors and rank and
file.
b) H1: There is significant difference between the level of effectiveness of the
Performance Appraisal System as perceived by the supervisors and rank and
file.
c)  = .05 (two – tailed test)
d) Critical Value = t.05 > 1.812 & t.05 < -1.812; df = 10;  = (.05)

79
t
-2.22 0 2.22

e) Computation: (Using the Formula)

x x
t  1 2
σ 2 σ2
1  2
n n
1 2
3.8117  3.3483

 .5177 2   .5355 2
6 6
 1.52

Computation: (Using MS EXCEL)


open ms excel, input supervisor’s column, input rank and file column,
select data, data analysis, t – test (equal variances), highlight variable 1
(supervisor’s column), highlight variable 2 (rank and file column), alpha
at 0.05, select output range, then enter.

t-Test: Two-Sample Assuming Equal Variances


Supervisors Rank and File
Mean 3.811667 3.348333
Variance 0.267977 0.286777
Observations 6 6
Pooled Variance 0.277377
Hypothesized Mean Difference 0
df 10
t Stat 1.523769
P(T<=t) one-tail 0.079273
t Critical one-tail 1.812461
P(T<=t) two-tail 0.158545
t Critical two-tail 2.228139
* Significant at alpha < 0.05

f) Decision: Since the computed value is less than the tabulated value and falls
within the acceptance region, accept the H 0 and reject the H1 and conclude that
there is no significant difference between the level of effectiveness of the

80
Performance Appraisal System as perceived by the supervisors and rank and
file.

The Paired t - test

The objective in paired comparison tests is to eliminate a maximum number of


sources of extraneous variation by making the pairs similar with respect to as many
variables as possible and is given by:

d  μ do
t 
s2 / n
d
where :
d  sample mean difference
μ do  hypothesized population mean difference

s2  variance of sample difference


d
n  number of sample difference

Example

1. Test the hypothesis that the diet was successful after 12 weeks if t he weights of 9
obese women before and after 12 weeks on a very low calorie diet were as follows:

Before After Difference

117.3 83.3 -34.0

111.4 85.9 -25.5

98.6 75.8 -22.8

104.3 82.9 -21.4

105.4 82.3 -23.1

100.4 77.7 -22.7

81.7 62.7 -19.0

89.5 69.0 -20.5

78.2 63.9 -14.3

Illustrative Testing:

81
a) H0: The very low calorie diet was not successful after 12 weeks for 9 obese
women.
b) H1: The very low calorie diet was successful after 12 weeks for 9 obese women.
c)  = 0.05 (one – tailed test)
d) Critical Region: v = 9 – 1 = 8

-1.86 0
e) Computation: (Using the Formula)
 di   34.0    25.5      14.3
d     22.59
n 9

2
sd 
 di  d 2 
  34.0    22.59  2    25.5    22.59  2      14.3    22.59  2
 28.3
n  1 8
 22.59  0
t    12.74
28.30
9
Computation: (Using MS EXCEL)
open ms excel, input after weights column, input before weights column,
select data, data analysis, t – test (paired two samples), highlight
variable 1 (after column), highlight variable 2 (before column),
alpha at 0.05, select output range, then enter.

t-Test: Paired Two Sample for Means


After Before
Mean 75.9444444 98.53333
Variance 76.7252778 172.505
Observations 9 9
Pearson Correlation 0.96020245
Hypothesized Mean Difference 0
df 8
t Stat -12.739511
P(T<=t) one-tail 6.7875E-07
t Critical one-tail 1.85954803
P(T<=t) two-tail 1.3575E-06
t Critical two-tail 2.30600413
* Significant at alpha < 0.05

f) Decision: Since -12.7395 falls in the rejection region, we reject H 0 and accept
H1 and conclude that the diet was successful after the very low calorie diet in 12 weeks.

Analysis of Variance

82
Analysis of Variance is a technique in inferential analysis in which it is designed
to test whether or not more than 2 samples (groups) are significantly different from each
other. To test this claim, one – way or two – way ANOVA can be applied in testing.

Advantages of Using F-Test Over the T-test

 It minimizes the time and effort when computing and testing more than two
samples.
 T-test has statistical limitation.
 The interaction effects between and among the variables can be measured.

Assumptions Made For the F-Test

 Random selection of subjects from a normal population with equal variances.


 Samples are independent.
 Data being analyzed must be interval.

Objectives

 Discuss concepts of analysis of variance.


 Define and discuss the applicability of One – Way ANOVA.
 Define and discuss the applicability of Two – Way ANOVA.
Differentiate One – Way from Two – Way ANOVA.

One – Way ANOVA (Two or More Population Means)

To test the equality of several means, researchers utilize a procedure known as


the analysis of variance. One – factor analysis of variance, only one factor being studied
as the independent variable, is a procedure that uses a set of calculations on several
variances to test the hypothesis that several populations have the same mean.

Steps in Testing One – Way ANOVA

a. Formulate the null hypothesis.


b. Formulate the alternative hypothesis.
c. Set the level of significance.
d. Prepare the worksheet.
e. Find SSTOT (Sum of Squares Total).
f. Find SSBET (Sum of Squares Between).
g. Find SSW (Sum of Squares Within).
h. Find the Degree of Freedom.
i. Find the Variance Estimates (Mean Squares).
j. Find the F-ratio and determine the significance of F. (refer to table of F-ratio)
k. Formulate decision and interpret.
l. Construct an ANOVA Summary Table.
m. Should there be a significant difference in the means, apply a Post hoc
Multiple Comparison Test or Post ANOVA Scheffe Test.

F 
Xi  X ii 2
 1 1 
MSW     k  1
N N 
 i ii 

83
Steps:
1. Specify pairs of means to be computed.
2. Compute for Scheffe Test.
3. Compute for the test statistic.
4. Compare the ratio and make interpretation.

Example

1. Four groups of 6 students in Statistics class each is taken from the four programs of the
College of Arts and Management is subjected in an experiment, where each group is
subjected to one of the types of teaching method. The grades of the students are tabulated
at the end of 1 month of experiment. At 0.05 level of significance, test the hypothesis that
the there is no significant difference in the average grade gains among the four groups of
students using the four methods of teaching against that there is significant difference
among the four methods.
Methods
Group 1 Group 2 Group 3 Group 4
Student
Method A Method B Method C Method D
1 78 95 97 80
2 82 85 89 86
3 85 85 88 80
4 79 92 90 75
5 80 82 90 80
6 82 90 80 82

Illustrative Testing:
a) H0: There is no significant difference in the average grade gains among the
four groups of students using the four methods of teaching.
b) H1: There is significant difference in the average grade gains among the four
groups of students using the four methods of teaching.
c)  = 0.05 (one – tailed test)
d. Computation: (Using the Formula)

Methods
Stud Group 1 Group 2 Group 3 Group4
A (X11) B (X21) C (X31) D (X41)

X11 2  X212  X 31 2  X 41 2
1 78 95 97 80 6084 9025 9409 6400
2 82 85 89 86 6724 7225 7921 7396
3 85 85 88 80 7225 7225 7744 6400
4 79 92 90 75 6241 8464 8100 5625
5 80 82 90 80 6400 6724 8100 6400
6 82 90 80 82 6724 8100 6400 6724
(X11) =
  X11    X 21  =   X 31  =   X 41  2 =
2 2 2
(X21) = (X31) = (X11) =
486
529 534 483 =39,398 46,763 47,674 38,945

84
   
2
Σ Xt
e. SS TOT  Σ X 2 
t Nt
 172,780 - 172,042.7
 737.33

f . SS 
 Σ X11 2 
Σ X 21 2

 Σ X31 2
 Σ X 41 2

 Σ X T 2
BET N11 N21 N3 N4 NT


 486  2 
 529 2   534  2   483 2   2032 2
6 6 6 6 24
 371
g. SS W  SS TOT  SSBET
 737.33  371
 366.33
h. Degrees of Freedom = 4 – 1 = 3 (between)
= 24 – 4 = 20 (within)
SSBET
i. MSBET 
dfBET
371

3
 123.67
SS W
MS 
W df w
366.33

20
 18.32
MSBET
j. F 
MS W
123.67

18.32
 6.75
F – Tabulated Value = 3.10

k. Decision: Reject H0 and Accept H1 and conclude that there is significant


difference in the average grade gains among the four groups of students using the four
methods of teaching.
l.
Summary Table for the One – Factor ANOVA of the Mean Grade Gain
of the Four Sample Groups Using Four Different Methods
Degrees
Sources Sum of Mean Computed Tabular
of Decision Interpretation
Variation Squares Squares f f (0.05)
Freedom
Between
371 3 123.67
Column Reject
6.75 3.10 Significant
Within H0
366.33 20 18.32
Column

85
m. Application of Post Hoc Multiple Comparison or Post ANOVA Scheffe Test.

Given:
x11 = 81 MSW = 18.32
x 21 = 88.17 K–1=3
x 31 = 89
x 41 = 80.5

1. a. x11 & x 21 d. x11 & x 31


b. x 21 & x 31 e. x11 & x 41
c. x 31 & x 41 f. x 21 & x 41

2. F 
X i  X ii 2
a  1 
1   k  1
MSW  
N N 
 i ii 
 81  88.17  2

1 1
18.32     3
6 6
 2.80

86
F 
Xi  X ii 2
b  1 
1   k  1
MSW  
N Nii 
 i 


 88.17
- 89  2
1 1
18.32     3
6 6
 .04

F 

X i  X ii
2
 F 
X i  X ii 2
c  1  d  1 
1 1   k  1
MSW     k  1 MSW  
N N  N N ii 
 i ii   i 
 89
 80.5  2  81 - 89  2
 
1 1 1 1
18.32     3 18.32     3
 6 6 6 6
 3.94  3.49

F 
X i  X ii  2

e  1 
1   k  1
MSW  
N N 
 i ii 
 81 - 80.5 2

1 1
18.32     3
6 6
 .013

F 
Xi  X ii 2
f  1 
1   k  1
MSW  
N N 
 i ii 


 88.17
- 80.5  2
1 1
18.32     3
 6 6
 3.21

3. F – Test
F   k  1  df 
  3  3.10 
 3.01

4. a. Since the computed value for Fa sheffe test (2.80) is less than Ftest (3.01),
therefore, there is no significant difference between methods A and B.
b. Since the computed value for Fb sheffe test (.04) is less than Ftest (3.01),
therefore, there is no significant difference between methods B and C.
c. Since the computed value for Fc sheffe test (3.94) is greater than Ftest
(3.01), therefore, there is significant difference between methods C and D.
d. Since the computed value for Fd sheffe test (3.49) is greater than Ftest
(3.01), therefore, there is significant difference between methods A and C.

87
e. Since the computed value for Fe sheffe test (.013) is less than Ftest (3.01),
therefore, there is no significant difference between methods A and D.
f. Since the computed value for Ff sheffe test (3.21) is greater than Ftest
(3.01), therefore, there is significant difference between methods B and D.

Computation: (Using MS EXCEL)


open ms excel, input Method A column, input Method B column, input
Method C column, input Method D column, select data, data analysis,
ANOVA single factor, highlight all numerical data in all methods,
alpha at 0.05, select output range, then enter.

SUMMARY
Groups Count Sum Average Variance
Method A 6 486 81 6.4
Method B 6 529 88.16667 24.56667
Method C 6 534 89 29.6
Method D 6 483 80.5 12.7
ANOVA
Source of Variation SS df MS f P-value f crit
Between Groups 371 3 123.6667 6.751592 0.002509 3.098391
Within Groups 366.3333 20 18.31667
Total 737.3333 23

Two – Way ANOVA (Two or More Population Means)

ANOVA for single factor may be extended to more than one independent
variable. Similar to one – way analysis of variance, the variability of data may be caused
by such sources as error and each independent variable. The effects produced by each
independent variable are called main effects, while the effect by the combination of the
variable is called interaction.

Steps in Testing Two – Way ANOVA

a. Formulate the null and alternative hypotheses.


b. Set up the level of significance.
c. Compute the SSTOT.
d. Compute the SSROWS.
e. Compute the SSCOLUMNS.
f. Compute the SSERROR.
g. Compute the SSINTERACTION
h. Compute the degrees of freedom of rows and degrees of freedom of
intersection.
i. Compute the Mean Sum of Squares.
j. Calculate the value of F.
k. Construct an ANOVA Two – Way Summary Table
l. Formulate decision and interpret.

There is no interaction if the null will be accepted.


There is an interaction if the null will be rejected.

88
Example

1. Compare the effects of study habits on memory retention of the students at


0.05 level of significance. Each variable has different categories; study habit
has classification of with music and without music, while the memory and
retention can be classified as high, moderate, and low retention.
Factor A
Study Factor B – Memory Retention
Habit
High Moderate Low Total Row
X11  X11 2 X21  X 21 2 X31  X 31 2 XT  X T 2
With 10 100 9 81 8 64 27 245
Music 15 225 12 144 10 100 37 469
14 196 11 121 7 49 32 366
13 169 10 100 10 100 33 369
12 144 13 169 11 121 36 434
Sub Total 64 834 55 615 46 434 165 1883
16 256 12 144 12 144 40 544
17 289 10 100 10 100 37 489
Without
12 144 11 121 11 121 34 386
Music
13 169 13 169 12 144 38 482
14 196 9 81 10 100 33 377
Sub Total 72 1054 55 615 55 609 182 2278
Total
136 1888 110 1230 101 1043 347 4161
Column

Illustrative Solution:
a. H0: There is no interaction between Study Habit (Factor A) and Memory
Retention (Factor B).
H1: There is an interaction between Study Habit (Factor A) and Memory
Retention (Factor B).
b.  = 0.05 (one tailed test)

Computation: (Using the Formula)

 
c. SS TOT  Σ X 2 
 
Σ Xt
2

t Nt
 4161 - 4013.6
 147.37
1652 1822 347 2
d. SS ROWS   
15 15 30 `
 1815  2208.27  4013.6
 9.63
136 2 1102 1012 347 2
e. SS    
COLUMNS 10 10 10 30
 1849.6  1210  1020.1 - 4013.63
 66.07

89
64 2 552 462 722 552 552
f. SS  4161 
    
ERROR 5 5 5 5 5 5
 4161  4094.2
 66.8
g. SS  SS  SS  SS
INTERACTION TOT ROWS COLUMNS
 SS
ERROR
 147.37  9.63  66.07  66.8
 4.87
h. Degrees of Freedom:
dfrows = (r - 1), (N – ab) = (2 - 1), (30 – 6) = 1, 24
dfcolumns = (c – 1), (N – ab) = (3 – 1), (30 – 6) = 2, 24
dferror = N – ab = 30 – 6 = 24
dfinteraction = (r – 1) . (c – 1), (N –ab)
= (2 – 1) . (3 – 1), (30 – 6)
= (2, 24)
SS ROWS SS COLUMNS
i. MS  MS 
ROWS  a 1 COLUMNS b  1
9.63 66.07
 
2 1 3 1
 9.63  33.04
SS INTERACTION SS ERROR
MS INTERACTION  MSERROR 
a  1 b  1 N  ab
4.87 66.8
 
2 24
 2.44  2.78

MS MS
j. f rows  ROWS f COLUMNS  COLUMNS
MS ERROR MS ERROR
9.63 33.04
 
2.78 2.78
 3.46  11.88
MS
f  INTERACTION
INTERACTION MS ERROR
2.44

2.78
 0.88

k. Summary Table
Degrees F–
Source of Sum of Mean F-
of Tabular Decision Interpretation
Variation Squares Squares Ratio
Freedom Value
Accept Not
Factor A 1 9.63 9.63 3.46 4.26
H0 Significant
Reject
Factor B 2 66.07 33.04 11.88 3.40 Significant
H0
Interaction Accept No
2 4.87 2.44 0.88 3.40
AB H0 interaction

90
Error 24 66.8 2.78
Total 29 147

Computation: (Using MS EXCEL)


open ms excel, A1 remain non – entry, A2 input with music, A7 input
without music, B column input high entry, C column input moderate
entry, D column input low entry, select data, data analysis, ANOVA
two factor with replication, highlight all numerical and non numerical
data, 5 rows per sample, alpha at 0.05, select output range, then enter.

Anova: Two-Factor With Replication


SUMMARY High Moderate Low Total
With Music
Count 5 5 5 15
Sum 64 55 46 165
Average 12.8 11 9.2 11
Variance 3.7 2.5 2.7 4.857143
Without Music
Count 5 5 5 15
Sum 72 55 55 182
Average 14.4 11 11 12.13333
Variance 4.3 2.5 1 4.980952
Total
Count 10 10 10
Sum 136 110 101
Average 13.6 11 10.1
Variance 4.266667 2.222222222 2.544444
ANOVA
Source of Variation SS df MS f P-value f crit
Sample 9.633333 1 9.633333 3.461078 0.075127 7.822871
Columns 66.06667 2 33.03333 11.86826 0.000261 5.613591
Interaction 4.866667 2 2.433333 0.874251 0.430042 5.613591
Within 66.8 24 2.783333
Total 147.3667 29

l. Decision: The study habit with music F – computed value of 3.46 is lower than
the tabular F – value of 4.26 so it leads to the acceptance of the null hypothesis. Hence,
there is no significant difference among the three with music populations in their mean
memory retention. It can be stated that studying with music has no effect in the
memory retention.
The study habit without music F – computed value of 11.88 exceeds the tabular
F – value of 3.40 thus, it leads to the rejection of the null hypothesis. Therefore, there is
significant difference exists between the three without music populations in their
memory retention. It can be stated that studying without music has an effect in the
memory retention.

91
For interaction, since that F – computed value is lower than the F - tabulated
value, it leads to the acceptance of the null hypothesis. It means that there is no
interaction between the study habit and memory retention. The result gives an
implication that the student with high memory retention using music while studying
will have a high retention even without music.

Exercise 12
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Test systematically the corresponding hypothesis. Use data analysis of Microsoft Excel
to test the following conjectures.

92
1 - 10. Test the hypothesis that the average number of suicide victims in Iraq is 33
every month if a random sample of 9 suicide events tallied victims 29, 30, 32, 34, 35,
37, 39, & 42. Use a 0.05 level of significance and assume that the distribution of
victims is normal.

11 – 20. Test the conjecture that performance of the experimental group is better than
the control group using the programmed material in statistics at 0.05 level 0f
significance.

Students EG CG
1 26 22
21 - 30. In 2 17 19 a study
conducted by
3 20 14
graduating students in
the College 4 19 18 of Arts and
5 19 11
6 24 16
7 23 15
8 25 20
Management in DMMMSU-MLUC the following data were recorded to determine the
level of effectiveness of the computerized registration during enrollment in min per
transaction.

After
Before (Min/trans.)
(Min/trans.)
25 17
20 12
30 12
28 20
25 18
19 17
32 18
20 15

Assuming the populations to be normally distributed, is there sufficient


evidence, at the 0.05 level of significance to say that computerized registration
influences the length of time to finish enrollment registration?

31 - 40. Test the conjecture that there is a significant difference on the level of
effectiveness of the Performance Appraisal System as to Educational Attainment as
perceived by the three groups.

Indicators Group 1 Group 2 Group 3

1 3.10 3.28 3.18


2 3.61 3.26 2.57
3 4.17 3.40 3.19

93
4 3.00 3.08 3.05
5 3.89 4.23 4.48
6 3.61 4.23 4.67

41 - 50. Apply the two – way ANOVA to the following data. Use a 0.05 level of
significance.

Factor A – Subjects Factor B – Classifications of Achievers


70 75 85
72 80 88
English 76 85 92
75 82 95
70 80 86
72 82 90
Math
76 84 88
70 85 92

Testing Hypothesis Concerning Variance/s

In this section, we are concerned with testing the uniformity of hypothesis


concerning population variance/s or standard deviation/s.

One Population Variance

Problem of testing hypothesis H0 that the population variance σ 2  σ 2


0 against
the usual alternatives:
2 2
 H 0: σ  σ 0

94
 H 1: σ 2  σ 20 , σ2  σ20 2 2
, or σ  σ 0

Assuming that the distribution of the population being sampled is at least


approximately normal, chi – square value for testing σ 2  σ 2
0 is given by:

x2 
n  1 s 2
σ20
Where:
n = sample size
s 2 = sample variance
σ2
0 = specified value of the population variance

Establishing the Critical Region

 If σ 2  σ20 , then the critical region is x 2  x 2α which lies on the right tail of the

distribution.

 If
2σ  σ2 , then the critical region is x2  x12 - α which lies on the left tail of the

0
distribution.

 If σ 2  σ 2
0 , then the critical region is x2  x2 and x2  x2 which lies on
1 - α/2 /2

each tail of the distribution.

Example

1. A soft-drink dispensing machine is said to be out of control if the variance of the


contents exceed 1.15 deciliters. If a random sample of 25 drinks from this machine has
a variance of 2.03 deciliters, does this indicate at the 0.05 level of significance that the
machine is out of control? Assume that the contents are approximately normally
distributed.
Illustrative Testing:
a) H0: The variance contents of the soft drink dispensing machine do not exceed
1.15 deciliters. σ 2  1.15 dec
b) H1: The variance contents of the soft drink dispensing machine exceed 1.15
deciliters. σ 2  1.15 dec
c)  = 0.05 (one – tailed test)
d) Critical Region: v = 25 – 1 = 24

95
0 36.415
2
e) Computation: s = 2.03; n = 25

x2 
 n  1 s2
σ20
24  2.03

1.15
 42.37
f) Decision: Reject H0 and accept H1 and conclude that the machine is out of
control since the variance exceeded to 42.37.

Two Population Variances

In this case, the objective is to check and test the equality or uniformity of two
2 and σ 2
variances σ1 2 of two populations. We shall test the null hypothesis against one
of the usual alternatives.
2 2
 H0: σ1  σ 2

 H 1: σ12  σ22 , σ12  σ22 2 2


, or σ1  σ 2

For independent random samples of size n 1 and n2, respectively from the two
populations, f – test for testing is given by:
2
s1
f 
s2
2
Where:
2
s1 = variance of the first population
s2
2 = variance of the second population

Establishing the Critical Region

 If σ 2  σ2 , then the critical region is f  fα v1, v 2   which lies on the right

 
1 2
tail of the distribution.

 If
σ2  σ2 , then the critical region is f  f v ,v
1-α 1 2 which lies on the left tail of

1 2
the distribution.

96
 If σ2  σ2 ,
1 2 then the critical region is f  f v ,v 
1 - α/2 1 2 and f f  
v ,v
α/2 1 2

which lies on each tail of the distribution.

Example

1. A classroom in mathematics is taught to 13 students by the conventional classroom


procedure. A second group of 11 students was given the same course by means of
programmed materials. At the end of the semester, the same examination was given to
each group. The 13 students meeting in the classroom made an average of 85 with a
standard deviation of 4, while 11 students made a deviation of 5. Test the hypothesis
that the unknown variances are equal at 0.10 level of significance.
Illustrative Testing:
a) H0: The consistencies of scores using the conventional classroom procedure
do not differ from the scores using the programmed materials.
b) H1: The consistencies of scores using the conventional classroom procedure
differ from the scores using the programmed materials.
c)  = 0.10 (two – tailed test)
d) Critical Regions: v1 = 13 – 1 = 12; v2 = 11 – 1 = 10

0.36 2.91

f f  
v ,v
α/2 1 2 = f0.05 (12, 10)

= 2.91

 
Theorem:

ff v ,v
1 - α/2 1 2
1

 
f v ,v
α 2 1
Therefore:

97
 
f  f v ,v
.95 1 2
1

 
f v ,v
α 2 1
1

f 10, 12
.05
 0.36
2
e) Computation: s1 = 16; s 2
2 = 25
s 2
f  1
s22
16

25
 0.64
f.) Decision: Accept H0 and conclude that scores taken from the conventional
classroom procedure are consistent or do not differ from the scores taken from the
programmed material.

98
Exercise 13
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Test the following hypothesis systematically.

1 - 10. A researcher would like to determine whether taking smoking on early age will
shorten the life span of the person. Past experience indicates that the person will live
with a mean life of 55 years and a variance of 3.5 years if started early smoking.
Suppose a sample of 27 subjects have been experimented and found out that the
variance of the person to live is 2.9 years, does this seem to indicate that taking early
smoking will shorten the variance of life of a person? Use 0.05 level of significance.
Illustrative Testing:
a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

e. Computations:

99
f. Decision:

11 - 20. A study was conducted to compare variability of the age of unexpected


pregnancy cases of students in DMMMSU – MLUC and SLC in San Fernando City, La
Union. Records show that in DMMMSU – MLUC the variance of the age of 10
unexpected pregnant students is 2.33 years, and 1.2 years for 8 students in SLC. Does
this seem to indicate that unexpected pregnancy cases in SLC is more variable or that
there are more students from SLC who get involve early in pregnancy than DMMMSU-
MLUC students? Use a 0.05 level of significance.
Illustrative Testing:
a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

e. Computations:

f. Decision:

100
Testing Hypothesis Concerning Proportions

Test of hypotheses concerning proportions are required in many areas.


Politicians, manufacturing firms, managers or even gamblers depend on knowledge of
proportion of outcomes that he considers favorable.

One Population Proportion

We shall consider the problem of testing the hypothesis that the proportion of
successes in a binomial experiment equals some specified values.
Problem of testing hypothesis H0 that the population proportion p  p0 against
the usual alternatives:
 H 0: p  p 0
 H1: p  p0 , p  p0 , or p  p0
Assuming that the distribution of the population being sampled is at least
approximately normal and large, z - value for testing p  p0 is given by:
x  n p0
z 
n p0 q 0
Where:
x = sample
n = population
p0 = proportion of success
q0 = proportion of failure

Establishing the Critical Region

 If p  p , then the critical region is z z which lies on the right tail of


0
α
the distribution.

101
p  p z  z
α
 If 0, then the critical region is which lies on the left tail of the

distribution.

z z z  z
α/2
 If p  p , then the critical region is and which lies on
0 α/2

each tail of the distribution.



Example

1. In a survey of drug users conducted by PDEA, 18 out of 423 (4.26%) were found to be
HIV positive. Can we conclude that fewer than 5% of the drug users in the sampled
population are HIV positive? Use 0.05 level of significance.
Illustrative Testing:
a) H0: The proportion of sampled population who are HIV positive is equal to .05
or 5%. P = .05
b) H1: The proportion of sampled population who are HIV positive is less than .
05 or 5%. P < .05
c)  = 0.05 (one – tailed test)

d) Critical Region: z < - 1.645

-1.645 0
e) Computation: x = 18; n = 423; p0 = 4.26%; q0 = 95.74%
x  n p0
z 
n p0 q 0
18  423 .0426

423 .0426 .9574 
  0.00477
f.) Decision: Accept H0 and reject H1 and conclude that the proportion of the
population who are HIV positive is equal to 0.05 or 5%.

Two Population Proportions

Problem of testing hypothesis H0 that the population proportions p1  p2


against the usual alternatives is given by:
 H0: p1  p2
 H1: p1  p2 , p1  p2 , or p1  p2
Assuming that the distribution of the population being sampled is at least
approximately normal and large, z - value for testing p1  p2 is given by:
P1  P2
z 
  
pq  1  1
n 2 
 n1 102
 x  x2
p  1 (pooled estimate proportions)
n1  n2
Establishing the Critical Region

 If p  p with corresponding level of significance , then the critical region


1 2

is z z which lies on the right tail of the distribution.


α
 If p  p with corresponding level of significance , then the critical region
1 2

z  z
α
is which lies on the left tail of the distribution.

 If p  p with corresponding level of significance , then the critical


1 2

z z z  z
α/2
region is and which lies on each tail of the distribution.
α/2

Example

1. A poll is taken among the residents in San Fernando City, La Union to determine and
compare level of acceptability favoring the proposal on Charter Change of the two
Barangay’s being sampled. Barangay Catbangen was found out to have 80 out of 150
favoring the proposal and there were 90 out of 200 favoring the proposal in Brgy.
Santiago. Would you agree that the proportion of the vote taken from Brgy. Santiago
favoring the Charter Change is higher than the proportion taken from Brgy. Catbangen?
Use a 0.05 level of significance to test the hypothesis.
Illustrative Testing:
a) H0: The proportions of the votes taken from 2 Barangay’s being sampled do
not differ from one another. p1  p2
b) H1: The proportion of the vote taken from Brgy. Santiago is higher than the
proportion of the vote taken from Brgy. Catbangen. p1  p2
c)  = 0.05 (one – tailed test)
d) Critical Region: z > 1.645

P1  P2
z 
  
pq  1
 1 
n
 10 n 2  1.645
e) Computation: P1 =0.53  0.45
80/150; P2 = 90/200; n1 = 150; n2 = 200

 0.49 0.51 1150  1200  
 1.48
 80  90
p 
150  200 103
 0.49

q  0.51
f.) Decision: Accept H0 and reject H1 and conclude that the proportions of the
votes of the 2 Barangay’s being sampled do not differ from one another or equal.

Testing Hypothesis Concerning Several Proportions

The chi – squared statistic for testing homogeneity is also applicable in


determining differences between two proportions to a test determining differences
among k proportions. Hence, we are interested in testing the null hypothesis that:
 H0: p1  p2     pk
 H1: p1, p2, …, pk not all are equal.
To perform this test, the test procedure is identical to the test for homogeneity or
the test for independence and is given by:
 To solve expected frequency:
Expected frequency = (column total / row total) / grand total
 Chi – squared testing for testing for independence

x2  
 oi  ei 2
i ei
Where:
oi = observed frequencies
ei = expected frequencies

Establishing the Critical Region

 If p1, p2, …, pk not all are equal, then the critical region is x 2  x 2α which

lies on the right tail of the distribution.

Example

1. In a shop, a set of data was collected to determine whether or not the proportion of
defectives produced by workers was the same for the day, evening, or midnight shift
worked. The following data were collected.
SHIFT
Day Evening Midnight
Defectives 45 55 70

104
Nondefectives 905 890 870

Use a 0.05 level of significance to determine if the proportion of defectives is the


same for all three shifts.
Illustrative Testing:
Let p1, p2, and p3 represent the true proportions of defectives for the day,
evening, and night shifts, respectively.
a) H0: There is no significant difference on the true proportions of defectives for
the day, evening, and night shifts, respectively. p1  p2  p3
b) H1: There is significant difference on the true proportions of defectives for the
day, evening, and night shifts, respectively. p1, p2, & p3 not all are equal.
c)  = 0.025 (one – tailed test)

d) Critical Region: x 2  7.378 for v  k - 1  3 - 1  2

0 7.378

e) Computation:

1. e 
 950170  57.0 4. e 
 2665 950  893.03
1 2835 4 2835
2. e2 
 945170  56.67 5. e5 
 945 2665  888.33
2835 2835
3. e 
 940170  56.37 6. e 
 940 2665  883.63
3 2835 6 2835
SHIFT Total
Day Evening Midnight
Defectives 45 (57.0) 55 (56.67) 70 (56.37) 170
Non-defectives 905 (893.03) 890 (888.33) 870 (883.63) 2665
Total 950 945 940 2835

x2 
 45  57 2

 55  56.67 2

 70  56.37 2
57.0 56.67 56.37


 905  893.03 2

 890  888.33 2

 870  883.63 2
893.03 888.33 883.63
 6.29
f.) Decision: Accept H0 and reject H1 and conclude that it would certainly be
dangerous to say that the proportions of defectives produced by the workers are the
same for all shifts.

105
Exercise 14
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Test the following hypothesis systematically.


1 - 10. In DMMMSU – MLUC, it is estimated that at most 70% of the students commute
jitney’s and tricycles to class. Do we have reason to doubt this claim, if a random
sample of 150 college students, 100 is found to ride jitneys’ and tricycles to class? Use a
0.01 level of significance.
Illustrative Testing:

a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

106
e. Computation:

f. Decision:

11 - 20. A study was conducted to estimate whether there is significant difference


between the proportions of the votes of the parents in the public and private schools in
San Fernando City, La Union favoring the proposal of holding classes in September
every school year. A random sample of 150 out 220 parents in the public favor the
proposal and only 75 out of 220 parents in the private schools favor the proposal. Does
this seem to indicate that there are more parents from the public schools favoring the
proposal than the parents in the private schools? Use a 0.05 level of significance.

Illustrative Testing:

a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

107
e. Computation:

f. Decision:

21 - 30. A study was made to determine whether there is a significant difference


between the proportions of parents in the province of Pangasinan, La Union, Ilocos Sur,
Ilocos Norte, and Abra who favor placing sex education in the secondary level. The
responses of 100 parents selected at random in each of these provinces are recorded in
the following table:
Provinces
Ilocos Ilocos
Preference Pangasinan La Union Abra Total
Sur Norte
Yes 65 66 53 38 45 267
No 35 34 47 62 55 233
Total 100 100 100 100 100 500

Can we conclude that the proportions of parents who favor placing sex
education in the secondary level are not the same for these 5 provinces? Use a 0.05
level of significance.

Illustrative Testing:
a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

108
e. Computations:

f. Decision:

Testing Hypothesis Concerning Correlation Coefficient

After a correlation coefficient has been computed, it is usually desired to


ascertain whether or not the computed coefficient is merely a chance deviation from a
normally distributed population in which r = 0.

Test of Significance for Pearson’s r (p = 0)

109
When testing the null hypothesis that the population correlation coefficient is
zero, the following t – test should be used:
r n  2
t  with v  n  2
1  r2
Where:
r = Pearson’s r correlation coefficient.
n = number of respondents (number of pairs sample)

n  xy    x   y 

   
r 
2 2
n  x2   x n  y2    y
 Pearson r 
Establishing the Critical Region

 If p > 0, then the critical region is t t which lies on the right tail of the
α
distribution.

If p < 0, then the critical region is t  t


α
 in t – test which lies on the left tail

of the distribution.

t t t  t
α/2
 If p  0, then the critical region is and in t – test which lies
α/2

on each tail of the distribution.


Example

1. A team of social psychologists has developed a scale that purports to measure social
isolation. The scores made on the scale by 15 respondents are correlated with their
scores on an index revealing the degrees of prejudice felt towards minority groups. They
obtain a Pearson’s r of 0.60. May they conclude that the obtained correlation is not
likely to have been drawn from a population in which the true correlation is zero? Use a
0.05 level of significance.
Illustrative Testing:
a) H0: The population correlation coefficient from which this sample was drawn
equals to 0. p = 0
b) H1: The population correlation coefficient from which this sample was drawn
does not equal to zero. P  0
c)  = 0.05 (two – tailed test)
d) Critical Region: t > 2.16 and t < -2.16; v = 15 – 2 = 13

-2.16 0 2.16
e) Computation: r = 0.60; n = 15

110
0.60 15  2
t 
2
1   0.60 
 2.70
f.) Decision: Reject H0 and accept H1 and conclude that the relation between the
origin of the respondents and the degree of prejudice felt toward the minority groups.

Test of Significance for Pearson’s r (p  0)

When testing the null hypothesis that the population correlation coefficient is
not zero, the following z – test should be used:
zr  Zr
z 
1
 n  3
Where:
zr = the transformed value of the sample r
Zr = the transformed value of the population correlation.

Establishing the Critical Region

 If p  p with corresponding level of significance , then the critical region


0

is z z which lies on the right tail of the distribution.


α
 If p  p corresponding level of significance , then the critical region
0 with

z  z
α
is which lies on the left tail of the distribution.

 If p  p with corresponding level of significance , then the critical region


0

z z z  z
α/2
is and which lies on each tail of the distribution.
α/2

Example

1. Using the same example, let us test H0 that the population correlation from which the
sample was drawn is 0.25.
Illustrative Solution:
a) H0: The population correlation coefficient from which this sample was drawn
equals to 0.25. p = 0.25
b) H1: The population correlation coefficient from which this sample was drawn
does not equal to 0.25. P  0.25
c)  = 0.05 (two – tailed test)
d) Critical Region: z > 1.645 and t < -1.645;

111
-1.645 0 1.645

e) Computation: r = 0.60; n = 15; p = 0.25


zr = 0.693 (referring to table, in which r = 0.60)
Zr = 0.256 (similarly refer to table, in which p = 0.25)
0.693  0.256
z 
1
15  3
 1.51
f.) Decision: Accept H0 and reject H1 and conclude that it is quite possible that
the sample was drawn from a population in which the true correlation is 0.25.

Test of Significance of rs (Spearman’s Correlation)

In testing rs spearman’s correlation coefficient, we merely compare it with the


critical values of rs that is given by ranked variables:
6  D2
rs  1 
 
n n2  1

Establishing the Critical Region

 If rs < r we accept H0
 If rs  r we reject H0

Example

1. Test the hypothesis that there is a strong relation of the social class backgrounds of
husbands and wives prior to marriage. Use a 0.01 level of significance.
Wife’s Rank (x) Husband’s Rank (y) D D2
1 4 -3 9
2 2 0 0
3 9 -6 36
4 1 3 9
5 7 -2 4
6 10 -4 16
7 8 -1 1
8 13 -5 25
9 5 4 16
10 3 7 49
11 11 0 0
12 6 6 36
13 12 1 1
14 15 -1 1
15 14 1 1
D = 0  D 2 = 204
Illustrative Testing:
a) H0: The population value of the spearman correlation coefficient is 0 and that
there is no strong relation of the social class of husband’s and wives prior to marriage.
p=0

112
b) H1: The true population correlation coefficient is greater than zero. P > 0
c)  = 0.01 (one – tailed test)
d) Critical Region: rs(0.01) > 0.623

0 .623
e) Computation: n = 15;  D 2 = 204
6  D2
rs  1 

n n2  1 
6  204 
 1 
15  224 
 0.64
f.) Decision: Reject H0 and Accept H1 and conclude that since population value of
rs is greater than zero, there is a strong relation of the social class of husbands and
wives prior to marriage.

113
Exercise 15
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Test the following hypothesis systematically.


1 - 15. The following data represent the algebra grades for a random sample of 10
freshmen in DMMMSU-MLUC along with their scores on an intelligence test
administered while they were still seniors in high school.

Student IQ Score (x) Algebra (y)


1 65 82
2 50 70
3 55 73
4 63 81
5 50 85
6 70 80
7 65 77
8 68 76
9 55 72
10 75 71
a. Compute and interpret the sample correlation coefficient.
b. Test the hypothesis that p = 0.5 against the alternative that p > 0.5. Use a 0.05 level
of significance.
Illustrative Testing:
a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

e. Computation:

114
f. Decision:

16 - 30. A consumer panel tests 9 brands of microwave ovens for over-all quality. The
ranks assigned by the panel and the suggested retail prices are as follows:

Suggested Price
Manufacturer Panel Rating
in Php
A 6 4800
B 9 3950
C 2 5750
D 8 5500
E 5 5100
F 1 5450
G 7 4000
H 4 4650
I 3 4200
Is there a significant relationship between the quality and the price of a
microwave oven? Use a 0.05 level of significance.
Illustrative Testing:
a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

e. Computation:

115
f. Decision:

Chapter VIII. The Chi – Square Analysis

Goodness – of – Fit Test

In this section, we are concerned with a test to determine if a population has


specified theoretical distribution. The test is based on how good a fit we have between
the frequency of occurrence of observations in an observed sample and the expected
frequencies obtained from the hypothesized distribution.
A goodness-of-fit test between observed and expected frequencies is based on
the quantity:

x2 

k oi  ei 2

i 1 ei
Where:
2 = value of a random variable very close to the chi-squared
x
distribution
oi = observed frequencies
ei = expected frequencies

Analysis

 If the observed frequencies are close to the corresponding expected


frequencies, x 2 -value will be small and indicating a good fit.
 If the observed frequencies differ considerably from the expected
frequencies, x 2 -value will be large, and the fit is poor.
 A good fit leads to the acceptance of the null hypothesis.
 A poor fit leads to the rejection of the null hypothesis.

Example

1. A die is tossed 180 times with the following results:


X 1 2 3 4 5 6
f 28 36 36 30 27 23

Is this is a balanced die? Use a 0.05 level of significance.


Illustrative Solution:
a) H0: There is no sufficient evidence to prove that the die is balanced.
b) H1: There is sufficient evidence to prove that the coin is balanced.
c)  = 0.05 (one – tailed test)

d) Critical Region: x 2  15.086 for v  n - 1  6 - 1  5

116
0 15.086

e) Computation:
Observed & Expected Frequencies of
180 Tosses a Die
X 1 2 3 4 5 6
Observed 28 36 36 30 27 23
Expected 30 30 30 30 30 30

x2 
 28  30  2

 36  30 2

 36  30  2
30 30 30


 30  30 2 
 27  30 2 
 23  30  2
30 30 30
 4.47
f.) Decision: Accept H0 and reject H1 and conclude that there is sufficient
evidence to prove that the coin is balanced.

Test for Independence (Categorical Data)

In this section, testing the hypothesis of independence of two variables of


classification can be determined also using the chi-squared test procedure discussed in
goodness-of-fit. A contingency table with r rows and c columns will be utilized in testing
the independency. But the only difference is the manner in assigning expected
frequencies which is recorded beside every actual observed frequency.
To test independency of two variables of classification we use:

x2 
k oi  ei 2

i 1 ei
Where:
2 = value of a random variable very close to the chi-squared
x
distribution
oi = observed frequencies
ei = expected frequencies
To compute the expected frequencies:
Expected frequency = (column total / row total) / grand total

Analysis

 Large value of the chi-squared distribution leads to the rejection of the null
hypothesis and therefore classifications of variables are dependent.
 Small value of the chi-squared distribution leads to the acceptance of the
null hypothesis and therefore, classifications of variables are independent.

Example

117
1. In an experiment to study the dependence of hypertension on smoking habits, the
following data were taken on 150 individuals.
Moderate Heavy
Non-smokers Total
Smokers Smokers
Hypertension 21 (26.07) 28 (27.2) 36 (31.73) 85
No hypertension 25 (19.93) 20 (20.8) 20 (24.27) 65
Total 46 48 56 150
Test the hypothesis that the presence or absence of hypertension is independent
of smoking habits. Use a 0.05 level of significance.
Illustrative Testing:
a) H0: The presence or absence of hypertension is independent of smoking
habits.
b) H1: The presence or absence of hypertension is dependent of smoking habits.
c)  = 0.05 (one – tailed test)

d) Critical Region: x 2  5.991 for v   2 - 1 3  1  2

0 5.991

e) Computation:
1. e 
 46  85  26.07 4. e 
 46  65  19.93
1 150 4 150
2. e2 
 48 85
 27.2 5. e5 
 48 65  20.8
150 150
3. e 
 56  85  31.73 6. e 
 56  65  24.27
3 150 6 150

x2 
 21  26.07  2

 28  27.2  2

 36  31.73  2
26.07 27.2 31.73


 25  19.93  2

 20  20.8 2

 20  24.27  2
19.93 20.8 24.27
 3.65
f.) Decision: Accept H0 and reject H1 and conclude that the presence or absence
of hypertension is independent of smoking habits.

Test for Homogeneity

In this section, we test the hypothesis that the population proportions within
each row are the same. We are basically interested in determining whether three or
more categories or classifications of variables are homogenous.
To test homogeneity of three or more variables of classifications we use:

x2 

k oi  ei 2

i 1 ei

118
To compute the expected frequencies:
Expected frequency = (column total / row total) / grand total

Analysis

 Large value of the chi-squared distribution leads to the rejection of the null
hypothesis and therefore classifications of variables are homogenous.
 Small value of the chi-squared distribution leads to the acceptance of the
null hypothesis and therefore, classifications of variables are heterogeneous.

Example

1. A random sample of 200 married men, all retired, were classified according to
education and number of children.
Number of Children
Education 0-3 4-7 Over 7 Total
Elementary 14 37 32 83
Secondary 19 42 17 78
College 12 17 10 39
Total 45 96 59 200

Test the hypothesis, at the 0.05 level of significance that the size of a family is
independent of the level of education attained by the father.
Illustrative Solution:
a) H0: The size of a family is independent of the level of education attained by the
father.
b) H1: The size of a family is dependent of the level of education attained by the
father.
c)  = 0.05 (one – tailed test)

d) Critical Region: x 2  9.488 for v   3 - 1 3  1  4

0 9.488
e) Computation:

119
1. e 
 45 83  18.675 4. e 
 45 78  17.55 7. e 
 45 39  8.775
1 200 4 200 7 200
2. e2 
 96 83  39.84 5. e5 
 96 78  37.44 8. e8 
 96 39  18.72
200 200 200
3. e 
 59 83  24.485 6. e 
 59 78  23.01 9. e 
 59 39  11.505
3 200 6 200 9 200

x2 
14  18.675 2

 37  39.84 2

 32  24.485 2

19  17.55 2
18.675 39.84 24.485 17.55


 42  37.44 2 
17  23.01 2 
12  8.775 2 
17  18.72 2
37.44 23.01 8.775 18.72


10  11.505 2
 7.48
11.505
f.) Decision: Accept H0 and reject H1 and conclude that the size of a family is
independent of the level of education attained by the father.

120
Exercise 16
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Test the corresponding hypothesis systematically.


1-10. The grades in Statistics subject of 5 sample students in the College of Arts and
Management were as follows:

X 1 2 3 4 5
Grade 83 85 80 90 82

Test the hypothesis, at 0.05 level of significance that the distribution is uniform.

Illustrative Testing:

a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

e. Computation:

f. Decision:

121
11 - 20. A random sample of 100 students in the College of Arts and Management are
classified according to gender and the number of hours they watched television during a
week:
Gender
Male Female Total
Over 30 hours 15 30 45
Under 30 hours 35 20 55
Total 50 50 100
Use a 0.05 level of significance to test the hypothesis that the time spent
watching television is independent of whether the viewer is male or female.

Illustrative Testing:

a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

e. Computation:

f. Decision:

122
21 - 30. Accordingto the study of the National Statistics Office, widows live longer than
widowers. Consider the following survival data collected on 100 widows and 100
widowers following the death of a spouse:
Status
Years Lived Widow Widower Total
Less than 5 26 29 55
5 to 10 40 40 80
More than 10 34 31 65
Total 100 100 200

Can we conclude at the 0.05 level of significance that the proportions of widows
and widowers are equal with respect to the different time periods that a spouse survives
after the death of his or her mate?
Illustrative Testing:

a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

e. Computation:

123
f. Decision:

Chapter IX. Linear Correlation & Regression

Objectives

 Illustrate & differentiate a positive relationship correlation from a negative


relationship correlation.
 Compute and interpret linear correlation coefficient.
 Compute, estimate, and interpret the trend line analysis or regression.

Pearson Product Moment Correlation

The most commonly used measure of relationship is the Pearson product


moment correlation coefficient which is denoted by r, which varies from -1  r  1.
Figure 1 shows a perfect positive relationship between the x and y variables,
while figure 2 indicates a perfect negative relationship.

Figure 1. Perfect Positive Relationship


X 1 2 3 4 5 6 7
Y 2 3 4 5 6 7 8

124
Figure 2. Perfect Negative Relationship
X 1 2 3 4 5 6 7
Y 12 10 8 6 4 2 0

In real life situations, however, the relationship between variables is not perfect.
Figure 3 illustrates a high positive relationship, while figure 4 shows a high negative
relationship.
Figure 3. Very High Positive Relationship
X 1 2 3 4 5 6 7
Y 2 2 4 8 8 10 11

Figure 4. Very High Negative Relationship


X 1 2 3 4 5 6 7
Y 2 2 4 8 8 10 11

Guide in Interpreting r (Likert’s Scale)

125
-

Pearson’s r Formula

To determine if there is significant relationship or no relationship at all between


two variables, the Pearson’s r formula is used and it is given by:

n  xy    x   y 

  
r 
2 2
n  x2   x n  y2    y
 Pearson r 
Where:
r = Pearson product moment correlation
x = independent variable
y = dependent variable
n = total number of paired observations

Example

1. Compute the correlation between the heights (x) in feet and weights (y) in kilogram of
10 sample students in DMMMSU – MLUC.

Height
5.2 4.5 4.11 5.7 5.8 6.2 4.8 5.5 6 5.4
(x)
Weight
40 42.5 55 70 75 60 80 62 63 54
(y)

Illustrative Testing: (Using the Formula)

Height (x) Weight (y) x2 y2 Xy


5.2 40 27.04 1600 208
4.5 42.5 20.25 1806.25 191.25
4.11 55 16.8921 3025 226.05
5.7 70 32.49 4900 399

126
5.8 75 33.64 5625 435
6.2 60 38.44 3600 372
4.8 80 23.04 6400 384
5.5 62 30.25 3844 341
6 63 36 3969 378
5.4 54 29.16 2916 291.6
 x = 53.21  y = 601.5
 x 2 = 287.20  y 2 = 37,685.25  xy = 3225.9
 Σ x  2 = 2831.3  Σ y  2 =361,802.3

n  xy    x   y 

  
r 
2 2
n  x2   x n  y2    y
10  3225.9    53.21  601.5

10 (287.20)  2831.3 10 (37685.25)  361802.3
 0.32

(Using ES PLUS CASIO)


mode, stat, a + bx, input x column (5.2 =, 4.5 =, 4.11 =, 5.7 =, 5.8 =, 6.2 =,
4.8 =, 5.5 =, 6 =, 5.4 =), input y column (40 =, 42.5 =, 55 =, 70 =, 75 =,
60 =, 80 =, 62 =, 63 =, 54 =), AC, shift, 1, reg, r, then =.
The value of r is 0.32.
(Using MS EXCEL)
open ms excel, input height observations, input weight observations, select
vacant cell, =, type pearson or correl, highlight x column, comma, highlight
y column, ), then enter. The value of r is the same as 0.32.

Height (x) Weight (y)


5.2 40
4.5 42.5
4.11 55
5.7 70
5.8 75
6.2 60
4.8 80
5.5 62
6 63
5.4 54

Correlation 0.323428854
pearson r 0.323428854

Interpretation:
For the data on the heights and weights of 10 sample students in DMMMSU –
MLUC, the computed value of r is 0.32. The value implies that weight has a low positive
correlation with height since r is positive. It can also be perceived that as a person
becomes taller, there is a small tendency to increase the weight.

127
Linear Regression

The main concern of linear regression is with the problem of estimation,


prediction or forecasting. It refers to the fact that there is a straight – line relationship
between the variables x and y.
 Regression Equation – a mathematical equation that allows us to predict
values one dependent variable from known values of one or more dependent
variables.
 Trend Line – line that represents the series of points that were plotted in such a
way that line approximates the general direction of the points and passes
through points.

Methods

1. Graphical Method – this method consists of plotting the points corresponding to the
paired values of X and Y on the rectangular coordinate system. It provides a rough
estimate.
Example

1. Plot the points corresponding to the paired values of the age of adults (X) and Y peak
heart rate (Y) on the rectangular coordinate system and draw the trend line.

Age 10 20 20 25 30 30 30 40 45 50
PHR 210 200 195 195 190 180 185 180 170 ?

Illustration and interpretation:

Interpretation:
Peak heart rate tends to decrease as the age increases.

2. Regression Formula – a fairly accurate estimate when the values of any two

variables are given. It makes use of the equation y  a  b X .


  Y  ΣX 2 
  ΣX  ΣXY 
a 
n ΣX 
2 
  ΣX 
2

128
n  ΣXY    ΣX  ΣY 
b 

n ΣX 2   ΣX  
2

Where:
a = intercept
b = slope of the line fitted to the sample

y = estimated value of the dependent variable
X = observed value of the independent variable

Example

1. Compute the regression line of the above example and interpret the result. Make a
prediction on the peak heart rate if the age of a person is 60 years old.

Age (X) PHR (Y) XY X2 Y2


10 210 2100 100 44100
20 200 4000 400 40000
20 195 3900 400 38025
25 195 4875 625 38025
30 190 5700 900 36100
30 180 5400 900 32400
30 185 5550 900 34225
40 180 7200 1600 32400
45 170 7650 2025 28900
50 165 8250 2500 27225
X = 300 Y = 1870 XY =
 X 2 =10,350  Y2 =351,400
 ΣX  2 =90,000  ΣY  2 =3,496,900 54,625

Illustrative Solution, interpretation & prediction:

(Using the Formula)


  Y  ΣX 2 
  ΣX  ΣXY 
a 
n ΣX 2 
  ΣX 
2


1870 10,350    300 54,625
10 10,350    90,000 
 219.78

n  ΣXY    ΣX  ΣY 
b 

n ΣX 2   ΣX  2

10 54,625   300  1870 



10 10,350    90,000 
  1.09

Regression equation:

129

y  219.78 - 1.09X

(Using ES PLUS CASIO)


mode, stat, a + bx, input x column (10 =, 20 =, 20 =, 25 =, 30 =, 30 =,
30 =, 40 =, 45 =, 50 =), input y column (210 =, 200 =, 195 =, 195 =, 190 =,
180 =, 185 =, 180 =, 170 =, 165 =), AC, shift, 1, reg, a, then =.
The value of a is 219.78. To find b, AC, shift, 1, reg, b, then =.
The value of b is -1.09. Thus, regression equation is the same.

(Using MS EXCEL)
open ms excel, input age observations, input phr observations, select data,
data analysis, regression analysis, OK, highlight phr observations, place
cursor to x range, highlight age observations, set confidence level at 95,
select and assign output range, then press OK.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.970793994
R Square 0.942440979
Adjusted R Square 0.935246101
Standard Error 3.507597574
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 1611.574074 1611.574 130.9878 3.07301E-06
Residual 8 98.42592593 12.30324
Total 9 1710
Coefficients Standard Error t Stat P-value
Intercept 219.77 3.071 71.560 1.62E-12
X Variable 1 -1.09 0.095 -11.445 3.07E-06

Interpretation:
Peak heart rate tends to decrease as age increases based on the illustration
above by scatter diagram. The peak heart rate of an individual can reach during
intensive exercise decreases by an estimated 1.09 for each increase in the age of one
year.

Prediction:
If the age of the person is 25 years old, we can predict the peak heart rate of a
person using the regression equation:


y  219.78 - 1.09X
 219.78 - 1.09  25
 192.53 or 192

130
Exercise 17
Name: Score:
Course & Year: Date:

Directions: Answer what is being asked for each of the following.

A. Test the corresponding hypothesis systematically.


1 - 10. Correlate Job Performance and Job Satisfaction of CAM Working Students with
the given information and interpret the result.

JP JS
2.75 2.55
2.50 2.35
4.00 3.75
3.25 4.25
3.10 4.10
4.25 3.34
4.10 2.86
3.05 2.90
3.45 2.75
2.45 3.46

Illustrative Testing:
a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

131
e. Computation:

f. Decision:

11 - 20. A researcher in the DMMMSU-MLUC was conducting a research and willing to


determine whether the time spent for remedial sessions of the students in the subject
algebra will have an effect on the performance in the subject. The time spent (in hours)
by 10 sample students in the subject were recorded with their corresponding
performance (in grades) in just two weeks. Here are the results:

Time Spent (x) Performance (Y)


2 70
4 72
7 75
7 80
10 80
13 85
15 88
15 89
17 89
20 92

Use the scatter diagram and compute the regression equation and interpret as
well.
Predict of what will be the grade of the student using the regression equation if
the time he or she spent for his/her study was 25.

Illustrative Testing:
a. Ho:

b. H1:

c. Level of Significance & Type of Test:

d. Critical Regions & Critical Values:

132
e. Computation:

f. Decision:

References

Agbayani, Victor A. E. Applied Statistics for Business and Research. Quezon


City: AFA Publications, Inc. 1994
Doval, Santos, Esther Z. et. al. Fundamentals of Statistical Analysis. Manila:
Era Philippines, Inc. 1982
Downie, N.M. & Heath R. W. Basic Statistical Methods. 4th ed. New York:
Harper and Row Publishers, 1984
Freund, J. E. Modern Elementary Statistics. 5th ed. New Jersey: Prentice
Hall, 1979
Freund, J. E. & Williams, F. J. Elementary Business Statistics: The Modern
Approach. 4th ed. New Jersey: Prentice Hall, 1983
Ferguson, George. Statistical Analysis in Psychology and Education. Tokyo:
McGraw Hill Book Co., 1971
Guilford, J. P. Fundamental Statistics Analysis in Psychology and
Education. Tokyo: McGraw Hill Book Co., 1973
Hay, W. L. Statistics for the Social Sciences. 2nd ed. New York: Holt, Rinehart
& Winston. 1973
Koosis, D. J. & Cola Darci, A. P. Statistics: A Self-Teaching Guide 2nd ed. New
York: John Wiley & Sons, 1977
Punzalan, T. G., et . al. A Simplified Approach. Manila: Rex Book Store, 1989
Rothstein, Anne L. Research Design and Statistics for Physical Education.
New Jersey: Prentice-Hall, Inc. 1985
Walpole, R. E. & Myers, R. H. Probability and Statistics for Engineers and
Scientists. New York: McMillan Company, 1972
Ya-lun, Chou. Probability and Statistics for Decision Making. New York: Holt,
Rinehart & Winston, 1972
Yamane, T. Statistics: An Introductory Analysis. New York: Harper and
Row, 1970

133

You might also like