Inferential Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Republic of the Philippines

POLYTECHNIC UNIVERSITY OF THE PHILIPPINES


Office of the Vice President for Branches and Campuses
Santa maria bulacan campus
Santa Maria, Bulacan

INSTRUCTIONAL MATERIALS FOR


SEMA 30133 ADVANCED STATISTICS

COMPILED BY:
JAYMIL B. DELOS REYES, LPT
INTRODUCTION

This course presents techniques for statistical analysis. Rank-based and resampling

techniques are well represented, but strong techniques are considered as well. These

techniques include one-sample testing and estimation, multi-sample testing and estimation, and

regression.

.This course will begin with a parametric statistics and will then shift to nonparametric

statistics. The students will train to do inferences which focused on both parametric and

nonparametric statistics. The students will also examine their techniques to identify what

statistical method is appropriate for the difference, relationship or association between two or

more variables.

By the end of this course, the student should be able to understand the basic concepts

and procedures of non-parametric statistics by illustrating examples that apply statistical

concepts, identify the correct usage of statistical tests by conducting investigations and

researches to formulate data-driven conclusions and decisions, develop skill in problem solving

by giving appropriate examples that can be solved using non-parametric statistics and

appreciate statistics by advocating the use of statistical data in making important decisions in

everyday life.

Note: No copyright infringement intended in all content of the IMs.


COURSE OUTCOMES
At the end of the course, the student should be able to:

A. Understand the basic concepts and procedures of non-parametric statistics by


illustrating examples that apply statistical concepts;
B. Analyze data by using appropriate technology for informed decision-making;
C. Identify the correct usage of statistical tests by conducting investigations and
researches to formulate data-driven conclusions and decisions;
D. Develop skill in problem solving by giving appropriate examples that can be solved
using non-parametric statistics;
E. Appreciate statistics by advocating the use of statistical data in making important
decisions in everyday life.

LEARNING OUTCOMES
At the end of each lesson, the student should be able to:

Topic Learning Outcomes

1. Testing the Differences between


Two Means
1.1 Testing the Differences
 Test the difference between sample
between Two Means: Using Z-
Test means, using the z test.
1.2 Testing the Differences  Test the difference between two means for
between Two Means: Using T- independent samples, using the t test.
Test
1.3 Testing the Differences  Test the difference between two means for
between Two Means: Dependent dependent samples.
Samples
 Test the difference between two
1.4 Testing the Differences
between Proportions proportions.

2. REGRESSION AND  Explain What A Correlation Coefficient


CORRELATION Measure Is.
2.1 Correlation coefficient
2.2 Testing the correlation
coefficient
2.3 Simple linear regression  Understand And Apply The Formula In
Finding Pearson Product Moment Correlation
 Test The Significance Of The Correlation
Coefficient
 Compute The Equation Of The Regression
Line.
 Interpret a linear regression equation.

 Distinguish one-way ANOVA from two-way


3. ANALYSIS OF VARIANCE ANOVA
 Interpret the results of ANOVA.

 Test a distribution for goodness of fit, using


chi-square
4. Chi- Square Test  Formulate inferences about population
parameters.

 Determine the importance of using


Nonparametric Statistics
5. Introduction to Nonparametric
Statistics  Identify the uses of Nonparametric tests
 State the advantages and disadvantages of
nonparametric methods.
 Compute the Wilcoxon signed ranks test
6. Wilcoxon signed ranks test
 Interpret the result.
 Compute the Mann-Whitney U- Test.
7. Mann-Whitney U-test
 Interpret the result
 Compute the Friedman test .
8. Friedman test
 Interpret the result
 Compute the Kruskal-Wallis Test .
9. The Kruskal-Wallis Test
 Interpret the result
 Compute the Spearman Rank Correlation
10. The Spearman Rank Coefficient
Correlation Coefficient
 Interpret the result
UNIT 1: TESTING THE DIFFERENCE BETWEEN 2 MEANS
LESSON 1– Z-TEST

Introduction:
The basic concepts of hypothesis testing were explained. With the z, t, and x2 tests, a
sample mean, variance, or proportion can be compared to a specific population mean, variance,
or proportion to determine whether the null hypothesis should be rejected. There are, however,
many instances when researchers wish to compare two sample means, using experimental and
control groups. For example, the average lifetimes of two different brands of bus tires might be
compared to see whether there is any difference in tread wear. Two different brands of fertilizer
might be tested to see whether one is better than the other for growing plants. Or two brands of
cough syrup might be tested to see whether one brand is more effective than the other. In the
comparison of two means, the same basic steps for hypothesis testing are used, and the z and t
tests are also used.

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. Test the difference between sample means, using the z test.

Course Materials:

The theory behind testing the difference between two means is based on selecting pairs
of samples and comparing the means of the pairs. The population means need not be known. All
possible pairs of samples are taken from populations. The means for each pair of samples are
computed and then subtracted, and the differences are plotted. If both populations have the same
mean, then most of the differences will be zero or close to zero. Before you can use the z test to
test the difference between two independent sample means, you must make sure that the
following assumptions are met.
ASSUMPTIONS FOR THE Z TEST TO DETERMINE THE DIFFERENCE BETWEEN TWO
MEANS

1. Both samples are random samples.


2. The samples must be independent of each other. That is, there can be no relationship
between the subjects in each sample.
3. The standard deviations of both populations must be known; and if the sample sizes
are less than 30, the populations must be normally or approximately normally distributed.

FORMULA FOR THE Z TEST FOR COMPARING TWO MEANS FROM


INDEPENDENT POPULATIONS

(𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2 )


𝑧=
𝜎2 𝜎2
√ 1 + 2
𝑛1 𝑛2

These tests can also be one-tailed, using the following hypotheses:

The basic format for hypothesis testing using the traditional method is reviewed here.

Step 1 State the hypotheses and identify the claim.

Step 2 Find the critical value(s).

Step 3 Compute the test value.

Step 4 Make the decision.

Step 5 Summarize the results.


Example:

A study using two random samples of 35 people each found that the average amount of
time those in the age group of 26–35 years spent per week on leisure activities was 39.6 hours,
and those in the age group of 46–55 years spent 35.4 hours. Assume that the population
standard deviation for those in the first age group found by previous studies is 6.3 hours, and
the population standard deviation of those in the second group found by previous studies was
5.8 hours. At alpha = 0.05, can it be concluded that there is a significant difference in the
average times each group spends on leisure activities?

Solution:

Step 1: State the hypotheses and identify the claim

𝐻0 = 𝜇1 = 𝜇2

𝐻1 = 𝜇1 ≠ 𝜇2

Step 2: Find the critical values.

Since α = 0.05, the critical values are +1.96 and -1.96.

Step 3: Compute the test value.

(𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2 )


𝑧=
𝜎2 𝜎2
√ 1 + 2
𝑛1 𝑛2

(39.6 − 35.4) − (0)


𝑧= = 2.90
2 2
√6.3 + 5.8
35 35
Step 4: Make the decision.

Reject the null hypothesis at α= 0.05 since 2.90 > 1.96.

Step 5: Summarize the results

There is enough evidence to support the claim that the means are not equal. That is, the
average of the times spent on leisure activities is different for the groups.
Watch:
 The (Pearson) Correlation Coefficient Explained in One Minute: From Definition to
Formula

https://www.youtube.com/watch?v=WpZi02ulCvQ

Read:
 Z-test : two Sample Mean
Bluman, A. G. (2012). Descriptive and Inferential Statistics. In Bluman, A. G.,
ELEMENTARY STATISTICS: A STEP BY STEP APPROACH, EIGHT EDITION. New
York: McGraw-Hill Education
UNIT 1: TESTING THE DIFFERENCE BETWEEN 2 MEANS
LESSON 2– T-TEST

Introduction:
The basic concepts of hypothesis testing were explained. With the z, t, and x 2 tests, a
sample mean, variance, or proportion can be compared to a specific population mean, variance,
or proportion to determine whether the null hypothesis should be rejected. There are, however,
many instances when researchers wish to compare two sample means, using experimental and
control groups. When comparing two means by using the t test, the researcher must decide if the
two samples are independent or dependent.

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. Test the difference between two means for independent samples, using the t test.

Course Materials:
Z test was used to test the difference between two means when the population
standard deviations were known and the variables were normally or approximately
normally distributed, or when both sample sizes were greater than or equal to 30. In
many situations, however, these conditions cannot be met—that is, the population
standard deviations are not known. In these cases, a t test is used to test the difference
between means when the two samples are independent and when the samples are
taken from two normally or approximately normally distributed populations. Samples are
independent samples when they are not related. Also it will be assumed that the
variances are not equal.

FORMULA FOR THE T TEST FOR TESTING THE DIFFERENCE BETWEEN TWO MEANS,
INDEPENDENT SAMPLES

(𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2 )


𝑡=
𝑠2 𝑠2
√ 1+ 2
𝑛1 𝑛2
Note: Variances are assumed to be unequal, where the degrees of freedom are equal to the
smaller of n1 - 1 or n2 – 1.

ASSUMPTIONS FOR THE T-TEST FOR TWO INDEPENDENT MEANS WHEN 𝝈𝟏 AND 𝝈𝟐
ARE UNKNOWN

1. The samples are random samples.

2. The sample data are independent of one another.

3. When the sample sizes are less than 30, the populations must be normally or
approximately normally distributed.

Example

A researcher wishes to see if the average weights of newborn male infants are different
from the average weights of newborn female infants. She selects a random sample of 10 male
infants and finds the mean weight is 7 pounds 11 ounces and the standard deviation of the
sample is 8 ounces. She selects a random sample of 8 female infants and finds that the mean
weight is 7 pounds 4 ounces and the standard deviation of the sample is 5 ounces. Can it be
concluded at alpha = 0.05 that the mean weight of the males is different from the mean weight
of the females? Assume that the variables are normally distributed.

Solution:

Step 1: State the hypotheses and identify the claim

𝐻0 = 𝜇1 = 𝜇2

𝐻1 = 𝜇1 ≠ 𝜇2

Step 2: Find the critical values.

Since the test is two-tailed and alpha = 0.05, the degrees of freedom are the smaller of
n1 - 1 or n2 - 1. In this case, n1 - 1 =10 – 1 = 9 and n2 – 1 = 8 – 1= 7. From F- Table, the critical
values are +2.365 and -2.365.
Step 3 Compute the test value. Change the means to ounces (1 lb = 16 oz):

7 lb 11 oz = 7 x 16 + 11 = 123 oz

7 lb 4 oz = 7 x 16 + 4 = 116 oz

(𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2 ) (123 − 116) − (0)


𝑡= = = 2.268
2 2
𝑠2 𝑠22 √8 + 5
√ 1 + 10 8
𝑛1 𝑛2

Step 4 Make the decision.

Do not reject the null hypothesis, since 2.268 < 2.365

Step 5 Summarize the results.

There is not enough evidence to support the claim that the mean of the weights of the
male infants is different from the mean of the weights of the female infants.

Read:
 T-test for Two Independent Means
Bluman, A. G. (2012). Descriptive and Inferential Statistics. In Bluman, A. G.,
ELEMENTARY STATISTICS: A STEP BY STEP APPROACH, EIGHT EDITION. New
York: McGraw-Hill Education
UNIT 1: TESTING THE DIFFERENCE BETWEEN 2 MEANS
LESSON 3– DEPENDENT SAMPLES

Introduction:
Z- test was used to compare two sample means when the samples were independent and
σ1 and σ2 were known. T- test was used to compare two sample means when the samples were
independent. In this section, a different version of the t test is explained. This version is used
when the samples are dependent.

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. Test the difference between two means for independent samples, using the t test.

Course Materials:
Samples are considered to be dependent samples when the subjects are paired or
matched in some way. Dependent samples are sometimes called matched-pair samples. For
example, suppose a medical researcher wants to see whether a drug will affect the reaction
time of its users. To test this hypothesis, the researcher must pretest the subjects in the sample.
That is, they are given a test to ascertain their normal reaction times. Then after taking the drug,
the subjects are tested again, using a posttest. Finally, the means of the two tests are compared
to see whether there is a difference. Since the same subjects are used in both cases, the
samples are related; subjects scoring high on the pretest will generally score high on the
posttest, even after consuming the drug. Likewise, those scoring lower on the pretest will tend to
score lower on the posttest. To take this effect into account, the researcher employs a t test,
using the differences between the pretest values and the posttest values. Thus, only the gain or
loss in values is compared.

When the samples are dependent, a special t test for dependent means is used. This
test employs the difference in values of the matched pairs. The hypotheses are as follows:
Before you can use the testing method presented in this section, the following
assumptions must be met.

Assumptions for the t Test for Two Means When the Samples Are Dependent

1. The sample or samples are random.

2. The sample data are dependent.

3. When the sample size or sample sizes are less than 30, the population or populations
must be normally or approximately normally distributed.

FORMULAS FOR THE T TEST FOR DEPENDENT SAMPLES

̅ − 𝜇𝐷
𝐷
𝑡= 𝑠𝐷
√𝑛

with d.f.= n - 1 and where

∑𝐷 𝑛 ∑ 𝐷 2 − (∑ 𝐷)2
̅=
𝐷 𝑎𝑛𝑑 𝑠𝐷 = √
𝑛 𝑛(𝑛 − 1)

Testing the Difference Between Means for Dependent Samples

Step 1 State the hypotheses and identify the claim.

Step 2 Find the critical value(s).


Step 3 Compute the test value.

a. Make a table, as shown.

A B
X1 X2
D = X1- X2 D2 = (X1- X2)2

. .

. .

. . ∑D = _____ ∑D2 = _____

b. Find the differences and place the results in column A.

D = X1- X2

c. Find the mean of the differences.

∑𝐷
̅=
𝐷
𝑛

d. Square the differences and place the results in column B. Complete the
table.

D2 = (X1- X2)2

e. Find the standard deviation of the differences.

𝑛 ∑ 𝐷 2 − (∑ 𝐷)2
𝑠𝐷 = √
𝑛(𝑛 − 1)

f. Find the test value.

̅ − 𝜇𝐷
𝐷
𝑡= 𝑠𝐷
√𝑛
Step 4 Make the decision.

Step 5 Summarize the results.

EXAMPLE:

A dietitian wishes to see if a person’s cholesterol level will change if the diet is
supplemented by a certain mineral. Six randomly selected subjects were pretested, and then
they took the mineral supplement for a 6-week period. The results are shown in the table.
(Cholesterol level is measured in milligrams per deciliter.) Can it be concluded that the
cholesterol level has been changed at alpha = 0.10? Assume the variable is approximately
normally distributed.

Subject 1 2 3 4 5 6

Before
210 235 208 190 172 244
(X1)

After (X2) 190 170 210 188 173 228

SOLUTION:

Step 1 State the hypotheses and identify the claim.

If the diet is effective, the before cholesterol levels should be different from the after
levels.

𝐻0 : 𝜇𝐷 = 0

𝐻1 : 𝜇𝐷 ≠ 0

Step 2 Find the critical value.

The degrees of freedom are 6 – 1= 5. At a 0.10, the critical values are ±2.015.
Step 3 Compute the test value.

a. Make a table, as shown.

A B
Before (X1) After (X2)
D = X1- X2 D2 = (X1- X2)2

210 190

235 170

208 210

190 188

172 173

244 228

b. Find the differences and place the results in column A.

D = X1- X2

210 - 190 = 20

235 – 170 = 65

208 – 210 = -2

190 – 188 = 2

172 – 172 = -1

244 – 228 = 16
∑D = 100

c. Find the mean of the differences.

∑𝐷 100
̅=
𝐷 = = 16.7
𝑛 6

d. Square the differences and place the results in column B. Complete the
table.

D2 = (X1- X2)2

(20)2 = 400

(65)2 = 4225

(-2)2 = 4

(2)2 = 4

(-1)2 = 1

(16)2 = 256

∑D2 = 4890

e. Find the standard deviation of the differences.

𝑛 ∑ 𝐷 2 − (∑ 𝐷)2 (6 × 4890) − 1002


𝑠𝐷 = √ = √ = 25.4
𝑛(𝑛 − 1) 6(6 − 1)

f. Find the test value.

̅ − 𝜇𝐷
𝐷 16.7 − 0
𝑡= 𝑠𝐷 = 25.4 = 1.610
√𝑛 √6

Step 4 Make the decision.

The decision is to not reject the null hypothesis, since the test value 1.610 is in the
noncritical region.
Step 5 Summarize the results.

There is not enough evidence to support the claim that the mineral changes a person’s
cholesterol level.

Read:
 Correlation Coefficient
Bluman, A. G. (2012). Descriptive and Inferential Statistics. In Bluman, A. G.,
ELEMENTARY STATISTICS: A STEP BY STEP APPROACH, EIGHT EDITION New
York: McGraw-Hill Education
UNIT 1: TESTING THE DIFFERENCE BETWEEN 2 MEANS
LESSON 4: TESTING THE DIFFERENCE BETWEEN PROPORTIONS

Introduction:
The z test with some modifications can be used to test the equality of two proportions. For
example, a researcher might ask, Is the proportion of men who exercise regularly less than the
proportion of women who exercise regularly? Is there a difference in the percentage of students
who own a personal computer and the percentage of nonstudents who own one? Is there a
difference in the proportion of college graduates who pay cash for purchases and the proportion
of non-college graduates who pay cash?

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. Test the difference between two proportions.

Course Materials:
The symbol 𝑝̂ (“p hat”) is the sample proportion used to estimate the population
proportion, denoted by p. For example, if in a sample of 30 college students, 9 are on probation,
9
then the sample proportion is 𝑝̂ = , or 0.3. The population proportion p is the number of all
30

students who are on probation, divided by the number of students who attend the college. The
formula for the sample proportion is
𝑋
𝑝̂ =
𝑛

Where:
X = number of units that possess the characteristic of interest
n = sample size

When you are testing the difference between two population proportions p 1 and p2, the
hypotheses can be stated thus, if no specific difference between the proportions is hypothesized.
𝐻0 : 𝑝1 = 𝑝2 𝐻0 : 𝑝1 − 𝑝2 = 0
or
𝐻1 : 𝑝1 ≠ 𝑝2 𝐻1 : 𝑝1 − 𝑝2 ≠ 0

Similar statements using < or > in the alternate hypothesis can be formed for one-tailed tests.

FORMULA FOR THE Z TEST VALUE FOR COMPARING TWO PROPORTIONS


(𝒑 ̂𝟐 ) − (𝒑𝟏 − 𝒑𝟐 )
̂𝟏 − 𝒑
𝒛=
𝟏 𝟏
√𝒑
̅𝒒̅( + )
𝒏𝟏 𝒏 𝟐

𝑿𝟏 +𝑿𝟐 𝑿𝟏
̅=
𝒑 ̂𝟏 =
𝒑
𝒏𝟏 +𝒏𝟐 𝒏𝟏
𝑿𝟐
̅ = 𝟏− 𝒑
𝒒 ̅ ̂𝟐 =
𝒑
𝒏𝟐
Before you can test the difference between two sample proportions, the following
assumptions must be met.
ASSUMPTIONS FOR THE Z TEST FOR TWO PROPORTIONS
1. The samples must be random samples.
2. The sample data are independent of one another.
3. For both samples np ≥ 5 and nq ≥ 5.

EXAMPLE:
In the nursing home study mentioned in the chapter-opening Statistics Today, the
researchers found that 12 out of 34 randomly selected small nursing homes had a resident
vaccination rate of less than 80%, while 17 out of 24 randomly selected large nursing homes had
a vaccination rate of less than 80%. At alpha= 0.05, test the claim that there is no difference in
the proportions of the small and large nursing homes with a resident vaccination rate of less than
80%.
Solution:
Step 1 State the hypotheses and identify the claim.
𝐻0 : 𝑝1 = 𝑝2

𝐻1 : 𝑝1 ≠ 𝑝2

Step 2 Find the critical values.


Since α = 0.05, the critical values are +1.96 and -1.96.
Step 3 Compute the test value.
̂𝟏 , 𝒑
First compute 𝒑 ̂𝟐 , 𝒑
̅, and 𝒒
̅. Then substitute in the formula.

𝑋1 12 𝑋2 17
𝑝̂1 = = = 0.35 𝑝̂2 = = = 0.71
𝑛1 34 𝑛2 24
𝑋1 +𝑋2 12 + 17 29
𝑝̅ = = = = 0.5
𝑛1 +𝑛2 34 + 24 58
𝑞̅ = 1 − 𝑝̅ = 1 − 0.5 = 0.5

(𝑝̂1 − 𝑝̂2 ) − (𝑝1 − 𝑝2 )


𝑧=
1 1
√𝑝̅ 𝑞̅ ( + )
𝑛1 𝑛2
(0.35 − 0.71) − (0) − 0.36
𝑧= = = −2.70
0.1333
√(0.5) (0.5) ( 1 + 1 )
34 24
Step 4 Make the decision
Reject the null hypothesis, since -2.70 < -1.96.
Step 5 Summarize the results.
There is enough evidence to reject the claim that there is no difference in the proportions
of small and large nursing homes with a resident vaccination rate of less than 80%.

Read:
 Correlation Coefficient
Bluman, A. G. (2012). Descriptive and Inferential Statistics. In Bluman, A. G.,
ELEMENTARY STATISTICS: A STEP BY STEP APPROACH, EIGHT EDITION New
York: McGraw-Hill Education

You might also like