Stat

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 70

Would you change the

Channel?
A survey by the a known organization
found that 45% of the people who
were offended by a television program
would change the channel, while 15%
would turn off their television sets. The
survey further stated that the margin
of error is 3% points, and 4000 adults
were interviewed.
Several Questions arise:
1. How do these estimates compare
with the true population percentage?
2. What is meant by a margin of
error of 3 percentage points?
3. Is the sample of 4000 large
enough to represent the population
of all adults who watch television in
the Philippines?

STATISTICAL INFERENCE:
Estimation
Estimation
Is a process of estimating the
value of a parameter from
information obtained from a
sample
Two Types of Estimates
Point Estimates
Interval Estimates
Point Estimate
Is a specific numerical value estimate
of a parameter.
Interval Estimate
is an interval or range of values used
to estimate the parameter. This
estimate may or may not contain the
value of the parameter being
estimated.
Three Properties of a good
estimator
The estimator should unbiased
estimator.
The estimator should be consistent.
For a consistent estimator, as sample
size increases the value of the
estimator approaches the value of the
parameter estimated.
The estimator should be a relatively
efficient estimator. (has smallest
variance)
Confidence level
Is the degree of assurance
that a particular statistical
statement is correct, under
specified conditions.

Confidence Interval
Is a specific interval estimate of a
parameter determined by using data
obtained from a sample and by using
the specific confidence level of
estimate
Significance Level
Is the degree uncertainty
about the statistical statement
under the same conditions
used to determine the
confidence level.
Significance levels are symbolized
by:
Mathematically,

Confidence level + Significance level = 1
Confidence Intervals
Use to estimate range of
possible values parameter,
rather than a single value.
When you use a confidence
interval instead of a point
estimator, you lose a degree
of precision but you gain a
large degree of confidence.
In general form:

Where:

lower limit = point estimator error of estimate
upper limit = point estimator + error of estimate
Formula for the confidence
interval of the Mean for a
Specific alpha
- Maximum error of estimate
Maximum error of estimate
Is the maximum likely
difference between the point
estimate of a parameter and
the actual value of the
parameter
Examples:
1. A researcher wishes to estimate the
average amount of money a persons
spends on lottery ticket each month. A
sample of 50 people who play the
lottery found the mea to be 19 dollars
an the standard deviation to be 6.8.
Find the best point estimate of the
population mean and the 95%
confidence interval of the population
mean.
Examples:
2. A survey of 30 adults found that the
mean age of a persons primary
vehicle is 5.6 years. Assuming the
standard deviation of the population is
0.8 year, find the best point estimate
of the population mean and the 99%
confidence interval of the population
mean
Formula for the confidence
interval of the Mean for a
Specific alpha
The degrees of freedom( df) are n - 1
STATISTICAL HYPOTHESIS
TESTING

How much better is better?
Suppose a school superintendent
reads an article which states that the
overall mean entrance exam score is
85. furthermore, suppose that, for a
sample of students, the average of the
entrance exam scores in the
superintendents school district is 88.
Can the superintendent conclude that
the students in his school district
scored higher than the average?
Question Arises:
Is there a real difference in the
means?
Is the difference simply due to
chance?
Statistical Hypothesis
is an assertion or conjecture
concerning one or more
populations. This conjecture
may or may not be true.


Types of Hypothesis
1. Null Hypothesis - ( Ho) is the hypothesis
that is being tested; it represents what the
experimenter doubts to be true.
2. Alternative Hypothesis ( Ha) - is the
operational statement of the theory that the
experimenter believes to be true and wishes
to prove. It is the contradiction of the null
hypothesis. It also specifies an existence of a
difference or a relationship, therefore it is
non- directional.

Illustration of how hypotheses
should be stated:
Situation A: A medical researcher is
interested in finding out where a new
medication will have any undesirable
side effects. The researcher is
particularly concerned with the pulse
rate of the patients who take the
medication. Will the pulse rate
increase, decrease, or remain
unchanged after a patient takes a
medication?
Since the researcher knows
that the mean pulse rate for
the population under study is
82 beats per minute, the
hypotheses for this situation
are

The null hypothesis specifies that
the mean will remain unchanged,
and the alternative states that it
will be different. This test is called
TWO-TAILED TEST
Situation B: A chemist invents an
additive to increase the life of an
automobile battery. If the mean
lifetime of the automobile battery
without the additive is 36 months, then
the hypotheses are:

in this situation, the chemist is
interested only in increasing the
lifetime of the batteries, so her
alternative hypothesis is that the
mean is greater than 36 months.
This test is called RIGHT-TAILED
TEST
Situation C: A contractor wishes a
lower heating bills by using a special
type or insulation in houses. If the
average of the monthly heating bills is
500 pesos, her hypotheses about
heating costs with the use of insulation
are:

This test is called LEFT-TAILED
TEST
Two-tailed
test
Right-
tailed test
Left-tailed
test
Summary:
Exercises:
State the null and alternative
hypotheses for each
conjecture.
A. A researcher thinks if
expectant mothers use vitamin
pills, the birth weight of the
babies will increase. The
average birth weight or the
population is 8.6 pounds
Exercises:
B. An engineer hypothesizes
that the mean number of
defects can be decreased in a
manufacturing process of
compact disks by using robots
instead of humans for certain
tasks. The mean number of
defective disks per 1000 is 18.
Exercises:
C. A psychologist fells that
playing soft music during a
test will change the results of
the test. The psychologist is
not sure whether the grades
will be higher or lower. In the
past, the mean of the scores
was 73.
Solution:
Test Statistic - is a statistics whose
value is calculated from sample
measurements and on which the
statistical decisions will be based.

Types of Error
1. Type I Error- is the error made by
rejecting the null hypothesis when it is
true. The probability of type I error is
denoted by .
2. Type II Error - is the error made by
accepting ( not rejecting ) the null
hypothesis when it is false. The
probability of a Type II error is denoted
by .


Level of Significance ( ) is the
maximum probability of committing
Type I error the researcher is willing to
commit.
3 levels:
a. 0.1
b. 0.05
c. 0.01


Critical Value separates the critical
region from the non-critical region. The
symbol is C.V
Critical Region or Rejection Region -
is the set of values of the test statistic
for which the null hypothesis will be
rejected. The acceptance region is the
set of values of the test statistic for
which the null hypothesis will not be
rejected. The acceptance and rejection
regions are separated by a critical
value of the test statistic.

Finding Critical values:
Find the critical value(s) for each
situation and draw the appropriate
figure, showing the critical region.
a. A left-tailed with = 0.10
b. A two-tailed test with = 0.02
c. A right-tailed with = 0.005
Factors to be consider in
selecting
Statistical Tests
Each test is appropriate under certain
conditions.
When selecting a test consider four
factors:
structure of the null hypotheses
the level of measurement allowed, or
required, of the test
sample size
distribution of the responses (if the
distribution is normal or not)

Steps in Hypothesis Testing

1. Formulate the hypothesis and identify the claim.
2. Determine the critical value
3. Determine the computed value of the test
statistics from the given conditions.
4. Make a decision. In making a decision we
compare the computed value to the critical value.
We shall have two possibilities.
If the computed value is less than the critical value,
we accept the null hypothesis and reject the
alternative hypothesis.
If the computed value is greater than the critical
value, we reject the null hypothesis and accept the
alternative hypothesis.
5. Summarize the results.
Types of Statistical Test
Z Test
T- Test
Chi-Square Analysis
ANOVA
Correlation Coefficient

Z Test
The simplest and most common test on
the significance of sample data. The
application of Z test requires normality of
distribution. The sample size should be
greater than or equal to 30. This test is one
of the parametric tests since it utilize the
two population parameters and . If the
population standard deviation is not known,
then the sample standard deviation can be
used. The Z- test can be applied in two
ways:

One Sample Mean Test

Formula:



where : X bar sample mean
hypothesized value of the
population mean
- population standard deviation
n - sample size

o
n X
Z
computed
) (
=
Example:
1. A researcher reports that the
average salary of assistant
professors is more than 42, 000
dollars. A sample of 30 assistant
professors has a mean salary of
43,260 dollars. At
= 0.05, test the claim that the assistant
professors earn more than 42,000
dollars a year. The standard deviation
of the population is 5230 dollars.
Solution:
Step 1:

Step 2: Since = 0.05 and the test is a
right-tailed test, the critical value is z = +
1.65

Step 3:

Step 4:


Step 5: There is not enough evidence
to support the claim that assistant
professors earn more than 42,000
dollars a year.
Example:
2. The medical rehabilitation Education
Foundation reports that the average cost of
rehabilitation for stroke victims is 24,672
dollars. To see if the average cost of
rehabilitation is different at a particular
hospital, researcher selects a random
sample of 35 stroke victims at the hospital
and finds that the average cost of their
rehabilitation is 25,226 dollars. The
standard deviation of the population is
3251. At = 0.01, can it be concluded that
the average cost of stoke rehabilitation at a
particular hospital is different from 24,672
dollars?
Two Sample Mean Test.

Formula:



where: = the variance of sample 1
= the variance of sample 2
= size of sample 1
= size of sample 2


2
2
2
1
2
1
2 1
n n
x x
Z
computed
o o
+

=
2
1
o
2
2
o
1
n
2
n
Critical Values of Z at different
level of Significance
Test type Level of significance
.01 .025 .05 .10
One tailed 2.33 1.96 1.645 1.28
Two tailed 2.575 2.33 1.96 1.645
Example :

1. A supplier sells ropes. He claims that the ropes
have a mean strength of 34 lbs and a variance
of 64 lbs. A random sample of 32 ropes
selected from a shipment yields a mean
strength of 31 lbs. Are you going to reject the
claim of the supplier at .o5 level?
2. An admission test was administered to incoming
freshmen in two colleges. Two independent
samples of 150 students each are randomly
selected and the mean scores of the given
samples are 88 and 85. Assume that the
variances of the test scores are 40 and 35
respectively. Is the difference between the mean
scores significant or can be attributed to chance?
Use .01 level significance.


T- test

When the sample is small n < 30 and
when only the sample variance is
known use the t- test. The use of t-
test involves the use of the degree
of freedom of the distribution. The
degree of freedom ( df) varies
accordingly to the particular type of t
test to be used.
Degrees of Freedom (df)
Are the number of values that are
free to vary after a sample
statistic has been computed, and
they tell the researcher which
specific curve to use when a
distribution consists of family of
curves.

One Sample mean test

Formula:



where: df = n 1


s
n X
t
computed
) (
=
Steps on Hypothesis testing
State the Hypotheses and identify the
claim
Find the critical values
Compute the test value
Make the decision to reject the or nor
reject the null hypothesis.
Summarize the results.
Examples:
1. A job placement director claims that
the average starting salary for nurses
is 24, 000 dollars. A sample of 10
nurses salaries has a mean of 23,450
dollars and a standard deviation of
400 dollars. Is there enough evidence
to reject the directors claim at =
0.05?
Solution:
Step 1

Step 2: the critical values are +2.262
and
-2.262 for = 0.05 and d.f. = 9


Step 3:
Step 4:


Step 5: There is enough evidence to
reject the claim that the starting salary
of nurses is 24, 000 dollars.

Examples:
2. An educator claims that the
average salary of substitute teachers
in a school district is less than 60
dollars per day. A random sample of
eight school districts is selected, and
the daily salaries (in dollars) are
shown. Is there enough evidence to
support the educators claim at =
0.10?

60 56 60 55 70 55 60 55

Two Sample Mean test

Formula:



where : df = n
1
+n
2
- 2


2 1 2 1
2
2 2
2
1 1
2 1
1 1
2
) 1 ( ) 1 (
n n n n
s n s n
x x
t
computed
+ -
+
+

=
Exercises :


1. ABC company, a manufacturer of automobile tires
claims that the average life of its product is 45, 600
miles. A random sample of 15 tires was chosen and
resulted to a mean life of 43, 500 miles with standard
deviation of 3, 000 miles.
2. It is claimed that the mean drying time of a certain
brand of nail polish is less than or equal to 25 minutes.
Would you agree to this claim if a random sample of 16
bottles show a mean drying time of 26 minutes with a
standard deviation of 2.4 minutes, using .01 level of
significance?
3. A random sample of 25 cartons of a certain brand of
powdered milk showed a mean content of 237 grams
with a standard deviation of 8.56 grams, while a sample
of 20 cartons of another brand of powdered milk
showed a mean content of 240 grams with a standard
deviation of 9.75grams. Using a .05 level of
significance, is there a difference in the mean content of
two brands of powdered milk?

CHI-SQUARE TEST

The objective in Chi-square test is
to compare the differences of the
sample frequencies with expected
frequencies. As in the case of t-test,
the tabular/critical value of the chi-
square statistics is dependent on two
factors the level of significance and
the degrees of freedom. The level of
significance in this test need not be
divided by two.

TEST FOR INDEPENDENCE

The test for independence is used to
determine whether two variables are related
or not.Since two variables are involved, the
frequencies are entered in a bivariate table
or contingency table. The dimension of such
table is defined by the expression r x c
where r indicates the number of rows and c
indicates the numbers of column. If the null
hypothesis for independence is rejected,
then a relationship between the two
variables exists.
Formula:


Where:
= observed number of cases in the
ith row of the jth column

= expected number of cases under
Ho

Df = ( r 1)(c -1)
Df = ( r 1)(c -1)
Note:

The test is valid if at least 80% of the cells have
expected frequencies of at least 5 no cell has an
expected frequency 1
If many expected frequencies are very small,
researchers commonly combine categories of
variables to obtain a table having larger cell
frequencies. Generally, one should not pool
categories unless there is a natural way to combine
them.
For a 2x 2 contingency table, a correction called
Yates correction for continuity is applied. The
formula then becomes.
Example:

A survey was conducted to determine whether
gender and age are related among stereo shop
customers. A total of 200 respondents was taken
and the results are presented below.
Conduct a test whether gender and age of stereo
shop costumers are independent at 1% level of
significance.

Age Gender
Male Female Total
Under 30 60 50 100
30 and over 80 10 90
TOTAL 140 60 200
Test whether a persons music
preference is related to his intelligence
as measured by IQ at 5% level of
significance. The observed
frequencies are presented below.

Music
Preference
IQ
High Medium Low Total
Classical 40 26 17 83
Pop 47 59 25 131
Rock 83 104 79 266
TOTAL 170 189 121 480
Correlational Analysis
You are interested in testing the null
hypothesis that two variables are not
correlated.
Both variables are at the interval level
of measurement or higher.
A normal distribution of responses is
not required.

FORMULAS
Pearson r
2
2
2
2
|
|
.
|

\
|

|
|
.
|

\
|

|
|
.
|

\
|
|
|
.
|

\
|

=


N
Y
N
Y
N
X
N
X
N
Y
N
X
N
XY
r
Where:
X is the scores in a test
Y is the scores in a test
N is the number of examinees
Interpretation of the Pearson r
0.90 to 1.00( -0.90 to -1.00) Very high positive/negative
correlation
0.70 to 0.90 (-0.70 to -0.90) High positive/negative
correlation
0.50 to 0.70 (-0.50 to -0.70) Moderate positive/negative
correlation
0.30 to 0.50 (-0.30 to -0.50) Low positive/ negative
correlation
0.00 to 0.30 (0.00 to -0.30) Little , if any correlation
To know whether the obtained correlation
coefficient is significant i.e., that a real
correlation exists or that the obtained r is
not merely due to a sampling variation a, t-
test for testing the significance of r could be
used.
FORMULA:


df = n-2
Where: r = the obtained Pearson r
n = sample size


2
1
2
r
n
r t

=
Example:
A study was made to determine the
relationship existing between the grade in
Calculus and the grade in Fortan
Computer Language. A random sample of
10 computer students in certain university
were taken and the following results of the
sampling.
Is the relationship significant at 0.05 level?

Student
no.
1 2 3 4 5 6 7 8 9 10
Calculu
s (x)
75 83 80 77 89 78 92 86 93 84
Fortan
(y)
78 87 78 76 92 81 89 89 91 84
Analysis of Variance (ANOVA)
Interested in testing a null hypothesis
to find whether or not the means in
more than two samples are the same.
Very similar to the T-test (the T-test is
in fact a variation of ANOVA).
Used to compare the means of more
than two groups.
Can be used with small samples.

You might also like