Chi Square Test

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 44

Sampling Fundamentals

And CHI Square Test


RESEARCH METHODOLOGY
GROUP NO 3
Definition
Sample: is a subgroup of population you are interested in.
Sampling: is the process of selecting a few (a sample) from
a bigger group, the sampling population, to become the
basis for estimating or predicting the prevalence of an
unknown piece of information, situation or outcome
regarding the bigger group.
Identifying the Target Population
Determining the Sampling Frame
Selecting a Sampling Frame
Probability
Sampling
Non-Probability
Sampling
Determining the Relevant Sample Size
Execute Sampling
Data Collection From Respondents
Information for Decision-Making
Reconciling the
Population, Sampling
Frame Differences
Handling the Non-
Response Problem
The Sampling
Process
Important Statistical Terms
Population: a set which includes
all measurements of interest to
the researcher
(The collection of all responses,
measurements, or counts that
are of interest)
Sample: A subset of the
population

Why Sampling?
Get information about large populations
Less costs
Less field time
More accuracy i.e. Can Do A Better Job of Data Collection
When its impossible to study the whole population

Target
Population:
The population to
be studied/ to which
the investigator
wants to generalize
his results
Sampling Unit:
smallest unit from
which sample can
be selected
Sampling frame
List of all the
sampling units from
which sample is
drawn
Sampling
scheme
Method of selecting
sampling units from
sampling frame
Types Of Sampling
Non-
probability
samples
Probability
samples
Non Probability Samples
Convenience
samples (ease of
access)
Sample is selected
from elements of a
population that are
easily accessible
Snowball sampling
(friend of
friend.etc.)
Purposive sampling
(judgemental)
You chose who you think
should be in the study
Quota sample
Non Probability Samples


Probability of being chosen is unknown
Cheaper- but unable to generalise
potential for bias
Probability Samples
Random sampling
Each subject has a known probability of being selected
Allows application of statistical sampling theory to results to:
Generalise
Test hypotheses
Conclusions
Probability
samples are the
best
Ensure
Representativeness
Precision
Methods Used In Probability Samples
Simple random sampling
Systematic sampling
Stratified sampling
Multi-stage sampling
Cluster sampling
Simple Random
Sampling
Table Of Random Numbers
6 8 4 2 5 7 9 5 4 1 2 5 6 3 2 1 4 0
5 8 2 0 3 2 1 5 4 7 8 5 9 6 2 0 2 4
3 6 2 3 3 3 2 5 4 7 8 9 1 2 0 3 2 5
9 8 5 2 6 3 0 1 7 4 2 4 5 0 3 6 8 6

Systematic Sampling


Sampling fraction
Ratio between sample size and population size
Systematic
Sampling
Cluster Sampling
Cluster: a group of sampling units close to each other i.e. crowding
together in the same area or neighborhood

Cluster
Sampling
Section 4
Section 5
Section 3
Section 2 Section 1
Stratified
sampling
Multi-stage
sampling
Stratification: The elements in the population are divided into
layers/groups/ strata based on their values on one/several auxiliary
variables. The strata must be non-overlapping and together
constitute the whole population.
Sampling within strata: Samples are selected independently
from each stratum. Different selection methods can be used in
different strata.

Stratified Sampling
Stratified
Sampling
Complex form of cluster sampling in which two or more levels of
units are embedded one in the other.
First stage, random number of districts chosen in all
states.
Followed by random number of talukas, villages.
Then third stage units will be houses.
All ultimate units (houses, for instance) selected at last step are
surveyed.

Multi-stage Sampling
Multi-stage
Sampling
Errors in Sampling
Non-Sampling Errors
These are errors that arise during the
course of all data collection activities.
Characteristics of Non-Sampling errors
Exist in both sample surveys and
censuses data
Difficult to measure.
Sampling Errors
This error is nothing but the difference in
True value of population & values from the
sample survey.
Characteristics of Sampling errors -
Depends on the size of samples.
Occurs only in sample surveys
are measurable.
Non Sampling Errors
Non-sampling errors arise from:
Failure to identify the target population.
Non response.
Errors by responses-
Poor questionnaire design
Interviewer bias
Respondent errors
Data processing
Reporting
Sample Size Calculation
2
D
2

2
Z
n
For an infinite population, Number of Samples to be selected from population with standard
deviation of , sampling error D with known confidence level can be given by -
For finite population N, Number of Samples to be selected from population
with standard deviation of , sampling error D with known confidence level
can be given by -
2 2 2
2 2
1)D - (N
N Z
n
Z

Problem 1
A study is to be performed to determine a certain parameter in a
community. From a previous study a standard deviation of 46
was obtained.
If a sample error of up to 4 is to be accepted. How many subjects
should be included in this study at 99% level of confidence?
Answer

Problem 2
Determine the size of the samples for estimating the true weight of
the cereal containers for the universe with N= 5000 on the basis of
following information.
1. The standard deviation is 2 ounces based on past records
2. Estimate should be within 0.8 ounces of the true average weight
with 99% probability.
Answer
2 2 2
2 2
1)D - (N
N Z
n
Z

N= 5000
Z = 2.57 (for 99% confidence level)
= 2 ounces
D = 0.8s

Using above formula we get n = 40.95 41
Precision
Cost
The Chi Square Test
It is a non parametric test

Tests under Chi Square
Test of Homogeneity
Test of Independence
Test of Goodness of fit
Necessary Conditions
Total sample size is large (more than 50)
Each cell in the contingency table has expected frequency of at
least 5
Summation of observed frequencies = Summation of expected
frequencies
Expected Frequency = [Corresponding Row total * Corresponding
Column total ] / Grand total
Conducting Chi-Square Analysis
1) Make a hypothesis based on your basic biological question
2) Determine the expected frequencies
3) Create a table with observed frequencies, expected frequencies, and
chi-square values using the formula:
(O-E)
2
E
4) Find the degrees of freedom: (c-1)(r-1)
5) Find the chi-square statistic in the Chi-Square Distribution table
6) If chi-square statistic > your calculated chi-square value, you do not
reject your null hypothesis and vice versa.
Test of homogeneity
Suppose assembly elections are announced in 3 major states. A major
Political party wishes to test if the proportion of its supporters in the
three states are same or not.
Ho: Proportions of supporters for all the states are same.


Supporters State 1 State 2 State 3 Total
Yes O11= 300 O12= 350 O13= 425
1075
E11= 358.33 E12= 358.33 E13= 358.33
No O21= 700 O22=650 O23=575
1925
E21=641.67 E22=641.67 E23=641.67
Total 1000 1000 1000 3000
2
= 34.4308 (Calculated)
Table value for 2 degrees of freedom(@5% level of significance): 3.84
Calculated value > Table value
Ho Rejected


Test of Independence
Meal plan selected by the students is as follows:


Class Number of meals per week
Total Standing 20/week 10/week none
Fresh 24 32 14 70
Sophomore 22 26 12 60
Junior 10 14 6 30
Senior 14 16 10 40
Total
70 88 42 200
Ho: Class Standing and Number of Meals per week are independent
Test of Independence
Expected Values :


Class Number of meals per week
Total Standing 20/week 10/week none
Fresh 24.5 30.8 14.7 70
Sophomore 21 26.4 12.6 60
Junior 10.5 13.2 6.3 30
Senior 14 17.6 8.4 40
Total
70 88 42 200
2
= 0.709 (Calculated)
Table value for 6 degrees of freedom(@5% level of significance): 12.592
Calculated value < Table value
Ho Accepted

Alternative Formula
a b


c d
(a + b)

(c + d)
N
(a + c) (b + d)

2
=
(ad bc)
2
* N

(a + c) (b + d) (a + b) (c +d)



Yates Correction

2

=
N * ( |ad bc|- 0.5N)

(a + c) (b + d) (a + b) (c +d)



Applicable only for 2 x 2 contingency table
Yates Correction

Test of goodness of fit
Calculation of 2 for the Coin-Toss Example

Face O E (0.5) O-E (O-E)^2 (O-E)^2/E
Heads 92 100 -8 64 0.64
Tales 108 100 8 64 0.64
Total 200 200 0 - 1.28
Table value for 1 degree of freedom(@5% level of significance): 3.84
Calculated value < Table value
Ho Accepted

You might also like