Lecture 2
Lecture 2
Lecture 2
Amine Hadji
Leiden University
• Inferential statistics
• Surveys
• Introducing causality
Examples:
• Are first ladies representative for women?
• Are LUC students representative for university students in the Netherlands?
Definitions
Goal: Use a small group of units to make inference about a larger group.
Definitions
Goal: Use a small group of units to make inference about a larger group.
Definitions:
• Population:
• Census:
• Sample:
• Sample survey:
Definitions
Goal: Use a small group of units to make inference about a larger group.
Definitions:
• Sample survey:
Definitions
Goal: Use a small group of units to make inference about a larger group.
Definitions:
• Simple random sample: every conceivable group of units of the required size
from the population has the same chance to be the selected sample.
• Sample survey: a subgroup of a large population is questioned on a set of topics.
It is a typical usage of simple random samples.
Sample survey vs. Census
Advantages of sample survey:
Sample survey vs. Census
Advantages of sample survey:
• Census is not always possible: quality control in a company, ecology,...
• Nonparticipation bias:
• Response bias:
Sample survey vs. Census
Advantages of sample survey:
• Census is not always possible: quality control in a company, ecology,...
Sample survey is often used to estimate the proportion (or percentage) of people who
have certain trait or opinion.
Question: How accurate is the estimation?
Margin of error
Sample survey is often used to estimate the proportion (or percentage) of people who
have certain trait or opinion.
Question: How accurate is the estimation?
Margin of error: Is a measure of the accuracy of a sample proportion as an estimate
of the population proportion. The difference between the sample proportion and the
population proportion is less than the margin of error (typically) 95% of the time.
Note: It assumes a representative sample.
Margin of error
Sample survey is often used to estimate the proportion (or percentage) of people who
have certain trait or opinion.
Question: How accurate is the estimation?
Margin of error: Is a measure of the accuracy of a sample proportion as an estimate
of the population proportion. The difference between the sample proportion and the
population proportion is less than the margin of error (typically) 95% of the time.
Note: It assumes a representative sample.
√
Conservative margin of error: Calculated as 1/ n, where n is the sample size. (In
percentage: √1n × 100%)
For n = 1600 we have m.e. of 2.5%
Margin of error
Influence of the sample size:
Margin of error
Influence of the sample size:
• Sample size increases, margin decreases.
• Margin decreases slower than sample size increases
(In order to divide the margin by 2, we must multiply the sample size by 4)
• To get accurate estimates about a sub-group large sample are required
• Sample size increases, sub-group size increases
• Sub-group size increases, margin of the sub-group decreases
Convenience sample: using the most convenient group available. This usually breaks
the fundamental rule of using data for inference.
Difficulties in sampling - Example
Convenience sample: using the most convenient group available. This usually breaks
the fundamental rule of using data for inference.
Example: failure of Literary Digest Poll of 1936: Roosevelt vs. Landon
• mailed 10 million people of magazine subscribers, car owners, telephone owners.
• Gallup got both the election and the prediction of Literary Digest right. (surveying
only 50 000 people).
Pitfalls of Asking Survey Questions
Response bias: the wording of the question influences the answer.
Pitfalls of Asking Survey Questions
Response bias: the wording of the question influences the answer.
Form of questions:
• Closed question: respondents are given a list of alternatives (easy to summarize
and analyse, might exclude possible important answers)
• Open question: are allowed to answer with their own words (difficult to
summarize, wording of question might exclude answers)
They can provide very different results; both are pros and cons
Advises about questionnaires
• Instead of specific questions relying on memory ask more vague ones (e.g. How
much chocolate did you eat in the past month? ⇒ In a typical month how much
chocolate do you eat?)
• Ordering:
• Introduction: simple questions
• Important questions should be asked earlier (loss of interest)
• Sensitive questions should not be the first ones
Cause and effect
Examples:
• Does smoking cause cancer?
Examples:
• Does smoking cause cancer?
Terminology:
• Unit/Subject/Participant: single individual or object being measured.
• Does church visits result in lower blood pressure? (2391 people, people attending
church regularly are 40% less likely to have high blood pressure).
• Malaria prevention with nets. (Two similar village, one gets nets, the other does
not. Measure the difference of malaria cases.)
Randomized experiment
• Goal: extend results for larger groups (data needs to be representative with
respect to the research question)
Designing randomized experiments
• Goal: extend results for larger groups (data needs to be representative with
respect to the research question)
• Participants: volunteers. (sometimes problematic ⇒ non-representative
population)
Designing randomized experiments
• Goal: extend results for larger groups (data needs to be representative with
respect to the research question)
• Participants: volunteers. (sometimes problematic ⇒ non-representative
population)
• Randomization:
• Randomizing the type of treatment (using computer programs)
• Randomizing the order of treatment
Designing randomized experiments II
• Control groups: to check whether the treatment has effect (no active treatment).
• Placebo: special treatment for control group. Looks like an actual treatment, but
no active “ingredient”. Proved to have significant effects
• Blinding:
• single blinding: participant or the researcher does not know which treatment
was assigned.
• double blinding: neither the participant nor the researcher knows which
treatment was assigned (not always possible)
Observational studies
Example:
• Case-control study: “cases” are compared to “controls” (having the attribute or
not)
• Example: Does baldness cause heart attack? (665 heart attack vs 772 other
disease, higher chance for baldness)
Case control study
Advantages:
• Efficiency. (e.g. heart attacks are rare, so choosing people among people who
already have had a heart attack is cheaper/more efficient)
• Reduced potential confounding variables. (e.g. maybe balding men are less
healthy; hence choosing other patients).
Difficulties
• Confounding variables:
• Do not have an effect in randomized experiments: cause-and-effect
relationships can be inferred.
• In observational study take them into account.
• Only extend if data can be considered to be representative with regard to the
question of interest.
• Hawthorne effect: Participants respond, because they are in an experiment (e.g.
problem in medical research).
• Experimenter effects: recording the data erroneously, treating subjects
differently, make the subject aware of the desired outcome.
• Ecological validity: Variables removed from their natural setting. (e.g. no social
pressure).