STAT1103 RensNotes
STAT1103 RensNotes
STAT1103 RensNotes
Research hypothesis = testable statement that predicts how 2 or more variables correlate if
they correlate at all
o Those with traumatic brain injuries will have higher depressive symptoms than those
with mild or moderate
Statistical hypothesis = mathematically precise, correspond to specific claims, aimed at a
population, usually absolutes (there is or isn't), non-directional and doesn't tease out which
group is affected so might not help answer research hypothesis
o There is (or is not) a difference with individuals with depression score for mild,
moderate and severe traumatic brain injuries
o H0 = No trend or pattern
o H1 = Distinct trend or pattern (difference between variables)
o Directional hypotheses use one tailed tests, non-directional use two-tailed (only looking
at two-tailed in this course, take probability of one tail and double it)
Step by step process
o Background reading to identify RQ and hypotheses > design study/identify methods and
measures > carry out study > general descriptive statistics > identify variables involved >
identify the appropriate test > check assumptions for test > run test and conclude
Types of Research Design
o Longitudinal vs Cross-sectional
Longitudinal considerations; test-retest effects, participant attrition (drop-out),
time investment, retrospective data (recall bias)
Time-points (waves) either predicting over time or measuring change, related
groups t-test (numeric) or McNemar's test (categorical)
o Experimental vs Non-experimental
Criteria for a cause and effect: covariance rule (relationship), temporal
precedence (cause must come first), internal validity (excluding other causes)
Non-experimental methods; single-variable, correlational research, quasi-
experimental
Experimental methods; between-subjects (researcher control, non-naturally
occurring groups, distinct), within subjects (same group of people but with
different 'conditions')
Between subjects needs more participants, statistically less powerful,
shorter experimental time good for participants, no carry-over effects
Within subjects needs half the participants, statistically powerful,
double experimental time, carry-over effect potential
o Survey vs Observational
Probability sampling
o Simple random, systematic (non-bias method to select), stratified random (characteristic
groups randomly chosen between), cluster sampling (only select certain clusters)
Non-probability sampling
o Convenience sampling, snowball sampling
Reliability = consistency/dependability of measurement, reliability of scale, does it measure
construct
Validity = face validity, content validity, criterion validity, concurrent vs predictive, convergent,
discriminant
Types of data = Qualitative vs Quantitative (or mixed) / Discrete vs Continuous
o Data analysis for qualitative is often described/explained, coded for common themes
(inter-rater reliability important) and then turned into variables for analysis
o
One categorical variable = Bar chart (frequency of categories, greater detail), pie chart (better
for comparing two or more categorical values)
o Two categorical variables = Contingency table (numeric summary), clustered bar chart
(graph)
One numerical variable = Comparative boxplot, histogram (no comparison, shape of distribution
shown)
o Two numerical variables = Correlation (numeric summary), scatterplot (graph)
Numeric summaries; frequency tables with count, percentage or proportion. Mean or median,
SQ or IQR, variance and range.
o With two categorical variables we create a cross-tabulation of one categorical variable
by another, allows comparison
Interval estimate gives us a range of believable values for the parameter, the interval/range is
called a confidence interval. Need a confidence level of 95%
Why 5% significance level? Minimize type 1 error (we reject the null hypothesis when we
shouldn't, false positive result, sample effect is due to chance)
o Type 2 error is not rejecting the null hypothesis when we should, our sample isn't
detecting a population effect, false negative result
o
Power is the probability that we correctly reject the null hypothesis, influenced by significance
level (higher significance, higher power). Also impacted by sample size (power increase with
larger size) and the variability in DV (more variable the DV, harder it is to reject)
TESTS
One sample t-test = single numeric variable
One sample z-test = single numeric variable (population mean known)
Effect sizes
o Cohen's d = 0.2 (small), 0.5 (medium), 0.8 (large)
o Cohen's w = 0.1 (small), 0.3 (medium), 0.5 (large)
o Pearson's r = 0.1 (small), 0.3 (medium), 0.5 (large)
o Pearson's correlation coefficient (all +)= 0-.10 (weak-none), .10-.30 (weak), .30-.50
(moderate), .50-1.00 (strong)
swilk variable
o One sample t-test assumption check of normality
reshape wide tol, i(id) j(age)
o This will restructure the data so that we have separate columns of tolerance score for
each age group and person.
robvar Mean_general_ds, by(Gender)
o
o Levene's test to test equality of variances, null hypothesis is that they're equal, do we
reject it