STAT1103 RensNotes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

OTHER

 Research hypothesis = testable statement that predicts how 2 or more variables correlate if
they correlate at all
o Those with traumatic brain injuries will have higher depressive symptoms than those
with mild or moderate
 Statistical hypothesis = mathematically precise, correspond to specific claims, aimed at a
population, usually absolutes (there is or isn't), non-directional and doesn't tease out which
group is affected so might not help answer research hypothesis
o There is (or is not) a difference with individuals with depression score for mild,
moderate and severe traumatic brain injuries
o H0 = No trend or pattern
o H1 = Distinct trend or pattern (difference between variables)
o Directional hypotheses use one tailed tests, non-directional use two-tailed (only looking
at two-tailed in this course, take probability of one tail and double it)
 Step by step process
o Background reading to identify RQ and hypotheses > design study/identify methods and
measures > carry out study > general descriptive statistics > identify variables involved >
identify the appropriate test > check assumptions for test > run test and conclude
 Types of Research Design
o Longitudinal vs Cross-sectional
 Longitudinal considerations; test-retest effects, participant attrition (drop-out),
time investment, retrospective data (recall bias)
 Time-points (waves) either predicting over time or measuring change, related
groups t-test (numeric) or McNemar's test (categorical)
o Experimental vs Non-experimental
 Criteria for a cause and effect: covariance rule (relationship), temporal
precedence (cause must come first), internal validity (excluding other causes)
 Non-experimental methods; single-variable, correlational research, quasi-
experimental
 Experimental methods; between-subjects (researcher control, non-naturally
occurring groups, distinct), within subjects (same group of people but with
different 'conditions')
 Between subjects needs more participants, statistically less powerful,
shorter experimental time good for participants, no carry-over effects
 Within subjects needs half the participants, statistically powerful,
double experimental time, carry-over effect potential
o Survey vs Observational
 Probability sampling
o Simple random, systematic (non-bias method to select), stratified random (characteristic
groups randomly chosen between), cluster sampling (only select certain clusters)
 Non-probability sampling
o Convenience sampling, snowball sampling
 Reliability = consistency/dependability of measurement, reliability of scale, does it measure
construct
 Validity = face validity, content validity, criterion validity, concurrent vs predictive, convergent,
discriminant
 Types of data = Qualitative vs Quantitative (or mixed) / Discrete vs Continuous
o Data analysis for qualitative is often described/explained, coded for common themes
(inter-rater reliability important) and then turned into variables for analysis
o

o Independent (predict/cause an outcome) vs dependent (outcome) = E.g. PAL attendance


(IV) on final grade (DV)
o Extraneous variable (anything other than IV/DV) and confounding variable (potentially
explain relationship)
o Nominal (unordered, categorical, arbitrary with no hierarchy, one example is
binary/dichotomous variable with only two groups/levels). Gender is nominal,
woman/man/other, totally distinct
o Ordinal (ordered categorical, hierarchy). Highest level of education.
o Interval (distance is meaningful, numeric scale with consistent differences between
points). Pain on a scale from 0-10.
o Ratio (numeric scale with consistent differences between points AND absolute zero, e.g.
0 Celsius doesn't mean there's no temperature, while 0 means absolute absence of the
thing you're measuring). Distance between two friends sitting next to each other (cms).
o

 One categorical variable = Bar chart (frequency of categories, greater detail), pie chart (better
for comparing two or more categorical values)
o Two categorical variables = Contingency table (numeric summary), clustered bar chart
(graph)
 One numerical variable = Comparative boxplot, histogram (no comparison, shape of distribution
shown)
o Two numerical variables = Correlation (numeric summary), scatterplot (graph)

 Numeric summaries; frequency tables with count, percentage or proportion. Mean or median,
SQ or IQR, variance and range.
o With two categorical variables we create a cross-tabulation of one categorical variable
by another, allows comparison

 Interval estimate gives us a range of believable values for the parameter, the interval/range is
called a confidence interval. Need a confidence level of 95%

 Why 5% significance level? Minimize type 1 error (we reject the null hypothesis when we
shouldn't, false positive result, sample effect is due to chance)
o Type 2 error is not rejecting the null hypothesis when we should, our sample isn't
detecting a population effect, false negative result
o

 Power is the probability that we correctly reject the null hypothesis, influenced by significance
level (higher significance, higher power). Also impacted by sample size (power increase with
larger size) and the variability in DV (more variable the DV, harder it is to reject)
 
 
TESTS
 One sample t-test = single numeric variable
 One sample z-test = single numeric variable (population mean known)

o Z value close to 0 big probability, far from 0 is a small probability


 Chi-square goodness of fit test = single categorical variable
 Correlation = two numeric variables
 Independent samples/two-sample t-test = one numeric variable, one categorical variable
 Paired samples t-test = one numeric variable, one categorical variable (related groups)

 Chi-square test of independence = two categorical variables


 McNemar's test = two categorical variables (different time point categories)


 Effect sizes
o Cohen's d = 0.2 (small), 0.5 (medium), 0.8 (large)
o Cohen's w = 0.1 (small), 0.3 (medium), 0.5 (large)
o Pearson's r = 0.1 (small), 0.3 (medium), 0.5 (large)
o Pearson's correlation coefficient (all +)= 0-.10 (weak-none), .10-.30 (weak), .30-.50
(moderate), .50-1.00 (strong)

 Descriptive statistics, understanding the sample (and comparison to population), phenomena


and the statistical analyses needed
o Demographics = tab1 Gender Ethnicity (frequency table), summarize Age (mean, range,
std)
o Key Variables = tab1 control group, tab1 experimental group
o Histogram frequency charts
 
 
 
 
STATA
 label define yesno 1 "Yes" 2 "No"
o Labelling values that was previously numbers
 label values variable variable variable yesno
o Attaches the yes/no label to specific variables
 tab1 variable
o Frequency table
 tabulate variable, expected row chi2
o Assumption check that expected frequency is at least 5 for Chi square test of
independence
 recode variable (min/9 = 0) (10/max = 1), generate(new_variable_name)
o Separates data into 0 and 1 (under/over the limit) rather than a whole range of numbers
to summarize
 summarize variable, detail
o Descriptive statistics (mean, standard deviation, median, interquartile range, min and
max)

 swilk variable
o One sample t-test assumption check of normality
 reshape wide tol, i(id) j(age)
o This will restructure the data so that we have separate columns of tolerance score for
each age group and person.
 robvar Mean_general_ds, by(Gender)
o

o Levene's test to test equality of variances, null hypothesis is that they're equal, do we
reject it

You might also like