R-Cheatsheet: Help Numerical Summaries Linear Regression
R-Cheatsheet: Help Numerical Summaries Linear Regression
R-Cheatsheet: Help Numerical Summaries Linear Regression
The following are examples of how some of the func- # Make a scatterplot of 'pcs' versus 'age' # Investigate whether we may drop 'age' as an
tions work (based on mosaic’s built-in data set, # coloured by 'sex' and add regression lines: # explanotary variable for 'pcs', when
HELPrct). We assume mosaic is already loaded. In gf_point(pcs~age, col = ~sex, data = HELPrct) %>% # 'substance' is in the linear model too:
some chunks only the code and not the output is gf_lm() %>% mod1 <- lm(pcs ~ age + substance, data = HELPrct)
shown (to see the output, copy-paste the code chunk gf_labs(x = "Age", mod2 <- lm(pcs ~ substance, data = HELPrct)
of interest into your console). y = "Physical score", anova(mod1, mod2)
title = "My first scatter plot")
# Create a contingency table for 'sex' and Analysis of Variance Table
# 'substance':
tally(sex ~ substance, data = HELPrct) My first scatter plot Model 1: pcs ~ age + substance
Model 2: pcs ~ substance
Physical score
substance 60 sex Res.Df RSS Df Sum of Sq F Pr(>F)
sex alcohol cocaine heroin female 1 449 47517
40
female 36 41 30 male 2 450 50139 -1 -2623 24.8 9.2e-07
male 141 111 94 20
20 30 40 50 60
Illustration of how the functions pdist and qdist
# Calculate mean 'age' for men and women: Age works:
mean(age ~ sex, data = HELPrct) # Calculate the 95th percentile for the
Note: gf point creates the scatter plot, gf lm adds
regression lines and gf labs adds a title and change # standard normal distribution (i.e., mean = 0
female male # and standard deviation = 1):
36.25 35.47 axis labels.
qdist("norm", p = 0.95, mean = 0, sd = 1)
# 'favstats' can be used to retrieve different # Use an exact binomial test to test whether [1] 1.645
# summaries of the data (here for 'age' # the proportion of women is 50 %:
0.4
# separated by sex) : binom.test(~sex, p = 0.5, data = HELPrct)
favstats(age ~ sex, data = HELPrct) 0.3 probability
density
0.2 A: 0.950
sex min Q1 median Q3 max mean sd B: 0.050
1 female 21 31 35 40.5 58 36.25 7.585 # Use a t-test to test whether the mean age of 0.1
2 male 19 30 35 40.0 60 35.47 7.750 # men and women are the same:
0.0
n missing t.test(age ~ sex, data = HELPrct) −2 0 2
1 107 0 # Calculate the probability of getting a value
2 346 0 # less than -1.5 for the standard normal
# Use a chi-square test to test for # distribution:
# Boxplot of 'age' for each substance with # independence between 'homeless' and 'sex': pdist("norm", q = -1.5, mean = 0, sd = 1)
# different panels for men and women: tab <- tally(homeless ~ sex, data = HELPrct)
gf_boxplot(age ~ substance | sex, data = HELPrct) [1] 0.06681
chisq.test(tab)
0.4
female male
60
# Use an approximate test to see whether the
0.3 probability
# proportion of homeless is the same for men
density
50
# and women: 0.2 A: 0.067
age
40 B: 0.933
prop.test(homeless ~ sex, data = HELPrct) 0.1
30
20 0.0
alcohol cocaine heroin alcohol cocaine heroin −2 0 2
substance