Introductory of Statistics - Chapter 4
Introductory of Statistics - Chapter 4
Introductory of Statistics - Chapter 4
(Introduction of Statistics)
X = explanatory variable
Y = response variable
2. Find the table entry in the row headed by n and the column headed by your choice of a. Your
choice of a is the risk you are willing to take of mistakenly concluding that p ≠ 0 when, in fact, p
= 0.
a) If r ≥ table entry, then there is sufficient evidence to conclude that p ≠ 0, and we say that
r is significant. In other words, we conclude that there is some population correlation
between the two variables x and y.
b) If r < table entry, then the evidence is insufficient to conclude that p ≠ 0, and we say that
r is not significant. We do not have enough evidence to conclude that there is any
correlation between the two variables x and y.
Conclusion
There does exist sufficient to conclude that a linear relationship between the inlet phosphorous
(100mg/L) and the outlet phosphorous level (100mg/L) in the population of California wetlands
biotreatment facilities exists at a confidence level of 99% tested with a simple random sample
(SRS) of n=8.
Notice how just about anyone can read you conclusion and have general understanding of the
results
Correlation can be thought of as a measure of how well a linear model (line) fits the data points
on a scatter diagram.
Causation
Correlation does not mean Causation (x does not necessarily cause y to change)
Why?
1) The scatter diagram r are from a sample (not entire population)
2) Lurking variables
3) Range of samples
Note:
The correlation between a variable using average is usually higher than r for raw data.
Do not use average for correlation: it may false inflater.
Lurking Variable- a variable not included as an explanatory variable (on response) that may be
responsive for
o changes in x, or
o changes in y, or
o changes in both x and y
Linear Regression
Key points
Residuals
The residual is the difference between the observed and predicted values for y:
Interpretation of Slope A
Influential Point- a point (x, y) is influential if removing it will substantially change the
intercept of slope of the regression line. (Usually points near min or max value of x, with y for
away from remainder of points).
Prediction – we are only allowed to predict values of y using values of x, that range of the x-
values in our sample (interpolation).
Note: We are not allowed to predict values of y using values of x that are outside of the x values
in our sample (extrapolation).