STAT 383 Exam 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

STAT 383

Clarkson University Name:


Exam 3
Department of Mathematics (Print Name Clearly)

Instructions: Please enter multiple choice answers on the answer sheet provided. Don’t forget to put your name and version on the answer
sheet! For multiple choice questions, choose the best available answer.

Multiple Choice (35 points)


1. Analysis of variance (ANOVA) can be used to test:
A. whether a factor variable has a significant effect.
B. whether k population means are all equal.
C. whether any of the slope coefficients of a linear regression are non-zero.
D. All of the above.
E. None of the above.
2. How many different hypothesis tests does a two-way ANOVA problem consist of?
A. 1 B. 2 C. 3 D. 7 E. None
3. A three-way ANOVA would consist of testing for the main effects (coming from each factor) as well as interaction between
any combination of factors considered. As such, how many different F-statistics would need to be computed for a three-
way ANOVA?
A. 3 B. 7 C. 8 D. 9 E. None
4. Which of the following statements is always true about F-statistics for ANOVA?
A. The error degrees of freedom is the numerator degrees of freedom.
B. The error degrees of freedom is the denominator degrees of freedom.
C. The numerator degrees of freedom is greater than the denominator degrees of freedom.
D. The denominator degrees of freedom is greater than the numerator degrees of freedom.
E. More than one of the above statements is correct.
5. The estimates for coefficients in linear regressionP
come from: P
A. minimizing the sum of the residuals (errors) (yi − ŷP i) = ei . P
B. minimizing the sum of the absolute residuals (errors) P|yi − ŷi | = P|ei |.
C. maximizing the sum of the absolute residuals (errors) |yi −Pŷi | = |ei |. P
D. maximizing the sum of the squared residuals (errors) SSE =P (yi − ŷi )2 =P e2i .
E. minimizing the sum of the squared residuals (errors) SSE = (yi − ŷi )2 = e2i .
6. How many coefficients do you need to estimate in a simple linear regression model?
A. 1 B. 2 C. 3 D. It depends. E. None.
7. How many coefficients do you need to estimate in a multiple linear regression model that incorporates 4 predictors?
A. 3 B. 4 C. 5 D. It depends. E. None.
8. Regardless of the value of the input/predictor/explanatory/independent variable, the variance of the distribution of the
output/response/dependent variable should be the same. This assumption of constant (or equal) variance about the
regression line is called:
A. random error B. pooled variance C. residual analysis D. heteroscedasticity E. homoscedasticity
9. Another name for a residual in regression analysis is:
A. random error B. pooled variance C. residual analysis D. heteroscedasticity E. homoscedasticity
10. Suppose that you have a simple linear regression of ŷ = −250 + 5x and a coefficient of determination of 0.64. The
appropriate sample correlation for this data is:
A. 0.8 B. -0.8 C. -0.64 D. 0.64 E. unknowable given this information.
11. Suppose, in the previous question, that y represented the weight (pounds) and x represented the height (inches) of an
NBA player. If an NBA player’s height were to increase by 1 inch, then their weight is expected to:
A. increase by 1 pound.
B. decrease by 1 pound.
C. increase by 5 pounds.
D. decrease by 5 pounds.
E. decrease by 250 pounds.

Version A Page 1 of 5
STAT 383 Exam 3 Friday December 6, 2019

12. Consider the intercept for the same regression mentioned in the previous question. Which of the following statements
would be a correct interpretation of that value?
A. An NBA player who is 0 inches tall is expected to weigh -250 pounds under this model.
B. The intercept value is not useful to interpret because there are no NBA players who are near 0 inches tall.
C. The intercept value is a location parameter that ensures our regression line stays close to the data points.
D. All of the above are correct.
E. None of the above are correct.

13. How could you test for a linear relationship between two variables?
A. ANOVA.
B. t-test for slope coefficient equal to zero.
C. t-test for correlation equal to zero
D. All of the above.
E. None of the above.

14. What does it mean if ALL of your data points lie on your regression line?
A. The data you had happened to be perfectly linear.
B. Your data must be inaccurate.
C. You have miscalculated the regression line.
D. You must have used software incorrectly.
E. This is one of life’s unknowable mysteries.
15. What does it mean if NONE of your data points lie on your regression line?
A. Your data must be inaccurate.
B. You have miscalculated the regression line.
C. You must have used software incorrectly.
D. You are a complete failure.
E. Error is part of a regression model; if you can verify that the error is random then what you have done is fine.
16. The final project for this course is due one week from today (Friday, 6 December). You may write one of the answers
below as your response or feel free to write out something else at the bottom of the answer sheet if none of the options
apply to you. (There’s no correct or incorrect answer, so honesty is appreciated!) What outcome from the project will
have the highest personal value to you?
A. A high grade to boost my course average
B. Some practical experience using software
C. A chance to integrate statistics with something that I actually care about
D. Learning if and how to use data to answer important questions
E. No value; the project is a requirement of the class, but otherwise a complete waste of my time.

Version A Page 2 of 5
STAT 383 Exam 3 Friday December 6, 2019

Short Answer (65 points)


1. (25 pts) A student considers data for the 2018-2019 NBA season for some teams located in the same geographic area.
The student wishes to investigate whether there are differences in the average number of three-point shots made by
players in different positions on different teams. The student’s statistical results are shown below. Use them to answer
the questions that follow.

(a) How many different teams were considered by this student?

(b) How many different positions were considered by this student?

(c) What kind of analysis was considered here? (One-way ANOVA, Randomized Block Design, Two-way ANOVA,
Three-way ANOVA, Linear Regression ANOVA, etc.)

(d) What conclusion(s) could the student draw from these results?

(e) Suppose that the student decided to run a Tukey pairwise comparison of the average three-point shots made as
grouped by positions. How many distinct (different) pairs of positions would the student need to consider?

(f) A small portion of the student’s Tukey results are shown below. With what is shown, you might be able to compare
a center (C) to four other positions on the team (PF = power forward, PG = point guard, SF = small forward, and
SG = shooting guard). What conclusion(s) can you draw from the results shown below? (Assume that we are using
= 0.05.)

Version A Page 3 of 5
STAT 383 Exam 3 Friday December 6, 2019

2. (20 pts) The same student from the previous question also believes that they can predict the average points per game
scored by players based on some very basic statistics: the player’s age (years), weight (pounds), height (inches), and
experience (number of years played in the NBA). They conducted regression analysis and the results are shown below.

(a) If the student chooses to use α = 0.01, which variables are significant in predicting average points scored per game?

(b) Provide an interpretation of the multiple R2 value the student obtained. Use that interpretation to comment on the
linear model the student has come up with.

(c) Write out the linear regression equation this model is yielding and provide brief interpretations of the slopes.

(d) Suppose that a 29-year-old male who weighs 130 pounds and is 70 inches tall was to suddenly join the NBA with
no previous experience. How many points per game would you expect him to average?

Version A Page 4 of 5
STAT 383 Exam 3 Friday December 6, 2019

3. (10 pts) Use the output from the previous question to construct the ANOVA table for the multiple linear regression
conducted. Be sure to state the appropriate hypotheses, decision, and conclusion to accompany this table.

4. (10 pts) The student who is so interested in basketball knows that basketball games are high-scoring events and believes
that there would be a direct correlation between the average number of minutes a player spent playing and the average
number of points that player scored per game.
(a) The student finds that a correlation between minutes played and points per game is r = 0.8486. Interpret this value.

(b) Below is a scatter plot of the data and the corresponding standardized residual plot for a simple linear regression.
Do you think a linear regression is the best choice to model this data? Explain your reasoning using features of both
graphs.

Version A Page 5 of 5

You might also like