HW 4
HW 4
HW 4
15
Patient Satisfaction: A hospital admin. wants to study the relationship between patient
statisfaction(Y) and Patients Age(X1 in years), severity of illness(X2, an index) and
anxiety level (X3, an index). The admin. randomly selected 46 patients and collected the
data presented below where larger values of Y, X2, X3 are associated with more patient
satisfaction, increased severity of illness, and more anxiety.
a) Prepare a histogram for each of the predictor variables. Are there any noteworthy
features revealed by these plots?
This is a histogram of patient age (X1). In this plot we see that there is some uniformity
among the ages of the patients except for people who are 20-25. There are a similar number of
people who between the ages of 25-40 as there are between ages of 40-55.
This is a histogram of the severity of illness (X2). In this histogram we can definitely see
most individuals had an illness severity between 45-55 and then there is a sharp decline on both
sides.
This is a histogram that shows the frequency distribution of anxiety levels (X3). We see
that the majority of the patients had an anxiety level between 1.8-2.4 where it then declined. We
also see that the biggest group of patients had am anxiety level between 2.2-2.4.
b) Obtain the scatter plot matrix. Interpret these and state your principal findings
This is a multilevel scatterplot. Looking at the bottom three plots, we see that age and
severity look like to have a positive linear relationship. That is as age increases the severity of
the illness increases. Also there seems to be good positive linear relationship between age and
anxiety levels. That is as age increases the anxiety levels tend to increase. Lastly we take a
look at anxiety level vs severity and there appears to be a good positive linear association
between the two variables. Basically as anxiety level increases so does the severity level of the
illness.
c) Fit the regression model (6.5) for the three predictor variables to the data and state the
estimated regression function. How is b2 interpreted?
Call:
lm(formula = y_i ~ x_i1 + x_i2 + x_i3)
Coefficients:
(Intercept)
158.491
x_i1
-1.142
x_i2
-0.442
x_i3
-13.470
There appears to be outliers as we can see that are residuals toward 15 and at the other
extreme which is towards -15. The mean is always zero since the sum of all the residuals add up to zero.
e) Plot the residuals against Y_hat, each of the predictor variables. Also prepare a normal
probability plot. Interpret your plots and summarize your findings
Comments: For each of the plots that have the residuals vs. a predictor variable, we see that the
residuals are nicely spread out around the line y=0. However in each of the plots I do see some outliers
which may need to be examined further. Also for the normal probability plot we see that the residuals fit
moderately well around the line which indicates that there could be some deviation from normality.
6.16- Refer to Patient Satisfaction problem 6.15. Assume that the regression model 6.5 for three
predictor variables with independent normal errors is appropriate.
a) Test whether there is a regression relation: Use alpha=.05 State the alternatives, decision rule,
and conclusion. What does your test imply about B1,B2, and B3? What is the p-value of the test?
Alternatives: Our null hypothesis is H0: B_1=B_2=B_3 and the alternative hypothesis is at least one B_k
is nonzero for 1<=k<=3
Decision Rule: The test statistic value is F=MSR/MSE=(8275.4+480.9+364.2)/3/101.2=30.04 and the
critical value is F(.95,3,42)=2.219059
Conclusion: Since F=30.04> 2.219059 we have enough statistical evidence to reject the null hypothesis.
Hence we conclude that at least one parameter is nonzero.
The p-value of this test is 0.4878 which is much larger than our alpha value of .05.
b) Obtain joint interval estimates of B1, B2, and B3 using a 99 percent family confidence
coefficient. Interpret your results.
We are 99% confident that B1 is between -1.721626 and -.5623744, B2 is between -1.7696 and .885599,
and B3 is between -32.6887 and 5.68868. (Small round off errors due to R).
Problem 6.17
Refer to patient satisfaction problem 6.15. Assume that regression model 6.5 for three predictor
variables with independent normal error terms is appropriate.
a) Obtain an interval estimate of the mean satisfaction when X_h1=35, X_h2=45, and X_h3=2.2.
Use a 90% confidence coefficient. Interpret your confidence interval.
We are 90 percent confident that when x_h1=35, xh2=45,xh3=2.2 the mean response is between
64.53663 and 73.45737 (slight error due to multiplying several matrices).
b) Obtain a prediction interval for a new patients satisfaction when X_h1=35, X_h2=45, and
X_h3=2.2. Use a 90% confidence coefficient. Interpret your confidence interval.
With confidence coefficient .90 we predict that a new patient when xh1=35, xh2=35, xh3=2.2 will have a
satisfaction level between 51.50965 and 86.51092
R Code
#Problem 6.15
data=read.table("C:/Users/Hellangel31/Desktop/CH06PR15.txt")
names(data)=c("Patient Satisfaction","Age","Severity","Anxiety")
data
data$"Patient Satisfaction"
#Part A
hist(data$"Age",xlab="Patient Age",main="Histogram of Patient Age")
hist(data$"Severity", xlab="Severity of Illness", main="Histogram of Illness Severity")
hist(data$"Anxiety",xlab="Anxiety Level",main="Histogram of Anxiety Levels")
#Part B
pairs(data[c("Age","Severity","Anxiety")])
#Part C
y_i=data$"Patient Satisfaction"
x_i1=data$"Age"
x_i2=data$"Severity"
x_i3=data$"Anxiety"
model=lm(y_i~x_i1+x_i2+x_i3)
model
#Part D
model$residuals
boxplot(model$residuals,main="Residuals")
#Part E
#Plot of residuals agains Y bar
plot(model$fitted.values,model$residuals,xlab="Fitted Values",ylab="Residuals")
#Plot of residuals against each predictor variable
plot(x_i1,model$residuals,xlab="Age of Patient",ylab="Residuals")
plot(x_i2,model$residuals,xlab="Severity of Illness",ylab="Residuals")
plot(x_i3,model$residuals,xlab="Anxiety Level",ylab="Residuals")
abline(h=0)
#QQ Plot
qqnorm(model$residuals)
qqline(model$residuals)
anova(model)
#Problem 6.16b
x=cbind(rep(1,46),data$"Age", data$"Severity", data$"Anxiety")
var_b= 101.2*solve(t(x) %*% x)
var_b
s_b=sqrt(var_b)
#s_b1=0.21483, s_b2=0.492056, s_b3=7.100963
#Confidence interval for B1
lower1= -1.142-qt(.995,42)*0.21483
upper1=-1.142+qt(.995,42)*0.21483
#Confidence interval for B2
lower2=-.442-qt(.995,42)*0.492056
upper2=-.442+qt(.995,42)*0.492056
#Confidence Interval for B3
lower3=-13.470-qt(.995,42)*7.100963
upper3=-13.470+qt(.995,42)*7.100963
#Problem 6.16C
anova(model)
summary(model)
R=sqrt(9120.5/13369.3)
#problem 6.17A
x_h=cbind(c(1,35,45,2.2))
t(x_h)
b=cbind(c(158.491,-1.142,-.442,-13.470))
y_h=t(x_h) %*% b
s_y_h=sqrt(100.2*t(x_h) %*% solve(t(x) %*% x) %*%x_h)
#Lower Limit
y_h-qt(1-.10/2,42)*s_y_h
#Upper Limit
y_h+qt(1-.10/2,42)*s_y_h
# Problem 6.17b
predict(model,newdata=data.frame(x_i1=35,x_i2=45,x_i3=2.2),interval="prediction",level=.90)