Problem Set 2 With Solutions
Problem Set 2 With Solutions
Problem Set 2 With Solutions
Question 1
The tables below are summary statistics from the California test scores
dataset often referenced in the textbook.
The variables we are interested in are: the number of students enrolled
on average at schools in the district (Enrolment ), the average number of
teachers working at schools in the district (Teachers), the average number
of computers available at schools in the district (Computer ) and the average
standardized test score for schools in the district (Test score).
Below you …nd descriptive statistics for the full sample, which includes both
K-8 and K-6 schools:
1
1. Can you reject H0 : Enrolment = 2250 vs H1 : Enrolment 6= 2250 at
5% in the overall sample? What if you performed the same test for the
sample of K-6 schools? How would your answers change if you were
to perform a 1% test?
2. Find the p-value for the test of H0 : T eachers = 130 vs H1 : T eachers >
130 for the sample of K-8 schools. Based on this p-value, would you
reject the null hypothesis at 1%? [You can do this using the Standard
2
Normal tables or taking advantage of the Stata command normal(x)
which returns the value of the Standard Normal CDF at x.]
where tact is the actual test statistic computed using the sample at
hand.
The t statistics for our test is:
3. Compute the 90%, 95% and 99% con…dence interval for Computer in
the sample of K-6 school.
The 90%, 95% and 99% con…dence interval for Computer in the sample
of K-6 school are as follows:
sComputerK sComputerK
IC 90 = ComputerK 6 tIC90 p 6
; ComputerK 6 + tIC90 p 6
N N
476:7 476:7
= 338:4 1:64 p ; 338:4 + 1:64 p = [238:3; 438:5]
61 61
where tIC90 is the critical value for the 90% con…dence interval because
it leaves 5% (= 1 20:9 ) of mass under each tail of the standard normal
distribution.
sComputerK sComputerK
IC 95 = ComputerK 6 tIC95 p 6
; ComputerK 6 + tIC95 p 6
N N
476:7 476:7
= 338:4 1:96 p ; 338:4 + 1:96 p = [218:8; 458]
61 61
where tIC95 is the critical value for the 95% con…dence interval because
it leaves 2.5% (= 1 0:95
2 ) of mass under each tail of the standard normal
distribution.
sComputerK sComputerK
IC 99 = ComputerK 6 tIC99 p 6
; ComputerK 6 + tIC99 p 6
N N
476:7 476:7
= 338:4 2:58 p ; 338:4 + 2:58 p = [180:9; 495:9]
61 61
3
where tIC99 is the critical value for the 99% con…dence interval because
it leaves 0.5% (= 1 0:95
2 ) of mass under each tail of the standard normal
distribution.
Notice how increasing the con…dence level leads to wider intervals.
This is a test for the di¤ erence in the mean test scores of two popula-
tions: K-6 and K-8 schools. The t-statistics is as follows:
T estScoreK 6 T estScoreK 8
t =
SE(T estScoreK 6 T estScoreK 8 )
661:5 652:9
= q = 3:18
19:72 18:72
61 + 359
Since the t-stat is larger than the critical value for a 1% 2-sided test
(which would be 2.58), we reject the null hypothesis at 1%.
ComputerK 6 ComputerK 8
t =
SE(ComputerK 6 ComputerK 8 )
338:4 297:4
= q = 0:63
476:72 435:52
61 + 359
Since the t-stat is smaller than the critical value for a 1% 2-sided test,
we cannot reject the null hypothesis at 1%.
Question 2
Below you can see the Stata output of a regression estimating a linear rela-
tionship between the average math test score (MathScore) and the average
students-teachers ratio (STR) in a district as in the model below:
M athScorei = + ST Ri + ui
4
1. What would be the change in math test score associated with a reduc-
tion of the student-teachers ratio of 3.7?
According to our linear regression model, the predicted associated change
in Math Test Score would be ST R = 1:94 ( 3:7) = 7:18
2. How would you argue whether the e¤ect calculated above is large or
small?
One way to scale the e¤ ect would be to obtain the standard deviation
of math test scores and see how large the predicted change is relative
to it.