Problem Set 2 With Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Problem Set 2–Solutions

Note: The material underlying the problems included in this assignment


is covered in Section 3.2-3.4 and 4.1, 4.3 of the textbook. If you had problems
answering the questions or you answered them incorrectly and the solutions
below do not help, you should review the material there and go see the TA
to make sure you understand the concepts and how to apply them.

Question 1
The tables below are summary statistics from the California test scores
dataset often referenced in the textbook.
The variables we are interested in are: the number of students enrolled
on average at schools in the district (Enrolment ), the average number of
teachers working at schools in the district (Teachers), the average number
of computers available at schools in the district (Computer ) and the average
standardized test score for schools in the district (Test score).
Below you …nd descriptive statistics for the full sample, which includes both
K-8 and K-6 schools:

Variable N Mean Std. dev


Enrolment 420 2628.8 3913.1
Teachers 420 129.1 187.9
Computer 420 303.4 441.3
Test score 420 654.2 19.1

Here are summary statistics for the subsample of K-6 schools:

Variable N Mean Std. dev


Enrolment 61 3181.3 4380.9
Teachers 61 157.7 210.3
Computer 61 338.4 476.7
Test score 61 661.5 19.7

Here are summary statistics for the subsample of K-8 schools:

Variable N Mean Std. dev


Enrolment 359 2534.9 3826.8
Teachers 359 124.2 183.7
Computer 359 297.4 435.5
Test score 359 652.9 18.7

1
1. Can you reject H0 : Enrolment = 2250 vs H1 : Enrolment 6= 2250 at
5% in the overall sample? What if you performed the same test for the
sample of K-6 schools? How would your answers change if you were
to perform a 1% test?

Let’s start by constructing our t-statistic for the test

Enrolment Enrolment;0 2628:8 2250


t= p = p = 1:98
sEnrolment = N 3913:1= 420
At this point we can use the Standard Normal Table or the Stata
command normal(x) to …nd out that the p-value of the test statistic is

P r(jtj 1:98) = 0:0477 < 0:05 (1)

Therefore, we reject the null hypothesis at 5%. Alternatively, you could


have spared yourself the last calculation if you had recalled that for 2-
sided 5% testing the threshold value for the t statistics is 1.96 (i.e., we
reject when jtj > 1:96).
We follow the same procedure for testing the same hypothesis in the
K-6 sample:

EnrolmentK6 EnrolmentK6 ;0 3181:3 2250


t= p = p = 1:66
sEnrolmentK6 = N 4380:9= 61
and
P r(jtj 1:66) = 0:097 > 0:05 (2)
Therefore in this case we cannot reject the null hypothesis.
Notice that although the sample mean in the K-6 sample is larger than
in the overall sample, the variance is also higher and the sample size
is smaller. In other words, the precision of the sample mean is lower
for the K-6 sample. Hence, we are not able to reject the hypothesis
that the true population mean is di¤ erent from 2250 even though the
sample mean is a much higher value than the sample mean in the
overall sample (where we rejected the null).
We have computed p-values above; testing at a di¤ erent level is now
easy. The p-value for the test in the overall sample was 0:0477 > 0:01.
Therefore, we cannot reject the null at 1% signi…cance level.
For the K-6 sample, we could not reject at 5%; therefore it is obvious
that we will not be able to reject at 1% (then 1% test makes it harder
to reject the null than a 5% test). As expected, 0:097 > 0:01:

2. Find the p-value for the test of H0 : T eachers = 130 vs H1 : T eachers >
130 for the sample of K-8 schools. Based on this p-value, would you
reject the null hypothesis at 1%? [You can do this using the Standard

2
Normal tables or taking advantage of the Stata command normal(x)
which returns the value of the Standard Normal CDF at x.]

Using the de…nition of p-value for a 1-sided test:

p value = P (t > tact ) (3)

where tact is the actual test statistic computed using the sample at
hand.
The t statistics for our test is:

T eachersK 8 T eachers;0 124:2 130


tact = p = p = 0:6
sT eachers;K 8= N 183:7= 359

Since the t-statistic is normally distributed in large samples, P (t >


tact ) is the probability mass to the right of -0.6 in standard normal,
which is 0.73. Therefore we cannot reject the null hypothesis at 1%
signi…cance level.

3. Compute the 90%, 95% and 99% con…dence interval for Computer in
the sample of K-6 school.
The 90%, 95% and 99% con…dence interval for Computer in the sample
of K-6 school are as follows:
sComputerK sComputerK
IC 90 = ComputerK 6 tIC90 p 6
; ComputerK 6 + tIC90 p 6

N N
476:7 476:7
= 338:4 1:64 p ; 338:4 + 1:64 p = [238:3; 438:5]
61 61
where tIC90 is the critical value for the 90% con…dence interval because
it leaves 5% (= 1 20:9 ) of mass under each tail of the standard normal
distribution.
sComputerK sComputerK
IC 95 = ComputerK 6 tIC95 p 6
; ComputerK 6 + tIC95 p 6

N N
476:7 476:7
= 338:4 1:96 p ; 338:4 + 1:96 p = [218:8; 458]
61 61
where tIC95 is the critical value for the 95% con…dence interval because
it leaves 2.5% (= 1 0:95
2 ) of mass under each tail of the standard normal
distribution.
sComputerK sComputerK
IC 99 = ComputerK 6 tIC99 p 6
; ComputerK 6 + tIC99 p 6

N N
476:7 476:7
= 338:4 2:58 p ; 338:4 + 2:58 p = [180:9; 495:9]
61 61

3
where tIC99 is the critical value for the 99% con…dence interval because
it leaves 0.5% (= 1 0:95
2 ) of mass under each tail of the standard normal
distribution.
Notice how increasing the con…dence level leads to wider intervals.

4. Can you reject the null H0 : T estScore; K 6 = T estScore; K 8 vs the


alternative H1 : T estScore; K 6 =
6 T estScore; K 8 at 1%?

This is a test for the di¤ erence in the mean test scores of two popula-
tions: K-6 and K-8 schools. The t-statistics is as follows:

T estScoreK 6 T estScoreK 8
t =
SE(T estScoreK 6 T estScoreK 8 )
661:5 652:9
= q = 3:18
19:72 18:72
61 + 359

Since the t-stat is larger than the critical value for a 1% 2-sided test
(which would be 2.58), we reject the null hypothesis at 1%.

5. Can you reject the null H0 : Computer; K 6 = Computer; K 8 vs the


alternative H1 : Computer; K 6 6= Computer; K 8 at 1%?
This question is analogous to the previous one. We compute the t-
statistic for a test for the di¤ erence in the mean number of computers
in two populations: K-6 and K-8 schools:

ComputerK 6 ComputerK 8
t =
SE(ComputerK 6 ComputerK 8 )
338:4 297:4
= q = 0:63
476:72 435:52
61 + 359

Since the t-stat is smaller than the critical value for a 1% 2-sided test,
we cannot reject the null hypothesis at 1%.

Question 2
Below you can see the Stata output of a regression estimating a linear rela-
tionship between the average math test score (MathScore) and the average
students-teachers ratio (STR) in a district as in the model below:

M athScorei = + ST Ri + ui

4
1. What would be the change in math test score associated with a reduc-
tion of the student-teachers ratio of 3.7?
According to our linear regression model, the predicted associated change
in Math Test Score would be ST R = 1:94 ( 3:7) = 7:18

2. How would you argue whether the e¤ect calculated above is large or
small?
One way to scale the e¤ ect would be to obtain the standard deviation
of math test scores and see how large the predicted change is relative
to it.

3. Compute the R2 of this regression


The R2 is the ratio between the Explained Sum of Squares (ESS) and
the Total Sum of Squares (TSS). In the Stata output table, ESS is
referred to as SS “Model” and its value is 5635.42. The TSS is labeled
instead “Total” and is 147370.72. Therefore the R2 is 0.038.

4. Compute the Standard Error of the Regression


The SSR can be computed starting from the Sum of Squared Residuals
(SSR) which in the Stata output q
table is referred
q to as SS “Residual”
SSR 141735:1
and is 141735.1. Hence SER = N 2 = 418 = 18:41.

You might also like