Instucciones Proyecto 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Economics 1152/SUP 135 Professor Raj Chetty and Dr.

Gregory Bruich
Spring 2019 Department of Economics, Harvard University

Empirical Project 1
Stories from the Atlas: Describing Data using Maps, Regressions, and Correlations
Posted on Thursday, February 7, 2019
Due at midnight on Thursday, February 21, 2019

The Opportunity Atlas was publicly released on October 1, 2018, and an accompanying article
appeared on the front page of the New York Times. The Opportunity Atlas is a freely available
interactive mapping tool that traces the roots of outcomes such as poverty and incarceration back
to the neighborhoods in which children grew up.

Policymakers, journalists, and the public have begun to explore the Opportunity Atlas, casting
new light on the geography of upward mobility in communities across the country. As an example,
see Jasmine Garsd’s recent analysis for the New York City neighborhood of Brownsville in
Brooklyn.

In this first empirical project, you will use the Opportunity Atlas mapping tool and the underlying
data to describe equality of opportunity in your hometown and across the United States. (If you
grew up outside the United States, you may select a community in which you have spent some
time, such as Boston, MA.)

The end product will be a 4-6 page narrative (or story) in which you describe what you have learned
from the Atlas. The next page lists specific analyses and questions that your narrative must
address. It should be double spaced with references, graphs, and maps.

This project focuses on the following methods for descriptive data analysis. (The later empirical
projects you will do in this class will be focused on causal inference and prediction).

1. Data visualization. Maps are a powerful way to present descriptive statistics for data with
a geographic component. You will use maps to display upward mobility statistics for the
Census tracts in your hometown.

2. Regression and correlation analysis. You will use linear regressions and correlation
coefficients to quantify the statistical relationship between upward mobility and potential
explanatory variables.

The Stata data file that you will use in this assignment, atlas.dta, contains an extract of the
Opportunity Atlas data. I have also merged on several other variables, which you may use for the
correlational analysis.

We will invite 5-10 students who produce the most compelling and insightful stories/analyses to
discuss them with Professor Chetty and his team members at a lunch hosted at Opportunity
Insights.
Instructions

Please submit your Empirical Project on Canvas. Your submission should include three files:
1. A 4-6 page narrative as a word or pdf document (double spaced and including references,
graphs, maps, and tables)
2. A do-file with your STATA code or an .R script file with your R code
3. A log file of your STATA or R output

Specific questions to address in your narrative

1. Start by looking up the city where you grew up on the Opportunity Atlas. Zoom in to the
Census tracts around your home.

Figure 1 in your narrative should be a map of the Census tracts in your hometown from the
Opportunity Atlas. Examples for Milwaukee, WI (where Professor Chetty grew up) and
Los Angeles, CA (discussed in Lecture 1) are shown on the next page. The text of your
narrative should describe what you see, and what data are being visualized.

Examine the patterns for a number of different groups (e.g., lowest income children, high
income children) and outcomes (e.g., earnings in adulthood, incarceration rates). Only
choose one or two of these to include in your narrative.

2. (To answer this question, read the Opportunity Atlas manuscript) What period do the data
you are analyzing come from? Are you concerned that the neighborhoods you are
studying may have changed for kids now growing up there? What evidence do Chetty et
al. (2018) provide suggesting that such changes are or are not important? What type of
data could you use to test whether your neighborhood has changed in recent years?

3. Now turn to the atlas.dta data set. How does average upward mobility, pooling races and
genders, for children with parents at the 25th percentile (kfr pooled_p25) in your
home Census tract compare to mean (population-weighted, using count_pooled)
upward mobility in your state and in the U.S. overall? Do kids where you grew up have
better or worse chances of climbing the income ladder than the average child in America?

Hint: The Opportunity Atlas website will give you the tract, county, and state FIPS codes
for your home address. For example, searching for “Lynwood Road, Verona, New
Jersey” will display Tract 34013021000, Verona, NJ. The first two digits refer to the
state code, the next three digits refer to the county code, and the last 6 digits refer to the
tract code. In Stata, listing this observation can be done as follows:
list kfr_pooled_p25 if state == 34 & county == 013 & tract == 021000

4. What is the standard deviation of upward mobility (population-weighted) in your home


county? Is it larger or smaller than the standard deviation across tracts in your state?
Across tracts in the country? What do you learn from these comparisons?

2
5. Now let’s turn to downward mobility: repeat questions (3) and (4) looking at children
who start with parents at the 75th and 100th percentiles. How do the patterns differ?

6. Using a linear regression, estimate the relationship between outcomes of children at the
25th and 75th percentile for the Census tracts in your home county. Generate a scatter
plot to visualize this regression. Do areas where children from low-income families do
well generally have better outcomes for those from high-income families, too?

7. Next, examine whether the patterns you have looked at above are similar by race. If there
is not enough racial heterogeneity in the area of interest (i.e., data is missing for most
racial groups), then choose a different area to examine.

8. Using the Census tracts in your home county, can you identify any covariates which help
explain some of the patterns you have identified above? Some examples of covariates
you might examine include housing prices, income inequality, fraction of children with
single parents, job density, etc. For 2 or 3 of these, report estimated correlation
coefficients along with their 95% confidence intervals.

9. Open question: formulate a hypothesis for why you see the variation in upward mobility
for children who grew up in the Census tracts near your home and provide correlational
evidence testing that hypothesis.

For this question, many covariates have been provided to you in the atlas.dta file, which
are described under the “Characteristics of Census tracts” header in Table 1.

You are welcome to use outside data that are not included in atlas.dta, but this is not
required. Diane Sredl has created a research guide for our class that contains links to
other data sources. You may wish to read this tutorial on how to add variables to a data
set in Stata.

10. Putting together all the analyses you did above, what have you learned about the
determinants of economic opportunity where you grew up? Identify one or two key
lessons or takeaways that you might discuss with a policymaker or journalist if asked
about your hometown. Mention any important caveats to your conclusions; for example,
can we conclude that the variable you identified as a key predictor in the question above
has a causal effect (i.e., changing it would change upward mobility) based on that
analysis? Why or why not?

3
Figure 1
Household Income in Adulthood for Children Raised in Low-Income Households
in Milwaukee, WI

Notes: This figure shows household income at ages 31-37 for low income children who grew up
in Census tracts near Milwaukee, WI. The image was saved from www.opportunity-atlas.org by
first searching for “Milwaukee, WI” and then clicking on the “download as image” button.

Figure 2
Incarceration Rates for Black Men Raised in the Lowest-Income Households
in Los Angeles, CA

Notes: This figure is from the non-technical summary of the Opportunity Atlas and was discussed
in Lecture 1.

4
DATA DESCRIPTION, FILE: atlas.dta

The data consist of n = 73,278 U.S. Census tracts. For more details on the construction of the
variables included in this data set, please see Chetty, Raj, John Friedman, Nathaniel Hendren,
Maggie R. Jones, and Sonya R. Porter. 2018. “The Opportunity Atlas: Mapping the Childhood
Roots of Social Mobility.” NBER Working Paper No. 25147.

Table 1
Definitions of Variables in atlas.dta

Variable name Label Obs.


(1) (2) (3)
1. Geographic identifiers
tract Tract FIPS Code (6-digit) 2010 73,278
county County FIPS Code (3-digit) 73,278
state State FIPS Code (2-digit) 73,278
cz Commuting Zone Identifier (1990 Definition) 72,473

2. Characteristics of Census tracts


hhinc_mean2000 Mean Household Income 2000 72,302
mean_commutetime2000 Average Commute Time of Working Adults in 72,313
2000
frac_coll_plus2010 Fraction of Residents with a College Degree or 72,993
More in 2010
frac_coll_plus2000 Fraction of Residents with a College Degree or 72,343
More in 2000
foreign_share2010 Share of Population Born Outside the U.S. 72,279
med_hhinc2016 Median Household Income in 2016 72,763
med_hhinc1990 Median Household Income in 1999 72,313
popdensity2000 Population Density (per square mile) in 2000 72,469
poor_share2010 Poverty Rate 2010 72,933
poor_share2000 Poverty Rate 2000 72,315
poor_share1990 Poverty Rate 1990 72,323
share_black2010 Share black 2010 73,111
share_hisp2010 Share Hispanic 2010 73,111
share_asian2010 Share Asian 2010 71,945
share_black2000 Share black 2000 72,368
share_white2000 Share white 2000 72,368
share_hisp2000 Share Hispanic 2000 72,368
share_asian2000 Share Asian 2000 71,050
gsmn_math_g3_2013 Average School District Level Standardized Test 72,090
Scores in 3rd Grade in 2013
rent_twobed2015 Average Rent for Two-Bedroom Apartment in 56,607
2015

5
singleparent_share2010 Share of Single-Headed Households with Children 72,564
2010
singleparent_share1990 Share of Single-Headed Households with Children 72,196
1990
singleparent_share2000 Share of Single-Headed Households with Children 72,285
2000
traveltime15_2010 Share of Working Adults w/ Commute Time of 15 72,939
Minutes Or Less in 2010
emp2000 Employment Rate 2000 72,344
mail_return_rate2010 Census Form Rate Return Rate 2010 72,547
ln_wage_growth_hs_grad Log wage growth for HS Grad., 2005-2014 51,635
jobs_total_5mi_2015 Number of Primary Jobs within 5 Miles in 2015 72,311
jobs_highpay_5mi_2015 Number of High-Paying (>USD40,000 annually) 72,311
Jobs within 5 Miles in 2015
nonwhite_share2010 Share of People who are not white 2010 73,111
popdensity2010 Population Density (per square mile) in 2010 73,194
ann_avg_job_growth_2004_2013 Average Annual Job Growth Rate 2004-2013 70,664
job_density_2013 Job Density (in square miles) in 2013 72,463

3. Measures of Upward Mobility from the Opportunity Atlas


kfr_pooled_p25 Household income ($) at age 31-37 for children 72,011
with parents at the 25th percentile of the national
income distribution
kfr_pooled_p75 Household income ($) at age 31-37 for children 72,012
with parents at the 75th percentile of the national
income distribution
kfr_pooled_p100 Household income ($) at age 31-37 for children 71,968
with parents at the 100th percentile of the national
income distribution
kfr_natam_p25 Household income ($) at age 31-37 for Native 1,733
American children with parents at the 25th
percentile of the national income distribution
kfr_natam_p75 Household income ($) at age 31-37 for Native 1,728
American children with parents at the 75th
percentile of the national income distribution
kfr_natam_p100 Household income ($) at age 31-37 for Native 1,594
American children with parents at the 100th
percentile of the national income distribution
kfr_asian_p25 Household income ($) at age 31-37 for Asian 15,434
children with parents at the 25th percentile of the
national income distribution
kfr_asian_p75 Household income ($) at age 31-37 for Asian 15,360
children with parents at the 75th percentile of the
national income distribution

6
kfr_asian_p100 Household income ($) at age 31-37 for Asian 13,480
children with parents at the 100th percentile of the
national income distribution
kfr_black_p25 Household income ($) at age 31-37 for Black 34,086
children with parents at the 25th percentile of the
national income distribution
kfr_black_p75 Household income ($) at age 31-37 for Black 34,049
children with parents at the 75th percentile of the
national income distribution
kfr_black_p100 Household income ($) at age 31-37 for Black 32,536
children with parents at the 100th percentile of the
national income distribution
kfr_hisp_p25 Household income ($) at age 31-37 for Hispanic 37,611
children with parents at the 25th percentile of the
national income distribution
kfr_hisp_p75 Household income ($) at age 31-37 for Hispanic 37,579
children with parents at the 75th percentile of the
national income distribution
kfr_hisp_p100 Household income ($) at age 31-37 for Hispanic 35,987
children with parents at the 100th percentile of the
national income distribution
kfr_white_p25 Household income ($) at age 31-37 for white 67,978
children with parents at the 25th percentile of the
national income distribution
kfr_white_p75 Household income ($) at age 31-37 for white 67,968
children with parents at the 75th percentile of the
national income distribution
kfr_white_p100 Household income ($) at age 31-37 for white 67,627
children with parents at the 100th percentile of the
national income distribution

3. Counts of number of children under 18 in 2000 (to calculate weighted summary statistics)
count_pooled Count of all children 72,451
count_white Count of White children 72,451
count_black Count of Black children 72,451
count_asian Count of Asian children 72,451
count_hisp Count of Hispanic children 72,451
count_natam Count of Native American children 72,451

Note: This table describes the variables included in the atlas.dta file.

7
Table 2a
STATA Hints
STATA command Description
*clear the workspace This code shows how to clear the workspace, change the
clear working directory, and open a Stata data file.
set more off
cap log close To change directories on either a mac or windows PC, you
can use the drop down menu in Stata. Go to file -> change
*change working directory and open data set working directory -> navigate to the folder where your data
cd "C:\Users\gbruich\Ec1152\Projects\" is located. The command to change directories will appear;
use atlas.dta it can then be copied and pasted into your .do file.

*Summary stats These commands report means and standard deviations for
sum yvar [aw = count_pooled] yvar, weighted by the variable count_pooled. The first line
calculates these statistics across the full sample. The second
*Summary stats for Wisconsin line calculates these statistics for observations in Wisconsin.
sum yvar if state == 55 [aw = count_pooled ] The third line calculates these statistics for observations in
Milwaukee County.
*Summary stats for Milwaukee County
sum yvar if state == 55 & county == 079 [aw =
count_pooled ]

(Last two lines all go on one line in Stata)


reg yvar xvar1 xvar2 xvar3, robust This command estimates an OLS regression of yvar against
xvar1, xvar2, and xvar3, using heteroskedasticity-robust
standard errors.
*Report correlation coefficients These commands show two methods for estimating
*Method 1 correlation coefficients.
sum yvar
gen y_std = (yvar - r(mean))/r(sd) The first block of code shows how to first generate
standardized versions of the variables yvar and xvar by
sum xvar subtracting from each its mean and then dividing each by its
gen x_std = (xvar - r(mean))/r(sd) variance (which are stored temporally by Stata as r(mean)
and r(sd)). The last line reports an OLS regression of these
reg y_std x_std , robust transformed variables, with heteroskedasticity robust
standard errors.
*Method 2
corr yvar xvar The second method is to use the corr command, which does
not report standard errors.
twoway (scatter yvar xvar) (lfit yvar xvar) This pair of commands first draws a scatter plot of yvar
graph export figure1.png, replace against xvar. The second line saves the graph as a .png file.
Also see this tutorial on graphs in Stata.
*start a log file These commands show how to start and close a log file,
log using milwaukee.log, replace which will save a text file of all the commands and output
that appears on in the command window in stata.
*commands go here

*close and save log file


log close

8
Table 2b: R Commands
R command Description
#clear the workspace This sequence of commands shows how to
rm(list=ls()) open Stata datasets in R. The first block of
code clears the work space. The second
#Install and load haven package block of code installs and loads the
install.packages("haven")
library(haven)
“haven” package. The third block of code
changes the working directory to the
#Change working directory and load stata data set location of the data and loads in atlas.dta.
setwd("C:/Users/gbruich/Ec1152/Projects")
atlas <- read_dta("atlas.dta")

# summary stats, unweighted These commands show how to calculate


summary(atlas$yvar) unweighted summary statistics.
mean(atlas$yvar, na.rm=TRUE)
sd(atlas$yvar, na.rm=TRUE)

# Install and load package These commands show how to calculate


install.packages("SDMTools") weighted summary statistics.
library(SDMTools)

#Report weighted summary statistics


wt.mean(atlas$yvar, atlas$count_pooled)
wt.sd(atlas$yvar,atlas$count_pooled)

## subset observations to Wisconsin These commands show how to subset the


wisconsin <- subset(atlas,state == 55) data to observations in only Wisconsin and
in only Milwaukee county.
## subset observations to Milwaukee County
milwaukee <- subset(atlas,state == 55 & county == 079)

#Install and load sandwich and lmtest packages This sequence of commands shows how to
install.packages("sandwich") estimate an ordinary least squares
install.packages("lmtest") regression with heteroskedasticity-robust
library(sandwich) standard errors. The first block of code
library(lmtest)
first loads the necessary packages. The
#Run regression with homoskedasticity-only standard errors second block of code estimates a
mod1 <- lm(yvar~xvar1+xvar2 + xvar3, data = milwaukee) regression of yvar against xvar1, xvar2,
summary(mod1) and xvar3, then reports the estimated
coefficients, homoskedasticity-only
#Report coefficients with heteroskedasticity robust standard errors standard errors, and regression diagnostics
coeftest(mod1, vcov = vcovHC(mod1, type="HC1")) (R2, adjusted R2, RMSE/SER which is
referred to in the output as the Residual
standard error). The last block of code
reports the coefficients with
heteroskedasticity-robust standard errors.
#Method 1 These commands show how to estimate
##Standardize variables correlation coefficients.
milwaukee$x_std <- (milwaukee$yvar -
mean(milwaukee$yvar))/sd(milwaukee$yvar)

milwaukee$y_std <- (milwaukee$xvar -


The first block of code shows how to first
mean(milwaukee$xvar))/sd(milwaukee$xvar) generate standardized versions of the
variables yvar and xvar by subtracting
#Report correlation coefficients from each its mean and then dividing each
#Using a regression by its variance. The last line reports a OLS
mod2 <- lm(y_std ~ x_std, data = milwaukee) regression of these transformed variables,

9
summary(mod2) with heteroskedasticity robust standard
coeftest(mod2, vcov = vcovHC(mod2, type="HC1")) errors.
#Note that regression output matches the following output The second method is to use the cor
cor(milwaukee$kfr_pooled_p25, milwaukee$job_density_2013)
command, which does not report standard
errors.
# Install and load ggplot2 package These commands show how to draw a
install.packages("ggplot2") scatter plot of yvar against xvar1. The
library(ggplot2) geom_smooth part of the code adds an OLS
regression line. The last line saves the
# Draw scatter plot with linear fit line
ggplot(data = milwaukee) + geom_point(aes(x = xvar1, y = yvar)) +
graph as a .png file.
geom_smooth(aes(x = xvar, y = yvar), method = "lm", se = F)

#Save graph as figure1a.png


ggsave("milwaukee_scatter.png")
sink(file="milwaukee_log.txt", split=TRUE) The first line starts a log file. The last line
sink() closes and saves the log file.

10

You might also like