Merge Using R

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Merge/Append using R

(draft)

Oscar Torres-Reyna
Data Consultant
[email protected]

January , 2011 http://dss.princeton.edu/training/


Intro
Merge adds variables to a dataset. This document will use merge function.
Merging two datasets require that both have at least one variable in common (either
string or numeric). If string make sure the categories have the same spelling (i.e.
country names, etc.).
Explore each dataset separately before merging. Make sure to use all possible
common variables (for example, if merging two panel datasets you will need country
and years).

Append adds cases/observations to a dataset. This document will use the rbind
function.
Appending two datasets require that both have variables with exactly the same name.
If using categorical data make sure the categories on both datasets refer to exactly the
same thing (i.e. 1 Agree, 2Disagree, 3 DK on both).

PU/DSS/OTR 2
MERGE EXAMPLE 1
mydata1 mydata2

mydata <- merge(mydata1, mydata2, by=c("country","year"))

edit(mydata)

PU/DSS/OTR 3
MERGE EXAMPLE 2 (one dataset missing a country)
mydata1 mydata3

Merge merges only common cases to both datasets

mydata <- merge(mydata1, mydata3, by=c("country","year"))

edit(mydata)

PU/DSS/OTR 4
MERGE EXAMPLE 2 (cont.) including all data from both datasets
mydata1 mydata3

Adding the option all=TRUE includes all cases from both datasets.

mydata <- merge(mydata1, mydata3, by=c("country","year"), all=TRUE)

edit(mydata)

PU/DSS/OTR 5
MERGE EXAMPLE 3 (many to one)
mydata1

mydata4

mydata <- merge(mydata1, mydata4, by=c("country"))

edit(mydata)

PU/DSS/OTR 6
MERGE EXAMPLE 4 (common ids have different name)
mydata1 mydata5

When common ids have different names use by.x and by.y to match them. R will keep the name of the first dataset (by.x)

mydata <- merge(mydata1, mydata5, by.x=c("country","year"), by.y=c("nations","time"))

edit(mydata)

PU/DSS/OTR 7
MERGE EXAMPLE 5 (different variables, same name)
mydata1 mydata6

When common ids have different names use by.x and by.y to match them. R will keep the name of the first dataset (by.x)
When different variables from two different dataset have the same name, R will assign a suffix .x or .y to make them unique and to
identify which dataset they are coming from.

mydata <- merge(mydata1, mydata6, by.x=c("country","year"), by.y=c("nations","time"))

edit(mydata)

PU/DSS/OTR 8
APPEND

PU/DSS/OTR 9
APPEND EXAMPLE 1

mydata7
mydata <- rbind(mydata7, mydata8)

edit(mydata)

mydata8

PU/DSS/OTR 10
APPEND EXAMPLE 1 (cont.) sorting by country/year

Notice the square brackets and parenthesis

attach(mydata)
mydata_sorted <- mydata[order(country, year),]
detach(mydata)
edit(mydata_sorted)

mydata_sorted

PU/DSS/OTR 11
APPEND EXAMPLE 2 one dataset missing one variable
mydata7 mydata9

If one variable is missing in one dataset you will get an error message

mydata <- rbind(mydata7, mydata9)

Error in rbind(deparse.level, ...) :


numbers of columns of arguments do not match

Possible solutions:

Option A) Drop the extra variable from one of the datasets (in this case mydata7)

mydata7$x3 <- NULL

Option B) Create the variable with missing values in the incomplete dataset (in this case mydata9)

mydata9$x3 <- NA

Run the rbind-- function again. PU/DSS/OTR 12


References/Useful links

Main references for this document:


UCLA R class notes: http://www.ats.ucla.edu/stat/r/notes/managing.htm

Quick-R: http://www.statmethods.net/management/merging.html

DSS Online Training Section http://dss.princeton.edu/training/


Princeton DSS Libguides http://libguides.princeton.edu/dss
John Foxs site http://socserv.mcmaster.ca/jfox/
Quick-R http://www.statmethods.net/
UCLA Resources to learn and use R http://www.ats.ucla.edu/stat/R/
DSS - R http://dss.princeton.edu/online_help/stats_packages/r

PU/DSS/OTR 13
References/Recommended books

An R Companion to Applied Regression, Second Edition / John Fox , Sanford Weisberg, Sage Publications, 2011
Data Manipulation with R / Phil Spector, Springer, 2008
Applied Econometrics with R / Christian Kleiber, Achim Zeileis, Springer, 2008
Introductory Statistics with R / Peter Dalgaard, Springer, 2008
Complex Surveys. A guide to Analysis Using R / Thomas Lumley, Wiley, 2010
Applied Regression Analysis and Generalized Linear Models / John Fox, Sage, 2008
R for Stata Users / Robert A. Muenchen, Joseph Hilbe, Springer, 2010
Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson Addison Wesley,
2007.
Data analysis using regression and multilevel/hierarchical models / Andrew Gelman, Jennifer Hill. Cambridge ;
New York : Cambridge University Press, 2007.
Econometric analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, 2008.
Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O. Keohane, Sidney
Verba, Princeton University Press, 1994.
Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King, Cambridge University
Press, 1989
Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam
Kachigan, New York : Radius Press, c1986
Statistics with Stata (updated for version 9) / Lawrence Hamilton, Thomson Books/Cole, 2006

PU/DSS/OTR 14

You might also like