Merge Using R
Merge Using R
Merge Using R
(draft)
Oscar Torres-Reyna
Data Consultant
[email protected]
Append adds cases/observations to a dataset. This document will use the rbind
function.
Appending two datasets require that both have variables with exactly the same name.
If using categorical data make sure the categories on both datasets refer to exactly the
same thing (i.e. 1 Agree, 2Disagree, 3 DK on both).
PU/DSS/OTR 2
MERGE EXAMPLE 1
mydata1 mydata2
edit(mydata)
PU/DSS/OTR 3
MERGE EXAMPLE 2 (one dataset missing a country)
mydata1 mydata3
edit(mydata)
PU/DSS/OTR 4
MERGE EXAMPLE 2 (cont.) including all data from both datasets
mydata1 mydata3
Adding the option all=TRUE includes all cases from both datasets.
edit(mydata)
PU/DSS/OTR 5
MERGE EXAMPLE 3 (many to one)
mydata1
mydata4
edit(mydata)
PU/DSS/OTR 6
MERGE EXAMPLE 4 (common ids have different name)
mydata1 mydata5
When common ids have different names use by.x and by.y to match them. R will keep the name of the first dataset (by.x)
edit(mydata)
PU/DSS/OTR 7
MERGE EXAMPLE 5 (different variables, same name)
mydata1 mydata6
When common ids have different names use by.x and by.y to match them. R will keep the name of the first dataset (by.x)
When different variables from two different dataset have the same name, R will assign a suffix .x or .y to make them unique and to
identify which dataset they are coming from.
edit(mydata)
PU/DSS/OTR 8
APPEND
PU/DSS/OTR 9
APPEND EXAMPLE 1
mydata7
mydata <- rbind(mydata7, mydata8)
edit(mydata)
mydata8
PU/DSS/OTR 10
APPEND EXAMPLE 1 (cont.) sorting by country/year
attach(mydata)
mydata_sorted <- mydata[order(country, year),]
detach(mydata)
edit(mydata_sorted)
mydata_sorted
PU/DSS/OTR 11
APPEND EXAMPLE 2 one dataset missing one variable
mydata7 mydata9
If one variable is missing in one dataset you will get an error message
Possible solutions:
Option A) Drop the extra variable from one of the datasets (in this case mydata7)
Option B) Create the variable with missing values in the incomplete dataset (in this case mydata9)
mydata9$x3 <- NA
Quick-R: http://www.statmethods.net/management/merging.html
PU/DSS/OTR 13
References/Recommended books
An R Companion to Applied Regression, Second Edition / John Fox , Sanford Weisberg, Sage Publications, 2011
Data Manipulation with R / Phil Spector, Springer, 2008
Applied Econometrics with R / Christian Kleiber, Achim Zeileis, Springer, 2008
Introductory Statistics with R / Peter Dalgaard, Springer, 2008
Complex Surveys. A guide to Analysis Using R / Thomas Lumley, Wiley, 2010
Applied Regression Analysis and Generalized Linear Models / John Fox, Sage, 2008
R for Stata Users / Robert A. Muenchen, Joseph Hilbe, Springer, 2010
Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson Addison Wesley,
2007.
Data analysis using regression and multilevel/hierarchical models / Andrew Gelman, Jennifer Hill. Cambridge ;
New York : Cambridge University Press, 2007.
Econometric analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, 2008.
Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert O. Keohane, Sidney
Verba, Princeton University Press, 1994.
Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King, Cambridge University
Press, 1989
Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam
Kachigan, New York : Radius Press, c1986
Statistics with Stata (updated for version 9) / Lawrence Hamilton, Thomson Books/Cole, 2006
PU/DSS/OTR 14