Topic: The SET, MERGE, UPDATE Statements

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

Topic: The SET, MERGE, UPDATE Statements

1. Set statement

2. Merge statement

3. Updating statement

4. PROC EXPORT/IMPORT

5. PROC CONTENTS

1
Overview
One of SAS’s greatest strengths is its ability to combine and process more than one
data set at a time. The main tools used to do this are the SET, MERGE and UPDATE
statements.

1. SET statement

1.1Concatenating data set using SET statement


It reads an observation from one or more SAS data sets. The SET statement is flexible
and has a variety of uses in SAS programming. These uses are determined by the options
and statements that you use with the SET statement:
data one;
input year pop $ @@;
datalines;
1991 500K 1992 501K 1993 502K
;
data two;
input year pop $ @@;
datalines;
1991 400K 1992 401K 1993 402K
;
data three;
input year pop $ @@;
datalines;
1991 300K 1992 301K 1993 302K 1994 303K
;
data combine_1;
set one two three;
run;

data con1;
input custom_id $ product $ 12.;
cards;
28901 pentium IV
36815 pentium III
21224 pentium IV
;
data con2;
input custom_id $ product $ 12.;
cards;
18601 pentium IV
24683 pentium III
851921 pentium IV
61831 pentium IV
;
data con3;
set con1;
set con2;
run;

1.2 Interleaving several datasets using SET statement

2
If you want to combine several datasets so that observations sharing a common value are
all adjacent to each other, you can list the datasets on SET statement, and specify the
variable to be used on a BY statement. Note: The data sets to be interleaved must already
be sorted by the variable(s) listed in the BY statement.

/*Creating a new data set from multiple data sets based upon sorted
order*/;
data animal;
input common $ animal $;
datalines;
a Ant
a Ape
b Bird
c Cat
d Dog
;
data plant;
input common $ plant $;
datalines;
a Apple
b Banana
c Coconut
d Dewberry
e Eggplant
f Fig
;
data interleaving;
set animal plant;
by common;
run;
proc print data=interleaving;
run;

2. Merging datasets
Overview
The Merge statement is flexible and has a variety of uses in SAS programming. It joins
observations from two or more SAS data sets into single observations. This section
describes basic uses of MERGE. Other applications include using more than one BY
variable, merging more than two data sets, and merging a few observations with all
observations in another data set.

One-to-One Merging
One-to-one merging combines observations from two or more SAS data sets into a single
observation in a new data set. To perform a one-to-one merge, we use the MERGE
statement without a BY statement. SAS combines the first observation from all data sets
that are named in the MERGE statement into the first observation in the new data set, the
second observation from all data sets into the second observation in the new data set, and
so on. In a one-to-one merge, the number of observations in the new data set is equal to
the number of observations in the largest data set named in the MERGE statement.
Example:
/*One-to-One MERGE by combining two data sets*/;
data merg_one;

3
input name $ age;
datalines;
Chris 36
Jane 21
Jerry 30
Joe 49
;
data merg_two;
input name $ salary;
format salary dollar10.;
datalines;
Chris 33000
Jane 40000
Jerry 60000
Joe 26000
Zoe 60000
;
data both;
merge merg_one
merg_two;
run;
proc print data=both;
run;

Match-Merging
Match-merging combines observations from two or more SAS data sets into a single
observation in a new data set according to the values of a common variable. The number
of observations in the new data set is the sum of the largest number of observations in
each BY group in all data sets. To perform a match-merge, we use a BY statement
immediately after the Merge statement. The variables in the BY statement must be
common to all data sets. Only one BY statement can accompany each MERGE statement
in a DATA step.
Examples:

/* Create sample data */ ;


data merg1;
input id name& $20.;
datalines;
1 Nay Rong
2 Kelly Windsor
3 Julio Meraz
4 Richard Krabill
5 Rita Giuliano
;
data merg2;
input id sale;
format sale dollar10.;
datalines;
1 28000
2 30000
3 35000
4 25000
5 40000
;

4
data merg3;
input id bonus;
format bonus dollar10.;
datalines;
1 2000
2 4000
3 3000
4 2500
5 2800
;
data final;
merge merg1
merg2
merg3;
by id;
run;
proc print data=final;
run;

/* One-to-many or many-to-one merge */;


/* Goal: Combine two data sets by common variables when there are
duplicates in only one data set.*/;
data one;
input id $ fruit $;
datalines;
a apple
a apple
b banana
c coconut
;

data two;
input id $ color $;
datalines;
a amber
b brown
c cream
c cocoa
c carmel
;

data both;
merge one two;
by id;
run;

Other examples with Merge In=:


/*Merge multiple data sets and output matches only*/;
data file1;
input var name $;
datalines;
100 Anja
200 Bob
400 Chandra
600 Darrin

5
;
data file2;
infile cards dsd truncover;
input var address $ 13.;
datalines;
100,34 Smith Road
200,67 Burt Ave
300,12 You St
400,45 Younge St
500,79 Wellington
600,23 Done Road
;
data file3;
input var zip;
datalines;
100 28092
200 27502
300 27539
600 27526
;
data three;
merge file1 (in=a) file2 (in=b) file3 (in=c);
by var;
if a and b and c;
run;
/*Merge data sets by a common variable and create output data
sets based upon observation origin*/;
data fileone;
input id $ name $ dept $ project $;
datalines;
000 Miguel A12 Document
111 Fred B45 Survey
222 Diana B45 Document
888 Monique A12 Document
999 Vien D03 Survey
;
data filetwo;
input id $ name $ projhrs;
datalines;
111 Fred 35
222 Diana 40
777 Steve 0
888 Monique 37
999 Vien 42
;
data both one_only two_only;
merge fileone(in=in1) filetwo(in=in2);
by id;
if in1 and in2 then output both;
else if in1 then output one_only;
else output two_only;
run;
proc print data=both;
run;
Note: Use IN= logic to determine if the current BY group is found in both data sets.

6
3. Update statement
It updates a master file by applying transactions.
Requirements
 The UPDATE statement must be accompanied by a BY statement that specifies
the variables by which observations are matched.
 The BY statement should immediately follow the UPDATE statement to which it
applies.
 The data sets listed in the UPDATE statement must be sorted by the values of the
variables listed in the BY statement, or they must have an appropriate index.
 Each observation in the master data set should have a unique value of the BY
variable or BY variables. If there are multiple values for the BY variable, only the
first observation with that value is updated. The transaction data set can contain
more than one observation with the same BY value. (Multiple transaction
observations are all applied to the master observation before it is written to the
output file.)

Example:
data master;
input part_id price;
cards;
109 218
110 156
111 98
112 36
113 45
;
data trans;
input part_id price;
cards;
109 208
110 149
113 .
121 78
122 46
;
data new;
update master trans;
by part_id;
run;

4. PROC EXPORT/IMPORT
PROC EXPORT
The EXPORT procedure reads data from a SAS data set and writes it to an external data
source. External data sources can include DBMS tables, PC files, spreadsheets, and
delimited external files (which are files that contain columns of data values that are
separated by a delimiter such as a blank or a comma).

PROC EXPORT DATA=SAS-data-set


OUTFILE=filename| OUTTABLE=table-name
<DBMS=identifier> <REPLACE>;

7
Required Arguments
DATA=SAS-data-set
identifies the input SAS data set with either a one- or two-level SAS name (library and
member name). If you specify a one-level name, PROC EXPORT assumes the WORK
library.
OUTFILE="filename"
specifies the complete path and filename of the output PC file, spreadsheet, or delimited
external file. If the name does not include special characters (like the backslash in a path),
lowercase characters, or spaces, you can omit the quotes.

OUTTABLE="tablename"
specifies the table name of the output DBMS table. If the name does not include special
characters (like question marks), lowercase characters, or spaces, you can omit the
quotes. Note that the DBMS table name may be case-sensitive.

Options

DBMS=identifier
specifies the type of data to export. For example, DBMS=DBF specifies to export a
dBASE file. For PC files, spreadsheets, and delimited external files, you do not have to
specify DBMS= if the filename specified with OUTFILE= contains a valid extension so
that PROC EXPORT can recognize the type of data. DBMS = specifies the type of file to
export, e.g. ACCESS, CSV, DLM, EXCEL, TAB.

REPLACE
overwrites an existing file. If you do not specify REPLACE, PROC EXPORT does not
overwrite an existing file.

/*Exporting a Excel File*/;


data expt;
input Product $ 1-10 Quantity Price ;
datalines;
Tea 10 16.00
Beer 24 19.00
Syrup 12 10.00
Seasoning 48 22.00
Mix 36 21.35
;
proc export data=expt
outfile="c:\Prices.xls"
DBMS=Excel2000;
*replace;
run;

/*Exporting a Delimited External File*/;


data expt_dlm;
input id Name $ Sex $ Age Height Weight;
cards;
1 Alice F 13 56.5 84.0
2 Becka F 13 65.3 98.0
3 Gail F 14 64.3 90.0

8
4 Karen F 12 56.3 77.0
5 Kathy F 12 59.8 84.5
6 Mary F 15 66.5 112.0
7 Sandy F 11 51.3 50.5
8 Sharon F 15 62.5 112.5
9 Tammy F 14 62.8 102.5
10 Alfred M 14 69.0 112.5
11 Duke M 14 63.5 102.5
12 Guido M 15 67.0 133.0
13 James M 12 57.3 83.0
14 Jeffrey M 13 62.5 84.0
15 John M 12 59.0 99.5
16 Philip M 16 72.0 150.0
17 Robert M 12 64.8 128.0
18 Thomas M 11 57.5 85.0
19 William M 15 66.5 112
;
proc export data=expt_dlm
outfile="c:\class.txt"
dbms=dlm ;
*replace;
*delimiter=',';
run;

/*Exporting a Microsoft access table*/;


proc export data=expt_dlm
outtable="customers"
dbms=access
replace;
database="c:\mydatabase.mdb";
run;

PROC IMPORT
The IMPORT procedure reads data from an external data source and writes it to a SAS
data set. External data sources can include DBMS tables, PC files, spreadsheets, and
delimited external files (which are files containing columns of data values that are
separated by a delimiter such as a blank or a comma).

PROC IMPORT
DATAFILE="filename" | TABLE="tablename"
OUT=SAS-data-set
<DBMS=identifier><REPLACE>;
<data-source-statements>;

/*Importing a Delimited External File*/;


proc import datafile="c:/class.txt"
out=mydata
dbms=dlm
replace;
*delimiter='';
getnames=yes;
run;

/*Importing an Excel Spreadsheet*/;

9
proc import datafile="c:\prices.xls"
out=work.accounts
DBMS=EXCEL2000 REPLACE;
getnames=yes;
run;

/*Importing an Microsoft Access Table*/;


proc import table="customers"
out=work.cust
dbms=access;
uid="userid";
pwd="mypassword";
database="c:\myfiles\east.mdb";
wgdb="c:\winnt\system32\security.mdb";
run;

5. PROC CONTENTS
You can use the CONTENTS procedure on hosts to find which base SAS engine was
used to create a SAS file and prints descriptions of the contents of one or more files from
a SAS data library.

Useful Options:

Position - output lists the variables by their position in the data set (default is
alphabetical).

Short - output is just the variable names in a row by row format.

Out=filename - creates a data set where each observation is a variable from the original
data set.

Varnum- print a list of the variables by their logical position in the data set

Noprint- suppress the printing of the output

Directory- print a list of the SAS files in the SAS data library

10

You might also like