Accessing Data: Center of Excellence Data Warehousing

Accessing Data
Last Updated : 29 June, 2004
Center of Excellence Data Warehousing
Agenda
Creating datasets using DATA step. Infile statement Different input styles. Combining datasets using DATA step
What Is the SAS System?

The SAS System is an integrated system of software products that enables you to perform
data entry, retrieval, and management report writing and graphics
statistical and mathematical analysis

business planning, forecasting, and decision support operations research and project management quality improvement applications development.
What Is the SAS System?

In addition, you can integrate with SAS many SAS business solutions that enable you to perform large scale business functions, such as
Data Warehousing and Data Mining Human Resources Management and Decision Support
Financial Management and Decision Support
Overview of Base SAS Software

The core of the SAS System is base SAS software SAS language
a programming language that you use to manage your data.
SAS procedures
software tools for data analysis and reporting.
Macro facility
a tool for extending and customizing SAS software programs and for reducing text in your programs.
DATA step debugger

a programming tool that helps you find logic problems in DATA step programs.
Output Delivery System (ODS)

a system that delivers output in a variety of easy-to-access formats, such as SAS data sets, listing files, or Hypertext Markup Language (HTML).
Base SAS data access management analysis presentation
Components of the SAS Language

SAS Files Files with formats or structures known to SAS. All SAS files reside in a SAS data library.
SAS data set
is structured in a format that SAS can process.
SAS catalog
Many different kinds of information that are used in a SAS job are
stored in SAS catalogs, such as instructions for reading and printing data values, or function key settings that you use in the SAS windowing environment.
SAS stored program

contains compiled code that you create and save for repeated use.

SAS Data Sets SAS data file
both describes and physically stores data values.
descriptor portion
describes the contents of the SAS data set to SAS.

Data portion
data that has been collected or calculated. An observation is a collection of data values that usually relate to a single object. A variable is the set of data values that describe a given characteristic.
SAS data view

does not actually store values but create logical SAS data sets
without using the storage space required by SAS data files.
Structure of SAS Data Sets

SAS Data Set General Data Set Information Name Number of Obs. *Label Number of Variables *Date/Time Created Storage Information Information for Each Variable Type Length Position Name *Label *Format *Informat IDNUM NAME WAGECAT S S S S WAGERATE 3392.50 5093.75 . 1572.50 1351 Farr, Sue 161 212 Moore, Ron 2512 Ruth, G H ... ... ... 5151 Coxe, Susan
Descriptor Portion
Data Portion
3163.00

External Files
Data files that you use to read and write data, but which are in a structure unknown to SAS. External files can be used for storing
raw data that you want to read into a SAS data file SAS program statements procedure output
Database Management System Files

SAS software is able to read and write data to and from other vendors'
software, such as many common database management system (DBMS) files. In addition to base SAS software, you must license the SAS/ACCESS software for your DBMS and operating environment.

SAS Language Elements
DATA step
consists of a group of statements in the SAS language that reads raw
data or existing SAS data sets to create a SAS data set.
PROC step
A group of procedure statements used to analyze data in SAS data sets
to produce statistics, tables, reports, charts, and plots, to create SQL queries, and to perform other analyses and operations on your data. They also provide ways to manage and print SAS files.
SAS Macro Facility

a powerful programming tool for extending and customizing your SAS
programs, and for reducing the amount of code that you must enter to do common tasks. Macros are SAS files that contain compiled macro program statements and stored text.
What Can the DATA Step Do?

You can use the DATA step in the following ways to transform your information:
Read from a raw data file into the SAS system.
Raw Data File SAS Data Set
Descriptor
DATA Step

Create multiple SAS data sets in one DATA step.
DATA Step

Combine existing data sets.
SAS Data Set 1
SAS Data Set 2
DATA Step

You can also add or augment information in a variety of ways.
Create accumulating totals. SaleDate 01APR2001 02APR2001 03APR2001 04APR2001 05APR2001
Sale Amt
498.49 946.50 994.97 564.59 783.01
Mth2Dte 498.49 1444.99 2439.96 3004.55 3787.56

Manipulate numeric values.
BirthDay 4253 SAS Function Age 30

Summarize data sets.
Salary 42000 34000 27000 20000 19000 19000
Div HUMRES FINACE FLTOPS FINACE FINACE FLTOPS
Div
DATA Step FINACE FLTOPS HUMRES
DivSal
42000 46000 73000

presenting your data file management And much, much more
Words in the SAS Language

A word or token in the SAS language is a collection of characters that communicates a meaning to SAS. A word or token ends when SAS encounters one of the following:
the beginning of a new token a blank after a name or a number token the ending quotation mark of a literal token.
Each word or token in the SAS language classified into four categories.
Names - a series of characters that begin with a letter or an underscore. Ex.: data, _old, yearcutoff, _n_, year_04, descending Literal - consists of 1 to 32,767 characters enclosed in single or double quotation marks ( Bangalore, 2003-04, Wipros Plan, "Report for the Third Quarter" )
Words in the SAS Language

Number - in general is composed entirely of numeric digits, with an optional decimal point and a leading plus or minus sign. SAS also recognizes numeric values in the following forms as number tokens: scientific (E-) notation, hexadecimal notation, missing value symbols, and date and time literals.
Ex: 1234, -2004, 1.25, 5.4E-1, 30jun04'd
Special character - is usually any single keyboard character other than letters, numbers, the underscore, and the blank. In general, each special character is a single token, although some two-character operators, such as ** and <=, form single tokens.
Ex: =, :, @, , +, /
Names in the SAS Language

A SAS name is a name token that represents
variables SAS data sets formats or informats SAS procedures options arrays statement labels SAS macros or macro variables SAS catalog entries librefs or filerefs.
There are two kinds of names in SAS.

names of elements of the SAS language names supplied by SAS users.

Rules for User-Supplied SAS Names
Members of SAS data libraries (SAS data sets, views, catalogs, indexes) Generation data sets Catalog entries Engines, Librefs, Filerefs, Passwords DATA step variables DATA step variable labels DATA step statement labels Arrays Functions Formats Informats Macros, Macro variables - 32 - 28 - 32 -8 - 32 - 256 - 32 - 32 - 16 -8 -7 - 32

The first character must be a letter (A, B, C, . . ., Z) or underscore (_). Subsequent characters can be letters, numeric digits (0, 1, . . ., 9), or underscores. You can use upper or lowercase letters. SAS processes names as uppercase regardless of how you type them. Blanks cannot appear in SAS names. SAS reserves a few names for automatic variables and variable lists, SAS data sets, and librefs.
When creating variables, do not use the names of special SAS
automatic variables (for example, _N_ and _ERROR_) or special variable list names (for example, _CHARACTER_, _NUMERIC_, and _ALL_). When associating a libref with a SAS data library, do not use SASHELP, SASMSG, SASUSER, WORK . When you create SAS data sets, do not use _NULL_, _DATA_, _LAST_.

Special characters, except for the underscore, are not allowed. In filerefs only, you can use the dollar sign ($), pound sign (#), and at sign (@). When assigning a fileref to an external file, do not use: SASCAT. When you create a macro variable, do not use names that begin with SYS
SAS Dates
SAS dates are special numeric values representing the number of days between January 1, 1960 and a specified date.
1jan1959 1jan1960 1jan1961 1jan2000 DATE9. Informat -365 0
SAS Date Values
366
14610
SAS Date Values
MMDDYY10. Format
Standard Data
The term standard data refers to character and numeric data that SAS recognizes automatically. Some examples of standard numeric data include
35469.93 3E5 (exponential notation) -46859
Standard character data is any character you can type on your keyboard. Standard character values are always left-justified by SAS.
Nonstandard Data
The term nonstandard data refers to character and numeric data that SAS does not recognize automatically. Examples of nonstandard numeric data include
12/12/2012 29FEB2000 4,242 $89,000
Create a SAS Data Set from a Raw Data File

E1232 E2341 E3452 E6781 E8321 E1052 E1062 E8172 E1091 15OCT1999 01JUN1997 26OCT1993 16SEP1992 26NOV1996 27FEB1997 10MAY1987 06JAN2000 20AUG1991 61065 A raw data file 91688 contains employee 32639 information for 28305 the level 1 flight 40440 attendants. Use the 39461 raw data file to create 41463 40650 the work.fltat1 SAS 40950 data set.
Desired Output
Obs 1 2 3 4 5 6 7 8 9 EmpID E1232 E2341 E3452 E6781 E8321 E1052 E1062 E8172 E1091 Hire Date 14532 13666 12352 11947 13479 13572 9991 14615 11554 Salary 61065 91688 32639 28305 40440 39461 41463 40650 40950 Bonus 3053.25 4584.40 1631.95 1415.25 2022.00 1973.05 2073.15 2032.50 2047.50
The DATA Statement

A DATA step always begins with a DATA statement. General form of a DATA statement:
DATA SAS-data-set;
The DATA statement starts the DATA step and names the SAS data set being created.
The INFILE Statement

If you are reading data from a raw data file, you need an INFILE statement. General form of an INFILE statement:
INFILE 'raw-data-file' <options>;
The INFILE statement points to the raw data file being read. Options in the INFILE statement affect how SAS reads the raw data file.
The INPUT Statement

When you are reading from a raw data file, the INPUT statement follows the INFILE statement. General form of an INPUT statement:
INPUT variable-specification ;
The INPUT statement describes the raw data fields and specifies how you want them converted into SAS variables.
Formatted Input
The input style tells SAS where to find the fields and how to read them into SAS.
INPUT @n variable-name informat. ...;
@n - moves the pointer to the starting point of the field. variable-name - names the SAS variable being created. Informat - specifies how many positions to read and how to convert the raw data into a SAS value.
The INPUT Statement

Common SAS informats: $w. - reads a standard character field, where w specifies the width of the field in bytes. W - reads a standard numeric field, where w specifies the width of the field in bytes. DATE9. - reads dates in the form 31DEC2012.
The Assignment Statement

To create a new variable in the DATA step, use an assignment statement:
variable-name=expression;
The assignment statement creates a SAS variable and specifies how to calculate that variable's value.

data work.fltat1; infile 'raw-data-file'; input @1 EmpID $5. @7 HireDate date9. @17 Salary 5.; Bonus=.05*Salary; run;

Partial Log
NOTE: 9 records were read from the infile 'fltat1.dat'. The minimum record length was 21. The maximum record length was 21. NOTE: The data set WORK.FLTAT1 has 9 observations and 4 variables.
Overview of DATA Step Processing
Processing the DATA Step

The SAS System processes the DATA step in two phases:
compilation execution.
When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them. During the compile phase, SAS creates the following three items
input buffer
is a logical area in memory into which SAS reads each record of raw
data when SAS executes an INPUT statement.
program data vector (PDV)

is a logical area in memory where SAS builds a data set, one
observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements.
DATA Step Processing

The data values are assigned to the appropriate variables in the
program data vector. From here, SAS writes the values to a SAS data set as a single observation Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution.
descriptor information
is information that SAS creates and maintains about each SAS
data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables.

The flow of action in the Execution Phase of a simple DATA step
The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. SAS sets the newly created program variables to missing in the program data vector (PDV). SAS reads a data record from a raw data file into the input buffer, or it reads an observation from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET, MODIFY, or UPDATE statement to read a record. SAS executes any subsequent programming statements for the current record.

At the end of the statements, an output, return, and reset occur automatically. SAS writes an observation to the SAS data set, the system automatically returns to the top of the DATA step, and the values of variables created by INPUT and assignment statements are reset to missing in the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here. SAS counts another iteration, reads the next record or observation, and executes the subsequent programming statements for the current observation. The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw data file.
DATA step
Reading External File Data
data bonus_04; [1] infile 'your-input-file'; [2] input IDnumber name $ salary ; [3] bonus=salary * 0.25; [4] run; [5]
1- Begin the DATA step and create a SAS data set called bonus_04. 2- Specify the external file that contains your data. 3- Read a record and assign values to three variables. 4- Calculate a value for variable bonus. 5- Execute the DATA step.
DATA step Input Styles

The INPUT statement reads raw data from instream data lines or external files into a SAS data set and input styles depending on the layout of data values in the records.
INPUT, Formatted - Reads input values from specified columns and assigns them to the corresponding SAS variables INPUT, Column - Reads input values with specified informats and assigns them to the corresponding SAS variables INPUT, List - Scans the input data record for input values and assigns them to the corresponding SAS variables INPUT, Named - Reads data values that appear after a variable name that is followed by an equal sign and assigns them to corresponding SAS variables
DATA step
An informat is an instruction that SAS uses to read data values into a variable.
The INPUT statement with an informat after a variable name is the simplest way to read values into a variable.
$w. DATEw.
Reads standard character data Reads date values in the form ddmmmyy or
ddmmmyyyy
MMDDYYw. -
Reads date values in the form mmddyy or
mmddyyyy
w.d
COMMAw.d -
Reads standard numeric data Removes embedded characters
Accessing Data List input

List input uses a scanning method for locating data values. Data values are must be separated by at least one blank (or other defined delimiter). List input requires only that you specify the variable names and a dollar sign ($), if defining a character variable.
Libname new /wipro/dw/data; data new.scores; length name $ 12; input name $ score1 score2; datalines; aaaaaa 1132 1187 bbbbbbbb 1015 1102 cccc 246 357 ; Run;
Modified List Input

data scores; infile datalines dsd; input Name : $9. Score1-Score3 Team ~ $25. Div $; datalines; Smith,12,22,46,"Green Hornets, Atlanta",AAA Mitchel,23,19,25,"High Volts, Portland",AAA Jones,09,17,54,"Vulcans, Las Vegas",AA ; run; output Name Score1 Score2 Score3 Smith 12 22 46 Mitchel 23 19 25 Jones 09 17 54 Team "Green Hornets, Atlanta "High Volts, Portland "Vulcans, Las Vegas Div AAA AAA AA
Modified List Input

The : (colon) format modifier enables you to use list input but also to specify an informat after a variable name, whether character or numeric. SAS reads until it encounters a blank column. The ~ (tilde) format modifier enables you to read and retain single quotation marks, double quotation marks, and delimiters within character values. If you want SAS to read consecutive delimiters as though there is a missing value between them, specify the DSD option in the INFILE statment. To read and store a character input value longer than 8 bytes, define a variable's length by using a LENGTH, INFORMAT. Character values cannot contain embedded blanks when the file is delimited by blanks. Fields must be read in order. Data must be in standard numeric or character format.
Data Accessing - Column Input

Column input enables you to read standard data values that are aligned in columns in the data records. Specify the variable name, followed by a dollar sign ($) if it is a character variable, and specify the columns in which the data values are located in each record: data scores; infile datalines truncover; input name $ 1-12 score1 17-20 score2 27-30; datalines; 123456789101112131415161718192021222324252627282930 Riley 1132 987 Henderson 1015 1102 ; run;
Data Accessing - Column Input

To use column input, data values must be
in the same field on all the input lines in standard numeric or character form.
Features of column input include the following

Character values can contain embedded blanks. Character values can be from 1 to 32,767 characters long. Placeholders, such as a single period (.), are not required for missing data. Input values can be read in any order, regardless of their position in the record. Values or parts of values can be reread. Both leading and trailing blanks within the field are ignored. Values do not need to be separated by blanks or other delimiters. Use the TRUNCOVER option on the INFILE statement to ensure that SAS handles data values of varying lengths appropriately.
Data Accessing - Formatted Input

Formatted input combines the flexibility of using informats with many of the features of column input. By using formatted input, you can read nonstandard data for which SAS requires additional instructions. Formatted input is typically used with pointer controls that enable you to control the position of the input pointer in the input buffer when you read data. data scores; input name $12. +4 score1 comma5. +6 score2 comma5.; datalines; Riley 1,132 1,187 Henderson 1,015 1,102 ;
Data Accessing - Formatted Input

Important points about formatted input are
Characters values can contain embedded blanks. Character values can be from 1 to 32,767 characters long. Placeholders, such as a single period (.), are not required for missing data. With the use of pointer controls to position the pointer, input values can be read in any order, regardless of their positions in the record. Values or parts of values can be reread. Formatted input enables you to read data stored in nonstandard form, such as packed decimal or numbers with commas.
Data Accessing - Named Input

You can use named input to read records in which data values are preceded by the name of the variable and an equal sign (=). The following INPUT statement reads the data lines containing equal signs. data games; input name=$ score1= score2=; datalines; name=abc score1=1132 score2=1187 ; run;
The MISSOVER Option

The MISSOVER option prevents SAS from loading a new record when the end of the current record is reached. General form of the INFILE statement with the MISSOVER option: INFILE raw-data-file MISSOVER; If SAS reaches the end of the row without finding values for all fields, variables without values are set to missing.
Using the MISSOVER Option

data airplanes3; length ID $ 5; infile 'raw-data-file' dlm=',' missover; input ID $ InService : date9. PassCap CargoCap; run;
Raw Data File
50001 ,25feb1989,132, 530 50002, 11nov1989,152 50003, 22oct1991,168, 530 50004, 4feb1993,172 50005, 24jun1993, 170, 510 50006, 20dec1994, 180, 520
Using the MISSOVER Option

Partial SAS Log NOTE: 6 records were read from the infile 'aircraft3.dat'. The minimum record length was 19. The maximum record length was 26. NOTE: The data set WORK.AIRPLANES3 has 6 observations and 4 variables. NOTE: DATA statement used: real time 0.42 seconds cpu time 0.07 seconds
Missing Values without Placeholders

There is missing data represented by two consecutive delimiters. 50001 ,25feb1989,, 540 50002, 11nov1989,132, 530 50003, 22oct1991,168, 530 50004, 4feb1993,172, 550 50005, 24jun1993,, 510 50006, 20dec1994, 180,520

By default, SAS treats two consecutive delimiters as one. Missing data should be represented by a placeholder. 5 0 0 0 1 ,25feb1989 , . , 5 3 0
The DSD Option

General form of the DSD option in the INFILE statement: INFILE file-name DSD;

The DSD option
sets the default delimiter to a comma treats consecutive delimiters as missing values enables SAS to read values with embedded delimiters if the value is surrounded by double quotes.
Using the DSD Option

data airplanes4; length ID $ 5; infile 'raw-data-file' dsd; input ID $ InService : date9. PassCap CargoCap; run;
50001 ,25feb1989,, 540 50002, 11nov1989,132, 530 50003, 22oct1991,168, 530 50004, 4feb1993,172, 550 50005, 24jun1993,, 510 50006, 20dec1994, 180,520
INFILE Statement Options

Problem Non-blank delimiters Missing data at end of row Missing data represented by consecutive delimiters or Embedded delimiters where values are surrounded by double quotes Option DLM='delimiter(s)' MISSOVER DSD
These options can be used separately or together in the INFILE statement.
Multiple Records Per Observation

A raw data file has three records per employee. Record 1 contains the first and last names, record 2 contains the city and state of residence, and record 3 contains the employees phone number.
Farr, Sue Anaheim, CA 869-7008 Anderson, Kay B. Chicago, IL 483-3321 Tennenbaum, Mary Ann Jefferson, MO 589-9030
Desired Output
The SAS data set should have one observation per employee.
LName Farr Anderson Tennenbaum FName Sue Kay B. Mary Ann City Anaheim Chicago Jefferson State CA IL MO Phone 869-7008 483-3321 589-9030
Multiple INPUT Statements

data address; length LName FName $ 20 City $ 25 State $ 2 Phone $ 8; infile 'raw-data-file' dlm=','; Load Record input LName $ FName $; Load Record input City $ State $; Load Record input Phone $; run;
Line Pointer Controls

You can also use line pointer controls to control when SAS loads a new record.
DATA SAS-data-set; INPUT var-1 var-2 var-3 / var-4 var-5; additional SAS statements
SAS loads the next record when it encounters a forward slash.
Reading Multiple Records Per Observation

data address; length LName FName $ 20 City $ 25 State $ 2 Phone $ 8; infile 'raw-data-file' dlm=','; Load Record input LName $ FName $ / Load Record City $ State $ / Load Record Phone $; run;
Reading Multiple Records Per Observation

Partial Log
NOTE: 9 records were read from the infile 'addresses.dat'. The minimum record length was 8. The maximum record length was 20. NOTE: The data set WORK.ADDRESS has 3 observations and 5 variables.
Reading Raw Data Files with Multiple Records Per Observation

Mixed Record Types
101 USA 1-20-1999 3295.50 3034 EUR 30JAN1999 1876,30 101 USA 1-30-1999 2938.00 128 USA 2-5-1999 2908.74 1345 EUR 6FEB1999 3145,60 109 USA 3-17-1999 2789.10

Desired Output
Sales ID Sale Date
Location
Amount
101 3034 101 128 1345 109
USA EUR USA USA EUR USA
14264 14274 14274 14280 14281 14320
3295.50 1876.30 2938.00 2908.74 3145.60 2789.10

The Single Trailing @
The single trailing @ option holds a raw data record in the input buffer until SAS executes an INPUT statement with no trailing @ reaches the bottom of the DATA step.
General form of an INPUT statement with the single trailing @:
INPUT var1 var2 var3 @;

Processing the Trailing @
Hold record for next INPUT statement.
Load next record.
input SalesID $ Location $ @; if location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;
Multiple Observations Per Record

The raw data file RETIRE contains each employees identification number and this years contribution to his or her retirement plan. Each record contains information for three employees.
E00973 1400 E09872 2003 E73150 2400 E45671 4500 E34805 1980 E47200 4371 EmpID Contrib 1400 2003 2400 4500 1980 4371
Desired Output
E00973 E09872 E73150 E45671 E34805 E47200

Processing: What Is Required?
E00973 1400 E09872 2003 E73150 2400 Read for Obs. 1 Read for Obs. 2 Read for Obs. 3
Process Other Statements
Output
Output
Output

The Double Trailing @
The double trailing @ holds the raw data record across iterations of the DATA step until the line pointer moves past the end of the line. INPUT var1 var2 var3 @@;
data work.retire; length EmpID $ 6; infile 'raw-data-file'; input EmpID $ Contrib @@; run;
Hold until end of record.

Partial Log
NOTE: 2 records were read from the infile 'retire.dat'. The minimum record length was 35. The maximum record length was 36. NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.RETIRE has 6 observations and 2 variables.
The SAS went to a new line message is expected because the @@ option indicates that SAS should read until the end of each record.

Trailing @ Versus Double Trailing @
Option Trailing @ INPUT var-1... @; Double trailing @ INPUT var-1 ... @@; Effect Holds raw data record until 1) an INPUT statement with no trailing @ 2) the bottom of the DATA step. Holds raw data records in input buffer until SAS reads past end of line.
Reading Hierarchical Raw Data Files

Processing Hierarchical Files
Many files are hierarchical in structure, consisting of a header record one or more related detail records. Typically, each record contains a field that identifies whether it is a header record or a detail record.
Header Detail Detail Header Header Detail Header Detail Detail

Processing Hierarchical Files
You can read a hierarchical file into a SAS data set by creating one observation per detail record and storing the header information as part of each observation.
Hierarchical File
SAS Data Set
Header 1 Detail 1 Detail 2 Detail 3 Header 2 Detail 1 Header 3 Detail 1 Detail 2
Header Variables Header 1 Header 1 Header 1 Header 2 Header 3 Header 3
Detail Variables
Detail 1 Detail 2 Detail 3 Detail 1 Detail 1 Detail 2

Creating One Observation Per Detail
E:Adams:Susan D:Michael:C D:Lindsay:C E:Porter:David D:Susan:S E:Lewis:Dorian D. D:Richard:C E:Dansky:Ian E:Nicholls:James D:Roberta:C E:Slaydon:Marla D:John:S The raw data file DEPENDANTS has a header record containing the name of the employee and a detail record for each dependant on the employees health insurance.

Desired Output
Personnel would like a list of all the dependants and the name of the associated employee.
EmpLName
Adams Adams Porter Lewis Nicholls Slaydon
EmpFName
Susan Susan David Dorian D. James Marla
DepName
Michael Lindsay Susan Richard Roberta John
Relation
C C S C C S

The RETAIN Statement
General form of the RETAIN statement:
RETAIN variable-name <initial-value>;
The RETAIN statement prevents SAS from reinitializing the values of new variables at the top of the DATA step. This means that values from previous records are available for processing.

Hold EmpLName and EmpFName
data dependants(drop=Type); length EmpLName EmpFName DepName $ 20 Relation $ 1; retain EmpLName EmpFName; infile 'raw-data-file' dlm=':'; input Type $ @; if Type='E' then input EmpLName $ EmpFName $; else do; input DepName $ Relation $; output; end; run;
Example Modified List Input

This example explains
Compilation Phase Execution Phase.
Example Modified List Input

Raw Data File
50001 50002 50003 50004 50005 50006
Compile
4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
ID $ 5
PDV
Raw Data File

50001 50002 50003 50004 50005 50006
Compile
Input Buffer
ID $ 5
PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8
...
Raw Data File

50001 50002 50003 50004 50005 50006
Execute
Input Buffer
ID $ 5
.
...
Raw Data File

50001 50002 50003 50004 50005 50006 4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520
Input Buffer
5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0
ID $ 5
.
...
Raw Data File

Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
ID $ 5
50001
. 10627
. 132
530 .
...
Raw Data File

Implicit return Implicit output
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
5 3 0
ID $ 5
50001
530 . Write out observation to airplanes. . 132
. 10627
...
Raw Data File

Implicit output
Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2
5 3 0
ID $ 5
50001
. 10627
...
Raw Data File

Implicit return
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
5 3 0
ID $ 5
50001
. 10627
. 132
530 .
...
Raw Data File

Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
ID $ 5
.
...
Raw Data File

Input Buffer
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0
ID $ 5
50002
. 10907
. 152
540 .
...
Raw Data File

Implicit return Implicit output
Input Buffer
5 0 0 0 2
1 1 n o v 1 98 9
1 5 2
5 4 0
ID $ 5
50002
. 10907
...
Raw Data File

Implicit output
Input Buffer
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2
5 4 0
ID $ 5
50002
. 10907
. 152
540 .
...
Raw Data File

Implicit return
Input Buffer
5 0 0 0 2
1 1 n o v 1 98 9
1 5 2
5 4 0
ID $ 5
50002
. 10907
. 152
540 .
...
Raw Data File

Input Buffer
5 0 0 0 2
Continue processing until 1 1 n o v 1 98 9 1 5 2 5 4 0 end of the raw data file. PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8
ID $ 5
Output of Dataset
proc print data=airplanes noobs; run; In Service 10627 10907 11617 12088 12228 12772 Pass Cap 132 152 168 172 170 180 Cargo Cap 530 540 530 550 510 520
ID 50001 50002 50003 50004 50005 50006
Combining SAS Data Sets

Create a Data set from two or more existing data sets by joining observation side-by-side or appends the observations from one data set to another data set. Methods to combine SAS data sets
concatenating interleaving one-to-one reading one-to-one merging match merging updating.
Combine SAS data sets
Concatenating
Concatenating the data sets appends the observations from one data set to another data set. The DATA step reads DATA1 sequentially until all observations have been processed, and then reads DATA2. Data set COMBINED contains the results of the concatenation.

Interleaving
intersperses observations from two or more data sets, based on one or more common variables.

One-to-One Reading and One-toOne Merging.
One-to-one reading combines observations from two or more SAS data sets by creating observations that contain all of the variables from each contributing data set. Observations are combined based on their relative position in each data set. The DATA step stops after it has read the last observation from the smallest data set. One-to-one merging is similar to a one-to-one reading, with two exceptions: you use the MERGE statement instead of multiple SET statements, and the DATA step reads all observations from all data sets.

Match merging
combines observations from two or more SAS data sets into a single observation in a new data set based on the values of one or more common variables.

Identifying Data Set Contributors
When you read multiple SAS data sets in one DATA step, you can use the IN= data set option to detect which data set contributed to an observation. General form of the IN= data set option:
SAS-data-set(IN=variable)
where variable is any valid SAS variable name. Variable is a temporary numeric variable with a value of:
0 to indicate false; the data set did not contribute to the current
observation 1 to indicate true; the data set did contribute to the current observation

IN= Data Set Option Transact
Num 111 111 113 114 116
Trans D C C D C
Amnt 126.32 560 235 14.56 371.69
Num 111 112 114 115 116
Branch
Branch M.G.Road Sivaji Nagar Madiwala Koramangala BTM
A data set named Newtrans shows this weeks transactions. A data set named noactiv shows accounts with no transactions this week. A data set named noacct shows transactions with no matching account number.

Num 111 111 113 114 116 Trans D C C D C Amnt 126.32 560 235 14.56 371.69 Num 111 112 114 115 116 Branch M.G.Road Sivaji Nagar Madiwala Koramangala BTM
Branch data Transact newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch); merge prog2.transact(in=InTrans) prog2.branch(in=InBanks); by ActNum; if InTrans and InBanks then output newtrans; else if InBanks and not InTrans then output noactiv; else if InTrans and not InBanks then output noacct; run;

A data set named Newtrans shows this weeks Num Trans Amnt Branch transactions.
111 111 114 116 D C D C 126.32 560 14.56 371.69 M.G.Road M.G.Road Madiwala BTM
A data set named noactiv shows accounts with no Num Branch transactions this week.
112 115 Sivaji Nagar Koramangala
A data set named noacct shows transactions with no matching account number.
Num 113 Trans Amnt C 235
Questions

Accessing Data: Center of Excellence Data Warehousing

Uploaded by

Copyright:

Available Formats

Accessing Data: Center of Excellence Data Warehousing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Accessing Data: Center of Excellence Data Warehousing

Uploaded by

Copyright:

Available Formats

Accessing Data

Last Updated : 29 June, 2004

Center of Excellence Data Warehousing

What Is the SAS System?

statistical and mathematical analysis

What Is the SAS System?

Financial Management and Decision Support

Overview of Base SAS Software

DATA step debugger

Output Delivery System (ODS)

Base SAS data access management analysis presentation

Components of the SAS Language

SAS stored program

Components of the SAS Language

describes the contents of the SAS data set to SAS.

SAS data view

without using the storage space required by SAS data files.

Structure of SAS Data Sets

Components of the SAS Language

Database Management System Files

Components of the SAS Language

data or existing SAS data sets to create a SAS data set.

SAS Macro Facility

What Can the DATA Step Do?

What Can the DATA Step Do?

What Can the DATA Step Do?

SAS Data Set 2

What Can the DATA Step Do?

Mth2Dte 498.49 1444.99 2439.96 3004.55 3787.56

What Can the DATA Step Do?

BirthDay 4253 SAS Function Age 30

What Can the DATA Step Do?

Salary 42000 34000 27000 20000 19000 19000

Div HUMRES FINACE FLTOPS FINACE FINACE FLTOPS

What Can the DATA Step Do?

Words in the SAS Language

Words in the SAS Language

Names in the SAS Language

There are two kinds of names in SAS.

Names in the SAS Language

Names in the SAS Language

Names in the SAS Language

Create a SAS Data Set from a Raw Data File

The DATA Statement

The INFILE Statement

The INPUT Statement

The INPUT Statement

The Assignment Statement

Create a SAS Data Set from a Raw Data File

Create a SAS Data Set from a Raw Data File

Overview of DATA Step Processing

Processing the DATA Step

data when SAS executes an INPUT statement.

program data vector (PDV)

DATA Step Processing

DATA Step Processing

DATA Step Processing

DATA step Input Styles

Reads date values in the form mmddyy or