Accessing Data: Center of Excellence Data Warehousing
Accessing Data: Center of Excellence Data Warehousing
Accessing Data: Center of Excellence Data Warehousing
Agenda
Creating datasets using DATA step. Infile statement Different input styles. Combining datasets using DATA step
SAS procedures
software tools for data analysis and reporting.
Macro facility
a tool for extending and customizing SAS software programs and for reducing text in your programs.
SAS catalog
Many different kinds of information that are used in a SAS job are
stored in SAS catalogs, such as instructions for reading and printing data values, or function key settings that you use in the SAS windowing environment.
data that has been collected or calculated. An observation is a collection of data values that usually relate to a single object. A variable is the set of data values that describe a given characteristic.
Descriptor Portion
Data Portion
3163.00
software, such as many common database management system (DBMS) files. In addition to base SAS software, you must license the SAS/ACCESS software for your DBMS and operating environment.
PROC step
A group of procedure statements used to analyze data in SAS data sets
to produce statistics, tables, reports, charts, and plots, to create SQL queries, and to perform other analyses and operations on your data. They also provide ways to manage and print SAS files.
programs, and for reducing the amount of code that you must enter to do common tasks. Macros are SAS files that contain compiled macro program statements and stored text.
DATA Step
DATA Step
DATA Step
Sale Amt
498.49 946.50 994.97 564.59 783.01
Div
DATA Step FINACE FLTOPS HUMRES
DivSal
42000 46000 73000
Each word or token in the SAS language classified into four categories.
Names - a series of characters that begin with a letter or an underscore. Ex.: data, _old, yearcutoff, _n_, year_04, descending Literal - consists of 1 to 32,767 characters enclosed in single or double quotation marks ( Bangalore, 2003-04, Wipros Plan, "Report for the Third Quarter" )
Special character - is usually any single keyboard character other than letters, numbers, the underscore, and the blank. In general, each special character is a single token, although some two-character operators, such as ** and <=, form single tokens.
Ex: =, :, @, , +, /
automatic variables (for example, _N_ and _ERROR_) or special variable list names (for example, _CHARACTER_, _NUMERIC_, and _ALL_). When associating a libref with a SAS data library, do not use SASHELP, SASMSG, SASUSER, WORK . When you create SAS data sets, do not use _NULL_, _DATA_, _LAST_.
SAS Dates
SAS dates are special numeric values representing the number of days between January 1, 1960 and a specified date.
1jan1959 1jan1960 1jan1961 1jan2000 DATE9. Informat -365 0
SAS Date Values
366
14610
SAS Date Values
MMDDYY10. Format
Standard Data
The term standard data refers to character and numeric data that SAS recognizes automatically. Some examples of standard numeric data include
35469.93 3E5 (exponential notation) -46859
Standard character data is any character you can type on your keyboard. Standard character values are always left-justified by SAS.
Nonstandard Data
The term nonstandard data refers to character and numeric data that SAS does not recognize automatically. Examples of nonstandard numeric data include
12/12/2012 29FEB2000 4,242 $89,000
Desired Output
Obs 1 2 3 4 5 6 7 8 9 EmpID E1232 E2341 E3452 E6781 E8321 E1052 E1062 E8172 E1091 Hire Date 14532 13666 12352 11947 13479 13572 9991 14615 11554 Salary 61065 91688 32639 28305 40440 39461 41463 40650 40950 Bonus 3053.25 4584.40 1631.95 1415.25 2022.00 1973.05 2073.15 2032.50 2047.50
DATA SAS-data-set;
The DATA statement starts the DATA step and names the SAS data set being created.
The INFILE statement points to the raw data file being read. Options in the INFILE statement affect how SAS reads the raw data file.
INPUT variable-specification ;
The INPUT statement describes the raw data fields and specifies how you want them converted into SAS variables.
Formatted Input
The input style tells SAS where to find the fields and how to read them into SAS.
INPUT @n variable-name informat. ...;
@n - moves the pointer to the starting point of the field. variable-name - names the SAS variable being created. Informat - specifies how many positions to read and how to convert the raw data into a SAS value.
variable-name=expression;
The assignment statement creates a SAS variable and specifies how to calculate that variable's value.
NOTE: 9 records were read from the infile 'fltat1.dat'. The minimum record length was 21. The maximum record length was 21. NOTE: The data set WORK.FLTAT1 has 9 observations and 4 variables.
When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and compiles them. During the compile phase, SAS creates the following three items
input buffer
is a logical area in memory into which SAS reads each record of raw
observation at a time. When a program executes, SAS reads data values from the input buffer or creates them by executing SAS language statements.
program data vector. From here, SAS writes the values to a SAS data set as a single observation Along with data set variables and computed variables, the PDV contains two automatic variables, _N_ and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an error caused by the data during execution.
descriptor information
is information that SAS creates and maintains about each SAS
data set, including data set attributes and variable attributes. It contains, for example, the name of the data set and its member type, the date and time that the data set was created, and the number, names and data types (character or numeric) of the variables.
DATA step
Reading External File Data
data bonus_04; [1] infile 'your-input-file'; [2] input IDnumber name $ salary ; [3] bonus=salary * 0.25; [4] run; [5]
1- Begin the DATA step and create a SAS data set called bonus_04. 2- Specify the external file that contains your data. 3- Read a record and assign values to three variables. 4- Calculate a value for variable bonus. 5- Execute the DATA step.
DATA step
An informat is an instruction that SAS uses to read data values into a variable.
The INPUT statement with an informat after a variable name is the simplest way to read values into a variable.
$w. DATEw.
Reads standard character data Reads date values in the form ddmmmyy or
ddmmmyyyy
MMDDYYw. -
mmddyyyy
w.d
COMMAw.d -
Farr, Sue Anaheim, CA 869-7008 Anderson, Kay B. Chicago, IL 483-3321 Tennenbaum, Mary Ann Jefferson, MO 589-9030
Desired Output
The SAS data set should have one observation per employee.
LName Farr Anderson Tennenbaum FName Sue Kay B. Mary Ann City Anaheim Chicago Jefferson State CA IL MO Phone 869-7008 483-3321 589-9030
NOTE: 9 records were read from the infile 'addresses.dat'. The minimum record length was 8. The maximum record length was 20. NOTE: The data set WORK.ADDRESS has 3 observations and 5 variables.
Location
Amount
input SalesID $ Location $ @; if location='USA' then input SaleDate : mmddyy10. Amount; else if Location='EUR' then input SaleDate : date9. Amount : commax8.;
Desired Output
Output
Output
Output
data work.retire; length EmpID $ 6; infile 'raw-data-file'; input EmpID $ Contrib @@; run;
Hierarchical File
Detail Variables
EmpLName
Adams Adams Porter Lewis Nicholls Slaydon
EmpFName
Susan Susan David Dorian D. James Marla
DepName
Michael Lindsay Susan Richard Roberta John
Relation
C C S C C S
The RETAIN statement prevents SAS from reinitializing the values of new variables at the top of the DATA step. This means that values from previous records are available for processing.
Compile
4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
ID $ 5
PDV
Compile
4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
ID $ 5
...
Execute
4feb1989 132 530 11nov1989 152 540 22oct1991 90 530 4feb1993 172 550 24jun1993 170 510 20dec1994 180 520
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
ID $ 5
.
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
5 0 0 0 1 4 f e b 1 9 8 9 1 3 2 5 3 0
ID $ 5
.
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
ID $ 5
50001
. 10627
. 132
530 .
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit return Implicit output
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
5 3 0
ID $ 5
50001
. 10627
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit output
Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2
5 3 0
ID $ 5
50001
. 10627
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit return
Input Buffer
5 0 0 0 1
4 f eb 1 9 8 9
1 3 2
5 3 0
ID $ 5
50001
. 10627
. 132
530 .
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
5 0 0 0 1 4 f eb 1 9 8 9 1 3 2 5 3 0
ID $ 5
.
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2 5 4 0
ID $ 5
50002
. 10907
. 152
540 .
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit return Implicit output
Input Buffer
5 0 0 0 2
1 1 n o v 1 98 9
1 5 2
5 4 0
ID $ 5
50002
. 10907
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit output
Input Buffer
5 0 0 0 2 1 1 n o v 1 98 9 1 5 2
5 4 0
ID $ 5
50002
. 10907
. 152
540 .
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Implicit return
Input Buffer
5 0 0 0 2
1 1 n o v 1 98 9
1 5 2
5 4 0
ID $ 5
50002
. 10907
. 152
540 .
...
data airplanes; length ID $ 5; infile 'raw-data-file'; input ID $ InService : date9. PassCap CargoCap; run;
Input Buffer
5 0 0 0 2
Continue processing until 1 1 n o v 1 98 9 1 5 2 5 4 0 end of the raw data file. PDV INSERVICE PASSCAP CARGOCAP N N N 8 8 8
ID $ 5
Output of Dataset
proc print data=airplanes noobs; run; In Service 10627 10907 11617 12088 12228 12772 Pass Cap 132 152 168 172 170 180 Cargo Cap 530 540 530 550 510 520
Concatenating
Concatenating the data sets appends the observations from one data set to another data set. The DATA step reads DATA1 sequentially until all observations have been processed, and then reads DATA2. Data set COMBINED contains the results of the concatenation.
SAS-data-set(IN=variable)
where variable is any valid SAS variable name. Variable is a temporary numeric variable with a value of:
0 to indicate false; the data set did not contribute to the current
observation 1 to indicate true; the data set did contribute to the current observation
Trans D C C D C
Branch
A data set named Newtrans shows this weeks transactions. A data set named noactiv shows accounts with no transactions this week. A data set named noacct shows transactions with no matching account number.
Branch data Transact newtrans noactiv (drop=Trans Amnt) noacct (drop=Branch); merge prog2.transact(in=InTrans) prog2.branch(in=InBanks); by ActNum; if InTrans and InBanks then output newtrans; else if InBanks and not InTrans then output noactiv; else if InTrans and not InBanks then output noacct; run;
A data set named noactiv shows accounts with no Num Branch transactions this week.
112 115 Sivaji Nagar Koramangala
A data set named noacct shows transactions with no matching account number.
Num 113 Trans Amnt C 235
Questions