Guide To Master SAS Programming

Download as pdf or txt
Download as pdf or txt
You are on page 1of 119

INDEX

TOPIC PAGE NO

DATA STATEMENT 1

INFILE STATEMENT 1

INPUT STATEMENT 5

FORMAT AND INFORMAT 8

CONDITIONAL STATEMENTS 15

ITERATIVE DO LOOPS 19

DM STATEMENT 21

COMBINING DATASETS 22

FUNCTIONS 35

OPTIONS 51

ARRAYS 58

IMPORTING AND EXPORTING THE DATA 63

PROC CONTENTS 65

PROC COMPARE 67

PROC TRANSPOSE 70

PROC MEANS 72

PROC FREQUENCY 73

PROC PRINT 75

PROC REPORT 79

MACROS 90

SQL 101
DATA STATEMENT

A DATA STATEMENT IS USED TO BEGINING A DATA STEP AND PROVIDES


NAMES FOR OUPUT DATASET. IT ENDS WITH RUN STATEMENT.
SYNTAX:
DATA DATASET_NAME;
RUN;

INFILE STATEMENTS

INFILE STATEMENT: THE PURPOSE OF INFILE STATEMENT IS TO LOCATE


WHERE THE RAW DATA IS PRESENT.
IF IT IS PRESENT IN THE EXTERNAL LOCATION IT IS SPECIFIED BY A
"PATH" ELSE IF THE DATA IS PRESENT IN THE SAS EDITOR
WINDOW(INSTREAM DATA) THEN IT IS SPECIFIED WITH "DATALINES".

IN THE BELOW PROGRAM THE SOURCE DATA IS INSTREAM DATA SO IN THE


INFILE STATEMENT DATALINES IS WRITTEN.
IN THE INPUT STATEMENT LENGTH(5-19), 21-36 AND 37-48 IS
SPECIFIED THESE WILL BE DISCUSSED IN THE INPUT STATEMENTS TOPIC.

DATA E_1;
INFILE DATALINES;
INPUT STARTING $ 5-19 DESTINATION $ 21-36 DISTANCE $ 37-48;
DATALINES;
NEWYORK LOS ANGELS 2,446.32 mi
DALLAS San-Diego 1,181.13 mi
San-Francisco New-York 2,463.84 mi
Las-Vegas Atlanta 4,403.12 mi
;
RUN;

RAW DATA LOCATED IN THE EXTERNAL LOCATION IS REPRESENTED BY THE


PATH
DATA E_2 ;
INFILE "C:\Users\RICKY\Desktop\BQ\3.TXT" DSD;
INPUT NAME $ 10. SEX $2. BIKE $10.;
RUN;

DSD (DELIMITER SENSITIVE DATA):IT READS MISSING VALUES WITH


DELIMITER.IN THE BELOW EXAMPLE COMMA ACTS AS DELIMITER. SAS
READS MISSING VALUE PRESENT BETWEEN THE TWO DELIMITERS AS A
BLANK VALUE. DSD REMOVES DOUBLE QUOTATION MARKS.
DATA E_2;
INFILE DATALINES DSD;
INPUT STARTING :$18. DESTINATION $DISTANCE $;
DATALINES;
NY,"LA",2446.32
DALLAS,"SD",1181.13
SF,"NY",2463.84
LV,"Atlanta",4403.12
;
RUN;

DATA E_3;
INFILE DATALINES DSD;
INPUT STARTING :$18. DESTINATION $DISTANCE $;
DATALINES;
NY,LA,2,446.32
DALLAS,SD,1,181.13
SF,NY,2,463.84
LV,Atlanta,4,403.12
;
RUN;

DATA E_3;
INFILE DATALINES DSD;
INPUT STARTING :$18. DESTINATION $DISTANCE $;
DATALINES;
NY,,2,446.32
DALLAS,SD,,181.13
SF,NY,2,463.84
LV,Atlanta,4,403.12
;
RUN;

DLM: WHEN THE RAW DATA HAS THE SPECIAL CHARACTER OTHER THAN
COMMA WE USE DLM. THE SPECIAL CHARACTERS MUST BE ENCLOSED IN THE
QUOTES.

DATA E_4;
INFILE DATALINES DLM='^';
INPUT STARTING :$18. DESTINATION $DISTANCE $;
DATALINES;
NY^LA^2446.32
DALLAS^SD^1181.13
SF^NY^2463.84
LV^Atlanta^4,403.12
;
RUN;

DATA E_5;
INFILE "C:\Users\RICKY\Desktop\BQ\_DM.TXT" DLM='%';
INFILE NAME $ AGE $ SEX $ RACE $ HEIGHT WEIGHT;
RUN;

DLMSTR: IT IS USED WHEN THE RAW DATA HAS STRINGS OR ALPHABETS AS


SPECIAL CHARACTER.
DATA E_6;
INFILE DATALINES DLM='z';
INPUT STARTING :$18. DESTINATION $DISTANCE $;
DATALINES;
NYzLAz2446.32
DALLASzSDz1181.13
SFzNYz2463.84
LVzAtlantaz4,403.12
;
RUN;

DLMSOPT='i' : IF WE HAVE THE DATA SEPERATED BY SENSITIVE CASE


ALPHABETS WE USE DLMSOPT='i’.
DATA E_7;
INFILE DATALINES DLM='zOPp' DLMSOPT='i';
INPUT STARTING :$18. DESTINATION $DISTANCE $;
DATALINES;
NYzOPpLAzOPp2446.32
DALLASzOPpSDzOPp1181.13
SFzOPpNYzOPp2463.84
LVzOPpAtlantazOPp4,403.12
;
RUN;

MISSOVER: WHEN THE VALUES ARE MISSING IN THE LAST VARIABE WE USE
MISSOVER.IN THE BELOW EXAMPLE SECOND OBSERVATION RACE IS
MISSING.

DATA E_5;
INFILE DATALINES MISSOVER;
INPUT SUBJID AGE GENDER $ RACE $;
DATALINES;
101 23 MALE WHITE
102 56 MALE
103 27 FEMALE ASIAN
;
RUN;

DATA E_6;
INFILE DATALINES MISSOVER;
INPUT SUBJID AGE ETHNIC $ RACE $ HEIGHT WEIGHT;
DATALINES;
101 23 Asian White 5.6 67
102 45 Asian
103 20 INDIAN Black 4.5 56
104 40 Asian Black
105 23 CANADIAN
106 34 Asian Black 7.3 87
;
RUN;

FIRSTOBS: IT HELPS IN SELECTING THE FIRST OBSERVATIONS.IN THE


BELOW EXAMPLE THE SAS COMPILER SELECTS FROM THE SECOND
OBSERVATION ONWARDS.

DATA E_7;
INFILE DATALINES FIRSTOBS=2 MISSOVER;
INPUT SUBJID AGE ETHNIC $ RACE $ HEIGHT WEIGHT;
DATALINES;
101 23 Asian White 5.6 67
102 45 Asian
103 20 INDIAN Black 4.5 56
104 40 Asian Black
105 23 CANADIAN
106 34 Asian Black 7.3 87
;
RUN;

OBS: IT HELPS IN SELECTING UPTO THE LAST OBSERVATION WHICH IS


SPECIFIED.
DATA E_8;
INFILE DATALINES OBS=2 MISSOVER;
INPUT SUBJID AGE ETHNIC $ RACE $ HEIGHT WEIGHT;
DATALINES;
101 23 Asian White 5.6 67
102 45 Asian
103 20 INDIAN Black 4.5 56
104 40 Asian Black
105 23 CANADIAN
106 34 Asian Black 7.3 87
;
RUN;
INPUT STATEMENT

INPUT STATEMENT IS USED FOR THE SPECIFING THE VARIABLE


NAME,LENGTH,VARIABLE TYPE WHEATHER IT IS A CHARACTER OR NUMERIC
AND THEIR FORMATS ETC.

COLUMN INPUT METHOD: IN THIS METHOD THE COLUMN LENGTH IS


SPECIFIED WHERE THE VARIABLE STARTS AND ENDS.

data DETAILS;
infile cards;
input empid 1-8 name $ 9-26 des $ 27-34;
cards;
1001 SAM AMMAR Tester
1002 JOSH SAM PANDA Analyst
;
RUN;

NAMED INPUT METHOD:THE VARIABLE NAMES ARE SPECIFIED BEFORE THE


OBSERATIONS.
data emp1;
infile cards;
length name $ 18;
input number= name= $ age= ;
cards;
number= 9846590 name=RAM Age= 27
number= 9988642 name= VAZ age= 40
number= 9006591 name=Satyam Age= 34
number= 9988642 name=courier age= 40
;
RUN;

SIMPLE LIST INPUT METHOD:IN THIS METHOD THE LENGTH IS SPECIFIED.


DATA EMP2;
INFILE DATALINES;
INPUT ID 3. NAME $13. SALARY 6.;
DATALINES;
101 BEZ 15000
102 GEZAJ 16450
103 RUBEN 14587
;
RUN;
POINTER & SYMBOL: IT IS USED WHEN THERE IS DOUBLE CONSECUTIVE
SPACING BETWEEN THE VARIABLE. WHEN LENGTH IS SPECIFIED IN THIS
SCENERIO WHICHEVER COME FIRST IT GETS EXECUTED.

DATA EMP3;
INFILE DATALINES;
INPUT ID 3. NAME &$8. SALARY 6.;
DATALINES;
101 BEZ 15000
102 GEZAJ 16450
103 RUBEN 14587
;
RUN;

DATA EMP3;
INFILE DATALINES;
INPUT ID 3. NAME &$8. SALARY 6.;
DATALINES;
101 BEZ 15000
102 GEZAJ 16450
103 RUBEN 14587
;
RUN;

POINTER # SYMBOL: THIS POINTER IS USED FOR MOVING TO COMPILER TO


THE NEXT OBSERVATION.

DATA EMP6;
INFILE DATALINES;
INPUT #1 ID 3.
#2 NAME &$8.
#3 SALARY 6.;
DATALINES;
101
BEZ
15000
102
GEZAJ
16450
103
RUBEN
14587
;
RUN;
POINTER : SYMBOL: THIS POINTER READS THE VALUES BETWEEN THE
NON-BLANK COLUMN.IRRESPECTIVE OF THE BLANK SPACE LENGTH. IT IS
ALSO USED WHEN DELIMITERS ARE USED.
DATA EMP4;
INFILE DATALINES;
INPUT ID :3. NAME :$8. SALARY 6.;
DATALINES;
101 BEZ 15000
102 GEZAJ 16450
103 RUBEN 14587
;
RUN;

DATA EMP5;
INFILE DATALINES DSD;
INPUT ID :3. NAME :$8. SALARY 6.;
DATALINES;
101,BEZ,15000
102,GEZAJ,16450
103,RUBEN,14587
;
RUN;

SINGLE TRAILING METHOD(@): THIS SINGLE TRAILING SPECIFIER HOLDS


THE RECORD FOR EXECUTION AND OUTPUTS.

DATA EMP6;
INFILE DATALINES;
INPUT SEX $18-19@;
IF SEX='M';
INPUT ID 1-3 NAME &$ 5-11 SALARY 12-17 SEX $ 18-19;
DATALINES;
101 BEZ 15000 M
102 GEZAJ 16450 F
103 RUBEN 14587 M
;
RUN;

DOUBLE TRAILING METHOD(@@): IT HELPS IN SELECTING THE DATA UPTO


A SPECIFIED VARIABLE.
DATA E_9;
INFILE DATALINES;
INPUT SUBJID NAME $ PROF $ AGE WEIGHT SEX $ @@;
DATALINES;
1001 SAM Tester 25 65 M 1002 PANDA Analyst 21 57 M
1003 JAMES ENG 31 69 M 1004 EMMA TEACHER 20 49 F
;
FORMATED INPUT METHODS
1. (N.) IT SPECIFIES THE LENGTH
2. (+N) IT SPECIFIES THE NUMBER OF SPACES TO SKIP
data empdet;
infile cards;
input empid 5. +3 name $ 18. des $ 10.;
cards;
1001 SAM AMMAR Tester
1002 JOSH SAM PANDA Analyst
;
POSITION INPUT METHOD
@N M. IT IS A METHOD WHERE @N REPRESENTS START POSITION
AND M. REPRESENTS THE LENGTH.
DATA E_;
INFILE CARDS;
INPUT @1 EPID 4. @9 NAME $ 18. @27 PROF $ 8.;
cards;
1001 SAM AMMAR Tester
1002 JOSH SAM PANDA Analyst
;

MIXED INPUT METHOD: WE CAN USE COMBINATION OF VARIOUS INPUT


METHODS.
DATA E_1;
INFILE CARDS;
INPUT @1 EMPID 4. NAME $ 9 - 23 PROF $ 27-34;
cards;
1001 SAM AMMAR Tester
1002 JOSH SAM PANDA Analyst
;
FORMAT AND INFORMAT

FORMAT: THIS OPTION IS USED DISPLAYING THE VARIABLES AS DESIRED


AS DESIRED.
INFORMAT: THIS OPTION IS USED HOW TO READ THE VARIABLES.

FORMAT AND INFORMAT SHOULD END WITH FULL STOP(.)

INFORMATS

DDMMYY10: THIS DATE FORMAT IS USED WHEN WE HAVE '-' OR '/' OR


'.' OR ':'* IN THE DATES.

DATA E_10;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE RDATE DDMMYY10.;
DATALINES;
101 BAKER 10/11/2014 09/11/2019
102 BANKER 01/05/2013 09/12/2020
103 BROKER 14/05/2011 04/05/2018
;
RUN;

DATE9.:THIS INFORMAT IS USED WHE THE AE IS REPRESENTED IN


DDMMMYYYY FORM.

DATA E_13;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE RDATE DATE9.;
DATALINES;
101 BAKER 10NOV2014 09NOV2019
102 BANKER 01MAY2013 09DEC2020
103 BROKER 14MAY2011 04MAY2018
;
RUN;

DATE7.:-
DATA E_14;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE RDATE DATE7.;
DATALINES;
101 BAKER 10NOV14 09NOV19
102 BANKER 01MAY13 09DEC20
103 BROKER 14MAY11 04MAY18
;
RUN;

ANYDTDTE.: THIS IS USED WHEN WE HAVE UNKNOW FORMAT OF THE DATE.


DATA E_15;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE RDATE ANYDTDTE.;
DATALINES;
101 BAKER 10NOV14 09NOV19
102 BANKER 01MAY13 09DEC20
103 BROKER 14MAY11 04MAY18
;
RUN;

DATA E_10;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE RDATE ANYDTDTE.;
DATALINES;
101 BAKER 10/11/2014 09/11/2019
102 BANKER 01/05/2013 09/12/2020
103 BROKER 14/05/2011 04/05/2018
;
RUN;

DATETIME.: THIS INFORMAT IS USED WHEN WE HAVE DATE TIME


TOGETHER. THE OUTPUT IS PRESENTED IN ELAPSED SECONDS CALCULATE
FROM 1960 ONWARDS

DATA E_16;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE RDATE DATETIME.;
DATALINES;
101 BAKER 10NOV14:11:15:46 09NOV19:10:41:13
102 BANKER 01MAY13:02:08:16 09DEC20:01:06:52
103 BROKER 14MAY11:11:02:18 04MAY18:08:19:48
;
RUN;
TIME.: THIS FORMAT IS USED TO REPRESENT ELAPSED NUMBER OF SECOND
I.E, 00:00:00 ONWARDS
DATA E_18;
INFILE DATALINES;
INPUT EMP_ID DES $:10. TIME;
INFORMAT TIME TIME.;
DATALINES;
101 BAKER 10:12:18
102 BANKER 01:15:19
103 BROKER 08:56:45
;
RUN;
INFORMAT FOR DATE AND TIME
DATE7. 29JAN10
DATE9. 29JAN2010
DDMMYY8. 29/01/10,29-01-10,29:01:10
DDMMYY10. 29/01/2010,29-01-2010,29:01:2010
TIME. 10:30:20
DATETIME. 29JAN10:10:30:20
ANYDTDTE. IF THE DATA INFORMAT IS UNKNOWN

FORMAT

DATA E_13;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
FORMAT JDATE RDATE DATE9.;
DATALINES;
101 BAKER 5693 7412
102 BANKER 7856 9245
103 BROKER 7844 8546
;
RUN;

FORMAT FOR DATE AND TIME


date7. 29Jan10
date9. 29Jan2010
ddmmyy8. 29/01/10
ddmmyy10. 29/01/2010
time5. 10:30
time8. 10:30:20
Datetime. 29Jan10:10:30:20
datetime20. 29Jan2010:10:30:20
worddate20. January 29, 2010
weekdate30. Friday, January 29,2010
yymmddn8. 20100129
yymmddd8. 10-01-29
yymmddd10. 2010-01-29
yymmdds8. 10/01/29
yymmdds10. 2010/01/29
yymmddc8 10:01:29
Yymmddc10. 2010:01:29

MISCELLANIOUS FORMATS

PERCENTW. : THIS OPTION IS USED TO READ THE VALUES WHEN THE


OBSERVATIONS HAVE PERCENTAGE
DATA E_12;
INFILE DATALINES;
INPUT EMP_ID DES $:10. PERCENT;
INFORMAT PERCENT PERCENT3.;
DATALINES;
101 BAKER 30%
102 BANKER 41%
103 BROKER 14%
;
RUN;

OTHER FORMATS AND INFORMATS

COMMAW. : THIS IS USED TO READ COMMAS AND OUTPUT THE RESULT WITH
COMMAS

DOLLARW. : THIS IS USED TO READ DOLLARS SIGN AND OUTPUT THE


RESULT WITH DOLLAR SIGN.

‘W’ REPRESENTS THE WIDTH OF THE OBSERVATION.AFTER THE DECIMAL


THE VALUE ARE DECIMAL VALUES. EXAMPLE DOLLAR5.1, 5 REPRESNENTS
THE ENTIRE LENGTH INCLUDING DOLLAR SIGN ‘.1’ REPRESENTS THE
WIDTH AFTER THE DECIMAL.

DATA E_13;
INFILE DATALINES;
INPUT EMP_ID DES $:10. SALARY;
INFORMAT SALARY DOLLAR5.1;
DATALINES;
101 BAKER $30.1
102 BANKER $41.0
103 BROKER $14.0
;
RUN;
DATA E_14;
INFILE DATALINES;
INPUT EMP_ID DES $:10. SALARY;
INFORMAT SALARY DOLLAR6.2;
FORMAT SALARY DOLLAR6.2;
DATALINES;
101 BAKER $30.0
102 BANKER $41.0
103 BROKER $14.0
;
RUN;

DATA E_A;
INFILE DATALINES;
INPUT EMP_ID DES $:10. SALARY;
INFORMAT SALARY DOLLAR7.;
FORMAT SALARY DOLLAR7.;
DATALINES;
101 BAKER $30,212
102 BANKER $40,153
103 BROKER $17,123
;
RUN;

DATA E_15;
INFILE DATALINES;
INPUT EMP_ID DES $:10. SALARY;
INFORMAT SALARY COMMA6.;
DATALINES;
101 BAKER 300,00
102 BANKER 410,00
103 BROKER 14,00
;
RUN;

DATA E_16;
INFILE DATALINES;
INPUT EMP_ID DES $:10. SALARY;
INFORMAT SALARY COMMA6.;
FORMAT SALARY COMMA6.;
DATALINES;
101 BAKER 300,00
102 BANKER 410,00
103 BROKER 14,00
;
RUN;
ATTRIB STATEMENT

ATTRIB STATEMENT MUST START WITH THE WORD ATTRIB. ALL THE
FORMATS AND INFORMATS FOR A VARIABLE CAN BE WRITTEN IS A SINGLE
ATTRIB STATEMENT

DATA E_;
INFILE CARDS;
ATTRIB DOB INFORMAT= DDMMYY8. FORMAT= DATE9. SALARY INFORMAT=
DOLLAR6. FORMAT=DOLLAR6.;
INPUT EPID:5. NAME :$ 12. PROF :$8. DOB SALARY;
cards;
1001 SAMAMMAR Tester 16/10/19 $15,267
1002 JOSHSAMPANDA Analyst 12/01/14 $16,267
;

PROC FORMAT

PROC FORMAT IS A PROCEDURE WHERE THE USER DEFINES THE INFORMATS


AND FORMATS ACCORDING TO THE NEEDS OF THE PROGRAMMER THESE ARE
NOT SYSTEM DEFINED.

DATA GROUP;
INPUT TEAM $ COLOR $ SEX $;
DATALINES;
MARSH R F
KING B M
LION G F
BEAR Y M
;
QUIT;

VALUE STATEMENT: IT IS USED FOR FORMATS.


PROC FORMAT;
VALUE $ COL 'R'='RED'
'B'='BLUE'
'G'='GREEN'
'Y'='YELLOW'
;
VALUE $ GEN 'F'='FEMALE'
'M'='MALE'
;
RUN;

DATA GROUP1;
INPUT TEAM $ COLOR $ SEX $;
FORMAT COLOR $COL. SEX $GEN.;
DATALINES;
MARSH R F
KING B M
LION G F
BEAR Y M
;
QUIT;

INVALUE :IT IS USED FOR INFORMATS


PROC FORMAT;
INVALUE $ COL 'R'='RED'
'B'='BLUE'
'G'='GREEN'
'Y'='YELLOW'
;
INVALUE $ GEN 'F'='FEMALE'
'M'='MALE'
;
RUN;
DATA GROUP2;
INPUT TEAM $ COLOR $ SEX $;
INFORMAT COLOR $COL. SEX $GEN.;
DATALINES;
MARSH R F
KING B M
LION G F
BEAR Y M
;
QUIT;
CONDITIONAL STATEMENTS

THESE ARE THE STATEMENTS WHERE THE SPECIFIED CONDITION IS


SATISFIED THEN THE PROGRAM GETS EXECUTED AND SPECIFIED OUTPUT IS
PRESENTED. THESE CONDITIONAL STATEMENTS ARE SIMILAR TO OTHER
PROGRAMMING LANGUAGES.

CONDITIONAL STATEMENT ARE OF THREE TYPES


1.IF STATEMENT
2.IF ELSE
3.WHERE

OPERATORS TYPES
1.AND
2.OR
3.IN
4.NOT
5.NOT IN
6. BETWEEN
7.LIKE

DATA KEYS1;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
ALISHA M FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALISHA F FRENCH 18 22 66 74
ALBERTA F ENGLISH 14 61 162 71
;
RUN;
WHERE STATEMENT IS USED TO SELECT A SPECIFIC OBSERVATION

PROC PRINT DATA=KEYS1;


WHERE NAME='ABDUL';
RUN;

PROC PRINT DATA=KEYS1;


WHERE SUBJECT='FRENCH';
RUN;

AND OPERATOR: THIS IS USED FOR WRITING MULTIPLE CONDITIONS IN A


STATEMENT.

PROC PRINT DATA=KEYS1;


WHERE SUBJECT='FRENCH' AND SEX='F';
RUN;

OR OPERATOR: IF ANY OF THE MULTIPLE CONDITIONS ARE TRUE THEN IT


IS USED.

PROC PRINT DATA=KEYS1;


WHERE SUBJECT='FRENCH' OR SEX='F';
RUN;

IN OPERATOR: THIS IS USED TO SPECIFY MULTIPLE OBSERVATIONS IN A


VARIABLE.

PROC PRINT DATA=KEYS1;


WHERE SUBJECT IN('FRENCH','ENGLISH');
RUN;

PROC PRINT DATA=KEYS1;


WHERE SUBJECT NOT IN('FRENCH','ENGLISH');
RUN;

BETWEEN OPERATOR: THIS OPERATOR IS USED TO SELECT THE RANGE


PROC PRINT DATA=KEYS1;
WHERE AGE BETWEEN 15 AND 18;
RUN;

LIKE OPERATOR: THIS OPERATOR IS USED TO SELECT DESIRED CHARACTER

PROC PRINT DATA=KEYS1;


WHERE NAME LIKE 'A%';
RUN;
CONTAIN OPERATOR: THIS OPERATOR IS USE FOR SELECTING SPECIFIED
CHARACTER

PROC PRINT DATA=KEYS1;


WHERE NAME CONTAINS 'E';
RUN;

IF STATEMENT: IT IS USED FOR WRITING STATEMENTS AND CONDITIONS.

DATA E2_;
SET KEYS1;
IF AGE=16;
RUN;

DATA E1_;
SET KEYS1;
IF AGE=15 THEN SALARY=6000;
RUN;

DATA E3_;
SET KEYS1;
IF AGE IN(15:18) THEN SALARY=16000;
RUN;

IF-ELSE: THIS STATEMENT IS USED FOR CONDITIONS WHERE IF THE


FIRST STATEMENT IS FALSE THEN NEXT STATEMENT GETS CHECKED AND
EXECUTED.

DATA E4_;
SET KEYS1;
IF AGE IN(11:15) THEN SALARY=10000;
ELSE SALARY=15000;
RUN;

IF ELSE IF: THESE STATEMENTS ARE USED FOR MULTIPLE CONDITIONS


AND THE VARIABLE WHICH IS PRESENT IN THE CONDITION SHOULD BE
SAME THROUGH OUT THE CONDITIONS.

DATA E5_;
SET KEYS1;
IF AGE IN(11:13) THEN SALARY=16000;
ELSE IF AGE IN(14:16) THEN SALARY=20000;
ELSE IF AGE IN(17:19) THEN SALARY=35000;
RUN;
GO TO STATEMENT: THIS STATEMENT IS USED FOR WRITING CASE BLOCKS.
IF THE CONDITION IS TRUE THEN THE SPECIFIC CASE IS EXECUTED.

DATA E6_;
LENGTH COLOR $7.;
SET KEYS1;
IF AGE>=11 AND AGE<=13 THEN GO TO CASE1;
ELSE IF AGE>=14 AND AGE<=15 THEN GO TO CASE2;
ELSE IF AGE>=16 AND AGE<=19 THEN GO TO CASE3;

CASE1: SALARY=5000;
COLOR='RED';
RETURN;
CASE2: SALARY=10000;
COLOR='PINK';
RETURN;
CASE3: SALARY=30000;
COLOR='YELLOW';
RETURN;
RUN;
ITERATIVE DO LOOPS

THESE ARE THE SIMPLEST FORM OF ITERATIVE LOOPS.THESE CAN BE EXECUTED


WITHIN THE SAS DATA STEP. THESE ITERATIVE DO LOOP ARE UNCONDITIONAL
I.E, IF YOU DEFINE A LOOP TO EXECUTE 5 TIMES, IT WILL EXECUTE 5 TIMES.

THIS ITERATIVE DO LOOP IS USED FOR INCREMENTAL COUNTING OR REPETITIVE


CALCULATION EXERCISES. IN THE BELOW EXAMPLE OUTPUT STATEMENT IS
WRITTEN TO OUTPUT THE RESULTS BEFORE GOING TO NEXT ITERATIVE STEP DO
LOOPS MUST ALWAYS END WITH 'END STATEMENT'

DO LOOPS: DO LOOPS ARE OF 3 TYPES

1.DO LOOP
2.DO WHILE
3.DO UNTIL

DO LOOP:
DATA A1_;
DO I=1 TO 5;
OUTPUT;
END;
RUN;

NESTED DO LOOP: WRITING A DO LOOP INSIDE ANOTHER DO LOOP IS


CALLED NESTED DO LOOP.
DATA A2_;
DO I=1 TO 5;
DO A=1 TO 10;
TABLES= I * A;
OUTPUT;
END;
END;
RUN;

PROC SORT DATA=SUN;


BY MARKS;
RUN;
DO WHILE: IT GETS EXECUTED WHEN THE WHILE CONDITION IS TRUE.
DATA SET1;
SUM=0;
R = 1;
DO WHILE(R<6);
SUM = SUM+R;
R+1;
END;
DROP R;
RUN;

DO UNTIL:IT GETS EXECUTED WHEN THE UNTIL CONDITION IS TRUE

DATA SET2;
SUM = 0;
R = 1;
DO UNTIL(R>5);
SUM = SUM+R;
R+1;
END;
RUN;

DIFFERENCE BETWEEN DO WHILE AND DO UNTIL

WHILE CONDITION CHECKS THE CONDITION FIRST AND EXECUTES NEXT.


UNTIL CONITION IS EXECUTED FIRST AND THEN CHECKS THE
CONDITION.IT RUNS ATLEAST ONCE.
DM STATEMENT

DM STATEMENT: IT CONTROLS THE SAS WINDOWS

DM LOG 'CLEAR'; IT CLEARS THE LOG WINDOW.

DM OUT 'CLEAR'; IT CLEARS THE OUTPUT WINDOW.

DM EDIT 'CLEAR'; IT CLEARS THE EDITIOR WINDOW.

DM LOG 'PATH'; IT EXPORTS THE LOG WINDOW TO A SPECIFIED FILE.


SYNTAX BELOW;
DM LOG "FILE 'C:\USERS\RIKY\DESKTOP\NEW TEXT DOCUMENT.TXT'";

DM EDIT 'PATH'; IT EXPORTS THE EDIT WINDOW TO A SPECIFIED FILE.


SYNTAX BELOW;
DM EDIT "FILE 'C:\USERS\RIKY\DESKTOP\DD.TXT'";

DM OUTPUT 'PATH';IT EXPORTS THE OUTPUT WINDOW TO A SPECIFIED


FILE.SYNTAX BELOW;
DM EDIT "FILE 'C:\USERS\RIKY\DESKTOP\DD1.TXT'";

DM LOG 'WINCLOSE'; USED FOR LOG WINDOW CLOSER


DM EDIT 'WINCLOSE'; USED FOR EDIT WINDOW CLOSER
DM OUT 'WINCLOSE'; USED FOR OUTPUT WINDOW CLOSER.
COMBINING DATASETS

DEPENDING UPON THE REQUIREMENT THE DATASETS ARE COMBINED


HORIZONTALY AND VERTICALY.

DATASETS CAN BE COMBINED BY THE FOLLOWING METHODS


1. CANCATENATION
2. INTERLEAVING
3. MERGE
4. PROC APPEND

CANCATENATION

COMBINING DATASET VERTICALLY IS DONE BY CANCATENATION.

DATA SPORTS;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
US SAM A+
US JOHN B-
US PELO A
UK JEM O
UK MOUZ C
UK BEZA A
;
RUN;

DATA SPORTS1;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
NZ RAZ C-
NZ MAZ A-
NZ REN C+
CA BEN A+
CA LEE B+
;
RUN;

SET STATEMENT: FOR CANCATENATION WE HAVE TO USE SET STATEMENT.

DATA FINAL1;
SET SPORTS SPORTS1;
RUN;

DATA SPORTS2;
INPUT TEAM $ NAME $ RATING $ POINTS;
DATALINES;
US SAM A+ 1
US JOHN B- 2
US PELO A 3
UK JEM O 4
UK MOUZ C 1
UK BEZA A 6
;
RUN;

DATA SPORTS3;
INPUT TEAM $ NAME $ RATING $ POINTS $;
DATALINES;
NZ RAZ C- 5
NZ MAZ A- 2
NZ REN C+ 4
CA BEN A+ 1
CA LEE B+ 2
;
RUN;

DATA SPORTS2A;
INPUT TEAM $ NAME $ RATING $ POINTS FROM$;
DATALINES;
US SAM A+ 1 LA
US JOHN B- 2 SF
US PELO A 3 NY
UK JEM O 4 BATH
UK MOUZ C 1 YORK
UK BEZA A 6 OXFORD
;
RUN;

DATA SPORTS3A;
INPUT TEAM $ NAME $ RATING $ POINTS FROM $12.;
DATALINES;
NZ RAZ C- 5 QUEENSTOWN
NZ MAZ A- 2 CHRISTCHURCH
NZ REN C+ 4 WIINGTON
CA BEN A+ 1 QUBEC
CA LEE B+ 2 MONTREL
;
RUN;
IN THE BELOW PROGRAM THE ‘FROM’ VARIABLE GETS TRUNCATED. TO
AVOID THIS LENGTH STATEMENT SHOULD BE WRITTEN. ALWAYS IT SHOULD
BE WRITTEN BEFORE THE SET STATEMENT.

DATA SPORTS3B;
RETAIN TEAM NAME RATING POINTS FROM;
LENGTH FROM $12.;
SET SPORTS2A SPORTS3A;
RUN;

CANCATENATION DOESN’T OCCUR WHEN THE VARIABLES ARE IN DIFFERENT


FORMATS. TO CANCATENATE VARIABLES SHOULD BE IN SAME FORMAT. TO
MAKE IT HAPPEN THESE SHOULD BE IN THE SAME FORM.SO WE ARE GOING
TO CONVERT INTO NUMERIC FORM.

DATA SPORTS4;
SET SPORTS2 SPORTS3;
RUN;

DATA SPORTS5(RENAME=(K=POINTS));
SET SPORTS3;
K=INPUT(POINTS, 3.);
DROP POINTS;
RUN;

DATA SPORTS6;
SET SPORTS2 SPORTS5;
RUN;

DATA SPORTS7;
INPUT TEAM $ NAME $ RATING $ ;
DATALINES;
US SAM A+
US JOHN B-
US PELO A
UK JEM O
UK MOUZ C
UK BEZA A
;
RUN;

DATA SPORTS8;
INPUT TEAM $ NAME $ RATING $ POINTS $;
DATALINES;
NZ RAZ C- 5
NZ MAZ A- 2
NZ REN C+ 4
CA BEN A+ 1
CA LEE B+ 2
;
RUN;

EVEN WHEN THE VARIABLES ARE NOT PRESENT IN THE ANY OF THE
DATASETS IT CAN BE CANCATENATED.

DATA SPORTS9;
SET SPORTS7 SPORTS8;
RUN;

DATASETS WITH DIFFERENT VARIABLES CAN BE CANCATENATED.

DATA SPORTS10;
INPUT TEAM $ NAME $ RATING $ COLOR $ ;
DATALINES;
US SAM A+ GREEN
US JOHN B- PINK
US PELO A BLUE
UK JEM O BLACK
UK MOUZ C BROWN
UK BEZA A RED
;
RUN;

DATA SPORTS11;
INPUT TEAM $ NAME $ RATING $ POINTS $;
DATALINES;
NZ RAZ C- 5
NZ MAZ A- 2
NZ REN C+ 4
CA BEN A+ 1
CA LEE B+ 2
;
RUN;

DATA SPORTS12;
SET SPORTS11 SPORTS10;
RUN;
PROC SORT

THIS OPTION IS USED FOR SORTING THE DATA AS REQUIRED.

DATA STANDARD;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

FOR SORTING WE USE BY STATEMENT. IN THE BELOW EXAMPLE WE ARE


SORTING BASED ON SEX VARIABLE.

PROC SORT DATA=STANDARD;


BY SEX;
RUN;

OUT: THIS OPTION IS USED TO STORE THE SORTED DATA.IF OUT OPTION
IS NOT USED THE DATA GETS SORTED AND STORED IN THE MASTER
DATASET ITSELF.

PROC SORT DATA=STANDARD OUT=RRR;


BY SEX;
RUN;

DESCENDING:THIS OPTION IS USED TO SORT DATA IN DESCENDING ORDER.


IT SHOULD BE WRITTEN IN THE BY STATEMENT.

PROC SORT DATA=STANDARD OUT=R1;


BY DESCENDING SEX;
RUN;
DATA NM;
INPUT NUM;
DATALINES;
24
.
0
-23
3.4
-4.5
69
;
RUN;

ORDER OF SORTING THE NUMERIC DATA . > - > 0 > +

PROC SORT DATA=NM OUT=NNM;


BY NUM;
RUN;

REMOVING DUPLICATE DATA:

NODUPKEY: DUPLICATE RECORD WILL BE REMOVED AND SORTED.

DATA STAND;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALISHA F FRENCH 18 22 166 74
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

PROC SORT DATA=STAND OUT=NODUP1 NODUPKEY;


BY NAME;
RUN;

DUPREC: THIS OPTION IS USED TO REMOVE THE ENTIRE DUPLICATE


OBSERVATION IF THEY ARE EXACTLY SIMILAR AND EXIST ONE BELOW THE
OTHER

PROC SORT DATA=STAND OUT=NOUP2 NODUPREC;


BY NAME;
RUN;

DATA KEY1;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
ALISHA M FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALISHA F FRENCH 18 22 66 74
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

PROC SORT DATA=KEY1 OUT=KEY2 NODUPREC;


BY NAME;
RUN;

DUPOUT: THIS OPTIONS ARE USED TO STORE THE DUPLICATE


OBSERVATIONS IN ANOTHER DATASET.

PROC SORT DATA=KEY1 DUPOUT=RKEY NODUPKEY;


BY NAME;
RUN;

NODUPREC OPTION IS SIMILAR TO NODUP


INTERLEAVING

FOR INTERLEAVING WE REQUIRE SET STATEMENT AND BY STATEMENT.WHEN


WE USE SET AND BY STATEMENT TWO AUTOMATIC VARIABLES FIRST. AND
LAST. ARE CREATED. TO GET INTERLEAVING EXECUTED WE HAVE TO RUN
PROC SORT PROGRAM FIRST.

DATA SPORTZ1;
INPUT TEAM $ NAME $ RATING $ ;
DATALINES;
US SAM A+
US JOHN B-
US PELO A
UK JEM O
UK MOUZ C
UK BEZA A
;
RUN;

DATA SPORTZ2;
INPUT TEAM $ NAME $ RATING $;
DATALINES;
NZ RAZ C-
NZ MAZ A-
NZ REN C+
CA BEN A+
CA LEE B+
;
RUN;

PROC SORT DATA=SPORTZ1;


BY TEAM;
RUN;

PROC SORT DATA=SPORTZ2;


BY TEAM;
RUN;

DATA SPROTSZ3;
SET SPORTZ1 SPORTZ2;
BY TEAM;
RUN;

SELECTING THE FIRST OCCURRENCE OF OBSERVATION IN THE DATASET WE


USE FIRST., IN THE BELOW EXAMPLE WE SELECT FIRST OCCURRENCE OF
CA,NZ,UK,US.
SYNTAX:
IF FIRST.VARIABLE=1 THEN OUTPUT.

DATA SPORTZ4;
SET SPORTZ1 SPORTZ2;
BY TEAM;
IF FIRST.TEAM=1 THEN OUTPUT;
RUN;

TO SELECT THE LAST OBSERVATION IN THE DATASET WE HAVE TO SPECIFY


‘LAST.’ IT HELPS IN SELECTING THE LAST OCCURRENCE OF THE
OBSERVATION. IN THE BELOW EXAMPE SELECTING LAST OCCURRENCE OF
CA, NZ,UK,US IS DONE.

DATA SPORTZ5;
SET SPORTZ1 SPORTZ2;
BY TEAM;
IF LAST.TEAM=1 THEN OUTPUT;
RUN;

SELECT THE FIRST AND LAST NON OCCURENCE OF THE DATA


DATA SPORTZ6;
SET SPORTZ1 SPORTZ2;
BY TEAM;
IF FIRST.TEAM=0 AND LAST.TEAM=0 THEN OUTPUT;
RUN;

DATA SPORTZ7;
SET SPORTZ1 SPORTZ2;
BY TEAM;
IF FIRST.TEAM=1 AND LAST.TEAM=1 THEN OUTPUT;
RUN;

MERGING

MERGING IS OF DIFFERENT TYPES


1.LEFT MERGE
2.RIGHT MERGE
3.INNER MERGE
4.FULL MERGE

IN MERGING ALEAST ONE VARIABLE MUST HAVE SIMILIAR OBSERVATIONS


IN MERGING DATASETS.
DATA MFG;
INPUT MFG $ MODEL $ PRICE;
DATALINES;
BENZ B-CLASS 10000
BMW X6 15213
SUZUKI WINZ 12457
HONDA ACCORD 14567
;
RUN;

DATA CARS;
INPUT MFG $ MODEL $ MILEAGE_MPS $;
DATALINES;
BENZ B-CLASS 12
BMW X6 15
SUZUKI WINZ 16
ISUZ MUX 14
;
RUN;

FOR MERGING THE DATA MUST BE SORTED FIRST. IN THE MERGING


DATASET MERGE STATEMENT SHOULD BE INCLUDED AND VARIABLES ARE
CREATED BY USING 'IN' OPTION. THESE VARIABLES HELP IN MERGING.
IN IF STATEMENT WE HAVE TO SPECIFY THE VARIABLE WHICH IS
CREATED.

PROC SORT DATA=MFG;


BY MFG;
RUN;

PROC SORT DATA=CARS;


BY MFG;
RUN;
LEFT MERGE

DATA LEFT;
MERGE MFG(IN=A) CARS(IN=B);
BY MFG;
IF A;
RUN;

RIGHT MERGE

DATA RIGHT;
MERGE MFG(IN=A) CARS(IN=B);
BY MFG;
IF B;
RUN;

INNER MERGE

DATA INNER;
MERGE MFG(IN=A) CARS(IN=B);
BY MFG;
IF A AND B;
RUN;
FULL MERGE

DATA FULL;
MERGE MFG(IN=A) CARS(IN=B);
BY MFG;
IF A OR B;
RUN;

PROC APPEND

PROC APPEND IS SIMILAR TO CANCATENATION BUT THE DISADVANTAGE OF


PROC APPEND IS IT GETS EXECUTED MULTIPLE TIMES. APPENDINGS TAKE
PLACE. THE OTHER DISADVANTAGE IS PARENT DATASET GET DISTURBED.

DATA SPORTS1Z;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
US SAM A+
US JOHN B-
US PELO A
UK JEM O
UK MOUZ C
UK BEZA A
;
RUN;

DATA SPORTS2Z;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
NZ RAZ C-
NZ MAZ A-
NZ REN C+
CA BEN A+
CA LEE B+
;
RUN;

PROC APPEND BASE=SPORTS DATA=SPORTS1;


RUN;
PROC APPEND HELPS IN CREATING A DUMMY DATASET AND APPENDING THE
DATA.

DATA PLAYERS;
INPUT TEAM $ NAME $ SCORE $;
LABEL TEAM='GROUP' NAME='PLAYERS' SCORE='CREDITS';
RUN;

PROC APPEND BASE=PLAYERS DATA=SPORTS1;


RUN;

DATA SPORTS3Z;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
US SAM 1
US JOHN 1
US PELO 2
UK JEM 1
UK MOUZ 2
UK BEZA 2
;
RUN;

DATA SPORTS4Z;
INPUT TEAM $ NAME $ SCORE ;
DATALINES;
NZ RAZ 1
NZ MAZ 0
NZ REN 2
CA BEN 1
CA LEE 2
;
RUN;

WHEN THE VARIABLES ARE OF DIFFERENT FORMATS THEY DO NOT GET


APPENDED, TO APPEND FORCE OPTION MUST BE USED FOR THE ABOVE
DATASETS. SCORE IS PRESENT IN NUMERIC AND CHARACTER FORMATS.

PROC APPEND BASE=SPORTS3Z DATA=SPORTS4Z FORCE;


RUN;
FUNCTIONS

DATASET FUNCTION ARE AMAZINGLY HELPFUL IN SAS PROGRAMMING SOME


OF THESE FUNCTIONS ARE ESSENTIAL FOR REDUCING THE UNNECESSARY
CODE.

DATASET FUNCTIONS ARE OF 4 TYPES


1. NUMERIC FUNCTIONS
2. CHARACTER FUNCTIONS
3. DATE FUNCTIONS
4. MISCELLANEOUS FUNTIONS

NUMERIC FUNTIONS:

ROUND FUNCTION: THIS FUNCTON IS USED TO ROUND THE DIGITS TO A


SPECIFIED DECIMAL VALUE.
DATA M_1;
K=1.235601;
ROU=ROUND(K,0.01);
ROU1=ROUND(K,0.001);
ROU2=ROUND(K,0.0001);
ROU3=ROUND(K,0.1);
RUN;

DATA M_2;
A=2.122654;
ROU1=ROUND(A,0.1);
ROU2=ROUND(A,0.01);
ROU3=ROUND(A,0.001);
ROU4=ROUND(A,0.0001);
RUN;

FLOOR FUNCTION: THIS RETURN THE NEAREST ROUNDED LOWEST VALUE.


CEIL FUNCTION: THIS GIVES THE ROUNDED HIGHEST VALUE.
INT FUNCTION: THIS GIVES THE NEAREST INTEGER VALUE.

DATA M_3;
A=1.236;
B=-1.236;
Z=1.6;
X=-1.8;
F1=FLOOR(A);
C1=CEIL(A);
I1=INT(A);
F2=FLOOR(B);
C2=CEIL(B);
I2=INT(B);
F3=FLOOR(Z);
C3=CEIL(Z);
I3=INT(Z);
F4=FLOOR(X);
C4=CEIL(X);
I4=CEIL(X);

RUN;

SQRT FUNCTION: THIS FUNCTION IS USED FOR FINDING THE SQUARE ROOT
OF THE GIVEN NUMBER.

DATA SQ;
A=25;
S=SQRT(A);
RUN;

MOD FUCTION: IT GIVES THE REMINDER VALUE AFTER DIVISION.

DATA REM1;
A=3;
REMINDER=MOD(A,2);
RUN;

DATA DD;
INPUT SALES $ REVENUE;
DATALINES;
SAL1 200
SAL2 320
SAL3 120
SAL4 250
SAL5 200
SAL6 512
;
RUN;

DIF FUNCTION: IT GIVES THE DIFFERENCE BETWEEN THE COLUMN VALUES.

DATA M_4;
SET DD;
K=DIF(REVENUE);
RUN;
LAG FUNCTION: IT TRANSFER THE VALUE FROM ABOVE OBSERVATON TO
BELOW OBSERVATION

DATA DD1;
SET DD;
LP=LAG(REVENUE);
RUN;

DATA DD2;
SET DD;
DIFF=DIF(REVENUE);
LP=LAG(REVENUE);
RESULT=REVENUE-LP;
RUN;

DATA DD3;
SET DD;
IF REVENUE >LAG(REVENUE) THEN PROFIT_LOSS='INCREASED';
ELSE PROFIT_LOSS='DECREASED';
RUN;

DATA DD4;
INPUT SUBJECT $ MARKS1-MARKS3;
DATALINES;
MATHS 10 12 13
ANTRO 19 10 25
PHYSC 21 15 13
SOCIO 10 12 3
REGIA 12 5 6
;
RUN;

MIN FUNCTION: OUTPUTS MINIMUM VALUE FROM THE VARIABLE.


MAX FUNCTION: OUTPUTS MAXIMUM VALUE FROM THE VARIABLE.
SUM FUNCTION: OUTPUTS SUM VALUE FROM THE VARIABLE.
MEAN FUNCTION: IT OUTPUTS THE AVERAGE VALUE FROM THE VARIABLE.
STD FUNCTION: OUTPUTS THE STANDARD DEVIATION FROM THE VARIABLE.
MEDIAN FUCTION: IT GENERATES THE MEDIAN VALUE FROM THE VARIABLE.

DATA DD5;
SET DD4;
MINIMUM=MIN(MARKS1,MARKS2,MARKS3);
MAXIMUM=MAX(MARKS1,MARKS2,MARKS3);
TOTAL=SUM(MARKS1,MARKS2,MARKS3);
AVERAGE=MEAN(MARKS1,MARKS2,MARKS3);
STD_DEV=STD(MARKS1,MARKS2,MARKS3);
MED=MEDIAN(MARKS1,MARKS2,MARKS3);
RUN;

_NULL_ : _NULL_ IS USED FOR NOT CREATING DATASET.

PUT: THIS STATEMENT IS USED FOR VIEWING THE RESULT IN THE LOG
WINDOW.

CHARACTER FUNCTIONS

INDEX FUNCTION:IT RETURNS THE POSITION OF THE A CHARACTER OR A


WORD.

DATA _NULL_;
A='THIS IS SAS PROGRAM HELLO!!!!!!! WORLD.';
INDEX1=INDEX(A,'O');
PUT @5 'THE POSITION OF THE CHARACTER IS:' INDEX1;
RUN;

INDEXC FUNCTION:IT RETURNS THE POSITION OF THE CHARACTER.

DATA _NULL_;
A='THIS IS SAS PROGRAM HELLO!!!!!!! WORLD.';
INDEX1=INDEXC(A,'HELLO');
PUT @5 'THE POSITION OF THE CHARACTER IS:' INDEX1;
RUN;

INDEXW FUNCTION: IT RETURNS THE POSITION OF THE WORD

DATA _NULL_;
DATA _NULL_;
A='THIS IS SAS PROGRAM HELLO!!!!!!! WORLD.';
INDEX1=INDEXC(A,'HELLO');
PUT @5 'THE POSITION OF THE CHARACTER IS:' INDEX1;
RUN;

NOTALPHA FUNCTION: IT RETURNS THE POSITION OF NONALPHABETIC


CHARACTERS

NOTALNUM FUNCTION: IT RETURNS THE POSITION OF NONALPHABETIC AND


NON NUMERIC CHARACTERS.

DATA _NULL_;
A= 'BC1-09A';
NA=NOTALPHA(A);
NAN=NOTALNUM(A);
PUT @5 'THE POSITION OF NONALPHABETIC CHARACTERS:' NA;
PUT @5 'THE POSITION OF NONALPHABETIC AND NON NUMERIC
CHARACTERS:' NAN;
RUN;

FIND FUNCTION: IT RETURNS THE POSITION OF THE CHARACTER AFTER


THE SPECIFIC LENGTH

DATA _NULL_;
A='THIS IS SAS PROGRAM HELLO!!!!!!! WORLD.';
FIND1=FIND(A,'O',16);
PUT @5 'THE POSITION OF THE CHARACTER:' FIND1;
RUN;

IN THE BELOW EXAMPLE THERE IS NO DIFFERENCE BETWEEN INDEX AND


FIND FUCNTIONS.

DATA _NULL_;
A='THIS IS SAS PROGRAM HELLO!!!!!!! WORLD.';
FIND1=FIND(A,'O');
PUT @5 'THE POSITION OF THE CHARACTER:' FIND1;
RUN;

FINDC AND FINDW FUCTIONS: THESE ARE SIMILIAR TO INDEXC AND


INDEXW FUNCTIONS BUT THE DIFFERNCE IS BY USING ‘FIND’ WE CAN
FIND OUT A SPECIFIC WORD OR A CHARACTER AFER SPECIFIED LENGTH.

DATA _NULL_;
A='THIS IS SAS PROGRAM HELLO !!!!!!! WORLD.';
FINDC1=FINDC(A,'D',16);
FINDW1=FINDW(A,'HELLO',10);
PUT @5 'THE POSITION OF THE CHARACTER:' FINDC1;
PUT @5 'THE POSITION OF THE CHARACTER:' FINDW1;
RUN;

LENGTH FUNCTION: THIS FUNCTION GIVES LENGTH OF THE STRING OR


WORD.

DATA _NULL_;
A='THIS IS SAS PROGRAM HELLO!!!!!!! WORLD.';
LENGTH1=LENGTH(A);
PUT @5 'THE LENGTH OF THE STRING IS:' LENGTH1;
RUN;

DATA _NULL_;
A= 'AMERICA';
LENGTH1=LENGTH(A);
PUT @5 'THE LENGTH OF THE STRING IS:' LENGTH1;
RUN;

LENGTHN FUCNTION: THIS FUNCTION RETURN 0 WHEN THERE ARE NO


CHARACTERS.

DIFFERENCE BETWEEN LENGTH AND LENGTHN FUNCTION IS LENGTH GIVES


VALUE AS 1 WHEN NO CHARACTER IS PRESENT WHERE AS LENGTHN GIVES
0.

DATA US1;
A='THIS IS SAS PROGRAMMING';
B='';
LEN1= LENGTHN(A);
LEN2=LENGTH(A);
LEN3=LENGTHN(B);
LEN4=LENGTH(B);
RUN;

COUNT FUCTION: IT GIVES HOW MANY TIMES A WORD IS REPEATED


DATA _NULL_;
A= 'AMERICA USA CANADA AMERICA JAPAN MEXICO AMERICA';
COUNT1=COUNT(A,'AMERICA');
PUT @5 'THE WORD COUNT IS:' COUNT1;
RUN;

PROPCASE FUCTION: THIS FUCTION IS USED FOR CREATING A PROPER


CASE.

DATA _NULL_;
A= 'AMERICA USA CANADA AMERICA JAPAN MEXICO AMERICA';
CASE=PROPCASE(A);
PUT @5 CASE;
RUN;

LOWCASE FUCTION: THIS FUNCTION IS USED FOR CREATING SMALLER


CASES.

DATA _NULL_;
A= 'AMERICA USA CANADA AMERICA JAPAN MEXICO AMERICA';
CASE=LOWCASE(A);
PUT @5 CASE;
RUN;

UPCASE FUNCTION: THIS FUCNTION IS USED FOR CONVERTING INTO UPPER


CASES.

DATA _NULL_;
A=' AMERICA USA CANADA AMERICA JAPAN MEXICO AMERICA';
CASE=UPCASE(A);
PUT @5 CASE;
RUN;

REVERSE FUNCTION: THIS FUNCTION REVERSES A WORD.

DATA _NULL_;
A='AMERICA USA CANADA AMERICA JAPAN MEXICO AMERICA';
REV=REVERSE(A);
PUT @5 REV;
RUN;

TRANSLATE FUNCTION: THIS FUNCTION IS USED TO REPLACE THE


ALPHABET OF A WORD.

SYNTAX TRANSLATE(ARGUMENT,'NEW WORD','OLD WORD');

DATA _NULL_;
A='AMERICA USA CANADA AMERICA JAPAN MEXICO AMERICA';
TRANS1=TRANSLATE(A,'Z', 'A');
PUT @5 TRANS1;
RUN;

TRANWRD FUCTION: THIS FUNCTION REPLACES THE WORD WITH A NEW


WORD.

SYNTAX TRANWRD(ARGUMENT, 'OLD WORD','NEW WORD').

DATA _NULL_;
A='AMERICA USA CANADA AMERICA JAPAN MEXICO AMERICA';
TRANS1=TRANWRD(A,'AMERICA', 'SINGAPORE');
PUT @5 TRANS1;
RUN;

SUBSTR FUNCTION: THIS FUNCTION IS USED FOR EXTRACTING A PART OF


A STRING OR CHARACTER.
SYNTAX SUBSTR(ARGUMENT,POSITIION,LENGTH)

DATA _NULL_;
A='THIS IS A SAS PROGRAM CREATED';
EXTRACTED=SUBSTR(A,15,14);
PUT @5 'THE OUTPUT IS:' EXTRACTED;
RUN;

DATA _NULL_;
A='THIS IS A SAS PROGRAM CREATED';
EXTRACTED=SUBSTR(A,4,9);
PUT @5 'THE OUTPUT IS:' EXTRACTED;
RUN;

SCAN FUNCTION: THIS FUNCTION IS USED FOR EXTRACTING A WORD

SYNTAX SCAN(ARGUMENT, POSITION)


DATA _NULL_;
A='THIS IS A SAS PROGRAM CREATED';
EXTRACTED=SCAN(A,3);
PUT @5 'THE OUTPUT IS:' EXTRACTED;
RUN;

EXTRACTING UPTO A SPECIFIED DELIMITER


SYNTAX SCAN(ARGUMENT,POSITION,'DELIMITER')

DATA _NULL_;
A='BANANAS, ORANGES & MANGOS ARE DELICIOUS FRUITS';
EXTRACTED=SCAN(A,1,'&');
PUT @5 'THE OUTPUT IS:' EXTRACTED;
RUN;

EXTRACTING FROM THE RIGHT SIDE WE USE NEGATIVE NUMERIC SIGH


SYNTAX SCAN(ARGUMENT,POSITION FROM RIGHT SIDE)

DATA _NULL_;
A='BANANAS, ORANGES & MANGOS ARE DELICIOUS FRUITS';
EXTRACTED=SCAN(A,-2);
PUT @5 'THE OUTPUT IS:' EXTRACTED;
RUN;

CANCATENATE FUNCTION(||): THIS FUCNTION IS USED FOR JOINING THE


CHARACTER STRINGS.

DATA _NULL_;
A='AMERICAN EXPRESS';
B=' TIMINGS 9.30 AM';
CANCAT= A||B;
PUT @5 CANCAT;
RUN;

COMPBL FUNCTION: THIS FUNCTION IS USED FOR REMOVING MULTIPLE


SPACES AND RETURN WITH A SINGLE BLANK SPACE.

DATA _NULL_;
A='AMERICAN EXPRESS';
COMP=COMPBL(A);
PUT @5 COMP;
RUN;

ANY FUCTION: ANY FUCTION IS OF VARIOUS TYPES I.E, ANYALNUM,


ANYAPLHA, ANYDIGIT, ANYSPACE, ANYPUNCT. THESE ARE USED TO
IDENTIFY THE FIRST ALPHABET OR NUMBER, ALPHABET,DIGIT, SPACE,
PUNCTUATION.
DATA ANY1;
X= '123 LOL TIMES #16 ';
Y= '1 MONKS OF 123 $ VEN';
A=ANYALNUM(X);
B=ANYALPHA(X);
C=ANYDIGIT(X);
D=ANYSPACE(X);
E=ANYPUNCT(X);
F=ANYALNUM(Y,2);
G=ANYALPHA(Y,11);*11TH POSITION IS BLANK IN THE 18TH POSITION
ALPHABET 'V'(VEN) EXISTS;
H=ANYDIGIT(Y,15);*AFTER 15TH POSITION(SPACE) THERE IS NO
DIGIT.SO OUTPUT IS 0 IN THE DATASET;
I=ANYPUNCT(Y,15);*IN THE 16TH POSITION PUNCTURATION MARK ($)
EXISTS.;
J=ANYSPACE(Y,5);
RUN;

CANCATENATION FUNCTIONS: TH
DATA CAT1;
X=' THE ';
Y=' AMERICAN ';
A=CAT(X,Y);* CAT:IT CANCATENATES THE STRINGS WITHOOUT REMOVING
THE LEADING AND TRAILING SPACES;
B=CATT(X,Y);*CATT: IT REMOVES THE TRAILING SPACES AND JOINS THE
STRINGS;
RUN;

DATA COMP12;
STRING='STUDING AMERICAN HISTORY AT BLOG !.... 12345';
RUN;
DATA COMP13;
SET COMP12;
STRING1=COMPRESS(STRING," "); REMOVES SPACE
RUN;

DATA COMP14;
SET COMP12;
STRING1=COMPRESS(STRING,' ','AK'); KEEPS ONLY ALPHABETS
RUN;

DATA COMP15;
SET COMP12;
STRING1=COMPRESS(STRING,' ','D'); COMPRESS ONLY DIGITS
RUN;

DATA COMP16;
SET COMP12;
STRING1=COMPRESS(STRING,' ','L'); COMPRESS ONLY LOWERCASE
ALPHABETS
RUN;

DATA COMP17;
SET COMP12;
STRING1=COMPRESS(STRING,' ','U'); COMPRESS ONLY UPPERCASE
ALPHABETS
RUN;

DATA COMP22;
SET COMP12;
STRING1=COMPRESS(STRING,' ','A'); COMPRESS ONLY ALPHABETS
RUN;

DATA COMP18;
SET COMP12;
STRING1=COMPRESS(STRING,'S','K'); KEEPS ONLY SPECIFIED ALPHABETS
HERE CAPITAL S
RUN;

DATA COMP19;
SET COMP12;
STRING1=COMPRESS(STRING,'!'); KEEPS ONLY !
RUN;

DATA COMP20;
SET COMP12;
STRING1=COMPRESS(STRING,' ','P'); COMPRESSES PUNCTUATION
RUN;
DATA COMP21;
SET COMP12;
STRING1=COMPRESS(STRING,'S','I'); COMPRESS ALPHABET SPECIFIED
RUN;

DATE FUNCTIONS

DATA F_1;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE DATETIME. RDATE DATE9. ;
FORMAT JDATE DATETIME. RDATE DATE9.;
DATALINES;
101 BAKER 10NOV2014:01:12:15 09NOV2019
102 BANKER 01MAY2013:02:15:59 09DEC2020
103 BROKER 14MAY2011:12:19:57 04MAY2018
;
RUN;

SECOND FUNCTION:IT RETURNS THE SECONDS VALUE

DATA F_2;
SET F_1;
SEC=SECOND(JDATE);
RUN;

MINUTE FUNCTION:IT RETURNS THE MINUTES VALUE


DATA F_3;
SET F_1;
MINUT=MINUTE(JDATE);
RUN;

HOURS FUNCTION: IT RETRUNS THE HOURS IN 24 HOURS FORMAT.


DATA F_4;
SET F_1;
HO=HOUR(JDATE);
RUN;

DAY FUNCTION: IT RETURNS THE DATE VALUES FORM 1 TO 31.


DATA F_5;
SET F_1;
DY=DAY(RDATE);
RUN;

WEEKDAY FUNCTION: IT RETURNS THE WEEKDAY VALUES FROM 1(SUNDAY)


TO 7(SATURDAY).
DATA F_6;
SET F_1;
WKD=WEEKDAY(RDATE);
RUN;

MONTH FUNCTION: IT RETRURNS THE MONTH VALUES FROM 1(JANUARY) TO


12(DECEMBER).

DATA F_7;
SET F_1;
MTH=MONTH(RDATE);
RUN;

QTR FUNCTION: IT RETURNS THE QUARTER VALUES FROM 1 TO 4


DATA F_8;
SET F_1;
DY=QTR(RDATE);
RUN;

DATEPART FUNCTION: IT RETURNS THE DATEPART VALUE FROM THE


VARIABLE.

DATA F_9;
SET F_1;
DPART=DATEPART(JDATE);
FORMAT DPART DATE9.;
RUN;

TIMEPART FUNCTION: IT RETURNS THE TIMEPART VALUE FROM THE


VARIABE.

DATA F_10;
SET F_1;
TPART=TIMEPART(JDATE);
FORMAT TPART TIME.;
RUN;

TODAY() FUNCTION: THIS FUNCTION IS USED FOR GETING OPERATING


SYSTEM DATE.
DATA F_11;
KK=TODAY();
FORMAT KK DATE9.;
RUN;

MDY FUNCTION:ASSIGN THE DATE FORMAT


SYNTAX MDY(MONTH,DAY,YEAR)
DATA F_12;
MODAYR=MDY(10,15,1945);
FORMAT MODAYR DATE9.;
RUN;

HMS FUNCTION: ASSIGNS TIME FORMAT.


SYNTAX HMS(HOURS,MINUTES,SECONDS);
DATA F_13;
HOMISE=HMS(12,56,01);
FORMAT HOMISE TIME.;
RUN;

DHMS FUNCTION:ASSIGNS DATE AND TIME FORMAT.


DATA F_14;
SET F_12;
DYHOMISE=DHMS(MODAYR,01,59,59);
FORMAT DYHOMISE DATETIME.;
RUN;

INTCK FUNCTION: IT GIVES THE INTERVAL BETWEEN DATE VALUES AND


TIME VALUES.

DATA F_15;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE DATE9. RDATE DATE9. ;
FORMAT JDATE DATE9. RDATE DATE9.;
DATALINES;
101 BAKER 10NOV2015 09NOV2019
102 BANKER 01MAY2013 09DEC2020
103 BROKER 14MAY2011 04MAY2018
;
RUN;

DATA F_16;
SET F_15;
INT1=INTCK('YEAR',JDATE,RDATE);
RUN;

IN THE BELOW EXAMPLE 'C' REPRESENTS GAP BETWEEN THE YEARS.


EXCLUDES STARTING AND ENDING YEARS
DATA F_17;
SET F_15;
INT1=INTCK('YEARS',JDATE,RDATE,'C');
RUN;

DATA F_18;
SET F_15;
INT1=INTCK('YEARS',JDATE,RDATE,'D');
RUN;

DATA F_19;
SET F_15;
INT1=INTCK('MONTHS',JDATE,RDATE);
RUN;

DATA F_20;
SET F_15;
INT1=INTCK('WEEKS',JDATE,RDATE);
RUN;

DATA F_22;
SET F_15;
INT1=INTCK('DAYS',JDATE,RDATE);
RUN;

DATA TRIM1;
INFILE DATALINES;
INPUT A $ B $ 4-12 C $;
TM= TRIM(A)|| TRIM(B) || C; *TRIM- PUTS ONE BLANK SPACE WHEN THE
VARIABLE IS MISSING;
TN= TRIMN(A)|| TRIMN(B)|| C; *TRIMN - DOESNT LEAVE ANY SPACE
BETWEEN THE VARIABLES EVEN ITS MISSING;
DATALINES;
THE ALASKAN BIRD
THE BIRD
;
RUN;

MISCELANEOUS FUNTIONS

EXIST FUNTION: EXIST FUNCTION HOLDS VALUES 1 AND 0. IT HELPS IN


FINDING WHETHER THE DATASET EXISTS OR NOT

DATA _NULL_;
IF EXIST('WORK.SUN')=1 THEN PUT@5 'DATASET EXISTS...';
ELSE PUT @5 "DATASET DOESN'T EXISTS";
RUN;

OPEN FUNTION: IT OPENS THE DATASET

ATTRN FUNTION: THIS IS USED TO FIND THE NUMBER OF VARIABLES AND


OBSERVATIONS IN THE DATASET.
SEE BELOW EXAMPLE FOR NUMBER OF VARIABLES AND OBSERVATIONS.

DATA _NULL_;
IF EXIST('WORK.SUN')=1 THEN DO;
OPENING=OPEN('WORK.SUN') ;
VAR=ATTRN(OPENING,'NVAR');
OBS=ATTRN(OPENING,'NOBS');
CS=CLOSE(OPENING);
PUT @10 'DATASET EXISTS, THE VARIABLES AND OBSERVATIONS
ARE: 'VAR'AND'OBS'';
END;
ELSE PUT @10 "DATASET DOESN'T EXIST";
RUN;

DATA SUN;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
ALISHA M FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALISHA F FRENCH 18 22 66 74
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

PUT FUNTION: THIS FUNCTION IS USED FOR CONVERTING THE NUMERIC


FORM OF A VARIABLE TYPE TO CHARACTER FORM.

DATA K_1;
SET SUN;
M=PUT(MARKS,$12.);
RUN;

INPUT FUNCTION: THIS FUNCTION IS USED FOR CONVERTING CHARACTER


FORMAT OF NUMERIC VALUES TO NUMERIC FORMATS
DATA K_2;
SET K_1;
N=INPUT(M,BEST.);
RUN;
OPTIONS

OPTIONS ARE OF THREE TYPES:


1.STATEMENT OPTIONS
2.DATASET OPTIONS
3.GLOBAL OPTIONS

STATEMENT OPTIONS: THESE OPTION STATEMENTS ARE USED IN THE


DATASET.THESE ARE CREATED WITHIN THE DATA STEP AND ARE APPIED TO
ALL SAS DATASETS THAT ARE CREATED. THESE WORK BEFORE PDV.

DATASET OPTIONS: THIS OPTIONS ARE USE IN DATASET.THESE SHOULD BE


SPECIFIED FOR EACH DATASET THAT ARE CREATED WITHIN THE DATA
STEP.THESE WORK BEFORE THE PDV.

GLOBAL OPTIONS: THESE OPTIONS ARE USED OR EFFECT ENTIRE SAS


PROGRAM.
EXAMPLE:
OPTIONS NODATE NOCENTER NONUMBER.

STATEMENT OPTIONS

DATA STANDARD;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

LABEL STATEMENT: IT IS USED FOR LABELLING THE VARIABLE NAMES.


DATA STANDARD5;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
LABEL NAME='STUDENT NAME' SEX='STUDENT GENDER' AGE='STUDENT AGE'
MARKS='STUDENT SCORE' HEIGHT='STUDENT HEIGHT' WEIGHT='STUDENT
WEIGHT';
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

DROP STATEMENT: IT IS USED FOR DROPPING THE SPECIFIED VARIABLE.


DATA STANDARD2;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DROP HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

LENGTH STATEMENT: SPECIFIES THE LENGTH FOR EACH VARIABLE. LENGTH


STATEMENT MUST BE WRITTEN BEFORE SET STATEMENT.
DATA E_2;
INFILE CARDS;
LENGTH NAME $15.;
INPUT EMPID NAME $;
CARDS;
1001 SAMAMMAR
1002 JOSHSAMPANDA
;
RUN;

RENAME STATEMENT IT IS USED FOR RENAMING THE VARIABLES.


DATA STANDARD3;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
RENAME SEX=GENDER;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

KEEP STATEMENT:IT IS USED FOR KEEPING THE REQUIRED VARIABLES


DATA STANDARD1;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
KEEP NAME SEX AGE;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

WHERE STATEMENT: IT IS CONDITIONAL STATEMENT USED FOR GETTING


THE REQUIRED OBSERVATIONS.
DATA STANDARD4;
SET STANDARD;
WHERE SEX='F';
RUN;

REMOVE STATEMENT: IT REMOVES THE DESIRED OBSERVATIONS. IT WORKS


ONLY WITH THE MODIFY STATEMENT. TO APPLY MODIFY STATEMENT THE
BELOW PROCEDURE SHOULD BE APPLIED.
DATA STANDARD5;
SET STANDARD;
RUN;

DATA STANDARD5;
MODIFY STANDARD5;
IF SEX='F' THEN REMOVE;
RUN;

DELETE STATEMENT:IT IS USED FOR DELETING THE OBSERVATIONS


DATA STANDARD6;
SET STANDARD;
IF AGE=13 THEN DELETE;
RUN;

STOP STATEMENT: IT STOP EXECUTION OF THE DATA STEP.


DATA STOP1;
STOP;
SET STANDARD;
RUN;

RETAIN STATEMENT: IT HOLD THE PREVIOUS VALUES ACTS AS


ACCUMULATOR.
DATA STAND_1;
RETAIN K;
SET STANDARD;
K+1;
RUN;
OUTPUT STATEMENT: THE OUTPUT STATEMENT TELLS SAS TO WRITE THE
CURRENT OBSERVATION TO A SAS EXISTING DATASET OR OUTPUTS AS
DEFINED IN THE CONDITIONAL STATEMENT.
DATA STAND_2;
SET STANDARD;
OUTPUT;
DOE='18.02.14';
OUTPUT;
RUN;

DATA STAND_3;
SET STANDARD;
IF SEX='M' THEN OUTPUT;
RUN;

DATASET OPTION

OPTIONS HAVE TO BE WRITTEN BESIDE THE DATASET NAME.

DROP OPTION: IT IS USED FOR DROPPING THE SPECIFIED VARIABLES


DATA STAND_5(DROP=NAME SEX AGE);
SET STANDARD;
RUN;

WHERE OPTION: IT IS CONDITIONAL OPTION USED FOR GETTING THE


REQUIRED OBSERVATIONS
DATA STAND_OP (WHERE= (SEX='F'));
SET STANDARD;
RUN;

LABEL OPTION: IT IS USED FOR LABELLING THE DATASET NAME.


DATA STAND_IO(LABEL=PROFILE);
SET STANDARD;
RUN;

RENAME OPTION: IT IS USED FOR RENAMING THE VARIABLE


DATA STAND_A(RENAME=(NAME=STUDENT_NAME SEX=STUDENT_SEX
AGE=STUDENT_AGE MARKS=STUDENT_MARKS)
DROP=SUBJECT HEIGHT WEIGHT);
SET STANDARD;
RUN;

KEEP OPTION:IT IS USED FOR KEEPING THE REQUIRED VARIABLES.


DATA STAND_4(KEEP=NAME SEX AGE);
SET STANDARD;
RUN;

PROC OPTION

THESE OPTIONS ARE USED GLOBALLY THROUGH OUT THE SAS ENVIRNOMENT.

WITH THE BELOW PROGRAM ALL THE OPTIONS ARE DISPLAYED IN THE LOG
WINDOW.
PROC OPTIONS;
RUN;

IT GIVES THE INDEPTH DESCRIPTION OF A OPTION.


PROC OPTIONS LONG;
RUN;

GIVES THE SHORT FORM OF OPTIONS DESCRIPTION


PROC OPTIONS SHORT;
RUN;

ALL THE RELATED OPTIONS ARE GROUPED TOGETHER.THE BELOW PROGRAM


IS USED FOR DISPLAYING THE OPTIONS IN A GROUP.
PROC OPTIONS LISTGROUPS;
RUN;

TO VIEW THE OPTIONS RELATED TO A GROUP. THE BELOW PROGRAM IS


CODED.IN THE BELOW EXAMPLE OPTIONS OF PDF ARE DISPLAYED IN THE
LOG WINDOW.
PROC OPTIONS GROUP=PDF;
RUN;

UPTO THE ABOVE PROGRAM ALL THE OPTIONS AND THEIR GROUP IS CODED.
NOW LETS SEE HOW TO USE THE OPTIONS GLOBALLY.

WRITING GLOBAL OPTIONS


OPTIONS NODATE NOCENTER; (OBSERVE THE RESULT VIEWER WINDOW)

PROC PRINT DATA=KEY1;


RUN;

THESE OPTIONS WOR UNTIL THE PROGRAM IS LIVE TO BRING BACK TO


DEFUALT.

OPTIONS DATE CENTER;


PROC PRINT DATA=KEY1;
RUN;
PROC PRINT DATA=KEY1;
OPTIONS PAGENO=9;(OBSERVE THE OUTPUT WINDOW)
RUN;

ERRORS: THIS OPTION HELPING IN LIMITING THE ERRORS DISPLAYED IN


THE LOG WINDOW.
OPTIONS ERRORS=1;

YEARCUTOFF: BY DEFAULT SAS STORES THE YEAR AS 1920. SO IT CAN BE


CHANGED AS REQUIRED.
OPTIONS YEARCUTOFF=2010;

MISSING=0: WHEREEVER THE NUMERIC VALUE IS MISSING IT INSERT 0


VALUE
OPTIONS MISSING=0;

LS(LINESIZE):IN THE BELOW PROGRAM IT PRINTS THE NUMBER OF LINES


PER PAGE. MAXIMUM LINES PER PAGE ARE 256
OPTIONS LS=20;

PS(PAGESIZE): IT DEFINES THE PAGE SIZE. THE PAGE SIZE RANGES


FROM 15 TO 32767.
OPTIONS PS=234;

BELOW OPTION IS USED FOR CONVERTING THE LOWER CASE LETTERS TO


UPPER CASE.
OPTIONS VALIDVARNAME=UPCASE;
ARRAYS

ARRAYS ARE OF TWO TYPES THEY ARE ONE DIMENSIONAL ARRAYS AND TWO
DIMENSIONAL ARRAYS

ONE DIMENSIONAL ARRAYS: ONE DIMENSIONAL ARRAYS WORK ‘COLUMN’


WISE.
DATA FRUITS;
INPUT FRUITS $ 5-11 GRADE1 $13-14 GRADE2 :$16-18 GRADE3
$19-21;
DATALINES;
MANGO A- A A+
APPLE A A+ A-
ORANGE A . A+
BANANA . A- A+
;
RUN;

DATA FRUITS1;
SET FRUITS;
ARRAY FRUIT[3] GRADE1 GRADE2 GRADE3;
DO P=1 TO 3;
IF FRUIT[P]=' ' THEN FRUIT[P]='-';
END;
DROP P;
RUN;
IN THE ABOVE PROGRAM ARRAY SHOULD START WITH ARRAY STATEMENT
FOLLOWED BY ARRAY NAME (ARRAY FRUIT AND [3] REPRESENTS NUMBER OF
VARIABLES. THEN FOLLOWED BY 3 VARIABLES.

IN THE BELOW PROGRAM IF THE VARIABLE VALUES ARE MISSING THEN


PUTTING THE PREVIOUS VARIABLE VALUES.
DATA FRUITS2;
SET FRUITS;
ARRAY FRUIT[3] GRADE1 GRADE2 GRADE3;
IF GRADE1=' ' THEN GRADE1=' ';
ELSE DO P=1 TO 3;
IF FRUIT[P]=' ' THEN FRUIT[P]=FRUIT[P-1];
END;
DROP P;
RUN;

CONVERTING THE VARIABLE FRUITS' OBSERVATION TO LOWERCASE.


DATA FRUITS3;
SET FRUITS;
ARRAY R(1) FRUITS;
K=1;
IF R(K) NE ' ' THEN DO;
FRUITS=LOWCASE(R(K));
END;
DROP K;
RUN;

DATA SUB;
INPUT N1 $ N2 $ N3 $ N4 $ N5 $;
DATALINES;
BEN RAM SAM RON ANASA
BILL JAMES CHRIS GLEN VIDH
BRUZ ELLE SAMG JENNY RUBEN
;
RUN;

CREATING A NEW VARIABLE. IN THE BELOW EXAMPLE A NEW VARIABLE


CALLED 'ANS' IS CREATED.
DATA REPLACE;
SET SUB;
ARRAY GOP(*) N1-N5;
ARRAY ANS(*)$ A1-A5;
DO I= 1 TO 5;
ANS(I)=TRANSLATE(GOP(I),'X','A');
END;
DROP I N1-N5;
RUN;

'*': IT IS USED WHEN THE NUMBER OF VARIABLES ARE UNKNOWN. AFTER


THE ARRAY NAME WE PUT '$' FOR DISTINGUISHING THE CHARACTER FROM
THE NUMBERIC VARIABLES.
DATA REPLACE1;
SET SUB;
ARRAY GOP(*) N1-N5;
ARRAY ANS(*)$ A1-A5;
DO I= 1 TO 5;
ANS(I)=TRANSLATE(GOP(I),'X','A');
END;
DROP I N1-N5;
RUN;

DIM OPTION: IT IS WRITTEN IN THE LOOP WHEN THE NUMBER OF


VARIABLES ARE UNKNOWN.
DATA REPLACE3;
SET SUB;
ARRAY GOP(*) N1-N5;
ARRAY ANS(*)$ A1-A5;
DO I= 1 TO DIM(GOP);
ANS(I)=TRANSLATE(GOP(I),'X','A');
END;
DROP I N1-N5;
RUN;

CREATING THE TEMPORARY ARRAY

SYNTAX FOR TEMPORARY ARRAYS


ARRAY ARRAY_NAME(NUMBER_OF_VARIABLES) _TEMPORARY_ (V1, V2, V3,
V4, V5, V6). IN THE DIM OPTON WE HAVE TO SPECIFY THE TEMPORARY
ARRAY NAME.

DATA MATCHING;
INPUT COLORS1 $ COLORS2 $ COLORS3 $ COLORS4 $ COLORS5 $;
DATALINES;
WHITE BROWN GREEN YELLOW PINK
ORANGE RED GREEN PINK YELLOW
BROWN BROWN YELLOW YELLOW PINK
BROWN YELLOW RED PINK WHITE
;
RUN;

DATA RESULTS;
SET MATCHING;
ARRAY COLO(*)$ COLORS1-COLORS5;
ARRAY MATCHING(5)$ _TEMPORARY_('BROWN', 'BROWN', 'YELLOW',
'YELLOW','PINK');
DO O=1 TO DIM(MATCHING);
IF COLO(O)=MATCHING(O);
END;
DROP O;
RUN;

GETTING THE SCORE FOR A STUDENT.


DATA EXAM;
INPUT NAME $ O1 $ O2 $ O3 $ O4 $ O5 $ O6 $;
DATALINES;
EAM A B A C D C
SAM B C . B A A
BEN C B A A D A
;
RUN;

DATA VAL;
SET EXAM;
C=0;
ARRAY RES(*) $ O1-O6 ;
ARRAY ANS(6) $ _TEMPORARY_ ('A','B','B','C','A','A');
DO I=1 TO DIM(ANS);
IF RES(I)='' THEN C=C-1;
IF RES(I)=ANS(I) THEN DO;
C=C+1;
END;
END;
DROP I;
RUN;

TWO DIMENSIONAL ARRAYS WORK COLUMN WISE FIRST THEN ROW WISE.
DATA PRICES;
INPUT COD $ U1-U3/U4-U6;
ARRAY MARK(2,3) U1-U6;
DO I=1 TO 2;
DO J=1 TO 3;
MARK (I,J)=ROUND(MARK(I,J), 0.1);
END;
END;
DROP I J;
CARDS;
P101 23.45 22.56 26.23
26.23 24.34 25.67
P102 45.23 45.64 46.12
47.89 48.23 49.34
P103 56.87 57.34 58.32
59.34 58.12 59.32
;
IMPORTING AND EXPORTING THE DATA

LIBRARY: SAS LIBRARIES ARE CREATED TO IMPORT THE RAW DATASETS


AND FOR STORING THE CREATED DATASETS. TO CREATE A LIBRARY WE
HAVE TO START WITH "LIBNAME" STATEMENT AND FOLLOWED BY THE
LIBRARY NAME AND THE PATH TO STORE THE DATA.

SYNTAX: LIBNAME RAW "C:\USERS\RICKY\DESKTOP\BQ"

CRITERIAS TO BE MET WHILE CREATING A LIBRARY


1.MUST START WITH ALPHABETS OR UNDERSCORE.
2.MUST BE EQUAL OR LESS THAN 8 CHARACTERS IN LENGTH

LIBNAME ELEN "C:\USERS\RICKY\DESKTOP\BQ";

DATASETS WHICH ARE CREATED CAN BE STORED IN THE CREATED LIBRARY


BY SPECIFYING THE LIBRARY NAME BEFORE THE DATASET NAME.

SYNTAX DATA ELEN.CREATED;


DATA ELEN.CREATED;
INFILE DATALINES;
INPUT NAME $ SEX $ AGE;
GENA F 25
SEMA F 21
VEZU M 36
;
RUN;

IMPORTING THE DATA

SYNTAX FOR IMPORTING THE DATA


PROC IMPORT DATAFILE="PATH"
OUT=LIBRARY_NAME.DATASET_NAME
DBMS='SOURCE FILE FORMAT';
RUN;

IN DBMS WE HAVE TO SPECIFY IN WHICH FORMAT THE FILE IS PRESENT.


FOR TEXT FORMAT WE HAVE TO SPECIFY TAB, FOR EXCEL AS XLS, FOR
CSV AS CSV, FOR ACCESS FILE AS MDB.

IMPORTING A TEXT FILE


PROC IMPORT DATAFILE="C:\USERS\RICKY\DESKTOP\FBA AMAZON.TXT"
OUT=WORK.B
DBMS=TAB;
RUN;
IMPORTING A EXCEL FILE
PROC IMPORT DATAFILE="C:\USERS\RICKY\DESKTOP\FBA AMAZON.XLS"
OUT=WORK.B
DBMS=XLS
REPLACE;
RUN;

IN THE ABOVE PROGRAM THE REPLACE STATEMENT IS USED FOR


OVERWRITING THE EXISTING PREVIOUS PROGRAM EXECUTION.

GETNAMES=YES/NO: HELPS IN GETTING THE VARIABLE NAMES WHEN CODED


AS 'YES' AND REMOVES THE VARIABLE NAMES WHEN CODED AS 'NO'.BY
DEFAULT IT WILL BE 'YES'.
PROC IMPORT DATAFILE="C:\USERS\RICKY\DESKTOP\FBA AMAZON.XLS"
OUT=WORK.B
DBMS=XLS
REPLACE;
GETNAMES=NO;
RUN;

TO GET THE SPECIFIED EXCEL SHEET AMONG THE GROUP OF EXCEL SHEETS
WE USE SHEET STATEMENT.
PROC IMPORT DATAFILE="C:\USERS\RICKY\DESKTOP\FBA AMAZON.XLS"
OUT=WORK.B
DBMS=XLS
REPLACE;
GETNAMES=NO;
SHEET='AMAZON_FBA_SALES';
RUN;

EXPORTING THE DATA


SYNTAX:
PROC EXPORT DATA= DATASET_NAME
OUTFILE='PATH'
DBMS='SOURCE FILE FORMAT';
RUN;

DATA KEYS;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
ALISHA M FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALISHA F FRENCH 18 22 66 74
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

EXPORTING THE DATA TO TEXT DOCUMENT


PROC EXPORT DATA=WORK.KEYS (WORK. :REPRESENTS THE DEFAULT
LIBRARY)
OUTFILE="C:\USERS\RICKY\DESKTOP\BQ\INDEX1.TXT"
DBMS=TAB;
RUN;

PROC EXPORT DATA=WORK.KEYS


OUTFILE="C:\USERS\RICKY\DESKTOP\BQ\INDEX1.XLS"
DBMS=XLS
REPLACE;
RUN;

PUTNAMES IS SIMILAR TO GETNAME.


PROC EXPORT DATA=WORK.KEYS
OUTFILE="C:\USERS\RICKY\DESKTOP\BQ\INDEX2.XLS"
DBMS=XLS
REPLACE;
PUTNAMES=NO;
RUN;

EXPORTING THE SPECIFIC EXCEL SHEET WE USE SHEET STATEMENT.


PROC EXPORT DATA=WORK.KEYS
OUTFILE="C:\USERS\RICKY\DESKTOP\BQ\INDEX2.XLS"
DBMS=XLS
REPLACE;
PUTNAMES=NO;
SHEET='POPULATION';
RUN;
PROC CONTENTS

PROC CONTENTS GIVES THE DESCRIPTIVE INFORMATION OF A DATASET


COMPONENTS I.E, VARIABLE PROPERTIES LIKE TYPE WHETHER CHARACTER
OR NUMERIC, FORMAT,INFORMAT , LABEL AND POSITION.

SYNTAX FOR PROC CONTENTS


PROC CONTENTS DATA=DATASET_NAME;
RUN;

DATA E_10;
INFILE DATALINES;
INPUT EMP_ID DES $:10. JDATE RDATE;
INFORMAT JDATE RDATE DDMMYY10.;
DATALINES;
101 BAKER 10/11/2014 09/11/2019
102 BANKER 01/05/2013 09/12/2020
103 BROKER 14/05/2011 04/05/2018
;
RUN;

PROC CONTENTS DATA=E_10;


RUN;

IN THE RESULT VIEWER WINDOW WE GET 3 TABLES THE CONTENTS


PROCEDURE, ENGINE/HOST DEPENDENT INFORMATION, ALPHABETIC LIST OF
VARIABLES AND ATTRIBUTES. IN THE CONTENTS PROCEDURE WE HAVE DATA
SET NAME, OBSERVATIONS, MEMBER TYPE, VARIABLES, ENGINE ETC.
ENGINE/HOST DEPENDENT INFORMATION CONTAINS DATA SET PAGE SIZE,
MAXIMUM OBSERVATONS PER PAGE ETC. ALPHABETIC LIST OF VARIABLES
AND ATTRIBUTES DESCRIBES ABOUT VARIABLE TYPE FORMAT AND
INFORMATC ETC.

IN THE BELOW PROGRAM THIRD TABLE IT SHOWS THE VARIABLE IN THE


CREATED ORDER AND ALPHABETIC LIST OF VARIABLES AND ATTRIBUTES
TABLE IS NOT CREATED.
PROC CONTENTS DATA=E_10 VARNUM;
RUN;

BELOW PRORGAM CREATES FOURTH TABLE ‘VARIABLES IN CREATION ORDER’


IN THIS TABLE VARIABLES ARE ARRANGED AS IN DATASET ORDER.
PROC CONTENTS DATA=E_10 POSITION;
RUN;
ONLY ONE TABLE IS CREATED AND IT’S VARIABLES IN CREATION ORDER.
PROC CONTENTS DATA=E_10 SHORT;
RUN;

SHORTER FORM OF VARIABLES IN CREATION ORDER TABLE IS DISPLAYED.


PROC CONTENTS DATA=E_10 VARNUM SHORT;
RUN;
PROC COMPARE

WHETHER MATCHING VARIABLES HAVE DIFFERENT VALUES, WHETHER ONE


DATA SET HAS MORE OBSERVATIONS THAN THE OTHER, WHAT VARIABLES
THE TWO DATA SETS HAVE IN COMMON, HOW MANY VARIABLES ARE IN ONE
DATA SET BUT NOT IN THE OTHER, WHETHER MATCHING VARIABLES HAVE
DIFFERENT FORMATS, LABELS, OR TYPES, A COMPARISON OF THE VALUES
OF MATCHING OBSERVATIONS.
PROC COMPARE IS GENERALLY USED FOR VALIDATION OF THE DATA.

DATA E_11;
INFILE DATALINES;
INPUT EMP_ID: $ DES $:10. JDATE RDATE;
INFORMAT JDATE DATE7. RDATE DDMMYY10.;
FORMAT JDATE DATE7. RDATE DDMMYY10.;
LABEL JDATE='JOINING DATE';
DATALINES;
101 BAKER 10NOV14 09/11/2019
102 BANKER 01MAY13 09/12/2020
103 BROKER 14MAY11 04/05/2018
104 ATTORNEY 10MAR10 06/04/2017
105 DOCTOR 11JUN12 06/04/2013
;
RUN;

DATA E_14;
INFILE DATALINES;
INPUT EMP_ID DES $9-15 JDATE RDATE;
INFORMAT JDATE RDATE DATE7.;
DATALINES;
101 BAKER 10NOV15 09NOV19
102 BANKER 01MAY13 09DEC20
103 BROKER 14MAY11 04MAY18
;
RUN;

PROC COMPARE BASE=E_11 COMPARE=E_14;


RUN;

IN THE RESULT VIEWER WINDOW WHEN VALIATED WE GET DIFFERENCES


BETWEEN THE DATASET VARIABLE.

IN THE BELOW PROGRAM VALIDATION IS DONE AND THE RESULTS ARE


EXACTLY SAME CAN BE VIEWED IN THR RESULT VIEWER WINDOW.
DATA E_11A;
INFILE DATALINES;
INPUT EMP_ID: $ DES $:10. JDATE RDATE;
INFORMAT JDATE DATE7. RDATE DDMMYY10.;
FORMAT JDATE DATE7. RDATE DDMMYY10.;
LABEL JDATE='JOINING DATE';
DATALINES;
101 BAKER 10NOV14 09/11/2019
102 BANKER 01MAY13 09/12/2020
103 BROKER 14MAY11 04/05/2018
104 ATTORNEY 10MAR10 06/04/2017
105 DOCTOR 11JUN12 06/04/2013
;
RUN;

DATA E_11B;
INFILE DATALINES;
INPUT EMP_ID: $ DES $:10. JDATE RDATE;
INFORMAT JDATE DATE7. RDATE DDMMYY10.;
FORMAT JDATE DATE7. RDATE DDMMYY10.;
LABEL JDATE='JOINING DATE';
DATALINES;
101 BAKER 10NOV14 09/11/2019
102 BANKER 01MAY13 09/12/2020
103 BROKER 14MAY11 04/05/2018
104 ATTORNEY 10MAR10 06/04/2017
105 DOCTOR 11JUN12 06/04/2013
;
RUN;

PROC COMPARE BASE=E_11A COMPARE=E_11B;

RUN;
PROC TRANSPOSE

PROC TRANSPOSE: THIS PROCEDURE HELPS THE DATA TO RESHAPE, BY


USING PROC TRANSPOSE THE ROWS ARE CONVERTED TO COLUMN AND
COLUMNS ARE CONVERTED TO ROWS.

DATA PROFILE;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

THE VARIABLES WHICH ARE TO BE TRANSPOSED SHOULD BE WRITTEN IN


THE VAR STATEMENT. THE TRANSPOSED DATA IS CREATED IN THE T1
DATASET.
PROC TRANSPOSE DATA=PROFILE OUT=T1;
VAR NAME SEX SUBJECT;
RUN;

IN COLUMN ATTRIBUTES YOU CAN SEE VARIABLE NAMES UNDER '_NAME_'


OR LABEL AS 'NAME OF FORMER VARIABLE'.

TRANSPOSING CAN BE DONE IN THE SORTING ORDER EVEN.


PROC SORT DATA=PROFILE;
BY SUBJECT;
RUN;

PROC TRANSPOSE DATA=PROFILE OUT=T2;


BY SUBJECT;
VAR NAME SEX MARKS;
RUN;

BY USING THE PREFIX WE CAN CREATE THE USER DEFINED VARIABLES.


THE VARIABLE COL1,COL2 ETC CHANGES TO PREFIX STATED VARIABLES.

PROC TRANSPOSE DATA=PROFILE OUT=T2 PREFIX=RICKY;


BY SUBJECT;
VAR NAME SEX MARKS;
RUN;

ID STATEMENT: THIS STATEMENT IS USED FOR CONVERTING THE


OBSERVATIONS INTO VARIABLE NAMES.

PROC TRANSPOSE DATA=PROFILE OUT=T3;


ID NAME;
VAR SUBJECT SEX MARKS HEIGHT WEIGHT;
RUN;
PROC MEANS

PROC MEANS: THIS PROCEDURE IS USED FOR NUMERICAL ANALYSIS. THE


DEFAULT STATISTICAL MEASURES WE GET IN PROC MEANS ARE
1. N(NUMMBER OF OBSERVATIONS)
2. MEAN (IT GIVES THE AVERAGE VALUES)
3. STD(STANDARD DEVIATION)
4. MAX(THE LARGEST VALUE)
5. MIN(THE MINIMUM VALUE)

SYNTAX:
PROC MEANS DATA=DATASET_NAME;
VAR VARIABLES
RUN;

DATA RACING;
INPUT RACER $ 5-14 BIKE $ 15-33 DISTANCE TIME_MIN;
DATALINES;
MIKE MV Agusta F3 326 15
ROGER Kawasaki 300 25
ELLON KTM RC250 312 19
JAMES Aprilia RSV4 RF 253 45
JACK CHI Ducati Panigale R 340 8
;
RUN;

PROC MEANS DATA=RACING;


VAR DISTANCE TIME_MIN;
RUN;

data profile;
infile datalines;
input Name $ Sex $ Subject $:15. Age Marks Height Weight;
datalines;
Ellen M Mathematics 15 66 153 55
Sam M English 12 88 169 70
cristana F Social 14 56 165 60
Maria F Science 15 71 160 66
John M French 13 45 154 64
Victoria F Mathematics 11 57 159 71
Britney F Arabic 19 64 161 59
Abdul M English 13 47 162 54
Vikram M Social 17 47 144 72
Alisha F French 18 22 166 74
Mona F Arabic 11 87 156 45
William M Science 16 55 161 77
Alberta F English 14 61 162 71
;
run;

BY STATEMENT: IT IS USED FOR GROUPING THE VALUES INTO DIFFERENT


TABLES .

PROC SORT DATA=PROFILE;


BY SEX;
RUN;

PROC MEANS DATA=PROFILE;


VAR MARKS HEIGHT WEIGHT;
BY SEX;
RUN;

TO STORE THE STATISTCAL VALUES IN A DATASET WE HAVE TO USE


OUTPUT OUT STATEMENT.
PROC MEANS DATA=PROFILE;
VAR MARKS HEIGHT WEIGHT;
OUTPUT OUT=P1;
BY SEX;
RUN;

TO GET THE DESIRED STATISTICAL VALUES WE HAVE TO SPECIFY IN THE


DATA STATEMENT.
PROC MEANS DATA=PROFILE N VAR NMISS MEAN SUM MAX MIN SKEWNESS;
VAR MARKS HEIGHT WEIGHT;
OUTPUT OUT=P2;
BY SEX;
RUN;

CLASS STATEMENT: THIS STATEMENT IS USED FOR CLASSIFING THE DATA


AS SPECIFIED IN THE CLASS STATEMENT.
PROC MEANS DATA=PROFILE N VAR NMISS MEAN;
CLASS SEX;
VAR MARKS HEIGHT WEIGHT;
RUN;

PROC MEANS DATA=PROFILE;


VAR MARKS HEIGHT AGE;
CLASS SEX;
BY SEX;
RUN;
PROC FREQUENCY

DATA SCORES;
INPUT NAME $ SUBJECT $ MARKS1-MARKS3;
DATALINES;
SAM MATHS 10 12 13
PARVEZ ANTRO 19 10 25
ENNA PHYSC 21 15 13
ESKANA SOCIO 10 12 3
REGIA SCIENCE 12 5 6
;
RUN;

PROC FREQUENCY IS USED FOR COUNTING THE OCCURENCE OF THE


OBSERVATIONS. THE DEFAULT VALUES WHICH WE GET ARE FREQUECY
PERCETAGE, CUMULATIVE FREQUENCY AND CUMMULATIVE PERCENTAGE.
FREQUENCY TABLES ARE OF TWO TYPES :
1. ONE WAY FREQUENCY TABLE AND
2. CROSS TABULATE FREQUENCY TABLE

PROC FREQ DATA=SCORES;


TABLES NAME MARKS1 MARKS2;
RUN;

NOCUM OPTION: THIS OPTION REMOVES THE CUMMULATIVE VALUES.


PROC FREQ DATA=SCORES;
TABLES NAME MARKS1 MARKS2/NOCUM;
RUN;

NOFREQ OPTION:THIS OPTION REMOVES THE FREQUENCY VALUES


PROC FREQ DATA=SCORES;
TABLES NAME MARKS1 MARKS2/NOFREQ;
RUN;

NOPERCENT OPTION:THIS OPTON REMOVES THE PERCENTAGE VALUES.


PROC FREQ DATA=SCORES;
TABLES NAME MARKS1 MARKS2/NOPERCENT;
RUN;

CROSS TABULATION: BY USING CROSS TABULATION WE GENERATE THE TWO


WAY TABLE.
PROC FREQ DATA=SCORES;
TABLES NAME *MARKS1 NAME*MARKS2;
RUN;
PROC FREQ DATA=SCORES;
TABLES NAME*MARKS1 NAME*MARKS2/NOCUM;
RUN;

NOROW OPTION: THIS OPTION REMOVES THE ROW VALUES.THIS IS USED IN


CROSS TABULATION. YOU CAN SEE IN THE RESULT VIEWER WINDOW.
PROC FREQ DATA=SCORES;
TABLES NAME*MARKS1 NAME*MARKS2/NOROW;
RUN;

LIST OPTION: THIS OPTION IS USED FOR CREATING A SIMPLE TABULAR


FORM WHERE FREQUENCY, CUMMULATIVE FREQUENCY PERCENTAGE AND
CUMMMULATIVE PERCETAGE ARE PRESENT.
PROC FREQ DATA=SCORES;
TABLES NAME*MARKS1/LIST;
RUN;

OUT OPTION: IT IS USED TO STORE THE FREQUENCY VALUES IN A


DATASET.
PROC FREQ DATA=SCORES NOPRINT;
TABLES NAME*MARKS1/OUT=T1;
RUN;
PROC PRINT

PROC PRINT DATA=KEY1;


RUN;

NOOBS: THIS OPTION REMOVES THE OBS COLUMN

PROC PRINT DATA= KEY1 NOOBS;


RUN;

DOUBLE THIS OPTION INSERT DOUBLE SPACING BETWEEN THE


OBSERVATIONS
PROC PRINT DATA= KEY1 DOUBLE;
RUN;

HEADING=VERTICAL : THIS OPTION IS USED TO PUT THE VARIABLES


VERTICALLY
PROC PRINT DATA=KEY1 HEADING= VERTICAL;
RUN;
N: THIS OPTION IS USED TO PUT NUMBER OF OBSERVATIONS
PROC PRINT DATA=KEY1 N;
RUN;

WIDTH=FULL/MINIMUM: THIS IS USED FOR ADJUSTMENT OF THE WIDTH OF


THE VARIABLE.
PROC PRINT DATA=KEY1 WIDTH=FULL;
RUN;

VAR STATEMENT: WE USE THIS STATEMENT FOR CUSTOMIZATION

PROC PRINT DATA=KEY1;


VAR NAME SEX AGE SUBJECT MARKS HEIGHT;
RUN;

PROC PRINT DATA=KEY1;


ID NAME SEX AGE;
RUN;

LABEL STATEMENT: THIS STATEMENT IS USED FOR LABELING THE


VARIABLE NAMES
PROC PRINT DATA=KEY1 LABEL;
var NAME SEX AGE MARKS HEIGHT WEIGHT;
LABEL NAME="Student Name"
SEX= "Student Sex"
AGE= "Student Age";
RUN;

SPLIT OPTION: THIS OPTION IS USED FOR SPLITING THE VARIABLE


NAME.
PROC PRINT DATA=KEY1 LABEL SPLIT='/';
VAR NAME SEX AGE MARKS HEIGHT WEIGHT;
LABEL NAME='STUDENT/NAME' SEX='STUDENT/SEX' AGE='STUDENT/AGE'
MARKS='STUDENT/MARKS' HEIGHT='STUDENT/HEIGHT'
WEIGHT='STUDENT/WEIGHT';
RUN;

PROC PRINT DATA=KEY1 LABEL SPLIT='/' WIDTH=FULL;


VAR NAME SEX AGE MARKS HEIGHT WEIGHT;
LABEL NAME='STUDENT/NAME' SEX='STUDENT/SEX' AGE='STUDENT/AGE'
MARKS='STUDENT/MARKS' HEIGHT='STUDENT/HEIGHT'
WEIGHT='STUDENT/WEIGHT';
;
RUN;

SUM STATEMENT: THIS STATEMENT IS USED TO SUM THE SPECIFIED


NUMERIC VARIABLE
PROC PRINT DATA=KEY1 LABEL SPLIT='/' WIDTH=FULL;
VAR NAME SEX AGE MARKS HEIGHT WEIGHT;
LABEL NAME='STUDENT/NAME' SEX='STUDENT/SEX' AGE='STUDENT/AGE'
MARKS='STUDENT/MARKS' HEIGHT='STUDENT/HEIGHT'
WEIGHT='STUDENT/WEIGHT';
SUM MARKS;
;
RUN;

VARIABLES GROUPING
VARIABLES ARE GROUPED IN TWO TYPES.
1.GROUPING VARIABLES
2.CALCULATION VARIABLES

PROC SORT DATA=KEY1 OUT=A1;


BY SEX;
RUN;
GROUPED BY SEX AND CALCULATED MARKS
PROC PRINT DATA=A1;
SUM MARKS;
BY SEX;
RUN;

PAGEBY STATEMENT: THIS STATEMENT IS USE TO INSERT LINE AFTER


GROUPING
PROC PRINT DATA=A1;
SUM MARKS;
BY SEX;
PAGEBY SEX;
RUN;
PROC REPORT

PROC REPORT IS A COMBINATION OF PROC MEANS, PROC FREQENCY, PROC


TABULATE AND HAS ADVANCE FEATURES THAN PROC PRINT

SYNTAX FOR PROC REPORTING:


1. PROC REPORT DATA;
RUN;

THE ABOVE PROGRAM IS USED FOR GENERATING THE REPORT.

EXAMPLE.

DATA STANDARD;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $&15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL STUDIES 14 56 165 60
MARIA F GENERAL SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL STUDIES 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M GENERAL SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;
PROC REPORT;
RUN;

2. PROC REPORT DATA=DATASET NAME;


RUN;
THIS PROGRAM IS USED TO GENERATE THE REPORT OF THE SPECIFIC
DATASET.
EXAMPLE:
PROC REPORT DATA=STANDARD;
RUN;

OPTIONS
BOX: THIS OPTION IS USED FOR CREATING THE BOX IN THE OUTPUT
WINDOW
PROC REPORT DATA=STANDARD BOX;
RUN;

NOHEADER: THIS OPTION IS USED FOR SKIPPING THE VARIABLE NAMES.


PROC REPORT DATA=STANDARD NOHEADER;
RUN;

HEADLINE: THIS OPTION IS USED FOR PUTTING A LINE BELOW THE


VARIABLE NAME. RESULT CAN BE SEEN IN THE OUTPUT WINDOW.
PROC REPORT DATA=STANDARD HEADLINE;
RUN;

HEADSKIP: THIS OPTION IS USED FOR PUTTING A BLANK SPACE BETWEEN


THE VARIABLE NAMES AND THE OBSERVATIONS.RESULT CAN BE SEEN IN
THE OUTPUT WINDOW.
PROC REPORT DATA=STANDARD HEADSKIP;
RUN;

NOWD OR NOWINDOW: THIS OPTION IS USED IN SAS 9.2 AND PREVIOUS


VERSION BECAUSE TO SUPRESS THE NEW REPORTING WINDOW.
PROC REPORT DATA=STANDARD NOWD;
RUN;

TO OUTPUT THE REPORT INTO A DATASET


PROC REPORT DATA=STANDARD OUT=SET1 DROP;
RUN;

PAGESIZE OR PS: THIS OPTION IS USED FOR EXPANDING THE PAGE SIZE.
THE DEFAULT PAGESIZE IS 15 CAN BE EXTENDED TO 32,767. THIS
OPTION IS USED WHEN THE DATA IS VERY LARGE.
PROC REPORT DATA=STANDARD PS=68;
RUN;

LINESIZE OR LS: THIS OPTION IS USED FOR INCREASING THE LINESIZE


THE DEFAULT LINESIZE IS 64 CAN BE EXTENDED TO 256.
PROC REPORT DATA=STANDARD LS=80;
RUN;

NOCENTER: THIS OPTION IS USED FOR ALINEMENT OF THE DATA. IF


NOCENTER IS USED DATA IS ALIGNED TO THE LEFT.
PROC REPORT DATA=STANDARD NOCENTER;
RUN;

NAMED: THIS OPTION IS USED FOR APPENDING VARIABLE NAME


(NAME=,SEX=) INFRONT OF EVERY VARIABLE
PROC REPORT DATA=STANDARD NAMED;
RUN;
NOEXEC: THE PROGRAM IS NOT EXECUTED.
PROC REPORT DATA=STANDARD NOEXEC;
RUN;

STATEMENTS
COLUMN: THIS IS SIMILAR TO THE VAR STATEMENT OF THE PROC PRINT
WHERE IT HOLD THE VARIABLE NAMES
AND IN THE DESIRED ORDER.
PROC REPORT DATA=STANDARD;
COLUMN NAME SEX MARKS;
RUN;

_CHAR_: THIS IS USED FOR DISPLAYING THE CHARACTER VARIABLES.


PROC REPORT DATA=STANDARD;
COLUMN _CHAR_;
RUN;

_NUMERIC_: THIS IS CODE IS USED FOR DISPLAYING ALL THE NUMERIC


VARIABLES SUM.
PROC REPORT DATA=STANDARD;
COLUMN _NUMERIC_;
RUN;

_ALL_: THIS SYNTAX IS USED FOR PRINTING ALL THE VARIABLESE IN


THE OUTPUT.
PROC REPORT DATA=STANDARD;
COLUMN _ALL_;
RUN;

DEFINE:THIS STATEMENT IS USED FOR FORMATS,SPACING,WIDTH, ORDER,


STYLE, ALINEMENTS ETC
PROC REPORT DATA=STANDARD;
COLUMN NAME SUBJECT MARKS HEIGHT;
DEFINE MARKS/'STUDENT MARKS';
RUN;
IN THE ABOVE PROGRAM IN THE DEFINE STATEMENT VARIABLE MARKS
IS LABELLED AS 'STUDENT MARKS'

PROC REPORT DATA=STANDARD;


COLUMN NAME SUBJECT MARKS HEIGHT;
DEFINE MARKS/'STUDENT MARKS' 'AS PER RULE';
RUN;

IN THE BELOW PROGRAM THE LABEL IS DISPLAYED IN TWO ROWS. SPLIT:


THIS IS USED FOR SPLITTING THE LABEL.
PROC REPORT DATA=STANDARD SPLIT='/';
COLUMN NAME SUBJECT MARKS HEIGHT;
DEFINE MARKS/'STUDENT/MARKS';
RUN;

WIDTH: THIS IS USED FOR ASSIGNING THE VARIABLE WIDTH.


SPACING: THIS IS USED FOR PUTTING THE SPACE BETWEEN THE
VARIABLES.
PROC REPORT DATA=STANDARD SPLIT='/';
COLUMN NAME SUBJECT MARKS HEIGHT;
DEFINE MARKS/'STUDENT/MARKS' WIDTH=10 SPACING=5;
RUN;

ADDING FORMATS TO THE REPORTING


PROC FORMAT;
VALUE $DARK 'M'='MALE'
'F'='FEMALE';
RUN;

PROC REPORT DATA=STANDARD;


COLUMN NAME SEX MARKS HEIGHT WEIGHT;
DEFINE SEX/'STUDENT GENDER' WIDTH=18 SPACING=7 FORMAT=$DARK.;
RUN;

OBSERVE THE RESULT IN THE OUTPUT WINDOW.


PROC REPORT DATA=STANDARD;
COLUMN NAME SEX MARKS HEIGHT WEIGHT;
DEFINE MARKS/ FORMAT = DOLLAR6.2;
RUN;

THOUGH THE ABOVE PROGRAM IS WRONG BUT TO SHOW HOW TO USE


NUMERIC FORMATS IT IS WRITTEN;

USING THE DISPLAY AND ANALYSIS OR SUM OPTIONS IN THE DEFINE


STATEMENT.

DISPLAY IS USED BY DEFAULT TO THE CHARACTER VARIABLES IT CAN


ALSO BE USED FOR NUMERIC VARIABLE WHEN ANALYSIS IS NOT REQUIRED.
BUT SUM OR ANALYSIS IS USED ONLY FOR NUMERIC VARIABLES.
PROC REPORT DATA=STANDARD SPLIT='/';
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

SUM OPTION WILL BE DISCUSS LATER IN THIS CHAPTER


DATA WILL BE ALINED AS CENTER RIGHT LEFT
PROC REPORT DATA=STANDARD SPLIT='/';
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY RIGHT;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK. RIGHT;
DEFINE AGE/'STUDENT AGE' WIDTH=15 LEFT DISPLAY ;
DEFINE MARKS/WIDHT=10 CENTER;
;
RUN;

THE OTHER WAY IS ALIGEMENT BY USING STYLE STATEMENT. WHEN STYLE


IS WRITTEN IN THE PROC STATEMENT IT IS APPLICABLE TO ALL THE
COLUMNS AND HEADERS.

WHEN WRITTEN IN PARTICULAR DEFINE STATEMENT IT IS APPLICABLE TO


THAT PARTICULAR VARIABLE.STYLE CAN BE APPLIED TO SPECIFIC HEADER
ALSO.
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(COLUMN)=[JUST=CENTER];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

PROC REPORT DATA=STANDARD SPLIT='/';


COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY STYLE(COLUMN)=[JUST=LEFT];
RUN;

PROC REPORT DATA=STANDARD SPLIT='/';


COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY
STYLE(HEADER)=[JUST=LEFT];
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

PROC REPORT DATA=STANDARD SPLIT='/';


COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY
STYLE(HEADER)=[JUST=LEFT]
STYLE(COLUMN)=[JUST=RIGHT];
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

APPLYING COLORS USING STYLE STATEMENT


PROC REPORT DATA=STANDARD SPLIT='/';
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY
STYLE(HEADER)=[JUST=LEFT BACKGROUND=BLUE FOREGROUND=GREEN]
STYLE(COLUMN)=[JUST=RIGHT BACKGROUND=YELLOW FOREGROUND=PINK];
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

WHEN WRITTEN IN THE PROC STATEMENT IT IS APPLICABLE TO WHOLE


HEADER OR COLUMNS
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(HEADER)=[JUST=LEFT
BACKGROUND=BLUE FOREGROUND=GREEN]
STYLE(COLUMN)=[JUST=RIGHT BACKGROUND=YELLOW FOREGROUND=PINK];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

APPLYING FONT SIZE,FONT WEIGHT, FONT FACE USING STYLE


STATEMENT
PROC REPORT DATA=STANDARD SPLIT='/';
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY
STYLE(HEADER)=[JUST=LEFT BACKGROUND=BLUE FOREGROUND=GREEN
FONT_WEIGHT=ITALIC FONT_FACE='TIMES NEW ROMAN' FONT_SIZE=5]
STYLE(COLUMN)=[JUST=RIGHT BACKGROUND=YELLOW FOREGROUND=PINK
FONT_WEIGHT=BOLD FONT_FACE='CALIBRI' FONT_SIZE=5];
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 DISPLAY
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

OPTIONS USED IN DEFINE STATEMENTS


1.DISPLAY: IT IS ALREADY DISCUSSED.
2.GROUP: WHEN THE COLUMN ARE SAME IT CONSOLIDATES INTO ONE.
3.ANALYSIS: USED FOR CALCULATING ARTHEMATIC OR STATISTICAL
VALUES.
4.COMPUTE: USED TO CALCULATE THE VALUES IN THE COMPUTE BLOCK.
5.ACROSS: USED FOR CREATING A COLUMN FOR EACH UNIQUE VALUE OF
THE VARIABLE.
6.DESCENDING:DISPLAYS GROUP ORDER AND ACROSS IN DESCENDING
ORDER
EXAMPLES OF GROUP
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(COLUMN)=[JUST=CENTER];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 GROUP
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

PROC REPORT DATA=STANDARD SPLIT='/';


COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 GROUP
FORMAT=$DARK.
STYLE(HEADER)=[JUST=LEFT BACKGROUND=GREEN FOREGROUND=RED
FONT_WEIGHT=ITALIC FONT_FACE='TIMES NEW ROMAN' FONT_SIZE=5]
STYLE(COLUMN)=[JUST=RIGHT BACKGROUND=YELLOW FOREGROUND=MAGENTA
FONT_WEIGHT=BOLD FONT_FACE='CALIBRI' FONT_SIZE=5];
DEFINE AGE/'STUDENT AGE' DISPLAY;
RUN;

EXAMPLES OF ANALYSIS
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(COLUMN)=[JUST=CENTER];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 GROUP
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' ANALYSIS;
RUN;

EXAMPLES OF DESCENDING
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(COLUMN)=[JUST=CENTER];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 GROUP
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DESCENDING ;
RUN;

EXAMPLE OF ACROSS
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(COLUMN)=[JUST=CENTER];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' WIDTH=18 SPACING=10 DISPLAY;
DEFINE SEX/'STUDENT/GENDER' WIDTH=15 SPACING=5 GROUP
FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' ACROSS;
RUN;

BREAK STATEMENT
BREAK STATEMENT IS DIVIDED INTO 3 CATEGORIES BREAK BEFORE,
BREAK AFTER, BREAK.

BREAK BEFORE STATEMENT IS USEFUL FOR GETTING SUMMARY BEFORE A


GROUP OR ORDER.

BREAK AFTER STATEMENT IS USEFUL FOR GETTING SUMMARY AFTER A


GROUP OR ORDER.

SUMMARIZE: THE SUM OF THE NUMERICAL VARIABLES IS INSERTED


AFTER A PARTICULAR GROUP OR ORDER.
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(COLUMN)=[JUST=CENTER];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK BEFORE SEX/SUMMARIZE;
RUN;

OL: DRAWS A LINE ABOVE THE SUMMARY. CHECK THE BELOW PROGRAM IN
THE OUTPUT WINDOW.
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(COLUMN)=[JUST=CENTER];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK BEFORE SEX/SUMMARIZE OL;
RUN;

UL: DRAWS A LINE BELOW THE SUMMARY. CHECK THE BELOW PROGRAM IN
THE OUTPUT WINDOW.
PROC REPORT DATA=STANDARD SPLIT='/' STYLE(COLUMN)=[JUST=CENTER];
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP FORMAT=$DARK.;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK BEFORE SEX/SUMMARIZE UL;
RUN;

COMBINING THE OL AND UL


PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK BEFORE SEX/SUMMARIZE OL UL;
RUN;

DOL AND DUL: DRAWS A DOUBLE LINES ABOVE AND BELOW THE SUMMARY.
CHECK THE BELOW PROGRAM IN THE OUTPUT WINDOW.
PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK BEFORE SEX/SUMMARIZE DOL DUL STYLE=[BACKGROUND=PINK
FOREGROUND=RED];
RUN;

SKIP: DRAWS A BLANK LINE. OBSERVE THE RESULT IN THE OUTPUT


WINDOW
PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK BEFORE SEX/SUMMARIZE SKIP;
RUN;

PAGE: PUTS THE GROUPED VARIABLE INTO DIFFERENT PAGES


PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK BEFORE SEX/SUMMARIZE PAGE;
RUN;

EXAMPLE OF BREAK AFTER


PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK AFTER SEX/SUMMARIZE;
RUN;

RBREAK: IT IS DISPLAYED ONLY ONE BEFORE OR AFTER. THE MAIN


DIFFERENCE BETWEEN RBREAK AND BREAK IS THAT
BREAK REQUIRES GROUPING OR ORDER WHERE AS RBREAK DOESNT
REQUIRE.(BREAK IT IS SIMILAR TO SUBTOTAL
RBREAK IS SIMILAR TO GRANDTOTAL)
PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/'STUDENT/NAME' DISPLAY;
DEFINE SEX/'STUDENT/GENDER' GROUP;
DEFINE AGE/'STUDENT AGE' DISPLAY;
DEFINE MARKS/ ORDER ANALYSIS;
BREAK AFTER SEX/SUMMARIZE;
RBREAK AFTER/SUMMARIZE;
RUN;

PLEASE OBSERVE BREAK ROW. WE GET ONLY TOTAL OF MARKS


PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/DISPLAY WIDTH=15;
DEFINE SEX/ GROUP;
DEFINE AGE/ DISPLAY;
DEFINE MARKS/ ANALYSIS;
DEFINE HEIGHT/DISPLAY;
DEFINE WEIGHT/DISPLAY;
BREAK AFTER SEX/SUMMARIZE;
RBREAK AFTER/SUMMARIZE;
RUN;

COMPUTE STATEMENT:
IT IS ALMOST SIMILAR TO DO STATEMENT. HERE WE CAN PERFORM
ARTHEMATIC/STATISTICAL OPERATIONS AND SOME
CONDITIONAL STATEMENTS. COMPUTE STATEMENT STARTS WITH COMPUTE
AFTER,COMPUTE BEFORE A GROUP AND SHOULD
END WITH ENDCOMP STATEMENT. MUST BE WRITTEN BEFORE RUN
STATEMENT.WE CAN CREATE A NEW VARIABLE ALSO
PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/DISPLAY WIDTH=15;
DEFINE SEX/ GROUP;
DEFINE AGE/ DISPLAY;
DEFINE MARKS/ ANALYSIS;
DEFINE HEIGHT/DISPLAY;
DEFINE WEIGHT/DISPLAY;
BREAK AFTER SEX/SUMMARIZE;
RBREAK AFTER/SUMMARIZE;
COMPUTE AFTER SEX;
NAME='TOTAL';
ENDCOMP;
RUN;

CONVERTING WEIGHT FROM KG TO LBS BY USING COMPUTE STATEMENT


PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT WTLB;
DEFINE NAME/DISPLAY WIDTH=15;
DEFINE SEX/ GROUP;
DEFINE AGE/ DISPLAY;
DEFINE MARKS/ ANALYSIS;
DEFINE HEIGHT/DISPLAY;
DEFINE WEIGHT/DISPLAY;
DEFINE WTLB/COMPUTED 'WEIGHT IN LBS' FORMAT=6.2;
BREAK AFTER SEX/SUMMARIZE;
RBREAK AFTER/SUMMARIZE;
COMPUTE WTLB;
WTLB= WEIGHT*2.204;
ENDCOMP;
RUN;

PROC REPORT DATA=STANDARD;


COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT WTLB HTM;
DEFINE NAME/DISPLAY WIDTH=15;
DEFINE SEX/ GROUP;
DEFINE AGE/ DISPLAY;
DEFINE MARKS/ ANALYSIS;
DEFINE HEIGHT/DISPLAY;
DEFINE WEIGHT/DISPLAY;
DEFINE WTLB/COMPUTED 'WEIGHT IN LBS' FORMAT=6.2;
DEFINE HTM/COMPUTED 'HEIGHT IN METERS' FORMAT=5.2;
BREAK AFTER SEX/SUMMARIZE;
RBREAK AFTER/SUMMARIZE;
COMPUTE WTLB;
WTLB= WEIGHT*2.204;
ENDCOMP;
COMPUTE HTM;
HTM=HEIGHT/100;
ENDCOMP;
RUN;
WRITING CONDITIONAL STATEMENTS IN COMPUTE STATEMENT
PROC REPORT DATA=STANDARD SPLIT='/';
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT QUALIFY;
DEFINE NAME/DISPLAY WIDTH=15;
DEFINE SEX/ GROUP;
DEFINE AGE/ DISPLAY;
DEFINE MARKS/DISPLAY;
DEFINE HEIGHT/DISPLAY;
DEFINE WEIGHT/DISPLAY;
DEFINE QUALIFY/COMPUTED 'PASS/FAIL' WIDTH=10;
BREAK AFTER SEX/SUMMARIZE;
RBREAK AFTER/SUMMARIZE;
COMPUTE QUALIFY / CHARACTER LENGTH=10;
IF MARKS >= 55 THEN QUALIFY='PASS';
IF MARKS <= 54.99 THEN QUALIFY='FAIL';
ENDCOMP;
RUN;

CALL DEFINE: HERE CALL DEFINE IS USED FOR ADDING COLOR TO


SPECIFIC ROW
PROC REPORT DATA=STANDARD;
COLUMN NAME SEX AGE MARKS HEIGHT WEIGHT;
DEFINE NAME/DISPLAY WIDTH=15;
DEFINE SEX/ DISPLAY;
DEFINE AGE/ DISPLAY;
DEFINE MARKS/ ANALYSIS;
DEFINE HEIGHT/DISPLAY;
DEFINE WEIGHT/DISPLAY;
BREAK AFTER SEX/SUMMARIZE;
RBREAK AFTER/SUMMARIZE;
COMPUTE SEX;
IF SEX='M' THEN CALL
DEFINE(_ROW_,'STYLE','STYLE=[BACKGROUND=MAGENTA]');
IF SEX='F' THEN CALL
DEFINE(_COL_,'STYLE','STYLE=[BACKGROUND=GDW]');
ENDCOMP;
RUN;
MACROS

CREATING THE MACROS


1.SIMPLE MACRO: IT MUST START WITH %, MACRO STATEMENT AND MACRO
NAME.
SYNTAX %LET VAR1='CHAR' OR VALUE;

2.COMPLEX MACRO: MACROS SHOULD START WITH % AND MACRO STATEMENT


THEN MACRO NAME IN THE BRACKETS PARAMETERS. NEXT DATA STEPS OR
SQL PROGRAMMING CAN BE WRITTEN AND THE MACRO SHOULD CLOSE WITH
%MEND STATEMENT.
SYNTAX %MACRO NAME (PARAMETER1, PARAMETER2);
<SAS PROGRAM>
OR
<SQL PROGRAM>
%MEND;

MACROS ARE CREATED IN 5 DIFFERENT TYPES;


1.GLOBAL MACRO: THESE CAN BE USED ANY WHERE IN THE SAS
ENVIRONMENT
2.LOCAL MACRO: THESE ARE USED LOCALLY. CREATED WITH IN A
MACRO
3.%LET MACRO: THESE CAN BE USED GLOBALLY AND LOCALLY
4.CALL SYMPUT: CAN BE USED GLOBALLY CREATED IN THE DATASET.
5.INTO: :THIS INTO: MACRO IS CREATED IN SQL, CAN BE USED
GLOBALLY.CALL SYMPUT AND INTO: WILL BE DISCUSSED LATER.

TO CHECK THE MACRO VALUE IN THE LOG WINDOW WE HAVE TO USE %PUT
AND MACRO NAME SHOULD START WITH &.& REPRESENTS A MACRO
REFERENCE.

%LET VAR1=100;
%PUT THE VALUES OF THE MACRO IS:&VAR1;
IT IS A SIMPLE MACRO WHICH CAN BE USED GOBALLY.BY USING %LET WE
CAN CREATE MACRO WITH ONLY ONE VARIABLE.

COMPLEX MACRO:
%MACRO SORTTING(NAME,VARIABLE);
PROC SORT DATA=&NAME;
BY &VARIABLE;
RUN;
%MEND;

%SORTTING(E_11,EMP_ID);

COMPLEX MACRO: CREATING A LOCAL MACRO WITHIN A MACRO.


%MACRO COMPLEX;
%LET PRICE=100;
%LET PRODUCT=TABLET;
%MEND;
%COMPLEX;
%PUT &PRICE;
%PUT &PRODUCT;

%MACRO COMPLEX;
%LET PRICE=100;
%LET PRODUCT=TABLET;
%PUT THE PRICE IS &PRICE AND THE PRODUCT IS &PRODUCT;
%MEND;
%COMPLEX;THIS STATEMENT RESOLVES THE MACRO.

TO CONVERT LOCAL MACRO TO GLOBAL WE HAVE TO ADD %GLOBAL INSIDE


THE MACRO.
%MACRO COMPLEX1;
%GLOBAL RATE ITEM;
%LET RATE=100;
%LET ITEM=CAKE;
%MEND;
%COMPLEX1;
%PUT THE PRICE IS &RATE AND THE PRODUCT IS &ITEM;

TO MAKE A MACRO COMPLETELY LOCAL WE HAVE TO WRITE %LOCAL INSIDE


THE MACRO.
%MACRO COMPLEX2;
%LOCAL PRICE PRODUCT;
%LET PRICE=100;
%LET PRODUCT=TABLET;
%PUT THE PRICE IS &PRICE AND THE PRODUCT IS &PRODUCT;
%MEND;
%COMPLEX2;

DATA RACE;
INPUT RACER $ 5-14 BIKE $ 15-33 DISTANCE TIME_MIN;
DATALINES;
MIKE MV Agusta F3 326 15
ROGER Kawasaki 300 25
ELLON KTM RC250 312 19
JAMES Aprilia RSV4 RF 253 45
JACK CHI Ducati Panigale R 340 8
;
RUN;

DIFFERENCE BETWEEN COMPLEX1 AND COMPLEX2 IS SHOWN BELOW


DATA CALCULATION;
SET RACE;
TIME_SEC=60*&RATE;
RUN;

DATA CALCULATION2;
SET RACE;
TIME_SEC=60*&PRICE;
RUN;

PARAMETERS: PARAMETERS MUST BE PRESENT INSIDE THE BRACKETS AND


MUST BE SEPARATED BY COMMAS. PARAMETERS ARE OF TWO TYPES
POSITIONAL PARAMETERS AND KEYWORD PARAMETERS.

BELOW PROGRAM IS A POSITIONAL PARAMETER PROGRAM. IN POSITIONAL


PARAMETERS THE ORDER MUST BE FOLLOWED. IF THE PARAMETERS ARE NOT
POSITIONED IN ORDER THEN ERROR OCCURS.
SYNTAX %MACRO NAME(PARAMETER1,PARAMETER2).

EXAMPLE
%MACRO SPEED(DATSET,SOURCE,V1,Z1,VARIABLE);
DATA &DATSET;
SET &SOURCE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;

%SPEED(NEW,RACE,DISTANCE,TIME_MIN,SPEED);

EXAMPLE
%MACRO IMPORTING(PATH,DATANAME,FORMAT);
PROC IMPORT DATAFILE=&PATH
OUT=&DATANAME
DBMS=&FORMAT;
RUN;
%MEND;

%IMPORTING("C:\Users\RICKY\Desktop\sas repository\base and


advance\data\files\excel\cars.xls",CAR,XLS);

KEYWORD PARAMETERS:FOR EACH PARAMETER KEYWORD IS ASSIGNED. THE


ADVANTAGE OF KEYWORD PARAMETER IS THEY CAN BE WRITTEN IN ANY
ORDER BETWEEN THE BRACKETS.

EXAMPLE
%MACRO VELOCITY(SOURCE=,DATSET=,V1=,VARIABLE=,Z1=);
DATA &DATSET;
SET &SOURCE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;

%VELOCITY(DATSET=SPEED,SOURCE=RACE,V1=DISTANCE,Z1=TIME_MIN,VARIA
BLE=SPEED);

EXAMPLE
%MACRO IMPORTING(DATANAME=,PATH=,FORMAT=);
PROC IMPORT DATAFILE=&PATH
OUT=&DATANAME
DBMS=&FORMAT;
RUN;
%MEND;

%IMPORTING(PATH="C:\Users\RICKY\Desktop\sas repository\base and


advance\data\files\excel\cars.xls",DATANAME=CAR,FORMAT=XLS);

MIXED PARAMETERS: COMBINING POSITIONAL PARAMETER AND KEYWORD


PARAMETERS.
%MACRO COVERED(DATSET,SOURCE,Z1=,VARIABLE=,V1=);
DATA &DATSET;
SET &SOURCE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;
%COVERED(R,RACE,Z1=TIME_MIN,VARIABLE=SPEED,V1=DISTANCE);

CALL SYMPUT: WHEN WE CREATE CALL SYMPUT MACROS WE GET LAST


OBSERVATION ONLY. THE VALUES OF THE VARIABLE ARE COPIED INTO THE
MACRO CREATED.

DATA RACING;
INPUT RACER $ 5-14 BIKE $ 15-33 DISTANCE TIME_MIN;
DATALINES;
MIKE MV Agusta F3 326 15
ROGER Kawasaki 300 25
ELLON KTM RC250 312 19
JAMES Aprilia RSV4 RF 253 45
JACK CHI Ducati Panigale R 340 8
;
RUN;

DATA RACING1;
SET RACING;
CALL SYMPUT('MOTORS',BIKE);
RUN;

%PUT &MOTORS.;

TO GET ALL THE OBSERVATIONS THE FOLLOWING SYNTAX SHOULD BE USED.

DATA RACING2;
SET RACING;
CALL SYMPUT('MOTORS'||COMPRESS(_N_),BIKE);
RUN;
%PUT &MOTORS1;
%PUT &MOTORS2;
%PUT &MOTORS3;
%PUT &MOTORS4;
%PUT &MOTORS5;

INTO: THIS IS USED ONLY IN SQL FOR CREATION OF MACRO.


PROC SQL;
SELECT RACER INTO:RACER1 FROM RACING;
QUIT;

%PUT &RACER1;

THE ABOVE MACRO HOLD ONLY FIRST OBSERVATION. TO MAKE IT HOLD ALL
THE OBSERVATIONS THE PROGRAM SHOULD BE CODED IN THE BELOW FORM.

PROC SQL;
SELECT RACER INTO:RACER1 SEPARATED BY ' ' FROM RACING;
QUIT;
%PUT &RACER1;

MACRO DEBUGGING OPTIONS ARE OF FOLLOWING TYPES BY DEFAULT THEY


ARE PRESENT AS
1.MERROR
2.SERROR
3.NOMLOGIC
4.NOMLOGICNES
5.NOMPRINT
6.NOMPRINTNEST
7.NOSYMBLOGEN
THE DEBUGGING OPTIONS ARE USED FOR CHECKING THE ERRORS IN THE
MACRO PROGRAM.
PROC OPTIONS GROUP=MACRO;
RUN;
MERROR:ISSUES A WARNING MESSAGE FOR AN UNRESOLVED MACRO
REFERENCE.BY DEFAULT MERROR OPTION EXISTS IT CAN BE CHANGED TO
NOMERROR.

OPTIONS MERROR;

DATA RACE;
INPUT RACER $ 5-14 BIKE $ 15-33 DISTANCE TIME_MIN;
DATALINES;
MIKE MV Agusta F3 326 15
ROGER Kawasaki 300 25
ELLON KTM RC250 312 19
JAMES Aprilia RSV4 RF 253 45
JACK CHI Ducati Panigale R 340 8
;
RUN;

%MACRO SPEED(DATSET,SOURCE,V1,Z1,VARIABLE);
DATA &DATSET;
SET &SOURCE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;

%SPED(NEW,RACE,DISTANCE,TIME_MIN,SPEED);
LOG WINDOW 'WARNING: Apparent invocation of macro SPED not
resolved.' IS DISPLAYED.

OPTIONS NOMERROR;
%SPED(NEW,RACE,DISTANCE,TIME_MIN,SPEED);
LOG WINDOW NO WARNING ISSUED.

SERROR:ISSUES A WARNING MESSAGE WHEN A MACRO VARIABLE REFERENCE


DOES NOT MATCH A MACRO VARIABLE.
OPTIONS SERROR;
%MACRO SPEED(DATSET,SOURCE,V1,Z1,VARIABLE);
DATA &DATSET;
SET &SOURE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;
%SPEED(NEW,RACE,DISTANCE,TIME_MIN,SPEED);
LOG WINDOW 'WARNING: Apparent symbolic reference SOURE not
resolved'.

OPTIONS NOSERROR;
%MACRO SPEEDA1(DATSET,SOURCE,V1,Z1,VARIABLE);
DATA &DATSET;
SET &SOURE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;
%SPEEDA1(NEW,RACE,DISTANCE,TIME_MIN,SPEED);

NOMLOGICNES:Does not display the macro nesting information in


the SAS log for MLOGIC output.

NOMPRINT: DOES NOT DISPLAY THE SAS STATEMENTS THAT ARE GENERATED
BY MACRO EXECUTION.
OPTIONS NOMPRINT;

%MACRO SPEED_A1(DATSET,SOURCE,V1,Z1,VARIABLE);
DATA &DATSET;
SET &SOURCE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;

%SPEED_A1(NEW1,RACE,DISTANCE,TIME_MIN,SPEED);

MPRINT OPTION:DISPLAYS HOW THE MACRO IS TRANSMUTED IN THE LOG


WINDOW.
OPTIONS MPRINT;

%MACRO SPEED_A2(DATSET,SOURCE,V1,Z1,VARIABLE);
DATA &DATSET;
SET &SOURCE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;

%SPEED_A2(SETA,RACE,DISTANCE,TIME_MIN,SPEED);

NOSYMBOLGEN:Does not display the results of resolving macro


variable references in the SAS log.
OPTIONS NOSYMBOLGEN;
%MACRO SPEED_A3(DATSET,SOURCE,V1,Z1,VARIABLE);
DATA &DATSET;
SET &SOURCE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;

%SPEED_A3(SETB,RACE,DISTANCE,TIME_MIN,SPEED);

OPTIONS SYMBOLGEN:IT DISPLAYS HOW THE PARAMETERS ARE RESOLVED.


OPTIONS SYMBOLGEN;

%MACRO SPEED_A4(DATSET,SOURCE,V1,Z1,VARIABLE);
DATA &DATSET;
SET &SOURCE;
&VARIABLE=&V1 &Z1;
RUN;
%MEND;

%SPEED_A4(SETB,RACE,DISTANCE,TIME_MIN,SPEED);

MACRO FUCTIONS

DATA E_11;
INFILE DATALINES;
INPUT EMP_ID: $ DES $:10. JDATE RDATE;
INFORMAT JDATE DATE7. RDATE DDMMYY10.;
FORMAT JDATE DATE7. RDATE DDMMYY10.;
LABEL JDATE='JOINING DATE';
DATALINES;
103 BROKER 14MAY11 04 05 2018
102 BANKER 01MAY13 09 12 2020
101 BAKER 10NOV14 09 11 2019
105 DOCTOR 11JUN12 06 04 2013
104 ATTORNEY 10MAR10 06 04 2017
;
RUN;

DATASTEP FUCTIONS CAN BE USED IN MACROS.BUT WHEN USING %LET WE


HAVE TO USE %SYSFUNC BEFORE THE FUNCTION. THE EXAMPLES ARE SHOWN
BELOW.

OPTIONS MPRINT;
%MACRO CASE_CHANGE(CREAT,SOURCE,VARIABLE1,VARIABLE2);
DATA &CREAT;
SET &SOURCE;
&VARIABLE1=TRANSLATE(&VARIABLE2,"ZO","BT");
RUN;
%MEND;
%CASE_CHANGE(RR,E_11,LOWERCASE,DES);

%LET NAME=%SYSFUNC(TRANSLATE(ATTORNEY_BAR,"ZO","BT"));
%PUT &NAME;

%MACRO MATRIX;
%GLOBAL NAME DESINATION JUSTIFICATION;
%LET NAME=%SYSFUNC(COMPBL('THIS IS
SAS PROGRAM.'));
%LET
DESINATION=%SYSFUNC(TRANSLATE(ATTORNEY_BAR,"ZO","BT"));
%MEND;
%MATRIX;
%PUT &NAME;
%PUT &DESINATION;

DEFAULT MACRO FUNCTIONS:


SOME OF THE DEFAULT MACRO FUNCTIONS ARE %SCAN, %SUBSTR,
%LOWCASE, %UPCASE, %LENGTH. ALL THESE FUNCTIONS HAVE SIMILAR
EXECUTION AS THAT OF DATASTEP FUNCTIONS.

%MACRO BLUESKY;
%GLOBAL SCANNED SUBSTRR LENGTH UPCASE LOWCASE;
%LET SCANNED=%SCAN(THIS IS A SAS PROGRAM.,4);
%LET SUBSTRR=%SUBSTR(THE PROGRAMMMER,5,4);
%LET LENGTH=%LENGTH(THIS IS A SAS PROGRAM.);
%LET UPCASE= %UPCASE(this is american history);
%LET LOWCASE=%LOWCASE(CINEMA ARE GOOD ENTERTAINMENTS);
%MEND;
%BLUESKY;
%PUT THE SCANNED WORD IS:&SCANNED;
%PUT THE EXTRACTED PART IS:&SUBSTRR;
%PUT THE LENGTH OF THE STRING IS:&LENGTH;
%PUT THE STRING IS CONVERTD FROM LOW TO UP:&UPCASE;
%PUT THE STRING IS CONVERTED FROM HIGH TO LOW:&LOWCASE;

%EVAL: THIS FUNCTION IS USED FOR ARITHEMETIC OPERATIONS WHEN THE


DIGITS ARE INTEGERS.
%LET K=%EVAL(8+5);
%PUT THE SUM IS:&K;

SYSEVALF:THIS FUNCTTION IS USED FOR ARTIHEMETIC OPERATIONS WHEN


THE DIGITS ARE FLOATING VALUES.
%LET O=%SYSEVALF(5.8+1.6);
%PUT THE VALUES ARE:&O;

data PROFILE;
infile datalines;
input Name $ Sex $ Subject $:15. Age Marks Height Weight;
datalines;
Ellen M mathematics 15 66 153 55
Sam M English 12 88 169 70
cristana F Social 14 56 165 60
Maria F Science 15 71 160 66
John M French 13 45 154 64
Victoria F Mathematics 11 57 159 71
Britney F Arabic 19 64 161 59
Abdul M English 13 47 162 54
Vikram M Social 17 47 144 72
Alisha F French 18 22 166 74
Alisha M French 18 22 166 74
Mona F Arabic 11 87 156 45
William M Science 16 55 161 77
Alisha F French 18 22 66 74
Alberta F English 14 61 162 71
;
run;

WRITING A MACRO FOR THE CONDITIONAL STATEMENTS


IF STATEMENT

OPTION MLOGIC;
%MACRO CREATE (NEWDATA,OLDDATA);
DATA &NEWDATA;
SET &OLDDATA;
IF SEX='F' THEN COLOR='GREEN';
ELSE COLOR='RED';
RUN;
%MEND;
%CREATE(NDS,PROFILE);

DATA DENGUE;
INFILE DATALINES;
INPUT NAME $ 5-11 TEMPERATURE PLATELETS_LAKHS DEATH $;
DATALINES;
JENSON 105 0.54 Y
NEEL 104 0.36 Y
RAMSON 105 0.41 Y
ELLEN 104 0.42 N
;
RUN;

%MACRO REPORT(D_SET,SOURCE);
DATA &D_SET;
SET &SOURCE;
IF DEATH='Y' THEN OUTPUT;
RUN;
%MEND;

%REPORT(NEW,DENGUE);

%MACRO REP(D_SET,RESOURCE);
DATA &D_SET;
SET &RESOURCE;
IF DEATH='Y' THEN DO;
AGE='OLDER';
BP='HIGH';
OUTPUT;
END;

RUN;
%MEND;

%REP(NEW2,DENGUE);
SQL
(STRUCTURED QUERY LANGUAGE)

QUERY ORDER
MEMORIZING THE ORDER
SELECT SOME
FROM FRENCH
WHERE WAITER
GROUP BY GROW
HAVING HEALTHY
ORDERY BY ORANGES

PROC SQL;
CREATE TABLE JOB(ID NUM, DESIGNATION CHAR(10), JOINING
CHAR(10), SALARY NUM);
QUIT;

INSERTING VALUES INTO THE TABLE CAN BE DONE BY VALUE STATEMENT


AND SET STATEMENT.
PROC SQL;
INSERT INTO JOB
VALUES(101, 'BANKER', '09NOV2010',50000)
VALUES(102, 'BAKER', '25JAN2016',12000)
VALUES(103, 'NURSE', '16FEB2018', 10000)
;
QUIT;

PROC SQL;
INSERT INTO JOB
SET ID=104, DESIGNATION='DOCTOR', JOINING='16SEP2014',
SALARY=152361
SET ID=105, DESIGNATION='LAWYER', JOINING='10MAR2018',
SALARY=12000;
QUIT;

INSERTING DATE AND TIME VALUES INTO THE TABLE


PROC SQL;
CREATE TABLE DOB(NAME CHAR(10), DOB NUM FORMAT=DATE9., TOB
NUM FORMAT=TIME8., ADMITDATETIME NUM FORMAT=DATETIME18.);
INSERT INTO DOB
VALUES('JOHNSON','15OCT2014'D, '12:10:01'T,
'10OCT2014:01:00:10'DT);
QUIT;

ASSIGNING LABELS
PROC SQL;
CREATE TABLE EMP(ID NUM 'IDENTIFICATION NUMBER', NAME
CHAR(15) 'NAME OF THE EMPLOYEE', DOJ NUM
FORMAT=DATE9.'DATE OF JOINING');
INSERT INTO EMP
VALUES(101, 'SAMUEL','09NOV2015'D)
VALUES(102,'JOHN','10DEC2010'D)
VALUES(103, 'SMITH','15JAN2009'D);
QUIT;

PRINTING THE DATASET


PROC SQL;
CREATE TABLE SAVE(ID NUM 'IDENTIFICATION NUMBER', NAME
CHAR(15) 'NAME OF THE EMPLOYEE', DOJ NUM
FORMAT=DATE9.'DATE OF JOINING');
INSERT INTO SAVE
VALUES(101, 'SAMUEL','09NOV2015'D)
VALUES(102,'JOHN','10DEC2010'D)
VALUES(103, 'SMITH','15JAN2009'D);
SELECT FROM SAVE;
QUIT;

SELECTING THE VARIABLES FOR THE GIVEN DATASET


PROC SQL;
SELECT ID,NAME FROM WORK.EMP;
QUIT;

DATA STANDARD;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;
ORDER BY OPTION:IT SORTS THE VARIABLE IN THE ORDER AS SPECIFIED

PROC SQL;
SELECT FROM STANDARD ORDER BY AGE;
QUIT;

PROC SQL;
SELECT FROM STANDARD ORDER BY AGE DESC;
QUIT;
DESC: THIS IS USED FOR SORTING IN DESCENDING ORDER
ASC: THIS IS USED FOR SORTING IN ASCENDING ORDER
PROC SQL;
SELECT FROM STANDARD ORDER BY AGE DESC, MARKS ASC;
QUIT;
WHERE QUERY IS USED FOR SELECTION
PROC SQL;
SELECT FROM STANDARD WHERE AGE <=16 ORDER BY MARKS DESC,
AGE ASC;
QUIT;

PROC SQL;
SELECT FROM STANDARD WHERE SEX='M';
QUIT;

PROC SQL;
SELECT FROM STANDARD WHERE SEX='M' AND AGE=13;
QUIT;

PROC SQL;
SELECT FROM STANDARD WHERE AGE BETWEEN 13 AND 17;
QUIT;

CREATING A NEW VARIABLE FROM THE EXISTING VARIABLES


PROC SQL;
SELECT , AGE+2.5 AS UPDATED_AGE FROM STANDARD;
QUIT;

CASE WHEN: THIS IS SIMILAR TO IF-ELSE STATEMENT. CASE WHEN HAS


TO END WITH END STATMENT
PROC SQL;
SELECT , CASE WHEN AGE=15 THEN MARKS+5
ELSE MARKS+1 END AS NEW_MARKS FROM STANDARD;
QUIT;

PROC SQL;
SELECT , CASE WHEN AGE=11 THEN MARKS+2
WHEN AGE=12 THEN MARKS+1.5
WHEN AGE=13 THEN MARKS+1
WHEN AGE=14 THEN MARKS+5
WHEN AGE=15 THEN MARKS+2.5
WHEN AGE=16 THEN MARKS-3
WHEN AGE=17 THEN MARKS-1.5
WHEN AGE=18 THEN MARKS+6.5
ELSE MARKS+10 END AS NEW_MARKS FROM
STANDARD;
QUIT;

PROC SQL;
SELECT , CASE WHEN AGE >=11 AND AGE<13 THEN "A+"
WHEN AGE>=13 AND AGE<15 THEN 'B+'
WHEN AGE>=15 AND AGE<17 THEN 'A++'
WHEN AGE>=17 AND AGE<18 THEN 'B---'
ELSE 'B+++' END AS RATING FROM STANDARD;
QUIT;
MULTIPLE CASE WHEN STATEMENTS
PROC SQL;
SELECT , CASE WHEN AGE=11 THEN MARKS+2
WHEN AGE=12 THEN MARKS+1.5
WHEN AGE=13 THEN MARKS+1
WHEN AGE=14 THEN MARKS+5
WHEN AGE=15 THEN MARKS+2.5
WHEN AGE=16 THEN MARKS-3
WHEN AGE=17 THEN MARKS-1.5
WHEN AGE=18 THEN MARKS+6.5
ELSE MARKS+10 END AS NEW_MARKS,
CASE WHEN AGE >=11 AND AGE<13 THEN "A+"
WHEN AGE>=13 AND AGE<15 THEN 'B+'
WHEN AGE>=15 AND AGE<17 THEN 'A++'
WHEN AGE>=17 AND AGE<18 THEN 'B---'
ELSE 'B+++' END AS RATING FROM STANDARD;
QUIT;

UPDATE STATEMENT: USED FOR UPDATING THE DATASET


PROC SQL;
UPDATE STANDARD SET MARKS=MARKS+25;
QUIT;

PROC SQL;
UPDATE STANDARD SET MARKS=
CASE WHEN AGE>11 AND AGE<13 THEN MARKS-25
WHEN AGE>=13 AND AGE<15 THEN MARKS-15
WHEN AGE>=15 AND AGE<19 THEN MARKS+13
ELSE MARKS+12 END;
QUIT;

CONVERTING CHAR TO NUMERIC, CHAR TO CHAR


PROC SQL;
SELECT PID, CASE WHEN SEX='F' THEN 1 ELSE 0 END AS SEX,
CASE WHEN RACE='AS' THEN 'ASIAN'
WHEN RACE='AM' THEN 'AMERICAN'
ELSE 'AFRICAN' END AS RACE,
CASE WHEN COLOR='W' THEN 'WHITE'
ELSE 'BLACK' END AS COLOR FROM PATIENT;

QUIT;

CONVERTING NUMERIC TO NUMERIC


PROC SQL;
SELECT PID,NAME, CASE WHEN AGE=21 THEN 25
WHEN AGE=56 THEN 25
WHEN AGE=24 THEN 25
ELSE 25 END AS AGE FROM PATIENT;
QUIT;

DESCRIBE STATEMENT: IT SHOW TABLE STRUCTURE


PROC SQL;
DESCRIBE TABLE PATIENT;
QUIT;

CREATING A NULL DATASET FROM EXISTING DATASET


PROC SQL;
CREATE TABLE MARKUP LIKE PATIENT;
QUIT;

DATA TREAT;
INPUT PID $ 5-15 TREATED $;
DATALINES;
APOLLO-104 PARACET
APOLLO-105 DICLOF
APOLLO-106 LOSART
APOLLO-106 LEVOCET
;
RUN;

DATA TREATMENT;
INPUT PID $ 5-15 TREATED $;
DATALINES;
APOLLO-101 PARACET
APOLLO-102 DICLOF
APOLLO-103 LOSART
APOLLO-104 PARACET
;
RUN;

NUMBER OPTION:IT GIVES THE NUMBER OF OBSERVATION AS ROW VARIABLE

PROC SQL NUMBER;


SELECT FROM POPULATION1;
QUIT;

NONUMBER OPTION: IT REMOVES THE ROW VARIABLE


PROC SQL NONUMBER;
SELECT FROM POPULATION2;
QUIT;

DATA STANDARD;
INFILE DATALINES;
INPUT NAME $ SEX $ SUBJECT $:15. AGE MARKS HEIGHT WEIGHT;
DATALINES;
ELLEN M MATHEMATICS 15 66 153 55
SAM M ENGLISH 12 88 169 70
CRISTANA F SOCIAL 14 56 165 60
MARIA F SCIENCE 15 71 160 66
JOHN M FRENCH 13 45 154 64
VICTORIA F MATHEMATICS 11 57 159 71
BRITNEY F ARABIC 19 64 161 59
ABDUL M ENGLISH 13 47 162 54
VIKRAM M SOCIAL 17 47 144 72
ALISHA F FRENCH 18 22 166 74
MONA F ARABIC 11 87 156 45
WILLIAM M SCIENCE 16 55 161 77
ALBERTA F ENGLISH 14 61 162 71
;
RUN;

NOEXEC OPTION: THE PROGRAM DOESNT GET EXECUTED


PROC SQL NOEXEC;
CREATE TABLE GROUPING AS SELECT , CASE WHEN AGE>11 AND AGE
<15 THEN 'O+'
WHEN
AGE>=15 AND AGE <17 THEN 'O'
ELSE 'O-'
END 'AGE_GROUPING' FROM STANDARD;
QUIT;

VALIATE OPTION: IT VALIDATES THE SYNTAC. THE QUOTED STATMENT IS


DISPLAYED IN THE LOG WINDOW "PROC SQL STATEMENT HAS VALID
SYNTAX"
PROC SQL NOEXEC;
VALIDATE SELECT , CASE WHEN AGE>11 AND AGE <15 THEN 'O+'
WHEN AGE>=15 AND AGE <17 THEN 'O'
ELSE 'O-' END 'AGE_GROUPING' FROM
STANDARD;
QUIT;

MODIFICATION OF TABLES:
1. ADDING VARIABLES.
2. DROP THE VARIABLE
3. ASSIGN CONTSRAINTS
4. CREATE COSTRAINTS.

DATA TORNADO;
INFILE DATALINES;
INPUT STATE $&14. MILES_PER_HOUR $ SPEED_OF_SOUND_MACH ;
DATALINES;
ALABAMA 80 2
CALIFORNIA 56 3
CONNECTICUT 91 1
DELAWARE 74 5
FLORIDA 71 2
IOWA 42 2
MARYLAND 87 5
MASSACHUSETTS 15 1
NEVADA 36 8
TENNESSEE 75 2
TEXAS 58 6
WISCONSIN 74 2
WYOMING 80 2
;
RUN;

DATA TORNADO1;
SET TORNADO;
RUN;

PROC SQL;
ALTER TABLE TORNADO1 ADD LIGHTENING CHAR(20);
QUIT;

PROC SQL;
UPDATE TORNADO1 SET LIGHTENING = CASE WHEN
SPEED_OF_SOUND_MACH<=3 THEN 'NOT OCCURED' ELSE 'OCCURED' END ;
QUIT;

DATA TORNADO2;
SET TORNADO1;
RUN;

PROC SQL;
ALTER TABLE TORNADO2 DROP LIGHTENING;
QUIT;

PROC SQL;
DELETE FROM TORNADO2 WHERE SPEED_OF_SOUND_MACH<=3;
QUIT;

DATA TORNADO3;
SET TORNADO1;
RUN;

PROC SQL;
DROP TABLE TORNADO3;
QUIT;

DATA SPORTS6;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
US SAM A+
US JOHN B-
US PELO A
UK JEM O
UK MOUZ C
CA BEN A+
UK BEZA A
;
RUN;
DATA SPORTS7;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
NZ RAZ C-
NZ MAZ A-
NZ REN C+
US PELO A
CA BEN A+
CA LEE B+
CA BEN A+
;
RUN;

SUBQUEIRES: QUERY WITHIN A QUERY IS CALLED SUBQUERY

PROC SQL;
SELECT FROM SPORTS6 WHERE NAME IN (SELECT NAME FROM
SPORTS7);
QUIT;

PROC SQL;
SELECT FROM SPORTS6 WHERE NAME NOT IN(SELECT NAME FROM
SPORTS7);
QUIT;

UNIONS

DATA SPORTS;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
US SAM A+
US JOHN B-
US PELO A
UK JEM O
UK MOUZ C
UK BEZA A
;
RUN;

DATA SPORTS1;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
NZ RAZ C-
NZ MAZ A-
NZ REN C+
CA BEN A+
CA LEE B+
CA BEN A+
;
RUN;

PROC SQL; DUPLICATE OBSERVATIONS WILL BE REMOVED AND SORTED;


CREATE TABLE NIN AS SELECT FROM SPORTS
UNION
SELECT FROM SPORTS1;
QUIT;

PROC SQL;DUPLICATE OBSERVATIONS WILL NOT BE REMOVED AND SORTED;


CREATE TABLE NIN1 AS SELECT FROM SPORTS
UNION ALL
SELECT FROM SPORTS1;
QUIT;

DATA SPORTS2;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
US SAM A+
US JOHN B-
US PELO A
UK JEM O
UK MOUZ C
UK BEZA A
;
RUN;

DATA SPORTS3;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
NZ RAZ C-
NZ MAZ A-
NZ REN C+
US PELO A
CA BEN A+
CA LEE B+
CA BEN A+
;
RUN;

PROC SQL;
CREATE TABLE INTER AS SELECT FROM SPORTS2
INTERSECT
SELECT FROM SPORTS3;
QUIT;

PROC SQL;
CREATE TABLE INTER1 AS SELECT , 'PLAYERS FROM COUNTRIES' AS
COUNTRIES_FROM FROM SPORTS2
INTERSECT
SELECT , 'PLAYERS FROM
COUNTRIES' AS CCOUNTRIES_FROM FROM SPORTS3;
QUIT;

DATA SPORTS4;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
US SAM A+
US JOHN B-
US PELO A
UK JEM O
UK MOUZ C
CA BEN A+
UK BEZA A
;
RUN;

DATA SPORTS5;
INPUT TEAM $ NAME $ SCORE $;
DATALINES;
NZ RAZ C-
NZ MAZ A-
NZ REN C+
US PELO A
CA BEN A+
CA LEE B+
CA BEN A+
;
RUN;
PROC SQL;
CREATE TABLE INTER2 AS SELECT TEAM,NAME FROM SPORTS4
EXCEPT
SELECT TEAM, NAME FROM SPORTS5;
QUIT;
JOINS
TYPES OF JOINS:
1.SIMPLE JOIN
2.INNER JOIN
3.LEFT JOIN
4.RIGHT JOIN
5.CROSS JOIN
6.FULL JOIN

DATA MFG;
INPUT @5MFG $9. MODEL $ PRICE;
DATALINES;
BENZ B-CLASS 10000
BMW X6 15213
SUZUKI WINZ 12457
HONDA ACCORD 14567
TATA JAGUAR 19856

;
RUN;

DATA CARS;
INPUT @5MFG $9. MODEL $ MILEAGE_MPS $;
DATALINES;
MAHENDRA XUV 15
BENZ B-CLASS 12
BMW X6 15
SUZUKI WINZ 16
ISUZ MUX 14

;
RUN;

SIMPE JOIN
PROC SQL;
CREATE TABLE SIMP AS SELECT FROM MFG,CARS WHERE
MFG.MFG=CARS.MFG;
QUIT;

INNER JOIN
PROC SQL;
CREATE TABLE INN1 AS SELECT FROM MFG INNER JOIN CARS ON
MFG.MFG=CARS.MFG;
QUIT;

PROC SQL;
CREATE TABLE INN2 AS SELECT FROM MFG INNER JOIN CARS ON
MFG.MFG=CARS.MFG AND MFG.MODEL=CARS.MODEL;
QUIT;

LEFT JOIN
PROC SQL;
CREATE TABLE LEF AS SELECT FROM CARS LEFT JOIN MFG ON
CARS.MFG=MFG.MFG;
QUIT;

PROC SQL;
CREATE TABLE RIG1 AS SELECT FROM MFG RIGHT JOIN CARS ON
MFG.MFG=CARS.MFG;
QUIT;
SO IN THE ABOVE RIGHT JOIN WE ARE NOT GETTING THE ALL COMMON
VARIABLES. TO OVERCOME THIS BELOW
FORM OF TABLE SHOULD BE WRITTEN.

RIGHT JOIN: IT IS SIMILAR TO RIGHT MERGE TO GET THE COMMON


VARIABLES FROM BOTH THE DATASETS WE HAVE
TO USE COALESCE FUCNTION.
PROC SQL;
CREATE TABLE RIG AS SELECT COALESCE(MFG.MFG,CARS.MFG) AS
MFG,COALESCE(MFG.MODEL,CARS.MODEL) AS MODEL,
PRICE,MILEAGE_MPS FROM MFG RIGHT JOIN CARS ON
MFG.MFG=CARS.MFG;
QUIT;

RIGHT MERGE IN SAS


PROC SORT DATA=MFG;
BY MFG;
RUN;
PROC SORT DATA=CARS;
BY MFG;
RUN;
DATA R_MERGE;
MERGE MFG(IN=A) CARS(IN=B);
BY MFG;
IF B;
RUN;

FULL JOIN
PROC SQL;
CREATE TABLE FUL AS SELECT COALESCE(MFG.MFG,CARS.MFG) AS
MFG,COALESCE(MFG.MODEL,CARS.MODEL) AS MODEL, PRICE,
MILEAGE_MPS FROM MFG FULL JOIN CARS ON MFG.MFG=CARS.MFG;
RUN;
CROSS JOIN: THIS JOIN PRODUCES CARTESIAN PRODUCT. CARESIAN
PRODUCT IS NOTHING BUT ONE OBSERVATION COMBINES
WITH ALL THE OBSERVATIONS WITH THE OTHER DATASET
PROC SQL;
CREATE TABLE CROSS AS SELECT FROM CARS CROSS JOIN MFG;
QUIT;

FUNCTIONS
DATA ATHLETICS;
INFILE DATALINES;
INPUT @5PLAYER $10. RACE1_MPS RACE2_MPS RACE3_MPS RACE4_MPS
RACE5_MPS;
DATALINES;
AKBAR 45 48 43 49 51.9
ANTONY 51 36 49.2 41 59
SAMUEL 46 56 41 42.3 49
ELLON 49 . 63 67 43
. 63 12 32 41 14
;
RUN;

SQL NUMERIC FUNCTIONS WORK SIMILAR TO SAS FUNCTIONS SOME OF THE


FUNCTIONS ARE CODED BELOW
PROC SQL;
CREATE TABLE VALUES AS SELECT , INT(RACE4_MPS) AS
RACE4_INT, CEIL(RACE5_MPS) AS RACE5_CEIL, FLOOR(RACE5_MPS)
AS RACE5_FLOOR, MOD(RACE1_MPS,2) AS RACE1_MOD,
MAX (RACE1_MPS, RACE2_MPS,RACE3_MPS,RACE4_MPS,RACE5_MPS) AS
RACE1_MAX, MIN(RACE1_MPS, RACE2_MPS, RACE3_MPS, RACE4_MPS,
RACE5_MPS) AS RACE1_MIN, SUM(RACE1_MPS, RACE2_MPS,
RACE3_MPS, RACE4_MPS, RACE5_MPS) AS RACE1_SUM,
NMISS(RACE1_MPS, RACE2_MPS, RACE3_MPS, RACE4_MPS,RACE5_MPS)
AS RACE1_NMISS, MEAN(RACE1_MPS, RACE2_MPS, RACE3_MPS,
RACE4_MPS,RACE5_MPS) AS RACE1_MEAN, ROUND(RACE5_MPS) AS
RACE5_ROUND FROM ATHLETICS;
QUIT;

DISTINCT FUCTION: IT IS FOR FINDING THE DISTINCT READINGS FROM A


VARIABLE
DATA ATHLETICS1;
INFILE DATALINES;
INPUT @5PLAYER $10. RACE1_MPS RACE2_MPS RACE3_MPS RACE4_MPS
RACE5_MPS;
DATALINES;
AKBAR 45 48 43 49 51.9
ANTONY 51 36 49.2 41 59
SAMUEL 46 56 41 42.3 49
ELLON 49 . 63 67 43
. 63 12 32 41 14
SAMUEL 46 56 41 42.3 49
AKBAR 45 47 43 49 51.9
;
RUN;

PROC SQL;
SELECT DISTINCT(RACE5_MPS) AS DISTINCT_OBSERVATION FROM
ATHLETICS1;
QUIT;
ONLINE CLASS WILL BE PROVIDED FOR THE BELOW SAS PROGRAMMING TOPICS. COST
OF THE COURSE IS $436.

CONTACT US BY EMAIL: zarpla.vasanth@ gmail.com

1. DATA STATEMENT
2. INFILE STATEMENT
3. INPUT STATEMENT
4. FORMAT AND INFORMAT
5. CONDITIONAL STATEMENTS
6. ITERATIVE DO LOOPS
7. DM STATEMENT
8. COMBINING DATASETS
9. FUNCTIONS
10. OPTIONS
11. ARRAYS
12. IMPORTING AND EXPORTING THE DATA
13. PROC CONTENTS
14. PROC COMPARE
15. PROC TRANSPOSE
16. PROC MEANS
17. PROC FREQUENCY
18. PROC RANK
19. PROC SGPLOT
20. PROC CORR
21. PROC REG
22. PROC ANOVA
23. PROC PRINT
24. PROC REPORT
25. MACROS
26. SQL
27. ODS

ONLINE CLASS WILL BE PROVIDED FOR THE CLINICAL SAS PROGRAMMER. COST OF THE
COURSE IS $750.

1. DATA STATEMENT
2. INFILE STATEMENT
3. INPUT STATEMENT
4. FORMAT AND INFORMAT
5. CONDITIONAL STATEMENTS
6. ITERATIVE DO LOOPS
7. DM STATEMENT
8. COMBINING DATASETS
9. FUNCTIONS
10. OPTIONS
11. ARRAYS
12. IMPORTING AND EXPORTING THE DATA
13. PROC CONTENTS
14. PROC COMPARE
15. PROC TRANSPOSE
16. PROC MEANS
17. PROC FREQUENCY
18. PROC RANK
19. PROC SGPLOT
20. PROC CORR
21. PROC REG
22. PROC ANOVA
23. PROC PRINT
24. PROC REPORT
25. MACROS
26. SQL
27. ODS
28. INTRODUCTION TO CLINICAL TRIAL
29. SDTM DOMAINS
a. INTRODUCTION TO STDM
b. DOMAIN
i. DEMOGRAPHICS(DM)
ii. CONCOMITANT MEDICATION(CM)
iii. EXPOSURE(EX)
iv. SUBSTANCE USE(SU)
v. ADVERSE EVENTS(AE)
vi. MEDICAL HISTORY(MH)
vii. DISPOSITION
viii. ECG TEST RESULTS(EG)
ix. LABORATORY TEST RESULTS(LB)
c. Supplementary domain
d. validation

You might also like