SAS Certified Specialilll

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Most frequently used PROCs

PROC CONTENTS
proc contents data=SASHELP._all_ VARNUM;
run;

SAS output: The CONTENTS Procedure

Data Set Name SASHELP.ENSO Observations 168


Member Type DATA Variables 3
Engine V9 Indexes 0
Created Tuesday, May 24, 2011 02:51:27 PM Observation Length 24
Last Modified Tuesday, May 24, 2011 02:51:27 PM Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label El Nino Southern Oscillation
Data Representation WINDOWS_32
Encoding us-ascii ASCII (ANSI)

Engine/Host Dependent Information


Data Set Page 4096
Size
Number of Data 2
Set Pages
First Data Page 1
Max Obs per 168
Page
Obs in First Data 113
Page
Number of Data 0
Set Repairs
Filename C:\Program
Files\SASHome\x86\SASFoundation\9.3\core\sashelp\enso.sas7bdat

1
Engine/Host Dependent Information
Release Created 9.0301M0
Host Created XP_PRO

Variables in Creation Order


# Variable Type Len
1 Month Num 8
2 Pressure Num 8

3 Year Num 8

PROC CONTENTS produces the descriptor portion of the data set.


PROC CONTENTS is usually used with _ALL_, which is a SAS automatic shorthand for all the data sets
in the library.
VARNUM options ask SAS to print a list of the variables in the order they were created, if not
specified, SAS will print a list of the variables in alphabetic order

Example6:
The following SAS program is submitted:
proc contents data=sasuser.houses;
run;
The exhibit below contains partial output produced by the CONTENTS procedure.
Data Set Name SASUSER.HOUSES Observations 15
Member Type DATA Variables 6
Engine V9 Indexes 0
Created Tuesday, April 22, 2003 03:09:25 PM Observation Length 56
Last Modified Tuesday, April 22, 2003 03:09:25 PM Deleted Observations 0
Protection
Compressed NO Data Set Type
Sorted NO
Label Residential housing for sale Data Representation WINDOWS_32
Encoding wlatin1 Western (Windows)

Which of the following describes the Sasuser.Houses data set?


a. The data set is sorted but not indexed.
b. The data set is both sorted and indexed.
c. The data set is not sorted but is indexed.
d. The data set is neither sorted nor indexed.
Answer:

The default encoding for a dataset in SAS is “wlatin ” (or “wlatin Western (windows)”). wlatin is used
in the “western world” and does not suppose Asian characters. UTF-8 is a universal encoding that can
handle characters from all possible languages.

2
Example 7:
Write a SAS program that will read the data set properties of cert.input11.
Run the program and use the results to answer the next 2 questions.
a. What is the encoding for the data set?
b. What label has been assigned to this data set?

PROC SORT

proc sort data=sashelp.class out=class;


by sex descending age;
run;

Out= option allows you to specify the name (class) of the sorted data set, if omitted, the original
data set (SASHELP.class) will be overwritten.
By default, SAS will sort the data set in ascending order (by sex). Option DESCENDING before the
variable age asks SAS to sort in descending order (by age).
Proc sort does not generate SAS results, but generate a new sorted data set.

Example8:
This project will use data set cert.input12.

Write a SAS program that sorts the data set and stores it as results.output12a.

Sort the data as follows:

First by lastName, ascending order


Next by firstName, ascending order
Finally, by Age, descending order

Run the program and use the results to answer the next 2 questions.

a. What is the value of the customerID variable for observation 30 in results.output12a?


Enter your answer in the space below:
b. What is the value of the customerID variable for observation 252 in results.output12a?
Enter your answer in the space below:
(Case capitalization is ignored)

3
Very important options in proc sort statement:

NODUP =
If you specify NODUP option, the duplicated records will be deleted from the sorted data set
(overview2_nodup).

proc sort data=cert.overview2 out=overview2_nodup nodup;


by country;
run;

NODUPKEY=
If you specify NODUPKEY option, only one record (the first one) with the same key variables (the
variables specified in BY statement) will be kept in the sorted data set (overview2_nodupkey).

proc sort data=cert.overview2 out=overview2_nodupkey nodupkey;


by country;
run;

Example9:
Continuing with the same program in example12, now add code to create another data set,
results.output12b as a subset of cert.input12. Select the records so that:

There is only one unique observation per postalCode.


When there are multiple observations for a specific postalCode in the source data, select the
observation with the highest income.

Finally, sort results.output12b by ascending income.

Run the program and use the results to answer the next 3 questions.

a. How many observations are in results.output12b?

b. What is the value of the customerID variable for observation 12 in results.output12b?

c. Write an additional SAS procedure step to determine the average (mean) of all income within the
results.output12b data set?

Enter your numeric answer in the space below. Round you answer to nearest whole number.

4
PROC MEANS

proc means data=SASHELP.CLASS std mean max maxdec=2;


var Height;
class Sex Age;
run;

Class statement asks SAS to calculate the statistics for VAR Height within each class levels (Sex,
Age)
VAR statement asks for statistics of specified numeric variables, ignoring missing values. When
omitted, the statistics for all the numeric variables will be calculated with missing values
ignored.
STD, MEAN, MAX … are the statistics you can require to calculate. By default, SAS will calculate
N, mean, std, min and max. N Ob i al a gi en b SAS o p hich co n o al
number of observations in each class level, including all missing values.
MAXDEC = specifies the maximum of decimal points you would like SAS to report.
Format, where and label statement can be applied in PROC MEANS as well

Answer: A

PROC FREQ

One-Way tabulation:

proc freq data = sashelp.shoes;


tables region;
run;

Sample output:

Region Frequency Percent Cumulative Cumulative


Frequency Percent

Africa 56 14.18 56 14.18

Asia 14 3.54 70 17.72

Canada 37 9.37 107 27.09

Central America/Caribbean 32 8.10 139 35.19

5
Eastern Europe 31 7.85 170 43.04

Middle East 24 6.08 194 49.11

Pacific 45 11.39 239 60.51

South America 54 13.67 293 74.18

United States 40 10.13 333 84.30

Western Europe 62 15.70 395 100.00

Nocum can be used to suppress cumulative frequency

Two-way Tabulation

proc freq data = sashelp.shoes;


tables region*product;
run;

nofreq nocol norow nopercent

The above options can be used in the tables statement to suppress frequency, col percent, row percent
and cell percent respectively.

For example:

proc freq data = sashelp.shoes;


tables region*product / nocol norow;
run;

Example10:
This question will ask you to provide a line of missing code.

The following SAS program is submitted:

proc freq data=WORK.SALES;


<insert code here>
run;

The following output is created by this FREQUENCY procedure:

6
Frequenc
y

Percent

Row Pct

Col Pct

Region
/Product Boot Men's Men's … … Total
Casual Dress

Africa 8 5 7 56
2.03 1.27 1.77 14.18
14.29 8.93 12.50
15.38 11.11 14.00
Asia 2 1 2 14
0.51 0.25 0.51 3.54
14.29 7.14 14.29
3.85 2.22 4.00
Canada 5 4 4 37
1.27 1.01 1.01 9.37
13.51 10.81 10.81
9.62 8.89 8.00
… …

Total 52 45 50 395
13.16 11.39 12.66 100.00

A. tables region product;


B. var region product;
C. var region*product;
D. tables region*product;

You might also like