Sas Interview Questions
Sas Interview Questions
Sas Interview Questions
Answer: To override the default way in which the DATA step writes observations t o output, you can use an OUTPUT statement in the DATA step. Placing an explicit OUTPUT statement in a DATA step overrides the automatic output, so that observat ions are added to a data set only when the explicit OUTPUT statement is executed .
Question: What is the function of Stop statement? Answer: Stop statement causes SAS to stop processing the current data step immed iately and resume processing statement after the end of current data step.
Question : What is the difference between using drop= data set option in data st atement and set statement? Answer: If you don't want to process certain variables and you do not want them to appear in the new data set, then specify drop= data set option in the set state ment. Whereas If want to process certain variables and do not want them to appear in t he new data set, then specify drop= data set option in the data statement.
Question: Given an unsorted dataset, how to read the last observation to a new d ata set? Answer: using end= data set option. For example: data work.calculus; set work.comp end=last; If last; run; Where Calculus is a new data set to be created and Comp is the existing data set last is the temporary variable (initialized to 0) which is set to 1 when the set statement reads the last observation.
Question : What is the difference between reading the data from external file an d reading the data from existing data set? Answer: The main difference is that while reading an existing data set with the SET statement, SAS retains the values of the variables from one observation to t he next.
Question: What is the difference between SAS function and procedures? Answer: Functions expects argument value to be supplied across an observation in a SAS data set and procedure expects one variable value per observation. For example: data average ; set temp ; avgtemp = mean( of T1 T24 ) ; run ; Here arguments of mean function are taken across an observation. proc sort ; by month ; run ; proc means ; by month ; var avgtemp ; run ; Proc means is used to calculate average temperature by month (taking one variabl e value across an observation).
Question: Differnce b/w sum function and using + operator? Answer: SUM function returns the sum of non-missing arguments whereas + operator r eturns a missing value if any of the arguments are missing. Example: data mydata; input x y z; cards; 33 3 3 24 3 4 24 3 4 . 3 2 23 . 3 54 4 . 35 4 2 ; run; data mydata2; set mydata; a=sum(x,y,z);
p=x+y+z; run; In the output, value of p is missing for 3rd, 4th and 5th observation as : a p 39 39 31 31 31 31 5 . 26 . 58 . 41 41
Question: What would be the result if all the arguments in SUM function are miss ing? Answer: a missing value
Question: What would be the denominator value used by the mean function if two o ut of seven arguments are missing? Answer: five Question: Give an example where SAS fails to convert character value to numeric value automatically? Answer: Suppose value of a variable PayRate begins with a dollar sign ($). When SAS tries to automatically convert the values of PayRate to numeric values, the dollar sign blocks the process. The values cannot be converted to numeric values . Therefore, it is always best to include INPUT and PUT functions in your programs when conversions occur.
Question: What would be the resulting numeric value (generated by automatic char to numeric conversion) of a below mentioned character value when used in arithm etic calculation? 1,735.00 Answer: a missing value
Question: What would be the resulting numeric value (generated by automatic char to numeric conversion) of a below mentioned character value when used in arithm etic calculation? 1735.00
Answer: 1735
Question: Which SAS statement does not perform automatic conversions in comparis ons? Answer: where statement Question: Briefly explain Input and Put function? Answer: Input function Character to numeric conversion- Input(source,informat) put function Numeric to character conversion- put(source,format )
Question: What would be the result of following SAS function(given that 31 Dec, 2000 is Sunday)? Weeks = intck (`week','31 dec 2000d,'01jan2001d); Years = intck (`year','31 dec 2000d,'01jan2001d); Months = intck (`month','31 dec 2000d,'01jan2001d); Answer: Weeks=0, Years=1,Months=1
Question: What are the parameters of Scan function? Answer: scan(argument,n,delimiters) argument specifies the character variable or expression to scan n specifies which word to read delimiters are special characters that must be enclosed in single quotation mark s
Question: Suppose the variable address stores the following expression: 209 RADCLIFFE ROAD, CENTER CITY, NY, 92716 What would be the result returned by the scan function in the following cases? a=scan(address,3); b=scan(address,3,',); Answer: a=Road; b=NY
Question: What is the length assigned to the target variable by the scan functio
n? Answer: 200
Question: Name few SAS functions? Answer: Scan, Substr, trim, Catx, Index, tranwrd, find, Sum.
Question: What is the function of tranwrd function? Answer: TRANWRD function replaces or removes all occurrences of a pattern of cha racters within a character string.
Question: Consider the following SAS Program data finance.earnings; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); end; run; What would be the value of month at the end of data step execution and how many observations would be there? Answer: Value of month would be 13 No. of observations would be 1
Question: Consider the following SAS Program data finance; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); output;
end; run; How many observations would be there at the end of data step execution? Answer: 12
Question: How do you use the do loop if you don't know how many times should you e xecute the do loop? Answer: we can use do until or do while to specify the condition.
Question: What is the difference between do while and do until? Answer: An important difference between the DO UNTIL and DO WHILE statements is that the DO WHILE expression is evaluated at the top of the DO loop. If the expr ession is false the first time it is evaluated, then the DO loop never executes. Whereas DO UNTIL executes at least once.
Question: How do you specify number of iterations and specific condition within a single do loop? Answer: data work; do i=1 to 20 until(Sum>=20000); Year+1; Sum+2000; Sum+Sum*.10; end; run; This iterative DO statement enables you to execute the DO loop until Sum is grea ter than or equal to 20000 or until the DO loop executes 10 times, whichever occ urs first.
Question: How many data types are there in SAS? Answer: Character, Numeric
Question: If a variable contains only numbers, can it be character data type? Al so give example
Answer: Yes, it depends on how you use the variable Example: ID, Zip are numeric digits and can be character data typ e.
Question: If a variable contains letters or special characters, can it be numeri c data type? Answer: No, it must be character data type.
Question; What can be the size of largest dataset in SAS? Answer: The number of observations is limited only by computer's capacity to handl e and store them. Prior to SAS 9.1, SAS data sets could contain up to 32,767 variables. In SAS 9.1 , the maximum number of variables in a SAS data set is limited by the resources available on your computer.
Question: Give some example where PROC REPORT's defaults are different than PROC P RINT's defaults? Answer: No Record Numbers in Proc Report Labels (not var names) used as headers in Proc Report REPORT needs NOWINDOWS option Question: Give some example where PROC REPORT's defaults are same as PROC PRINT's de faults? Answer: Variables/Columns in position order. Rows ordered as they appear in data set. Question: Highlight the major difference between below two programs: a. data mydat; input ID Age; cards; 2 23 4 45 3 56
9 43 ; run; proc report data = mydat nowd; column ID Age; run; b. data mydat1; input grade $ ID Age; cards; A 2 23 B 4 45 C 3 56 D 9 43 ; run; proc report data = mydat1 nowd; column Grade ID Age; run; Answer: When all the variables in the input file are numeric, PROC REPORT does a sum as a default.Thus first program generates one record in the list report whe reas second generates four records.
Question: In the above program, how will you avoid having the sum of numeric var iables? Answer: To avoid having the sum of numeric variables, one or more of the input variables must be defined as DISPLAY. Thus we have to use : proc report data = mydat nowd; column ID Age; define ID/display; run;
Question: What is the difference between Order and Group variable in proc report ? Answer: If the variable is used as group variable, rows that have the same values are co llapsed. Group variables produce list report whereas order variable produces summary repo rt. Question: Give some ways by which you can define the variables to produce the su mmary report (using proc report)? Answer: All of the variables in a summary report must be defined as group, analy sis, across, or Computed variables.
Questions: What are the default statistics for means procedure? Answer: n-count, mean, standard deviation, minimum, and maximum
Question: How to limit decimal places for variable using PROC MEANS? Answer: By using MAXDEC= option
Question: What is the difference between CLASS statement and BY statement in pro c means? Answer: Unlike CLASS processing, BY processing requires that your data already be sorted or indexed in the order of the BY variables. BY group results have a layout that is different from the layout of CLASS group results. Question: What is the difference between PROC MEANS and PROC Summary? Answer: The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, you must in clude a PRINT option in the PROC SUMMARY statement.
Question: How to specify variables to be processed by the FREQ procedure? Answer: By using TABLES Statement.
Question: Describe CROSSLIST option in TABLES statement? Answer: Adding the CROSSLIST option to TABLES statement displays crosstabulation tables in ODS column format.
Question: How to create list output for crosstabulations in proc freq? Answer: To generate list output for crosstabulations, add a slash (/) and the LI ST option to the TABLES statement in your PROC FREQ step. TABLES variable-1*variable-2 <* variable-n> / LIST;
Question: Proc Means work for ________ variable and Proc FREQ Work for ______ va riable? Answer: Numeric, Categorical
Question: How can you combine two datasets based on the relative position of row s in each data set; that is, the first observation in one data set is joined wit h the first observation in the other, and so on? Answer: One to One reading
Question: data concat; set a b; run; format of variable Revenue in dataset a is dollar10.2 and format of variable Re venue in dataset b is dollar12.2 What would be the format of Revenue in resulting dataset (concat)? Answer: dollar10.2
Question: If you have two datasets you want to combine them in the manner such t hat observations in each BY group in each data set in the SET statement are read sequentially, in the order in which the data sets and BY variables are listed t hen which method of combining datasets will work for this? Answer: Interleaving
Question: While match merging two data sets, you cannot use the __________option with indexed data sets because indexes are always stored in ascending order.
Answer: Descending
Question: I have a dataset concat having variable a b & c. How to rename a b to e & f? Answer: data concat(rename=(a=e b=f)); set concat; run;
Question : What is the difference between One to One Merge and Match Merge? Giv e example also.. Answer: If both data sets in the merge statement are sorted by id(as shown belo w) and each observation in one data set has a corresponding observation in the o ther data set, a one-to-one merge is suitable. data mydata1; input id class $; cards; 1 Sa 2 Sd 3 Rd 4 Uj ; data mydata2; input id class1 $; cards; 1 Sac 2 Sdf 3 Rdd 4 Lks ; data mymerge; merge mydata1 mydata2; run;
If the observations do not match, then match merging is suitable data mydata1; input id class $; cards; 1 Sa 2 Sd 2 Sp 3 Rd 4 Uj ; data mydata2; input id class1 $; cards; 1 Sac 2 Sdf 3 Rdd 3 Lks 5 Ujf ; data mymerge; merge mydata1 mydata2; by id run; What is the one statement to set the criteria of data that can be coded in any s tep? A) Options statement. What is the effect of the OPTIONS statement ERRORS=1? A) The ERROR- variable ha a value of 1 if there is an error in the data for that observation and 0 if it is not. What do the SAS log messages "numeric values have been converted to character" m ean? What are the implications? A) It implies that automatic conversion took place to make character functions p ossible.
Why is a STOP statement needed for the POINT= option on a SET statement? A) Because POINT= reads only the specified observations SAS cannot detect an end -of-file condition as it would if the file were being read sequentially. How do you control the number of observations and/or variables read or written? A) FIRSTOBS and OBS option Approximately what date is represented by the SAS date value of 730? A) 31st December 1961 Identify statements whose placement in the DATA step is critical. A) INPUT, DATA and RUN Does SAS Translate (compile) or does it Interpret? A) Compile What does the RUN statement do? A) When SAS editor looks at Run it starts compiling the data or proc step, if yo u have more than one data step or proc step or if you have a proc step. Followin g the data step then you can avoid the usage of the run statement. Why is SAS considered self-documenting? A) SAS is considered self documenting because during the compilation time it cre ates and stores all the information about the data set like the time and date of the data set creation later No. of the variables later labels all that kind of info inside the dataset and you can look at that info using proc contents proced ure. What are some good SAS programming practices for processing very large data sets ? A) Sort them once, can use firstobs = and obs = , What is the different between functions and PROCs that calculate the same simple descriptive statistics? A) Functions can used inside the data step and on the same data set but with pro cs you can create a new data sets to output the results. May be more .......... . If you were told to create many records from one record, show how you would do t his using arrays and with PROC TRANSPOSE? A) I would use TRANSPOSE if the variables are less use arrays if the var are mor e ................. depends What is a method for assigning first.VAR and last.VAR to the BY groupvariable on unsorted data? A) In unsorted data you cant use First. or Last. How do you debug and test your SAS program? A) First thing is look into Log for errors or warning or NOTE in some cases or u se the debugger in SAS data step. What other SAS features do you use for error trapping and data validation? A) Check the Log and for data validation things like Proc Freq, Proc means or so me times proc print to look how the data looks like ........ How would you combine 3 or more tables with different structures? A) I think sort them with common variables and use merge statement. I am not sur e what you mean different structures.
Other questions: What areas of SAS are you most interested in? A) BASE, STAT, GRAPH, ETSBriefly Describe 5 ways to do a "table lookup" in SAS. A) Match Merging, Direct Access, Format Tables, Arrays, PROC SQL What versions of SAS have you used (on which platforms)? A) SAS 9.1.3,9.0, 8.2 in Windows and UNIX, SAS 7 and 6.12 What are some good SAS programming practices for processing very large data sets ?A) Sampling method using OBS option or subsetting, commenting the Lines, Use Da ta Null What are some problems you might encounter in processing missing values? In Data steps? Arithmetic? Comparisons? Functions? Classifying data? A) The result of any operation with missing value will result in missing value. Most SAS statistical procedures exclude observations with any missing variable v ales from an analysis. How would you create a data set with 1 observation and 30 variables from a data set with 30observations and 1 variable? A) Using PROC TRANSPOSE What is the different between functions and PROCs that calculate the same simple descriptive statistics? A) Proc can be used with wider scope and the results can be sent to a different dataset. Functions usually affect the existing datasets. If you were told to create many records from one record, show how you would do t his using array and with PROC TRANSPOSE? A) Declare array for number of variables in the record and then used Do loop Pro c Transpose with VARstatement What are _numeric_ and _character_ and what do they do? A) Will either read or writes all numeric and character variables in dataset. How would you create multiple observations from a single observation?A) Using do uble Trailing @@ For what purpose would you use the RETAIN statement? A) The retain statement is used to hold the values of variables across iteration s of the data step. Normally, all variables in the data step are set to missing at the start of each iteration of the data step.What is the order of evaluation of the comparison operators: + - * / ** ()?A) (), **, *, /, +, How could you generate test data with no input data? A) Using Data Null and put statement How do you debug and test your SAS programs? A) Using Obs=0 and systems options to trace the program execution in log.
What can you learn from the SAS log when debugging? A) It will display the execution of whole program and the logic. It will also di splay the error with line number so that you can and edit the program. What is the purpose of _error_? A) It has only to values, which are 1 for error and 0 for no error. How can you put a "trace" in your program? A) By using ODS TRACE ON How does SAS handle missing values in: assignment statements, functions, a merge , an update, sort order, formats, PROCs? A) Missing values will be assigned as missing in Assignment statement. Sort orde r treats missing as second smallest followed by underscore. How do you test for missing values? A) Using Subset functions like IF then Else, Where and Select. How are numeric and character missing values represented internally? A) Character as Blank or and Numeric as. Which date functions advances a date time or date/time value by a given interval ? A) INTNX. In the flow of DATA step processing, what is the first action in a typical DATA Step? A) When you submit a DATA step, SAS processes the DATA step and then creates a n ew SAS data set.( creation of input buffer and PDV) Compilation Phase Execution Phase What are SAS/ACCESS and SAS/CONNECT? A) SAS/Access only process through the databases like Oracle, SQL-server, Ms-Acc ess etc. SAS/Connect only use Server connection. What is the one statement to set the criteria of data that can be coded in any s tep?A) OPTIONS Statement, Label statement, Keep / Drop statements. What is the purpose of using the N=PS option? A) The N=PS option creates a buffer in memory which is large enough to store PAG ESIZE (PS) lines and enables a page to be formatted randomly prior to it being p rinted. What are the scrubbing procedures in SAS? A) Proc Sort with nodupkey option, because it will eliminate the duplicate value s. What are the new features included in the new version of SAS i.e., SAS9.1.3? A) The main advantage of version9 is faster execution of applications and centra lized access of data and support.
There are lots of changes has been made in the version 9 when we compared with t he version8. The following are the few:SAS version 9 supports Formats longer tha n 8 bytes & is not possible with version 8. Length for Numeric format allowed in version 9 is 32 where as 8 in version 8. Length for Character names in version 9 is 31 where as in version 8 is 32. Length for numeric informat in version 9 is 31, 8 in version 8. Length for character names is 30, 32 in version 8.3 new informats are available in version 9 to convert various date, time and datetime forms of data into a SAS date or SAS time. ANYDTDTEW. - Converts to a SAS date value ANYDTTMEW. - Converts to a SAS time valu e. ANYDTDTMW. -Converts to a SAS datetime value.CALL SYMPUTX Macro statement is a dded in the version 9 which creates a macro variable at execution time in the da ta step by Trimming trailing blanks Automatically converting numeric value to character. New ODS option (COLUMN OPTION) is included to create a multiple columns in the o utput. WHAT DIFFERRENCE DID YOU FIND AMONG VERSION 6 8 AND 9 OF SAS. The SAS 9 A) Architecture is fundamentally different from any prior version of SAS. In the SAS 9 architecture, SAS relies on a new component, the Metadata Server, to prov ide an information layer between the programs and the data they access. Metadata , such as security permissions for SAS libraries and where the various SAS serve rs are running, are maintained in a common repository. What has been your most common programming mistake? A) Missing semicolon and not checking log after submitting program, Not using debugging techniques and not using Fsview option vigorously. Name several ways to achieve efficiency in your program.Efficiency and performan ce strategies can be classified into 5 different areas. CPU time Data Storage Elapsed time Input/Output Memory CPU Time and Elapsed Time- Base line measurements Few Examples for efficiency violations:Retaining unwanted datasets Not sub setti ng early to eliminate unwanted records. Efficiency improving techniques: A) Using KEEP and DROP statements to retain necessary variables. Use macros for red ucing the code. Using IF-THEN/ELSE statements to process data programming. Use SQL procedure to reduce number of programming steps. Using of length statements to reduce the variable size for reducing the Data sto rage. Use of Data _NULL_ steps for processing null data sets for Data storage. What other SAS products have you used and consider yourself proficient in using? B) A) Data _NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq a nd Proc print, Proc Univariate etc.
What is the significance of the OF in X=SUM (OF a1-a4, a6, a9);A) If don't use t he OF function it might not be interpreted as we expect. For example the functio n above calculates the sum of a1 minus a4 plus a6 and a9 and