SAS Certification Examples

Back to SAS Certification Examples and Exercises

(Part 1)

Session 1
You can select and copy each one of the following examples and paste them directly into your SAS Editor window. They should each run without errors.

Example 1
data test; input x y; cards; 1 2 3 4 5 80 ; run;


 * Notice that a dataset is created, but no output is produced. What's missing from the program?
 * The dataset has 3 observations and 2 variables.

After the line that says 5 80 add a new line that says 6.

Run the whole program again. Notice that SAS does not produce an error message.

Example 2
Taken from the SAS Online Documentation, Example from the PROC REG procedure. data fitness; input Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse @@; datalines; 44 89.47 44.609 11.37 62 178 182  40 75.07 45.313 10.07 62 185 185    44 85.84 54.297  8.65 45 156 168   42 68.15 59.571  8.17 40 166 172    38 89.02 49.874  9.22 55 178 180   47 77.45 44.811 11.63 58 176 176    40 75.98 45.681 11.95 70 176 180   43 81.19 49.091 10.85 64 162 170    44 81.42 39.442 13.08 63 174 176   38 81.87 60.055  8.63 48 170 186    44 73.03 50.541 10.13 45 168 168   45 87.66 37.388 14.03 56 186 192    45 66.45 44.754 11.12 51 176 176   47 79.15 47.273 10.60 47 162 164    54 83.12 51.855 10.33 50 166 170   49 81.42 49.156  8.95 44 180 185    51 69.63 40.836 10.95 57 168 172   51 77.91 46.672 10.00 48 162 168    48 91.63 46.774 10.25 48 162 164   49 73.37 50.388 10.08 67 168 168    57 73.37 39.407 12.63 58 174 176   54 79.38 46.080 11.17 62 156 165    52 76.32 45.441  9.63 48 164 166   50 70.87 54.625  8.92 48 146 155    51 67.25 45.118 11.08 48 172 172   54 91.63 39.203 12.88 44 168 172    51 73.71 45.790 10.47 59 186 188   57 59.08 50.545  9.93 49 148 155    49 76.32 48.673  9.40 56 186 188   48 61.24 47.920 11.50 52 170 176    52 82.78 47.467 10.50 53 170 172    ;    run; proc reg data=fitness; model Oxygen=RunTime; run;

Session 2
To understand the differences in the SAS Statments-INPUT and INFILE, See the following examples.

Example 1
The INFILE statement is to identify an external file.

In SAS, it will be like:

FILENAME CAT 'C:\USERS\CAT.DAT'; DATA PETS; INFILE CAT; INPUT ID $ 1-4 AGE 6-7 SEX $ 8; RUN;

Notice that the CAT is the name of your external file.

Note: DATA statement is to NAME a SAS data set.

Example 2
The INPUT statment is to describe your data.

In SAS, it will be like:

FILENAME CAT 'C:\USERS\CAT.DAT'; DATA PETS; INFILE CAT; INPUT ID $ 1-4 AGE 6-7 SEX $ 8; RUN;

Notice that the dollar sign idenfities the variable type as character. Since it's meaningless to run the regression if you treat the ID as numeric variable.

And the INPUT statement here assigns the character variable ID to the data in columns 1-4, the numeric variable AGE to the data in columns 6-7, the character variable SEX to the data in column 8.

Example 3
There is a useful statement-OBS which can be used in the INFILE statement.

Situation: When you have 1000000 observations in your data set, and you want to take a look at it without reading the entire data file. You can add OBS=n to the INFILE satement, so that you can process only records 1 through n.

FILENAME CAT 'C:\USERS\CAT.DAT'; DATA PETS; INFILE CAT OBS=10; INPUT ID $ 1-4 AGE 6-7 SEX $ 8; RUN;

Notice that you only run the first ten observations here.

Example 4
The data statements of this example make use of a new feature called direct referencing (version 9.1). By using this new feature we avoid the added step of using a FILENAME statement.

Note: this example won't run without errors since in class we used File Import in the SAS menu to create a WORK.Class1 dataset.

Here is the raw text that belongs in a file called class1.txt in the C:\Temp folder (from an Excel spreadsheet).

id     age     name 1     19George 2     20Mary 3     21Xena 4     21Juan

proc print data=class1 (firstobs=3 obs=4); var id age; run; proc means data=class1; run; data class2; infile 'c:\Temp\class1.txt' firstobs=2 truncover; input id 1-8 age 9-16 name $ 17-24; run; proc print data=class2; run; data 'c:\Temp\plato.sas7bdat'; set class1; run; data test; input x y; cards; 1 2 3 4 5 80 6         16000 ; run; data 'c:\Temp\test.sas7bdat'; set test; run;

Example 5
FIRSTOBS: To specify which observation SAS processes first. data test; input x y z; cards; 1 2  4 3 4   7 5 80  3 6 20  2 9 30  1 ; run; proc print data=test (firstobs=2); run;

Run the whole example in your computer and notice that you only print out the last four observations in output.

Example 6
MISSOVER: To prevent an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. --from SDLEo

data scores; infile datalines missover; input score1-score5; datalines; 90 98 98 80 100 98 70 78 20 50 90 30 60 ; run; proc print data=scores; run;

Run the above program and notice the missing value in the ouput window.

Example 7
Change the option-MISSOVER to FLOWOVER and TRUNCOVER in the previous example, and see what's the difference.

Example 1
data test; input f1; cards; 1 2 3000 0.0004 0.0005 6.6 70 365 366 367 ; run; data test2; set test; f2=f1; f3=f1; f4=f1; f5=f1; format f2 8.2 ; format f3 date9. ; format f4 dollar8.2 ; format f5 8.0 ; run; proc print data=test2; run;

Example 2
Great white sharks (Wikipedia article) are killing machines that have not needed to evolve for millions of years.

An extended example. The data records the average length in feet of selected whales and sharks.

data sealife; input name $ family $ length ; datalines; beluga  whale   15 whale   shark   40 basking shark   30 gray    whale   50 mako    shark   12 sperm   whale   60 dwarf   shark   .5 whale   shark   40 humpback. 50 blue    whale   100 killer  whale   30 ; run; proc means data=sealife mean; var length; run; proc means data=sealife min max; class family; var length; run; proc sort data=sealife out=sortedlife; by descending length ; run; proc print data=sortedlife; var name family length; run; proc sort data=sealife out=sealife2; by family descending length ; run; proc print data=sealife2; var name family length; run; proc means data=sealife noprint nway; class family; var length; output out=ds35; run; proc sort data=sealife out=sl2; by family; run; proc means data=sl2; var length; by family; run;

Example 3
Using the fitness example from the earlier session we looked at a common usage for PROC SORT

data fit2; set fitness; * Here's a simple calculation that will give us two age groups; agecat=1; /* The 'young' age category */ if (age>50) then agecat=2; /* 'old' */ run; proc print data=fit2; var age agecat Oxygen RestPulse; run; proc sort data=fit2 out=jerry; by agecat; run; proc corr data=jerry; var Oxygen RestPulse; by agecat; run;

A separate Correlation report is produced for each Age category.

Example 4
Candy sales data, similar to the data mentioned in The Little SAS Book, Section 4.4 (light blue edition, page 107).

Name ClassRm Month Day Year Candy Quantity Adriana   21    3   2  2000  MP    7 Nathan    14    2   28 2000  CD   19 Matthew   14    3   1  2000  CD   14 Claire    14    3   3  2000  CD   11 Caitlin   21    2   24 2000  CD    9 Ian       21    3   3  2000  MP   18 Chris     14    2   18 2000  CD    6 Anthony   21    6   1  2000  MP   13 Stephen   14    3   25 2000  CD   10 Erika     21    3   25 2000  MP  17

Briefly editting the text above we wrote the program to illustrate two ways to use PROC FREQ.

data candy; input Name $ ClassRm Month Day Year Candy $ Quantity  ; cards; Adriana   21    3   2  2000  MP    7 Nathan    14    2   28 2000  CD   19 Matthew   14    3   1  2000  CD   14 Claire    14    3   3  2000  CD   11 Caitlin   21    2   24 2000  CD    9 Ian       21    3   3  2000  MP   18 Chris     14    2   18 2000  CD    6 Anthony   21    6   1  2000  MP   13 Stephen   14    3   25 2000  CD   10 Erika     21    3   25 2000  MP  17 ; run; proc freq data=candy; tables ClassRm Candy; run; proc freq data=candy; tables ClassRm*Candy /nopercent norow; run;

Examples (part 1)
Height/Weight data showing very basic ODS statement usage

data htwt; input Name $ 1-10 Sex $ 12 Age 14-15 Height 17-18 Weight 20-22; datalines; ALFRED    M 14 69 112 ALICE     F 13 56  84 BARBARA   F 14 62 102 BERNADETTE F 13 65 98 HENRY     M 14 63 102 JAMES     M 12 57  83 JANE      F 12 59  84 JANET     F 15 62 112 JEFFREY   M 13 62  84 JOHN      M 12 59  99 JOYCE     F 11 51  50 JUDY      F 14 64  90 LOUISE    F 12 56  77 MARY      F 15 66 112 PHILLIP   M 16 72 150 ROBERT    M 12 64 128 RONALD    M 15 67 133 THOMAS    M 11 57  85 WILLIAM   M 15 66 112 ; run; proc corr data=htwt; var height weight; run; ods listing close; proc print data=htwt; var height weight; run;

Notice the error message that occurs when there is no active Output Destination (above).

The following is a simple, very controlled way of directing where the output from a procedure should go ... and in what format you'd like to make the output. Other popular choices for output formats are: ods html file='c:\Temp\corr.html'; title 'The relationship between heights and weights'; proc corr data=htwt; var height weight; run; ods html close;
 * PDF
 * RTF (Rich Text Format)

ODS and function examples using Tomato data (from the Little SAS Book, 3rd edition, Section 5.3)

ods html file='c:\Temp\print2.html'; ods listing; /* Turns the listing output destination back on */ data tomatoes; input name $13. color $ Days Weight ; cards; Big Zac     red 80 5 Delicious   red 80 3 Dinner Plate red 90 2 Goliath     red 85 1.5 Mega Tom    red 80 2 Big Rainbow yellow 90 1.5 Pineapple   yellow 85 2 ; run; proc print data=tomatoes; run; ods trace on; proc corr data=tomatoes; var Days Weight; run; ods trace off;

* Examine the LOG to see what the names are for the various output components;

ods trace on; proc corr data=tomatoes nosimp; /* This nosimp option reduces the number of tables produced */ var Days Weight; run; ods trace off; proc corr data=tomatoes; ods select Corr.PearsonCorr; var Days Weight; run;

Now, on to limiting the observations output by a datastep, functions and creating new variables.

data tom2; set tomatoes; if (Days>80); *if (Days <= 80) then delete; run; proc print data=tom2; run; data tom3; set tomatoes; newsum=Days+Weight; r=round(Weight); first3=substr(Name,1,3); spacepos=index(name,' '); caps=upcase(Name); leng=length(Name); run; proc print data=tom3; run;

Example 2 (skipped)
The following was not covered in class due to time contraints. The data is derived from the Little SAS Book, 3rd edition, Section 5.11. (I was lazy about typing last names. --Rhansen 13:27, 28 September 2006 (EDT)) Traffic-Lighting Data

1, Jochem Smith, Netherlands,374.66 2, Derek Smith, United States,377.98 3, Jens Smith, Germany,381.73 4, Dmitry Smith, Russia,381.85 5, KC Smith, United States,382.97

Example 3 (actually just a part 2)
Baseball data, illustrating arrays and Do loops

The following is not a proper reading of data mention in both editions of The Little SAS Book (2nd and 3rd editions) in the chapter about Simple Regression analyses. To illustrate a practicle use of arrays we've pretended that each T-ball player tries to hit a ball five times, rather than treating all 30 height/distance pairs as separate players.

data baseball; input Height1 Distance1 Height2 Distance2 Height3 Distance3 Height4 Distance4 Height5 Distance5; cards; 50 110 49 135  48 129  53 150  48 124 50 143  51 126  45 107  53 146  50 154 47 136  52 144  47 124  50 133  50 128 50 118  48 135  47 129  45 126  48 118 45 121  53 142  46 122  47 119  51 134 49 130  46 132  51 144  50 132  50 131 ; run; data bb2; set baseball; array H {5} Height1 Height2 Height3 Height4 Height5; do i=1 to 5; H{i}=H{i}*2.54; /* To convert inches to centimeters */ end; run; proc print data=bb2; run; data simple; input x; cards; 1 2 5 6 ; run; data newsimp; set simple; do i=1 to 3; output; end; run; proc print data=newsimp; run;

Example 4
COMPANY = Mutual Fund Company RETURN = Average Annual Return, Percent SDAR   = Standard Deviation of Annual Return, Percent Name                                      Return   SDAR Group Securities. Common Stock Fund       15.1     19.1 Incorporated Investors                    14.0     25.5 Investment Company of America             17.4     21.8 Investors Mutual                          11.3     12.5 Loomis-Sales Mutual Fund                  10.0     10.4 Massachusetts Investors Trust             16.2     20.8 National Investors Corporation            18.3     19.9 National Securities-Income Series         12.4     17.8 New England Fund                          10.4     10.2 Massachusetts Investors-Growth Stock      18.6     22.7 Group Securities. Fully Administered Fund 11.4     14.1

Example 5
Time:Total minutes spent on homework per week. Days:Days spent on homework per week. Time Days 100 3 150  4  99  5   .  6 160  3  80  . 200  3

Example 6
FIELD = Field of Specialization WOMEN = Median Salaries of Women, Thousands of $ MEN  = Median Salaries of Men, Thousands of $ FIELD                              WOMEN    MEN Business, Finance, Etc.              9.3    13.0 Labor Economics                     10.3    12.0 Monetary-Fiscal                      8.0    11.6 General Economic Theory              8.7    10.8 Population, Welfare Programs, Etc.  12.0    11.5 Economic Systems and Development     9.0    12.2

Example 1
Simple examples to illustrate simple dataset creation. data test; x=1; a=2; b=3; c=18; run; data test2; set test; y=32; d=a+b; run; data test3; set test2; output; output; output; run;

Example 2
Examples for generating a sequence of year values

data alumni; year=1995; contrib=1000; output; year=year+5; contrib=1500; output; year=year+5; output; year=year+5; output; run; data alumni; year=1995; output; year=year+5; contrib=1500; output; year=year+5; output; year=year+5; output; run; data alumni50; year=1995; contrib=1000; do i=1 to 10; contrib=contrib+25; year=year+5; output; end; run; proc print data=alumni50; run; data pop; year=1790; output; do i=1 to 21; year=year+10; output; end; run; proc print data=pop; run;

Example 2a
Combining a sequence with real data of US Population estimates starting in 1790 (Data from the SAS online documentation PROC REG (regression) overview.)

data actpop; input population; cards; 3.929 5.308 7.239 9.638 12.866 17.069 23.191 31.443 39.818 50.155 62.947 75.994 91.972 105.71 122.775 131.669 151.325 179.323 203.211 226.542 248.71 281.422 ; run; data fullpop; merge pop actpop; run; proc print data=fullpop; run;

Example 2b
Using the population data for a quick linear regression. Actually looking at the plot reveals what a poor fit the linear model is.

proc gplot data=fullpop; plot population*year; run; proc reg data=fullpop; model Population=year; run;

Example 3
Heights, Weights and Ages for a class.

data class; input Name $ Height Weight Age @@; datalines; Alfred 69.0 112.5 14  Alice  56.5  84.0 13  Barbara 65.3  98.0 13 Carol  62.8 102.5 14  Henry  63.5 102.5 14  James   57.3  83.0 12 Jane   59.8  84.5 12  Janet  62.5 112.5 15  Jeffrey 62.5  84.0 13 John   59.0  99.5 12  Joyce  51.3  50.5 11  Judy    64.3  90.0 14 Louise 56.3  77.0 12  Mary   66.5 112.0 15  Philip  72.0 150.0 16 Robert 64.8 128.0 12  Ronald 67.0 133.0 15  Thomas  57.5  85.0 11 William 66.5 112.0 15 ; run; proc reg data=class ; model Height=Age; run; proc reg data=class outest=est noprint; model Height=Age; run; proc print data=est; run;

Example 4
Setting up arbitrary groups based on the students' names data class2; set class; group=1; if substr(Name,1,1)='J' then group=2; run; * Sorting is required before BY statement processing; proc sort data=class2; by group; run; proc reg data=class2; model Height=Age; by group; run;

Example 4a
Illustrating an efficient concise form for outputing several regression models' results

proc reg data=class2 outest=est2 noprint; model Height=Age; by group; run; proc print data=est2; run;

Example 5
Illustrating using PROC REG to generate other output; the residuals from a regression

proc reg data=class2; model height=age; output out=DS1 r=george; run; proc print data=DS1; run; proc sort data=DS1; by descending george ; run; proc print data=DS1; var name age height george; run; Left as an exercise: If we want the largest outlier (not the most positive), how should we change the program above? What should we do with the dataset DS1?