SAS Certification Examples

From BingWiki

Jump to: navigation, search

Back to SAS Certification Examples and Exercises

(Part 1)

Contents

Session 1

You can select and copy each one of the following examples and paste them directly into your SAS Editor window. They should each run without errors.

Example 1

data test;
input x y;
cards;
1 2
3 4
5 80
;
run;
  • Notice that a dataset is created, but no output is produced. What's missing from the program?
  • The dataset has 3 observations and 2 variables.

After the line that says 5 80 add a new line that says

6 .

Run the whole program again. Notice that SAS does not produce an error message.

Example 2

Taken from the SAS Online Documentation, Example from the PROC REG procedure.

data fitness; 
     input Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse @@; 
     datalines; 
  44 89.47 44.609 11.37 62 178 182   40 75.07 45.313 10.07 62 185 185 
  44 85.84 54.297  8.65 45 156 168   42 68.15 59.571  8.17 40 166 172 
  38 89.02 49.874  9.22 55 178 180   47 77.45 44.811 11.63 58 176 176 
  40 75.98 45.681 11.95 70 176 180   43 81.19 49.091 10.85 64 162 170 
  44 81.42 39.442 13.08 63 174 176   38 81.87 60.055  8.63 48 170 186 
  44 73.03 50.541 10.13 45 168 168   45 87.66 37.388 14.03 56 186 192 
  45 66.45 44.754 11.12 51 176 176   47 79.15 47.273 10.60 47 162 164 
  54 83.12 51.855 10.33 50 166 170   49 81.42 49.156  8.95 44 180 185 
  51 69.63 40.836 10.95 57 168 172   51 77.91 46.672 10.00 48 162 168 
  48 91.63 46.774 10.25 48 162 164   49 73.37 50.388 10.08 67 168 168 
  57 73.37 39.407 12.63 58 174 176   54 79.38 46.080 11.17 62 156 165 
  52 76.32 45.441  9.63 48 164 166   50 70.87 54.625  8.92 48 146 155 
  51 67.25 45.118 11.08 48 172 172   54 91.63 39.203 12.88 44 168 172 
  51 73.71 45.790 10.47 59 186 188   57 59.08 50.545  9.93 49 148 155 
  49 76.32 48.673  9.40 56 186 188   48 61.24 47.920 11.50 52 170 176 
  52 82.78 47.467 10.50 53 170 172 
  ; 
  run;
proc reg data=fitness;
  model Oxygen=RunTime;
  run;

Session 2

To understand the differences in the SAS Statments-INPUT and INFILE, See the following examples.

Example 1

The INFILE statement is to identify an external file.

In SAS, it will be like:

FILENAME CAT 'C:\USERS\CAT.DAT';
DATA PETS;
INFILE CAT;
INPUT ID $ 1-4 AGE 6-7 SEX $ 8;
RUN;

Notice that the CAT is the name of your external file.

Note: DATA statement is to NAME a SAS data set.

Example 2

The INPUT statment is to describe your data.

In SAS, it will be like:

FILENAME CAT 'C:\USERS\CAT.DAT';
DATA PETS;
INFILE CAT;
INPUT ID $ 1-4 AGE 6-7 SEX $ 8;
RUN;
Notice that the dollar sign idenfities the variable type as character.
Since it's meaningless to run the regression if you treat the ID as numeric variable.
And the INPUT statement here assigns the character variable ID to the data in columns 1-4,
the numeric variable AGE to the data in columns 6-7, the character variable SEX to the data in
column 8.

Example 3

There is a useful statement-OBS which can be used in the INFILE statement.

Situation: When you have 1000000 observations in your data set, and you want to take a look
at it without reading the entire data file. You can add OBS=n to the INFILE 
satement,  so that you can process only records 1 through n. 
FILENAME CAT 'C:\USERS\CAT.DAT';
DATA PETS;
INFILE CAT OBS=10;
INPUT ID $ 1-4 AGE 6-7 SEX $ 8;
RUN;

Notice that you only run the first ten observations here.

Example 4

The data statements of this example make use of a new feature called direct referencing (version 9.1). By using this new feature we avoid the added step of using a FILENAME statement.

Note: this example won't run without errors since in class we used File Import in the SAS menu to create a WORK.Class1 dataset.

Here is the raw text that belongs in a file called class1.txt in the C:\Temp folder (from an Excel spreadsheet).

id      age     name
       1      19George
       2      20Mary
       3      21Xena
       4      21Juan

proc print data=class1 (firstobs=3 obs=4);
var id age;
run;

proc means data=class1;
run;

data class2;
infile 'c:\Temp\class1.txt' firstobs=2 truncover;
input id 1-8 age 9-16 name $ 17-24;
run;

proc print data=class2;
run;

data 'c:\Temp\plato.sas7bdat';
set class1;
run;

data test;
input x y;
cards;
1 2
3 4
5 80
6          16000
;
run;

data 'c:\Temp\test.sas7bdat';
set test;
run;


Example 5

FIRSTOBS: To specify which observation SAS processes first.

data test;
input x y z;
cards;
1 2   4
3 4   7
5 80  3
6 20  2
9 30  1
;
run;

proc print data=test (firstobs=2);
run;

Run the whole example in your computer and notice that you only print out the last four observations in output.

Example 6

MISSOVER: To prevent an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. --from SDLEo

data scores;
  infile datalines missover;
  input score1-score5;
  datalines;
90 98 98
80 100 98 70 78
20 50 90 30 60
;
run;
proc print data=scores;
run;

Run the above program and notice the missing value in the ouput window.

Example 7

Change the option-MISSOVER to FLOWOVER and TRUNCOVER in the previous example, and see what's the difference.

Session 3

Example 1

data test;
input f1;
cards;
1
2
3000
0.0004
0.0005
6.6
70
365
366
367
;
run;

data test2;
set test;
f2=f1;
f3=f1;
f4=f1;
f5=f1;
format f2 8.2 ;
format f3 date9. ;
format f4 dollar8.2 ;
format f5 8.0 ;
run;

proc print data=test2;
run;

Example 2

Great white sharks (Wikipedia article) are killing machines that have not needed to evolve for millions of years.

An extended example. The data records the average length in feet of selected whales and sharks.

data sealife;
input name $ family $ length ;
datalines;
beluga   whale   15
whale    shark   40
basking  shark   30
gray     whale   50
mako     shark   12
sperm    whale   60
dwarf    shark   .5
whale    shark   40
humpback   .     50
blue     whale   100
killer   whale   30
;
run; 

proc means data=sealife mean;
var length;
run; 

proc means data=sealife min max;
class family;
var length;
run;

proc sort data=sealife out=sortedlife;
by descending length ;
run;

proc print data=sortedlife;
var name family length;
run;

proc sort data=sealife out=sealife2;
by family descending length ;
run;
proc print data=sealife2;
var name family length;
run; 

proc means data=sealife noprint nway;
class family;
var length;
output out=ds35;
run;

proc sort data=sealife out=sl2;
by family;
run;

proc means data=sl2;
var length;
by family;
run;

Example 3

Using the fitness example from the earlier session we looked at a common usage for PROC SORT

data fit2;
set fitness;
* Here's a simple calculation that will give us two age groups;
agecat=1; /* The 'young' age category */
if (age>50) then agecat=2; /* 'old' */
run;

proc print data=fit2;
var age agecat Oxygen RestPulse;
run;

proc sort data=fit2 out=jerry;
by agecat;
run;

proc corr data=jerry;
var  Oxygen RestPulse;
by agecat;
run;

A separate Correlation report is produced for each Age category.

Example 4

Candy sales data, similar to the data mentioned in The Little SAS Book, Section 4.4 (light blue edition, page 107).

Name  ClassRm Month Day Year Candy Quantity  
Adriana    21    3   2  2000  MP    7
Nathan     14    2   28 2000  CD   19
Matthew    14    3   1  2000  CD   14
Claire     14    3   3  2000  CD   11
Caitlin    21    2   24 2000  CD    9
Ian        21    3   3  2000  MP   18
Chris      14    2   18 2000  CD    6
Anthony    21    6   1  2000  MP   13
Stephen    14    3   25 2000  CD   10
Erika      21    3   25 2000  MP  17

Briefly editting the text above we wrote the program to illustrate two ways to use PROC FREQ.

data candy;
input Name $ ClassRm Month Day Year Candy $ Quantity   ;
cards;
Adriana    21    3   2  2000  MP    7
Nathan     14    2   28 2000  CD   19
Matthew    14    3   1  2000  CD   14
Claire     14    3   3  2000  CD   11
Caitlin    21    2   24 2000  CD    9
Ian        21    3   3  2000  MP   18
Chris      14    2   18 2000  CD    6
Anthony    21    6   1  2000  MP   13
Stephen    14    3   25 2000  CD   10
Erika      21    3   25 2000  MP  17
;
run;

proc freq data=candy;
tables ClassRm Candy;
run; 

proc freq data=candy;
tables ClassRm*Candy /nopercent norow;
run;


Session 4

Examples (part 1)

Height/Weight data showing very basic ODS statement usage

data htwt;
    input Name $ 1-10 Sex $ 12 Age 14-15 Height 17-18 Weight 20-22;
    datalines;
ALFRED     M 14 69 112
ALICE      F 13 56  84
BARBARA    F 14 62 102
BERNADETTE F 13 65  98
HENRY      M 14 63 102
JAMES      M 12 57  83
JANE       F 12 59  84
JANET      F 15 62 112
JEFFREY    M 13 62  84
JOHN       M 12 59  99
JOYCE      F 11 51  50
JUDY       F 14 64  90
LOUISE     F 12 56  77
MARY       F 15 66 112
PHILLIP    M 16 72 150
ROBERT     M 12 64 128
RONALD     M 15 67 133
THOMAS     M 11 57  85
WILLIAM    M 15 66 112
;
run; 

proc corr data=htwt;
var height weight;
run;

ods listing close;
proc print data=htwt;
var height weight;
run;

Notice the error message that occurs when there is no active Output Destination (above).

The following is a simple, very controlled way of directing where the output from a procedure should go ... and in what format you'd like to make the output. Other popular choices for output formats are:

  • PDF
  • RTF (Rich Text Format)
ods html file='c:\Temp\corr.html';
title 'The relationship between heights and weights';
proc corr data=htwt;
var height weight;
run;
ods html close;

ODS and function examples using Tomato data (from the Little SAS Book, 3rd edition, Section 5.3)

ods html file='c:\Temp\print2.html';
ods listing; /* Turns the listing output destination back on */
data tomatoes;
input name $13. color $  Days Weight ;
cards;
Big Zac      red 80 5
Delicious    red 80 3
Dinner Plate red 90 2
Goliath      red 85 1.5
Mega Tom     red 80 2
Big Rainbow  yellow 90 1.5
Pineapple    yellow 85 2
;
run;

proc print data=tomatoes;
run;

ods trace on;
proc corr data=tomatoes;
var Days Weight;
run;
ods trace off;
* Examine the LOG to see what the names are for the various output components;
ods trace on;
proc corr data=tomatoes nosimp; /* This nosimp option reduces the number of tables produced */
var Days Weight;
run;
ods trace off;

proc corr data=tomatoes;
ods select Corr.PearsonCorr;
var Days Weight;
run;

Now, on to limiting the observations output by a datastep, functions and creating new variables.

data tom2;
set tomatoes;
if (Days>80);
*if (Days <= 80) then delete;
run;

proc print data=tom2;
run;

data tom3;
set tomatoes;
newsum=Days+Weight;
r=round(Weight);
first3=substr(Name,1,3);
spacepos=index(name,' ');
caps=upcase(Name);
leng=length(Name);
run;

proc print data=tom3;
run;

Example 2 (skipped)

The following was not covered in class due to time contraints. The data is derived from the Little SAS Book, 3rd edition, Section 5.11. (I was lazy about typing last names. --Rhansen 13:27, 28 September 2006 (EDT)) Traffic-Lighting Data

1, Jochem Smith, Netherlands,374.66
2, Derek Smith, United States,377.98
3, Jens Smith, Germany,381.73
4, Dmitry Smith, Russia,381.85
5, KC Smith, United States,382.97

Example 3 (actually just a part 2)

Baseball data, illustrating arrays and Do loops

The following is not a proper reading of data mention in both editions of The Little SAS Book (2nd and 3rd editions) in the chapter about Simple Regression analyses. To illustrate a practicle use of arrays we've pretended that each T-ball player tries to hit a ball five times, rather than treating all 30 height/distance pairs as separate players.

data baseball;
input Height1 Distance1 
      Height2 Distance2 
      Height3 Distance3
      Height4 Distance4
      Height5 Distance5;
cards;
50 110  49 135  48 129  53 150  48 124
50 143  51 126  45 107  53 146  50 154
47 136  52 144  47 124  50 133  50 128
50 118  48 135  47 129  45 126  48 118
45 121  53 142  46 122  47 119  51 134
49 130  46 132  51 144  50 132  50 131
;
run;

data bb2;
set baseball;
array H {5} Height1 Height2 Height3 Height4 Height5;
do i=1 to 5;
H{i}=H{i}*2.54; /* To convert inches to centimeters */
end;
run; 

proc print data=bb2;
run;

data simple;
input x;
cards;
1
2 
5
6
;
run;

data newsimp;
set simple;
do i=1 to 3;
output;
end;
run;

proc print data=newsimp;
run;

Example 4

 COMPANY = Mutual Fund Company
 RETURN  = Average Annual Return, Percent
 SDAR    = Standard Deviation of Annual Return, Percent
 Name                                       Return   SDAR
 Group Securities. Common Stock Fund        15.1     19.1
 Incorporated Investors                     14.0     25.5
 Investment Company of America              17.4     21.8
 Investors Mutual                           11.3     12.5
 Loomis-Sales Mutual Fund                   10.0     10.4
 Massachusetts Investors Trust              16.2     20.8
 National Investors Corporation             18.3     19.9
 National Securities-Income Series          12.4     17.8
 New England Fund                           10.4     10.2
 Massachusetts Investors-Growth Stock       18.6     22.7
 Group Securities. Fully Administered Fund  11.4     14.1


Example 5

Time:Total minutes spent on homework per week.
Days:Days spent on homework per week.
Time Days
100  3
150  4
 99  5
  .  6
160  3
 80  .
200  3


Example 6

 FIELD = Field of Specialization
 WOMEN = Median Salaries of Women, Thousands of $
 MEN   = Median Salaries of Men, Thousands of $
 FIELD                               WOMEN    MEN
 Business, Finance, Etc.               9.3    13.0
 Labor Economics                      10.3    12.0
 Monetary-Fiscal                       8.0    11.6
 General Economic Theory               8.7    10.8
 Population, Welfare Programs, Etc.   12.0    11.5
 Economic Systems and Development      9.0    12.2


Session 5

Example 1

Simple examples to illustrate simple dataset creation.

data test;
x=1;
a=2;
b=3;
c=18;
run;

data test2;
set test;
y=32;
d=a+b;
run;

data test3;
set test2;
output;
output;
output;
run;

Example 2

Examples for generating a sequence of year values

data alumni;
year=1995;
contrib=1000;
output;
year=year+5;
contrib=1500;
output;
year=year+5;
output;
year=year+5;
output;
run;


data alumni;
year=1995;
output;
year=year+5;
contrib=1500;
output;
year=year+5;
output;
year=year+5;
output;
run;

data alumni50;
year=1995;
contrib=1000;
do i=1 to 10;
  contrib=contrib+25;
  year=year+5;
  output;
  end;
run;

proc print data=alumni50;
run;

data pop;
year=1790;
output;
do i=1 to 21;
year=year+10;
output;
end;
run;

proc print data=pop;
run;

Example 2a

Combining a sequence with real data of US Population estimates starting in 1790 (Data from the SAS online documentation PROC REG (regression) overview.)

data actpop;
input population;
cards;
3.929
5.308
7.239
9.638
12.866
17.069
23.191
31.443
39.818
50.155
62.947
75.994
91.972
105.71
122.775
131.669
151.325
179.323
203.211
226.542
248.71
281.422
;
run;
data fullpop;
merge pop actpop;
run;
proc print data=fullpop;
run;


Example 2b

Using the population data for a quick linear regression. Actually looking at the plot reveals what a poor fit the linear model is.

proc gplot data=fullpop;
plot population*year;
run;

proc reg data=fullpop;
model Population=year;
run;

Example 3

Heights, Weights and Ages for a class.

data class;
input Name $ Height Weight Age @@;
datalines;
  Alfred  69.0 112.5 14  Alice  56.5  84.0 13  Barbara 65.3  98.0 13   
  Carol   62.8 102.5 14  Henry  63.5 102.5 14  James   57.3  83.0 12 
  Jane    59.8  84.5 12  Janet  62.5 112.5 15  Jeffrey 62.5  84.0 13 
  John    59.0  99.5 12  Joyce  51.3  50.5 11  Judy    64.3  90.0 14 
  Louise  56.3  77.0 12  Mary   66.5 112.0 15  Philip  72.0 150.0 16 
  Robert  64.8 128.0 12  Ronald 67.0 133.0 15  Thomas  57.5  85.0 11 
  William 66.5 112.0 15 
;
run;

proc reg data=class ;
model Height=Age;
run;
proc reg data=class outest=est noprint;
model Height=Age;
run;

proc print data=est;
run;

Example 4

Setting up arbitrary groups based on the students' names

data class2;
set class;
group=1;
if substr(Name,1,1)='J' then group=2;
run;

* Sorting is required before BY statement processing;
proc sort data=class2;
by group;
run;

proc reg data=class2;
model Height=Age;
by group;
run;

Example 4a

Illustrating an efficient concise form for outputing several regression models' results

proc reg data=class2 outest=est2 noprint; 
model Height=Age;
by group;
run; 

proc print data=est2;
run;

Example 5

Illustrating using PROC REG to generate other output; the residuals from a regression

proc reg data=class2;
model height=age;
output out=DS1 r=george;
run;
 
proc print data=DS1;
run; 

proc sort data=DS1;
by descending george ;
run;

proc print data=DS1;
var name age height george;
run;
 

Left as an exercise: If we want the largest outlier (not the most positive), how should we change the program above? What should we do with the dataset DS1?

SAS_Certification_Examples(part_2)

Personal tools