SAS Certification Examples
From BingWiki
Back to SAS Certification Examples and Exercises
(Part 1)
Contents |
Session 1
You can select and copy each one of the following examples and paste them directly into your SAS Editor window. They should each run without errors.
Example 1
data test; input x y; cards; 1 2 3 4 5 80 ; run;
- Notice that a dataset is created, but no output is produced. What's missing from the program?
- The dataset has 3 observations and 2 variables.
After the line that says 5 80 add a new line that says
6 .
Run the whole program again. Notice that SAS does not produce an error message.
Example 2
Taken from the SAS Online Documentation, Example from the PROC REG procedure.
data fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse @@;
datalines;
44 89.47 44.609 11.37 62 178 182 40 75.07 45.313 10.07 62 185 185
44 85.84 54.297 8.65 45 156 168 42 68.15 59.571 8.17 40 166 172
38 89.02 49.874 9.22 55 178 180 47 77.45 44.811 11.63 58 176 176
40 75.98 45.681 11.95 70 176 180 43 81.19 49.091 10.85 64 162 170
44 81.42 39.442 13.08 63 174 176 38 81.87 60.055 8.63 48 170 186
44 73.03 50.541 10.13 45 168 168 45 87.66 37.388 14.03 56 186 192
45 66.45 44.754 11.12 51 176 176 47 79.15 47.273 10.60 47 162 164
54 83.12 51.855 10.33 50 166 170 49 81.42 49.156 8.95 44 180 185
51 69.63 40.836 10.95 57 168 172 51 77.91 46.672 10.00 48 162 168
48 91.63 46.774 10.25 48 162 164 49 73.37 50.388 10.08 67 168 168
57 73.37 39.407 12.63 58 174 176 54 79.38 46.080 11.17 62 156 165
52 76.32 45.441 9.63 48 164 166 50 70.87 54.625 8.92 48 146 155
51 67.25 45.118 11.08 48 172 172 54 91.63 39.203 12.88 44 168 172
51 73.71 45.790 10.47 59 186 188 57 59.08 50.545 9.93 49 148 155
49 76.32 48.673 9.40 56 186 188 48 61.24 47.920 11.50 52 170 176
52 82.78 47.467 10.50 53 170 172
;
run;
proc reg data=fitness;
model Oxygen=RunTime;
run;
Session 2
To understand the differences in the SAS Statments-INPUT and INFILE, See the following examples.
Example 1
The INFILE statement is to identify an external file.
In SAS, it will be like:
FILENAME CAT 'C:\USERS\CAT.DAT'; DATA PETS; INFILE CAT; INPUT ID $ 1-4 AGE 6-7 SEX $ 8; RUN;
Notice that the CAT is the name of your external file.
Note: DATA statement is to NAME a SAS data set.
Example 2
The INPUT statment is to describe your data.
In SAS, it will be like:
FILENAME CAT 'C:\USERS\CAT.DAT'; DATA PETS; INFILE CAT; INPUT ID $ 1-4 AGE 6-7 SEX $ 8; RUN;
Notice that the dollar sign idenfities the variable type as character. Since it's meaningless to run the regression if you treat the ID as numeric variable.
And the INPUT statement here assigns the character variable ID to the data in columns 1-4, the numeric variable AGE to the data in columns 6-7, the character variable SEX to the data in column 8.
Example 3
There is a useful statement-OBS which can be used in the INFILE statement.
Situation: When you have 1000000 observations in your data set, and you want to take a look at it without reading the entire data file. You can add OBS=n to the INFILE satement, so that you can process only records 1 through n.
FILENAME CAT 'C:\USERS\CAT.DAT'; DATA PETS; INFILE CAT OBS=10; INPUT ID $ 1-4 AGE 6-7 SEX $ 8; RUN;
Notice that you only run the first ten observations here.
Example 4
The data statements of this example make use of a new feature called direct referencing (version 9.1). By using this new feature we avoid the added step of using a FILENAME statement.
Note: this example won't run without errors since in class we used File Import in the SAS menu to create a WORK.Class1 dataset.
Here is the raw text that belongs in a file called class1.txt in the C:\Temp folder (from an Excel spreadsheet).
id age name
1 19George
2 20Mary
3 21Xena
4 21Juan
proc print data=class1 (firstobs=3 obs=4); var id age; run; proc means data=class1; run; data class2; infile 'c:\Temp\class1.txt' firstobs=2 truncover; input id 1-8 age 9-16 name $ 17-24; run; proc print data=class2; run; data 'c:\Temp\plato.sas7bdat'; set class1; run; data test; input x y; cards; 1 2 3 4 5 80 6 16000 ; run; data 'c:\Temp\test.sas7bdat'; set test; run;
Example 5
FIRSTOBS: To specify which observation SAS processes first.
data test; input x y z; cards; 1 2 4 3 4 7 5 80 3 6 20 2 9 30 1 ; run; proc print data=test (firstobs=2); run;
Run the whole example in your computer and notice that you only print out the last four observations in output.
Example 6
MISSOVER: To prevent an INPUT statement from reading a new input data record if it does not find values in the current input line for all the variables in the statement. --from SDLEo
data scores; infile datalines missover; input score1-score5; datalines; 90 98 98 80 100 98 70 78 20 50 90 30 60 ; run; proc print data=scores; run;
Run the above program and notice the missing value in the ouput window.
Example 7
Change the option-MISSOVER to FLOWOVER and TRUNCOVER in the previous example, and see what's the difference.
Session 3
Example 1
data test; input f1; cards; 1 2 3000 0.0004 0.0005 6.6 70 365 366 367 ; run; data test2; set test; f2=f1; f3=f1; f4=f1; f5=f1; format f2 8.2 ; format f3 date9. ; format f4 dollar8.2 ; format f5 8.0 ; run; proc print data=test2; run;
Example 2
Great white sharks (Wikipedia article) are killing machines that have not needed to evolve for millions of years.
An extended example. The data records the average length in feet of selected whales and sharks.
data sealife; input name $ family $ length ; datalines; beluga whale 15 whale shark 40 basking shark 30 gray whale 50 mako shark 12 sperm whale 60 dwarf shark .5 whale shark 40 humpback . 50 blue whale 100 killer whale 30 ; run; proc means data=sealife mean; var length; run; proc means data=sealife min max; class family; var length; run; proc sort data=sealife out=sortedlife; by descending length ; run; proc print data=sortedlife; var name family length; run; proc sort data=sealife out=sealife2; by family descending length ; run; proc print data=sealife2; var name family length; run; proc means data=sealife noprint nway; class family; var length; output out=ds35; run; proc sort data=sealife out=sl2; by family; run; proc means data=sl2; var length; by family; run;
Example 3
Using the fitness example from the earlier session we looked at a common usage for PROC SORT
data fit2; set fitness; * Here's a simple calculation that will give us two age groups; agecat=1; /* The 'young' age category */ if (age>50) then agecat=2; /* 'old' */ run; proc print data=fit2; var age agecat Oxygen RestPulse; run; proc sort data=fit2 out=jerry; by agecat; run; proc corr data=jerry; var Oxygen RestPulse; by agecat; run;
A separate Correlation report is produced for each Age category.
Example 4
Candy sales data, similar to the data mentioned in The Little SAS Book, Section 4.4 (light blue edition, page 107).
Name ClassRm Month Day Year Candy Quantity Adriana 21 3 2 2000 MP 7 Nathan 14 2 28 2000 CD 19 Matthew 14 3 1 2000 CD 14 Claire 14 3 3 2000 CD 11 Caitlin 21 2 24 2000 CD 9 Ian 21 3 3 2000 MP 18 Chris 14 2 18 2000 CD 6 Anthony 21 6 1 2000 MP 13 Stephen 14 3 25 2000 CD 10 Erika 21 3 25 2000 MP 17
Briefly editting the text above we wrote the program to illustrate two ways to use PROC FREQ.
data candy; input Name $ ClassRm Month Day Year Candy $ Quantity ; cards; Adriana 21 3 2 2000 MP 7 Nathan 14 2 28 2000 CD 19 Matthew 14 3 1 2000 CD 14 Claire 14 3 3 2000 CD 11 Caitlin 21 2 24 2000 CD 9 Ian 21 3 3 2000 MP 18 Chris 14 2 18 2000 CD 6 Anthony 21 6 1 2000 MP 13 Stephen 14 3 25 2000 CD 10 Erika 21 3 25 2000 MP 17 ; run; proc freq data=candy; tables ClassRm Candy; run; proc freq data=candy; tables ClassRm*Candy /nopercent norow; run;
Session 4
Examples (part 1)
Height/Weight data showing very basic ODS statement usage
data htwt;
input Name $ 1-10 Sex $ 12 Age 14-15 Height 17-18 Weight 20-22;
datalines;
ALFRED M 14 69 112
ALICE F 13 56 84
BARBARA F 14 62 102
BERNADETTE F 13 65 98
HENRY M 14 63 102
JAMES M 12 57 83
JANE F 12 59 84
JANET F 15 62 112
JEFFREY M 13 62 84
JOHN M 12 59 99
JOYCE F 11 51 50
JUDY F 14 64 90
LOUISE F 12 56 77
MARY F 15 66 112
PHILLIP M 16 72 150
ROBERT M 12 64 128
RONALD M 15 67 133
THOMAS M 11 57 85
WILLIAM M 15 66 112
;
run;
proc corr data=htwt;
var height weight;
run;
ods listing close;
proc print data=htwt;
var height weight;
run;
Notice the error message that occurs when there is no active Output Destination (above).
The following is a simple, very controlled way of directing where the output from a procedure should go ... and in what format you'd like to make the output. Other popular choices for output formats are:
- RTF (Rich Text Format)
ods html file='c:\Temp\corr.html'; title 'The relationship between heights and weights'; proc corr data=htwt; var height weight; run; ods html close;
ODS and function examples using Tomato data (from the Little SAS Book, 3rd edition, Section 5.3)
ods html file='c:\Temp\print2.html'; ods listing; /* Turns the listing output destination back on */ data tomatoes; input name $13. color $ Days Weight ; cards; Big Zac red 80 5 Delicious red 80 3 Dinner Plate red 90 2 Goliath red 85 1.5 Mega Tom red 80 2 Big Rainbow yellow 90 1.5 Pineapple yellow 85 2 ; run; proc print data=tomatoes; run; ods trace on; proc corr data=tomatoes; var Days Weight; run; ods trace off;
* Examine the LOG to see what the names are for the various output components;
ods trace on; proc corr data=tomatoes nosimp; /* This nosimp option reduces the number of tables produced */ var Days Weight; run; ods trace off; proc corr data=tomatoes; ods select Corr.PearsonCorr; var Days Weight; run;
Now, on to limiting the observations output by a datastep, functions and creating new variables.
data tom2; set tomatoes; if (Days>80); *if (Days <= 80) then delete; run; proc print data=tom2; run; data tom3; set tomatoes; newsum=Days+Weight; r=round(Weight); first3=substr(Name,1,3); spacepos=index(name,' '); caps=upcase(Name); leng=length(Name); run; proc print data=tom3; run;
Example 2 (skipped)
The following was not covered in class due to time contraints. The data is derived from the Little SAS Book, 3rd edition, Section 5.11. (I was lazy about typing last names. --Rhansen 13:27, 28 September 2006 (EDT)) Traffic-Lighting Data
1, Jochem Smith, Netherlands,374.66 2, Derek Smith, United States,377.98 3, Jens Smith, Germany,381.73 4, Dmitry Smith, Russia,381.85 5, KC Smith, United States,382.97
Example 3 (actually just a part 2)
Baseball data, illustrating arrays and Do loops
The following is not a proper reading of data mention in both editions of The Little SAS Book (2nd and 3rd editions) in the chapter about Simple Regression analyses. To illustrate a practicle use of arrays we've pretended that each T-ball player tries to hit a ball five times, rather than treating all 30 height/distance pairs as separate players.
data baseball;
input Height1 Distance1
Height2 Distance2
Height3 Distance3
Height4 Distance4
Height5 Distance5;
cards;
50 110 49 135 48 129 53 150 48 124
50 143 51 126 45 107 53 146 50 154
47 136 52 144 47 124 50 133 50 128
50 118 48 135 47 129 45 126 48 118
45 121 53 142 46 122 47 119 51 134
49 130 46 132 51 144 50 132 50 131
;
run;
data bb2;
set baseball;
array H {5} Height1 Height2 Height3 Height4 Height5;
do i=1 to 5;
H{i}=H{i}*2.54; /* To convert inches to centimeters */
end;
run;
proc print data=bb2;
run;
data simple;
input x;
cards;
1
2
5
6
;
run;
data newsimp;
set simple;
do i=1 to 3;
output;
end;
run;
proc print data=newsimp;
run;
Example 4
COMPANY = Mutual Fund Company RETURN = Average Annual Return, Percent SDAR = Standard Deviation of Annual Return, Percent Name Return SDAR Group Securities. Common Stock Fund 15.1 19.1 Incorporated Investors 14.0 25.5 Investment Company of America 17.4 21.8 Investors Mutual 11.3 12.5 Loomis-Sales Mutual Fund 10.0 10.4 Massachusetts Investors Trust 16.2 20.8 National Investors Corporation 18.3 19.9 National Securities-Income Series 12.4 17.8 New England Fund 10.4 10.2 Massachusetts Investors-Growth Stock 18.6 22.7 Group Securities. Fully Administered Fund 11.4 14.1
Example 5
Time:Total minutes spent on homework per week. Days:Days spent on homework per week. Time Days 100 3 150 4 99 5 . 6 160 3 80 . 200 3
Example 6
FIELD = Field of Specialization WOMEN = Median Salaries of Women, Thousands of $ MEN = Median Salaries of Men, Thousands of $ FIELD WOMEN MEN Business, Finance, Etc. 9.3 13.0 Labor Economics 10.3 12.0 Monetary-Fiscal 8.0 11.6 General Economic Theory 8.7 10.8 Population, Welfare Programs, Etc. 12.0 11.5 Economic Systems and Development 9.0 12.2
Session 5
Example 1
Simple examples to illustrate simple dataset creation.
data test; x=1; a=2; b=3; c=18; run; data test2; set test; y=32; d=a+b; run; data test3; set test2; output; output; output; run;
Example 2
Examples for generating a sequence of year values
data alumni; year=1995; contrib=1000; output; year=year+5; contrib=1500; output; year=year+5; output; year=year+5; output; run; data alumni; year=1995; output; year=year+5; contrib=1500; output; year=year+5; output; year=year+5; output; run; data alumni50; year=1995; contrib=1000; do i=1 to 10; contrib=contrib+25; year=year+5; output; end; run; proc print data=alumni50; run; data pop; year=1790; output; do i=1 to 21; year=year+10; output; end; run; proc print data=pop; run;
Example 2a
Combining a sequence with real data of US Population estimates starting in 1790 (Data from the SAS online documentation PROC REG (regression) overview.)
data actpop; input population; cards; 3.929 5.308 7.239 9.638 12.866 17.069 23.191 31.443 39.818 50.155 62.947 75.994 91.972 105.71 122.775 131.669 151.325 179.323 203.211 226.542 248.71 281.422 ; run; data fullpop; merge pop actpop; run; proc print data=fullpop; run;
Example 2b
Using the population data for a quick linear regression. Actually looking at the plot reveals what a poor fit the linear model is.
proc gplot data=fullpop; plot population*year; run; proc reg data=fullpop; model Population=year; run;
Example 3
Heights, Weights and Ages for a class.
data class; input Name $ Height Weight Age @@; datalines; Alfred 69.0 112.5 14 Alice 56.5 84.0 13 Barbara 65.3 98.0 13 Carol 62.8 102.5 14 Henry 63.5 102.5 14 James 57.3 83.0 12 Jane 59.8 84.5 12 Janet 62.5 112.5 15 Jeffrey 62.5 84.0 13 John 59.0 99.5 12 Joyce 51.3 50.5 11 Judy 64.3 90.0 14 Louise 56.3 77.0 12 Mary 66.5 112.0 15 Philip 72.0 150.0 16 Robert 64.8 128.0 12 Ronald 67.0 133.0 15 Thomas 57.5 85.0 11 William 66.5 112.0 15 ; run; proc reg data=class ; model Height=Age; run; proc reg data=class outest=est noprint; model Height=Age; run; proc print data=est; run;
Example 4
Setting up arbitrary groups based on the students' names
data class2; set class; group=1; if substr(Name,1,1)='J' then group=2; run; * Sorting is required before BY statement processing; proc sort data=class2; by group; run; proc reg data=class2; model Height=Age; by group; run;
Example 4a
Illustrating an efficient concise form for outputing several regression models' results
proc reg data=class2 outest=est2 noprint; model Height=Age; by group; run; proc print data=est2; run;
Example 5
Illustrating using PROC REG to generate other output; the residuals from a regression
proc reg data=class2; model height=age; output out=DS1 r=george; run; proc print data=DS1; run; proc sort data=DS1; by descending george ; run; proc print data=DS1; var name age height george; run;
Left as an exercise: If we want the largest outlier (not the most positive), how should we change the program above? What should we do with the dataset DS1?

