SAS Certification Assignments

From BingWiki

Jump to: navigation, search


Contents

Session 1

exercise 1

Fill in the blank:

data test;
_______ x 1-2 y 3-6 z 7-9;
cards;
1 2   5
3 4   6
5 80  8
;
run;

exercise 2

Fill in the blanks:

data cat;
______  'C:\Documents and Settings\Yuan-Ting Wang\UserData\cat.txt';
______  ID $ 1-4 AGE 6-7 SEX $ 8;
run;

exercise 3

Write a program, similar to the previous exercise that will create a temporary dataset called Students (who are all cool cats). There are perhaps two (or more) ways to do this. One way uses the raw data file directly. Or you may assume that a temporary dataset (called cats, work.cats, or 'cats.sas7bdat'(in some folder) already exists.

What changes should be made so that the DATA step reads only the first 15 observations?

exercise 4

Look at Example 1 in Session 1 :

data test;
input x y;
cards;
1 2
3 4
5 80
;
run;

How many times does SAS execute the INPUT statement when the program is submitted?


exercise 5

In example 6:

data scores;
 infile datalines _______;
 input score1-score5;
 datalines;
90 98 98
80 100 98 70 78
20 50 90 30 60
;
run;
proc print data=scores;
run;

If you know there are five tests in the semester, the first person missed the last two tests and she got 90 98 98 in the first three tests. The second person got 80 100 98 70 78, and the third person got 20 50 90 30 60. Then which options should you use to present the correct data set.

exercise 6

In Example 5:

data test;
input x y z;
cards;
1 2   4
3 4   7
5 80  3
6 20  2
9 30  1
;
run;
proc print data=test (firstobs=2);
run;

Which changes should be made if you only want to read the first three observations?

And which changes should be made if you want to read the second and the third observations?

Session 2

exercise 1

In example 3.2

data sealife;
input name $ family $ length ;
datalines;
beluga   whale   15
whale    shark   40
basking  shark   30
gray     whale   50
mako     shark   12
sperm    whale   60
dwarf    shark   .5
whale    shark   40
humpback   .     50
blue     whale   100
killer   whale   30
;
run; 


Write a program to create a new data set called newsealife, and set a new variable-newlength which presents the length in two decimal points. And print out the new data set with only the new variable-newlength.

Hint: the output should look like below

     Obs             newlength
     1                 15.00
     2                 40.00
     3                 30.00
     4                 50.00
     5                 12.00
     6                 60.00
     7                  0.50
     8                 40.00
     9                 50.00
    10                100.00
    11                 30.00

exercise 2

In example 3.4

Name  ClassRm Month Day Year Candy Quantity  
Adriana    21    3   2  2000  MP    7
Nathan     14    2   28 2000  CD   19
Matthew    14    3   1  2000  CD   14
Claire     14    3   3  2000  CD   11
Caitlin    21    2   24 2000  CD    9
Ian        21    3   3  2000  MP   18
Chris      14    2   18 2000  CD    6
Anthony    21    6   1  2000  MP   13
Stephen    14    3   25 2000  CD   10
Erika      21    3   25 2000  MP   17

Create a program which shows the min and the max of the quantity.

Hint: The output should look like below:

Analysis Variable : Quantity

Minimum Maximum

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

6.0000000 19.0000000

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

exercise 3

Follow the previous exercise. Group the candy into two groups. Call the Candy with quantity less than 12 "1", and call the other "2". And then print out the result.

Hint: The output should look like below

Obs    group
1         1
2         2
3         2
4         1
5         1
6         2
7         1
8         2
9         1
10         2

exercise 4

Follow the previous exercise, sort the data by group. And calculate the mean quantity of each group.

Hint: the output should look like below

--------------------------------------------- group=1
                                       The MEANS Procedure
                                   Analysis Variable : Quantity
                                                   Mean
                                           ------------
                                              8.6000000
                                           ------------
--------------------------------------------- group=2 
                                   Analysis Variable : Quantity
                                                   Mean
                                           ------------
                                             16.2000000
                                           ------------

exercise 5

Instead of creating the seperate tables above, what changes are you going to make if you want a single table like below:

    Analysis Variable : Quantity
             N
     group    Obs        Mean
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
       1      5       8.6000000
       2      5      16.2000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ


Session 3

Exercise 1

Use the data in example 3.4 to create a HTML page which has the title Candy Data , and shows the correlation between quantity of the candy and the classroom. Finally, save the HTML page in your computer: Hint: Use the html , title option in ods statement. And the Proc statement.

Exercise 2

Use the data in Session 4.4. Use the round function for SDAR, the substr function to choose the first five characters of the name variale, and sort the data by the first five characters of the name.

Exercise 3

Use the data in Example 4.5 to calculate the average minutes that students spend on the homework per day. Hint: using the mean function.

Exercise 4

Use the example 4.6. Set an array T to describe the Women's and men's Salary in thousands. Hint: T[i]=T[i]*1000

Session 4

Exercise 1

Use the data in example 2.1. Run the regression with the independent variables-weight and RestPulse and the dependent variable-age. Remove those observations with the absolute value of the residual larger than 8, and re-run the program again. Use the option outest to compare the results you get. Hint: use the option r in output statement to calculate the residual.

Exercise 2

Follow the previous example. Do the plot option in proc statment and see the relation between age and weight.

Exercise 3

Create the data called month. Use do loop to get the output which has 12 months. Merge the month data and the data below-the frequency of speeding per month:

datalines;
2  
3 
2  
1  
3  
6  
8  
9 
10 
13  
2
12 
;

Print out the data you just merged, and see what it looks like.

Session 5

exercise 1

Use the example below:

data phone;
input City $11. @12 State $ Zip $ Phonenum $;
cards;
cary        NC  27513  6224549
cary        NC  27513  6223251
chapel-hill NC  27514  9974794
raleigh     NC  27612  6970450
raleigh     NC  27612  6791125
cary        NC  27513  6224550
;
run;
proc print data=phone;
run;

Use substr function to limit the first three numbers in phone number. Use length function to see how many characters the city has.

exercise 2

Variable:
year
Y   = Index of Real Compensation per Hour, 1982=100
X   = Index of Output per Hour, 1982=100
1982    100.0000    100.0000
1983    100.5000    102.0000
1984    100.4000    104.6000
1985    101.3000    106.1000
1986    104.4000    108.3000
1987    104.3000    109.4000
1988    104.4000    110.4000
1989    103.0000    109.5000
1990    103.2000    109.7000
1991    103.9000    110.1000

Use the information above to run the regression with the dependent variable-Y and independent variable-X. See the relationship between the compensation and output per hour.

exercise 3

According to the previous data, you want to change the index base to 1 not 100. What change should you make? Will that affect the result of the regression?

Hint:Apply the informat function in your codes or creat new variables.


exercise 4

Use the data below:

  Y     X2    X3
 10     1     1
  8     2     3
  6     3     5
  4     4     7
  2     5     9
  0     6    11
  2     7    13

Treat Y is the dependent variable and X3 is independent variable. Take log on both sides, and compare the regression with log and without log.

exercise 5

Go to the US. census Bureau website, and findout the 2005 3rd quarter data in Federal Assistance Award Data System. Download the flat data file and try to import the data into your SAS program. Print out the first 15 observations for variables- COUNTY_NAME F_FUNDS T_FUNDS. Compare the output and the original excel data file, see if you get the correct print out. If you are interested in the data, you can read FAADS User Guide, and see how to use the data.

If you can't find the flat data file, here is the link [1]

Session 6

Use the data "air" in sashelp. Format the variable date into julday. Also try to informat the variable date into numbers of day. And explain why you get negative value in date in this case.

Hint: Since the "informat" must follow by the input statement, you have to put infile statement too. But never use infile to read sashelp data, because infile statement can only read text file. So what you may do is to save the sas data file into text file then use it in the infile statement.

Session 7

exercise 1

Follow questions in /* */ and complete the codes we did in the course. Please fill out all the blank below:

data airt;
set sashelp.air;
day=day(date);
mo=month(date);
yr=year(date);
decade=(int(yr/10)-190)*10;
run;

proc print data=airt;
run;

data airt;
set airt;
drop day date;
run;
/*define a new variable-mair which stands for the maximum value of air line travel in each  decade*/
proc ____ data=airt;
__ decade;
var air;
output out=maxair ___=mair;
run;

proc print data=maxair;
run;
data test;
set maxair;
drop _type_ _freq_;
run;
/*Save the test data file into the permanent sas file*/
data '_____________________';
set ____;
run;

data airmax;
set 'c:\temp\airmax.sas7bdat';
run;
/*Now you want to merge the data airmax and airt by decade. And define a new  variable relative=((mair-air)/mair)*100. Fill out the blank below*/
proc sort data=airmax;
___ decade;
run;
proc print data=airmax;
run;

proc ____ data=airt;
___ decade;
run; 

proc print data=airt;
run;

data final;
____ airt airmax;
___ decade;
relative=((mair-air)/mair)*100;
run;

proc print data=final;
run;

exercise 2

Go to the website of US. Department of Labor: [2]

Find the data file describes the population-employment ratio for white male until 2002. And if now you have data in 2002 and 2003 looks like below:

Year Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 
2003 74.4 74.4 74.2 74.2 74.1 74.0 74.0 73.8 73.9 73.7 73.3 73.3   
2004 72.7 73.2 72.8 72.9 73.1 73.1 73.1 73.2 73.0 72.9 72.7 72.4 

Try to combine those two data sets.

exercise 3

If the new data you get includes 2002-2004 instead of 2003-2004, then there will be a duplicate data for 2002 after you combine the old data and new data. How would you fix the problem? Which SAS options are you going to use?

Session 8

exercise 1

Use the soup example introduced in the workshop. Try to count the soup which is made by chicken. Hint: You can use SUBSTR or INDEX to solve the problem.

Session 9

exercise 1

Use the Example for Model Procedure in SAS Certification Examples(part 2).

1.Plot the original data

2.Apply the economic model: population = a / ( 1 + exp( b - c * (year-1790) ) ) where

a=Maximum Population.

b=Location Parameter.

c=Initial Growth Rate.

Use Model Procedure to estimate a, b and c

note:a should start from 1000, and b shoud start from 5.5 and c should start from 0.02

in SAS you can write:

start=(a 1000 b 5.5 c .02)

3.plot the model you estimate.

Personal tools