SAS Certification Assignments

exercise 1
Fill in the blank:

data test; _______ x 1-2 y 3-6 z 7-9; cards; 1 2  5 3 4   6 5 80  8 ; run;

exercise 2
Fill in the blanks:

data cat; ______ 'C:\Documents and Settings\Yuan-Ting Wang\UserData\cat.txt'; ______ ID $ 1-4 AGE 6-7 SEX $ 8; run;

exercise 3
Write a program, similar to the previous exercise that will create a temporary dataset called Students (who are all cool cats). There are perhaps two (or more) ways to do this. One way uses the raw data file directly. Or you may assume that a temporary dataset (called cats, work.cats, or 'cats.sas7bdat'(in some folder) already exists.

What changes should be made so that the DATA step reads only the first 15 observations?

exercise 4
Look at Example 1 in Session 1 :

data test; input x y; cards; 1 2 3 4 5 80 ; run;

How many times does SAS execute the INPUT statement when the program is submitted?

exercise 5
In example 6:

data scores; infile datalines _______; input score1-score5; datalines; 90 98 98 80 100 98 70 78 20 50 90 30 60 ; run; proc print data=scores; run;

If you know there are five tests in the semester, the first person missed the last two tests and she got 90 98 98 in the first three tests. The second person got 80 100 98 70 78, and the third person got 20 50 90 30 60. Then which options should you use to present the correct data set.

exercise 6
In Example 5:

data test; input x y z; cards; 1 2  4 3 4   7 5 80  3 6 20  2 9 30  1 ; run;

proc print data=test (firstobs=2); run;

Which changes should be made if you only want to read the first three observations?

And which changes should be made if you want to read the second and the third observations?

exercise 1
In example 3.2

data sealife; input name $ family $ length ; datalines; beluga  whale   15 whale   shark   40 basking shark   30 gray    whale   50 mako    shark   12 sperm   whale   60 dwarf   shark   .5 whale   shark   40 humpback. 50 blue    whale   100 killer  whale   30 ; run;

Write a program to create a new data set called newsealife, and set a new variable-newlength which presents the length in two decimal points. And print out the new data set with only the new variable-newlength.

Hint: the output should look like below Obs            newlength 1                15.00      2                 40.00      3                 30.00      4                 50.00      5                 12.00      6                 60.00      7                  0.50      8                 40.00      9                 50.00     10                100.00     11                 30.00

exercise 2
In example 3.4

Name ClassRm Month Day Year Candy Quantity Adriana   21    3   2  2000  MP    7 Nathan    14    2   28 2000  CD   19 Matthew   14    3   1  2000  CD   14 Claire    14    3   3  2000  CD   11 Caitlin   21    2   24 2000  CD    9 Ian       21    3   3  2000  MP   18 Chris     14    2   18 2000  CD    6 Anthony   21    6   1  2000  MP   13 Stephen   14    3   25 2000  CD   10 Erika     21    3   25 2000  MP   17

Create a program which shows the min and the max of the quantity.

Hint: The output should look like below:

Analysis Variable : Quantity

Minimum            Maximum

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

6.0000000     19.0000000

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

exercise 3
Follow the previous exercise. Group the candy into two groups. Call the Candy with quantity less than 12 "1", and call the other "2". And then print out the result.

Hint: The output should look like below

Obs   group 1        1 2         2 3         2 4         1 5         1 6         2 7         1 8         2 9         1 10         2

exercise 4
Follow the previous exercise, sort the data by group. And calculate the mean quantity of each group.

Hint: the output should look like below - group=1 The MEANS Procedure Analysis Variable : Quantity Mean 8.6000000 - group=2 Analysis Variable : Quantity Mean 16.2000000

exercise 5
Instead of creating the seperate tables above, what changes are you going to make if you want a single table like below:

Analysis Variable : Quantity N     group    Obs        Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1     5       8.6000000        2      5      16.2000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Exercise 1
Use the data in example 3.4 to create a HTML page which has the title Candy Data, and shows the correlation between quantity of the candy and the classroom. Finally, save the HTML page in your computer: Hint: Use the html, title option in ods statement. And the Proc statement.

Exercise 2
Use the data in Session 4.4. Use the round function for SDAR, the substr function to choose the first five characters of the name variale, and sort the data by the first five characters of the name.

Exercise 3
Use the data in Example 4.5 to calculate the average minutes that students spend on the homework per day. Hint: using the mean function.

Exercise 4
Use the example 4.6. Set an array T to describe the Women's and men's Salary in thousands. Hint: T[i]=T[i]*1000

Exercise 1
Use the data in example 2.1. Run the regression with the independent variables-weight and RestPulse and the dependent variable-age. Remove those observations with the absolute value of the residual larger than 8, and re-run the program again. Use the option outest to compare the results you get. Hint: use the option r in output statement to calculate the residual.

Exercise 2
Follow the previous example. Do the plot option in proc statment and see the relation between age and weight.

Exercise 3
Create the data called month. Use do loop to get the output which has 12 months. Merge the month data and the data below-the frequency of speeding per month: datalines; 2  3  2   1   3   6   8   9  10  13   2 12  ;

Print out the data you just merged, and see what it looks like.

exercise 1
Use the example below: data phone; input City $11. @12 State $ Zip $ Phonenum $; cards; cary       NC  27513  6224549 cary       NC  27513  6223251 chapel-hill NC 27514  9974794 raleigh    NC  27612  6970450 raleigh    NC  27612  6791125 cary       NC  27513  6224550 ; run; proc print data=phone; run;

Use substr function to limit the first three numbers in phone number. Use length function to see how many characters the city has.

exercise 2
Variable: year Y  = Index of Real Compensation per Hour, 1982=100 X  = Index of Output per Hour, 1982=100 1982   100.0000    100.0000 1983    100.5000    102.0000 1984    100.4000    104.6000 1985    101.3000    106.1000 1986    104.4000    108.3000 1987    104.3000    109.4000 1988    104.4000    110.4000 1989    103.0000    109.5000 1990    103.2000    109.7000 1991    103.9000    110.1000

Use the information above to run the regression with the dependent variable-Y and independent variable-X. See the relationship between the compensation and output per hour.

exercise 3
According to the previous data, you want to change the index base to 1 not 100. What change should you make? Will that affect the result of the regression?

Hint:Apply the informat function in your codes or creat new variables.

exercise 4
Use the data below: Y    X2    X3  10     1     1 8    2     3   6     3     5   4     4     7   2     5     9   0     6    11   2     7    13

Treat Y is the dependent variable and X3 is independent variable. Take log on both sides, and compare the regression with log and without log.

exercise 5
Go to the US. census Bureau website, and findout the 2005 3rd quarter data in Federal Assistance Award Data System. Download the flat data file and try to import the data into your SAS program. Print out the first 15 observations for variables- COUNTY_NAME F_FUNDS T_FUNDS. Compare the output and the original excel data file, see if you get the correct print out. If you are interested in the data, you can read FAADS User Guide, and see how to use the data.

If you can't find the flat data file, here is the link

Session 6
Use the data "air" in sashelp. Format the variable date into julday. Also try to informat the variable date into numbers of day. And explain why you get negative value in date in this case.

Hint: Since the "informat" must follow by the input statement, you have to put infile statement too. But never use infile to read sashelp data, because infile statement can only read text file. So what you may do is to save the sas data file into text file then use it in the infile statement.

exercise 1
Follow questions in /* */ and complete the codes we did in the course. Please fill out all the blank below:

data airt; set sashelp.air; day=day(date); mo=month(date); yr=year(date); decade=(int(yr/10)-190)*10; run; proc print data=airt; run; data airt; set airt; drop day date; run; /*define a new variable-mair which stands for the maximum value of air line travel in each decade*/ proc ____ data=airt; __ decade; var air; output out=maxair ___=mair; run; proc print data=maxair; run; data test; set maxair; drop _type_ _freq_; run; /*Save the test data file into the permanent sas file*/ data '_____________________'; set ____; run; data airmax; set 'c:\temp\airmax.sas7bdat'; run; '''/*Now you want to merge the data airmax and airt by decade. And define a new variable relative=((mair-air)/mair)*100. Fill out the blank below*/''' proc sort data=airmax; ___ decade; run; proc print data=airmax; run; proc ____ data=airt; ___ decade; run; proc print data=airt; run; data final; ____ airt airmax; ___ decade; relative=((mair-air)/mair)*100; run; proc print data=final; run;

exercise 2
Go to the website of US. Department of Labor: 

Find the data file describes the population-employment ratio for white male until 2002. And if now you have data in 2002 and 2003 looks like below: Year Jan Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec 2003 74.4 74.4 74.2 74.2 74.1 74.0 74.0 73.8 73.9 73.7 73.3 73.3   2004 72.7 73.2 72.8 72.9 73.1 73.1 73.1 73.2 73.0 72.9 72.7 72.4

Try to combine those two data sets.

exercise 3
If the new data you get includes 2002-2004 instead of 2003-2004, then there will be a duplicate data for 2002 after you combine the old data and new data. How would you fix the problem? Which SAS options are you going to use?

exercise 1
Use the soup example introduced in the workshop. Try to count the soup which is made by chicken. Hint: You can use SUBSTR or INDEX to solve the problem.

exercise 1
Use the Example for Model Procedure in SAS Certification Examples(part 2).

1.Plot the original data

2.Apply the economic model: population = a / ( 1 + exp( b - c * (year-1790) ) ) where

a=Maximum Population.

b=Location Parameter.

c=Initial Growth Rate.

Use Model Procedure to estimate a, b and c

note:a should start from 1000, and b shoud start from 5.5 and c should start from 0.02

in SAS you can write:

start=(a 1000 b 5.5  c .02)

3.plot the model you estimate.