SAS Certification Assignments
From BingWiki
Contents |
Session 1
exercise 1
Fill in the blank:
data test; _______ x 1-2 y 3-6 z 7-9; cards; 1 2 5 3 4 6 5 80 8 ; run;
exercise 2
Fill in the blanks:
data cat; ______ 'C:\Documents and Settings\Yuan-Ting Wang\UserData\cat.txt'; ______ ID $ 1-4 AGE 6-7 SEX $ 8; run;
exercise 3
Write a program, similar to the previous exercise that will create a temporary dataset called Students (who are all cool cats). There are perhaps two (or more) ways to do this. One way uses the raw data file directly. Or you may assume that a temporary dataset (called cats, work.cats, or 'cats.sas7bdat'(in some folder) already exists.
What changes should be made so that the DATA step reads only the first 15 observations?
exercise 4
Look at Example 1 in Session 1 :
data test; input x y; cards; 1 2 3 4 5 80 ; run;
How many times does SAS execute the INPUT statement when the program is submitted?
exercise 5
In example 6:
data scores; infile datalines _______; input score1-score5; datalines; 90 98 98 80 100 98 70 78 20 50 90 30 60 ; run; proc print data=scores; run;
If you know there are five tests in the semester, the first person missed the last two tests and she got 90 98 98 in the first three tests. The second person got 80 100 98 70 78, and the third person got 20 50 90 30 60. Then which options should you use to present the correct data set.
exercise 6
In Example 5:
data test; input x y z; cards; 1 2 4 3 4 7 5 80 3 6 20 2 9 30 1 ; run;
proc print data=test (firstobs=2); run;
Which changes should be made if you only want to read the first three observations?
And which changes should be made if you want to read the second and the third observations?
Session 2
exercise 1
In example 3.2
data sealife; input name $ family $ length ; datalines; beluga whale 15 whale shark 40 basking shark 30 gray whale 50 mako shark 12 sperm whale 60 dwarf shark .5 whale shark 40 humpback . 50 blue whale 100 killer whale 30 ; run;
Write a program to create a new data set called newsealife, and set a new variable-newlength which presents the length in two decimal points. And print out the new data set with only the new variable-newlength.
Hint: the output should look like below
Obs newlength
1 15.00
2 40.00
3 30.00
4 50.00
5 12.00
6 60.00
7 0.50
8 40.00
9 50.00
10 100.00
11 30.00
exercise 2
In example 3.4
Name ClassRm Month Day Year Candy Quantity Adriana 21 3 2 2000 MP 7 Nathan 14 2 28 2000 CD 19 Matthew 14 3 1 2000 CD 14 Claire 14 3 3 2000 CD 11 Caitlin 21 2 24 2000 CD 9 Ian 21 3 3 2000 MP 18 Chris 14 2 18 2000 CD 6 Anthony 21 6 1 2000 MP 13 Stephen 14 3 25 2000 CD 10 Erika 21 3 25 2000 MP 17
Create a program which shows the min and the max of the quantity.
Hint: The output should look like below:
Analysis Variable : Quantity
Minimum Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
6.0000000 19.0000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
exercise 3
Follow the previous exercise. Group the candy into two groups. Call the Candy with quantity less than 12 "1", and call the other "2". And then print out the result.
Hint: The output should look like below
Obs group 1 1 2 2 3 2 4 1 5 1 6 2 7 1 8 2 9 1 10 2
exercise 4
Follow the previous exercise, sort the data by group. And calculate the mean quantity of each group.
Hint: the output should look like below
--------------------------------------------- group=1
The MEANS Procedure
Analysis Variable : Quantity
Mean
------------
8.6000000
------------
--------------------------------------------- group=2
Analysis Variable : Quantity
Mean
------------
16.2000000
------------
exercise 5
Instead of creating the seperate tables above, what changes are you going to make if you want a single table like below:
Analysis Variable : Quantity
N
group Obs Mean
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 5 8.6000000
2 5 16.2000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Session 3
Exercise 1
Use the data in example 3.4 to create a HTML page which has the title Candy Data , and shows the correlation between quantity of the candy and the classroom. Finally, save the HTML page in your computer: Hint: Use the html , title option in ods statement. And the Proc statement.
Exercise 2
Use the data in Session 4.4. Use the round function for SDAR, the substr function to choose the first five characters of the name variale, and sort the data by the first five characters of the name.
Exercise 3
Use the data in Example 4.5 to calculate the average minutes that students spend on the homework per day. Hint: using the mean function.
Exercise 4
Use the example 4.6. Set an array T to describe the Women's and men's Salary in thousands. Hint: T[i]=T[i]*1000
Session 4
Exercise 1
Use the data in example 2.1. Run the regression with the independent variables-weight and RestPulse and the dependent variable-age. Remove those observations with the absolute value of the residual larger than 8, and re-run the program again. Use the option outest to compare the results you get. Hint: use the option r in output statement to calculate the residual.
Exercise 2
Follow the previous example. Do the plot option in proc statment and see the relation between age and weight.
Exercise 3
Create the data called month. Use do loop to get the output which has 12 months. Merge the month data and the data below-the frequency of speeding per month:
datalines; 2 3 2 1 3 6 8 9 10 13 2 12 ;
Print out the data you just merged, and see what it looks like.
Session 5
exercise 1
Use the example below:
data phone; input City $11. @12 State $ Zip $ Phonenum $; cards; cary NC 27513 6224549 cary NC 27513 6223251 chapel-hill NC 27514 9974794 raleigh NC 27612 6970450 raleigh NC 27612 6791125 cary NC 27513 6224550 ; run; proc print data=phone; run;
Use substr function to limit the first three numbers in phone number. Use length function to see how many characters the city has.
exercise 2
Variable: year Y = Index of Real Compensation per Hour, 1982=100 X = Index of Output per Hour, 1982=100 1982 100.0000 100.0000 1983 100.5000 102.0000 1984 100.4000 104.6000 1985 101.3000 106.1000 1986 104.4000 108.3000 1987 104.3000 109.4000 1988 104.4000 110.4000 1989 103.0000 109.5000 1990 103.2000 109.7000 1991 103.9000 110.1000
Use the information above to run the regression with the dependent variable-Y and independent variable-X. See the relationship between the compensation and output per hour.
exercise 3
According to the previous data, you want to change the index base to 1 not 100. What change should you make? Will that affect the result of the regression?
Hint:Apply the informat function in your codes or creat new variables.
exercise 4
Use the data below:
Y X2 X3 10 1 1 8 2 3 6 3 5 4 4 7 2 5 9 0 6 11 2 7 13
Treat Y is the dependent variable and X3 is independent variable. Take log on both sides, and compare the regression with log and without log.
exercise 5
Go to the US. census Bureau website, and findout the 2005 3rd quarter data in Federal Assistance Award Data System. Download the flat data file and try to import the data into your SAS program. Print out the first 15 observations for variables- COUNTY_NAME F_FUNDS T_FUNDS. Compare the output and the original excel data file, see if you get the correct print out. If you are interested in the data, you can read FAADS User Guide, and see how to use the data.
If you can't find the flat data file, here is the link [1]
Session 6
Use the data "air" in sashelp. Format the variable date into julday. Also try to informat the variable date into numbers of day. And explain why you get negative value in date in this case.
Hint: Since the "informat" must follow by the input statement, you have to put infile statement too. But never use infile to read sashelp data, because infile statement can only read text file. So what you may do is to save the sas data file into text file then use it in the infile statement.
Session 7
exercise 1
Follow questions in /* */ and complete the codes we did in the course. Please fill out all the blank below:
data airt; set sashelp.air; day=day(date); mo=month(date); yr=year(date); decade=(int(yr/10)-190)*10; run; proc print data=airt; run; data airt; set airt; drop day date; run; /*define a new variable-mair which stands for the maximum value of air line travel in each decade*/ proc ____ data=airt; __ decade; var air; output out=maxair ___=mair; run; proc print data=maxair; run; data test; set maxair; drop _type_ _freq_; run; /*Save the test data file into the permanent sas file*/ data '_____________________'; set ____; run; data airmax; set 'c:\temp\airmax.sas7bdat'; run; /*Now you want to merge the data airmax and airt by decade. And define a new variable relative=((mair-air)/mair)*100. Fill out the blank below*/ proc sort data=airmax; ___ decade; run; proc print data=airmax; run; proc ____ data=airt; ___ decade; run; proc print data=airt; run; data final; ____ airt airmax; ___ decade; relative=((mair-air)/mair)*100; run; proc print data=final; run;
exercise 2
Go to the website of US. Department of Labor: [2]
Find the data file describes the population-employment ratio for white male until 2002. And if now you have data in 2002 and 2003 looks like below:
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2003 74.4 74.4 74.2 74.2 74.1 74.0 74.0 73.8 73.9 73.7 73.3 73.3 2004 72.7 73.2 72.8 72.9 73.1 73.1 73.1 73.2 73.0 72.9 72.7 72.4
Try to combine those two data sets.
exercise 3
If the new data you get includes 2002-2004 instead of 2003-2004, then there will be a duplicate data for 2002 after you combine the old data and new data. How would you fix the problem? Which SAS options are you going to use?
Session 8
exercise 1
Use the soup example introduced in the workshop. Try to count the soup which is made by chicken. Hint: You can use SUBSTR or INDEX to solve the problem.
Session 9
exercise 1
Use the Example for Model Procedure in SAS Certification Examples(part 2).
1.Plot the original data
2.Apply the economic model: population = a / ( 1 + exp( b - c * (year-1790) ) ) where
a=Maximum Population.
b=Location Parameter.
c=Initial Growth Rate.
Use Model Procedure to estimate a, b and c
note:a should start from 1000, and b shoud start from 5.5 and c should start from 0.02
in SAS you can write:
start=(a 1000 b 5.5 c .02)
3.plot the model you estimate.

