Getting Started with SAS/Part 1

From BingWiki

Jump to: navigation, search

< Getting Started with SAS

Contents

A Simple Program

Technically, a SAS program can be saved as simple text in a file with any extension. Commonly however, SAS programs are saved with the .sas extension for the file name, but the text of these files are not coded any differently from plain text. It is thus possible to examine and edit a SAS program using any editor of your choosing. Most users, however, do use the SAS Enhanced Editor when composing or altering a SAS program as it is well integrated with other 'SAS Environment' windows and tools. Technical coding is mentioned here because many will find the opportunity to receive SAS programs as mail messages, attachments, UNIX files etc. It is better, however to explicitly open the SAS environment and examine the program before executing it. Some programs may even end with the command ENDSAS; which may have the effect of closing your SAS session before you have a chance to examine the output generated by the program.

Step 1:
Double click SAS icon (SAS 9.1 (English)) to start SAS (in the Start menu under the SAS menu).
 SAS icon

This is what your screen looks like (There should be no error messages in the Log Window if the software is current and the installation is done correctly.

Note: There are five basic windows in SAS: The Results and the Explorer windows and three programming windows: Program Editor, Log, and Output. The Results window is like a table of contents for your Output window; the Result tree lists each part of your results in an outline form. The Explorer window gives you easy access to your SAS files and libraries. We can switch between different windows by click on the window which you want to activate in the Window Bar.

Step 2:
Verify that the (Enhanced) Editor window is active.

Step 2a: (optional)
If the Editor window is not active, some important menu items will not be available. If the window is visible, but not active, then click once on the window to make it active. If an editor window is not visible, click on /View/Enhanced Editor to make such a window visible and active. When the SAS window is active, the title bar of it will be dark blue.

Step 3:
Click on /File/Open and browse your hard drive to locate the program you’d like to run.

Step 3a: (alternatively)
An example program below can be copied and pasted directly into the Editor window. In the example below, the names, gender, ages, etc. of school children are recorded.

* A simple SAS program;
options center;
options nodate;
options linesize=70;
libname localdat 'c:\Data'; 

data htwt;
    input Name $ 1-10 Sex $ 12 Age 14-15 Height 17-18 Weight 20-22;
    datalines;
ALFRED     M 14 69 112
ALICE      F 13 56  84
BARBARA    F 14 62 102
BERNADETTE F 13 65  98
HENRY      M 14 63 102
JAMES      M 12 57  83
JANE       F 12 59  84
JANET      F 15 62 112
JEFFREY    M 13 62  84
JOHN       M 12 59  99
JOYCE      F 11 51  50
JUDY       F 14 64  90
LOUISE     F 12 56  77
MARY       F 15 66 112
PHILLIP    M 16 72 150
ROBERT     M 12 64 128
RONALD     M 15 67 133
THOMAS     M 11 57  85
WILLIAM    M 15 66 112
;
run; 

proc print data=htwt;
run; 

proc means data=htwt;
   var Age Weight;
run;

proc corr data=htwt;
   var Height Weight;
run;

Step 4:
Click on the /Run/Submit button as below or choose /Run/Submit from the menu. Image:P5pic03.jpg

Remarks (Basic Debugging/ Common Errors)

In spite of your best efforts, sometimes programs just don’t work. Here are a few guidelines you can follow to help fix your program:

Read the SAS log: The SAS log has a bunch of information about your program. It is very helpful in finding the source of your errors. Make sure to start reading from the start of that program when you read the SAS log.

The SAS log has 3 types of messages about your program: errors, warnings, and notes.
Errors: The errors come up in red on your screen and they are easy to find because they are usually underlined. And your program will not run with errors. Usually errors are some kind of syntax or spelling mistake.

Warnings: Although your program will run with warnings, a warning may mean that SAS has done something you have not intended. So make sure that you know what all the warnings are about and that you agree with them.

Notes: Sometimes notes just give you information, like telling you the execution time of each step in your program. But sometimes notes can indicate a problem. Make sure that you know what each note means and why it is there.

Common Errors: Missing semicolons and misspelling keywords are the most common source of errors in SAS programs. So check your syntax first.

Example: The following program is missing a semicolon on the data statement:

data demo
input id age sex $;
datalines;
101 23 M
102 22 F
103 34 F
105 42 M
106 25 M
107 18 F
108 20 F
;
run;

Here is the SAS log after running the program:

Image:P6pic04.jpg
… then lots of WARNINGs about 0 observations etc. …

Remark: The error messages produced in the SAS log can be misleading and puzzling and it certainly seems so in this example. As a simple rule of thumb, always look at the first error message produced in the log. When a “Syntax error” occurs, it is commonly the result of missing a semi-colon (;) (as has happen above), adding or deleting a slash (/), comma (,) or parenthesis (( )) or mis-spelling a dataset, variable or keyword name. Look for missing semicolons first since they are major separators in SAS. In this example, SAS underlined the location at the end of what it thought was the first data statement (one that spans over two lines). It said it discovered a syntax error at the dollar sign because it assumed that “input” and “id” etc. were just new dataset names on a DATA statement. It is important, as well, to notice when datasets have 0 observations in them. Having no observations in a dataset often indicates that a file wasn't read properly or that the result of merging two datasets unexpectedly didn't find any overlap.

Basic Reporting (Entering Data (Basic))

Numerical data, which represents measurements, counts, or simply codes, can be typed directly into SAS for analysis. This should be done in the context of what is called a Data Step. To a great measure, since SAS is so good at handling multiple data formats as input (such as data entered in Excel, as a text file separate from SAS program itself), there is little advantage to using the SAS editor for entering data, except it’s directness, but we will proceed with this method as a simple illustration.

The following set of statements is, as a whole, a single Data step. Note: Each statement ends with a semi-colon. The data step ends with a run statement.

Image:P7pic05.jpg

Line 1 of this example – Declares that a dataset called SCORES is being created and made active. The more technically accurate name of this dataset is WORK.SCORES (the dataset SCORES, in the library WORK, a temporary scratch area), but more about library names and datasets are mentioned below when discussing Save your work.

Line 2 – Declares that the data set will have two numeric variables A and B. Variables are columns of numbers (or columns of dates, short or long characters strings (/words/names)).

Line 3 – Indicates that the following group of lines (none terminated by a semicolon) up until the next semi-colon will be raw data to be used for the dataset. A synonym for the word DATALINES; an equivalent command is CARDS;

Lines 4-6 – Each have two numbers (which can be whole numbers, negative, or include decimal points). In this data step (based on the form of the input statement), the numbers on each of these lines are separated by a single (or more than one) space.

Line 7 – contains a single semi-colon which terminates the sectioned started by the CARDS statement.

Line 8 – The RUN; statement terminates the data step. RUN; is also used to finish a PROC statement (more about these, PROCEDURE steps) below.

Note: Simply typing these statements into the Editor window accomplishes nothing. The statements must be submitted (or RUN, as described above). These lines of code also contain no PROC statements and will not produce any output/results in the Output window. Running a data step, however (whether it produces output or not) does generate a dataset in the current session’s SAS environment. A note stating such will appear in the SAS Log window and the additional dataset will have an icon visible (at an appropriate level of viewing) in the SAS Explorer Window (more later).

There are other methods of creating the same dataset (having the exact same content as above). Two simple forms are illustrated here:

data scores;
infile "C:\Temp\simpledata.txt";
input a b;
run;
proc import out = scores
         datafile = "C:\Temp\commadata.csv"
         dbms=dlm replace;
         delimiter = ",";
         getnames=yes; /* var names on first line */
         datarow=2;
run;
simpledata.txt contains the following

(with spaces, not tabs between columns)

Note: missing semi-colons are not in err. The PROC IMPORT statement ends on third line. commadata.csv contains the following
1  3   
2  5
3  8
a,b
1,3
2,5
3,8

In addition to these fairly simple examples, the INPUT and INFILE statement documentation (of the SAS Online Documentation) contain descriptions for handling just about every input data situation you can think of (ex. MISSOVER, TRUNCOVER, pointer control (using the @ symbol) etc). Note: The %INCLUDE statement is usually used for including SAS statements or whole programs and macros from an outside file, not just data.

Data Coding

A few common situations and question that arise concerning entering data as follows:
First note, it’s unnecessary to have one column that identifies the row by some unique identification number or alphanumeric code. The key word _N_ in a data step can be used to generate or refer to the observation number. By default the PROC PRINT procedure will print an observation number next to each observations data.

Secondly, some numeric data that is commonly entered represent coded values where the numbers that are entered represent membership in a group (1=Male, 2=Female or 1=White, 2=Black, 3=Hispanic) or the numbers signify certain attributes (1=’less than average IQ’, 2=’Average IQ’, etc.) of the observation/case/subject. Often this data coding scheme is called a ‘Code Book’ in many introductory statistics texts. Setting up Value labels is accomplished using VALUE formats in SAS.

For Example: A car company surveys its customers as to their preference for car colors (yellow, gray, blue and white), also records each customer’s age, sex (coded as 1 for male and 2 for female), and annual income. Here is the raw data:

18 1 25000 Y
30 1 60000 G
72 2 37000 B
31 2 49000 Y
67 1 32000 W

The following program reads the data; create formats for sex and car color using proc format; then prints the data using the new formats

data cars;
	infile 'C:\SAS programs\car survey.dat';
	input age sex income color $;

proc format;
	value gender 1='male'
	             2='female';
	value $col 'W'='White'
	           'B'='Blue'
		   'Y'='Yellow'
		   'G'='Gray';

proc print data=cars;
	format sex gender. color $col.;
	title 'Car Survey results using gender format';
run;

Here is the output:
Image:P9pic06.jpg

Remark: Gender is a numeric format and $Col is a character format (because the unformatted sex data values are numbers, whereas the unformatted color values are letters (technically one character length ‘character strings’)). Notice that in the format statement, the format names (gender. and $col.) end with periods, but that the same names in the proc format step do not have periods when they are defined.

See also other proc format options (such as PICTURE formats), and learn about built-in formats (for dates and numbers especially) described in the SAS Online Documentation. Built-in SAS formats (which should be distinguished from SAS INformats) are described in the SAS Users Guide Dictionary of Elements. PROC FORMAT options are described in the SAS/Base Procedures Guide.

Next Section>

Personal tools