Getting Started with SAS/Part 3

From BingWiki

Jump to: navigation, search

< Getting Started with SAS

Contents

Calculating Summary Statistics (Means, medians, maximums…)

Proc Means

Using the proc means is quite easy to obtain means, minimum and maximum values and standard deviations.

Example: The following raw data are the test scores of a certain class. The students’ IDs (identification number) are also recorded.

Here is the output:


Remark: All those statistics mentioned above are given for each variable of score1 and score2.

Note: There is a difference between calculating the mean of values that are arranged across a row (within a single case) versus calculating the mean of values arranged in column. The average of a column (a single variable) is easier (as above). To get the mean of values across a row involves a two step process. First, within a data step calculate the mean by generating a new variable.


Example: For the data file test.dat shown above, the following program shows how to find the average score for each student in the class according to score1 and score2.



Here is the output:



The proc means has a number of useful options including those which allow examining subgroups and generating results in a convenient dataset format.

Proc Univariate

It produces statistics describing the distribution of a single variable. These statistics include the mean, median, mode, standard deviations, skewness and kurtosis. It is fairly simple but useful.


Example: The following raw data are the test scores from a certain class.

Here is the result:

… then quantiles, a stem and leaf and box plot …


Note: The double trailing AT signs (@@) are used in the example above to read multiple observations per line of raw data. While reading data SAS will hold a line of data, continuing to read observations until it either runs out of data or reaches an input statement that does not end with a double trailing sign (@@).

The normal option produces a test of normality while plot option produces three plots of the data (stem-and-leaf plot, box plot, and normal probability plot).

See also proc means, proc univariate options

Working with Categorical Variables (Histograms, frequency distributions …)


Frequency

For categorical variables (non-interval level variables) that indicate unordered groups, relationships can be explored by generating tables (cross-tabulations). To count the number and percentage of cases that a variable takes on particular values you can generate a frequency distribution by using PROC FREQ statement. PROC FREQ produces many statistics for categorical data. There are various statistical options such as Chi-Square tests, Fisher’s EXACT test, Kendal’s Tau, Kappa and so on to examine the same null hypothesis of no association between the variables.


Example: The following raw data set Drugs contains data for a study of three drugs to treat a chronic disease (Agresti 1990). Forty-six subjects receive drugs A, B, and C. The response to each drug is either favorable ('F') or unfavorable ('U').

Note: Options “/ norow noclo” in TABLE statement means there are no row percentage and no column percentage shown in the frequency table.


Here is the output:

Note: From the results of Chi-Square test above, the probability of obtaining extreme or more extreme value than the observed value of 9.4138 is 0.0022 (i.e. p-value). So we reject the null hypothesis of no association between variables at 5% confidence level, that is, the data do support the idea that there is a relationship between Drug_A and Drug_B.

Histogram

A histogram is a bar chart of an interval variable. In a histogram, the interval represented by a bar is called a bin. Instead of a frequency axis, histograms in a distribution analysis use a density axis to measure the fractional distribution over a given interval. A histogram is a good tool for visually examining the distribution. However, changes in the width and position of the bars can greatly affect your perception of the shape of the distribution. A density function can be approximated by a histogram that gives the proportion of the population lying within each of a series of intervals of values. A probability density function is like a histogram with an infinite number of infinitely small intervals.


PROC UNIVARIATE and PROC CAPABILITY provide the histogram statement to create histograms using high-resolution graphics and optionally superimpose parametric and nonparametric density curve estimates. We can </FONT>display density curves for fitted theoretical distributions, for example, Normal Distributions on the histograms.



Here are the outputs:


…then the histogram by using PROC UNIVARIATE…


Note: PROC CAPABILITY can be used to compute summary statistics, request a variety of statistics for summarizing the data distribution of each analysis variable and so on. (See </FONT>PROC CAPABILITY options).</FONT>

Saving your data, programs, output

A standard method of saving work in Windows applications is to choose File and Save from the main menu. In SAS, the effect this will have varies depending on which of several windows are active. If the Enhanced or Program Editor Window is active, you will be saving SAS commands. If the Listing window is active then File Save will store output and results that have been produced by submitting the commands. Saving various aspects of your SAS session work are described below.


Save a SAS Dataset (.SAS7BDAT)

A crude method of storing a SAS dataset is to Save the Program which produces it. This works well for the simple examples of this document and is described below under Saving a SAS Program. Such a procedure, however, is not efficient in many cases (especially for large datasets) because a data step can often do a great amount of processing and calculation and may merge information from several different sources (such as disparate databases or files on multiple operating systems). The better method is to save a dataset permanently in the native SAS (.SAS7BDAT) format which stores only the results of all the computations and keeps the data in a form that is easily analyzed and recognized by the SAS system.

To save a SAS dataset by pointing and clicking on menu choices or by writing a SAS program (running a data step) to do so, it is first important to understand and ‘create’ a SAS library. In SAS, a library (named/made by using the LIBNAME statement) can contain many different datasets. It is similar to a folder in the Windows operating system (in fact a SAS library name just points to a specific Windows folder during the SAS session).

Let’s suppose, for example, that in a particular company an analyst must manage two different projects. He or she might pick arbitrary names for the projects and keep all the emails, documents and data related to each project in a separate location on his/her computer. Let’s call the two projects ‘Pegasus’ and ‘Poseidon’ (for the diagram below). The icons in the top half of the diagram represent files or folders. In the bottom half of the diagram the icons are SAS Libraries and SAS datasets (or ‘members’ of the libraries). In SAS we use library names (or ‘lib’ names) as pointers to folder/directories on the hard disk drive (or any floppy or networked drive). The names used in our SAS program may have nothing to do with the naming conventions we use in Windows. <P CLASS="list-western" STYLE="margin-bottom: 0in"><a large diagram showing folders, files, libraries and explanatory text>

Save a SAS LOG file (.LOG)

As a beginner it will be common to write programs that have syntax or logical programming errors. To receive help debugging your program it may often be helpful to send an analyst a copy of your SAS error log. The SAS LOG file is simply a text file that contains a short summary of: all the SAS Code that has been submitted, which data step and procedures that SAS has processed, and the results of these steps. Each data step and procedure call at least writes some notes about its input and how long the step was processing even if the step produces no results.

To save the LOG, click on the LOG window and choose File and then Save As from the main menu. You can name the file anything you like and place it anywhere on your hard drive or on a floppy by browsing to the desired folder. Note: The output will be plain text but won't be well formatted. There will be unexpected line breaks. A skilled analyst however, will not be deterred by poor formatting. In some cases, he or she may need to work directly with the data you are working with to fully understand the source of an error, but as a first step, examining a complete log will retrace much of what occurred during your SAS session.


See also SAS Online documentation about PUT and %PUT (as a means of tracing what occurs DURING a data step as each observation is processed) and the system option MPRINT. These statements produce LOG files that are rich with information.


Save Output from an SAS analyses (.LST)

SAS Output can be produced in a variety of formats (See especially the discussion of SAS’s Output Delivery System (ODS) below).


By default however, SAS procedures do not produce output that is sharp looking and suitable for publication. The output is formatted as plain text designed for older printers (line printers, not laser printers nor other graphically oriented display devices). This output is, however quite accurate and readable so beginning users may be content to save listing files in part or whole. As you become more skilled with SAS, it is possible to exert greater control over what is produced though such skills are beyond the scope of this document. This is discussed briefly below.

To Save the SAS Listing file

Step 1: Click on the Output window to make it active.

Step 2: Choose File and then Save

Step 3: Provide a File name for the output and browse to the folder location where you’d like it saved.


The output will be saved as plain text and tagged with a file extension LST. To view the output it’s best to use a simple text editor such as Notepad or Wordpad even though these tools will not be automatically invoked by double-clicking on the Listing file’s Icon. The LST extension is associated with SAS on most Windows installations, which causes a new SAS session to start up if you opt to double-click on the file icon. There is no harm to doing this but it is inefficient. Instead, to examine the output outside of a SAS session, it’s better to open the text processor and choose File open, or right-click on the file icon and select Open with …


Note:

The output may not appear to be well formatted when you look at it

using Word or Word Pad. For the best appearance in a word processor (which constrains text to specific margins and may word-wrap some lines of output inappropriately) choose a landscape layout with small margins and a small font (so that long lines fit are not broken up). It may also be necessary to select all of the text and format it using a Courier (or another fixed-width) font. Some fonts (such as Ariel) are variable-width fonts that allow a letter or number (such as 1) to take up less space than others. This would lead to mis-alignment of columns of numbers. You can control other aspects of the output formatting options using several SAS statements (OPTIONS, LINESIZE, PAGESIZE, NODATE, NOCENTER etc…)

To Save Just a Single Table of Interest (not the whole listing)

Step 1: Double Click on the table or procedure in the Results window that you’d like to keep. It should become visible in the Output window.

Step 2: Right-click on the highlighted Table or Procedure icon (still in the Results window).

Step 3: Choose Save As in the pop-up menu. The default File name should be appropriately named (as the name of the Procedure, or as something like ‘Summary’, or ‘SimpleStats’). You may change this name to whatever you like.

Step 4: Browse to the folder location where you’d like the output saved.

Step 5: Click on the Save button. The output will be saved as plain text, though, as when saving the whole listing file, the file name will have the extension .LST. It can be opened and viewed conveniently with Notepad or another text processor.


To simplify the output some businesses use PROC REPORT, macro variables and direct PUT statements with pointer control (@) in a data step, to control the exact text that is produced by a SAS Program.

To Save a graphical output (a bar chart or scatter plot diagram)


Graphs are stored in a separate set of files called a catalog, separate from the main table text listing.


SAS also allows you to direct the output into separate files that have a variety of formats (such as PDF and HTML files) using the Output Delivery System (ODS) commands. Precise control and many options for formatting output is a strength of recent versions of SAS, though the beginner may be overwhelmed by the number of choices. The ODS is described elsewhere.


See also Elaborations on Basic Analyses below.


Save a SAS Program

click on the Editor window and choose /File/Save Note: The lines between steps won't appear. The program is saved as a text file. Be careful with the ENDSAS; statement. See also

Elaborations on Basic Analyses


Refining the Look of Your Output


System Options

There are some command system options which affect the appearance of SAS outputs:

Center | Nocenter: Center centers the output on the page;
Nocenter left justifies the output.

Date | Nodate: With date, today’s date will appear at the top of each page of output;
With nodate it will not.

Number | Nonumber: This switch controls whether or not page number appear on each page of SAS output.

Linesize = n: With linesize you can control; the maximum length of output lines. Possible values for n are 64-256.

Pagesize = n: Pagesize controls the maximum number of lines per page of output. Possible values for n are 15-32767.


Example: The following example is used to demonstrate how to use options of center/nocenter, date/nodate, and linesize.



Here is the output:


Proc statement options

Statement options appear in individual statements and influence how SAS runs that particular DATA or PROC step. DATA =, for example, is a statement option you can use in any procedure that reads a SAS data set. This option tells SAS which data set to use. Without it, SAS defaults to use the most currently created data set.

Next Section>

Personal tools