SAS Certification Essential Reading

See also examples and assignments.

Lecture Notes

Class room for the workshop series

The following table is a for a comprehensive review of topics covered in ten sessions (See SAS Cert)

Start with Getting Started with SAS, but reading in the Little SAS Book is very helpful and informative. The Little SAS Book purple (2nd) edition is quite close to the more recent (light blue edition) and is on two-hour reserve at the library reference desk. It's unlikely that the single copy of the Little SAS Book (maroon colored first edition) is available (listed in circulation).

Abbreviations

 * LSB - The Little SAS Book
 * SPGo - SAS Procedures Guide (online) - SAS Online Documentation still reflects some somewhat outdated (though relevant to some sites) organizational categories called modules. The two major sections of SAS Procedures are those in the BASE and those in th STAT modules. Folks looking at time series will want to study a third module called SAS/ETS. BASE includes basic data manipulation procedures and just a few statistics analysis-like PROCs (CORR, MEANS, FREQ). STAT includes the larger selection of statistical analysis procedures (REGression, ANOVA, TTESTs, GLM (for General Linear Models), etc.) (alphabetically by name).
 * SDLEo - SAS Dictionary of Language Elements (online). Major sections organized here a a reference are: Statements (used in a data step), Functions, Formats, and Informats.
 * SoD - refers in general to the SAS Online Documentation (Things like SAS Enterprise miner, ODS (the output delivery system), SAS/GRAPH and SAS Insight are substantive and listed separately (as new modules) from the older/original User, Reference, Procedures taxonomy (before the modules existed and were packaged and sold separately)

Commonly confused terminology
The following are easily distinguished after brief explanations, but are confusable at face value when looking at a SAS Program (or talking about SAS Programs and syntax). The beginner should consider most of these learning objectives while studying essential topics and trying example problems.
 * dataset vrs. data step - The data set is a spreadsheet/table-like object that is analyzed. A Data set is read and prepared for analysis using a group of statements (beginning with the DATA statement) called a data step in the program. The term data step occassionally refers to a single iteration (reading one observation) while the data step (the group of statements) is running.
 * statement vrs. (a line) - A statement, which can extend over several lines in a program, always ends with a semi-colon.
 * function vrs. procedure(PROC) - A function is used in a data step, often during the assignment or creation of a new variable for a particular calculation. It produces a numeric or character string (text) result. A procedure is fully specified by several statements and often produces printable tables of results, and may produces new files and datasets
 * the data= option vrs. the data statement - It is good practice when specifying a procedure (writing a PROC statement) to say which dataset should be used in the PROC. This is done by including the data=something on the PROC statement (though some PROC allow other methods, like FILE= or IN=). A DATA statement (without the equals sign (=) is the first statement when declaring a new, dataset that will be referenced later in the program by data= options.
 * variable vrs. array - A variable is usually thought of as a vertical set of numbers/values indexed by observation number in a dataset. An array is group of variables (horizontal)...
 * option vrs. an optional statement (in a PROC) - and option (often though not always follows a slash (/) and is part of a statement (before the semi-colon which ends the statement). The first line of a PROC statement may have several options that affect the behavior of the procedure (type of analysis, determining the output that is generated). An optional statement (in a procedure) is what follows the first PROC statement (things like VAR, CLASS, BY, OUTPUT etc.). Some statements are not optional to make the procedure work, but often when an optional procedure is not used sensible default values or specifications are used and the procure still produces lots of output.
 * format vrs. informat - A format of a variable determines the appearance of a numeric (or other type of) variable when the variable is used in a procedure (PROC). The INformat is used less often only when reading text files in a data step so that there is a proper conversion of the text read to an internal representation of the number/value for the dataset. Informats are used for the convenience of reading variable sloppy typing during data-entry, whereas formats are used to add style and finesse to a printed report/table.
 * (an analysis) vrs. expression - An analysis refers to doing manipulation of datasets to produce some set of tables as a report. You probably specify an anlysis using several data steps and procedure steps/calls. An expression is part of an assignment statement or something else within a datastep only (usually). The output of an expression is a single value (numerical/logical or a text string).
 * Short miscellaneous elements with multiple uses (rarely encountered by beginners, when you speak 'SAS' you'll start to understand)
 * the IN= option vrs. the IN function - The option is part of a SET statement that creates a variable, to help us tell in a data step whether one dataset or another is being read. The IN function (also used in a data step) is followed (in an assignment statement usually) by a series of values separated by commas as a convenient list. X=IN(1,3,7,8) is true when the X variable takes on one of the values listed.
 * LAST= versus last. versus _LAST_ - LAST= used to specify a variable when reading a dataset, last.varname has a value of zero or one during BY processing in a data step. _LAST_ is a temporary variable that become one (the default is zero) when a data step has reached the end of a dataset.
 * OBS= versus ODS statement - OBS means Observation number (used to set the number of observation to read from a dataset). An ODS statement (meaning Output Delivery System) is used outside a data step and outside PROC steps to specify where the output should go for the following analyses.
 * MDY function versus mmddyy+. (in)format, YEAR function versus yearw. format - MDY converts three integers (a month value (1-12), a Day value (1-31) and a year value) to a SAS date value. The mmddyy+. informat allows reading dates in a text file that always use at least two digits for each of the month, day and year values and may include some space padding. Use the datew. informat if someone has used three letter month abbreviations in the text input file. The mmddyy+. informat does not have variations (referring to s for slash, c for colon etc.) The mmddyy+. family of formats does have these variations (to produce output with colons, dashes, slashes etc.). The YEAR function reads a SAS date value and returns an integer. The yearw. formats takes a variable and will make the procdure output the value as a two or four digit number.
 * N function, _N_ - N will take a list of variables (or values) separated by commas and determine if any of the variables listed are missing. _N_ is the observation number in a datastep.
 * DO statement, %DO, END statement, %END %MEND, IF statement( and THEN), %IF ( and %THEN) - The statements cause repetition/ grouping of statements etc. at runtime for each observation read (each step of the data step). The macros listed (those with percent signs (%)) may cause repetition/ or generate text conditionally at compile time (before any data is read in).
 * PUT statement versus PUT function versus %PUT - %PUT (as above, since %PUT is a macro command) has it's effect at compile, not directly runtime. A PUT statement outputs something to the log (or other destination), during runtime/execution of the SAS program. The PUT function (and PUTN, PUTC, in a data step) applies a format to a value while an expression is evaluated.
 * On an input statement - (just as trivia test) compare the use of the symbols / . - -- : + # $ @ and @@