Jiangtang's profile技止于此BlogListsNetwork Tools Help

Blog


    9/3/2007

    SAS学习笔记(1):Basic Concepts

    SAS OnlineTutor®: Basic and Intermediate SAS®的learning path见《SAS学习笔记:开篇》


    SAS Programs

    A SAS program can consist of a DATA step or a PROC step or any combination of DATA and PROC steps.

    DATA steps typically create or modify SAS data sets. They can also be used to produce custom-designed reports.

    PROC (procedure) steps are pre-written routines that enable you to analyze and process the data in a SAS data set and to present the data in the form of a report. They sometimes create new SAS data sets that contain the results of the procedure. PROC steps can list, sort, and summarize data.

    SAS programs consist of SAS statements. A SAS statement has two important characteristics:

    • It usually begins with a SAS keyword.
    • It always ends with a semicolon.
    A DATA step begins with a DATA statement, which begins with the keyword DATA. A PROC step begins with a PROC statement, which begins with the keyword PROC.

    SAS statements are free-format. This means that

    • they can begin and end anywhere on a line
    • one statement can continue over several lines
    • several statements can be on a line.

    Blanks or special characters separate "words" in a SAS statement.

    DATA and PROC statements signal the beginning of a new step. When SAS encounters a subsequent DATA, PROC, or RUN statement (for DATA steps and most procedures) or a QUIT statement (for some procedures), SAS stops reading statements and executes the previous step in the program. In our sample program, each step ends with a RUN statement.

    The beginning of a new step (DATA or PROC) implies the end of the previous step. Though the RUN statement is not always required between steps in a SAS program, using it can make the SAS program easier to read and debug, and it makes the SAS log easier to read.


    SAS Libraries

    Every SAS file is stored in a SAS library, which is a collection of SAS files. A SAS data library is the highest level of organization for information within SAS.

    SAS libraries have different implementations depending on your operating environment, but a library generally corresponds to the level of organization that your host operating system uses to access and store files. In some operating environments, a library is a physical collection of files. In others, the files are only logically related.

    Windows, UNIX, OpenVMS, OS/2(directory based-systems)

    a group of SAS files that are stored in the same directory. Other files can be stored in the directory, but only the files that have SAS file extensions are recognized as part of the SAS data library.

    CMS: a group of SAS files that have the same file type.

    z/OS (OS/390): a specially formatted host data set in which only SAS files are stored.

    Depending on the library name that you use when you create a file, you can store SAS files temporarily or permanently.

    Storing files temporarily:  If you don't specify a library name when you create a file (or if you specify the library name Work), the file is stored in the temporary SAS data library. When you end the session, the temporary library and all of its files are deleted.

    Storing files permanently:  To store files permanently in a SAS data library, you specify a library name other than the default library name Work.


    Referencing SAS Files

    To reference a permanent SAS data set in your SAS programs, you use a two-level name: libref.filename. In the two-level name, libref is the name of the SAS data library that contains the file, and filename is the name of the file itself. A period separates the libref and filename.

    To reference temporary SAS files, you can specify the default libref Work, a period, and the filename. For example, the two-level name Work.Test references the SAS data set named Test that is stored in the temporary SAS library Work. Alternatively, you can simply use a one-level name (the filename only) to reference a file in a temporary SAS library. When you specify a one-level name, the default libref Work is assumed.

    Rules for SAS Names: SAS data set names

    • can be 1 to 32 characters long
    • must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_)
    • can continue with any combination of numbers, letters, or underscores.


    SAS Data Sets

    A SAS data set is a file that consists of two parts: a descriptor portion and a data portion.

    The descriptor portion of a SAS data set contains information about the data set, including

    • the name of the data set
    • the date and time that the data set was created
    • the number of observations
    • the number of variables.

    The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table.

    Rows (called observations) in the data set are collections of data values that usually relate to a single object.

    Columns (called variables) in the data set are collections of values that describe a particular characteristic.


    Variable Attributes

    Name: Each variable has a name that conforms to SAS naming conventions. Variable names follow exactly the same rules as SAS data set names. Like data set names, variable names

    • can be 1 to 32 characters long
    • must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_)
    • can continue with any combination of numbers, letters, or underscores.

    Type: A variable's type is either character or numeric.

    • Character variables, such as Name (shown below), can contain any values.
    • Numeric variables, such as Policy and Total (shown below), can contain only numeric values (the digits 0 through 9, +, -, ., and E for scientific notation).

    A variable's type determines how missing values for a variable are displayed. In the following data set, Name and Sex are character variables, and Age and Weight are numeric variables.

    • For character variables such as Name, a blank represents a missing value.
    • For numeric variables such as Age, a period represents a missing value.

    Length: A variable's length (the number of bytes used to store it) is related to its type.

    • Character variables can be up to 32K long. In the example below, Name has a length of 20 characters and uses 20 bytes of storage.
    • All numeric variables have a default length of 8. Numeric values (no matter how many digits they contain) are stored as floating-point numbers in 8 bytes of storage, unless you specify a different length.

    Format: Formats are variable attributes that affect the way data values are written. SAS software offers a variety of character, numeric, and date and time formats. You can also create and store your own formats. To write values out using some particular form, you select the appropriate format.

    SAS format

    For example, to display the value 1234 as $1234.00 in a report, you can use the DOLLAR8.2 format.

    Informat: Whereas formats write values out using some particular form, informats read data values in certain forms into standard SAS values. Informats determine how data values are read into a SAS data set. You must use informats to read numeric values that contain letters or other special characters.

    SAS informat

    For example, the numeric value $12,345.00 contains two special characters, a dollar sign ($) and a comma (,). You can use an informat to read the value while removing the dollar sign and comma, and then store the resulting value as a standard numeric value.

    Label: A variable can have a label, which consists of descriptive text up to 256 characters long.

    Technorati Tags: ,

    Comments (5)

    Please wait...
    Sorry, the comment you entered is too long. Please shorten it.
    You didn't enter anything. Please try again.
    Sorry, we can't add your comment right now. Please try again later.
    To add a comment, you need permission from your parent. Ask for permission
    Your parent has turned off comments.
    Sorry, we can't delete your comment right now. Please try again later.
    You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
    Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
    Complete the security check below to finish leaving your comment.
    The characters you type in the security check must match the characters in the picture or audio.

    To add a comment, sign in with your Windows Live ID (if you use Hotmail, Messenger, or Xbox LIVE, you have a Windows Live ID). Sign in


    Don't have a Windows Live ID? Sign up

    Nov. 9
    Nov. 3
    Jiangtang Huwrote:
    to 陈靓, 也一样不稳定
    Sept. 12
    靓 陈wrote:
    老胡,我学到第四章的第三十八页,后面的怎么都打不开了,你有没有碰到这个问题?
    Sept. 7
    玲 陈wrote:
     你好,很幸运找到你的blog,很想能一起交流一下关于数据挖掘方面的知识,可以吗?我的msn是chenling_tracy@hotmail.com,等待你的答复
    Sept. 6

    Trackbacks

    The trackback URL for this entry is:
    http://johnthu.spaces.live.com/blog/cns!2053CD511E6D5B1E!276.trak
    Weblogs that reference this entry
    • None