S17 SAS Workshop: Getting comfortable with my data in SAS. Descriptive Statistics

PDF Copy of Online notes – 20170711

Quick Review of reading Data into SAS

Preparing Data

1. Variable names in the first row – make sure they are appropriate for the statistical software you are using.  For more information check out the Best Practices for Entering your Research Data using Excel
2. Save your Excel file as a CSV – if you are using the INFILE statement.  Please note for Mac users, you MUST save as MSDOS-CSV!

SAS Studio Users

2. Remember to right-click on the file once it is in My Files to obtain its location for the INFILE statement.

Copying and Pasting from Excel

1. With smaller datasets this works fine
2. But you need to remember where your MASTER dataset is!!

Data tuesday;
input ID group\$ trmt age height eye_colour;
datalines;
1  a  1  39  137  2
2  a  1  35  140  2

;
Run;

Using an INFILE statement

Data tuesday;

infile “C:\Users\edwardsm\Documents\Workshops\SAS\Level_I\SASI_2\dataset.csv”                  dlm=”,” firstobs =2 missover;
input ID group\$ trmt age height eye_colour;
Run;

Use a Proc Print – to make sure that SAS has read in your data correctly.  ALWAYS read the LOG window.  You will see how many lines of observations are in the file and how many variables were read.  You should also see information about the data your read in.  If you’re using the INFILE statement, you will see characteristics about the file.

Proc print data=tuesday;
Run;

Do you know what group, trmt represent?  We can probably guess what age, height, and eye_colour mean, but would you know what units age and height were measured in?  Without a codebook or information, such as labels for the variables and value labels for the variable values, you would be guessing!

In SAS, and with many other statistical programs, you can add both a variable label and value labels.

Whenever you work with the data, you need to be working in a DATA step.  Drawing parallels to Excel, you will need to open a new dataset or excel worksheet, make the changes and then save it.  In SAS, you will create a new DATA Step, make the changes to the variable(s), and save it.

Data tuesday_new;
set tuesday;        * this tells SAS that you want to use the dataset called tuesday that you                                    created earlier;
label
group = “Individuals on the trial were randomly assigned to 4 groups”
trmt = “Treatments were assigned within each group”
age = “Age of the participant in years”
height = “Height taken of the participants at the end of the trial, measured in cm”
eye_colour = “Colour of the participants’ eyes”;
Run;

To view these changes, try a Proc print – what happens??

Try the following:

Proc Contents data=tuesday_new;
Run;

What do you see?

ProcContents_labels

Sometimes you will collect variables that are coded.  Rather than writing Blue eyes, brown eyes, you might provide them with a code such as 1,2, etc…  But how do you remember what code you gave what value?  Writing it down on a piece of paper is fine, but what if you misplace that paper?  Adding value labels to your data is a great way to keep all the information together.

To accomplish this in SAS, it is a 2-step process.  We need to create the codes and their labels first, and then we need to apply these to the variables in the dataset.  This allows you to re-use the labels.

Creating the value labels

Proc format;
value \$groupformat
a = “Group A – Monday morning”
b = “Group B – Monday afternoon”
c = “Group C – Tuesday morning”
d = “Group D – Tuesday afternoon”;

value trmtformat
1 = “Treatment 1 – Placebo”
2 = “Treatment 2 – Vitamin C”;
Run;

This creates SAS formats.  One called groupformat and another called trmt format.  Think of these as boxes that say a represents Group A – Monday morning, etc..

Applying the value formats to the data

Remember that we are touching the data or making changes to the data, so we need to use a Data Step.  Let’s re-use the one where we added variable labels:

Data tuesday_new;
set tuesday;

label
group = “Individuals on the trial were randomly assigned to 4 groups”
trmt = “Treatments were assigned within each group”
age = “Age of the participant in years”
height = “Height taken of the participants at the end of the trial, measured in cm”
eye_colour = “Colour of the participants’ eyes”;

format
group groupformat.
trmt trmtformat.

Run;

Permanent vs Temporary SAS Datasets

We we work with SAS, when you look in the LOG window you see reference to something called WORK.TUESDAY or WORK.TUESDAY_NEW.  We didn’t add the WORK part, so where did that come from?

SAS organizes the data it reads in a Library.  The default library is called the WORK library.  This is temporary, which means that when I shut down SAS, all the datasets that were read into SAS are deleted.  Your original Excel files are still there, as is your SAS coding (if you saved it).  But any of the temporary SAS datasets are deleted.

We can create permanent SAS datasets however.  These will be physical files with the file ending of .sas7bdat  For extremely large files, this may be the best way to handle them.  Read them into SAS once and save them.

To do this we need to create a SAS library reference to a physical location on our laptop/computer.

libname sasdata “C:\Users\edwardsm\Documents\Workshops\SAS”;

This maps the location to the SAS libraries in the “black box” of the SAS program.  To save a permanent SAS datafile to this location we do the following:

Data sasdata.tuesday_new;
set tuesday_new;
Run;

We simply change the first name of WORK to our library name SASDATA.  Check out your log window to see what happened!  Also check your computer to see if you can find that file.

NB: I’m not sure how this works with SAS Studio!

Descriptive Statistics

We will run Proc freq and Proc means to describe the data we have just read.

Here is a link to the SAS_20170609_ME that was used in this workshop.