Ridgetown Workshop – August 2, 2017

SAS program files:

Data for balanced RCBD example

Complete SAS program for balanced RCBD example

Data rcbd;
input block trmt Nitrogen;
datalines;
1 1 34.98
1 2 40.89
1 3 42.07
1 4 37.18
1 5 37.99
1 6 34.89
2 1 41.22
2 2 46.69
2 3 49.42
2 4 45.85
2 5 41.99
2 6 50.15
3 1 36.94
3 2 46.65
3 3 52.68
3 4 40.23
3 5 37.61
3 6 44.57
4 1 39.97
4 2 41.9
4 3 42.91
4 4 39.2
4 5 40.45
4 6 43.29
;
Run;

Data for unbalanced RCBD example

Complete SAS program for unbalanced RCBD example

Data rcbd_unb;
input block trmt Nitrogen;
datalines;
1 1 34.98
1 2 40.89
1 3 42.07
1 4 37.18
1 5 37.99
1 6 34.89
2 1 41.22
2 2 46.69
2 3 49.42
2 4 45.85
2 5 41.99
2 6 50.15
3 2 46.65
3 3 52.68
3 4 40.23
3 5 37.61
3 6 44.57
4 1 39.97
4 3 42.91
4 4 39.2
4 5 40.45
4 6 43.29
;
Run;

Data for Repeated Measures example

Complete SAS program for Repeated Measures example

Data repeated;
input ID Room trmt day wt;
datalines;
1 1 1 1 13
2 1 1 1 17
3 1 1 1 13
4 1 2 1 16
5 1 2 1 17
6 1 2 1 17
1 2 1 2 22
2 2 1 2 24
3 2 1 2 20
4 2 2 2 23
5 2 2 2 22
6 2 2 2 23
1 3 1 3 36
2 3 1 3 38
3 3 1 3 46
4 3 2 3 45
5 3 2 3 45
6 3 2 3 32
;
Run;

Data for Count example

Complete SAS program for Count data example

Data trial;
input trmt$ block count;
datalines;
A 1 69
A 2 56
A 3 20
A 4 63
B 1 69
B 2 72
B 3 74
B 4 82
C 1 87
C 2 72
C 3 80
C 4 95
D 1 78
D 2 72
D 3 50
D 4 94
;
Run;

S17 SAS Workshop: Proc GLM, Proc MIXED, Proc GLIMMIX – an overview – RCBD

Notes For the CRD and RBCD Workshop – PDF file

This workshop will look at a Randomized Complete Block Design (RCBD) in Proc GLM, Proc MIXED, and Proc GLIMMIX.  The goal is to review the coding similarities & differences, along with the differences & similarities in the respective outputs.

The SAS program can be found here – please note that it is a PDF file

Proc GLM Results

Proc MIXED Results

Proc GLIMMIX Results

 

S17 SAS Workshop: Proc GLM, Proc MIXED, Proc GLIMMIX – an overview – CRD

Notes For the CRD and RBCD Workshop – PDF file

The goals of this workshop are:

  • to compare Proc GLM, Proc MIXED, Proc GLIMMIX using a Completely Randomized Design (CRD) for the example by:
    • showing coding differences
    • showing output differences
  • to provide guidelines/explanations as to why and when you would use GLM, MIXED, and GLIMMIX

Proc GLIMMIX, appears to be the “new” kid on the block when it comes to analyzing our data.  But believe it or not, GLIMMIX has existed for many years, but never really caught on, until a few years ago.  Many of us now are relearning our traditional analyses methods in SAS and converting to GLIMMIX.

There will be several workshops that will concentrate on the use of Proc GLIMMIX.  The idea is that we will start with the straighforward experimental designs and increase the complexity to showcase the strengths of GLIMMIX and maybe convince you to make the switch to this more robust SAS procedure.  This workshop will use the basic Completely Randomized Design to primarily show coding and output differences among the 3 procedures.

Completely Randomized Design

Our fictitious design has 6 treatments (A, B, C, D, F, G) with 4 observations per treatment. Our Null Hypothesis states that all treatment means are equal, with our Alternate Hypothesis stating that at least 2 means are not equal.  We will have a model to reflect this design as:

Outcome variable(Weight) = overall mean + Treatment effect + residual error

To read in the data we will use a Data Step as follows:

/***************************************************************************/
/* Reading data gathered from a CRD conducted across 6 treatments */
/* Variables are 6 treatments and weight collected in hypothetical units */
/* This is a dummy dataset created for the purposes of a demo and workshop */
/* Created by A.M.Edwards May 23, 2017 */
/***************************************************************************/

Data crd;
input ID trmt$ weight;
datalines;
1 A 41
2 A 24
3 A 33
4 A 38
5 B 24
6 B 21
7 B 16
8 B 43
9 C 46
10 C 33
11 C 14
12 C 19
13 D 32
14 D 38
15 D 15
16 D 17
17 F 31
18 F 15
19 F 36
20 F 46
21 G 28
22 G 40
23 G 37
24 G 39
;
Run;

History of ANOVA analyses in SAS

1966 – SAS is released with Proc ANOVA, which is to be used with:

  • balanced data ONLY!
  • FIXED effects ONLY!
  • NOTE from SAS Online Docs: “Caution:If you use PROC ANOVA for analysis of unbalanced data, you must assume responsibility for the validity of the results.

1976 – SAS released Proc GLM

  • balanced (Type I SS) and unbalanced (Type III SS)
  • RANDOM statement introduced – provides EMS (expected mean squares equations, but you need to do the calculations!)

1992 – Proc MIXED

  • RANDOM statement incorporated
  • REPEATED statement introduced
  • “Normally distributed” data ONLY
  • linear effects

1992 – Proc GENMOD

  • Non-normal data
  • Fixed effects ONLY

xxxx? – Proc NLMIXED

  • normal, binomial, Poisson distributions
  • nonlinear effects

2005 – Proc GLIMMIX

  • Proc MIXED
  • Proc NLMIXED
  • Non-normal data

Proc GLM – General Linear Model

Proc GLM was the second generation PROCedure developed in SAS to conduct ANOVAs (analysis of variance).  This Proc is still used today for situations where you have a FIXED effects model and a balanced design – same number of observations in each treatment group.

Proc glm data=crd;
  class trmt;
  model weight = trmt;
  title “Proc GLM Results”;
Run;
Quit;

Proc glm – calls on the GLM Procedure.  data=crd – specifies the dataset which you want Proc GLM to use.

Class statement – list your classification variables here.  Think of these variables are those that tell you which group your observations fall into.

Model statement – this should be based on your experimental design.  In this case we have a CRD – our dependent variable = independent variable or our fixed effect.

Title statement – another great little habit to start.  Create a title statement for each procedure you use.  This way you will have a title at the top of our output window.  You will never guess again as to what that output was about.  If you want more titles or subtitles simply type title2 or title 3, etc….  You can also use the Footnote option to add notes to the bottom of our output page.

Run statement finishes the Procedure.

Quit statement will let SAS know that you do not want to add any more information to the Proc GLM.  Proc GLM is one of the few SAS Procedures that will wait for more instructions by running in the background.  In order to close it out, you will need to add a Quit.

View Proc GLM Results

Proc MIXED

With the increasing use of mixed models – models that include both fixed and random effects, Proc MIXED was developed.   Proc MIXED can also account for unbalanced designs.  Using the same CRD dataset:

Proc mixed data=crd;
  class trmt;
  model weight = trmt;
  title “Proc MIXED Results”;
Run;

 

You should obtain the SAME results with both procedures with a basic CRD design. For most straightforward models, Proc GLM and Proc MIXED should yield the same results.

Proc mixed – calls on the MIXED Procedure.  data=crd – specifies the dataset which you want Proc MIXED to use.

Class statement – list your classification variables here.  Think of these variables are those that tell you which group your observations fall into.

Model statement – this should be based on your experimental design.  In this case we have a CRD – our dependent variable = independent variable or our fixed effect.

Run statement finishes the Procedure.

View Proc MIXED Results

Proc GLIMMIX

Proc GLIMMIX does it all!  ok, almost.  For our purposes, Proc GLIMMIX handles the different types of experimental designs that are used in OAC and in the agricultural field.

Proc glimmix data=crd;
  class trmt;
  model weight = trmt;
  title “Proc GLIMMIX Results”;
Run;

Proc glimmix – calls on the GLIMMIX Procedure.  data=crd – specifies the dataset which you want Proc GLIMMIX to use.

Class statement – list your classification variables here.  Think of these variables are those that tell you which group your observations fall into.

Model statement – this should be based on your experimental design.  In this case we have a CRD – our dependent variable = independent variable or our fixed effect.

Run statement finishes the Procedure.

View Proc GLIMMIX Results

 

S17 SAS Workshop: Getting comfortable with my data in SAS. Descriptive Statistics

PDF Copy of Online notes – 20170711

Quick Review of reading Data into SAS

Preparing Data

  1. Variable names in the first row – make sure they are appropriate for the statistical software you are using.  For more information check out the Best Practices for Entering your Research Data using Excel
  2. Save your Excel file as a CSV – if you are using the INFILE statement.  Please note for Mac users, you MUST save as MSDOS-CSV!

SAS Studio Users

  1. Upload your CSV file to your SAS Studio
  2. Remember to right-click on the file once it is in My Files to obtain its location for the INFILE statement.

Copying and Pasting from Excel

  1. With smaller datasets this works fine
  2. But you need to remember where your MASTER dataset is!!

Download Excel file for this workshop Dataset

Data tuesday;
  input ID group$ trmt age height eye_colour;
  datalines;
1  a  1  39  137  2
2  a  1  35  140  2

;
Run;

Using an INFILE statement

Data tuesday;

infile “C:\Users\edwardsm\Documents\Workshops\SAS\Level_I\SASI_2\dataset.csv”                  dlm=”,” firstobs =2 missover;
  input ID group$ trmt age height eye_colour;
Run;

Checking your data

Use a Proc Print – to make sure that SAS has read in your data correctly.  ALWAYS read the LOG window.  You will see how many lines of observations are in the file and how many variables were read.  You should also see information about the data your read in.  If you’re using the INFILE statement, you will see characteristics about the file.

Proc print data=tuesday;
Run;

Adding variable labels

Do you know what group, trmt represent?  We can probably guess what age, height, and eye_colour mean, but would you know what units age and height were measured in?  Without a codebook or information, such as labels for the variables and value labels for the variable values, you would be guessing!

In SAS, and with many other statistical programs, you can add both a variable label and value labels.

Whenever you work with the data, you need to be working in a DATA step.  Drawing parallels to Excel, you will need to open a new dataset or excel worksheet, make the changes and then save it.  In SAS, you will create a new DATA Step, make the changes to the variable(s), and save it.

Data tuesday_new;
  set tuesday;        * this tells SAS that you want to use the dataset called tuesday that you                                    created earlier;
label
  group = “Individuals on the trial were randomly assigned to 4 groups”
  trmt = “Treatments were assigned within each group”
  age = “Age of the participant in years”
  height = “Height taken of the participants at the end of the trial, measured in cm”
  eye_colour = “Colour of the participants’ eyes”;
Run;

To view these changes, try a Proc print – what happens??

Try the following:

Proc Contents data=tuesday_new;
Run;

What do you see?

ProcContents_labels

Adding Value labels

Sometimes you will collect variables that are coded.  Rather than writing Blue eyes, brown eyes, you might provide them with a code such as 1,2, etc…  But how do you remember what code you gave what value?  Writing it down on a piece of paper is fine, but what if you misplace that paper?  Adding value labels to your data is a great way to keep all the information together.

To accomplish this in SAS, it is a 2-step process.  We need to create the codes and their labels first, and then we need to apply these to the variables in the dataset.  This allows you to re-use the labels.

Creating the value labels

Proc format;
  value $groupformat
                a = “Group A – Monday morning”
                b = “Group B – Monday afternoon”
                c = “Group C – Tuesday morning”
                d = “Group D – Tuesday afternoon”;

  value trmtformat
               1 = “Treatment 1 – Placebo”
               2 = “Treatment 2 – Vitamin C”;
Run;

This creates SAS formats.  One called groupformat and another called trmt format.  Think of these as boxes that say a represents Group A – Monday morning, etc..

Applying the value formats to the data

Remember that we are touching the data or making changes to the data, so we need to use a Data Step.  Let’s re-use the one where we added variable labels:

Data tuesday_new;
  set tuesday;       

label
  group = “Individuals on the trial were randomly assigned to 4 groups”
  trmt = “Treatments were assigned within each group”
  age = “Age of the participant in years”
  height = “Height taken of the participants at the end of the trial, measured in cm”
  eye_colour = “Colour of the participants’ eyes”;

format
  group groupformat.
  trmt trmtformat.

Run;

Permanent vs Temporary SAS Datasets

We we work with SAS, when you look in the LOG window you see reference to something called WORK.TUESDAY or WORK.TUESDAY_NEW.  We didn’t add the WORK part, so where did that come from?

SAS organizes the data it reads in a Library.  The default library is called the WORK library.  This is temporary, which means that when I shut down SAS, all the datasets that were read into SAS are deleted.  Your original Excel files are still there, as is your SAS coding (if you saved it).  But any of the temporary SAS datasets are deleted.

We can create permanent SAS datasets however.  These will be physical files with the file ending of .sas7bdat  For extremely large files, this may be the best way to handle them.  Read them into SAS once and save them.

To do this we need to create a SAS library reference to a physical location on our laptop/computer.

libname sasdata “C:\Users\edwardsm\Documents\Workshops\SAS”;

This maps the location to the SAS libraries in the “black box” of the SAS program.  To save a permanent SAS datafile to this location we do the following:

Data sasdata.tuesday_new;
  set tuesday_new;
Run;

We simply change the first name of WORK to our library name SASDATA.  Check out your log window to see what happened!  Also check your computer to see if you can find that file.

NB: I’m not sure how this works with SAS Studio!

Descriptive Statistics

We will run Proc freq and Proc means to describe the data we have just read.

Here is a link to the SAS_20170609_ME that was used in this workshop.