F19 Workshops and Tutorials

Oh yes!  It is that time of year again 🙂  I have to admit that I love fall – my favourite season.  The time for so many new beginnings.  With this all in mind, the new schedule for F19 OACStats workshops is now open for registration at https://oacstats_workshops.youcanbook.me/.   Workshops will be approximately 3 hours long with breaks and hands-on exercises – so bring your laptops with the appropriate software installed.  Please note that the workshops are being held in Crop Science Building Rm 121B (room with NO computers) and will begin at 8:30am.

September 10: Introduction to SAS
September 17: Introduction to R
October 15: Getting comfortable with your data in SAS: Descriptive statistics and visualizing your data
October 29: Getting comfortable with your data in R: Descriptive statistics and visualizing your data
November 5: ANOVA in SAS
November 15: ANOVA in R

I am also trying something new this semester – to stay with the theme of new beginnings 🙂  Tutorials!  These will be held on Friday afternoons from 1:30-3:30 – sorry only time I could get a lab that worked with all the schedules.  They will be held in Crop Science Building Rm 121A (room with computers).  Topics will jump around a bit with time to review and work on Workshop materials.  To register for these please visit:  https://oacstatstutorials.youcanbook.me/

September 13: Saving your code and making your research REPRODUCIBLE
Cancelled:  September 20: Introduction to SPSS
September 27: Follow-up questions to Intro to SAS and Intro to R workshops
October 18: More DATA Step features in SAS
October 25: More on Tidy Data in R
November 1: Open Forum
November 15: Questions re: ANOVAs in SAS and R
November 29: Open Forum

I hope to see many of you this Fall!

One last new item – PODCASTS.  I’ll be trying to record the workshops and tutorials.  These will be posted on the new page and heading PODCASTS.  I will also link to them in each workshops post.

Welcome back and let’s continue to make Stats FUN

Name

R vs. SAS

A question that comes up more and more in my position.  Graduate students starting their academic career or experienced researchers looking to keep up with the “trends”.

There was a recent article published on the RBloggers website, that compared the top statistical packages:  R, Python (?), SAS, SPSS, and Stata.  If you are interested in reading the original article I’ve linked to it here.  I’d like to summarize and show a few examples as well.

What do they look like?

R Studio is one of the more common ways that folks are using R today.  It is a comfortable environment – a little bit of GUI that really doesn’t leave you hanging out in space – ok maybe a little – but you’re fine once you get comfortable with the coding.

RStudio

Yes!  you read that correctly – you need to write coding in R – very similar to needing to write code in SAS.  The code or syntax is different for the 2 programs – but you need to write some code in order to conduct any statistical analyses in either program.

SAS as you may be aware has a few different interfaces as well.  There is the SAS Studio – used with the Free University edition

SAS Studio

Licensed version of SAS:

SAS

Sample coding

As I noted earlier each program has their own language or syntax.  R is comprised of packages that may deal with a type of analysis.  Within a package there are several functions.  SAS we have PROCedures with options and lines of code that will run the analysis.  Very similar concepts.  Each program will have documentation.  Since R is open source and community driven, the detail of the documentation will depend on the creator of the package.  SAS documentation is extensive but very technical at times.

R coding

library(ggplot2)
ggplot(fruit, aes(x=Yield)) +
geom_histogram()

plot(Yield ~ Variety,
col = factor(Variety),
data=fruit)

legend(“topleft”,
legend = c(1, 2, 3, 4),
col = c(“black”, “red”, “green”, “blue”),
pch=1)

SAS coding

Proc sgplot data=out_asp2010_test;
scatter x=julian y=mms / group=entry yerrorlower= low4 yerrorupper = high4;
series x=julian y=mms / group=entry lineattrs=(pattern=solid);
xaxis label =”Julian Day”;
yaxis label = “Mms”;
title “Plot of Mms by Julian Day for 2010”;
Run;

Support

As noted above R is open source and community-driven.  Which also means that it is supported by the community.  Any questions, challenges you may encounter, you will use a variety of sources to find help:  the author of the package you are using, or a listserv.

SAS is a commercial product with professional support network to assist its users.  There are listservs of users as well.

Conclusion

As pointed out in the R Bloggers article, they both have their strengths and their weaknesses.  I’ll be honest I never through I’d see the day when banks and pharma started using R, but it’s here!  The small program that folks used because it was free and accessible, has now become a major contender in the statistical analysis world.

Which program you select to use, will depend on your background – what have you used in your undergrad or in your course – the level of support available to you on your campus, maybe what program your supervisor uses or recommends.  I used to recommend SAS if you were going to work in a workplace that needed standards, but after learning more about R and seeing its growth, I’m not sure that should be a reason to use SAS in academia anymore.

I, personally, believe, that we should be learning both programs – I know too much time to learn – but they both look awesome on a resume, and they both provide you with the opportunities to increase your skillset and talk stats to SAS and R users 😉

Name

 

Labels in SAS – Variable and Value

Adding variable labels

Do you know what group, trmt represent?  We can probably guess what age, height, and eye_colour mean, but would you know what units age and height were measured in?  Without a codebook or information, such as labels for the variables and value labels for the variable values, you would be guessing!

In SAS, and with many other statistical programs, you can add both a variable label and value labels.

Whenever you work with the data, you need to be working in a DATA step.  Drawing parallels to Excel, you will need to open a new dataset or excel worksheet, make the changes and then save it.  In SAS, you will create a new DATA Step, make the changes to the variable(s), and save it.

Data tuesday_new;
  set tuesday;        * this tells SAS that you want to use the dataset called tuesday that you                                    created earlier;
label
  group = “Individuals on the trial were randomly assigned to 4 groups”
  trmt = “Treatments were assigned within each group”
  age = “Age of the participant in years”
  height = “Height taken of the participants at the end of the trial, measured in cm”
  eye_colour = “Colour of the participants’ eyes”;
Run;

To view these changes, try a Proc print – what happens??

Try the following:

Proc Contents data=tuesday_new;
Run;

What do you see?

Adding Value labels

Sometimes you will collect variables that are coded.  Rather than writing Blue eyes, brown eyes, you might provide them with a code such as 1,2, etc…  But how do you remember what code you gave what value?  Writing it down on a piece of paper is fine, but what if you misplace that paper?  Adding value labels to your data is a great way to keep all the information together.

To accomplish this in SAS, it is a 2-step process.  We need to create the codes and their labels first, and then we need to apply these to the variables in the dataset.  This allows you to re-use the labels.

CREATING THE VALUE LABELS

Proc format;
  value $groupformat
                a = “Group A – Monday morning”
                b = “Group B – Monday afternoon”
                c = “Group C – Tuesday morning”
                d = “Group D – Tuesday afternoon”;

  value trmtformat
               1 = “Treatment 1 – Placebo”
               2 = “Treatment 2 – Vitamin C”;
Run;

This creates SAS formats.  One called groupformat and another called trmt format.  Think of these as boxes that say a represents Group A – Monday morning, etc..

APPLYING THE VALUE FORMATS TO THE DATA

Remember that we are touching the data or making changes to the data, so we need to use a Data Step.  Let’s re-use the one where we added variable labels:

Data tuesday_new;
  set tuesday;       

label
  group = “Individuals on the trial were randomly assigned to 4 groups”
  trmt = “Treatments were assigned within each group”
  age = “Age of the participant in years”
  height = “Height taken of the participants at the end of the trial, measured in cm”
  eye_colour = “Colour of the participants’ eyes”;

format
  group groupformat.
  trmt trmtformat.

Run;

S19 Workshops

A couple of workshops are now available for booking.  I will be hosting 2  1-day long workshops in June.  June 4 will be a 1-day SAS workshop followed by a 1-day R workshop on June 11.  The workshops will be held in ANNU Rm 102 starting at 9am and ending the latest by 4pm.

Please register for the one(s) you would like to attend by visiting https://oacstats_workshops.youcanbook.me/.    Please note you will need to bring a laptop with the software already installed.  If you do not have the software, you may watch the demos – however, I will not be able to help you with any software installations.

June 4 – SAS:  We will begin by touring the different versions of the SAS program that are available to us on campus.  Our next stop will be getting data into SAS, followed by some descriptive statistics. We will then move onto Regression and ANOVAs, and if time premits PCA and/or Factor analysis.  If you have a particular analysis in mind that you would like to work through in SAS, please let me know beforehand – email oacstats@uoguelph.ca.

June 11- R/RStudio:  We will again begin our tour with RStudio and discuss the merits and challenges of using the R software.  We will then work through a number of ways to get the data into RStudio, followed by some descriptive statistics and data visualization options.  We will move onto Regression and ANOVAs, and if time permits we will try our hand at some on-demand analyses.  If you have a particular analysis in mind that you would like to work through in R, please let me know beforehand – email oacstats@uoguelph.ca.