Welcome to the documenting your data analysis workshop. The notes for this workshop are available here. Please note that by attending the workshop you will be creating your own version of notes.

# Author: Michelle

## S20 Workshops – UPDATED

Oh my!!! I updated the registration page but forgot to update the blog. Oish!!! Summer workshops are happening and they start next week!! We will be using TEAMS for the workshops. The Friday before the workshops are happening you will receive an invitation to the Meeting space. Hope to see you there!!

To register please visit https://oacstats_workshops.youcanbook.me/

A few more workshops have been added to this summer’s roster:

**Tuesday, July 7**: Regression in SAS: Nonlinear

**Wednesday, July 8**: Regression in R: Nonlinear

**Tuesday, July 14**: PCA and Cluster Analysis in SAS

**Wednesday, July15**: PCA and Cluster Analysis in R

**Tuesday, July 21**: GLMM using Multinomial data in SAS

**Wednesday, July 22**: Visualizing your analysis results in R

_______________________________________________

**Tuesday, May 5**: Starting your research off on the right foot. How to organize and collect your data to help make your adventure into statistics a little easier.

**Wednesday, May 6**: Documenting your data analysis. Whether you are planning on using R or SAS to conduct your analysis, come learn how to use R Markdown to document your syntax and output. If you’re curious what this is – check out the Workshop notes on the OACStats Blog

**Tuesday, May 12**: Intro to SAS

**Wednesday, May 13**: Intro to RStudio

**Tuesday, May 19**: Getting Comfortable with your data in SAS

**Wednesday, May 20**: Getting Comfortable with your data in R

**Tuesday, May 26**: ANOVA in SAS – CRD and RCBD

**Wednesday, May 27**: ANOVA in R – CRD and RCBD

**Tuesday, June 2**: Regression in SAS: Linear and Multiple regression

**Wednesday, June 3**: Regression in R: Linear and Multiple regression

**Tuesday, June 9**: ANOVA in SAS: GLMM

**Wednesday, June 10**: ANOVA in R: GLMM

**Tuesday, June 16**: Regression in SAS: Nonlinear

**Wednesday, June 17**: Regression in R: Nonlinear

**Tuesday, June 23**: ANOVA in SAS: Repeated Measures

**Wednesday, June 24**: ANOVA in R: Repeated Measures

**Tuesday, July 7**: PCA and Cluster Analysis in SAS

**Wednesday, July 8**: PCA and Cluster Analysis in R

## R – Documenting in R and ANOVA/GLMM analyses

Ever wondered how you can write R script, document it, run the script, and document the output – all in one file? Come join us, on February 18 in Crop Science Rm 121a starting at 9am, as we learn all about R Markdown. I’ll introduce R Markdown and then encourage everyone to use as we learn more about ANOVAs and GLMMs.

The Excel file that we will use for the second half of the workshop is downloadable here.

If you can’t make it here is a copy of the notes (created with R Markdown).

If you are a SAS user – keep an eye on this webpage for an upcoming workshop on how to use R Markdown with SAS.

## W20 – Starting your Research on the Right Foot – Introduction to RDM

For those that attended the RDM workshop on January 8, 2020, here is a copy of the slides we used as talking points. Please note that if you have specific questions regarding RDM you should contact the Library at lib.research@uoguelph.ca

## R vs SAS Series: Getting the data ready – ANOVA

Continuing on from our last blog post R vs SAS Series: Statistical Models Review – ANOVA, let’s take a look at how we need to get the data ready for our analysis.

Let’s review our statistical model.

Nitrate_{ij} = μ + trmt_{i} + e_{ij}

Where:

Nitrate_{ij} = Stem nitrate amount of the jth observation in the ith trmt

μ = Overall mean or model intercept

trmt_{i} = the effect of the ith treatment group

e_{ij }= random error or experimental error

This means that in order to run our analysis, we need to have stem nitrate measures and information about our treatments. Specifically, we need to have in our dataset a column with the nitrate measures and a second column that tells us which treatment each nitrate measure was on. You may also have a column that is an identifier – in this case **Plot_ID **which helps me to identify which plot the measurements were taken from. A sample data table or Excel file may look like this:

Plot_ID |
Treatment |
Nitrate |

101 | 1 | 34.98 |

102 | 2 | 40.89 |

103 | 3 | 42.07 |

… | … | … |

124 | 6 | 43.29 |

## Fixed vs Random Effects

Now we need to do a little bit of background work. We’ve all heard of FIXED and RANDOM effects. These should be driven by your statistical model! In the example we are currently working with, we only have one effect: **Treatment**. Is it a FIXED or is it a RANDOM effect?

Let’s go back and look at some definitions and examples of these 2 terms.

### Fixed Effects

Fixed effects are something you want to study – you set out the levels that you are interested in. You “fix” the levels. The results from your experiment can only talk about the levels you studied.

- Example #1: I want to see whether 1st year students prefer Coke or Pepsi
- Example #2: I want to see the effect of 3 levels of fertilizer on my crop

### Random Effects

Random effects are factors in your design that may contribute variation in your outcome measure, but you are not interested in it. You only want to account for it, before looking at your treatment effects.

- Example #1: I want to study the effect of fertilizer on my crop
- Example #2: Block effect, Weather, etc…

Back to our example – what do you think our **Treatment** effect is? If you said FIXED – you are correct!

Alrighty – so **Treatment** is a FIXED effect. In our dataset, we entered the **Treatment** levels as 1, 2, 3, 4, 5, or 6 – in other words, we used numbers. We could have used letters / alphanumeric / strings – doesn’t matter. However, using numbers we need to let our programs know that these values are not numbers that we will calculate means or manipulate in any way. They are to be used as a grouping or classification or as a factor variable. Something that tells us and the program which treatment each of our nitrate values comes from.

In SAS – we can do this very simply by including the Treatment variable in a CLASS statement. However, in R, we need to change the format of the variable to a factor. TO do this we need to use the following R script:

Treatment <- as.factor(Treatment)

We’ll see how this fits in with our ANOVA coding in the next Blog post. For know – remember:

- We need to determine which of our factors are FIXED or RANDOM
- In R, we need to change the format of our factors using the as.factor() function.

## Quick Recap

Everything is based on that statistical model – please remember what it is for your trial

Factors in our model may be FIXED or RANDOM

In SAS we can tell the program which variables are factors by listing them in a CLASS statement.

In R, we need to use the as.factor() function to change the format of our factor variables to a factor

## Coming up next in this mini series

- R vs. SAS Series: Conducting the ANOVA
- R vs. SAS Series: Reading the ANOVA outputs
- R vs. SAS Series: RCBD – ANOVA
- R vs. SAS Series: RCBD – Reading the ANOVA outputs