S17 RDM Workshop: Best Practices for entering your Research Data using Excel

Powerpoint Used during the 20170607 workshop

Commonly Used Statistical Packages

For the purposes of this workshop, the following statistical packages were reviewed:

  • SAS
  • SPSS
  • Stata
  • R
  • Matlab

It is recognized that there are many more available and used in the OAC community.  If you have questions regarding other packages not included here, please email oacstats@uoguelph.ca .

Commonly Used Statistical Packages:  Variable name restrictions and limits

LENGTH OF THE VARIABLE NAME

SAS – 32 characters long
SPSS – 64 bytes long
• 64 characters in English
• 32 characters in Chinese
Stata – 32 characters long
R – 10,000 characters long
Matlab – 63 characters lo

1ST CHARACTER OF THE VARIABLE NAME

SAS – 1st character MUST be:
• a letter (English) OR
• an underscore “_”
SPSS – 1st character MUST be:
• a letter (English) OR
• an underscore “_” OR
• “@”,“#”,“$”
Stata – 1st character MUST be:
• a letter (English) OR
• an underscore “_”
R – NA
Matlab – 1st character MUST be:
• a letter

BLANKS IN VARIABLE NAMES

SAS – NO Blanks!
SPSS – NO Blanks!
Stata – NO Blanks!
R – NO Blanks!
Matlab – NO Blanks!

SPECIAL CHARACTERS IN VARIABLE NAMES

SAS – NO Special characters with the exception of:
• “_”
SPSS -NO Special characters with the exception of:
• “_”
• “.”
• “@”
Stata -NO Special characters with the exception of:
• “_”
R -NO Special characters with the exception of:
• “_”
• “.”
Matlab – NA

CASE IN VARIABLE NAMES

SAS – Mixed case – for presentation only
SPSS – Mixed case – for presentation only
Stata – Mixed case – for presentation only
R – Mixed case – for presentation only
Matlab – Case sensitive

NAMES/WORDS TO AVOID IN VARIABLE NAMES

SAS – SAS keywords
SPSS – SPSS Reserve words
Stata – NA
R – R function words
Matlab – Function names

GENERAL NOTES ABOUT VARIABLE NAMES

SAS – Libref names can only be 8 characters long
SPSS – #variable – is a scratch variable used in syntax
• $variable – is a system variable
• Do NOT end variable name with a “.” OR “_”
Stata – NA
R – R Community recommends that you develop a naming convention for your data
• Use of “_” is faster (10-20%) than the use of “.”
Matlab – NA

Commonalities across the Statistical Packages – Recommended Best Practices for Excel – Variable Names

LENGTH RECOMMENDATION

  • Maximum length: 32 characters
  • Keep the variable names short and use a variable label to provide more information. Remember you need to type these variable names in and you will need to remember them.

1ST CHARACTER OF A VARIABLE NAME

  • ALWAYS start variable names with a letter

VARIABLE NAMES AND SPECIAL CHARACTERS

  • Numbers may be used anywhere in the variable name AFTER the first character
  • Only use underscores “_”
  • Do NOT use BLANKS – replace blanks with an underscore “_”

CASE

  • Use lowercase
  • Case doesn’t matter for most packages.
  • If you are using MatLab – please be aware that the variable names are case sensitive – if you use lowercase as a Best Practice you won’t forget which ones are Capitals and which ones are NoT.

FAMILIARITY WITH STATISTICAL PACKAGE NOMENCLATURE

  • As you work with a particular package you will become familiar with keywords or reference words that are reserved for the program to use.
  • As a general rule keep away from Statistical terms as variable names.  If you REALLY want to use “mean” qualify it with your data, so wt_mean or concentration_mean

Commonalities across the Statistical Packages – Recommended Best Practices for Excel – Variable Labels

Variable names are often short and may not reflect the contents of the data collected.  Trying to create a variable name that is a descriptive summary of the data can be extremely challenging.  Recommendation is to create short, concise variable names and to create variable labels that are descriptive for each variable name.

Variable name:  wt28

Variable label: Weight (kg) at 28 days of age 

CREATING VARIABLE LABELS – SAS

Data first;
  Infile …
  Input …

 
  Label wt28 = “Weight (kg) at 28 days of age”;
Run;

 CREATING VARIABLE LABELS – SPSS

  1. Variable View in the SPSS Data viewer
    • Find the variable called wt28
    • In the Column called Label – Type: Weight (kg) at 28 days of age
  2. Syntax Window:

VARIABLE LABELS

Wt28 “Weight (kg) at 28 days of age”.

CREATING VARIABLE LABELS – R

Apply the appropriate function for the space you are working in.  For instance the Dataframe, Vector, etc..

Lapply function

CREATING VARIABLE LABELS – STATA

label variable wt28 “Weight (kg) at 28 days of age”

 

CREATING VARIABLE LABELS – MATLAB

T.Properties.VariableDecsriptions{‘wt28’}=”Weight (kg) at 28 days of age”;

One thought on “S17 RDM Workshop: Best Practices for entering your Research Data using Excel”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s