R: Using RStudio and Importing Data

Where do you start?

Learning a new program can be scary and overwhelming at times.  So let me share a few of the shortcuts I’ve learned these past few months on using RStudio.

Navigating the Windows in RStudio

When you open RStudio you’ll see 4 windows or 4 sections on your screen:  editor, console, history, and environments with tabs.  Let’s start with the environments window – you should see 6 tabs:  Environment, Files, Plots, Packages, Help, and Viewer.   The Environment tab lists the files/datasets that are being used during the current project.  The Files tab allows you to view all the files that are available in your working directory.  The Plots tab will show any plots that are created during your session.  The Packages tab will list all packages that you have loaded.  The Help tab is self-explanatory.  A quick sidenote, the Help window is great!  Please take advantage of it by using the search function in the Help tab.

The History window will list all the lines of code that you have run until you clear it out.  A great way to see what you have done – especially if you encounter troubles along the way.

That leaves the editor and the console.  The editor is where you open an R script file and the console is where you run your code as you type it in.  To run code that is in your editor – select the bits of code and hit Ctrl-Enter to run it.  In the console, you type the line, hit enter and it runs immediately.  I use these two windows in tandem.  To move between these two windows – Ctrl-2 moves you to the Console window and Ctrl-1 brings you back to the editor window.  Of course, a mouse works great too!

One more quick tip – the console window can fill up quite quickly and to me, can feel very cluttered.  Remember the History window will keep a history of your code, so it would be ok to clear out the console as you see fit.  In order to do this, use Ctrl-L to clear it out.

Working Directory

Sometimes having your program always refer to the same directory, when saving files or when opening files, can be very handy.  You’ll always know where your files are!  R makes it very easy to accomplish it.

First, let’s see what the current working directory of your RStudio is by typing:

getwd()

To change the working directory for the current project you are working on type:

setwd (“C:/Users/edwardsm/Documents/Workshops/R”)

Of course, you’ll want to make this a directory on your computer 😉   But as you look at this – do you notice anything odd about this statement???  You’ll notice that the slashes / are the opposite direction than you normally see on a Windows machine.  Changing these manually can be a time consuming effort.  One way around this is to add an extra \ after everyone in your location.  See below:

setwd (“C:\\Users\\edwardsm\\Documents\\Workshops\\R”)

Always double-check your working directory by checking getwd() Are the results what you were expecting?  If not, try it again.

More ways to set your working directory:

  • In RStudio, Session in the File Menu provides 3 options for setting your working directory:
    • To Source File location (the directory where you save your R script and program files)
    • To Files Pane Location – in the Files Pane – navigate to the location you want to have as your Working Directory.  Once you have it selected in the Files Pane, then choose Session -> Set Working Directory -> Files Pane location.  You will see the new working directory appear in your console and it should match what you select in the Files Pane.
    • Choose Directory – will open a windows dialogue box where you navigate and select the directory of choice.
  • While you are in the Files Pane location – navigate to the directory that you would like to set as your working directory, then in the Files Pane – select More -> Set Working Directory.  This option is very similar to the Files Pane Location option under the Session menu of RStudio.

Importing Data

Every statistical package has a number of ways to bring data in.  R is no different!  Now most of us will use Excel to enter our data, then we’ll clean it up in Excel before we import it into our preferred statistical package.  For the purposes of this session, I will use an Excel worksheet as an example.

The first step is to save our Excel worksheet as a CSV (Comma separated values) file.   In Excel, File -> Save As -> Select CSV as the file format.  You will be asked several questions regarding the format of the file you want to save.  Please note that when you save an Excel file or a worksheet as a CSV, it will only save the worksheet that you have selected and NOT the entire Excel file (which may have several worksheets).

There are a couple of ways to import the CSV file.  But, the first thing you’ll need to do is give the file a name.  Please note that in R, you can use a “.” in the name of the file.  For more information on best practices for filenames and variables, please visit the Best Practices for entering your Research Data using Excel

The following code will import a CSV file called Example and save it in an R file called my.data .  For this piece of code, the file, Example.csv, must be in the working directory that you’ve set earlier  OR you will need to provide R with the full location of the file – this includes the drive and directory structure.  The header=TRUE option, let’s R know that the first line of the datafile has a header or contains the names of the variables.

my.data=read.csv(“Example.csv”, header=TRUE)

If you have files that are not in the working directory or you don’t want to provide the full location of the file, then you can use the following piece of code.  Personally, I much prefer this next piece.

my.data2=read.table(file.choose(), header=TRUE, sep=”,”)

Now my file in R will be called my.data2  .  Using this code, a dialogue box will open and will allow me to navigate to the directory that holds my files.  This coding option provides me more opportunities than the first, in my opinion.  The header=TRUE holds the same meaning as above.  However, this time we need to specify the delimiter, or the item that is separating the variables in the CSV file, which is a comma – depicted as sep=”,” in the code.

Why do you have to use a sep=”,” option in the second case and not in the first, we are reading the same CSV file in both cases?  The first import coding option is using a function called read.csv – so R already knows that it will be reading a CSV or comma separated file.  Whereas in the second case, read.table – R has no idea what type of data is till be reading, therefore we need to specify what the delimiter or separator is in the datafile.

There are other ways to import data into R, but I have found these two, with preference for the second one, to be quite direct and straightforward.  It also encourages the user to maintain a data Master file in Excel, with a text copy of your data to use for the analysis.  Remember the text format will be a great option for preserving and sharing once your project is complete.

Two other packages discussed during our session today, that import data, in one case specifically Excel files and in the second case, many data formats.  These are:

  • read.xl package
  • tidyverse package

Look to future sessions on more details about these packages.

If you have suggestions or hints for other methods to import data into R, please leave a comment below or send me an email and I can add them here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s