HOME is where I want to be
But I guess I’m already there
I come home, a clean directory
I guess this must be the place
Any time you’re working in R, R needs to know where you are within your computer, which is referred to as the working directory. The working directory could be something like
"YourUsername/Documents", or it could be something more specific like
"YourUsername/Documents/GradSchool/Chapter1". Either way, R will think of everything in your computer as located relative to your working directory. One of the nicest parts of using R Projects is that they automatically set the working directory to the folder containing the .RProject file. You can see your current working directory by running the function
getwd(), and you can set it using
setwd(), but if you’re using an R Project, you generally shouldn’t mess with this too much.
All directories and files in R (and most computer languages) are located using file paths. The two working directory examples we just used are written as file paths, with
/ put between different levels. Within
YourUsername there is a folder called
Documents, and within that there is a folder called
GradSchool, and so on. In R, file paths are always wrapped in quotes. There are 2 basic kinds of file paths: absolute and relative. Absolute paths list out the full file path, usually starting with your username, which you can also refer to using the shortcut
~. So instead of
YourUsername/Documents, you can type
~/Documents. However, if you’ve got folders within folders within folders, typing out absolute paths can get really tedious. Relative paths are relative to your working directory. So if R thinks we’re in that
Chapter1 folder, and we want to access a folder inside it called
data, we can just type
data instead of
"YourUsername/Documents/GradSchool/Chapter1/data". What happens if we need to go up a level, into
GradSchool? Well we can type
.. to go up a level. So if we need to grab something from our
Chapter2 folder, but our working directory is
Chapter1, we would type
The scientific process is naturally incremental, and many projects start life as random notes, some code, then a manuscript, and eventually everything is a bit mixed together.
“Your greatest enemy is yourself four months ago” -Every grad student ever
A good project layout will ultimately make your life easier:
Although there is no “best” way to lay out a project, there are some general principles to adhere to that will make project management easier:
This is probably the most important goal of setting up a project. Raw data should never be edited, because you can never be sure that you will want to keep any edit you make, and you want to have a record of any changes you make to data. Therefore, treat your raw data as “read only”, perhaps even making a
raw_data directory that is never modified. If you do some data cleaning or modification, save the modified file separate from the raw data, and ideally keep all the modifying actions in a script so that you can review and revise them as needed in the future.
Anything generated by your scripts should be treated as disposable: it should all be able to be regenerated from your scripts. There are lots of different ways to manage this output, and what’s best may depend on the particular kind of project. At a minimum, it’s useful to have separate directories for each of the following:
RStudio has a feature to help keep everything organized in a self-contained, reproducible package, called a “project”.
A project is a small file with a
.Rproj extension, but you can think of all the files and sub-directories as belonging to that project. We recommend creating a directory and a project file for each project you work on. It should look something like this:
When you want to work on this project using R, double click on the .Rproj file, and RStudio will open it and keep everything organized for you. You can also open an existing project from RStudio by clicking “File -> Open project…
Let’s create a new project in RStudio. We won’t use this project today (we will ALWAYS work in the repository we pulled from github), but it’s good practice.
If everything went right, RStudio should’ve flickered and you should be looking at a pretty bare RStudio instance. That’s okay. Click on the “Files” tab in the lower right pane. Your .Rproj file should be there with nothing else. You’ve got the bare bones of a new project. Let’s now create the directory structure described above, a folder for each of data, code, results, and papers. You can do this in RStudio by clicking on the “New Folder” button in the Files pane, or in your OS by navigating to the directory you just created.
getwd()function. Your working directory should be your new project.
Finally, let’s switch back to our R_DAVIS project that we will be working in for the rest of class. Go up to the right hand corner of your screen and click the pull down tab that currently says something like “test_project_lastname.” Once you click it, you can toggle between multiple projects at once by clicking the white square/arrow icon next to the project name. This will allow you to have multiple projects open at once. If you want to switch projects completely, just click the project name (not the white square/arrow icon).
Now in this project go ahead and create a file structure like we’ve practiced. You should already have a data folder. In addition to this, create a “scripts” folder to store your weekly code in. It is good to come up with some kind of naming convention that you can keep consistently (e.g. week_2.R).
This lesson is adapted from the Software Carpentry: R for Reproducible Scientific Analysis Project management with RStudio materials and the Data Carpentry: R for data analysis and visualization of Ecological Data Before We Start materials. .