Tyler Smith bio photo

Tyler Smith

Plant taxonomist
Adjunct professor
Field botanist

Email Twitter Github Stackoverflow

UPDATE: To clarify, the following advice assumes you have some way to get R running in the appropriate directory, and that may well require you to use setwd, depending on the editor you are using. However, while setwd may be necessary in interactive use, unless you have a very good reason you shouldn’t use it in your scripting! For more see my next post.

A quick post this evening to make amends for earlier snarkiness on twitter!

I was working through a new R ‘package’ this afternoon, investigating whether it would be appropriate for a new project I’m contemplating. It installed fine, although the web documentation didn’t quite match the installed code. Checking the R help, I see there’s a GUI that I can start with:

load_GUI()

On trying that I get a ‘file not found’ error. As it turns out, this was a consequence of the definition of load_GUI(), which was:

load_GUI <- function(){
  source("/path/to/some/file.R")
}

This is a problem, as this path doesn’t exist on my computer, or, quite possibly, on any computer other than the package author’s. The only way it works is if you install the files in a particular way, and then make use of setwd somewhere in your code. However, if you’ve gone to the trouble of making a package, and then hard-code it to depend on a file arrangement particular to your computer, you are (ahem) “doing it wrong”.

The file in question does exist in the downloadables posted alongside (but not in?) this particular package, and I managed to bungle through file.R until I came to another basic syntax error that lead me to conclude the code quality was unlikely to repay further study.

It’s unusual to find this kind of problem in a package. However, I often see it when I get an assignment from a student, who has included a call to setwd in their code. I try to be very clear about my requirements for classes - your code must run on my computer. It can’t do that if it includes a line like:

setwd("/home/richard/stats-class/assginment3")

This code will never run on my computer, because my name is not richard, and I can spell ‘assignment’. Once again, this is doing it wrong.

Is setwd always wrong?

Yes.

setwd in an R package is undeniably wrong. There are much better alternatives (see below). But what is wrong with using setwd for personal scripts, some fine people on the twitter were wondering? Innocuous as it seems, setwd is in fact a time-bomb, which will blow up when:

  1. You hand in your assignment to your professor, who’s cranky to begin with
  2. You send your code to your colleague
  3. You get a new computer (which inevitably has a different file layout)

The second and third events are most likely to occur at some distance from your original writing of the code; consequently, not only do you have to figure out what your old code was supposed to do, but also where it had to be!

The right way

Scripts

For most of us, writing relatively modest scripts for one-off analyses, the easiest way to arrange our workflow is to put everything in a single directory. Give it a useful name, and copy everything you need (data, pre-processing code, analysis) into it. A simple project might look like:

soil-analysis-2015-01-26/
├── analysis.R
└── data.csv

If you have enough files to warrant more structure, you can use sub-directories:

soil-analysis-2015-01-26/
├── analysis.R
├── data.csv
├── meta-notes.txt
└── soil-data
    ├── plot1
    └── plot2

Now you can refer to files from within your scripts without worrying about exactly where they are in the file system: plot1 <- read.table("soil-data/plot1"). And if you want to share this with a colleague, you can zip- or tar- up the entire directory and they will see exactly what you see.

As an added bonus, you can turn this one directory into a git repository. For true geek Nirvana, you can put a markdown text file and a bibtex database in there too, and then you’ve got an entire manuscript, all in one place. But I’m getting ahead of myself.

Packages

If you’re writing packages, you no longer need to source anything. All your code should be in the R directory. You can include ‘private’ functions if you need to. As for data, system.file() provides one option for portably getting to it, either from within functions or as part of the examples in your help files.

I won’t go on about packages, because there’s really good information available on that from Hadley Wickham.

For more:

  • RStudio is a very nifty tool with this workflow baked right in.

  • Hadley Wickham has useful advice for writing solid R code, even if you aren’t building packages.

  • I have a slide deck on reproducible research using R and markdown. This includes material from some of Roger Peng’s many online resources.