UPDATE: To clarify, the following advice assumes you have some way to
get R running in the appropriate directory, and that may well require you
setwd, depending on the editor you are using. However, while
setwd may be necessary in interactive use, unless you have a very good
reason you shouldn’t use it in your scripting! For more see
my next post.
A quick post this evening to make amends for earlier snarkiness on twitter!
I was working through a new R ‘package’ this afternoon, investigating whether it would be appropriate for a new project I’m contemplating. It installed fine, although the web documentation didn’t quite match the installed code. Checking the R help, I see there’s a GUI that I can start with:
On trying that I get a ‘file not found’ error. As it turns out, this was a
consequence of the definition of
load_GUI(), which was:
This is a problem, as this path doesn’t exist on my computer, or, quite
possibly, on any computer other than the package author’s. The only way it
works is if you install the files in a particular way, and then make use of
setwd somewhere in your code. However, if you’ve gone to the trouble of
making a package, and then hard-code it to depend on a file arrangement
particular to your computer, you are (ahem) “doing it wrong”.
The file in question does exist in the downloadables posted alongside (but
not in?) this particular package, and I managed to bungle through
file.R until I came to another basic syntax error that lead me to
conclude the code quality was unlikely to repay further study.
It’s unusual to find this kind of problem in a package. However, I often
see it when I get an assignment from a student, who has included a call to
setwd in their code. I try to be very clear about my requirements for
classes - your code must run on my computer. It can’t do that if it
includes a line like:
This code will never run on my computer, because my name is not richard, and I can spell ‘assignment’. Once again, this is doing it wrong.
setwd always wrong?
setwd in an R package is undeniably wrong. There are much better
alternatives (see below). But what is wrong with using
setwd for personal
scripts, some fine people on the twitter were wondering? Innocuous as it
setwd is in fact a time-bomb, which will blow up when:
- You hand in your assignment to your professor, who’s cranky to begin with
- You send your code to your colleague
- You get a new computer (which inevitably has a different file layout)
The second and third events are most likely to occur at some distance from your original writing of the code; consequently, not only do you have to figure out what your old code was supposed to do, but also where it had to be!
The right way
For most of us, writing relatively modest scripts for one-off analyses, the easiest way to arrange our workflow is to put everything in a single directory. Give it a useful name, and copy everything you need (data, pre-processing code, analysis) into it. A simple project might look like:
soil-analysis-2015-01-26/ ├── analysis.R └── data.csv
If you have enough files to warrant more structure, you can use sub-directories:
soil-analysis-2015-01-26/ ├── analysis.R ├── data.csv ├── meta-notes.txt └── soil-data ├── plot1 └── plot2
Now you can refer to files from within your scripts without worrying about
exactly where they are in the file system:
read.table("soil-data/plot1"). And if you want to share this with a
colleague, you can
tar- up the entire directory and they will
see exactly what you see.
As an added bonus, you can turn this one directory into a git repository. For true geek Nirvana, you can put a markdown text file and a bibtex database in there too, and then you’ve got an entire manuscript, all in one place. But I’m getting ahead of myself.
If you’re writing packages, you no longer need to
source anything. All
your code should be in the
R directory. You can include ‘private’
functions if you need to. As for data,
system.file() provides one option
for portably getting to it, either from within functions or as part of the
examples in your help files.
I won’t go on about packages, because there’s really good information available on that from Hadley Wickham.