It hit 23o Celsius in Ottawa today, so I packed my camera and headed out to Mer Bleue for my first photo trek of the season. It’ll be another week or so before the spring wildflowers get into full swing, but I came back with a great preview.
I’ve been working on a lot of AFLP data this winter. I’d really like to be able to do all the analysis in R, for a few reasons. First, it would mean no more fighting with GeneMapper, which is incredibly frustrating: it’s Windows-only, expensive, closed-source and painfully underpowered for the job. Second, presumably if I can figure out how to code this myself I will develop a deeper understanding of the system. And third, if I can get the code working in R, I will be able to automate most of the process.
There are two R projects already in progress for working with AFLP data. RawGeno is one option. It doesn’t yet allow for importing fsa files directly, but the example scripts provide some clues about how to do this. I couldn’t get the code to work as written, but I was able to steal some ideas from it.
The other R package is AFLP. This package includes a read.fsa() function, but it doesn’t seem to work yet. I understand they’ve only recently switched to ABI sequencers, and haven’t yet updated their code. AFLP also combines reading the fsa files, calibrating the sizing, and defining the bins into one step. That’s a sensible thing to do, but I’m not that clever. I need to break things into small pieces if I hope to get anywhere.
Since one of my goals is self-education, I’m not concerned about duplicating some of the effort of these other projects. In fact, I’m going to try and steal as much as I can from them. That’s one of the benefits of Free Software, we get to learn from each other.
The Flora of North America is a great resource for botanists. The books are nice, and even better, almost all of the keys and images are also freely available online. These keys are generally the best available for a genus or family, unless you’re lucky enough to work in an area with a very recent local flora.
However, since the keys have to include all the species found anywhere in the US and Canada, they tend to be long and convoluted. On occasion I’ve rewritten some of the keys, to trim them down to the species where I live. This is tedious work, and I usually give up before I’ve done more than one or two of my favourite Carex sections.
Enter Python. I’ve been looking for a good project to try Python for a while, and it turns out it’s a great tool for scraping websites and reformatting the data to suit your needs.
(If you’re not interested in programming, you may want to skip to the end product, which is my draft key to the sedges of Ontario. The rest of this post is about the code I wrote to make it. On the other hand, if you are interested in Python, what follows may horrify you. It’s my first Python program, so it’s bound to be ugly. This may lead you to wonder who exactly is the intended audience for this post. That’s a good question.)