Example rmd2pdf document

Using the LaTeX listings package to style R PDF reports with knitr and pandoc

knitr is a an R package that allows you to include R code in markdown or LaTeX source files, and have the code and/or its output included in the resulting html or pdf files. RStudio provides good support for this, so if you want to try it out that’s a good place to start. This post assumes you’ve got everything installed and working, and want to customize the pdf output via LaTeX.

I’ve been working with this for a week or two, and the one hitch that I’ve run into is generating a nice pdf directly from the source Rmd file. For example, working from this source file, in the Rmd or R-markdown format:

Generating html is easy. From within R, just call knit2html("example.Rmd"). This produces a self-contained, nicely highlighted html file. Even easier from Rstudio, just click the ‘knit to html’ button.

Getting to pdf requires one more step. knit2html("example.Rmd") created two new files, one in markdown: example.md, and the target html: example.html. The markdown file can be converted to pdf with pandoc("example.md", format = "latex").

(if you don’t need html output, you can use knit("example.Rmd") instead of knit2html)

This calls pandoc behind the scenes to do the conversion. You can now view your output in a pdf viewer. The R source code and output has a different font and a bit of highlighting, but are not otherwise set off from the surrounding code.

Default Pandoc PDF

I’m preparing R tutorials, and want to visually distinguish my instructions, the R code, and the associated output. The pandoc default doesn’t quite cut it here.

Leaving R for the command line, we can try the pandoc highlight-style options. I like tango:

This shades the R source code, but the output is unchanged. Not quite what I’m after:

Pandoc tango highlighting style

It seems like the highlight-styles ought to be customizable, but I haven’t figure that out yet.

Pandoc provides one further option, using the LaTeX Listings package. Listings provides lots of different options for customizing the presentation and highlighting of code blocks in latex output. With listings I’ll be able to add boxes and shading to the code chunks with a custom template for the file.

However, before I can do that, I need to fix one small shortcoming of the Pandoc LaTeX output. When Pandoc uses the --listings option on our example.md file:

The resulting latex is marked up like this:

Note that the R source code has the language option set to R, but the output has no options set at all. There’s no way to style these two environments differently in LaTeX. To do that, we need to apply a style to one or the other of these listings. I haven’t found any way to accomplish this using knitr or pandoc, so I now pipe the pandoc output through sed to get this done:

Almost there. Now the listings styles are applied, but the style needs to be defined in the document template. Pandoc templates are stored in ~/.pandoc/templates. The default latex template is available with the command pandoc -D latex. So I created a new template:

Now I can place the listings style info directly into ty.latex:

Check the documentation for listings to see all the options. The code here sets the default options for the R blocks (source and output), including boxing and colouring strings and comments. In addition, R source code blocks will be shaded gray; the background of the output remains white.

This is just the head of the file; the rest of it is unchanged from the default. I don’t understand all the options, so I’ll leave them alone for now. I did change the fonts to tgschola for the body, DejaVuSansMono for the code chunks, and added my standard geometry and natbib options. I haven’t used this template with references yet, so that may need tweaking.

With the template in place, I can now generate a pdf with my preferred formatting directly from the source Rmd, using the following script:

The first line just bundles up the knitr call, using Rscript to run a self-contained R session for the processing. Next, we use pandoc to generate the LaTeX file. I use sed to add the style options, and finally call texi2pdf to generate the final document.

And here’s the result:

Example rmd2pdf document

That’s a fairly long and winding path to travel! Now that it’s done, I can use all the features of listings, and only need remember the simple R Markdown formatting for my day-to-day writing.

The Reindeer Botanist, by Wendy Dathan

The following book review first appeared in The Canadian Field Naturalist, Volume 127, #3, 2013 pages 282-283. As I am a federal employee, copyright remains with the crown. You are welcome to redistribute the text for non-commercial use.

Alfred Erling Porsild, the subject of Wendy Dathan's biography "The Reindeer Botanist"

Alfred Erling Porsild, the subject of Wendy Dathan’s biography “The Reindeer Botanist”

Erling Porsild is a legendary figure in Canadian botany. His personal accomplishments as an explorer, taxonomist and biogeographer have rightly earned him a place of honour in Canadian science. More than that, his career crossed an important transition in floristic research. When he arrived in this country, vast swaths of the boreal and arctic regions of Canada had never been visited by biologists. By the end of his career, he had personally documented the flora of large parts of the Yukon, Northwest Territories, and the Hudson Bay Lowlands, and the number of places in Canada’s far north that remained unknown to science was shrinking fast.

Continue reading

Preparing Rubus samples for herbarium study

Collecting blackberries and their relatives (Rubus spp.) for herbarium study is particularly challenging, and even experienced field-botanists may not appreciate everything that is involved. More than in other vascular plant groups, to make a good Rubus specimens, you need to understand a bit about the their life-cycle.

A single Rubus allegheniensis specimen, with the first-year primocane on the right, and the second-year floricane on the left

A single Rubus allegheniensis specimen, with the first-year primocane on the right, the second-year floricane on the left, and my expert Rubus presser Charlotte in the middle. Note that the primocane is unbranched, while the floricane has many flowering branches

Continue reading

CSI Ottawa: how plant taxonomy foiled a Japanese plot to fire-bomb Saskatoon in WWII

I’ve been reading Wendy Dathan’s monumental biography of Erling Porsild, The Reindeer Botanist (read Jeff Saarela’s review). Erling had an incredibly varied career, which included playing cowboy to Canada’s first herd of domestic reindeer, and serving as Canadian consul to Greenland during the second world war. He was first and foremost a botanist, though, and it was in his capacity as chief botanist at the Canadian Museum of Nature that he had the opportunity to avert an impending Japanese attack on the nation.

Alfred Erling Porsild, Chief Botanist of the Canadian Museum of Nature

Alfred Erling Porsild, Chief Botanist of the Canadian Museum of Nature

Continue reading

Processing ABI .fsa files in R, part 1.

I’ve been working on a lot of AFLP data this winter. I’d really like to be able to do all the analysis in R, for a few reasons. First, it would mean no more fighting with GeneMapper, which is incredibly frustrating: it’s Windows-only, expensive, closed-source and painfully underpowered for the job. Second, presumably if I can figure out how to code this myself I will develop a deeper understanding of the system. And third, if I can get the code working in R, I will be able to automate most of the process.

There are two R projects already in progress for working with AFLP data. RawGeno is one option. It doesn’t yet allow for importing fsa files directly, but the example scripts provide some clues about how to do this. I couldn’t get the code to work as written, but I was able to steal some ideas from it.

The other R package is AFLP. This package includes a read.fsa() function, but it doesn’t seem to work yet. I understand they’ve only recently switched to ABI sequencers, and haven’t yet updated their code. AFLP also combines reading the fsa files, calibrating the sizing, and defining the bins into one step. That’s a sensible thing to do, but I’m not that clever. I need to break things into small pieces if I hope to get anywhere.

Since one of my goals is self-education, I’m not concerned about duplicating some of the effort of these other projects. In fact, I’m going to try and steal as much as I can from them. That’s one of the benefits of Free Software, we get to learn from each other.

Continue reading

My own private Flora of North America

The Flora of North America is a great resource for botanists. The books are nice, and even better, almost all of the keys and images are also freely available online. These keys are generally the best available for a genus or family, unless you’re lucky enough to work in an area with a very recent local flora.

However, since the keys have to include all the species found anywhere in the US and Canada, they tend to be long and convoluted. On occasion I’ve rewritten some of the keys, to trim them down to the species where I live. This is tedious work, and I usually give up before I’ve done more than one or two of my favourite Carex sections.

Enter Python. I’ve been looking for a good project to try Python for a while, and it turns out it’s a great tool for scraping websites and reformatting the data to suit your needs.

(If you’re not interested in programming, you may want to skip to the end product, which is my draft key to the sedges of Ontario. The rest of this post is about the code I wrote to make it. On the other hand, if you are interested in Python, what follows may horrify you. It’s my first Python program, so it’s bound to be ugly. This may lead you to wonder who exactly is the intended audience for this post. That’s a good question.)

Continue reading