R on plantarum.ca

R on plantarum.ca https://plantarum.ca/tags/r/ Recent content in R on plantarum.ca Hugo -- gohugo.io en-us Thu, 04 Apr 2024 00:00:00 +0000 Preparing GBIF records for distribution modeling https://plantarum.ca/2024/04/04/record-cleaning/ Thu, 04 Apr 2024 00:00:00 +0000 https://plantarum.ca/2024/04/04/record-cleaning/ GBIF.org The Global Biodiversity Information Facility (GBIF.org) has become the standard open-access online database of occurrence records for all manner of biological organisms. It was initially a clearinghouse for museum records (such as herbarium specimens), but now includes iNaturalist observations (those that are rated ‘research’ grade), survey data, and a growing variety of taxonomic and checklist sources. While GBIF’s expansion increases the overall value of the database, it also means we need to be more circumspect in how we use the data. Niche Quantification with Ecospat and Terra https://plantarum.ca/2023/07/28/ecospat-terra/ Fri, 28 Jul 2023 00:00:00 +0000 https://plantarum.ca/2023/07/28/ecospat-terra/ Introduction This is an update of my previous ecospat tutorial. Spatial analysis in R is shifting to terra and sf as the primary packages, so I’ve translated my old, raster-based tutorial to the new workflow. I also took this opportunity to clean up and extend the original tutorial. See the RSpatial tutorial for a more detailed introduction/overview of using terra for GIS/spatial analysis. Note this analysis depends on the ecospat package, and as of 2023-07-28 ecospat doesn’t support the spatial objects produced by terra. Managing Absolute Paths in Reproducible Analyses https://plantarum.ca/2023/02/14/path_switching/ Tue, 14 Feb 2023 00:00:00 +0000 https://plantarum.ca/2023/02/14/path_switching/ In a previous post on reproducible analysis, I explained the importance of using relative paths in your scripts, and organizing your data in a single directory, in order to maintain portability. You want to be able to pack up your analysis in a zip file, or upload it as a single directory to GitHub or Dropbox, in order to share it with colleagues, or transfer it to a new computer. Simple Maps in R with Terra https://plantarum.ca/2023/02/13/terra-maps/ Mon, 13 Feb 2023 00:00:00 +0000 https://plantarum.ca/2023/02/13/terra-maps/ Reference This is an update of my previous mapping tutorial. Spatial analysis in R is shifting to terra and sf as the primary packages, so I’ve translated my old, raster-based tutorial to the new workflow. See the RSpatial tutorial for a more detailed introduction/overview of using terra for GIS/spatial analysis. The following tutorial walks through some common plotting tasks I use for distribution models. Basemaps The geodata package provides several convenient functions for downloading raster and vector maps for use as basemaps and spatial analysis. Data Management for Reproducible Science https://plantarum.ca/2022/10/17/data_management/ Mon, 17 Oct 2022 00:00:00 +0000 https://plantarum.ca/2022/10/17/data_management/ Introduction Research is reproducible when others can reproduce the results of a scientific study given only the original data, code, and documentation (Alston and Rick, 2021) Benefits to the Author: Clear and complete documentation of your work makes it easier to share, write up and extend in future work, including responding to reviewers and developing new projects Conscientious documentation of your work involves a great deal of error-checking, which is reassuring to you – that you haven’t missed anything, or mis-remembered what you did; and to your readers – that you have conducted your work in a rigorous manner Reproducible work gets cited more, and developing a data archive creates a new citable product from your research. Schoener's D and Study Extent https://plantarum.ca/2021/12/02/schoenersd/ Thu, 02 Dec 2021 00:00:00 +0000 https://plantarum.ca/2021/12/02/schoenersd/ Background Schoener’s D was created by Schoener (1968) He was studying the feeding niche of anoles, and needed a way to quantify the overlap in prey items for different species. This is what he came up with: \[D(p_X, p_X) = 1 - \frac{1}{2} \sum_i \vert p_{X,i} - p_{Y, i} \vert\] Here, \(p_{X,i}\) and \(p_{Y,i}\) are the frequencies for species \(X\) and \(Y\), respectively, for the \(i^{th}\) category. For Schoener, the categories were prey sizes. Thinning Occurrence Records in R https://plantarum.ca/2021/10/26/r-gridsample/ Tue, 26 Oct 2021 00:00:00 +0000 https://plantarum.ca/2021/10/26/r-gridsample/ Note that this tutorial refers to the thinning method used in the old version of the rspatial.org tutorial, which used the raster package (along with dismo) for the GIS computations. The terra package will shortly be replacing raster, and all new code should use this instead. The details of spatial thinning with terra are presented in my new ecospat tutorial A common approach to reducing spatial bias in occurrence records is to randomly select one (or a small number) of samples present in each cell in the landscape. Emacs for Bioinformatics #4: RMarkdown https://plantarum.ca/2021/10/03/emacs-tutorial-rmarkdown/ Sun, 03 Oct 2021 00:00:00 +0000 https://plantarum.ca/2021/10/03/emacs-tutorial-rmarkdown/ This is part four in my series of Emacs tutorials aimed at bioinformatics (and other scientific analysis) workflows. See the rest on my tutorials page. Emacs provides full support for editing RMarkdown documents. RMarkdown has extensive documentation, both at the previous RStudio link, and several free online books by Xie et al. (notably R Markdown: The Definitive Guide, but also several others listed on Yihui Xie’s Bookdown page). Most of these references assume you are using the RStudio development environment. Evaluating Invasion Stage with SDMs https://plantarum.ca/2021/08/11/invasion-stage/ Wed, 11 Aug 2021 00:00:00 +0000 https://plantarum.ca/2021/08/11/invasion-stage/ My attempt to recreate the invasion stage analysis developed by Gallien et al. (2012), inspired by seeing it applied by Eckert et al. (2020). We’ll continue with the Lythrum salicaria data from my tutorial on niche quantification analysis. Specifically, I’ll model how the niche space this species occupies in its invaded range in North America relates to its global niche. library(ecospat) library(raster) library(rgbif) library(maptools) library(magrittr) library(dismo) Gallien et al. (2012) used an ensemble of SDMs, which is (should be) more robust than applying a single approach. Niche Quantification with Ecospat https://plantarum.ca/2021/07/29/ecospat/ Thu, 29 Jul 2021 00:00:00 +0000 https://plantarum.ca/2021/07/29/ecospat/ The ecospat package (Cola et al. 2017) provides code to quantify and compare the environmental and geographic niche of two species, or of the same species in different contexts (e.g., in its native and invaded ranges). The included vignette explains how to do such analyses. However, the vignette assumes you already have a matrix of occurrence records, along with the climate data for each of those records. In our work, we typically have to construct those matrices from observation data (herbarium records, iNaturalist observations, etc) and climate rasters (e. GBS Admixture Analysis Workflow https://plantarum.ca/2021/06/01/admixture/ Tue, 01 Jun 2021 00:00:00 +0000 https://plantarum.ca/2021/06/01/admixture/ Admixture is a program for completing STRUCTURE-style analyses of large SNP datasets, such as we get with GBS (Elshire et al. 2011). This short tutorial covers getting our SNP data from STACKS (Rochette, Rivera‐Colón, and Catchen 2019) into a format that Admixture will understand, running the analysis, and importing the results into R for further investigation & plotting. Converting Stacks Output to Admixture Input Both Stacks and Admixture can process PLINK data. Adding Lat/Lon Grids to Maps in R https://plantarum.ca/2021/02/22/graticules-r/ Mon, 22 Feb 2021 00:00:00 +0000 https://plantarum.ca/2021/02/22/graticules-r/ In a previous post, I outlined my workflow for preparing maps in R. Today I had to add a graticule, a grid of latitude and longitude lines, to my maps. That’s easy enough to do with unprojected maps, as the plot coordinates are latitude and longitude, so your X and Y axes are already graticules. But if you’ve projected your data, the plot coordinates are on a different scale, so you need to do a bit of tuning. Emacs for Bioinformatics #3: R and ESS https://plantarum.ca/2020/12/30/emacs-tutorial-03/ Wed, 30 Dec 2020 00:00:00 +0000 https://plantarum.ca/2020/12/30/emacs-tutorial-03/ This is part three in my series of Emacs tutorials aimed at bioinformatics (and other scientific analysis) workflows. See the rest on my tutorials page. Emacs support for the R programming language is provided by the ESS package (AKA, “Emacs Speaks Statistics”). ESS has been around since at least 1994, and is supported by a very active development team. It provides most or all of the features of the more widely-known RStudio, as well as a great many more. Plotting Simple Maps in R https://plantarum.ca/2020/10/30/simple-maps-r/ Fri, 30 Oct 2020 00:00:00 +0000 https://plantarum.ca/2020/10/30/simple-maps-r/ NOTE: This tutorial uses older R packages that are scheduled to be deprecated at the end of 2023. I have updated this tutorial using the new packages. Unless you need to use older code, you should use the new Terra-based approach instead of this! Reference See the RSpatial tutorial for a more detailed introduction/overview of using R for GIS/spatial analysis. The following tutorial walks through some common plotting tasks I use for distribution models. Medium Performance Cluster Computing https://plantarum.ca/2014/08/19/medium-performance-cluster-computing/ Tue, 19 Aug 2014 00:00:00 +0000 https://plantarum.ca/2014/08/19/medium-performance-cluster-computing/ I recently ran into a crunch getting some memory-intensive GIS analysis completed. My work laptop has 2 CPUs and 4GB RAM, and running one instance of the GRASS GIS r.horizon command on a 16GB map was gobbling up 8GB of virtual RAM, which temporarily ground my machine to a crawl before the process was killed. GRASS is not yet installed on the high performance cluster at work, so I decided to try setting up my own medium performance cluster on a Digital Ocean1 VPS (which they refer to as ‘droplets’). Publication Quality R Figures https://plantarum.ca/2014/02/19/r-graphics/ Wed, 19 Feb 2014 00:00:00 +0000 https://plantarum.ca/2014/02/19/r-graphics/ Introduction Learning Objectives Pre-requisites Motivation Building Our Plot Size Content Plot Symbols Margins Axes The finished plot Exercise 1: adding a legend Additional Customization Selecting Plot Symbols Panels Exercise 2: Completing the Panel Image Formats Raster Images Vector Images Figure 1: A. Iris Sepal Size by Species. B. Iris Petal Width Introduction Learning Objectives At the end of this lesson, you should be able to: Customize plots produced with the R base graphics system Design multi-panel plots Design plots to suit the publication requirements of a journal Save your plots as high-resolution raster or vector image files as required by your publisher Pre-requisites You will need: