9 min read

Niche Quantification with Ecospat

§r §sdm

The ecospat package (Cola et al. 2017) provides code to quantify and compare the environmental and geographic niche of two species, or of the same species in different contexts (e.g., in its native and invaded ranges). The included vignette explains how to do such analyses.

However, the vignette assumes you already have a matrix of occurrence records, along with the climate data for each of those records. In our work, we typically have to construct those matrices from observation data (herbarium records, iNaturalist observations, etc) and climate rasters (e.g. Fick and Hijmans 2017). This short tutorial will walk through the steps necessary to do this.

Packages

In addition to ecospat, we’ll use raster (Hijmans 2021) to download WorldClim (Fick and Hijmans 2017) rasters, and manipulate the spatial data; rgbif (Chamberlain et al. 2021) to download GBIF records, and maptools (Bivand and Lewin-Koh 2021) to get a world basemap for plots.

library(ecospat)
library(raster)
library(rgbif)
library(maptools)

NB there is a bug in ecospat that prevents us from using the argument geomask (see below). This has been fixed, but as of 2021-07-29, the bug fix has not made it into the released package, currently version 3.2. Consequently, you need to install directly from the development sources:

library(devtools)
install_github(repo = "ecospat/ecospat/ecospat")

Presumably this won’t be necessary for versions 3.3+ or newer (once released).

Getting Data

We’ll start by sourcing our data. For observations, let’s take a look at Purple Loosestrife, a wetland species that is native to Europe, and invasive in North America. For actual research work, I normally download the files directly from GBIF, and examine them carefully to check for errors or missing data. For this demo we’ll use the rgbif package to download the data directly into R, and we’ll assume there are no problems.

lsGBIF <- occ_search(scientificName = "Lythrum salicaria",
                    limit = 10000,
                    basisOfRecord = "Preserved_Specimen",
                    hasCoordinate = TRUE,
                    fields = c("decimalLatitude",
                               "decimalLongitude", "year",
                               "country", "countryCode"))

save(lsGBIF, file = "../data/2021-07-29-ls-gbif-recs.Rda")

This returned an object with 7969 records. I saved that locally, so that I’m not making GBIF search their database everytime I work on this demo.

load("../data/2021-07-29-ls-gbif-recs.Rda")
lsOccs <- lsGBIF$data

lsGBIF$data is the table with the actual records in it. That’s what we’ll be working with. The other components of lsGBIF are metadata related to the original GBIF search. That’s useful to have, but not needed for the rest of this example.

Next, we tell R which columns are the coordinates, which allows us to map the observations. This also converts our observation matrix to a SpatialPointsDataFrame object.

coordinates(lsOccs) <- c("decimalLongitude",
                        "decimalLatitude") 
data(wrld_simpl) # load the maptools worldmap

par(mar = c(0,0, 0, 0))
plot(wrld_simpl, border = "gray80")
points(lsOccs, pch = 16, col = 2, cex = 0.3)

To get our climate data, we can use raster’s getData function. The first time you call this function in a directory, it downloads the data from the internet, and saves it locally. Subsequent calls will load your local copy of the data, to speed things up. I’m using the coarsest resolution (10 minutes) to speed things up for this demonstration:

wclim <- getData("worldclim", var = "bio", res = 10,
                path = "../data")

We can take a look at one layer:

par(mar = c(0,0, 3, 1))
plot(wclim[["bio1"]], main = "bio1")

Next, we need to extract the environmental values from the climate rasters for each of our observation records:

lsOccs <- cbind(lsOccs, extract(wclim, lsOccs))

In the process of extracting wclim values for our observations, we usually end up with a few missing values. This is a consequence of mismatches between the observation coordinates and the climate rasters. In some cases, the observations are placed off the coast in the ocean, or in another area where there is no climate data available. We need to exclude these missing values from our analysis.

lsOccs <- lsOccs[complete.cases(data.frame(lsOccs)), ]

Splitting Data

At this point, all the data we need for the Niche Quantification analysis is in lsOccs and wclimMat. We need to split this data into native and invasive regions for our comparison. We’ll restrict ourselves to the northern hemisphere north of 20 degrees, and consider all records from Eurasia as native, and all records from North America as invasive.

I’ve created extents to cover the rough outlines of the areas in question. In practice, you could use a more carefully constructed vector map to split your data.

## North America: na
## Longitude from 40 to 180W, Latitude from 20 to 90N
naExt <- extent(c(-180, -40, 20, 90))
lsNA <- crop(lsOccs, naExt)

## Eurasia: ea
## Longitude from 40W to 180E, Latitude from 20 to 90N
eaExt <- extent(c(-40, 180, 20, 90))
lsEA <- crop(lsOccs, eaExt)

par(mar = c(1, 0, 0, 0))
plot(wrld_simpl, ylim = c(20, 80), axes = FALSE)
points(lsNA, pch = 16, col = 'red', cex = 0.5)
points(lsEA, pch = 16, col = 'darkgreen', cex = 0.5)

For the Niche Quantification, we need to have a matrix with the background environment present in the native and invasive ranges, as well as the complete global environmental including the combined extent of the native and introduced environments. After cropping, we use getValues to convert the raster to a dataframe.

## Crop Climate Layers:
naEnvR <- crop(wclim, naExt)
eaEnvR <- crop(wclim, eaExt)

## Extract values to matrix:
naEnvM <- getValues(naEnvR)
eaEnvM <- getValues(eaEnvR)

## Clean out missing values:
naEnvM <- naEnvM[complete.cases(naEnvM), ]
eaEnvM <- eaEnvM[complete.cases(eaEnvM), ]

## Combined global environment:
globalEnvM <- rbind(naEnvM, eaEnvM)

Niche Quantification

PCA

The Niche Quantification analysis starts with a Principal Components Analysis of the environmental data. The actual ordination uses the global data, with the observation records and the native and invasive background environment treated as supplemental rows.

pca.clim <- dudi.pca(globalEnvM, center = TRUE,
                    scale = TRUE, scannf = FALSE, nf = 2)
global.scores <- pca.clim$li

nativeLS.scores <-
  suprow(pca.clim,
         data.frame(lsEA)[, colnames(globalEnvM)])$li   
invasiveLS.scores <-
  suprow(pca.clim,
         data.frame(lsNA)[, colnames(globalEnvM)])$li

nativeEnv.scores <- suprow(pca.clim, naEnvM)$li
invasiveEnv.scores <- suprow(pca.clim, eaEnvM)$li

Let’s break that down. dudi.pca does a PCA analysis on globalEnvM, which is a matrix of all the environmental variables over the entire study area. We use that to create a two-dimensional summary of the total environmental variability.

Next, we map our observation data (lsEA and lsNA) into that 2-dimensional ordination, using the suprow function. lsEA and lsNA are SpatialPointsDataFrame objects. Sometimes you can treat them as if they were data.frames, but other times you need to explicity convert them. This is one of those times, hence I’ve wrapped them in data.frame().

Recall that lsEA and lsNA have more columns than the environmental matrix: they also include year, countryCode, country. We only want to include the environmental variables when you project the observations into the ordination. To make sure that we use the same variables as in the original ordination of globalEnvM, in the same order, I select the columns explicitly to match that object:

data.frame(lsEA)[, colnames(globalEnvM)]

The output of dudi.pca and suprow includes a lot of information that we aren’t using here. We only need the li element, so I’ve selected that from each of the function outputs.

Occurence Densities Grid

Finally we’re ready to do the Niche Quantification/Comparisons. We’ll use the PCA scores for the global environment, the native and invasive environments, and the native and invasive occurrence records.

nativeGrid <- ecospat.grid.clim.dyn(global.scores,
                                   nativeEnv.scores,
                                   nativeLS.scores)

invasiveGrid <- ecospat.grid.clim.dyn(global.scores,
                                   invasiveEnv.scores, 
                                   invasiveLS.scores)

ecospat.plot.niche.dyn(nativeGrid, invasiveGrid,
                       quant = 0.05) 

The resulting plot shows us the environmental conditions present in Eurasia (inside the green line) and North America (inside the red line). The green area represents environments occupied by Lythrum salicaria in Eurasia, but not in North America, the red area shows environments occupied in North America and not Eurasia, and the blue area shows environments occupied in both ranges. We can also see that there are a few areas in Eurasia with environments not present in North America, and vice versa. However, for the most part, Lythrum salicara doesn’t occur in this environments (except for a tiny bit of green in the center of the plot).

Geographic Comparisons

You can also apply this analysis to geographic locations, instead of environmental conditions. This won’t make much sense for native vs invaded range comparisons, but it could be useful for comparing different species within the same area.

To demonstrate, let’s compare the distribution of Lythrum salicaria in North America before and after 1950. We use geographic coordinates here, so no need for a PCA. We do need to generate the ‘background’ coordinates. I’ll use expand.grid to create the locations for this. I’ve broken up the NA extent into 500 x 500 grids.

lsNAearly <- subset(lsNA, year <= 1950)
lsNAlate <- subset(lsNA, year > 1950)
geoGrid <- expand.grid(longitude =
                        seq(-160, -40, length.out = 500),
                      latitude =
                        seq(20, 90, length.out = 500))

earlyGeoGrid <- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                     coordinates(lsNAearly))

lateGeoGrid <- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                    coordinates(lsNAlate))

ecospat.plot.niche.dyn(earlyGeoGrid, lateGeoGrid, quant = 0)
plot(wrld_simpl, add = TRUE)

This looks pretty good. However, ecospat uses a kernel density formula to model the occurence distributions. As a consequence, it projects out into the ocean, which isn’t very realistic. To correct this, we need to mask the analysis to the continental land mass. This requires we have a vector map of the desired area. I’ll combine the US, Canada, and Mexico polygons from wrld_simpl for this purpose.

naMask <- bind(subset(wrld_simpl, NAME == "Canada"),
              subset(wrld_simpl, NAME == "United States"),
              subset(wrld_simpl, NAME == "Mexico"))

earlyGeoGrid <- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                     coordinates(lsNAearly),
                                     geomask = naMask)

lateGeoGrid <- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                    coordinates(lsNAlate),
                                    geomask = naMask)

ecospat.plot.niche.dyn(earlyGeoGrid, lateGeoGrid, quant = 0)
plot(wrld_simpl, add = TRUE)

That gives more reasonable results.

Summary

This is a fairly quick overview of this workflow. You’ll almost certainly want to consider thinning your observations, among other data cleaning procedures. I’ve also set the study extent very crudely. That might be appropriate for very large scale (global) studies. But you’ll usually want to think a bit more carefully about how you set your extent. The way you process your data will also differ depending on your context.

References

Bivand, Roger, and Nicholas Lewin-Koh. 2021. Maptools: Tools for Handling Spatial Objects. Manual.
Chamberlain, Scott, Vijay Barve, Dan Mcglinn, Damiano Oldoni, Peter Desmet, Laurens Geffert, and Karthik Ram. 2021. Rgbif: Interface to the Global Biodiversity Information Facility API. Manual.
Cola, Valeria Di, Olivier Broennimann, Blaise Petitpierre, Frank T. Breiner, Manuela D’Amen, Christophe Randin, Robin Engler, et al. 2017. “Ecospat: An R Package to Support Spatial Analyses and Modeling of Species Niches and Distributions.” Ecography 40 (6): 774–87. https://doi.org/10.1111/ecog.02671.
Fick, Stephen E., and Robert J. Hijmans. 2017. WorldClim 2: New 1-Km Spatial Resolution Climate Surfaces for Global Land Areas.” International Journal of Climatology 37 (12): 4302–15. https://doi.org/10.1002/joc.5086.
Hijmans, Robert J. 2021. Raster: Geographic Data Analysis and Modeling. Manual.