<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>R on plantarum.ca</title>
    <link>https://plantarum.ca/tags/r/</link>
    <description>Recent content in R on plantarum.ca</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Fri, 10 May 2024 00:00:00 +0000</lastBuildDate>
    
        <atom:link href="https://plantarum.ca/tags/r/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Spatial Tutorials Update</title>
      <link>https://plantarum.ca/2024/05/10/terra-time/</link>
      <pubDate>Fri, 10 May 2024 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2024/05/10/terra-time/</guid>
      <description>


&lt;p&gt;A quick update. The &lt;a href=&#34;https://rspatial.org/&#34;&gt;spatial analysis libraries&lt;/a&gt; in
the &lt;a href=&#34;https://www.r-project.org/&#34;&gt;R Project&lt;/a&gt; have undergone a substantial
change in the past couple of years. The details are laid out in the &lt;a href=&#34;https://r-spatial.org/&#34;&gt;R
spatial blog&lt;/a&gt;, but the crux of the issue is that
legacy packages &lt;code&gt;rgdal&lt;/code&gt; and &lt;code&gt;rgeos&lt;/code&gt; have been retired, and packages that
depend on them (such as &lt;code&gt;raster&lt;/code&gt; and &lt;code&gt;sp&lt;/code&gt;) will have been modified to use
new dependencies, or replaced entirely. For the most part, the things we
used to do with &lt;code&gt;raster&lt;/code&gt; we now do with
&lt;a href=&#34;https://rspatial.org/index.html&#34;&gt;&lt;code&gt;terra&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The transition was a bit rough, and for a while we needed to translate back
and forth between &lt;code&gt;terra&lt;/code&gt; and &lt;code&gt;raster&lt;/code&gt; in our work. That &lt;em&gt;should&lt;/em&gt; now be
over, with all current packages using the new &lt;code&gt;terra&lt;/code&gt;-based workflow.&lt;/p&gt;
&lt;p&gt;I have already updated my &lt;a href=&#34;https://plantarum.ca/2023/02/13/terra-maps&#34;&gt;quick mapping tutorial&lt;/a&gt;,
and I’ve just updated my &lt;a href=&#34;https://plantarum.ca/2023/07/28/ecospat-terra&#34;&gt;ecospat tutorial&lt;/a&gt;, now
that &lt;code&gt;ecospat&lt;/code&gt; has been fully updated to use &lt;code&gt;terra&lt;/code&gt; too. Some of the other
spatial tutorials you find here may not work properly, or at all, until I
have a chance to review them. If you happen to find anything that isn’t
working as expected, please let me know!&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Preparing GBIF records for distribution modeling</title>
      <link>https://plantarum.ca/2024/04/04/record-cleaning/</link>
      <pubDate>Thu, 04 Apr 2024 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2024/04/04/record-cleaning/</guid>
      <description>


&lt;div id=&#34;gbif.org&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;GBIF.org&lt;/h1&gt;
&lt;p&gt;The Global Biodiversity Information Facility
(&lt;a href=&#34;https://www.gbif.org/&#34;&gt;GBIF.org&lt;/a&gt;) has become the standard open-access
online database of occurrence records for all manner of biological
organisms. It was initially a clearinghouse for museum records (such as
herbarium specimens), but now includes
&lt;a href=&#34;https://www.inaturalist.org&#34;&gt;iNaturalist&lt;/a&gt; observations (those that are
rated &lt;a href=&#34;https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7&#34;&gt;‘research’
grade&lt;/a&gt;),
survey data, and a growing variety of taxonomic and checklist sources.&lt;/p&gt;
&lt;p&gt;While GBIF’s expansion increases the overall value of the database, it
also means we need to be more circumspect in how we use the data. When I
first encountered GBIF decades ago, I used it as one of several sources for
herbarium records. I searched for the species I was looking for, and
received a list of museum specimens. Nowadays most, maybe all, online
herbarium data is mirrored by GBIF, so I no longer need to chase down
multiple websites to round up all the herbarium records I need.&lt;/p&gt;
&lt;p&gt;However, I can no longer assume that a GBIF record represents a physical
specimen. It could be: a human observation, with or without an associated
image; documentation harvested from sequence data submitted to
&lt;a href=&#34;https://www.ncbi.nlm.nih.gov/genbank/&#34;&gt;Genbank&lt;/a&gt;; an entry from a field
survey, which sometimes contain records for both &lt;em&gt;presences&lt;/em&gt; and
&lt;em&gt;absences&lt;/em&gt;. And of course, all of these records are subject to any number
of issues: transcription errors, identification errors, georeferencing
errors.&lt;/p&gt;
&lt;p&gt;All of which to say, we need a way to filter the results of our GBIF query
to ensure the data we receive is fit for purpose.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;getting-gbif-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Getting GBIF data&lt;/h1&gt;
&lt;p&gt;Step one is actually getting the data from GBIF. You can do this from the
website, but I now prefer to do this in my &lt;a href=&#34;https://www.r-project.org/&#34;&gt;R&lt;/a&gt;
scripts, using the
&lt;a href=&#34;https://docs.ropensci.org/rgbif/articles/rgbif.html&#34;&gt;rgbif&lt;/a&gt; package.&lt;/p&gt;
&lt;div id=&#34;finding-taxon-names&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Finding taxon names&lt;/h2&gt;
&lt;p&gt;To get started, we need to match our name up with the GBIF taxonomic
backbone:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(rgbif)
conyza &amp;lt;- name_backbone(&amp;quot;Conyza canadensis&amp;quot;)
erigeron &amp;lt;- name_backbone(&amp;quot;Erigeron canadensis&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This snippet searches the GBIF database for “&lt;em&gt;Conyza canadensis&lt;/em&gt;” and
“&lt;em&gt;Erigeron canadensis&lt;/em&gt;” and returns the closest matches. The results can be
a bit confusing to interpret. To make sense of it, we need to understand a
few terms.&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;usageKey&lt;/code&gt; is a unique number associated with every taxon in the
database, at every level. This includes species, subspecies, genera etc,
and importantly, it includes both &lt;em&gt;accepted species&lt;/em&gt; and &lt;em&gt;synonyms&lt;/em&gt;.
Complicating things, GBIF taxonomy tables use the term &lt;code&gt;usageKey&lt;/code&gt;, but in
the individual observation records the term &lt;code&gt;taxonKey&lt;/code&gt; is used instead.
They both mean the same thing - you’ll use the &lt;code&gt;usageKey&lt;/code&gt; you get from your
&lt;code&gt;name_backbone&lt;/code&gt; search as the &lt;code&gt;taxonKey&lt;/code&gt; in the query you submit (see
below).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;acceptedUsageKey&lt;/code&gt; (or &lt;code&gt;acceptedTaxonKey&lt;/code&gt;) is the number associated
with every &lt;em&gt;accepted&lt;/em&gt; taxon. For taxa that are synonyms, their
&lt;code&gt;acceptedUsageKey&lt;/code&gt; is the &lt;code&gt;usageKey&lt;/code&gt; of the accepted taxon they belong to.&lt;/p&gt;
&lt;p&gt;To illustrate the distinctions, let’s look at the values for the examples
above.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Key&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;scientificName&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Conyza canadensis (L.) Cronquist&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;usageKey&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;5404801&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;status&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;SYNONYM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;acceptedUsageKey&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;3146791&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr /&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr class=&#34;header&#34;&gt;
&lt;th align=&#34;left&#34;&gt;Key&lt;/th&gt;
&lt;th align=&#34;left&#34;&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;scientificName&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;Erigeron canadensis L.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;even&#34;&gt;
&lt;td align=&#34;left&#34;&gt;usageKey&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;3146791&lt;/td&gt;
&lt;/tr&gt;
&lt;tr class=&#34;odd&#34;&gt;
&lt;td align=&#34;left&#34;&gt;status&lt;/td&gt;
&lt;td align=&#34;left&#34;&gt;ACCEPTED&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In this case &lt;em&gt;Conyza canadensis&lt;/em&gt; is a synonym of &lt;em&gt;Erigeron canadensis&lt;/em&gt;. The
name &lt;em&gt;Conyza canadensis&lt;/em&gt; has it’s own &lt;code&gt;usageKey&lt;/code&gt;: 5404801. Its
&lt;code&gt;acceptedUsageKey&lt;/code&gt; is 3146791, which is the &lt;code&gt;usageKey&lt;/code&gt; for &lt;em&gt;Erigeron
canadensis&lt;/em&gt;. An additional wrinkle is that accepted taxa only have a
&lt;code&gt;usageKey&lt;/code&gt;, they don’t have an &lt;code&gt;acceptedUsageKey&lt;/code&gt;. You can also use the
&lt;code&gt;status&lt;/code&gt; field to check if a taxon is &lt;em&gt;accepted&lt;/em&gt; or a &lt;em&gt;synonym&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Understanding this is important to make sure you get the records you’re
after when you query the database. If you request data for &lt;code&gt;taxonKey&lt;/code&gt;
5404801 (the &lt;code&gt;usageKey&lt;/code&gt; for &lt;em&gt;Conyza canadensis&lt;/em&gt;), you’ll get records with
that name on them, but &lt;em&gt;not&lt;/em&gt; records for &lt;em&gt;Erigeron canadensis&lt;/em&gt;. On the
other hand, if you search for &lt;code&gt;taxonKey&lt;/code&gt; 3146791, you’ll get records for
&lt;em&gt;Erigeron canadensis&lt;/em&gt;, and also all records for any synonyms of that name,
including &lt;em&gt;Conyza canadensis&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In other words, searching for an &lt;em&gt;accepted&lt;/em&gt; taxon will return results for
that taxon including all its synonyms. Searching for a &lt;em&gt;synonym&lt;/em&gt; will
return results only for that synonym.&lt;/p&gt;
&lt;p&gt;You can search for records by name, without using the &lt;code&gt;usageKey&lt;/code&gt;. But it’s
safer to look up and use the &lt;code&gt;usageKey&lt;/code&gt;, to confirm that the name you asked
for matches with something in the GBIF database.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;preparing-a-query&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Preparing a query&lt;/h2&gt;
&lt;p&gt;Once you have a list of one or more &lt;code&gt;taxonKey&lt;/code&gt; values, you’re ready to
request your data. For large record sets, and especially if you want to
request multiple species at once, the &lt;code&gt;rgbif&lt;/code&gt; function &lt;code&gt;occ_download_queue&lt;/code&gt;
is very convenient. Note that you need to have a (free) GBIF account in
order to use this.&lt;/p&gt;
&lt;p&gt;This is a three step process. You create your query with
&lt;code&gt;occ_download_prep&lt;/code&gt;, submit the query to GBIF with &lt;code&gt;occ_download_queue&lt;/code&gt;,
and once the query is done you download the results via &lt;code&gt;occ_download_get&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Starting with &lt;code&gt;occ_download_prep&lt;/code&gt;, a basic query only requires your account
credentials and one or more &lt;code&gt;taxonKeys&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;myQuery &amp;lt;- occ_download_prep(
  pred_in(&amp;quot;taxonKey&amp;quot;, c(3146791, 3189859)),
  pred(&amp;quot;hasCoordinate&amp;quot;, TRUE),
  format = &amp;quot;DWCA&amp;quot;,
  user = &amp;quot;YourUserName&amp;quot;,
  pwd = &amp;quot;YourGBIFPassword&amp;quot;,
  email = &amp;quot;your@email.address&amp;quot;
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;SECURITY NOTE&lt;/strong&gt; You can provide your GBIF username and password in your
script as I have done, but there are more secure ways to submit your
credentials without listing them in your code. I have mine stored in my
&lt;code&gt;~/.Renviron&lt;/code&gt; file, which looks like this:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;GBIF_USER=&amp;quot;my_user_name&amp;quot;
GBIF_PWD=&amp;quot;my_password,&amp;quot;
GBIF_EMAIL=&amp;quot;my@email.ca&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;See the &lt;a href=&#34;https://docs.ropensci.org/rgbif/articles/gbif_credentials.html&#34;&gt;rgbif
documentation&lt;/a&gt;
for more options.&lt;/p&gt;
&lt;p&gt;This will create a query for two taxa, as specified by the provided keys;
it will filter the results to include only records with coordinates (i.e.,
&lt;code&gt;hasCoordinate&lt;/code&gt; is TRUE); and the results will be in the &lt;a href=&#34;https://dwc.tdwg.org/&#34;&gt;Darwin Core
Format&lt;/a&gt; (i.e., “DWCA”). We reviewed taxon keys
above. If we’re mapping our records, and don’t have time or need to do any
georeferencing ourselves, we can save time by limiting our results to
records that already have coordinates.&lt;/p&gt;
&lt;p&gt;The default format is “DWCA”, which includes a &lt;em&gt;lot&lt;/em&gt; of columns in the
results. You can choose “SIMPLE_CSV” instead, and this will give you a
subset of commonly used fields. However, this limits your ability to filter
records after download, so I recommend sticking with “DWCA”.&lt;/p&gt;
&lt;p&gt;There are many other ways to filter records, documented in
&lt;code&gt;?download_predicate_dsl&lt;/code&gt;. Depending on your focus, you might want to
restrict the results by
&lt;a href=&#34;https://dwc.tdwg.org/terms/#dwc:basisOfRecord&#34;&gt;basisOfRecord&lt;/a&gt;, to select
only “HumanObservation” or “PreservedSpecimen”; or by
&lt;a href=&#34;https://dwc.tdwg.org/terms/#dwc:datasetID&#34;&gt;datasetID&lt;/a&gt; to select a specific
project (i.e., iNaturalist). However, I’ve found that these terms are not
applied consistently, so it may be better to download everything and filter
after you’ve had a chance to inspect the tables yourself.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;submitting-a-query&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Submitting a query&lt;/h2&gt;
&lt;p&gt;With a query ready, you can now submit it to GBIF via:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;out &amp;lt;- occ_download_queue(.list = list(myQuery))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We could have submitted the query directly, by using &lt;code&gt;occ_download&lt;/code&gt; above
instead of &lt;code&gt;occ_download_prep&lt;/code&gt;. The latter method offers two advantages.
First, we can submit multiple queries at once. e.g.,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;out &amp;lt;- occ_download_queue(.list = list(queryA, queryB,
                                       queryC)) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Second, GBIF allows you to submit up to three queries at a time. If you
have more, you have to wait until one of the earlier queries is finished.
&lt;code&gt;occ_download_queue&lt;/code&gt; keeps track of this for you, submitting three
requests, and sending additional requests as the first three finish.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;retrieving-your-query&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Retrieving your query&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;occ_download_queue&lt;/code&gt; returns the details of your submission(s):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$`46c5a45e8f1f2e64fc9eda5318e74972`
&amp;lt;&amp;lt;gbif download&amp;gt;&amp;gt;
  Your download is being processed by GBIF:
  ...
  Check status with
  occ_download_wait(&amp;#39;0025322-231120084113126&amp;#39;)
  After it finishes, use
  d &amp;lt;- occ_download_get(&amp;#39;0025322-231120084113126&amp;#39;) %&amp;gt;%
    occ_download_import()
  to retrieve your download.
  ...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Usually it takes a few minutes to process your request. You can check its
progress with &lt;code&gt;occ_download_wait(&#39;...&#39;)&lt;/code&gt;, using the details provided. Once
the query is done, you can download it via &lt;code&gt;occ_download_get&lt;/code&gt;, and read it
into &lt;code&gt;R&lt;/code&gt; with &lt;code&gt;occ_download_import&lt;/code&gt;, as shown.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: once your query is submitted by &lt;code&gt;occ_download_queue&lt;/code&gt;, it will be
processed remotely by GBIF. If something should happen in the meantime –
your computer crashes, or you cancel the function call for any reason –
the query will still be processed. However, should this happen you won’t
have the &lt;code&gt;downloadKey&lt;/code&gt; you need to retrieve the results. You can get this
key by logging into your GBIF account on the website, navigating to the
Downloads section of your profile, and clicking on either the &lt;code&gt;DOI&lt;/code&gt; or the
&lt;code&gt;SHOW&lt;/code&gt; links for the download in question. This will take you to a page
with the meta data for the query. It doesn’t include the actual
&lt;code&gt;downloadKey&lt;/code&gt;, but that value is present as the final part of the url, as
shown here:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;./gbif-downloadKey.jpg&#34; alt=&#34;The GBIF download summary page, showing the downloadKey value in the URL&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;The GBIF download summary page, showing the downloadKey value in the
URL&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;You can also click the &lt;code&gt;Download&lt;/code&gt; button on the right side to download the
file from your browser, if you prefer.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;cleaning-gbif-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Cleaning GBIF data&lt;/h1&gt;
&lt;p&gt;Now that we have our records downloaded, we need to review and clean the
data before analysis. Note that in the code below, I’m using a large
download to demonstrate my investigations. I haven’t included this data in
this tutorial, but you can try the code on your own data.&lt;/p&gt;
&lt;div id=&#34;filtering-based-on-the-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Filtering based on the data&lt;/h2&gt;
&lt;div id=&#34;basisofrecord&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;basisOfRecord&lt;/h3&gt;
&lt;p&gt;With the data in hand, we can take a closer look at what kind of records
they are, and where they came from. Starting with &lt;code&gt;basisofRecord&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;d1 &amp;lt;- occ_download_get(out[[1]], path = &amp;quot;./dl/&amp;quot;) %&amp;gt;% 
  occ_download_import()
sort(table(d1$basisOfRecord), decreasing = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;  HUMAN_OBSERVATION  PRESERVED_SPECIMEN          OCCURRENCE 
            4224215              325278              112588 
        OBSERVATION   MATERIAL_CITATION     LIVING_SPECIMEN 
             104335               17703                3123 
    MATERIAL_SAMPLE MACHINE_OBSERVATION     FOSSIL_SPECIMEN 
               1698                 222                  10 &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, I have nine different kinds of record. The definitions of
&lt;em&gt;some&lt;/em&gt; of these terms are listed in the &lt;a href=&#34;https://dwc.tdwg.org/terms/#livingspecimen&#34;&gt;Darwin Core Quick Reference
Guide&lt;/a&gt;. &lt;em&gt;HUMAN_OBSERVATION&lt;/em&gt;
includes, among other things, &lt;a href=&#34;https://www.inaturalist.org&#34;&gt;iNaturalist&lt;/a&gt;
records. &lt;em&gt;PRESERVED_SPECIMEN&lt;/em&gt; includes mostly (and most) herbarium records.
I usually want both of these groups.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;OCCURRENCE&lt;/em&gt; and &lt;em&gt;OBSERVATION&lt;/em&gt; aren’t well defined or consistently used, so
require further examination to determine if we want to retain them.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;MATERIAL_CITATION&lt;/em&gt;, &lt;em&gt;MATERIAL_SAMPLE&lt;/em&gt;, and &lt;em&gt;MACHINE_OBSERVATION&lt;/em&gt; are a
little vague, inconsistently used, and also not used often, so I remove
them.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;LIVING_SPECIMEN&lt;/em&gt; and &lt;em&gt;FOSSIL_SPECIMEN&lt;/em&gt; are self-explanatory, and usually
not want I want for my work.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;inaturalist-and-human-observations&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;iNaturalist and Human Observations&lt;/h3&gt;
&lt;p&gt;Research-grade &lt;a href=&#34;https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7&#34;&gt;iNaturalist
records&lt;/a&gt;
are imported to GBIF every few weeks. We can extract these using the field
&lt;code&gt;datasetKey&lt;/code&gt;, with the value &lt;code&gt;&#34;50c9509d-22c7-4a22-a47d-8c48425ef4a7&#34;&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sum(d1$datasetKey == &amp;quot;50c9509d-22c7-4a22-a47d-8c48425ef4a7&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;357863 # iNaturalist Records in my data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You might be tempted to filter on &lt;code&gt;datasetName&lt;/code&gt;, since there is a dataset
called “iNaturalist research-grade observations”. Unfortunately, this name
isn’t used consistently, the name “iNaturalist observations” is also used
for some records in the &lt;em&gt;same&lt;/em&gt; dataset. In fact, the word &lt;code&gt;iNaturalist&lt;/code&gt;
appears in a variety of different dataset names:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(d1$datasetName[grep(&amp;quot;iNaturalist&amp;quot;, d1$datasetName)])&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;Flora of Russia&amp;quot; on iNaturalist: a trusted backlog 
                                                131 
                           iNaturalist observations 
                                                 25 
            iNaturalist research-grade observations 
                                             357838 
               iNaturalist XicotliData observations 
                                                  7 &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Which leaves us with the unwieldy &lt;code&gt;datasetKey == &#34;50c9509d-22c7-4a22-a47d-8c48425ef4a7&#34;&lt;/code&gt; as the most reliable way to get the
offical iNaturalist dataset.&lt;/p&gt;
&lt;p&gt;All of the iNaturalist records are labelled as &lt;code&gt;HUMAN_OBSERVATION&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(d1$basisOfRecord[d1$datasetKey ==
                       &amp;quot;50c9509d-22c7-4a22-a47d-8c48425ef4a7&amp;quot;]) &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;HUMAN_OBSERVATION 
           357863 &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But this is only a small fraction of the 4,224,215 &lt;code&gt;HUMAN_OBSERVATION&lt;/code&gt;
records in my query. In my data, there are over 1500 different
&lt;code&gt;HUMAN_OBSERVATION&lt;/code&gt; datasets. This includes a number of surveys, and in
some cases these surveys record presences &lt;em&gt;and&lt;/em&gt; absences, as recorded in
the &lt;code&gt;occurrenceStatus&lt;/code&gt; field:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(d1$occurrenceStatus)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt; ABSENT PRESENT 
  27896 4761276 &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Whether you want to include a heterogeneous collection of surveys in your
data depends on the question your asking.&lt;/p&gt;
&lt;p&gt;But if you are planning to do distribution modeling, you may not want to
include &lt;code&gt;ABSENT&lt;/code&gt; records in your training data. It will depend on the scale
of your modeling, and the scale of the surveys that contributed their data
to GBIF. Fine-scale local surveys may include a mix of PRESENT and ABSENT
records within a small area (say a few hundred meters). If your modeling is
at the scale of 30 second climate rasters (~1km^2), that’s a problem. A
species can be absent in a 10m^2 quadrat, but present in the larger 1km^2
raster grid.&lt;/p&gt;
&lt;p&gt;Here are a few examples of filters you might want to use:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;herbarium &amp;lt;- d1[, d1$basisOfRecord == &amp;quot;PRESERVED_SPECIMEN&amp;quot;]
iNat &amp;lt;- d1[, d1$datasetKey ==
             &amp;quot;50c9509d-22c7-4a22-a47d-8c48425ef4a7&amp;quot;]
present &amp;lt;- d1[, d1$occurrenceStatus == &amp;quot;PRESENT&amp;quot;
              &amp;amp; d1$basisOfRecord == &amp;quot;HUMAN_OBSERVATION&amp;quot;]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;common-location-errors&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Common location errors&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;R&lt;/code&gt; package &lt;code&gt;CoordinateCleaner&lt;/code&gt; provides some
functions for dealing with common problems:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cc_cen&lt;/code&gt;
: Identifies records close to the centroid of a country. Automated
georeferencing will often use the centroid of a country as the
coordinates for a specimen that doesn’t have any other location data.
This could be 100s/1000s of kilometers away from the actual location.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cc_inst&lt;/code&gt;
: Identifies records close to the location of museums. Automated
georeferencing will often use the location of a museum as the
location for specimens it contains, regardless of where the specimen
was actually collected.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cc_sea&lt;/code&gt;
: Identifies records that are in the ocean, which are clearly errors
for terrestrial organisms. However, this could also be a consequence
of relatively minor errors in record coordinates or the mapping of
coastlines.&lt;/p&gt;
&lt;p&gt;For a more thorough overview of these kinds of issues, see this post by
John Waller on the &lt;a href=&#34;https://data-blog.gbif.org/post/gbif-filtering-guide/&#34;&gt;GBIF data
blog&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;identification-errors&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Identification Errors&lt;/h2&gt;
&lt;p&gt;Now that we’ve dealt with the more ‘mechanical’ sorts of errors we might
find in our data, we can review our records for obvious or suspected
identification errors. If you have a large dataset, especially if it
includes many species, or species you are not familiar with, this can be
a daunting task.&lt;/p&gt;
&lt;p&gt;Two important botanical resources can help us with this, at least in North
America: the &lt;a href=&#34;http://floranorthamerica.org&#34;&gt;Flora of North America&lt;/a&gt; and
&lt;a href=&#34;https://www.natureserve.org/&#34;&gt;NatureServe.org&lt;/a&gt;. These provide a ‘sanity
check’ for our GBIF records, as both websites host carefully curated data
on plant distributions.&lt;/p&gt;
&lt;p&gt;The Flora of North America includes taxonomic treatments of all plant
species that occur outside of cultivation in Canada and the United
States&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;. The maps aren’t very detailed, showing us only the states or
provinces each taxon is found in. But those maps are all based on actual
specimens examined by a taxonomist with expertise on that plant. If New
York appears on an FNA distribution map, it means there is at least one
documented record of that species, confirmed by an expert, in that state.
And conversely, if a state or province isn’t included on the map, it means
that a thorough search of herbaria has failed to find a single record of
the species.&lt;/p&gt;
&lt;p&gt;NatureServe is similar, but the distribution maps are based on
state/provincial/regional natural heritage programs. These are generally
expert field botanists, rather than taxonomists. Their job is to document
which plants grow in their jurisdiction. If a NatureServe distribution map
includes Ontario, that means an expert field botanist has evidence (which
could be a herbarium voucher, or a reliable observation) that the species
occurs (or occurred) in Ontario. And again, if a state or province isn’t
included on the map, then no such evidence has been found.&lt;/p&gt;
&lt;p&gt;NatureServe and FNA distribution maps will usually match fairly closely.
NatureServe is more responsive to new information, and those maps will
periodically be updated. FNA isn’t yet complete (although it’s getting
close), but for the taxa that it covers it provides a comprehensive review
of their distribution at the time of publication (i.e., it doesn’t get
updated).&lt;/p&gt;
&lt;p&gt;Neither source is complete, and plants do move around. But, if you have
records in your GBIF dataset from areas beyond the range documented in
NatureServe or FNA, these are the ones I’d be most concerned about
verifying. Use the map interface on the GBIF website to locate them,
and look at the record details for clues to confirm or refute them. If they
trace back to iNaturalist records, you may be able to confirm the
identification yourself from the photos, or &lt;a href=&#34;https://www.inaturalist.org/observations/189766549&#34;&gt;ask the
observer for more
details&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You won’t likely be able to confirm the identifications of all of your
records, but you can often validate or exclude the outlying records that
have the greatest potential to distort your analysis.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;The FNA project is more properly the “Flora of North America north of
Mexico”.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Extrapolation Detection (exDet) for SDMs</title>
      <link>https://plantarum.ca/2023/12/19/exdet/</link>
      <pubDate>Tue, 19 Dec 2023 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2023/12/19/exdet/</guid>
      <description>


&lt;div id=&#34;identifying-non-analogous-climate-conditions&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Identifying Non-Analogous Climate Conditions&lt;/h1&gt;
&lt;p&gt;A major concern when projecting species distribution models to new contexts
(e.g., invaded ranges, or future climates) is establishing whether (and
where) the environments in the new context are analogous to those in the
training region (&lt;a href=&#34;https://plantarum.ca/2020/06/15/maxent#projection&#34;&gt;see my notes&lt;/a&gt;). A common
approach is to compare each variable in isolation, and construct a
“Multivariate Environmental Similarity Surface”, or MESS &lt;span class=&#34;citation&#34;&gt;(Elith et al., &lt;a href=&#34;#ref-ElithEtAl_2010&#34;&gt;2010&lt;/a&gt;)&lt;/span&gt;.
Areas in the new context that are outside the range of any variable from
the training context will have values below 0, with lower values indicating
greater departures.&lt;/p&gt;
&lt;p&gt;However, the MESS approach does not account for correlations among
variables. Perhaps the native range of a species includes areas with hot,
wet, summers, but not hot dry regions. The MESS analysis would consider
conditions ‘analogous’ as long as they were within the temperature range
and precipitation range of the training region. This could result in novel
environmental combinations being incorrectly identified as analogous
climate:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;mess.png&#34; alt=&#34;Visualization of MESS analysis. The green area indicates the reference climate conditions, and the red square shows the climate space MESS will identify as ‘analog’.&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Visualization of MESS analysis. The green area indicates the reference
climate conditions, and the red square shows the climate space MESS will
identify as ‘analog’.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Note particularly the upper left corner. This is well outside the range of
conditions in the training region (the green area), but by considering each
variable separately, MESS will treat this region as if it were analogous.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;extrapolation-detection&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Extrapolation Detection&lt;/h1&gt;
&lt;p&gt;Despite the name, MESS is only just barely multivariate: it evaluates
multiple variables, but each one is considered in isolation. Extrapolation
Detection &lt;span class=&#34;citation&#34;&gt;(Mesgaran et al., &lt;a href=&#34;#ref-MesgaranEtAl_2014&#34;&gt;2014&lt;/a&gt;)&lt;/span&gt;, or &lt;code&gt;exDet&lt;/code&gt;, provides a more sophisticated
approach, based on the Mahalanobis distance. This is a common multivariate
distance measure, designed to account for covariation among variables. In
this context, our analysis will include the following steps:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Calculate the Mahalanobis distance for every cell in the training region&lt;/li&gt;
&lt;li&gt;Scale the distances by dividing by the maximum distance, such that they
range between 0 and 1&lt;/li&gt;
&lt;li&gt;Calculate the Mahalanobis distance for every cell in the novel
region/time period, again dividing by the maximum distance from the
&lt;strong&gt;training&lt;/strong&gt; (not the new) region.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This will produce a raster map with cell values ranging from 0 to infinity,
which serves as the Novelty Index. &lt;span class=&#34;citation&#34;&gt;Mesgaran et al. (&lt;a href=&#34;#ref-MesgaranEtAl_2014&#34;&gt;2014&lt;/a&gt;)&lt;/span&gt; refer to it as NT2,
to distinguish it from the &lt;code&gt;MESS&lt;/code&gt; index which they call NT1. Cells with NT2
values less than 1 are within the range of conditions present in the
training range; cells with NT2 &amp;gt; 1 are outside that range, with higher
values indicating greater departure:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;exdet.png&#34; alt=&#34;Visualization of exDet analysis. The green area indicates the reference climate conditions, and the red ellipse shows the climate space exDet will identify as ‘analog’.&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Visualization of exDet analysis. The green area indicates the reference
climate conditions, and the red ellipse shows the climate space exDet will
identify as ‘analog’.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The result more closely captures the (co)variation in conditions present in
the training area.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;exdet-in-r&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;exDet in R&lt;/h1&gt;
&lt;p&gt;There is an R package that calculates &lt;code&gt;exDet&lt;/code&gt;:
&lt;a href=&#34;https://github.com/luismurao/ntbox&#34;&gt;ntbox&lt;/a&gt;. However, it is still using the
&lt;code&gt;raster&lt;/code&gt;-based workflow, which is now obsolete. Luckily, calculating
Mahalanobis distances only requires a few lines of code, so we can do it
ourselves.&lt;/p&gt;
&lt;p&gt;For this example, I’ll use the &lt;em&gt;Lythrum salicaria&lt;/em&gt; data from my previous
tutorials, but updated to use the new
&lt;a href=&#34;https://rspatial.org/index.html&#34;&gt;terra&lt;/a&gt; workflow.&lt;/p&gt;
&lt;div id=&#34;data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Data&lt;/h2&gt;
&lt;p&gt;I use the same GBIF records previously downloaded in my &lt;a href=&#34;https://plantarum.ca/2023/07/28/ecospat-terra&#34;&gt;Ecospat with Terra
tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(terra)
library(geodata)
library(rgbif) ## not actually necessary, used to download
               ## the occurrence data

library(usdm) ## for vif collinearity test below

load(&amp;quot;../data/2021-07-29-ls-gbif-recs.Rda&amp;quot;)
lsOccs &amp;lt;- lsGBIF$data

## convert to a spatial vector:
lsOccs &amp;lt;- vect(lsOccs, geom = c(&amp;quot;decimalLongitude&amp;quot;,
                                &amp;quot;decimalLatitude&amp;quot;),
               crs = &amp;quot;+proj=longlat +datum=WGS84&amp;quot;)

wrld &amp;lt;- world(path = &amp;quot;../data/maps/&amp;quot;)

wclim &amp;lt;- worldclim_global(var = &amp;quot;bio&amp;quot;, res = 10,
                          path = &amp;quot;../data/&amp;quot;)

## North America basemap:
nAmCountries &amp;lt;- c(&amp;quot;CAN&amp;quot;, &amp;quot;MEX&amp;quot;, &amp;quot;USA&amp;quot;)
nAm &amp;lt;- gadm(nAmCountries, level = 0,
            path = &amp;quot;../data/maps/&amp;quot;,
            resolution = 2)

## Eurasia basemap:
eurasiaCountries &amp;lt;- country_codes(&amp;quot;Asia|Europe&amp;quot;)

## remove missing countries:
eurasiaCountries &amp;lt;-
  eurasiaCountries[!eurasiaCountries$ISO3 %in%
                    c(&amp;quot;HKG&amp;quot;, &amp;quot;MAC&amp;quot;, &amp;quot;XNC&amp;quot;),] 
eur &amp;lt;- gadm(eurasiaCountries$ISO3, level = 0,
            path = &amp;quot;../data/maps/&amp;quot;,
            resolution = 2)

## subset occurrences
lsNA &amp;lt;- lsOccs[nAm]
lsEA &amp;lt;- lsOccs[eur]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;training-region&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Training Region&lt;/h2&gt;
&lt;p&gt;There are a number of considerations when deciding on the region to use to
train our model. For this example, I’ll just use a 500km buffer around
occurrences in the native range.&lt;/p&gt;
&lt;p&gt;Now we have all the data we need for the analysis:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;train &amp;lt;- buffer(lsEA, width = 500000)
train &amp;lt;- aggregate(train)

plot(wrld, mar = c(1.5, 1.5, 0.5, 0.5))
plot(train, col = &amp;#39;#00FF0080&amp;#39;, add = TRUE)
plot(lsEA, add = TRUE, cex = 0.5)
plot(lsNA, add = TRUE, cex = 0.5, col = &amp;#39;blue&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:training-region&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://plantarum.ca/2023/12/19/exdet/index_files/figure-html/training-region-1.png&#34; alt=&#34;Lythrum salicaria distribution. Green shading shows the training region, black points are occurrences in the native range, and blue points are occurrences in the invaded range.&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Lythrum salicaria distribution. Green shading shows the training region, black points are occurrences in the native range, and blue points are occurrences in the invaded range.
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Note that the training region extends into the ocean, but the underlying
worldclim data doesn’t, so we don’t need to worry about cleaning this up.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;mahalanobis-distance&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Mahalanobis Distance&lt;/h2&gt;
&lt;p&gt;Now we can calculate the NT2 index. We start by clipping the climate data
to our training and projection regions, and then selecting a set of
variables that aren’t highly collinear (in the training region):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;trainWC &amp;lt;- mask(wclim, train)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
|---------|---------|---------|---------|
=========================================
                                          &lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;trainVal &amp;lt;- values(trainWC)

naWC &amp;lt;- mask(wclim, nAm)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
|---------|---------|---------|---------|
=========================================
                                          &lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;naVal &amp;lt;- values(naWC)

## screen out collinear variables:
vifSel &amp;lt;- vifstep(data.frame(trainVal), th = 5,
                  size = 20000)

VARS &amp;lt;- vifSel@results$Variables&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;VARS&lt;/code&gt; contains a list of non-collinear variables. For a real analysis you
probably should be more deliberate in selecting which variables to retain,
in order to consider both their biological interest and statistical
properties.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;trainMeans &amp;lt;- colMeans(trainVal[, VARS], na.rm = TRUE)
trainVar &amp;lt;- var(trainVal[, VARS], na.rm = TRUE)
trainMah &amp;lt;- mahalanobis(trainVal[, VARS], trainMeans,
                        trainVar, na.rm = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;trainMah&lt;/code&gt; now contains the Mahalanobis distance from each cell in the
training region to the climate centroid of the training region. ‘0’ means
the point is in the center of the climate space. By the nature of the
Mahalanobis distance, this value accounts for covariation among our climate
variables.&lt;/p&gt;
&lt;p&gt;As a quick check, let’s take a look at the Mahalanobis distances for the
training region:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;hist(trainMah, main = &amp;quot;Mahalanobis Distances in Eurasia&amp;quot;,
     xlab = &amp;quot;Distance&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/12/19/exdet/index_files/figure-html/hist-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;That’s a bit odd. We have a small number of outliers with very high
distances. I think this is likely a consequence of idiosyncracies in our
climate rasters: an artifact of the analysis rather than a biologically
meaningful value. To clean this up, I’ll identify values above the 95th
percentile as outliers and remove them. Then we can proceed with
calculating the distances for the invaded range.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;thresh &amp;lt;- quantile(trainMah, probs = 0.95, na.rm = TRUE)
trainMah[which(trainMah &amp;gt; thresh)] &amp;lt;- NA
hist(trainMah, main = &amp;quot;Trimmed Mahalanobis Distances in Eurasia&amp;quot;,
     xlab = &amp;quot;Distance&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/12/19/exdet/index_files/figure-html/outliers-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;That looks better. Moving on, we can now calculate NT2, which is the
Mahalanobis distance divided by the maximum Mahalanobis distance in the
training range:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;maxMah &amp;lt;- max(trainMah, na.rm = TRUE)
trainMah &amp;lt;- trainMah/maxMah
trainMahR &amp;lt;- trainWC[[1]]
values(trainMahR) &amp;lt;- trainMah

naMah &amp;lt;- mahalanobis(naVal[, VARS], 
                     trainMeans, trainVar, na.rm = TRUE)
naMah &amp;lt;- naMah/maxMah

## Create a new raster map for North America:
naMahR &amp;lt;- naWC[[1]]

## set the values to our Mahalanobis distances:
values(naMahR) &amp;lt;- naMah&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have a raster map with the NT2 index values for each cell in the
training and invaded ranges. Values between 0 and 1 are within the training
range. We can visualize that directly:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;COLS &amp;lt;- c(&amp;quot;green&amp;quot;, &amp;quot;lightgreen&amp;quot;, &amp;quot;aquamarine&amp;quot;, &amp;quot;blue&amp;quot;, 
           &amp;quot;orange&amp;quot;, &amp;quot;brown&amp;quot;, &amp;quot;red&amp;quot;)

BREAKS = c(0, 0.25, 0.5, 1, 2, 4, 8, 16)

plot(trainMahR, breaks = BREAKS, col = COLS,
     mar = c(2, 2, 1, 5))
plot(naMahR, breaks = BREAKS, col = COLS, add = TRUE, legend
     = FALSE)
plot(wrld, border = &amp;#39;lightgrey&amp;#39;, add = TRUE, lwd = 0.5)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/12/19/exdet/index_files/figure-html/mahPlot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This completes main part of the &lt;code&gt;exDet&lt;/code&gt; analysis. We can see the training
range all falls within the range 0-1, which is by design. Transferring that
distribution to North America, we can see most of Canada and the north
central US falls within the same range. The high arctic, the SE US and
western US are all outside of the training range, with Mexico even more
novel. Projections into these areas should be treated with caution, or
avoided entirely.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;most-influential-covariate-mic&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Most Influential Covariate, MIC&lt;/h2&gt;
&lt;p&gt;While the map above is our primary interest in &lt;code&gt;exDet&lt;/code&gt; analysis, we may
want to know which variables are most influential in creating novel
environments. To do this, we need to calculate the distance map repeatedly,
removing one variable at a time. For each cell, we can then identify the
variable that contributes the most to its NT2 index, by calculating the
difference between the distance for all variables, and the distance with
that variable removed.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;names(naMahR) &amp;lt;- &amp;quot;all&amp;quot;

## Remove variables one at a time and recalculate distances:

for(V in VARS){
  SEL &amp;lt;- VARS[VARS != V]

  tmpMeans &amp;lt;- colMeans(trainVal[, SEL], na.rm = TRUE)
  tmpVar &amp;lt;- var(trainVal[, SEL], na.rm = TRUE)
  tmpMah &amp;lt;- mahalanobis(trainVal[, SEL], tmpMeans,
                        tmpVar, na.rm = TRUE)

  thresh &amp;lt;- quantile(tmpMah, probs = 0.95, na.rm = TRUE)
  tmpMah[which(tmpMah &amp;gt; thresh)] &amp;lt;- NA
  tmpMax &amp;lt;- max(tmpMah, na.rm = TRUE)

  resMah &amp;lt;- mahalanobis(naVal[, SEL], 
                     tmpMeans, tmpVar, na.rm = TRUE)
  resMah &amp;lt;- resMah/tmpMax
  resMahR &amp;lt;- naWC[[1]]
  
  ## calculate difference from full distance:
  values(resMahR) &amp;lt;- values(naMahR$all) - resMah

  ## add a layer to our NA raster:
  naMahR[[V]] &amp;lt;- resMahR
}

## Find the maximum difference for each cell:
naMaxMah &amp;lt;- max(naMahR[[-1]])
naTest &amp;lt;- naMahR[[-1]] == naMaxMah

## Convert TRUE/FALSE to category numbers:

for(L in seq_along(names(naTest))){
  values(naTest[[L]]) &amp;lt;- values(naTest[[L]]) * L
}

## Collapse layers into a single raster:
naCat &amp;lt;- max(naTest)

## convert to a factor
levs &amp;lt;- data.frame(vals = seq_along(VARS),
                   ## clean up variable names:
                   levels = gsub(&amp;quot;wc2.1_10m_&amp;quot;, &amp;quot;&amp;quot;, VARS))
levels(naCat) &amp;lt;- levs&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can visualize which individual variables are contributing most to
the NT2 index for each cell:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(naCat, xlim = c(-180, -40), ylim = c(10, 90),
     mar = c(2, 2, 4, 5),
     main = &amp;quot;Most Influential Covariate&amp;quot;)
plot(wrld, border = &amp;#39;lightgrey&amp;#39;, lwd = 0.5, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/12/19/exdet/index_files/figure-html/MIC-plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Whether or not that information is useful to you will depend on your
research question, of course.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-ElithEtAl_2010&#34;&gt;
&lt;p&gt;Elith, J., M. Kearney, and S. Phillips. 2010. The art of modelling range-shifting species. &lt;em&gt;Methods in Ecology and Evolution&lt;/em&gt; 1: 330–342.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-MesgaranEtAl_2014&#34;&gt;
&lt;p&gt;Mesgaran, M. B., R. D. Cousens, and B. L. Webber. 2014. Here be dragons: A tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models J. Franklin [ed.], &lt;em&gt;Diversity and Distributions&lt;/em&gt; 20: 1147–1159.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Niche Quantification with Ecospat and Terra</title>
      <link>https://plantarum.ca/2023/07/28/ecospat-terra/</link>
      <pubDate>Fri, 28 Jul 2023 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2023/07/28/ecospat-terra/</guid>
      <description>


&lt;div id=&#34;introduction&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;This is an update of my previous &lt;a href=&#34;https://plantarum.ca/2021/07/29/ecospat&#34;&gt;ecospat
tutorial&lt;/a&gt;. Spatial analysis in R is shifting to
&lt;code&gt;terra&lt;/code&gt; and &lt;code&gt;sf&lt;/code&gt; as the primary packages, so I’ve translated my old,
&lt;code&gt;raster&lt;/code&gt;-based tutorial to the new workflow. I also took this opportunity
to clean up and extend the original tutorial.&lt;/p&gt;
&lt;p&gt;See the &lt;a href=&#34;https://rspatial.org/spatial/index.html&#34;&gt;RSpatial tutorial&lt;/a&gt; for a
more detailed introduction/overview of using &lt;code&gt;terra&lt;/code&gt; for GIS/spatial
analysis.&lt;/p&gt;
&lt;p&gt;&lt;del&gt;Note this analysis depends on the &lt;code&gt;ecospat&lt;/code&gt; package, and as of 2023-07-28
&lt;code&gt;ecospat&lt;/code&gt; doesn’t support the spatial objects produced by &lt;code&gt;terra&lt;/code&gt;. There
are a couple of work-arounds in the code below to account for this. I’ll
update the code once &lt;code&gt;ecospat&lt;/code&gt; is fully compatible with &lt;code&gt;terra&lt;/code&gt;&lt;/del&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;UPDATE 2024-05-10&lt;/strong&gt; &lt;code&gt;ecospat&lt;/code&gt; has now been updated to use the &lt;code&gt;terra&lt;/code&gt;
package for spatial analysis. The code here has been updated to reflect
this. Thank you to Lin Lin at Yunan University for bringing this to my
attention!&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;ecospat&lt;/code&gt; package &lt;span class=&#34;citation&#34;&gt;(Cola et al. &lt;a href=&#34;#ref-ColaEtAl_2017&#34;&gt;2017&lt;/a&gt;)&lt;/span&gt; provides code to quantify and
compare the environmental and geographic niche of two species, or of the
same species in different contexts (e.g., in its native and invaded
ranges). The included vignette explains how to do such analyses.&lt;/p&gt;
&lt;p&gt;However, the vignette assumes you already have a matrix of occurrence
records, along with the climate data for each of those records. In our
work, we typically have to construct those matrices from observation data
(herbarium records, iNaturalist observations, etc) and climate rasters
&lt;span class=&#34;citation&#34;&gt;(e.g. Fick and Hijmans &lt;a href=&#34;#ref-FickHijmans_2017&#34;&gt;2017&lt;/a&gt;)&lt;/span&gt;. This short tutorial will walk through the steps
necessary to do this.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;required-packages&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Required Packages&lt;/h1&gt;
&lt;p&gt;In addition to &lt;code&gt;ecospat&lt;/code&gt;, we’ll use &lt;code&gt;terra&lt;/code&gt; &lt;span class=&#34;citation&#34;&gt;(Hijmans &lt;a href=&#34;#ref-Hijmans_2023&#34;&gt;2023&lt;/a&gt;)&lt;/span&gt; to download
WorldClim &lt;span class=&#34;citation&#34;&gt;(Fick and Hijmans &lt;a href=&#34;#ref-FickHijmans_2017&#34;&gt;2017&lt;/a&gt;)&lt;/span&gt; rasters, and manipulate the spatial data;
&lt;code&gt;rgbif&lt;/code&gt; &lt;span class=&#34;citation&#34;&gt;(Chamberlain et al. &lt;a href=&#34;#ref-ChamberlainEtAl_2021&#34;&gt;2021&lt;/a&gt;)&lt;/span&gt; to download GBIF records, &lt;code&gt;geodata&lt;/code&gt;
&lt;span class=&#34;citation&#34;&gt;(Hijmans et al. &lt;a href=&#34;#ref-HijmansEtAl_2023&#34;&gt;2023&lt;/a&gt;)&lt;/span&gt; to get a world basemap for plots, and &lt;code&gt;ade4&lt;/code&gt;
&lt;span class=&#34;citation&#34;&gt;(Thioulouse et al. &lt;a href=&#34;#ref-ThioulouseEtAl_2018&#34;&gt;2018&lt;/a&gt;)&lt;/span&gt; to perform the principal components analysis of the
climate data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ecospat)
library(terra)
library(rgbif)
library(geodata)
library(ade4)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;getting-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Getting Data&lt;/h1&gt;
&lt;div id=&#34;gbif-occurrence-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;GBIF Occurrence Data&lt;/h2&gt;
&lt;p&gt;We’ll start by sourcing our data. For observations, let’s take a look at
Purple Loosestrife, a wetland species that is native to Europe, and
invasive in North America. For actual research work, I normally download
the files directly from GBIF, and examine them carefully to check for
errors or missing data. For this demo we’ll use the &lt;code&gt;rgbif&lt;/code&gt; package to
download the data directly into R, and we’ll assume there are no problems
with the data that need to be corrected.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsGBIF &amp;lt;- occ_search(scientificName = &amp;quot;Lythrum salicaria&amp;quot;,
                    limit = 10000,
                    basisOfRecord = &amp;quot;Preserved_Specimen&amp;quot;,
                    hasCoordinate = TRUE,
                    fields = c(&amp;quot;decimalLatitude&amp;quot;,
                               &amp;quot;decimalLongitude&amp;quot;, &amp;quot;year&amp;quot;,
                               &amp;quot;country&amp;quot;, &amp;quot;countryCode&amp;quot;))

save(lsGBIF, file = &amp;quot;../data/2021-07-29-ls-gbif-recs.Rda&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This returned an object with 7969 records. I saved that locally, so that
I’m not making GBIF search their database everytime I work on this demo.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;load(&amp;quot;../data/2021-07-29-ls-gbif-recs.Rda&amp;quot;)
lsOccs &amp;lt;- lsGBIF$data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;lsGBIF$data&lt;/code&gt; is the table with the actual records in it. That’s what we’ll
be working with. The other components of &lt;code&gt;lsGBIF&lt;/code&gt; are metadata related to
the original GBIF search. That’s useful to have, but not needed for the
rest of this example.&lt;/p&gt;
&lt;p&gt;Next, we tell R which columns are the coordinates, which allows us to map
the observations. This also converts our observation matrix to a
&lt;code&gt;SpatVector&lt;/code&gt; object.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;crs&lt;/code&gt; argument here tells R that our points are in lat/lon
(unprojected) coordinates. If all of your data use lat/lon, you don’t need
to specify this, but it’s important if you need to reproject your data, or
combine data with different projections.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsOccs &amp;lt;- vect(lsOccs, geom = c(&amp;quot;decimalLongitude&amp;quot;,
                                &amp;quot;decimalLatitude&amp;quot;),
               crs = &amp;quot;+proj=longlat +datum=WGS84&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;basemap&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Basemap&lt;/h2&gt;
&lt;p&gt;We’ll also need a world map to use in our plots. the &lt;code&gt;world&lt;/code&gt; function from
the &lt;code&gt;geodata&lt;/code&gt; package will download one for us. The first time you call
this function in a directory, it downloads the data from the internet,
and saves it locally according to the &lt;code&gt;path&lt;/code&gt; argument. Subsequent calls
will load your local copy of the data, to speed things up.&lt;/p&gt;
&lt;p&gt;In this case I’ve set the &lt;code&gt;path&lt;/code&gt; argument to store the downloaded files in
a location that is convenient for me. You can set this to anything you
like, or leave it out to use the current R working directory.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wrld &amp;lt;- world(path = &amp;quot;../data/maps/&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a relatively low resolution map of the continents. For higher
resolution maps, see the &lt;code&gt;gadm&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;Now we can plot our data. Note that we set the margins (via &lt;code&gt;mar&lt;/code&gt;) &lt;em&gt;inside&lt;/em&gt;
the &lt;code&gt;plot&lt;/code&gt; call, not via the &lt;code&gt;par&lt;/code&gt; function used in most base R plots.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(wrld, border = &amp;quot;gray80&amp;quot;, mar = c(0, 0, 0, 0))
points(lsOccs, col = 2, cex = 0.3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/07/28/ecospat-terra/index_files/figure-html/base-plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;climate-data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Climate Data&lt;/h2&gt;
&lt;p&gt;To get our climate data, we can use geodata’s &lt;code&gt;worldclim_*&lt;/code&gt; functions.
I’m using the coarsest resolution (10 minutes) to speed things up for this
demonstration. The &lt;code&gt;path&lt;/code&gt; argument works the same was as for &lt;code&gt;world&lt;/code&gt;,
storing a local copy of the files. In this instance we’ll use
&lt;code&gt;worldclim_global&lt;/code&gt; to get the bioclim variables for the world all at once:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wclim &amp;lt;- worldclim_global(var = &amp;quot;bio&amp;quot;, res = 10,
                          path = &amp;quot;../data/&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can take a look at one layer:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(wclim$wc2.1_10m_bio_1, main = &amp;quot;bio1&amp;quot;,
     mar = c(0.1, 0.1, 2, 4), legend = TRUE,
     axes = FALSE)
par(mar = c(0.1, 0.1, 2, 4))
box()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/07/28/ecospat-terra/index_files/figure-html/climate-plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;sampling-bias&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Sampling Bias&lt;/h2&gt;
&lt;p&gt;One of the challenges we deal with with herbarium data is that observations
tend to be clustered together in non-random ways. Sites near universities,
museums, or in well-used field stations will have more records than more
remote locations. We can’t completely account for this bias, but we can
reduce it by spatial thinning. This is the process of randomly selecting a
small number (often just one) of records for each grid cell in our
analysis. The result is that those highly-sampled locations will be
represented by a single record, meaning they will have a less exaggerated
influence on the results.&lt;/p&gt;
&lt;p&gt;We will need the full set of records below, so I’ll make a copy of the
un-thinned data named &lt;code&gt;lsOccsAll&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsOccsAll &amp;lt;- lsOccs ## keep this for later

## select one record per cell:
lsOccs &amp;lt;- spatSample(lsOccs, size = 1, strata = wclim)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;linking-climate-data-to-species-observations&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Linking Climate Data to Species Observations&lt;/h2&gt;
&lt;p&gt;Next, we need to extract the environmental values from the climate rasters
for each of our observation records:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsOccs &amp;lt;- cbind(lsOccs, extract(wclim, lsOccs))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the process of extracting &lt;code&gt;wclim&lt;/code&gt; values for our observations, we
usually end up with a few missing values. This is a consequence of
mismatches between the observation coordinates and the climate rasters. In
some cases, the observations are placed off the coast in the ocean, or in
another area where there is no climate data available. We need to exclude
these missing values from our analysis.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsOccs &amp;lt;- lsOccs[complete.cases(data.frame(lsOccs)), ]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;splitting-records-into-native-and-introduced-ranges&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Splitting Records into Native and Introduced Ranges&lt;/h1&gt;
&lt;p&gt;At this point, all the data we need for the Niche Quantification analysis
is in &lt;code&gt;lsOccs&lt;/code&gt; and &lt;code&gt;wclim&lt;/code&gt;. We need to split this data into native and
invasive regions for our comparison. We’ll restrict ourselves to the
northern hemisphere, and consider all records from Eurasia as native, and
all records from North America as invasive.&lt;/p&gt;
&lt;p&gt;We can use GADM to download maps of the continents, and use this to select
subsets of the datasets. I will select the country-level borders (&lt;code&gt;level = 0&lt;/code&gt;) and low resolution (&lt;code&gt;resolution = 2&lt;/code&gt;). For ‘real’ analyses you should
use high resolution (&lt;code&gt;resolution = 1&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;To speed up analyses further down, I’m just going to download Canada, USA,
and Mexico for my “North America” map.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nAmCountries &amp;lt;- c(&amp;quot;CAN&amp;quot;, &amp;quot;MEX&amp;quot;, &amp;quot;USA&amp;quot;)
nAm &amp;lt;- gadm(nAmCountries, level = 0,
            path = &amp;quot;../data/maps/&amp;quot;,
            resolution = 2)

## select Europe OR Asia:
eurasiaCountries &amp;lt;- country_codes(&amp;quot;Asia|Europe&amp;quot;)
eur &amp;lt;- gadm(eurasiaCountries$ISO3, level = 0,
            path = &amp;quot;../data/maps/&amp;quot;,
            resolution = 2)

lsNA &amp;lt;- lsOccs[nAm]
lsEA &amp;lt;- lsOccs[eur]

plot(wrld, axes = FALSE, xlim = c(-140, 150),
     ylim = c(10, 80), mar = c(0, 0, 0, 0))
points(lsNA, col = &amp;#39;red&amp;#39;, cex = 0.5)
points(lsEA, col = &amp;#39;darkgreen&amp;#39;, cex = 0.5)
par(mar = c(0,0,0,0))
box()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/07/28/ecospat-terra/index_files/figure-html/splitting-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Note that this code will generate some warnings, “this file does not
exist”. There are a few countries listed in the &lt;code&gt;country_codes&lt;/code&gt; database
that don’t have corresonding maps in the GADM repository (e.g. Hong Kong).
We’ll ignore this for now.&lt;/p&gt;
&lt;p&gt;For the Niche Quantification, we need to have a matrix with the background
environment present in the native and invasive ranges, as well as the
complete global environmental including the combined extent of the native
and introduced environments. After cropping the data, we use &lt;code&gt;values&lt;/code&gt;
to convert the raster to a dataframe.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Crop Climate Layers:
naEnvR &amp;lt;- mask(wclim, nAm)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## |---------|---------|---------|---------|=========================================                                          &lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;eaEnvR &amp;lt;- mask(wclim, eur)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## |---------|---------|---------|---------|=========================================                                          &lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;globalEnvR &amp;lt;- mask(wclim, rbind(nAm, eur))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## |---------|---------|---------|---------|=========================================                                          &lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Extract values to matrix:
naEnvM &amp;lt;- values(naEnvR)
eaEnvM &amp;lt;- values(eaEnvR)
globalEnvM &amp;lt;- values(globalEnvR)

## Clean out missing values:
naEnvM &amp;lt;- naEnvM[complete.cases(naEnvM), ]
eaEnvM &amp;lt;- eaEnvM[complete.cases(eaEnvM), ]
globalEnvM &amp;lt;- globalEnvM[complete.cases(globalEnvM), ]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that for the geographic projection below, it is essential that the
&lt;code&gt;globalEnvM&lt;/code&gt; be constructed directly from the &lt;code&gt;globalEnvR&lt;/code&gt; rasters. If you
try to combine &lt;code&gt;naEnvM&lt;/code&gt; and &lt;code&gt;eaEnvM&lt;/code&gt; to make &lt;code&gt;globalEnvM&lt;/code&gt; it will end up
scrambling the values in the geographic projection. If you don’t do a
geographic projection it doesn’t matter.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;niche-quantification&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Niche Quantification&lt;/h1&gt;
&lt;div id=&#34;pca&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;PCA&lt;/h2&gt;
&lt;p&gt;The Niche Quantification analysis starts with a Principal Components
Analysis of the environmental data. The actual ordination uses the global
data, with the observation records and the native and invasive background
environment treated as supplemental rows.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pca.clim &amp;lt;- dudi.pca(globalEnvM, center = TRUE,
                    scale = TRUE, scannf = FALSE, nf = 2)
global.scores &amp;lt;- pca.clim$li

nativeLS.scores &amp;lt;-
  suprow(pca.clim,
         data.frame(lsEA)[, colnames(globalEnvM)])$li   
invasiveLS.scores &amp;lt;-
  suprow(pca.clim,
         data.frame(lsNA)[, colnames(globalEnvM)])$li

nativeEnv.scores &amp;lt;- suprow(pca.clim, naEnvM)$li
invasiveEnv.scores &amp;lt;- suprow(pca.clim, eaEnvM)$li&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s break that down. &lt;code&gt;dudi.pca&lt;/code&gt; does a PCA analysis on &lt;code&gt;globalEnvM&lt;/code&gt;,
which is a matrix of all the environmental variables over the entire study
area. We use that to create a two-dimensional summary of the total
environmental variability.&lt;/p&gt;
&lt;p&gt;Next, we map our observation data (&lt;code&gt;lsEA&lt;/code&gt; and &lt;code&gt;lsNA&lt;/code&gt;) into that
2-dimensional ordination, using the &lt;code&gt;suprow&lt;/code&gt; function. &lt;code&gt;lsEA&lt;/code&gt; and &lt;code&gt;lsNA&lt;/code&gt;
are &lt;code&gt;SpatialPointsDataFrame&lt;/code&gt; objects. Sometimes you can treat them as if
they were data.frames, but other times you need to explicity convert them.
This is one of those times, hence I’ve wrapped them in &lt;code&gt;data.frame()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Recall that &lt;code&gt;lsEA&lt;/code&gt; and &lt;code&gt;lsNA&lt;/code&gt; have more columns than the environmental
matrix: they also include &lt;code&gt;year&lt;/code&gt;, &lt;code&gt;countryCode&lt;/code&gt;, &lt;code&gt;country&lt;/code&gt;. We only want to
include the environmental variables when you project the observations into
the ordination. To make sure that we use the same variables as in the
original ordination of &lt;code&gt;globalEnvM&lt;/code&gt;, in the same order, I select the
columns explicitly to match that object:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data.frame(lsEA)[, colnames(globalEnvM)]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output of &lt;code&gt;dudi.pca&lt;/code&gt; and &lt;code&gt;suprow&lt;/code&gt; includes a lot of information that we
aren’t using here. We only need the &lt;code&gt;li&lt;/code&gt; element, so I’ve selected that
from each of the function outputs.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;occurrence-density-grids&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Occurrence Density Grids&lt;/h2&gt;
&lt;p&gt;Finally we’re ready to do the Niche Quantification/Comparisons. We’ll use
the PCA scores for the global environment, the native and invasive
environments, and the native and invasive occurrence records.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nativeGrid &amp;lt;- ecospat.grid.clim.dyn(global.scores,
                                   nativeEnv.scores,
                                   nativeLS.scores)

invasiveGrid &amp;lt;- ecospat.grid.clim.dyn(global.scores,
                                   invasiveEnv.scores, 
                                   invasiveLS.scores)

ecospat.plot.niche.dyn(nativeGrid, invasiveGrid,
                       quant = 0.05) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/07/28/ecospat-terra/index_files/figure-html/grid.clim.dyn-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The resulting plot shows us the environmental conditions present in Eurasia
(inside the green line) and North America (inside the red line). The green
area represents environments occupied by &lt;em&gt;Lythrum salicaria&lt;/em&gt; in Eurasia,
but not in North America, the red area shows environments occupied in North
America and not Eurasia, and the blue area shows environments occupied in
both ranges. We can also see that there are a few areas in Eurasia with
environments not present in North America, and vice versa. However, for the
most part, &lt;em&gt;Lythrum salicara&lt;/em&gt; doesn’t occur in this environments.&lt;/p&gt;
&lt;p&gt;To get the index values we use &lt;code&gt;ecospat.niche.dyn.index&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;indexVals &amp;lt;- ecospat.niche.dyn.index(nativeGrid,
                                     invasiveGrid) 
indexVals$dynamic.index.w&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##  expansion  stability  unfilling 
## 0.00598794 0.99401206 0.02349507&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;projecting-climate-space-to-geographic-space&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Projecting Climate Space to Geographic Space&lt;/h2&gt;
&lt;p&gt;We can take these results and project them onto a geographic map. This will
show us which areas of the native and invasive range of &lt;em&gt;Lythrum salicaria&lt;/em&gt;
fall into each of the three categories.&lt;/p&gt;
&lt;p&gt;Note that the function &lt;code&gt;ecospat.niche.dynIndexProjGeo&lt;/code&gt; doesn’t yet handle
the &lt;code&gt;SpatRaster&lt;/code&gt; objects created by the &lt;code&gt;terra&lt;/code&gt; package, so we have to
convert it to a raster’s &lt;code&gt;stack&lt;/code&gt; when we call it. For convenience and
consistency with the rest of the code, I convert the results back to a
&lt;code&gt;SpatRaster&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;geoProj &amp;lt;-
  ecospat.niche.dynIndexProjGeo(nativeGrid,
                                invasiveGrid,
                                env = globalEnvR)

plot(geoProj, legend = FALSE, 
     col = c(&amp;quot;grey&amp;quot;, &amp;quot;green&amp;quot;, &amp;quot;red&amp;quot;, &amp;quot;blue&amp;quot;),
     ylim = c(20, 80), axes = FALSE, mar = c(0, 0, 0, 0))
points(lsEA, col = &amp;#39;darkgreen&amp;#39;,
       cex = 0.25, pch = 23, bg = &amp;quot;white&amp;quot;)
points(lsNA, col = &amp;#39;darkred&amp;#39;,
       cex = 0.25, pch = 23, bg = &amp;quot;white&amp;quot;)
par(mar = c(0,0,0,0))
box()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/07/28/ecospat-terra/index_files/figure-html/geographic_projection-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;geographic-comparisons&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Geographic Comparisons&lt;/h1&gt;
&lt;p&gt;You can also apply this analysis to geographic locations, instead of
environmental conditions. This won’t make much sense for native vs invaded
range comparisons, but it could be useful for comparing different species
within the same area.&lt;/p&gt;
&lt;p&gt;To demonstrate, let’s compare the distribution of &lt;em&gt;Lythrum salicaria&lt;/em&gt; in
North America before and after 1950. In this case, I need to go back to the
original occurrence data, since I need the thin the records before and
after 1950 separately. Otherwise, we won’t accurately capture the locations
where &lt;em&gt;Lythrum salicaria&lt;/em&gt; was collected in both time periods.&lt;/p&gt;
&lt;p&gt;We use geographic coordinates here, so no need for a PCA. We do need to
generate the ‘background’ coordinates. I’ll use &lt;code&gt;expand.grid&lt;/code&gt; to create the
locations for this. I’ve broken up the NA extent into 500 x 500 grids.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsNA_All &amp;lt;- lsOccsAll[nAm]

lsNAearly &amp;lt;- subset(lsNA_All, lsNA$year &amp;lt;= 1950)
## thin records:
lsNAearly &amp;lt;- spatSample(lsNAearly, size = 1, strata = wclim)

lsNAlate &amp;lt;- subset(lsNA_All, lsNA$year &amp;gt; 1950)
## thin records:
lsNAlate &amp;lt;- spatSample(lsNAlate, size = 1, strata = wclim)

geoGrid &amp;lt;- expand.grid(longitude =
                        seq(-160, -40, length.out = 500),
                      latitude =
                        seq(20, 90, length.out = 500))

earlyGeoGrid &amp;lt;- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                     crds(lsNAearly))

lateGeoGrid &amp;lt;- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                    crds(lsNAlate))

ecospat.plot.niche.dyn(earlyGeoGrid, lateGeoGrid, quant = 0)
plot(nAm, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/07/28/ecospat-terra/index_files/figure-html/temporal-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This looks pretty good. However, &lt;code&gt;ecospat&lt;/code&gt; uses a kernel density formula to
model the occurence distributions. As a consequence, it projects out into
the ocean, which isn’t very realistic. To correct this, we need to mask the
analysis to the continental land mass. This requires we have a vector map
of the desired area.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;earlyGeoGrid &amp;lt;- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                     crds(lsNAearly),
                                     geomask = nAm)

lateGeoGrid &amp;lt;- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                    crds(lsNAlate),
                                    geomask = nAm)

ecospat.plot.niche.dyn(earlyGeoGrid, lateGeoGrid, quant = 0)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/07/28/ecospat-terra/index_files/figure-html/masked-geography-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;That gives more reasonable results. The blue area is the range occupied by
&lt;em&gt;Lythrum salicara&lt;/em&gt; prior to 1950, and the red area is the range it expanded
into after that year. The small green areas are regions where it hasn’t
been collected since 1950.&lt;/p&gt;
&lt;p&gt;Note that this visualization is weighted by the density of points, so there
might be a few pre-1950 records in the red area, or a few post-1950 records
in the red area. That’s expected.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;This is a fairly quick overview of this workflow. You’ll almost certainly
want to consider thinning your observations, among other data cleaning
procedures. I’ve also set the study extent very crudely. That might be
appropriate for very large scale (global) studies. But you’ll usually want
to think a bit more carefully about how you set your extent. The way you
process your data will also differ depending on your context.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-ChamberlainEtAl_2021&#34;&gt;
&lt;p&gt;Chamberlain, Scott, Vijay Barve, Dan Mcglinn, Damiano Oldoni, Peter Desmet, Laurens Geffert, and Karthik Ram. 2021. “Rgbif: Interface to the Global Biodiversity Information Facility API.” Manual. &lt;a href=&#34;https://CRAN.R-project.org/package=rgbif&#34;&gt;https://CRAN.R-project.org/package=rgbif&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-ColaEtAl_2017&#34;&gt;
&lt;p&gt;Cola, Valeria Di, Olivier Broennimann, Blaise Petitpierre, Frank T. Breiner, Manuela D’Amen, Christophe Randin, Robin Engler, et al. 2017. “Ecospat: An R Package to Support Spatial Analyses and Modeling of Species Niches and Distributions.” &lt;em&gt;Ecography&lt;/em&gt; 40 (6): 774–87. &lt;a href=&#34;https://doi.org/10.1111/ecog.02671&#34;&gt;https://doi.org/10.1111/ecog.02671&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-FickHijmans_2017&#34;&gt;
&lt;p&gt;Fick, Stephen E., and Robert J. Hijmans. 2017. “WorldClim 2: New 1-Km Spatial Resolution Climate Surfaces for Global Land Areas.” &lt;em&gt;International Journal of Climatology&lt;/em&gt; 37 (12): 4302–15. &lt;a href=&#34;https://doi.org/10.1002/joc.5086&#34;&gt;https://doi.org/10.1002/joc.5086&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-Hijmans_2023&#34;&gt;
&lt;p&gt;Hijmans, Robert J. 2023. “Terra: Spatial Data Analysis.” Manual. &lt;a href=&#34;https://CRAN.R-project.org/package=terra&#34;&gt;https://CRAN.R-project.org/package=terra&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-HijmansEtAl_2023&#34;&gt;
&lt;p&gt;Hijmans, Robert J., Márcia Barbosa, Aniruddha Ghosh, and Alex Mandel. 2023. “Geodata: Download Geographic Data.” Manual. &lt;a href=&#34;https://CRAN.R-project.org/package=geodata&#34;&gt;https://CRAN.R-project.org/package=geodata&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-ThioulouseEtAl_2018&#34;&gt;
&lt;p&gt;Thioulouse, Jean, Stéphane Dray, Anne-Béatrice Dufour, Aurélie Siberchicot, Thibaut Jombart, and Sandrine Pavoine. 2018. &lt;em&gt;Multivariate Analysis of Ecological Data with Ade4&lt;/em&gt;. New York, NY: Springer New York. &lt;a href=&#34;https://doi.org/10.1007/978-1-4939-8850-1&#34;&gt;https://doi.org/10.1007/978-1-4939-8850-1&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Managing Absolute Paths in Reproducible Analyses</title>
      <link>https://plantarum.ca/2023/02/14/path_switching/</link>
      <pubDate>Tue, 14 Feb 2023 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2023/02/14/path_switching/</guid>
      <description>


&lt;p&gt;In a previous post on &lt;a href=&#34;https://plantarum.ca/2022/10/17/data_management&#34;&gt;reproducible analysis&lt;/a&gt;,
I explained the importance of using relative paths in your scripts, and
organizing your data in a single directory, in order to maintain
portability. You want to be able to pack up your analysis in a zip file, or
upload it as a single directory to GitHub or Dropbox, in order to share it
with colleagues, or transfer it to a new computer.&lt;/p&gt;
&lt;p&gt;This is best practice, but you may run into problems achieving this. One
challenge is dealing with large data sets that you will use in multiple
analyses. For example, we use &lt;a href=&#34;https://worldclim.org/&#34;&gt;WorldClim&lt;/a&gt; for a lot
of our distribution work. Each copy of the global 30s dataset fills nearly
10GB (compressed). That would quickly fill my laptop harddrive if I stored
a separate copy for each project.&lt;/p&gt;
&lt;p&gt;I have created a separate directory to store such datasets, so that I only
need to maintain a single copy on my computer. All analyses that use the
WorldClim data will look for it in &lt;code&gt;~/data/worldclim/&lt;/code&gt; on my laptop. On my
workstation, I store the same data in &lt;code&gt;~/data/enm/worldclim&lt;/code&gt;. I could
simplify this by using the same absolute path on both machines, but that
wouldn’t help anyone else trying to use my script on their own machine.&lt;/p&gt;
&lt;p&gt;The approach I’ve come up with for managing this requires a few lines of
code to set the paths appropriately:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;switch(system2(&amp;quot;hostname&amp;quot;, stdout = TRUE),
       LAPTOP.HOSTNAME =  ## my laptop:
         {worldClimPath &amp;lt;- &amp;quot;~/data/worldclim/&amp;quot;}, 
       WORKSTATION.HOSTNAME =   ## my workstation:
         {worldClimPath &amp;lt;- &amp;quot;~/data/enm/worldclim/&amp;quot;},
       {worldClimPath &amp;lt;- &amp;quot;dl/worldclim/&amp;quot;}) ## default&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;First, I check for the machine name with the function &lt;code&gt;system2(&#34;hostname&#34;, stdout = TRUE)&lt;/code&gt;. This calls the &lt;code&gt;hostname&lt;/code&gt; command on the underlying
operating system (which should work on Linux, Windows, and Mac). &lt;code&gt;hostname&lt;/code&gt;
returns the network hostname for your computer, which should be unique (at
least within your organization). The &lt;code&gt;switch&lt;/code&gt; function then compares this
value to the names for my different machines, which I’ve already looked up.
I can then use that information to set the correct path for my shared data.&lt;/p&gt;
&lt;p&gt;In the case that I’m not on either of my machines, I set a default,
relative path. That will allow other people to use my script without using
my hard-coded paths.&lt;/p&gt;
&lt;p&gt;In the case of WorldClim, the &lt;code&gt;geodata&lt;/code&gt; package provides a convenient way
to download the rasters:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(geodata)
bio &amp;lt;- worldclim_global(&amp;quot;bio&amp;quot;, res = 10,
                        path = worldClimPath) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The function &lt;code&gt;worldclim_global&lt;/code&gt; will check the &lt;code&gt;path&lt;/code&gt; argument. If it finds
the requested data there, it loads the local copy. If the data isn’t there,
it downloads a new copy from the internet, and stores it there.&lt;/p&gt;
&lt;p&gt;This makes for a convenient solution: running on my computers, all of my
analyses will use the same shared data, and I won’t have to wait for
downloads or exhaust my hard drives. But I can also share my code as-is
with collaborators, and it will &lt;em&gt;just work&lt;/em&gt;, without their having to change
any paths.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Simple Maps in R with Terra</title>
      <link>https://plantarum.ca/2023/02/13/terra-maps/</link>
      <pubDate>Mon, 13 Feb 2023 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2023/02/13/terra-maps/</guid>
      <description>


&lt;div id=&#34;reference&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Reference&lt;/h1&gt;
&lt;p&gt;This is an update of my previous &lt;a href=&#34;https://plantarum.ca/2020/10/30/simple-maps-r&#34;&gt;mapping
tutorial&lt;/a&gt;. Spatial analysis in R is shifting to
&lt;code&gt;terra&lt;/code&gt; and &lt;code&gt;sf&lt;/code&gt; as the primary packages, so I’ve translated my old,
&lt;code&gt;raster&lt;/code&gt;-based tutorial to the new workflow. See the &lt;a href=&#34;https://rspatial.org/spatial/index.html&#34;&gt;RSpatial
tutorial&lt;/a&gt; for a more detailed
introduction/overview of using &lt;code&gt;terra&lt;/code&gt; for GIS/spatial analysis.&lt;/p&gt;
&lt;p&gt;The following tutorial walks through some common plotting tasks I use for
distribution models.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;basemaps&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Basemaps&lt;/h1&gt;
&lt;p&gt;The &lt;code&gt;geodata&lt;/code&gt; package provides several convenient functions for downloading
raster and vector maps for use as basemaps and spatial analysis. The first
time you use these functions, they will download the requested maps from
the internet. It will save the data in your working directory, or in a
location specified with the &lt;code&gt;path&lt;/code&gt; argument. The next time you request the
same data, if it finds them in the local directory (or the specified
&lt;code&gt;path&lt;/code&gt;), they will be loaded from there, saving the time and bandwith
necessary to download them.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(geodata)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: terra&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## terra 1.7.29&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;us &amp;lt;- gadm(country = &amp;quot;USA&amp;quot;, level = 1, resolution = 2,
             path = &amp;quot;../data/maps/&amp;quot;)
canada &amp;lt;- gadm(country = &amp;quot;CAN&amp;quot;, level = 1, resolution = 2,
               path = &amp;quot;../data/maps&amp;quot;)
mexico &amp;lt;- gadm(country = &amp;quot;MX&amp;quot;, level = 1, resolution = 2,
               path = &amp;quot;../data/maps&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Countries are specified by their ISO code, which you can find by calling
the function &lt;code&gt;country_codes()&lt;/code&gt;. The by default, &lt;code&gt;country_codes()&lt;/code&gt; returns a
table of countries and the various ISO codes they have, as well as the
continents they are in. The &lt;code&gt;query&lt;/code&gt; argument lets you filter this table on
any value. For example, you can get a table of North American countries
with:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;country_codes(&amp;quot;North America&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or, just their ISO codes with:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;country_codes(&amp;quot;North America&amp;quot;)$ISO3&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The other arguments allow you to select, national, subnational, or
subprovincial borders (&lt;code&gt;level&lt;/code&gt; 1-3), the &lt;code&gt;resolution&lt;/code&gt; (high: 1, low: 2),
and the &lt;code&gt;path&lt;/code&gt; to the directory where you want the maps stored.&lt;/p&gt;
&lt;p&gt;These maps can be plotted directly with the &lt;code&gt;plot&lt;/code&gt; command. If you want to
combine them, use the &lt;code&gt;add = TRUE&lt;/code&gt; argument to the second &lt;code&gt;plot&lt;/code&gt; call:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(us, lwd = 2)
plot(canada, add = TRUE, col = &amp;#39;red&amp;#39;)
plot(mexico, add = TRUE, border = &amp;quot;green&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/map_plots-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;col&lt;/code&gt; sets the fill colour, &lt;code&gt;border&lt;/code&gt; sets the outline color.&lt;/p&gt;
&lt;p&gt;You can also request multiple countries as a single set of polygons, by
passing a character vector to the &lt;code&gt;country&lt;/code&gt; argument.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;NorthAmerica &amp;lt;- gadm(country = country_codes(&amp;quot;North America&amp;quot;)$ISO3,
                     level = 0, resolution = 2,
                     path = &amp;quot;../data/maps/&amp;quot;)

plot(NorthAmerica, xlim = c(-180, -50))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/combining_vectors-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Note that &lt;code&gt;xlim&lt;/code&gt; and &lt;code&gt;ylim&lt;/code&gt; work as you would expect for plotting.&lt;/p&gt;
&lt;p&gt;If you want to combine multiple polygons into a single object, use &lt;code&gt;rbind&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;CanUS &amp;lt;- rbind(us, canada)
plot(CanUS, xlim = c(-180, -50))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/rbind_polygons-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;These maps are ‘unprojected’, meaning they are plotted in
latitude/longitude degrees. That makes it easy to set the plot boundaries:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(CanUS, xlim = c(-100, -50), ylim = c(30, 60))
plot(NorthAmerica, lwd = 2, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/zooming%20a%20map-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NB:&lt;/strong&gt; The size of your plot canvas is fixed, but a map can’t stretch. The
x and y dimensions have to maintain the same aspect. That means zooming in
one dimension (i.e. latitude only) won’t necessarily change the zoom of
your map, if the other dimension fills the canvas. You’ll have to play
around with the plot size, and both x and y dimensions together, to tweak
your zoom.&lt;/p&gt;
&lt;p&gt;It’s handy to have a shapefile of the Great Lakes, for making prettier
maps. I created this one in QGIS and use it for plotting:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;greatlakes &amp;lt;- vect(&amp;quot;../data/maps/greatlakes.shp&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;adding-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Adding Data&lt;/h1&gt;
&lt;div id=&#34;vectors&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Vectors&lt;/h2&gt;
&lt;p&gt;You can add points to the plot like a regular scatter plot:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(scales)  ## for the alpha function below&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;scales&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following object is masked from &amp;#39;package:terra&amp;#39;:
## 
##     rescale&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gbif &amp;lt;- read.table(&amp;quot;../data/trich-gbif.csv&amp;quot;)
## Set the line color to gray to focus on the data points:
plot(CanUS, xlim = c(-100, -50), ylim = c(30, 60),
     border = &amp;quot;gray&amp;quot;)
points(gbif$X, gbif$Y, pch = 16,
       col = alpha(&amp;quot;green&amp;quot;, 0.2))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/adding%20points-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;You can also convert your points to a spatial vector object, in which case
R will know which columns to use for plotting. This is also necessary
before we can project our data (see below).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gbif &amp;lt;- vect(gbif, geom = c(&amp;quot;X&amp;quot;, &amp;quot;Y&amp;quot;))
## plot(CanUS, xlim = c(-100, -50), ylim = c(30, 60),
##      border = &amp;quot;gray&amp;quot;)
## points(gbif, pch = 16, col = alpha(&amp;quot;green&amp;quot;, 0.2))&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;subsetting-vectors&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Subsetting vectors&lt;/h3&gt;
&lt;p&gt;You often need to select polygons that contain points, or select only those
points that occur within a particular polygon. You can do this with R’s &lt;code&gt;[&lt;/code&gt;
subsetting syntax. For instance, &lt;code&gt;&amp;lt;polygons&amp;gt;[&amp;lt;points&amp;gt;, ]&lt;/code&gt; will select
&lt;code&gt;polygons&lt;/code&gt; that contain one or more &lt;code&gt;points&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(CanUS, xlim = c(-100, -50), ylim = c(30, 60),
      border = &amp;quot;gray&amp;quot;)
## Select states and provinces where Trichophorum is
## present: 
plot(CanUS[gbif, ], col = &amp;#39;red&amp;#39;, add = TRUE)
points(gbif, pch = 21, col = &amp;#39;white&amp;#39;, bg = &amp;#39;black&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/states_with_trich-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Conversely, &lt;code&gt;&amp;lt;points&amp;gt;[&amp;lt;polygons&amp;gt;, ]&lt;/code&gt; will select points that are within
polygons. In this example, we’ll use the fact that our &lt;code&gt;CanUS&lt;/code&gt; &lt;code&gt;SpatVector&lt;/code&gt;
object has a column named &lt;code&gt;NAME_1&lt;/code&gt; that holds the name of the state or
province of each polygon:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(CanUS, xlim = c(-100, -50), ylim = c(30, 60),
      border = &amp;quot;gray&amp;quot;)
plot(CanUS[gbif, ], col = &amp;#39;red&amp;#39;, add = TRUE)
## Select only points in NY state:
points(gbif[CanUS[CanUS$NAME_1 == &amp;quot;New York&amp;quot;, ], ],
       pch = 21, col = &amp;#39;white&amp;#39;, bg = &amp;#39;black&amp;#39;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/NY_trich-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;rasters&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Rasters&lt;/h2&gt;
&lt;p&gt;Similarly, you can plot rasters with plot:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;trichPreds &amp;lt;- rast(&amp;quot;../data/trichPreds.grd&amp;quot;)
plot(trichPreds, xlim = c(-100, -50), ylim = c(30, 60))
plot(CanUS, border = &amp;quot;gray&amp;quot;, lwd = 0.5, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/loading%20rasters-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Cells with &lt;code&gt;NA&lt;/code&gt; values are transparent. In this case, a species
distribution model, low values are displayed in gray. This may be useful
for visualizing the extent of the model. However, it looks a bit odd, and
makes it hard to see limits of the high-suitability areas. You can tweak
this by playing with the color ramp, but it’s also handy to ‘turn off’ the
low values entirely (for visualization, &lt;strong&gt;not&lt;/strong&gt; for analysis!!)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;trichPredsTrim &amp;lt;- trichPreds
trichPredsTrim[trichPredsTrim &amp;lt;
               quantile(values(trichPreds),
                        probs = 0.75, na.rm = TRUE)] &amp;lt;- NA
plot(trichPredsTrim, xlim = c(-100, -50), ylim = c(30, 60))
plot(CanUS, border = &amp;quot;grey&amp;quot;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/trimming%20predictions-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The test I used here, &lt;code&gt;trichPredsTrim &amp;lt; quantile(getValues(trichPreds), probs = 0.75, na.rm = TRUE)&lt;/code&gt; identifies all cells in the lower 75% of the
suitability scores, which I then set to &lt;code&gt;NA&lt;/code&gt; to make them invisible. I
decided on 75% after experimenting with different values. In this case, 75%
drops most of the grey background (the very lowest values), without eating
into the areas that the prediction indicates are suitable.&lt;/p&gt;
&lt;p&gt;You could also use an absolute value here, but then you’d need to know the
actual distribution of the suitability scores. &lt;code&gt;quantile&lt;/code&gt; is easier to
tweak.&lt;/p&gt;
&lt;p&gt;Alternatively, you can assign colours to the prediction raster based on the
value of each pixel, after splitting them into categories:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;suitability &amp;lt;- extract(trichPreds, gbif, ID = FALSE)[, 1]

predCat &amp;lt;- (trichPreds &amp;gt; quantile(suitability, 0.25)) + 
  (trichPreds &amp;gt; quantile(suitability, 0.15)) +
  (trichPreds &amp;gt; quantile(suitability, 0.05)) +
  (trichPreds &amp;gt; quantile(suitability, 0.025))

predCols &amp;lt;- c(&amp;quot;white&amp;quot;, &amp;quot;grey95&amp;quot;, &amp;quot;yellow3&amp;quot;, &amp;quot;orange&amp;quot;, &amp;quot;red&amp;quot;) 

plot(predCat, xlim = c(-100, -50), colNA = &amp;#39;lightblue&amp;#39;,
     col = predCols, ylim = c(30, 60), legend = FALSE)
plot(CanUS, border = &amp;quot;grey&amp;quot;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/colour_categories-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;I find using a small number of categories makes it easier to read the
suitability map than trying to interpret the colour gradients usually used.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;projections&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Projections&lt;/h1&gt;
&lt;p&gt;Lat/Lon maps look a bit square; we’re more used to seeing maps projected. A
common projection for Canada is Lambert Conformal Conic. We can transform
our data to this projection to make nicer maps:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## define the projection
canlam &amp;lt;- &amp;quot;+proj=lcc +lat_1=49 +lat_2=77 +lat_0=49 +lon_0=-95 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs&amp;quot;

## project our vector data:
CanUS.lcc &amp;lt;- project(CanUS, canlam)
gl.lcc &amp;lt;- project(greatlakes, canlam)

## Now we to set the projection of our points:
crs(gbif) &amp;lt;- &amp;quot;+proj=longlat +datum=WGS84&amp;quot;

## Finally, we can project our points to LCC:
gbif.lcc &amp;lt;- project(gbif, canlam)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that our data needs to be in an object of class &lt;code&gt;Spat*&lt;/code&gt;, and it must
have a defined coordinate reference system (CRS) before we can project it
to a new CRS. The &lt;code&gt;crs&lt;/code&gt; function allows us to explicitly set the
projection. See &lt;a href=&#34;https://rspatial.org/spatial/6-crs.html#notation&#34;&gt;the RSpatial
tutorial&lt;/a&gt; for more
details on specifying CRS with &lt;code&gt;terra&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The same function works for rasters (sort of):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rasterLCC &amp;lt;- project(trichPredsTrim, canlam)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will reproject my raster from lat/lon to Lambert Conformal Conic, but
we will unavoidably lose some precision when we do this. This is fine for
visualization, but you should avoid projecting raster data used in
analysis. For more details, see &lt;a href=&#34;https://rspatial.org/spatial/6-crs.html#transforming-raster-data&#34;&gt;the terra
tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(rasterLCC)
plot(CanUS.lcc, border = &amp;quot;grey&amp;quot;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/plot%20projected-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The units are no longer Lat/Lon, but meters. We can read them off the plot
to improve the zoom:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(rasterLCC, xlim = c(0, 2500000),
     ylim = c(-1500000, -400000))
plot(CanUS.lcc, border = &amp;quot;grey&amp;quot;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/projected%20zoom-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;formatting&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Formatting&lt;/h1&gt;
&lt;p&gt;With the data plotted, we can then turn to making the map a little
prettier. Note that when working with &lt;code&gt;Spat*&lt;/code&gt; objects, we can’t set the
plot margins with &lt;code&gt;par&lt;/code&gt; as we usually do. Instead, we use the &lt;code&gt;mar&lt;/code&gt;
argument &lt;em&gt;within&lt;/em&gt; the &lt;code&gt;plot&lt;/code&gt; function:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Make a panel with two plots
par(mfrow = c(1, 2))

## store the plot limits:
my_xlims &amp;lt;- c(0, 2500000) 
my_ylims &amp;lt;- c(-1300000, -200000)

## Plot the points:
plot(CanUS.lcc, xlim = my_xlims , ylim = my_ylims,
     border = &amp;quot;grey&amp;quot;, background = &amp;quot;lightblue&amp;quot;,
     col = &amp;quot;white&amp;quot;, axes = FALSE, mar = c(0.1,0.1,0.1,0))
plot(gl.lcc, add = TRUE, border = &amp;quot;grey&amp;quot;, col = &amp;quot;lightblue&amp;quot;)
points(gbif.lcc, pch = 16, col = alpha(&amp;quot;grey30&amp;quot;, 0.2),
       cex = 0.7)
box() 

plot(CanUS.lcc, xlim = my_xlims , ylim = my_ylims,
     border = &amp;quot;grey&amp;quot;, background = &amp;quot;lightblue&amp;quot;,
     col = &amp;quot;white&amp;quot;, mar = c(0.1,0,0.1,0.1), axes = FALSE)
plot(gl.lcc, add = TRUE, border = &amp;quot;grey&amp;quot;, col = &amp;quot;lightblue&amp;quot;)
plot(rasterLCC, add = TRUE, legend = FALSE, axes = FALSE)

## plotted again to put the border lines on top:
plot(CanUS.lcc, border = &amp;quot;grey&amp;quot;, add = TRUE) 
box()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2023/02/13/terra-maps/index_files/figure-html/pretty%20plot-1.png&#34; width=&#34;696&#34; /&gt;&lt;/p&gt;
&lt;p&gt;If you want to plot the state/provincial borders &lt;em&gt;on top&lt;/em&gt; of the raster,
you need to add those layers last. But you can’t set the background colour
of the raster layer to “lightblue” (or at least I haven’t figured that
out), so the ocean stays white. I get around that by plotting the
boundaries twice, first to set the background colour, and then to put the
state lines on top of the raster.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Data Management for Reproducible Science</title>
      <link>https://plantarum.ca/2022/10/17/data_management/</link>
      <pubDate>Mon, 17 Oct 2022 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2022/10/17/data_management/</guid>
      <description>


&lt;div id=&#34;introduction&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;Research is reproducible when others can reproduce the results of a
scientific study given only the original data, code, and documentation
&lt;span class=&#34;citation&#34;&gt;(Alston and Rick, &lt;a href=&#34;#ref-AlstonRick_2021&#34;&gt;2021&lt;/a&gt;)&lt;/span&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Benefits to the Author:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Clear and complete documentation of your work makes it easier to share,
write up and extend in future work, including responding to reviewers
and developing new projects&lt;/li&gt;
&lt;li&gt;Conscientious documentation of your work involves a great deal of
error-checking, which is reassuring to you – that you haven’t missed
anything, or mis-remembered what you did; and to your readers – that
you have conducted your work in a rigorous manner&lt;/li&gt;
&lt;li&gt;Reproducible work gets cited more, and developing a data archive creates
a new citable product from your research.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Benefits to the Community:&lt;/p&gt;
&lt;ol style=&#34;list-style-type: decimal&#34;&gt;
&lt;li&gt;Increases the speed and fidelity with which we can learn and apply new
approaches.&lt;/li&gt;
&lt;li&gt;Makes it easier to avoid mistakes (through the care and attention
required to create the archive), and to detect and correct them if they
do happen (by allowing others to critically review your work)&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
&lt;div id=&#34;how-to-make-your-work-reproducible&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;How to make your work reproducible&lt;/h1&gt;
&lt;p&gt;There are two related goals in producing a reproducible analysis:
&lt;strong&gt;portability&lt;/strong&gt;, and &lt;strong&gt;reproducibility&lt;/strong&gt;. Reproducibility is perhaps
obvious, but it’s not enough that you can reproduce your analysis on your
computer. You are the only person with access to your computer. Your work
should be reproducible on anyone’s computer.&lt;/p&gt;
&lt;div id=&#34;portability&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Portability&lt;/h2&gt;
&lt;p&gt;To achieve portability, we need to know which files are needed for an
analysis, and to organize them in a way that they can be readily moved from
one computer to another. In practice, this means they’ll all be in a single
directory, and that directory will only contain files for that particular
analysis. You may have several related analyses that share a directory.
That’s ok, but take time to organize them in a sensible way.&lt;/p&gt;
&lt;p&gt;While you may have related analyses together in a directory, you don’t want
to mix unrelated files and data in this directory. That will make it harder
to keep track of what is needed and what isn’t, and will waste space on the
computers where this work is ultimately archived.&lt;/p&gt;
&lt;div id=&#34;readme.txt&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;README.txt&lt;/h3&gt;
&lt;p&gt;There are many ways you can organize your work within this directory. One
absolute requirement is there needs to be a clear guide to your files, a
“Table of Contents”. This should be a ‘plain text’ file - something that
can be opened by any text editor. Your archive might be around for decades,
and you don’t know if your readers will be able to find a copy of MSWord97
when they need to read it. We can be reasonably confident that plain text
files will be accessible for a long time to come.&lt;/p&gt;
&lt;p&gt;By convention, this file is called &lt;code&gt;README.txt&lt;/code&gt;, and some data archiving
services (&lt;a href=&#34;https://datadryad.org/&#34;&gt;DataDryad&lt;/a&gt;) require that you include a
file with this name. You should probably use this name, unless you have a
good reason not to.&lt;/p&gt;
&lt;p&gt;One minor exception: if you use
&lt;a href=&#34;https://www.markdownguide.org/&#34;&gt;markdown&lt;/a&gt;, or
&lt;a href=&#34;https://rmarkdown.rstudio.com/&#34;&gt;RMarkdown&lt;/a&gt;, you can use these formats for
your &lt;code&gt;README&lt;/code&gt;. They are both plain text, and even if your audience doesn’t
use them, they can open a &lt;code&gt;README.md&lt;/code&gt; or &lt;code&gt;README.Rmd&lt;/code&gt; file in any text
editor. There are other, similar simple markup formats used in different
coding communities. As long as they are saved as plain text files they will
meet our requirements.&lt;/p&gt;
&lt;p&gt;The contents of your &lt;code&gt;README&lt;/code&gt; should describe as clearly as possible the
contents of your archive. Here is an excerpt from one of &lt;a href=&#34;https://github.com/plantarum/trich&#34;&gt;mine&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Start with trich.Rmd to read our draft manuscript. See trich-prep.Rmd for
the bulk of the code used in generating this manuscript.&lt;/p&gt;
&lt;h1 id=&#34;file-list&#34;&gt;File List:&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;trich.Rmd&lt;/strong&gt; : main manuscript file&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;plantarum.json&lt;/strong&gt; : bibliography (Zotero bibliography)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;trich-prep.Rmd&lt;/strong&gt; : The bulk of the code used in the analysis. Loaded from
trich.Rmd to regenerate the figures and tables there.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;data/&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;data/ssr_raw.csv&lt;/strong&gt; : raw microsatellite data. See trich-prep.Rmd (Loading
Data) for code to load and translate this to a genind object&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;data/survey-pops.csv&lt;/strong&gt; : coordinates of sampled populations&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;data/eval.opt.2020-06-24.Rda&lt;/strong&gt; : the output of the Maxent modeling,
saved as a binary R Data object. Load into R with the load() function,
so you don’t need to repeat the lengthy Maxent analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;data/trich-gbif.csv&lt;/strong&gt; : GBIF records used in the Maxent analysis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;data/trich_soil.csv&lt;/strong&gt; : Soil analysis for each sampled population&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;data/maps&lt;/strong&gt; : Maps (shapefiles and rasters) used in the Maxent
analysis, and for some of the manuscript plots&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I prefer to use &lt;code&gt;RMarkdown&lt;/code&gt; to develop my manuscripts, as this allows me to
keep the code for figures, and the resulting images, together with the text
that describes the methods and interprets the results. If you prefer to
manage your code separately from your writing that’s fine too, but you may
end up structuring your archive a little differently.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;directory-organization&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Directory Organization&lt;/h3&gt;
&lt;p&gt;If you only have a few files, you may not need to do any further
organization of your archive. I find it helpful to use subfolders to keep
things organized. Depending on the needs of a project, I use:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;data/&lt;/dt&gt;
&lt;dd&gt;unprocessed data used in the analysis
&lt;/dd&gt;
&lt;dt&gt;processed/&lt;/dt&gt;
&lt;dd&gt;intermediate data files, generated from data and not stored permanently
&lt;/dd&gt;
&lt;dt&gt;downloads/&lt;/dt&gt;
&lt;dd&gt;storage of large external datasets used in the analysis, but not
stored permanently in the archive (e.g., &lt;a href=&#34;https://worldclim.org/&#34;&gt;WorldClim Climate
Data&lt;/a&gt;); be sure to include links to the source of
any external data you are not archiving yourself!
&lt;/dd&gt;
&lt;dt&gt;plots/&lt;/dt&gt;
&lt;dd&gt;images generated by the analysis
&lt;/dd&gt;
&lt;dt&gt;code/&lt;/dt&gt;
&lt;dd&gt;code used in the analysis, if it’s not in the top-level directory
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;These are not hard rules. You can use whatever structure suits your
project. Just be sure to explain it in your &lt;code&gt;README&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;reproducibility&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Reproducibility&lt;/h2&gt;
&lt;p&gt;File organization gets us most of the way to portability. There are a few
things we need to do in our coding to complete this arrangement, and of
course to ensure we can reproduce the analysis once we move it to a new
computer.&lt;/p&gt;
&lt;div id=&#34;use-relative-paths&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Use Relative Paths&lt;/h3&gt;
&lt;p&gt;A &lt;code&gt;absolute path&lt;/code&gt; is the location of a file on a particular computer. The
absolute path to the file I’m editing right now is
&lt;code&gt;/home/smithty/blogdown/content/tutorials/2022-10-17-data-management/index.md&lt;/code&gt;.
That location will only ever exist on a Linux computer with a user named
&lt;code&gt;smithty&lt;/code&gt;. If I moved it to a Windows computer, it might be located at
&lt;code&gt;C:\Users\smithty\Documents\blogdown/content/tutorials/2022-10-17-data-management/index.md&lt;/code&gt;.
If I try to refer to this file by its absolute path on Linux on the Windows
machine, I won’t find it.&lt;/p&gt;
&lt;p&gt;On the other hand, from the top directory of my blog, &lt;code&gt;blogdown/&lt;/code&gt;, this
file will have the same &lt;code&gt;relative path&lt;/code&gt; on both machines:
&lt;code&gt;content/tutorials/2022-10-17-data-management/index.md&lt;/code&gt;. That means links
to this file using the relative path will work just fine on both machines.&lt;/p&gt;
&lt;p&gt;We should use &lt;code&gt;relative paths&lt;/code&gt; in our analyses too.&lt;/p&gt;
&lt;p&gt;For example, I could use an absolute path to load my data:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;samples &amp;lt;-
  read.csv(&amp;quot;/home/smithty/nextcloud/trich/2020-06-25/data/survey-pops.csv&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But this won’t load on anyone else’s computer. If we do this instead:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;samples &amp;lt;-
  read.csv(&amp;quot;data/survey-pops.csv&amp;quot;) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Anyone with our archive can run the code as it is, as long as they have the
working directory set properly. For my projects, I do this by running my
code from the top directory of the archive. In this case, I have the
directory structure:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;├── data
│   └── survey-pops.csv
├── README.md
├── trich-prep.Rmd
└── trich.Rmd&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When I run code from &lt;code&gt;trich.Rmd&lt;/code&gt;, I set the working directory to the
location of that file. &lt;code&gt;RStudio&lt;/code&gt; manages this for you with its project
support. If you don’t use that feature, you can tell R to use that location
when it starts. After that, stick to relative file paths, and you don’t
need to worry about your code breaking when you move to a different
computer.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;structure-your-code-to-run-on-its-own&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Structure Your Code to Run On Its Own&lt;/h3&gt;
&lt;p&gt;Two common problems that make it hard to run your code are mixing your
‘good’ code with non-working code, and writing code that requires you to
update it by hand to finish your analysis.&lt;/p&gt;
&lt;p&gt;Mixing good and bad code might look like this:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Load in our data:
myData &amp;lt;- read.data(&amp;quot;data/myfile.csv&amp;quot;)
myDataScaled &amp;lt;- scale(myData)
myDataScaled &amp;lt;- scale(myData, center = FALSE)

myData &amp;lt;- myData[, -1]
myDataScaled &amp;lt;- scale(myData, scale = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It’s not unusual to accumulate various versions of code (should I scale
this, center it, both? Do I need the first column?). While you’re actively
working on your code, you may find you have multiple versions in the same
file. They need to be clearly commented, for your own benefit! But more
importantly, when you decide which version you’re going to use, remove the
rest. Don’t expect yourself (and certainly don’t expect anyone else) to
figure out which lines they should run, and which they should skip.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Load in our data:
myData &amp;lt;- read.data(&amp;quot;data/myfile.csv&amp;quot;)

## drop the name columnn
myData &amp;lt;- myData[, colnames(myData) != &amp;quot;name&amp;quot;]

## don&amp;#39;t scale, use raw data
## myDataScaled &amp;lt;-scale(myData)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you have a few lines of code that you might want to revisit, comment
them out and leave yourself a note in a comment. If you have large blocks
of code you aren’t using, but want to keep a record of, put them in a
separate file. The end product should be a file that you can run from start
to finish, without deciding which lines to skip and which to run.&lt;/p&gt;
&lt;p&gt;A similar problem occurs when you have code that works, but requires you to
manually update it in order to complete your analysis. e.g.,&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;myData &amp;lt;- read.table(&amp;quot;data/experiments.csv&amp;quot;)

myExperiment &amp;lt;- subset(myExperimentData, exp == 1)

myExperimentResult &amp;lt;- processingCode(myExperiment)

## repeat for exp 1-20&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Code structured like this requires you to edit and re-edit the code many
times to complete your work. That is tedious, and it’s easy to make a
mistake. And when you update &lt;code&gt;processingCode&lt;/code&gt;, you need to rerun everything
by hand. You can avoid this with loops and lists.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-programming-do-not-save-your-workspace&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;R Programming: Do Not Save Your Workspace!&lt;/h3&gt;
&lt;p&gt;R offers to save your current workspace when you close the terminal. This
is sometimes convenient, but can cause hard to detect problems with your
analysis. If you happen to alter one of the objects you are working with in
an R session, but don’t capture the code in your script file, you won’t
have a record of what you’ve done. If you then save your workspace, the
next time you work on your code, you will be able to keep using that
modified object. At this point, your code and data are out of sync, with
nothing to indicate how they differ.&lt;/p&gt;
&lt;p&gt;At best, this is inconvenient. At worst, if undetected, you can waste weeks
or months analyzing the wrong data! Better to avoid the risk and set R to
never save your workspace.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;metadata-data-about-your-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Metadata: data about your data&lt;/h3&gt;
&lt;p&gt;If your archive includes data files, these should also be in an open
format, such as comma-separated or tab-separated text files (typically with
a name like &lt;code&gt;FILE.csv&lt;/code&gt;, &lt;code&gt;DATA.txt&lt;/code&gt;, or &lt;code&gt;RECORDS.txt&lt;/code&gt;. That ensures that
anyone can open your data without needing a proprietary program to do so.&lt;/p&gt;
&lt;div id=&#34;data-tables&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Data Tables&lt;/h4&gt;
&lt;p&gt;In addition, you need to document how your &lt;a href=&#34;https://data.library.arizona.edu/data-management/best-practices/data-documentation-readme-metadata&#34;&gt;data is
coded&lt;/a&gt;.
This includes things like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Variable names and descriptions&lt;/li&gt;
&lt;li&gt;Definition of codes and classification schemes&lt;/li&gt;
&lt;li&gt;Codes of, and reasons for, missing values&lt;/li&gt;
&lt;li&gt;Definitions of specialty terminology and acronyms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Be sure to include things like units (meters or feet?), and what your
special codes mean (is &lt;code&gt;pet_l&lt;/code&gt; the petal length or the petiole length?,
what is &lt;code&gt;lf1&lt;/code&gt;, &lt;code&gt;lf2&lt;/code&gt; and &lt;code&gt;lf3&lt;/code&gt;?). This is also a chance to review your data
for consistency - are you using NA &lt;em&gt;and&lt;/em&gt; -1 for missing values? Do you have
multiple different phrasings for the same thing?&lt;/p&gt;
&lt;p&gt;Depending on your project, you may also need to distinguish between absent
evidence (you didn’t sample on a day) and evidence of absence (you sampled
on a day, but didn’t find any events/individuals). If your analysis will
include sampling events that didn’t result in any observations, you’ll need
to document these ‘true negatives’ in your data table.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;data-sources&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Data Sources&lt;/h4&gt;
&lt;p&gt;If you’re using data from outside sources, like &lt;a href=&#34;www.gbif.org&#34;&gt;GBIF.org&lt;/a&gt; or
&lt;a href=&#34;https://worldclim.org/&#34;&gt;WorldClim.og&lt;/a&gt;, be sure to record their citation
details, including a DOI, if available, when you download them. This will
ensure you can properly cite them later, and that your readers will be able
to access the same data if they want to reproduce your work. Most data
providers have clear policies as to how you should cite them, and what
you’re allowed to do with the data they share (i.e., can you share it or
archive it yourself). For example, here are the policies for
&lt;a href=&#34;https://worldclim.org/about.html&#34;&gt;WorldClim&lt;/a&gt; and
&lt;a href=&#34;https://www.gbif.org/citation-guidelines&#34;&gt;GBIF&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;how-to-get-there-from-here&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;How to get there from here&lt;/h1&gt;
&lt;p&gt;Now we have an idea of what we want our data archive to look like, how do
we get there? Start by setting up your directory structure, and making a
&lt;code&gt;README&lt;/code&gt; file. If you’re starting from scratch, make it a practice to
review your directory regularly, to see what you’ve done, what you’re no
longer using, and updating your &lt;code&gt;README&lt;/code&gt; to capture that. This kind of
regular reflection is useful to track your progress, and helps you keep on
top of your archive work so you don’t have a huge mess to wrangle at the
end of your project.&lt;/p&gt;
&lt;p&gt;If you’re already well-along in your project, it may be easier to create an
‘aspirational’ directory and populate it from your existing work. I do this
regularly! I often find I’ve charged into an analysis without thinking
about archiving, and after a few weeks I have an unholy mess of files and
data to deal with. In that case, I create a new directory, a &lt;code&gt;README&lt;/code&gt;, and
copy over the main code file I’m using to that directory. When I get to the
first data file in that directory, I copy it over to the &lt;code&gt;data/&lt;/code&gt; directory
in the new archive, adjust the code to use the relative path, if necessary,
and continue. This can be very helpful in clarifying what code and files
you actually need and use, and what can be left behind.&lt;/p&gt;
&lt;p&gt;This is also a good opportunity to ensure that your analysis is structured
in a way that it can run start-to-finish without manual editing&lt;/p&gt;
&lt;p&gt;You don’t need to delete anything in the old directory, you can keep it in
case you later decide you want to revisit some ideas in there.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;what-to-do-with-it-all&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;What to do with it all&lt;/h1&gt;
&lt;p&gt;One of the benefits of structuring your code in a single directory is there
are lots of tools that you can use to manage it.
&lt;a href=&#34;https://git-scm.com/&#34;&gt;git&lt;/a&gt; and &lt;a href=&#34;https://github.com/&#34;&gt;GitHub&lt;/a&gt; are popular,
and very powerful for managing code, especially tracking different versions
of the same files. However, they require a certain amount of discipline to
get the full benefit, and they are challenging to learn. RStudio does have
good support for GitHub repositories.&lt;/p&gt;
&lt;p&gt;You can also keep it simple, and sync your directory to Google Drive,
Dropbox, Nextcloud, or many other options. Once your work is published, you
can archive it permanently on &lt;a href=&#34;https://zenodo.org/&#34;&gt;Zenodo&lt;/a&gt;,
&lt;a href=&#34;https://datadryad.org/&#34;&gt;DataDryad&lt;/a&gt;, or other online services.&lt;/p&gt;
&lt;p&gt;All of these options will support housing a single directory and its
subdirectories. None of these options will be easy to deal with if you have
files spread across multiple directories and mixed with files from other
projects!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;examples&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Examples&lt;/h1&gt;
&lt;p&gt;I’ve been doing some version of this for my own work for years, but have
only recently moved to permanent, public archives of my work. Here are
three recent examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span class=&#34;citation&#34;&gt;Hayes et al. (&lt;a href=&#34;#ref-HayesEtAl_2022&#34;&gt;2022&lt;/a&gt;)&lt;/span&gt;: &lt;a href=&#34;https://github.com/plantarum/celtisSSR&#34;&gt;The Genetic Diversity of Triploid &lt;em&gt;Celtis pumila&lt;/em&gt; and
its Diploid Relatives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;citation&#34;&gt;Nowell et al. (&lt;a href=&#34;#ref-NowellEtAl_2022&#34;&gt;2022&lt;/a&gt;)&lt;/span&gt;: &lt;a href=&#34;https://github.com/plantarum/trich&#34;&gt;Conservation assessment of a range-edge population of
&lt;em&gt;Trichophorum planifolium&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;span class=&#34;citation&#34;&gt;Foster et al. (&lt;a href=&#34;#ref-FosterEtAl_2022&#34;&gt;2022&lt;/a&gt;)&lt;/span&gt;: &lt;a href=&#34;https://doi.org/10.5061/dryad.cfxpnvx8f&#34;&gt;Testing the assumption of environmental equilibrium in
an invasive plant species over a 130 year
history&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I’m still figuring out how best to do this, and my practice will definitely
continue to change and evolve. Regardless of the specifics, I have
benefited enormously from investing the time needed to make coherent
archives of my projects.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-AlstonRick_2021&#34;&gt;
&lt;p&gt;Alston, J. M., and J. A. Rick. 2021. A Beginner’s Guide to Conducting Reproducible Research. &lt;em&gt;The Bulletin of the Ecological Society of America&lt;/em&gt; 102: e01801.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-FosterEtAl_2022&#34;&gt;
&lt;p&gt;Foster, S. L., H. M. Kharouba, and T. W. Smith. 2022. Testing the assumption of environmental equilibrium in an invasive plant species over a 130 year history. &lt;em&gt;Ecography&lt;/em&gt;: e06284.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-HayesEtAl_2022&#34;&gt;
&lt;p&gt;Hayes, A., S. Wang, A. T. Whittemore, and T. W. Smith. 2022. The Genetic Diversity of Triploid &lt;em&gt;Celtis&lt;/em&gt; &lt;em&gt;Pumila&lt;/em&gt; and its Diploid Relatives &lt;em&gt;C&lt;/em&gt;&lt;em&gt;. Occidentalis&lt;/em&gt; and &lt;em&gt;C&lt;/em&gt;&lt;em&gt;. Laevigata&lt;/em&gt; (Cannabaceae). &lt;em&gt;Systematic Botany&lt;/em&gt; 47: 441–451.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;ref-NowellEtAl_2022&#34;&gt;
&lt;p&gt;Nowell, V. J., S. Wang, and T. W. Smith. 2022. Conservation assessment of a range-edge population of &lt;em&gt;Trichophorum&lt;/em&gt; &lt;em&gt;Planifolium&lt;/em&gt; (Cyperaceae) reveals range-wide inbreeding and locally divergent environmental conditions. &lt;em&gt;Botany&lt;/em&gt; 100: 631–642.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Schoener&#39;s D and Study Extent</title>
      <link>https://plantarum.ca/2021/12/02/schoenersd/</link>
      <pubDate>Thu, 02 Dec 2021 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2021/12/02/schoenersd/</guid>
      <description>
&lt;script src=&#34;https://plantarum.ca/2021/12/02/schoenersd/index_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;div id=&#34;background&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Background&lt;/h1&gt;
&lt;p&gt;Schoener’s D was created by &lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-Schoener_1968&#34; role=&#34;doc-biblioref&#34;&gt;Schoener&lt;/a&gt; (&lt;a href=&#34;#ref-Schoener_1968&#34; role=&#34;doc-biblioref&#34;&gt;1968&lt;/a&gt;)&lt;/span&gt; He was studying the feeding
niche of anoles, and needed a way to quantify the overlap in prey items for
different species. This is what he came up with:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[D(p_X, p_X) = 1 - \frac{1}{2} \sum_i \vert p_{X,i} - p_{Y, i} \vert\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Here, &lt;span class=&#34;math inline&#34;&gt;\(p_{X,i}\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(p_{Y,i}\)&lt;/span&gt; are the frequencies for species &lt;span class=&#34;math inline&#34;&gt;\(X\)&lt;/span&gt; and &lt;span class=&#34;math inline&#34;&gt;\(Y\)&lt;/span&gt;,
respectively, for the &lt;span class=&#34;math inline&#34;&gt;\(i^{th}\)&lt;/span&gt; category. For Schoener, the categories were
prey sizes. In the context of distribution modeling, they would be regions
along an environmental gradient, and the ‘frequencies’ are the fitted
values from an SDM, or the density values from an Ecospat dynamic niche
grid.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-WarrenEtAl_2008&#34; role=&#34;doc-biblioref&#34;&gt;Warren et al.&lt;/a&gt; (&lt;a href=&#34;#ref-WarrenEtAl_2008&#34; role=&#34;doc-biblioref&#34;&gt;2008&lt;/a&gt;)&lt;/span&gt; pointed out some subtle theoretical issues with Schoener’s
D in this context, and proposed his own index &lt;em&gt;I&lt;/em&gt;, based on the Hellinger
distance, to better account for them.&lt;/p&gt;
&lt;p&gt;Hellinger’s distance:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[H(p_X, p_Y = \sqrt{\sum_i(\sqrt{p_{X,i}} - \sqrt{p_{Y,i}})^2}\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Warren’s &lt;em&gt;I&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[I(p_X, p_Y) = 1 - \frac{1}{2} H(p_X, p_Y)\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;In application, Schoener’s D suggests that the &lt;span class=&#34;math inline&#34;&gt;\(p_{X, i}\)&lt;/span&gt; values reflect
relative use of a particular habitat. However, ENM predictions indicate the
relative ‘suitability’ of a cell for &lt;em&gt;occupancy&lt;/em&gt; (i.e., presence or
absence) by the study species, but do not necessarily reflect density.&lt;/p&gt;
&lt;p&gt;However, Warren also noted that despite the potential issues, in practice
there is little difference in the qualitative results following from &lt;em&gt;D&lt;/em&gt;
and &lt;em&gt;I&lt;/em&gt;. I think Schoener’s &lt;em&gt;D&lt;/em&gt; is more commonly used now, but either or
both may show up in distribution modeling studies.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;overlap-vs-correlation&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Overlap vs Correlation&lt;/h1&gt;
&lt;p&gt;&lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-Warren_2018&#34; role=&#34;doc-biblioref&#34;&gt;Warren&lt;/a&gt; (&lt;a href=&#34;#ref-Warren_2018&#34; role=&#34;doc-biblioref&#34;&gt;2018&lt;/a&gt;)&lt;/span&gt; made an interesting contrast between two species’ niche
overlap (D), and the correlation between their suitability scores.
Schoener’s &lt;em&gt;D&lt;/em&gt; quantifies the extent to which a pair of species may
interact in the same space (i.e., they’re both likely to be present
together in a location). This is important to know, especially in the
context of niche-shift studies &lt;span class=&#34;citation&#34;&gt;(e.g. &lt;a href=&#34;#ref-AtwaterBarney_2021&#34; role=&#34;doc-biblioref&#34;&gt;Atwater and Barney, 2021&lt;/a&gt;)&lt;/span&gt;. But while they
tell us about where species are found along an environmental gradient, they
don’t tell us anything about how they respond to that gradient. In fact,
species with &lt;em&gt;perfectly opposite&lt;/em&gt; responses to the environment
may still have relatively high niche overlap, &lt;em&gt;D&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Let’s revisit the example from &lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-Warren_2018&#34; role=&#34;doc-biblioref&#34;&gt;Warren&lt;/a&gt; (&lt;a href=&#34;#ref-Warren_2018&#34; role=&#34;doc-biblioref&#34;&gt;2018&lt;/a&gt;)&lt;/span&gt;. We start with the &lt;code&gt;olaps&lt;/code&gt;
helper function, which calculates the statistics of interest:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggplot2)
library(grid)

olaps &amp;lt;- function(sp1, sp2){
  ## Calculate Schoener&amp;#39;s D, Warren&amp;#39;s I, and Spearman
  ## Correlation for sp1 and sp2

  ## sp1 and sp2 are the relative occupancy values for each
  ## species along the same environmental gradient

  ## scale the values for each species 0:1
  sp1 &amp;lt;- sp1/sum(sp1)
  sp2 &amp;lt;- sp2/sum(sp2)
  
  plot.table &amp;lt;- data.frame(
    species = c(rep(&amp;quot;sp1&amp;quot;, length(sp1)),
                rep(&amp;quot;sp2&amp;quot;, length(sp2))),
    env = c(seq(1:length(sp1)), seq(1:length(sp2))),
    suitability = c(sp1, sp2))

  D = 1 - sum(abs(sp1 - sp2))/2
  I = 1 - sum((sqrt(sp1) - sqrt(sp2))^2)/2
  cor = cor(sp1, sp2, method = &amp;quot;spearman&amp;quot;)

  grob &amp;lt;- grobTree(textGrob(paste(&amp;quot;D =&amp;quot;, round(D, 2),
                                 &amp;quot;  I =&amp;quot;, round(I, 2),
                                 &amp;quot;  Cor =&amp;quot;, round(cor, 2)),
                           x = 0.1,  y = 0.95, hjust = 0,
                           gp = gpar(fontsize = 15)))
  

  suitplot = qplot(env, suitability, data = plot.table,
                   col = species, geom = &amp;quot;line&amp;quot;) +
    annotation_custom(grob)

  return(list(
    D = D, I = I, cor = cor, suitplot = suitplot
  ))
  
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can recreate the examples from &lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-Warren_2018&#34; role=&#34;doc-biblioref&#34;&gt;Warren&lt;/a&gt; (&lt;a href=&#34;#ref-Warren_2018&#34; role=&#34;doc-biblioref&#34;&gt;2018&lt;/a&gt;)&lt;/span&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sp1 &amp;lt;- seq(0.1, 1.0, 0.001)
sp2 &amp;lt;- seq(0.1, 1.0, 0.001)

olaps(sp1, sp2)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:example-1&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://plantarum.ca/2021/12/02/schoenersd/index_files/figure-html/example-1-1.png&#34; alt=&#34;Identical Species&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: Identical Species
&lt;/p&gt;
&lt;/div&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sp1 &amp;lt;- seq(0.1, 1.0, 0.001)
sp2 &amp;lt;- seq(1.0, 0.1, -0.001)

olaps(sp1, sp2)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:example-2&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://plantarum.ca/2021/12/02/schoenersd/index_files/figure-html/example-2-1.png&#34; alt=&#34;Species with Inverse Environmental Response&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 2: Species with Inverse Environmental Response
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The point that &lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-Warren_2018&#34; role=&#34;doc-biblioref&#34;&gt;Warren&lt;/a&gt; (&lt;a href=&#34;#ref-Warren_2018&#34; role=&#34;doc-biblioref&#34;&gt;2018&lt;/a&gt;)&lt;/span&gt; was making is that two species may occupy more
or less similar locations along an environmental gradient, while having
very different &lt;em&gt;responses&lt;/em&gt; to that gradient. This isn’t a problem. But it
does highlight the importance of clearly articulating the question you are
asking in your research, and making sure that the analyses you choose are
actually answering that question.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;niche-overlap-vs-study-extent&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Niche Overlap vs Study Extent&lt;/h1&gt;
&lt;p&gt;Something else struck me reading Warren’s post. The toy examples he used
represent a very narrow slice of an environmental gradient; that is, the
portion where both species are present. Applying these analyses to global
patterns, as we do when comparing the distribution of invasive species in
their native and introduced range, and especially when we apply these
analyses to large numbers of species, we can (potentially) include much
broader gradients. And this can have significant impact on theses
statistics.&lt;/p&gt;
&lt;p&gt;Here’s a (ever so slightly) more realistic example to illustrate. We’ll
define our gradient over the range 0 to 50&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;env &amp;lt;- seq(0, 50, by = 0.01)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we’ll define two species, with partially overlapping ranges:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sp1 &amp;lt;- dnorm(env, mean = 22.5, 2)
sp2 &amp;lt;- dnorm(env, mean = 27.5, 2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now compare the species ‘globally’:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;olaps(sp1, sp2)&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:global-analysis&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://plantarum.ca/2021/12/02/schoenersd/index_files/figure-html/global-analysis-1.png&#34; alt=&#34;Global Analysis&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 3: Global Analysis
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;At this scale, their response to the gradient appears to be highly
correlated, while they have low niche overlap.&lt;/p&gt;
&lt;p&gt;If we zoom in a bit, and ‘trim’ off the lowest and highest 1000 values on
our gradient, we can emulate a ‘continental’ extent:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## ignore the lowest and highest 1000
## environmental values 
slice &amp;lt;- 1000:4000 

olaps(sp1[slice], sp2[slice])&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:continental-analysis&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://plantarum.ca/2021/12/02/schoenersd/index_files/figure-html/continental-analysis-1.png&#34; alt=&#34;Continental Analysis&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 4: Continental Analysis
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Correlation drops, but niche overlap remains identical. On reflection, this
makes sense. Locations where neither species are present get no weight in
the calculation of D, so dropping ‘empty’ gradient has no impact. On the
other hand, those locations do contribute to inflating correlation.&lt;/p&gt;
&lt;p&gt;Now what if we shift our focus, such that the distribution of our species
is not equally represented:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;slice &amp;lt;- 1000:2500 
olaps(sp1[slice], sp2[slice])&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:regional-analysis&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://plantarum.ca/2021/12/02/schoenersd/index_files/figure-html/regional-analysis-1.png&#34; alt=&#34;Regional Analysis&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 5: Regional Analysis
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Correlation jumps up, as despite both species increase together over most
of the sampled gradient. And with this particular slice, our niche overlap
is twice the ‘true’ value when we consider the full gradient.&lt;/p&gt;
&lt;p&gt;Finally, we can zoom in on the center of the gradient, where both species
are equally represented (although with inverse responses):&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;slice &amp;lt;- 2000:3000
olaps(sp1[slice], sp2[slice])&lt;/code&gt;&lt;/pre&gt;
&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:contact-analysis&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://plantarum.ca/2021/12/02/schoenersd/index_files/figure-html/contact-analysis-1.png&#34; alt=&#34;Contact Zone&#34; width=&#34;672&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 6: Contact Zone
&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Correlation drops again, accurately reflecting the inverse pattern. And D
is back down close to the ‘true’ value. That’s ‘lucky’, as my toy species
have perfectly symmetrical distributions, so sufficently large, symmetrical
regions around the mid-point between the two of them will give reasonably
accurate estimates.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;implications&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Implications&lt;/h1&gt;
&lt;p&gt;Why does this matter? If you’re interested in comparing the environmental
responses of two species, the results (correlation) can vary quite
dramatically depending on the extent of your study.&lt;/p&gt;
&lt;p&gt;On the other hand, Schoener’s D is robust to data that includes ‘too much’
of a gradient (i.e., extending beyond the region occupied by either
species). But it can be sensitive to undersampling a gradient, where the
relative occupancy of each species varies depending on how you set set your
extent.&lt;/p&gt;
&lt;p&gt;In other words, if you’re interested in the ‘underlying models’ that govern
species’ comparative distributions along a gradient, you need to be very
clear about the scope of the question, and how much of the environmental
gradient you sample. But if you want to quantify niche overlap (or relative
niche shift), then you want to include environments well beyond the regions
actually occupied by your study organisms.&lt;/p&gt;
&lt;p&gt;All of which is trivial to do when you get to create the species on the
computer, and much trickier when you need to infer the details from museum
records and climate rasters!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references csl-bib-body hanging-indent&#34;&gt;
&lt;div id=&#34;ref-AtwaterBarney_2021&#34; class=&#34;csl-entry&#34;&gt;
Atwater, D. Z., and J. N. Barney. 2021. Climatic niche shifts in 815 introduced plant species affect their predicted distributions I. Martins [ed.],. &lt;em&gt;Global Ecology and Biogeography&lt;/em&gt; 30: 1671–1684.
&lt;/div&gt;
&lt;div id=&#34;ref-Schoener_1968&#34; class=&#34;csl-entry&#34;&gt;
Schoener, T. W. 1968. The Anolis Lizards of Bimini: Resource Partitioning in a Complex Fauna. &lt;em&gt;Ecology&lt;/em&gt; 49: 704–726.
&lt;/div&gt;
&lt;div id=&#34;ref-Warren_2018&#34; class=&#34;csl-entry&#34;&gt;
Warren, D. 2018. Species In Space: Why add correlations for suitability scores? &lt;em&gt;Species In Space&lt;/em&gt;. Website &lt;a href=&#34;https://enmtools.blogspot.com/2018/10/why-add-correlations-for-suitability.html&#34;&gt;https://enmtools.blogspot.com/2018/10/why-add-correlations-for-suitability.html&lt;/a&gt; [accessed 2 December 2021].
&lt;/div&gt;
&lt;div id=&#34;ref-WarrenEtAl_2008&#34; class=&#34;csl-entry&#34;&gt;
Warren, D. L., R. E. Glor, and M. Turelli. 2008. Environmental Niche Equivalency Versus Conservatism: Quantitative Approaches to Niche Evolution. &lt;em&gt;Evolution&lt;/em&gt; 62: 2868–2883.
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Thinning Occurrence Records in R</title>
      <link>https://plantarum.ca/2021/10/26/r-gridsample/</link>
      <pubDate>Tue, 26 Oct 2021 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2021/10/26/r-gridsample/</guid>
      <description>


&lt;blockquote&gt;
&lt;p&gt;Note that this tutorial refers to the thinning method used in the old
version of the &lt;code&gt;rspatial.org&lt;/code&gt; tutorial, which used the &lt;code&gt;raster&lt;/code&gt; package
(along with &lt;code&gt;dismo&lt;/code&gt;) for the GIS computations. The &lt;code&gt;terra&lt;/code&gt; package will
shortly be replacing &lt;code&gt;raster&lt;/code&gt;, and all new code should use this instead.
The details of spatial thinning with &lt;code&gt;terra&lt;/code&gt; are presented in my &lt;a href=&#34;https://plantarum.ca/2023/07/28/ecospat-terra/#sampling-bias&#34;&gt;new
ecospat tutorial&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A common approach to reducing spatial bias in occurrence records is to
randomly select one (or a small number) of samples present in each cell in
the landscape. This uses the &lt;code&gt;gridSample&lt;/code&gt; function from the package &lt;code&gt;dismo&lt;/code&gt;
&lt;span class=&#34;citation&#34;&gt;(Hijmans et al. &lt;a href=&#34;#ref-HijmansEtAl_2017&#34;&gt;2017&lt;/a&gt;)&lt;/span&gt;, as described at
&lt;a href=&#34;https://rspatial.org/raster/sdm/2_sdm_occdata.html#sampling-bias&#34;&gt;RSpatial.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, the code presented at RSpatial uses a newly-created raster layer
to thin the records. This layer is based on the extent of your occurrence
data; even if you set the resolution to match the resolution of the
environmental rasters you use, the they won’t necessarily be aligned. That
means the cells will be the same size, but the edges won’t line up.&lt;/p&gt;
&lt;p&gt;A consequence of this is that you might end up keeping more than one
sample, or removing all samples, from a single cell in your environmental
data, even after thinning to one sample per cell in your newly-created
raster. This lead to some strange behaviour in one of my downstream
analyses, where the results for a small data set changed each time I reran
the analysis.&lt;/p&gt;
&lt;p&gt;I’ll demonstrate using the example from RSpatial:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(dismo)
library(maptools)
library(sp)
library(raster)

wclim &amp;lt;- getData(&amp;quot;worldclim&amp;quot;, var = &amp;quot;bio&amp;quot;, res = 10,
                path = &amp;quot;../data&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in getData(&amp;quot;worldclim&amp;quot;, var = &amp;quot;bio&amp;quot;, res = 10, path = &amp;quot;../data&amp;quot;): getData will be removed in a future version of raster
## . Please use the geodata package instead&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wc1 &amp;lt;- wclim[[1]]

## crop the climate data to speed up grid creation
wc1crop &amp;lt;- crop(wc1, extent(c(-70, -60),
                           ylim = c(-20, -10)))

data(acaule)
data(wrld_simpl)

acgeo &amp;lt;- subset(acaule, !is.na(lon) &amp;amp; !is.na(lat))
dups2 &amp;lt;- duplicated(acgeo[, c(&amp;#39;lon&amp;#39;, &amp;#39;lat&amp;#39;)])
acg &amp;lt;- acgeo[!dups2, ]
i &amp;lt;- acg$lon &amp;gt; 0 &amp;amp; acg$lat &amp;gt; 0
acg$lon[i] &amp;lt;- -1 * acg$lon[i]
acg$lat[i] &amp;lt;- -1 * acg$lat[i]
acg &amp;lt;- acg[acg$lon &amp;lt; -50 &amp;amp; acg$lat &amp;gt; -50, ]
coordinates(acg) &amp;lt;- ~lon+lat
crs(acg) &amp;lt;- crs(wrld_simpl)

plot(acg, pch = 20)
plot(wclim[[1]], legend = FALSE, add = TRUE)
points(acg, pch = 20)
plot(wrld_simpl, add=T, border=&amp;#39;blue&amp;#39;, lwd=2)
box()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/10/26/r-gridsample/index_files/figure-html/setup-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This is the same example from RSpatial. See the link above for more
details.&lt;/p&gt;
&lt;p&gt;Now we want to thin our records, such that we retain only one observation
for each cell in the WorldClim climate layer. To track what’s going on
here, I’ll zoom in on one of the crowed areas:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(acg, pch = 20, xlim = c(-68.5, -67),
     ylim = c(-17.5, -16.5))
plot(wc1crop, legend = FALSE, add = TRUE)
points(acg, pch = 20)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/10/26/r-gridsample/index_files/figure-html/zoom-1.png&#34; width=&#34;672&#34; /&gt;
Now lets overlay the grid generated by the code from RSpatial:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;r &amp;lt;- raster(acg)
# set the resolution of the cells to the same as wclim
res(r) &amp;lt;- res(wclim)
# expand (extend) the extent of the RasterLayer a little
r &amp;lt;- extend(r, extent(r)+1)
p &amp;lt;- rasterToPolygons(r)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(acg, pch = 20, xlim = c(-68.5, -67),
     ylim = c(-17.5, -16.5))
plot(wc1crop, legend = FALSE, add = TRUE)
points(acg, pch = 20)
plot(p, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/10/26/r-gridsample/index_files/figure-html/gridPlot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Notice how the climate cells (the coloured squares) are offset from the
sampling grid (the black gridlines).&lt;/p&gt;
&lt;p&gt;Using this grid for &lt;code&gt;gridSample&lt;/code&gt;. I’ll plot a blue ring around the retained
occurrences; the records we drop are left as points:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(1)
acsel &amp;lt;- gridSample(acg, r, n=1)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(acg, pch = 20, xlim = c(-68.5, -67),
     ylim = c(-17.5, -16.5))
plot(wc1crop, legend = FALSE, add = TRUE)
points(acg, pch = 20)
plot(p, add = TRUE)
points(acsel, col = &amp;#39;blue&amp;#39;, cex = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/10/26/r-gridsample/index_files/figure-html/gridSample_plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Take a close look at that last plot. Notice that there are climate cells
with no retained observations:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/10/26/r-gridsample/index_files/figure-html/gridMissing-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;As well as climate cells with multiple observations:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/10/26/r-gridsample/index_files/figure-html/gridExtra-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This is fine if you are using the occurrence records in a purely spatial
analysis (i.e., without incorporating climate data for each observation).
But if you are intending to retain at most one observations for every cell
in your climate map, this is not what you were hoping for.&lt;/p&gt;
&lt;p&gt;A better way to achieve our desired result is to use the climate layer
directly in &lt;code&gt;gridSample&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;set.seed(1)
acselClimate &amp;lt;- gridSample(acg, wc1, n=1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Lets visualize the grid we sampled on:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(acg, pch = 20, xlim = c(-68.5, -67),
     ylim = c(-17.5, -16.5))
plot(wc1crop, legend = FALSE, add = TRUE)
points(acg, pch = 20)

## using a cropped layer because this is a slow operation:
climGrid &amp;lt;- rasterToPolygons(wc1crop)
plot(climGrid, add = TRUE)
points(acselClimate, col = &amp;#39;blue&amp;#39;, cex = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/10/26/r-gridsample/index_files/figure-html/climateGrid-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Everything matches up as we expect now: the sampling grids are perfectly
aligned with the climate cells.&lt;/p&gt;
&lt;p&gt;This approach will only work if your climate raster covers the full extent
of your occurrence records. Which it really should - if it doesn’t, the
records that aren’t covered will end up getting dropped from your analysis
since there’s no climate data at those locations.&lt;/p&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references&#34;&gt;
&lt;div id=&#34;ref-HijmansEtAl_2017&#34;&gt;
&lt;p&gt;Hijmans, Robert J., Steven Phillips, John Leathwick, and Jane Elith. 2017. “Dismo R Package Version 1.1-4.” &lt;a href=&#34;https://CRAN.R-project.org/package=dismo&#34;&gt;https://CRAN.R-project.org/package=dismo&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Emacs for Bioinformatics #4: RMarkdown</title>
      <link>https://plantarum.ca/2021/10/03/emacs-tutorial-rmarkdown/</link>
      <pubDate>Sun, 03 Oct 2021 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2021/10/03/emacs-tutorial-rmarkdown/</guid>
      <description>



&lt;p&gt;This is part four in my series of Emacs tutorials aimed at bioinformatics
(and other scientific analysis) workflows. See the rest on my
&lt;a href=&#34;https://plantarum.ca/tutorials/&#34;&gt;tutorials&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;Emacs provides full support for editing
&lt;a href=&#34;https://rmarkdown.rstudio.com/&#34;&gt;RMarkdown&lt;/a&gt; documents. RMarkdown has
extensive documentation, both at the previous RStudio link, and several
free online books by Xie et al. (notably &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/&#34;&gt;R Markdown: The Definitive
Guide&lt;/a&gt;, but also several others
listed on &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/&#34;&gt;Yihui Xie’s Bookdown
page&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Most of these references assume you are using the
&lt;a href=&#34;https://rstudio.com/&#34;&gt;RStudio&lt;/a&gt; development environment. The purpose of
this tutorial is to get you started editing RMarkdown documents in Emacs.&lt;/p&gt;
&lt;div id=&#34;installation&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Installation&lt;/h1&gt;
&lt;div id=&#34;prerequisites&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;You need to have &lt;a href=&#34;https://www.r-project.org/&#34;&gt;R&lt;/a&gt; installed, of course. You
will also need &lt;a href=&#34;https://pandoc.org/&#34;&gt;Pandoc&lt;/a&gt; in order to take full
advantage of all the output options available. If you want to create PDF
documents, you’ll need &lt;a href=&#34;https://www.latex-project.org/&#34;&gt;LaTeX&lt;/a&gt; as well.&lt;/p&gt;
&lt;p&gt;All three of these programs are provided in the package repositories for
most major Linux distributions. See the links above for instructions for
installing on Windows or Apple computers.&lt;/p&gt;
&lt;p&gt;You will also need to install the &lt;code&gt;rmarkdown&lt;/code&gt; R package. You can do this
from within R via:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;quot;rmarkdown&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will also install the other R requirements, notably the
&lt;a href=&#34;https://yihui.org/knitr/&#34;&gt;knitr&lt;/a&gt; package.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://bookdown.org/yihui/rmarkdown/&#34;&gt;bookdown&lt;/a&gt; package provides some
more advanced citation features. I won’t discuss them in this short
tutorial, but in order to use them you need to install that package too:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;install.packages(&amp;quot;bookdown&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;emacs-packages&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Emacs Packages&lt;/h2&gt;
&lt;p&gt;We need a few additional Emacs packages to comfortably edit RMarkdown
documents. These are:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;a href=&#34;https://github.com/jrblevin/markdown-mode&#34;&gt;Markdown Mode&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;The major mode for editing files in markdown format. &lt;strong&gt;This tutorial uses
features added after 6 January 2021.&lt;/strong&gt;&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;a href=&#34;https://ess.r-project.org/&#34; title=&#34;ESS&#34;&gt;ESS&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;The collection of modes for editing R code and interacting with the R
program.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;a href=&#34;https://polymode.github.io/&#34;&gt;poly-R (Polymode)&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;&lt;code&gt;polymode&lt;/code&gt; is a ‘glue’ mode. The &lt;code&gt;poly-R&lt;/code&gt; variant extends markdown mode
to allow us to edit embedded code snippets in R (and other languages too)&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;(&lt;code&gt;poly-R&lt;/code&gt; also supports files in &lt;code&gt;.Rnw&lt;/code&gt; format, which mix LaTeX and R
code. We won’t cover that here)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;polymode&lt;/code&gt; started out as a collection of modes to support files with
different combinations of languages. As it has grown, many of those
different modes have been split out into separate packages. When we
install &lt;code&gt;poly-R&lt;/code&gt;, it will automatically install the core of the
&lt;code&gt;polymode&lt;/code&gt; system for us.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;This tutorial uses features added after 29 September 2021.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As in previous tutorials, (see &lt;a href=&#34;https://plantarum.ca/2020/12/30/emacs-tutorial-03/&#34;&gt;my
blog&lt;/a&gt; or the &lt;a href=&#34;https://www.youtube.com/watch?v=So1LYzSk9o0&#34;&gt;demo on
Youtube&lt;/a&gt;), we can install all
three of these packages from &lt;a href=&#34;https://melpa.org/#/&#34;&gt;MELPA&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once we have the required packages installed, no further configuration
should be necessary. When we next open a file with a &lt;code&gt;.Rmd&lt;/code&gt; extension,
Emacs will know to use the &lt;code&gt;poly-markdown+R-mode&lt;/code&gt; for these files. If
everything is working properly, you’ll see &lt;code&gt;Markdown PM-Rmd&lt;/code&gt; in the
modeline at the bottom of the window for these files, and &lt;code&gt;Markdown&lt;/code&gt;,
&lt;code&gt;RMarkdown&lt;/code&gt;, and &lt;code&gt;Polymode&lt;/code&gt; menus at the top of Emacs frame.&lt;/p&gt;
&lt;div id=&#34;configuration-note&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Configuration Note&lt;/h3&gt;
&lt;p&gt;Depending on how you have installed &lt;code&gt;poly-R&lt;/code&gt;, it may be loaded
automatically, or you might need to load it yourself in your config. If it
isn’t loaded automatically, you might see errors like &lt;code&gt;(void-function poly-gfm+r-mode)&lt;/code&gt; when you try to open an Rmarkdown file.&lt;/p&gt;
&lt;p&gt;You can fix this with by adding the following line to your Emacs config.
The location isn’t critical, but it’s probably most convenient to put it at
the beginning of any configuration you use for ESS/R/Markdown.&lt;/p&gt;
&lt;pre class=&#34;lisp&#34;&gt;&lt;code&gt;(require &amp;#39;poly-R)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;github-flavoured-markdown-and-code-blocks&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Github Flavoured Markdown and Code Blocks&lt;/h3&gt;
&lt;p&gt;Markdown mode supports several different options for code blocks. To take
full advantage of the RMarkdown support provided by the &lt;code&gt;rmarkdown&lt;/code&gt; R
package, we need to use fenced code blocks, along with language strings
wrapped in braces&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt;. I’ll explain this in more detail below.&lt;/p&gt;
&lt;p&gt;This variant of markdown is referred to as “Github Flavoured Markdown”, and
the &lt;code&gt;markdown-mode&lt;/code&gt; package provides &lt;code&gt;gfm-mode&lt;/code&gt; with a few extra features
particular to it. Turning on &lt;code&gt;gfm-mode&lt;/code&gt; for Rmd files requires the
following line in your Emacs configuration to turn it on&lt;a href=&#34;#fn2&#34; class=&#34;footnote-ref&#34; id=&#34;fnref2&#34;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;lisp&#34;&gt;&lt;code&gt;;; associate the new polymode to Rmd files:
(add-to-list &amp;#39;auto-mode-alist
             &amp;#39;(&amp;quot;\\.[rR]md\\&amp;#39;&amp;quot; . poly-gfm+r-mode))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You will also need the following line, if you want &lt;code&gt;gfm-mode&lt;/code&gt; to
automatically insert braces for code blocks (described below):&lt;/p&gt;
&lt;pre class=&#34;lisp&#34;&gt;&lt;code&gt;;; uses braces around code block language strings:
(setq markdown-code-block-braces t)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will switch you from using &lt;code&gt;poly-markdown+R-mode&lt;/code&gt; to
&lt;code&gt;poly-gfm+r-mode&lt;/code&gt;, which shows up in your mode bar as “PM-Rmd(gfm)”. It’s
nearly similar, the main differences being the support for fenced code
blocks.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;rmarkdown&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;RMarkdown&lt;/h1&gt;
&lt;div id=&#34;editing-markdown&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Editing Markdown&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://www.markdownguide.org/basic-syntax/&#34;&gt;Markdown syntax&lt;/a&gt; is designed
to be easily entered by hand, which means if you’re already familiar with
the format you can just get going. Markdown mode will provide you with
syntax highlighing automatically:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;markdown-mode.jpg&#34; alt=&#34;A Markdown Mode buffer showing syntax highlighting&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;A Markdown Mode buffer showing syntax highlighting&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Of course, there are lots of shortcuts available. You can explore the most
frequently used in the &lt;code&gt;Markdown&lt;/code&gt; menu. I’ll summarize some of the main
ones here to get you started.&lt;/p&gt;
&lt;div id=&#34;headings&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Headings&lt;/h3&gt;
&lt;p&gt;Markdown has two different kinds of headings. The “Atx” style uses &lt;code&gt;#&lt;/code&gt;
symbols at the beginning of the heading, and, optionally, also at the end:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
# Heading Level 1

## Heading Level 2 ##
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can insert these headings with the following commands:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;C-c C-s h&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;insert a heading at the same level as the previous heading.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;If the &lt;code&gt;region&lt;/code&gt; is active, the contents of the region will be used as the
header text. If &lt;code&gt;point&lt;/code&gt; is on a line with text, the line will be
converted into a header. Otherwise, an empty header will be created.&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;C-c C-s {1-9}&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;insert a heading at the specified level. i.e., &lt;code&gt;C-c C-s 3&lt;/code&gt; inserts a
third-level heading. &lt;code&gt;region&lt;/code&gt; and &lt;code&gt;point&lt;/code&gt; can be used to set the heading
text as for the previous.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;You can manipulate headings with the following commands:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;C-c &amp;lt;up&amp;gt;&lt;/code&gt; and &lt;code&gt;C-c &amp;lt;down&amp;gt;&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;move a heading and all of its content up or down in the document.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;ie., turn this:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;markdown-move1.jpg&#34; alt=&#34;Markdown headings in original order&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Markdown headings in original order&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;into this:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;markdown-move2.jpg&#34; alt=&#34;Markdown headings with subheading 2 ahead of subheading 1&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Markdown headings with subheading 2 ahead of subheading 1&lt;/p&gt;
&lt;/div&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;C-c &amp;lt;left&amp;gt;&lt;/code&gt; and &lt;code&gt;C-c &amp;lt;right&amp;gt;&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;promote or demote a heading.&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;i.e., turn this:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;markdown-move1.jpg&#34; alt=&#34;Markdown headings in hierarchy&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Markdown headings in hierarchy&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;into this (and vice versa):&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;demotion.jpg&#34; alt=&#34;Markdown with subheading 2 demoted&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Markdown with subheading 2 demoted&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;If you prefer asymmetric headings (i.e., with &lt;code&gt;#&lt;/code&gt; symbols only at the
beginning of the line), you can configue this by setting the variable
&lt;code&gt;markdown-asymmetric-header&lt;/code&gt; to &lt;code&gt;t&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;lisp&#34;&gt;&lt;code&gt;;; set in your ~/.emacs or ~/.emacs.d/init.el
(setq markdown-asymmetric-header t)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Alternatively, you can do this via &lt;code&gt;M-x customize-variable markdown-asymmetric-header&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Markdown mode also supports the &lt;code&gt;setext&lt;/code&gt; style headings:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
Heading Level 1
===============

Heading Level 2
---------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Only two levels are supported, which you can insert automatically with the
commands &lt;code&gt;C-c C-s !&lt;/code&gt; (level 1) and &lt;code&gt;C-c C-s @&lt;/code&gt; (level 2).&lt;/p&gt;
&lt;div id=&#34;heading-navigation&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Heading Navigation&lt;/h4&gt;
&lt;p&gt;You can move from heading to heading with the following commands:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;C-c C-n&lt;/code&gt; and &lt;code&gt;C-c C-p&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;move to next and previous headings&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-f&lt;/code&gt; and &lt;code&gt;C-c C-b&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;move forward and backward to headings at the same level&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-u&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;move up to parent heading&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;
&lt;div id=&#34;heading-visibility&#34; class=&#34;section level4&#34;&gt;
&lt;h4&gt;Heading Visibility&lt;/h4&gt;
&lt;p&gt;You can hide and show different sections in documents by pressing the
&lt;code&gt;&amp;lt;TAB&amp;gt;&lt;/code&gt; key with point on a heading. For example, with point on the &lt;code&gt;# Installation&lt;/code&gt; heading, when I press &lt;code&gt;&amp;lt;TAB&amp;gt;&lt;/code&gt; I move from this:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;unhidden.jpg&#34; alt=&#34;Markdown buffer with all sections visible&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Markdown buffer with all sections visible&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;to this:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;hidden.jpg&#34; alt=&#34;Markdown buffer with the Installation section hidden&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;Markdown buffer with the Installation section hidden&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This doesn’t change any of the text in your file, it only hides the parts
you don’t want to see. Repeatedly pressing the &lt;code&gt;&amp;lt;TAB&amp;gt;&lt;/code&gt; key will toggle
through the various levels of hiding and showing.&lt;/p&gt;
&lt;p&gt;If you want to toggle all the headings at once, &lt;code&gt;Shift-&amp;lt;TAB&amp;gt;&lt;/code&gt; will toggle
visibility for all headings at once. You can use this to collapse your
entire document to a table of contents:&lt;/p&gt;
&lt;div class=&#34;figure&#34;&gt;
&lt;img src=&#34;toc.jpg&#34; alt=&#34;A Markdown buffer with only headings visible&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;A Markdown buffer with only headings visible&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;links-and-images&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Links and Images&lt;/h3&gt;
&lt;p&gt;Inserting links is done with &lt;code&gt;C-c C-l&lt;/code&gt;. Emacs will first prompt you for the
link URL, followed by the link text, and finally the tooltip text. Only the
URL is required. To open a link from Emacs, use &lt;code&gt;C-c C-o&lt;/code&gt;, which will take
you to the webpage in your browser.&lt;/p&gt;
&lt;p&gt;Images are handled similarly, and are inserted with &lt;code&gt;C-c &amp;lt;TAB&amp;gt;&lt;/code&gt; or &lt;code&gt;C-c C-i&lt;/code&gt;. The URL can be a web resource (e.g.,
&lt;code&gt;https://my-images.ca/image1.jpg&lt;/code&gt;), or a local file (e.g.,
&lt;code&gt;./images/image1.jpg&lt;/code&gt;). The image will appear as:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;![Image Caption](image URL)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can toggle displaying the actual image in the buffer with &lt;code&gt;C-c C-x C-i&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;tables&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Tables&lt;/h3&gt;
&lt;p&gt;To insert a new table, use the command &lt;code&gt;C-c C-s t&lt;/code&gt;. You will be prompted
for the number of rows and columns, and the alignment you want. When you’re
done, you’ll have a proper markdown table ready to edit:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;|   |   |   |   |
|---|---|---|---|
|   |   |   |   |
|   |   |   |   |
|   |   |   |   |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With point in any of the cells, you can &lt;code&gt;&amp;lt;TAB&amp;gt;&lt;/code&gt; into the next cell, or
&lt;code&gt;Shift-&amp;lt;TAB&amp;gt;&lt;/code&gt; into the previous cell. Each time you hit tab the cells will
resize automatically to accomodate your text.&lt;/p&gt;
&lt;p&gt;Additional commands are available for moving, adding and deleting rows and
columns; see the &lt;code&gt;Markdown -&amp;gt; Tables&lt;/code&gt; menu the options.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;other-markup&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Other Markup&lt;/h3&gt;
&lt;p&gt;Markdown mode also provides shortcuts for other markup elements. See the
&lt;code&gt;Markdown&lt;/code&gt; menu for some of the options. I find most of the basics (bold,
emphasis, unordered lists) are just as fast to type by hand as they are to
insert using shortcuts.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;working-with-r-code&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Working with R Code&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;rmarkdown&lt;/code&gt; uses fenced code blocks with braces around the language string. i.e.,:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    ```{R code-block-example}
    ## R code goes here!
    1 + 1
    ```&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you’ve set up &lt;code&gt;gfm-mode&lt;/code&gt; as described above, you can create one of these
code blocks with the command &lt;code&gt;markdown-insert-gfm-code-block&lt;/code&gt;, bound to
&lt;code&gt;C-c C-s C&lt;/code&gt; by default. Alternatively, simply entering three “`” characters
at the beginning of a line will call the function for you&lt;a href=&#34;#fn3&#34; class=&#34;footnote-ref&#34; id=&#34;fnref3&#34;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt;. Either way,
you’ll be prompted for the language of the code block (which will be R most
of the time, but you can use others!). You can also add a label for the
code block at the prompt, and any additional options you want to use for
the chunk. You can also add options later if you change your mind.&lt;a href=&#34;#fn4&#34; class=&#34;footnote-ref&#34; id=&#34;fnref4&#34;&gt;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Once you have created a code block, &lt;code&gt;polymode&lt;/code&gt; will work its magic. You can
continue to edit the markdown portions of your document, with all the
features of &lt;code&gt;gfm-mode&lt;/code&gt;. But when point is in an R code block, you’ll be
editing it in &lt;code&gt;ESS[R]&lt;/code&gt; mode. That allows you to use all the features of
that package (see
&lt;a href=&#34;https://plantarum.ca/tutorials/emacs-tutorial-03/&#34;&gt;plantarum.ca&lt;/a&gt; for a
quick tutorial/refresher).&lt;/p&gt;
&lt;p&gt;Polymode provides some additional conveniences:&lt;/p&gt;
&lt;div id=&#34;navigation&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Navigation&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;polymode-next-chunk&lt;/code&gt;/&lt;code&gt;polymode-previous-chunk&lt;/code&gt;, bound to &lt;code&gt;M-n C-n&lt;/code&gt; and &lt;code&gt;M-n C-p&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;move to the next/previous chunk. i.e., move from an RMarkdown chunk to
the next R code chunk.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;polymode-next-chunk-same-type&lt;/code&gt;/&lt;code&gt;polymode-previous-chunk-same-type&lt;/code&gt;, bound to &lt;code&gt;M-n M-C-n&lt;/code&gt; and &lt;code&gt;M-n M-C-p&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;move to the next/previous chunk of the same type. i.e., move from one R
code chunk the next R code chunk.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;polymode-kill-chunk&lt;/code&gt;, bound to &lt;code&gt;M-n M-k&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;kill the current chunk&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;polymode-toggle-chunk-narrowing&lt;/code&gt;, bound to &lt;code&gt;M-n C-t&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;toggle narrowing the buffer to display only the current chunk, or to
display the entire document&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;
&lt;div id=&#34;evaluation&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Evaluation&lt;/h3&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;polymode-eval-region-or-chunk&lt;/code&gt;, bound to &lt;code&gt;M-n v&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;evaluate all code chunks in the active region, or the chunk at point if there
is no active region&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;polymode-eval-buffer&lt;/code&gt;, bound to &lt;code&gt;M-n b&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;evaluate all code chunks in the buffer&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;polymode-eval-buffer-from-beg-to-point&lt;/code&gt;/&lt;code&gt;polymode-eval-buffer-from-point-to-end&lt;/code&gt;, bound to &lt;code&gt;M-n u&lt;/code&gt; or &lt;code&gt;M-n ↑&lt;/code&gt;, and &lt;code&gt;M-n d&lt;/code&gt; or &lt;code&gt;M-n ↓&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;evaluate all code chunks from the beginning of the buffer to point (&lt;code&gt;u&lt;/code&gt; and &lt;code&gt;↑&lt;/code&gt;), or from point to the end of the buffer (&lt;code&gt;d&lt;/code&gt; and &lt;code&gt;↓&lt;/code&gt;)&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;exporting-rmarkdown&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Exporting RMarkdown&lt;/h2&gt;
&lt;p&gt;The most important ‘convenience’ of &lt;code&gt;polymode&lt;/code&gt; is that it connects Emacs to
the programs used to export RMarkdown files to presentation formats (i.e.,
pdf, html, slides). The main function you need for this is
&lt;code&gt;polymode-export&lt;/code&gt;, bound to &lt;code&gt;M-n e&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The first time you run this, you’ll be asked which exporter you would like
to use. There are two choices, &lt;code&gt;markdown&lt;/code&gt; and &lt;code&gt;markdown-ess&lt;/code&gt;. &lt;code&gt;markdown&lt;/code&gt;
means &lt;code&gt;polymode&lt;/code&gt; will start a new, self-contained R process and compile
your file there. When compilation is finished, the process will be closed.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;markdown-ess&lt;/code&gt; will use an existing R process, or start a new one if there
isn’t an active process available. When compilation is complete, the R
process remains active. This allows you to check the values of various
objects interactively. This can be useful as you develop a new script.&lt;/p&gt;
&lt;p&gt;RMarkdown files can be compiled to produce a variety of output formats. You
will be prompted to select which one you want the first time you run the
exporter. &lt;code&gt;polymode&lt;/code&gt; remembers this setting, so you don’t get prompted a
second time. If you want to switch, say from pdf output to html, you can
reset the target via &lt;code&gt;C-u M-n e&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;&lt;a href=&#34;https://github.com/jrblevin/markdown-mode/pull/581&#34;&gt;Feature added 6 January
2021&lt;/a&gt;.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn2&#34;&gt;&lt;p&gt;&lt;a href=&#34;https://github.com/polymode/poly-R/pull/27&#34;&gt;Feature added 29 September
2021&lt;/a&gt;&lt;a href=&#34;#fnref2&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn3&#34;&gt;&lt;p&gt;By default; you can turn this feature off by setting the variable
&lt;code&gt;markdown-gfm-use-electric-backquote&lt;/code&gt; to nil.&lt;a href=&#34;#fnref3&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li id=&#34;fn4&#34;&gt;&lt;p&gt;I’m working on tab-completion for R chunk options, but haven’t
decided how best to set it up yet. Watch this space!&lt;a href=&#34;#fnref4&#34; class=&#34;footnote-back&#34;&gt;↩&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Evaluating Invasion Stage with SDMs</title>
      <link>https://plantarum.ca/2021/08/11/invasion-stage/</link>
      <pubDate>Wed, 11 Aug 2021 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2021/08/11/invasion-stage/</guid>
      <description>
&lt;script src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;My attempt to recreate the invasion stage analysis developed by
&lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-GallienEtAl_2012&#34; role=&#34;doc-biblioref&#34;&gt;Gallien et al.&lt;/a&gt; (&lt;a href=&#34;#ref-GallienEtAl_2012&#34; role=&#34;doc-biblioref&#34;&gt;2012&lt;/a&gt;)&lt;/span&gt;, inspired by seeing it applied by &lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-EckertEtAl_2020&#34; role=&#34;doc-biblioref&#34;&gt;Eckert et al.&lt;/a&gt; (&lt;a href=&#34;#ref-EckertEtAl_2020&#34; role=&#34;doc-biblioref&#34;&gt;2020&lt;/a&gt;)&lt;/span&gt;. We’ll
continue with the &lt;em&gt;Lythrum salicaria&lt;/em&gt; data from my tutorial on &lt;a href=&#34;https://plantarum.ca/2021/07/29/ecospat/&#34;&gt;niche
quantification analysis&lt;/a&gt;. Specifically, I’ll model
how the niche space this species occupies in its invaded range in North
America relates to its global niche.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ecospat)
library(raster)
library(rgbif)
library(maptools)
library(magrittr)
library(dismo)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-GallienEtAl_2012&#34; role=&#34;doc-biblioref&#34;&gt;Gallien et al.&lt;/a&gt; (&lt;a href=&#34;#ref-GallienEtAl_2012&#34; role=&#34;doc-biblioref&#34;&gt;2012&lt;/a&gt;)&lt;/span&gt; used an ensemble of SDMs, which is (should be) more
robust than applying a single approach. Nevertheless, for this short
tutorial, I’ll stick to Maxent. I’m also cutting a lot of corners with
respect to variable selection, model validation and other important steps.
See my &lt;a href=&#34;https://plantarum.ca/2020/06/15/maxent/&#34;&gt;Maxent notebook&lt;/a&gt; for pointers.&lt;/p&gt;
&lt;div id=&#34;niche-models&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Niche Models&lt;/h1&gt;
&lt;p&gt;We start by constructing SDMs for the global and North American
distribution of &lt;em&gt;L. salicaria&lt;/em&gt;.&lt;/p&gt;
&lt;div id=&#34;data&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Data&lt;/h2&gt;
&lt;p&gt;We need occurrence data and environmental data, and we’ll need to create
background (pseudoabsence) samples.&lt;/p&gt;
&lt;p&gt;The occurence data comes from GBIF, with details in my &lt;a href=&#34;https://plantarum.ca/2021/07/29/ecospat/&#34;&gt;previous
post&lt;/a&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;load(&amp;quot;../data/2021-07-29-ls-gbif-recs.Rda&amp;quot;)
lsOccs &amp;lt;- lsGBIF$data

coordinates(lsOccs) &amp;lt;- c(&amp;quot;decimalLongitude&amp;quot;,
                        &amp;quot;decimalLatitude&amp;quot;) 
  ## Set the projection
crs(lsOccs) &amp;lt;- &amp;#39;+proj=longlat +datum=WGS84&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(wrld_simpl) # load the maptools worldmap

par(mar = c(0,0, 0, 0))
plot(wrld_simpl, border = &amp;quot;gray80&amp;quot;)
points(lsOccs, pch = 16, col = 2, cex = 0.3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/observation-data-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We’ll use the same climate data as well, sourced from WorldClim
&lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-FickHijmans_2017&#34; role=&#34;doc-biblioref&#34;&gt;Fick and Hijmans 2017&lt;/a&gt;)&lt;/span&gt; and imported using functions provided in the &lt;code&gt;raster&lt;/code&gt;
&lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-Hijmans_2021&#34; role=&#34;doc-biblioref&#34;&gt;Hijmans 2021&lt;/a&gt;)&lt;/span&gt; package. Note that I use the &lt;code&gt;path&lt;/code&gt; argument to direct the
download to a particular location. This is the same location I used in the
previous post, and the data is still there, so it doesn’t get downloaded
again.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wclim &amp;lt;- getData(&amp;quot;worldclim&amp;quot;, var = &amp;quot;bio&amp;quot;, res = 10,
                path = &amp;quot;../data&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We need to define our study extent for selecting background points. I’ll
use a 200 km buffer around our observations. We’re working at the global
scale, and &lt;em&gt;Lythrum salicaria&lt;/em&gt; is a strong disperser, so a relatively large
scale is appropriate here. You’ll need to consider the aims of your own
study when setting your extent.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;studyExtent &amp;lt;- buffer(lsOccs, 200000, dissolve = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required namespace: rgeos&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
## NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(wrld_simpl, border = &amp;quot;gray80&amp;quot;)
plot(studyExtent, col = &amp;#39;lightgreen&amp;#39;, add = TRUE)
points(lsOccs, pch = 16, col = 2, cex = 0.3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/extent-buffer-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The 200 km buffer creates some isolated pockets in North America. The
extent should represent the area the species can access. The buffer I made
includes the west coast inwards to Alberta, the east coast inwards to
Saskatchewan, with an isolated patch in the center of Canada which looks
like it’s at the Alberta/Saskatchewan border, with similar ‘islands’ in the
western US:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(wrld_simpl, border = &amp;quot;gray80&amp;quot;, xlim = c(-135, -90),
     ylim = c(45, 60))
plot(studyExtent, col = &amp;#39;lightgreen&amp;#39;, add = TRUE)
points(lsOccs, pch = 16, col = 2, cex = 1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/extent-buffer2-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Those islands are likely the leading edge of the same invasion, not
separate invasions! I’m going to increase our buffer to 300 km to capture
the intervening area on the map:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;studyExtent &amp;lt;- buffer(lsOccs, 300000, dissolve = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files
## NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(wrld_simpl, border = &amp;quot;gray80&amp;quot;)
plot(studyExtent, col = &amp;#39;lightgreen&amp;#39;, add = TRUE)
points(lsOccs, pch = 16, col = 2, cex = 0.3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/extent-buffer-plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This is better. I prefer to use ecoregions to set study extent, but for the
purposes of this demo I’ll continue with this.&lt;/p&gt;
&lt;p&gt;One further issue: our study extent includes the ocean. Let’s trim it back
to the land:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;land &amp;lt;- aggregate(wrld_simpl) ## dissolve country borders

  ## clip buffer to land:
studyExtent &amp;lt;- intersect(studyExtent, land) 

plot(wrld_simpl, border = &amp;quot;gray80&amp;quot;)
plot(studyExtent, col = &amp;#39;lightgreen&amp;#39;, add = TRUE)
points(lsOccs, pch = 16, col = 2, cex = 0.3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/crop-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;(This generates some warnings, likely related to missing values in my data
or issues with the shapefile manipulations. It seems safe to proceed.)&lt;/p&gt;
&lt;p&gt;The aggregation is a bit rough, but that should work for my purposes today.
Now we can select our background points. I’m using 10000 points, and
excluding any cells with a &lt;em&gt;Lythrum salicaria&lt;/em&gt; occurrence.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;  ## Convert landmass polygon to a raster:
landMask &amp;lt;- rasterize(land, wclim)

  ## sample points from the raster:  
background &amp;lt;- randomPoints(landMask, n = 10000, p = lsOccs)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;global-sdm&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Global SDM&lt;/h2&gt;
&lt;p&gt;Now we can fit our Maxent model. To reduce bias, I’ll thin the samples to
5 observations per grid cell (ca. 20 km square). Normally I work on a
finer resolution (1 km 2), and thin to 1 observation per cell. Again, the
details depend on your study area and goals.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsThin &amp;lt;- gridSample(lsOccs, wclim, n = 5) %&amp;gt;%
  as.data.frame

coordinates(lsThin) &amp;lt;-
  c(&amp;quot;decimalLongitude&amp;quot;, &amp;quot;decimalLatitude&amp;quot;)
glMax &amp;lt;- maxent(wclim, p = lsThin, a = background)
glPred &amp;lt;- predict(glMax, wclim)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Warnings here tell us that some of the occurrences are in locations where
there is no climate data. That’s normal, and not a problem as long as there
are only a few points lost this way. If you are working with small data
sets, you’ll want to investigate further to see if you can better match
your records with the climate rasters.)&lt;/p&gt;
&lt;p&gt;Here’s the model prediction:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(glPred)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/global-maxent-plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;north-america-sdm&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;North America SDM&lt;/h2&gt;
&lt;p&gt;For the SDM in the invaded range in North America, we need to crop our
observations and background. Here I’m repeating the functions I used above
to create a raster mask for land, but applying it only to the area of
Canada, United States, and Mexico (our species isn’t in the Caribbean). I’m
using the pipe (&lt;code&gt;%&amp;gt;%&lt;/code&gt;) feature from &lt;code&gt;magrittr&lt;/code&gt;, which makes it easier to
follow the process.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;NA_polygon &amp;lt;- wrld_simpl %&amp;gt;%
  subset(NAME %in%
         c(&amp;quot;Canada&amp;quot;, &amp;quot;United States&amp;quot;, &amp;quot;Mexico&amp;quot;)) %&amp;gt;%
  aggregate()

NA_mask &amp;lt;- rasterize(NA_polygon, wclim)

NA_background &amp;lt;-
  randomPoints(NA_mask, n = 10000, p = lsOccs) %&amp;gt;%
  as.data.frame()

coordinates(NA_background) &amp;lt;- c(&amp;quot;x&amp;quot;, &amp;quot;y&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the original paper of &lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-GallienEtAl_2012&#34; role=&#34;doc-biblioref&#34;&gt;Gallien et al.&lt;/a&gt; (&lt;a href=&#34;#ref-GallienEtAl_2012&#34; role=&#34;doc-biblioref&#34;&gt;2012&lt;/a&gt;)&lt;/span&gt;, the background points were
weighted using the values from the global model. I don’t think
&lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-EckertEtAl_2020&#34; role=&#34;doc-biblioref&#34;&gt;Eckert et al.&lt;/a&gt; (&lt;a href=&#34;#ref-EckertEtAl_2020&#34; role=&#34;doc-biblioref&#34;&gt;2020&lt;/a&gt;)&lt;/span&gt; applied this weighting, and it’s not clear to me how to
do so with Maxent. For now I’ll skip it.&lt;/p&gt;
&lt;p&gt;For the occurrence records, I’ll take my previously thinned data, and crop
it to North America:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;NA_polygon &amp;lt;- wrld_simpl %&amp;gt;%
  subset(NAME %in%
         c(&amp;quot;Canada&amp;quot;, &amp;quot;United States&amp;quot;, &amp;quot;Mexico&amp;quot;)) %&amp;gt;%
  aggregate()

lsNAThin &amp;lt;- intersect(lsThin, NA_polygon)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can construct the SDM for North America:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;naMax &amp;lt;- maxent(wclim, p = lsNAThin, a = NA_background)
naPred &amp;lt;- predict(naMax, wclim)
plot(naPred)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/NA-maxent-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Note that I didn’t crop the WorldClim layer for the North American SDM
model fitting. Maxent only uses the data for the presence and background
points, so it doesn’t matter if the climate layers cover the whole planet
for this step.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;invasion-stage-analysis&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Invasion Stage Analysis&lt;/h1&gt;
&lt;p&gt;Now that we have completed both a global and a local (North America) SDM
for &lt;em&gt;L. salicaria&lt;/em&gt;, we’re ready to compare the results.&lt;/p&gt;
&lt;div id=&#34;niche-space&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Niche Space&lt;/h2&gt;
&lt;p&gt;The values we need are the model predictions corresponding to each
observation in North America.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;globalVals &amp;lt;- extract(glPred, lsNAThin)
naVals &amp;lt;- extract(naPred, lsNAThin)

plot(naVals ~ globalVals, pch = 16, xlim = c(0, 1),
     ylim = c(0, 1),
     xlab = &amp;quot;Global model predictions&amp;quot;,
     ylab = &amp;quot;Regional model predictions&amp;quot;,
     col = &amp;quot;#00000050&amp;quot;)
abline(h = 0.5, lty = 2)
abline(v = 0.5, lty = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/model-predictions-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This plot compares the default Maxent output, the complementary log-log
value. This is an estimate of the probability of presence, which is more
appropriate than the other options for this kind of analysis (raw values
would be difficult to interpret). However, I’m not sure 50% is the most
appropriate value to use in the analysis that follows. &lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-EckertEtAl_2020&#34; role=&#34;doc-biblioref&#34;&gt;Eckert et al.&lt;/a&gt; (&lt;a href=&#34;#ref-EckertEtAl_2020&#34; role=&#34;doc-biblioref&#34;&gt;2020&lt;/a&gt;)&lt;/span&gt;
used &lt;code&gt;optim.thresh&lt;/code&gt; from the (now defunct) SDMTools package to determine
the best threshold for their study.&lt;/p&gt;
&lt;p&gt;Following &lt;span class=&#34;citation&#34;&gt;&lt;a href=&#34;#ref-GallienEtAl_2012&#34; role=&#34;doc-biblioref&#34;&gt;Gallien et al.&lt;/a&gt; (&lt;a href=&#34;#ref-GallienEtAl_2012&#34; role=&#34;doc-biblioref&#34;&gt;2012&lt;/a&gt;)&lt;/span&gt;, we interpret the four quadrants of this plot
as follows:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;Upper right&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;High suitability in both native and global habitat. Observations here are
occupying locations that fall within both the global and invaded niche,
interpreted as ‘stabilizing.’&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Upper left&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;High suitability in native model, but low suitability in the global
model. Observations are occupying locations that are within the invaded
niche, but outside the global niche, interpreted as populations
demonstrating local adaption&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Lower right&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;High suitability in the global model, but low suitability in the local
model. These are interpreted as regional colonizations: the conditions
here are within the global niche, but which are only starting to be
occupied in the invaded range.&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;Lower left&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Low suitability in both the local and global model. Presumably sink
populations (not likely to persist).&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Let’s tabulate the results:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tally &amp;lt;- c(stabilizing
          = sum(globalVals &amp;gt;= 0.5 &amp;amp; naVals &amp;gt;= 0.5,
                na.rm = TRUE),
          adapting = sum(globalVals &amp;lt; 0.5 &amp;amp; naVals &amp;gt;= 0.5,
                         na.rm = TRUE),
          sinks = sum(globalVals &amp;lt; 0.5 &amp;amp; naVals &amp;lt; 0.5,
                      na.rm = TRUE),
          colonizing = sum(globalVals &amp;gt;= 0.5 &amp;amp; naVals &amp;lt; 0.5,
                           na.rm = TRUE))

barplot(tally, ylab = &amp;quot;Occurences&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/niche-space-tally-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can plot these regions on the map as well (apologies for the opaque
raster algebra; there should be a clearer way to calculate this, but I can’t
think of it at the moment).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;suitabilityThreshold &amp;lt;- 0.5
na_Niche &amp;lt;- naPred &amp;gt; suitabilityThreshold
gl_Niche &amp;lt;- glPred &amp;gt; suitabilityThreshold

stable_Niche &amp;lt;- (na_Niche + gl_Niche) == 2
expansion_Niche &amp;lt;- ((2 * na_Niche) - gl_Niche) == 2
contraction_Niche &amp;lt;- ((2 * gl_Niche) - na_Niche) == 2

NicheRaster &amp;lt;- stable_Niche + (2 * expansion_Niche) +
  (3 * contraction_Niche)

plot(NicheRaster, xlim = c(-140, -60), ylim = c(30, 70),
     col = c(&amp;quot;white&amp;quot;, &amp;quot;blue&amp;quot;, &amp;quot;red&amp;quot;, &amp;quot;green&amp;quot;),
     legend = FALSE)
plot(wrld_simpl, add = TRUE)
points(lsNAThin, pch = 16, cex = 0.5)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/08/11/invasion-stage/index_files/figure-html/map-comparisons-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In this plot, the blue depicts areas identified as suitable habitat in both
the global and regional model. The green is area identified as suitable
habitat in the global model, but not the North American model. There are
some occurrences in this area, but they aren’t as numerous as the blue
regions. Finally, the red areas were identified by the North American model
as suitable habitat but they were not part of the global model’s suitable
habitat. Following Gallien’s framework, any points in the white areas would be
‘sinks.’ More likely they’re the current leading edge of the invasion front
I think.&lt;/p&gt;
&lt;p&gt;Obviously, there’s a lot going on here, and each of these steps will
warrant careful consideration and additional checks, validations, and
optimizations. I hope this simplified outline is enough to get you started.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references csl-bib-body hanging-indent&#34;&gt;
&lt;div id=&#34;ref-EckertEtAl_2020&#34; class=&#34;csl-entry&#34;&gt;
Eckert, Sandra, Amina Hamad, Charles Joseph Kilawe, Theo E. W. Linders, Wai‐Tim Ng, Purity Rima Mbaabu, Hailu Shiferaw, Arne Witt, and Urs Schaffner. 2020. &lt;span&gt;“Niche Change Analysis as a Tool to Inform Management of Two Invasive Species in Eastern Africa.”&lt;/span&gt; &lt;em&gt;Ecosphere&lt;/em&gt; 11 (2). &lt;a href=&#34;https://doi.org/10.1002/ecs2.2987&#34;&gt;https://doi.org/10.1002/ecs2.2987&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&#34;ref-FickHijmans_2017&#34; class=&#34;csl-entry&#34;&gt;
Fick, Stephen E., and Robert J. Hijmans. 2017. &lt;span&gt;“WorldClim 2: New 1-Km Spatial Resolution Climate Surfaces for Global Land Areas.”&lt;/span&gt; &lt;em&gt;International Journal of Climatology&lt;/em&gt; 37 (12): 4302–15. &lt;a href=&#34;https://doi.org/10.1002/joc.5086&#34;&gt;https://doi.org/10.1002/joc.5086&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&#34;ref-GallienEtAl_2012&#34; class=&#34;csl-entry&#34;&gt;
Gallien, Laure, Rolland Douzet, Steve Pratte, Niklaus E. Zimmermann, and Wilfried Thuiller. 2012. &lt;span&gt;“Invasive Species Distribution Models – How Violating the Equilibrium Assumption Can Create New Insights.”&lt;/span&gt; &lt;em&gt;Global Ecology and Biogeography&lt;/em&gt; 21 (11): 1126–36. https://doi.org/&lt;a href=&#34;https://doi.org/10.1111/j.1466-8238.2012.00768.x&#34;&gt;https://doi.org/10.1111/j.1466-8238.2012.00768.x&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&#34;ref-Hijmans_2021&#34; class=&#34;csl-entry&#34;&gt;
Hijmans, Robert J. 2021. &lt;span&gt;“Raster: Geographic Data Analysis and Modeling.”&lt;/span&gt; Manual. &lt;a href=&#34;https://CRAN.R-project.org/package=raster&#34;&gt;https://CRAN.R-project.org/package=raster&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Niche Quantification with Ecospat</title>
      <link>https://plantarum.ca/2021/07/29/ecospat/</link>
      <pubDate>Thu, 29 Jul 2021 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2021/07/29/ecospat/</guid>
      <description>


&lt;div id=&#34;update&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Update&lt;/h1&gt;
&lt;p&gt;NB: this tutorial is now out of date and depends on deprecated versions of
R packages! Please refer to the new version of my &lt;a href=&#34;https://plantarum.ca/2023/07/28/ecospat-terra&#34;&gt;ecospat
tutorial&lt;/a&gt; for the current packages and workflow
for this analysis.&lt;/p&gt;
&lt;p&gt;I’ll leave the old version below in case it’s of any interest, but it won’t
work on current versions of R.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;archived-version&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Archived Version&lt;/h1&gt;
&lt;p&gt;The &lt;code&gt;ecospat&lt;/code&gt; package &lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-ColaEtAl_2017&#34;&gt;Cola et al. 2017&lt;/a&gt;)&lt;/span&gt; provides code to quantify and
compare the environmental and geographic niche of two species, or of the
same species in different contexts (e.g., in its native and invaded
ranges). The included vignette explains how to do such analyses.&lt;/p&gt;
&lt;p&gt;However, the vignette assumes you already have a matrix of occurrence
records, along with the climate data for each of those records. In our
work, we typically have to construct those matrices from observation data
(herbarium records, iNaturalist observations, etc) and climate rasters
&lt;span class=&#34;citation&#34;&gt;(e.g. &lt;a href=&#34;#ref-FickHijmans_2017&#34;&gt;Fick and Hijmans 2017&lt;/a&gt;)&lt;/span&gt;. This short tutorial will walk through the steps
necessary to do this.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;packages&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Packages&lt;/h1&gt;
&lt;p&gt;In addition to &lt;code&gt;ecospat&lt;/code&gt;, we’ll use &lt;code&gt;raster&lt;/code&gt; &lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-Hijmans_2021&#34;&gt;Hijmans 2021&lt;/a&gt;)&lt;/span&gt; to download
WorldClim &lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-FickHijmans_2017&#34;&gt;Fick and Hijmans 2017&lt;/a&gt;)&lt;/span&gt; rasters, and manipulate the spatial data;
&lt;code&gt;rgbif&lt;/code&gt; &lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-ChamberlainEtAl_2021&#34;&gt;Chamberlain et al. 2021&lt;/a&gt;)&lt;/span&gt; to download GBIF records, and &lt;code&gt;maptools&lt;/code&gt;
&lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-BivandLewin-Koh_2021&#34;&gt;Bivand and Lewin-Koh 2021&lt;/a&gt;)&lt;/span&gt; to get a world basemap for plots.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ecospat)
library(raster)
library(rgbif)
library(maptools)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;NB&lt;/strong&gt; there is a &lt;a href=&#34;https://github.com/ecospat/ecospat/issues/18&#34;&gt;bug in
ecospat&lt;/a&gt; that prevents us
from using the argument &lt;code&gt;geomask&lt;/code&gt; (see below). This has been fixed, but as
of 2021-07-29, the bug fix has not made it into the released package,
currently version 3.2. Consequently, you need to install directly from the
development sources:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(devtools)
install_github(repo = &amp;quot;ecospat/ecospat/ecospat&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Presumably this won’t be necessary for versions 3.3+ or newer (once
released).&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;getting-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Getting Data&lt;/h1&gt;
&lt;p&gt;We’ll start by sourcing our data. For observations, let’s take a look at
Purple Loosestrife, a wetland species that is native to Europe, and
invasive in North America. For actual research work, I normally download
the files directly from GBIF, and examine them carefully to check for
errors or missing data. For this demo we’ll use the &lt;code&gt;rgbif&lt;/code&gt; package to
download the data directly into R, and we’ll assume there are no problems.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsGBIF &amp;lt;- occ_search(scientificName = &amp;quot;Lythrum salicaria&amp;quot;,
                    limit = 10000,
                    basisOfRecord = &amp;quot;Preserved_Specimen&amp;quot;,
                    hasCoordinate = TRUE,
                    fields = c(&amp;quot;decimalLatitude&amp;quot;,
                               &amp;quot;decimalLongitude&amp;quot;, &amp;quot;year&amp;quot;,
                               &amp;quot;country&amp;quot;, &amp;quot;countryCode&amp;quot;))

save(lsGBIF, file = &amp;quot;../data/2021-07-29-ls-gbif-recs.Rda&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This returned an object with 7969 records. I saved that locally, so that
I’m not making GBIF search their database everytime I work on this demo.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;load(&amp;quot;../data/2021-07-29-ls-gbif-recs.Rda&amp;quot;)
lsOccs &amp;lt;- lsGBIF$data&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;lsGBIF$data&lt;/code&gt; is the table with the actual records in it. That’s what we’ll
be working with. The other components of &lt;code&gt;lsGBIF&lt;/code&gt; are metadata related to
the original GBIF search. That’s useful to have, but not needed for the
rest of this example.&lt;/p&gt;
&lt;p&gt;Next, we tell R which columns are the coordinates, which allows us to map
the observations. This also converts our observation matrix to a
&lt;code&gt;SpatialPointsDataFrame&lt;/code&gt; object.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;coordinates(lsOccs) &amp;lt;- c(&amp;quot;decimalLongitude&amp;quot;,
                        &amp;quot;decimalLatitude&amp;quot;) 
data(wrld_simpl) # load the maptools worldmap

par(mar = c(0,0, 0, 0))
plot(wrld_simpl, border = &amp;quot;gray80&amp;quot;)
points(lsOccs, pch = 16, col = 2, cex = 0.3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To get our climate data, we can use raster’s &lt;code&gt;getData&lt;/code&gt; function. The first
time you call this function in a directory, it downloads the data from the
internet, and saves it locally. Subsequent calls will load your local copy
of the data, to speed things up. I’m using the coarsest resolution (10
minutes) to speed things up for this demonstration:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;wclim &amp;lt;- getData(&amp;quot;worldclim&amp;quot;, var = &amp;quot;bio&amp;quot;, res = 10,
                path = &amp;quot;../data&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can take a look at one layer:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;par(mar = c(0,0, 3, 1))
plot(wclim[[&amp;quot;bio1&amp;quot;]], main = &amp;quot;bio1&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we need to extract the environmental values from the climate rasters
for each of our observation records:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsOccs &amp;lt;- cbind(lsOccs, extract(wclim, lsOccs))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the process of extracting &lt;code&gt;wclim&lt;/code&gt; values for our observations, we
usually end up with a few missing values. This is a consequence of
mismatches between the observation coordinates and the climate rasters. In
some cases, the observations are placed off the coast in the ocean, or in
another area where there is no climate data available. We need to exclude
these missing values from our analysis.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsOccs &amp;lt;- lsOccs[complete.cases(data.frame(lsOccs)), ]&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;splitting-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Splitting Data&lt;/h1&gt;
&lt;p&gt;At this point, all the data we need for the Niche Quantification analysis
is in &lt;code&gt;lsOccs&lt;/code&gt; and &lt;code&gt;wclimMat&lt;/code&gt;. We need to split this data into native and
invasive regions for our comparison. We’ll restrict ourselves to the
northern hemisphere north of 20 degrees, and consider all records from
Eurasia as native, and all records from North America as invasive.&lt;/p&gt;
&lt;p&gt;I’ve created extents to cover the rough outlines of the areas in question.
In practice, you could use a more carefully constructed vector map to split
your data.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## North America: na
## Longitude from 40 to 180W, Latitude from 20 to 90N
naExt &amp;lt;- extent(c(-180, -40, 20, 90))
lsNA &amp;lt;- crop(lsOccs, naExt)

## Eurasia: ea
## Longitude from 40W to 180E, Latitude from 20 to 90N
eaExt &amp;lt;- extent(c(-40, 180, 20, 90))
lsEA &amp;lt;- crop(lsOccs, eaExt)

par(mar = c(1, 0, 0, 0))
plot(wrld_simpl, ylim = c(20, 80), axes = FALSE)
points(lsNA, pch = 16, col = &amp;#39;red&amp;#39;, cex = 0.5)
points(lsEA, pch = 16, col = &amp;#39;darkgreen&amp;#39;, cex = 0.5)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For the Niche Quantification, we need to have a matrix with the background
environment present in the native and invasive ranges, as well as the
complete global environmental including the combined extent of the native and
introduced environments. After cropping, we use &lt;code&gt;getValues&lt;/code&gt; to convert the
raster to a dataframe.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Crop Climate Layers:
naEnvR &amp;lt;- crop(wclim, naExt)
eaEnvR &amp;lt;- crop(wclim, eaExt)

## Extract values to matrix:
naEnvM &amp;lt;- getValues(naEnvR)
eaEnvM &amp;lt;- getValues(eaEnvR)

## Clean out missing values:
naEnvM &amp;lt;- naEnvM[complete.cases(naEnvM), ]
eaEnvM &amp;lt;- eaEnvM[complete.cases(eaEnvM), ]

## Combined global environment:
globalEnvM &amp;lt;- rbind(naEnvM, eaEnvM)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;niche-quantification&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Niche Quantification&lt;/h1&gt;
&lt;div id=&#34;pca&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;PCA&lt;/h2&gt;
&lt;p&gt;The Niche Quantification analysis starts with a Principal Components
Analysis of the environmental data. The actual ordination uses the global
data, with the observation records and the native and invasive background
environment treated as supplemental rows.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pca.clim &amp;lt;- dudi.pca(globalEnvM, center = TRUE,
                    scale = TRUE, scannf = FALSE, nf = 2)
global.scores &amp;lt;- pca.clim$li

nativeLS.scores &amp;lt;-
  suprow(pca.clim,
         data.frame(lsEA)[, colnames(globalEnvM)])$li   
invasiveLS.scores &amp;lt;-
  suprow(pca.clim,
         data.frame(lsNA)[, colnames(globalEnvM)])$li

nativeEnv.scores &amp;lt;- suprow(pca.clim, naEnvM)$li
invasiveEnv.scores &amp;lt;- suprow(pca.clim, eaEnvM)$li&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s break that down. &lt;code&gt;dudi.pca&lt;/code&gt; does a PCA analysis on &lt;code&gt;globalEnvM&lt;/code&gt;,
which is a matrix of all the environmental variables over the entire study
area. We use that to create a two-dimensional summary of the total
environmental variability.&lt;/p&gt;
&lt;p&gt;Next, we map our observation data (&lt;code&gt;lsEA&lt;/code&gt; and &lt;code&gt;lsNA&lt;/code&gt;) into that
2-dimensional ordination, using the &lt;code&gt;suprow&lt;/code&gt; function. &lt;code&gt;lsEA&lt;/code&gt; and &lt;code&gt;lsNA&lt;/code&gt;
are &lt;code&gt;SpatialPointsDataFrame&lt;/code&gt; objects. Sometimes you can treat them as if
they were data.frames, but other times you need to explicity convert them.
This is one of those times, hence I’ve wrapped them in &lt;code&gt;data.frame()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Recall that &lt;code&gt;lsEA&lt;/code&gt; and &lt;code&gt;lsNA&lt;/code&gt; have more columns than the environmental
matrix: they also include &lt;code&gt;year&lt;/code&gt;, &lt;code&gt;countryCode&lt;/code&gt;, &lt;code&gt;country&lt;/code&gt;. We only want to
include the environmental variables when you project the observations into
the ordination. To make sure that we use the same variables as in the
original ordination of &lt;code&gt;globalEnvM&lt;/code&gt;, in the same order, I select the
columns explicitly to match that object:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data.frame(lsEA)[, colnames(globalEnvM)]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output of &lt;code&gt;dudi.pca&lt;/code&gt; and &lt;code&gt;suprow&lt;/code&gt; includes a lot of information that we
aren’t using here. We only need the &lt;code&gt;li&lt;/code&gt; element, so I’ve selected that
from each of the function outputs.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;occurence-densities-grid&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Occurence Densities Grid&lt;/h2&gt;
&lt;p&gt;Finally we’re ready to do the Niche Quantification/Comparisons. We’ll use
the PCA scores for the global environment, the native and invasive
environments, and the native and invasive occurrence records.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;nativeGrid &amp;lt;- ecospat.grid.clim.dyn(global.scores,
                                   nativeEnv.scores,
                                   nativeLS.scores)

invasiveGrid &amp;lt;- ecospat.grid.clim.dyn(global.scores,
                                   invasiveEnv.scores, 
                                   invasiveLS.scores)

ecospat.plot.niche.dyn(nativeGrid, invasiveGrid,
                       quant = 0.05) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The resulting plot shows us the environmental conditions present in Eurasia
(inside the green line) and North America (inside the red line). The green
area represents environments occupied by &lt;em&gt;Lythrum salicaria&lt;/em&gt; in Eurasia,
but not in North America, the red area shows environments occupied in North
America and not Eurasia, and the blue area shows environments occupied in
both ranges. We can also see that there are a few areas in Eurasia with
environments not present in North America, and vice versa. However, for the
most part, &lt;em&gt;Lythrum salicara&lt;/em&gt; doesn’t occur in this environments (except
for a tiny bit of green in the center of the plot).&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;geographic-comparisons&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Geographic Comparisons&lt;/h1&gt;
&lt;p&gt;You can also apply this analysis to geographic locations, instead of
environmental conditions. This won’t make much sense for native vs invaded
range comparisons, but it could be useful for comparing different species
within the same area.&lt;/p&gt;
&lt;p&gt;To demonstrate, let’s compare the distribution of &lt;em&gt;Lythrum salicaria&lt;/em&gt; in
North America before and after 1950. We use geographic coordinates here, so
no need for a PCA. We do need to generate the ‘background’ coordinates.
I’ll use &lt;code&gt;expand.grid&lt;/code&gt; to create the locations for this. I’ve broken up the
NA extent into 500 x 500 grids.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;lsNAearly &amp;lt;- subset(lsNA, year &amp;lt;= 1950)
lsNAlate &amp;lt;- subset(lsNA, year &amp;gt; 1950)
geoGrid &amp;lt;- expand.grid(longitude =
                        seq(-160, -40, length.out = 500),
                      latitude =
                        seq(20, 90, length.out = 500))

earlyGeoGrid &amp;lt;- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                     coordinates(lsNAearly))

lateGeoGrid &amp;lt;- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                    coordinates(lsNAlate))

ecospat.plot.niche.dyn(earlyGeoGrid, lateGeoGrid, quant = 0)
plot(wrld_simpl, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This looks pretty good. However, &lt;code&gt;ecospat&lt;/code&gt; uses a kernel density formula to
model the occurence distributions. As a consequence, it projects out into
the ocean, which isn’t very realistic. To correct this, we need to mask the
analysis to the continental land mass. This requires we have a vector map
of the desired area. I’ll combine the US, Canada, and Mexico polygons from
&lt;code&gt;wrld_simpl&lt;/code&gt; for this purpose.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;naMask &amp;lt;- bind(subset(wrld_simpl, NAME == &amp;quot;Canada&amp;quot;),
              subset(wrld_simpl, NAME == &amp;quot;United States&amp;quot;),
              subset(wrld_simpl, NAME == &amp;quot;Mexico&amp;quot;))

earlyGeoGrid &amp;lt;- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                     coordinates(lsNAearly),
                                     geomask = naMask)

lateGeoGrid &amp;lt;- ecospat.grid.clim.dyn(geoGrid, geoGrid,
                                    coordinates(lsNAlate),
                                    geomask = naMask)

ecospat.plot.niche.dyn(earlyGeoGrid, lateGeoGrid, quant = 0)
plot(wrld_simpl, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That gives more reasonable results.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;summary&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;This is a fairly quick overview of this workflow. You’ll almost certainly
want to consider thinning your observations, among other data cleaning
procedures. I’ve also set the study extent very crudely. That might be
appropriate for very large scale (global) studies. But you’ll usually want
to think a bit more carefully about how you set your extent. The way you
process your data will also differ depending on your context.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references csl-bib-body hanging-indent&#34; entry-spacing=&#34;0&#34;&gt;
&lt;div id=&#34;ref-BivandLewin-Koh_2021&#34; class=&#34;csl-entry&#34;&gt;
Bivand, Roger, and Nicholas Lewin-Koh. 2021. &lt;span&gt;“Maptools: Tools for Handling Spatial Objects.”&lt;/span&gt; Manual. &lt;a href=&#34;https://CRAN.R-project.org/package=maptools&#34;&gt;https://CRAN.R-project.org/package=maptools&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&#34;ref-ChamberlainEtAl_2021&#34; class=&#34;csl-entry&#34;&gt;
Chamberlain, Scott, Vijay Barve, Dan Mcglinn, Damiano Oldoni, Peter Desmet, Laurens Geffert, and Karthik Ram. 2021. &lt;span&gt;“Rgbif: Interface to the Global Biodiversity Information Facility API.”&lt;/span&gt; Manual. &lt;a href=&#34;https://CRAN.R-project.org/package=rgbif&#34;&gt;https://CRAN.R-project.org/package=rgbif&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&#34;ref-ColaEtAl_2017&#34; class=&#34;csl-entry&#34;&gt;
Cola, Valeria Di, Olivier Broennimann, Blaise Petitpierre, Frank T. Breiner, Manuela D’Amen, Christophe Randin, Robin Engler, et al. 2017. &lt;span&gt;“Ecospat: An R Package to Support Spatial Analyses and Modeling of Species Niches and Distributions.”&lt;/span&gt; &lt;em&gt;Ecography&lt;/em&gt; 40 (6): 774–87. &lt;a href=&#34;https://doi.org/10.1111/ecog.02671&#34;&gt;https://doi.org/10.1111/ecog.02671&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&#34;ref-FickHijmans_2017&#34; class=&#34;csl-entry&#34;&gt;
Fick, Stephen E., and Robert J. Hijmans. 2017. &lt;span&gt;“WorldClim 2: New 1-Km Spatial Resolution Climate Surfaces for Global Land Areas.”&lt;/span&gt; &lt;em&gt;International Journal of Climatology&lt;/em&gt; 37 (12): 4302–15. &lt;a href=&#34;https://doi.org/10.1002/joc.5086&#34;&gt;https://doi.org/10.1002/joc.5086&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&#34;ref-Hijmans_2021&#34; class=&#34;csl-entry&#34;&gt;
Hijmans, Robert J. 2021. &lt;span&gt;“Raster: Geographic Data Analysis and Modeling.”&lt;/span&gt; Manual. &lt;a href=&#34;https://CRAN.R-project.org/package=raster&#34;&gt;https://CRAN.R-project.org/package=raster&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>GBS Admixture Analysis Workflow</title>
      <link>https://plantarum.ca/2021/06/01/admixture/</link>
      <pubDate>Tue, 01 Jun 2021 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2021/06/01/admixture/</guid>
      <description>
&lt;script src=&#34;https://plantarum.ca/2021/06/01/admixture/index_files/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;&lt;a href=&#34;https://dalexander.github.io/admixture/index.html&#34;&gt;Admixture&lt;/a&gt; is a program
for completing
&lt;a href=&#34;https://web.stanford.edu/group/pritchardlab/structure.html&#34;&gt;STRUCTURE&lt;/a&gt;-style
analyses of large SNP datasets, such as we get with GBS
&lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-ElshireEtAl_2011&#34; role=&#34;doc-biblioref&#34;&gt;Elshire et al. 2011&lt;/a&gt;)&lt;/span&gt;. This short tutorial covers getting our SNP data from
STACKS &lt;span class=&#34;citation&#34;&gt;(&lt;a href=&#34;#ref-RochetteEtAl_2019&#34; role=&#34;doc-biblioref&#34;&gt;Rochette, Rivera‐Colón, and Catchen 2019&lt;/a&gt;)&lt;/span&gt; into a format that Admixture will understand,
running the analysis, and importing the results into
&lt;a href=&#34;https://www.r-project.org/&#34;&gt;R&lt;/a&gt; for further investigation &amp;amp; plotting.&lt;/p&gt;
&lt;div id=&#34;converting-stacks-output-to-admixture-input&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Converting Stacks Output to Admixture Input&lt;/h1&gt;
&lt;p&gt;Both Stacks and Admixture can process
&lt;a href=&#34;https://www.cog-genomics.org/plink2/formats&#34;&gt;PLINK&lt;/a&gt; data. However, there
are a few ‘gotchas’ that took a while to sort out. The simplest way I found
to bridge the two programs was to export my Stacks data to &lt;code&gt;vcf&lt;/code&gt;, clean it
up on the command line, and then use the &lt;code&gt;plink&lt;/code&gt; program to convert it to a
&lt;code&gt;plink&lt;/code&gt; file that Admixture could parse.&lt;/p&gt;
&lt;p&gt;I’ll start from the &lt;code&gt;vcf&lt;/code&gt; file generated by Stacks’
&lt;a href=&#34;http://catchenlab.life.illinois.edu/stacks/comp/populations.php&#34;&gt;populations&lt;/a&gt;
program. We expect to have thousands of contigs in a typical GBS dataset,
and each of which is numbered in the Stacks output:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;head -20 populations.haps.vcf&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##fileformat=VCFv4.2
##fileDate=20210531
##source=&amp;quot;Stacks v2.3e&amp;quot;
##INFO=&amp;lt;ID=AD,Number=R,Type=Integer,Description=&amp;quot;Total Depth for Each Allele&amp;quot;&amp;gt;
...
##FORMAT=&amp;lt;ID=GT,Number=1,Type=String,Description=&amp;quot;Genotype&amp;quot;&amp;gt;
##INFO=&amp;lt;ID=loc_strand,Number=1,Type=Character,Description=&amp;quot;Genomic strand the co
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NIAP1-1011  NIAP1-0342  
26  1   .   TTTCG   TATCG   .   PASS    snp_columns=61,93,136,137,177   GT  0/0 0/0 
46  1   .   AAAGTT  AACATT  .   PASS    snp_columns=13,15,48,73,125,192 GT  0/1 0/1 
103 1   .   CCCACATGATACGCCGC   CCCATATAAGCCGCCGC   .   PASS    snp_columns=11,16   
149 1   .   ACACTGT ACATTGT .   PASS    snp_columns=47,66,84,91,121,126,163 GT  0/0 
271 1   .   CATTGTGCGGAATATGT   TACCTCATTTGCATTAC,CATTGTGCGGAAAATGT .   PASS&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This causes a problem when we try to use &lt;code&gt;plink&lt;/code&gt;, which won’t accept
&lt;code&gt;#CHROM&lt;/code&gt; values higher than 21. To fix this, we need to append a letter to
the &lt;code&gt;#CHROM&lt;/code&gt; numbers:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;sed &amp;#39;/^[[:digit:]]/s/^/c/&amp;#39; populations.haps.vcf &amp;gt; popC.haps.vcf&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;head -20 mingan.haps.vcf&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##fileformat=VCFv4.2
##fileDate=20210531
##source=&amp;quot;Stacks v2.3e&amp;quot;
##INFO=&amp;lt;ID=AD,Number=R,Type=Integer,Description=&amp;quot;Total Depth for Each Allele&amp;quot;&amp;gt;
...
##FORMAT=&amp;lt;ID=GT,Number=1,Type=String,Description=&amp;quot;Genotype&amp;quot;&amp;gt;
##INFO=&amp;lt;ID=loc_strand,Number=1,Type=Character,Description=&amp;quot;Genomic strand the co
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NIAP1-1011  NIAP1-0342  
c26 1   .   TTTCG   TATCG   .   PASS    snp_columns=61,93,136,137,177   GT  0/0 
c46 1   .   AAAGTT  AACATT  .   PASS    snp_columns=13,15,48,73,125,192 GT  0/1 
c103    1   .   CCCACATGATACGCCGC   CCCATATAAGCCGCCGC   .   PASS
c149    1   .   ACACTGT ACATTGT .   PASS    snp_columns=47,66,84,91,121,126,163 GT
c271    1   .   CATTGTGCGGAATATGT   TACCTCATTTGCATTAC,CATTGTGCGGAAAATGT .   PASS&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With that small addition, we can now create a &lt;code&gt;plink&lt;/code&gt; file with the &lt;code&gt;plink&lt;/code&gt;
program:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;plink --vcf popC.haps.vcf --make-bed --out pop.admix --allow-extra-chr 0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This generates a few files: &lt;code&gt;pop.admix.bed&lt;/code&gt;, &lt;code&gt;pop.admix.bim&lt;/code&gt;,
&lt;code&gt;pop.admix.fam&lt;/code&gt;, &lt;code&gt;pop.admix.log&lt;/code&gt;, &lt;code&gt;pop.admix.nosex&lt;/code&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;running-admixture&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Running Admixture&lt;/h1&gt;
&lt;p&gt;We can now run admixture itself. We need all the files generated by &lt;code&gt;plink&lt;/code&gt;
together in the same directory. We’ll pass the &lt;code&gt;.bed&lt;/code&gt; file as an argument
to Admixture, but it will look for the other files when it’s running.&lt;/p&gt;
&lt;p&gt;The other key argument for admixture is the number of clusters to look for.
We most likely will want to try a range of different values, in order to
determine the optimal number (if there is one). We can do this in bash with
a loop:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;for K in `seq -w 1 20` 
do
    admixture --cv pop.admix.bed $K &amp;gt; ktests/k${K}.out
done&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;seq&lt;/code&gt; command generates a sequence of numbers, and the &lt;code&gt;-w&lt;/code&gt; flag tells
it to pad the numbers with zeros (i.e., 01, 02, … 19, 20).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--cv&lt;/code&gt; flag tells admixture to calculate cross-validation error rates,
which we will use to determine the optimal K value.&lt;/p&gt;
&lt;p&gt;We direct the output to files in the directory &lt;code&gt;ktests&lt;/code&gt;. Make sure this
directory exists before you start.&lt;/p&gt;
&lt;p&gt;Once the loop is finished, we’ll want to examine the results:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;grep -h CV ktests/*out&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;CV error (K=1): 0.39835
CV error (K=2): 0.31327
CV error (K=3): 0.26516
CV error (K=4): 0.19929
CV error (K=5): 0.18499
...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we’re ready to move into R to explore the results.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;r-plotting&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;R Plotting&lt;/h1&gt;
&lt;p&gt;First, we’ll take a look at the CV values. Since they’re scattered in 20
different log files, we’ll use grep to collect them into a single file:&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;grep -h CV ktests/*out &amp;gt; CV.csv&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can load that into R:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;CVs &amp;lt;- read.table(&amp;quot;CV.csv&amp;quot;, sep = &amp;quot; &amp;quot;)
CVs &amp;lt;- CVs[, 3:4] ## drop the first two columns
## Remove the formatting around the K values:
CVs[, 1] &amp;lt;- gsub(x = CVs[, 1], pattern = &amp;quot;\\(K=&amp;quot;,
                replacement = &amp;quot;&amp;quot;)
CVs[, 1] &amp;lt;- gsub(x = CVs[, 1], pattern = &amp;quot;\\):&amp;quot;,
                replacement = &amp;quot;&amp;quot;) 
head(CVs)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   V3      V4
## 1  1 0.39835
## 2  2 0.31327
## 3  3 0.26516
## 4  4 0.19929
## 5  5 0.18499
## 6  6 0.15408&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(CVs, xlab = &amp;quot;K&amp;quot;, ylab = &amp;quot;CV error&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/06/01/admixture/index_files/figure-html/CV-plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;In our case, there isn’t a real clear optimum. K = 9 is about the bottom of
the ‘elbow,’ we’ll use that to make our plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ad9 &amp;lt;- read.table(&amp;quot;pop.admix.9.Q&amp;quot;)
head(ad9)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         V1    V2    V3    V4    V5       V6    V7    V8    V9
## 1 0.000010 1e-05 1e-05 1e-05 1e-05 0.999920 1e-05 1e-05 1e-05
## 2 0.000010 1e-05 1e-05 1e-05 1e-05 0.999920 1e-05 1e-05 1e-05
## 3 0.000010 1e-05 1e-05 1e-05 1e-05 0.999920 1e-05 1e-05 1e-05
## 4 0.000010 1e-05 1e-05 1e-05 1e-05 0.999920 1e-05 1e-05 1e-05
## 5 0.020244 1e-05 1e-05 1e-05 1e-05 0.979686 1e-05 1e-05 1e-05
## 6 0.003441 1e-05 1e-05 1e-05 1e-05 0.996489 1e-05 1e-05 1e-05&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We also need a popmap file to annotate our plot. This file lists the
population of every sample, and critically, it must be in the same order as
the rows in &lt;code&gt;pop.admix.9.Q&lt;/code&gt;. In this case, I’ll use the popmap data that I
used with the &lt;code&gt;populations&lt;/code&gt; program to generate the original vcf files we
started wtih, with an additional column added with the population names in
it.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;popmap &amp;lt;- read.table(&amp;quot;popmap.csv&amp;quot;)
head(popmap)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       sample popnum    popname
## 1 NIAP1-1011      1 Niapsikau1
## 2 NIAP1-0342      1 Niapsikau1
## 3 NIAP1-1004      1 Niapsikau1
## 4 NIAP1-1017      1 Niapsikau1
## 5 NIAP1-1014      1 Niapsikau1
## 6 NIAP1-0923      1 Niapsikau1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At this point, the two tables are in the same order. Before I do any
manipulations, I’ll combine them. This allows me to sort them in any order,
and the names will stay associated with the correct samples.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ad9 &amp;lt;- cbind(popmap, ad9)
head(ad9)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##       sample popnum    popname       V1    V2    V3    V4    V5       V6    V7
## 1 NIAP1-1011      1 Niapsikau1 0.000010 1e-05 1e-05 1e-05 1e-05 0.999920 1e-05
## 2 NIAP1-0342      1 Niapsikau1 0.000010 1e-05 1e-05 1e-05 1e-05 0.999920 1e-05
## 3 NIAP1-1004      1 Niapsikau1 0.000010 1e-05 1e-05 1e-05 1e-05 0.999920 1e-05
## 4 NIAP1-1017      1 Niapsikau1 0.000010 1e-05 1e-05 1e-05 1e-05 0.999920 1e-05
## 5 NIAP1-1014      1 Niapsikau1 0.020244 1e-05 1e-05 1e-05 1e-05 0.979686 1e-05
## 6 NIAP1-0923      1 Niapsikau1 0.003441 1e-05 1e-05 1e-05 1e-05 0.996489 1e-05
##      V8    V9
## 1 1e-05 1e-05
## 2 1e-05 1e-05
## 3 1e-05 1e-05
## 4 1e-05 1e-05
## 5 1e-05 1e-05
## 6 1e-05 1e-05&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In my case, the samples aren’t in order, so I need to sort them prior to
plotting:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ad9 &amp;lt;- ad9[order(ad9$popnum), ]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we’re ready to plot:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;barplot(t(as.matrix(ad9[, -1:-3])), col=rainbow(9), 
        space = 0, xlab=&amp;quot;Population&amp;quot;, ylab = &amp;quot;Ancestry&amp;quot;, 
        border=NA, axisnames = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/06/01/admixture/index_files/figure-html/admixture-plot-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Let’s break that down. First, we excluded the first three columns, &lt;code&gt;ad9[, -1:-3]&lt;/code&gt;, so that we don’t include our labels in the data. Then we transpose
the matrix, so that each individual sample is represented by a column of
Ancestry proportions. I removed the borders on the bars, and the spaces
between them (&lt;code&gt;border = NA&lt;/code&gt;, and &lt;code&gt;space = 0&lt;/code&gt;), and set the axis labels.&lt;/p&gt;
&lt;p&gt;That’s nice, but we’d like to label our original populations on the plot,
so we can see how they compare to the clusters produced by admixture. I use
the &lt;code&gt;aggregate&lt;/code&gt; function for this.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;xlabels &amp;lt;- aggregate(1:nrow(ad9),
                    by = list(ad9[, &amp;quot;popname&amp;quot;]),
                    FUN = mean)
xlabels&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      Group.1     x
## 1   Fantome4  58.0
## 2   Fantome5  83.5
## 3     Havre6 107.5
## 4     Havre7 125.5
## 5  Marteau11 149.0
## 6 Niapsikau1   7.0
## 7 Niapsikau2  30.5&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here, I’ve grouped the rows by population name, and then calculated the
mean row number for each group. That will be handy, as I can then use that
mean value to plot the name of each population centered beneath it.&lt;/p&gt;
&lt;p&gt;Similarly, I can find the borders of the groups:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sampleEdges &amp;lt;- aggregate(1:nrow(ad9),
                        by = list(ad9[, &amp;quot;popname&amp;quot;]), 
                        FUN = max)
sampleEdges&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##      Group.1   x
## 1   Fantome4  68
## 2   Fantome5  98
## 3     Havre6 116
## 4     Havre7 134
## 5  Marteau11 163
## 6 Niapsikau1  13
## 7 Niapsikau2  47&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, I find the highest row for each population, which I’ll use to
draw a line between them.&lt;/p&gt;
&lt;p&gt;Putting this all together, we get:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;barplot(t(as.matrix(ad9[, -1:-3])), col=rainbow(9), 
        space = 0, xlab=&amp;quot;Population&amp;quot;, ylab = &amp;quot;Ancestry&amp;quot;, 
        border=NA, axisnames = FALSE)
abline(v = sampleEdges$x, lwd = 2)
axis(1, at = xlabels$x - 0.5, labels = xlabels$Group.1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/2021/06/01/admixture/index_files/figure-html/admixture-plot-complete-1.png&#34; width=&#34;864&#34; /&gt;&lt;/p&gt;
&lt;p&gt;And we’re done!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;see-also&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;See Also&lt;/h1&gt;
&lt;p&gt;For an alternative using ggplot2, see &lt;a href=&#34;https://luisdva.github.io/rstats/model-cluster-plots/&#34;&gt;Luis D. Verde Arregoitia’s
blog&lt;/a&gt;. There’s
another tutorial that you might find helpful at
&lt;a href=&#34;https://speciationgenomics.github.io/ADMIXTURE/&#34;&gt;SpeciationGenomics.github.io&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;references&#34; class=&#34;section level1 unnumbered&#34;&gt;
&lt;h1&gt;References&lt;/h1&gt;
&lt;div id=&#34;refs&#34; class=&#34;references csl-bib-body hanging-indent&#34;&gt;
&lt;div id=&#34;ref-ElshireEtAl_2011&#34; class=&#34;csl-entry&#34;&gt;
Elshire, Robert J., Jeffrey C. Glaubitz, Qi Sun, Jesse A. Poland, Ken Kawamoto, Edward S. Buckler, and Sharon E. Mitchell. 2011. &lt;span&gt;“A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species.”&lt;/span&gt; &lt;em&gt;PLoS ONE&lt;/em&gt; 6 (5). &lt;a href=&#34;https://doi.org/10.1371/journal.pone.0019379&#34;&gt;https://doi.org/10.1371/journal.pone.0019379&lt;/a&gt;.
&lt;/div&gt;
&lt;div id=&#34;ref-RochetteEtAl_2019&#34; class=&#34;csl-entry&#34;&gt;
Rochette, Nicolas C., Angel G. Rivera‐Colón, and Julian M. Catchen. 2019. &lt;span&gt;“Stacks 2: Analytical Methods for Paired-End Sequencing Improve RADseq-Based Population Genomics.”&lt;/span&gt; &lt;em&gt;Molecular Ecology&lt;/em&gt; 28 (21): 4737–54. https://doi.org/&lt;a href=&#34;https://doi.org/10.1111/mec.15253&#34;&gt;https://doi.org/10.1111/mec.15253&lt;/a&gt;.
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Adding Lat/Lon Grids to Maps in R</title>
      <link>https://plantarum.ca/2021/02/22/graticules-r/</link>
      <pubDate>Mon, 22 Feb 2021 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2021/02/22/graticules-r/</guid>
      <description>
&lt;script src=&#34;https://plantarum.ca/rmarkdown-libs/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;In a previous post, I outlined my workflow for &lt;a href=&#34;https://plantarum.ca/2020/10/30/simple-maps-r/&#34;&gt;preparing maps in
R&lt;/a&gt;. Today I had to add a
&lt;a href=&#34;https://www.merriam-webster.com/dictionary/graticule&#34;&gt;graticule&lt;/a&gt;, a grid
of latitude and longitude lines, to my maps. That’s easy enough to do with
unprojected maps, as the plot coordinates are latitude and longitude, so
your X and Y axes are already graticules. But if you’ve projected your
data, the plot coordinates are on a different scale, so you need to do a
bit of tuning.&lt;/p&gt;
&lt;p&gt;I couldn’t find a direct way to do this in the R &lt;code&gt;sp&lt;/code&gt; package. However,
&lt;code&gt;sp&lt;/code&gt; (&lt;code&gt;sp&lt;/code&gt; for ‘spatial’) is slowly being replaced by
&lt;a href=&#34;https://r-spatial.github.io/sf/index.html&#34;&gt;sf&lt;/a&gt; (&lt;code&gt;sf&lt;/code&gt; for &lt;a href=&#34;https://en.wikipedia.org/wiki/Simple_Features&#34;&gt;simple
feature&lt;/a&gt;), and &lt;code&gt;sf&lt;/code&gt; does
support graticules. Here are the steps required to add them to your plots:&lt;/p&gt;
&lt;div id=&#34;importing-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Importing Data&lt;/h1&gt;
&lt;p&gt;We can use &lt;code&gt;raster::getData&lt;/code&gt; to get our map data again. It’s
straightforward to convert objects from &lt;code&gt;sp&lt;/code&gt; (&lt;code&gt;Spatial*&lt;/code&gt;) and &lt;code&gt;sf&lt;/code&gt; (&lt;code&gt;sf*&lt;/code&gt;)
format and back, with the functions &lt;code&gt;st_as_sf&lt;/code&gt; (to convert from a
&lt;code&gt;Spatial*&lt;/code&gt; to &lt;code&gt;sf*&lt;/code&gt;), and &lt;code&gt;as&lt;/code&gt; (to convert from &lt;code&gt;sf*&lt;/code&gt; to a &lt;code&gt;Spatial*&lt;/code&gt;
object). As it turns out, &lt;code&gt;getData&lt;/code&gt; also supports downloading data directly
into &lt;code&gt;sf&lt;/code&gt; format:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(sf)
library(raster)
us &amp;lt;- getData(&amp;quot;GADM&amp;quot;, country = &amp;quot;USA&amp;quot;, level = 1,
             path = &amp;quot;./data/maps/&amp;quot;, type = &amp;quot;sf&amp;quot;)
canada &amp;lt;- getData(&amp;quot;GADM&amp;quot;, country = &amp;quot;CAN&amp;quot;, level = 1,
                 path = &amp;quot;./data/maps&amp;quot;, type = &amp;quot;sf&amp;quot;)
mexico &amp;lt;- getData(&amp;quot;GADM&amp;quot;, country = &amp;quot;MEX&amp;quot;, level = 1,
                 path = &amp;quot;./data/maps&amp;quot;, type = &amp;quot;sf&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This uses the undocumented type argument, set to &lt;code&gt;sf&lt;/code&gt;. Given that it’s not
documented, it may change in future, be warned!&lt;/p&gt;
&lt;p&gt;You can also use the function &lt;code&gt;st_read&lt;/code&gt; to read shapefiles directly:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;greatlakes &amp;lt;- st_read(&amp;quot;data/maps/greatlakes.shp&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Reading layer `greatlakes&amp;#39; from data source 
##   `/home/smithty/blogdown/content/tutorials/data/maps/greatlakes.shp&amp;#39; 
##   using driver `ESRI Shapefile&amp;#39;
## Simple feature collection with 2 features and 5 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -365240.6 ymin: -2741892 xmax: 1888977 ymax: 509590.2
## proj4string:   +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the previous tutorial, I used &lt;code&gt;bind&lt;/code&gt; to combine two &lt;code&gt;Spatial*&lt;/code&gt; objects.
With &lt;code&gt;sf&lt;/code&gt; we need &lt;code&gt;rbind&lt;/code&gt; instead:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;na &amp;lt;- rbind(us, canada, mexico)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()
## old-style crs object detected; please recreate object with a recent sf::st_crs()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Plotting complex vector maps like this can be a slow process, especially
when you’re constantly tweaking and adjusting them. You can speed this up
by simplifying the layers:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;na.simp &amp;lt;- st_simplify(na, dTolerance = 0.01)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;On my laptop, plotting the original map takes a minute or more, compared to
2 seconds for the simplified vector. I set the tolerance by trial and
error. The higher the tolerance, the smoother the map will be. At 0.01, it
still looks nearly identical at the scale I’m plotting it, but is much
smaller and faster to plot. &lt;code&gt;sf&lt;/code&gt; does warn me about not correctly
simplifying the data, but since I’m only using this for display that’s not
a concern. I wouldn’t simplify a vector if I was going to use it in an analysis.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;plotting-maps&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Plotting Maps&lt;/h1&gt;
&lt;p&gt;When it comes to plotting, we need to tell R to plot only the geometry. By
default it will plot multiple maps, one for each attribute. That’s not what
we want here.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(st_geometry(na.simp), xlim = c(-130, -70),
     ylim = c(35, 45))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2021-02-22-r-maps-graticules_files/figure-html/plot%20sf%20map-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;projections&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Projections&lt;/h1&gt;
&lt;p&gt;To project our unprojected data, we need to define a projection, and transform the object.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;laea = CRS(&amp;quot;+proj=laea +lat_0=30 +lon_0=-95&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;na.la &amp;lt;- st_transform(na.simp, laea)
plot(st_geometry(na.la), xlim = c(-500000, 2000000),
     ylim = c(-400000, 2100000))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2021-02-22-r-maps-graticules_files/figure-html/projection-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We can add layers just as we did in the previous post:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;gl.la &amp;lt;- st_transform(greatlakes, laea)
plot(st_geometry(gl.la), col = &amp;#39;lightblue&amp;#39;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2021-02-22-r-maps-graticules_files/figure-html/plotting%20the%20great%20lakes%20for%20real-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;You can also mix &lt;code&gt;sf&lt;/code&gt; and &lt;code&gt;Spatial*&lt;/code&gt; objects on the same plot, as long as
they’re in the same projection.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;graticules&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Graticules&lt;/h1&gt;
&lt;p&gt;Now we have everything we need to add graticules to our map. This includes
the map we want to plot, and the CRS data for the graticules we want to
overlay. In our case, we’ll use the original, unprojected layer as the
source our CRS:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(st_geometry(na.la),
     xlim = c(-500000, 2000000), ylim = c(-400000, 2100000),
     graticule = st_crs(na.simp),
     bgc = &amp;#39;lightblue&amp;#39;, ## Background color for the ocean
     col = &amp;#39;white&amp;#39;,
     axes = TRUE)
plot(st_geometry(gl.la), col = &amp;#39;lightblue&amp;#39;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2021-02-22-r-maps-graticules_files/figure-html/plotting%20with%20graticules-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;If you want to specify the location of the graticules, you can use the
arguments &lt;code&gt;lat&lt;/code&gt; and &lt;code&gt;lon&lt;/code&gt; to specify where you want them.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Emacs for Bioinformatics #3: R and ESS</title>
      <link>https://plantarum.ca/2020/12/30/emacs-tutorial-03/</link>
      <pubDate>Wed, 30 Dec 2020 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2020/12/30/emacs-tutorial-03/</guid>
      <description>
&lt;script src=&#34;https://plantarum.ca/rmarkdown-libs/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;


&lt;p&gt;This is part three in my series of Emacs tutorials aimed at bioinformatics
(and other scientific analysis) workflows. See the rest on my
&lt;a href=&#34;https://plantarum.ca/tutorials/&#34;&gt;tutorials&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;Emacs support for the R programming language is provided by the
&lt;a href=&#34;https://ess.r-project.org/&#34; title=&#34;ESS&#34;&gt;ESS&lt;/a&gt; package (AKA, “Emacs Speaks
Statistics”). ESS has been around since at least 1994, and is supported by
a very active development team. It provides most or all of the features of
the more widely-known &lt;a href=&#34;https://rstudio.com/&#34; title=&#34;RStudio&#34;&gt;RStudio&lt;/a&gt;, as
well as a great many more. Like all things Emacs, if it doesn’t have a
feature you want, it’s likely someone else has written a package that
provides it; failing that, the motivated hacker you can create their own
customizations using the built-in scripting language, elisp.&lt;/p&gt;
&lt;p&gt;However, lets not let all that potential scare us off. Getting up and
running with ESS doesn’t require much effort at all.&lt;/p&gt;
&lt;div id=&#34;installation&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Installation&lt;/h1&gt;
&lt;div id=&#34;prerequisites&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;You need to have &lt;code&gt;R&lt;/code&gt; installed in order to use &lt;code&gt;ESS&lt;/code&gt;!&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;installing-ess-from-melpa&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Installing ESS from MELPA&lt;/h2&gt;
&lt;p&gt;The easiest way to install it is to use the &lt;a href=&#34;https://melpa.org/%20%22MELPA%22&#34;&gt;MELPA&lt;/a&gt; package repository. MELPA hosts Emacs packages provided by hackers
who are not part of the Emacs development team. (“packages” here has
roughly the same meaning as “plugins” or “extensions” in other software
systems).&lt;/p&gt;
&lt;p&gt;If you aren’t already using MELPA, you need to add it to your configuration
file (typically &lt;code&gt;~/.emacs.d/init.el&lt;/code&gt;, or &lt;code&gt;~/.emacs&lt;/code&gt;):&lt;/p&gt;
&lt;pre class=&#34;elisp&#34;&gt;&lt;code&gt;(require &amp;#39;package)
(add-to-list &amp;#39;package-archives
             &amp;#39;(&amp;quot;melpa&amp;quot; . &amp;quot;https://melpa.org/packages/&amp;quot;) t)
(package-initialize)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once this code is evaluated, you can view the complete list of packages
available on MELPA via &lt;code&gt;M-x package-list-packages&lt;/code&gt;. It’s a &lt;em&gt;big&lt;/em&gt; list, and
it will take a few seconds for Emacs to get the latest version from the
server (you need an internet connection for this).&lt;/p&gt;
&lt;p&gt;Search down to the entry for &lt;code&gt;ESS&lt;/code&gt;, select it by pressing &lt;code&gt;i&lt;/code&gt;, and then
install it by pressing &lt;code&gt;x&lt;/code&gt;. &lt;code&gt;ESS&lt;/code&gt; is one of the larger packages, so it may
take a few seconds to download and install all the files.&lt;/p&gt;
&lt;p&gt;Once this is done, you have &lt;code&gt;ESS&lt;/code&gt;, and don’t need to return to
&lt;code&gt;package-list-packages&lt;/code&gt; until you want to update to a new version (or add
some other packages).&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;getting-started&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Getting Started&lt;/h1&gt;
&lt;p&gt;ESS comes with a comprehensive manual. That will be your canonical
reference for learning about this package. However, you can get started
with just a few commands.&lt;/p&gt;
&lt;div id=&#34;interactive-r-session&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Interactive R Session&lt;/h2&gt;
&lt;p&gt;From within Emacs, start R with the command &lt;code&gt;M-x R&lt;/code&gt;. You will be prompted
for the project starting directory. Select whatever you like and press
enter. You will then be presented with an R shell:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/images/ess.jpg&#34; /&gt;&lt;/p&gt;
&lt;p&gt;You can enter code and view results here, just as you would in the terminal
in RStudio, or with R running on the command line. ESS uses the same code
to manage this as for &lt;a href=&#34;https://plantarum.ca/2020/06/16/emacs-tutorial-01/&#34;&gt;shell mode&lt;/a&gt;. That
means we can use the same keybindings here:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;&amp;lt;tab&amp;gt;&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;with the cursor at the active prompt, &lt;code&gt;tab&lt;/code&gt; will complete function and
variable names, as well as the arguments for functions&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;&amp;lt;enter&amp;gt;&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;with the cursor at the active prompt, send the command on the prompt to R
for evaluation&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;&amp;lt;enter&amp;gt;&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;With the cursor on a previous command, re-enter that command&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;M-p&lt;/code&gt; and &lt;code&gt;M-n&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Move through your command history at the active prompt&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-p&lt;/code&gt; and &lt;code&gt;C-c C-n&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Move the cursor to the &lt;em&gt;previous&lt;/em&gt; and &lt;em&gt;next&lt;/em&gt; prompts&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c &amp;lt;enter&amp;gt;&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;With the cursor on a previous command, copy that command the to active
prompt, but don’t enter it. This allows you to edit a previous command
before sending a new variation to R for evaluation&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-o&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Delete the output from the previous command&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-v&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;Opens a prompt to select a help file, which will be displayed in Emacs
(you can also open help files from the prompt via &lt;code&gt;?&amp;lt;function&amp;gt;&lt;/code&gt;)&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;These few commands cover most general interactions. There are a lot more
features available. Check the &lt;code&gt;iESS&lt;/code&gt; menu item on the toolbar for some of
them; see the manual for the details.&lt;/p&gt;
&lt;div id=&#34;plots&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Plots&lt;/h3&gt;
&lt;p&gt;Calling plotting commands will create a new window (frame) for your figure.
There isn’t a dedicated pane in Emacs to display them, like in RStudio, and
you can’t scroll forward and backward through your history of images. You
can, however, create multiple image windows, and view them side by side.&lt;/p&gt;
&lt;p&gt;To create and manipulate new image windows, you’ll need the following
commands:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dev.new() ## Create new plot window, and make it the
          ## active window
dev.set() ## If more than one plot window is open,
          ## set the next window to be the active
          ##window &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;See &lt;code&gt;?dev&lt;/code&gt; page for more details.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;writing-r-scripts&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Writing R Scripts&lt;/h2&gt;
&lt;p&gt;After &lt;code&gt;ESS&lt;/code&gt; is installed, anytime you open a file with a &lt;code&gt;.r&lt;/code&gt; or &lt;code&gt;.R&lt;/code&gt;
extension, it will be in &lt;code&gt;ESS[R]&lt;/code&gt; mode. You can enter text as usual, and
additionally have the following helpful commands available:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;C-c C-n&lt;/code&gt; or &lt;code&gt;C-c &amp;lt;enter&amp;gt;&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;send the current line to the R process and step to the next&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-r&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;send the current region to the R process&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-f&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;send the current function to the R process&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-c&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;send the current region, paragraph, or function to the R process&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-b&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;send the entire buffer to the R process&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-v&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;prompt for a help file to open&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;M-tab&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;tab completion of objects (functions, variables, file names) and function
arguments&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;If there is no &lt;code&gt;R&lt;/code&gt; process running when you try to send code, you will be
prompted for a working directory in which to start a new process. In
addition, you can manage &lt;code&gt;R&lt;/code&gt; processes with the following commands:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;C-c C-z&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;switch from the script buffer to the process buffer (and vice versa)&lt;/p&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;C-c C-s&lt;/code&gt;&lt;/dt&gt;
&lt;dd&gt;&lt;p&gt;change the process linked to the current script buffer (e.g., if you want
to run multiple R processes at once, with different scripts in each process)&lt;/p&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;next-steps&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Next Steps&lt;/h1&gt;
&lt;p&gt;This may well be all you need, and if that’s the case, you’re all done.
However, there is a lot more available to you, including support for
writing documentation, package development, managing git repositories,
editing on remote servers, and more.&lt;/p&gt;
&lt;p&gt;My advice is to start slowly. The pointers on this page will get you up and
running. When you find yourself repeating something tedious multiple times,
it may be time to investigate if there’s a shortcut available to make your
life easier. I recommend skimming the manual, to get a sense of all that’s
available, and if something catches your eye see about incorporating it
into your workflow.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Plotting Simple Maps in R</title>
      <link>https://plantarum.ca/2020/10/30/simple-maps-r/</link>
      <pubDate>Fri, 30 Oct 2020 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2020/10/30/simple-maps-r/</guid>
      <description>


&lt;p&gt;NOTE: This tutorial uses older R packages that are scheduled to be
deprecated at the end of 2023. I have updated this tutorial using the new
packages. Unless you need to use older code, you should use the new
&lt;a href=&#34;https://plantarum.ca/2023/02/13/terra-maps&#34;&gt;Terra-based approach&lt;/a&gt; instead of this!&lt;/p&gt;
&lt;div id=&#34;reference&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Reference&lt;/h1&gt;
&lt;p&gt;See the &lt;a href=&#34;https://rspatial.org/raster/spatial/index.html&#34;&gt;RSpatial
tutorial&lt;/a&gt; for a
more detailed introduction/overview of using R for GIS/spatial analysis.
The following tutorial walks through some common plotting tasks I use for
distribution models.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;basemaps&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Basemaps&lt;/h1&gt;
&lt;p&gt;The &lt;code&gt;raster&lt;/code&gt; package provides the function &lt;code&gt;getData&lt;/code&gt;, which is a handy way
to download basemaps for plotting. (You can also use it to get WorldClim
data, see the man page). The first time you call it, it will download the
requested maps from the internet. It will save the data in your working
directory, or in a location specified with the &lt;code&gt;path&lt;/code&gt; argument. The next
time you request the same map from &lt;code&gt;getData&lt;/code&gt;, if it finds it in the local
directory it will load it from there, rather than downloading it again.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(raster)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: sp&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: methods&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: multiple methods tables found for &amp;#39;metadata&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;us &amp;lt;- getData(&amp;quot;GADM&amp;quot;, country = &amp;quot;USA&amp;quot;, level = 1,
             path = &amp;quot;./data/maps/&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in getData(&amp;quot;GADM&amp;quot;, country = &amp;quot;USA&amp;quot;, level = 1, path = &amp;quot;./data/maps/&amp;quot;): getData will be removed in a future version of raster
## . Please use the geodata package instead&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;canada &amp;lt;- getData(&amp;quot;GADM&amp;quot;, country = &amp;quot;CAN&amp;quot;, level = 1,
                 path = &amp;quot;./data/maps&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in getData(&amp;quot;GADM&amp;quot;, country = &amp;quot;CAN&amp;quot;, level = 1, path = &amp;quot;./data/maps&amp;quot;): getData will be removed in a future version of raster
## . Please use the geodata package instead&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These maps can be plotted directly with the &lt;code&gt;plot&lt;/code&gt; command. If you want to
combine them, use the &lt;code&gt;add = TRUE&lt;/code&gt; argument to the second &lt;code&gt;plot&lt;/code&gt; call:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(us)
plot(canada, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-10-30-r-maps_files/figure-html/map%20plots-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;You can combine multiple vector maps into a single map with &lt;code&gt;bind&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;na &amp;lt;- bind(us, canada)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;These maps are ‘unprojected’, meaning they are plotted in
latitude/longitude degrees. That makes it easy to set the plot boundaries:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(na, xlim = c(-100, -50), ylim = c(30, 60))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-10-30-r-maps_files/figure-html/zooming%20a%20map-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NB:&lt;/strong&gt; The size of your plot canvas is fixed, but a map can’t stretch. The
x and y dimensions have to maintain the same aspect. That means zooming in
one dimension (i.e. latitude only) won’t necessarily change the zoom of
your map, if the other dimension fills the canvas. You’ll have to play
around with the plot size, and both x and y dimensions together, to tweak
your zoom.&lt;/p&gt;
&lt;p&gt;It’s handy to have a shapefile of the Great Lakes, for making prettier
maps. I created this one in QGIS and use it for plotting:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;greatlakes &amp;lt;- shapefile(&amp;quot;data/maps/greatlakes.shp&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;adding-data&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Adding Data&lt;/h1&gt;
&lt;p&gt;You can add points to the plot like a regular scatter plot:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(scales)  ## for the alpha function below
gbif &amp;lt;- read.table(&amp;quot;data/trich-gbif.csv&amp;quot;)
## Set the line color to gray to focus on the data points:
plot(na, xlim = c(-100, -50), ylim = c(30, 60),
     border = &amp;quot;gray&amp;quot;)
points(gbif$X, gbif$Y, pch = 16,
       col = alpha(&amp;quot;green&amp;quot;, 0.2))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-10-30-r-maps_files/figure-html/adding%20points-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;You can also convert your points to a spatial points object, in which case
R will know which columns to use for plotting. This is also necessary
before we can project our data (see below).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;coordinates(gbif) &amp;lt;- ~X+Y
## plot(na, xlim = c(-100, -50), ylim = c(30, 60),
##      border = &amp;quot;gray&amp;quot;)
## points(gbif, pch = 16, col = alpha(&amp;quot;green&amp;quot;, 0.2))&lt;/code&gt;&lt;/pre&gt;
&lt;div id=&#34;rasters&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Rasters&lt;/h2&gt;
&lt;p&gt;Similarly, you can plot rasters with plot:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;trichPreds &amp;lt;- raster(&amp;quot;./data/trichPreds&amp;quot;)
plot(trichPreds, xlim = c(-100, -50), ylim = c(30, 60))
plot(na, border = &amp;quot;gray&amp;quot;, lwd = 0.5, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-10-30-r-maps_files/figure-html/loading%20rasters-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Cells with &lt;code&gt;NA&lt;/code&gt; values are transparent. In this case, a species
distribution model, low values are displayed in gray. This may be useful
for visualizing the extent of the model. However, it looks a bit odd, and
makes it hard to see limits of the high-suitability areas. You can tweak
this by playing with the color ramp, but it’s also handy to ‘turn off’ the
low values entirely (for visualization, &lt;strong&gt;not&lt;/strong&gt; for analysis!!)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;trichPredsTrim &amp;lt;- trichPreds
trichPredsTrim[trichPredsTrim &amp;lt;
               quantile(getValues(trichPreds),
                        probs = 0.75, na.rm = TRUE)] &amp;lt;- NA
plot(trichPredsTrim, xlim = c(-100, -50), ylim = c(30, 60))
plot(na, border = &amp;quot;grey&amp;quot;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-10-30-r-maps_files/figure-html/trimming%20predictions-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The test I used here, &lt;code&gt;trichPredsTrim &amp;lt; quantile(getValues(trichPreds), probs = 0.75, na.rm = TRUE)&lt;/code&gt; identifies all cells in the lower 75% of the
suitability scores, which I then set to &lt;code&gt;NA&lt;/code&gt; to make them invisible. I
decided on 75% after experimenting with different values. In this case, 75%
drops most of the grey background (the very lowest values), without eating
into the areas that the prediction indicates are suitable.&lt;/p&gt;
&lt;p&gt;You could also use an absolute value here, but then you’d need to know the
actual distribution of the suitability scores. &lt;code&gt;quantile&lt;/code&gt; is easier to
tweak.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;projections&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Projections&lt;/h1&gt;
&lt;p&gt;Lat/Lon maps look a bit square; we’re more used to seeing maps projected. A
common projection for Canada is Lambert Conformal Conic. We can transform
our data to this projection to make nicer maps:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## define the projection
canlam &amp;lt;- CRS(&amp;quot;+proj=lcc +lat_1=49 +lat_2=77 +lat_0=49 +lon_0=-95 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs&amp;quot;)

## project our vector data:
na.lcc &amp;lt;- spTransform(na, canlam)
gl.lcc &amp;lt;- spTransform(greatlakes, canlam)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: PROJ support is provided by the sf and terra packages among others&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## We already convereted gbif to spatial points object above!
## Now we to set the projection of our points:
crs(gbif) &amp;lt;- CRS(&amp;quot;+proj=longlat +datum=WGS84&amp;quot;)

## Finally, we can project our points to LCC:
gbif.lcc &amp;lt;- spTransform(gbif, canlam)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that our data needs to be in an object of class &lt;code&gt;Spatial*&lt;/code&gt;, and it
must have a defined coordinate reference system (CRS) before we can project
it to a new CRS. Setting the coordinates of our points via the
&lt;code&gt;coordinates&lt;/code&gt; function creates a &lt;code&gt;Spatial*&lt;/code&gt; object. The &lt;code&gt;crs&lt;/code&gt; function
allows us to explicitly set the projection. We need to know the EPSG code
for the projection to use this. The function &lt;code&gt;make_EPSG&lt;/code&gt; in the package
&lt;code&gt;rgdal&lt;/code&gt; is helpful for finding this information. See &lt;a href=&#34;https://rspatial.org/raster/spatial/6-crs.html#notation&#34;&gt;the RSpatial
tutorial&lt;/a&gt; for
details.&lt;/p&gt;
&lt;p&gt;There are a few more steps for raster layers:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rasterLCC &amp;lt;- projectExtent(trichPredsTrim, canlam)
res(rasterLCC) &amp;lt;- 10000 ## set the cell size to 10km
predLCC &amp;lt;- projectRaster(trichPredsTrim, rasterLCC)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that I set the resolution to 10km here. That’s the size of the raster
cells. The original raster cells, in the lat/lon projection, were at 30
second resolution, which is about 1km. I could have set a smaller cell
size here. However, since I’m only using this map for visualization, 10km
is plenty big enough for my plot, and will run faster (and take less
memory) than a map with 1km cell size.&lt;/p&gt;
&lt;p&gt;Now we can plot our data in the Lambert Conformal Projection:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(predLCC)
plot(na.lcc, border = &amp;quot;grey&amp;quot;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-10-30-r-maps_files/figure-html/plot%20projected-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The units are no longer Lat/Lon, but meters. We can read them off the plot
to improve the zoom:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(predLCC, xlim = c(0, 2500000),
     ylim = c(-1500000, -400000))
plot(na.lcc, border = &amp;quot;grey&amp;quot;, add = TRUE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-10-30-r-maps_files/figure-html/projected%20zoom-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;formatting&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Formatting&lt;/h1&gt;
&lt;p&gt;With the data plotted, we can then turn to making the map a little
prettier:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## Make a panel with two plots, set the right margin tight:
par(mar = c(0.1,0.1,0.1,0), mfrow = c(1, 2))

## store the plot limits:
my_xlims &amp;lt;- c(0, 2500000) 
my_ylims &amp;lt;- c(-1300000, -200000)

## Plot the points:
plot(na.lcc, xlim = my_xlims , ylim = my_ylims,
     border = &amp;quot;grey&amp;quot;, bg = &amp;quot;lightblue&amp;quot;, col = &amp;quot;white&amp;quot;)
plot(gl.lcc, add = TRUE, border = &amp;quot;grey&amp;quot;, col = &amp;quot;lightblue&amp;quot;)
points(gbif.lcc, pch = 16, col = alpha(&amp;quot;grey30&amp;quot;, 0.2),
       cex = 0.7)
box() 

## tighten up the left margin:
par(mar = c(0.1,0,0.1,0.1))
plot(na.lcc, xlim = my_xlims , ylim = my_ylims,
     border = &amp;quot;grey&amp;quot;, bg = &amp;quot;lightblue&amp;quot;, col = &amp;quot;white&amp;quot;)
plot(gl.lcc, add = TRUE, border = &amp;quot;grey&amp;quot;, col = &amp;quot;lightblue&amp;quot;)
plot(predLCC, add = TRUE, legend = FALSE)

## plotted again to put the border lines on top:
plot(na.lcc, border = &amp;quot;grey&amp;quot;, add = TRUE) 
box()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-10-30-r-maps_files/figure-html/pretty%20plot-1.png&#34; width=&#34;696&#34; /&gt;&lt;/p&gt;
&lt;p&gt;If you want to plot the state/provincial borders &lt;em&gt;on top&lt;/em&gt; of the raster,
you need to add those layers last. But you can’t set the background colour
of the raster layer to “lightblue” (or at least I haven’t figured that
out), so the ocean stays white. I get around that by plotting the
boundaries twice, first to set the background colour, and then to put the
state lines on top of the raster.&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Medium Performance Cluster Computing</title>
      <link>https://plantarum.ca/2014/08/19/medium-performance-cluster-computing/</link>
      <pubDate>Tue, 19 Aug 2014 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2014/08/19/medium-performance-cluster-computing/</guid>
      <description>&lt;p&gt;I recently ran into a crunch getting some memory-intensive GIS analysis
completed. My work laptop has 2 CPUs and 4GB RAM, and running one instance
of the &lt;a href=&#34;http://grass.osgeo.org&#34;&gt;GRASS GIS&lt;/a&gt; &lt;code&gt;r.horizon&lt;/code&gt; command on a 16GB
map was gobbling up 8GB of virtual RAM, which temporarily ground my machine
to a crawl before the process was killed.&lt;/p&gt;
&lt;p&gt;GRASS is not yet installed on the high performance cluster at work, so I
decided to try setting up my own &lt;em&gt;medium&lt;/em&gt; performance cluster on a
&lt;a href=&#34;https://www.digitalocean.com/?refcode=0c9c59e00cc0&#34;&gt;Digital Ocean&lt;/a&gt;&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; VPS
(which they refer to as &amp;lsquo;droplets&amp;rsquo;).&lt;/p&gt;
&lt;h1 id=&#34;why&#34;&gt;Why?&lt;/h1&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;strong&gt;Price&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;Not only are their prices competitively low, they charge &lt;em&gt;by the hour&lt;/em&gt;.
This means you can have your own virtual machine with 64GB RAM and 20
CPUs for less than $1/hour. What&amp;rsquo;s more, the minimum period is one hour.
Meaning, in theory, you could spin up a droplet, get 20 hours of
processing done, and then shut it all down again with no further
commitment.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Quick &amp;amp; Easy Installation&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;They offer a variety of GNU/Linux distributions, and once you&amp;rsquo;ve chosen
you&amp;rsquo;ll be able to log on to your VPS in 60 seconds. This assumes you&amp;rsquo;re
comfortable working on the command line. But if you&amp;rsquo;re doing cluster
computing, if that&amp;rsquo;s not already true you&amp;rsquo;ll need to learn anyways.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;No Queues&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;No need to worry about submitting batch jobs to a queue or waiting your
turn. It&amp;rsquo;s your VPS, not shared with anyone else.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Reuse Your Installation&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;You pay for the time your droplet is available. However, you can save a
&amp;lsquo;snapshot&amp;rsquo;, which is stored in your account. This allows you to destroy
your droplet when you don&amp;rsquo;t need it. Then, simply reload it from the saved
snapshot when you next need to crunch some numbers.&lt;/dd&gt;
&lt;dt&gt;&lt;strong&gt;Nerdy fun&lt;/strong&gt;&lt;/dt&gt;
&lt;dd&gt;I have to admit, I was motivated in part by sheer, unbridled nerdy
curiosity. Who wouldn&amp;rsquo;t want to ssh in to their very own server?&lt;/dd&gt;
&lt;/dl&gt;
&lt;h1 id=&#34;why-not&#34;&gt;Why not?&lt;/h1&gt;
&lt;p&gt;This won&amp;rsquo;t be a practical solution in all cases. The longer your job will
take to run, the more practical a real cluster becomes. It&amp;rsquo;s also worth
noting that the CPUs aren&amp;rsquo;t particularly high-powered. So processes won&amp;rsquo;t
run faster than on a recent laptop, assuming memory isn&amp;rsquo;t limiting. Another
thing to consider is how much data you have to upload. This is not a viable
approach for true &amp;lsquo;big data&amp;rsquo; projects! Sending gigabytes over the open
internet can be a very slow process, which is another point in favour of
using a local HPC cluster.&lt;/p&gt;
&lt;h1 id=&#34;how-to&#34;&gt;How-To&lt;/h1&gt;
&lt;p&gt;With that in mind, here&amp;rsquo;s how I set up my temporary cluster:&lt;/p&gt;
&lt;h2 id=&#34;purchase-the-droplet&#34;&gt;Purchase the droplet&lt;/h2&gt;
&lt;p&gt;Browse over to &lt;a href=&#34;https://digitalocean.com&#34;&gt;Digital Ocean&lt;/a&gt; and sign up. Once
you&amp;rsquo;re logged in, click &amp;lsquo;create&amp;rsquo; and fill in your details:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hostname: it&amp;rsquo;s your server, call it what you like&lt;/li&gt;
&lt;li&gt;Size: they offer everything from 512MB/1 CPU up to 64GB/20 CPUs, with
prices varying accordingly. From my project I selected 32GB/12 CPUs for
$0.476/hour&lt;/li&gt;
&lt;li&gt;Region: pick something close, particularly if you&amp;rsquo;ll be up/downloading a
lot of data. In my case, that&amp;rsquo;s New York.&lt;/li&gt;
&lt;li&gt;Linux Distribution: Choices include Ubuntu, Fedora, Debian, CentOS. I
picked Debian, as that&amp;rsquo;s been my OS for the past decade. Regardless of
the distribution, if you&amp;rsquo;re going to be working with large files, you
will definitely want to select the 64bit version of your OS.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;security&#34;&gt;Security&lt;/h2&gt;
&lt;p&gt;Your root password and dedicated IP address will be emailed to you. Which
means the NSA will know it before you do. So immediately log in and change
your password. Actually, as soon as you log in you will be required to
change your password in any case.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ssh root@123.45.67.89
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You&amp;rsquo;ll probably also want a regular-strength user for non-administrative
work, so add that next:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;adduser tyler
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;At this point, you can log in as root or as your regular user. A more
secure option is to authenticate via rsa keys. If you haven&amp;rsquo;t done this
before, generate the key on your laptop/local computer:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ssh-keygen -t rsa
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note that you can use a blank passphrase here. Doing so will allow you to
log in to the server without entering a password from now on. It also means
that anyone that has physical access to your laptop also be able to log in
to the server without a password. If you&amp;rsquo;ve lost control of your laptop,
this is likely the least of your worries.&lt;/p&gt;
&lt;p&gt;Next, transfer the key to the server:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ssh-copy-id tyler@123.45.67.89
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You&amp;rsquo;ll be asked for your password again here. Now try logging into the
server again:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ssh tyler@123.45.67.89
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If everything is working correctly, you should be logged in to your droplet
without entering a password. If that is the case, we can proceed to shore
up our security. &lt;code&gt;su&lt;/code&gt; to root user, and edit the ssh config files:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;su
nano /etc/ssh/sshd_config
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Look for and modify the following lines, then save the file. You need to
remove the comment character (&lt;code&gt;#&lt;/code&gt;)from the beginning of the line, if it&amp;rsquo;s
there, and make sure they say &amp;lsquo;no&amp;rsquo;, not &amp;lsquo;yes&amp;rsquo;. You don&amp;rsquo;t need to modify any
other lines.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PermitRootLogin no
PasswordAuthentication no
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The second line will prevent anyone from logging in using a password -
you&amp;rsquo;ll only be able to login if you have the correct RSA key on your
computer. This prevents ne&amp;rsquo;er-do-wells from trying to crack your password.&lt;/p&gt;
&lt;p&gt;The first line will prevent anyone (including you!) from logging in
directly as root. Meaning a potential cracker will have to get your RSA key
in order to log on to the machine, and then they&amp;rsquo;ll have to crack the root
password in order to do anything really nasty.&lt;/p&gt;
&lt;p&gt;Note that if you want to access the server from another computer, you&amp;rsquo;ll
have to log in from each computer via password, or at least &lt;code&gt;ssh-copy-id&lt;/code&gt;
the RSA key, before your set &lt;code&gt;PasswordAuthentication no&lt;/code&gt;. Or, afterwards,
simply set it back to &lt;code&gt;PasswordAuthentication yes&lt;/code&gt; briefly from the first
computer long enough for the second computer to log on and &lt;code&gt;ssh-copy-id&lt;/code&gt;
their RSA key.&lt;/p&gt;
&lt;p&gt;One final configuration detail: if you&amp;rsquo;re going to use an X server (to view
graphical windows of any kind), you need to modify &lt;code&gt;/etc/ssh/ssh_config&lt;/code&gt;.
Make sure it includes the following uncommented line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ForwardX11 yes
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Now you need to reload the modified configuration. Still as root:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;/etc/init.d/ssh reload
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Given that we won&amp;rsquo;t have any outward facing servers on this machine, that
should do for our security for now. If you do want to put some servers on
here, you&amp;rsquo;ll definitely want to look into getting at least a firewall, and
probably some intrusion detection software on here. The DigitalOcean
tutorials are quite good in this area.&lt;/p&gt;
&lt;h2 id=&#34;install-software&#34;&gt;Install Software&lt;/h2&gt;
&lt;p&gt;Now that we have a passably secured machine, it&amp;rsquo;s time to install the
software you&amp;rsquo;ll want. Given this is Debian:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;run &lt;code&gt;aptitude&lt;/code&gt; as root&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;u&lt;/strong&gt;pdate&lt;/li&gt;
&lt;li&gt;install any security updates&lt;/li&gt;
&lt;li&gt;select and install your desired programs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In my case, to run GRASS, I needed the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;aptitude install grass emacs screen htop avce00 &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  e00compr git mercurial xorg
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;(yes, you can run &lt;code&gt;aptitude&lt;/code&gt; from the command line just like &lt;code&gt;apt-get&lt;/code&gt; if
you like!)&lt;/p&gt;
&lt;h2 id=&#34;setup-grass&#34;&gt;Setup GRASS&lt;/h2&gt;
&lt;p&gt;GRASS will require you to create your database directory on the server
(&lt;strong&gt;not&lt;/strong&gt; as root!):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;mkdir ~/grassdata
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next you need to transfer any files you need from your &lt;em&gt;local&lt;/em&gt; machine:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;rsync -az --progress --compresslevel&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;9&lt;/span&gt; --partial &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;  grassdata/location tyler@123.45.67.89:grassdata/
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The options here include &lt;strong&gt;a&lt;/strong&gt;, which among other things will transfer
directories recursively, &lt;strong&gt;z&lt;/strong&gt;, which will compress files prior to transfer
(which dramatically reduces upload times), &lt;strong&gt;compresslevel=9&lt;/strong&gt;, which uses
the greatest amount of compression, and &lt;strong&gt;partial&lt;/strong&gt; which allows rsync to
pick up where it left off in case the connection is interrupted.&lt;/p&gt;
&lt;h2 id=&#34;running-grass-inside-screen&#34;&gt;Running GRASS Inside Screen&lt;/h2&gt;
&lt;p&gt;Finally, we begin. Now we&amp;rsquo;re ready to use X windows, so when you log back
in you&amp;rsquo;ll want to use the &lt;strong&gt;X&lt;/strong&gt; and &lt;strong&gt;C&lt;/strong&gt; flags to &lt;code&gt;ssh&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;ssh -XC tyler@123.45.67.89
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After logging back into your server, run &lt;code&gt;screen&lt;/code&gt;. You&amp;rsquo;ll be transported
from a regular terminal window into a &lt;code&gt;screen&lt;/code&gt; window. It will look almost
exactly the same. But it gives you some super powers, as we&amp;rsquo;ll see shortly.&lt;/p&gt;
&lt;p&gt;Next, start GRASS. In order to conveniently run multiple processes at once,
I prefer to use text mode, hence I start with: &lt;code&gt;grass -text&lt;/code&gt;. Navigate
through the charmingly archaic text windows until you&amp;rsquo;re at the familiar
GRASS text prompt.&lt;/p&gt;
&lt;p&gt;One very helpful thing I discovered about GRASS is that each command is
really a stand-alone program. Which has the lovely side-effect of giving us
quick access to parallel programming. So long as any command &lt;em&gt;foo&lt;/em&gt; does not
require the output of command &lt;em&gt;bar&lt;/em&gt; to run, and vice versa, you can run them
both concurrently. Which means, in my case, I can do things like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;r.slope.aspect elevin&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;dem slope&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;myslope aspect&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;myaspect &amp;amp;
r.horizon elevin&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;elevation horizonstep&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;30&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    bufferzone&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;200&lt;/span&gt; horizon&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;horangle &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;    maxdistance&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;2000&lt;/span&gt; &amp;amp;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You might find yourself getting carried away, starting process after
process. You might want to check on the load your server is under, before
you max out your RAM or CPUs. Here is where &lt;code&gt;screen&lt;/code&gt; comes in handy.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/images/htop-screen.png&#34; alt=&#34;screen, with GRASS running in the top window and htop in the bottom&#34;&gt;&lt;/p&gt;
&lt;p&gt;Type &lt;code&gt;Ctrl-a c&lt;/code&gt;, and you&amp;rsquo;ll have a new terminal window to work in.
Your GRASS session is still working away in the background, and you can
check on how many processes you&amp;rsquo;ve spun out by calling &lt;code&gt;htop&lt;/code&gt;. To go back
to the GRASS session, &lt;code&gt;Ctrl-a &amp;quot;&lt;/code&gt; brings up a list of all the windows
available inside your &lt;code&gt;screen&lt;/code&gt; instance, which you can select from with the
arrow keys.&lt;/p&gt;
&lt;p&gt;Finally, you may need to shut down your laptop at some point while the
GRASS session is still running. To do this, we &lt;em&gt;detach&lt;/em&gt; the &lt;code&gt;screen&lt;/code&gt;
session, with &lt;code&gt;Ctrl-a d&lt;/code&gt;. This tucks the session away out of sight, but it
continues to run. It will continue to run even after we log out of the
server. When you want to reconnect to the session, simply enter &lt;code&gt;screen -r&lt;/code&gt;
at the command line and you&amp;rsquo;re back in charge.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;screen&lt;/code&gt; can do a lot more than this, check out the docs for details!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1 id=&#34;goodbye-but-not-farewell&#34;&gt;Goodbye, but not Farewell&lt;/h1&gt;
&lt;p&gt;When you&amp;rsquo;ve completed all the work you need to do, you can save a snapshot
of your server to use later on. Log in to your Digital Ocean account,
select your droplet, and follow the links to create a snapshot. Once that&amp;rsquo;s
done, you can safely destroy the droplet. A destroyed droplet will no
longer accrue charges (and obviously it won&amp;rsquo;t be doing any processing
either). To reinstate your droplet, follow the same steps you used above to
create a droplet, but instead of selecting a Linux Distribution, select
your snapshot from the &lt;em&gt;My Images&lt;/em&gt; tab.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Comments or questions? Ping me on &lt;a href=&#34;https://ottawa.place/@plantarum&#34;&gt;mastodon&lt;/a&gt;,
or send me an email (address in sidebar)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;section class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34; role=&#34;doc-endnote&#34;&gt;
&lt;p&gt;Full disclaimer: this link uses my referral code, so if you sign up through here I&amp;rsquo;ll get a small kick-back from Digital Ocean. I hope this won&amp;rsquo;t lower your confidence in what some random guy on the internet has written. &lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</description>
    </item>
    
    <item>
      <title>Publication Quality R Figures</title>
      <link>https://plantarum.ca/2014/02/19/r-graphics/</link>
      <pubDate>Wed, 19 Feb 2014 00:00:00 +0000</pubDate>
      
      <guid>https://plantarum.ca/2014/02/19/r-graphics/</guid>
      <description>
&lt;script src=&#34;https://plantarum.ca/rmarkdown-libs/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;

&lt;div id=&#34;TOC&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#introduction&#34;&gt;Introduction&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#learning-objectives&#34;&gt;Learning Objectives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#pre-requisites&#34;&gt;Pre-requisites&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#motivation&#34;&gt;Motivation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#building-our-plot&#34;&gt;Building Our Plot&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#size&#34;&gt;Size&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#content&#34;&gt;Content&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#plot-symbols&#34;&gt;Plot Symbols&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#margins&#34;&gt;Margins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#axes&#34;&gt;Axes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#the-finished-plot&#34;&gt;The finished plot&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#exercise-1-adding-a-legend&#34;&gt;Exercise 1: adding a legend&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#additional-customization&#34;&gt;Additional Customization&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#selecting-plot-symbols&#34;&gt;Selecting Plot Symbols&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#panels&#34;&gt;Panels&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#exercise-2-completing-the-panel&#34;&gt;Exercise 2: Completing the Panel&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#image-formats&#34;&gt;Image Formats&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#raster-images&#34;&gt;Raster Images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#vector-images&#34;&gt;Vector Images&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

&lt;div class=&#34;figure&#34;&gt;&lt;span style=&#34;display:block;&#34; id=&#34;fig:final&#34;&gt;&lt;/span&gt;
&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/final-1.png&#34; alt=&#34;A. Iris Sepal Size by Species. B. Iris Petal Width&#34; width=&#34;696&#34; /&gt;
&lt;p class=&#34;caption&#34;&gt;
Figure 1: A. Iris Sepal Size by Species. B. Iris Petal Width
&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;div id=&#34;learning-objectives&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Learning Objectives&lt;/h2&gt;
&lt;p&gt;At the end of this lesson, you should be able to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Customize plots produced with the R base graphics system&lt;/li&gt;
&lt;li&gt;Design multi-panel plots&lt;/li&gt;
&lt;li&gt;Design plots to suit the publication requirements of a journal&lt;/li&gt;
&lt;li&gt;Save your plots as high-resolution raster or vector image files as
required by your publisher&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;pre-requisites&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Pre-requisites&lt;/h2&gt;
&lt;p&gt;You will need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A recent version of R installed on your computer&lt;/li&gt;
&lt;li&gt;Familiarity editing R scripts and passing commands from a script to the R
interpreter&lt;/li&gt;
&lt;li&gt;Note that RStudio is not ideal for this lesson, due to limitations in how
it processes plotting commands; the default RGui installed on Windows or
Mac will work better&lt;/li&gt;
&lt;li&gt;These notes!&lt;/li&gt;
&lt;li&gt;Optionally, some of your own data to work with during the exercises&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;motivation&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Motivation&lt;/h2&gt;
&lt;p&gt;You have several options for plotting with R. The simplest is the built-in
or base graphics package. Base graphics are less powerful than newer
alternatives like lattice or ggplot2. On the other hand, it’s much easier
to customize base graphics than the others. For this reason, I prefer to
use the built-in functions when preparing single-panel plots.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ggplot2&lt;/code&gt; is definitely worth investigating, especially if you want to
produce complex multi-panel faceted plots. The &lt;a href=&#34;http://ggplot2.org/&#34;&gt;official
website&lt;/a&gt; has all the documentation. Roger Peng has
also posted a very nice introductory &lt;a href=&#34;https://www.youtube.com/watch?v=HeqHMM4ziXA&amp;amp;feature=youtube_gdata_player&#34;&gt;video on
YouTube&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;building-our-plot&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Building Our Plot&lt;/h1&gt;
&lt;p&gt;For this example, we’ll use the guidelines provided by the &lt;a href=&#34;https://bsapubs.onlinelibrary.wiley.com/hub/journal/15372197/homepage/forauthors#ps&#34;&gt;American
Journal of
Botany&lt;/a&gt;. AJB
accepts figures 3.5 inches (1 column), 5-6 inches (1.5 columns), or 7.25
inches wide (2 columns). The height can be up to 9 inches. We’ll start with
a one-column plot, so the dimensions should be 3.5 inches wide.&lt;/p&gt;
&lt;p&gt;The figure we’ll plot is from the built-in &lt;code&gt;iris&lt;/code&gt; data set. We’ll do a
simple scatterplot of &lt;code&gt;Sepal.Length&lt;/code&gt; against &lt;code&gt;Sepal.Width&lt;/code&gt;.&lt;/p&gt;
&lt;div id=&#34;size&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Size&lt;/h2&gt;
&lt;p&gt;Let’s start with a square. If we need more height, we can increase the size
as necessary. Similarly, if we decide we need to stretch our figure over
two columns, we can change later.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note that RStudio isn’t the best environment for this exercise.&lt;/strong&gt;
Unfortunately, it’s not possible to create new plot windows in RStudio, so
you can’t specify the dimensions of the figure for the on-screen display.
Consequently, when you save your figure to an image file, you won’t
necessarily get exactly what you see on the screen. In many cases, this may
well be fine. But double-check the image file to make sure that you get
what you expected. If you didn’t, it may be because the on-screen display
and the saved image were not close enough to the same dimensions.&lt;/p&gt;
&lt;p&gt;To set up the canvas for our plot, start a new device:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dev.new(height = 3.5, width = 3.5)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you are using RStudio, &lt;code&gt;dev.new()&lt;/code&gt; won’t work. Instead, drag the edges
of the plot window to get as close to a 3.5&#34; square as possible.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;content&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Content&lt;/h2&gt;
&lt;p&gt;Now that our canvas is ready, we can start placing our graphics. Let’s
start with the default plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(Sepal.Length ~ Sepal.Width, data = iris)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/iris-fig-default-source-1.png&#34; width=&#34;336&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;plot-symbols&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Plot Symbols&lt;/h2&gt;
&lt;p&gt;The default plot uses the same symbol for each point. However, our data
frame includes samples from three different species:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;str(iris)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## &amp;#39;data.frame&amp;#39;:    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels &amp;quot;setosa&amp;quot;,&amp;quot;versicolor&amp;quot;,..: 1 1 1 1 1 1 1 1 1 1 ...&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Recall that a factor is just a vector of integers, with each integer having
it’s own label. We can display the underlying numbers by converting from
factor to numeric:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;as.numeric(iris$Species)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
##  [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3
## [112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [149] 3 3&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is a very useful feature. It means we can set a different plot symbol
for each species:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;plot(Sepal.Length ~ Sepal.Width, pch = as.numeric(Species),
     data = iris)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/iris-fig-1-1.png&#34; width=&#34;336&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;margins&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Margins&lt;/h2&gt;
&lt;p&gt;Now we can see what we’re working with, but the layout isn’t ideal. In
particular, the plot is very small relative to the size of the figure. We
can fix this with the &lt;code&gt;mar&lt;/code&gt; parameter. &lt;code&gt;mar&lt;/code&gt; takes a vector of four
integers, which set the width of the margin on the bottom, left, top and
right sides respectively (remember clockwise from the bottom!). These
numbers refer to the width of each margin in &lt;code&gt;lines&lt;/code&gt; — i.e., the width
required for a single line of text. The default is &lt;code&gt;c(5, 4, 4, 2) + 0.1&lt;/code&gt;.
The top margin in particular is usually too wide. We will very rarely add a
title to a published figure, so we don’t need to set aside space for it.&lt;/p&gt;
&lt;p&gt;Set the value of &lt;code&gt;mar&lt;/code&gt; with the &lt;code&gt;par&lt;/code&gt; function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;par(mar = c(3, 3, 0.5, 0.5))
plot(Sepal.Length ~ Sepal.Width, pch = as.numeric(Species),
     data = iris)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/iris-fig-2-1.png&#34; width=&#34;336&#34; /&gt;&lt;/p&gt;
&lt;p&gt;That’s better. But we’ve lost our axis labels. They aren’t actually lost,
but they are plotted outside of the margins we’ve set, so they are no
longer visible. I find the defaults that R uses for the axes to be larger
than we need. Better to turn off the axes entirely and replot them
ourselves.&lt;/p&gt;
&lt;p&gt;Note that once the margins are set with &lt;code&gt;par()&lt;/code&gt;, they will keep their value
until we open a new plot window, or reset them.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;axes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Axes&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;par(mar = c(3, 3, 0.5, 0.5))
plot(Sepal.Length ~ Sepal.Width, pch = as.numeric(Species),
     data = iris, 
     ann = FALSE,      # turn off axis labels
     axes = FALSE)     # turn off axis ticks&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/iris-fig-3-1.png&#34; width=&#34;336&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Notice that &lt;code&gt;axes = FALSE&lt;/code&gt; has turned of the box around our plot. We can put it back easily:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;box()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we can explicitly add each axis with their size and placement specified:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;axis(side = 1, tcl = -0.2, mgp = c(3, 0.3, 0),
     cex.axis = 0.8) 
axis(side = 2, tcl = -0.2, mgp = c(3, 0.3, 0),
     cex.axis = 0.8) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/iris-fig-3b-1.png&#34; width=&#34;336&#34; /&gt;&lt;/p&gt;
&lt;p&gt;What just happened? Let’s breakdown the arguments to &lt;code&gt;axis&lt;/code&gt;:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;side&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;which side of the plot, clockwise from bottom, same as for &lt;code&gt;mar&lt;/code&gt;
above
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;tcl&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;length of the ticks. Negative values indicate extending outwards
from plot, positive values extend inward. The default is -0.5, which I
find a bit too long.
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;mgp&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;&lt;code&gt;margin line&lt;/code&gt;, a vector of three numbers, which indicate the
position of the axis title, axis labels, and axis line, respectively. The
values are the number of &lt;code&gt;lines&lt;/code&gt; away from the plot border to place each
item, with &lt;code&gt;0&lt;/code&gt; indicating the margin of the plot area. Note that title
doesn’t matter here, since we aren’t using an axis title (yet).
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;cex.axis&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;axis character expansion. Scale the size of the tick labels.
&amp;lt; 1 reduces the size, &amp;gt; 1 increases the size.
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;Now we can add our axis titles back in:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mtext(&amp;quot;Sepal Width&amp;quot;, side = 1, line = 1.5)
mtext(&amp;quot;Sepal Length&amp;quot;, side = 2, line = 1.5)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here, we can use &lt;code&gt;line&lt;/code&gt; to adjust the distance between the label text and the axis.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;the-finished-plot&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;The finished plot&lt;/h2&gt;
&lt;p&gt;Putting this altogether gives us the following plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;par(mar = c(3, 3, 0.5, 0.5))
plot(Sepal.Length ~ Sepal.Width, pch = as.numeric(Species),
     data = iris, 
     ann = FALSE,      # turn off axis labels
     axes = FALSE)     # turn off axis ticks
box()
axis(side = 1, tcl = -0.2, mgp = c(3, 0.3, 0),
     cex.axis = 0.8) 
axis(side = 2, tcl = -0.2, mgp = c(3, 0.3, 0),
     cex.axis = 0.8) 
mtext(&amp;quot;Sepal Width&amp;quot;, side = 1, line = 1.5)
mtext(&amp;quot;Sepal Length&amp;quot;, side = 2, line = 1.5)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/iris-finished-plot-1.png&#34; width=&#34;336&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;exercise-1-adding-a-legend&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Exercise 1: adding a legend&lt;/h3&gt;
&lt;p&gt;We now have a complete figure. We could provide an explanation of the
symbols in the caption, but it might be nicer to have a legend plotted on
the figure. This is easily done with the &lt;code&gt;legend()&lt;/code&gt; function.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;legend(legend = levels(iris$Species), x=&amp;quot;topleft&amp;quot;,
       pch = 1:3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/legend-code-1.png&#34; width=&#34;336&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Since we set the &lt;code&gt;pch&lt;/code&gt; argument in our plots using the factor
&lt;code&gt;iris$Species&lt;/code&gt;, we can use &lt;code&gt;levels()&lt;/code&gt; function to extract the labels for
the legend. &lt;code&gt;pch&lt;/code&gt; indicates the actual symbols to use, and &lt;code&gt;x&lt;/code&gt; is the
location of the legend.&lt;/p&gt;
&lt;p&gt;This is clearly not “publication quality.” Our plot needs a bit more space
for the legend. See if you can make an attractive plot. The following
options might be helpful:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;&lt;code&gt;dev.new()&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;&lt;code&gt;width, height&lt;/code&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;plot()&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;&lt;code&gt;xlim, ylim, cex&lt;/code&gt;
&lt;/dd&gt;
&lt;dt&gt;&lt;code&gt;legend()&lt;/code&gt;:&lt;/dt&gt;
&lt;dd&gt;&lt;code&gt;x, y, bty, horiz, cex, pt.cex, text.width&lt;/code&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;If you need a hint, take a look at the next section.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;additional-customization&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Additional Customization&lt;/h1&gt;
&lt;div id=&#34;selecting-plot-symbols&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Selecting Plot Symbols&lt;/h2&gt;
&lt;p&gt;If you want to select different symbols, it’s easy to do using R’s
subsetting syntax. By default, for three levels of our &lt;code&gt;Species&lt;/code&gt; factor, we
get symbols 1, 2, and 3. If instead we wanted to use symbols&lt;a href=&#34;#fn1&#34; class=&#34;footnote-ref&#34; id=&#34;fnref1&#34;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; 19, 5, and
3, we could do this:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;par(mar = c(3, 3, 0.5, 0.5))
mysymbols &amp;lt;- c(19, 5, 3)
plot(Sepal.Length ~ Sepal.Width,
     pch = mysymbols[as.numeric(Species)], data = iris,
     ylim = c(4.0, 8.5), ann = FALSE, axes = FALSE)
box()
axis(side = 1, tcl = -0.2, mgp = c(3, 0.3, 0),
     cex.axis = 0.8) 
axis(side = 2, tcl = -0.2, mgp = c(3, 0.3, 0),
     cex.axis = 0.8) 
mtext(&amp;quot;Sepal Width&amp;quot;, side = 1, line = 1.5)
mtext(&amp;quot;Sepal Length&amp;quot;, side = 2, line = 1.5)
legend(legend = levels(iris$Species), x = &amp;quot;top&amp;quot;,
       pch = mysymbols, horiz = TRUE, bty = &amp;#39;n&amp;#39;,
       cex = 0.9, text.width = c(0.6, 0.7, 0.6))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/symbols-1.png&#34; width=&#34;336&#34; /&gt;&lt;/p&gt;
&lt;p&gt;That’s an important application of &lt;code&gt;R&lt;/code&gt;’s subsetting commands, so make sure
you follow what happened — we subset the &lt;code&gt;mysymbols&lt;/code&gt; vector with the
longer &lt;code&gt;as.numeric(iris$Species)&lt;/code&gt; vector, which converted the values from
&lt;code&gt;(1, 2, 3)&lt;/code&gt; to &lt;code&gt;(19, 5, 3)&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mysymbols&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 19  5  3&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;as.numeric(iris$Species)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
##  [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3
## [112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [149] 3 3&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;mysymbols[as.numeric(iris$Species)]&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##   [1] 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
##  [26] 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
##  [51]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [76]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
## [101]  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
## [126]  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;panels&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Panels&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;ggplot2&lt;/code&gt; provides a very sophisticated system for producing multi-panel
plots. But it’s easy enough to create a simple panel using the base
graphics. For this example, let’s do a two-plot horizontal panel, with our
scatter plot in the first position, and a boxplot of petal widths in the
second position. A two-column plot in AJB is 7.25 inches wide:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dev.new(width = 7.25, height = 3.5)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next we need, to inform R that we’re splitting the figure into two panels:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;par(mfrow = c(1, 2))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;mfrow&lt;/code&gt; sets the graphics device for rows and columns, in this case one
row, two columns. We can now put our first plot in the first spot:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/panel1-1.png&#34; width=&#34;696&#34; /&gt;&lt;/p&gt;
&lt;p&gt;After dividing a plot device into panels with &lt;code&gt;mfrow&lt;/code&gt;, the first high-level
plot (i.e., &lt;code&gt;plot, boxplot&lt;/code&gt; etc.) command will be placed in the first
panel. All subsequent low-level plotting commands (i.e., &lt;code&gt;legend, axis, mtext&lt;/code&gt; etc.) will be added to this same panel. When the next high-level
command is called, it will be placed in the next panel, and focus shifts
with it. So we can now add our boxplot to the second panel:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;boxplot(Petal.Width ~ Species, data = iris)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/panel2-1.png&#34; width=&#34;696&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Note that the margins we set for the first panel are still in effect.
Consequently, we’ve lost the axis labels on our second plot. It’s going to
need some attention to make it look right. We’ll leave that for the next
exercise.&lt;/p&gt;
&lt;p&gt;In the meantime, we have one more requirement to meet. On multi-figure
panels, AJB requires an uppercase letter (A, B, etc) to label each plot.
This label should go in the upper-left corner of each panel. This is easy
to do with the &lt;code&gt;text&lt;/code&gt; command. At the moment, we don’t have space in the
upper-left corner, so we’ll put the labels in the lower-right temporarily.
For example:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## For the first panel:
text(&amp;quot;A&amp;quot;, x = 4.2, y = 4.5, cex = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## For the second panel:
text(&amp;quot;B&amp;quot;, x = 3.2, y = 0.24, cex = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/panel2B-1.png&#34; width=&#34;696&#34; style=&#34;display: block; margin: auto;&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Unfortunately, each figure is plotted on different scales, so placing the
letters in the same position is not straightforward. Luckily, R provides a
function for getting “universal” coordinates for every plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## For the first panel:
text(&amp;quot;A&amp;quot;, x = grconvertX(0.9, from=&amp;quot;npc&amp;quot;, to=&amp;quot;user&amp;quot;), 
     y = grconvertY(0.1, from = &amp;quot;npc&amp;quot;, to=&amp;quot;user&amp;quot;), cex = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;## For the second panel:
text(&amp;quot;B&amp;quot;, x = grconvertX(0.9, from=&amp;quot;npc&amp;quot;, to=&amp;quot;user&amp;quot;), 
     y = grconvertY(0.1, from = &amp;quot;npc&amp;quot;, to=&amp;quot;user&amp;quot;), cex = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The functions &lt;code&gt;grconvertX&lt;/code&gt; and &lt;code&gt;grconvertY&lt;/code&gt; convert between different
coordinate systems. &lt;code&gt;npc&lt;/code&gt; is “normalized plot coordinates.” In this system,
(0, 0) is the lower left corner of the plot, and (1, 1) is the upper right
corner. &lt;code&gt;user&lt;/code&gt;, on the other hand, is the coordinate system in effect for
the actual plotted data. Which, for our Panel A, means the lower right
corner is ca. (2.0, 4.0) and the upper left corner is ca. (4.5, 8.5). So
&lt;code&gt;grconvertX(0.9, from = &#34;npc&#34;, to = &#34;user&#34;)&lt;/code&gt; returns the X coordinate to
plot our text 90% of the way to the left side of the plot, regardless of
the scale used in that plot. With this addition, we have the following
code, and the generated panels:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;par(mfrow = c(1, 2))
par(mar = c(3, 3, 0.5, 0.5))
mysymbols &amp;lt;- c(19, 5, 3)
plot(Sepal.Length ~ Sepal.Width,
     pch = mysymbols[as.numeric(Species)], data = iris,
     ylim = c(4.0, 8.5), ann = FALSE, axes = FALSE)
box()
axis(side = 1, tcl = -0.2, mgp = c(3, 0.3, 0),
     cex.axis = 0.8) 
axis(side = 2, tcl = -0.2, mgp = c(3, 0.3, 0),
     cex.axis = 0.8) 
mtext(&amp;quot;Sepal Width&amp;quot;, side = 1, line = 1.5)
mtext(&amp;quot;Sepal Length&amp;quot;, side = 2, line = 1.5)
legend(legend = levels(iris$Species), x = &amp;quot;top&amp;quot;,
       pch = mysymbols, horiz = TRUE, bty = &amp;#39;n&amp;#39;,
       cex = 0.9, text.width = c(0.6, 0.7, 0.6))
## For the first panel:
text(&amp;quot;A&amp;quot;, x = grconvertX(0.9, from=&amp;quot;npc&amp;quot;, to=&amp;quot;user&amp;quot;), 
     y = grconvertY(0.1, from = &amp;quot;npc&amp;quot;, to=&amp;quot;user&amp;quot;), cex = 2)
boxplot(Petal.Width ~ Species, data = iris)
## For the second panel:
text(&amp;quot;B&amp;quot;, x = grconvertX(0.9, from=&amp;quot;npc&amp;quot;, to=&amp;quot;user&amp;quot;), 
     y = grconvertY(0.1, from = &amp;quot;npc&amp;quot;, to=&amp;quot;user&amp;quot;), cex = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://plantarum.ca/tutorials/2020-07-20-base-R-plots_files/figure-html/panel2Bc-1.png&#34; width=&#34;696&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;exercise-2-completing-the-panel&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Exercise 2: Completing the Panel&lt;/h3&gt;
&lt;p&gt;There are still a few problems with our panel:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The title of the Y axis on the second panel is not visible&lt;/li&gt;
&lt;li&gt;The panel labels (A and B) are in the wrong positions — they should be
in the top left corners&lt;/li&gt;
&lt;li&gt;Fixing the panel labels will require moving the legend for the first figure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you change &lt;code&gt;ylim&lt;/code&gt;, you can put the legend on the bottom, and make space
for the label at the top. Go ahead and see what you can do with this. You
can use my example, Figure 1 at the top of this article as a model. There
is a trick to formatting the x-axis of boxplots. When calling the function
&lt;code&gt;axis&lt;/code&gt;, you’ll have to set the &lt;code&gt;at&lt;/code&gt; argument to indicate where to plot the
labels (which should be &lt;code&gt;c(1, 2, 3)&lt;/code&gt;, and you’ll have to set the &lt;code&gt;labels&lt;/code&gt;
argument to indicate what the labels should be.&lt;/p&gt;
&lt;p&gt;Alternatively, try formatting your own data according to the requirements
of a journal in your field.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;image-formats&#34; class=&#34;section level1&#34;&gt;
&lt;h1&gt;Image Formats&lt;/h1&gt;
&lt;p&gt;R can save graphics to a variety of formats, including anything your target
journal might require. In general, you can store your images in one of two
classes of file format:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;raster:&lt;/dt&gt;
&lt;dd&gt;images are stored as a matrix of values, with each value indicating the color of a single pixel in the grid. Best used for photographs. Examples: jpg, tiff, png.
&lt;/dd&gt;
&lt;dt&gt;vector:&lt;/dt&gt;
&lt;dd&gt;images are stored as a series of mathematical instructions for re-creating the display: lines, polygons, text etc. Best used for line drawings. Examples: eps, svg.
&lt;/dd&gt;
&lt;/dl&gt;
&lt;div id=&#34;raster-images&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Raster Images&lt;/h2&gt;
&lt;p&gt;Raster images are stored as a grid of numbers called pixels. Each number
records the colour of a single pixel in the image. As a consequence, the
image resolution is limited by the number of pixels recorded in the file.
In our example, we need a figure 3.5 inches wide. AJB requires a resolution
of 1000 dots per inch (DPI) for line drawings, which means we need a source
image 3500 pixels in the x and y dimension. We don’t actually need to do
these calculations, though, R will handle it for us. We just need to pick a
format and set the final resolution.&lt;/p&gt;
&lt;p&gt;AJB prefers TIFF format for raster files. To generate one we will use the
&lt;code&gt;tiff()&lt;/code&gt; function. Note that this function only sets the file details for
our plot; we need to add the plotting code after we open the file, and
close it when we’re done with &lt;code&gt;dev.off()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;tiff(filename = &amp;quot;iris.tiff&amp;quot;, width = 3.5, height = 3.5,
     units=&amp;quot;in&amp;quot;, res = 1000, compression = &amp;quot;lzw&amp;quot;)
par(mar = c(3, 3, 0.5, 0.5))
plot(Sepal.Length ~ Sepal.Width, pch = as.numeric(Species),
     data = iris, 
     ann = FALSE,      # turn off axis labels
     axes = FALSE)     # turn off axis ticks
## Insert additional plot code here
dev.off()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;width&lt;/code&gt;, &lt;code&gt;height&lt;/code&gt; and &lt;code&gt;units&lt;/code&gt; set the size of the image, &lt;code&gt;res&lt;/code&gt; sets the
resolution in points per inch. &lt;code&gt;compression&lt;/code&gt; reduces the size of the file.
The &lt;code&gt;lzw&lt;/code&gt; options is only available for &lt;code&gt;tiff&lt;/code&gt; files. It’s lossless, which
means the compressed image is just as good as the original, so there’s no
reason not to use it. In this case, it reduces the file size from 36Mb to
366K — a 99% reduction!&lt;/p&gt;
&lt;p&gt;To create the same image as a &lt;code&gt;jpg&lt;/code&gt; with the same resolution we’d use:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;jpeg(filename = &amp;quot;iris.jpg&amp;quot;, width = 3.5, height = 3.5,
     units = &amp;quot;in&amp;quot;, res = 1000, quality = 85)
## insert plot code here!
dev.off()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;jpg&lt;/code&gt; files are always compressed, and they use a lossy compression. That
means there is some degradation of the image quality associated with the
compression. The &lt;code&gt;quality&lt;/code&gt; argument determines how aggressively the image
is compressed. Higher values produce larger, less-degraded images. As a
rule of thumb, 85 usually produces fine images at a reasonable size. In
this case, the file is 550K, so a little larger than the compressed TIFF
file.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;vector-images&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Vector Images&lt;/h2&gt;
&lt;p&gt;Vector images are stored as a list of instructions: ‘draw a line from here
to here, put a circle at this coordinate’ etc. As a consequence, they don’t
have an inherent resolution; rather, they can be printed at any resolution
necessary. So we don’t worry about the resolution when creating them, just
the size and width. There are other options we need to be concerned with
here:&lt;/p&gt;
&lt;dl&gt;
&lt;dt&gt;paper:&lt;/dt&gt;
&lt;dd&gt;&lt;code&gt;&#34;special&#34;&lt;/code&gt; indicates that we are making a single image, not a full-page
&lt;/dd&gt;
&lt;dt&gt;onefile:&lt;/dt&gt;
&lt;dd&gt;&lt;code&gt;FALSE&lt;/code&gt; indicates that we are making a new file for each image (probably
not necessary with a single image)
&lt;/dd&gt;
&lt;/dl&gt;
&lt;p&gt;horizontal:
:&lt;code&gt;FALSE&lt;/code&gt; indicates we don’t want a landscape-orientation&lt;/p&gt;
&lt;p&gt;To create an &lt;code&gt;eps&lt;/code&gt; file:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;postscript(&amp;quot;iris.eps&amp;quot;, height = 3.5, width = 3.5,
           paper = &amp;quot;special&amp;quot;, onefile = FALSE,
           horizontal = FALSE) 
## insert plot code here!
dev.off()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This file is only 11K, and can be printed at any resolution. That makes
&lt;code&gt;eps&lt;/code&gt; a very convenient format to use. However, you may run into issues
with fonts. By default, &lt;code&gt;eps&lt;/code&gt; files produced by R don’t include the fonts,
just the position of the letters to place on the image. If you need to
embed the fonts, you need to explicitly request this:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;embedFonts(&amp;quot;iris.eps&amp;quot;, outfile=&amp;quot;iris-embed.eps&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This command creates a new file, &lt;code&gt;iris-embed.eps&lt;/code&gt;, that has the font
information embedded in the file. Fonts can be tricky, and specific details
vary between Windows, Mac and Linux. It’s easiest to stick to the default
font settings, and only dive into custom fonts and settings if you are
required by the publisher.&lt;/p&gt;
&lt;p&gt;Note that, &lt;code&gt;pdf&lt;/code&gt; files are more common than &lt;code&gt;eps&lt;/code&gt;. You can create &lt;code&gt;pdf&lt;/code&gt;
image files directly from &lt;code&gt;R&lt;/code&gt; as well, using:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pdf(&amp;quot;iris.pdf&amp;quot;, height = 3.5, width = 3.5,
    paper = &amp;quot;special&amp;quot;, onefile = FALSE) 
## insert plot code here!
dev.off()&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&#34;footnotes&#34;&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id=&#34;fn1&#34;&gt;&lt;p&gt;How did I pick 19, 5 and 3? You can see all 25 symbols
with &lt;code&gt;plot(1:25, pch = 1:25)&lt;/code&gt;.&lt;a href=&#34;#fnref1&#34; class=&#34;footnote-back&#34;&gt;↩︎&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
