Shared Services Canada maintains the General Purpose Science Cluster (GPSC)
high-performance computing cluster. These are my notes for accessing it
from the AAFC network. For more general notes on using orgmode
to manage
cluster work, see this post.
Get the hostname from the administrators
Note: GPSC no longer uses multi-hop access (at least from AAFC), so you just need the GPSC hostname to log in. I’m leaving these notes here in case we switch back at some point.
Multi-hop access from inside your department network requires logging into
your ‘local’ cluster (<LOCAL HOSTNAME>
), and then logging into GPSC
(<GPSC HOSTNAME>
) from there. You’ll need to get the actual hostnames for
these from the cluster administrator.
ssh
into your local cluster using your GPSC username (not the same as your AAFC network user ID) and password.from your local cluster,
ssh
into the GPSC with the same credentials.
Configure keys and addresses
Use an RSA key for password-free logins, as I described in my tutorial on Digital Ocean droplets.
I’ll never rember the hostnames, so I’ve added the following to
~/.ssh/config
on my laptop (Linux; should work the same on Mac(?),
Windows users will have to configure their ssh
client as needed):
Host gpsc
Hostname <GPSC HOSTNAME>
User <USERNAME>
This allows me to sign in to <GPSC HOSTNAME>
via ssh gpsc
without
remembering the actual address, username or password. This also works for
transferring files via rsync
:
rsync <LOCAL> gpsc:<PATH>
Here, <LOCAL>
can be a file (e.g., my_script.sh
), or a directory (e.g.,
my-folder
), and <PATH>
is the location to transfer to on gpsc. Leaving
<PATH
blank will transfer your file or directory to your home directory
on gpsc
.
Prepare slurm submission scripts
The template for submitting jobs is:
#!/bin/bash -l
#SBATCH --job-name=JOB_NAME
#SBATCH --open-mode=append
#SBATCH --partition=standard
#SBATCH --time=1:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
####SBATCH --mem-per-cpu=1G
#SBATCH --comment="<SUBMIT_COMMENT>"
#SBATCH --account=<ACCOUNT_NAME>
echo hello
sleep 45
echo goodbye
Note that you’ll need to include your actual
With the above template saved to a file named slurm_script.sh
, you can
run it from the cluster via sbatch slurm_script.sh
.
Be sure to update the job-name
option to something informative, and of
course increase the time
, ntasks
, and cpus-per-task
to something
appropriate for your job.
Note that on GPSC, mem-per-cpu
isn’t required. That’s why I’ve commented
it out in the script above (notice the extra ###
). The system will
automatically divide RAM among CPUs, which usually works fine. If your job
runs out of memory, you can uncomment this line and ask for a specific
(higher) amount.
Configure Emacs
You can open files on GPSC in Emacs on your local machine via: C-x C-f /ssh:gpsc:FILENAME
.
Execute code blocks with Org and Babel
You can use orgmode to run code blocks from a file
on your local machine on GPSC
. To do this, use a header like this in your
local .org
file (which assumes you’ve set up your .ssh/config
files as
described above):
#+PROPERTY: header-args:bash :results output :dir /ssh:gpsc:./path/to/working/directory
I ran into transient permission issues executing code blocks. I’ve fixed
this by configuring org-babel
to use my home directory for temp files. It
seems to have problems writing to /tmp
on GPSC, although I should have
permission to do so? Here’s the code from my emacs init.el
:
(setq org-babel-remote-temporary-directory "~/")
So far the temp files babel
generates are cleaned up automatically, so
this hasn’t created any detritus. You could use a subdirectory under ~/
,
but the directory must exist (i.e., you need to create it yourself) before
org
can find it - it won’t make it for you.
With this done, you can submit code blocks from your org file directly to
slurm
on GPSC
. Here’s my template:
#+BEGIN_SRC bash :results output
sbatch <<SUBMITSCRIPT
#!/bin/bash
#SBATCH --job-name=slurm_test_emacs
#SBATCH --output=slurm_test_emacs.log
#SBATCH --open-mode=append
#SBATCH --partition=standard
#SBATCH --time=0:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
####SBATCH --mem-per-cpu=1G
#SBATCH --comment="<SUBMIT_COMMENT>"
#SBATCH --account=<ACCOUNT_NAME>
date
echo
echo hello
echo
SUBMITSCRIPT
#+END_SRC
Everything between <<SUBMITSCRIPT
and SUBMITSCRIPT
will get passed to
sbatch
as if it were a separate script file.
If you do use this approach, note that you must escape any variables!
(e.g., \$VAR
). This won’t work:
#+BEGIN_SRC bash :results output
sbatch <<SUBMITSCRIPT
#!/bin/bash
#SBATCH --job-name=bad_HEREDOC
<...>
NAME=TYLER
echo $NAME
SUBMITSCRIPT
#+END_SRC
But this will (note the backslash in front of the $
):
#+BEGIN_SRC bash :results output
sbatch <<SUBMITSCRIPT
#!/bin/bash
#SBATCH --job-name=good_HEREDOC
<...>
NAME=TYLER
echo \$NAME
SUBMITSCRIPT
#+END_SRC
A further wrinkle: this uses a
heredoc, and heredocs need to
write a temp file. By defaul this goes in /tmp
, and in some cases /tmp
may be full. If that happens, you’ll get an error
cannot create temp file for here-document: No space left on device
All users can write to /tmp
, so you can’t fix this yourself. You can
change the location where your heredoc gets written, though:
#+BEGIN_SRC bash :results output
export TMPDIR=/path/to/your/personal/tmp/directory
sbatch <<SUBMITSCRIPT
#!/bin/bash
#SBATCH --job-name=good_HEREDOC
<...>
NAME=TYLER
echo \$NAME
SUBMITSCRIPT
#+END_SRC
You may set this permanently by adding the export TMPDIR=...
line to your
.bashrc
on GPSC.
Once a job is submitted, the ID will appear in your local .org
file,
e.g.:
#+RESULTS:
: Submitted batch job 1385545
You can retrieve info on the run with:
#+BEGIN_SRC bash :results output
sacct --jobs=1385545 --format=jobid,jobname,state,elapsed,MaxRSS
#+END_SRC
In my case, this generated:
#+RESULTS:
: JobID JobName State Elapsed MaxRSS
: ------------ ---------- ---------- ---------- ----------
: 1385545 slurm_tes+ COMPLETED 00:00:07
: 1385545.bat+ batch COMPLETED 00:00:07 2096K
: 1385545.ext+ extern COMPLETED 00:00:07 2088K
Use the format
argument to request which details to report. MaxRSS
is
the maximum amount of RAM used during the run. Other options are available
via sacct --helpformat
, with more details in the slurm
manual.
Note that GPSC doesn’t provide the seff command, which you may have used on other clusters.
Linking to files on the remote server
It may be more convenient to manage more complex scripts as stand-alone
.sh
files on the GPSC. If the configuration above works for you, you can
insert a link to remote files in your org
file with the syntax:
[[file:/ssh:gpsc:~/path/to/script.sh][My Script]]
This will create a ‘hotlink’ in your file you can click to open that file
in your local Emacs. The default keyboard shortcut for entering links is
C-c C-l
.
That will allow you to edit the remote file as if it were local. When
you’re ready to submit it, you can do that from your local .org
file:
#+BEGIN_SRC bash :results output
sbatch script.sh
#+end_src
Note that the location of your script will be relative to the directory you listed in your file header.
I also use org links for data files I might need to refer to or edit (i.e., sample indices), and files generated by the jobs themselves.