Shared Services Canada maintains the General Purpose Science Cluster (GPSC)
high-performance computing cluster. These are my notes for accessing it
from the AAFC network. For more general notes on using orgmode
to manage
cluster work, see this post.
Get the hostnames from the administrators
Multi-hop access from inside your department network requires logging into
your ‘local’ cluster (<LOCAL HOSTNAME>
), and then logging into GPSC
(<GPSC HOSTNAME>
) from there. You’ll need to get the actual hostnames for
these from the cluster administrator.
ssh
into your local cluster using your GPSC username (not the same as your AAFC network user ID) and password.from your local cluster,
ssh
into the GPSC with the same credentials.
Configure keys and addresses
Use an RSA key for password-free logins, as I described in my tutorial on Digital Ocean droplets.
I’ll never rember the hostnames, so I’ve added the following to
~/.ssh/config
on my laptop (Linux; should work the same on Mac(?),
Windows users will have to configure their ssh
client as needed):
Host gpsc
Hostname <GPSC HOSTNAME>
User <USERNAME>
Host aafc-gpsc
Hostname <LOCAL HOSTNAME>
User <USERNAME>
I also added first stanza to ~/.ssh/config
on <LOCAL HOSTNAME>
.
Together with the RSA key, this allows me to sign in to <LOCAL HOSTNAME>
via ssh aafc-gpsc
, and from there into the GPSC via ssh gpsc
,
without remembering the actual hostnames or password.
To further streamline this, I’ve added the following alias on my laptop (in
~/.bash_aliases
):
alias gpsc='ssh -t aafc-gpsc ssh gpsc'
With this set, from a terminal I can login to GPSC directly via gpsc
.
We have similar issues with rsync
: to transfer files to and from gpsc
, we
need to pass them through aafc-gpsc
. We can accomplish this in a single
step via:
rsync -av -e "ssh -A -t USERNAME@aafc-gpsc ssh -A -t USERNAME@gpsc" SOURCE :DEST
This will sync SOURCE
from the local file to DEST
on gpsc
, passing
through aafc-gpsc
(and with the archival and verbose flags). Note
that we need to use the :
to indicate which file is on the remote
machine.
Another alias will make this easier:
alias rs2gpsc='rsync -av -e "ssh -A -t USERNAME@aafc-gpsc ssh -A -t USERNAME@gpsc"'
With this installed, we can send to gpsc
with:
rs2gpsc LOCALFILE :REMOTE-PATH
and retrieve files with:
rs2gpsc :REMOTE-FILE LOCAL-PATH
Note the :
in the second example indicating we’re transfering from
gpsc
to our local machine.
Prepare slurm submission scripts
The template for submitting jobs is:
#!/bin/bash -l
#SBATCH --job-name=JOB_NAME
#SBATCH --open-mode=append
#SBATCH --partition=standard
#SBATCH --time=1:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --comment="<SUBMIT_COMMENT>"
#SBATCH --account=<ACCOUNT_NAME>
echo hello
sleep 45
echo goodbye
Note that you’ll need to include your actual
With the above template saved to a file named slurm_script.sh
, you can
run it from the cluster via sbatch slurm_script.sh
.
Be sure to update the job-name
option to something informative, and of
course increase the time
, ntasks
, cpus-per-task
and mem-per-cpu
to
something appropriate for your job.
Configure Emacs
Multihops on Tramp
For Emacs users, we can streamline our workflow by configuring
TRAMP to use multi-hops. Add the
following to your config (usually .emacs
, or .emacs.d/init.el
; TRAMP is
installed with Emacs, no additional packages required):
(add-to-list 'tramp-default-proxies-alist
'("gpsc"
nil
"/ssh:USERNAME@<LOCAL HOSTNAME>:"))
That allows me to open files on GPSC in my local Emacs via: C-x C-f /ssh:gpsc:FILENAME
.
Execute code blocks with Org and Babel
You can use orgmode to run code blocks from a file
on your local machine on GPSC
. To do this, use a header like this in your
local .org
file (which assumes you’ve set up your .ssh/config
files as
described above):
#+PROPERTY: header-args:bash :results output :dir /ssh:gpsc:./path/to/working/directory
I ran into transient permission issues executing code blocks. I’ve fixed
this by configuring org-babel
to use my home directory for temp files. It
seems to have problems writing to /tmp
on GPSC, although I should have
permission to do so? Here’s the code from my emacs init.el
:
(setq org-babel-remote-temporary-directory "~/")
So far the temp files babel
generates are cleaned up automatically, so
this hasn’t created any detritus. You could use a subdirectory under ~/
,
but the directory must exist (i.e., you need to create it yourself) before
org
can find it - it won’t make it for you.
With this done, you can submit code blocks from your org file directly to
slurm
on GPSC
. Here’s my template:
#+BEGIN_SRC bash :results output
sbatch <<SUBMITSCRIPT
#!/bin/bash
#SBATCH --job-name=slurm_test_emacs
#SBATCH --output=slurm_test_emacs.log
#SBATCH --open-mode=append
#SBATCH --partition=standard
#SBATCH --time=0:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --comment="<SUBMIT_COMMENT>"
#SBATCH --account=<ACCOUNT_NAME>
date
echo
echo hello
echo
SUBMITSCRIPT
#+END_SRC
Everything between <<SUBMITSCRIPT
and SUBMITSCRIPT
will get passed to
sbatch
as if it were a separate script file.
If you do use this approach, note that you must escape any variables!. This won’t work:
#+BEGIN_SRC bash :results output
sbatch <<SUBMITSCRIPT
#!/bin/bash
#SBATCH --job-name=bad_HEREDOC
<...>
NAME=TYLER
echo $NAME
SUBMITSCRIPT
#+END_SRC
But this will (note the backslash in front of the $
):
#+BEGIN_SRC bash :results output
sbatch <<SUBMITSCRIPT
#!/bin/bash
#SBATCH --job-name=good_HEREDOC
<...>
NAME=TYLER
echo \$NAME
SUBMITSCRIPT
#+END_SRC
A further wrinkle: this uses a
heredoc, and heredocs need to
write a temp file. By defaul this goes in /tmp
, and in some cases /tmp
may be full. If that happens, you’ll get an error
cannot create temp file for here-document: No space left on device
All users can write to /tmp
, so you can’t fix this yourself. You can
change the location where your heredoc gets written, though:
#+BEGIN_SRC bash :results output
export TMPDIR=/path/to/your/personal/tmp/directory
sbatch <<SUBMITSCRIPT
#!/bin/bash
#SBATCH --job-name=good_HEREDOC
<...>
NAME=TYLER
echo \$NAME
SUBMITSCRIPT
#+END_SRC
You may set this permanently by adding the export TMPDIR=...
line to your
.bashrc
on GPSC.
Once a job is submitted, the ID will appear in your local .org
file,
e.g.:
#+RESULTS:
: Submitted batch job 1385545
You can retrieve info on the run with:
#+BEGIN_SRC bash :results output
sacct --jobs=1385545 --format=jobid,jobname,state,elapsed,MaxRSS
#+END_SRC
In my case, this generated:
#+RESULTS:
: JobID JobName State Elapsed MaxRSS
: ------------ ---------- ---------- ---------- ----------
: 1385545 slurm_tes+ COMPLETED 00:00:07
: 1385545.bat+ batch COMPLETED 00:00:07 2096K
: 1385545.ext+ extern COMPLETED 00:00:07 2088K
Use the format
argument to request which details to report. MaxRSS
is
the maximum amount of RAM used during the run. Other options are available
via sacct --helpformat
, with more details in the slurm
manual.
Note that GPSC doesn’t provide the seff command, which you may have used on other clusters.
Linking to files on the remote server
It may be more convenient to manage more complex scripts as stand-alone
.sh
files on the GPSC. If the configuration above works for you, you can
insert a link to remote files in your org
file with the syntax:
[[file:/ssh:gpsc:~/path/to/script.sh][My Script]]
This will create a ‘hotlink’ in your file you can click to open that file
in your local Emacs. The default keyboard shortcut for entering links is
C-c C-l
.
That will allow you to edit the remote file as if it were local. When
you’re ready to submit it, you can do that from your local .org
file:
#+BEGIN_SRC bash :results output
sbatch script.sh
#+end_src
Note that the location of your script will be relative to the directory you listed in your file header.
I also use org links for data files I might need to refer to or edit (i.e., sample indices), and files generated by the jobs themselves.