6 min read

Notes on GPSC Accounts

§hpc

Shared Services Canada maintains the General Purpose Science Cluster (GPSC) high-performance computing cluster. These are my notes for accessing it from the AAFC network. For more general notes on using orgmode to manage cluster work, see this post.

Get the hostnames from the administrators

Multi-hop access from inside your department network requires logging into your ‘local’ cluster (<LOCAL HOSTNAME>), and then logging into GPSC (<GPSC HOSTNAME>) from there. You’ll need to get the actual hostnames for these from the cluster administrator.

  1. ssh into your local cluster using your GPSC username (not the same as your AAFC network user ID) and password.

  2. from your local cluster, ssh into the GPSC with the same credentials.

Configure keys and addresses

Use an RSA key for password-free logins, as I described in my tutorial on Digital Ocean droplets.

I’ll never rember the hostnames, so I’ve added the following to ~/.ssh/config on my laptop (Linux; should work the same on Mac(?), Windows users will have to configure their ssh client as needed):

Host gpsc
    Hostname <GPSC HOSTNAME>
    User <USERNAME>

Host aafc-gpsc
    Hostname <LOCAL HOSTNAME>
    User <USERNAME>

I also added first stanza to ~/.ssh/config on <LOCAL HOSTNAME>.

Together with the RSA key, this allows me to sign in to <LOCAL HOSTNAME> via ssh aafc-gpsc, and from there into the GPSC via ssh gpsc, without remembering the actual hostnames or password.

To further streamline this, I’ve added the following alias on my laptop (in ~/.bash_aliases):

alias gpsc='ssh -t aafc-gpsc ssh gpsc'

With this set, from a terminal I can login to GPSC directly via gpsc.

We have similar issues with rsync: to transfer files to and from gpsc, we need to pass them through aafc-gpsc. We can accomplish this in a single step via:

rsync -av -e "ssh -A -t USERNAME@aafc-gpsc ssh -A -t USERNAME@gpsc" SOURCE :DEST

This will sync SOURCE from the local file to DEST on gpsc, passing through aafc-gpsc (and with the archival and verbose flags). Note that we need to use the : to indicate which file is on the remote machine.

Another alias will make this easier:

alias rs2gpsc='rsync -av -e "ssh -A -t USERNAME@aafc-gpsc ssh -A -t USERNAME@gpsc"'

With this installed, we can send to gpsc with:

rs2gpsc LOCALFILE :REMOTE-PATH

and retrieve files with:

rs2gpsc :REMOTE-FILE LOCAL-PATH

Note the : in the second example indicating we’re transfering from gpsc to our local machine.

Prepare slurm submission scripts

The template for submitting jobs is:

#!/bin/bash -l
#SBATCH --job-name=JOB_NAME
#SBATCH --open-mode=append
#SBATCH --partition=standard
#SBATCH --time=1:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --comment="<SUBMIT_COMMENT>"
#SBATCH --account=<ACCOUNT_NAME>

echo hello
sleep 45
echo goodbye

Note that you’ll need to include your actual and as provided by the administrators.

With the above template saved to a file named slurm_script.sh, you can run it from the cluster via sbatch slurm_script.sh.

Be sure to update the job-name option to something informative, and of course increase the time, ntasks, cpus-per-task and mem-per-cpu to something appropriate for your job.

Configure Emacs

Multihops on Tramp

For Emacs users, we can streamline our workflow by configuring TRAMP to use multi-hops. Add the following to your config (usually .emacs, or .emacs.d/init.el; TRAMP is installed with Emacs, no additional packages required):

(add-to-list 'tramp-default-proxies-alist 
             '("gpsc" 
               nil 
               "/ssh:USERNAME@<LOCAL HOSTNAME>:"))

That allows me to open files on GPSC in my local Emacs via: C-x C-f /ssh:gpsc:FILENAME.

Execute code blocks with Org and Babel

You can use orgmode to run code blocks from a file on your local machine on GPSC. To do this, use a header like this in your local .org file (which assumes you’ve set up your .ssh/config files as described above):

#+PROPERTY: header-args:bash :results output :dir /ssh:gpsc:./path/to/working/directory

I ran into transient permission issues executing code blocks. I’ve fixed this by configuring org-babel to use my home directory for temp files. It seems to have problems writing to /tmp on GPSC, although I should have permission to do so? Here’s the code from my emacs init.el:

(setq org-babel-remote-temporary-directory "~/")

So far the temp files babel generates are cleaned up automatically, so this hasn’t created any detritus. You could use a subdirectory under ~/, but the directory must exist (i.e., you need to create it yourself) before org can find it - it won’t make it for you.

With this done, you can submit code blocks from your org file directly to slurm on GPSC. Here’s my template:

#+BEGIN_SRC bash :results output
  sbatch <<SUBMITSCRIPT
  #!/bin/bash
  #SBATCH --job-name=slurm_test_emacs
  #SBATCH --output=slurm_test_emacs.log
  #SBATCH --open-mode=append
  #SBATCH --partition=standard
  #SBATCH --time=0:01:00
  #SBATCH --ntasks=1
  #SBATCH --cpus-per-task=1
  #SBATCH --mem-per-cpu=1G
  #SBATCH --comment="<SUBMIT_COMMENT>"
  #SBATCH --account=<ACCOUNT_NAME>

  date
  echo
  echo hello
  echo

  SUBMITSCRIPT
#+END_SRC

Everything between <<SUBMITSCRIPT and SUBMITSCRIPT will get passed to sbatch as if it were a separate script file.

If you do use this approach, note that you must escape any variables!. This won’t work:

#+BEGIN_SRC bash :results output
  sbatch <<SUBMITSCRIPT
  #!/bin/bash
  #SBATCH --job-name=bad_HEREDOC
  <...>

  NAME=TYLER
  echo $NAME

  SUBMITSCRIPT
#+END_SRC

But this will (note the backslash in front of the $):

#+BEGIN_SRC bash :results output
  sbatch <<SUBMITSCRIPT
  #!/bin/bash
  #SBATCH --job-name=good_HEREDOC
  <...>

  NAME=TYLER
  echo \$NAME

  SUBMITSCRIPT
#+END_SRC

A further wrinkle: this uses a heredoc, and heredocs need to write a temp file. By defaul this goes in /tmp, and in some cases /tmp may be full. If that happens, you’ll get an error

cannot create temp file for here-document: No space left on device

All users can write to /tmp, so you can’t fix this yourself. You can change the location where your heredoc gets written, though:

#+BEGIN_SRC bash :results output
  export TMPDIR=/path/to/your/personal/tmp/directory
  sbatch <<SUBMITSCRIPT
  #!/bin/bash
  #SBATCH --job-name=good_HEREDOC
  <...>

  NAME=TYLER
  echo \$NAME

  SUBMITSCRIPT
#+END_SRC

You may set this permanently by adding the export TMPDIR=... line to your .bashrc on GPSC.

Once a job is submitted, the ID will appear in your local .org file, e.g.:

#+RESULTS:
: Submitted batch job 1385545

You can retrieve info on the run with:

#+BEGIN_SRC bash :results output 
sacct --jobs=1385545 --format=jobid,jobname,state,elapsed,MaxRSS
#+END_SRC

In my case, this generated:

#+RESULTS:
: JobID           JobName      State    Elapsed     MaxRSS 
: ------------ ---------- ---------- ---------- ---------- 
: 1385545      slurm_tes+  COMPLETED   00:00:07 
: 1385545.bat+      batch  COMPLETED   00:00:07      2096K 
: 1385545.ext+     extern  COMPLETED   00:00:07      2088K 

Use the format argument to request which details to report. MaxRSS is the maximum amount of RAM used during the run. Other options are available via sacct --helpformat, with more details in the slurm manual.

Note that GPSC doesn’t provide the seff command, which you may have used on other clusters.

Linking to files on the remote server

It may be more convenient to manage more complex scripts as stand-alone .sh files on the GPSC. If the configuration above works for you, you can insert a link to remote files in your org file with the syntax:

[[file:/ssh:gpsc:~/path/to/script.sh][My Script]]

This will create a ‘hotlink’ in your file you can click to open that file in your local Emacs. The default keyboard shortcut for entering links is C-c C-l.

That will allow you to edit the remote file as if it were local. When you’re ready to submit it, you can do that from your local .org file:

#+BEGIN_SRC bash :results output
sbatch script.sh
#+end_src

Note that the location of your script will be relative to the directory you listed in your file header.

I also use org links for data files I might need to refer to or edit (i.e., sample indices), and files generated by the jobs themselves.