4. In-Depth Guide to HPC Usage

Pegasus is the code-name for genomicsengland High Performance compute cluster that runs all production worthy workflows. Pegasus used IBM's Load Sharing Facility ( simply call spectrum LSF) as the workload management tool (Job Scheduler).       

Accessing the HPC

The HPC is acessed via ssh from the terminal. The following is an example of a gecip user, John Doe, connecting via ssh. The address will change depending on what group you belong to, see the table below for more information.

You will then be prompted for your password, and once entered, will be connected to the HPC.

If you do not want to enter a password each time you connect, you can create a ssh key and ssh config file that will make logging in easier.

-Create a ssh key in your .ssh folder, which is located in /home/<username>/.ssh

cd .ssh

Follow the prompts to name your ssh key (I suggest cluster as a good name) and leave the password blank.

Create a ssh config file called config in the .ssh folder with the following information and format:

Host cluster
	Hostname hpc-prod-grid-login-gecip-01
	User <your username>
	IdentityFile ~/.ssh/cluster

Copy your new ssh public key to the HPC

ssh-copy-id -i cluster.pub cluster

This will ask for your password, then copy the ssh key to the HPC.

Now, instead of having to type

You can connect by typing

ssh cluster

Login nodes access address

NameWhoLDAP groupDescription
hpc-prod-grid-login-gecip-01GeCips & Researchersgecip_lsf_access, research_lsf_access
hpc-prod-grid-login-discoveryforum-01Commercial (Discovery Forum)discovery_lsf_access

Using software on the HPC

Sofware on the Genomics England HPC is managed through the module system framework. A full list of the software available in the modules is available here: Software Available on the HPC.

Module system commands

Loading software:

module load R/3.4.0

Always specify the version of the software that you want to load, to avoid errors and unexpected results. Running module load R will load version 3.5.1 instead of my desired version of 3.4.0.

Unloading software

module unload R/3.4.0

Switching versions (required software to be loaded first)

module switch R/3.3.0

How to submit jobs to LSF

Before you can submit jobs to the cluster, you need to load the cluster module.

module load cluster/prod

To load the module automatically on ssh connection to the HPC, add the command to the end of your .bashrc file.

If the above does not work for some reason, run the following line:

source /lsf/prod/conf/profile.lsf

To Submit an LSF job, you'll use the command bsub:

bsub -q <queue_name> -P <project_code> -o <output.stdout> -e <output.stderr> <myjob>

To submit an LSF job using a script, use the following command:

bsub -q <queue_name> -P <project_code> -o <output.stdout> -e <output.stderr> < <myscript.sh>

For a list of all LSF queues and project codes see LSF Project Codes.

You will only be able to submit to queues that you have LDAP access to.

Please use the login node as a portal to the HPC to submit jobs and nothing else. Unauthorised tools will not be permitted to run on the login nodes, and if they are found to be running, will be terminated without warning.

Interactive Vs Batch Jobs

Interactive Jobs are jobs that you interact with

  • command line
  • GUI
  • Job stays connected to submission shell

Interactive jobs have dedicated queue (name of the queue is inter) with dedicated resources during core hours for faster dispatch

Batch jobs are jobs that you don’t interact with. Job is disconnected from submission shell.

Jobs are batch by default

Some Basic LSF Commands 




submits a job to the cluster


shows info on the cluster queues


shows info on the cluster jobs


shows info on the cluster hosts


shows info on the finished cluster jobs


shows statistics and info on finished cluster jobs


removes a job from the cluster


shows static resource info


shows dynamic resource info

bjobs (Display information about LSF job)

bjobs is a very handy command to view job information (both Pending & running jobs). Using the long option ( -l ), it shows high level view of why (in case of job Pending in the queue), where, turnaround time, resource usage detail (for Running jobs)


bjobs -l <JOBID>


bjobs -l 513

Genomics England LSF setup

Each node has a fixed number of ‘Job slots’. A job consumes one slot [can be more for parallel jobs]. Standard policy is 1 Job slot per CPU (for CPU single core) or 1 Job slot per Core (for CPU multi core).

Consequently, the cluster has a maximum concurrent job limit to allow a fast dispatch.
In our estate, this means each compute node can accommodate 22 concurrent jobs (batch jobs).

For interactive jobs, we allow more job slots per node with the assumption that interactive workloads are not resource intensive (mostly a way for users to submit to the cluster than running day-to-day activities directly into the submission node)


This is our main Production grid. All workloads are expected to be submitted to this grid targeting the right queue. (Total number of Job slots will increase overtime)

Cluster NameTotal number of CPU coresTotal number of Job slotsAvailable queues







View cluster information

To view cluster information (LSF version, Cluster name, Master host) & check if your environment is correctly setup, run command lsid


IBM Spectrum LSF Standard, Jul 08 2016

Copyright International Business Machines Corp. 1992, 2016.

US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

My cluster name is pegasus

My master name is hpc-prod-grid-lsfmaster-01.gel.zone

You'll be able to see which cluster you are connected to. "My cluster name is pegasus" refers to the Production cluster 

Queues Available

Genomics England HPC queues are time based queues (short, medium, long). This means you need to submit the jobs to the queue that reflects the runtime of your job.

For example, if you have a job that will run anytime upto 4 hours, then short queue is the queue to submit the jobs to. Likewise, if you have a job that runs upto 24 hours, then medium queue is the right queue.

To see all available queues in the grid, run bqueues

inter            50  Open:Active       -    -    -    -     0     0     0     0
short            30  Open:Active       -    -    -    -     0     0     0     0
medium           20  Open:Active       -    -    -    -     0     0     0     0
long             10  Open:Active       -    -    -    -     0     0     0     0

Queue nameWhoLDAP groupDescription
interALLN/AThis queue is for light weight interactive or GUI tools.The queue has a per user concurrent job limit of 5
shortexternal discovery_lsf_access, gecip_lsf_access, research_lsf_accessThis queue is for jobs with maximum RUNTIME of 4 hours
mediumexternaldiscovery_lsf_access, gecip_lsf_access, research_lsf_accessThis queue is for jobs with maximum RUNTIME of 24 hours
longexternaldiscovery_lsf_access, gecip_lsf_access, research_lsf_accessThis queue is for jobs with unlimited RUNTIME (default to 7 days if not specified - or specify the limit as -W [hours:]minutes )

Resources in LSF

LSF tracks resource availability and usage. LSF Jobs can use defined resources to request specific resource.
All hosts have static numeric resources. e.g
maxmem      total physical memory
ncpus           number of CPUs
maxtmp        maximum available space in /tmp
cpuf              CPU factor (relative performance)
as well as all hosts have dynamic numeric resources. e.g
mem             available memory
tmp              available space in /tmp
ut                 CPU utilisation 
Additionally resources can be, OS and ARCH boolean resources per host. This allows easy targeting of correct platforms. Example generic and specific OS resources
ub1604        host is running Ubuntu 16.04.

dsk              host has local disk /scratch with 2TB space

Ways to specify resources strings requirement (-R option)

   Select: It is a logical expression built from a set of resource names
   Order: The order string is used for host sorting and selection
   Usage: It is used to specify resource reservations for jobs
   Span: A span string specifies the locality of a parallel job.
   Same: The same string specifies that all processes of a parallel job must run on hosts with the same resource