Pegasus is the code-name for genomicsengland High Performance compute cluster that runs all production worthy workflows.
Pegasus used IBM's Load Sharing Facility ( simply call spectrum LSF) as the workload management tool (Job Scheduler).
Accessing the HPC
The HPC is acessed via ssh from the terminal. The following is an example of a gecip user, John Doe, connecting via ssh. The address will change depending on what group you belong to, see the table below for more information.
You will then be prompted for your password, and once entered, will be connected to the HPC.
If you do not want to enter a password each time you connect, you can create a ssh key and ssh config file that will make logging in easier.
-Create a ssh key in your .ssh folder, which is located in /home/<username>/.ssh
Follow the prompts to name your ssh key (I suggest cluster as a good name) and leave the password blank.
Create a ssh config file called config in the .ssh folder with the following information and format:
Copy your new ssh public key to the HPC
This will ask for your password, then copy the ssh key to the HPC.
Now, instead of having to type
You can connect by typing
Login nodes access address
|hpc-prod-grid-login-gecip-01||GeCips & Researchers||gecip_lsf_access, research_lsf_access|
|hpc-prod-grid-login-discoveryforum-01||Commercial (Discovery Forum)||discovery_lsf_access|
Using software on the HPC
Sofware on the Genomics England HPC is managed through the module system framework. A full list of the software available in the modules is available here: Software Available on the HPC.
Module system commands
module load R/3.4.0
Always specify the version of the software that you want to load, to avoid errors and unexpected results. Running
module load R will load version 3.5.1 instead of my desired version of 3.4.0.
module unload R/3.4.0
Switching versions (required software to be loaded first)
module switch R/3.3.0
How to submit jobs to LSF
Before you can submit jobs to the cluster, you need to load the cluster module.
module load cluster/prod
To load the module automatically on ssh connection to the HPC, add the command to the end of your
If the above does not work for some reason, run the following line:
To Submit an LSF job, you'll use the command bsub:
bsub -q <queue_name> -P <project_code> -o <output.stdout> -e <output.stderr> <myjob>
To submit an LSF job using a script, use the following command:
bsub -q <queue_name> -P <project_code> -o <output.stdout> -e <output.stderr> < <myscript.sh>
For a list of all LSF queues and project codes see LSF Project Codes.
You will only be able to submit to queues that you have LDAP access to.
Please use the login node as a portal to the HPC to submit jobs and nothing else. Unauthorised tools will not be permitted to run on the login nodes, and if they are found to be running, will be terminated without warning.
Interactive Vs Batch Jobs
Interactive Jobs are jobs that you interact with
- command line
- Job stays connected to submission shell
Interactive jobs have dedicated queue (name of the queue is inter) with dedicated resources during core hours for faster dispatch
Batch jobs are jobs that you don’t interact with. Job is disconnected from submission shell.
Jobs are batch by default
Some Basic LSF Commands
submits a job to the cluster
shows info on the cluster queues
shows info on the cluster jobs
shows info on the cluster hosts
shows info on the finished cluster jobs
shows statistics and info on finished cluster jobs
removes a job from the cluster
shows static resource info
shows dynamic resource info
bjobs (Display information about LSF job)
bjobs is a very handy command to view job information (both Pending & running jobs). Using the long option ( -l ), it shows high level view of why (in case of job Pending in the queue), where, turnaround time, resource usage detail (for Running jobs)
bjobs -l <JOBID>
bjobs -l 513
Genomics England LSF setup
Each node has a fixed number of ‘Job slots’. A job consumes one slot [can be more for parallel jobs]. Standard policy is 1 Job slot per CPU (for CPU single core) or 1 Job slot per Core (for CPU multi core).
Consequently, the cluster has a maximum concurrent job limit to allow a fast dispatch.
In our estate, this means each compute node can accommodate 22 concurrent jobs (batch jobs).
For interactive jobs, we allow more job slots per node with the assumption that interactive workloads are not resource intensive (mostly a way for users to submit to the cluster than running day-to-day activities directly into the submission node)
This is our main Production grid. All workloads are expected to be submitted to this grid targeting the right queue. (Total number of Job slots will increase overtime)
|Cluster Name||Total number of CPU cores||Total number of Job slots||Available queues|
View cluster information
To view cluster information (LSF version, Cluster name, Master host) & check if your environment is correctly setup, run command lsid
IBM Spectrum LSF Standard 10.1.0.0, Jul 08 2016
Copyright International Business Machines Corp. 1992, 2016.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
My cluster name is pegasus
My master name is hpc-prod-grid-lsfmaster-01.gel.zone
You'll be able to see which cluster you are connected to. "My cluster name is pegasus" refers to the Production cluster
Genomics England HPC queues are time based queues (short, medium, long). This means you need to submit the jobs to the queue that reflects the runtime of your job.
For example, if you have a job that will run anytime upto 4 hours, then short queue is the queue to submit the jobs to. Likewise, if you have a job that runs upto 24 hours, then medium queue is the right queue.
To see all available queues in the grid, run bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
inter 50 Open:Active - - - - 0 0 0 0
short 30 Open:Active - - - - 0 0 0 0
medium 20 Open:Active - - - - 0 0 0 0
long 10 Open:Active - - - - 0 0 0 0
|Queue name||Who||LDAP group||Description|
|inter||ALL||N/A||This queue is for light weight interactive or GUI tools.The queue has a per user concurrent job limit of 5|
|short||external||discovery_lsf_access, gecip_lsf_access, research_lsf_access||This queue is for jobs with maximum RUNTIME of 4 hours|
|medium||external||discovery_lsf_access, gecip_lsf_access, research_lsf_access||This queue is for jobs with maximum RUNTIME of 24 hours|
|long||external||discovery_lsf_access, gecip_lsf_access, research_lsf_access||This queue is for jobs with unlimited RUNTIME (default to 7 days if not specified - or specify the limit as -W [hours:]minutes )|
Resources in LSF
LSF tracks resource availability and usage. LSF Jobs can use defined resources to request specific resource.
All hosts have static numeric resources. e.g
maxmem total physical memory
ncpus number of CPUs
maxtmp maximum available space in /tmp
cpuf CPU factor (relative performance)
as well as all hosts have dynamic numeric resources. e.g
mem available memory
tmp available space in /tmp
ut CPU utilisation
Additionally resources can be, OS and ARCH boolean resources per host. This allows easy targeting of correct platforms. Example generic and specific OS resources
ub1604 host is running Ubuntu 16.04.
dsk host has local disk /scratch with 2TB space
Ways to specify resources strings requirement (-R option)
Select: It is a logical expression built from a set of resource names
Order: The order string is used for host sorting and selection
Usage: It is used to specify resource reservations for jobs
Span: A span string specifies the locality of a parallel job.
Same: The same string specifies that all processes of a parallel job must run on hosts with the same resource