Page tree
Skip to end of metadata
Go to start of metadata

Overview

Genomics England, in partnership with the Sanger Institute, has been assessing the advantages of long-reads sequencing technologies over Illumina short-reads whole genome sequencing. The primary objective is to identify structural variants (SVs), repeats expansions and contractions, and epigenetic modifications that cannot be accurately detected with short-read sequencing.

The dataset consists of human genomes from a subset of 100,000 Genomes Project participants assembled with ultra-long reads. The genomic data deposited in the Research Environment were generated with the Oxford Nanopore Technologies (ONT) Promethion (Beta) and comprise the full output of the long-reads analytical pipeline 1.0.

This page includes information on the sequencing protocol, the analytical pipeline and a summary of the data within the research environment.


  • Overview
  • Sequencing protocol
    • v1_protocol_ONT_LSK109
  • Bioinformatics pipeline
    • pipeline 1.0
  • Files structure


Sequencing protocol

Germline DNA from a subset of 100,000 Genome Project participants was depleted of low molecular weight DNA (<10 Kb) before library preparation. Libraries for ONT sequencing were prepared with the protocol indicated in the library_prep field of the ‘LRS_sample’ table in LabKey. Data were acquired with the PromethION Beta for 42-60hrs in high-accuracy mode. Full details of the protocol can be found here:

v1_protocol_ONT_LSK109


Bioinformatics pipeline

ONT pipeline 1.0

File structure

ONT samples

Folders:

The ONT samples are structured as follows:

run_id/sequencing_output_id/
aligned_minimap/
fast5_fail/
fast5_pass/
fastq_fail/
fastq_pass/
SV_sniffles/

Files:

Fast5:

These files contain the raw output from the ONT sequencer in a HDF5 format. Each file contains the data for up to 4000 sequences.

Fastq:

This is the output of the ONT basecaller Guppy, containing the sequence and base-quality scores of each read.

Bam:

The BAM file contains all pass filter reads and information on their alignment to GRCh38.

Sniffles VCF:

Structural variant calls against GRCh38 from the sniffles tool.

  • No labels