Next-Generation Nanopore Genome Sequencing
Over the last decade, improvements in next generation DNA sequencing technology have transformed the field of genomics, making it an essential tool in modern genetic and clinical research laboratories. The facility to sequence whole genomes or specific genomic regions of interest is delivering new insights into a variety of applications such as human health and disease, metagenomics, antimicrobial resistance, evolutionary biology and crop breeding.
Next-generation sequencing (NGS) technologies using DNA, RNA, or methylation sequencing have impacted enormously on the life sciences. NGS is the choice for large-scale genomic and transcriptomic sequencing because of the high-throughput production and outputs of sequencing data in the gigabase range per instrument run and the lower cost compared to the traditional Sanger first-generation sequencing method. The vast amounts of data generated by NGS have broadened our understanding of structural and functional genomics through the concepts of “omics” ranging from basic genomics to integrated systeomics, providing new insight into the workings and meaning of genetic conservation and diversity of living things. NGS today is more than ever about how different organisms use genetic information and molecular biology to survive and reproduce with and without mutations, disease, and diversity within their population networks and changing environments. In this chapter, the advances, applications, and challenges of NGS are reviewed starting with a history of first-generation sequencing followed by the major NGS platforms, the bioinformatics issues confronting NGS data storage and analysis, and the impacts made in the fields of genetics, biology, agriculture, and medicine in the brave, new world of ”omics.” Next-generation sequencing (NGS) refers to the deep, high-throughput, in-parallel DNA sequencing technologies developed a few decades after the Sanger DNA sequencing method first emerged in 1977 and then dominated for three decades. The NGS technologies are different from the Sanger method in that they provide massively parallel analysis, extremely high-throughput from multiple samples at much reduced cost. Millions to billions of DNA nucleotides can be sequenced in parallel, yielding substantially more throughput and minimizing the need for the fragment-cloning methods that were used with Sanger sequencing. The second-generation sequencing methods are characterized by the need to prepare amplified sequencing libraries before undertaking sequencing of the amplified DNA clones, whereas third-generation single molecular sequencing can be done without the need for creating the time-consuming and costly amplification libraries.
A genome is the complete genetic information of an organism or a cell. Single or double stranded nucleic acids store this information in a linear or in a circular sequence. To precisely determine this sequence, progressively more efficient technologies characterized by increased accuracy, throughput and sequencing speed have been developed. Nonetheless, sequencers can generate sequences, known as reads, comprised only in defined ranges of lengths, usually far shorter than the size of the genomes investigated. The complete genome sequence has to be deduced from the overlaps of these shorter fragments, a process defined as de novo genome assembly. Historically, mostly due to time and cost constraints, only an individual per species was addressed, and its sequence generally represents the ‘reference’ genome for the species. These reference genomes can guide resequencing efforts in the same species, acting as a template for read mapping. They can be annotated to understand gene function or used to design gene manipulation experiments (Teh B.T., Lim K., Yong C.H., 2017). Sequences from different species can be aligned and compared to study molecular evolution. Due to the impact of reference genomes in all these downstream applications, it is paramount that their sequence is as much complete and error-free as possible. In the last 50 years of the XX century, available sequencing technologies allowed to focus mostly on relatively small genomes. Since the new millennium, novel platforms, known as Next Generation Sequencing (NGS), have been developed to address larger genomes, in a process called Whole Genome Sequencing (WGS). In the two decades following the advent of WGS, NGS has become increasingly more efficient and affordable. In parallel, new sequencing technologies have emerged that promise to revolutionize the field and generate genomes of higher quality (Bickhart D.M., Rosen B.D., 2017).
In essence, whole-genome sequencing (WGS) is becoming widely used in clinical medicine in diagnostic contexts and to inform treatment choice. Here we evaluate the potential of the Oxford Nanopore Technologies (ONT) MinION long-read sequencer for routine WGS by sequencing the reference sample NA12878 and the genome of an individual with ataxia-pancytopenia syndrome and severe immune dysregulation. We develop and apply a novel reference panel-free analytical method to infer and then exploit phase information which improves single-nucleotide variant (SNV) calling performance from otherwise modest levels. In the clinical sample, we identify and directly phase two non-synonymous de novo variants in SAMD9L, (OMIM #159550) inferring that they lie on the same paternal haplotype. Whilst consensus SNV-calling error rates from ONT data remain substantially higher than those from short-read methods, we demonstrate the substantial benefits of analytical innovation. Ongoing improvements to base-calling and SNV-calling methodology must continue for nanopore sequencing to establish itself as a primary method for clinical WGS.
Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009
Burton J.N., Adey A., Patwardhan R.P., Qiu R., Kitzman J.O. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol. 2013
Dudchenko O., Batra S.S., Omer A.D., Nyquist S.K., Hoeger M. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017
Teh B.T., Lim K., Yong C.H., Ng C.C.Y., Rao S.R. The draft genome of tropical fruit durian (Durio zibethinus) Nat Genet. 2017
Bickhart D.M., Rosen B.D., Koren S., Sayre B.L., Hastie A.R. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet. 2017