Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #419044

Research Project: Championing Improvement of Sorghum and Other Agriculturally Important Species through Data Stewardship and Functional Dissection of Complex Traits

Location: Plant, Soil and Nutrition Research

Title: Gapless assembly of complete humannand plant chromosomes using only nanopore sequencing

Author
item KOREN, SERGEY - National Human Genome Research Institute
item BAO, ZHIGUI - Chinese Academy Of Agricultural Sciences
item GUARROCINO, ANDREA - University Of Tennessee
item OU, SHUJUN - The Ohio State University
item GOODWIN, SARA - Cold Spring Harbor Laboratory
item JENIKE, KATHARINE - Johns Hopkins University
item LUCAS, JULIAN - University Of California Santa Cruz
item MCNULTY, BRANDY - University Of California Santa Cruz
item PARK, JIMIN - University Of California Santa Cruz
item PHILLIPPY, ADAM - National Human Genome Research Institute
item RAUTIAINEN, MIKKO - National Human Genome Research Institute
item RHIE, ARANG - National Human Genome Research Institute
item ROELOF, DICK - Keygene Nv
item SCHNEIDERS, HARRIE - Keygene Nv
item VRIJENHOEK, ILSE - Keygene Nv
item NIJBROEK, KOEN - Keygene Nv
item Ware, Doreen
item SCHATZ, MICHAEL - Johns Hopkins University
item GARRISON, ERIK - University Of Tennessee
item HUANG, SANWEN - Chinese Academy Of Agricultural Sciences
item MCCOMBIE, RICHARD - Cold Spring Harbor Laboratory
item MIGA, KAREN - University Of California Santa Cruz
item WITTENBERG, ALEZANDER - Keygene Nv

Submitted to: bioRxiv
Publication Type: Pre-print Publication
Publication Acceptance Date: 3/19/2024
Publication Date: N/A
Citation: N/A

Interpretive Summary: Imagine trying to put together a puzzle with millions of pieces, where each piece represents a part of a living organism's DNA. Scientists have developed powerful new tools to help us understand these puzzles better than ever before. In recent years, researchers have been using a combination of two cutting-edge technologies to decode the entire genetic instructions of humans and plants. Using a new type of sequencing technology called Oxford Nanopore (ONT) Duplex sequencing. This technique reads both strands of DNA, making it highly accurate and capable of reading very long pieces of DNA. Scientists tested this new method on three well-known genomes: human HG002, a type of tomato called Solanum lycopersicum Heinz 1706, and a variety of maize known as Zea mays B73. For the human genome, they used an additional technique called "Pore-C" to map out the way DNA folds and interacts inside cells, which helps in understanding the complete picture of our genetic information. The results were impressive. The ONT Duplex sequencing provided accuracy similar to the best existing technologies but with the added benefit of reading longer DNA sequences. This new method allowed scientists to piece together almost entire chromosomes as single continuous pieces with an accuracy of over 99.999%. This means we can now decode complete genomes using just one advanced tool, making the process simpler and more accessible. This development opens the door to new possibilities in genetics research, helping us understand more about human health, plant biology, and potentially many other living organisms.

Technical Abstract: The combination of ultra-long Oxford Nanopore (ONT) sequencing reads with long, accurate PacBio HiFi reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, “telomere-to-telomere” genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT “Duplex” sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used “Pore-C'' chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high quality initial assembly, which can then be further resolved using the ultra-long reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and has the potential to provide a single instrument solution for the reconstruction of complete genomes.