Skip to main content
ARS Home » Midwest Area » Madison, Wisconsin » U.S. Dairy Forage Research Center » Cell Wall Biology and Utilization Research » Research » Publications at this Location » Publication #341020

Title: Exploiting long read sequencing technologies to establish high quality highly contiguous pig reference genome assemblies

Author
item WARR, AMANDA - University Of Edinburgh
item HALL, RICHARD - Pacific Biosciences Inc
item KIM, KIRSTI - Pacific Biosciences Inc
item TSENG, ELIZABETH - Pacific Biosciences Inc
item KOREN, SERGEY - National Institutes Of Health (NIH)
item PHILLIPPY, ADAM - National Institutes Of Health (NIH)
item Bickhart, Derek
item Rosen, Benjamin - Ben
item Schroeder, Steven - Steve
item HUME, DAVID - Roslin Institute
item TALBOT, RICHARD - University Of Edinburgh
item RUND, LAURIE - University Of Illinois
item SCHOOK, LAWRENCE - University Of Illinois
item CHOW, WILLIAM - Wellcome Trust Sanger Institute
item HOWE, KIRSTIN - Wellcome Trust Sanger Institute
item Nonneman, Danny - Dan
item Rohrer, Gary
item PUTNAM, NICHOLAS - Dovetail Genomics
item GREEN, ED - Dovetail Genomics
item WATSON, MICK - Roslin Institute
item Smith, Timothy - Tim
item ARCHIBALD, ALAN - Roslin Institute

Submitted to: Plant and Animal Genome Conference
Publication Type: Abstract Only
Publication Acceptance Date: 10/31/2016
Publication Date: 1/14/2017
Citation: Warr, A., Hall, R., Kim, K., Tseng, E., Koren, S., Phillippy, A.M., Bickhart, D.M., Rosen, B.D., Schroeder, S.G., Hume, D.A., Talbot, R., Rund, L., Schook, L.B., Chow, W., Howe, K., Nonneman, D.J., Rohrer, G.A., Putnam, N., Green, E., Watson, M., Smith, T.P., Archibald, A.L. 2017. Exploiting long read sequencing technologies to establish high quality highly contiguous pig reference genome assemblies [abstract]. Plant and Animal Genome Conference XX, January 14-18, 2017, San Diego, California. Paper No. 25025.

Interpretive Summary:

Technical Abstract: The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft quality. We have built new de novo genome assemblies from whole genome shotgun (WGS) sequence reads generated using Pacific Biosciences (PacBio) long read sequencing technology for two pigs – the original reference animal (Duroc sow 2-14) and a Duroc/Landrace/Yorkshire crossbred barrow. About 60-70x coverage WGS data per animal were assembled with the Falcon assembler and error corrected with Quiver/Arrow and Pilon using high coverage WGS PacBio and Illumina reads, respectively. The estimated accuracy (99.999%) of the Duroc assembly meets the requirement of a Gold standard finished sequence. The Duroc assembly was scaffolded with paired-end reads from isogenic BAC and fosmid clones. The crossbred assembly was scaffolded using Dovetail’s Hi-Rise. The current statistics for these assemblies are: Duroc 2-14 (Sscrofa11) for SSC1-18, SSCX (2.39 Gbp, 122 contigs; contig N50=58.5 Mbp; scaffold N50=107.6 Mbp); Duroc/Landrace/Yorkshire crossbred for SSC1-18, SSCX, SSCY (2.62 Gbp, 14,924 contigs; contig N50 =6.5 Mbp; scaffold N50=132 Mbp). The BAC and fosmid clone resource from Duroc 2-14 will facilitate further targeted sequence closure. These improved genome assemblies will be a key resource for research in pigs and will enable applications in agriculture and biomedicine. The assemblies are being deposited in the public database under the pre-publication data release terms of the Toronto Statement (Nature 461:168-70).