Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #411445

Research Project: Championing Improvement of Sorghum and Other Agriculturally Important Species through Data Stewardship and Functional Dissection of Complex Traits

Location: Plant, Soil and Nutrition Research

Title: Swift pan-genomic methods for comprehensive genome annotation in crop genomes

Author
item CHOUGULE, KAPEEL - Cold Spring Harbor Laboratory
item WEI, SHARON - Cold Spring Harbor Laboratory
item LU, ZHENYUAN - Cold Spring Harbor Laboratory
item OLSON, ANDREW - Cold Spring Harbor Laboratory
item Ware, Doreen

Submitted to: Meeting Abstract
Publication Type: Abstract Only
Publication Acceptance Date: 12/6/2023
Publication Date: N/A
Citation: N/A

Interpretive Summary:

Technical Abstract: While the production of high-quality genome assemblies from long reads has become a common practice thanks to advanced assembly algorithms, the accurate annotation of gene structures remains a significant challenge. This challenge arises due to the predictive nature of the algorithms and the inconsistency in available transcriptome evidence. A single reference genome annotation often falls short in representing the full coding potential of a species. De novo or ab-initio gene annotations also encounter issues, whether it be sensitivity or specificity problems stemming from the absence of accession-specific evidence or inadequately trained HMMs for gene prediction. As more accessions are sequenced and annotated within a species, there arises a need to establish pan-genes, which encompass all known syntenic orthologs for a gene model and can be traced back to their original sources. To tackle this challenge, we have developed a pan-genomic approach that leverages representative pan-gene models selected through a comparative analysis of gene family trees created using the Ensembl Compara pipeline. We have compared and benchmarked this approach against other methods that rely on phylogeny and alignment for clustering pan-genes.To propagate these pan-gene representatives onto the genome assemblies of other unannotated accessions, we employ Liftoff and subsequently enhance the gene structures using available transcriptome evidence through PASA. This approach has been benchmarked across multiple genome assemblies of maize, oryza, sorghum, and grapevine varieties.To assess the quality of gene structural annotations, we employ the Gramene gene tree curation tool, allowing us to visually identify inconsistent gene models and flag them for potential manual curation. Furthermore, we characterize pan-gene sets based on taxonomic age and their presence in each genome, classifying them as core, shell, or orphan genes.