Location: Plant, Soil and Nutrition Research
Title: Swift Pan-Genomic Methods for Comprehensive Genome Annotation in Crop Genomes.Author
CHOUGULE, KAPEEL - Cold Spring Harbor Laboratory | |
WEI, SHARON - Cold Spring Harbor Laboratory | |
LU, ZHENYUAN - Cold Spring Harbor Laboratory | |
OLSON, ANDREW - Cold Spring Harbor Laboratory | |
Ware, Doreen |
Submitted to: Meeting Abstract
Publication Type: Abstract Only Publication Acceptance Date: 5/7/2024 Publication Date: N/A Citation: N/A Interpretive Summary: Technical Abstract: While the production of high-quality genome assemblies from long reads has become a common practice thanks to advanced assembly algorithms, the accurate annotation of gene structures remains a significant challenge. This challenge arises due to the predictive nature of the algorithms and the inconsistency in available transcriptome evidence. A single reference genome annotation often falls short in representing the full coding potential of a species. De novo or ab-initio gene annotations also encounter issues, whether it be sensitivity or specificity problems stemming from the absence of accession-specific evidence or inadequately trained HMMs for gene prediction. As more accessions are sequenced and annotated within a species, there arises a need to establish pan-genes, which encompass all known alleles for a gene model and can be traced back to their original sources. To tackle this challenge, we have developed a pan-genomic approach that leverages representative pan-gene models selected through a comparative analysis of gene family trees created using the Ensembl Compara pipeline. We have compared and benchmarked this approach against other methods that rely on phylogeny and alignment for clustering pan-genes.To propagate these pan-gene representatives onto the genome assemblies of other unannotated accessions, we employ Liftoff and subsequently enhance the gene structures using available transcriptome evidence through PASA. This approach has been benchmarked across multiple genome assemblies of maize, rice, sorghum, and grapevine varieties. To assess the quality of gene structural annotations, we employ the Gramene gene tree curation tool, allowing us to visually identify inconsistent gene models and flag them for potential manual curation. Furthermore, we characterize pan-gene sets based on taxonomic age and their presence in each genome, classifying them as core, shell, or orphan genes. |