Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #402108

Research Project: Mapping Crop Genome Functions for Biology-Enabled Germplasm Improvement

Location: Plant, Soil and Nutrition Research

Title: Pan-genomic approaches to consistent annotation of rice genomes

Author
item CHOUGULE, KAPEEL - Cold Spring Harbor Laboratory
item LU, ZHENUAN - Cold Spring Harbor Laboratory
item OLSON, ANDREW - Cold Spring Harbor Laboratory
item WEI, SHARON - Cold Spring Harbor Laboratory
item Ware, Doreen

Submitted to: Plant and Animal Genome Conference
Publication Type: Abstract Only
Publication Acceptance Date: 1/13/2023
Publication Date: N/A
Citation: N/A

Interpretive Summary:

Technical Abstract: Since the first rice genome, sequenced 20 years ago, new genomes representing a wider variety to explore agriculturally important traits, have been sequenced exhibiting higher contiguity and completeness. Assembling a high-quality rice genome assembly has become a commodity practice with lowering sequencing cost, improved sequencing chemistry, and assembly algorithms. As we transition from having single reference to multiple reference pangenomes, many challenges exist post-assembly including accurately predicting gene structural annotation and assigning locus identifiers. Due to the predictive nature of annotation algorithms and lack of curated or accession-specific transcript evidence, the majority of the annotation tools lack either sensitivity or specificity for accurately predicting gene structure. Protein structure is well conserved across grass phylogeny and more so for accession within a specie. Based on this we developed an annotation protocol that builds a pan-gene index using representative pan-gene models selected from comparative analysis of protein coding gene family trees. We have benchmarked this protocol across other grasses but presenting results in rice. We propagate these pan-genes onto genome assemblies of other unannotated rice accessions using Liftoff, and update the gene structures with available transcriptome evidence using PASA. To support the projected models we curated and included evidence from the Nipponbare reference transcriptome that includes EST and full-length mRNA that were filtered for intron retention and clustered using CDS-hit. In addition, we demonstrate the power of using accession-specific rice full-lengths in improving gene structure and capturing alternate isoforms. Supported by USDA-ARS #8062-21000-044-00D.