Skip to main content
ARS Home » Plains Area » Fargo, North Dakota » Edward T. Schafer Agricultural Research Center » Cereal Crops Improvement Research » Research » Research Project #440183

Research Project: Oat Pangenome Project Data Sharing Agreement

Location: Cereal Crops Improvement Research

Project Number: 3060-21000-046-011-N
Project Type: Non-Funded Cooperative Agreement

Start Date: Jun 1, 2021
End Date: May 31, 2026

Objective:
To produce a global pangenome which consists of approximately 30 diverse hexaploid oat accessions. This resource will be useful for characterizing core gene sets, identifying novel gene sequences, and accumulating comparative sequence information which will be of particular use for agronomic and quality trait mapping and evolutionary genomic analyses.

Approach:
Whole-genome sequencing technology and analysis has progressed to the point that the first complete, chromosome-level oat genome assemblies have been developed for two diploid oat species (Avena atlantica, AsAs genome; and A. eriantha, CpCp genome) and for A. sativa (AACCDD genome) lines ‘Belinda’ and OT3098 . Consequently, at the Plant & Animal Genome XXVIII Conference January 11-15, 2020, the oat genomics community coalesced around an Oat Pangenome Project (PanOat Project) that would involve multiple labs worldwide and result in the production of whole-genome sequence assemblies for approximately 30 diverse oat accessions. Briefly, sequenced lines will include three disease resistant accessions held by the National Small Grains Collection (NSGC) Avena barbata PI 388828, the synthetic hexaploid Amagalon (CIav9364), and heirloom variety Victoria (AFRI # RP-84), and other accessions from around the globe. Initial genome sequence information will be acquired independently by the collaborators following similar rules to ensure high quality assembly data. Most collaborators will acquire PacBio HiFi reads at 25x coverage and assemble these reads into contigs. Hi-C information will be acquired and used to orient and order the scaffolds into chromosome-size sequences. Twenty-four of the whole genome sequences will be annotated to identify genes using a combination of de novo prediction, comparative analysis, and cDNA sequencing. RNA will be extracted from 6 botanical and developmental tissue types and pooled for PacBio IsoSeq sequencing to provide this cDNA informative of genes important to the development and function of oat tissues. We will also develop a gene expression atlas of a subset of tissue types to immediately utilize the novel annotations and provide a rich genomic resource for oat researchers. The finalized assembly, annotation, and expression data will be centralized to facilitate sharing with PanOat Project collaborators for targeted analyses such as the core gene set identification, haplotype block analysis, and structural variation detection. ARS will obtain sequencing information of their 4 contributed lines, facilitate annotation and expression data acquisition of 24 lines in the PanOat Project, house the finalized data before and after publication, and work on targeted analysis of the entire data set. ARS will share raw data with the Cooperator to facilitate assembly, annotation, and expression work. The Cooperator will utilize expertise in assembly to integrate Hi-C and contig information to create full sequences. The Cooperator will additionally act as a hub to provide raw and intermediate ARS data to other collaborators in the PanOat Project, deposit raw data to public databases, and share finalized data of all 30 accessions back to the ARS for public access, storage, long-term data management, and visualization at ARS’s GrainGenes database (https://wheat.pw.usda.gov).