Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Crop Improvement and Genetics Research » Research » Research Project #434601

Research Project: GrainGenes: Enabling Data Access and Sustainability for Small Grains Researchers

Location: Crop Improvement and Genetics Research

2021 Annual Report


Objectives
GrainGenes is an international, centralized crop database for peer-reviewed small grains data and information portal that serves the small grains research and breeding communities (wheat, barley, oat, and rye). The GrainGenes project ensures long-term data curation, accessibility, and sustainability so that small grains researchers can develop new, more nutritious, disease and pest resistant, high yielding cultivars. Objective 1: Accelerate small grains (wheat, oats, barley, and rye) trait analysis, germplasm analysis, genetic studies, and breeding by providing open access to small grains genome sequences, germplasm diversity information, trait mapping information, and phenotype data at GrainGenes. Goal 1A: Integrate small grains genome assemblies, pangenomes, and annotations into GrainGenes. Goal 1B: Integrate genetic, diversity, functional, and phenotypic data into GrainGenes with a genome-centric focus. Objective 2: Develop an infrastructure to curate, integrate, query, and visualize the genetic, genomic, and phenotypic relationships in small grains germplasm. Goal 2A: Develop methods and pipelines to link genetic, genomic, functional, and phenotypic information and to enhance genome-centric focus. Goal 2B: Implement web-based and computational tools to integrate and visualize genomic data linked with genetic, expression, functional, and diversity data. Goal 2C: Update database structure to align with community migration to a unified interface. Objective 3: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability. Goal 3A: Collaborate with data and germplasm repositories and organizations to facilitate the curation, sharing, and linking of data. Goal 3B: Collaborate with community software development efforts to adopt database schema design and tool development. Objective 4: Provide community support and training for small grains researchers through workshops, webinars, and other outreach activities. Goal 4: Facilitate communication and information sharing among the small grains communities and GrainGenes to support research needs.


Approach
As a service project, the GrainGenes team does not perform hypothesis-driven research, but rather fulfills its long-term objectives by adding value to peer-reviewed data generated by others. It provides data curation, management and integration, long-term sustainability, and digital platforms as needed. Driven by stakeholder input, GrainGenes will maintain a central location for curated genomic, genetic, functional, and phenotypic data sets, downloadable in standardized formats, enhanced by intuitive query and visualization tools. Tutorial videos will be created to train small grains researchers on how to efficiently access and retrieve information from GrainGenes, and to show them different ways to reach and use multiple types of data to help develop better small grain crops. Objective 1: Our approach will be to (a) curate genomic, pangenomic, and diversity data into the GrainGenes database; (b) create gene model pages to aggregate and link genomic and genetic data at GrainGenes; (c) curate high-impact, peer-reviewed genetic, trait, phenotypic data into GrainGenes; (d) visualize more accurate genetic maps at GrainGenes; and (e) curate functional gene annotations. Objective 2: We will implement computational pipelines to (a) align genomic and genetic features between different genome assemblies; (b) assign gene function for small grain genomes; (c) facilitate data curation into the GrainGenes database; (d) visualize SNP data online; and (e) display pedigree information. In addition, we will implement and maintain genome browsers to display tracks for multiple genome assemblies and create a multi-species Basic Local Alignment Search Tool (BLAST) interface to allow users to align their sequences against small grains genome assemblies; in parallel, we will prepare for a new release of GrainGenes with an updated content management system. Objective 3: We will enhance links and data sharing between GrainGenes and the Triticeae Toolbox for small grains data, and collaborate with other data and germplasm repositories, groups, and organizations to facilitate the curation, sharing, and linking of data. Objective 4: We will (a) present GrainGenes tools and resources at conferences and site visits; (b) create training videos to teach our users how they can use GrainGenes more efficiently; (c) organize annual meetings between GrainGenes and the GrainGenes Liaison Committee to receive community feedback; and (d) maintain GrainGenes e-mail lists to facilitate communication among members of the small grains community.


Progress Report
In support of Sub-objective 1A, several database tables were added to accommodate new genomic datasets and improve data representation. These tables form the basis of the new framework for genome browser feature search that GrainGenes created called the GrainGenes Application Programming Interface (GGAPI). This backend database for the application programming interface contains 85,461,858 records encompassing 124 genome browser tracks for 31 organisms. In support of Sub-objective 1B, 150 new small grains datasets were curated, including 117 maps for oat, the wheat Axiom map set, and many single gene maps to accompany the Wheat Gene Catalogue curation work, resulting in 992 new linkage groups on the genetic map display software called CMap, 373,825 new marker records, 965 quantitative trait loci records. Specifically, for the curation of the Wheat Gene Catalogue into GrainGenes, updates have been made to 41 gene classes, which cover approximately 232 genes and 132 alleles. In support of Sub-objective 2A, 24 single nucleotide polymorphism (SNP) variant calling pipelines were created for whole exome capture sequencing. To create these pipelines, eight read alignment applications (bowtie2, bowtie2 –local, bwa aln, bwa mem, gsnap, hisat2, STAR, novoalign) were used followed by three variant calling applications (FreeBayes, BCFtools, and VarScan) resulting in 24 (8x3) pipelines. The pipelines were used against 48 elite wheat cultivars and the performances of each pipeline were analyzed and compared. In support of Sub-objective 2B, the web-based Basic Local Alignment Search Tool (BLAST) application in GrainGenes was replaced with the more advanced and customized Sequence Server software, which takes advantage of the latest National Center for Biotechnology Information (NCBI) BLAST version and also provides new visualization of results. The new BLAST page was designed in such a way that the results are directly linked to GrainGenes JBrowse-based genome browsers where available. The new GrainGenes BLAST service for wheat, barley, oat and rye collections is available at https://wheat.pw.usda.gov/blast/. The new BLAST service harnesses the latest NCBI BLAST+ 2.10.0 with all our databases provided in the new version five database format and processing is load-balanced by multithreading. The new interface provides a drag-and-drop interface, multiple database selections with multiple query sequence support. A collection of 113 BLAST databases was recreated and linked to 44 GrainGenes genome browsers. Progress on Sub-objective 2C, included the baseline GrainGenes database being converted to a PostgreSQL (v10) database form upon which to build a platform managed by a content management system. Efforts continue to build data handling scripts using Python3 programming. The anticipated efforts will allow a transition of GrainGenes into a Chado (v3) based schema design. In tandem new curated data sources are being sought for inclusion into the database resources. In regard to Sub-objective 3A, GrainGenes created tracks in two of its genome browsers based on the data coming from the Triticeae Toolbox (T3) database for Chinese Spring wheat (version 1) and Morex barley. From each genomic feature in these tracks links were created from GrainGenes to the T3 database. The total number of links to T3 pages from GrainGenes genome browsers is currently 6,969,964 (4,965,610 links for wheat and 2,004,354 for barley). In support of Sub-objective 3B, GrainGenes has followed progress associated with the Tripal community. To fulfill community migration to the Drupal8/9 content management environment the project has begun to rework data based on the Drupal7 platform. Initial efforts have begun to address comparative genome tool environments around the new platform environment. Research progress on Sub-objective 4A, included the creation of the following two new training videos about how to use GrainGenes: 1) “Using CMap in GrainGenes to Improve Marker Density around a Gene of Interest”, and 2) “How to submit data to GrainGenes.” The videos were uploaded to YouTube, and they are freely and publicly available without a firewall. In fiscal year 2021, they received 68 and 12 views respectively.


Accomplishments
1. Private-academic-government partnership results in sequencing, assembly, and data stewardship for OT3098 Hexaploid Oat (Version 2). A private-academic-government partnership resulted in free open access to critical genomics and genome sequence assembly data for oat improvement. Oats are among the healthiest grains on earth and free access to critical genetic and genomics resource information is important to the research community for crop improvement research. ARS scientists at Albany, California, worked with private companies and scientists from universities to host, share, and display the OT3098 hexaploid oat genome sequence version 2 data at GrainGenes database. The oat genome sequence data were generated by PepsiCo and Corteva in collaboration with GrainGenes, the University of North Carolina, Charlotte, and the University of Saskatchewan, to advance oat research. GrainGenes provided a home for the long-term data stewardship for the oat assembly and annotations. An ARS press release was distributed for an earlier version of this release (i.e., version 1) in fiscal year 2020 and is available at https://www.ars.usda.gov/news-events/news/research-news/2020/oat-genome-available-on-ars-website.


Review Publications
Cagirici, B.H., Budak, H., Sen, T.Z. 2021. Genome-wide discovery of G-quadruplexes in barley. Scientific Reports. 11. Article 7876. https://doi.org/10.1038/s41598-021-86838-3.
Cagirici, B.H., Galvez, S., Sen, T.Z., Budak, H. 2021. LncMachine: a machine learning algorithm for long noncoding RNA annotation in plants. Functional and Integrative Genomics. 21:195-204. https://doi.org/10.1007/s10142-021-00769-w.