Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Crop Improvement and Genetics Research » Research » Research Project #434601

Research Project: GrainGenes: Enabling Data Access and Sustainability for Small Grains Researchers

Location: Crop Improvement and Genetics Research

2022 Annual Report


Objectives
GrainGenes is an international, centralized crop database for peer-reviewed small grains data and information portal that serves the small grains research and breeding communities (wheat, barley, oat, and rye). The GrainGenes project ensures long-term data curation, accessibility, and sustainability so that small grains researchers can develop new, more nutritious, disease and pest resistant, high yielding cultivars. Objective 1: Accelerate small grains (wheat, oats, barley, and rye) trait analysis, germplasm analysis, genetic studies, and breeding by providing open access to small grains genome sequences, germplasm diversity information, trait mapping information, and phenotype data at GrainGenes. Goal 1A: Integrate small grains genome assemblies, pangenomes, and annotations into GrainGenes. Goal 1B: Integrate genetic, diversity, functional, and phenotypic data into GrainGenes with a genome-centric focus. Objective 2: Develop an infrastructure to curate, integrate, query, and visualize the genetic, genomic, and phenotypic relationships in small grains germplasm. Goal 2A: Develop methods and pipelines to link genetic, genomic, functional, and phenotypic information and to enhance genome-centric focus. Goal 2B: Implement web-based and computational tools to integrate and visualize genomic data linked with genetic, expression, functional, and diversity data. Goal 2C: Update database structure to align with community migration to a unified interface. Objective 3: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability. Goal 3A: Collaborate with data and germplasm repositories and organizations to facilitate the curation, sharing, and linking of data. Goal 3B: Collaborate with community software development efforts to adopt database schema design and tool development. Objective 4: Provide community support and training for small grains researchers through workshops, webinars, and other outreach activities. Goal 4: Facilitate communication and information sharing among the small grains communities and GrainGenes to support research needs.


Approach
As a service project, the GrainGenes team does not perform hypothesis-driven research, but rather fulfills its long-term objectives by adding value to peer-reviewed data generated by others. It provides data curation, management and integration, long-term sustainability, and digital platforms as needed. Driven by stakeholder input, GrainGenes will maintain a central location for curated genomic, genetic, functional, and phenotypic data sets, downloadable in standardized formats, enhanced by intuitive query and visualization tools. Tutorial videos will be created to train small grains researchers on how to efficiently access and retrieve information from GrainGenes, and to show them different ways to reach and use multiple types of data to help develop better small grain crops. Objective 1: Our approach will be to (a) curate genomic, pangenomic, and diversity data into the GrainGenes database; (b) create gene model pages to aggregate and link genomic and genetic data at GrainGenes; (c) curate high-impact, peer-reviewed genetic, trait, phenotypic data into GrainGenes; (d) visualize more accurate genetic maps at GrainGenes; and (e) curate functional gene annotations. Objective 2: We will implement computational pipelines to (a) align genomic and genetic features between different genome assemblies; (b) assign gene function for small grain genomes; (c) facilitate data curation into the GrainGenes database; (d) visualize SNP data online; and (e) display pedigree information. In addition, we will implement and maintain genome browsers to display tracks for multiple genome assemblies and create a multi-species Basic Local Alignment Search Tool (BLAST) interface to allow users to align their sequences against small grains genome assemblies; in parallel, we will prepare for a new release of GrainGenes with an updated content management system. Objective 3: We will enhance links and data sharing between GrainGenes and the Triticeae Toolbox for small grains data, and collaborate with other data and germplasm repositories, groups, and organizations to facilitate the curation, sharing, and linking of data. Objective 4: We will (a) present GrainGenes tools and resources at conferences and site visits; (b) create training videos to teach our users how they can use GrainGenes more efficiently; (c) organize annual meetings between GrainGenes and the GrainGenes Liaison Committee to receive community feedback; and (d) maintain GrainGenes e-mail lists to facilitate communication among members of the small grains community.


Progress Report
In support of Sub-objective 1A, hyperlinks from 13,826 genes from the International Wheat Genome Sequencing Consortium’s Chinese Spring version 1 assembly were added to probe records to take users directly to: 1) ExpVIP (wheatexpression.com), which is an RNA-seq data analysis and visualization platform that holds expression data for nearly 40 studies; 2) KnetMiner (https://knetminer.com), which is a graph-based gene discovery platform; 3) PhyloGenes (http://www.phylogenes.org), which displays pre-computed phylogenetic trees of gene families alongside experimental gene function data; 4) Ensembl Plants (http://plants.ensembl.org), which is a genome-centric portal for plant species; 5) Persephone (https://persephonesoft.com), which is a genome browser that facilitate comparative genomic views; and 6) Rust expression browser (http://www.rust-expression.com), which specializes in expression of rust-disease related genes. In support of Sub-objective 1A, the following genome assemblies were brought into GrainGenes, and genome browsers and associated pages were created: Kariega wheat, Fielder wheat, Morex barley version 3, Lo7 rye, Weining rye, Sang oat, Avena longiglumis oat, Avena insularis oat, Aegilops tauschii Aet version 5.0, Aegilops tauschii T093, Aegilops tauschii AY61, Aegilops tauschii XJ02, and Aegilops tauschii AY17. In support of Sub-objective 1B, 37,789 records for loci, probes, and sequences for the 2016 50K marker set acquired from the James Hutton Institute Germinate project page were added to GrainGenes. These are found as significant markers in genome-wide association studies in barley and were linked to quantitative trait loci as those data were curated. An example sequence record, JHI-Hv50k-2016-100012 can be aligned against over 100 small grains genome databases many of which are linked to the alignment results on the accompanying JBrowse genome browser. In addition, for the Morex barley version 3 and the PepsiCo OT3098 v2 hexaploid oat genome browsers, 659 quantitative trait loci were curated and reciprocal links from the browser to the GrainGenes quantitative trait loci and probe pages for significant markers were created. In support of Sub-objective 2A, computational pipelines were run and subsequent manual curation were performed for Morex barley version 3 and the PepsiCo OT3098 v2 hexaploid oat browsers for the following genetic quantitative trait loci data to assign track positions and genomic sequences: 83 quantitative trait loci for beta-glucan, as well as 576 quantitative trait loci for net blotch, other diseases, agronomic traits, and malt trait. In support of Sub-objective 2B, the diversity data tracks were created on JBrowse-based genome browsers at GrainGenes for the International Wheat Genome Sequencing Consortium Chinese Spring wheat version 1. These tracks include varietal single nucleotide polymorphism datasets, and the datasets displaying the outcomes of the 1000 Wheat Exomes Project, which, according to their publication, “aimed to generate a haplotype map on the basis of targeted re-sequencing of 890 diverse wheat landraces and cultivars, and tetraploid wild and domesticated relatives to identify genomic regions showing the signals of introgression from wild emmer.” The resulting track in GrainGenes contains 348,372 single nucleotide polymorphisms on the A and B genomes. In support of Sub-objective 2C, there were some setbacks as a need to change operating system environments due to end-of-life support and evolution of programming minimal requirements changed. In addition, another website very close to the intended model evolved from another resource at Institute of Crops Sciences, GSCAAS, Beijing, China (wheatgene.agrinome.org); thus, local efforts were redirected to move developments into the next release environments of the Tripal software, possibly extending the utility using next generation versions of the software. In support of Sub-objective 3A, the GrainGenes team members in Albany, California, worked with the International Wheat Genome Sequencing Consortium, the Morex barley genome assembly and annotation collaboration led by a group from the Leibniz Institute of Plant Genetics and Crop Plant Research in Germany, and Sang oat genome assembly and annotation group led by Sweden scientists, to bring genomic datasets into GrainGenes. The GrainGenes team created genomic displays, databases to allow genomic alignments, and a dedicated data download section on GrainGenes. In support of Sub-objective 3B, the Tripal community is embracing a newer version of the base Drupal environment (version 9), and the support for a close to end-of-life Drupal (version 7) version will come to a close with minimal support. Participation in monthly meetings and community message boards continues to keep track of modules and newer version of the Tripal package adapted for Drupal version 9. There may be tools available which may still have utility in serving other datasets. In support of Sub-objective 4A, two online tutorials were created, which can be found at https://wheat.pw.usda.gov/GG3/tutorials. The first tutorial is entitled as “Navigating IWGSC Data”, and the other “Using BLAST on GrainGenes.” In addition, a presentation was made at the virtual Plant and Animal Genome Conference 2022 at the International Wheat Genome Sequencing Consortium Structural and Functional Genomics workshop, entitled "GrainGenes: Improved BLAST Services and Genome Browsers to Navigate IWGSC Data.” Another presentation was made at the International Barley Community Seminar Series, with the title “Navigating GWAS Results on the GrainGenes Morex v3 Genome Browser."


Accomplishments
1. A web-based tool to compute genomic alignments based on user-selected web browser regions. GrainGenes (https://wheat.pw.usda.gov) is an ARS-flagship repository that provides a centralized location for global wheat, barley, rye, and oat data. As part of its mission, GrainGenes serves genomic sequence information and genomic features for several genome assemblies through its genome browsers. ARS scientists and programmers located in Albany, California, designed a computational module that enabled users to select genomic regions on GrainGenes genome browsers through a simple step of highlighting them. This module is the first of its kind and allows users to directly run alignment computations of highlighted regions to identify similar regions in other genomes. Such alignments facilitate the functional identification of genes, some of which control important agronomic traits. ARS scientists and programmers located in Albany, California, are currently working on generalizing this module, so that other biological databases can customize and implement it for their genome browsers.


Review Publications
Blake, V.C., Wight, C.P., Yao, E., Sen, T.Z. 2022. GrainGenes: Tools and content to assist breeders improving oat quality. Foods. 11(7). Article 914. https://doi.org/10.3390/foods11070914.
Cagirici, H.B., Budak, H., Sen, T.Z. 2022. G4Boost: A machine learning-based tool for quadruplex identification and stability prediction. BMC Bioinformatics. 23. Article 240. https://doi.org/10.1186/s12859-022-04782-z.
Yao, E., Blake, V.C., Cooper, L., Wight, C.P., Michel, S., Cagirici, H.B., Lazo, G.R., Birkett, C., Waring, D.J., Jannink, J., Holmes, I., Waters, A.J., Eickholt, D.P., Sen, T.Z. 2022. GrainGenes: A data-rich repository for small grains genetics and genomics. Database: The Journal of Biological Databases and Curation. 2022. Article baac034. https://doi.org/10.1093/database/baac034.
Hussain, B., Akpinar, B.A., Alaux, M., Algharib, A.M., Sehgal, D., Ali, Z., Aradottir, G.I., Batley, J., Bellec, A., Bentley, A.R., Cagirici, H.B., Cattivelli, L., Choulet, F., Cockram, J., Desiderio, F., Devaux, P., Dogramaci, M., Dorado, G., Dreisigacker, S., Edwards, D., El-Hassouni, K., Eversole, K., Fahima, T., Figueroa, M., Galvez, S., Gill, K.S., Govta, L., Gul, A., Hensel, G., Hernandez, P., Herrera, L.C., Ibrahim, A., Kilian, B., Korzun, V., Krugman, T., Li, Y., Liu, S., Mahmoud, A.F., Morgounov, A., Muslu, T., Naseer, F., Ordon, F., Paux, E., Perovic, D., Reddy, G.V., Reif, J.C., Reynolds, M., Roychowdhury, R., Rudd, J., Sen, T.Z., Sukumaran, S., Ozdemir, B.S., Tiwari, V., Ullah, N., Unver, T., Yazar, S., Appels, R., Budak, H. 2022. Capturing wheat phenotypes at the genome level. Frontiers in Plant Science. 13. Article 851079. https://doi.org/10.3389/fpls.2022.851079.
Cho, K., Sen, T.Z., Andorf, C.M. 2022. Predicting tissue-specific mRNA and protein abundance in maize: A machine learning approach. Frontiers in Artificial Intelligence. 5. Article 830170. https://doi.org/10.3389/frai.2022.830170.