Skip to main content
ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Crop Improvement and Genetics Research » Research » Research Project #434601

Research Project: GrainGenes: Enabling Data Access and Sustainability for Small Grains Researchers

Location: Crop Improvement and Genetics Research

2019 Annual Report


Objectives
GrainGenes is an international, centralized crop database for peer-reviewed small grains data and information portal that serves the small grains research and breeding communities (wheat, barley, oat, and rye). The GrainGenes project ensures long-term data curation, accessibility, and sustainability so that small grains researchers can develop new, more nutritious, disease and pest resistant, high yielding cultivars. Objective 1: Accelerate small grains (wheat, oats, barley, and rye) trait analysis, germplasm analysis, genetic studies, and breeding by providing open access to small grains genome sequences, germplasm diversity information, trait mapping information, and phenotype data at GrainGenes. Goal 1A: Integrate small grains genome assemblies, pangenomes, and annotations into GrainGenes. Goal 1B: Integrate genetic, diversity, functional, and phenotypic data into GrainGenes with a genome-centric focus. Objective 2: Develop an infrastructure to curate, integrate, query, and visualize the genetic, genomic, and phenotypic relationships in small grains germplasm. Goal 2A: Develop methods and pipelines to link genetic, genomic, functional, and phenotypic information and to enhance genome-centric focus. Goal 2B: Implement web-based and computational tools to integrate and visualize genomic data linked with genetic, expression, functional, and diversity data. Goal 2C: Update database structure to align with community migration to a unified interface. Objective 3: Collaborate with database developers and plant researchers to develop improved methods and mechanisms for open, standardized data and knowledge exchange to enhance database utility and interoperability. Goal 3A: Collaborate with data and germplasm repositories and organizations to facilitate the curation, sharing, and linking of data. Goal 3B: Collaborate with community software development efforts to adopt database schema design and tool development. Objective 4: Provide community support and training for small grains researchers through workshops, webinars, and other outreach activities. Goal 4: Facilitate communication and information sharing among the small grains communities and GrainGenes to support research needs.


Approach
As a service project, the GrainGenes team does not perform hypothesis-driven research, but rather fulfills its long-term objectives by adding value to peer-reviewed data generated by others. It provides data curation, management and integration, long-term sustainability, and digital platforms as needed. Driven by stakeholder input, GrainGenes will maintain a central location for curated genomic, genetic, functional, and phenotypic data sets, downloadable in standardized formats, enhanced by intuitive query and visualization tools. Tutorial videos will be created to train small grains researchers on how to efficiently access and retrieve information from GrainGenes, and to show them different ways to reach and use multiple types of data to help develop better small grain crops. Objective 1: Our approach will be to (a) curate genomic, pangenomic, and diversity data into the GrainGenes database; (b) create gene model pages to aggregate and link genomic and genetic data at GrainGenes; (c) curate high-impact, peer-reviewed genetic, trait, phenotypic data into GrainGenes; (d) visualize more accurate genetic maps at GrainGenes; and (e) curate functional gene annotations. Objective 2: We will implement computational pipelines to (a) align genomic and genetic features between different genome assemblies; (b) assign gene function for small grain genomes; (c) facilitate data curation into the GrainGenes database; (d) visualize SNP data online; and (e) display pedigree information. In addition, we will implement and maintain genome browsers to display tracks for multiple genome assemblies and create a multi-species Basic Local Alignment Search Tool (BLAST) interface to allow users to align their sequences against small grains genome assemblies; in parallel, we will prepare for a new release of GrainGenes with an updated content management system. Objective 3: We will enhance links and data sharing between GrainGenes and the Triticeae Toolbox for small grains data, and collaborate with other data and germplasm repositories, groups, and organizations to facilitate the curation, sharing, and linking of data. Objective 4: We will (a) present GrainGenes tools and resources at conferences and site visits; (b) create training videos to teach our users how they can use GrainGenes more efficiently; (c) organize annual meetings between GrainGenes and the GrainGenes Liaison Committee to receive community feedback; and (d) maintain GrainGenes e-mail lists to facilitate communication among members of the small grains community.


Progress Report
In support of Sub-objective 1A, multiple genome browsers were made publicly available and new genome browser tracks were added for wheat and barley. ARS scientists collaborated with genome sequencing consortiums to get their genomic and genetic data. In some cases, these genetic data sets required heavy curation, and a few oversights were fixed. Among the new assemblies and new tracks, wheat received the most data. A durum wheat genome assembly and annotations were publicly released, along with version two of the wild emmer genome assembly and annotations. The largest addition was for the Chinese Spring hexaploid wheat genome. The paper for the 1,000 wheat exomes project was published in May 2019, and the genomic project outcomes were added as a separate section on the International Wheat Genome Sequencing Consortium Reference Sequence (RefSeq) v1.0 genome browser. In support of Sub-objective 1B, multiple data sets were curated into GrainGenes: 1) Spring Wheat Nested Association (NAM) Map; 2) the Global Tetraploid Wheat Collection germplasm; 3) the Durum quantitative trait loci; 4) uniform regional nursery data; 5) legacy oat maps; 6) the Oat 2018 Consensus map; and 7) updated GrainGenes trait records. GrainGenes indexed the following data sets into the Wheat Information System: 1) 16,106 germplasm records including lines from the Global Tetraploid Wheat Collection, and all other diploid, tetraploid and hexaploid accessions in GrainGenes; 2) 548 quantitative trait loci; 3) 101 genetic and physical maps; and 4) 3,119 genes from Wheat Gene Catalogue. In support of Sub-objective 2A, the reference genome sequence data has been made available to download from the genome browser pages and from the data download site created on GrainGenes at https://wheat.pw.usda.gov/graingenes_downloads/. In support of Sub-objective 2B, GrainGenes collaborated with a group in University of California, Berkeley, to develop a prototype JBrowse plug-in called JBlast to allow BLASTing of sequences directly from tracks on JBrowse-based browsers and link them with genetic marker information. The production-quality plug-in is publicly available for anyone who uses a JBrowse-based genome browser, which is one of the most popular genome browsers used in the world. In support of Sub-objective 2C, GrainGenes data sources including the formal MySQL relational database (315 tables), the companion CMap genetic maps’ MySQL database (over 250 data sets), and the Basic Local Alignment Search Tool (BLAST) nucleic acid and peptide databases (over 500 data sets) were deconstructed to create base files for future integration into newer database tools in development. Docker containers were created to prepare migration strategies for newer operating systems, migrated programming tools, and updated software versions used for data visualization and curation. Attention focused on the Tripal suite of modules, and JBrowse for genome-visualization and Pretzel for genetic map-visualization. Steps were taken to convert the current MySQL (v5.5) version of the GrainGenes database into a PostgreSQL (v10.8) version in preparation for migration to a Content Management System (CMS)(Drupal7) driven suite of modules (Tripal v3) for biological data. A PostgreSQL version of GrainGenes was created, and a workflow of test data-queries were tested and adjusted. In support of Sub-objective 3A, collaboration with the Wheat Information System (WheatIS; wheatis.org) and the personnel at Unité de Recherche Génomique Info (URGI) in France continued in FY19. Operating under the Wheat Initiative, WheatIS is a platform that provides a single hub of access to the wheat data that is distributed among the small grains databases worldwide through a common application programming interface (API). A shortcut to all GrainGenes data at WheatIS can be found at https://urgi.versailles.inra.fr/wheatis/#result/term=graingenes. This year, GrainGenes has started a closer collaboration with the USDA-ARS Triticeae Toolbox (T3) project for the genomic data representation. To reduce cost and increase efficiency, both databases decided to maintain and populate a common set of genome browsers housed at GrainGenes. Our collaborative efforts were described in a GrainGenes Database article, which is in press. In support of Sub-objective 3B, a PostgreSQL-based content management system (CMS) hosting the Tripal module suite was created which will allow hosting the new version of GrainGenes alongside the default Chado-based (v3.1) database used by the Tripal module suite. Multiple test platforms were created to differentiate genome content for wheat, barley, rye, and oat. In support of Objective 4, a new interface has been designed for the USDA-ARS Small Grains Genotyping Labs. This is the website describing the four ARS genotyping labs in the U.S. They use the site for information, links, and contact information. https://wheat.pw.usda.gov/GenotypingLabs/. The Barley Genetics Newsletter v47 was created by ARS scientists and made available at GrainGenes as a pdf document. Barley Genetics Newsletter issues can be found at: https://wheat.pw.usda.gov/ggpages/bgn/. The Annual Wheat Newsletter is hosted online from v63 in 2018 back to the 37th issue at (https://wheat.pw.usda.gov/ggpages/awn/). Two training videos were created and disseminated through GrainGenes and YouTube which have reached stakeholders globally.


Accomplishments
1. GrainGenes increased its global userbase by 65 percent. GrainGenes (https://wheat.pw.usda.gov) is the ARS flagship database for small grains data, including wheat, barley, rye, and oat. The userbase of GrainGenes is distributed across six continents, more than half of which are located in the U.S., China, and India. In comparison to the previous year, GrainGenes site visitors increased by 65 percent to 31,038 based on unique internet protocol (IP) addresses.


Review Publications
Arora, R., Bharyal, P., Sarswati, S., Sen, T.Z., Yennamalli, R.M. 2018. Structural dynamics of lytic polysaccharide monoxygenases reveals a highly flexible substrate binding region. Journal of Molecular Graphics and Modeling. 88:1-10. https://doi.org/10.1016/j.jmgm.2018.12.012.
Blake, V.C., Woodhouse, M.R., Lazo, G.R., Odell, S.G., Wight, C.W., Tinker, N.A., Wang, Y., Gu, Y.Q., Birkett, C.L., Jannink, J., Matthews, D.E., Hane, D.L., Michel, S.L., Yao, E., Sen, T.Z. 2019. GrainGenes: centralized small grain resources and digital platform for geneticists and breeders. Database: The Journal of Biological Databases and Curation. 2019.