Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #445317

Research Project: Enabling Mechanistic Allele Mining to Accelerate Genomic Selection for New Agro-Ecosystems

Location: Plant, Soil and Nutrition Research

Project Number: 8062-21000-052-000-D
Project Type: In-House Appropriated

Start Date: Mar 1, 2023
End Date: Feb 29, 2028

Objective:
Objective 1: Develop species-transferable models of gene expression and protein activity for crop plants (build using maize, sorghum, rice, wheat, Brachypodium, Setaria, tomato, Arabidopsis, soybean, and cassava data) derived solely from DNA sequence information. Sub-objective 1.A: Develop bioinformatics to support pangenome comparisons, analysis, and imputation of haplotypes for molecular quantitative genetics and breeding. Sub-objective 1.B: Develop and evaluate gene activity models for GWAS and GWP. Sub-objective 1.C: Annotate pangenomes with gene activity estimates and make them accessible to molecular breeder analysis tools. Objective 2: Apply species-transferable models of gene expression and protein activity to enhance allele mining and genomic selection for maize, minor crops, and specialty crops (including but not limited to sorghum, oat, cassava, table grapes, and others) for the identification of genetic variation controlling frost/heat tolerance and nitrogen/phosphorous recycling. Sub-objective 2.A: Allele mining for temperature tolerance in maize and its wild relatives. Sub-objective 2.B: Allele mining for nutrient recycling in maize and its wild relatives. Sub-objective 2.C: Allele mining and genomic selection using gene activity for breeding across Breeding Insight species.

Approach:
Plant breeding and genetics are poised to contribute to numerous goals in feeding the planet, adapting to climate change, and reducing the environmental impact of agriculture. To date, breeding models have been crop-specific, drastically limiting their efficacy and scope. Genomics and machine learning, however, have now developed to the point where the molecular activity of genetic variation, and thereby impacts on plant performance in the field, can be estimated solely from genome sequence information. The goal of this project now is to shift breeding models from being genetic variant-based to being molecular activity-based (e.g., protein structure and RNA/protein expression). New cross-species models will be generated and leveraged to predict whole-plant phenotypes and uncover environmental adaptation strategies, making prediction and discovery more accurate, efficient, and powerful. The project will develop bioinformatic and machine learning tools to infer the complete genomes of crop species (pangenomes) and then use that sequence to predict the protein structure and gene expression patterns in thousands of crops and related wild species. Molecular breeders working across species use these models through bioinformatic portals we develop. This project will then allele mine for genes in maize and sorghum’s wild relatives that control frost and heat tolerance, as well as recycling nitrogen and phosphorous. These wild relatives, many of which are perennial, are more tolerant of extreme temperatures and are incredibly productive without needing chemical inputs. The candidate genes will be identified by genetic mapping within Tripsacum and Zea, environmental mapping using gene activity across the Andropogoneae and landraces, and expression/metabolite profiling in wild species. Successful identification of these genes could lead to a dramatic reduction in the need for fertilizer inputs and increases in maize yields by allowing earlier planting and avoiding and tolerating heat extremes.