Location: Dale Bumpers National Rice Research Center
Project Number: 6028-21000-012-017-S
Project Type: Non-Assistance Cooperative Agreement
Start Date: Jul 1, 2024
End Date: Jun 30, 2026
Objective:
To develop a deep learning model capable of using the DNA sequence of a gene and its proximal genomic sequences to predict the gene’s expression level (mRNA copy number) and associated phenotypic effects.
Approach:
The model will be applied to the genome sequences of rice diversity panels and breeding lines to identify individuals that harbor DNA sequence variants that are predicted to cause altered patterns of gene expression and novel phenotypes. The model will be used for allele mining to identify superior haplotypes at agronomically important genes. It will also be employed for enhanced genomic selection methods incorporating gene expression predictions to accelerate rice breeding.
1) Assemble necessary training data consisting of rice genome sequence datasets and all relevant rice gene expression (RNA-Seq) datasets from public database resources.
2) Train and test deep learning models on the rice dataset. First, published convolutional neural net (CNN)-based deep learning methods will be tested. Later, those models will be modified and improved, and new models will be developed and evaluated using recurrent neural network (RNN) or Transformer-based methods.
3) The deep learning models will be applied to rice diversity panels, breeding lines, and breeding populations to identify germplasm with useful novel alleles for further experimental evaluation.
4) New genomic selection methods will be developed to incorporate gene expression predictions.