Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Publications at this Location » Publication #406839

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

Title: Modeling chromatin state from sequence across angiosperms using recurrent convolutional neural networks

Author
item WRIGHTSMAN, TRAVIS - Cornell University
item MARAND, ALEXANDRE - Agriculture University Of Georgia
item CRISP, PETER - University Of Queensland
item SPRINGER, NATHAN - University Of Minnesota
item Buckler, Edward - Ed

Submitted to: The Plant Genome
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/20/2022
Publication Date: 9/16/2022
Citation: Wrightsman, T., Marand, A.P., Crisp, P.A., Springer, N.M., Buckler IV, E.S. 2022. Modeling chromatin state from sequence across angiosperms using recurrent convolutional neural networks. The Plant Genome. 15(3):e20249. https://doi.org/10.1002/tpg2.20249.
DOI: https://doi.org/10.1002/tpg2.20249

Interpretive Summary: Accessible chromatin regions are small areas of the genome that are known to be responsible for regulating the production of proteins, the important machines that maintain life, but are hard to identify without expensive lab assays. Differences in the DNA sequence in these regions between individuals strongly suggests different regulation of protein production, which can lead to differences in appearance and even disease. In agriculture, plant breeders use differences in DNA sequences to select superior plants with better yield or nutritional quality but selecting superior plants would be much easier (and faster) if they knew which parts of the genome to focus on, such as the accessible chromatin regions. We have created a model that can accurately predict whether a given region of DNA is an accessible chromatin region or not. The model works well across plants known as angiosperms, which most of the crops around the world are. Using our model, a breeder who has access to only the genome sequence of their target crop(s) can predict where the accessible chromatin regions are and use that to develop markers for those regions. Markers within those regions are likely to be more effective in selecting superior varieties than untargeted markers. Scientists who are interested in genome regulation could also use our model to predict accessible chromatin regions in their genomes of interest without the need for expensive lab assays. Studies of the evolution of genome regulation could be done across a wide range of angiosperms in a cost-effective manner.

Technical Abstract: Accessible chromatin regions are critical components of gene regulation but modeling them directly from sequence remains challenging, especially within plants, whose mechanisms of chromatin remodeling are less understood than in animals. We trained an existing deep-learning architecture, DanQ, on data from 12 angiosperm species to predict the chromatin accessibility in leaf of sequence windows within and across species. We also trained DanQ on DNA methylation data from 10 angiosperms because unmethylated regions have been shown to overlap significantly with ACRs in some plants. The across-species models have comparable or even superior performance to a model trained within species, suggesting strong conservation of chromatin mechanisms across angiosperms. Testing a maize (Zea mays L.) held-out model on a multi-tissue chromatin accessibility panel revealed our models are best at predicting constitutively accessible chromatin regions, with diminishing performance as cell-type specificity increases. Using a combination of interpretation methods, we ranked JASPAR motifs by their importance to each model and saw that the TCP and AP2/ERF transcription factor (TF) families consistently ranked highly. We embedded the top three JASPAR motifs for each model at all possible positions on both strands in our sequence window and observed position- and strand-specific patterns in their importance to the model. With our publicly available across-species ‘a2z’ model it is now feasible to predict the chromatin accessibility and methylation landscape of any angiosperm genome.