Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Publications at this Location » Publication #373865

Research Project: SoyBase and the Legume Clade Database

Location: Corn Insects and Crop Genetics Research

Title: Family-specific gains and losses of protein domains in the legume and grass plant families

Author
item YADAV, AKSHAY - Iowa State University
item FERNANDEZ-BACA, DAVID - Iowa State University
item Cannon, Steven

Submitted to: Evolutionary Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 6/15/2020
Publication Date: 7/9/2020
Citation: Yadav, A., Fernandez-Baca, D., Cannon, S.B. 2020. Family-specific gains and losses of protein domains in the legume and grass plant families. Evolutionary Bioinformatics. 16. https://doi.org/10.1177/1176934320939943.
DOI: https://doi.org/10.1177/1176934320939943

Interpretive Summary: Proteins are the molecular workhorses of cells in all organisms. Proteins are modular, made up of smaller components called "domains." This study evaluates the domains that are found in two large groups of plants: grasses (including crops such as wheat, rice, and corn) and legumes (including crops such as soybean, lentil, and pea). In grasses, there were increases in domains involved in responses to viruses and in some aspects of flower development that are particular to grasses. In legumes, we found an increase in an antioxidant that is highly valuable in nitrogen-fixing root nodules however, several domains involved in a particular kind of DNA repair were lost. These results will be useful as researchers work to understand basic molecular differences between these two important plant groups, which include most of the crop plants that humans depend on.

Technical Abstract: Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events like domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationship. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in two large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate four types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The four types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve - i.e. via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. Domain content analysis in legumes shows a striking loss of protein domains from Fanconi Anemia pathway, the pathway which is responsible for repair of interstrand DNA crosslinks. There were also increases in glutathione synthase, an antioxidant that is important in nitrogen-fixing root nodules that are found in legumes. In grasses, there were increases in domains involved in responses to viruses and in some aspects of flower development that are particular to grasses.