Location: Corn Insects and Crop Genetics Research
Title: Family-specific gains and losses of protein domains in the legume and grass plant familiesAuthor
YADAV, AKSHAY - Iowa State University | |
FERNANDEZ-BACA, DAVID - Iowa State University | |
Cannon, Steven |
Submitted to: Evolutionary Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 6/15/2020 Publication Date: 7/9/2020 Citation: Yadav, A., Fernandez-Baca, D., Cannon, S.B. 2020. Family-specific gains and losses of protein domains in the legume and grass plant families. Evolutionary Bioinformatics. 16. https://doi.org/10.1177/1176934320939943. DOI: https://doi.org/10.1177/1176934320939943 Interpretive Summary: Proteins are the molecular workhorses of cells in all organisms. Proteins are modular, made up of smaller components called "domains." This study evaluates the domains that are found in two large groups of plants: grasses (including crops such as wheat, rice, and corn) and legumes (including crops such as soybean, lentil, and pea). In grasses, there were increases in domains involved in responses to viruses and in some aspects of flower development that are particular to grasses. In legumes, we found an increase in an antioxidant that is highly valuable in nitrogen-fixing root nodules however, several domains involved in a particular kind of DNA repair were lost. These results will be useful as researchers work to understand basic molecular differences between these two important plant groups, which include most of the crop plants that humans depend on. Technical Abstract: Protein domains can be regarded as sections of protein sequences capable of folding independently and performing specific functions. In addition to amino-acid level changes, protein sequences can also evolve through domain shuffling events like domain insertion, deletion, or duplication. The evolution of protein domains can be studied by tracking domain changes in a selected set of species with known phylogenetic relationship. Here, we conduct such an analysis by defining domains as “features” or “descriptors,” and considering the species (target + outgroup) as instances or data-points in a data matrix. We then look for features (domains) that are significantly different between the target species and the outgroup species. We study the domain changes in two large, distinct groups of plant species: legumes (Fabaceae) and grasses (Poaceae), with respect to selected outgroup species. We evaluate four types of domain feature matrices: domain content, domain duplication, domain abundance, and domain versatility. The four types of domain feature matrices attempt to capture different aspects of domain changes through which the protein sequences may evolve - i.e. via gain or loss of domains, increase or decrease in the copy number of domains along the sequences, expansion or contraction of domains, or through changes in the number of adjacent domain partners. Domain content analysis in legumes shows a striking loss of protein domains from Fanconi Anemia pathway, the pathway which is responsible for repair of interstrand DNA crosslinks. There were also increases in glutathione synthase, an antioxidant that is important in nitrogen-fixing root nodules that are found in legumes. In grasses, there were increases in domains involved in responses to viruses and in some aspects of flower development that are particular to grasses. |