Skip to main content
ARS Home » Midwest Area » Ames, Iowa » Corn Insects and Crop Genetics Research » Research » Publications at this Location » Publication #412735

Research Project: SoyBase and the Legume Information System - Information Infrastructure and Research for Legume Crop Improvement

Location: Corn Insects and Crop Genetics Research

Title: Pandagma: a tool for identifying pan-gene sets and gene families at desired evolutionary depths and accommodating whole genome duplications

Author
item Cannon, Steven
item LEE, HYUN-OH - Orise Fellow
item Weeks, Nathan
item BERENDZEN, JOEL - Generisbio, Llc

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 8/18/2024
Publication Date: 8/24/2024
Citation: Cannon, S.B., Lee, H., Weeks, N.T., Berendzen, J. 2024. Pandagma: a tool for identifying pan-gene sets and gene families at desired evolutionary depths and accommodating whole genome duplications. Bioinformatics. https://doi.org/10.1093/bioinformatics/btae526.
DOI: https://doi.org/10.1093/bioinformatics/btae526

Interpretive Summary: Identifying corresponding genes from different individuals in a species, or from different species in a genus, is important for discovering the causes of differences between those individuals. In turn, understanding the causes of those differences helps breeders and researchers to select and generate crop varieties with improved characteristics. This publication describes software for comparing and analyzing all genes in a set of individuals. The software places them into collections of genes that represent the core sets of all genes in a species -- or if applied to multiple species, all genes in that set of species. This software is expected to help breeders and researchers to better utilize diverse genetic material for crop improvement.

Technical Abstract: Identification of allelic or corresponding genes (pan-genes) within a species or genus is important for discovery of biologically significant genetic conservation and variation. Similarly, identification of orthologs (gene families) across wider evolutionary distances is important for understanding the genetic basis for similar or differing traits. Especially in plants, several complications make identification of pan-genes and gene families challenging, including whole-genome duplications, evolutionary rate differences among lineages, and varying qualities of assemblies and annotations. Here, we document and distribute a set of workflows that we have used to address these problems.