Location: Virus and Prion Research
Title: Phylogenetic diversity statistics for all clades in a phylogenyAuthor
GROVER, SIDDHANT - Iowa State University | |
MARKIN, ALEXEY - Oak Ridge Institute For Science And Education (ORISE) | |
Anderson, Tavis | |
EULENSTEIN, OLIVER - Iowa State University |
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 4/10/2023 Publication Date: 6/30/2023 Citation: Grover, S., Markin, A., Anderson, T.K., Eulenstein, O. 2023. Phylogenetic diversity statistics for all clades in a phylogeny. Bioinformatics. 39(1):i177-i184. https://doi.org/10.1093/bioinformatics/btad263. DOI: https://doi.org/10.1093/bioinformatics/btad263 Interpretive Summary: Measuring diversity of biological organisms is one of the most fundamental problems in ecology and evolutionary biology. Understanding the diversity of organisms is crucial in conservation efforts as well as in control of pathogens. In this work we develop new computational techniques to compute descriptive statistics of organisms' diversity. Our techniques are based on a popular notion of 'phylogenetic diversity' and use intricate algorithms to compute the descriptive statistics. Our algorithms compute some essential diversity statistics significantly faster than the previously suggested methods. We present a tool for in-depth study of diversity of organisms given their evolutionary history. By contemplating how diversity changes during the course of evolution, we identify the hotspots of diversity: i.e., locales and time-periods where organisms undergo rapid diversification. Technical Abstract: A quantitative measure of phylogenetic diversity, PD, has been used to address problems in conservation biology, microbial ecology, and evolutionary biology. PD has been defined as the minimum total length of the branches in a phylogeny required to cover a specified set of taxa on the phylogeny. A general goal in the application of PD has been to identify taxa that maximize PD on a given phylogeny, and this has been mirrored in the development of algorithms that can solve the problem. Other descriptive statistics, such as the minimum PD, average PD, and standard deviation of PD, provide valuable and often needed insight into the distribution of PD across a phylogeny but there is limited work on computing these statistics. We introduce efficient and exact algorithms for computing PD and the associated descriptive statistics for an entire phylogeny. Our algorithms also compute PD statistics for every clade in a phylogeny, enabling direct comparisons of PD between clades. We conducted a simulation study to test the scalability of our algorithms and demonstrate that PD statistics can be efficiently computed to analyze large phylogenies with application in ecology and evolutionary biology. |