Location: Virus and Prion Research
Title: The asymmetric cluster affinity costAuthor
WAGLE, SANKET - Iowa State University | |
MARKIN, ALEXEY - Iowa State University | |
GORECKI, PAWEL - University Of Warsaw | |
Anderson, Tavis | |
EULENSTEIN, OLIVER - Iowa State University |
Submitted to: Research Computational Molecular Biology (RECOMB)
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 3/6/2023 Publication Date: 7/13/2023 Citation: Wagle, S., Markin, A., Gorecki, P., Anderson, T.K., Eulenstein, O. 2023. The asymmetric cluster affinity cost. Research Computational Molecular Biology (RECOMB). 13883:131-145. https://doi.org/10.1007/978-3-031-36911-7_9. DOI: https://doi.org/10.1007/978-3-031-36911-7_9 Interpretive Summary: The identification of genetically novel influenza A viruses (IAV) that contain genes derived from human-, swine-, or avian-origin IAV is critical for controlling infection in swine. These novel viruses may be undergoing rapid changes in genetic diversity that reduce the efficacy of vaccine control methods and may also pose a greater risk to humans for zoonotic infection. In this study, we developed an algorithm that measures the distance between two evolutionary trees and identifies the differences between the two trees as a cost score. The cost score can be used to subsequently merge individual gene trees together into a larger phylogenetic network describing how reassortment has impacted the evolution of the virus. The proof of the algorithm was validated using simulated data, and demonstrated improved performance against other state-of-the-art comparison metrics, and it was able to objectively measure the differences between evolutionary trees with largescale field-relevant datasets. The development of this algorithm provides computational support for swine IAV surveillance as it can objectively identify when the genetic components of a virus are derived from different evolutionary origins. These data may then be applied to identify genetically novel of swine IAV strains for characterization, for use in vaccine development, and it may help reduce the risk of interspecies transmission by identifying viruses that have zoonotic potential. Technical Abstract: Tree comparison costs are sophisticated tools used to compare the results of different phylogenetic hypotheses and reconstruction methods and to evaluate the robustness of a tree to data perturbations. The Robinson-Foulds distance is a widely used measure for comparing the topologies of two trees, but it is highly sensitive to tree error. Consequently, tree differences may be over-estimated, leading to incorrect inference. An approach to overcome this shortcoming is the Cluster Affinity distance, which is a refinement of the Robinson-Foulds distance. These distances are symmetric and thus designed to compare the same type of trees. However, it is more common to compare different types of trees, such as gene trees compared with species trees, or the integration of different datasets into a supertree: these comparisons are inherently asymmetric. Here, we introduce the asymmetric Cluster Affinity cost, a relaxation of the original Affinity cost to compare heterogeneous trees. We demonstrate that the characteristics of this cost are similar to the symmetric Cluster Affinity distance. Further, for the asymmetric affinity cost we describe efficient algorithms, derive the exact tree diameters, and use these to standardize the cost to be applicable in practice. |