Skip to main content
ARS Home » Midwest Area » Ames, Iowa » National Animal Disease Center » Virus and Prion Research » Research » Publications at this Location » Publication #402040

Research Project: Intervention Strategies to Control Endemic and New and Emerging Influenza A Virus Infections in Swine

Location: Virus and Prion Research

Title: The asymmetric cluster affinity cost

Author
item WAGLE, SANKET - Iowa State University
item MARKIN, ALEXEY - Iowa State University
item GORECKI, PAWEL - University Of Warsaw
item Anderson, Tavis
item EULENSTEIN, OLIVER - Iowa State University

Submitted to: Research Computational Molecular Biology (RECOMB)
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/6/2023
Publication Date: 7/13/2023
Citation: Wagle, S., Markin, A., Gorecki, P., Anderson, T.K., Eulenstein, O. 2023. The asymmetric cluster affinity cost. Research Computational Molecular Biology (RECOMB). 13883:131-145. https://doi.org/10.1007/978-3-031-36911-7_9.
DOI: https://doi.org/10.1007/978-3-031-36911-7_9

Interpretive Summary: The identification of genetically novel influenza A viruses (IAV) that contain genes derived from human-, swine-, or avian-origin IAV is critical for controlling infection in swine. These novel viruses may be undergoing rapid changes in genetic diversity that reduce the efficacy of vaccine control methods and may also pose a greater risk to humans for zoonotic infection. In this study, we developed an algorithm that measures the distance between two evolutionary trees and identifies the differences between the two trees as a cost score. The cost score can be used to subsequently merge individual gene trees together into a larger phylogenetic network describing how reassortment has impacted the evolution of the virus. The proof of the algorithm was validated using simulated data, and demonstrated improved performance against other state-of-the-art comparison metrics, and it was able to objectively measure the differences between evolutionary trees with largescale field-relevant datasets. The development of this algorithm provides computational support for swine IAV surveillance as it can objectively identify when the genetic components of a virus are derived from different evolutionary origins. These data may then be applied to identify genetically novel of swine IAV strains for characterization, for use in vaccine development, and it may help reduce the risk of interspecies transmission by identifying viruses that have zoonotic potential.

Technical Abstract: Tree comparison costs are sophisticated tools used to compare the results of different phylogenetic hypotheses and reconstruction methods and to evaluate the robustness of a tree to data perturbations. The Robinson-Foulds distance is a widely used measure for comparing the topologies of two trees, but it is highly sensitive to tree error. Consequently, tree differences may be over-estimated, leading to incorrect inference. An approach to overcome this shortcoming is the Cluster Affinity distance, which is a refinement of the Robinson-Foulds distance. These distances are symmetric and thus designed to compare the same type of trees. However, it is more common to compare different types of trees, such as gene trees compared with species trees, or the integration of different datasets into a supertree: these comparisons are inherently asymmetric. Here, we introduce the asymmetric Cluster Affinity cost, a relaxation of the original Affinity cost to compare heterogeneous trees. We demonstrate that the characteristics of this cost are similar to the symmetric Cluster Affinity distance. Further, for the asymmetric affinity cost we describe efficient algorithms, derive the exact tree diameters, and use these to standardize the cost to be applicable in practice.