Skip to main content
ARS Home » Midwest Area » Ames, Iowa » National Animal Disease Center » Virus and Prion Research » Research » Publications at this Location » Publication #390301

Research Project: Intervention Strategies to Control Endemic and New and Emerging Influenza A Virus Infections in Swine

Location: Virus and Prion Research

Title: smot: a python package and CLI tool for contextual phylogenetic subsampling

Author
item ARENDSEE, ZEBULUN - Oak Ridge Institute For Science And Education (ORISE)
item Baker, Amy
item Anderson, Tavis

Submitted to: Journal of Open Source Software
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/20/2022
Publication Date: 12/20/2022
Citation: Arendsee, Z.W., Baker, A.L., Anderson, T.K. 2022. smot: a python package and CLI tool for contextual phylogenetic subsampling. Journal of Open Source Software. 7(80). Article 4193. https://doi.org/10.21105/joss.04193.
DOI: https://doi.org/10.21105/joss.04193

Interpretive Summary: The U.S. Department of Agriculture influenza A virus (IAV) in swine surveillance system monitors the genetic diversity and evolutionary trends of thousands of IAV strains. Analysis of thousands of genetic sequences is computationally difficult, and important evolutionary trends and epidemiological linkages may be obscured. This problem may be overcome through reducing the number of sequences analyzed through downsampling, but this must be conducted so that genetic and geographic diversity is maintained to ensure host-to-host transmission is detectable and accurate evolutionary inference is conducted. We introduce a rigorous and empirically validated Python package and command line utility called "smot" (Simple Manipulation Of Trees). This package offers general functions for filtering phylogenetic trees, algorithms for classifying unlabeled tips given a subset of labeled reference tips, and subsampling algorithms that preserve reference strains and tree topology. The smot tool is an integral component in phylogenetic pipelines that sample and identify representative strains for whole genome sequencing within the USDA IAV in swine surveillance system. It also facilitates the rapid identification and visualization of interspecies transmission events. The smot tool is publicly available and through its objective quantification of spatial and temporal trends in the diversity of IAV, allows stakeholders to make informed decisions on IAV vaccine design to improve animal health.

Technical Abstract: smot (Simple Manipulation Of Trees) is a command line tool and Python package with the pragmatic goal of distilling large-scale phylogenetic data to facilitate inference and visualization. This package offers general functions for filtering phylogenetic trees, algorithms for classifying unlabeled tips given a subset of labeled reference tips, and subsampling algorithms that preserve reference strains and tree topology. The smot tool has broad application in phylogenetic analysis and we demonstrate its utility using a genomic epidemiology study of influenza A virus in swine.