Skip to main content
ARS Home » Midwest Area » Ames, Iowa » National Animal Disease Center » Virus and Prion Research » Research » Publications at this Location » Publication #383640

Research Project: Intervention Strategies to Control Endemic and New and Emerging Viral Diseases of Swine

Location: Virus and Prion Research

Title: The United States Swine Pathogen Database: Integrating veterinary diagnostic laboratory sequence data to monitor emerging pathogens of swine

Author
item Anderson, Tavis
item INDERSKI, BLAKE - Orise Fellow
item DIEL, DIEGO - Cornell University
item HAUSE, BENJAMIN - South Dakota State University
item PORTER, ELIZABETH - Kansas State University
item CLEMENT, TRAVIS - South Dakota State University
item NELSON, ERIC - South Dakota State University
item BAI, JIANFA - Kansas State University
item Lager, Kelly
item Faaberg, Kay
item CHRISTOPHER-HENNINGS, JANE - South Dakota State University
item GAUGER, PHILIP - Iowa State University
item ZHANG, JIANQIANG - Iowa State University
item HARMON, KAREN - Iowa State University
item MAIN, RODGER - Iowa State University

Submitted to: Database: The Journal of Biological Databases and Curation
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/29/2021
Publication Date: 12/15/2021
Citation: Anderson, T.K., Inderski, B., Diel, D.G., Hause, B.M., Porter, E., Clement, T., Nelson, E.A., Bai, J., Lager, K.M., Faaberg, K.S., Christopher-Hennings, J., Gauger, P.C., Zhang, J., Harmon, K.M., Main, R. 2021. The United States Swine Pathogen Database: Integrating veterinary diagnostic laboratory sequence data to monitor emerging pathogens of swine. Database: The Journal of Biological Databases and Curation. 2021. Article baab078. https://doi.org/10.1093/database/baab078.
DOI: https://doi.org/10.1093/database/baab078

Interpretive Summary: In recent years, several deadly viral diseases of pigs have emerged in the United States causing hundreds of millions of dollars in economic damage. To effectively respond to these diseases or detect new disease incursions or viral variants, it is critical to have a database of currently circulating viral genetic sequences and associated tools to analyze the sequences. A database was constructed to house porcine reproductive and respiratory syndrome virus, Senecavirus A, porcine epidemic diarrhea virus, African and Classical swine fever viruses, and Foot-and-mouth disease virus nucleotide sequence and related metadata parsed from public resources and previously private clinical cases from three major veterinary diagnostic laboratories. A suite of web-based tools allows stakeholders, researchers, and veterinarians to quickly search for genetic sequence information, identify similar viruses, and browse virus genomes to inform their research and control efforts. Databases such as these will greatly increase researchers’ understanding of endemic circulating viruses and speed response efforts by helping them to quickly identify new viral variants.

Technical Abstract: Veterinary diagnostic laboratories annually derive thousands of nucleotide sequences from clinical samples of swine pathogens such as porcine reproductive and respiratory syndrome virus (PRRSV), Senecavirus A, and swine enteric coronaviruses. In addition, next generation sequencing has resulted in the rapid production of full-length genomes. Presently, sequence data are released to diagnostic clients for the purposes of informing control measures, but are not publicly available as data may be associated with sensitive information. However, public sequence data can be used to objectively design field-relevant vaccines; determine when and how pathogens are spreading across the landscape; identify virus transmission hotspots; and are a critical component in genomic surveillance for pandemic preparedness. We have developed a centralized sequence database that integrates a selected set of previously private clinical data, using PRRSV data as an exemplar, alongside publicly available genomic information. We implemented the Tripal toolkit, using the open source Drupal content management system and the Chado database schema. Tripal consists of a collection of Drupal modules that are used to manage, visualize, and disseminate biological data stored within Chado. Hosting is provided by Amazon Web Services (AWS) EC2 cloud instance with resource scaling. New sequences sourced from diagnostic labs contain at a minimum four data items: genomic information; date of collection; collection location (state or province level); and a unique identifier. Users can download annotated genomic sequences from the database using a customized search interface that incorporates data mined from published literature; search for similar sequences using BLAST-based tools; and explore annotated reference genomes. Additionally, because the bulk of data presently are PRRSV sequences, custom curation and annotation pipelines have determined PRRSV genotype (Type 1 or 2), the location of open reading frames and nonstructural proteins, generated amino acid sequences, the occurrence of putative frame shifts, and restriction fragment length polymorphism (RFLP) classification of GP5 genes. Genomic data from seven major swine pathogens have been curated and annotated. The resource provides researchers timely access to sequences discovered by veterinary diagnosticians, allowing for epidemiological and comparative virology studies. The result will be a better understanding on the emergence of novel swine viruses in the United States, and how these novel strains are disseminated in the US and abroad. Database URL: https://swinepathogendb.org