Location: Produce Safety and Microbiology Research
Title: DeepPL: A deep-learning-based tool for the prediction of bacteriophage lifecycleAuthor
Zhang, Yujie | |
MAO, MARK - Kansas State University | |
ZHANG, ROBERT - Kansas State University | |
Liao, Yen-Te | |
Wu, Vivian |
Submitted to: PLoS Computational Biology
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 10/10/2024 Publication Date: 10/17/2024 Citation: Zhang, Y., Mao, M., Zhang, R., Liao, Y., Wu, V.C. 2024. DeepPL: A deep-learning-based tool for the prediction of bacteriophage lifecycle. PLoS Computational Biology. 20(10). Article e1012525. https://doi.org/10.1371/journal.pcbi.1012525. DOI: https://doi.org/10.1371/journal.pcbi.1012525 Interpretive Summary: Bacteriophages (or phages) are viruses that infect bacteria and are widely prevalent in different environments, such as oceans, lakes, and agricultural soil. Phages have two different life cycles—lytic cycle and lysogenic cycle—to interact with their bacterial hosts. Lytic phages can infect and promptly lyse bacteria and have attracted growing interest as a promising antimicrobial agent. Lysogenic phages can integrate their DNA into bacterial genomes and have been used for drug discovery and antibody production. Therefore, effecientlly identifying the lifecycle of newly isolated phages is a critical step highly associated with the subsequent applications. In this study, we developed a deep learning-based tool called DeepPL for the prediction of phage lifecycle using phage DNA sequences. The results showed that DeepPL had an excellent performance, with an accuracy of 94.62% among 374 published phages. In addition, DeepPL demonstrated a high accuracy (100%) in predicting 18 diverse phages that were isolated and well-characterized by our lab previously. DeepPL also predicted phage lifecycles from a metagenomic dataset containing various phage sequences with accuracies ranging from 85.71% to 100%. In conclusion, DeepPL displays great accuracy in phage lifecycle classification and provides the data-driven direction for phage-based research and application. Technical Abstract: Bacteriophages are viruses that infect bacteria and can be classified into two different lifecycles. Virulent phages (or lytic phages) have a lytic cycle that can lyse the bacteria host immediately after their infection. Temperate phages (or lysogenic phages) can integrate their phage genomes into bacterial chromosomes and replicate with bacterial hosts via the lysogenic cycle. The identification of phage lifecycles is a crucial step in developing suitable applications for phages. Compared to the complicatedly traditional biological experiments, several tools were designed for predicting phage lifecycle using different algorithms, such as random forest (RF), linear support-vector classifier (SVC), and convolutional neural network (CNN). In this study, we developed a natural language processing (NLP)-based tool—DeepPL—for predicting phage lifecycles via nucleotide sequences. The test results showed that our DeepPL had an accuracy of 94.62% with a sensitivity of 92.24% and a specificity of 95.91%. This result indicated that DeepPL captured the most fundamental genomic differences between virulent and temperate phages at the nucleotide level and performed an excellent prediction of phage lifecycles. Moreover, DeepPL had 100% accuracy in lifecycle prediction on the 18 phages we isolated and biologically verified previously. Additionally, a mock phage community metagenomic dataset was used to test the potential usage of DeepPL in viral metagenomic research. DeepPL displayed high accuracies ranging from 85.71% to 100% on phage contigs produced by various next-generation sequencing technologies. Overall, our study indicates that DeepPL has a reliable performance on phage lifecycle prediction using nucleotide sequences and can be applied to future phage and metagenomic research. |