Location: Healthy Processed Foods Research
Title: UniDL4BioPep: A universal deep learning architecture for binary classification in peptide bioactivityAuthor
DU, ZHENJIAO - Kansas State University | |
DING, XINGJIAN - Kansas State University | |
Xu, Yixiang | |
LI, YONGHUI - Kansas State University |
Submitted to: Briefings in Bioinformatics
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 3/16/2023 Publication Date: 4/5/2023 Citation: Du, Z., Ding, X., Xu, Y., Li, Y. 2023. UniDL4BioPep: A universal deep learning architecture for binary classification in peptide bioactivity. Briefings in Bioinformatics. 24(3). Article bbad135. https://doi.org/10.1093/bib/bbad135. DOI: https://doi.org/10.1093/bib/bbad135 Interpretive Summary: Identification of potent peptides through model prediction can reduce benchwork in wet experiments. However, conventional model buildings are complex and time-consuming. Recently, release of advanced pre-trained deep learning-based language models (LM) promotes protein sequence embedding and structure/function prediction. This research is to develop UniDL4BioPep, a universal deep-learning model architecture that can be self-adaptive to model bioactivity of various peptides with any length. Through combining the LM with a convolutional neural network, UniDL4BioPep exhibited greater performances in predicting bioactivity of the peptides than the respective state-of-the-art models. The model can achieve cutting-edge performance by directly fitting various bioactive peptide datasets to meet the demands of biochemistry researchers and to facilitate the exploration of bioactive peptides. Technical Abstract: Identification of potent peptides through model prediction can reduce benchwork in wet experiments. However, conventional model buildings are complex and time-consuming, and face different challenges including peptide representation and feature, model selection, and hyperparameter. Advanced pre-trained deep learning-based language models (LM) would promote protein sequence embedding and structure/function prediction. This research is to develop UniDL4BioPep, a universal deep-learning model architecture that can be self-adaptive to model bioactivity of various peptides with any length. Through combining the LM with a convolutional neural network, UniDL4BioPep exhibited greater performances in predicting bioactivity of the peptides than the respective state-of-the-art models. The accuracy, Mathews correlation coefficient, and area under the curve were 1.6-7%, 3.6-26.4%, and 1.8-12.3% higher, respectively. The model was also validated through t-distributed stochastic neighbor embedding analysis. |