Publication : USDA ARS

ARS Home » Pacific West Area » Albany, California » Western Regional Research Center » Healthy Processed Foods Research » Research » Publications at this Location » Publication #404112

Research Project: New Sustainable Processes, Preservation Technologies, and Product Concepts for Specialty Crops and Their Co-Products

Location: Healthy Processed Foods Research

Title: UniDL4BioPep: A universal deep learning architecture for binary classification in peptide bioactivity

Author

	DU, ZHENJIAO - Kansas State University
	DING, XINGJIAN - Kansas State University
	Xu, Yixiang
	LI, YONGHUI - Kansas State University

Submitted to: Briefings in Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 3/16/2023
Publication Date: 4/5/2023
Citation: Du, Z., Ding, X., Xu, Y., Li, Y. 2023. UniDL4BioPep: A universal deep learning architecture for binary classification in peptide bioactivity. Briefings in Bioinformatics. 24(3). Article bbad135. https://doi.org/10.1093/bib/bbad135.
DOI: https://doi.org/10.1093/bib/bbad135

Interpretive Summary: Identification of potent peptides through model prediction can reduce benchwork in wet experiments. However, conventional model buildings are complex and time-consuming. Recently, release of advanced pre-trained deep learning-based language models (LM) promotes protein sequence embedding and structure/function prediction. This research is to develop UniDL4BioPep, a universal deep-learning model architecture that can be self-adaptive to model bioactivity of various peptides with any length. Through combining the LM with a convolutional neural network, UniDL4BioPep exhibited greater performances in predicting bioactivity of the peptides than the respective state-of-the-art models. The model can achieve cutting-edge performance by directly fitting various bioactive peptide datasets to meet the demands of biochemistry researchers and to facilitate the exploration of bioactive peptides.

Technical Abstract: Identification of potent peptides through model prediction can reduce benchwork in wet experiments. However, conventional model buildings are complex and time-consuming, and face different challenges including peptide representation and feature, model selection, and hyperparameter. Advanced pre-trained deep learning-based language models (LM) would promote protein sequence embedding and structure/function prediction. This research is to develop UniDL4BioPep, a universal deep-learning model architecture that can be self-adaptive to model bioactivity of various peptides with any length. Through combining the LM with a convolutional neural network, UniDL4BioPep exhibited greater performances in predicting bioactivity of the peptides than the respective state-of-the-art models. The accuracy, Mathews correlation coefficient, and area under the curve were 1.6-7%, 3.6-26.4%, and 1.8-12.3% higher, respectively. The model was also validated through t-distributed stochastic neighbor embedding analysis.

U.S. DEPARTMENT OF AGRICULTURE

Healthy Processed Foods Research: Albany, CA