Location: Methods and Application of Food Composition Laboratory
Title: Deep learning accurately predicts food categories and nutrients based on ingredient statementsAuthor
MA, PEIHUA - University Of Maryland | |
WANG, QIN - University Of Maryland | |
YU, NING - University Of Maryland | |
LI, YING - University Of Maryland | |
ZHANG, ZHIKUN - Helmholtz Centre | |
SHENG, JIPING - Renmin University Of China | |
Ahuja, Jaspreet | |
MCGINTY, HANDE - University Of Maryland |
Submitted to: Food Chemistry
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 5/16/2022 Publication Date: 5/19/2022 Citation: Ma, P., Wang, Q., Yu, N., Li, Y., Zhang, Z., Sheng, J., Ahuja, J.K., Mcginty, H. 2022. Deep learning accurately predicts food categories and nutrients based on ingredient statements. Food Chemistry. 391:133243. https://doi.org/10.1016/j.foodchem.2022.133243. DOI: https://doi.org/10.1016/j.foodchem.2022.133243 Interpretive Summary: Determining attributes such as taxonomy and nutrients for foods can be a challenging and resource-intensive task, albeit important for better understanding of foods. In this study, a novel strategy has been developed to predict food categories and nutrient values based on the ingredient statement from USDA Branded Food Products Database, using deep learning models. The Multi-layer Perceptron (MLP) method (ingredient encoding using term frequency-inverse document frequency and dataset rebalancing with synthetic minority oversampling technique-edited nearest neighbors) obtained the highest learning efficiency for AI food natural language processing tasks, which achieved up to 99% accuracy for food classification and 0.98 for R2 for calcium estimation. The deep learning approach has great potential to be embedded in other food classification and regression tasks and as an extension to other applications in the food and nutrient scope. The automation of these resource-intensive tasks can help with precision nutrition and better understanding of dietary intakes, and can be useful for food composition database managers, dietitians, and epidemiologists. Technical Abstract: Determining attributes such as taxonomy and nutrients for foods can be a challenging and resource-intensive task, albeit important for better understanding of foods. In this study, a novel strategy has been developed to predict food categories and nutrient values based on the ingredient statement using deep learning models. A novel dataset, 134k BFPD, was collected from USDA BFPD (USDA Branded Food Products Database) with modification and labeled with three food taxonomy and nutrient values and became an artificial intelligence (AI) dataset that covered the largest food types to date. The deep learning strategy encompassed parsing ingredients of high-frequency ingredient encoding using term frequency-inverse document frequency (TF-IDF, TF in short) and dataset rebalancing with synthetic minority oversampling technique-edited nearest neighbors (SMOTEENN, SE in short). Overall, the Multi-layer perceptron (MLP)-TF-SE method obtained the highest learning efficiency for AI food natural language processing tasks, which achieved up to 99% accuracy for food classification and 0.98 for R2 for calcium estimation (0.79~ 0.97 for calories, protein, sodium, total carbohydrate, total lipids, etc). The deep learning approach has great potential to be embedded in other food classification and regression tasks and as an extension to other applications in the food and nutrient scope. |