Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BHNRC) » Beltsville Human Nutrition Research Center » Methods and Application of Food Composition Laboratory » Research » Publications at this Location » Publication #406857

Research Project: USDA National Nutrient Databank for Food Composition

Location: Methods and Application of Food Composition Laboratory

Title: USDA IngID Thesaurus: an application dataset for systematic reporting of ingredients used in commercially packaged foods

Author
item Ahuja, Jaspreet
item LI, YING - University Of Maryland
item Pehrsson, Pamela
item Harnly, James - Jim

Submitted to: Government Publication/Report
Publication Type: Government Publication
Publication Acceptance Date: 10/11/2023
Publication Date: 10/11/2023
Citation: Ahuja, J.K., Li, Y., Pehrsson, P.R., Harnly, J.M. 2023. USDA IngID Thesaurus: an application dataset for systematic reporting of ingredients used in commercially packaged foods. Government Publication/Report. 100 (2021).

Interpretive Summary: There is a general lack of information in scientific literature on type of ingredients used in commercially packaged foods. USDA’s Global Branded Food products Database (GBFPD), as part of FoodData Central, makes publicly available ingredient lists >0.25 million commercially packaged food products. A review of the ingredient terms revealed the need for a thesaurus of ingredient terms used on commercially packaged food labels. We obtained ingredient lists of top-selling food categories, based on the variety and diversity of the type of ingredients used in their products. Ingredients that were equivalent, similar, spelling or usage variants, spelling errors or synonyms were assigned a Preferred descriptor for systematic reporting of ingredients. IngID Thesaurus for the first time makes publicly available a tool that can potentially help reduce pre-processing and data clean-up time for the study of ingredients as listed on labels of commercially packaged foods. It will enable characterization of what is in the food we eat using standardized vocabulary and can potentially help improve our understanding of commercial ingredients, characterizing foods in dimensions other than the traditional nutrient profiles, and development of food ontology, and artificial intelligence tools.

Technical Abstract: Commercially packaged foods are an integral part of the US diet. There is a general lack of information in scientific literature on type of ingredients used in these foods. USDA’s Global Branded Food products Database (GBFPD), as part of FoodData Central, makes publicly available a compiled dataset of ingredient lists for over a quarter million commercially packaged food products. In 2021, a prototype of IngID, a framework for parsing and systematically reporting ingredients used in commercially packaged baked products was developed. A review of the ingredient terms revealed the need for a thesaurus of ingredient terms used on commercially packaged food labels. We obtained ingredient lists of top-selling food categories, based on the variety and diversity of the type of ingredients used in their products. The selected categories include - baked products, beverages, candies, dairy products, frozen and refrigerated entrees, and soups, among others. We obtained ingredient lists (blocks of free text) from GBFPD for foods representing these selected categories and parsed them into individual ingredients. The parsed ingredients were reviewed and parsed ingredients that were equivalent, similar, spelling or usage variants, spelling errors or synonyms were assigned the same Preferred Descriptor (PD). This allows for systematic reporting of these ingredients. Furthermore, the PDs were grouped broadly into a broad taxonomy scheme. The first publicly available version, IngID Thesaurus Version 1 (2023) contains ~26,000 parsed ingredient terms, that have been assigned ~3,000 PDs, categorized in a taxonomic hierarchy of 16 broad groups. IngID Thesaurus for the first time makes publicly available a tool that can potentially help reduce pre-processing and data clean-up time for the study of ingredients as listed on commercially packaged food labels. It will enable characterization of what is in the food we eat using standardized vocabulary and can potentially help improve our understanding of commercial ingredients.