Publication : USDA ARS

ARS Home » Plains Area » Houston, Texas » Children's Nutrition Research Center » Research » Publications at this Location » Publication #400353

Research Project: Preventing the Development of Childhood Obesity

Location: Children's Nutrition Research Center

Title: Clustering egocentric images in passive dietary monitoring with self-supervised learning

Author

	PENG, JIACHUAN - Imperial College
	SHI, PEILUN - Imperial College
	QUI, JIANING - Imperial College
	JU, XINWEI - Imperial College
	LO, FRANK - Imperial College
	GU, XIAO - Imperial College
	JIA, WENYAN - University Of Pittsburgh
	BARANOWSKI, TOM - Children'S Nutrition Research Center (CNRC)
	STEINER-ASIEDU, MATILDA - University Of Ghana
	ANDERSON, ALEX - University Of Ghana
	MCCRORY, MEGAN - Boston University
	SAZONOV, EDWARD - University Of Alabama
	SUN, MINGUI - University Of Pittsburgh
	FROST, GARY - Imperial College
	LO, BENNY - Imperial College

Submitted to: IEEE Access
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/15/2022
Publication Date: 11/18/2022
Citation: Peng, J., Shi, P., Qui, J., Ju, X., Lo, F.P., Gu, X., Jia, W., Baranowski, T., Steiner-Asiedu, M., Anderson, A.K., McCrory, M.A., Sazonov, E., Sun, M., Frost, G., Lo, B. 2022. Clustering egocentric images in passive dietary monitoring with self-supervised learning. 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). https://doi.org/10.1109/BHI56158.2022.9926927.
DOI: https://doi.org/10.1109/BHI56158.2022.9926927

Interpretive Summary: The assessment of what and how much people eat is fraught with error when done by self report, currently the most common method. Artificial Intelligence (AI) methods may overcome some of the error when applied to images of intake from cameras, but large numbers of annotated images (i.e. images which have been labeled with what they really contain, also called ground truth) are needed to train the AI. In our recent dietary assessment field studies on passive dietary monitoring in Ghana, we collected over 250k in-the-wild images, i.e. not in a laboratory, to facilitate accurate measurement of individual food and nutrient intake in low and middle income countries with passive monitoring camera technologies. The current dataset involves 20 households (74 subjects) from both the rural and urban regions of Ghana, using two different types of wearable cameras. Wearable cameras continuously capture subjects' activities, which yield massive amounts of data to be cleaned and annotated before analysis is conducted. To ease the data post-processing and annotation tasks, we developed a novel self-supervised learning framework to cluster the large volume of egocentric images into separate events. Each event consists of a sequence of temporally continuous and contextually similar images. By clustering images into separate events, annotators and dietitians can examine and analyze the data more efficiently and facilitate the subsequent dietary assessment processes. Validated on a held-out test set with ground truth labels, the proposed framework outperforms baselines in terms of clustering quality and classification accuracy.

Technical Abstract: In our recent dietary assessment field studies on passive dietary monitoring in Ghana, we have collected over 250k in-the-wild images. The dataset is an ongoing effort to facilitate accurate measurement of individual food and nutrient intake in low and middle income countries with passive monitoring camera technologies. The current dataset involves 20 households (74 subjects) from both the rural and urban regions of Ghana, and two different types of wearable cameras were used in the studies. Once initiated, wearable cameras continuously capture subjects' activities, which yield massive amounts of data to be cleaned and annotated before analysis is conducted. To ease the data post-processing and annotation tasks, we propose a novel self-supervised learning framework to cluster the large volume of egocentric images into separate events. Each event consists of a sequence of temporally continuous and contextually similar images. By clustering images into separate events, annotators and dietitians can examine and analyze the data more efficiently and facilitate the subsequent dietary assessment processes. Validated on a held-out test set with ground truth labels, the proposed framework outperforms baselines in terms of clustering quality and classification accuracy.

U.S. DEPARTMENT OF AGRICULTURE

Children's Nutrition Research Center: Houston, TX