Skip to main content
ARS Home » Plains Area » Houston, Texas » Children's Nutrition Research Center » Research » Publications at this Location » Publication #400353

Research Project: Preventing the Development of Childhood Obesity

Location: Children's Nutrition Research Center

Title: Clustering egocentric images in passive dietary monitoring with self-supervised learning

Author
item PENG, JIACHUAN - Imperial College
item SHI, PEILUN - Imperial College
item QUI, JIANING - Imperial College
item JU, XINWEI - Imperial College
item LO, FRANK - Imperial College
item GU, XIAO - Imperial College
item JIA, WENYAN - University Of Pittsburgh
item BARANOWSKI, TOM - Children'S Nutrition Research Center (CNRC)
item STEINER-ASIEDU, MATILDA - University Of Ghana
item ANDERSON, ALEX - University Of Ghana
item MCCRORY, MEGAN - Boston University
item SAZONOV, EDWARD - University Of Alabama
item SUN, MINGUI - University Of Pittsburgh
item FROST, GARY - Imperial College
item LO, BENNY - Imperial College

Submitted to: IEEE Access
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/15/2022
Publication Date: 11/18/2022
Citation: Peng, J., Shi, P., Qui, J., Ju, X., Lo, F.P., Gu, X., Jia, W., Baranowski, T., Steiner-Asiedu, M., Anderson, A.K., McCrory, M.A., Sazonov, E., Sun, M., Frost, G., Lo, B. 2022. Clustering egocentric images in passive dietary monitoring with self-supervised learning. 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). https://doi.org/10.1109/BHI56158.2022.9926927.
DOI: https://doi.org/10.1109/BHI56158.2022.9926927

Interpretive Summary: The assessment of what and how much people eat is fraught with error when done by self report, currently the most common method. Artificial Intelligence (AI) methods may overcome some of the error when applied to images of intake from cameras, but large numbers of annotated images (i.e. images which have been labeled with what they really contain, also called ground truth) are needed to train the AI. In our recent dietary assessment field studies on passive dietary monitoring in Ghana, we collected over 250k in-the-wild images, i.e. not in a laboratory, to facilitate accurate measurement of individual food and nutrient intake in low and middle income countries with passive monitoring camera technologies. The current dataset involves 20 households (74 subjects) from both the rural and urban regions of Ghana, using two different types of wearable cameras. Wearable cameras continuously capture subjects' activities, which yield massive amounts of data to be cleaned and annotated before analysis is conducted. To ease the data post-processing and annotation tasks, we developed a novel self-supervised learning framework to cluster the large volume of egocentric images into separate events. Each event consists of a sequence of temporally continuous and contextually similar images. By clustering images into separate events, annotators and dietitians can examine and analyze the data more efficiently and facilitate the subsequent dietary assessment processes. Validated on a held-out test set with ground truth labels, the proposed framework outperforms baselines in terms of clustering quality and classification accuracy.

Technical Abstract: In our recent dietary assessment field studies on passive dietary monitoring in Ghana, we have collected over 250k in-the-wild images. The dataset is an ongoing effort to facilitate accurate measurement of individual food and nutrient intake in low and middle income countries with passive monitoring camera technologies. The current dataset involves 20 households (74 subjects) from both the rural and urban regions of Ghana, and two different types of wearable cameras were used in the studies. Once initiated, wearable cameras continuously capture subjects' activities, which yield massive amounts of data to be cleaned and annotated before analysis is conducted. To ease the data post-processing and annotation tasks, we propose a novel self-supervised learning framework to cluster the large volume of egocentric images into separate events. Each event consists of a sequence of temporally continuous and contextually similar images. By clustering images into separate events, annotators and dietitians can examine and analyze the data more efficiently and facilitate the subsequent dietary assessment processes. Validated on a held-out test set with ground truth labels, the proposed framework outperforms baselines in terms of clustering quality and classification accuracy.