Location: Children's Nutrition Research Center
Title: Clustering egocentric images in passive dietary monitoring with self-supervised learningAuthor
PENG, JIACHUAN - Imperial College | |
SHI, PEILUN - Imperial College | |
QUI, JIANING - Imperial College | |
JU, XINWEI - Imperial College | |
LO, FRANK - Imperial College | |
GU, XIAO - Imperial College | |
JIA, WENYAN - University Of Pittsburgh | |
BARANOWSKI, TOM - Children'S Nutrition Research Center (CNRC) | |
STEINER-ASIEDU, MATILDA - University Of Ghana | |
ANDERSON, ALEX - University Of Ghana | |
MCCRORY, MEGAN - Boston University | |
SAZONOV, EDWARD - University Of Alabama | |
SUN, MINGUI - University Of Pittsburgh | |
FROST, GARY - Imperial College | |
LO, BENNY - Imperial College |
Submitted to: IEEE Access
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 7/15/2022 Publication Date: 11/18/2022 Citation: Peng, J., Shi, P., Qui, J., Ju, X., Lo, F.P., Gu, X., Jia, W., Baranowski, T., Steiner-Asiedu, M., Anderson, A.K., McCrory, M.A., Sazonov, E., Sun, M., Frost, G., Lo, B. 2022. Clustering egocentric images in passive dietary monitoring with self-supervised learning. 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI). https://doi.org/10.1109/BHI56158.2022.9926927. DOI: https://doi.org/10.1109/BHI56158.2022.9926927 Interpretive Summary: The assessment of what and how much people eat is fraught with error when done by self report, currently the most common method. Artificial Intelligence (AI) methods may overcome some of the error when applied to images of intake from cameras, but large numbers of annotated images (i.e. images which have been labeled with what they really contain, also called ground truth) are needed to train the AI. In our recent dietary assessment field studies on passive dietary monitoring in Ghana, we collected over 250k in-the-wild images, i.e. not in a laboratory, to facilitate accurate measurement of individual food and nutrient intake in low and middle income countries with passive monitoring camera technologies. The current dataset involves 20 households (74 subjects) from both the rural and urban regions of Ghana, using two different types of wearable cameras. Wearable cameras continuously capture subjects' activities, which yield massive amounts of data to be cleaned and annotated before analysis is conducted. To ease the data post-processing and annotation tasks, we developed a novel self-supervised learning framework to cluster the large volume of egocentric images into separate events. Each event consists of a sequence of temporally continuous and contextually similar images. By clustering images into separate events, annotators and dietitians can examine and analyze the data more efficiently and facilitate the subsequent dietary assessment processes. Validated on a held-out test set with ground truth labels, the proposed framework outperforms baselines in terms of clustering quality and classification accuracy. Technical Abstract: In our recent dietary assessment field studies on passive dietary monitoring in Ghana, we have collected over 250k in-the-wild images. The dataset is an ongoing effort to facilitate accurate measurement of individual food and nutrient intake in low and middle income countries with passive monitoring camera technologies. The current dataset involves 20 households (74 subjects) from both the rural and urban regions of Ghana, and two different types of wearable cameras were used in the studies. Once initiated, wearable cameras continuously capture subjects' activities, which yield massive amounts of data to be cleaned and annotated before analysis is conducted. To ease the data post-processing and annotation tasks, we propose a novel self-supervised learning framework to cluster the large volume of egocentric images into separate events. Each event consists of a sequence of temporally continuous and contextually similar images. By clustering images into separate events, annotators and dietitians can examine and analyze the data more efficiently and facilitate the subsequent dietary assessment processes. Validated on a held-out test set with ground truth labels, the proposed framework outperforms baselines in terms of clustering quality and classification accuracy. |