Location: Sugarbeet and Bean Research
Title: MetaFruit meets foundation models: Leveraging a comprehensive multi-fruit dataset for advancing agricultural foundation modelsAuthor
![]() |
LI, JIAJIA - Michigan State University |
![]() |
LAMMERS, KYLE - Michigan State University |
![]() |
YIN, XUNYUAN - Nanyang Technological University |
![]() |
YIN, XIANG - Jiaotong University |
![]() |
HE, LONG - Pennsylvania State University |
![]() |
SHENG, JUN - University Of California, Riverside |
![]() |
Lu, Renfu |
![]() |
LI, ZHAOJIAN - Michigan State University |
Submitted to: Computers and Electronics in Agriculture
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 1/2/2025 Publication Date: 1/22/2025 Citation: Li, J., Lammers, K., Yin, X., Yin, X., He, L., Sheng, J., Lu, R., Li, Z. 2025. MetaFruit meets foundation models: Leveraging a comprehensive multi-fruit dataset for advancing agricultural foundation models. Computers and Electronics in Agriculture. 231. Article 109908. https://doi.org/10.1016/j.compag.2025.109908. DOI: https://doi.org/10.1016/j.compag.2025.109908 Interpretive Summary: Manual fruit harvesting is still prevalent, and it is the single largest cost in apple production, accounting for approximately 15% of total production cost in the U.S. Hence, there is an urgent need for harvest automation to address the critical issues of increased cost and declining availability of labor for the specialty crop industries. Machine vision-based detection of fruits is a crucial step in robotic fruit harvesting. Much recent research has been reported on the development and application of artificial intelligence (AI) models, such as machine learning and deep learning, for detection of fruits in orchards. These models generally need to be trained using large image datasets with manually labelled fruit information, but they still have poor performance when encountering different, complex orchard environments. In this paper, we report on the creation of the largest publicly available image data set, called MetaFruit, which comprises 4,248 images and 248,015 manually labeled instances for apples, oranges, lemons, grapefruit and tangerines, collected under different orchard and natural light conditions. Furthermore, we also proposed an innovative open-access fruit detection system leveraging the advanced Vision Foundation Models (VFMs). The VFM-based system was evaluated against other existing AI models, using MetaFruit and other publicly available fruit datasets. The system has demonstrated superior performance compared to the other existing AI models in fruit detection; it had outstanding self-learning capabilities, requiring minimum images for training. Furthermore, the model also exhibited great ability to interpret human instructions for subtle fruit detection tasks, such as identifying fruits that are occluded by leaves and/or branches. The new MetaFruit dataset and the VFM-based fruit detection system are expected to significantly advance or accelerate the research and development of robotic fruit harvesting technology, thus addressing the critical labor issue facing the specialty crop industries. Technical Abstract: Fruit harvesting poses a significant labor and financial burden on the apple industry, requiring over 10 million worker hours each year in the U.S. alone and accounting for approximately 15% of total production cost. These challenges underscore the urgent need for automated or robotic harvesting solutions. Machine vision-based fruit detection is a crucial step for robotic harvesting of fruits. Despite significant progress in deep learning and machine learning techniques for fruit detection, these models have the poor ability to adapt to different orchard environments and/or different fruit species. These challenges are further compounded by the limited availability of pertinent datasets that are required for model training. In this work, we have created the largest publicly available multi-class fruit dataset, called MetaFruit, which comprises 4,248 images and 248,015 manually labeled instances, collected from diverse orchards in the U.S. Furthermore, this study also proposed an innovative open-access fruit detection system leveraging advanced Vision Foundation Models (VFMs) for fruit detection that can adeptly identify different types of fruits under varying orchard conditions. This system not only demonstrates remarkable adaptability in learning from minimal data through few-shot learning but also shows the ability to interpret human instructions for subtle detection tasks. The developed foundation model was comprehensively evaluated using several metrics, which outperformed existing state-of-the-art algorithms when using our own MetaFruit dataset and other open-sourced fruit datasets. The open-sourced MetaFruit dataset and the new VFM-based detection framework are expected to foster future research in vision-based robotic fruit harvesting. |