A Proposal for Explainable Fruit Quality Recognition Using Multimodal Models

Nuñez, F; Peralta, B; Nicolis, O; Caro, L; Mora, M

Keywords: artificial vision, clip, Multimodal classification

Abstract

The fruit industry in Chile has achieved global recognition for its productivity and leadership in fruit exportation, being the main exporter in the Southern Hemisphere, especially of cherries, grapes, and blueberries. Agricultural automation is a growing trend aimed at reducing laborious work and the consumption of time and personnel. Advances in artificial intelligence are enabling the automation of various processes, such as fruit categorization, though there are still gaps in the precision of classifying fruits in good and bad condition, particularly when considering specialized multimodal models. This work addresses this gap by combining convolutional neural network models and the multimodal CLIP technique, evaluating the effectiveness of convolutional architectures such as ResNet50, Xception, and MobileNet. The experiments show interesting results among different architectures, with ViT-B/16 model standing out for its higher precision in this task. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Más información

Título según WOS: A Proposal for Explainable Fruit Quality Recognition Using Multimodal Models
Título según SCOPUS: A Proposal for Explainable Fruit Quality Recognition Using Multimodal Models
Título de la Revista: Lecture Notes in Computer Science
Editorial: Springer Science and Business Media Deutschland GmbH
Fecha de publicación: 2025
Página de inicio: 118
Página final: 132
Idioma: English
DOI:

10.1007/978-3-031-76607-7_9

Notas: ISI, SCOPUS