Enhancing Intra-modal Similarity in a Cross-Modal Triplet Loss

Mallea, Mario; Nanculef, Ricardo ; Araya, Mauricio; Bifet, A; Lorena, AC; Ribeiro, RP; Gama, J; Abreu, PH

Abstract

Cross-modal retrieval requires building a common latent space that captures and correlates information from different data modalities, usually images and texts. Cross-modal training based on the triplet loss with hard negative mining is a state-of-the-art technique to address this problem. This paper shows that such approach is not always effective in handling intra-modal similarities. Specifically, we found that this method can lead to inconsistent similarity orderings in the latent space, where intra-modal pairs with unknown ground-truth similarity are ranked higher than cross-modal pairs representing the same concept. To address this problem, we propose two novel loss functions that leverage intra-modal similarity constraints available in a training triplet but not used by the original formulation. Additionally, this paper explores the application of this framework to unsupervised image retrieval problems, where cross-modal training can provide the supervisory signals that are otherwise missing in the absence of category labels. Up to our knowledge, we are the first to evaluate cross-modal training for intra-modal retrieval without labels. We present comprehensive experiments on MS-COCO and Flickr30K, demonstrating the advantages and limitations of the proposed methods in cross-modal and intra-modal retrieval tasks in terms of performance and novelty measures. Our code is publicly available on GitHub https://github.com/MariodotR/FullHN.git. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Más información

Título según WOS: ID WOS:001455440700017 Not found in local WOS DB
Título según SCOPUS: Enhancing Intra-modal Similarity in a Cross-Modal Triplet Loss
Título de la Revista: Lecture Notes in Computer Science
Editorial: Springer Science and Business Media Deutschland GmbH
Fecha de publicación: 2023
Página de inicio: 249
Página final: 264
Idioma: English
DOI:

10.1007/978-3-031-45275-8_17

Notas: ISI, SCOPUS