PDI - Resultado de Búsqueda

Abstract

Cross-modal retrieval requires building a common latent space that captures and correlates information from different data modalities, usually images and texts. Cross-modal training based on the triplet loss with hard negative mining is a state-of-the-art technique to address this problem. This paper shows that such approach is not always effective in handling intra-modal similarities. Specifically, we found that this method can lead to inconsistent similarity orderings in the latent space, where intra-modal pairs with unknown ground-truth similarity are ranked higher than cross-modal pairs representing the same concept. To address this problem, we propose two novel loss functions that leverage intra-modal similarity constraints available in a training triplet but not used by the original formulation. Additionally, this paper explores the application of this framework to unsupervised image retrieval problems, where cross-modal training can provide the supervisory signals that are otherwise missing in the absence of category labels. Up to our knowledge, we are the first to evaluate cross-modal training for intra-modal retrieval without labels. We present comprehensive experiments on MS-COCO and Flickr30K, demonstrating the advantages and limitations of the proposed methods in cross-modal and intra-modal retrieval tasks in terms of performance and novelty measures. Our code is publicly available on GitHub https://github.com/MariodotR/FullHN.git.

Más información

Título según SCOPUS:	ID SCOPUS_ID:85174249848 Not found in local SCOPUS DB
Título de la Revista:	Lecture Notes in Computer Science
Volumen:	14276 LNAI
Editorial:	Springer
Fecha de publicación:	2023
Página de inicio:	249
Página final:	264
DOI:	10.1007/978-3-031-45275-8_17
Notas:	SCOPUS

Enhancing Intra-modal Similarity in a Cross-Modal Triplet Loss

Abstract

Más información