PUC Chile team at Caption Prediction: ResNet visual encoding and caption classification with parametric ReLU
Keywords: Convolutional neural networks; Deep learning; Image captioning; Medical artificial intelligence; Perceptual similarity
Abstract
This article describes PUC Chile team's participation in the Caption Prediction task of ImageCLEFmedical challenge 2021, which resulted in the team winning this task. We first show how a very simple approach based on statistical analysis of captions, without relying on images, results in a competitive baseline score. Then, we describe how to improve the performance of this preliminary submission by encoding the medical images with a ResNet CNN, pre-trained on ImageNet and later fine-tuned with the challenge dataset. Afterwards, we use this visual encoding as the input for a multi-label classification approach for caption prediction. We describe in detail our final approach, and we conclude by discussing some ideas for future work.
Más información
| Título según SCOPUS: | PUC Chile team at Caption Prediction: ResNet visual encoding and caption classification with parametric ReLU |
| Título de la Revista: | CEUR Workshop Proceedings |
| Volumen: | 2936 |
| Editorial: | CEUR-WS |
| Fecha de publicación: | 2021 |
| Página de inicio: | 1174 |
| Página final: | 1183 |
| Idioma: | English |
| Notas: | SCOPUS |