Multimodality information fusion for automated machine translation

Li Lin, Tayir Turghun, Han Yifeng, Tao Xiaohui, Velasquez Juan D

Keywords: semi-supervised learning, multimodal fusion, Machine translation, Multimodal alignment


Machine translation is a popular automation approach for translating texts between different languages. Although traditionally it has a strong focus on natural language, images can potentially provide an additional source of information in machine translation. However, there are presently two challenges: (i) the lack of an effective fusion method to handle the triangular-mapping function between image, text, and semantic knowledge; and (ii) the accessibility of large-scale parallel corpus to train a model for generating accurate machine translations. To address these challenges, this work proposes an effective multimodality information fusion method for automated machine translation based on semi-supervised learning. The method fuses multimodality information, texts and images to deliver automated machine translation. Specifically, our objective fuses multimodalities with alignment in a multimodal attention network, which advances the method through the power of mapping text and image features to their semantic information with accuracy. Moreover, a semi-supervised learning method is utilized for its capability in using a small number of parallel corpus for supervised training on the basis of unsupervised training. Conducted on the Multi30k dataset, the experimental results shows the promising performance of our proposed fusion method compared with state-of-the-art approaches.

Más información

Título de la Revista: INFORMATION FUSION
Volumen: 91
Editorial: Elsevier
Fecha de publicación: 2023
Página de inicio: 352
Página final: 363
Idioma: Ingles