An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition

Aguilera, Ana; Mellado, Diego; Rojas, Felipe

Keywords: deep learning, multimodal emotion recognition, in-the-wild datasets

Abstract

Multimodal emotion recognition implies the use of different resources and techniques for identifying and recognizing human emotions. A variety of data sources such as faces, speeches, voices, texts and others have to be processed simultaneously for this recognition task. However, most of the techniques, which are based mainly on Deep Learning, are trained using datasets designed and built in controlled conditions, making their applicability in real contexts with real conditions more difficult. For this reason, the aim of this work is to assess a set of in-the-wild datasets to show their strengths and weaknesses for multimodal emotion recognition. Four in-the-wild datasets are evaluated: AFEW, SFEW, MELD and AffWild2. A multimodal architecture previously designed is used to perform the evaluation and classical metrics such as accuracy and F1-Score are used to measure performance in training and to validate quantitative results. However, strengths and weaknesses of these datasets for various uses indicate that by themselves they are not appropriate for multimodal recognition due to their original purpose, e.g., face or speech recognition. Therefore, we recommend a combination of multiple datasets in order to obtain better results when new samples are being processed and a good balance in the number of samples by class.

Más información

Título de la Revista: SENSORS
Volumen: 23(11)
Número: 5184
Editorial: MDPI
Página de inicio: 1
Página final: 27
Idioma: inglés
Financiamiento/Sponsor: Universidad de Valparaíso
URL: https://www.mdpi.com/1424-8220/23/11/5184
DOI:

10.3390/s23115184

Notas: WOS