Multimodal Emotion Recognition Dataset in the Wild (MERDWild)
Abstract
Multimodal emotion recognition involves identifying human emotions in specific situations using artificial intelligence across multiple modalities. MERDWild, a multimodal emotion recognition dataset, addresses the challenge of unifying, cleaning, and transforming three datasets collected in uncontrolled environments with the aim of integrating and standardizing a database that encompasses three modalities: facial images, audio, and text. A methodology is presented that combines information from these modalities, utilizing �in-the-wild� datasets including AFEW, AffWild2, and MELD. MERDWild consists of 15 873 audio samples, 905 281 facial images, and 15 321 sentences, all of them considered usable quality data. The project outlines the entire process of data cleaning, transformation, normalization, and quality control, resulting in a unified structure for recognizing seven emotions.
Más información
Título según SCOPUS: | ID SCOPUS_ID:85189554627 Not found in local SCOPUS DB |
Editorial: | IEEE |
Fecha de publicación: | 2023 |
DOI: |
10.1109/CHILECON60335.2023.10418672 |
Notas: | SCOPUS |