Alzheimer’s detection from English to Spanish using acoustic and linguistic embeddings

Pérez-Toro, Paula Andrea; Klumpp, Philipp; Hernandez, Abner; Arias-Vergara, Tomas; Lillo, Patricia; Slachevsky, Andrea; García, Adolfo M.; Schuster, Maria; Maier, Andreas K.; Noeth, Elmar; Orozco-Arroyave, Juan Rafael

Abstract

Cross-lingual approaches are growing in popularity in the machine learning domain, where large amounts of data are required to obtain better generalizations. Moreover, one of the biggest problems is the availability of clinical speech data, where most of the resources are in English. For instance, not many available Alzheimer´s Disease (AD) corpora in different languages can be found in the literature. Despite the phonological and phonemic differences between Spanish and English, fortunately, there are also similarities between these two languages, e.g., around 40% of all words in English have a related word in Spanish. In this work, we want to investigate the feasibility of combining information from English and Spanish languages to discriminate AD. Two datasets were considered: part of the Pitt Corpus, which is composed of English speakers, and a Spanish AD dataset composed of speakers from Chile. We based our analysis on known acoustic (Wav2Vec) and word (BERT, RoBERTa) embeddings using different classifiers. Strong language dependencies were found, even using multilingual representations. We observed that linguistic information was more important for classifying English AD (F-Score=0.76) and acoustic for Spanish AD (F-Score=0.80). Using knowledge transferred from English to Spanish achieved F-scores of up to 0.85 for discriminating AD.

Más información

Fecha de publicación: 2022
Año de Inicio/Término: Del 18 al 22 de septiembre de 2022
Página de inicio: 2483
Página final: 2487
Idioma: Inglés
URL: https://www.isca-speech.org/archive/interspeech_2022/pereztoro22_interspeech.html
DOI:

10.21437/Interspeech.2022-10883