GRaNN: feature selection with golden ratio-aided neural network for emotion, gender and speaker identification from voice signals

Garain, Avishek; Ray, Biswarup; Giampaolo, Fabio; Velasquez, Juan D.; Singh, Pawan Kumar; Sarkar, Ram

Abstract

Compared to other features of the human body, voice is quite complex and dynamic, in a sense that a speech can be spoken in various languages with different accents and in different emotional states. Recognizing the gender, i.e. male or female from the voice of an individual, is by all accounts a minor errand for human beings. Similar goes for speaker identification if we are well accustomed with the speaker for a long time. Our ears function as the front end, accepting the sound signs which our cerebrum processes and settles on our disposition. Although being trivial for us, it becomes a challenging task to mimic for any computing device. Automatic gender, emotion and speaker identification systems have many applications in surveillance, multimedia technology, robotics and social media. In this paper, we propose a Golden Ratio-aided Neural Network (GRaNN) architecture for the said purposes. As deciding the number of units for each layer in deep NN is a challenging issue, we have done this using the concept of Golden Ratio. Prior to that, an optimal subset of features are selected from the feature vector extracted, common for all three tasks, from spectral images obtained from the input voice signals. We have used a wrapper-filter framework where minimum redundancy maximum relevance selected features are fed to Mayfly algorithm combined with adaptive beta hill climbing (A beta HC) algorithm. Our model achieves accuracies of 99.306% and 95.68% for gender identification in RAVDESS and Voice Gender datasets, 95.27% for emotion identification in RAVDESS dataset and 67.172% for speaker identification in RAVDESS dataset. Performance comparison of this model with existing models on the publicly available datasets confirms its superiority over those models. Results also ensure that we have chosen the common feature set meticulously, which works equally well on three different pattern classification tasks. The proposed wrapper-filter framework reduces the feature dimension significantly, thereby lessening the storage requirement and training time. Finally, strategically selecting the number units in each layer in NN help increases the overall performance of all three pattern classification tasks.

Más información

Título según WOS: GRaNN: feature selection with golden ratio-aided neural network for emotion, gender and speaker identification from voice signals
Título de la Revista: NEURAL COMPUTING & APPLICATIONS
Volumen: 34
Número: 17
Editorial: SPRINGER LONDON LTD
Fecha de publicación: 2022
Página de inicio: 14463
Página final: 14486
DOI:

10.1007/s00521-022-07261-x

Notas: ISI