A parallel approach to text data augmentation for sentiment analysis using the PoS Wise Synonym Substitution algorithm

Gutierrez-Benitez; R.; Valdés-Jiménez; A.; Segura-Navarrete; Å

Keywords: Emotion analysis; Natural Language Processing; OpenMP; Sentiment Analysis; Text Data Augmentation

Abstract

Over the last decade, the use of social media as a massive communication medium has given people a tool to express their opinions. In it, people write their thoughts and feelings about plenty of topics generating large amount of data that can be analyzed by companies and researchers. Being tasks of the Natural Language Processing, Emotion analysis focuses on extracting the underlying emotions in text, meanwhile, Sentiment Analysis focuses on extracting the polarity of it. To accomplish this two tasks, Traditional Machine Learning and Deep Learning techniques are used. However, to reach good generalization performance, these techniques require large datasets of labeled data for training. For researchers this is an issue because in languages like Spanish the labeled datasets are sparse. To solve this, Data Augmentation techniques are used to generate wider datasets of labeled data from a small, labeled dataset. This work presents an OpenMP version for shared memory systems of a Data Augmentation technique called PoS Wise Synonym Substitution that replaces some of the words of a sentence with their synonyms extracted from Wordnet to create new sentences. With the parallel approach we reduced the execution time reasonably compared to the original version reaching a speedup of up to 17.5x.

Más información

Título según SCOPUS: A parallel approach to text data augmentation for sentiment analysis using the PoS Wise Synonym Substitution algorithm
Título de la Revista: Proceedings - International Conference of the Chilean Computer Science Society, SCCC
Editorial: IEEE Computer Society
Fecha de publicación: 2023
Idioma: Spanish
DOI:

10.1109/SCCC59417.2023.10315705

Notas: SCOPUS