Annotate-Sample-Average (ASA): A New Distant Supervision Approach for Twitter Sentiment Analysis

Bravo-Marquez, Felipe; Frank, Eibe; Pfahringer, Bernhard; Kaminka, GA; Fox M.; Bouquet, P; Hullermeier, E; Dignum, V; Dignum, F; VanHarmelen, F

Abstract

The classification of tweets into polarity classes is a popular task in sentiment analysis. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. A drawback of these approaches is the high cost involved in data annotation. Two freely available resources that can be exploited to solve the problem are: 1) large amounts of unlabelled tweets obtained from the Twitter API and 2) prior lexical knowledge in the form of opinion lexicons. In this paper, we propose Annotate-Sample-Average (ASA), a distant supervision method that uses these two resources to generate synthetic training data for Twitter polarity classification. Positive and negative training instances are generated by sampling and averaging unlabelled tweets containing words with the corresponding polarity. Polarity of words is determined from a given polarity lexicon. Our experimental results show that the training data generated by ASA (after tuning its parameters) produces a classifier that performs significantly better than a classifier trained from tweets annotated with emoticons and a classifier trained, without any sampling and averaging, from tweets annotated according to the polarity of their words.

Más información

Título según WOS: ID WOS:000385793700059 Not found in local WOS DB
Título de la Revista: ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE OF THE CATALAN ASSOCIATION FOR ARTIFICIAL INTELLIGENCE
Volumen: 285
Editorial: IOS Press
Fecha de publicación: 2016
Página de inicio: 498
Página final: 506
DOI:

10.3233/978-1-61499-672-9-498

Notas: ISI