From Unlabelled Tweets to Twitter-specific Opinion Words
Abstract
In this article, we propose a word-level classification model for automatically generating a Twitter-specific opinion lexicon from a corpus of unlabelled tweets. The tweets from the corpus are represented by two vectors: a bag-of-words vector and a semantic vector based on word-clusters. We propose a distributional representation for words by treating them as the centroids of the tweet vectors in which they appear. The lexicon generation is conducted by training a word-level classifier using these centroids to form the instance space and a seed lexicon to label the training instances. Experimental results show that the two types of tweet vectors complement each other in a statistically significant manner and that our generated lexicon produces significant improvements for tweet-level polarity classification.
Más información
| Título según WOS: | ID WOS:000382307300081 Not found in local WOS DB |
| Título de la Revista: | SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL |
| Editorial: | ASSOC COMPUTING MACHINERY |
| Fecha de publicación: | 2015 |
| Página de inicio: | 743 |
| Página final: | 746 |
| DOI: |
10.1145/2766462.2767770 |
| Notas: | ISI |