Charla: “Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis”
This talk addresses the label sparsity problem for Twitter polarity classification by automatically building two type of resources that can be exploited when labelled data is scarce: opinion lexicons, which are lists of words labelled by sentiment, and synthetically labelled tweets.
We build Twitter-specific opinion lexicons by training words-level classifiers using representations that exploit different sources of information such as (a) the morphological information conveyed by part-of-speech (POS) tags, (b) associations between words and the sentiment expressed in the tweets that contain them, and (c) distributional representations calculated from unlabeled tweets. Experimental results show that the generated lexicons produce significant improvements over existing manually annotated lexicons for tweet-level polarity classification.
In the second part, we develop distant supervision methods for generating synthetic training data for Twitter polarity classification by exploiting unlabelled tweets and prior lexical knowledge. Positive and negative training instances are generated by averaging unlabelled tweets annotated according to a given polarity lexicon. We study different mechanisms for selecting the candidate tweets to be averaged. Our experimental results show that the training data generated by the proposed models produce classifiers that perform significantly better than classifiers trained from tweets annotated with emoticons, a popular distant supervision approach for Twitter sentiment analysis.
About the Speaker:
Felipe Bravo-Marquez is currently doing his PhD at the machine learning group in the University of Waikato, New Zealand. He received two engineering degrees in the fields of computer science and industrial engineering, and a master’s degree in computer science, all from the University of Chile. He worked for three years as a research engineer at Yahoo! Labs Latin America. His main areas of interest are: data mining, natural language processing, information retrieval, and sentiment analysis.
You can find his full list of publications at: http://www.cs.waikato.ac.nz/~fjb11/