Sentiment Analysis Implementation

During preliminary research, I have found a paper "Sentiment Analysis in Financial News" (2009) which compares the efficiency of two techniques for market value prediction using sentiment analysis.

The first one parse news articles and calculates a sentiment score for each document by looking up the General Inquirer Dictionary which associates several thousands words with a positive or negative score. It only takes into account documents with at least a certain amount of sentiment-related words and with a certain minimum score. This technique allows to predict market value, but its efficiency is often just enough to compensate the trading costs. Its implementation is scalable, light processing-wise and straightforward to implement. Thus it can be used to predict market value in real time.

The second technique consists of using Machine Learning to learn the sentiment values of the parsed words. It has the advantage of giving much better results in predicting market value. Financial data suits the machine learning approach pretty good. We use the parsed news as feature and the market value as the label to train the classifiers. The implementatioin would be regular updates (every X hours) of the classifier on past news because it is a heavy computation. Then we can run it on real-time news. We need to find or implement an implementation of distributed perceptron or support vector machine because the training set, i.e. the past news, has a considerable size. Once it is trained, it shouldn’t be necessary to distribute the workload and many open-source implementations such as LIBSVM for Java.

 

References:

"Sentiment Analysis in Financial News" by Pablo Daniel Azar, http://people.csail.mit.edu/azar/wp-content/uploads/2011/09/thesis.pdf