Elsevier

Expert Systems with Applications

Volume 42, Issue 24, 30 December 2015, Pages 9603-9611
Expert Systems with Applications

Sentiment analysis on social media for stock movement prediction

https://doi.org/10.1016/j.eswa.2015.07.052Get rights and content

Highlights

  • A novel method for predicting stock price movement was presented.

  • Topics and sentiments of them were extracted from social media as the feature.

  • Two methods were proposed to capture the topic-sentiment feature.

  • Integration of the sentiments was investigated via a large scale experiment.

  • Our model outperformed other methods in the average accuracy of 18 stocks.

Abstract

The goal of this research is to build a model to predict stock price movement using the sentiment from social media. Unlike previous approaches where the overall moods or sentiments are considered, the sentiments of the specific topics of the company are incorporated into the stock prediction model. Topics and related sentiments are automatically extracted from the texts in a message board by using our proposed method as well as existing topic models. In addition, this paper shows an evaluation of the effectiveness of the sentiment analysis in the stock prediction task via a large scale experiment. Comparing the accuracy average over 18 stocks in one year transaction, our method achieved 2.07% better performance than the model using historical prices only. Furthermore, when comparing the methods only for the stocks that are difficult to predict, our method achieved 9.83% better accuracy than historical price method, and 3.03% better than human sentiment method.

Introduction

Stock price forecasting is very important in the planning of business activity. However, building an accurate stock prediction model is still a challenging problem. In addition to historical prices, the current stock market is affected by the mood of society. The overall social mood with respect to a given company might be one of the important variables which affect the stock price of that company. Nowadays, the emergence of online social networks makes available large amounts of mood data. Therefore, incorporating information from social media with the historical prices can improve the predictive ability of models.

The goal of our research is to develop a model to predict the stock price movement (whether the price will be up or down) using information from social media (Message Board). In our proposed method, a model that predicts the stock value at t using features derived from information at t1 and t2, where t stands for a transaction date, will be trained by supervised machine learning. Apart from the mood information, the stock prices are affected by many factors such as microeconomic and macroeconomic factors. However, this research only focuses on how the mood information from social media can be used to predict the stock price. We will mainly aim at extracting the mood information by sentiment analysis on social media data. Then, these sentiments will be integrated into a model to predict stocks. To achieve this goal, discovering the topics and sentiments in a large amount of social media is very important to get the opinions of investors. However, sentiment analysis on social media is difficult. The text is usually short, contains many misspellings, uncommon grammar constructions and so on. In addition, the literature shows conflicting results in sentiment analysis for stock market prediction. Some researchers report that sentiments from social media have no predictive capabilities (Antweiler, Frank, 2004, Tumarkin, Whitelaw, 2001), while other researchers have reported either weak or strong predictive capabilities (Bollen, Mao, & Zeng, 2011). Therefore, how to use opinions in social media for stock price predictions is still an open problem.

One contribution of this paper is that we propose a novel feature ‘topic-sentiment’ to improve the performance of stock market prediction. It is important to recognize what topics are discussed in social media and how people feel about these topics. The ‘topic-sentiment’ feature, which represents the sentiments of the specific topics of the company (product, service, dividend and so on), are used for prediction of stock price movement. This feature is obtained in two ways: by using the existing topic model called the joint sentiment/topic model (JST) and by our own proposed method. The extracted topics and sentiments in the former method are hidden (latent), whereas not hidden in the latter. To the best of our knowledge, this is the first research trying to extract topics and sentiments simultaneously and utilize them for stock market prediction. Another contribution is a large scale evaluation. The effectiveness of the sentiments in social media in stock market prediction is still uncertain because a relatively small data was used for evaluation in the previous work. This paper investigates whether the sentiments in the social media are really useful on the test data containing many stocks and transaction dates.

The rest of the paper is organized as follows. Section 2 introduces some previous approaches on sentiment analysis for stock prediction. Section 3 describes our dataset. Section 4 describes our proposed method. We also propose a novel feature for stock prediction based on the topics and the sentiments associated with them. Section 5 assesses the results of the experiments. Finally, Section 6 concludes our contribution.

Section snippets

Related work

Stock market prediction is one of the most attracted topics in academic as well as real life business. Many researches have tried to address the question whether the stock market can be predicted. Some of the researches were based on the random walk theory and the Efficient Market Hypothesis (EMH). According to the EMH (Fama, 1991, Fama, Fisher, Jensen, Roll, 1969), the current stock market fully reflects all available information. Hence, price changes are merely due to new information or news.

Dataset

We used two datasets for our stock prediction model. The first one is the historical price dataset, and the second one is the mood information dataset.

Methods for stock movement prediction

The Support Vector Machine (SVM) has long been recognized as being able to efficiently handle high dimensional data and has been shown to perform well on classification (Joachims, 1998, Nguyen, Shirai, 2013). Therefore, we chose the SVM with the linear kernel as the prediction model. To assess the effectiveness of sentiment analysis on the message boards, six sets of features are designed. The first one used only the historical prices. The other methods incorporated the mood information into

Experiment setup

We divided the time series into two parts: the period from July 23, 2012 to March 28, 2013 for training containing 171 transaction dates, and April 01, 2013 to July 19, 2013 for testing containing 78 transaction dates3. We assigned each transaction date a label (up, down) by comparing its price with the previous transaction

Conclusion & future work

Stock price prediction is a challenging task because the stock prices are affected by many factors. This paper presents the novel method to integrate the sentiments in social media for the prediction of stock price movement. The contribution of this study can be summarized as follows. First, while the overall sentiments in the documents are considered in the previous research, this research proposed a method using the sentiment of the topic for stock market prediction. Second, we proposed two

References (34)

  • BleiD.M. et al.

    Latent Dirichlet allocation

    Journal of Machine Learning Research

    (2003)
  • DermoucheM. et al.

    A joint model for topic-sentiment modeling from text

    ACM/SIGAPP symposium on applied computing (sac)

    (2015)
  • FamaE.F.

    Efficient capital markets: II

    The Journal of Finance

    (1991)
  • FamaE.F. et al.

    The adjustment of stock prices to new information

    International Economic Review

    (1969)
  • JoY. et al.

    Aspect and sentiment unification model for online review analysis

    Proceedings of the fourth ACM international conference on web search and data mining

    (2011)
  • JoachimsT.

    Text categorization with support vector machines: learning with many relevant features

    (1998)
  • LakkarajuH. et al.

    Exploiting coherence for the simultaneous discovery of latent facets and associated sentiments

    Proceedings of the eleventh SIAM international conference on data mining

    (2011)
  • Cited by (420)

    • Unveiling the potential: Exploring the predictability of complex exchange rate trends

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    View full text