Skip to main content

Prediction and Trading of Dow Jones from Twitter: A Boosting Text Mining Method with Relevant Tweets Identification

  • Conference paper
  • First Online:
Book cover Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2017)

Abstract

Previous studies claim that financial news influence the movements of stock prices almost instantaneously, however the poor foreseeability of news limits their possibility of predicting the stock price changes and trading actions. Recently complex sentiment analysis techniques have also showed that large amount of social network posts can predict the price movements of the Dow Jones Industrial Average (DJIA) within a less stringent timescale. From the idea that the contents of social posts can forecast the future stock trading actions, in this paper we present a simpler text mining method than the sentiment analysis approaches, which extracts the predictive knowledge of the DJIA movements from a large dataset of tweets, boosting also the prediction accuracy by identifying and filtering out irrelevant/noisy tweets. The noise detection technique we introduced improves the initial effectiveness of more than 10%. We tested our method on 10 millions twitter posts spanning one year, achieving an accuracy of 88.9% in the Dow Jones daily predictions, which, to the best of our knowledge, improves the best literature result based on social networks. Finally we have used the prediction method to drive the DJIA buy/sell actions of a trading protocol; the achieved return on investments (ROI) outperforms the state-of-the-art.

This work was partially supported by the project “Toreador”, funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 688797. We thank NVIDIA Corporation for the donated Titan GPU used in this work.

G. Domeniconi—Contribution done during the affiliation at the University of Bologna.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://bit.ly/2x0WsVD.

  2. 2.

    www.cs.waikato.ac.nz/ml/weka/.

  3. 3.

    http://help.sentiment140.com/.

References

  1. Liu, Y., Huang, X., An, A., Yu, X.: ARSA: a sentiment-aware model for predicting sales performance using blogs. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 607–614. ACM (2007)

    Google Scholar 

  2. Mishne, G., de Rijke, M.: Capturing global mood levels using blog posts. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs 2006, pp. 145–152 (2006)

    Google Scholar 

  3. Asur, S., Huberman, B.A.: Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 492–499. IEEE (2010)

    Google Scholar 

  4. Gruhl, D., Guha, R., Kumar, R., Novak, J., Tomkins, A.: The predictive power of online chatter. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 78–87. ACM (2005)

    Google Scholar 

  5. Gayo-Avello, D.: “I wanted to predict elections with Twitter and all I got was this lousy paper”–a balanced survey on election prediction using Twitter data. arXiv preprint arXiv:1204.6441 (2012)

  6. Fama, E.F.: The behavior of stock-market prices. J. Bus. 38, 34–105 (1965)

    Article  Google Scholar 

  7. Kimoto, T., Asakawa, K., Yoda, M., Takeoka, M.: Stock market prediction system with modular neural networks. In: 1990 IJCNN International Joint Conference on Neural Networks, pp. 1–6. IEEE (1990)

    Google Scholar 

  8. Fama, E.F.: Efficient capital markets: II. J. Financ. 46, 1575–1617 (1991)

    Article  Google Scholar 

  9. Malkiel, B.G.: The efficient market hypothesis and its critics. J. Econ. Perspect. 17, 59–82 (2003)

    Article  Google Scholar 

  10. Lo, A.W., MacKinlay, A.C.: Stock market prices do not follow random walks: evidence from a simple specification test. Rev. Financ. Stud. 1, 41–66 (1988)

    Article  Google Scholar 

  11. Butler, K.C., Malaikah, S.: Efficiency and inefficiency in thinly traded stock markets: Kuwait and Saudi Arabia. J. Bank. Financ. 16, 197–210 (1992)

    Article  Google Scholar 

  12. Gidófalvi, G., Elkan, C.: Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego (2001)

    Google Scholar 

  13. Schumaker, R.P., Chen, H.: Textual analysis of stock market prediction using financial news. In: Americas Conference on Information Systems (2006)

    Google Scholar 

  14. Li, X., Wang, C., Dong, J., Wang, F., Deng, X., Zhu, S.: Improving stock market prediction by integrating both market news and stock prices. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011. LNCS, vol. 6861, pp. 279–293. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23091-2_24

    Chapter  Google Scholar 

  15. Lin, M.C., Lee, A.J.T., Kao, R.T., Chen, K.T.: Stock price movement prediction using representative prototypes of financial reports. ACM Trans. Manag. Inf. Syst. 2, 19:1–19:18 (2008)

    Google Scholar 

  16. Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: ICWSM (2011)

    Google Scholar 

  17. Si, J., Mukherjee, A., Liu, B., Pan, S.J., Li, Q., Li, H.: Exploiting social relations and sentiment for stock prediction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25–29 October 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1139–1145. ACL (2014)

    Google Scholar 

  18. Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: a hybrid system using n-gram analysis and dynamic artificial neural network. Expert Syst. Appl. 40, 6266–6282 (2013)

    Article  Google Scholar 

  19. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2, 1–8 (2011)

    Article  Google Scholar 

  20. Samant, R.M., Rao, S.: The effect of noise in automatic text classification. In: Proceedings of the International Conference & #38; Workshop on Emerging Trends in Technology, ICWET 2011, pp. 557–558. ACM, New York (2011)

    Google Scholar 

  21. Markou, M., Singh, S.: Novelty detection: a review? Part 1: statistical approaches. Sig. Process. 83, 2481–2497 (2003)

    Article  Google Scholar 

  22. Domeniconi, G., Moro, G., Pagliarani, A., Pasolini, R.: Learning to predict the stock market Dow Jones index detecting and mining relevant tweets. In: Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Funchal, Madeira, Portugal, 1–3 November 2017, vol. 1, pp. 165–172. SciTePress (2017)

    Google Scholar 

  23. Wilson, T., et al.: OpinionFinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, pp. 34–35. Association for Computational Linguistics (2005)

    Google Scholar 

  24. Oliveira, N., Cortez, P., Areal, N.: Some experiments on modeling stock market behavior using investor sentiment analysis and posting volume from Twitter. In: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS 2013, pp. 31:1–31:8. ACM, New York (2013)

    Google Scholar 

  25. Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf.idf. In: Helfert, M., Holzinger, A., Belo, O., Francalanci, C. (eds.) DATA 2015. CCIS, vol. 584, pp. 39–58. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30162-4_4

    Chapter  Google Scholar 

  26. O’Connor, N., Madden, M.G.: A neural network approach to predicting stock exchange movements using external factors. Knowl. Based Syst. 19, 371–378 (2006)

    Article  Google Scholar 

  27. Fabbri, M., Moro, G.: Dow Jones trading with deep learning: the unreasonable effectiveness of recurrent neural networks. In: Proceedings of the 7th International Conference on Data Science, Technology and Applications: DATA, INSTICC, vol. 1, pp. 142–153. SciTePress (2018)

    Google Scholar 

  28. Bao, W., Yue, J., Rao, Y.: A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLOS ONE 7, (2017)

    Google Scholar 

  29. Qian, B., Rasheed, K.: Stock market prediction with multiple classifiers. Appl. Intell. 26, 25–33 (2007)

    Article  Google Scholar 

  30. Atsalakis, G.S., Valavanis, K.P.: Surveying stock market forecasting techniques-part ii: soft computing methods. Expert Syst. Appl. 36, 5932–5941 (2009)

    Article  Google Scholar 

  31. Mittermayer, M.A., Knolmayer, G.: Text mining systems for market response to news: a survey. Institut für Wirtschaftsinformatik der Universität Bern (2006)

    Google Scholar 

  32. Antweiler, W., Frank, M.Z.: Is all that talk just noise? The information content of internet stock message boards. J. Financ. 59, 1259–1294 (2004)

    Article  Google Scholar 

  33. Gilbert, E., Karahalios, K.: Widespread worry and the stock market. In: ICWSM, pp. 59–65 (2010)

    Google Scholar 

  34. Bahrammirzaee, A.: A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput. Appl. 19, 1165–1195 (2010)

    Article  Google Scholar 

  35. Chyan A, Lengerich C, Hsieh T.: A stock-purchasing agent from sentiment analysis of Twitter (2012)

    Google Scholar 

  36. Mittal, A., Goel, A.: Stock prediction using Twitter sentiment analysis (2012)

    Google Scholar 

  37. Sprenger, T.O., Tumasjan, A., Sandner, P.G., Welpe, I.M.: Tweets and trades: the information content of stock microblogs. Eur. Financ. Manag. 20, 926–957 (2013)

    Article  Google Scholar 

  38. Rao, T., Srivastava, S.: Twitter sentiment analysis: how to hedge your bets in the stock markets. CoRR abs/1212.1107 (2012)

    Google Scholar 

  39. Mao, H., Counts, S., Bollen, J.: Predicting financial markets: comparing survey, news, Twitter and search engine data. arXiv preprint arXiv:1112.1051 (2011)

  40. Mao, Y., Wei, W., Wang, B., Liu, B.: Correlating S&P 500 stocks with Twitter data. In: Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, HotSocial 2012, pp. 69–72. ACM, New York (2012)

    Google Scholar 

  41. Porshnev, A., Redkin, I., Shevchenko, A.: Improving prediction of stock market indices by analyzing the psychological states of Twitter users. In: HSE Working Papers WP BRP 22/FE/2013. National Research University Higher School of Economics (2013)

    Google Scholar 

  42. Ruiz, E.J., Hristidis, V., Castillo, C., Gionis, A., Jaimes, A.: Correlating financial time series with micro-blogging activity. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 513–522. ACM, New York (2012)

    Google Scholar 

  43. Zhang, X., Fuehres, H., Gloor, P.A.: Predicting stock market indicators through Twitter “I hope it is not as bad as I fear”. Proc. Soc. Behav. Sci. 26, 55–62 (2011)

    Article  Google Scholar 

  44. Mao, Y., Wei, W., Wang, B.: Twitter volume spikes: analysis and application in stock trading. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, p. 4. ACM (2013)

    Google Scholar 

  45. Arias, M., Arratia, A., Xuriguera, R.: Forecasting with Twitter data. ACM Trans. Intell. Syst. Technol. 5, 8:1–8:24 (2014)

    Google Scholar 

  46. Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263. ACM (1995)

    Google Scholar 

  47. Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4. 5. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 41. ACM (2004)

    Google Scholar 

  48. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41, 15:1–15:58 (2009)

    Article  Google Scholar 

  49. Domeniconi, G., Moro, G., Pasolini, R., Sartori, C.: Iterative refining of category profiles for nearest centroid cross-domain text classification. In: Fred, A., Dietz, J.L.G., Aveiro, D., Liu, K., Filipe, J. (eds.) IC3K 2014. CCIS, vol. 553, pp. 50–67. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25840-9_4

    Chapter  Google Scholar 

  50. Domeniconi, G., Moro, G., Pagliarani, A., Pasolini, R.: Markov chain based method for in-domain and cross-domain sentiment classification. In: KDIR 2015 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, part of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015), Lisbon, Portugal, vol. 1, pp. 127–137. SciTePress (2015)

    Google Scholar 

  51. Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P.: Cross-organism learning method to discover new gene functionalities. Comput. Meth. Progr. Biomed. 126, 20–34 (2016)

    Article  Google Scholar 

  52. Moro, G., Pagliarani, A., Pasolini, R., Sartori, C.: Cross-domain & in-domain sentiment analysis with memory-based deep neural networks. In: Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Seville, Spain, 18–20 September 2018. SciTePress (2018)

    Google Scholar 

  53. Domeniconi, G., Moro, G., Pagliarani, A., Pasolini, R.: On deep learning in cross-domain sentiment classification. In: Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, KDIR, Funchal, Madeira, Portugal, 1–3 November 2017, vol. 1, pp. 50–60. INSTICC, SciTePress (2017). https://doi.org/10.5220/0006488100500060. ISBN: 978-989-758-271-4

  54. Pagliarani, A., Moro, G., Pasolini, R., Domeniconi, G.: Transfer learning in sentiment classification with deep neural networks. In: International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management, Springer, Heidelberg (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giacomo Domeniconi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moro, G., Pasolini, R., Domeniconi, G., Pagliarani, A., Roli, A. (2019). Prediction and Trading of Dow Jones from Twitter: A Boosting Text Mining Method with Relevant Tweets Identification. In: Fred, A., et al. Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2017. Communications in Computer and Information Science, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-15640-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15640-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15639-8

  • Online ISBN: 978-3-030-15640-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics