Skip to main content
Log in

A hybrid approach for stock trend prediction based on tweets embedding and historical prices

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Recently, the development of data mining and natural language processing techniques enable the relationship probe between social media and stock market volatility. The integration of natural language processing, deep learning and the financial field is irresistible. This paper proposes a hybrid approach for stock market prediction based on tweets embedding and historical prices. Different from the traditional text embedding methods, our approach takes the internal semantic features and external structural characteristics of Twitter data into account, such that the generated tweet vectors can contain more effective information. Specifically, we develop a Tweet Node algorithm for describing potential connection in Twitter data through constructing the tweet node network. Further, our model supplements emotional attributes to the Twitter representations, which are input into a deep learning model based on attention mechanism together with historical stock price. In addition, we designed a visual interactive stock prediction tool to display the result of the prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Akita, R., Yoshihara, A., Matsubara, T., Uehara, K.: Deep learning for stock prediction using numerical and textual information. In: 15th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2016, Okayama, Japan, June 26-29, 2016, pp. 1–6. IEEE Computer Society (2016)

  2. Ali, S.A., Raza, B., Malik, A.K., Shahid, A.R., Faheem, M., Alquhayz, H., Kumar, Y.J.: An optimally configured and improved deep belief network (OCI-DBN) approach for heart disease prediction based on ruzzo-tompa and stacked genetic algorithm. IEEE Access 8, 65947–65958 (2020)

    Article  Google Scholar 

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

  4. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

  5. Conneau, A., Kruszewski, G., Lample, G., Barrault, L., Baroni, M.: What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv:1805.01070 (2018)

  6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019)

  7. Ding, X., Zhang, Y., Liu, T., Duan, J.: Deep learning for event-driven stock prediction. In: Yang, Q., Wooldridge, M.J. (eds.) Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pp. 2327–2333. AAAI Press (2015)

  8. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 855–864. ACM (2016)

  9. Hiemstra, D.: A probabilistic justification for using tf x idf term weighting in information retrieval. Int. J. Digit. Libr. 3(2), 131–139 (2000)

    Article  Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Hu, Z., Liu, W., Bian, J., Liu, X., Liu, T.: Listening to chaotic whispers: A deep learning framework for news-oriented stock trend prediction. In: Chang, Y., Zhai, C., Liu, Y., Maarek, Y. (eds.) Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, Marina Del Rey, CA, USA, February 5-9, 2018, pp. 261–269. ACM (2018)

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  13. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, JMLR Workshop and Conference Proceedings, vol. 32, pp. 1188–1196. JMLR.org (2014)

  14. Li, Q., Chen, Y., Wang, J., Chen, Y., Chen, H.: Web media and stock markets : A survey and future directions from a big data perspective. IEEE Trans. Knowl. Data Eng. 30(2), 381–399 (2018)

    Article  Google Scholar 

  15. Li, X., Li, Y., Yang, H., Yang, L., Liu, X.: DP-LSTM: differential privacy-inspired LSTM for stock prediction using financial news. arXiv:1912.10806 (2019)

  16. Lin, Z., Feng, M., dos Santos, C.N., Yu, M., Xiang, B., Zhou, B., Bengio, Y.: A structured self-attentive sentence embedding. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net (2017)

  17. Little, C., Mclean, D., Crockett, K.A., Edmonds, B.: A semantic and syntactic similarity measure for political tweets. IEEE Access 8, 154095–154113 (2020)

    Article  Google Scholar 

  18. Liu, P., Qiu, X., Huang, X.: Adversarial multi-task learning for text classification. In: Barzilay, R., Kan, M. (eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pp. 1–10. Association for Computational Linguistics (2017)

  19. Liu, X., Huang, H., Zhang, Y., Yuan, C.: News-driven stock prediction with attention-based noisy recurrent state transition. arXiv:2004.01878 (2020)

  20. Ma, Y., Zong, L., Wang, P.: A novel distributed representation of news (drnews) for stock market predictions. arXiv:2005.11706 (2020)

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings (2013)

  22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C. J. C., Bottou, L., Ghahramani, Z., Weinberger, K. Q. (eds.) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pp. 3111–3119 (2013)

  23. Nguyen, T.H., Shirai, K.: Topic modeling based sentiment analysis on social media for stock market prediction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pp. 1354–1364. The Association for Computer Linguistics (2015)

  24. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Macskassy, S.A., Perlich, C., Leskovec, J., Wang, W., Ghani, R. (eds.) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014, pp. 701–710. ACM (2014)

  25. Rather, A.M., Agarwal, A., Sastry, V.N.: Recurrent neural network and a hybrid model for prediction of stock returns. Expert Syst. Appl. 42(6), 3234–3241 (2015)

    Article  Google Scholar 

  26. Ren, R., Wu, D.D., Liu, T.: Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Syst. J. 13 (1), 760–770 (2019)

    Article  Google Scholar 

  27. Sawhney, R., Agarwal, S., Wadhwa, A., Shah, R.R.: Deep attentive learning for stock movement prediction from social media text and company correlations. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pp. 8415–8426 (2020)

  28. Scheel, O.: Using deep neural networks for scene understanding and behaviour prediction in autonomous driving. Ph.D. thesis, Technical University of Munich, Germany (2020)

  29. Staudemeyer, R.C., Morris, E.R.: Understanding LSTM - a tutorial into long short-term memory recurrent neural networks. arXiv:1909.09586 (2019)

  30. Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261 (2016)

  31. Thompson, N.C., Greenewald, K., Lee, K., Manso, G.F.: The computational limits of deep learning. arXiv:2007.05558 (2020)

  32. Vanstone, B.J., Gepp, A., Harris, G.: Do news and sentiment play a role in stock price prediction?. Appl. Intell. 49(11), 3815–3820 (2019)

    Article  Google Scholar 

  33. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp 5998–6008 (2017)

  34. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  35. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 1225–1234. ACM (2016)

  36. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., Dean, J.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144 (2016)

  37. Xu, N., Zeng, Z., Mao, W.: Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pp. 3777–3786. Association for Computational Linguistics (2020)

  38. Xu, Y., Cohen, S.B.: Stock movement prediction from tweets and historical prices. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pp. 1970–1979. Association for Computational Linguistics (2018)

  39. Yahoo finance. https://finance.yahoo.com/ (2012)

  40. Yang, Y., Wu, B., Zhao, K., Guo, W.: Tweet stance detection: A two-stage DC-BILSTM model based on semantic attention. In: 5th IEEE International Conference on Data Science in Cyberspace, DSC 2020, Hong Kong, July 27-30, 2020, pp. 22–29. IEEE (2020)

  41. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: Knight, K., Nenkova, A., Rambow, O. (eds.) NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pp. 1480–1489. The Association for Computational Linguistics (2016)

  42. Zheng, J., Xia, A., Shao, L., Wan, T., Qin, Z.: Stock volatility prediction based on self-attention networks with social information. In: IEEE Conference on Computational Intelligence for Financial Engineering & Economics, CIFEr 2019, Shenzhen, China, May 4-5, 2019, pp. 1–7. IEEE (2019)

Download references

Acknowledgements

This work is partially supported by GuangDong Basic and Applied Basic Research Foundation 2019B1515120048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Cheng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Huihui Ni and Shuting Wang are joint first author and contribute equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ni, H., Wang, S. & Cheng, P. A hybrid approach for stock trend prediction based on tweets embedding and historical prices. World Wide Web 24, 849–868 (2021). https://doi.org/10.1007/s11280-021-00880-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00880-9

Keywords

Navigation