Abstract
The recent development of deep learning-based natural language processing (NLP) methods has fostered many downstream applications in various fields. As one of the applications in the financial industry, fine-grained financial sentiment analysis (FSA) aims to understand the sentimental orientation, i.e., bullish or bearish, of financial texts by predicting the polarity score and has been widely applied in the financial industry stock-related opinion mining. Because of the lack of a large-scale labeled dataset and the domain-dependent nature, FSA is challenging. Previous works mainly focus on constructing and exploiting handcrafted lexicons that encode expert knowledge to enhance the semantic features in decision making, which yields improvements but are expensive to acquire. This paper proposes a lightweight regression model incorporating the statistical distribution of a term over the polarity range, say between − 1 and 1, to address the fine-grained FSA task. More concretely, we first count each word’s appearance at different polarity intervals and produce a statistic-based representation for each text, which will be encoded as a corpus-level statistical feature vector by an autoencoder. Subsequently, the obtained feature vector will be integrated with the semantic feature vector in the regression model. Our experiments show such a model can produce significant improvements compared with the baseline models on two FSA subsets, i.e., news headlines and microblogs, without a computational overhead. Furthermore, we notice the signs that lexicon-based approaches have neglected can play an important role in FSA.
Similar content being viewed by others
References
Akhtar, M.S., Kumar, A., Ghosal, D., Ekbal, A., Bhattacharyya, P.: A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/D17-1057, pp 540–546. Association for Computational Linguistics, Copenhagen, Denmark (2017)
Antweiler, W., Frank, M.Z.: Is all that talk just noise? the information content of internet stock message boards. J. Financ. 59(3), 1259–1294 (2004). http://www.jstor.org/stable/3694736
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (2015)
Brown, G.W., Cliff, M.T.: Investor sentiment and the near-term stock market. J. Empir. Financ. 11(1), 1–27 (2004). https://doi.org/10.1016/j.jempfin.2002.12.001, https://www.sciencedirect.com/science/article/pii/S0927539803000422
Cai, Y., Huang, Q., Lin, Z., Xu, J., Chen, Z., Li, Q.: Recurrent neural network with pooling operation and attention mechanism for sentiment analysis: A multi-task learning approach. Knowledge-Based Systems 203, 105856 (2020)
Cambria, E., Li, Y., Xing, F.Z., Poria, S., Kwok, K.: Senticnet 6: Ensemble application of symbolic and subsymbolic ai for sentiment analysis. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, pp 105–114 (2020)
Chen, X., Xie, H., Cheng, G., Li, Z.: A decade of sentic computing: Topic modeling and bibliometric analysis. Cogn. Comput., 1–24 (2021)
Cortis, K., Freitas, A., Daudert, T., Huerlimann, M., Zarrouk, M., Handschuh, S., Davis, B.: SemEval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). https://doi.org/10.18653/v1/S17-2089, pp 519–535. Association for Computational Linguistics, Vancouver, Canada (2017)
Davis, B., Cortis, K., Vasiliu, L., Koumpis, A., McDermott, R., Handschuh, S.: Social sentiment indices powered by x-scores. 2nd International Conference on Big Data, Small Data, Linked Data and Open Data, ALLDATA 2016. p. 21 (2016)
Do, H.H., Prasad, P., Maag, A., Alsadoon, A.: Deep learning for aspect-based sentiment analysis: A comparative review. Expert Syst. Appl. 118, 272–299 (2019). https://doi.org/10.1016/j.eswa.2018.10.003, https://www.sciencedirect.com/science/article/pii/S0957417418306456
Fama, E.F.: Efficient capital markets: A review of theory and empirical work. J. Financ. 25(2), 383–417 (1970). http://www.jstor.org/stable/2325486
Feuerriegel, S., Prendinger, H.: News-based trading strategies. Decis. Support Syst. 90, 65–74 (2016)
Ghosal, D., Bhatnagar, S., Akhtar, M.S., Ekbal, A., Bhattacharyya, P.: IITP at SemEval-2017 task 5: An ensemble of deep learning and feature based models for financial sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). https://doi.org/10.18653/v1/S17-2154, pp 899–903. Association for Computational Linguistics, Vancouver, Canada (2017)
Graves, A., Jaitly, N., Mohamed, A.: Hybrid speech recognition with deep bidirectional lstm. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. https://doi.org/10.1109/ASRU.2013.6707742, pp 273–278 (2013)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04. https://doi.org/10.1145/1014052.1014073, pp 168–177. Association for Computing Machinery, New York, NY, USA (2004)
Hutto, C., Gilbert, E.: Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8 (2014)
Jiang, M., Lan, M., Wu, Y.: ECNU at SemEval-2017 task 5: An ensemble of regression algorithms with effective features for fine-grained sentiment analysis in financial domain. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). https://doi.org/10.18653/v1/S17-2152, pp 888–893. Association for Computational Linguistics, Vancouver, Canada (2017)
Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 562–570. Association for Computational Linguistics (2017)
Kar, S., Maharjan, S., Solorio, T.: RiTUAL-UH at SemEval-2017 task 5: Sentiment analysis on financial data using neural networks. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). https://doi.org/10.18653/v1/S17-2150, pp 877–882. Association for Computational Linguistics, Vancouver, Canada (2017)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. Association for Computational Linguistics (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of the 2014 International Conference on Learning Representations (2014)
Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)
Li, F.: The information content of forward-looking statements in corporate filings—a naïve bayesian machine learning approach. J. Account. Res. 48(5), 1049–1102 (2010). https://doi.org/10.1111/j.1475-679X.2010.00382.x
Li, X., Li, Z., Xie, H., Li, Q.: Merging statistical feature via adaptive gate for improved text classification. Proc. AAAI Conf. Artif. Intell. 35 (15), 13288–13296 (2021). https://ojs.aaai.org/index.php/AAAI/article/view/17569
Li, X., Li, Z., Zhao, Y., Xie, H., Li, Q.: Incorporating effective global information via adaptive gate attention for text classification. arXiv:2002.09673 (2020)
Li, X., Xie, H., Chen, L., Wang, J., Deng, X.: News impact on stock price return via sentiment analysis. Knowl.-Based Syst. 69, 14–23 (2014). https://doi.org/10.1016/j.knosys.2014.04.022, https://www.sciencedirect.com/science/article/pii/S0950705114001440
Li, X., Xie, H., Lau, R.Y.K., Wong, T., Wang, F.L.: Stock prediction via sentimental transfer learning. IEEE Access 6, 73110–73118 (2018)
Li, Z., Chen, X., Xie, H., Li, Q., Tao, X.: Emochannelattn: Exploring emotional construction towards multi-class emotion classification. In: 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). https://doi.org/10.1109/WIIAT50758.2020.00036, pp 242–249 (2020)
Li, Z., Xie, H., Cheng, G., Li, Q.: Word-level emotion distribution with two schemas for short text emotion classification. Knowl.-Based Syst., 107163. https://doi.org/10.1016/j.knosys.2021.107163, https://www.sciencedirect.com/science/article/pii/S0950705121004263 (2021)
Loughran, T., Mcdonald, B.: When is a liability not a liability? textual analysis, dictionaries, and 10-ks. J. Financ. 66 (1), 35–65 (2011). https://doi.org/10.1111/j.1540-6261.2010.01625.x
Luo, L., Ao, X., Pan, F., Wang, J., Zhao, T., Yu, N., He, Q.: Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. https://doi.org/10.24963/ijcai.2018/590, pp 4244–4250 (2018)
Mai, L., Le, B.: Joint sentence and aspect-level sentiment analysis of product comments. Ann. Oper. Res., 1–21 (2020)
Malkiel, B.G.: The efficient market hypothesis and its critics. J. Econ. Perspect. 17(1), 59–82 (2003). https://doi.org/10.1257/089533003321164958
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). https://www.aclweb.org/anthology/L18-1008, pp 52–55. European Language Resources Association (ELRA) (2018)
Mohammad, S.M., Kiritchenko, S.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31(2), 301–326 (2015). https://doi.org/10.1111/coin.12024
Mohammad, S.M., Turney, P.D.: Nrc emotion lexicon. National Research Council, Canada, pp. 1–234 (2013)
Mowlaei, M.E., Saniee Abadeh, M., Keshavarz, H.: Aspect-based sentiment analysis using adaptive aspect-based lexicons. Expert Syst. Appl. 148, 113234 (2020). https://doi.org/10.1016/j.eswa.2020.113234, https://www.sciencedirect.com/science/article/pii/S0957417420300609
Ramos, J., et al.: Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning (2003)
Sabherwal, S., Sarkar, S.K., Zhang, Y.: Do internet stock message boards influence trading? evidence from heavily discussed stocks with no fundamental news. J. Bus. Financ. Account. 38(9-10), 1209–1237 (2011). https://doi.org/10.1111/j.1468-5957.2011.02258.x
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Stone, P.J., Dunphy, D.C., Smith, M.S.: The general inquirer: A computer approach to content analysis (1966)
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1422–1432 (2015)
Wang, G., Wang, T., Wang, B., Sambasivan, D., Zhang, Z., Zheng, H., Zhao, B.Y.: Crowds on wall street: Extracting value from collaborative investing platforms. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work &; Social Computing, CSCW ’15. https://doi.org/10.1145/2675133.2675144, pp 17–30. Association for Computing Machinery, New York, NY, USA (2015)
Wang, Q., Lau, R.Y.K.: The impact of investors’ surprise emotion on post-m&a performance: A social media analytics approach. In: 40th International Conference on Information Systems (ICIS 2019). Association for Information Systems (2019)
Xing, F., Malandri, L., Zhang, Y., Cambria, E.: Financial sentiment analysis: An investigation into common mistakes and silver bullets. In: Proceedings of the 28th International Conference on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.85, pp 978–987. International Committee on Computational Linguistics, Barcelona Spain (Online) (2020)
Xing, F.Z., Cambria, E., Welsch, R.E.: Natural language based financial forecasting: a survey. Artif. Intell. Rev. 50(1), 49–73 (2018)
Xu, J., Cai, Y., Wu, X., Lei, X., Huang, Q., Leung, H.F., Li, Q.: Incorporating context-relevant concepts into convolutional neural networks for short text classification. Neurocomputing 386, 42–53 (2020)
Yuan, H., Tang, Y., Xu, W., Lau, R.Y.K.: Exploring the influence of multimodal social media data on stock performance: an empirical perspective and analysis. Internet Res. (2021)
Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, pp 649–657 (2015)
Zubiaga, A.: Exploiting Class Labels to Boost Performance on Embedding-Based Text Classification. Association for Computing Machinery, New York, NY USA. https://doi.org/10.1145/3340531.3417444 (2020)
Acknowledgements
The work of this paper has been supported by the Hong Kong Research Grants Council under the General Research Fund (Project No. 15200021), the Lam Woo Research Fund (Project No. LWI20011) and the Faculty Research Grant (Project No. DB21B6) of Lingnan University, Hong Kong, the One-off Special Fund from Central and Faculty Fund in Support of Research from 2019/20 to 2021/22 (Project No. MIT02/19-20) and the Research Cluster Fund (Project No. RG 78/2019-2020R) of The Education University of Hong Kong. Lau’s work was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU 11507219). Dian Zhang’s work was supported by NSFC (Project No. 61872247).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Web Intelligence =Artificial Intelligence in the Connected World Guest Editors: Yuefeng Li, Amit Sheth, Athena Vakali, and Xiaohui Tao
Rights and permissions
About this article
Cite this article
Zhang, H., Li, Z., Xie, H. et al. Leveraging statistical information in fine-grained financial sentiment analysis. World Wide Web 25, 513–531 (2022). https://doi.org/10.1007/s11280-021-00993-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-021-00993-1