InFi-BERT 1.0: Transformer-Based Language Model for Indian Financial Volatility Prediction

Sasubilli, Sravani; Verma, Mridula

doi:10.1007/978-3-031-23633-4_10

Sravani Sasubilli⁴⁶ &
Mridula Verma⁴⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1753))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

872 Accesses

Abstract

In recent years, BERT-like pretrained neural language models have been successfully developed and utilized for multiple financial domain-specific tasks. These domain-specific pre-trained models are effective enough to learn the specialized language used in financial context. In this paper, we consider the task of textual regression for the purpose of forecasting financial volatility from financial texts, and designed Infi-BERT (Indian Financial BERT), a transformer-based pre-trained language model using domain-adaptive pre-training approach, which effectively learns linguistic-context from annual financial reports from Indian financial texts. In addition, we present the first Indian financial corpus for the task of volatility prediction. With detailed experimentation and result analysis, we demonstrated that our model outperforms the base model as well as the previous domain-specific models for financial volatility forecasting task.

Supported by Ministry of Electronics and Information Technology (MeiTy), Government of India and IIT Bhilai Innovation and Technology Foundation (IBITF) under the project entitled "Blockchain and Machine Learning Powered Unified Video KYC Framework".

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Indian Annual Report Assessment Using Large Language Models

Multi-modal Natural Language Processing for Stock Price Prediction

Is Deep-Learning and Natural Language Processing Transcending the Financial Forecasting? Investigation Through Lens of News Analytic Process

Article Open access 22 July 2021

Notes

References

Araci, D.: Finbert: Financial sentiment analysis with pre-trained language models. CoRR abs/ arXiv: 1908.10063 (2019)
Arslan, Y., et al.: A comparison of pre-trained language models for multi-class text classification in the financial domain. In: Companion Proceedings of the Web Conference 2021, WWW 2021, pp. 260–268. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3442442.3451375
Au, W., Ait-Azzi, A., Kang, J.: Finsbd-2021: The 3rd shared task on structure boundary detection in unstructured text in the financial domain. In: Companion Proceedings of the Web Conference 2021, pp. 276–279 (2021)
Google Scholar
Barbaglia, L., Consoli, S., Wang, S.: Financial forecasting with word embeddings extracted from news: A preliminary analysis. In: Kamp, M., et al. (eds.) Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 179–188. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-93733-1_12
Chapter Google Scholar
Chen, Q.: Stock movement prediction with financial news using contextualized embedding from bert. arXiv preprint arXiv:2107.08721 (2021)
De Stefani, J., Caelen, O., Hattab, D., Bontempi, G.: Machine learning for multi-step ahead forecasting of volatility proxies. In: MIDAS@ PKDD/ECML, pp. 17–28 (2017)
Google Scholar
Dereli, N., Saraclar, M.: Convolutional neural networks for financial text regression. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 331–337. Association for Computational Linguistics, Florence, Italy (Jul 2019). https://doi.org/10.18653/v1/P19-2046, https://www.aclweb.org/anthology/P19-2046
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/ arXiv: 1810.04805 (2018)
Gururangan, S., et al.: Don’t stop pretraining: adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964 (2020)
Kogan, S., Levin, D., Routledge, B.R., Sagi, J.S., Smith, N.A.: Predicting risk from financial reports with regression. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 272–280. Association for Computational Linguistics, Boulder, Colorado (Jun 2009), https://www.aclweb.org/anthology/N09-1031
Kristjanpoller, W., Fadic, A., Minutolo, M.C.: Volatility forecast using hybrid neural network models. Expert Syst. Appli. 41(5), 2437–2442 (2014). https://doi.org/10.1016/j.eswa.2013.09.043, https://www.sciencedirect.com/science/article/pii/S0957417413007975
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: A lite BERT for self-supervised learning of language representations. CoRR abs/ arXiv: 1909.11942 (2019)
Lin, P., Mo, X., Lin, G., Ling, L., Wei, T., Luo, W.: A news-driven recurrent neural network for market volatility prediction. In: 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), pp. 776–781 (2017). https://doi.org/10.1109/ACPR.2017.35
Liu, Y., et al.: Roberta: A robustly optimized BERT pretraining approach. CoRR abs/ arXiv: 1907.11692 (2019)
Liu, Z., Huang, D., Huang, K., Li, Z., Zhao, J.: Finbert: A pre-trained financial language representation model for financial text mining. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pp. 4513–4519. International Joint Conferences on Artificial Intelligence Organization (7 2020), special Track on AI in FinTech
Google Scholar
Liu, Z., Huang, D., Huang, K., Li, Z., Zhao, J.: Finbert: A pre-trained financial language representation model for financial text mining. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 4513–4519 (2021)
Google Scholar
Mariko, D., Labidurie, E., Ozturk, Y., Akl, H.A., de Mazancourt, H.: Data processing and annotation schemes for fincausal shared task. arXiv preprint arXiv:2012.02498 (2020)
Peng, B., Chersoni, E., Hsu, Y.Y., Huang, C.R.: Is domain adaptation worth your investment? comparing bert and finbert on financial tasks. In: Proceedings of the Third Workshop on Economics and Natural Language Processing, pp. 37–44 (2021)
Google Scholar
Rekabsaz, N., Lupu, M., Baklanov, A., Dür, A., Andersson, L., Hanbury, A.: Volatility prediction using financial disclosures sentiments with word embedding-based IR models. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1712–1721. Association for Computational Linguistics, Vancouver, Canada (Jul 2017). https://doi.org/10.18653/v1/P17-1157, https://www.aclweb.org/anthology/P17-1157
Ruder, S., Peters, M.E., Swayamdipta, S., Wolf, T.: Transfer learning in natural language processing. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pp. 15–18. Association for Computational Linguistics, Minneapolis, Minnesota (Jun 2019). https://doi.org/10.18653/v1/N19-5004, https://aclanthology.org/N19-5004
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol.1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany (Aug 2016). https://doi.org/10.18653/v1/P16-1162, https://aclanthology.org/P16-1162
Tsai, M., Wang, C.: Financial keyword expansion via continuous word vector representations. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25–29 Oct 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1453–1458. ACL (2014). https://doi.org/10.3115/v1/d14-1152
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs arXiv: 1609.08144 (2016)
Yang, L., Ng, T.L.J., Smyth, B., Dong, R.: Html: Hierarchical transformer-based multi-task learning for volatility prediction. In: Proceedings of The Web Conference 2020, pp. 441–451 (2020)
Google Scholar
Yang, Y., Uy, M.C.S., Huang, A.: Finbert: A pretrained language model for financial communications. CoRR abs/ arXiv: 2006.08097 (2020)
Zheng, S., Lu, A., Cardie, C.: Sumsum@ fns-2020 shared task. In: Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pp. 148–152 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Hyderabad, Hyderabad, India
Sravani Sasubilli
Institute for Development and Research in Banking Technology, Hyderabad, India
Mridula Verma

Authors

Sravani Sasubilli
View author publications
You can also search for this author in PubMed Google Scholar
Mridula Verma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mridula Verma .

Editor information

Editors and Affiliations

University of Sydney, Sydney, Australia
Irena Koprinska
University of Bari Aldo Moro, Bari, Italy
Paolo Mignone
University of Pisa, Pisa, Italy
Riccardo Guidotti
Warsaw University of Technology, Warsaw, Poland
Szymon Jaroszewicz
Heidelberg University, Heidelberg, Germany
Holger Fröning
UniCredit, Rome, Italy
Francesco Gullo
University of Lisbon, Lisbon, Portugal
Pedro M. Ferreira
Roche, Basel, Switzerland
Damian Roqueiro
Barcelona Supercomputing Center, Barcelona, Spain
Gaia Ceddia
Halmstad University, Halmstad, Sweden
Slawomir Nowaczyk
University of Porto, Porto, Portugal
João Gama
University of Porto, Porto, Portugal
Rita Ribeiro
UPC BarcelonaTech, Barcelona, Spain
Ricard Gavaldà
University of Naples Federico II, Naples, Italy
Elio Masciari
University of North Carolina, Charlotte, USA
Zbigniew Ras
ICAR-CNR, Rende, Italy
Ettore Ritacco
University of Pisa, Pisa, Italy
Francesca Naretto
Aalen University of Applied Sciences, Aalen, Germany
Andreas Theissler
Warsaw University of Technology, Warszaw, Poland
Przemyslaw Biecek
KU Leuven, Leuven, Belgium
Wouter Verbeke
University of Duisburg-Essen, Essen, Germany
Gregor Schiele
Graz University of Technology, Graz, Austria
Franz Pernkopf
AMD, Dublin, Ireland
Michaela Blott
UniCredit, Rome, Italy
Ilaria Bordino
UniCredit, Milan, Italy
Ivan Luciano Danesi
National Agency for New Technologies, Rome, Italy
Giovanni Ponti
Unicredit, Rome, Italy
Lorenzo Severini
University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Bari Aldo Moro, Bari, Italy
Giuseppina Andresini
University of Lisbon, Lisbon, Portugal
Ibéria Medeiros
University of Lisbon, Lisbon, Portugal
Guilherme Graça
Northwestern University, Chicago, USA
Lee Cooper
Roche, Basel, Switzerland
Naghmeh Ghazaleh
University of Lausanne, Lausanne, Switzerland
Jonas Richiardi
Novartis, Basel, Switzerland
Diego Saldana
Novartis, Basel, Switzerland
Konstantinos Sechidis
Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy
Arif Canakoglu
Politecnico di Milano, Milan, Italy
Sara Pido
Politecnico di Milano, Milan, Italy
Pietro Pinoli
University of Waikato, Hamilton, New Zealand
Albert Bifet
Halmstad University, Halmstad, Sweden
Sepideh Pashami

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sasubilli, S., Verma, M. (2023). InFi-BERT 1.0: Transformer-Based Language Model for Indian Financial Volatility Prediction. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1753. Springer, Cham. https://doi.org/10.1007/978-3-031-23633-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-23633-4_10
Published: 31 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23632-7
Online ISBN: 978-3-031-23633-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

InFi-BERT 1.0: Transformer-Based Language Model for Indian Financial Volatility Prediction