Abstract
Stock market is a complex and dynamic industry that has always presented challenges for stakeholders and investors due to its unpredictable nature. This unpredictability motivates the need for more accurate prediction models. Traditional prediction models have limitations in handling the dynamic nature of the stock market. Additionally, previous methods have used less relevant data, leading to suboptimal performance. This study proposes the use of Bidirectional Encoder Representations from Transformers (BERT), a pre-trained Large Language Model (LLM), to predict Dhaka Stock Exchange (DSE) market movements. We also introduce a new dataset designed specifically for this problem, capturing important characteristics and patterns that were missing in other datasets. We test our new dataset of headlines and stock market indexes on various machine learning techniques, including Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), Linear Support Vector Machine (LSVM), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), Bidirectional Long Short-Term Memory (Bi-LSTM), BERT, Financial Bidirectional Encoder Representations from Transformers (FinBERT), and RoBERTa, which are compared to assess their predictive capabilities. Our proposed model achieves 99.83% accuracy on the training set and 99.78% accuracy on the test set, outperforming previous methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bing, L., Chan, K.C., Ou, C.: Public sentiment analysis in Twitter data for prediction of a company’s stock price movements. In: 2014 IEEE 11th International Conference on e-Business Engineering, pp. 232–239. IEEE, November 2014
Cakra, Y.E., Trisedya, B.D.: Stock price prediction using linear regression based on sentiment analysis. In: 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 147–154. IEEE, October 2015
Seker, S.E., Cihan, M.E.R.T., Khaled, A.N., Ozalp, N., Ugur, A.Y.A.N.: Time series analysis on stock market for text mining correlation of economy news. Int. J. Soc. Sci. Humanit. Stud. 6(1), 69–91 (2013)
Kim, Y., Jeong, S.R., Ghani, I.: Text opinion mining to analyze news for stock market prediction. Int. J. Advance. Soft Comput. Appl 6(1), 2074–8523 (2014)
Abdullah, S.S., Rahaman, M.S., Rahman, M.S.: Analysis of stock market using text mining and natural language processing. In: 2013 International Conference on Informatics, Electronics and Vision (ICIEV), pp. 1–6. IEEE, May 2013
Khan, M.N.R., Al Tanim, O., Salsabil, M.S., Reza, S.R., Hasib, K.M., Alam, M.S.: A multi-modal deep learning approach for predicting Dhaka stock exchange. In: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0879–0885. IEEE, March 2023
Khan, M.N.R., Reza, S.R., Al Tanim, O., Salsabil, M.S., Hasib, K.M., Alam, M.S.: A hybrid method based on machine learning to predict the stock prices in Bangladesh. In: 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 0067–0073. IEEE, October 2022
Hasan, T., et al.: XL-Sum: large-scale multilingual abstractive summarization for 44 languages. arXiv preprint arXiv:2106.13822 (2021)
Belinkov, Y., Bisk, Y.: Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 (2017)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Lora, S.K., Shahariar, G.M., Nazmin, T., Rahman, N.N., Rahman, R., Bhuiyan, M.: Ben-Sarc: a corpus for sarcasm detection from Bengali social media comments and its baseline evaluation (2022)
Ashtiani, M.N., Raahmei, B.: News-based intelligent prediction of financial markets using text mining and machine learning: a systematic literature review. Expert Syst. Appl. 217, 119509 (2023)
Melina, S., Napitupulu, H., Mohamed, N.: A conceptual model of investment-risk prediction in the stock market using extreme value theory with machine learning: a semisystematic literature review. Risks 11(3), 60 (2023)
Han, Y., Kim, J., Enke, D.: A machine learning trading system for the stock market based on N-period Min-Max labeling using XGBoost. Expert Syst. Appl. 211, 118581 (2023)
Ali, M.B.: Impact of micro and macroeconomic variables on emerging stock market return: a case on Dhaka Stock Exchange (DSE). Interdisc. J. Res. Bus. 1(5), 8–16 (2011)
Sousa, M.G., Sakiyama, K., de Souza Rodrigues, L., Moraes, P.H., Fernandes, E.R., Matsubara, E.T.: BERT for stock market sentiment analysis. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1597–1601. IEEE, November 2019
Araci, D.: FinBERT: financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063 (2019)
Liao, W., Zeng, B., Yin, X., Wei, P.: An improved aspect-category sentiment analysis model for text sentiment analysis based on RoBERTa. Appl. Intell. 51, 3522–3533 (2021)
Al-Taie, M.Z., Kadry, S., Lucas, J.P.: Online data preprocessing: a case study approach. Int. J. Electr. Comput. Eng. 9(4), 2620 (2019)
Almeida, F., Xexéo, G.: Word embeddings: a survey. arXiv preprint arXiv:1901.09069 (2019)
Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. Adv. Neural. Inf. Process. Syst. 33, 6256–6268 (2020)
Rahman, N.: DSEX-News Dataset for Forecasting DSE Using BERT (2023). https://www.kaggle.com/datasets/nilabrahman/dsex-news-dataset-for-forecasting-dse-using-bert/data
Rauf, S.A., Qiang, Y., Ali, S.B., Ahmad, W.: Using BERT for checking the polarity of movie reviews. Int. J. Comput. Appl. 975(8887) (2019)
Bi, J.: Stock market prediction based on financial news text mining and investor sentiment recognition. Math. Probl. Eng. 2022, 1–9 (2022). https://doi.org/10.1155/2022/2427389
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Khan, M.N.R. et al. (2024). News that Moves the Market: DSEX-News Dataset for Forecasting DSE Using BERT. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2024. Communications in Computer and Information Science, vol 2145. Springer, Singapore. https://doi.org/10.1007/978-981-97-5934-7_19
Download citation
DOI: https://doi.org/10.1007/978-981-97-5934-7_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5933-0
Online ISBN: 978-981-97-5934-7
eBook Packages: Computer ScienceComputer Science (R0)