Skip to main content

News that Moves the Market: DSEX-News Dataset for Forecasting DSE Using BERT

  • Conference paper
  • First Online:
Recent Challenges in Intelligent Information and Database Systems (ACIIDS 2024)

Abstract

Stock market is a complex and dynamic industry that has always presented challenges for stakeholders and investors due to its unpredictable nature. This unpredictability motivates the need for more accurate prediction models. Traditional prediction models have limitations in handling the dynamic nature of the stock market. Additionally, previous methods have used less relevant data, leading to suboptimal performance. This study proposes the use of Bidirectional Encoder Representations from Transformers (BERT), a pre-trained Large Language Model (LLM), to predict Dhaka Stock Exchange (DSE) market movements. We also introduce a new dataset designed specifically for this problem, capturing important characteristics and patterns that were missing in other datasets. We test our new dataset of headlines and stock market indexes on various machine learning techniques, including Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), Linear Support Vector Machine (LSVM), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), Bidirectional Long Short-Term Memory (Bi-LSTM), BERT, Financial Bidirectional Encoder Representations from Transformers (FinBERT), and RoBERTa, which are compared to assess their predictive capabilities. Our proposed model achieves 99.83% accuracy on the training set and 99.78% accuracy on the test set, outperforming previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bing, L., Chan, K.C., Ou, C.: Public sentiment analysis in Twitter data for prediction of a company’s stock price movements. In: 2014 IEEE 11th International Conference on e-Business Engineering, pp. 232–239. IEEE, November 2014

    Google Scholar 

  2. Cakra, Y.E., Trisedya, B.D.: Stock price prediction using linear regression based on sentiment analysis. In: 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 147–154. IEEE, October 2015

    Google Scholar 

  3. Seker, S.E., Cihan, M.E.R.T., Khaled, A.N., Ozalp, N., Ugur, A.Y.A.N.: Time series analysis on stock market for text mining correlation of economy news. Int. J. Soc. Sci. Humanit. Stud. 6(1), 69–91 (2013)

    Google Scholar 

  4. Kim, Y., Jeong, S.R., Ghani, I.: Text opinion mining to analyze news for stock market prediction. Int. J. Advance. Soft Comput. Appl 6(1), 2074–8523 (2014)

    Google Scholar 

  5. Abdullah, S.S., Rahaman, M.S., Rahman, M.S.: Analysis of stock market using text mining and natural language processing. In: 2013 International Conference on Informatics, Electronics and Vision (ICIEV), pp. 1–6. IEEE, May 2013

    Google Scholar 

  6. Khan, M.N.R., Al Tanim, O., Salsabil, M.S., Reza, S.R., Hasib, K.M., Alam, M.S.: A multi-modal deep learning approach for predicting Dhaka stock exchange. In: 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0879–0885. IEEE, March 2023

    Google Scholar 

  7. Khan, M.N.R., Reza, S.R., Al Tanim, O., Salsabil, M.S., Hasib, K.M., Alam, M.S.: A hybrid method based on machine learning to predict the stock prices in Bangladesh. In: 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 0067–0073. IEEE, October 2022

    Google Scholar 

  8. Hasan, T., et al.: XL-Sum: large-scale multilingual abstractive summarization for 44 languages. arXiv preprint arXiv:2106.13822 (2021)

  9. Belinkov, Y., Bisk, Y.: Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173 (2017)

  10. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)

    Article  Google Scholar 

  11. Lora, S.K., Shahariar, G.M., Nazmin, T., Rahman, N.N., Rahman, R., Bhuiyan, M.: Ben-Sarc: a corpus for sarcasm detection from Bengali social media comments and its baseline evaluation (2022)

    Google Scholar 

  12. Ashtiani, M.N., Raahmei, B.: News-based intelligent prediction of financial markets using text mining and machine learning: a systematic literature review. Expert Syst. Appl. 217, 119509 (2023)

    Article  Google Scholar 

  13. Melina, S., Napitupulu, H., Mohamed, N.: A conceptual model of investment-risk prediction in the stock market using extreme value theory with machine learning: a semisystematic literature review. Risks 11(3), 60 (2023)

    Article  Google Scholar 

  14. Han, Y., Kim, J., Enke, D.: A machine learning trading system for the stock market based on N-period Min-Max labeling using XGBoost. Expert Syst. Appl. 211, 118581 (2023)

    Article  Google Scholar 

  15. Ali, M.B.: Impact of micro and macroeconomic variables on emerging stock market return: a case on Dhaka Stock Exchange (DSE). Interdisc. J. Res. Bus. 1(5), 8–16 (2011)

    Google Scholar 

  16. Sousa, M.G., Sakiyama, K., de Souza Rodrigues, L., Moraes, P.H., Fernandes, E.R., Matsubara, E.T.: BERT for stock market sentiment analysis. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1597–1601. IEEE, November 2019

    Google Scholar 

  17. Araci, D.: FinBERT: financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063 (2019)

  18. Liao, W., Zeng, B., Yin, X., Wei, P.: An improved aspect-category sentiment analysis model for text sentiment analysis based on RoBERTa. Appl. Intell. 51, 3522–3533 (2021)

    Article  Google Scholar 

  19. Al-Taie, M.Z., Kadry, S., Lucas, J.P.: Online data preprocessing: a case study approach. Int. J. Electr. Comput. Eng. 9(4), 2620 (2019)

    Google Scholar 

  20. Almeida, F., Xexéo, G.: Word embeddings: a survey. arXiv preprint arXiv:1901.09069 (2019)

  21. Xie, Q., Dai, Z., Hovy, E., Luong, T., Le, Q.: Unsupervised data augmentation for consistency training. Adv. Neural. Inf. Process. Syst. 33, 6256–6268 (2020)

    Google Scholar 

  22. Rahman, N.: DSEX-News Dataset for Forecasting DSE Using BERT (2023). https://www.kaggle.com/datasets/nilabrahman/dsex-news-dataset-for-forecasting-dse-using-bert/data

  23. Rauf, S.A., Qiang, Y., Ali, S.B., Ahmad, W.: Using BERT for checking the polarity of movie reviews. Int. J. Comput. Appl. 975(8887) (2019)

    Google Scholar 

  24. https://www.investopedia.com/terms/b/bollingerbands.asp

  25. Bi, J.: Stock market prediction based on financial news text mining and investor sentiment recognition. Math. Probl. Eng. 2022, 1–9 (2022). https://doi.org/10.1155/2022/2427389

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Rafiqul Islam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khan, M.N.R. et al. (2024). News that Moves the Market: DSEX-News Dataset for Forecasting DSE Using BERT. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2024. Communications in Computer and Information Science, vol 2145. Springer, Singapore. https://doi.org/10.1007/978-981-97-5934-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5934-7_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5933-0

  • Online ISBN: 978-981-97-5934-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics