Skip to main content

Topics in Financial Filings and Bankruptcy Prediction with Distributed Representations of Textual Data

  • Conference paper
  • First Online:
  • 1967 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12461))

Abstract

We uncover latent topics embedded in the management discussion and analysis (MD&A) of financial reports from the listed companies in the US, and we examine the evolution of topics found by a dynamic topic modelling method - Dynamic Embedding Topic Model. Using more than 203k reports with 40M sentences ranging from 1997 to 2017, we find 30 interpretable topics. The evolution of topics follows economics cycles and major industrial events. We validate the significance of these latent topics by the state-of-the-art performance of a simple bankruptcy ensemble classifier trained on both novel features - topical distributed representation of the MD&A, and accounting features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Electronic Data Gathering, Analysis, and Retrieval system - SEC.

  2. 2.

    https://www.sec.gov/corpfin/cf-manual/topic-9.

  3. 3.

    https://www.sec.gov/reportspubs/investor-publications/investorpubsbankrupthtm.html.

  4. 4.

    We did not use the modal weak words because the uncertainty wordlist includes all words in that list, and the modal moderate list is also excluded since it does not have words with strong sentiment modification.

  5. 5.

    In Statement of Recommended Accounting Standards No. 15, Financial Accounting Standards Board (FASB).

  6. 6.

    “U.S. coal production dropped by more than 10% in 2015 to 897 million short tons, the lowest production level since 1986”, US Energy Information Administration, https://www.eia.gov/todayinenergy/detail.php?id=28732, Retrieved 3rd Mar, 2020.

References

  1. Altman, E.I., Iwanicz-Drozdowska, M., Laitinen, E.K., Suvas, A.: Financial distress prediction in an international context: a review and empirical analysis of Altman’s Z- score model. J. Int. Financ. Manag. Acc. 28(2), 131–171 (2017)

    Article  Google Scholar 

  2. Altman, E.I., Sabato, G.: Modelling credit risk for SMEs: evidence from the U.S. market. Abacus 43(3), 332–357 (2007)

    Google Scholar 

  3. Bao, Y., Datta, A.: Simultaneously discovering and quantifying risk types from textual risk disclosures. Manag. Sci. 60(6), 1371–1391 (2014)

    Article  Google Scholar 

  4. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning - ICML 2006, pp. 113–120. ACM Press, Pittsburgh (2006)

    Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)

    Google Scholar 

  7. Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: The dynamic embedded topic model. arXiv:1907.05545 [cs, stat] (2019)

  8. Dieng, A.B., Ruiz, F.J., Blei, D.M.: Topic modeling in embedding spaces. arXiv preprint arXiv:1907.04907 (2019)

  9. Gandhi, P., Loughran, T., McDonald, B.: Using annual report sentiment as a proxy for financial distress in U.S. banks. J. Behav. Finance 20(4), 424–436 (2019)

    Google Scholar 

  10. García, V., Marqués, A.I., Sánchez, J.S.: Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inf. Fusion 47, 88–101 (2019)

    Article  Google Scholar 

  11. Guan, L., He, S.D., McEldowney, J.: Window dressing in reported earnings. Com. Lending Rev. 23, 26 (2008)

    Google Scholar 

  12. Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 856–864. Curran Associates, Inc. (2010)

    Google Scholar 

  13. Huang, A.H., Lehavy, R., Zang, A.Y., Zheng, R.: Analyst information discovery and interpretation roles: a topic modeling approach. Manag. Sci. 64(6), 2833–2855 (2018)

    Article  Google Scholar 

  14. Huang, K.W., Li, Z.: A multilabel text classification algorithm for labeling risk factors in SEC form 10-K. ACM Trans. Manag. Inf. Syst. 2(3), 1–19 (2011)

    Article  Google Scholar 

  15. Jiang, F., Lee, J., Martin, X., Zhou, G.: Manager sentiment and stock returns. J. Financ. Econ. 132(1), 126–149 (2019)

    Article  Google Scholar 

  16. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning (2014)

    Google Scholar 

  17. Loughran, T., Mcdonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66(1), 35–65 (2011)

    Article  Google Scholar 

  18. Mai, F., Tian, S., Lee, C., Ma, L.: Deep learning models for bankruptcy prediction using textual disclosures. Eur. J. Oper. Res. 274(2), 743–758 (2019)

    Article  Google Scholar 

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013)

    Google Scholar 

  20. Nguyen, H.B., Huynh, V.N.: On sampling techniques for corporate credit scoring. J. Adv. Comput. Intell. Intell. Inform. 24(1), 48–57 (2020)

    Article  Google Scholar 

  21. Nguyen, T.H., Shirai, K., Velcin, J.: Sentiment analysis on social media for stock movement prediction. Expert Syst. Appl. 42(24), 9603–9611 (2015)

    Article  Google Scholar 

  22. Zhou, G.: Measuring investor sentiment. Ann. Rev. Financ. Econ. 10, 239–259 (2018)

    Article  Google Scholar 

Download references

Acknowledgments

We appreciate the fruitful discussions with Professor Jonathan Crook, Professor Galina Andreeva, Professor Raffaella Calabrese and other researchers at Business School, University of Edinburgh when Hung Ba was supported by JAIST Research Grant and DRF Grant No. 238003 to work as a visiting scholar. Other errors retain our own.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ba-Hung Nguyen .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1343 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, BH., Kiyoaki, S., Huynh, VN. (2021). Topics in Financial Filings and Bankruptcy Prediction with Distributed Representations of Textual Data. In: Dong, Y., Ifrim, G., Mladenić, D., Saunders, C., Van Hoecke, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12461. Springer, Cham. https://doi.org/10.1007/978-3-030-67670-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67670-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67669-8

  • Online ISBN: 978-3-030-67670-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics