Skip to main content

Word Categorization of Corporate Annual Reports for Bankruptcy Prediction by Machine Learning Methods

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Abstract

The language of company related documents is recognized as being an important indicator of future financial performance. This study aims to extract various word categories from corporate annual reports and examine their effect on bankruptcy prediction. We show that the language used by bankrupt companies is characterized by stronger tenacity, accomplishment, familiarity, present concern, exclusion and denial. Bankrupt companies also use more modal, positive, uncertain and negative language. We used neural networks, support vector machines, decision trees and ensembles of decision trees to predict corporate bankruptcy. The prediction models utilized both financial indicators and word categorizations as input variables. We show that both general dictionary and financial dictionary categories can significantly improve the accuracy of the prediction models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kirkos, E.: Assessing Methodologies for Intelligent Bankruptcy Prediction. Artificial Intelligence Review 43(1), 83–123 (2015)

    Article  Google Scholar 

  2. Huang, S.M., Tsai, C.F., Yen, D.C., Cheng, Y.L.: A Hybrid Financial Analysis Model for Business Failure Prediction. Expert Systems with Applications 35(3), 1034–1040 (2008)

    Article  Google Scholar 

  3. Chaudhuri, A., De, K.: Fuzzy Support Vector Machine for Bankruptcy Prediction. Applied Soft Computing 11(2), 2472–2486 (2011)

    Article  Google Scholar 

  4. Alfaro, E., García, N., Gámez, M., Elizondo, D.: Bankruptcy Forecasting: An Empirical Comparison of AdaBoost and Neural Networks. Decision Support Systems 45(1), 110–122 (2008)

    Article  Google Scholar 

  5. Verikas, A., Kalsyte, Z., Bacauskiene, M., Gelzinis, A.: Hybrid and Ensemble-based Soft Computing Techniques in Bankruptcy Prediction: A Survey. Soft Computing 14(9), 995–1010 (2010)

    Article  Google Scholar 

  6. Heo, J., Yang, J.Y.: AdaBoost Based Bankruptcy Forecasting of Korean Construction Companies. Applied Soft Computing 24, 494–499 (2014)

    Article  Google Scholar 

  7. Loughran, T., McDonald, B.: When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance 66(1), 35–65 (2011)

    Article  Google Scholar 

  8. Cecchini, M., Aytug, H., Koehler, G.J., Pathak, P.: Making Words Work: Using Financial Text as a Predictor of Financial Events. Decision Support Systems 50(1), 164–175 (2010)

    Article  Google Scholar 

  9. Shirata, C.Y., Takeuchi, H., Ogino, S., Watanabe, H.: Extracting Key Phrases as Predictors of Corporate Bankruptcy: Empirical Analysis of Annual Reports by Text Mining. Journal of Emerging Technologies in Accounting 8(1), 31–44 (2011)

    Article  Google Scholar 

  10. Lu, H.M., Tsai, F.T., Chen, H., Hung, M.W., Li, S.H.: Credit Rating Change Modeling Using News and Financial Ratios. ACM Transactions on Management Information Systems 3(3), 14 (2012)

    Article  Google Scholar 

  11. Lu, Y.C., Shen, C.H., Wei, Y.C.: Revisiting early warning signals of corporate credit default using linguistic analysis. Pacific-Basin Finance Journal 24, 1–21 (2013)

    Article  Google Scholar 

  12. Hájek, P., Olej, V.: Evaluating sentiment in annual reports for financial distress prediction using neural networks and support vector machines. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) EANN 2013, Part II. CCIS, vol. 384, pp. 1–10. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Hajek, P., Olej, V., Myskova, R.: Forecasting Corporate Financial Performance using Sentiment in Annual Reports for Stakeholders’ Decision-Making. Technological and Economic Development of Economy 20(4), 721–738 (2014)

    Article  Google Scholar 

  14. Zhou, L.: Performance of Corporate Bankruptcy Prediction Models on Imbalanced Dataset: The Effect of Sampling Methods. Knowledge-Based Systems 41, 16–25 (2013)

    Article  Google Scholar 

  15. Hart, R.P.: Redeveloping DICTION: theoretical considerations. In: West, M.D. (ed.) Theory, Method, and Practice in Computer Content Analysis, pp. 43–60 (2001)

    Google Scholar 

  16. Hall, M.A.: Correlation-based Feature Selection for Machine Learning. Doctoral dissertation, The University of Waikato (1999)

    Google Scholar 

  17. Hajek, P., Michalak, K.: Feature Selection in Corporate Credit Rating Prediction. Knowledge-Based Systems 51, 72–84 (2013)

    Article  Google Scholar 

  18. Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: 16th Int. Conf. on Machine Learning, pp. 124–133, Bled, Slovenia (1999)

    Google Scholar 

  19. Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)

    Google Scholar 

  20. Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)

    Google Scholar 

  21. Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation Forest: A New Classifier Ensemble Method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)

    Article  Google Scholar 

  22. Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    Article  Google Scholar 

  23. Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A Comparison of Decision Tree Ensemble Creation Techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1), 173–180 (2007)

    Article  Google Scholar 

  24. Powers, D.M.W.: Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies 1(2), 37–63 (2011)

    Google Scholar 

  25. Hájek, P., Olej, V., Myšková, R.: Predicting financial distress of banks using random subspace ensembles of support vector machines. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Artificial Intelligence Perspectives and Applications. AISC, vol. 347, pp. 131–140. Springer, Heidelberg (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr Hájek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Hájek, P., Olej, V. (2015). Word Categorization of Corporate Annual Reports for Bankruptcy Prediction by Machine Learning Methods. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics