Word Categorization of Corporate Annual Reports for Bankruptcy Prediction by Machine Learning Methods

Hájek, Petr; Olej, Vladimír

doi:10.1007/978-3-319-24033-6_14

Word Categorization of Corporate Annual Reports for Bankruptcy Prediction by Machine Learning Methods

Petr Hájek¹⁵ &
Vladimír Olej¹⁵

Conference paper
First Online: 11 December 2015

1900 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Abstract

The language of company related documents is recognized as being an important indicator of future financial performance. This study aims to extract various word categories from corporate annual reports and examine their effect on bankruptcy prediction. We show that the language used by bankrupt companies is characterized by stronger tenacity, accomplishment, familiarity, present concern, exclusion and denial. Bankrupt companies also use more modal, positive, uncertain and negative language. We used neural networks, support vector machines, decision trees and ensembles of decision trees to predict corporate bankruptcy. The prediction models utilized both financial indicators and word categorizations as input variables. We show that both general dictionary and financial dictionary categories can significantly improve the accuracy of the prediction models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kirkos, E.: Assessing Methodologies for Intelligent Bankruptcy Prediction. Artificial Intelligence Review 43(1), 83–123 (2015)
Article Google Scholar
Huang, S.M., Tsai, C.F., Yen, D.C., Cheng, Y.L.: A Hybrid Financial Analysis Model for Business Failure Prediction. Expert Systems with Applications 35(3), 1034–1040 (2008)
Article Google Scholar
Chaudhuri, A., De, K.: Fuzzy Support Vector Machine for Bankruptcy Prediction. Applied Soft Computing 11(2), 2472–2486 (2011)
Article Google Scholar
Alfaro, E., García, N., Gámez, M., Elizondo, D.: Bankruptcy Forecasting: An Empirical Comparison of AdaBoost and Neural Networks. Decision Support Systems 45(1), 110–122 (2008)
Article Google Scholar
Verikas, A., Kalsyte, Z., Bacauskiene, M., Gelzinis, A.: Hybrid and Ensemble-based Soft Computing Techniques in Bankruptcy Prediction: A Survey. Soft Computing 14(9), 995–1010 (2010)
Article Google Scholar
Heo, J., Yang, J.Y.: AdaBoost Based Bankruptcy Forecasting of Korean Construction Companies. Applied Soft Computing 24, 494–499 (2014)
Article Google Scholar
Loughran, T., McDonald, B.: When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance 66(1), 35–65 (2011)
Article Google Scholar
Cecchini, M., Aytug, H., Koehler, G.J., Pathak, P.: Making Words Work: Using Financial Text as a Predictor of Financial Events. Decision Support Systems 50(1), 164–175 (2010)
Article Google Scholar
Shirata, C.Y., Takeuchi, H., Ogino, S., Watanabe, H.: Extracting Key Phrases as Predictors of Corporate Bankruptcy: Empirical Analysis of Annual Reports by Text Mining. Journal of Emerging Technologies in Accounting 8(1), 31–44 (2011)
Article Google Scholar
Lu, H.M., Tsai, F.T., Chen, H., Hung, M.W., Li, S.H.: Credit Rating Change Modeling Using News and Financial Ratios. ACM Transactions on Management Information Systems 3(3), 14 (2012)
Article Google Scholar
Lu, Y.C., Shen, C.H., Wei, Y.C.: Revisiting early warning signals of corporate credit default using linguistic analysis. Pacific-Basin Finance Journal 24, 1–21 (2013)
Article Google Scholar
Hájek, P., Olej, V.: Evaluating sentiment in annual reports for financial distress prediction using neural networks and support vector machines. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) EANN 2013, Part II. CCIS, vol. 384, pp. 1–10. Springer, Heidelberg (2013)
Chapter Google Scholar
Hajek, P., Olej, V., Myskova, R.: Forecasting Corporate Financial Performance using Sentiment in Annual Reports for Stakeholders’ Decision-Making. Technological and Economic Development of Economy 20(4), 721–738 (2014)
Article Google Scholar
Zhou, L.: Performance of Corporate Bankruptcy Prediction Models on Imbalanced Dataset: The Effect of Sampling Methods. Knowledge-Based Systems 41, 16–25 (2013)
Article Google Scholar
Hart, R.P.: Redeveloping DICTION: theoretical considerations. In: West, M.D. (ed.) Theory, Method, and Practice in Computer Content Analysis, pp. 43–60 (2001)
Google Scholar
Hall, M.A.: Correlation-based Feature Selection for Machine Learning. Doctoral dissertation, The University of Waikato (1999)
Google Scholar
Hajek, P., Michalak, K.: Feature Selection in Corporate Credit Rating Prediction. Knowledge-Based Systems 51, 72–84 (2013)
Article Google Scholar
Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: 16th Int. Conf. on Machine Learning, pp. 124–133, Bled, Slovenia (1999)
Google Scholar
Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)
Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Google Scholar
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation Forest: A New Classifier Ensemble Method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)
Article Google Scholar
Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Article Google Scholar
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A Comparison of Decision Tree Ensemble Creation Techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1), 173–180 (2007)
Article Google Scholar
Powers, D.M.W.: Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies 1(2), 37–63 (2011)
Google Scholar
Hájek, P., Olej, V., Myšková, R.: Predicting financial distress of banks using random subspace ensembles of support vector machines. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Artificial Intelligence Perspectives and Applications. AISC, vol. 347, pp. 131–140. Springer, Heidelberg (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of System Engineering and Informatics, Faculty of Economics and Administration, University of Pardubice, Pardubice, Czech Republic
Petr Hájek & Vladimír Olej

Authors

Petr Hájek
View author publications
You can also search for this author in PubMed Google Scholar
Vladimír Olej
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petr Hájek .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hájek, P., Olej, V. (2015). Word Categorization of Corporate Annual Reports for Bankruptcy Prediction by Machine Learning Methods. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_14
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics