Abstract
The language of company related documents is recognized as being an important indicator of future financial performance. This study aims to extract various word categories from corporate annual reports and examine their effect on bankruptcy prediction. We show that the language used by bankrupt companies is characterized by stronger tenacity, accomplishment, familiarity, present concern, exclusion and denial. Bankrupt companies also use more modal, positive, uncertain and negative language. We used neural networks, support vector machines, decision trees and ensembles of decision trees to predict corporate bankruptcy. The prediction models utilized both financial indicators and word categorizations as input variables. We show that both general dictionary and financial dictionary categories can significantly improve the accuracy of the prediction models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kirkos, E.: Assessing Methodologies for Intelligent Bankruptcy Prediction. Artificial Intelligence Review 43(1), 83–123 (2015)
Huang, S.M., Tsai, C.F., Yen, D.C., Cheng, Y.L.: A Hybrid Financial Analysis Model for Business Failure Prediction. Expert Systems with Applications 35(3), 1034–1040 (2008)
Chaudhuri, A., De, K.: Fuzzy Support Vector Machine for Bankruptcy Prediction. Applied Soft Computing 11(2), 2472–2486 (2011)
Alfaro, E., García, N., Gámez, M., Elizondo, D.: Bankruptcy Forecasting: An Empirical Comparison of AdaBoost and Neural Networks. Decision Support Systems 45(1), 110–122 (2008)
Verikas, A., Kalsyte, Z., Bacauskiene, M., Gelzinis, A.: Hybrid and Ensemble-based Soft Computing Techniques in Bankruptcy Prediction: A Survey. Soft Computing 14(9), 995–1010 (2010)
Heo, J., Yang, J.Y.: AdaBoost Based Bankruptcy Forecasting of Korean Construction Companies. Applied Soft Computing 24, 494–499 (2014)
Loughran, T., McDonald, B.: When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance 66(1), 35–65 (2011)
Cecchini, M., Aytug, H., Koehler, G.J., Pathak, P.: Making Words Work: Using Financial Text as a Predictor of Financial Events. Decision Support Systems 50(1), 164–175 (2010)
Shirata, C.Y., Takeuchi, H., Ogino, S., Watanabe, H.: Extracting Key Phrases as Predictors of Corporate Bankruptcy: Empirical Analysis of Annual Reports by Text Mining. Journal of Emerging Technologies in Accounting 8(1), 31–44 (2011)
Lu, H.M., Tsai, F.T., Chen, H., Hung, M.W., Li, S.H.: Credit Rating Change Modeling Using News and Financial Ratios. ACM Transactions on Management Information Systems 3(3), 14 (2012)
Lu, Y.C., Shen, C.H., Wei, Y.C.: Revisiting early warning signals of corporate credit default using linguistic analysis. Pacific-Basin Finance Journal 24, 1–21 (2013)
Hájek, P., Olej, V.: Evaluating sentiment in annual reports for financial distress prediction using neural networks and support vector machines. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) EANN 2013, Part II. CCIS, vol. 384, pp. 1–10. Springer, Heidelberg (2013)
Hajek, P., Olej, V., Myskova, R.: Forecasting Corporate Financial Performance using Sentiment in Annual Reports for Stakeholders’ Decision-Making. Technological and Economic Development of Economy 20(4), 721–738 (2014)
Zhou, L.: Performance of Corporate Bankruptcy Prediction Models on Imbalanced Dataset: The Effect of Sampling Methods. Knowledge-Based Systems 41, 16–25 (2013)
Hart, R.P.: Redeveloping DICTION: theoretical considerations. In: West, M.D. (ed.) Theory, Method, and Practice in Computer Content Analysis, pp. 43–60 (2001)
Hall, M.A.: Correlation-based Feature Selection for Machine Learning. Doctoral dissertation, The University of Waikato (1999)
Hajek, P., Michalak, K.: Feature Selection in Corporate Credit Rating Prediction. Knowledge-Based Systems 51, 72–84 (2013)
Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: 16th Int. Conf. on Machine Learning, pp. 124–133, Bled, Slovenia (1999)
Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation Forest: A New Classifier Ensemble Method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)
Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A Comparison of Decision Tree Ensemble Creation Techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1), 173–180 (2007)
Powers, D.M.W.: Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies 1(2), 37–63 (2011)
Hájek, P., Olej, V., Myšková, R.: Predicting financial distress of banks using random subspace ensembles of support vector machines. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Artificial Intelligence Perspectives and Applications. AISC, vol. 347, pp. 131–140. Springer, Heidelberg (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Hájek, P., Olej, V. (2015). Word Categorization of Corporate Annual Reports for Bankruptcy Prediction by Machine Learning Methods. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)