Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports

Butler, Matthew; Kešelj, Vlado

doi:10.1007/978-3-642-01818-3_7

Matthew Butler²¹ &
Vlado Kešelj²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5549))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

1807 Accesses
8 Citations

Abstract

Two novel Natural Language Processing (NLP) classification techniques are applied to the analysis of corporate annual reports in the task of financial forecasting. The hypothesis is that textual content of annual reports contain vital information for assessing the performance of the stock over the next year. The first method is based on character n-gram profiles, which are generated for each annual report, and then labeled based on the CNG classification. The second method draws on a more traditional approach, where readability scores are combined with performance inputs and then supplied to a support vector machine (SVM) for classification. Both methods consistently outperformed a benchmark portfolio, and their combination proved to be even more effective and efficient as the combined models yielded the highest returns with the fewest trades.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING 2003, Dalhousie University, Halifax, Nova Scotia, Canada, pp. 255–264 (August 2003)
Google Scholar
Falinouss, P.: Stock trend prediction using news articles. Master’s thesis, Lulea University of Technology (2007) ISSN 1653-0187
Google Scholar
Schumaker, R., Chen, H.: Textual analysis of stock market prediction using financial news articles. In: Proc. from the America’s Conf. on Inform. Systems (2006)
Google Scholar
Mittermayer, M.: Forecasting intraday stock price trends with text mining techniques. In: Proc. of the 37th Hawaii Int’nal Conf. on System Sciences (2004)
Google Scholar
Kloptchenko, A., Magnusson, C., Back, B., Vanharanta, H., Visa, A.: Mining textual contents of quarterly reports. Technical Report No. 515, TUCS (May 2002) ISBN 952-12-1138-5
Google Scholar
Kloptchenko, A., Eklund, T., Karlsson, J., Back, B., Vanharanta, H., Visa, A.: Combined data and text mining techniques for analysing financial reports. Intelligent Systems in Accounting, Finance and Management 12, 29–41 (2004)
Article Google Scholar
Subramanian, R., Insley, R., Blackwell, R.: Performance and readability: A comparison of annual reports of profitable and unprofitable corporations. The Journal of Business Communication (1993)
Google Scholar
Li, F.: Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics (2008)
Google Scholar
Yahoo! Inc.: Yahoo! finance, http://ca.finance.yahoo.com/ (last access 2008)
Kešelj, V.: Text:Ngrams Perl module for flexible ngram analysis (2003–2009), http://www.cs.dal.ca/~Evlado/srcperl/Ngrams/Ngrams.html Ver, 2.002. Avail. at CPAN
Ryan, K.: Lingua:EN: Fathom Perl module for measuring readability of english text. Available at CPAN (2007)
Google Scholar
CPAN Community: CPAN—Comprehensive Perl Archive Network (1995–2009), http://cpan.org
Fast, G.: Lingua:EN: Syllable Perl module for estimating syllable count in words. Available at CPAN (1999), http://search.cpan.org/perldoc?Lingua:EN:Syllable
Fan, R., Chen, P., Lin, C.: Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889–1918 (2005)
MATH Google Scholar
Fan, A., Palaniswami, M.: Stock selection using support vector machines. In: Proceedings of IJCNN 2001, vol. 3, pp. 1793–1798 (2001)
Google Scholar
Huang, W., Nakamori, Y., Want, S.-Y.: Forecasting stock market movement direction with support vector machine. Computers and Operations Research 32, 2513–2522 (2005)
Article MATH Google Scholar
Kim, K.: Financial time series forecasting using support vector machines. Neurocomputing 55, 307–319 (2003)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~7Ecjlin/libsvm

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, Canada
Matthew Butler & Vlado Kešelj

Authors

Matthew Butler
View author publications
You can also search for this author in PubMed Google Scholar
Vlado Kešelj
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science Irving K. Barber School of Arts and Sciences, University of British Columbia Okanagan, 3333 University Way, V1V 1V5, Kelowna, British Columbia, Canada
Yong Gao
School of Information Technology & Engineering, University of Ottawa, 800 King Edward Avenue, P.O. Box 450, K1N 6N5, Stn. A, Ottawa, Ontario, Canada
Nathalie Japkowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Butler, M., Kešelj, V. (2009). Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports. In: Gao, Y., Japkowicz, N. (eds) Advances in Artificial Intelligence. Canadian AI 2009. Lecture Notes in Computer Science(), vol 5549. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01818-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-01818-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01817-6
Online ISBN: 978-3-642-01818-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics