Skip to main content

Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5549))

Included in the following conference series:

Abstract

Two novel Natural Language Processing (NLP) classification techniques are applied to the analysis of corporate annual reports in the task of financial forecasting. The hypothesis is that textual content of annual reports contain vital information for assessing the performance of the stock over the next year. The first method is based on character n-gram profiles, which are generated for each annual report, and then labeled based on the CNG classification. The second method draws on a more traditional approach, where readability scores are combined with performance inputs and then supplied to a support vector machine (SVM) for classification. Both methods consistently outperformed a benchmark portfolio, and their combination proved to be even more effective and efficient as the combined models yielded the highest returns with the fewest trades.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING 2003, Dalhousie University, Halifax, Nova Scotia, Canada, pp. 255–264 (August 2003)

    Google Scholar 

  2. Falinouss, P.: Stock trend prediction using news articles. Master’s thesis, Lulea University of Technology (2007) ISSN 1653-0187

    Google Scholar 

  3. Schumaker, R., Chen, H.: Textual analysis of stock market prediction using financial news articles. In: Proc. from the America’s Conf. on Inform. Systems (2006)

    Google Scholar 

  4. Mittermayer, M.: Forecasting intraday stock price trends with text mining techniques. In: Proc. of the 37th Hawaii Int’nal Conf. on System Sciences (2004)

    Google Scholar 

  5. Kloptchenko, A., Magnusson, C., Back, B., Vanharanta, H., Visa, A.: Mining textual contents of quarterly reports. Technical Report No. 515, TUCS (May 2002) ISBN 952-12-1138-5

    Google Scholar 

  6. Kloptchenko, A., Eklund, T., Karlsson, J., Back, B., Vanharanta, H., Visa, A.: Combined data and text mining techniques for analysing financial reports. Intelligent Systems in Accounting, Finance and Management 12, 29–41 (2004)

    Article  Google Scholar 

  7. Subramanian, R., Insley, R., Blackwell, R.: Performance and readability: A comparison of annual reports of profitable and unprofitable corporations. The Journal of Business Communication (1993)

    Google Scholar 

  8. Li, F.: Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics (2008)

    Google Scholar 

  9. Yahoo! Inc.: Yahoo! finance, http://ca.finance.yahoo.com/ (last access 2008)

  10. Kešelj, V.: Text:Ngrams Perl module for flexible ngram analysis (2003–2009), http://www.cs.dal.ca/~Evlado/srcperl/Ngrams/Ngrams.html Ver, 2.002. Avail. at CPAN

  11. Ryan, K.: Lingua:EN: Fathom Perl module for measuring readability of english text. Available at CPAN (2007)

    Google Scholar 

  12. CPAN Community: CPAN—Comprehensive Perl Archive Network (1995–2009), http://cpan.org

  13. Fast, G.: Lingua:EN: Syllable Perl module for estimating syllable count in words. Available at CPAN (1999), http://search.cpan.org/perldoc?Lingua:EN:Syllable

  14. Fan, R., Chen, P., Lin, C.: Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889–1918 (2005)

    MATH  Google Scholar 

  15. Fan, A., Palaniswami, M.: Stock selection using support vector machines. In: Proceedings of IJCNN 2001, vol. 3, pp. 1793–1798 (2001)

    Google Scholar 

  16. Huang, W., Nakamori, Y., Want, S.-Y.: Forecasting stock market movement direction with support vector machine. Computers and Operations Research 32, 2513–2522 (2005)

    Article  MATH  Google Scholar 

  17. Kim, K.: Financial time series forecasting using support vector machines. Neurocomputing 55, 307–319 (2003)

    Article  Google Scholar 

  18. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~7Ecjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Butler, M., Kešelj, V. (2009). Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports. In: Gao, Y., Japkowicz, N. (eds) Advances in Artificial Intelligence. Canadian AI 2009. Lecture Notes in Computer Science(), vol 5549. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01818-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01818-3_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01817-6

  • Online ISBN: 978-3-642-01818-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics