Skip to main content

Automatic Detection of Uncertain Statements in the Financial Domain

  • Conference paper
  • First Online:
  • 1183 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10762))

Abstract

The automatic detection of uncertain statements can benefit NLP tasks such as deception detection and information extraction. Furthermore, it can enable new analyses in social sciences such as business where the quantification of uncertainty or risk plays a significant role. Thus, for the first time, we approached the automatic detection of uncertain statements as a binary sentence classification task on the transcripts of spoken language in the financial domain. We created a new dataset and – besides using bag-of-words, part-of-speech tags, and dictionaries – developed rule-based features tailored to our task. Finally, we analyzed systematically, which features perform best in the financial domain as opposed to the previously researched encyclopedic domain.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Earnings calls are publicly accessible teleconferences or webcasts in which executives of a public company present the financial results of the last quarter.

  2. 2.

    10-Ks/10-Qs are standardized annual/quarterly reports providing an overview of a company’s financial results, which are required by the U. S. Securities and Exchange Commission.

  3. 3.

    http://us.spindices.com/indices/equity/sp-500

  4. 4.

    http://seekingalpha.com/earnings/earnings-call-transcripts

  5. 5.

    This tagger reached an accuracy of 96.80% when applied to an evaluation set of 130,000 words taken from The Wall Street Journal [25].

  6. 6.

    http://www3.nd.edu/~mcdonald/Word_Lists.html

  7. 7.

    http://rgai.inf.u-szeged.hu/conll2010st/download.html

References

  1. Hyland, K.: Hedging in Scientific Research Articles. John Benjamins, Amsterdam/Philadelphia (1998)

    Book  Google Scholar 

  2. Larcker, D.F., Zakolyukina, A.: Detecting deceptive disucssions in conference calls. J. Account. Res. 50, 494–540 (2012)

    Article  Google Scholar 

  3. Bachenko, J., Fitzpatrick, E., Schonwetter, M.: Verification and implementation of language-based deception indicators in civil and criminal narratives. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, pp. 25–32 (2008)

    Google Scholar 

  4. Szarvas, G.: Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: Proceedings of ACL-08: HLT, Columbus, OH, pp. 281–289 (2008)

    Google Scholar 

  5. Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, pp. 992–999 (2007)

    Google Scholar 

  6. Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, pp. 25–32 (2003)

    Google Scholar 

  7. Farkas, R., Vincze, V., Móra, G., Csirik, J., Szarvas, G.: The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning: Shared Task, Uppsala, pp. 1–12 (2010)

    Google Scholar 

  8. Loughran, T., McDonald, B.: Textual analysis in accounting and finance: a survey. J. Account. Res. 54, 1187–1230 (2016)

    Article  Google Scholar 

  9. Light, M., Qiu, X.Y., Srinivasan, P.: The language of bioscience: facts, speculations, and statements in between. In: HLT-NAACL 2004 Workshop: BioLINK 2004, Linking Biological Literature, Ontologies and Databases, Boston, MA, pp. 17–24 (2004)

    Google Scholar 

  10. Ganter, V., Strube, M.: Finding hedges by chasing weasels: hedge detection using wikipedia tags and shallow linguistic features. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Singapore, pp. 173–176 (2009)

    Google Scholar 

  11. Li, F.: Textual analysis of corporate disclosures: a survey of the literature. J. Account. Lit. 29, 143–165 (2010)

    Google Scholar 

  12. Kearney, C., Liu, S.: Textual sentiment in finance: a survey of methods and models. Int. Rev. Financ. Anal. 33, 171–185 (2014)

    Article  Google Scholar 

  13. Das, S.R.: Text and context: language analytics in finance. Found. Trends Financ. 8, 144–261 (2014)

    Article  Google Scholar 

  14. Li, F.: Annual report readability, current earnings, and earnings persistence. J. Account. Econ. 45, 221–247 (2008)

    Article  Google Scholar 

  15. Li, F.: The information content of forward-looking statements in corporate filings: a naïve bayesian machine learning approach. J. Account. Res. 50, 494–540 (2012)

    Google Scholar 

  16. Loughran, T., McDonald, B., Yun, H.: A wolf in sheeps clothing: the use of ethics-related terms in 10-K reports. J. Bus. Ethics 89, 39–49 (2009)

    Article  Google Scholar 

  17. Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 35–65 (2011)

    Article  Google Scholar 

  18. Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69, 1643–1671 (2014)

    Article  Google Scholar 

  19. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 41–48 (1960)

    Article  Google Scholar 

  20. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)

    Article  Google Scholar 

  21. Fleiss, J.L.: Statistical Methods for Rates and Proportions, 2nd edn. John Wiley, New York (1981)

    MATH  Google Scholar 

  22. Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)

    MATH  Google Scholar 

  23. Bird, S., Loper, E.: Natural language toolkit: taggers (2017). https://github.com/nltk/nltk/blob/develop/nltk/tag/__init__.py. Accessed 27 Jan 2017

  24. Honnibal, M.: Averaged perceptron tagger (2013). https://github.com/nltk/nltk/blob/develop/nltk/tag/perceptron.py. Accessed 27 Jan 2017

  25. Honnibal, M.: A good part-of-speech tagger in about 200 lines of python (2013). https://explosion.ai/blog/part-of-speech-pos-tagger-in-python. Accessed 27 Jan 2017

  26. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

  27. Le Cessie, S., van Houwelingen, J.: Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992)

    Article  Google Scholar 

  28. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)

    Google Scholar 

  29. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods-Support Vector Learning (1998)

    Google Scholar 

  30. Aha, D., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991)

    MATH  Google Scholar 

  31. Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)

    Chapter  Google Scholar 

  32. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  33. Breiman, L.: Random forests. Mach. Learn. 41, 5–32 (2001)

    Article  Google Scholar 

Download references

Acknowledgments

We thank Alexander Diete for his help with the data acquisition and technical advice as well as Clemens Müller for his help with the annotation. This work was supported by the SFB 884 on the Political Economy of Reforms at the University of Mannheim (project C4), funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christoph Kilian Theil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Theil, C.K., Štajner, S., Stuckenschmidt, H., Paolo Ponzetto, S. (2018). Automatic Detection of Uncertain Statements in the Financial Domain. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77116-8_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77115-1

  • Online ISBN: 978-3-319-77116-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics