Automatic Detection of Uncertain Statements in the Financial Domain

Theil, Christoph Kilian; Štajner, Sanja; Stuckenschmidt, Heiner; Paolo Ponzetto, Simone

doi:10.1007/978-3-319-77116-8_48

Automatic Detection of Uncertain Statements in the Financial Domain

Christoph Kilian Theil¹⁴,
Sanja Štajner¹⁴,
Heiner Stuckenschmidt¹⁴ &
…
Simone Paolo Ponzetto¹⁴

Conference paper
First Online: 10 October 2018

1183 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10762))

Abstract

The automatic detection of uncertain statements can benefit NLP tasks such as deception detection and information extraction. Furthermore, it can enable new analyses in social sciences such as business where the quantification of uncertainty or risk plays a significant role. Thus, for the first time, we approached the automatic detection of uncertain statements as a binary sentence classification task on the transcripts of spoken language in the financial domain. We created a new dataset and – besides using bag-of-words, part-of-speech tags, and dictionaries – developed rule-based features tailored to our task. Finally, we analyzed systematically, which features perform best in the financial domain as opposed to the previously researched encyclopedic domain.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Earnings calls are publicly accessible teleconferences or webcasts in which executives of a public company present the financial results of the last quarter.
2.
10-Ks/10-Qs are standardized annual/quarterly reports providing an overview of a company’s financial results, which are required by the U. S. Securities and Exchange Commission.
3.
http://us.spindices.com/indices/equity/sp-500
4.
http://seekingalpha.com/earnings/earnings-call-transcripts
5.
This tagger reached an accuracy of 96.80% when applied to an evaluation set of 130,000 words taken from The Wall Street Journal [25].
6.
http://www3.nd.edu/~mcdonald/Word_Lists.html
7.
http://rgai.inf.u-szeged.hu/conll2010st/download.html

References

Hyland, K.: Hedging in Scientific Research Articles. John Benjamins, Amsterdam/Philadelphia (1998)
Book Google Scholar
Larcker, D.F., Zakolyukina, A.: Detecting deceptive disucssions in conference calls. J. Account. Res. 50, 494–540 (2012)
Article Google Scholar
Bachenko, J., Fitzpatrick, E., Schonwetter, M.: Verification and implementation of language-based deception indicators in civil and criminal narratives. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, pp. 25–32 (2008)
Google Scholar
Szarvas, G.: Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: Proceedings of ACL-08: HLT, Columbus, OH, pp. 281–289 (2008)
Google Scholar
Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, pp. 992–999 (2007)
Google Scholar
Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, pp. 25–32 (2003)
Google Scholar
Farkas, R., Vincze, V., Móra, G., Csirik, J., Szarvas, G.: The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning: Shared Task, Uppsala, pp. 1–12 (2010)
Google Scholar
Loughran, T., McDonald, B.: Textual analysis in accounting and finance: a survey. J. Account. Res. 54, 1187–1230 (2016)
Article Google Scholar
Light, M., Qiu, X.Y., Srinivasan, P.: The language of bioscience: facts, speculations, and statements in between. In: HLT-NAACL 2004 Workshop: BioLINK 2004, Linking Biological Literature, Ontologies and Databases, Boston, MA, pp. 17–24 (2004)
Google Scholar
Ganter, V., Strube, M.: Finding hedges by chasing weasels: hedge detection using wikipedia tags and shallow linguistic features. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Singapore, pp. 173–176 (2009)
Google Scholar
Li, F.: Textual analysis of corporate disclosures: a survey of the literature. J. Account. Lit. 29, 143–165 (2010)
Google Scholar
Kearney, C., Liu, S.: Textual sentiment in finance: a survey of methods and models. Int. Rev. Financ. Anal. 33, 171–185 (2014)
Article Google Scholar
Das, S.R.: Text and context: language analytics in finance. Found. Trends Financ. 8, 144–261 (2014)
Article Google Scholar
Li, F.: Annual report readability, current earnings, and earnings persistence. J. Account. Econ. 45, 221–247 (2008)
Article Google Scholar
Li, F.: The information content of forward-looking statements in corporate filings: a naïve bayesian machine learning approach. J. Account. Res. 50, 494–540 (2012)
Google Scholar
Loughran, T., McDonald, B., Yun, H.: A wolf in sheeps clothing: the use of ethics-related terms in 10-K reports. J. Bus. Ethics 89, 39–49 (2009)
Article Google Scholar
Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 35–65 (2011)
Article Google Scholar
Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69, 1643–1671 (2014)
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 41–48 (1960)
Article Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Article Google Scholar
Fleiss, J.L.: Statistical Methods for Rates and Proportions, 2nd edn. John Wiley, New York (1981)
MATH Google Scholar
Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)
MATH Google Scholar
Bird, S., Loper, E.: Natural language toolkit: taggers (2017). https://github.com/nltk/nltk/blob/develop/nltk/tag/__init__.py. Accessed 27 Jan 2017
Honnibal, M.: Averaged perceptron tagger (2013). https://github.com/nltk/nltk/blob/develop/nltk/tag/perceptron.py. Accessed 27 Jan 2017
Honnibal, M.: A good part-of-speech tagger in about 200 lines of python (2013). https://explosion.ai/blog/part-of-speech-pos-tagger-in-python. Accessed 27 Jan 2017
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar
Le Cessie, S., van Houwelingen, J.: Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992)
Article Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods-Support Vector Learning (1998)
Google Scholar
Aha, D., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991)
MATH Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Chapter Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 41, 5–32 (2001)
Article Google Scholar

Download references

Acknowledgments

We thank Alexander Diete for his help with the data acquisition and technical advice as well as Clemens Müller for his help with the annotation. This work was supported by the SFB 884 on the Political Economy of Reforms at the University of Mannheim (project C4), funded by the German Research Foundation (DFG).

Author information

Authors and Affiliations

Data and Web Science Group, University of Mannheim, Mannheim, Germany
Christoph Kilian Theil, Sanja Štajner, Heiner Stuckenschmidt & Simone Paolo Ponzetto

Authors

Christoph Kilian Theil
View author publications
You can also search for this author in PubMed Google Scholar
Sanja Štajner
View author publications
You can also search for this author in PubMed Google Scholar
Heiner Stuckenschmidt
View author publications
You can also search for this author in PubMed Google Scholar
Simone Paolo Ponzetto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph Kilian Theil .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theil, C.K., Štajner, S., Stuckenschmidt, H., Paolo Ponzetto, S. (2018). Automatic Detection of Uncertain Statements in the Financial Domain. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-77116-8_48
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77115-1
Online ISBN: 978-3-319-77116-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics