Abstract
The automatic detection of uncertain statements can benefit NLP tasks such as deception detection and information extraction. Furthermore, it can enable new analyses in social sciences such as business where the quantification of uncertainty or risk plays a significant role. Thus, for the first time, we approached the automatic detection of uncertain statements as a binary sentence classification task on the transcripts of spoken language in the financial domain. We created a new dataset and – besides using bag-of-words, part-of-speech tags, and dictionaries – developed rule-based features tailored to our task. Finally, we analyzed systematically, which features perform best in the financial domain as opposed to the previously researched encyclopedic domain.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Earnings calls are publicly accessible teleconferences or webcasts in which executives of a public company present the financial results of the last quarter.
- 2.
10-Ks/10-Qs are standardized annual/quarterly reports providing an overview of a company’s financial results, which are required by the U. S. Securities and Exchange Commission.
- 3.
- 4.
- 5.
This tagger reached an accuracy of 96.80% when applied to an evaluation set of 130,000 words taken from The Wall Street Journal [25].
- 6.
- 7.
References
Hyland, K.: Hedging in Scientific Research Articles. John Benjamins, Amsterdam/Philadelphia (1998)
Larcker, D.F., Zakolyukina, A.: Detecting deceptive disucssions in conference calls. J. Account. Res. 50, 494–540 (2012)
Bachenko, J., Fitzpatrick, E., Schonwetter, M.: Verification and implementation of language-based deception indicators in civil and criminal narratives. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, pp. 25–32 (2008)
Szarvas, G.: Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: Proceedings of ACL-08: HLT, Columbus, OH, pp. 281–289 (2008)
Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, pp. 992–999 (2007)
Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, pp. 25–32 (2003)
Farkas, R., Vincze, V., Móra, G., Csirik, J., Szarvas, G.: The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning: Shared Task, Uppsala, pp. 1–12 (2010)
Loughran, T., McDonald, B.: Textual analysis in accounting and finance: a survey. J. Account. Res. 54, 1187–1230 (2016)
Light, M., Qiu, X.Y., Srinivasan, P.: The language of bioscience: facts, speculations, and statements in between. In: HLT-NAACL 2004 Workshop: BioLINK 2004, Linking Biological Literature, Ontologies and Databases, Boston, MA, pp. 17–24 (2004)
Ganter, V., Strube, M.: Finding hedges by chasing weasels: hedge detection using wikipedia tags and shallow linguistic features. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Singapore, pp. 173–176 (2009)
Li, F.: Textual analysis of corporate disclosures: a survey of the literature. J. Account. Lit. 29, 143–165 (2010)
Kearney, C., Liu, S.: Textual sentiment in finance: a survey of methods and models. Int. Rev. Financ. Anal. 33, 171–185 (2014)
Das, S.R.: Text and context: language analytics in finance. Found. Trends Financ. 8, 144–261 (2014)
Li, F.: Annual report readability, current earnings, and earnings persistence. J. Account. Econ. 45, 221–247 (2008)
Li, F.: The information content of forward-looking statements in corporate filings: a naïve bayesian machine learning approach. J. Account. Res. 50, 494–540 (2012)
Loughran, T., McDonald, B., Yun, H.: A wolf in sheeps clothing: the use of ethics-related terms in 10-K reports. J. Bus. Ethics 89, 39–49 (2009)
Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 35–65 (2011)
Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69, 1643–1671 (2014)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 41–48 (1960)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Fleiss, J.L.: Statistical Methods for Rates and Proportions, 2nd edn. John Wiley, New York (1981)
Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)
Bird, S., Loper, E.: Natural language toolkit: taggers (2017). https://github.com/nltk/nltk/blob/develop/nltk/tag/__init__.py. Accessed 27 Jan 2017
Honnibal, M.: Averaged perceptron tagger (2013). https://github.com/nltk/nltk/blob/develop/nltk/tag/perceptron.py. Accessed 27 Jan 2017
Honnibal, M.: A good part-of-speech tagger in about 200 lines of python (2013). https://explosion.ai/blog/part-of-speech-pos-tagger-in-python. Accessed 27 Jan 2017
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Le Cessie, S., van Houwelingen, J.: Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992)
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods-Support Vector Learning (1998)
Aha, D., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991)
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Breiman, L.: Random forests. Mach. Learn. 41, 5–32 (2001)
Acknowledgments
We thank Alexander Diete for his help with the data acquisition and technical advice as well as Clemens Müller for his help with the annotation. This work was supported by the SFB 884 on the Political Economy of Reforms at the University of Mannheim (project C4), funded by the German Research Foundation (DFG).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Theil, C.K., Štajner, S., Stuckenschmidt, H., Paolo Ponzetto, S. (2018). Automatic Detection of Uncertain Statements in the Financial Domain. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-77116-8_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77115-1
Online ISBN: 978-3-319-77116-8
eBook Packages: Computer ScienceComputer Science (R0)