Abstract
Most statistical approaches to stylometry to date have focused on lexical methods, such as relative word frequencies or type-token ratios. Explicit attention to syntactic features has been comparatively rare. Those approaches that have used syntactic features typically either used very shallow features (such as parts of speech) or features based on phrase structure grammars. This paper investigates whether typed dependency grammars might yield useful stylometric features.
An experiment was conducted using a novel method of depicting information about typed dependencies. Each token in a text is replaced with a “DepWord,” which consists of a concise representation of the chain of grammatical dependencies from that token back to the root of the sentence. The resulting representation contains only syntactic information, with no lexical or othographic information. These DepWords can then be used in place of the original words as the input for statistical language processing methods.
I adapted a simple method of authorship attribution — nearest neighbor based on word frequency rankings — for use with DepWords, and found it performed comparably to the same technique trained on words or parts of speech, even outperforming lexical methods in some cases. This indicates that the grammatical dependency relations between words contains stylometric information sufficient for distinguishing authorship. These results suggest that further research into typed-dependency-based stylometry might prove fruitful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baayen, R., van Halteren, H., Tweedie, F.: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11(3), 121–131 (1996)
Goldman, E., Allison, A.: Using grammatical Markov models for stylometric analysis. Class project, CS224N, Stanford University (2008), Retrieved from, http://nlp.stanford.edu/courses/cs224n/2008/reports/17.pdf
Holmes, D.I.: Authorship attribution. Computers and the Humanities 28(2), 87–106 (1994)
Juola, P.: Authorship Attribution. Now Publishers, Delft (2008)
Kaster, A., Siersdorfer, S., Weikum, G.: Combining text and linguistic document representations for authorship attribution. In: SIGIR Workshop: Stylistic Analysis of Text for Information Access (STYLE), pp. 27–35. MPI, Saarbrücken (2005)
Levitsky, V., Melnyk, Y.P.: Sentence length and sentence structure in English prose. Glottometrics 21, 14–24 (2011)
Marneffe, M., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, pp. 449–454 (2006)
Mosteller, F., Wallace, D.L.: Inference and disputed authorship: The Federalist. Addison-Wesley, Massachusetts (1964)
Popescu, M., Dinu, L.P.: Rank distance as a stylistic similarity. In: Coling 2008: Companion Volume — Posters and Demonstrations, pp. 91–94 (2008)
Raghavan, S., Kovashka, A., Mooney, R.: Authorship attribution using probabilistic context-free grammars. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 38–42 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hollingsworth, C. (2012). Using Dependency-Based Annotations for Authorship Identification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science(), vol 7499. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32790-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-32790-2_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32789-6
Online ISBN: 978-3-642-32790-2
eBook Packages: Computer ScienceComputer Science (R0)