Abstract
We survey the authorship attribution of documents given some prior stylistic characteristics of the author’s writing extracted from a corpus of known works, e.g., authentication of disputed documents or literary works. Although the pioneering paper based on word length histograms appeared at the very end of the nineteenth century, the resolution power of this and other stylometry approaches is yet to be studied both theoretically and on case studies such that additional information can assist finding the correct attribution.
We survey several theoretical approaches including ones approximating the apparently nearly optimal one based on Kolmogorov conditional complexity and some case studies: attributing Shakespeare canon and newly discovered works as well as allegedly M. Twain’s newly-discovered works, Federalist papers binary (Madison vs. Hamilton) discrimination using Naive Bayes and other classifiers, and steganography presence testing. The latter topic is complemented by a sketch of an anagrams ambiguity study based on the Shannon cryptography theory.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abramyan, A.: The Armenian Cryptography (in Armenian). Yerevan University Press (1974)
Bakhtin, M.: Problemy Poetiki Dostoevskogo, English translation. University of Minnesota Press (1984)
Bosch, R., Smith, J.: Separating hyperplanes and the authorship of the disputed Federalist Papers. American Mathematical Monthly 105(7), 601–608 (1998)
Brinegar, C.: Mark Twain and the Quintus Curtis Snodgrass Letters: A statistical test of authorship. Journal of American Statistical Association 58(301), 85–96 (1963)
Cilibasi, R., Vitanyi, P.: Clustering by compression, CWI manuscript (submitted, 2003)
Burges, C.: A Tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 955–974 (1998)
Cover, T., Thomas, J.: Elements of Information Theory. Wiley, N.Y (1991)
Donnelly, I.: The Great Cryptogram, 1, 1888, reprinted by Bell and Howell, Cleveland (1969)
Efron, B., Thisted, R.: Estimating the number of unseen species; How many words did Shakespeare know? Biometrika 63, 435–437 (1975)
Thisted, R., Efron, B.: Did Shakespeare write a newly discovered poem? Biometrika 74, 445–455 (1987)
Friedman, W., Friedman, E.: The Shakespearean Ciphers Exposed. Cambridge University Press, Cambridge (1957)
Katirai, H.: Filtering junk e-mail (1999), See his web-site, http://members.rogers.com/hoomank/
Khmelev, D., Tweedy, F.J.: Using markov chains for identification of writers. Literary and Linguistic Computing 16(4), 299–307 (2001)
Kukushkina, O., Polikarpov, A., Khmelev, D.: Text authorship attribution using letter and grammatical information. Problems of Information Transmission 37(2), 172–184 (2001)
Markov, A.: On application of statistical method. Comptes Rendus of Imper. Academy of Sciences, Ser. VI, X, 153 (1913), 239 (1916)
Matus, I.: Shakespeare, in Fact, Continuum, N.Y. (1994)
Mendenhall, T.A.: The characteristic curves of composition. Science 11, 237–249 (1887)
Mendenhall, T.A.: A mechanical solution to a literary problem. Popular Science Monthly 60, 97–105 (1901)
Mitchell, J.: Who Wrote Shakespeare. Thames and Hudson Ltd., London (1996)
Mosteller, F., Wallace, D.: Inference and Disputed Authorship. Addison-Wesley, Reading (1964)
Nicholl, C.: The Reckoning, 2nd edn. Chicago University Press (1992)
Price, D.: Shakespeare’s Unorthodox Biography. Greenwood Press, London (2001)
Rosenfeld, R.: A maximum entropy approach to adaptive statistical language, Modeling. Computer, Speech and Language 10, 187–228 (1996); A shortened version of the author’s PhD thesis, Carnegie Mellon University (1994)
Thompson, J.W., Padover, S.K.: Secret Diplomacy; Espionage and Cryptography, pp. 1500–1815. F. Ungar Pub. Co., N.Y (1963)
De Vel, O., Anderson, A., Corney, M., Mohay, G.: Multi-Topic E-mail Authorship Attribution Forensics. In: Proc. Workshop on Data Mining for Security Applications, 8th ACM Conference on Computer Security (CCS 2001) (2001)
Williams, C.: Word-length distribution in the works of Shakespeare and Bacon. Biometrika 62, 207–212 (1975)
Zhao, J.: The impact of cross-entropy on language modeling, PhD thesis, Mississippi State University (1999), http://www.isip.msstate.edu/publications/courses/ece_7000_speech/lectures/1999/lecture_06/paper/paper_v1.pdf
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Malyutov, M.B. (2006). Authorship Attribution of Texts: A Review. In: Ahlswede, R., et al. General Theory of Information Transfer and Combinatorics. Lecture Notes in Computer Science, vol 4123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11889342_20
Download citation
DOI: https://doi.org/10.1007/11889342_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46244-6
Online ISBN: 978-3-540-46245-3
eBook Packages: Computer ScienceComputer Science (R0)