Skip to main content

Complete Syntactic N-grams as Style Markers for Authorship Attribution

  • Conference paper
Human-Inspired Computing and Its Applications (MICAI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8856))

Included in the following conference series:

Abstract

In this paper we present an authorship attribution method based on the use of complete (non-continuous, with bifurcations) syntactic n-grams as style markers. Syntactic n-grams are obtained by following paths in subtrees of a syntactic tree. We work with relatively short text fragments and build authors’ profiles of various sizes using tf-idf scheme. We train SVM classifier to perform the task. We compare the method with the application of character n-grams and show that the accuracy increases when using complete syntactic n-grams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Argamon, S., Juola, P.: Overview of the international authorship identification competition at pan-2011. In: CLEF (Notebook Papers/Labs/Workshop) (2011)

    Google Scholar 

  2. Bozkurt, I.N., Baghoglu, O., Uyar, E.: Authorship attribution. In: 22nd International Symposium on Computer and information sciences, ISCIS 2007, pp. 1–5. IEEE (2007)

    Google Scholar 

  3. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)

    Google Scholar 

  4. Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship attribution with support vector machines. Applied intelligence 19(1-2), 109–123 (2003)

    Article  MATH  Google Scholar 

  5. Escalante, H.J., Solorio, T., Montes-y Gómez, M.: Local histograms of character n-grams for authorship attribution. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 288–298. Association for Computational Linguistics (2011)

    Google Scholar 

  6. Halteren, H.V.: Author verification by linguistic profiling: An exploration of the parameter space. ACM Transactions on Speech and Language Processing (TSLP) 4(1), 1 (2007)

    Article  Google Scholar 

  7. Holmes, D.I., Forsyth, R.S.: The federalist revisited: New directions in authorship attribution. Literary and Linguistic Computing 10(2), 111–127 (1995)

    Article  Google Scholar 

  8. Juola, P.: Future trends in authorship attribution. In: Advances in digital forensics III, pp. 119–132. Springer (2007)

    Google Scholar 

  9. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 423–430. Association for Computational Linguistics (2003)

    Google Scholar 

  10. Sidorov, G.: Non-continuous syntactic n-grams. Polibits 48, 67–75 (2013)

    Google Scholar 

  11. Sidorov, G.: Syntactic dependency based n-grams in rule based automatic english as second language grammar correction. International Journal of Computational Linguistics and Applications 4(2), 169–188 (2013)

    Google Scholar 

  12. Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Systems with Applications 41(3), 853–860 (2014)

    Article  Google Scholar 

  13. Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60(3), 538–556 (2009)

    Article  Google Scholar 

  14. Tweedie, F.J., Singh, S., Holmes, D.I.: Neural network applications in stylometry: The federalist papers. Computers and the Humanities 30(1), 1–10 (1996)

    Article  Google Scholar 

  15. Zhao, Y., Zobel, J., Vines, P.: Using relative entropy for authorship attribution. In: Information Retrieval Technology, pp. 92–105. Springer (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Posadas-Duran, JP., Sidorov, G., Batyrshin, I. (2014). Complete Syntactic N-grams as Style Markers for Authorship Attribution. In: Gelbukh, A., Espinoza, F.C., Galicia-Haro, S.N. (eds) Human-Inspired Computing and Its Applications. MICAI 2014. Lecture Notes in Computer Science(), vol 8856. Springer, Cham. https://doi.org/10.1007/978-3-319-13647-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13647-9_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13646-2

  • Online ISBN: 978-3-319-13647-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics