Skip to main content

A Relational Unsupervised Approach to Author Identification

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8399))

Included in the following conference series:

Abstract

In the last decades speaking and writing habits have changed. Many works faced the author identification task by exploiting frequency-based approaches, numeric techniques or writing style analysis. Following the last approach we propose a technique for author identification based on First-Order Logic. Specifically, we translate the complex data represented by natural language text to complex (relational) patterns that represent the writing style of an author. Then, we model an author as the result of clustering the relational descriptions associated to the sentences. The underlying idea is that such a model can express the typical way in which an author composes the sentences in his writings. So, if we can map such writing habits from the unknown-author model to the known-author model, we can conclude that the author is the same. Preliminary results are promising and the approach seems viable in real contexts since it does not need a training phase and performs well also with short texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Argamon, S., Saric, M., Stein, S.S.: Style mining of electronic messages for multiple authorship discrimination: first results. In: Getoor, L., Senator, T.E., Domingos, P., Faloutsos, C. (eds.) Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 475–480. ACM (2003)

    Google Scholar 

  2. Argamon, S., Whitelaw, C., Chase, P., Hota, S.R., Garg, N., Levitan, S.: Stylistic text classification using functional lexical features: research articles. J. Am. Soc. Inf. Sci. Technol. 58(6), 802–822 (2007)

    Article  Google Scholar 

  3. De Marneffe, M.C., Maccartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of International Conference on Language Resources and Evaluation (LREC), pp. 449–454 (2006)

    Google Scholar 

  4. Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship attribution with support vector machines. Appl. Intell. 19(1–2), 109–123 (2003)

    Article  MATH  Google Scholar 

  5. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  6. Feng, V.W., Hirst, G.: Authorship verication with entity coherence and other rich linguistic features notebook for PAN at CLEF 2013. In: Forner, P., Navigli, R., Tufis, D. (ed.) CLEF 2013 Labs and Workshops - Online Working Notes, Padua, Italy, September 2013. PROMISE (2013)

    Google Scholar 

  7. Ferilli, S., Basile, T.M.A., Di Mauro, N., Esposito, F.: Plugging numeric similarity in first-order logic horn clauses comparison. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 33–44. Springer, Heidelberg (2011)

    Google Scholar 

  8. Ferilli, S., Leuzzi, F., Rotella, F.: Cooperating techniques for extracting conceptual taxonomies from text. In: Proceedings of The Workshop on Mining Complex Patterns at AI*IA XIIth Conference (2011)

    Google Scholar 

  9. Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15. MIT Press, Cambridge (2003)

    Google Scholar 

  10. Leuzzi, F., Ferilli, S., Rotella, F.: ConNeKTion: a tool for handling conceptual graphs automatically extracted from text. In: Catarci, T., Ferro, N., Poggi, A. (eds.) IRCDL 2013. CCIS, vol. 385, pp. 93–104. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  11. Leuzzi, F., Ferilli, S., Rotella, F.: Improving robustness and flexibility of concept taxonomy learning from text. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2012. LNCS, vol. 7765, pp. 170–184. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Li, J., Zheng, R., Chen, H.: From fingerprint to writeprint. Commun. ACM 49(4), 76–82 (2006)

    Article  Google Scholar 

  13. Lowe, D., Matthews, R.: Shakespeare vs. fletcher: a stylometric analysis by radial basis functions. Comput. Humanit. 29(6), 449–461 (1995)

    Article  Google Scholar 

  14. Mccarthy, P.M., Lewis, G.A., Dufty, D.F., Mcnamara, D.S.: Analyzing writing styles with coh-metrix. In: Sutcliffe, G., Goebel, R. (eds.) Proceedings of the Florida Artificial Intelligence Research Society International Conference (FLAIRS), pp. 764–769. AAAI Press (2006)

    Google Scholar 

  15. Qiu, L., Kan, M.-Y., Chua, T.-S.: A public reference implementation of the RAP anaphora resolution algorithm. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation, LREC 2004, 26–28 May 2004, Lisbon, Portugal, pp. 291–294. European Language Resources Association (2004)

    Google Scholar 

  16. Raghavan, S., Kovashka, A., Mooney, R.: Authorship attribution using probabilistic context-free grammars. In: Proceedings of the ACL 2010 Conference Short Papers, ACLShort ’10, pp. 38–42, Stroudsburg, PA, USA, Association for Computational Linguistics (2010)

    Google Scholar 

  17. Rotella, F., Ferilli, S., Leuzzi, F.: An approach to automated learning of conceptual graphs from text. In: Ali, M., Bosse, T., Hindriks, K.V., Hoogendoorn, M., Jonker, C.M., Treur, J. (eds.) IEA/AIE 2013. LNCS, vol. 7906, pp. 341–350. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  18. Rotella, F., Ferilli, S., Leuzzi, F.: A domain based approach to information retrieval in digital libraries. In: Agosti, M., Esposito, F., Ferilli, S., Ferro, N. (eds.) IRCDL 2012. CCIS, vol. 354, pp. 129–140. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  19. Seidman, S.: Authorship verification using the impostors method notebook for pan at clef 2013. In: Forner, P., Navigli, R., Tufis, D. (eds.) CLEF 2013 Labs and Workshops - Online Working Notes, Padua, Italy, September 2013. PROMISE (2013)

    Google Scholar 

  20. Tweedie, F.J., Singh, S., Holmes, D.I.: Neural network applications in stylometry: the federalist papers. Comput. Humanit. 30(1), 1–10 (1996)

    Article  Google Scholar 

  21. van Halteren, H.:. Linguistic profiling for author recognition and verification. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04, Stroudsburg, PA, USA. Association for Computational Linguistics (2004)

    Google Scholar 

  22. Vilarino, D., Pinto, D., Gomez, H., Leo, S., Castillo, E.: Lexical-syntactic and graph-based features for authorship verification - notebook for pan at clef 2013. In: Forner, P., Navigli, R., Tufis, D. (eds.) CLEF 2013 Labs and Workshops - Online Working Notes, Padua, Italy, September 2013. PROMISE (2013)

    Google Scholar 

  23. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)

    Article  Google Scholar 

Download references

Acknowledgments

We wish to express our sincere thanks to Paolo Gissi, for many useful discussions and for the inspiring concept of gray zone. This work was partially funded by Italian FAR project DM19410 MBLab “Laboratorio di Bioinformatica per la Biodiversità Molecolare” and Italian PON 2007-2013 project PON02_00563_3489339 “Puglia@Service”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Ferilli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Leuzzi, F., Ferilli, S., Rotella, F. (2014). A Relational Unsupervised Approach to Author Identification. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2013. Lecture Notes in Computer Science(), vol 8399. Springer, Cham. https://doi.org/10.1007/978-3-319-08407-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08407-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08406-0

  • Online ISBN: 978-3-319-08407-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics