skip to main content
10.1145/1363686.1363788acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Author identification using writer-dependent and writer-independent strategies

Published: 16 March 2008 Publication History

Abstract

In this work we discuss author identification for documents written in Portuguese. Two different approaches were compared. The first is the writer-independent model which reduces the pattern recognition problem to a single model and two classes, hence, makes it possible to build robust system even when few genuine samples per writer are available. The second is the personal model, which very often performs better but needs a bigger number of samples per writer. We also introduce a stylometric feature set based on the conjunctions and adverbs of the Portuguese language. Experiments on a database composed of short articles from 30 different authors and Support Vector Machine (SVM) as classifier demonstrate that the proposed strategy can produced results comparable to the literature.

References

[1]
S. Argamon, M. Koppel, J. Fine, and A. R. Shimony. Gender, genre, and writing style in formal written texts. Text, 23(3), 2003.
[2]
S. Argamon, M. Saric, and S. S. Stein. Style mining of electronic messages for multiple author discrimination. In ACM Conference on Knowledge Discovery and Data Mining, 2003.
[3]
H. Baayen, H. van Halteren, and F. Tweedie. Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing, 11(3):121--131, 1666.
[4]
C. Chaski. A daubert-inspired assessment of current techniques for language-based author identification. Technical Report 1098, ILE Technical Report, 1998.
[5]
C. E. Chaski. Who is at the keyboard. authorship attribution in digital evidence investigations. International Journal of Digital Evidence, 4(1), 2005.
[6]
B. C. Coutinho, L. M. Macedo, A. Rique-JR, and L. V. Batista. Atribuio de autoria usando PPM. In XXV Congress of the SBC, pages 2208--2217, 2004.
[7]
R. S. Forsyth and D. I. Holmes. Feature finding for text classfication. Literary and Linguistic Computing, 11(4):163--174, 1996.
[8]
M. Koppel and J. Schler. Exploiting stylistic idiosyncrasies for authorship attribution. In Workshop on Computational Approaches to Style Analysis and Synthesis, 2003.
[9]
D. Madigan, A. Genkin, D. D. Lewis, S. Argamon, D. Fradkin, and L. Ye. Author identification on the large scale. In Joint Annual Meeting of the Interface and the Classification Society of North America (CSNA), 2005.
[10]
C. Mascol. Curves of pauline and pseudo-pauline style i. Unitarian Review, 30:453--460, 1888.
[11]
T. Mendenhall. The characteristic curves of composition. Science, 214:237--249, 1887.
[12]
A. Morton. Literary Detection. Charles Scribners Sons, 1978.
[13]
F. Mosteller and D. L. Wallace. Inference and disputed authorship: The federalist. In Series in behavioral science: Quantitative methods edition. Addison-Wesley, 1964.
[14]
E. Pekalska and R. P. W. Duin. Dissimilarity representations allow for building good classifiers. Pattern Recognition, 23:943--956, 2002.
[15]
J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola et al, editor, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 1999.
[16]
F. Smadja. Lexical co-occurrence: The missing link. Journal of the Association for Literary and Linguistic Computing, 4(3), 1989.
[17]
G. Tambouratzis, S. Markantonatou, N. Hairetakis, M. Vassiliou, G. Carayannis, and D. Tambouratzis. Discriminating the registers and styles in the modern greek language -- part 2: Extending the feature vector to optimize author discrimination. Literary and Linguistic Computing, 19(2):221--242, 2004.
[18]
V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc, 1995.

Cited By

View all
  • (2019) What are neural networks not good at? On artificial creativity Big Data & Society10.1177/20539517198394336:1Online publication date: 9-Apr-2019
  • (2019)Representation Learning and Dissimilarity for Writer Identification2019 International Conference on Systems, Signals and Image Processing (IWSSIP)10.1109/IWSSIP.2019.8787293(63-68)Online publication date: Jun-2019
  • (2019)The dissimilarity approach: a reviewArtificial Intelligence Review10.1007/s10462-019-09746-zOnline publication date: 2-Aug-2019
  • Show More Cited By

Index Terms

  1. Author identification using writer-dependent and writer-independent strategies

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '08: Proceedings of the 2008 ACM symposium on Applied computing
      March 2008
      2586 pages
      ISBN:9781595937537
      DOI:10.1145/1363686
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 March 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. author identification
      2. stylometry

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SAC '08
      Sponsor:
      SAC '08: The 2008 ACM Symposium on Applied Computing
      March 16 - 20, 2008
      Fortaleza, Ceara, Brazil

      Acceptance Rates

      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019) What are neural networks not good at? On artificial creativity Big Data & Society10.1177/20539517198394336:1Online publication date: 9-Apr-2019
      • (2019)Representation Learning and Dissimilarity for Writer Identification2019 International Conference on Systems, Signals and Image Processing (IWSSIP)10.1109/IWSSIP.2019.8787293(63-68)Online publication date: Jun-2019
      • (2019)The dissimilarity approach: a reviewArtificial Intelligence Review10.1007/s10462-019-09746-zOnline publication date: 2-Aug-2019
      • (2018)A Computational Approach for Authorship Attribution on Multiple Languages2018 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2018.8489704(1-8)Online publication date: Jul-2018
      • (2017)Off-line writer identification using handcrafted features versus ConvNets2017 36th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC.2017.8405123(1-8)Online publication date: Oct-2017
      • (2017)An extensive study of authorship authentication of Arabic articlesInternational Journal of Web Information Systems10.1108/IJWIS-03-2016-001113:1(85-104)Online publication date: 18-Apr-2017
      • (2016)Pairwise Comparative Classification for Translator Stylometric AnalysisACM Transactions on Asian and Low-Resource Language Information Processing10.1145/289899716:1(1-26)Online publication date: 27-Jun-2016
      • (2016)A computational approach for authorship attribution of literary texts using sintatic features2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727835(4835-4842)Online publication date: Jul-2016
      • (2014)On authorship authentication of Arabic articles2014 5th International Conference on Information and Communication Systems (ICICS)10.1109/IACS.2014.6841973(1-6)Online publication date: Apr-2014
      • (2012)A new document author representation for authorship attributionProceedings of the 4th Mexican conference on Pattern Recognition10.1007/978-3-642-31149-9_29(283-292)Online publication date: 27-Jun-2012

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media