Skip to main content

Advertisement

Log in

A comparative study on authorship attribution classification tasks using both neural network and statistical methods

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The present paper investigates the application of the multi-layer perceptron (MLP) to the task of categorizing texts based on their authors’ style. This task is of particular importance for information retrieval applications involving very large document databases. The emphasis of this article is to determine the extent to which the MLP model can be fine-tuned to successfully analyse such data, uncovering the stylistic differences among authors. The MLP-based method is compared and contrasted to statistical techniques, such as discriminant analysis, that are widely used in stylistic studies. The comparison of the methods is based on their classification performance, to provide an objective evaluation of the advantages of each method. A second aim of the study presented here is to compare the effectiveness of distinct features in the task of uncovering the author identity for each method. To evaluate to a greater depth the effectiveness of the entire approach, the results of the proposed MLP-based method are compared to those of established approaches, such as the support vector machines (SVM), using both the original parameters employed by the MLP as well as term frequency–inverse document frequency (TF–IDF) parameters, and the cascade correlation approach. It is found that the proposed MLP-based approach possesses a number of advantages, such as high classification accuracy, broadly comparable to that of the SVM, coupled with the ability to algorithmically reduce the set of parameters used without adversely affecting the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Gurney PJ, Gurney LW (1998) Subsets and homogeneity: authorship attribution in the Scriptores Historiae Augustae. Lit Linguist Comput 13(3):133–140

    Article  Google Scholar 

  2. Holmes DI (1994) Authorship attribution. Comput Humanit 28:86–106

    Google Scholar 

  3. Mosteller F, Wallace DL (1984) Applied Bayesian and classical inference: the case of the Federalist papers, 2nd edn. Springer, New York

    MATH  Google Scholar 

  4. Holmes DI, Singh S, Tweedie FJ (1996) Neural network applications in stylometry: the Federalist papers. Comput Humanit 30:1–10

    Article  Google Scholar 

  5. Lowe D, Matthews R (1995) Shakespeare vs. Fletcher: a stylometric analysis by radial basis functions. Comput Humanit 29:449–461

    Article  Google Scholar 

  6. Tambouratzis G, Hairetakis N, Markantonatou S, Carayannis G (2003) Applying the SOM model to text classification according to register and stylistic content. Int J Neural Syst 13(1):1–11

    Article  Google Scholar 

  7. Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  8. Somers H, Tweedie F (2003) Authorship attribution and practice. Comput Humanit 37:407–429

    Article  Google Scholar 

  9. Tambouratzis G, Markantonatou S, Hairetakis N, Vassiliou M, Tambouratzis D, Carayannis G (2004) Discriminating the registers and styles in the Modern Greek language—part 2: extending the feature vector to optimise author discrimination. Lit Linguist Comput 19(2):221–242

    Article  Google Scholar 

  10. Kolman E, Margaliot M (2005) Are artificial neural networks white boxes? IEEE Trans Neural Netw 16(4):844–852

    Article  Google Scholar 

  11. Mackay D (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4(3):448–472

    Article  Google Scholar 

  12. Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. Proc Int Jt Conf Neural Netw 3:21–26

    Article  Google Scholar 

  13. Papageorgiou H, Prokopidis P, Giouli V, Piperidis S (2000) A unified PoS tagging architecture and its application to Greek, vol 3. Second international conference on language resources and evaluation proceedings. Athens, pp 1455–1462

  14. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York

    MATH  Google Scholar 

  15. Fahlman S, Lebiere C (1990) The cascade-correlation learning architecture. Adv Neural Inform Process Syst 2:524–532 Morgan Kaufmann

    Google Scholar 

  16. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inform Process Manag 24(5):513–523

    Article  Google Scholar 

  17. Vapnik V (1998) Statistical learning theory. Wiley Interscience, New York

    MATH  Google Scholar 

  18. Diederich J, Kinderman J, Leopold E, Paass G (2003) Authorship attribution with support vector machines. Appl Intell 19:109–123

    Article  MATH  Google Scholar 

  19. Bhamidipati NL, Pal SK (2007) Stemming via distribution-based word segregation for classification and retrieval. IEEE Trans Syst Man Cybern B Cybern 37(2):350–360

    Article  Google Scholar 

  20. Drucker H, Wu D, Vapnik V (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054

    Article  Google Scholar 

  21. Koprinska I, Poon J, Clark J, Chan J (2007) Learning to classify e-mail. Inform Sci 177(10):2167–2187

    Article  Google Scholar 

  22. Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Acknowledgments

The present study was partly funded by the PENED 03ED97 research project of the General Secretariat for Research and Technology of the Hellenic Ministry of Development.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikos Tsimboukakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsimboukakis, N., Tambouratzis, G. A comparative study on authorship attribution classification tasks using both neural network and statistical methods. Neural Comput & Applic 19, 573–582 (2010). https://doi.org/10.1007/s00521-009-0314-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-009-0314-7

Keywords