A comparative study on authorship attribution classification tasks using both neural network and statistical methods

Tsimboukakis, Nikos; Tambouratzis, George

doi:10.1007/s00521-009-0314-7

A comparative study on authorship attribution classification tasks using both neural network and statistical methods

Original Article
Published: 01 November 2009

Volume 19, pages 573–582, (2010)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Nikos Tsimboukakis¹ &
George Tambouratzis¹

360 Accesses
10 Citations
Explore all metrics

Abstract

The present paper investigates the application of the multi-layer perceptron (MLP) to the task of categorizing texts based on their authors’ style. This task is of particular importance for information retrieval applications involving very large document databases. The emphasis of this article is to determine the extent to which the MLP model can be fine-tuned to successfully analyse such data, uncovering the stylistic differences among authors. The MLP-based method is compared and contrasted to statistical techniques, such as discriminant analysis, that are widely used in stylistic studies. The comparison of the methods is based on their classification performance, to provide an objective evaluation of the advantages of each method. A second aim of the study presented here is to compare the effectiveness of distinct features in the task of uncovering the author identity for each method. To evaluate to a greater depth the effectiveness of the entire approach, the results of the proposed MLP-based method are compared to those of established approaches, such as the support vector machines (SVM), using both the original parameters employed by the MLP as well as term frequency–inverse document frequency (TF–IDF) parameters, and the cascade correlation approach. It is found that the proposed MLP-based approach possesses a number of advantages, such as high classification accuracy, broadly comparable to that of the SVM, coupled with the ability to algorithmically reduce the set of parameters used without adversely affecting the classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification

Role of Machine Learning in Authorship Attribution with Select Stylometric Features

Application of Machine Learning Algorithms to Determine the Authorship of Text Fragments

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Gurney PJ, Gurney LW (1998) Subsets and homogeneity: authorship attribution in the Scriptores Historiae Augustae. Lit Linguist Comput 13(3):133–140
Article Google Scholar
Holmes DI (1994) Authorship attribution. Comput Humanit 28:86–106
Google Scholar
Mosteller F, Wallace DL (1984) Applied Bayesian and classical inference: the case of the Federalist papers, 2nd edn. Springer, New York
MATH Google Scholar
Holmes DI, Singh S, Tweedie FJ (1996) Neural network applications in stylometry: the Federalist papers. Comput Humanit 30:1–10
Article Google Scholar
Lowe D, Matthews R (1995) Shakespeare vs. Fletcher: a stylometric analysis by radial basis functions. Comput Humanit 29:449–461
Article Google Scholar
Tambouratzis G, Hairetakis N, Markantonatou S, Carayannis G (2003) Applying the SOM model to text classification according to register and stylistic content. Int J Neural Syst 13(1):1–11
Article Google Scholar
Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Somers H, Tweedie F (2003) Authorship attribution and practice. Comput Humanit 37:407–429
Article Google Scholar
Tambouratzis G, Markantonatou S, Hairetakis N, Vassiliou M, Tambouratzis D, Carayannis G (2004) Discriminating the registers and styles in the Modern Greek language—part 2: extending the feature vector to optimise author discrimination. Lit Linguist Comput 19(2):221–242
Article Google Scholar
Kolman E, Margaliot M (2005) Are artificial neural networks white boxes? IEEE Trans Neural Netw 16(4):844–852
Article Google Scholar
Mackay D (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4(3):448–472
Article Google Scholar
Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. Proc Int Jt Conf Neural Netw 3:21–26
Article Google Scholar
Papageorgiou H, Prokopidis P, Giouli V, Piperidis S (2000) A unified PoS tagging architecture and its application to Greek, vol 3. Second international conference on language resources and evaluation proceedings. Athens, pp 1455–1462
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
MATH Google Scholar
Fahlman S, Lebiere C (1990) The cascade-correlation learning architecture. Adv Neural Inform Process Syst 2:524–532 Morgan Kaufmann
Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inform Process Manag 24(5):513–523
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley Interscience, New York
MATH Google Scholar
Diederich J, Kinderman J, Leopold E, Paass G (2003) Authorship attribution with support vector machines. Appl Intell 19:109–123
Article MATH Google Scholar
Bhamidipati NL, Pal SK (2007) Stemming via distribution-based word segregation for classification and retrieval. IEEE Trans Syst Man Cybern B Cybern 37(2):350–360
Article Google Scholar
Drucker H, Wu D, Vapnik V (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054
Article Google Scholar
Koprinska I, Poon J, Clark J, Chan J (2007) Learning to classify e-mail. Inform Sci 177(10):2167–2187
Article Google Scholar
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Acknowledgments

The present study was partly funded by the PENED 03ED97 research project of the General Secretariat for Research and Technology of the Hellenic Ministry of Development.

Author information

Authors and Affiliations

Institute for Language and Speech Processing, Artemidos 6 & Epidavrou, Maroussi, 151 25, Greece
Nikos Tsimboukakis & George Tambouratzis

Authors

Nikos Tsimboukakis
View author publications
You can also search for this author inPubMed Google Scholar
George Tambouratzis
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Nikos Tsimboukakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsimboukakis, N., Tambouratzis, G. A comparative study on authorship attribution classification tasks using both neural network and statistical methods. Neural Comput & Applic 19, 573–582 (2010). https://doi.org/10.1007/s00521-009-0314-7

Download citation

Received: 29 January 2009
Accepted: 15 October 2009
Published: 01 November 2009
Issue Date: June 2010
DOI: https://doi.org/10.1007/s00521-009-0314-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative study on authorship attribution classification tasks using both neural network and statistical methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classification

Role of Machine Learning in Authorship Attribution with Select Stylometric Features

Application of Machine Learning Algorithms to Determine the Authorship of Text Fragments

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now