On Automatic Plagiarism Detection Based on n-Grams Comparison

Barrón-Cedeño, Alberto; Rosso, Paolo

doi:10.1007/978-3-642-00958-7_69

Alberto Barrón-Cedeño¹⁹ &
Paolo Rosso¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

European Conference on Information Retrieval

3476 Accesses
45 Citations

Abstract

When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the plagiarised text fragments to their potential source. One of the biggest difficulties in this task is to locate plagiarised fragments that have been modified (by rewording, insertion or deletion, for example) from the source text.

The definition of proper text chunks as comparison units of the suspicious and original texts is crucial for the success of this kind of applications. Our experiments with the METER corpus show that the best results are obtained when considering low level word n-grams comparisons (n = {2,3}).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Clough, P., Gaizauskas, R., Piao, S.: Building and Annotating a Corpus for the Study of Journalistic Text Reuse. In: 3rd International Conference on Language Resources and Evaluation (LREC 2002), V, pp. 1678–1691. Las Palmas, Spain (2002)
Google Scholar
Kang, N., Gelbukh, A.: PPChecker: Plagiarism Pattern Checker in Document Copy Detection. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS, vol. 4188, pp. 661–667. Springer, Heidelberg (2006)
Chapter Google Scholar
Lyon, C., Malcolm, J., Dickerson, B.: Detecting Short Passages of Similar Text in Large Document Collections. In: Conference on Empirical Methods in Natural Language Processing, Pennsylvania, pp. 118–125 (2001)
Google Scholar
Lyon, C., Barrett, R., Malcolm, J.: A Theoretical Basis to the Automated Detection of Copying Between Texts, and its Practical Implementation in the Ferret Plagiarism and Collusion Detector. In: Plagiarism: Prevention, Practice and Policies Conference, Newcastle, UK (2004)
Google Scholar
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Engineering Lab. Dpto. Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Spain
Alberto Barrón-Cedeño & Paolo Rosso

Authors

Alberto Barrón-Cedeño
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Rosso
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université de Toulouse - IRIT,, 118 Route de Narbonne,, 31062,, Toulouse Cedex 4,, France
Mohand Boughanem
Laboratoire d’Informatique de Grenoble, BP 53,, Université Joseph Fourier,, 38041, Grenoble Cedex 9,, France
Catherine Berrut
Université de Toulouse - IRIT,, 118 Route de Narbonne,, 31062, Toulouse Cedex 4,, France
Josiane Mothe & Chantal Soule-Dupuy &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barrón-Cedeño, A., Rosso, P. (2009). On Automatic Plagiarism Detection Based on n-Grams Comparison. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_69

Download citation

DOI: https://doi.org/10.1007/978-3-642-00958-7_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics