Abstract
The focus of the paper is to improve intrinsic plagiarism detection. The paper investigates and improves performance of character n-grams profiles method proposed by Stamatatos by tuning its parameter settings and proposing new modifications and rich feature sets. We raised the overall plagdet score from 24.67% to 33.41% for the PAN-PC09 corpus and from 18.83% to 26.66% for the PAN-PC11 corpus. Results are reported on PAN-PC09 and PAN-PC11 corpora, which are especially well suited for this task and were previously used in Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN) competitions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
McCabe, D.: Levels of cheating and plagiarism remain high. Technical report, Duke University, Center for Academic Integrity (2005)
Sheard, J., Dick, M., Markham, S., MacDonald, I., Walsh, M.: Cheating and plagiarism: Perceptions and practices of first year IT students. In: Caspersen, M.E., Joyce, D., Goelman, D., Utting, I. (eds.) Seventh Annual Conference on Innovation and Technology in Computer Science Education, pp. 183–187 (2002)
Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Language Resources and Evaluation 45(1), 63–82 (2011)
Oberreuter, G., L’Huillier, G., RÃos, S.A., Velásquez, J.D.: Approaches for intrinsic and external plagiarism detection - Notebook for PAN at CLEF 2011. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)
Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)
Kestemont, M., Luyckx, K., Daelemans, W.: Intrinsic plagiarism detection using character trigram distance scores - Notebook for PAN at CLEF 2011. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)
Stamatatos, E.: Intrinsic plagiarism detection using character n-gram profiles. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 38–46 (2009)
Akiva, N.: Using clustering to identify outlier chunks of text - Notebook for PAN at CLEF 2011. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)
Seaward, L., Matwin, S.: Intrinsic plagiarism detection using complexity analysis. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 56–61 (2009)
Potthast, M., Eiselt, A., Stein, B., Barrón-Cedeño, A., Rosso, P.: Plagiarism Corpus PAN-PC 2009 (2009), http://www.webis.de/research/corpora
Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Huang, C.R., Jurafsky, D. (eds.) 23rd International Conference on Computational Linguistics (COLING 2010), pp. 997–1005. Association for Computational Linguistics (2010)
Barrón-Cedeño, A., Potthast, M., Rosso, P., Stein, B., Eiselt, A.: Corpus and Evaluation Measures for Automatic Plagiarism Detection. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) 7th Conference on International Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA) (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kuta, M., Kitowski, J. (2014). Optimisation of Character n-gram Profiles Method for Intrinsic Plagiarism Detection. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8468. Springer, Cham. https://doi.org/10.1007/978-3-319-07176-3_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-07176-3_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07175-6
Online ISBN: 978-3-319-07176-3
eBook Packages: Computer ScienceComputer Science (R0)