skip to main content
10.1145/3033288.3033300acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicnccConference Proceedingsconference-collections
research-article

Analysis on the Effect of Term-Document's Matrix to the Accuracy of Latent-Semantic-Analysis-Based Cross-Language Plagiarism Detection

Published: 17 December 2016 Publication History

Abstract

This paper presents the results of experimental investigation on the impact of term-document matrix variations to the accuracy of cross-language LSA-based plagiarism detection. The experiment was focusing in comparing Indonesian and English papers. The increase of document definition size as the source of matrix construction significantly caused negative impact to the detection accuracy in all scenarios. The results of the experiments showed that the document definition size must be kept below 10 in order to maintain high accuracy, and reached its worst performance at 25. Additionally, the implementation of term-document matrix using the frequency of word's occurrence was found beneficial to the improvement of detection accuracy compared to the binary implementation using simply the existence/absence of words.

References

[1]
T. Landauer, P. Foltz and D. Laham, "An Introdution to Latent Semantic Analysis," Discourse Processes, vol. 25, no. 2--3, pp. 259--284, 1998.
[2]
S. T. Dumais, "Latent Semantic Analysis," Annual Review of Information Science and Technology, vol. 38, no. 1, pp. 188--230, 2004.
[3]
A. A. P. Ratna, E. Lomempow, P. D. Purnamasari, U. Yuwono and B. A. Adhi, "Latent Semantic Analysis Based Automatic Cross-Language Plagiarism Detector for Paragraph Written in Two Syntactically Distinct Language," in The Third Asian Conference on Society, Education & Technology (ASCET), Kobe, 2015.
[4]
C. Barnbaum, "Plagiarism: A Student's Guide To Recognizing It And Avoiding It," Department of Physics and Astronomy, Valdosta State University, {Online}. Available: http://ww2.valdosta.edu/~cbarnbau/personal/teaching_MISC/plagiarism.htm. {Accessed 29 March 2016}.
[5]
C. Neville, The Complete Guide To Referencing And Avoiding Plagiarism (Open Up Study Skills), Milton Keynes, United Kingdom: Open University Press, 2010.
[6]
A. G. P. J. Euzenat, The Semantic Web: Research and Application, Berlin: Springer, 2005.
[7]
C. F. Shalom Lappin, The Handbook of Contemporary Semantic Theory, John Wiley & Sons, 2015.
[8]
D. L. Olson and D. Delen, Advanced Data Mining Techniques, 1st edition ed., Springer, 2008, p. 138.
[9]
W. P. R. W. S. L. K. Krzysztof J. Cios, "Learning Vector Quantization," Data Mining: A Knowledge Discovery Approach, Spirnger Science & Business Media, 2007, p. 173.
[10]
S. Soleman and A. Purwarianti, "Experiments on the Indonesian Plagiarism Detection using Latent Semantic Analysis," in International Conference on Information and Communication Technology, Bandung, 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICNCC '16: Proceedings of the Fifth International Conference on Network, Communication and Computing
December 2016
343 pages
ISBN:9781450347938
DOI:10.1145/3033288
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cross-Language Plagiarism Detection
  2. Latent Semantic Analysis
  3. Learning Vector Quantization
  4. Term-Document Matrix

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICNCC '16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 67
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media