skip to main content
article

Sentence-based natural language plagiarism detection

Published: 01 December 2004 Publication History

Abstract

With the increasing levels of access to higher education in the United Kingdom, larger class sizes make it unrealistic for tutors to be expected to identify instances of peer-to-peer plagiarism by eye and so automated solutions to the problem are required. This document details a novel algorithm for comparison of suspect documents at a sentence level and has been implemented as a component of plagiarism detection software for detecting similarities in both natural language documents and comments within program source-code. The algorithm is capable of detecting sophisticated obfuscation (such as paraphrasing, reordering, merging, and splitting sentences) as well as direct copying. The implemented algorithm has also been used to successfully detect plagiarism on real assignments at the university. The software has been evaluated by comparison with other plagiarism detection tools.

References

[1]
Carroll, J. and Appleton, J. 2001. Plagiarism: A good practice guide. Tech. rep., Joint Information Services Committee. Available: http://www.jisc.ac.uk/index.cfm?name=project_plag_practise (Accessed 27th January 2004).
[2]
Chester, G. 2001. Pilot of free-text detection software. Tech. rep., Joint Information Services Committee. Available: http://www.jisc.ac.uk/index.cfm?name=project_plag_pilot (Accessed 20th April 2005).
[3]
Culwin, F. and Lancaster, T. 2001. Visualising intra-corpal plagiarism. In 5th International Conference of Information Visualisation (IV 2001). London, England.
[4]
Culwin, F. and Lancaster, T. 2004. Plagiarism prevention and detection. online. Available: http://cise.lsbu.ac.uk (Accessed 20th April 2005).
[5]
Culwin, F., Macleod, A., and Lancaster, T. 2001. Source-code plagiarism in uk he computing schools. In 2nd Annual Conference of the LTSN Centre for Information and Computer Sciences. University of North London, England.
[6]
Curtis, P. 2003. Hodge defends higher education target. online. Available: http://education. guardian.co.uk/print/0,3858,4582592-108229,00.html (Accessed 20th April 2005).
[7]
Decoo, W. 2002. Crisis on Campus: Confronting Academic Misconduct. The MIT Press, Cambridge, MA.
[8]
Finkel, R. A., Zaslavsky, A., Monostori, K., and Schmidt, H. 2002. Signature extraction for overlap detection in documents. In Proceedings of the 25th Australasian Conference on Computer Science. Australian Computer Society, Inc., 59--64.
[9]
Hoad, T. C. and Zobel, J. 2003. Methods for identifying versioned and plagiarized documents. Journal of the American Society for Information Science and Technology 54, 3, 203--215.
[10]
iParadigms. 2005. Jisc service---solutions for a new era in education. online. Available: http://www.submit.ac.uk (Accessed 20th April 2005).
[11]
Joy, M. S. and Luck, M. 1998. Computer Based Assessment (Vol. 2): Case Studies in Science and Computing. SEED Publications, University of Plymouth, United Kingdom. The BOSS system for on-line submission and assessment of computing assignments, 39--44.
[12]
Joy, M. S. and Luck, M. 1999. Plagiarism in programming assignments. IEEE Transactions on Education 42, 2, 129--133.
[13]
Monostori, K., Zaslavsky, A., and Bia, A. 2001. Using the matchdetectreveal system for comparative analysis of texts. In Proceedings of the 6th Australian Document Computing Symposium (ADCS 2001). 51--58.
[14]
Monostori, K., Finkel, R. A., Zaslavsky, A., Hodasz, G., and Pataki, M. 2002. Comparison of overlap detection techniques. In International Conference on Computational Science (ICCS 2002). Amsterdam, The Netherlands, 51--60.
[15]
Prechelt, L., Malpohl, G., and Phillipsen, M. 2002. Finding plagiarisms among a set of programs with jplag. Journal of Universal Computer Science 8, 11.
[16]
Ribler, R. L. and Abrams, M. 2000. Using visualization to detect plagiarism in computer science classes. In IEEE Symposium on Information Visualisation. Salt Lake City, Utah, 173--177.
[17]
Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd Edn. Morgan Kaufmann, San Francisco, California.
[18]
Woolls, D. 2003. Private correspondence.
[19]
Woolls, D. 2004. Welcome to the home of powerful text analysis tools. online. Available: http://www.copycatchgold.com/ (Accessed 20th April 2005).

Cited By

View all
  • (2022)TransVis: Integrated Distant and Close Reading of Othello TranslationsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.301277828:2(1397-1414)Online publication date: 1-Feb-2022
  • (2022)An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embeddingExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.116677197:COnline publication date: 18-May-2022
  • (2021)PLAGIARISM DETECTION WITH OPTICAL CHARACTER RECOGNITIONi-manager's Journal on Computer Science10.26634/jcom.9.1.180989:1(15)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal on Educational Resources in Computing
Journal on Educational Resources in Computing  Volume 4, Issue 4
December 2004
55 pages
ISSN:1531-4278
EISSN:1531-4278
DOI:10.1145/1086339
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2004
Published in JERIC Volume 4, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Natural language
  2. plagiarism detection

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)TransVis: Integrated Distant and Close Reading of Othello TranslationsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.301277828:2(1397-1414)Online publication date: 1-Feb-2022
  • (2022)An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embeddingExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.116677197:COnline publication date: 18-May-2022
  • (2021)PLAGIARISM DETECTION WITH OPTICAL CHARACTER RECOGNITIONi-manager's Journal on Computer Science10.26634/jcom.9.1.180989:1(15)Online publication date: 2021
  • (2020)AlignVis: Semi-automatic Alignment and Visualization of Parallel Translations2020 24th International Conference Information Visualisation (IV)10.1109/IV51561.2020.00026(98-108)Online publication date: Sep-2020
  • (2019)iCheckerScholarly Ethics and Publishing10.4018/978-1-5225-8057-7.ch011(232-247)Online publication date: 2019
  • (2017)Plagiarism Detection in Malayalam Language Text using a Composition of Similarity measuresProceedings of the 9th International Conference on Machine Learning and Computing10.1145/3055635.3056655(456-460)Online publication date: 24-Feb-2017
  • (2017)Constructive Visual Analytics for Text Similarity DetectionComputer Graphics Forum10.1111/cgf.1279836:1(237-248)Online publication date: 1-Jan-2017
  • (2017)Fast Plagiarism Detection in Large-Scale DataBeyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation10.1007/978-3-319-58274-0_27(329-343)Online publication date: 27-Apr-2017
  • (2016)iCheckerInternational Journal of Systems and Service-Oriented Engineering10.4018/IJSSOE.20160701026:3(16-31)Online publication date: 1-Jul-2016
  • (2015)Sentence-Based Plagiarism Detection for Japanese Document Based on Common Nouns and Part-of-Speech StructureIntelligent Software Methodologies, Tools and Techniques10.1007/978-3-319-17530-0_21(297-308)Online publication date: 7-May-2015
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media