Reference Hub13
Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection

Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection

Ali Daud, Jamal Ahmad Khan, Jamal Abdul Nasir, Rabeeh Ayaz Abbasi, Naif Radi Aljohani, Jalal S. Alowibdi
Copyright: © 2018 |Volume: 14 |Issue: 3 |Pages: 17
ISSN: 1552-6283|EISSN: 1552-6291|EISBN13: 9781522542926|DOI: 10.4018/IJSWIS.2018070103
Cite Article Cite Article

MLA

Daud, Ali, et al. "Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection." IJSWIS vol.14, no.3 2018: pp.53-69. http://doi.org/10.4018/IJSWIS.2018070103

APA

Daud, A., Khan, J. A., Nasir, J. A., Abbasi, R. A., Aljohani, N. R., & Alowibdi, J. S. (2018). Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection. International Journal on Semantic Web and Information Systems (IJSWIS), 14(3), 53-69. http://doi.org/10.4018/IJSWIS.2018070103

Chicago

Daud, Ali, et al. "Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection," International Journal on Semantic Web and Information Systems (IJSWIS) 14, no.3: 53-69. http://doi.org/10.4018/IJSWIS.2018070103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

In this article we present a new semantic and syntactic-based method for external plagiarism detection. In the proposed approach, latent dirichlet allocation (LDA) and parts of speech (POS) tags are used together to detect plagiarism between the sample and a number of source documents. The basic hypothesis is that considering semantic and syntactic information between two text documents may improve the performance of the plagiarism detection task. Our method is based on two steps, naming, which is a pre-processing where we detect the topics from the sentences in documents using the LDA and convert each sentence in POS tags array; then a post processing step where the suspicious cases are verified purely on the basis of semantic rules. For two types of external plagiarism (copy and random obfuscation), we empirically compare our approach to the state-of-the-art N-gram based and stop-word N-gram based methods and observe significant improvements.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.