Abstract
In this paper, an effective methodology which hybridizes a LCS finding algorithm and SimHash computation is presented for evaluating the text-similarity of articles. It reduces the time-space scale needed by the LCS algorithm by breaking the articles into word subsequences of sentences, managing and pairing them by SimHash comparisons, and reaching the goal of evaluating long-length articles rapidly, with the similar parts and similarity score of compared articles figured out exactly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hunt, J.W., MacIlroy, M.D.: An algorithm for differential file comparison. Computing science technical report, #41, Bell Laboratories (1976)
Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmica 1(1–4), 251–266 (1986)
Sadowski, C., Levin, G.: SimHash: Hash-based similarity detection. Technical report UCSC-SOE-11-07, University of California, Santa Cruz, February 2011
Indu, P., et al.: A comparative study of different longest common subsequence algorithms. Int. J. Recent Res. Aspects 3(2), 65–69 (2016). ISSN 2349-7688
Hertel, M.: An O(ND) Difference Algorithm for C# (2006). https://www.mathertel.de/Diff/
Partow, A.: General Purpose Hash Function Algorithms. http://www.partow.net/programming/hashfunctions/
Examples of Plagiarism: Princeton University. https://pr.princeton.edu/pub/integrity/pages/plagiarism/
Lin, K.-M.: What is Plagiarism? Examples and Explanations. https://www.facebook.com/notes/657936507563326/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, SK., Chou, C. (2019). A Hybrid Methodology of Effective Text-Similarity Evaluation. In: Chang, CY., Lin, CC., Lin, HH. (eds) New Trends in Computer Technologies and Applications. ICS 2018. Communications in Computer and Information Science, vol 1013. Springer, Singapore. https://doi.org/10.1007/978-981-13-9190-3_24
Download citation
DOI: https://doi.org/10.1007/978-981-13-9190-3_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9189-7
Online ISBN: 978-981-13-9190-3
eBook Packages: Computer ScienceComputer Science (R0)