Skip to main content

A Hybrid Methodology of Effective Text-Similarity Evaluation

  • Conference paper
  • First Online:
New Trends in Computer Technologies and Applications (ICS 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1013))

Included in the following conference series:

Abstract

In this paper, an effective methodology which hybridizes a LCS finding algorithm and SimHash computation is presented for evaluating the text-similarity of articles. It reduces the time-space scale needed by the LCS algorithm by breaking the articles into word subsequences of sentences, managing and pairing them by SimHash comparisons, and reaching the goal of evaluating long-length articles rapidly, with the similar parts and similarity score of compared articles figured out exactly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hunt, J.W., MacIlroy, M.D.: An algorithm for differential file comparison. Computing science technical report, #41, Bell Laboratories (1976)

    Google Scholar 

  2. Myers, E.W.: An O(ND) difference algorithm and its variations. Algorithmica 1(1–4), 251–266 (1986)

    Article  MathSciNet  Google Scholar 

  3. Sadowski, C., Levin, G.: SimHash: Hash-based similarity detection. Technical report UCSC-SOE-11-07, University of California, Santa Cruz, February 2011

    Google Scholar 

  4. Indu, P., et al.: A comparative study of different longest common subsequence algorithms. Int. J. Recent Res. Aspects 3(2), 65–69 (2016). ISSN 2349-7688

    Google Scholar 

  5. Hertel, M.: An O(ND) Difference Algorithm for C# (2006). https://www.mathertel.de/Diff/

  6. Partow, A.: General Purpose Hash Function Algorithms. http://www.partow.net/programming/hashfunctions/

  7. Examples of Plagiarism: Princeton University. https://pr.princeton.edu/pub/integrity/pages/plagiarism/

  8. Lin, K.-M.: What is Plagiarism? Examples and Explanations. https://www.facebook.com/notes/657936507563326/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu-Kai Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, SK., Chou, C. (2019). A Hybrid Methodology of Effective Text-Similarity Evaluation. In: Chang, CY., Lin, CC., Lin, HH. (eds) New Trends in Computer Technologies and Applications. ICS 2018. Communications in Computer and Information Science, vol 1013. Springer, Singapore. https://doi.org/10.1007/978-981-13-9190-3_24

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9190-3_24

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9189-7

  • Online ISBN: 978-981-13-9190-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics