Skip to main content

Reducing Computational Effort for Plagiarism Detection with Approximate String Matching

  • Conference paper
  • First Online:
Recent Advances on Soft Computing and Data Mining (SCDM 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 700))

Included in the following conference series:

  • 1401 Accesses

Abstract

Currently, a large number of documents are created as digital material and distributed world-wide. Digital materials are easy to publish and copy at a remarkably low cost. As a result, many documents are copied illegally, and this practice is spreading, making plagiarism a significant social issue. Therefore, the need to develop systems that detect plagiarism is very high. We have developed a new plagiarism detection method that compares documents by using approximate string matching to detect plagiarism. We have also developed a technique that reduces the computational time of the comparison method. In this paper, we demonstrate our proposed method’s usefulness through experiments and through the measuring indexes of precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-based plagiarism detection methods and tools: an overview. In: Proceedings of the 2007 International Conference on Computer Systems and Technologies. ACM, 2007, pp. 1–6

    Google Scholar 

  2. Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Partial plagiarism detection using string matching with mismatches. In: Proceedings of Informatics Engineering and Information Science, Springer CCIS254, pp. 265–272 (2011)

    Google Scholar 

  3. Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Speed improvement of the plagiarism detection method. In: Proceedings of IIAI International Conference on Advanced Information Technologies 2013 (IIAI AIT 2013), 2013, P38.pdf

    Google Scholar 

  4. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  5. Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Inc., New York, NY, USA (1994)

    MATH  Google Scholar 

  6. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company (2002)

    Google Scholar 

  7. Fischer, M.J., Paterson, M.S.: String-matching and other products. In Proceedings of the SIAM-AMS Applied Mathematics Symposium. Massachusetts Institute of Technology Cambridge, MA, USA, pp. 113–125 (1974)

    Google Scholar 

  8. Atallah, M., Chyzak, F., Dumas, P.: A randomized algorithm for approximate string matching. Algorithmica 29(3), 468–486 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  9. Baba, K., Shinohara, A., Takeda, M., Inenaga, S., Arikawa, S.: A note on randomized algorithm for string matching with mismatches. Nordic J. Comput. 10, 2–12 (2003)

    MathSciNet  MATH  Google Scholar 

  10. Nakatoh, T., Baba, K., Ikeda, D., Yamada, Y., Hirokawa, S.: An efficient mapping for computing the score of string matching. J. Automata Lang. Combinat. 10(5/6), 697–704 (2005)

    MathSciNet  MATH  Google Scholar 

  11. Schoenmeyr, T., Zhang, D.Y.: FFT-based algorithms for the string matching with mismatches problem. J. Algorithms 57(2), 130–139 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Baba, K., Tanaka, Y., Nakatoh, T., Shinohara, A.: A generalization of FFT algorithm for string matching. In: Proceedings of International Symposium on Information Science and Electrical Engineering, pp. 191–194 (2003)

    Google Scholar 

  13. Nakatoh, T., Baba, K., Ikeda, D., Mori, M., Hirokawa, S.: Accuracy evaluation of FFT-based randomized algorithms for string matching with mismatches (in Japanese). IPSJ Trans. Databases (TOD) 2(4), 24–31 (2009)

    Google Scholar 

  14. Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2012 Evaluation Labs, eds. by Forner, P., Karlgren, J., Womser-Hacker, C., Sept. 2012

    Google Scholar 

  15. Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2013 Evaluation Labs, eds. by Forner, P., Navigli, R., Tufis, D. Sep. 2013

    Google Scholar 

  16. Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 6th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2014 Evaluation Labs, ser. CEUR Workshop Proceedings, L. Cappellato, N. Ferro, M. Halvey, W. Kraaij, Eds. CLEF and CEUR-WS.org, Sep. 2014

    Google Scholar 

  17. L. Kong, H. Qi, S. Wang, C. Du, S. Wang, and Y. Han, “Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection-Notebook for PAN at CLEF 2012,” in CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy, P. Forner, J. Karlgren, and C. Womser-Hacker, Eds., Sep. 2012

    Google Scholar 

  18. Rodríguez Torrejon, D., Martín Ramos, J.: Text alignment module in CoReMo 2.1 plagiarism detector-notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop Working Notes Papers, 23–26 September, Valencia, Spain, eds. by Forner, P., Navigli, R., Tufis, D. Sept. 2013

    Google Scholar 

  19. Sanchez-Perez, M., Sidorov, G., Gelbukh, A.: A winning approach to text alignment for text reuse detection at PAN 2014 Notebook for PAN at CLEF 2014. In: CLEF 2014 Evaluation Labs and Workshop—Working Notes Papers, 15–18 September, Sheffield, UK, ser. CEUR Workshop Proceedings, eds. by Cappellato, L., Ferro, N., Halvey, M., Kraaij, W., CEUR-WS.org, Sept. 2014

    Google Scholar 

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number 15K00426.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsuya Nakatoh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nakatoh, T., Minami, T. (2018). Reducing Computational Effort for Plagiarism Detection with Approximate String Matching. In: Ghazali, R., Deris, M., Nawi, N., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2018. Advances in Intelligent Systems and Computing, vol 700. Springer, Cham. https://doi.org/10.1007/978-3-319-72550-5_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72550-5_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72549-9

  • Online ISBN: 978-3-319-72550-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics