Abstract
Currently, a large number of documents are created as digital material and distributed world-wide. Digital materials are easy to publish and copy at a remarkably low cost. As a result, many documents are copied illegally, and this practice is spreading, making plagiarism a significant social issue. Therefore, the need to develop systems that detect plagiarism is very high. We have developed a new plagiarism detection method that compares documents by using approximate string matching to detect plagiarism. We have also developed a technique that reduces the computational time of the comparison method. In this paper, we demonstrate our proposed method’s usefulness through experiments and through the measuring indexes of precision and recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lukashenko, R., Graudina, V., Grundspenkis, J.: Computer-based plagiarism detection methods and tools: an overview. In: Proceedings of the 2007 International Conference on Computer Systems and Technologies. ACM, 2007, pp. 1–6
Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Partial plagiarism detection using string matching with mismatches. In: Proceedings of Informatics Engineering and Information Science, Springer CCIS254, pp. 265–272 (2011)
Nakatoh, T., Baba, K., Yamada, Y., Ikeda, D.: Speed improvement of the plagiarism detection method. In: Proceedings of IIAI International Conference on Advanced Information Technologies 2013 (IIAI AIT 2013), 2013, P38.pdf
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)
Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Inc., New York, NY, USA (1994)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company (2002)
Fischer, M.J., Paterson, M.S.: String-matching and other products. In Proceedings of the SIAM-AMS Applied Mathematics Symposium. Massachusetts Institute of Technology Cambridge, MA, USA, pp. 113–125 (1974)
Atallah, M., Chyzak, F., Dumas, P.: A randomized algorithm for approximate string matching. Algorithmica 29(3), 468–486 (2001)
Baba, K., Shinohara, A., Takeda, M., Inenaga, S., Arikawa, S.: A note on randomized algorithm for string matching with mismatches. Nordic J. Comput. 10, 2–12 (2003)
Nakatoh, T., Baba, K., Ikeda, D., Yamada, Y., Hirokawa, S.: An efficient mapping for computing the score of string matching. J. Automata Lang. Combinat. 10(5/6), 697–704 (2005)
Schoenmeyr, T., Zhang, D.Y.: FFT-based algorithms for the string matching with mismatches problem. J. Algorithms 57(2), 130–139 (2005)
Baba, K., Tanaka, Y., Nakatoh, T., Shinohara, A.: A generalization of FFT algorithm for string matching. In: Proceedings of International Symposium on Information Science and Electrical Engineering, pp. 191–194 (2003)
Nakatoh, T., Baba, K., Ikeda, D., Mori, M., Hirokawa, S.: Accuracy evaluation of FFT-based randomized algorithms for string matching with mismatches (in Japanese). IPSJ Trans. Databases (TOD) 2(4), 24–31 (2009)
Potthast, M., Gollub, T., Hagen, M., Graßegger, J., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., Stein, B.: Overview of the 4th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2012 Evaluation Labs, eds. by Forner, P., Karlgren, J., Womser-Hacker, C., Sept. 2012
Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 5th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2013 Evaluation Labs, eds. by Forner, P., Navigli, R., Tufis, D. Sep. 2013
Potthast, M., Gollub, T., Hagen, M., Tippmann, M., Kiesel, J., Rosso, P., Stamatatos, E., Stein, B.: Overview of the 6th International Competition on Plagiarism Detection in Working Notes Papers of the CLEF 2014 Evaluation Labs, ser. CEUR Workshop Proceedings, L. Cappellato, N. Ferro, M. Halvey, W. Kraaij, Eds. CLEF and CEUR-WS.org, Sep. 2014
L. Kong, H. Qi, S. Wang, C. Du, S. Wang, and Y. Han, “Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection-Notebook for PAN at CLEF 2012,” in CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy, P. Forner, J. Karlgren, and C. Womser-Hacker, Eds., Sep. 2012
Rodríguez Torrejon, D., Martín Ramos, J.: Text alignment module in CoReMo 2.1 plagiarism detector-notebook for PAN at CLEF 2013. In: CLEF 2013 Evaluation Labs and Workshop Working Notes Papers, 23–26 September, Valencia, Spain, eds. by Forner, P., Navigli, R., Tufis, D. Sept. 2013
Sanchez-Perez, M., Sidorov, G., Gelbukh, A.: A winning approach to text alignment for text reuse detection at PAN 2014 Notebook for PAN at CLEF 2014. In: CLEF 2014 Evaluation Labs and Workshop—Working Notes Papers, 15–18 September, Sheffield, UK, ser. CEUR Workshop Proceedings, eds. by Cappellato, L., Ferro, N., Halvey, M., Kraaij, W., CEUR-WS.org, Sept. 2014
Acknowledgements
This work was supported by JSPS KAKENHI Grant Number 15K00426.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Nakatoh, T., Minami, T. (2018). Reducing Computational Effort for Plagiarism Detection with Approximate String Matching. In: Ghazali, R., Deris, M., Nawi, N., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2018. Advances in Intelligent Systems and Computing, vol 700. Springer, Cham. https://doi.org/10.1007/978-3-319-72550-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-72550-5_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72549-9
Online ISBN: 978-3-319-72550-5
eBook Packages: EngineeringEngineering (R0)