A Fuzzy R Code Similarity Detection Algorithm

Bartoszuk, Maciej; Gagolewski, Marek

doi:10.1007/978-3-319-08852-5_3

Maciej Bartoszuk¹⁶ &
Marek Gagolewski^17,18

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 444))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1116 Accesses
2 Citations

Abstract

R is a programming language and software environment for performing statistical computations and applying data analysis that increasingly gains popularity among practitioners and scientists. In this paper we present a preliminary version of a system to detect pairs of similar R code blocks among a given set of routines, which bases on a proper aggregation of the output of three different [0,1]-valued (fuzzy) proximity degree estimation algorithms. Its analysis on empirical data indicates that the system may in future be successfully applied in practice in order e.g. to detect plagiarism among students’ homework submissions or to perform an analysis of code recycling or code cloning in R’s open source packages repositories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aiken, A.: MOSS (Measure of software similarity) plagiarism detection system, http://theory.stanford.edu/~aiken/moss/
Chilowicz, M., Duris, E., Roussel, G.: Viewing functions as token sequences to highlight similarities in source code. Science of Computer Programming 78, 1871–1891 (2013)
Article Google Scholar
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)
Article Google Scholar
Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program Lang. Syst. 9(3), 319–349 (1987)
Article MATH Google Scholar
Fodor, J., Roubens, M.: Fuzzy Preference Modelling and Multicriteria Decision Support. Springer (1994)
Google Scholar
Gagolewski, M., Grzegorzewski, P.: Possibilistic analysis of arity-monotonic aggregation operators and its relation to bibliometric impact assessment of individuals. International Journal of Approximate Reasoning 52(9), 1312–1324 (2011)
Article MATH MathSciNet Google Scholar
Grabisch, M., Marichal, J.L., Mesiar, R., Pap, E.: Aggregation functions. Cambridge University Press (2009)
Google Scholar
Hamming, R.W.: Error detecting and error correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
Article MathSciNet Google Scholar
Lee, C.Y.: Some properties of nonbinary error-correcting codes. IRE Transactions on Information Theory 4(2), 77–82 (1958)
Article Google Scholar
Levenshtein, I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
MathSciNet Google Scholar
Liu, C., Chen, C., Han, J., Yu, P.S.: GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis. In: Proc. 12th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining (KDD 2006), pp. 872–881 (2006)
Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Article Google Scholar
Prechelt, L., Malpohl, G., Philippsen, M.: Finding plagiarisms among a set of programs with JPlag. Journal of Universal Computer Science 8(11), 1016–1038 (2002)
Google Scholar
Prechelt, L., Malpohl, G., Phlippsen, M.: JPlag: Finding plagiarisms among a set of programs. Tech. rep. (2000)
Google Scholar
Qu, W., Jia, Y., Jiang, M.: Pattern mining of cloned codes in software systems. Information Sciences 259, 544–554 (2014)
Article Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2014), http://www.R-project.org/
Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proc. Section on Survey Research Methods (ASA), pp. 354–359 (1990)
Google Scholar
Wise, M.J.: String similarity via greedy string tiling and running Karp-Rabin matching. Tech. rep., Dept. of Computer Science, University of Sydney (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Interdisciplinary PhD Studies Program, Systems Research Institute, Polish Academy of Sciences, Poland
Maciej Bartoszuk
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Marek Gagolewski
Faculty of Mathematics and Information Science, Warsaw University of Technology, ul. Koszykowa 75, 00-662, Warsaw, Poland
Marek Gagolewski

Authors

Maciej Bartoszuk
View author publications
You can also search for this author in PubMed Google Scholar
Marek Gagolewski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University Montpellier 2, LIRMM - CNRS UMR 5506, 161, Rue Ada, 34392, Montpellier Cedex 5, France
Anne Laurent
Institute for Human Factors and Technology Management, IAT, University of Stuttgart, Nobelstraße 12, 70569, Stuttgart, Germany
Oliver Strauss
LIP6, UPMC Univ. Paris 06, CNRS UMR 7606, F-75005, Paris, France
Bernadette Bouchon-Meunier
Dept. of Information Systems, Iona College, 710 North Ave, 10801, New Rochelle, NY, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bartoszuk, M., Gagolewski, M. (2014). A Fuzzy R Code Similarity Detection Algorithm. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2014. Communications in Computer and Information Science, vol 444. Springer, Cham. https://doi.org/10.1007/978-3-319-08852-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-08852-5_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08851-8
Online ISBN: 978-3-319-08852-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics