File(s) under permanent embargo
A Fuzzy R Code similarity detection algorithm
conference contribution
posted on 2014-01-01, 00:00 authored by M Bartoszuk, Marek GagolewskiMarek GagolewskiR is a programming language and software environment for performing statistical computations and applying data analysis that increasingly gains popularity among practitioners and scientists. In this paper we present a preliminary version of a system to detect pairs of similar R code blocks among a given set of routines, which bases on a proper aggregation of the output of three different [0,1]-valued (fuzzy) proximity degree estimation algorithms. Its analysis on empirical data indicates that the system may in future be successfully applied in practice in order e.g. to detect plagiarism among students' homework submissions or to perform an analysis of code recycling or code cloning in R's open source packages repositories. © Springer International Publishing Switzerland 2014.