Abstract:
We propose RefactorScore, an automatic evaluation metric for code. RefactorScore computes the number of refactor prone locations on each token in a candidate file and map...Show MoreMetadata
Abstract:
We propose RefactorScore, an automatic evaluation metric for code. RefactorScore computes the number of refactor prone locations on each token in a candidate file and maps the occurrences into a quantile to produce a score. RefactorScore is evaluated across 61,735 commits and uses a model called RefactorBERT trained to predict refactors on 1,111,246 commits. Finally, we validate RefactorScore on a set of industry leading projects providing each with a RefactorScore. We calibrate RefactorScore's detection of low quality code with human developers through a human subject study. RefactorBERT, the model driving the scoring mechanism, is capable of predicting defects and refactors predicted by RefDiff 2.0. To our knowledge, our approach, coupled with the use of large scale data for training and validated with human developers, is the first code quality scoring metric of its kind.
Published in: IEEE Transactions on Software Engineering ( Volume: 49, Issue: 11, November 2023)