Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software

Tsikerdekis, Michail

doi:10.1007/s10664-017-9575-4

Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software

Published: 21 November 2017

Volume 23, pages 1871–1894, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Michail Tsikerdekis ORCID: orcid.org/0000-0001-6898-952X¹

527 Accesses
4 Citations
4 Altmetric
Explore all metrics

Abstract

Measuring code contribution in crowdsourced software is essential for ranking contributors to a project or distributing revenue. Past studies have demonstrated that there is variation between different code contribution measures and their ability for ranking users accurately. This study proposes a new code contribution ranking algorithm, Persistent Code Contribution (PCC), that aims to be language independent, quality aware and provide a ranking balance between new and senior users. PCC tracks the number of characters contributed by a user and ranks each character based on the number of subsequent revisions that each character survived for. It also tracks lines that may have been moved between revisions in the code and attributes character changes to the appropriate user that committed them to a repository. A ranking comparison between existing code contribution measures is performed to determine the similarities and differences, and, quantitative as well as qualitative evidence is presented as a means to validate the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations

Article 05 September 2020

Soumaya Rebai, Abderrahmen Amich, … Rick Kazman

Investigating the effectiveness of peer code review in distributed software development based on objective and subjective data

Article Open access 26 October 2018

Eduardo Witter dos Santos & Ingrid Nunes

Profile based recommendation of code reviewers

Article Open access 15 August 2017

Mikołaj Fejzer, Piotr Przymus & Krzysztof Stencel

References

Beck K (1999) Embracing change with extreme programming. https://doi.org/10.1109/2.796139
Benaglia T, Chauveau D, Hunter DR, Young DS (2009) mixtools: An R Package for Analyzing Finite Mixture Models. J Stat Softw 32 (6):1–29. https://hal.archives-ouvertes.fr/hal-00384896
Article Google Scholar
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’T touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE ’11. https://doi.org/10.1145/2025113.2025119, pp 4–14
Black P (2004) Ratcliff/Obershelp pattern recognition. http://www.nist.gov/dads/HTML/ratcliffObershelp.html
Canfora G, Cerulo L, Penta MD (2007) Identifying changed source code lines from version repositories. https://doi.org/10.1109/MSR.2007.14
Canfora G, Cerulo L, Penta MD (2009) Ldiff: an enhanced line differencing tool. https://doi.org/10.1109/ICSE.2009.5070564
Dixon J (2009) The Beekeeper. http://wiki.pentaho.com/display/BEEKEEPER/The+Beekeeper
Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12. https://doi.org/10.1109/32.895984
Article Google Scholar
Eyolfson J, Tan L, Lam P (2011) Do time of day and developer experience affect commit bugginess?. In: Proceedings of the 8th Working Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ’11, pp 153–162. https://doi.org/10.1145/1985441.1985464
Foucault M, Falleri JR, Blanc X (2014) Code ownership in open-source software. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, ACM, New York, NY, USA, EASE ’14, pp 39:1—-39:9. https://doi.org/10.1145/2601248.2601283
Foucault M, Teyton C, Lo D, Blanc X, Falleri JR (2015) On the usefulness of ownership metrics in open-source software projects. Inf Softw Technol 64:102–112. https://doi.org/10.1016/j.infsof.2015.01.013. http://www.sciencedirect.com/science/article/pii/S0950584915000294
Article Google Scholar
Frantzeskou G, Stamatatos E, Gritzalis S, Chaski CE, Howald BS (2007) Identifying authorship by byte-level N-grams: the source code author profile (SCAP) method. Int J Digital Evidence 6(1):1–18
Google Scholar
Frantzeskou G, MacDonell SG, Stamatatos E (2010) Source code authorship analysis for supporting the cybercrime investigation process. In: Handbook of Research on Computational Forensics, Digital Crime, and Investigation, IGI Global. https://doi.org/10.4018/978-1-60566-836-9.ch020, pp 470–495
Halfaker A, Keyes O, Kluver D, Thebault-Spieker J, Nguyen T, Shores K, Uduwage A, Warncke-Wang M (2015) User session identification based on strong regularities in inter-activity time. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, WWW ’15, pp 410–418. https://doi.org/10.1145/2736277.2741117
Halvorsen SM, Raaen K (2014) Games for research: a comparative study of open source game projects. In: Mey D, Alexander M, Bientinesi P, Cannataro M, Clauss C, Costan A, Kecskemeti G, Morin C, Ricci L, Sahuquillo J, Schulz M, Scarano V, Scott SL, Weidendorfer J (eds) Euro-Par 2013: Parallel Processing Workshops: BigDataCloud, DIHC, FedICI, HeteroPar, HiBB, LSDVE, MHPC, OMHI, PADABS, PROPER, Resilience, ROME, and UCHPC 2013, Aachen, Germany, August 26-27, 2013. Revised Selected Papers. Springer, Berlin, pp 353–362. https://doi.org/10.1007/978-3-642-54420-0_35
Harrison W (1992) An entropy-based measure of software complexity. IEEE Trans Softw Eng 18(11):1025–1029. https://doi.org/10.1109/32.177371
Article Google Scholar
Hirth M, Hoßfeld T, Tran-Gia P (2011), Anatomy of a Crowdsourcing Platform - Using the Example of Microworkers.com. https://doi.org/10.1109/IMIS.2011.89
Kilgour R, Gray A, Sallis P, MacDonell S (1998) A fuzzy logic approach to computer software source code authorship analysis. In: Proceedings of the 1997 International Conference on Neural Information Processing and Intelligent Information Systems. Springer, Berlin, pp 865–868. http://hdl.handle.net/10292/3471
Linares-Vasquez M, Hossen K, Dang H, Kagdi H, Gethers M, Poshyvanyk D (2012) Triaging incoming change requests: Bug or commit history, or code authorship?. In: 2012 28th IEEE International Conference on Software Maintenance (ICSM), pp 451–460. https://doi.org/10.1109/ICSM.2012.6405306
Maier D (1978) The complexity of some problems on subsequences and supersequences. J ACM 25(2):322–336. https://doi.org/10.1145/322063.322075
Article MathSciNet MATH Google Scholar
McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK Projects. In: Proceedings of the 11th working conference on mining software repositories, ACM, New York, NY, USA, MSR 2014, pp 192–201. https://doi.org/10.1145/2597073.2597076
Meng X, Miller BP, Williams WR, Bernat AR (2013) Mining software repositories for accurate authorship. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, IEEE Computer Society, Washington, DC, USA, ICSM ’13, pp 250–259. https://doi.org/10.1109/ICSM.2013.36
Nardi BA (1996) Context and consciousness: activity theory and human-computer interaction. MIT Press, Cambridge
Google Scholar
Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes. https://doi.org/10.1109/TSE.2007.1015
Panciera K, Halfaker A, Terveen L (2009) Wikipedians are born, not made: a study of power editors on Wikipedia. In: Proceedings of the ACM 2009 International Conference on Supporting Group Work, Association for Computing Machinery, vol 4. ACM Press, New York, pp 51–60. https://doi.org/10.1145/1531674.1531682
Peng X, Babar MA, Ebert C (2014) Collaborative software development platforms for crowdsourcing. IEEE Softw 31(2):30–36. https://doi.org/10.1109/MS.2014.31
Article Google Scholar
Posnett D, D’Souza R, Devanbu P, Filkov V (2013) Dual ecological measures of focus in software development. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, NJ, USA, ICSE ’13, pp 452–461. http://dl.acm.org/citation.cfm?id=2486788.2486848
Prechelt L (2000) An empirical comparison of seven programming languages. Computer 33(10):23–29. https://doi.org/10.1109/2.876288
Article Google Scholar
Pythonorg (2016) difflib — Helpers for computing deltas. https://docs.python.org/2/library/difflib.html
Rahman F, Devanbu P (2011) Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd International Conference on Software Engineering, ACM, New York, NY, USA, ICSE ’11, pp 491–500. https://doi.org/10.1145/1985793.1985860
Raymond E (1999) The cathedral and the bazaar. Knowl Technol Policy 12(3):23–49. https://doi.org/10.1007/s12130-999-1026-0
Article Google Scholar
van Wendel de Joode R, De Bruijn JA, Van Eeten MJG (2003) Protecting the virtual commons: self-organizing open source communities and innovative intellectual property regimes. Asser Press International Distribution by kluwer Law International, The Hague, The Netherlands. http://hdl.handle.net/10535/25
Wagner R, Fischer M (1974) The string-to-string correction problem. J ACM 21(1):168–173
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Western Washington University, Bellingham, WA, 98225, USA
Michail Tsikerdekis

Authors

Michail Tsikerdekis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michail Tsikerdekis.

Additional information

Communicated by: Maurizio Morisio

Appendix

1.1 A Proof of Claim 1

Let $\lfloor x \rceil \in \mathbb {Z}^{+}$ be the closest integer to x. It holds that:

$$x - \frac{1}{2} \le \lfloor x \rceil < x + \frac{1}{2} $$

For x = n ∗ m t where n > 0 and $mt \in \mathbb {R} | 0 \le mt \le 1$ it follows that:

$$\begin{array}{@{}rcl@{}} &&n*mt - \frac{1}{2} \le \lfloor n*mt \rceil < n*mt + \frac{1}{2} \\ &&\implies \frac {n*mt - \frac{1}{2}}{n} \le \frac{\lfloor n*mt \rceil}{n} < \frac{n*mt + \frac{1}{2}}{n} \\ &&\implies \frac {n*mt}{n} - \frac{\frac{1}{2}}{n} \le \frac{\lfloor n*mt \rceil}{n} < \frac{n*mt}{n} + \frac{\frac{1}{2}}{n} \\ &&\implies mt - \frac{1}{2n} \le \frac{\lfloor n*mt \rceil}{n} < mt + \frac{1}{2n} \\ &&\implies mt - mt - \frac{1}{2n} \le \underset{\underset{\text{discrete form}}{\text{percentage over}}}{\underbrace{\frac{\lfloor n*mt \rceil}{n}}} - mt < mt - mt + \frac{1}{2n} \end{array} $$

As the middle part represents the difference (or distortion) due to applying a percentage to a discrete set, we can represent it simply as d.

$$\begin{array}{@{}rcl@{}} &&- \frac{1}{2n} \le d < \frac{1}{2n} \\ &&\implies |d| \le \frac{1}{2n} \end{array} $$

Therefore the absolute maximum of d is $|d_{max}| = \frac {1}{2n}$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsikerdekis, M. Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software. Empir Software Eng 23, 1871–1894 (2018). https://doi.org/10.1007/s10664-017-9575-4

Download citation

Published: 21 November 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10664-017-9575-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software

Abstract

Access this article

Similar content being viewed by others

Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations

Investigating the effectiveness of peer code review in distributed software development based on objective and subjective data

Profile based recommendation of code reviewers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 A Proof of Claim 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Persistent code contribution: a ranking algorithm for code contribution in crowdsourced software

Abstract

Access this article

Similar content being viewed by others

Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations

Investigating the effectiveness of peer code review in distributed software development based on objective and subjective data

Profile based recommendation of code reviewers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 A Proof of Claim 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation