research-article

Assessing Developer Expertise from the Statistical Distribution of Programming Syntax Patterns

Authors:
Arghavan Moradi Dakhel

Polytechnique Montreal, Canada

Polytechnique Montreal, Canada
View Profile

,
Michel C. Desmarais

Polytechnique Montreal, Canada

Polytechnique Montreal, Canada
View Profile

,
Foutse Khomh

Polytechnique Montreal, Canada

Polytechnique Montreal, Canada
View Profile

EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software EngineeringJune 2021Pages 90–99https://doi.org/10.1145/3463274.3463343

Published:21 June 2021Publication History

EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pages 90–99

ABSTRACT

Accurate assessment of developer expertise is crucial for the assignment of an individual to perform a task or, more generally, to be involved in a project that requires an adequate level of knowledge. Potential programmers can come from a large pool. Therefore, automatic means to provide such assessment of expertise from written programs would be highly valuable in such context.

Previous works towards this goal have generally used heuristics such as Line 10 Rule or linguistic information in source files such as comments or identifiers to represent the knowledge of developers and evaluate their expertise. In this paper, we focus on syntactic patterns mastery as an evidence of knowledge in programming and propose a theoretical definition of programming knowledge based on the distribution of Syntax Patterns (SPs) in source code, namely Zipf’s law. We first validate the model and its scalability over synthetic data of “Expert” and “Novice” programmers. This provides a ground truth and allows us to explore the space of validity of the model. Then, we assess the performance of the model over real data from programmers. The results show that our proposed approach outperforms the recent state of the art approaches for the task of classifying programming experts.

References

Mohammad Allahbakhsh, Boualem Benatallah, Aleksandar Ignjatovic, Hamid Reza Motahari-Nezhad, Elisa Bertino, and Schahram Dustdar. 2013. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing 17, 2 (2013), 76–81.Google ScholarDigital Library
Eduardo G Altmann and Martin Gerlach. 2016. Statistical laws in linguistics. In Creativity and universality in language. Springer, 7–26.Google Scholar
John Anvik and Gail C Murphy. 2007. Determining implementation expertise from bug reports. In Fourth International Workshop on Mining Software Repositories (MSR’07: ICSE Workshops 2007). IEEE, 2–2.Google ScholarDigital Library
Ali Sajedi Badashian. 2016. Realistic bug triaging. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). IEEE, 847–850.Google ScholarDigital Library
Jaume Baixeries, Brita Elvevåg, and Ramon Ferrer-i Cancho. 2013. The evolution of the exponent of Zipf’s law in language ontogeny. PloS one 8, 3 (2013), e53227.Google ScholarCross Ref
Younes Boubekeur, Gunter Mussbacher, and Shane McIntosh. 2020. Automatic assessment of students’ software models using a simple heuristic and machine learning. In Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings. 1–10.Google ScholarDigital Library
Martin S Bressler. 2014. Building the winning organization through high-impact hiring. Journal of Management and Marketing Research 15 (2014), 1.Google Scholar
Casey Casalnuovo, Kenji Sagae, and Prem Devanbu. 2019. Studying the difference between natural and programming language corpora. Empirical Software Engineering 24, 4 (2019), 1823–1868.Google ScholarDigital Library
Xiang Cheng, Shuguang Zhu, Gang Chen, and Sen Su. 2015. Exploiting user feedback for expert finding in community question answering. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW). IEEE, 295–302.Google ScholarDigital Library
Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law distributions in empirical data. SIAM review 51, 4 (2009), 661–703.Google ScholarDigital Library
Baojiang Cui, Jiansong Li, Tao Guo, Jianxin Wang, and Ding Ma. 2010. Code comparison system based on abstract syntax tree. In 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT). IEEE, 668–673.Google Scholar
Jose Ricardo da Silva, Esteban Clua, Leonardo Murta, and Anita Sarma. 2015. Niche vs. breadth: Calculating expertise over time through a fine-grained analysis. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 409–418.Google ScholarCross Ref
Nilesh Dalvi, Anirban Dasgupta, Ravi Kumar, and Vibhor Rastogi. 2013. Aggregating crowdsourced binary ratings. In Proceedings of the 22nd international conference on World Wide Web. 285–294.Google ScholarDigital Library
Anna Deluca and Álvaro Corral. 2013. Fitting and goodness-of-fit test of non-truncated and truncated power-law distributions. Acta Geophysica 61, 6 (2013), 1351–1394.Google ScholarCross Ref
Tapajit Dey, Andrey Karnauch, and Audris Mockus. 2020. Representation of Developer Expertise in Open Source Software. arXiv preprint arXiv:2005.10176(2020).Google Scholar
Roderick Edwards and Laura Collins. 2011. Lexical frequency profiles and Zipf’s law. Language Learning 61, 1 (2011), 1–30.Google ScholarCross Ref
Jean-Claude Falmagne, Mathieu Koppen, Michael Villano, Jean-Paul Doignon, and Leila Johannesen. 1990. Introduction to knowledge spaces: How to build, test, and search them.Psychological Review 97, 2 (1990), 201.Google Scholar
Thomas Fritz, Jingwen Ou, Gail C Murphy, and Emerson Murphy-Hill. 2010. A degree-of-knowledge model to capture source code familiarity. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. 385–394.Google ScholarDigital Library
Michel L Goldstein, Steven A Morris, and Gary G Yen. 2004. Problems with fitting to the power-law distribution. The European Physical Journal B-Condensed Matter and Complex Systems 41, 2(2004), 255–258.Google ScholarCross Ref
Gillian J Greene and Bernd Fischer. 2016. Cvexplorer: Identifying candidate developers by mining and exploring their open source contributions. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 804–809.Google ScholarDigital Library
Weizhi Huang, Wenkai Mo, Beijun Shen, Yu Yang, and Ning Li. 2016. CPDScorer: Modeling and Evaluating Developer Programming Ability across Software Communities.. In SEKE. 87–92.Google Scholar
George F Jenks. 1967. The data model concept in statistical mapping. International yearbook of cartography 7 (1967), 186–190.Google Scholar
Jennifer Marlow and Laura Dabbish. 2013. Activity traces and signals in software developer recruitment and hiring. In Proceedings of the 2013 conference on Computer supported cooperative work. 145–156.Google ScholarDigital Library
Dominique Matter, Adrian Kuhn, and Oscar Nierstrasz. 2009. Assigning bug reports using a vocabulary-based expertise model of developers. In 2009 6th IEEE international working conference on mining software repositories. IEEE, 131–140.Google Scholar
David W McDonald and Mark S Ackerman. 2000. Expertise recommender: a flexible recommendation system and architecture. In Proceedings of the 2000 ACM conference on Computer supported cooperative work. 231–240.Google ScholarDigital Library
Robert McMaster. 1997. In Memoriam: George F. Jenks (1916-1996). Cartography and Geographic Information Systems 24, 1 (1997), 56–59.Google ScholarCross Ref
Audris Mockus and James D Herbsleb. 2002. Expertise browser: a quantitative approach to identifying expertise. In Proceedings of the 24th International Conference on Software Engineering. ICSE 2002. IEEE, 503–512.Google ScholarCross Ref
Joao Eduardo Montandon, Luciana Lourdes Silva, and Marco Tulio Valente. 2019. Identifying experts in software libraries and frameworks among GitHub users. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 276–287.Google ScholarDigital Library
João Eduardo Montandon, Marco Tulio Valente, and Luciana L Silva. 2021. Mining the Technical Roles of GitHub Users. Information and Software Technology 131 (2021), 106485.Google ScholarCross Ref
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Thirtieth AAAI Conference on Artificial Intelligence.Google ScholarDigital Library
Johnatan Oliveira, Markos Viggiato, and Eduardo Figueiredo. 2019. How Well Do You Know This Library? Mining Experts from Source Code Analysis. In Proceedings of the XVIII Brazilian Symposium on Software Quality. 49–58.Google ScholarDigital Library
Regina Pustet. 2004. Zipf and his heirs. Language Sciences 26, 1 (2004), 1–25.Google ScholarCross Ref
Ali Sajedi-Badashian and Eleni Stroulia. 2020. Vocabulary and time based bug-assignment: A recommender system for open-source projects. Software: Practice and Experience(2020).Google Scholar
David Schuler and Thomas Zimmermann. 2008. Mining usage expertise from version archives. In Proceedings of the 2008 international working conference on Mining software repositories. 121–124.Google ScholarDigital Library
Evgeny Shulzinger, Irina Legchenkova, and Edward Bormashenko. 2018. Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts Representing Human and Artificial Languages. arXiv preprint arXiv:1803.03667(2018).Google Scholar
Renuka Sindhgatta. 2008. Identifying domain expertise of developers from source code. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 981–989.Google ScholarDigital Library
Leif Singer, Fernando Figueira Filho, Brendan Cleary, Christoph Treude, Margaret-Anne Storey, and Kurt Schneider. 2013. Mutual assessment in the social programmer ecosystem: An empirical investigation of developer profile aggregators. In Proceedings of the 2013 conference on Computer supported cooperative work. 103–116.Google ScholarDigital Library
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. Pydriller: Python framework for mining software repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 908–911.Google ScholarDigital Library
Cédric Teyton, Marc Palyart, Jean-Rémy Falleri, Floréal Morandat, and Xavier Blanc. 2014. Automatic extraction of developer expertise. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering. 1–10.Google ScholarDigital Library
Yuan Tian, Dinusha Wijedasa, David Lo, and Claire Le Goues. 2016. Learning to rank for bug report assignee recommendation. In 2016 IEEE 24th International Conference on Program Comprehension (ICPC). IEEE, 1–10.Google ScholarCross Ref
Morteza Verdi, Ashkan Sami, Jafar Akhondali, Foutse Khomh, Gias Uddin, and Alireza Karami Motlagh. 2020. An empirical study of c++ vulnerabilities in crowd-sourced code examples. IEEE Transactions on Software Engineering(2020).Google ScholarDigital Library
Xin Xia, David Lo, Xinyu Wang, and Bo Zhou. 2015. Dual analysis for recommending developers to resolve bugs. Journal of Software: Evolution and Process 27, 3 (2015), 195–220.Google ScholarDigital Library
Hongyu Zhang. 2009. Discovering power laws in computer programs. Information processing & management 45, 4 (2009), 477–483.Google Scholar
George Kingsley Zipf. 1949. Human behaviour and the principle of least-effort. Cambridge MA edn. Reading: Addison-Wesley(1949).Google Scholar
George Kingsley Zipf. 2016. Human behavior and the principle of least effort: An introduction to human ecology. Ravenio Books.Google Scholar

Recommendations

Automatic extraction of developer expertise
EASE '14: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering

Context: Expert identification is becoming critical to ease the communication between developers in case of global software development or to better know members of large software communities. To quickly identify who are the experts that will best ...
Read More
C++ Quick Syntax Reference
Read More
Design strategies in object-oriented programming and expertise
CHI '92: Posters and Short Talks of the 1992 SIGCHI Conference on Human Factors in Computing Systems

The goal of the present study is to analyse the design activity followed by professional programmers using an object-oriented programming (OOP) language. An issue is to define which characteristics of the design strategies are common or different ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering
June 2021
417 pages
ISBN:9781450390538
DOI:10.1145/3463274
Editors:
Ruzanna Chitchyan,
Jingyue Li,
Barbara Weber,
Tao Yue
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Zipf law
knowledge assessment
software maintenance
syntax pattern
version control system
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate71of232submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 160
  Total Downloads
- Downloads (Last 12 months)51
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Assessing Developer Expertise from the Statistical Distribution of Programming Syntax Patterns

EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Recommendations

Automatic extraction of developer expertise

C++ Quick Syntax Reference

Design strategies in object-oriented programming and expertise

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Assessing Developer Expertise from the Statistical Distribution of Programming Syntax Patterns

EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Recommendations

Automatic extraction of developer expertise

C++ Quick Syntax Reference

Design strategies in object-oriented programming and expertise

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media