skip to main content
10.1145/2972958.2972960acmotherconferencesArticle/Chapter ViewAbstractPublication PagespromiseConference Proceedingsconference-collections
research-article

Hidden Markov Models for the Prediction of Developer Involvement Dynamics and Workload

Published: 09 September 2016 Publication History

Abstract

The evolution of software projects is driven by developers who are in control of the developed artifacts. When analyzing the behavior of developers, the observable behaviors are, e.g., commits, messages, or bug assignments. For defining dynamic activities and workload of developers, we consider underlying characteristics, which means the level of involvement according to their role in the project. In this paper, we propose to employ Hidden Markov Models (HMMs) to model this underlying behavior given the observable behavior as input. For this, we observe monthly commits, bugfixes, mailing list activity, and bug comments for each developer over the project duration. As output we get a model for each developer describing how likely it is to be in a low, medium, or high contribution state of every point in time. As a result, we discovered that same developer types exhibit similar models in terms of state patterns and transition matrices, which represent their involvement dynamics. Although the workload of the different developer roles related to this is more complex to model, we created a general model which performs nearly as well as individual developer contribution models. Moreover, to demonstrate the practical applicability, we present an example of the usage of our approach in project planning.

References

[1]
T. Girba, A. Kuhn, M. Seeberger, and S. Ducasse, "How developers drive software evolution," in Proceedings of the Eighth International Workshop on Principles of Software Evolution, ser. IWPSE '05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 113--122.
[2]
X. Ben, S. Beijun, and Y. Weicheng, "Mining developer contribution in open source software using visualization techniques," in Proceedings of the Third International Conference on Intelligent System Design and Engineering Applications (ISDEA), 2013, pp. 934--937.
[3]
M. Foucault, M. Palyart, X. Blanc, G. C. Murphy, and J.-R. Falleri, "Impact of developer turnover on quality in open-source software," in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. New York, NY, USA: ACM, 2015, pp. 829--841.
[4]
J. Anvik, L. Hiew, and G. C. Murphy, "Who should fix this bug?" in Proceedings of the 28th International Conference on Software Engineering, ser. ICSE '06. New York, NY, USA: ACM, 2006, pp. 361--370.
[5]
E. J. Weyuker, T. J. Ostrand, and R. M. Bell, "Using developer information as a factor for fault prediction," in Proceedings of the Third International Workshop on Predictor Models in Software Engineering, ser. PROMISE '07. Washington, DC, USA: IEEE Computer Society, 2007, pp. 8--14.
[6]
L. Yu and S. Ramaswamy, "Mining cvs repositories to understand open-source project developer roles," in Proceedings of the Fourth International Workshop on Mining Software Repositories, ser. MSR '07. Washington, DC, USA: IEEE Computer Society, 2007, pp. 8--11.
[7]
P. Bhattacharya, I. Neamtiu, and M. Faloutsos, "Determining developers' expertise and role: A graph hierarchy-based approach." in ICSME. IEEE Computer Society, 2014, pp. 11--20.
[8]
A. Bachmann, C. Bird, F. Rahman, P. T. Devanbu, and A. Bernstein, "The missing links: bugs and bug-fix commits." in SIGSOFT FSE, G.-C. Roman and K. J. Sullivan, Eds. ACM, 2010, pp. 97--106.
[9]
N. Bettenburg, E. Shihab, and A. E. Hassan, "An empirical study on the risks of using off-the-shelf techniques for processing mailing list data." in ICSM. IEEE Computer Society, 2009, pp. 539--542.
[10]
K. Crowston and J. Howison, "Hierarchy and centralization in free and open source software team communications," Knowledge, Technology, and Policy, vol. 18, no. 4, pp. 65--85, Dec. 2006.
[11]
Kde.org, "Amarok," https://amarok.kde.org/, 2016.
[12]
Apache.org, "Log4j," http://logging.apache.org/log4j, 2016.
[13]
KDE.org, "Konsole," https://konsole.kde.org/, 2016.
[14]
Apache.org, "Ant," http://ant.apache.org/, 2016.
[15]
Apache.org, "Poi," https://poi.apache.org/, 2016.
[16]
Eclipse.org, "Egit," http://www.eclipse.org/egit/, 2016.
[17]
J. Lima, C. Treude, F. F. Filho, and U. Kulesza, "Assessing developer contribution with repository mining-based metrics," in Software Maintenance and Evolution (ICSME), 2015 IEEE International Conference on, Sept 2015, pp. 536--540.
[18]
L. Hattori and M. Lanza, "On the nature of commits." in ASE Workshops. IEEE, 2008, pp. 63--71.
[19]
G. Gousios, E. Kalliamvakou, and D. Spinellis, "Measuring developer contribution from software repository data," in Proceedings of the 2008 International Working Conference on Mining Software Repositories, ser. MSR '08. New York, NY, USA: ACM, 2008, pp. 129--132.
[20]
P. V. Singh, Y. Tan, and N. Youn, "A hidden markov model of developer learning dynamics in open source software projects." Information Systems Research, vol. 22, no. 4, pp. 790--807, 2011.
[21]
M. Fischer, M. Pinzger, and H. Gall, "Populating a release history database from version control and bug tracking systems," in Proceedings of the International Conference on Software Maintenance, ser. ICSM '03. Washington, DC, USA: IEEE Computer Society, 2003, pp. 23--32.
[22]
T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan, "The promise repository of empirical software engineering data," 2012. {Online}. Available: http://promisedata.googlecode.com
[23]
B. Turhan and O. Kutlubay, "Mining software data," in Data Engineering Workshop, 2007 IEEE 23rd International Conference on, April 2007, pp. 912--916.
[24]
L. R. Rabiner and B. H. Juang, "An introduction to hidden markov models," IEEE ASSp Magazine, 1986.
[25]
L. R. Rabiner, "Readings in speech recognition," A. Waibel and K.-F. Lee, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1990, ch. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, pp. 267--296. {Online}. Available: http://dl.acm.org/citation.cfm?id=108235.108253
[26]
T. K. Moon, "The expectation-maximization algorithm," IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47--60, Nov. 1996.
[27]
C. G. Campos, "Cvsanaly," 2014. {Online}. Available: http://metricsgrimoire.github.io/CVSAnalY/
[28]
G. Rossum, "Python reference manual," Amsterdam, The Netherlands, Tech. Rep., 1995.
[29]
M. Widenius and D. Axmark, Mysql Reference Manual, 1st ed., P. DuBois, Ed. Sebastopol, CA, USA: O'Reilly & Associates, Inc., 2002.
[30]
R. Ihaka and R. Gentleman, "R: a language for data analysis and graphics," Journal of computational and graphical statistics, vol. 5, no. 3, pp. 299--314, 1996.
[31]
L. Xiaolin, M. Parizeau, and R. Plamondon, "Training hidden markov models with multiple observations-a combinatorial method." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 4, pp. 371--377, 2000.
[32]
S. Herbold, J. Grabowski, and S. Waack, "Calculation and optimization of thresholds for sets of software metrics," Empirical Software Engineering, vol. 16, no. 6, pp. 812--841, 2011.
[33]
J. O'Connell and S. Højsgaard, "Hidden semi markov models for multiple observation sequences: The mhsmm package for R," Journal of Statistical Software, vol. 39, no. 4, pp. 1--22, 2011. {Online}. Available: http://www.jstatsoft.org/v39/i04/
[34]
M. Taboga, Lectures on probability theory and mathematical statistics. CreateSpace Independent Pub., 2012.
[35]
M. Hollander and D. A. Wolfe, Nonparametric Statistical Methods, 2nd Edition, 2nd ed. Wiley-Interscience, Jan. 1999. {Online}. Available: http://www.worldcat.org/isbn/0471190454
[36]
V. Honsel, "Statistical learning and software mining for agent based simulation of software evolution," in Doctoral Symposium at the 37th International Conference on Software Engineering (ICSE), 2015.
[37]
M. Pohl and S. Diehl, "What dynamic network metrics can tell us about developer roles," in Proceedings of the 2008 International Workshop on Cooperative and Human Aspects of Software Engineering, ser. CHASE '08. New York, NY, USA: ACM, 2008, pp. 81--84.
[38]
N. Nagappan, B. Murphy, and V. Basili, "The influence of organizational structure on software quality: An empirical case study," in Proceedings of the 30th International Conference on Software Engineering (ICSE). ACM, 2008.
[39]
V. Honsel, D. Honsel, and J. Grabowski, "Software process simulation based on mining software repositories." The Third International Workshop on Software Mining, 2014.
[40]
V. Honsel, D. Honsel, S. Herbold, J. Grabowski, and S. Waack, "Mining software dependency networks for agent-based simulation of software evolution." The Fourth International Workshop on Software Mining, 2015.

Cited By

View all
  • (2024)Innovating Coding: Evaluating the Impact of Innovative Thinking in ProgrammingProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644397(241-245)Online publication date: 15-Apr-2024
  • (2022)Synergies Between Artificial Intelligence and Software Engineering: Evolution and TrendsHandbook on Artificial Intelligence-Empowered Applied Software Engineering10.1007/978-3-031-08202-3_2(11-36)Online publication date: 4-Sep-2022
  • (2021)Modeling Operator Performance in Human-in-the-Loop Autonomous SystemsIEEE Access10.1109/ACCESS.2021.30980609(102715-102731)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PROMISE 2016: Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software Engineering
September 2016
84 pages
ISBN:9781450347723
DOI:10.1145/2972958
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Developer Roles
  2. Hidden Markov Models
  3. Software Development

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PROMISE 2016

Acceptance Rates

PROMISE 2016 Paper Acceptance Rate 10 of 23 submissions, 43%;
Overall Acceptance Rate 98 of 213 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Innovating Coding: Evaluating the Impact of Innovative Thinking in ProgrammingProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644397(241-245)Online publication date: 15-Apr-2024
  • (2022)Synergies Between Artificial Intelligence and Software Engineering: Evolution and TrendsHandbook on Artificial Intelligence-Empowered Applied Software Engineering10.1007/978-3-031-08202-3_2(11-36)Online publication date: 4-Sep-2022
  • (2021)Modeling Operator Performance in Human-in-the-Loop Autonomous SystemsIEEE Access10.1109/ACCESS.2021.30980609(102715-102731)Online publication date: 2021
  • (2021)Investigation and prediction of open source software evolution using automated parameter mining for agent-based simulationAutomated Software Engineering10.1007/s10515-021-00280-328:1Online publication date: 1-May-2021
  • (2017)Agent-Based Simulation for Software Development ProcessesMulti-Agent Systems and Agreement Technologies10.1007/978-3-319-59294-7_28(333-340)Online publication date: 23-Jun-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media