ABSTRACT
Understanding the evolution of communities in developer social networks (DSNs) around open source software (OSS) projects can provide valuable insights about the socio-technical process of OSS development. Existing studies show the evolutionary behaviors of social communities can effectively be described using patterns including split, shrink, merge, expand, emerge, and extinct. However, existing pattern-based approaches are limited in supporting quantitative analysis, and are potentially problematic for using the patterns in a mutually exclusive manner when describing community evolution. In this work, we propose that different patterns can occur simultaneously between every pair of communities during the evolution, just in different degrees. Four entropy-based indices are devised to measure the degree of community split, shrink, merge, and expand, respectively, which can provide a comprehensive and quantitative measure of community evolution in DSNs. The indices have properties desirable to quantify community evolution including monotonicity, and bounded maximum and minimum values that correspond to meaningful cases. They can also be combined to describe more patterns such as community emerge and extinct. We conduct studies with real-world OSS projects to evaluate the validity of the proposed indices. The results suggest the proposed indices can effectively capture community evolution, and are consistent with existing approaches in detecting evolution patterns in DSNs with an accuracy of 94.1%. The results also show that the indices are useful in predicting OSS team productivity with an accuracy of 0.718. In summary, the proposed approach is among the first to quantify the degree of community evolution with respect to different patterns, which is promising in supporting future research and applications about DSNs and OSS development.
- Abdelmonem A Afifi, Jenny B Kotlerman, Susan L Ettner, and Marie Cowan. 2007. Methods for improving regression analysis for skewed continuous or counted responses. Annu. Rev. Public Health, 28 (2007), 95–111. Google ScholarCross Ref
- Haldun Akoglu. 2018. User’s guide to correlation coefficients. Turkish journal of emergency medicine, 18, 3 (2018), 91–93. Google Scholar
- Mohamed Abdelrahman Aljemabi and Zhongjie Wang. 2018. Empirical study on the evolution of developer social networks. IEEE Access, 6 (2018), 51049–51060. Google ScholarCross Ref
- Sitaram Asur, Srinivasan Parthasarathy, and Duygu Ucar. 2009. An event-based framework for characterizing the evolutionary behavior of interaction graphs. ACM Transactions on Knowledge Discovery from Data (TKDD), 3, 4 (2009), 1–36. Google Scholar
- Nicolas Bettenburg. 2011. Mining development repositories to study the impact of collaboration on software systems. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 376–379. Google ScholarDigital Library
- Stefanie Betz, Samuel Fricker, Andrew Moss, Wasif Afzal, Mikael Svahnberg, Claes Wohlin, Jürgen Börstler, and Tony Gorschek. 2013. An evolutionary perspective on socio-technical congruence: The rubber band effect. In 2013 3rd International Workshop on Replication in Empirical Software Engineering Research. 15–24. Google ScholarDigital Library
- Christian Bird, Alex Gourley, Prem Devanbu, Michael Gertz, and Anand Swaminathan. 2006. Mining email social networks. In Proceedings of the 2006 international workshop on Mining software repositories. 137–143. Google ScholarDigital Library
- Christian Bird, David Pattison, Raissa D’Souza, Vladimir Filkov, and Premkumar Devanbu. 2008. Latent social structure in open source projects. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. 24–35. Google ScholarDigital Library
- Thomas Bock, Angelika Schmid, and Sven Apel. 2021. Measuring and Modeling Group Dynamics in Open-Source Software Development: A Tensor Decomposition Approach. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 2 (2021), 1–50. Google ScholarDigital Library
- Piotr Bródka, Stanisł aw Saganowski, and Przemysł aw Kazienko. 2013. GED: the method for group evolution discovery in social networks. Social Network Analysis and Mining, 3, 1 (2013), 1–14. Google ScholarCross Ref
- Gemma Catolino, Fabio Palomba, Damian A Tamburri, Alexander Serebrenik, and Filomena Ferrucci. 2019. Gender diversity and women in software teams: How do they affect community smells? In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). 11–20. Google ScholarDigital Library
- Tanmoy Chakraborty, Ayushi Dalmia, Animesh Mukherjee, and Niloy Ganguly. 2017. Metrics for community analysis: A survey. ACM Computing Surveys (CSUR), 50, 4 (2017), 1–37. Google ScholarDigital Library
- Jailton Coelho and Marco Tulio Valente. 2017. Why modern open source projects fail. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 186–196. Google ScholarDigital Library
- Patricia Cohen, Stephen G West, and Leona S Aiken. 2014. Applied multiple regression/correlation analysis for the behavioral sciences. Psychology press. Google Scholar
- Melvin E Conway. 1968. How do committees invent. Datamation, 14, 4 (1968), 28–31. Google Scholar
- Kevin Crowston and James Howison. 2005. The social structure of free and open source software development. First Monday. Google Scholar
- Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 conference on computer supported cooperative work. 1277–1286. Google ScholarDigital Library
- Christine P Dancey and John Reidy. 2007. Statistics without maths for psychology. Pearson education. Google Scholar
- Nicolas Ducheneaut. 2005. Socialization in an open source software community: A socio-technical analysis. Computer Supported Cooperative Work (CSCW), 14, 4 (2005), 323–368. Google ScholarDigital Library
- Kate Ehrlich and Marcelo Cataldo. 2012. All-for-one and one-for-all? A multi-level analysis of communication patterns and individual performance in geographically distributed software development. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. 945–954. Google ScholarDigital Library
- Andrew Gelman and Jennifer Hill. 2006. Data analysis using regression and multilevel/hierarchical models. Cambridge university press. Google Scholar
- Georgios Gousios and Diomidis Spinellis. 2012. GHTorrent: GitHub’s data from a firehose. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). 12–21. Google ScholarCross Ref
- Derek Greene, Donal Doyle, and Padraig Cunningham. 2010. Tracking the evolution of communities in dynamic social networks. In 2010 international conference on advances in social networks analysis and mining. 176–183. Google ScholarDigital Library
- Jungpil Hahn, Jae Yun Moon, and Chen Zhang. 2008. Emergence of new project teams from open source software developer networks: Impact of prior collaboration ties. Information Systems Research, 19, 3 (2008), 369–391. Google ScholarCross Ref
- Anna Hannemann and Ralf Klamma. 2013. Community dynamics in open source software projects: Aging and social reshaping. In IFIP International Conference on Open Source Systems. 80–96. Google ScholarCross Ref
- Steffen Herbold, Aynur Amirfallah, Fabian Trautsch, and Jens Grabowski. 2021. A systematic mapping study of developer social network research. Journal of Systems and Software, 171 (2021), 110802. Google ScholarCross Ref
- Qiaona Hong, Sunghun Kim, Shing Chi Cheung, and Christian Bird. 2011. Understanding a developer social network and its evolution. In 2011 27th IEEE international conference on software maintenance (ICSM). 323–332. Google ScholarDigital Library
- Hao-Yun Huang, Qize Le, and Jitesh H Panchal. 2011. Analysis of the structure and evolution of an open-source community. Journal of Computing and Information Science in Engineering, 11, 3 (2011). Google ScholarCross Ref
- Carlos Jensen, Scott King, and Victor Kuechler. 2011. Joining free/open source software communities: An analysis of newbies’ first interactions on project mailing lists. In 2011 44th Hawaii international conference on system sciences. 1–10. Google ScholarDigital Library
- Mitchell Joblin. 2017. Structural and Evolutionary Analysis of Developer Networks. Ph. D. Dissertation. Universität Passau. Google Scholar
- Arora Kanika. 2015. Research methods: The essential knowledge base. Cengage learning. Google Scholar
- Ulrich Knief and Wolfgang Forstmeier. 2021. Violating the normality assumption may be the lesser of two evils. Behavior Research Methods, 53, 6 (2021), 2576–2590. Google ScholarCross Ref
- Zhixing Li, Yue Yu, Minghui Zhou, Tao Wang, Gang Yin, Long Lan, and Huaimin Wang. 2020. Redundancy, context, and preference: An empirical study of duplicate pull requests in OSS projects. IEEE Transactions on Software Engineering. Google Scholar
- Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura, and Belle L Tseng. 2007. Blog community discovery and evolution based on mutual awareness expansion. In IEEE/WIC/ACM International Conference on Web Intelligence (WI’07). 48–56. Google ScholarDigital Library
- Daniel Lüdecke. 2021. Assessment of Regression Models Performance. https://cran.r-project.org/web/packages/performance/performance.pdf Google Scholar
- Mircea Lungu, Michele Lanza, Tudor Gîrba, and Romain Robbes. 2010. The small project observatory: Visualizing software ecosystems. Science of Computer Programming, 75, 4 (2010), 264–275. Google ScholarDigital Library
- Gregory Madey, Vincent Freeh, and Renee Tynan. 2002. The open source software development phenomenon: An analysis based on social network theory. In Americas Conference on Information Systems (AMCIS2002). 1806–1813. Google Scholar
- Andrew Meneely, Ben Smith, and Laurie Williams. 2013. Validating software metrics: A spectrum of philosophies. ACM Transactions on Software Engineering and Methodology (TOSEM), 21, 4 (2013), 1–28. Google ScholarDigital Library
- Tom Mens and Mathieu Goeminne. 2011. Analysing the evolution of social aspects of open source software ecosystems. In Proceedings of the Workshop on Software Ecosystems. 1–14. Google Scholar
- Kumiyo Nakakoji, Yasuhiro Yamamoto, Yoshiyuki Nishinaka, Kouichi Kishida, and Yunwen Ye. 2002. Evolution patterns of open-source software systems and communities. In Proceedings of the international workshop on Principles of software evolution. 76–85. Google ScholarDigital Library
- Mark EJ Newman. 2003. Mixing patterns in networks. Physical review E, 67, 2 (2003), 026126. Google Scholar
- Mark EJ Newman. 2004. Analysis of weighted networks. Physical review E, 70, 5 (2004), 056131. Google Scholar
- Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E, 69, 2 (2004), 026113. Google Scholar
- Kawin Ngamkajornwiwat, Dongsong Zhang, A Gunes Koru, Lina Zhou, and Robert Nolker. 2008. An exploratory study on the evolution of OSS developer communities. In Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008). 305–305. Google ScholarDigital Library
- Tim O’Reilly. 1999. Lessons from open-source software development. Commun. ACM, 42, 4 (1999), 32–37. Google ScholarDigital Library
- Gergely Palla, Albert-László Barabási, and Tamás Vicsek. 2007. Quantifying social group evolution. Nature, 446, 7136 (2007), 664–667. Google Scholar
- Gergely Palla, Imre Derényi, Illés Farkas, and Tamás Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. nature, 435, 7043 (2005), 814–818. Google Scholar
- Sebastiano Panichella, Gabriele Bavota, Massimiliano Di Penta, Gerardo Canfora, and Giuliano Antoniol. 2014. How developers’ collaborations identified from different sources tell us about code changes. In 2014 IEEE International Conference on Software Maintenance and Evolution. 251–260. Google ScholarDigital Library
- Sebastiano Panichella, Gerardo Canfora, Massimiliano Di Penta, and Rocco Oliveto. 2014. How the evolution of emerging collaborations relates to code changes: an empirical study. In Proceedings of the 22nd International Conference on Program Comprehension. 177–188. Google ScholarDigital Library
- Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering. 2–12. Google ScholarDigital Library
- Uzma Raja and Marietta J Tretter. 2012. Defining and evaluating a measure of open source project survivability. IEEE Transactions on Software Engineering, 38, 1 (2012), 163–174. Google ScholarDigital Library
- Lionel Robert and Daniel M Romero. 2015. Crowd size, diversity and performance. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 1379–1382. Google ScholarDigital Library
- Giulio Rossetti and Rémy Cazabet. 2018. Community discovery in dynamic networks: a survey. ACM Computing Surveys (CSUR), 51, 2 (2018), 1–37. Google ScholarDigital Library
- Holger Schielzeth, Niels J Dingemanse, Shinichi Nakagawa, David F Westneat, Hassen Allegue, Céline Teplitsky, Denis Réale, Ned A Dochtermann, László Zsolt Garamszegi, and Yimen G Araya-Ajoy. 2020. Robustness of linear mixed-effects models to violations of distributional assumptions. Methods in Ecology and Evolution, 11, 9 (2020), 1141–1152. Google ScholarCross Ref
- Roland Robert Schreiber and Matthäus Paul Zylka. 2020. Social Network Analysis in Software Development Projects: A Systematic Literature Review. International Journal of Software Engineering and Knowledge Engineering, 30, 03 (2020), 321–362. Google ScholarCross Ref
- Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal, 27, 3 (1948), 379–423. Google Scholar
- Param Vir Singh. 2010. The small-world effect: The influence of macro-level properties of developer collaboration networks on open-source project success. ACM Transactions on Software Engineering and Methodology (TOSEM), 20, 2 (2010), 1–27. Google ScholarDigital Library
- Didi Surian, David Lo, and Ee-Peng Lim. 2010. Mining collaboration patterns from a large developer network. In 2010 17th Working Conference on Reverse Engineering. 269–273. Google ScholarDigital Library
- MM Mahbubul Syeed, Imed Hammouda, and Tarja Systä. 2013. Evolution of open source software projects: A systematic literature review.. Journal of Software, 8, 11 (2013), 2815–2829. Google Scholar
- Damian Andrew Andrew Tamburri, Fabio Palomba, and Rick Kazman. 2019. Exploring community smells in open-source: An automated approach. IEEE Transactions on software Engineering. Google Scholar
- Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of social and technical factors for evaluating contribution in GitHub. In Proceedings of the 36th international conference on Software engineering. 356–366. Google ScholarDigital Library
- Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Let’s talk about it: evaluating contributions through discussion in GitHub. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. 144–154. Google ScholarDigital Library
- Matthew Van Antwerp and Greg Madey. 2010. The importance of social network structure in the open source software developer community. In 2010 43rd Hawaii International Conference on System Sciences. 1–10. Google ScholarDigital Library
- Bogdan Vasilescu, Daryl Posnett, Baishakhi Ray, Mark G.J. van den Brand, Alexander Serebrenik, Premkumar Devanbu, and Vladimir Filkov. 2015. Gender and Tenure Diversity in GitHub Teams. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). Association for Computing Machinery, New York, NY, USA. 3789–3798. isbn:9781450331456 https://doi.org/10.1145/2702123.2702549 Google ScholarDigital Library
- Jing Wang. 2012. Survival factors for Free Open Source Software projects: A multi-stage perspective. European Management Journal, 30, 4 (2012), 352–371. Google ScholarCross Ref
- Yi Wang, Defeng Guo, and Huihui Shi. 2007. Measuring the evolution of open source software systems with their communities. ACM SIGSOFT Software Engineering Notes, 32, 6 (2007), 7–es. Google ScholarDigital Library
- Michael Weiss, Gabriella Moroiu, and Ping Zhao. 2006. Evolution of open source communities. In IFIP International Conference on Open Source Systems. 21–32. Google ScholarCross Ref
- Mairieli Wessel, Bruno Mendes De Souza, Igor Steinmacher, Igor S Wiese, Ivanilton Polato, Ana Paula Chaves, and Marco A Gerosa. 2018. The power of bots: Characterizing and understanding bots in oss projects. Proceedings of the ACM on Human-Computer Interaction, 2, CSCW (2018), 1–19. Google ScholarDigital Library
- Jin Xu, Yongqin Gao, Scott Christley, and Gregory Madey. 2005. A topological analysis of the open souce software development community. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences. 198a–198a. Google Scholar
- Qi Xuan and Vladimir Filkov. 2014. Building it together: Synchronous development in OSS. In Proceedings of the 36th International Conference on Software Engineering. 222–233. Google ScholarDigital Library
- Jierui Zhang, Liang Wang, Zhiwen Zheng, and Xianping Tao. 2022. Social Community Evolution Analysis and Visualization in Open Source Software Projects. In Proceedings of the 23rd International Conference on Web Information Systems and Engineering. 1–8. Google ScholarDigital Library
Index Terms
- Quantifying community evolution in developer social networks
Recommendations
Evolution of communities in dynamic social networks: An efficient map-based approach
Highlights- Proposing a highly efficient approach to track community evolution.
- Defining a ...
AbstractThe expanded domain of expert system applications has risen the impact of modeling and analysis of community evolution in social networks as an important part of the decision-making process. Social networks are time-variant systems, ...
Sustainability of Open Source software communities beyond a fork
First comprehensive analysis of Open Source projects involving a fork.The LibreOffice project, which was forked from the OpenOffice.org project, shows no sign of long-term decline.LibreOffice has attracted the long-term and most active committers in ...
Tracking community evolution in social networks: A survey
AbstractThis paper presents a survey of previous studies done on the problem of tracking community evolution over time in dynamic social networks. This problem is of crucial importance in the field of social network analysis. The goal of our paper is to ...
Comments