Abstract
In large-scale open-source software projects, where developers are often distributed across the entire planet, coordination among developers is crucial. To estimate whether a state of socio-technical congruence is achieved, which is associated with software quality and project success, we assess the alignment of collaboration and communication in such software projects in terms of coordination requirements. By means of an empirical study on a substantial set of large-scale open-source software projects—the development histories of all projects sum up to over 180 years—we aim at shedding light on this issue. To this end, to take a more semantic view on this phenomenon in comparison to previous work, we do not only identify coordination requirements arising from files and functions only, but also those arising from features. We found that open-source developers fulfill coordination requirements intentionally, but mostly those coordination requirements that arise from coupled source-code artifacts, while they resolve simpler ones independently. Furthermore, neither of the considered abstraction levels of source-code artifacts (files, functions, features) is more suitable to construct coordination requirements with respect to their fulfillment. This finding strongly indicates that features do not play an as important role in the development process as expected and commonly believed by the research community in the area of feature-oriented and feature-driven development. Finally, we identified notable evolutionary trends in the fulfillment of coordination requirements and showed that far-reaching social events (such as organizational issues) have a huge impact on their fulfillment, both negatively and positively. The key findings of our empirical study are that socio-technical relations are important to understand open-source development communities and that the incorporation of different abstraction levels for developer collaboration does yield important insights to further improve the evolution in open-source software projects.
Similar content being viewed by others
Notes
We could define “positive” and “negative” motifs to capture the fulfillment of coordination requirements directly; but to keep the analysis simple, we define only motifs for the coordination requirements as such and analyze whether the identified coordination requirements are fulfilled or not.
There are various techniques to implement features, we analyze preprocessor annotations. More details in Section 4.1.
For example, see http://wiki.qemu.org/Contribute/SubmitAPatch (accessed: 2018-11-05).
In our analysis implementation, we use “:: ” as separator, but for readability reasons, we write “/ ” in this paper.
As a simple scenario, changing indentation from tabs to spaces in 1,415 files at once gives rise to more than 1 million edges representing logical coupling among these files. However, such far-spreading changes are likely not functionality changes (Hindle et al. 2008), so we aim at reducing their impact by omitting them during network construction.
We use the Wilcoxon signed-rank test because the number of available data points is rather small in our analysis and the data for some subject projects cannot be assumed to be normally distributed (Shapiro-Wilk test, p < 0.1). This also holds for other hypotheses and corresponding statistical analyses.
We did not analyze further projects as the obtained results do not fully compensate for the large amount of computing time for the additional data. Nevertheless, we argue that the selected subset of projects is sufficient to identify indicators.
To be able to compare coordination requirements among different abstraction levels—they include information on two developers and, at least, one source-code artifact, as we define in Section 2.3—, we stripped the artifact information from them.
We omit the plots for the square motif due to space restrictions. We refer to the supplementary website for all data and plots. See also Section 5.5.2.
The upcoming description of events is based on the following list of references (all accessed 2018-11-05):
-
the announcement: https://ffmpeg.org/archive.html#return_to_freedom, and
-
further detailed information from (former) FFmpeg contributors: https://lwn.net/Articles/424396/, http://blog.pkh.me/p/13-the-ffmpeg-libav-situation.html, and https://www.slideshare.net/SamsungOSG/ffmpeg-a-retrospective.
-
http://heartbleed.com/ (accessed 2018-11-05)
The upcoming description of events is based on the following list of references (all accessed 2019-03-14):
https://wiki.qemu.org/OlderNews (accessed 2018-11-05)
We also discuss this threat to construct validity in Section 7.
See the following references: https://dwheeler.com/essays/heartbleed.html, https://news.ycombinator.com/item?id=7556826, https://www.theregister.co.uk/Print/2014/04/11/openssl_heartbleed_robin_seggelmann/ (all accessed 2019-03-15)
This information could be accessed through the motif identification, which we describe in Section 2.3.
References
Aljemabi MA, Wang Z (2018) Empirical study on the evolution of developer social networks. IEEE Access 6:51049–51060
de Andrade HS, Almeida E, Crnkovic I (2014) Architectural bad smells in software product lines. In: Proc Int Conf Dependable and Secure Cloud Computing Architecture (DASCCA), ACM, pp 1–6
Apel S, Batory D, Kästner C, Saake G (2013) Feature-Oriented Software Product Lines. Springer
Argote L (2012) Organizational Learning. Springer
Arisholm E, Briand LC, Foyen A (2004) Dynamic coupling measurement for object-oriented software. IEEE Transactions on Software Engineering (TSE) 30(8):491–506
Bacchelli A, D’Ambros M, Lanza M (2010) Are popular classes more defect prone? In: Proc Int Conf Fundamental Approaches to Software Engineering (FASE), Springer, pp 59–73
Berger T, Lettner D, Rubin J, Grünbacher P, Silva A, Becker M, Chechik M, Czarnecki K (2015) What is a feature? In: Proc Int Software Product Line Conference (SPLC), ACM, pp 16–25
Betz S, Šmite D, Fricker S, Moss A, Afzal W, Svahnberg M, Wohlin C, Börstler J, Gorschek T (2013) An evolutionary perspective on Socio-Technical congruence: The rubber band effect. In: Proc Int Workshop on Replication in Empirical Software Engineering Research (RESER), IEEE
Bhattacharya P, Iliofotou M, Neamtiu I, Faloutsos M (2012) Graph-based analysis and prediction for software evolution. In: Proc Int Conf Software Engineering (ICSE), IEEE, pp 419–429
Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: Proc Int Workshop Mining Software Repositories (MSR), ACM, pp 137–143
Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P (2008) Latent social structure in open source projects. In: Proc Int Symposium on Foundations of Software Engineering (FSE), ACM, pp 24–35
Blincoe K, Valetto G, Goggins S (2012) Proximity: A measure to quantify the need for developers’ coordination. In: Proc Int Conf Computer-Supported Cooperative Work (CSCW), ACM, pp 1351–1360
Blincoe K, Valetto G, Damian D (2013) Do all task dependencies require coordination? The role of task properties in identifying critical coordination needs in software projects. In: Proc Europ Software Engineering Conf and the Int Symposium Foundations of Software Engineering (ESEC/FSE), ACM, pp 213–223
Brandes U, Gaertler M, Wagner D (2003) Experiments on graph clustering algorithms. In: European Symposium on Algorithms (ESA), Springer, pp 568–579
Brooks FP (1995) The Mythical Man-Month, Anniversary Edition: Essays On Software Engineering. Pearson Education
Cannon-Bowers JA, Salas E, Converse S (1993) Shared mental models in expert team decision making. In: Individual and Group Decision Making: Current Issues, Lawrence Erlbaum Associates, Inc., Chap 12, pp 221–246
Cataldo M, Herbsleb JD (2013) Coordination breakdowns and their impact on development productivity and software failures. IEEE Transactions on Software Engineering (TSE) 39(3):343–360
Cataldo M, Wagstrom PA, Herbsleb JD, Carley KM (2006) Identification of coordination requirements: Implications for the design of collaboration and awareness tools. In: Proc Int Conf Computer-supported Cooperative Work (CSCW), ACM, pp 353–362
Cataldo M, Herbsleb JD, Carley KM (2008) Socio-Technical congruence: A framework for assessing the impact of technical and work dependencies on software development productivity. In: Proc Int Symposium Empirical Software Engineering and Measurement, ACM, pp 2–11
Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836
Colfer LJ, Baldwin CY (2016) The mirroring hypothesis: Theory, evidence, and exceptions. Industrial and Corporate Change (ICC) 25(5):709–738
Conway ME (1968) How do committees invent? Datamation 14 (5):28–31
Crowston K, Howison J (2005) The social structure of free and open source software development. First Monday 10(2)
Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJournal Complex Systems: 1695. http://igraph.org
Curtis B, Krasner H, Iscoe N (1988) A field study of the software design process for large systems. Communications of the ACM 31(11):1268–1287
Dabbish L, Stuart C, Tsay J, Herbsleb J (2012) Social coding in GitHub: Transparency and collaboration in an open software repository. In: Proc. Int Conf Computer-supported Cooperative Work (CSCW), ACM, pp 1277–1286
Draheim D, Pekacki L (2003) Process-centric analytical processing of version control data. In: Proc Int Workshop Principles of Software Evolution (IWPSE) IEEE, pp 131–136
Ehrlich K, Helander M, Valetto G, Davies S, Williams C (2008) An analysis of congruence gaps and their effect on distributed software development. In: Int. Workshop on Socio-Technical Congruence (STC), Available online
Ernst MD, Badros GJ, Notkin D (2002) An empirical analysis of c preprocessor use. IEEE Transactions on Software Engineering (TSE) 28(12):1146–1170
Espinosa JA (2002) Shared Mental Models and Coordination in Large-scale, Distributed Software Development PhD thesis, Graduate School of Industrial Administration, Carnegie Mellon University
Espinosa JA, Lerch FJ, Kraut RE (2002) Explicit vs. implicit coordination mechanisms and task dependencies: One size does not fit all. In: Team Cognition: Understanding the Factors that Drive Process and Performance, American Psychological Association, pp 107–129
Feigenspan J, Kästner C, Apel S, Liebig J, Schulze M, Dachselt R, Papendieck M, Leich T, Saake G (2012) Do background colors improve program comprehension in the #ifdef Hell?. Empirical Software Engineering 18 (4):699–745
Fenske W, Schulze S (2015) Code smells revisited: A variability perspective. In: Proc Int Workshop on Variability Modeling of Software-intensive Systems (VaMoS), ACM, pp 3–10
Fenske W, Schulze S, Meyer D, Saake G (2015) When code smells twice as much: Metric-based detection of variability-aware code smells. In: Int Working Conf Source Code Analysis and Manipulation (SCAM), IEEE, pp 171–180
Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proc Int Conf Software Maintenance (ICSM), IEEE, pp 190–198
Gharehyazie M, Filkov V (2017) Tracing distributed collaborative development in Apache software foundation projects. Empir Softw Eng 22(4):1795
Gkantsidis C, Mihail M, Zegura EW (2003) The Markov Chain simulation method for generating connected power law random graphs. In: Proc Workshop Algorithm Engineering and Experiments (ALENEX), SIAM, pp 16–25
Gneiting T, Ševčíková H, Percival DB (2012) Estimators of fractal dimension: Assessing the roughness of time series and spatial data. Statistical Science 27(2):247–277
Gobbi A, Iorio F, Albanese D, Jurman G, Saez-Rodriguez J (2017) BiRewire: High-Performing Routines for the Randomization of a Bipartite Graph (or a Binary Event Matrix), Undirected and Directed Signed Graph Preserving Degree Distribution (or Marginal Totals). https://www.bioconductor.org/packages/release/bioc/html/BiRewire.html
Grinter RE (1998) Recomposition: Putting it all back together again. In: Proc Int Conf Computer-supported Cooperative Work (CSCW) ACM, pp 393–402
Grinter RE, Herbsleb JD, Perry DE (1999) The geography of coordination: Dealing with distance in R&D work. In: Proc Int Conf Supporting Group Work (GROUP), ACM, pp 306–315
Gutwin C, Greenberg S (1999) The effects of workspace awareness support on the usability of real-time distributed groupware. ACM Transactions on Computer-Human Interaction 6(3):243–281
Herbsleb JD, Grinter RE (1999a) Architectures, coordination, and distance: Conway’s law and beyond. IEEE Software 16(5):63–70
Herbsleb JD, Grinter RE (1999b) Splitting the organization and integrating the code: Conway’s law revisited. In: Proc Int Conf Software Engineering (ICSE), ACM, pp 85–95
Herbsleb JD, Mockus A (2003a) An empirical study of speed and communication in globally distributed software development. IEEE Transactions on Software Engineering (TSE) 29(6):481–494
Herbsleb JD, Mockus A (2003b) Formulation and preliminary test of an empirical theory of coordination in software engineering. In: Proc Europ Software Engineering Conf and the Int Symposium Foundations of Software Engineering (ESEC/FSE), ACM, pp 138–147
Herbsleb JD, Roberts JA (2006) Collaboration in software engineering projects: A theory of coordination. In: Proc Int Conf Information Systems (ICIS), Association for Information Systems, pp 553–568
Hindle A, Germán DM, Holt RC (2008) What do large commits tell us? A taxonomical study of large commits. In: Proc Working Conf Mining Software Repositories (MSR), pp 99–108
Hunsen C, Zhang B, Siegmund J, Kästner C, Leßenich O, Becker M, Apel S (2016) Preprocessor-based variability in open-source and industrial software systems: An empirical study. Empirical Software Engineering 21(2):449–482
Jermakovics A, Sillitti A, Succi G (2011) Mining and visualizing developer networks from version control systems. In: Proc Int Workshop Cooperative and Human Aspects of Software Engineering (CHASE), ACM, pp 24–31
Joblin M, Mauerer W, Apel S, Siegmund J, Riehle D (2015) From developer networks to verified communities: A fine-grained approach. In: Proc Int Conf Software Engineering (ICSE), ACM, pp 563–573
Joblin M, Apel S, Mauerer W (2016) Evolutionary trends of developer coordination: A network approach. Empir Softw Eng 22(4):2050–2094
Joblin M, Apel S, Hunsen C, Mauerer W (2017) Classifying developers into core and peripheral: An empirical study on count and network metrics. In: Proc Int Conf Software Engineering (ICSE), IEEE, pp 164–174
Kossinets G (2006) Effects of missing data in social networks. Soc Networks 28(3):247–268
Kwan I, Damian D (2011) Extending socio-technical congruence with awareness relationships. In: Proc Int Workshop on Social Software Engineering (SSE), ACM
Kwan I, Schröter A, Damian D (2009) A weighted congruence measure. In: Int Workshop on Socio-Technical Congruence (STC), Available online
Kwan I, Schröter A, Damian D (2011) Does Socio-Technical congruence have an effect on software build success? A study of coordination in a software project. IEEE Transactions on Software Engineering (TSE), 37(3):307–324
Kwan I, Cataldo M, Damian D (2012) Conway’s law revisited: The evidence for a task-based perspective. IEEE Softw 29(1):90–93
Levesque LL, Wilson JM, Wholey DR (2001) Cognitive divergence and shared mental models in software development project teams. J Organ Behav 22 (2):135–144
Li J, Carley KM, Eberlein A (2012) Assessing team performance from a socio-technical congruence perspective. In: Proc Int Conf Software and System Process (ICSSP), IEEE, pp 160–169
Liebig J, Apel S, Lengauer C, Kästner C, Schulze M (2010) An analysis of the variability in forty preprocessor-based software product lines. In: Proc Int Conf Software Engineering, (ICSE), ACM, pp 105–114
Liebig J, Kästner C, Apel S (2011) Analyzing the discipline of preprocessor annotations in 30 million lines of C code. In: Proc Int Conf Aspect-oriented Software Development (AOSD), ACM, pp 191–202
López-Fernández L, Robles G, González-Barahona JM (2004) Applying social network analysis to the information in CVS Repositories. In: Proc Int Workshop Mining Software Repositories (MSR), pp 101–105
MacCormack A, Baldwin C, Rusnak J (2012) Exploring the duality between product and organizational architectures: A test of the “Mirroring” hypothesis. Res Policy 41(8):1309–1324
Malone TW, Crowston (1990) What is coordination theory and how can it help design cooperative work systems? In: Proc Int Conf Computer-supported cooperative work (CSCW), ACM, pp 357–370
Mathieu JE, Heffner TS, Goodwin GF, Salas E, Cannon-Bowers JA (2000) The influence of shared mental models on team process and performance. J Appl Psychol 85(2):273–283
Meneely A, Williams L (2011) Socio-technical developer networks: Should we trust our measurements? In: Proc Int Conf Software Engineering (ICSE), ACM, pp 281–290
Milo R, Kashtan N, Itzkovitz S, Newman MEJ, Alon U (2004) On the uniform generation of random graphs with prescribed degree sequences. arXiv:cond-mat/0312028v2
Mitchell BS, Mancoridis S (2006) On the automatic modularization of software systems using the Bunch tool. IEEE Trans Softw Eng 32(3):193–208
Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology (TOSEM) 11(3):309–346
Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality: An empirical case study. In: Proc Int Conf Software Engineering (ICSE), ACM, pp 521–530
de Oliveira MC, Bonifácio R, Ramos GN, Ribeiro M (2016) Unveiling and reasoning about co-change dependencies. In: Proc Int Conf Modularity (MODULARITY), ACM Press, pp 25–36
Palmer SR, Felsing JM (2002) A Practical Guide to Feature-Driven Development. Prentice-Hall
Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15(12):1053–1058
Passos L, Padilla J, Berger T, Apel S, Czarnecki K, Valente MT (2015) Feature scattering in the large: A longitudinal study of Linux Kernel device drivers. In: Proc Int Conf Modularity (MODULARITY), ACM, pp 81–92
Passos L, Queiroz R, Mukelabai M, Berger T, Ape S, Czarnecki K, Padilla J (2018) A study of feature scattering in the Linux Kernel. IEEE Transactions on Software Engineering (TSE) pp 1–16, online first
Portillo-Rodríguez J, Vizcaíno A, Piattini M, Beecham S (2014) Using agents to manage socio-technical congruence in a global software engineering project. Inf Sci 264:230–259
Poshyvanyk D, Marcus A, Ferenc R, Gyimóthy T (2009) Using information retrieval based coupling measures for impact analysis. Empir Softw Eng 14(1):5–32
Queiroz R, Passos L, Valente MT, Hunsen C, Apel S, Czarnecki K (2015) The shape of feature code: An analysis of twenty C-preprocessor-based systems. Software and Systems Modeling (SoSyM) 16(1):77–96
R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.r-project.org
Ramsauer R, Lohmann D, Mauerer W (2019) The list is the process: Reliable pre-integration tracking of commits on mailing lists. In: Proc Int Conf Software Engineering (ICSE), IEEE, pp 807–818
Rouse WB, Cannon-Bowers JA, Salas E (1992) The role of mental models in team performance in complex systems. IEEE Transactions on Systems, Man, and Cybernetics 22(6):1296–1308
Sarma A, Noroozi Z, van der Hoek A (2003) Palantir: Raising awareness among configuration management workspaces. In: Proc Int Conf Software Engineering (ICSE), IEEE, pp 444–454
Sarma A, Herbsleb J, van der Hoek A (2008) Challenges in Measuring, Understanding, and Achieving Social-Technical Congruence. Tech rep. Institute for Software Research, Carnegie Mellon University
Sarma A, Maccherone L, Wagstrom P, Herbsleb J (2009) Tesseract: Interactive visual exploration of Socio-Technical relationships in software development. In: Proc Int Conf Software Engineering (ICSE), IEEE, pp 23–33
Schroeder M (1992) Fractals, chaos, power laws: Minutes from an infinite paradise. W. H. Freeman, pp 211–215
Schwind M, Wegmann C (2008) SVNNAT: measuring collaboration in software development networks. In: Proc Int Conf E-commerce Technology and Int Conf Enterprise Computing, E-commerce, and E-services (CEC/EEE), IEEE, pp 97–104
Scozzi B, Crowston K, Eseryel UY, Li Q (2008) Shared mental models among open source software developers. In: Proc Hawaii Int Conf System Sciences (HICSS), IEEE, pp 1–10
Sevcikova H, Percival D, Gneiting T (2014) fractaldim : Estimation of Fractal Dimensions. https://CRAN.R-project.org/package=fractaldim
Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of escherichia coli. Nat Genet 31 (1):64–68
Sierra JM, Vizcaíno A, Genero M, Piattini M (2018) A systematic mapping study about socio-technical congruence. Information and Software Technology (IST) 94:111–129
Sommerville I (2010) Software Engineering, nineth edn. Addison-Wesley
Sosa ME, Eppinger SD, Pich M, McKendrick DG, Stout SK (2002) Factors that influence technical communication in distributed product development: An empirical study in the telecommunications industry. IEEE Transactions on Engineering Management 49(1):45–58
de Souza C, Froehlich J, Dourish P (2005) Seeking the source: software source code as a social and technical artifact. In: Proc Int Conf Supporting Group Work (GROUP), ACM, pp 197–206
Storey MA, Singer L, Figueira Filho F, Zagalsky A, German DM (2016) How social and communication channels shape and challenge a participatory culture in software development. IEEE Transactions on Software Engineering (TSE) 41(7):1–20
Stout R, Salas E (1993) The role of planning in coordinated team decision making: Implications for training. Proc Human Factors and Ergonomics Society Annual Meeting 37(18):1238–1242
Toral SL, Martínez-Torres MR, Barrero F (2010) Analysis of virtual communities supporting OSS projects using social network analysis. Information and Software Technology (IST) 52(3):296–303
Valetto G, Helander M, Ehrlich K, Chulani S, Wegman M, Williams C (2007) Using software repositories to investigate socio-technical congruence in development projects. In: Proc Int Workshop Mining Software Repositories (MSR), IEEE, pp 25:1–25:4
Valetto G, Chulani S, Williams C (2008) Balancing the value and risk of socio-technical congruence. In: Int Workshop on Socio-Technical Congruence (STC), Available Online
Wagstrom P, Herbsleb JD, Carley KM (2010) Communication, team performance, and the individual: Bridging technical dependencies. Academy of Management Proceedings 2010(1):1–7
Xuan Q, Filkov V (2014) Building it together: Synchronous development in OSS. In: Proc Int Conf Software Engineering (ICSE). ACM, pp 222–233
Xuan Q, Gharehyazie M, Devanbu PT, Filkov V (2012) Measuring the effect of social communications on individual working rhythms: A case study of open source software. In: Proc Int Conf Social Informatics (SocInfo). IEEE, pp 78–85
Xuan Q, Fang H, Fu C, Filkov V (2015) Temporal motifs reveal collaboration patterns in online task-oriented networks. Phys Rev E 91:052813
Zimmermann T, Weißgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proc Int Conf Software Engineering (ICSE). IEEE, pp 563–572
Acknowledgements
We thank Alexander Grebhahn, Angelika Schmid, Thomas Bock, and Christian Kästner for their useful comments on previous versions of this paper and their encouragement. Furthermore, we thank all reviewers and editors for their valuable input to improve this article. This work was supported by the DFG (German Research Foundation, AP 206/5-1&2, AP 206/6-1&2, and AP 206/14-1). Siegmund’s work is funded by the Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation.Bavaria (ZD.B) and the DFG (SI 2045/2-2).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Filippo Lanubile
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hunsen, C., Siegmund, J. & Apel, S. On the fulfillment of coordination requirements in open-source software projects: An exploratory study. Empir Software Eng 25, 4379–4426 (2020). https://doi.org/10.1007/s10664-020-09833-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-020-09833-8