Community detection in software ecosystem by comprehensively evaluating developer cooperation intensity
Graphical abstract
Introduction
As soon as the concept of software ecosystem was first proposed in 2003 by Messerschmitt and Szyperski [1], it has aroused great interest in both academia [2], [3] and industry (Apple, Microsoft, SAP, etc.).
The main reason for the rapid development of software ecosystems lies in their economic, strategic, and technical advantages [4]. As the highest level of current software engineering, software ecosystem has high reliability and scalability, and can achieve rapid evolution and adjustment to the changing market needs [5]. Through the openness of the system, software enterprises can speed up the emergence of innovative achievements and share the development cost with partners, so as to improve the efficiency of software development and reduce the operating cost.
However, software ecosystem is still a new research topic, and the technology that sustains their success and growth is much less understood [6], [7]. The structural characteristics of software ecosystem are helpful to reveal the hidden rules and some behavior characteristics of the system, improve its viability and the recycling of knowledge, so as to ensure the stability and robustness of the system. How to mine the structural characteristics of software ecosystem is a subject worth studying.
In essence, software ecosystem can be described as a complex network. Community structures are critical towards understanding not only the network topology but also how the network functions [8]. Therefore, community detection has become a hot issue of complex network in the past decades. For software ecosystem, community detection can help us master its behavior, evolution and development, which is of great theoretical and practical significance [9]. Franco-Bedoya et al. have also stressed that the next challenge of software ecosystem becomes the relationship with all the parties in their ecosystem [10]. Community structure is an important reflect of individual relationship.
Since Girvan and Newman put forward the concept of community detection in 2002 [11], various community detection algorithms have been proposed, mainly including graph segmentation method [12], cluster-based method [13] and heuristic method [14].
The existing community detection methods of complex network are mainly based on the topology structure. However, the dependence and interaction between different individuals of software ecosystems are very complex, and the scale of data access is far larger than that of traditional software system. In order to fulfill reasonable community division, the network structure and other information of software ecosystem must be clearly described and analyzed. The existing community detection methods, due to the lack of full consideration of various characteristics, are difficult to obtain satisfactory community division in software ecosystem.
In view of this, this paper presents a method of community detection by comprehensively evaluating developer cooperation intensity. First, we combine network topology information and developer interaction information in software ecosystem to calculate the developer cooperation intensity, so as to deeply explore the relationship between developers from both topological and semantic properties. Then a community detection algorithm, Algorithm Based on Developer Cooperation Intensity (ABDCI), is proposed by referring to the hierarchical clustering idea of Louvain algorithm. Finally, the proposed method is applied to a typical open source software ecosystem, GitHub. The experimental results show that the proposed method can identify clear community structure for the developer collaboration network in GitHub.
The structure of this paper is as follows. Section 2 presents the related work. We introduce the relevant concepts and our research object in Section 3. Section 4 gives the measurement of the cooperation intensity between developers. A new community detection algorithm is proposed based on the idea of hierarchical clustering. We apply the proposed method to a typical open source community in the software ecosystem, GitHub in Section 5. Section 6 presents possible threats. Finally, conclusions are given in Section 7.
Section snippets
Related work
Since the paper studies community detection for software ecosystem, this section first summarizes the works on software ecosystem, and then reviews the methods of community detection.
Preliminaries
There are different types of relationships among developers in a software ecosystem, such as attention, submission of common repositories, comment interaction. Submission is a behavior that can best reflect the contribution of participants to the repository. Developers who submit products to the same repository always have similar skills and preference, and should be classified into the same cluster. So this paper build the software ecosystem network taking the joint repository submission as
Developer cooperation intensity
In traditional complex networks, the similarity between nodes is mainly calculated based on the topological properties. To actual networks, such as software ecosystem, in addition to the network topology, we need to consider the impact of other information when measure the similarity of vertices. On one hand, the relationship between developers is directly reflected by their cooperation behavior, which can be calculated by the connectivity (i.e. topology properties) in the developer cooperation
Algorithm based on developer cooperation intensity
Hierarchical clustering algorithm has a good performance in community detection, among which Louvain algorithm [35] has received much attention because of its high efficiency and stability. It iterates by constantly merging communities with larger modularity gains. Louvain algorithm can be viewed as a two-level iterative merge process: the first stage considers each node as a community firstly, and merge nodes into the community with the largest module gain by calculating modularity function.
Description of datasets
In order to verify the effectiveness of the method proposed in this paper, we choose GitHub, a popular code hosting platform, to collect data for experiments. GitHub is a more complex system with different types and sizes of software ecosystem, which is originally used for code sharing. Users can push code to it and collaborate with others on repositories. With the rapid increase in the number of developers and repositories, GitHub becomes the most popular and widely used system, from which we
Threats to the validity
This section proposes main threats to the validity of the experiments and the methods of addressing them.
First of all, there are internal threats. The interaction between developers is complex, so it is difficult to find a reasonable quantitative way to deeply explore their potential relationship. In order to avoid the one-sidedness of traditional topology model, this paper adds semantic properties analysis. We define the topological similarity function and semantic similarity function from two
Conclusion
This paper aims at solving the community detection problem of real developer collaboration network in software ecosystem. Considering the shortcomings of single properties of nodes in traditional complex networks, this paper defines the cooperation intensity of developers by synthesizing the semantic properties of developers and the topological properties of the network, and proposes a community detection algorithm ABDCI based on the cooperation intensity of developers. Experiments on different
CRediT authorship contribution statement
Tingting Hou: Conception or design of the work, Acquisition, analysis, or interpretation of data, Drafted the work or revised it critically for important intellectual content, Approved the final version to be published, Agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is jointly funded by National Natural Science Foundation of China (No. 61573362 and 61773384), National Key Research and Development Program of China (No. 2018YFB1003802) and Fundamental Research Funds for the Central Universities (2020ZDPYMS40).
References (37)
A focus area maturity model for software ecosystem governance
Inf. Softw. Technol.
(2020)Revisiting software ecosystems research: A longitudinal literature study
J. Syst. Softw.
(2016)- et al.
Variability mechanisms in software ecosystems
Inf. Softw. Technol.
(2014) - et al.
Open source software ecosystems: A Systematic mapping
Inf. Softw. Technol.
(2017) - et al.
Special issue editorial: Understanding software ecosystems
Inf. Softw. Technol.
(2014) - et al.
Evolution of the R software ecosystem: metrics, relationships, and their impact on qualities
J. Syst. Softw.
(2017) Software Engineering beyond the Project - sustaining Software Ecosystems
Inf. Softw. Technol.
(2014)- et al.
Startup ecosystem effect on minimum viable product development in software startups
Inf. Softw. Technol.
(2019) - et al.
Analysis and design of software ecosystem architectures-Towards the 4S telemedicine ecosystem
Inf. Softw. Technol.
(2014) - et al.
Bridges and barriers to hardware-dependent software ecosystem participation - A case study
Inf. Softw. Technol.
(2014)
A link clustering based overlapping community detection algorithm
Data Knowl. Eng.
Discovering how end-user programmers and their communities use public repositories: A study on Yahoo! Pipes
Inf. Softw. Technol.
Does UML make the grade? Insights from the software development community
Inf. Softw. Technol.
Analysis of virtual communities supporting OSS projects using social network analysis
Inf. Softw. Technol.
A cascade information diffusion based label propagation algorithm for community detection in dynamic social networks
J. Comput. Sci.
Software Ecosystem: Understanding an Indispensable Technology and Industry
Cited by (9)
Multi-objective optimization and integrated indicator-driven two-stage project recommendation in time-dependent software ecosystem
2024, Information and Software TechnologyA method framework for identifying digital resource clusters in software ecosystems
2024, Decision Support SystemsA community detection approach based on network representation learning for repository mining
2023, Expert Systems with ApplicationsOverlapping community detection in software ecosystem based on pheromone guided personalized PageRank algorithm
2023, Information and Software TechnologyMETHODS: A meta-path-based method for heterogeneous community detection in the open source software ecosystem
2023, Information and Software TechnologyParallel multi-objective evolutionary optimization based dynamic community detection in software ecosystem
2022, Knowledge-Based SystemsCitation Excerpt :The experimental environment is Intel(R) Core(TM) i3-6100M CPU @ 3.70 GHz, 4G memory, and the programming language is Python. In a software ecosystem, there are complex relationships among entities, where the developer cooperation relationship is the most representative one [7]. If two developers commit a repository together, the cooperation relationship between them will exist.