Analyzing the structure of Java software systems by weighted K-core decomposition

https://doi.org/10.1016/j.future.2017.09.039Get rights and content

Highlights

  • We propose an approach to empirically investigate the static and evolving topological properties enclosed in the weighted software networks by using weighted k-core decomposition.

  • We propose a weighted software network to represent the topological structure of a software system at the class level, which uses the coupling frequencies to assign weights to the edges.

  • Our approach is illustrated using a set of 16 open source software systems and several interesting observations are obtained.

Abstract

Statistical properties of un-weighted software networks have been extensively studied. However, software networks in their nature should be weighted. Understanding the properties enclosed in the weighted software networks can lead to better software engineering practices. In this paper, we construct a set of weighted software networks from real-world Java software systems and empirically investigate their topological properties by using weighted k-core decomposition. First, we investigate the static topological properties of the weighted k-core structure, and find that small value of the graph coreness is a property shared by many software systems, the distribution of weighted coreness follows a power law with an exponential cutoff, and weighted coreness and node degree are closely correlated with their spearman correlation coefficients larger than 0.94. Second, we analyze the evolving topological properties of the weighted k-core structure, including the graph coreness, size of the main core, and new members and vanishing members of the main core. Empirical results show that the graph coreness will keep relatively stable unless the system undergoes major changes, size of the main core keeps stable in its evolution, and new members or vanishing members of a main core are from or go to the shells very near the corresponding main cores. Finally, we apply the weighted k-core decomposition method to identify the key classes, and find that, compared with other nine approaches, our approach performs best in the whole set of subject systems according to the average ranking of the Friedman test. It can identify a majority of classes deemed important. This work could help developers to improve software understanding, propose new metrics for software measurement and evaluate the quality of the system in development.

Introduction

Over the past few years, the study of complex networks has gained overwhelming popularity [[1], [2], [3], [4]]. It provides a unifiedperspective for studying various complex systems simply bymodeling them as a network. Software systems, no matter object-orientation (OO) and structured programming, can be mapped to a network (or graph), also known as software network, where network nodes represent the software entities such as methods/attributes, classes/interfaces, or packages, and network edges (or links), couplings between them [5]. With the software becoming ever larger and complex, the idea of applying complex network theory to model large software and further to interpret their global statistical properties is viable [[5], [6]]. Great efforts have been made to understand the topological structure of software and many shared physics-like laws of software systems have been revealed such as scale-free [[5], [7], [8], [9]], small-world [[5], [9], [10]], and fractal properties [6].

k-core structure [[11], [12], [13]] is another interesting structural property that are not captured by scale-free, small-world, or other simple topological properties. An in-depth investigation of the k-core structures of software networks is very important for deeply understanding the inner characteristics of software systems[[12], [13], [14]]. Several related studies have been performed [[12], [13], [14], [15]]. However, one major limitation of these methods is that the software networks they used are un-weighted, which does not conform to the reality of a piece of software [[9], [16]]. Another limitation of the existing methods is that the software systems they analyzed are mainly written in C++ language. Little attention has been paid to the analysis of k-core structure of weighted software networks extracted from Java software systems.

The objective of this paper is to explore the characteristics of k-core structure in weighted software networks extracted from Java software systems. First, we formally represent the topological structure of Java software at the class level of granularity using a weighted software network, which takes into consideration the coupling frequencies between classes as weights. Second, we introduce the k-core decomposition method for weighted complex networks proposed in [17] (hereinafter referred to as Wk-core) and use it to calculate the k-core structure of the weighted software network. Wk-core will partition the weighted software network into a layered structure which will be further measured by amount of relevant properties by statistical parameters. Our approach could potentially uncover some characteristics enclosed in the topological structure of software systems, which can help developers to improve software understanding, propose new metrics for software measurement and evaluate the quality of the system in development.

The primary contributions of the current paper are as follows:

  • We propose an approach to empirically investigate the static and evolving topological properties enclosed in the weighted software networks by using weighted k-core decomposition.

  • We propose a weighted software network to represent the topological structure of a software system at the class level, which uses the coupling frequencies to assign weights to the edges.

  • Our approach is illustrated using a set of 16 open source software systems and several interesting observations are obtained.

The rest of this paper is structured as follows. Section 2 gives a brief overview of the related work on investigation of the k-core structures of software networks. In Section 3, we describes our approach in detail, with focus on the definition of the weighted software network and Wk-core. In Section 4, we use Wk-core to partition the weighted software network into a layered structure and use some statistical parameters to uncover some characteristics enclosed in the topological structure of software systems. In Section 5, we discuss the implications of the results obtained in the current work to software engineering. And we conclude this paper in Section 6.

Section snippets

Related work

To the best of our knowledge, there are only several research studies that have been performed to investigate the k-core structures of software networks. They are all published before the year of 2016.

Zhang et al. [[12], [14]] investigated the topological properties of a set of un-weighted software networks extracted from software systems at the class level, and found some noticeable properties such as small software coreness, high-core connecting tendency of classes, and evolution stability of

Method

Our approach works as follows. First, we will parse the .java files of a Java software system to extract meaningful structural information in the source code and propose a weighted software network to formally represent the extracted information. Second, we will employ Wk-core to obtain the k-core structure of the weighted software network. Finally, the k-core structure is characterized by a amount of relevant properties via statistical parameters. The following subsections will discuss the

Empirical study

We designed and conducted a set of experiments to investigate the topological structure and its evolution of real-world software systems using weighted k-core decomposition method. Our experiments were carried out on a PC at 2.6 GHz with 8 GB of RAM.

In the following sections, we describe in detail the objects of study (Section 4.1) and our analysis of the results (Section 4.2).

Implications for software engineering

Complex systems and complexity science are viewed as the ‘21st Century Science’ [40]. Its basic view is that the topological structure determines the function, emphasizing the view of the system as a whole. Software networks represent another important class of complex networks which can also be studied using complex network theory. It provides a different dimension to our understanding of software from the perspective of software as a whole, ignoring the microscopic details. Research on

Conclusions

In this work, we propose an approach to uncover the properties enclosed in the weighted software networks to help developers improve software understanding, propose new metrics for software measurement, and evaluate the quality of the system in development. To analyze the topological properties of software, we first propose a weighted class coupling network (WCCN) to represent a piece of software at the class level of granularity which takes into consideration the coupling frequency to assign

Acknowledgment

This work was supported by the National Key Research and Development Program of China (Nos.   2016YFB0800400 and 2014CB340404), the National Natural Science Foundation of China (Nos.   61273216, 61572371 and 61402406), the Zhejiang Provincial Nature Science Foundation of China (No.   LY15F020004) and the Commonweal Project of Science and Technology Department of Zhejiang Province (No.   2014C23008).

Weifeng Pan received his Ph.D. degree from School of Computer at Wuhan University, China, in 2011. He is presently an associate professor in School of Computer Science and Information Engineering at Zhejiang Gongshang University. He is also a member of China Computer Federation (CCF) and ACM. His current research interests include software engineering, service computing, complex networks, and intelligent computation.

References (51)

  • AI-GaradiM.A. et al.

    Identification of influential spreaders in online social networks using interaction weighted k-core decomposition method

    Physica A

    (2017)
  • KhanM.S. et al.

    Virtual community detection through the association between prime nodes in online social networks and its application to ranking algorithms

    IEEE Access

    (2016)
  • MyersC.R.

    Software systems as complex networks: Structure, function, and evolvability of software collaboration graphs

    Phys. Rev. E

    (2003)
  • PotaninA. et al.

    Scale-free geometry in OO programs

    Commun. ACM

    (2005)
  • ConcasG. et al.

    Power-laws in a large object-oriented software system

    IEEE Trans. Softw. Eng.

    (2007)
  • PanW.F. et al.

    Multi-granularity evolution analysis of software using complex network theory

    J. Syst. Sci. Complex.

    (2011)
  • MaY.T. et al.

    A hybrid set of complexity metrics for large-scale object-oriented software systems

    J. Comput. Sci. Tech.

    (2010)
  • BatageljV. et al.

    Generalized cores

    Adv. Data Anal. Classif.

    (2011)
  • ZhangH.H. et al.

    Using the k-core decomposition to analyze the static structure of large-scale software systems

    J. Supercomput.

    (2010)
  • LiH. et al.

    Research on hierarchy of large-scale software macro-topology base on k-core

    Chin. J. Electron.

    (2010)
  • H.H. Zhang, H. Zhao, W. Cai, M. Zhao, G.L. Luo, Visualization and cognition of large-scale software structure using the...
  • LiH. et al.

    Extraction and analysis of crucial fraction in software networks

    Int. J. Softw. Eng. Knowl. Eng.

    (2014)
  • PanW.F. et al.

    Measuring structural quality of object-oriented softwares via bug propagation analysis on weighted software networks

    J. Comput. Sci. Tech.

    (2010)
  • GarasA. et al.

    A k-shell decomposition method for weighted networks

    New J. Phys.

    (2012)
  • ProkhorenkoV. et al.

    Intent-based extensible real-time PHP supervision framework

    IEEE Trans. Inf. Forensics Secur.

    (2016)
  • Cited by (38)

    • An improved Nyström spectral graph clustering using k-core decomposition as a sampling strategy for large networks

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      Then, Batagelj and Zaversnik (2003) proposed an efficient k-core decomposition algorithm with a complexity of only O(m) to quickly obtain the k-core of networks. The k-core decomposition, which decomposes the entire graph into several k-core subgraphs is an efficient graph partition method, and the k-core of a graph called the densest core is a maximal size subgraph where each node has at least k neighbors in the subgraph (Pan et al., 2018; Al-garadi et al., 2017). Based on this idea, the densest cores of a graph roughly maintain their clustering structure (Alvarez-Hamelin et al., 2017).

    • Enhancing artificial bee colony algorithm with multi-elite guidance

      2021, Information Sciences
      Citation Excerpt :

      The experiments show that our approach can achieve promising results on most of the test functions, which are better or at least comparable to its competitors. In the future, the MGABC can be applied to more real-world problems, such as the software modular clustering problem [43–45]. Xinyu Zhou: Conceptualization, Methodology, Writing - original draft, Writing - review & editing.

    • Indicator & crowding distance-based evolutionary algorithm for combined heat and power economic emission dispatch

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      In addition, in the process of searching the best solution, we intend to adopt more efficient methods, such as parallel computing, to improve the practicability of the algorithm. When we use large-scale software in our future work, we will use new technologies in literature [35,36] to speed up the understanding of the software and conduct large-scale experiments better. Jiaze Sun: Conceptualization, Funding acquisition, Investigation, Software, Writing - review.

    • Software Module Clustering: An In-Depth Literature Analysis

      2022, IEEE Transactions on Software Engineering
    View all citing articles on Scopus

    Weifeng Pan received his Ph.D. degree from School of Computer at Wuhan University, China, in 2011. He is presently an associate professor in School of Computer Science and Information Engineering at Zhejiang Gongshang University. He is also a member of China Computer Federation (CCF) and ACM. His current research interests include software engineering, service computing, complex networks, and intelligent computation.

    Bing Li received his Ph.D., M.S., and B.A. degrees from Huazhong University of Science and Technology, China, in 2003, 1997 and 1990 respectively, all in computer science. He is presently a Professor and Ph.D. supervisor in International School of Software and Research Center for Complex Network at Wuhan University. He is also a senior member of China Computer Federation (CCF) and a member of ACM. His main research interests include requirements engineering, cloud computing, complex network, and semantic web service.

    Jing Liu is now an associate professor of State Key Laboratory of Software Engineering at Wuhan University. She is a member of China Computer Federation (CCF) and ACM. She received the Ph.D. degree from Wuhan University in 2007. Her current research interests include software metrics, software evolution and the interdisciplinary research between software engineering and complex networks.

    Yutao Ma is now an associate professor of State Key Laboratory of Software Engineering at Wuhan University. He is a member of China Computer Federation (CCF) and ACM. He received the Ph.D. degree from Wuhan University in 2007. His current research interests include software metrics, software evolution and the interdisciplinary research between software engineering and complex networks.

    Bo Hu is presently a researcher in Kingdee Research, Kingdee International Software Group Co. Ltd. He received his Ph.D. degree from State Key Laboratory of Software Engineering at Wuhan University, China, in 2011. His current research interests include software metrics, cloud computing, and complex networks.

    View full text