Exploring community structure of software Call Graph and its applications in class cohesion measurement

doi:10.1016/j.jss.2015.06.015

Journal of Systems and Software

Volume 108, October 2015, Pages 193-210

https://doi.org/10.1016/j.jss.2015.06.015 Get rights and content

Highlights

•
We show that software static Call Graphs exhibit significant community structures.
•
We propose two new class cohesion metrics based on community structures.
•
The new metrics provide new and useful measurement of software class cohesion.
•
The new metrics perform better than existing metrics in software fault prediction.

Abstract

Many complex networked systems exhibit natural divisions of network nodes. Each division, or community, is a densely connected subgroup. Such community structure not only helps comprehension but also finds wide applications in complex systems. Software networks, e.g., Class Dependency Networks, are such networks with community structures, but their characteristics at the function or method call granularity have not been investigated, which are useful for evaluating and improving software intra-class structure. Moreover, existing proposed applications of software community structure have not been directly compared or combined with existing software engineering practices. Comparison with baseline practices is needed to convince practitioners to adopt the proposed approaches. In this paper, we show that networks formed by software methods and their calls exhibit relatively significant community structures. Based on our findings we propose two new class cohesion metrics to measure the cohesiveness of object-oriented programs. Our experiment on 10 large open-source Java programs validate the existence of community structures and the derived metrics give additional and useful measurement of class cohesion. As an application we show that the new metrics are able to predict software faults more effectively than existing metrics.

Introduction

Many natural and man-made complex networked systems, including metabolic networks, computer networks and social networks, exhibit divisions or clusters of network nodes (Flake, Lawrence, Giles, 2000, Fortunato, 2010, Girvan, Newman, 2002, Mucha, Richardson, Macon, Porter, Onnela, 2010, Palla, Derényi, Farkas, Vicsek, 2005). Each division, or community (Girvan and Newman, 2002), is a densely connected and highly correlated subgroup. Such community structure not only helps comprehension but also finds wide applications in complex systems. For example, researchers in Biology and Bioinformatics have applied community detection algorithms to identifying functional groups of proteins in Protein–Protein Interaction networks (Dunn, Dudbridge, Sanderson, 2005, Jonsson, Cavanna, Zicha, Bates, 2006). For online auction sites such as ebay.com, community structure is used to improve the effectiveness of the recommendation systems (Jin, Parkes, Wolfe, 2007, Reichardt, Bornholdt, 2007). A survey on the applications of community detection algorithms can be found in Fortunato (2010).

There are also research efforts to investigate community structures in software, a very complex system (Concas, Monni, Orru, Tonelli, 2013, Pan, Li, Ma, Liu, 2011, Šubelj, Bajec, 2011, Šubelj, Bajec, 2012, Šubelj, Žitnik, Blagus, Bajec, 2014). Most of them reported a significant community structure of a certain type of software network such as Class Dependency Networks (Šubelj and Bajec, 2011). Some pioneering applications of software community structure are proposed (for more details, please refer to Section 2). However, there are still some unsolved problems.

Firstly, most of the measurements are performed on the network of classes. Little results are reported on the granularity of software method or function call, i.e., method/function Call Graphs (Graham et al., 1982). Such investigation is necessary from both theoretical and practical perspectives. In addition, measurements of the network of classes cannot be used in intra-class structure, which limits their applications in software quality evaluation and improvement.

Secondly, these pioneering applications have not been directly compared or combined with existing software engineering metrics and practices. Comparison with baseline practices is needed to convince people to adopt the proposed approaches. Only when the proposed approaches outperform or complement the existing method, there can be a possibility that the approaches are adopted by software engineering practitioners.

Do software networks at other granularities also present significant community structures? If so how can we make use of it in software engineering practices? To answer these open questions and solve the existing problems, we construct static Call Graphs, in which nodes represent methods in an OO (Object-Oriented) program and edges represent methods invocation relations. We then apply existing community detection algorithms to such graphs. Fig. 1 depicts the community structure of jEdit, an open-source text editor. There are 5979 nodes that are divided into 34 communities shown in different colors. The community structure is detected by Louvain algorithm (Blondel et al., 2008) that is implemented in a network analysis and visualization tool called Pajek.¹ In Section 3, we show that such result presents typical community characteristics similar to those previously observed in other complex systems.

It is well known that high-quality software should exhibit “high cohesion and low coupling” nature. Software with such nature is believed to be easy to understand, modify, and maintain (Briand, Bunse, Daly, 2001, Pressman, 2010). Object-oriented design strives to incorporate data and related functionality into modules, which usually reduces coupling between modules. However, employing object-oriented mechanism itself does not necessarily guarantee minimal coupling and maximal cohesion. Therefore, a quantitative measurement is valuable in both a posteriori analysis of a finished product to control software quality, and a priori analysis to guide coding in order to avoid undesirable results in the first place.

The existence of community structures, as confirmed by our experiments on 10 large open-source Java programs using four widely-used community detection algorithms, sheds light on the cohesiveness measurements of OO programs. Intuitively, community structures are able to indicate cohesion as nodes within a community are highly cohesive, and nodes in different communities are loosely coupled. In this paper, we propose two new class cohesion metrics—MCC (Method Community Cohesion) and MCEC (Method Community Entropy Cohesion) based on community structures. The basic idea of MCC is to quantify how many methods of a certain class reside in the same community. As for MCEC, it uses the standard notion of Information Entropy (Shannon, 2001) to quantify the distribution of all the methods of a class among communities. Comparing with existing metrics, these two metrics provide a new and more systematic point of view for class cohesion measurement.

Fig. 2 gives the overview of our approach. Once a Call Graph is constructed, we apply widely-used community detection algorithms. Fig. 2 shows the static Call Graph of JHotDraw, a Java GUI framework for technical and structured graphics. There are 5125 nodes divided into 35 communities, as reported by Louvain algorithm (Blondel et al., 2008). Based on the community structures, the metrics of MCC and MCEC are computed.

We validate the proposed metrics using the following processes. Firstly, we show that MCC and MCEC theoretically satisfy expected properties of class cohesion metrics (Briand et al., 1998). Secondly, we empirically compare MCC and MCEC with five widely-used class cohesion metrics and our experiments indicate that the new metrics are more reasonable than existing ones. Thirdly, Principle Component Analysis (PCA; Pearson, 1901) is conducted to show that MCC and MCEC provide additional and useful information of class cohesion that is not reflected by existing metrics. Finally, experiments are carried out to show that MCC and MCEC usually perform equally or better than existing class cohesion metrics when they are used in software fault prediction.

In summary we make the following contributions in this paper:

1.
We show through experiments on 10 large open-source Java programs that the static Call Graphs constructed from OO programs usually exhibit relatively significant community structures as other networked complex systems (e.g., social networks). Such results are helpful in intra-class structure and quality evaluation.
2.
Based on community structures of Call Graphs, we propose two new class cohesion metrics. We conduct study to confirm the proposed metrics satisfy the theoretical requirements of cohesion metrics. The comparison with five existing metrics shows that the class cohesion metrics based on community structures can provide new insight of OO programs.
3.
We conduct empirical study and illustrate the effectiveness of the new metrics through software fault prediction experiments on four open-source programs with 1500 classes, among which there are 702 faulty ones. Results show that the new metrics usually perform equally or better than existing ones.

The rest of this paper is organized as follows. Section 2 reviews related work. In Section 3, community structures of 10 large open-source programs are investigated using four community detection algorithms. Two class cohesion metrics based on community structure are proposed in Section 4. Section 5 conducts empirical evaluations of the class cohesion metrics, followed by discussions on community detection algorithms and potential applications of the proposed metrics in Section 6. Finally Section 7 concludes the paper with future work.

Section snippets

Community structure of software

The significant progress of Complex Network theory (Barabási, Albert, 1999, Chakrabarti, Faloutsos, 2006, Watts, Strogatz, 1998), which was originally developed in Physics and Data Science, leads to wide adoption in different domains (Fortunato, 2010). In recent years, the theory has been successfully applied in the domain of software engineering, including software evolution process modeling and understanding (Li, Zhao, Cai, Xu, Ai, 2013, Pan, Li, Ma, Liu, 2011, Turnu, Concas, Marchesi, Pinna,

Call Graphs

For an OO program P, its Call Graph CG_P is a directed graph: $C G_{P} = (V, E),$ where each node v ∈ V represents a method in P, and the edge set E represents the method invocation relationships. Let m_i denotes the method that v_i refers to. Then v_i → v_j ∈ E if and only if m_i has at least one method invocation that calls m_j.

To empirically study community structures of software Call Graphs, a data set including 10 widely-used open-source Java programs is collected, as shown in Table 1: Ant is a Java

New class cohesion metrics

In this section we propose two class cohesion metrics based on software community structures.

Definition 1

Method Community Cohesion (MCC): Given a class C with m methods located in LCC, after applying a certain community detection algorithm, these m methods distribute in N communities. For the ith community, there are n_i methods belonging to C (1 ≤ i ≤ N). Let $n_{max} = \max {n_{i}}$ . We define $MCC (C) = {\begin{matrix} 1, & if m = 1, \\ 0, & if n_{max} = 1 and m \geq 2, \\ \frac{n_{max}}{m}, & otherwise . \end{matrix}$

The definition of MCC describes the largest portion of its methods that

Empirical evaluation of class cohesion metrics

In our empirical study we first compare our proposed class cohesion metrics with several existing ones, followed by two case studies. The purpose of the first case study is to determine whether MCC and MCEC provide additional information comparing with other well-known metrics. The second case study is to explore whether MCC and MCEC can lead to better results in class fault prediction. These two evaluation processes have been widely used in previous studies (Al Dallal, Briand, 2012, Gyimothy,

The (in)stability of community detection algorithms

Community detection algorithms may not obtain exactly the same results in different runs. To study this effect we run each community detection algorithm 100 times on jEdit. Results of these experiments are shown in Fig. 11. The number of detected communities are shown in the first subfigure, followed by the Q values in the second subfigure. It can be noticed that the fg and ml algorithms are very stable as both obtain exactly same results in all the experiments. On the other hand, the lp

Conclusions

In this paper, by using four community detection algorithms in the analysis of 10 widely-used open-source Java software systems, we have shown that software static Call Graphs usually present relatively significant community structures. Two class cohesion metrics have been proposed. The two metrics are based on the distributions of a class’s methods among communities, thus can reflect the class’s cohesiveness. We show that the proposed metrics can provide additional and useful information of

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China (91118005, 91218301, 91418205, 61221063, 61203174, 61428206 and U1301254), Doctoral Fund of Ministry of Education of China (20110201120010), 863 High Tech Development Plan of China (2012AA011003), 111 International Collaboration Program of China, and the Fundamental Research Funds for the Central Universities. We would also like to thank the anonymous reviewers for their insightful comments and valuable

Yu Qu received the B.S. degree from the School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China in 2006. He is currently a Ph.D. candidate student at the Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University. His research interests include trustworthy software and applying complex network and data mining theories to analyzing software systems.

References (69)

Al DallalJ.
The impact of accounting for special methods in the measurement of object-oriented class cohesion on refactoring and fault prediction activities
J. Syst. Softw.
(2012)
BoccalettiS. et al.
The structure and dynamics of multilayer networks
Phys. Rep.
(2014)
CaiK.-Y. et al.
Software execution processes as an evolving complex network
Inform. Sci.
(2009)
FortunatoS.
Community detection in graphs
Phys. Rep.
(2010)
LiH. et al.
A modular attachment mechanism for software network evolution
Phys. A: Stat. Mech. Appl.
(2013)
LiuY. et al.
Modeling class cohesion as mixtures of latent topics
IEEE International Conference on Software Maintenance, 2009 (ICSM’09)
(2009)
ŠubeljL. et al.
Community structure of complex software systems: Analysis and applications
Phys. A: Stat. Mech. Appl.
(2011)
TurnuI. et al.
A modified yule process to model the evolution of some object-oriented system properties
Inform. Sci.
(2011)
Al DallalJ.
Mathematical validation of object-oriented class cohesion metrics
Int. J. Comput.
(2010)
Al DallalJ.
Qualitative analysis for the impact of accounting for special methods in object-oriented class cohesion measurement
J. Softw.
(2013)

Al DallalJ. et al.

A precise method-method interaction-based cohesion metric for object-oriented classes

ACM Trans. Softw. Eng. Methodol. (TOSEM)

(2012)

BadriL. et al.

A proposal of a new class cohesion criterion: an empirical study

J. Object Technol.

(2004)

BarabásiA.-L. et al.

Emergence of scaling in random networks

Science

(1999)

BaxterG. et al.

Understanding the shape of java software

ACM SIGPLAN Notices

(2006)

BhattacharyaP. et al.

Graph-based analysis and prediction for software evolution

Proceedings of the 2012 International Conference on Software Engineering

(2012)

BiemanJ.M. et al.

Cohesion and reuse in an object-oriented system

ACM SIGSOFT Software Engineering Notes

(1995)

BlondelV.D. et al.

Fast unfolding of communities in large networks

J. Stat. Mech.: Theor. Exper.

(2008)

BoetticherG. et al.

Promise Repository of Empirical Software Engineering Data

(2007)

BriandL.C. et al.

A controlled experiment for evaluating quality guidelines on the maintainability of object-oriented designs

IEEE Trans. Softw. Eng.

(2001)

BriandL.C. et al.

A unified framework for cohesion measurement in object-oriented systems

Empir. Softw. Eng.

(1998)

ChakrabartiD. et al.

Graph mining: Laws, generators, and algorithms

ACM Comput. Survey (CSUR)

(2006)

ChenZ. et al.

A novel approach to measuring class cohesion based on dependence analysis

Proceedings of the International Conference on Software Maintenance, 2002

(2002)

ChidamberS.R. et al.

Towards a metrics suite for object oriented design

Conference Proceedings on Object-oriented Programming Systems, Languages, and Applications

(1991)

ChidamberS.R. et al.

A metrics suite for object oriented design

IEEE Trans. Softw. Eng.

(1994)

ClausetA. et al.

Finding community structure in very large networks

Phys. Rev. E

(2004)

ConcasG. et al.

A study of the community structure of a complex software network

2013 4th International Workshop on Emerging Trends in Software Metrics (WETSoM)

(2013)

DanonL. et al.

Comparing community structure identification

J. Stat. Mech.: Theor. Exper.

(2005)

DillS. et al.

Self-similarity in the web

ACM Trans. Internet Technol. (TOIT)

(2002)

DunnR. et al.

The use of edge-betweenness clustering to investigate biological function in protein interaction networks

BMC Bioinform.

(2005)

FlakeG.W. et al.

Efficient identification of web communities

Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

(2000)

FowlerM. et al.

Refactoring: Improving the Design of Existing Code

(1999)

GirvanM. et al.

Community structure in social and biological networks

Proc. Natl. Acad. Sci.

(2002)

GoodB.H. et al.

Performance of modularity maximization in practical contexts

Phys. Rev. E

(2010)

GrahamS.L. et al.

Gprof: A call graph execution profiler

ACM Sigplan Notices

(1982)

Cited by (41)

Kieker: A monitoring framework for software engineering research
2020, Software Impacts
Application-level monitoring and dynamic analysis of software systems are a basis for various tasks in software engineering research, such as performance evaluation and reverse engineering. The Kieker framework provides monitoring, analysis, and visualization support for these purposes. It commenced in 2006, and grew toward a high-quality open-source software that has been employed in a variety of software engineering research projects over the last decade. Several research groups constitute the open-source community to advance the Kieker framework. In this paper, we review Kieker’s history, development, and impact both in research and technology transfer with industry.
Improving binary diffing speed and accuracy using community detection and locality-sensitive hashing: an empirical study
2023, Journal of Computer Virology and Hacking Techniques
How Does Visualisation Help App Practitioners Analyse Android Apps?
2023, IEEE Transactions on Dependable and Secure Computing
RGDroid: Detecting Android Malware with Graph Convolutional Networks against Structural Attacks
2023, Proceedings - 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023
Towards Demystifying the Impact of Dependency Structures on Bug Locations in Deep Learning Libraries
2022, International Symposium on Empirical Software Engineering and Measurement
Evolving software forges: An experience report from Apache Allura
2021, Journal of Software: Evolution and Process

View all citing articles on Scopus

Xiaohong Guan received the B.S. and M.S. degrees from Tsinghua University, Beijing, China in 1982 and 1985 respectively, and his Ph.D. degree from the University of Connecticut in 1993. He was with the Division of Engineering and Applied Science, Harvard University from 1999 to 2000. He is the Cheung Kong Professor of Systems Engineering and the Dean of School of Electronic and Information Engineering, Xi’an Jiaotong University. He is also the Director of the Center for Intelligent and Networked Systems, Tsinghua University, and served as the Head of Department of Automation, 2003–2008. His research interests include cyber-physical systems and network security.

Qinghua Zheng received the B.S. and M.S. degrees in computer science and technology from Xi’an Jiaotong University, Xi’an, China in 1990 and 1993, respectively, and his Ph.D. degree in systems engineering from the same university in 1997. He was a postdoctoral researcher at Harvard University in 2002. Since 1995 he has been with the Department of Computer Science and Technology at Xi’an Jiaotong University, and was appointed director of the Department in 2008 and Cheung Kong Professor in 2009. His research interests include intelligent e-learning and trustworthy software.

Ting Liu received the B.S. and Ph.D. degrees from Xi’an Jiaotong University, Xi’an, China in 2003 and 2010 respectively. He is an associate professor in systems engineering at Xi’an Jiaotong University. His research interests include cyber-physical systems, network security and trustworthy software.

Lidan Wang received the B.S. degree from the School of Software Engineering, Xidian University, Xi’an, China in 2013. She is currently an M.S. candidate student at the Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University. Her research interests include trustworthy software and software engineering.

Yuqiao Hou received the B.S. degree from the School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China in 2012. She is currently an M.S. candidate student at the Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University. Her research interests include trustworthy software and software engineering.

Zijiang Yang is an associate professor in computer science at Western Michigan University. He holds a Ph.D. degree from the University of Pennsylvania, an M.S. degree from Rice University and a B.S. degree from the University of Science and Technology of China. Before joining WMU he was an associate research staff member at NEC Labs America. He was also a visiting professor at the University of Michigan from 2009 to 2013. His research interests are in the area of software engineering with the primary focus on the testing, debugging and verification of software systems. He is a senior member of IEEE.

View full text

Exploring community structure of software Call Graph and its applications in class cohesion measurement

Highlights

Abstract

Introduction

Section snippets

Community structure of software

Call Graphs

New class cohesion metrics

Empirical evaluation of class cohesion metrics

The (in)stability of community detection algorithms

Conclusions

Acknowledgments

J. Syst. Softw.

Phys. Rep.

Inform. Sci.

Phys. Rep.

Phys. A: Stat. Mech. Appl.

Phys. A: Stat. Mech. Appl.

Inform. Sci.

Mathematical validation of object-oriented class cohesion metrics

Int. J. Comput.

Qualitative analysis for the impact of accounting for special methods in object-oriented class cohesion measurement

J. Softw.

A precise method-method interaction-based cohesion metric for object-oriented classes

ACM Trans. Softw. Eng. Methodol. (TOSEM)

A proposal of a new class cohesion criterion: an empirical study

J. Object Technol.

Emergence of scaling in random networks

Science

Understanding the shape of java software

ACM SIGPLAN Notices

Graph-based analysis and prediction for software evolution

Proceedings of the 2012 International Conference on Software Engineering

Cohesion and reuse in an object-oriented system

ACM SIGSOFT Software Engineering Notes

Fast unfolding of communities in large networks

J. Stat. Mech.: Theor. Exper.

Promise Repository of Empirical Software Engineering Data

A controlled experiment for evaluating quality guidelines on the maintainability of object-oriented designs

IEEE Trans. Softw. Eng.

A unified framework for cohesion measurement in object-oriented systems

Empir. Softw. Eng.

Graph mining: Laws, generators, and algorithms

ACM Comput. Survey (CSUR)

A novel approach to measuring class cohesion based on dependence analysis

Proceedings of the International Conference on Software Maintenance, 2002

Towards a metrics suite for object oriented design

Conference Proceedings on Object-oriented Programming Systems, Languages, and Applications

A metrics suite for object oriented design

IEEE Trans. Softw. Eng.

Finding community structure in very large networks

Phys. Rev. E

A study of the community structure of a complex software network

2013 4th International Workshop on Emerging Trends in Software Metrics (WETSoM)

Comparing community structure identification

J. Stat. Mech.: Theor. Exper.

Self-similarity in the web

ACM Trans. Internet Technol. (TOIT)

The use of edge-betweenness clustering to investigate biological function in protein interaction networks

BMC Bioinform.

Efficient identification of web communities

Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Refactoring: Improving the Design of Existing Code

Community structure in social and biological networks

Proc. Natl. Acad. Sci.

Performance of modularity maximization in practical contexts

Phys. Rev. E

Gprof: A call graph execution profiler

ACM Sigplan Notices