Skip to main content

Can Commit Change History Reveal Potential Fault Prone Classes? A Study on GitHub Repositories

  • Conference paper
  • First Online:
Software Technologies (ICSOFT 2018)

Abstract

Various studies had successfully utilized graph theory analysis as a way to gain a high-level abstraction view of the software systems, such as constructing the call graph to visualize the dependencies among software components. The level of granularity and information shown by the graph usually depends on the input such as variable, method, class, package, or combination of multiple levels. However, there are very limited studies that investigated how software evolution and change history can be used as a basis to model software-based complex network. It is a common understanding that stable and well-designed source code will have less update throughout a software development lifecycle. It is only those code that were badly design tend to get updated due to broken dependencies, high coupling, or dependencies with other classes. This paper put forward an approach to model a commit change-based weighted complex network based on historical software change and evolution data captured from GitHub repositories with the aim to identify potential fault prone classes. Four well-established graph centrality metrics were used as a proxy metric to discover fault prone classes. Experiments on ten open-source projects discovered that when all centrality metrics are used together, it can yield reasonably good precision when compared against the ground truth.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ma, Y.T., He, K.Q., Li, B., Liu, J., Zhou, X.Y.: A hybrid set of complexity metrics for large-scale object-oriented software systems. J. Comput. Sci. Technol. 25, 1184–1201 (2010)

    Article  Google Scholar 

  2. Concas, G., Marchesi, M., Murgia, A., Tonelli, R., Turnu, I.: On the Distribution of Bugs in the Eclipse System. IEEE T Softw. Eng. 37, 872–877 (2011)

    Article  Google Scholar 

  3. Turnu, I., Concas, G., Marchesi, M., Tonelli, R.: The fractal dimension of software networks as a global quality metric. Inform. Sci. 245, 290–303 (2013)

    Article  MathSciNet  Google Scholar 

  4. Zimmermann, T., Nagappan, N.: Predicting defects using network analysis on dependency graphs. In: Proceedings of the 30th International Conference on Software Engineering, pp. 531–540. ACM (2008)

    Google Scholar 

  5. Hyland-Wood, D., Carrington, D., Kaplan, S.: Scale-free nature of java software package, class and method collaboration graphs. In: Proceedings of the 5th International Symposium on Empirical Software Engineering, Rio de Janeiro, Brasil (2006)

    Google Scholar 

  6. Chong, C.Y., Lee, S.P.: Analyzing maintainability and reliability of object-oriented software using weighted complex network. J. Syst. Softw. 110, 28–53 (2015)

    Article  Google Scholar 

  7. Chong, C.Y., Lee, S.P.: Automatic clustering constraints derivation from object-oriented software using weighted complex network with graph theory analysis. J. Syst. Softw. 133, 28–53 (2017)

    Article  Google Scholar 

  8. Myers, C.R.: Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys. Rev. E 68, 046116 (2003)

    Article  Google Scholar 

  9. Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: An in-depth study of the promises and perils of mining GitHub. Empirical Softw. Eng. 21(5), 2035–2071 (2016)

    Article  Google Scholar 

  10. Begel, A., Bosch, J., Storey, M.A.: Social networking meets software development: perspectives from GitHub, MSDN, stack exchange, and TopCoder. Softw. IEEE 30, 52–66 (2013)

    Article  Google Scholar 

  11. Gousios, G., Pinzger, M., Deursen, A.V.: An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, pp. 345–355. ACM, Hyderabad (2014)

    Google Scholar 

  12. Nagappan, N., Zeller, A., Zimmermann, T., Herzig, K., Murphy, B.: Change bursts as defect predictors. In: 2010 IEEE 21st International Symposium on Software Reliability Engineering (ISSRE), pp. 309–318. IEEE (2010)

    Google Scholar 

  13. Chong, C.Y., Lee, S.P.: A commit change-based weighted complex network approach to identify potential fault prone classes. In: 13th International Conference on Software Technologies, pp. 471–482 (2018)

    Google Scholar 

  14. Potanin, A., Noble, J., Frean, M., Biddle, R.: Scale-free geometry in OO programs. Commun. ACM 48, 99–103 (2005)

    Article  Google Scholar 

  15. Concas, G., Marchesi, M., Pinna, S., Serra, N.: Power-laws in a large object-oriented software system. IEEE Trans. Softw. Eng. 33, 687–708 (2007)

    Article  Google Scholar 

  16. Louridas, P., Spinellis, D., Vlachos, V.: Power laws in software. ACM Trans. Softw. Eng. Methodol. 18, 1–26 (2008)

    Article  Google Scholar 

  17. Pang, T.Y., Maslov, S.: Universal distribution of component frequencies in biological and technological systems. Proc. Nat. Acad. Sci. 110(15), 6235–6239 (2013)

    Article  MathSciNet  Google Scholar 

  18. Baxter, G., et al.: Understanding the shape of Java software. In: Sigplan Notices, vol. 41, pp. 397–412 (2006)

    Article  Google Scholar 

  19. LaBelle, N., Wallingford, E.: Inter-package dependency networks in open-source software. arXiv preprint arXiv:cs/0411096 (2004)

  20. Oyetoyan, T.D., Falleri, J.R., Dietrich, J., Jezek, K.: Circular dependencies and change-proneness: an empirical study. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 241–250 (2015)

    Google Scholar 

  21. Valverde, S., Solé, R.V.: Hierarchical small worlds in software architecture. arXiv preprint arXiv:cond-mat/0307278 (2003)

  22. Zhang, B., Huang, G., Zheng, Z., Ren, J., Hu, C.: Approach to mine the modularity of software network based on the most vital nodes. IEEE Access (2018)

    Google Scholar 

  23. Muthukumaran, K., Choudhary, A., Murthy, N.L.B.: Mining GitHub for novel change metrics to predict buggy files in software systems. In: 2015 International Conference on Computational Intelligence and Networks, pp. 15–20 (2015)

    Google Scholar 

  24. Hassan, A.E.: Predicting faults using the complexity of code changes. In: Proceedings of the 31st International Conference on Software Engineering, pp. 78–88. IEEE Computer Society (2009)

    Google Scholar 

  25. Wiese, I.S., Kuroda, R.T., Re, R., Oliva, G.A., Gerosa, M.A.: An empirical study of the relation between strong change coupling and defects using history and social metrics in the apache aries project. In: Damiani, E., Frati, F., Riehle, D., Wasserman, Anthony I. (eds.) OSS 2015. IAICT, vol. 451, pp. 3–12. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17837-0_1

    Chapter  Google Scholar 

  26. Ambros, M.D., Lanza, M., Robbes, R.: On the relationship between change coupling and software defects. In: 2009 16th Working Conference on Reverse Engineering, pp. 135–144 (2009)

    Google Scholar 

  27. Ajienka, N., Capiluppi, A.: Understanding the interplay between the logical and structural coupling of software classes. J. Syst. Softw. 134, 120–137 (2017)

    Article  Google Scholar 

  28. Zimmermann, T., Weisgerber, P., Diehl, S., Zeller, A.: Mining version histories to guide software changes. In: Proceedings of the 26th International Conference on Software Engineering, pp. 563–572. IEEE Computer Society (2004)

    Google Scholar 

  29. Kagdi, H., Gethers, M., Poshyvanyk, D.: Integrating conceptual and logical couplings for change impact analysis in software. Empirical Softw. Eng. 18, 933–969 (2013)

    Article  Google Scholar 

  30. Yang, X., Lo, D., Xia, X., Sun, J.: TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inf. Softw. Technol. 87, 206–220 (2017)

    Article  Google Scholar 

  31. Xia, X., Lo, D., Pan, S.J., Nagappan, N., Wang, X.: HYDRA: massively compositional model for cross-project defect prediction. IEEE T. Softw. Eng. 42, 977–998 (2016)

    Article  Google Scholar 

  32. Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 159–170 (2017)

    Google Scholar 

  33. Guerrouj, L., et al.: Investigating the relation between lexical smells and change-and fault-proneness: an empirical study. Softw. Qual. J. 25, 641–670 (2017)

    Article  Google Scholar 

  34. Arnaoudova, V., Di Penta, M., Antoniol, G.: Linguistic antipatterns: what they are and how developers perceive them. Empirical Softw. Eng. 21, 104–158 (2016)

    Article  Google Scholar 

  35. Chong, C.Y.: 01 January 2019. https://github.com/chongchunyong/Commit-Change-based-WCN

Download references

Acknowledgement

This work was carried out within the framework of the research project FP001-2016 under the Fundamental Research Grant Scheme provided by Ministry of Higher Education, Malaysia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun Yong Chong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chong, C.Y., Lee, S.P. (2019). Can Commit Change History Reveal Potential Fault Prone Classes? A Study on GitHub Repositories. In: van Sinderen, M., Maciaszek, L. (eds) Software Technologies. ICSOFT 2018. Communications in Computer and Information Science, vol 1077. Springer, Cham. https://doi.org/10.1007/978-3-030-29157-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29157-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29156-3

  • Online ISBN: 978-3-030-29157-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics