Skip to main content
Log in

Change profile analysis of open-source software systems to understand their evolutionary behavior

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Source code management systems (such as git) record changes to code repositories of Open-Source Software (OSS) projects. The metadata about a change includes a change message to record the intention of the change. Classification of changes, based on change messages, into different change types has been explored in the past to understand the evolution of software systems from the perspective of change size and change density only. However, software evolution analysis based on change classification with a focus on change evolution patterns is still an open research problem. This study examines change messages of 106 OSS projects, as recorded in the git repository, to explore their evolutionary patterns with respect to the types of changes performed over time. An automated keyword-based classifier technique is applied to the change messages to categorize the changes into various types (corrective, adaptive, perfective, preventive, and enhancement). Cluster analysis helps to uncover distinct change patterns that each change type follows. We identify three categories of 106 projects for each change type: high activity, moderate activity, and low activity. Evolutionary behavior is different for projects of different categories. The projects with high and moderate activity receive maximum changes during 76–81 months of the project lifetime. The project attributes such as the number of committers, number of files changed, and total number of commits seem to contribute the most to the change activity of the projects. The statistical findings show that the change activity of a project is related to the number of contributors, amount of work done, and total commits of the projects irrespective of the change type. Further, we explored languages and domains of projects to correlate change types with domains and languages of the projects. The statistical analysis indicates that there is no significant and strong relation of change types with domains and languages of the 106 projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Lehman M M. Programs, life cycles and laws of software evolution. Proceedings of the IEEE, 1980, 68(9): 1060–1076

    Article  Google Scholar 

  2. Hindle A, Godfrey M, Holt R C. Mining recurrent activities: Fourier analysis of change events. In: Proceedings of the 31st International Conference on Software Engineering-Companion. 2009, 295–298

    Google Scholar 

  3. Mockus A, Votta L G. Identifying reasons for software changes using historic databases. In: Proceedings of International Conference on software Maintenance. 2000, 120–130

    Google Scholar 

  4. Hassan A. Automated classification of change messages in open source projects. ACM Symposium on Applied Computing. 2008, 837–841

    Google Scholar 

  5. Kolassa C, Riehle D, Salim M. The empirical commit frequency distribution of open source projects. In: Proceedings of ACM Joint International Symposium on Wikis and Open Collaboration. 2013

    Google Scholar 

  6. Lin S H, Ma Y T, Chen J X. Empirical evidence on developer’s commit activity for open-source software projects. In: Proceedings of the 25th International Conference on Software Engineering and Knowledge Engineering. 2013, 455–460

    Google Scholar 

  7. Tiwari P, Li W, Alomainy R, Wei B Y. An empirical study of different types of changes in the eclipse project. The Open Software Engineering Journal, 2013, 7: 24–37

    Article  Google Scholar 

  8. Kemerer C F, Slaughter S A. An empirical approach to studying software evolution. IEEE Transactions on Software Engineering, 1999, 25(4): 493–509

    Article  Google Scholar 

  9. Bennett K H. Software maintenance and evolution: a roadmap, In: Proceedings of the 22nd International Conference on Software Engineering. 2000, 73–78

    Google Scholar 

  10. Gupta A, Conradi R, Shull F, Cruzes D, Ackermann C, Rønneberg H, Landre E. Experience report on the effect of software development characteristics on change distribution. In: Proceedings of the 9th International Conference on Product Focused Software Process Improvement. 2008, 158–173

    Chapter  Google Scholar 

  11. Smith N, Capiluppi A, Ramil J F. A study of open source software evolution data using qualitative simulation. Software Process: Improvement and Practice, 2005, 10(3): 287–300

    Article  Google Scholar 

  12. Gonzalez-Barahona J, Robles G, Herriaz I, Ortega F. Studying the laws of software evolution in a long-lived FLOSS project. Journal of Software: Evolution and Process, 2014, 26(7): 589–612

    Google Scholar 

  13. Koch S. Evolution of open source software systems-a large-scale investigation. In: Proceedings of the 1st International Conference on Open Source Systems. 2005, 148–153

    Google Scholar 

  14. Schach S R, Jin B, Wright D R, Heller G Z, Offutt J. Determining the distribution of maintenance categories: survey versus measurement. Empirical Software Engineering, 2003, 8(4): 351–365

    Article  Google Scholar 

  15. Burch E, Kungs H J. Modeling software maintenance requests: a case study. In: Proceedings of the International Conference on Software Maintenance. 1997, 40–47

    Google Scholar 

  16. Swanson B. The dimensions of maintenance. In: Proceedings of the 2nd International Conference on Software Engineering. 1976, 492–497

    Google Scholar 

  17. IEEE. Standard for Software Maintenance (IEEE Std 1219–1998). New York: Institute for Electrical and Electronic Engineers, 1998

    Google Scholar 

  18. ISO/IEC FDIS 14764:1999(E). Software Engineering—Software Maintenance. Geneva: International Standards Organization, 1999

    Google Scholar 

  19. Lientz B P, Swanson E B, Tompkins G E. Characteristics of application software maintenance. Communication of the ACM, 1978, 21(6): 466–471

    Article  Google Scholar 

  20. Nosek J, Palvia T P. Software maintenance management: changes in the last decade. Journal of Software Maintenance: Research and Practice, 1990, 2(3): 157–174

    Article  Google Scholar 

  21. Lee M G, Jefferson T L. An empirical study of software maintenance of a Web-based Java application. In: Proceedings of the 21st IEEE International Conference on Software Maintenance. 2005, 571–576

    Google Scholar 

  22. Basili V, Briand L C, Condon S, Kim Y M, Melo W L, Valettt J D. Understanding and predicting the process of software maintenance releases. In: Proceedings of the 18th International Conference on Software Engineering. 1996, 464–474

    Chapter  Google Scholar 

  23. Sousa M J C, Moreira H M. A Survey on the software maintenance process. In: Proceedings of IEEE International Conference on Software Maintenance. 1998, 265–274

    Google Scholar 

  24. Yip S WL, Lam T. A software maintenance survey. In: Proceedings of the 1st Asia-Pacific Software Engineering Conference. 1994, 70–79

    Chapter  Google Scholar 

  25. Abran A, Nguyenkim H. Analysis of maintenance work categories through measurement. In: Proceedings of IEEE Conference on Software Maintenance. 1991, 104–113.

    Google Scholar 

  26. Gefen D, Schneberger S L. The non-homogeneous maintenance periods: a case study of software modifications. In: Proceedings of IEEE Conference on Software Maintenance. 1996, 134–141

    Google Scholar 

  27. Meqdadi O, Alhindawi N, Collard M L, Maletic J I. Towards understanding large-scale adaptive changes from version histories. In: Proceedings of the 29th IEEE International Conference on Software Maintenance. 2013, 416-419

    Google Scholar 

  28. Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research. 2003, 3: 993–1022

    MATH  Google Scholar 

  29. Kim S, Whitehead E J, Zhang Y. Classifying software changes: clean or buggy, IEEE Transactions on Software Engineering, 2008, 34(2): 181–196

    Article  Google Scholar 

  30. Lehnert S, Riebisch M. A taxonomy of change types and its application in software evolution. In: Proceedings of the 19th International Conference and Workshops on Engineering of Computer Based Systems. 2012, 98–107

    Google Scholar 

  31. Chaplin N, Hale J E, Khan K M, Ramil J F, Tan WG. Types of software evolution and software maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 2001, 13(1): 3–30

    Article  MATH  Google Scholar 

  32. Forward A, Lethbridge T C. A taxonomy of software types to facilitate search and evidence-based software engineering. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds. 2008, 14

    Google Scholar 

  33. Saini M, Kaur K. Analyzing the change profiles of software systems using their change logs. International Journal of Software Engineering- Egypt, 2014, 7(2): 39–66

    Google Scholar 

  34. Larose D T. K-nearest neighbor algorithm. Discovering Knowledge in Data: An Introduction to Data Mining, 2005, 90–106

    Google Scholar 

  35. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurements, 1960, 20(1): 37–46

    Article  Google Scholar 

  36. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of International Joint Conference on Artificial Intelligence. 1995, 1137–1145

    Google Scholar 

  37. Cleveland W S. LOWESS: a program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 1981, 35(1): 54

    Article  Google Scholar 

  38. Massart D L, Smeyers-Verbeke A J, Capron A X, Schlesier K B. Visual presentation of data by means of box plots. LC-GC Europe, 2005, 18(4): 2–5

    Google Scholar 

  39. Ramsay J O, Silverman BW. Applied Functional Data Analysis: Methods and Case Studies. New York: Springer-Verlag, 2002

    Book  MATH  Google Scholar 

  40. Cuesta-AlbertosJ A, Gordaliza A, Matrán C. Trimmed k-means: an attempt to robustify quantizers. The Annals of Statistics. 1997, 25(2): 553–576

    Article  MathSciNet  MATH  Google Scholar 

  41. Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann, 2000

    MATH  Google Scholar 

  42. Kothari R, Pitts D. On finding the number of clusters. Pattern Recognition Letters, 1999, 20(4): 405–416

    Article  Google Scholar 

  43. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of International Joint Conference on Artificial Intelligence. 1995, 1137–1145

    Google Scholar 

  44. Moore D S. Chi-square tests. Purdue University, 1976

    Google Scholar 

  45. Bolstad B M, Irizarry R A, Åstrand M, Speed T P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19(2): 185–193

    Article  Google Scholar 

Download references

Acknowledgements

This research work was performed under a UGC sanctioned research project. We acknowledge the UGC for providing the grant to perform the research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Saini.

Additional information

Munish Saini is a PhD student in the Department of Computer Science, Guru Nanak Dev University, India. He received his B. Tech degree in computer science and engineering from Sant Baba Bhag Singh Institute of Engineering and Technology, India and M.Tech in computer science and engineering from Dr. B. R. Ambedkar National Institute of Technology, India. His research interests are in data mining, open-source software, and software engineering.

Kuljit Kaur Chahal is an assistant professor in the Department of Computer Science, Guru Nanak Dev University, India. She received her PhD degree in computer science from Guru Nanak Dev University, India. Her research interests are in distributed computing, Web services security, and open-source software.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saini, M., Chahal, K.K. Change profile analysis of open-source software systems to understand their evolutionary behavior. Front. Comput. Sci. 12, 1105–1124 (2018). https://doi.org/10.1007/s11704-016-6301-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-6301-0

Keywords

Navigation