skip to main content
10.1145/3524842.3528492acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
short-paper

A time series-based dataset of open-source software evolution

Published: 17 October 2022 Publication History

Abstract

Software evolution is the process of developing, maintaining, and updating software systems. It is known that the software systems tend to increase their complexity and size over their evolution to meet the demands required by the users. Due to this fact, researchers have increasingly carried out studies on software evolution to understand the systems' evolution pattern and propose techniques to overcome inherent problems in software evolution. Many of these works collect data but do not make them publicly available. Many datasets on software evolution are outdated, and/or are small, and some of them do not provide time series from software metrics. We propose an extensive software evolution dataset with temporal information about open-source Java systems. To build this dataset, we proposed a methodology of four steps: selecting the systems using a criterion, extracting and measuring their releases, and generating their time series. Our dataset contains time series of 46 software metrics extracted from 46 open-source Java systems, and we make it publicly available.

References

[1]
Arwa Abuasad and Izzat M Alsmadi. 2012. Evaluating the correlation between software defect and design coupling metrics. In 2012 International Conference on Computer, Information and Telecommunication Systems (CITS). IEEE, 1--5.
[2]
M. Alenezi and M. Zarour. 2015. Modularity measurement and evolution in object-oriented open-source projects. ACM International Conference Proceeding Series 24--26-September-2015 (2015).
[3]
M. Aniche. 2015. Java code metrics calculator (CK). Available at: https://github.com/mauricioaniche/ck/. Accessed on January, 2021.
[4]
A. Capiluppi, J. Fernandez-Ramil, J. Higman, H.C. Sharp, and N. Smith. 2007. An empirical study of the evolution of an agile-developed software system. In Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 511--518.
[5]
A. Capiluppi, M. Morisio, and J.F. Ramil. 2004. The evolution of source folder structure in actively evolved open source systems. In 10th International Symposium on Software Metrics, 2004. Proceedings. IEEE, 2--13.
[6]
A. Capiluppi, M. Morisio, and J.F. Ramil. 2004. Structural evolution of an Open Source system: A case study. Program Comprehension, Workshop Proceedings 12 (2004), 172--182.
[7]
A. Capiluppi and J.F. Ramil. 2004. Studying the evolution of open source systems at different levels of granularity: Two case studies. International Workshop on Principles of Software Evolution (IWPSE) (2004), 113--118.
[8]
Michelle Cartwright. 1998. An empirical view of inheritance. Information and Software Technology 40, 14 (1998), 795--799.
[9]
S. R. Chidamber and C. F. Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on Software Engineering 20, 6 (June 1994), 476--493.
[10]
C. Couto, C. Maffort, R. Garcia, and M. T. Valente. 2013. COMETS: a dataset for empirical research on software evolution using source code metrics and time series analysis. ACM SIGSOFT Software Engineering Notes 38, 1 (2013), 1--3.
[11]
Cesar Couto, Christofer Silva, Marco Tulio Valente, Roberto Bigonha, and Nicolas Anquetil. 2012. Uncovering causal relationships between software metrics and bugs. In 2012 16th European Conference on Software Maintenance and Reengineering. IEEE, 223--232.
[12]
C. F. M. Couto. 2013. Predicting Software Defects with Causality Tests. Ph. D. Dissertation. UFMG, Belo Horizonte, Minas Gerais.
[13]
John Daly, Andrew Brooks, James Miller, Marc Roper, and Murray Wood. 1996. Evaluating inheritance depth on the maintainability of object-oriented software. Empirical Software Engineering 1, 2 (1996), 109--132.
[14]
M. D'Ambros, M. Lanza, and R. Robbes. 2010. An extensive comparison of bug prediction approaches. In MSR 2010. IEEE, 31--41.
[15]
David P Darcy, Sherae L Daniel, and Katherine J Stewart. 2010. Exploring complexity in open source software: Evolutionary patterns, antecedents, and outcomes. In 2010 43rd Hawaii International Conference on System Sciences. IEEE, 1--11.
[16]
Sinan Eski and Feza Buzluca. 2011. An empirical study on object-oriented metrics and software evolution in order to reduce testing costs by predicting change-prone classes. In 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops. IEEE, 566--571.
[17]
M.W. Godfrey and Q. Tu. 2000. Evolution in open source software: A case study. In Proceedings 2000 International Conference on Software Maintenance. IEEE, 131--142.
[18]
J.M. Gonzalez-Barahona, G. Robles, M. Michlmayr, J.J. Amor, and D.M. German. 2009. Macro-level software evolution: a case study of a large software compilation. Empirical Software Engineering 14, 3 (2009), 262--285.
[19]
Georgios Gousios. 2013. The GHTorent Dataset and Tool Suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (San Francisco, CA, USA) (MSR '13). IEEE Press, 233--236.
[20]
F. Grigorio, D. Brito, E. Anjos, and M. Zenha-Rela. 2015. On systems project abandonment: An analysis of complexity during development and evolution of FLOSS systems. IEEE International Conference on Adaptive Science and Technology, ICAST 2015-January (2015).
[21]
Rachel Harrison, Steve Counsell, and Reuben Nithi. 2000. Experimental assessment of the effect of inheritance on the maintainability of object-oriented systems. Journal of Systems and Software 52, 2--3 (2000), 173--179.
[22]
Les Hatton, Diomidis Spinellis, and Michiel van Genuchten. 2017. The long-term growth rate of evolving software: Empirical results and implications. Journal of Software: Evolution and Process 29, 5 (2017), e1847.
[23]
I. Herraiz, J.M. Gonzalez-Barahona, and G. Robles. 2007. Towards a theoretical model for software growth. In Fourth International Workshop on Mining Software Repositories (MSR'07: ICSE Workshops 2007). IEEE, 21--21.
[24]
Israel Herraiz, Gregorio Robles, Jesús M González-Barahona, Andrea Capiluppi, and Juan F Ramil. 2006. Comparison between SLOCs and number of files as size metrics for software evolution analysis. In Conference on Software Maintenance and Reengineering (CSMR'06). IEEE, 8--pp.
[25]
C. Izurieta and J. Bieman. 2006. The evolution of FreeBSD and Linux. In Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering. ACM, 204--211.
[26]
Stefan Koch. 2007. Software evolution in open source projects---a large-scale investigation. Journal of Software Maintenance and Evolution: Research and Practice 19, 6 (2007), 361--382.
[27]
Manny M Lehman. 1996. Laws of software evolution revisited. In European Workshop on Software Process Technology. Springer, 108--124.
[28]
Meir M Lehman, Juan F Ramil, Paul D Wernick, Dewayne E Perry, and Wladyslaw M Turski. 1997. Metrics and laws of software evolution-the nineties view. In Proceedings Fourth International Software Metrics Symposium. IEEE, 20--32.
[29]
Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus. 2019. World of code: an infrastructure for mining the universe of open source VCS data. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 143--154.
[30]
Bertrand Meyer. 1996. The many faces of inheritance: A taxonomy of taxonomy. Computer 29, 5 (1996), 105--108.
[31]
Emal Nasseri, Steve Counsell, and M Shepperd. 2008. An empirical study of evolution of inheritance in Java OSS. In 19th Australian Conference on Software Engineering (aswec 2008). IEEE, 269--278.
[32]
U. Raja, D.P. Hale, and J.E. Hale. 2009. Modeling software evolution defects: A time series approach. Journal of Software Maintenance and Evolution 21, 1 (2009), 49--71.
[33]
Gregorio Robles, Juan Jose Amor, Jesus M Gonzalez-Barahona, and Israel Herraiz. 2005. Evolution and growth in large libre software projects. In Eighth International Workshop on Principles of Software Evolution (IWPSE'05). IEEE, 165--174.
[34]
SEDataset. 2021. A Time Series-Based Dataset of Open-Source Software Evolution. https://brunolsousa.github.io/software-evolution-dataset/index.html.
[35]
Garry Singh and M Daud Ahmed. 2017. Effect of Coupling on Change in Open Source Java Systems. In Proceedings of the Australasian Computer Science Week Multiconference (Geelong, Australia) (ACSW '17). ACM, Association for Computing Machinery, New York, NY, USA.
[36]
E. Tempero, C. Anslow, J. Dietrich, T. Han, J. Li, M. Lumpe, H. Melton, and J. Noble. 2010. The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies. In Software Engineering Conference (APSEC), 2010 17th Asia Pacific, APSEC (Ed.). IEEE, 336--345.
[37]
Ewan Tempero, James Noble, and Hayden Melton. 2008. How do Java programs use inheritance? An empirical study of inheritance in Java software. In European Conference on Object-Oriented Programming. Springer, 667--691.
[38]
R. Vasa, M. Lumpe, and A. Jones. 2010. Helix - Software Evolution Data Set. http://www.ict.swin.edu.au/research/projects/helix.

Cited By

View all
  • (2024)A longitudinal study on the temporal validity of software samplesInformation and Software Technology10.1016/j.infsof.2024.107404168:COnline publication date: 17-Apr-2024
  • (2024)Teaching Empirical Methods at Eindhoven University of TechnologyHandbook on Teaching Empirical Software Engineering10.1007/978-3-031-71769-7_7(179-207)Online publication date: 25-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
May 2022
815 pages
ISBN:9781450393034
DOI:10.1145/3524842
© 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dataset
  2. open-source software
  3. software evolution
  4. software metrics
  5. time series

Qualifiers

  • Short-paper

Conference

MSR '22
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)3
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A longitudinal study on the temporal validity of software samplesInformation and Software Technology10.1016/j.infsof.2024.107404168:COnline publication date: 17-Apr-2024
  • (2024)Teaching Empirical Methods at Eindhoven University of TechnologyHandbook on Teaching Empirical Software Engineering10.1007/978-3-031-71769-7_7(179-207)Online publication date: 25-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media