research-article

Is There an Interplay Between Library Usage and Repository Features?: An Analysis with Regression Models

Authors:

João Victor Esteves,

Daniel Coutinho,

Marcelo Schots,

Igor Machado CoelhoAuthors Info & Claims

SBES '19: Proceedings of the XXXIII Brazilian Symposium on Software Engineering

Pages 407 - 416

https://doi.org/10.1145/3350768.3351800

Published: 23 September 2019 Publication History

Abstract

The advent of open source has changed the way developers reuse software. The availability of libraries and their corresponding source code in public software repositories enables new forms of analyzing project aspects that can provide clues on their stability and maintainability. However, the literature lacks studies aiming to identify and understand whether and which repository features may correlate with the likeliness of usage of a library. In this sense, we present a factorial experiment using three different regression models - Multiple Linear Regression, Random Forest, and Neural Networks -, aiming at analyzing whether there is a correlation between library usage and a set of features extracted from release management and version control repositories. The results allowed to map features with positive learning impact, such as the number of stars, pull requests, and number of downloads, as well as features that contributed much less to the models (e.g., the repository size). Although the impact level of each feature varied from model to model, we also noticed from the analysis of regression results that the models were capable of achieving higher accuracy when considering only a subset of features.

Paper category: Experimental; Language: English

References

[1]

A S Badashian and E Stroulia. 2016. Measuring user influence in GitHub: the million follower fallacy. In IEEE/ACM 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE). Austin, USA, 15--21.

Digital Library

[2]

B Barnes, T Durek, J Gaffney, and A Pyster. 1988. A Framework and Economic Foundation for Software Reuse. In Software Reuse: Emerging Technology, Will Tracz (Ed.). IEEE Computer Society Press, Los Alamitos, USA, 77--88.

[3]

V R Basili, H D Rombach, J Bailey, and A Delis. 1990. Ada reusability analysis and measurement. In Empirical Foundations of Information and Software Science V, P. Zunde and D. Hocking (Eds.). Springer, Boston, USA, 355--368.

[4]

C M Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag, New York, USA.

Digital Library

[5]

K Blincoe, J Sheoran, S Goggins, E Petakovic, and D Damian. 2016. Understanding the popular users: Following, affiliation influence and leadership on GitHub. Information and Software Technology 70 (2016), 30--39.

Digital Library

[6]

H Borges, A Hora, and M T Valente. 2016. Predicting the popularity of GitHub repositories. In The 12th International Conference on Predictive Models and Data Analytics in Software Engineering. Ciudad Real, Spain, 9.

Digital Library

[7]

H Borges, A Hora, and M T Valente. 2016. Understanding the factors that impact the popularity of GitHub repositories. In 2016 IEEE International Conference on Software Maintenance and Evolution. Raleigh, USA, 334--344.

[8]

H Borges and M T Valente. 2018. What's in a GitHub star? understanding repository starring practices in a social coding platform. Journal of Systems and Software 146 (2018), 112--129.

[9]

L Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32.

Digital Library

[10]

D L Civco. 1993. Artificial neural networks for land-cover classification and mapping. International Journal of Geographical Information Science 7, 2 (1993), 173--186.

[11]

T Davis. 1993. The reuse capability model: a basis for improving an organization's reuse capability. In 2nd International Workshop on Software Reusability - Advances in Software Reuse. Lucca, Italy, 126--133.

[12]

W Frakes and C Terry. 1994. Reuse level metrics. In 3rd International Conference on Software Reuse: Advances in Software Reusability. Rio de Janeiro, Brazil, 139--148.

[13]

W Frakes and C Terry. 1996. Software reuse: metrics and models. ACM Computing Surveys (CSUR) 28, 2 (1996), 415--435.

Digital Library

[14]

W B Frakes and C J Fox. 1995. Modeling reuse across the software life cycle. Journal of Systems and Software 30, 3 (1995), 295--301.

Digital Library

[15]

W B. Frakes and C J. Fox. 1996. Quality improvement using a software reuse failure modes model. IEEE Transactions on Software Engineering 22, 4 (1996), 274--279.

Digital Library

[16]

W B Frakes and P B Gandel. 1990. Representing reusable software. Information and Software Technology 32, 10 (1990), 653--664.

Digital Library

[17]

W B. Frakes and T P. Pole. 1994. An empirical study of representation methods for reusable software components. IEEE Transactions on Software Engineering 20, 8 (1994), 617--630.

Digital Library

[18]

J E Gaffney and T A Durek. 1989. Software reuse - key to enhanced productivity: some quantitative models. Information and Software Technology 31, 5 (1989), 258--267.

Digital Library

[19]

G Gousios. 2013. The GHTorrent dataset and tool suite. In 10th Working Conference on Mining Software Repositories. San Francisco, USA, 233--236.

Digital Library

[20]

G Grégoire. 2014. Multiple linear regression. European Astronomical Society Publications Series 66 (2014), 45--72.

[21]

R Hecht-Nielsen. 1988. Theory of the backpropagation neural network. Neural Networks 1, Supplement-1 (1988), 445--448.

[22]

E Kalliamvakou, G Gousios, K Blincoe, L Singer, D M German, and D Damian. 2014. The promises and perils of mining GitHub. In 11th Working Conference on Mining Software Repositories. Hyderabad, India, 92--101.

Digital Library

[23]

R Kohavi. 1996. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In 2nd International Conference on Knowledge Discovery and Data Mining. Portland, USA, 202--207.

[24]

P Koltun and A Hudson. 1991. A reuse maturity model. In 4th Annual Workshop on Software Reuse, W. B. Frakes (Ed.). Hemdon, USA, 1--4.

[25]

J Margono and T E Rhoads. 1992. Software reuse economics: cost-benefit analysis on a large-scale Ada project. In 14th International Conference on Software Engineering. Melbourne, Australia, 338--348.

Digital Library

[26]

M D McIlroy. 1968. Mass-produced software components. In Software Engineering: Report on a Conference Sponsored by the NATO Science Committee, P Naur and B Randell (Eds.). NATO Scientific Affairs Division, Garmisch, Germany, 88--98.

[27]

A Michail. 2000. Data mining library reuse patterns using generalized association rules. In 22nd International Conference on Software Engineering. Limerick, Ireland, 167--176.

Digital Library

[28]

Y M Mileva, V Dallmeier, M Burger, and A Zeller. 2009. Mining trends of library usage. In Joint International and Annual ERCIM Workshops on Principles of Software Evolution (IWPSE) and Software Evolution (Evol) Workshops. Amsterdam, The Netherlands, 57--62.

Digital Library

[29]

M Morisio, M Ezran, and C Tully. 2002. Success and failure factors in software reuse. IEEE Transactions on Software Engineering 28, 4 (2002), 340--357.

Digital Library

[30]

M S Oliveira. 2015. On the use of visualization for supporting software reuse. Ph.D. Dissertation. Federal University of Rio de Janeiro (COPPE/UFRJ).

[31]

L Rokach and O Maimon. 2005. Clustering Methods. In Data Mining and Knowledge Discovery Handbook, O Maimon and L Rokach (Eds.). Springer US, Boston, USA, 321--352.

[32]

Richard W Selby. 1989. Quantitative studies of software reuse. In Software Reusability, Ted J. Biggerstaff and Alan J. Perlis (Eds.). ACM, New York, USA, 213--233.

[33]

R Setiono and H Liu. 1997. Neural-network feature selector. IEEE Transactions on Neural Networks 8, 3 (1997), 654--662.

Digital Library

[34]

J Tsay, L Dabbish, and J Herbsleb. 2014. Influence of social and technical factors for evaluating contribution in GitHub. In 36th International Conference on Software Engineering. Hyderabad, India, 356--366.

Digital Library

[35]

J Zhu, M Zhou, and A Mockus. 2014. Patterns of folder use and project popularity: A case study of GitHub repositories. In 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. Torino, Italy, 30.

Digital Library

Cited By

Coelho IHanafi STodosijevic RRatli MGendron B(2024)Variable Neighborhood Search with Dynamic Exploration for the Set Union Knapsack ProblemCombinatorial Optimization and Applications10.1007/978-3-031-57603-4_2(17-35)Online publication date: 28-Jun-2024
https://doi.org/10.1007/978-3-031-57603-4_2

Index Terms

Is There an Interplay Between Library Usage and Repository Features?: An Analysis with Regression Models
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by regression
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability
  2. Software notations and tools
    1. Software libraries and repositories

Recommendations

Role and relevance of reuse repository facilitating software development

This paper discusses the role and relevance of reuse repositories and how software reuse influences software development. Reuse repositories are designed to increase the possibility of locating reusable components at a centralized location to ease ...
On the Identification of Third-Party Library Usage Patterns for Android Applications
EASE '22: Proceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering

The rapid growth of mobile applications development and usage raises several new challenges to developers as they need to respond quickly to the users’ needs in a world of continuous changes. Developers often use third-party libraries to add ...
A repository to support requirement specifications reuse
ISCNZ '96: Proceedings of the 1996 Information Systems Conference of New Zealand (ISCNZ '96)

Software reuse benefits from methodologies and tools to: locate, evaluate, and tailor reusable components to new developments; develop more readily reusable components; reduce cost, time, effort and probability of emerging bugs; and increase efficiency ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SBES '19: Proceedings of the XXXIII Brazilian Symposium on Software Engineering

September 2019

583 pages

ISBN:9781450376518

DOI:10.1145/3350768

General Chairs:
Ivan Machado
UFBA
,
Rodrigo Souza
UFBA
,
Rita Suzana Maciel
UFBA
,
Claudio Sant'Anna
UFBA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SBC: Sociedade Brasileira de Computação

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SBES 2019

SBES 2019: XXXIII Brazilian Symposium on Software Engineering

September 23 - 27, 2019

Salvador, Brazil

Acceptance Rates

SBES '19 Paper Acceptance Rate 67 of 153 submissions, 44%;

Overall Acceptance Rate 147 of 427 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
98
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Coelho IHanafi STodosijevic RRatli MGendron B(2024)Variable Neighborhood Search with Dynamic Exploration for the Set Union Knapsack ProblemCombinatorial Optimization and Applications10.1007/978-3-031-57603-4_2(17-35)Online publication date: 28-Jun-2024
https://doi.org/10.1007/978-3-031-57603-4_2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten