Assessment and cross-product prediction of software product line quality: accounting for reuse across products, over multiple releases

Devine, Thomas; Goseva-Popstojanova, Katerina; Krishnan, Sandeep; Lutz, Robyn R.

doi:10.1007/s10515-014-0160-4

Assessment and cross-product prediction of software product line quality: accounting for reuse across products, over multiple releases

Published: 12 August 2014

Volume 23, pages 253–302, (2016)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Thomas Devine¹,
Katerina Goseva-Popstojanova¹,
Sandeep Krishnan² &
…
Robyn R. Lutz²

838 Accesses
Explore all metrics

Abstract

The goals of cross-product reuse in a software product line (SPL) are to mitigate production costs and improve the quality. In addition to reuse across products, due to the evolutionary development process, a SPL also exhibits reuse across releases. In this paper, we empirically explore how the two types of reuse—reuse across products and reuse across releases—affect the quality of a SPL and our ability to accurately predict fault proneness. We measure the quality in terms of post-release faults and consider different levels of reuse across products (i.e., common, high-reuse variation, low-reuse variation, and single-use packages), over multiple releases. Assessment results showed that quality improved for common, low-reuse variation, and single-use packages as they evolved across releases. Surprisingly, within each release, among preexisting (‘old’) packages, the cross-product reuse did not affect the change and fault proneness. Cross-product predictions based on pre-release data accurately ranked the packages according to their post-release faults and predicted the 20 % most faulty packages. The predictions benefited from data available for other products in the product line, with models producing better results (1) when making predictions on smaller products (consisting mostly of common packages) rather than on larger products and (2) when trained on larger products rather than on smaller products.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Studying just-in-time defect prediction using cross-project models

Article 14 September 2015

Cross-project smell-based defect prediction

Article 04 October 2021

Heuristic Approaches to Improve Product Quality in Large Scale Integrated Software Products

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

A fault is defined as an accidental condition, which if encountered, may cause the system or system component to fail to perform as required. We avoid using the term defect, which is used inconsistently in the literature to refer in some cases to both faults and failures and in other cases only to faults or perhaps, faults detected pre-release.
For a comprehensive survey of binary classification studies the reader is referred to the recent paper by Hall et al. (2012).
For this study, a package is considered faulty if any file contained in that package exhibited one or more post-release faults.
Thompson and Heimdahl (2003) proposed a set-theoretic approach to represent requirements reuse in product line engineering, which described the boundaries of sets as commonalities and the members within the sets as products. The approach taken in our previous work (Devine et al. 2012) and used here is complementary to Thompson and Heimdahl (2003). Specifically, it is used to illustrate the amount of shared code at different levels of cross-product reuse; the elements within the sets are packages of the SPL, and the boundaries of sets define the products.
pserver:anonymous@dev.eclipse.org:2401/cvsroot
The complexity measure used by SourceMonitor approximately follows the definition by McConnell (2004).
Some form of $k$-fold cross validation is commonly employed in machine learning in general and software engineering in particular. Cross validation is the process of splitting the data randomly into $k$ groups, and then predicting values for the $k$-th group by building a model on the other $k-1$ groups. This is repeated using each of the $k$ groups as a testing group and the average value of the predicted variable is reported. Cross validation may provide better results than building models and predicting on disjoint data sets (as was done in this paper) because averaging the results over $k$ repeated trials offers more consistent, flattened end results than one achieved via building models and predicting on disjoint sets.
Many software metrics are highly correlated to each other, which engenders a problem that is commonly referred to as multicollinearity. To quote Kutner et al. (2004) “The fact that some or all predictor variables are correlated among themselves does not, in general, inhibit our ability to obtain a good fit nor does it tend to affect inferences about mean responses or predictions of new observations ...” However, multicollinearity may cause the estimated regression coefficients to have a large sampling variability and thus affect explanatory studies.
Kendall’s $\tau _b$ approaches the normal distribution quite rapidly so that the normal approximation is better for Kendall’s $\tau _b$ than it is for Spearman’s $\rho $. Another advantage of Kendall’s $\tau _b$ is its direct and simple interpretation in terms of probabilities of observing concordant pairs (both numbers of one observation are larger than their respective members of the other observation) and discordant pairs (the two numbers in one observation differ in opposite directions from the respective members in the other observation).
If the Friedman test results in rejection of the null hypothesis that there is no difference, a post hoc multiple comparison test is used to identify where the difference is. Alternatively, instead of the Friedman test, one can use the Page test which is used to test the null hypothesis that there is no statistically significant difference in several related samples (i.e., $H_0: \mu _1 = \mu _2 = \mu _3$) against the ordered alternative that the samples differ in a specified direction, with at least one inequality (i.e., $H_1: \mu _1 \ge \mu _2 \ge \mu _3$).

References

Agresti, A.: Analysis of Ordinal Categorical Data. John Wiley and Sons Inc, Hoboken, NJ (2010)
Book MATH Google Scholar
Andersson, C., Runeson, P.: A replicated quantitative analysis of fault distributions in complex software systems. IEEE Trans. Softw. Eng. 33, 273–286 (2007)
Article Google Scholar
Bell, R.M., Ostrand, T.J., Weyuker, E.J.: Looking for bugs in all the right places. In: Proceedings of the 2006 International Symposium on Software Testing and Analysis, ISSTA ’06, pp. 61–72 (2006)
Bibi, S., Tsoumakas, G., Stamelos, I., Vlahvas, I.: Software defect prediction using regression via classification. In: Proceedings of the IEEE International Conference on Computer Systems and Applications, AICCSA ’06, pp. 330–336 (2006)
Bingham, N.H., Fry, J.M.: Regression: Linear Models in Statistics, 1st edn. Springer-Verlag, London (2010)
Book MATH Google Scholar
Boehm, B., Basili, V.R.: Software defect reduction top 10 list. Computer 34, 135–137 (2001)
Article Google Scholar
Breivold, H.P., Crnkovic, I., Larsson, M.: A systematic review of software architecture evolution research. Info. Softw. Technol. 54(1), 16–40 (2012)
Article Google Scholar
Chastek, G., McGregor, J., Northrop, L.: Observations from viewing Eclipse as a product line. In: Proceedings of the 3rd International Workshop on Open Source Software and Product Lines, pp. 1–6 (2007)
D’Ambros, M., Lanza, M., Robbes, R.: On the relationship between change coupling and software defects. In: Proceedings of the 16th Working Conference on Reverse Engineering, WCRE ’09, pp. 135–144 (2009)
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories, MSR ’10, pp. 31–41 (2010)
D’Ambros, M., Lanza, M., Robbes, R.: Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir. Softw. Eng. 17, 531–577 (2012)
Article Google Scholar
Devine, T., Goseva-Popstajanova, K., Krishnan, S., Lutz, R., Li, J.: An empirical study of pre-release software faults in an industrial product line. In: Proceedings of the 5th IEEE International Conference on Software Testing, Verification and Validation, ICST ’12, pp. 181–190 (2012)
Fenton, N.E., Ohlsson, N.: Quantitative analysis of faults and failures in a complex software system. IEEE Trans. Softw. Eng. 26, 797–814 (2000)
Article Google Scholar
Frakes, W.B., Succi, G.: An industrial study of reuse, quality, and productivity. J. Syst. Softw. 57, 99–106 (2001)
Article Google Scholar
Gomaa, H.: Designing Software Product Lines with UML: From Use Cases to Pattern-Based Software Architectures. Addison Wesley Longman Publishing Co. Inc, Redwood City, CA (2004)
Google Scholar
van Gurp, J., Prehofer, C., Bosch, J.: Comparing practices for reuse in integration-oriented software product lines and large open source software projects. Softw. Prac. Exper. 40(4), 285–312 (2010)
Google Scholar
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic review of fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38(6), 1276–1304 (2012)
Article Google Scholar
Hamill, M., Goseva-Popstojanova, K.: Common trends in software fault and failure data. IEEE Trans. Softw. Eng. 35, 484–496 (2009)
Article Google Scholar
He, Z., Shu, F., Yang, Y., Li, M., Wang, Q.: An investigation on the feasibility of cross-project defect prediction. Autom. Softw. Eng. 19(2), 167–199 (2012)
Article Google Scholar
He, Z., Peters, F., Menzies, T., Yang, Y.: Learning from open-source projects: An empirical study on defect prediction. In: Proceedings of the ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM’13, pp. 45–54 (2013)
Kamei, Y., Matsumoto, S., Monden, A., Matsumoto, Ki., Adams, B., Hassan, A.E.: Revisiting common bug prediction findings using effort-aware models. In: Proceedings of the 2010 IEEE International Conference on Software Maintenance, ICSM ’10, pp. 1–10 (2010)
Kastro, Y., Bener, A.B.: A defect prediction method for software versioning. Softw. Qual. Control 16(4), 543–562 (2008)
Article Google Scholar
Khoshgoftaar, T., Munson, J.: Predicting software development errors using software complexity metrics. IEEE J. Sel. Areas Commun. 8(2), 253–261 (1990)
Article Google Scholar
Khoshgoftaar, T.M., Seliya, N.: Comparative assessment of software quality classification techniques: an empirical case study. Empir. Softw. Eng. 9(3), 229–257 (2004)
Article Google Scholar
Kitchenham, B., Mendes, E.: Why comparative effort prediction studies may be invalid. In: Proceedings of the 5th International Conference on Predictor Models in Software Engineering, PROMISE ’09, pp. 4:1–4:5 (2009)
Kleinbaum, D.G., Kupper, L.L., Muller, K.E. (eds.): Applied regression analysis and other multivariable methods. PWS Publishing Co., Boston, MA (1988)
MATH Google Scholar
Krishnan, S., Lutz, R.R., Goseva-Popstojanova, K.: Empirical evaluation of reliability improvement in an evolving software product line. In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR ’11, pp. 103–112 (2011a)
Krishnan, S., Strasburg, C., Lutz, R.R., Goseva-Popstojanova, K.: Are change metrics good predictors for an evolving software product line? In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, PROMISE’11, pp. 7:1–7:10 (2011b)
Krishnan, S., Strasburg, C., Lutz, R.R., Goseva-Popstojanova, K., Dorman, K.S.: Predicting failure-proneness in an evolving software product line. Info. Softw. Technol. 55(8), 1479–1495 (2012)
Article Google Scholar
Kutner, M.H., Nachtsheim, C.J., Neter, J.: Appl. Linear Regres. Models, forth edn. McGraw-Hill/Irwin, New York, NY (2004)
Google Scholar
Laffra, C., Veys, N.: Where did Eclipse come from? http://wiki.eclipse.org/FAQ_Where_did_Eclipse_come_from%3F (2013). Accessed 5 Aug 2014
Li, P.L., Herbsleb, J., Shaw, M., Robinson, B.: Experiences and results from initiating field defect prediction and product test prioritization efforts at ABB Inc., In: Proceedings of the 28th International Conference on Software Engineering, ICSE ’06, pp. 413–422 (2006)
Lim, W.: Effects of reuse on quality, productivity, and economics. IEEE Trans. Softw. Eng. 11(5), 23–30 (1994)
Article Google Scholar
Ma, Y., Luo, G., Zeng, X., Chen, A.: Transfer learning for cross-company software defect prediction. Info. Softw. Technol. 54(3), 248–256 (2012)
Article Google Scholar
Mansfield, D.: CVSps-patchsets for CVS. http://www.cobite.com/cvsps (2012). Accessed 5 Aug 2014
McConnell, S.: Code Complete, 2nd edn. Microsoft Press, Redmond, WA (2004)
Google Scholar
McCullagh, P., Nelder, J.: Generalized Linear Models. Monographs on Statistics and Applied Probability. Chapman and Hall, New York, NY (1983)
Book MATH Google Scholar
Mohagheghi, P., Conradi, R.: An empirical investigation of software reuse benefits in a large telecom product. ACM Trans. Softw. Eng. Method. 17, 13:1–13:31 (2008)
Article Google Scholar
Mohagheghi, P., Conradi, R., Killi, O., Schwarz, H.: An empirical study of software reuse vs. defect-density and stability. In: Proceedings of the 26th International Conference on Software Engineering, ICSE ’04, pp. 282–291 (2004)
Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the ACM/IEEE 30th International Conference on Software Engineering, ICSE ’08, pp. 181–190 (2008)
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, ICSE ’06, pp. 452–461 (2006)
Nam, J., Pan, S.J., Kim, S.: Transfer defect learning. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp. 382–391 (2013)
Nelder, J.A., Wedderburn, R.W.M.: Generalized linear models. J. Royal Statist. Soc. Ser. A (General) 135(3), 370–384 (1972)
Article Google Scholar
Norušis, M.J.: IBM SPSS Statistics 19 Advanced Statistical Procedures Companion. Prentice Hall, Upper Saddle River, NJ (2012)
Ohlsson, N., Alberg, H.: Predicting fault-prone software modules in telephone switches. IEEE Trans. Softw. Eng. 22(12), 886–894 (1996)
Article Google Scholar
Ostrand, T.J., Weyuker, E.J.: The distribution of faults in a large industrial software system. In: Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA ’02, pp. 55–64 (2002)
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Where the bugs are. In: Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA’04, pp. 86–96 (2004)
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the location and number of faults in large software systems. IEEE Trans. Softw. Eng. 31(4), 340–355 (2005)
Article Google Scholar
Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Programmer-based fault prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, PROMISE’10, pp. 19:1–19:10 (2010)
Pohl, K., Böckle, G.: Software Product Line Engineering: Foundations. Principles and Techniques. Springer-Verlag, Secaucus, NJ (2005)
Book MATH Google Scholar
Selby, R.: Enabling reuse-based software development of large-scale systems. IEEE Trans. Softw. Eng. 31(6), 495–510 (2005)
Article Google Scholar
Shin, Y., Bell, R., Ostrand, T., Weyuker, E.: Does calling structure information improve the accuracy of fault prediction? In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories, MSR ’09, pp. 61–70 (2009)
Shull, F.J., Carver, J.C., Vegas, S., Juristo, N.: The role of replications in empirical software engineering. Empir. Softw. Eng. 13(2), 211–218 (2008)
Article Google Scholar
SourceMonitor (2011) Version 3.2. http://www.campwoodsw.com/sourcemonitor.html. Accessed 5 Aug 2014
Taylor, R.N.: The role of architectural styles in successful software ecosystems. In: Proceedings of the 17th International Software Product Line Conference, SPLC ’13, pp. 2–4 (2013)
Thomas, W.M., Delis, A., Basili, V.R.: An analysis of errors in a reuse-oriented development environment. J. Syst. Softw. 38, 211–224 (1997)
Article Google Scholar
Thompson, J.M., Heimdahl, M.P.E.: Structuring product family requirements for n-dimensional and hierarchical product lines. Requir. Eng. 8(1), 42–54 (2003)
Article Google Scholar
Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J.: On the relative value of cross-company and within-company data for defect prediction. Empir. Softw. Eng. 14(5), 540–578 (2009)
Article Google Scholar
van der Linden, F.: Applying open source software principles in product lines. Cepsus Upgrade Eur. J. Info. Prof. 10, 32–40 (2009)
Google Scholar
van der Linden, F.: Open source practices in software product line engineering. In: Lucia, A., Ferrucci, F. (eds.) Software Engineering, Lecture Notes in Computer Science, vol. 7171, pp. 216–235. Springer, Berlin Heidelberg (2013)
Google Scholar
Watanabe, S., Kaiya, H., Kaijiri, K.: Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, PROMISE ’08, pp. 19–24 (2008)
Weiss, D.M., Lai, C.T.R.: Software Product-Line Engineering: A Family-Based Software Development Process. Addison-Wesley Longman Publishing Co. Inc, Boston, MA (1999)
Google Scholar
Weyuker, E.J., Ostrand, T.J., Bell, R.M.: Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir. Softw. Eng. 13(5), 539–559 (2008)
Article Google Scholar
Zhang, W., Jarzabek, S.: Reuse without compromising performance: Industrial experience from RPG software product line for mobile devices. In: Software Product Lines, LNCS, vol. 3714, pp. 57–69 (2005)
Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for Eclipse. In: Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering, PROMISE’07, p. 9 (2007)
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE ’09, pp. 91–100 (2009)

Download references

Acknowledgments

This work was supported in part by the National Science Foundation Grants 0916275 and 0916284 with funds from the American Recovery and Reinvestment Act of 2009 and by the WVU ADVANCE Sponsorship Program funded by the National Science Foundation ADVANCE IT Program award HRD-100797. Part of this work was performed while Robyn Lutz was visiting the California Institute of Technology.

Author information

Authors and Affiliations

Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
Thomas Devine & Katerina Goseva-Popstojanova
Department of Computer Science, Iowa State University, Ames, IA, USA
Sandeep Krishnan & Robyn R. Lutz

Authors

Thomas Devine
View author publications
You can also search for this author inPubMed Google Scholar
Katerina Goseva-Popstojanova
View author publications
You can also search for this author inPubMed Google Scholar
Sandeep Krishnan
View author publications
You can also search for this author inPubMed Google Scholar
Robyn R. Lutz
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Katerina Goseva-Popstojanova.

Appendix: Aggregation metrics

The static code and change metrics were collected at file-level and then were aggregated to the package level, as specified in Tables 7 and 8. As a result, each package was characterized by a vector $\mathbf{m}$ of 112 metrics (i.e., features), where $\mathbf{m}\left[ i \right] , i=1,\ldots ,73$ are static code metrics, while $\mathbf{m}\left[ i \right] , i=74,\ldots ,112$ are change metrics.

Table 7 Aggregations applied to each static code metric

Full size table

Table 8 Aggregations applied to each change metric

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Devine, T., Goseva-Popstojanova, K., Krishnan, S. et al. Assessment and cross-product prediction of software product line quality: accounting for reuse across products, over multiple releases. Autom Softw Eng 23, 253–302 (2016). https://doi.org/10.1007/s10515-014-0160-4

Download citation

Received: 25 November 2013
Accepted: 18 July 2014
Published: 12 August 2014
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10515-014-0160-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessment and cross-product prediction of software product line quality: accounting for reuse across products, over multiple releases

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Studying just-in-time defect prediction using cross-project models

Cross-project smell-based defect prediction

Heuristic Approaches to Improve Product Quality in Large Scale Integrated Software Products

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Aggregation metrics

Appendix: Aggregation metrics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now