Skip to main content
Log in

Data-efficient performance learning for configurable systems

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Many software systems today are configurable, offering customization of functionality by feature selection. Understanding how performance varies in terms of feature selection is key for selecting appropriate configurations that meet a set of given requirements. Due to a huge configuration space and the possibly high cost of performance measurement, it is usually not feasible to explore the entire configuration space of a configurable system exhaustively. It is thus a major challenge to accurately predict performance based on a small sample of measured system variants. To address this challenge, we propose a data-efficient learning approach, called DECART, that combines several techniques of machine learning and statistics for performance prediction of configurable systems. DECART builds, validates, and determines a prediction model based on an available sample of measured system variants. Empirical results on 10 real-world configurable systems demonstrate the effectiveness and practicality of DECART. In particular, DECART achieves a prediction accuracy of 90% or higher based on a small sample, whose size is linear in the number of features. In addition, we propose a sample quality metric and introduce a quantitative analysis of the quality of a sample for performance prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.videolan.org/developers/x264.html

  2. http://www.r-project.org/

  3. http://cran.r-project.org/web/packages/rpart/

  4. https://cran.r-project.org/web/packages/rBayesianOptimization/

  5. http://github.com/jmguo/DECART/

References

  • Abdelaziz AA, Kadir WMW, Osman A (2011) Comparative analysis of software performance prediction approaches in context of component-based system. Int J Comput Appl 23(3):15–22

    Google Scholar 

  • Apel S, Kästner C (2009) An overview of feature-oriented software development. J Object Tech 8(5):49–84

    Article  Google Scholar 

  • Balsamo S, Marco AD, Inverardi P, Simeoni M (2004) Model-based performance prediction in software development: A survey. IEEE Trans Software Eng 30(5):295–310

    Article  Google Scholar 

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305

    MathSciNet  MATH  Google Scholar 

  • Berk R (2008) Statistical learning from a regression perspective. Springer, Berlin

    MATH  Google Scholar 

  • Breiman L, Friedman J, Stone C, Olshen R (1984) Classication and regression trees. Wadsworth and Brooks

  • Bu X, Rao J, Xu C (2009) A reinforcement learning approach to online web systems auto-configuration. In: Proceedings of 29th IEEE international conference on distributed computing systems (ICDCS), pp 2–11

  • Chen S, Liu Y, Gorton I, Liu A (2005) Performance prediction of component-based applications. J Syst Softw 74(1):35–43

    Article  Google Scholar 

  • Courtois M, Woodside CM (2000) Using regression splines for software performance analysis. In: Proceedings of second international workshop on software and performance, pp 105–114

  • Czarnecki K, Eisenecker U (2000) Generative programming: methods, tools, and applications. Addison-Wesley, Boston

    Google Scholar 

  • Deisenroth M, Mohamed S, Doshi-Velez F, Krause A, Welling M (2016) ICML Workshop on data-efficient machine learning. https://sites.google.com/site/dataefficientml/

  • Domhan T, Springenberg JT, Hutter F (2015) Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence (IJCAI), pp 3460–3468

  • Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, Chapman

    Book  MATH  Google Scholar 

  • Grechanik M, Fu C, Xie Q (2012) Automatically finding performance problems with feedback-directed learning software testing. In: Proceedings of international conference on software engineering. IEEE, pp 156–166

  • Guo J, Czarnecki K, Apel S, Siegmund N, Wąsowski A (2013) Variability-aware performance prediction: a statistical learning approach. In: Proceedings of international conference on automated software engineering. IEEE, pp 301–311

  • Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. The MIT Press, Cambridge

  • Happe J, Koziolek H, Reussner R (2011) Facilitating performance predictions using software components. IEEE Soft 28(3):27–33

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning data mining. In: Inference, and prediction, 2nd edn. Springer

  • Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University, Taipei City

  • Huang P, Ma X, Shen D, Zhou Y (2014) Performance regression testing target prioritization via performance risk analysis. In: Proceedings of international conference on software engineering. ACM, pp 60–71

  • Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: Proceedings of international conference on learning and intelligent optimization. Springer, pp 507–523

  • Hutter F, Xu L, Hoos HH, Leyton-Brown K (2014) Algorithm runtime prediction: methods & evaluation. Artif Intell 206:79–111

    Article  MathSciNet  MATH  Google Scholar 

  • Jamshidi P, Casale G (2016) An uncertainty-aware approach to optimal configuration of stream processing systems. In: International symposium on modeling, analysis and simulation of computer and telecommunication systems, pp 39–48

  • Jovic M, Adamoli A, Hauswirth M (2011) Catch me if you can: performance bug detection in the wild. In: Proceedings of international conference on object oriented programming systems languages and applications. ACM, pp 155–170

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of international joint conference on artificial intelligence. Morgan Kaufmann, pp 1137–1145

  • Kwon Y, Lee S, Yi H, Kwon D, Yang S, Chun BG, Huang L, Maniatis P, Naik M, Paek Y (2013) Mantis: automatic performance prediction for smartphone applications. In: Proceedings of the 2013 USENIX conference on annual technical conference. USENIX Association, pp 297–308

  • Lee BC, Brooks DM, de Supinski BR, Schulz M, Singh K, McKee SA (2007) Methods of inference and learning for performance modeling of parallel applications. In: Proceedings of the 12th ACM SIGPLAN symposium on principles and practice of parallel programming (PPOPP), pp 249–258

  • Nadi S, Berger T, Kästner C., Czarnecki K (2015) Where do configuration constraints stem from? An extraction approach and an empirical study. IEEE Trans Softw Eng 41(8):820–841

    Article  Google Scholar 

  • Osogami T, Kato S (2007) Optimizing system configurations quickly by guessing at the performance. In: Proceedings of international conference on measurement and modeling of computer systems, pp 145–156

  • Provost FJ, Jensen D, Oates T (1999) Efficient progressive sampling. In: Proceedings of international conference on knowledge discovery and data Mining. ACM, pp 23–32

  • Ramirez A, Cheng B (2011) Automatic derivation of utility functions for monitoring software requirements. In: Proceedings of international conference on model driven engineering languages and systems. IEEE

  • Salkind NJ (2003) Exploring research. Prentice Hall PTR

  • Sarkar A, Guo J, Siegmund N, Apel S, Czarnecki K (2015) Cost-efficient sampling for performance prediction of configurable systems. In: Proceedings of international conference on automated software engineering. IEEE, pp 342–352

  • She S, Ryssel U, Andersen N, Wasowski A, Czarnecki K (2014) Efficient synthesis of feature models. Inf Soft Tech 56(9):1122–1143

    Article  Google Scholar 

  • Siegmund N, Grebhahn A, Apel S, Kästner C (2015) Performance-influence models for highly configurable systems. In: Proceedings of international symposium on the foundations of software engineering, pp 284–294

  • Siegmund N, Kolesnikov S, Kästner C, Apel S, Batory D, Rosenmüller M, Saake G (2012a) Predicting performance via automated feature-interaction detection. In: Proceedings of international conference on software engineering. IEEE

  • Siegmund N, Rosenmüller M, Kuhlemann M, Kästner C, Apel S, Saake G (2012b) SPL conqueror: toward optimization of non-functional properties in software product lines. Softw Qual J 20(3-4):487–517

    Article  Google Scholar 

  • Siegmund N, Sobernig S, Apel S (2017) Attributed variability models: outside the comfort zone. In: Proceedings of international symposium on the foundations of software engineering, pp 268–278

  • Sincero J, Schröder-Preikschat W, Spinczyk O (2010) Approaching non-functional properties of software product lines: learning from products. In: Proceedings of Asia-Pacific software engineering conference. IEEE

  • Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Proceedings of 26th annual conference on neural information processing systems (NIPS), pp 2960–2968

  • Tawhid R, Petriu D (2011) Automatic derivation of a product performance model from a software product line model. In: Proceedings of international software product line conference. IEEE, pp 80–89

  • Thereska E, Doebel B, Zheng A, Nobel P (2010) Practical performance models for complex, popular applications. In: Proceedings SIGMETRICS. ACM, pp 1–12

  • Valov P, Guo J, Czarnecki K (2015) Empirical comparison of regression methods for variability-aware performance prediction. In: Proceedings of international software product line conference. ACM, pp 186–190

  • Valov P, Petkovich J, Guo J, Fischmeister S, Czarnecki K (2017) Transferring performance prediction models across different hardware platforms. In: Proceedings of the 8th ACM/SPEC on international conference on performance engineering (ICPE), pp 39–50

  • Westermann D, Happe J, Krebs R, Farahbod R (2012) Automated inference of goal-oriented performance prediction functions. In: Proceedings of international conference on automated software engineering. ACM

  • Williams G (2011) Data Mining with Rattle and R: the art of excavating data for knowledge discovery. Springer, Berlin

    Book  MATH  Google Scholar 

  • Xi B, Liu Z, Raghavachari M, Xia CH, Zhang L (2004) A smart hill-climbing algorithm for application server configuration. In: Proceedings of international conference on World Wide Web, pp 287–296

  • Zhang Y, Guo J, Blais E, Czarnecki K (2015) Performance prediction of configurable software systems by fourier learning. In: Proceedings of international conference on automated software engineering. IEEE, pp 365–373

  • Zhang Y, Guo J, Blais E, Czarnecki K, Yu H (2016) A mathematical model of performance-relevant feature interactions. In: Proceedings of the 20th international systems and software product line conference (SPLC), pp 25–34

Download references

Acknowledgements

We would like to thank anonymous reviewers for their insightful comments. This research was partially supported by National Natural Science Foundation of China (No. 61772200, 61702320), Shanghai Pujiang Talent Program (No. 17PJ1401900), Shanghai Municipal Natural Science Foundation (No. 17ZR1406900), Shanghai Municipal Education Commission Funds of Young Teacher Training Program (No. ZZSDJ17021), Specialized Fund of Shanghai Municipal Commission of Economy and Informatization (No. 201602008), Specialized Research Fund for Doctoral Program of Higher Education (No. 20130074110015), the DFG grants (AP 206/4, AP 206/5, AP 206/7, SI 2171/2, and SI 2171/3), Natural Sciences and Engineering Research Council of Canada, and Pratt & Whitney Canada.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jianmei Guo or Dingyu Yang.

Additional information

Communicated by: Vittorio Cortellessa

Jianmei Guo and Dingyu Yang contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, J., Yang, D., Siegmund, N. et al. Data-efficient performance learning for configurable systems. Empir Software Eng 23, 1826–1867 (2018). https://doi.org/10.1007/s10664-017-9573-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9573-6

Keywords

Navigation