Data-efficient performance learning for configurable systems

Guo, Jianmei; Yang, Dingyu; Siegmund, Norbert; Apel, Sven; Sarkar, Atrisha; Valov, Pavel; Czarnecki, Krzysztof; Wasowski, Andrzej; Yu, Huiqun

doi:10.1007/s10664-017-9573-6

Data-efficient performance learning for configurable systems

Published: 20 November 2017

Volume 23, pages 1826–1867, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Jianmei Guo ORCID: orcid.org/0000-0001-5787-6781¹,
Dingyu Yang²,
Norbert Siegmund³,
Sven Apel⁴,
Atrisha Sarkar⁵,
Pavel Valov⁵,
Krzysztof Czarnecki⁵,
Andrzej Wasowski⁶ &
…
Huiqun Yu⁷

1536 Accesses
65 Citations
4 Altmetric
Explore all metrics

Abstract

Many software systems today are configurable, offering customization of functionality by feature selection. Understanding how performance varies in terms of feature selection is key for selecting appropriate configurations that meet a set of given requirements. Due to a huge configuration space and the possibly high cost of performance measurement, it is usually not feasible to explore the entire configuration space of a configurable system exhaustively. It is thus a major challenge to accurately predict performance based on a small sample of measured system variants. To address this challenge, we propose a data-efficient learning approach, called DECART, that combines several techniques of machine learning and statistics for performance prediction of configurable systems. DECART builds, validates, and determines a prediction model based on an available sample of measured system variants. Empirical results on 10 real-world configurable systems demonstrate the effectiveness and practicality of DECART. In particular, DECART achieves a prediction accuracy of 90% or higher based on a small sample, whose size is linear in the number of features. In addition, we propose a sample quality metric and introduce a quantitative analysis of the quality of a sample for performance prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Faster discovery of faster system configurations with spectral learning

Article 30 August 2017

Vivek Nair, Tim Menzies, … Sven Apel

Tradeoffs in modeling performance of highly configurable software systems

Article 08 February 2018

Sergiy Kolesnikov, Norbert Siegmund, … Sven Apel

Mastering uncertainty in performance estimations of configurable software systems

Article Open access 19 January 2023

Johannes Dorn, Sven Apel & Norbert Siegmund

Notes

References

Abdelaziz AA, Kadir WMW, Osman A (2011) Comparative analysis of software performance prediction approaches in context of component-based system. Int J Comput Appl 23(3):15–22
Google Scholar
Apel S, Kästner C (2009) An overview of feature-oriented software development. J Object Tech 8(5):49–84
Article Google Scholar
Balsamo S, Marco AD, Inverardi P, Simeoni M (2004) Model-based performance prediction in software development: A survey. IEEE Trans Software Eng 30(5):295–310
Article Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305
MathSciNet MATH Google Scholar
Berk R (2008) Statistical learning from a regression perspective. Springer, Berlin
MATH Google Scholar
Breiman L, Friedman J, Stone C, Olshen R (1984) Classication and regression trees. Wadsworth and Brooks
Bu X, Rao J, Xu C (2009) A reinforcement learning approach to online web systems auto-configuration. In: Proceedings of 29th IEEE international conference on distributed computing systems (ICDCS), pp 2–11
Chen S, Liu Y, Gorton I, Liu A (2005) Performance prediction of component-based applications. J Syst Softw 74(1):35–43
Article Google Scholar
Courtois M, Woodside CM (2000) Using regression splines for software performance analysis. In: Proceedings of second international workshop on software and performance, pp 105–114
Czarnecki K, Eisenecker U (2000) Generative programming: methods, tools, and applications. Addison-Wesley, Boston
Google Scholar
Deisenroth M, Mohamed S, Doshi-Velez F, Krause A, Welling M (2016) ICML Workshop on data-efficient machine learning. https://sites.google.com/site/dataefficientml/
Domhan T, Springenberg JT, Hutter F (2015) Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence (IJCAI), pp 3460–3468
Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, Chapman
Book MATH Google Scholar
Grechanik M, Fu C, Xie Q (2012) Automatically finding performance problems with feedback-directed learning software testing. In: Proceedings of international conference on software engineering. IEEE, pp 156–166
Guo J, Czarnecki K, Apel S, Siegmund N, Wąsowski A (2013) Variability-aware performance prediction: a statistical learning approach. In: Proceedings of international conference on automated software engineering. IEEE, pp 301–311
Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. The MIT Press, Cambridge
Happe J, Koziolek H, Reussner R (2011) Facilitating performance predictions using software components. IEEE Soft 28(3):27–33
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning data mining. In: Inference, and prediction, 2nd edn. Springer
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University, Taipei City
Huang P, Ma X, Shen D, Zhou Y (2014) Performance regression testing target prioritization via performance risk analysis. In: Proceedings of international conference on software engineering. ACM, pp 60–71
Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: Proceedings of international conference on learning and intelligent optimization. Springer, pp 507–523
Hutter F, Xu L, Hoos HH, Leyton-Brown K (2014) Algorithm runtime prediction: methods & evaluation. Artif Intell 206:79–111
Article MathSciNet MATH Google Scholar
Jamshidi P, Casale G (2016) An uncertainty-aware approach to optimal configuration of stream processing systems. In: International symposium on modeling, analysis and simulation of computer and telecommunication systems, pp 39–48
Jovic M, Adamoli A, Hauswirth M (2011) Catch me if you can: performance bug detection in the wild. In: Proceedings of international conference on object oriented programming systems languages and applications. ACM, pp 155–170
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of international joint conference on artificial intelligence. Morgan Kaufmann, pp 1137–1145
Kwon Y, Lee S, Yi H, Kwon D, Yang S, Chun BG, Huang L, Maniatis P, Naik M, Paek Y (2013) Mantis: automatic performance prediction for smartphone applications. In: Proceedings of the 2013 USENIX conference on annual technical conference. USENIX Association, pp 297–308
Lee BC, Brooks DM, de Supinski BR, Schulz M, Singh K, McKee SA (2007) Methods of inference and learning for performance modeling of parallel applications. In: Proceedings of the 12th ACM SIGPLAN symposium on principles and practice of parallel programming (PPOPP), pp 249–258
Nadi S, Berger T, Kästner C., Czarnecki K (2015) Where do configuration constraints stem from? An extraction approach and an empirical study. IEEE Trans Softw Eng 41(8):820–841
Article Google Scholar
Osogami T, Kato S (2007) Optimizing system configurations quickly by guessing at the performance. In: Proceedings of international conference on measurement and modeling of computer systems, pp 145–156
Provost FJ, Jensen D, Oates T (1999) Efficient progressive sampling. In: Proceedings of international conference on knowledge discovery and data Mining. ACM, pp 23–32
Ramirez A, Cheng B (2011) Automatic derivation of utility functions for monitoring software requirements. In: Proceedings of international conference on model driven engineering languages and systems. IEEE
Salkind NJ (2003) Exploring research. Prentice Hall PTR
Sarkar A, Guo J, Siegmund N, Apel S, Czarnecki K (2015) Cost-efficient sampling for performance prediction of configurable systems. In: Proceedings of international conference on automated software engineering. IEEE, pp 342–352
She S, Ryssel U, Andersen N, Wasowski A, Czarnecki K (2014) Efficient synthesis of feature models. Inf Soft Tech 56(9):1122–1143
Article Google Scholar
Siegmund N, Grebhahn A, Apel S, Kästner C (2015) Performance-influence models for highly configurable systems. In: Proceedings of international symposium on the foundations of software engineering, pp 284–294
Siegmund N, Kolesnikov S, Kästner C, Apel S, Batory D, Rosenmüller M, Saake G (2012a) Predicting performance via automated feature-interaction detection. In: Proceedings of international conference on software engineering. IEEE
Siegmund N, Rosenmüller M, Kuhlemann M, Kästner C, Apel S, Saake G (2012b) SPL conqueror: toward optimization of non-functional properties in software product lines. Softw Qual J 20(3-4):487–517
Article Google Scholar
Siegmund N, Sobernig S, Apel S (2017) Attributed variability models: outside the comfort zone. In: Proceedings of international symposium on the foundations of software engineering, pp 268–278
Sincero J, Schröder-Preikschat W, Spinczyk O (2010) Approaching non-functional properties of software product lines: learning from products. In: Proceedings of Asia-Pacific software engineering conference. IEEE
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Proceedings of 26th annual conference on neural information processing systems (NIPS), pp 2960–2968
Tawhid R, Petriu D (2011) Automatic derivation of a product performance model from a software product line model. In: Proceedings of international software product line conference. IEEE, pp 80–89
Thereska E, Doebel B, Zheng A, Nobel P (2010) Practical performance models for complex, popular applications. In: Proceedings SIGMETRICS. ACM, pp 1–12
Valov P, Guo J, Czarnecki K (2015) Empirical comparison of regression methods for variability-aware performance prediction. In: Proceedings of international software product line conference. ACM, pp 186–190
Valov P, Petkovich J, Guo J, Fischmeister S, Czarnecki K (2017) Transferring performance prediction models across different hardware platforms. In: Proceedings of the 8th ACM/SPEC on international conference on performance engineering (ICPE), pp 39–50
Westermann D, Happe J, Krebs R, Farahbod R (2012) Automated inference of goal-oriented performance prediction functions. In: Proceedings of international conference on automated software engineering. ACM
Williams G (2011) Data Mining with Rattle and R: the art of excavating data for knowledge discovery. Springer, Berlin
Book MATH Google Scholar
Xi B, Liu Z, Raghavachari M, Xia CH, Zhang L (2004) A smart hill-climbing algorithm for application server configuration. In: Proceedings of international conference on World Wide Web, pp 287–296
Zhang Y, Guo J, Blais E, Czarnecki K (2015) Performance prediction of configurable software systems by fourier learning. In: Proceedings of international conference on automated software engineering. IEEE, pp 365–373
Zhang Y, Guo J, Blais E, Czarnecki K, Yu H (2016) A mathematical model of performance-relevant feature interactions. In: Proceedings of the 20th international systems and software product line conference (SPLC), pp 25–34

Download references

Acknowledgements

We would like to thank anonymous reviewers for their insightful comments. This research was partially supported by National Natural Science Foundation of China (No. 61772200, 61702320), Shanghai Pujiang Talent Program (No. 17PJ1401900), Shanghai Municipal Natural Science Foundation (No. 17ZR1406900), Shanghai Municipal Education Commission Funds of Young Teacher Training Program (No. ZZSDJ17021), Specialized Fund of Shanghai Municipal Commission of Economy and Informatization (No. 201602008), Specialized Research Fund for Doctoral Program of Higher Education (No. 20130074110015), the DFG grants (AP 206/4, AP 206/5, AP 206/7, SI 2171/2, and SI 2171/3), Natural Sciences and Engineering Research Council of Canada, and Pratt & Whitney Canada.

Author information

Authors and Affiliations

Alibaba Group, Hangzhou, China
Jianmei Guo
Shanghai Dianji University, Shanghai, China
Dingyu Yang
Bauhaus-University Weimar, Weimar, Germany
Norbert Siegmund
University of Passau, Passau, Germany
Sven Apel
University of Waterloo, Waterloo, ON, Canada
Atrisha Sarkar, Pavel Valov & Krzysztof Czarnecki
IT University of Copenhagen, Copenhagen, Denmark
Andrzej Wasowski
East China University of Science and Technology, Shanghai, China
Huiqun Yu

Authors

Jianmei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Dingyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Siegmund
View author publications
You can also search for this author in PubMed Google Scholar
Sven Apel
View author publications
You can also search for this author in PubMed Google Scholar
Atrisha Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Valov
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Czarnecki
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Wasowski
View author publications
You can also search for this author in PubMed Google Scholar
Huiqun Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jianmei Guo or Dingyu Yang.

Additional information

Communicated by: Vittorio Cortellessa

Jianmei Guo and Dingyu Yang contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, J., Yang, D., Siegmund, N. et al. Data-efficient performance learning for configurable systems. Empir Software Eng 23, 1826–1867 (2018). https://doi.org/10.1007/s10664-017-9573-6

Download citation

Published: 20 November 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10664-017-9573-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-efficient performance learning for configurable systems

Abstract

Access this article

Similar content being viewed by others

Faster discovery of faster system configurations with spectral learning

Tradeoffs in modeling performance of highly configurable software systems

Mastering uncertainty in performance estimations of configurable software systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data-efficient performance learning for configurable systems

Abstract

Access this article

Similar content being viewed by others

Faster discovery of faster system configurations with spectral learning

Tradeoffs in modeling performance of highly configurable software systems

Mastering uncertainty in performance estimations of configurable software systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation