What to expect when you are consolidating: effective prediction models of application performance on multicores

Chen, Lydia Y.; Serazzi, Giuseppe; Ansaloni, Danilo; Smirni, Evgenia; Binder, Walter

doi:10.1007/s10586-013-0273-8

What to expect when you are consolidating: effective prediction models of application performance on multicores

Published: 25 May 2013

Volume 17, pages 19–37, (2014)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Lydia Y. Chen¹,
Giuseppe Serazzi²,
Danilo Ansaloni³,
Evgenia Smirni⁴ &
…
Walter Binder³

406 Accesses
6 Citations
Explore all metrics

Abstract

Consolidation of multiple applications with diverse and changing resource requirements is common in multicore systems as hardware resources are abundant. As opportunities for better system usage become ample, so are opportunities to degrade individual application performances due to unregulated performance interference between applications and system resources. Can we predict a performance region within which application performance is expected to lie under different consolidations? Alternatively, can we maximize resource utilization while maintaining individual application performance targets? In this work we provide a methodology that offers answers to the above difficult questions by constructing a queueing-theory based tool that can be used to accurately predict application scalability on multicores. The tool can also provide the optimal consolidation suggestions to maximize system resource utilization while meeting application performance targets. The proposed methodology is based on asymptotic analysis that can quickly provide a range of performance values that the user should expect under various consolidation scenarios. In addition, when more accurate performance forecasting is needed, the methodology can provide more accurate predictions using approximate mean value analysis. The methodology is light-weight as it relies on capturing application resource demands using standard system monitoring, via non-intrusive low-level measurements.

We evaluate our approach on an IBM Power7 system using the DaCapo and SPECjvm2008 benchmark suites. From 900 different consolidations of application instances, our tool accurately predicts the average iteration time of collocated applications with an average error below 9 per cent. Experimental and analytical results are in excellent agreement, confirming the robustness of the proposed methodology in suggesting the best consolidations that meet given performance objectives of individual applications while maximizing system resource utilization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads

Multicore Performance Prediction with MPET

Article Open access 01 July 2020

How Pre-multicore Methods and Algorithms Perform in Multicore Era

Notes

Because the applications that we use to evaluate the methodology proposed in this paper are composed by a set of iterations, we use the average “iteration time” as a measure of the application end-to-end execution time. Effectively, a scaled iteration time (multiplied by the number of iterations) expresses the application execution time.
Throughout this paper we use the terms “program”, “benchmark” and “application” interchangeably.
Throughout this paper we use the terms “JVM process” and “application instance” interchangeably.
MVA provides the exact solution of product form queueing networks, whose solutions of the steady-state probabilities can be expressed as a product of factors describing the state of each queuing node.
Of course, any alternative definition of a “target” iteration would also work.
See http://www.spec.org/jvm2008.
Of course, this scenario can be changed and the number of primary execution instances can be any integer. We have done experiments that have varied this number from 1 to 10 but are not reported here due to lack of space. The selected number of four primary consolidated applications is representative of all experiments.

References

Ansaloni, D., Chen, L.Y., Smirni, E., Binder, W.: Model-driven consolidation of Java workloads on multicores. In: Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN-PDS), pp. 1–12 (2012)
Chapter Google Scholar
Apparao, P., Iyer, R., Zhang, X., Newell, D., Adelmeyer, T.: Characterization & analysis of a server consolidation benchmark. In: Proceedings of VEE, pp. 21–30 (2008)
Google Scholar
Balbo, G., Serazzi, G.: Asymptotic analysis of multiclass closed queueing networks: common bottleneck. Perform. Eval. 26(1), 51–72 (1996)
Article MATH Google Scholar
Blackburn, S., Garner, R., Hoffman, C., Khan, A., McKinley, K., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J., Phansalkar, A., Stefanović, D., von Dincklage, D., Wiedermann, B.: The DaCapo benchmarks: Java benchmarking development and analysis. In: Proceedings of OOPSLA, pp. 169–190 (2006)
Google Scholar
Chen, J., John, L., Kaseridis, D.: Modeling program resource demand using inherent program characteristics. In: Proceedings of SIGMETRICS, pp. 1–12 (2011)
Google Scholar
Chen, L.Y., Ansaloni, D., Smirni, E., Yokokawa, A., Binder, W.: Achieving application-centric performance targets via consolidation on multicores: myth or reality? In: Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (HPDC), pp. 37–48 (2012)
Google Scholar
Chen, L.Y., Das, A., Qin, W., Sivasubramaniam, A., Wang, Q., Harper, R., Morris, B.: Consolidating clients on back-end servers with co-location and frequency control. ACM SIGMETRICS Perform. Eval. Rev. 34, 383–384 (2006)
Article Google Scholar
Dey, T., Wang, W., Davidson, J., Soffa, M.: Characterizing multi-threaded applications based on shared-resource contention. In: Proceedings of ISPASS, pp. 76–86 (2011)
Google Scholar
Govindan, S., Liu, J., Kansal, A., Sivasubramaniam, A.: Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In: Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)
Google Scholar
Hauswirth, M., Sweeney, P., Diwan, A., Hind, M.: Vertical profiling: understanding the behavior of object-oriented applications. In: Proceedings of OOPSLA, pp. 251–269 (2004)
Google Scholar
Hines, M.R., Gordon, A., Silva, M., da Silva, D., Ryu, K.D., Ben-Yehuda, M.: Applications know best: performance-driven memory overcommit with ginkgo. Tech. rep., IBM (2011)
Ïpek, E., McKee, S., Caruana, R., de Supinski, B., Schulz, M.: Efficiently exploring architectural design spaces via predictive modeling. In: Proceedings of ASPLOS, pp. 195–206 (2006)
Google Scholar
Jerger, N., Vantreaseand, D., Lipast, M.: An evaluation of server consolidation workloads for multi-core designs. In: Proceedings of IISWC, pp. 47–56 (2007)
Google Scholar
Knauerhase, R., Brett, P., Hohlt, B., Li, T., Hahn, S.: Using OS observations to improve performance in multicore systems. IEEE MICRO 28, 54–66 (2008)
Article Google Scholar
Koh, Y., Knauerhase, R.C., Brett, P., Bowman, M., Wen, Z., Pu, C.: An analysis of performance interference effects in virtual environments. In: Proceedings of ISPASS, pp. 200–209 (2007)
Google Scholar
Lee, B., Collins, J., Wang, H., Brooks, D.: CPR: composable performance regression for scalable multiprocessor models. In: Proceedings of Micro, pp. 270–281. IEEE Computer Society, Washington (2008)
Google Scholar
Lipsky, L., Lieu, C., Tehranipour, A., van de Liefvoort, A.: On the asymptotic behavior of time-sharing systems. Commun. ACM 25(10), 707–714 (1982)
Article MATH Google Scholar
Menascé, D., Almeida, V., Dowdy, L.: Capacity Planning and Performance Modeling: From Mainframes to Client-Server Systems. Prentice Hall, New York (1994)
Google Scholar
Meng, X., Isci, C., Kephart, J., Zhang, L., Bouillet, E., Pendarakis, D.: Efficient resource provisioning in compute clouds via VM multiplexing. In: Proceedings of ICAC, pp. 11–20 (2010)
Google Scholar
Mi, N., Casale, G., Cherkasova, L., Smirni, E.: Burstiness in multi-tier applications: symptoms, causes, and new models. In: Proceedings of Middleware, pp. 265–286 (2008)
Google Scholar
Nathuji, R., Kansal, A., Ghaffarkhah, A.: Q-clouds: managing performance interference effects for QoS-aware clouds. In: Proceedings of EuroSys, pp. 237–250 (2010)
Google Scholar
Reiser, M., Lavenberg, S.S.: Mean-value analysis of closed multichain queuing networks. J. ACM 27, 313–322 (1980)
Article MATH MathSciNet Google Scholar
Sharifi, A., Srikantaiah, S., Mishra, A., Kandemir, M., Das, C.: METE: meeting end-to-end QoS in multicores through system-wide resource management. In: Proceedings of SIGMETRICS, pp. 13–24 (2011)
Google Scholar
Song, X., Chen, H., Chen, R., Wang, Y., Zang, B.: A case for scaling applications to many-core with OS clustering. In: Proceesings of EuroSys, pp. 61–76 (2011)
Google Scholar
Tallent, N., Mellor-Crummey, J.: Effective performance measurement and analysis of multithreaded applications. SIGPLAN Not. 44, 229–240 (2009)
Article Google Scholar
Urgaonkar, B., Pacifici, G., Spreitzer, P.S.M., Tantawi, A.: An analytical model for multi-tier Internet services and its applications. In: Proceedings of SIGMETRICS, pp. 291–302 (2005)
Google Scholar
Wood, T., Cherkasova, L., Ozonat, K., Shenoy, P.: Profiling and modeling resource usage of virtualized applications. In: Proceedings of Middleware, pp. 366–387 (2008)
Google Scholar
Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Sandpiper: black-box and gray-box resource management for virtual machines. Comput. Netw. 53, 2923–2938 (2009)
Article MATH Google Scholar
Zhang, Q., Cherkasova, L., Mi, N., Smirni, E.: A regression-based analytic model for capacity planning of multi-tier applications. Clust. Comput. 11, 197–211 (2008)
Article Google Scholar
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: Proceesings of ASPLOS, pp. 129–142 (2010)
Google Scholar

Download references

Acknowledgements

This work has been supported by IBM and the Swiss National Science Foundation (project 200021 141002). Part of this work was conducted while Danilo Ansaloni was on an internship and Evgenia Smirni was on sabbatical leave at the IBM Zurich Research Laboratory. Evgenia Smirni is partially supported by NSF grants CCF-0937925 and CCF-1218758. A preliminary version [6] of this paper appeared in the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC’12, Delft, Netherlands, June 18–22, 2012.

Author information

Authors and Affiliations

IBM Zurich Lab, Zurich, Switzerland
Lydia Y. Chen
Politecnico di Milano, Milano, Italy
Giuseppe Serazzi
University of Lugano, Lugano, Switzerland
Danilo Ansaloni & Walter Binder
College of William and Mary, Wiiliamsburg, VA, USA
Evgenia Smirni

Authors

Lydia Y. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Serazzi
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Ansaloni
View author publications
You can also search for this author in PubMed Google Scholar
Evgenia Smirni
View author publications
You can also search for this author in PubMed Google Scholar
Walter Binder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evgenia Smirni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L.Y., Serazzi, G., Ansaloni, D. et al. What to expect when you are consolidating: effective prediction models of application performance on multicores. Cluster Comput 17, 19–37 (2014). https://doi.org/10.1007/s10586-013-0273-8

Download citation

Received: 04 October 2012
Accepted: 30 April 2013
Published: 25 May 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10586-013-0273-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What to expect when you are consolidating: effective prediction models of application performance on multicores

Abstract

Access this article

Similar content being viewed by others

Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads

Multicore Performance Prediction with MPET

How Pre-multicore Methods and Algorithms Perform in Multicore Era

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What to expect when you are consolidating: effective prediction models of application performance on multicores

Abstract

Access this article

Similar content being viewed by others

Mbench: Benchmarking a Multicore Operating System Using Mixed Workloads

Multicore Performance Prediction with MPET

How Pre-multicore Methods and Algorithms Perform in Multicore Era

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation