Abstract
Consolidation of multiple applications with diverse and changing resource requirements is common in multicore systems as hardware resources are abundant. As opportunities for better system usage become ample, so are opportunities to degrade individual application performances due to unregulated performance interference between applications and system resources. Can we predict a performance region within which application performance is expected to lie under different consolidations? Alternatively, can we maximize resource utilization while maintaining individual application performance targets? In this work we provide a methodology that offers answers to the above difficult questions by constructing a queueing-theory based tool that can be used to accurately predict application scalability on multicores. The tool can also provide the optimal consolidation suggestions to maximize system resource utilization while meeting application performance targets. The proposed methodology is based on asymptotic analysis that can quickly provide a range of performance values that the user should expect under various consolidation scenarios. In addition, when more accurate performance forecasting is needed, the methodology can provide more accurate predictions using approximate mean value analysis. The methodology is light-weight as it relies on capturing application resource demands using standard system monitoring, via non-intrusive low-level measurements.
We evaluate our approach on an IBM Power7 system using the DaCapo and SPECjvm2008 benchmark suites. From 900 different consolidations of application instances, our tool accurately predicts the average iteration time of collocated applications with an average error below 9 per cent. Experimental and analytical results are in excellent agreement, confirming the robustness of the proposed methodology in suggesting the best consolidations that meet given performance objectives of individual applications while maximizing system resource utilization.
Similar content being viewed by others
Notes
Because the applications that we use to evaluate the methodology proposed in this paper are composed by a set of iterations, we use the average “iteration time” as a measure of the application end-to-end execution time. Effectively, a scaled iteration time (multiplied by the number of iterations) expresses the application execution time.
Throughout this paper we use the terms “program”, “benchmark” and “application” interchangeably.
Throughout this paper we use the terms “JVM process” and “application instance” interchangeably.
MVA provides the exact solution of product form queueing networks, whose solutions of the steady-state probabilities can be expressed as a product of factors describing the state of each queuing node.
Of course, any alternative definition of a “target” iteration would also work.
Of course, this scenario can be changed and the number of primary execution instances can be any integer. We have done experiments that have varied this number from 1 to 10 but are not reported here due to lack of space. The selected number of four primary consolidated applications is representative of all experiments.
References
Ansaloni, D., Chen, L.Y., Smirni, E., Binder, W.: Model-driven consolidation of Java workloads on multicores. In: Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN-PDS), pp. 1–12 (2012)
Apparao, P., Iyer, R., Zhang, X., Newell, D., Adelmeyer, T.: Characterization & analysis of a server consolidation benchmark. In: Proceedings of VEE, pp. 21–30 (2008)
Balbo, G., Serazzi, G.: Asymptotic analysis of multiclass closed queueing networks: common bottleneck. Perform. Eval. 26(1), 51–72 (1996)
Blackburn, S., Garner, R., Hoffman, C., Khan, A., McKinley, K., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J., Phansalkar, A., Stefanović, D., von Dincklage, D., Wiedermann, B.: The DaCapo benchmarks: Java benchmarking development and analysis. In: Proceedings of OOPSLA, pp. 169–190 (2006)
Chen, J., John, L., Kaseridis, D.: Modeling program resource demand using inherent program characteristics. In: Proceedings of SIGMETRICS, pp. 1–12 (2011)
Chen, L.Y., Ansaloni, D., Smirni, E., Yokokawa, A., Binder, W.: Achieving application-centric performance targets via consolidation on multicores: myth or reality? In: Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (HPDC), pp. 37–48 (2012)
Chen, L.Y., Das, A., Qin, W., Sivasubramaniam, A., Wang, Q., Harper, R., Morris, B.: Consolidating clients on back-end servers with co-location and frequency control. ACM SIGMETRICS Perform. Eval. Rev. 34, 383–384 (2006)
Dey, T., Wang, W., Davidson, J., Soffa, M.: Characterizing multi-threaded applications based on shared-resource contention. In: Proceedings of ISPASS, pp. 76–86 (2011)
Govindan, S., Liu, J., Kansal, A., Sivasubramaniam, A.: Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In: Proceedings of the ACM Symposium on Cloud Computing (SOCC) (2011)
Hauswirth, M., Sweeney, P., Diwan, A., Hind, M.: Vertical profiling: understanding the behavior of object-oriented applications. In: Proceedings of OOPSLA, pp. 251–269 (2004)
Hines, M.R., Gordon, A., Silva, M., da Silva, D., Ryu, K.D., Ben-Yehuda, M.: Applications know best: performance-driven memory overcommit with ginkgo. Tech. rep., IBM (2011)
Ïpek, E., McKee, S., Caruana, R., de Supinski, B., Schulz, M.: Efficiently exploring architectural design spaces via predictive modeling. In: Proceedings of ASPLOS, pp. 195–206 (2006)
Jerger, N., Vantreaseand, D., Lipast, M.: An evaluation of server consolidation workloads for multi-core designs. In: Proceedings of IISWC, pp. 47–56 (2007)
Knauerhase, R., Brett, P., Hohlt, B., Li, T., Hahn, S.: Using OS observations to improve performance in multicore systems. IEEE MICRO 28, 54–66 (2008)
Koh, Y., Knauerhase, R.C., Brett, P., Bowman, M., Wen, Z., Pu, C.: An analysis of performance interference effects in virtual environments. In: Proceedings of ISPASS, pp. 200–209 (2007)
Lee, B., Collins, J., Wang, H., Brooks, D.: CPR: composable performance regression for scalable multiprocessor models. In: Proceedings of Micro, pp. 270–281. IEEE Computer Society, Washington (2008)
Lipsky, L., Lieu, C., Tehranipour, A., van de Liefvoort, A.: On the asymptotic behavior of time-sharing systems. Commun. ACM 25(10), 707–714 (1982)
Menascé, D., Almeida, V., Dowdy, L.: Capacity Planning and Performance Modeling: From Mainframes to Client-Server Systems. Prentice Hall, New York (1994)
Meng, X., Isci, C., Kephart, J., Zhang, L., Bouillet, E., Pendarakis, D.: Efficient resource provisioning in compute clouds via VM multiplexing. In: Proceedings of ICAC, pp. 11–20 (2010)
Mi, N., Casale, G., Cherkasova, L., Smirni, E.: Burstiness in multi-tier applications: symptoms, causes, and new models. In: Proceedings of Middleware, pp. 265–286 (2008)
Nathuji, R., Kansal, A., Ghaffarkhah, A.: Q-clouds: managing performance interference effects for QoS-aware clouds. In: Proceedings of EuroSys, pp. 237–250 (2010)
Reiser, M., Lavenberg, S.S.: Mean-value analysis of closed multichain queuing networks. J. ACM 27, 313–322 (1980)
Sharifi, A., Srikantaiah, S., Mishra, A., Kandemir, M., Das, C.: METE: meeting end-to-end QoS in multicores through system-wide resource management. In: Proceedings of SIGMETRICS, pp. 13–24 (2011)
Song, X., Chen, H., Chen, R., Wang, Y., Zang, B.: A case for scaling applications to many-core with OS clustering. In: Proceesings of EuroSys, pp. 61–76 (2011)
Tallent, N., Mellor-Crummey, J.: Effective performance measurement and analysis of multithreaded applications. SIGPLAN Not. 44, 229–240 (2009)
Urgaonkar, B., Pacifici, G., Spreitzer, P.S.M., Tantawi, A.: An analytical model for multi-tier Internet services and its applications. In: Proceedings of SIGMETRICS, pp. 291–302 (2005)
Wood, T., Cherkasova, L., Ozonat, K., Shenoy, P.: Profiling and modeling resource usage of virtualized applications. In: Proceedings of Middleware, pp. 366–387 (2008)
Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Sandpiper: black-box and gray-box resource management for virtual machines. Comput. Netw. 53, 2923–2938 (2009)
Zhang, Q., Cherkasova, L., Mi, N., Smirni, E.: A regression-based analytic model for capacity planning of multi-tier applications. Clust. Comput. 11, 197–211 (2008)
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: Proceesings of ASPLOS, pp. 129–142 (2010)
Acknowledgements
This work has been supported by IBM and the Swiss National Science Foundation (project 200021 141002). Part of this work was conducted while Danilo Ansaloni was on an internship and Evgenia Smirni was on sabbatical leave at the IBM Zurich Research Laboratory. Evgenia Smirni is partially supported by NSF grants CCF-0937925 and CCF-1218758. A preliminary version [6] of this paper appeared in the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC’12, Delft, Netherlands, June 18–22, 2012.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, L.Y., Serazzi, G., Ansaloni, D. et al. What to expect when you are consolidating: effective prediction models of application performance on multicores. Cluster Comput 17, 19–37 (2014). https://doi.org/10.1007/s10586-013-0273-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-013-0273-8