CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud

Jin, Hai; Qin, Hanfeng; Wu, Song; Guo, Xuerong

doi:10.1007/s10766-013-0286-1

CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud

Published: 22 October 2013

Volume 43, pages 403–420, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Hai Jin¹,
Hanfeng Qin¹,
Song Wu¹ &
…
Xuerong Guo¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Applications in High Performance Computing (HPC) cloud are characterized by large cache resource consumption due to large-scale inputs and intensive communications, which creates serious Shared Last Level cache (SLLC) performance bottleneck. Current system software stacks are not efficient in addressing this issue among virtual machines at the hypervisor level or the threads at the operating system level. In this paper, we investigate performance interference due to contention for SLLC in the HPC cloud. We employ an enhanced reuse distance analysis technique with an accelerated cyclic compression algorithm to identify application’s cache interference intensity. Based on reuse distance analysis, we propose a practical Cache Contention-Aware virtual machine Placement approach (CCAP). CCAP dispatches virtual machines according to their cache interference intensities to avoid cache pollution and interference, thus alleviating negative effects of cache contention. We implement CCAP in the Xen hypervisor. Evaluation of NPB workload reveals that CCAP can improve performance of cache sensitive applications when they are co-scheduled with cache pollution programs. For a 2-workload system, it reduces execution time by 12 %, as well as cache miss rate by 13 %, while increasing throughput by 13 %, on average. Moreover, CCAP also improves the average performance of the cache pollution programs by 5 %. For a 4-workload system, CCAP brings more significant performance improvement to cache sensitive applications, an average increase of 20 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

When Partitioning Works and When It Doesn’t: An Empirical Study on Cache Way Partitioning

RCS: Hybrid Co-scheduling Optimization in Virtualized System

Towards Interference-Aware Dynamic Scheduling in Virtualized Environments

References

Alarm, S., Barrett, R.F., Kuehn, J.A., Roth, P.C., Vetter, J.S.: Characterization of scientific workloads on systems with multi-core processors. In: Proceedings of IEEE International Symposium on Workload Characterization (IISWC’06), pp. 225–236. IEEE (2006)
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., et al.: The nas parallel benchmarks—summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (SC’91), pp. 158–165. ACM (1991)
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the 9th ACM Symposium on Operating Systems Principles (SOSP’03), pp. 164–177. ACM (2003)
Barker, D.P.: Realities of multi-core CPU chips and memory contention. In: Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP’2009), pp. 446–453. IEEE (2009)
Borkar, S.: Thousand core chips: a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference, pp. 746–749. ACM (2007)
Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05), pp. 340–351. IEEE (2005)
Chang, J., Sohi, G.S.: Cooperative cache partitioning for chip multiprocessors. In: Proceedings of the 21st Annual International Conference on Supercomputing (SC’07), pp. 242–252. ACM (2007)
Cohen, W.E.: Tuning programs with oprofile. Wide Open Mag. 1, 53–62 (2004)
Google Scholar
Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI’03), pp. 245–257. ACM (2003)
Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., Veidenbaum, A.V.: Improving cache management policies using dynamic reuse distances. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’12), pp. 389–400. IEEE (2012)
Fedorova, A., Seltzer, M., Smith, M.D.: Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT’07), pp. 25–38. IEEE (2007)
Goldberg, R.P.: Survey of virtual machine research. Computer 7(6), 34–45 (1974)
Article Google Scholar
Guo, F., Kannan, H., Zhao, L., Illikkal, R., Iyer, R., Newell, D., Solihin, Y., Kozyrakis, C.: From chaos to QoS: case studies in CMP resource management. ACM SIGARCH Comput. Archit. News 35(1), 21–30 (2007)
Article Google Scholar
Hao, S., Du, Z., Bader, D.A., Ye, Y.: A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache. In: Proceedings of the 38th International Conference on Parallel Processing (ICPP’09), pp. 396–403. IEEE (2009)
Hsu, L.R., Reinhardt, S.K., Iyer, R., Makineni, S.: Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT’06), pp. 13–22. ACM (2006)
Hsu, W.C., Chen, H., Yew, P.C., Chen, D.Y.: On the predictability of program behavior using different input data sets. In: Proceedings of 6th Annual Workshop on Interaction Between Compilers and Computer Architectures, pp. 45–53. IEEE (2002)
Iyer, R.: CQoS: a framework for enabling Qos in shared caches of CMP platforms. In: Proceedings of the 18th Annual International Conference on Supercomputing (SC’04), pp. 257–266. ACM (2004)
Jahre, M., Natvig, L.: A light-weight fairness mechanism for chip multiprocessor memory systems. In: Proceedings of the 6th ACM Conference on Computing Frontiers (CF’09), pp. 1–10. ACM (2009)
Jaleel, A., Theobald, K.B., Steely, S.C. Jr., Emer, J.: High performance cache replacement using re-reference interval prediction (rrip). In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10), pp. 60–71. ACM (2010)
Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT’04), pp. 111–122. IEEE (2004)
Lu, Q., Lin, J., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Soft-olp: improving hardware cache performance through software-controlled object-level partitioning. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09), pp. 246–257. IEEE (2009)
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05), pp. 190–200. ACM (2005)
Mattson, R.L., Gecsei, J., Slutz, D.R., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM Syst. J. 9(2), 78–117 (1970)
Article Google Scholar
Nesbit, K.J., Laudon, J., Smith, J.E.: Virtual private caches. In: Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07), pp. 57–68. ACM (2007)
Nesbit, K.J., Moreto, M., Cazorla, F.J., Ramirez, A., Valero, M., Smith, J.E.: Multicore resource management. IEEE Micro 28(3), 6–16 (2008)
Article Google Scholar
Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06), pp. 423–432. IEEE (2006)
Rosenblum, M., Garfinkel, T.: Virtual machine monitors: current technology and future trends. Computer 38(5), 39–47 (2005)
Article Google Scholar
Schuff, D.L., Parsons, B.S., Pai, V.S.: Multicore-aware reuse distance analysis. In: Proceedings of 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Ph.d. Forum (IPDPSW’10), pp. 1–8. IEEE (2010)
Smith, J.E., Nair, R.: The architecture of virtual machines. Computer 38(5), 32–38 (2005)
Article Google Scholar
Soares, L., Tam, D., Stumm, M.: Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’08), pp. 258–269. IEEE (2008)
Suo, G., Yang, X., Liu, G., Wu, J., Zeng, K., Zhang, B., Lin, Y.: IPC-based cache partitioning: an IPC-oriented dynamic shared cache partitioning mechanism. In: Proceedings of the 2008 3rd International Conference on Convergence and Hybrid Information Technology (ICHIT’08), pp. 399–406. IEEE (2008)
Zhong, Y., Dropsho, S.G., Shen, X., Studer, A., Ding, C.: Miss rate prediction across program inputs and cache configurations. IEEE Trans. Comput. 56(3), 328–343 (2007)
Article MathSciNet Google Scholar
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10), pp. 129–142. ACM (2010)

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their helpful comments. The research is supported by National Science Foundation of China under Grant No. 61232008 and 61073024, ChinaGrid and CRANE project, Outstanding Youth Foundation of Hubei Province under Grant No. 2011CD-A086, National 863 Hi-Tech Research and Development Program under Grant No. 2013AA01A213, and Research Fund for the Doctoral Program of MOE under Grant 20110142130005.

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Hai Jin, Hanfeng Qin, Song Wu & Xuerong Guo

Authors

Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Hanfeng Qin
View author publications
You can also search for this author in PubMed Google Scholar
Song Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xuerong Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai Jin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, H., Qin, H., Wu, S. et al. CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud. Int J Parallel Prog 43, 403–420 (2015). https://doi.org/10.1007/s10766-013-0286-1

Download citation

Received: 14 April 2013
Accepted: 04 October 2013
Published: 22 October 2013
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10766-013-0286-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud

Abstract

Access this article

Similar content being viewed by others

When Partitioning Works and When It Doesn’t: An Empirical Study on Cache Way Partitioning

RCS: Hybrid Co-scheduling Optimization in Virtualized System

Towards Interference-Aware Dynamic Scheduling in Virtualized Environments

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud

Abstract

Access this article

Similar content being viewed by others

When Partitioning Works and When It Doesn’t: An Empirical Study on Cache Way Partitioning

RCS: Hybrid Co-scheduling Optimization in Virtualized System

Towards Interference-Aware Dynamic Scheduling in Virtualized Environments

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation