Skip to main content
Log in

CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Applications in High Performance Computing (HPC) cloud are characterized by large cache resource consumption due to large-scale inputs and intensive communications, which creates serious Shared Last Level cache (SLLC) performance bottleneck. Current system software stacks are not efficient in addressing this issue among virtual machines at the hypervisor level or the threads at the operating system level. In this paper, we investigate performance interference due to contention for SLLC in the HPC cloud. We employ an enhanced reuse distance analysis technique with an accelerated cyclic compression algorithm to identify application’s cache interference intensity. Based on reuse distance analysis, we propose a practical Cache Contention-Aware virtual machine Placement approach (CCAP). CCAP dispatches virtual machines according to their cache interference intensities to avoid cache pollution and interference, thus alleviating negative effects of cache contention. We implement CCAP in the Xen hypervisor. Evaluation of NPB workload reveals that CCAP can improve performance of cache sensitive applications when they are co-scheduled with cache pollution programs. For a 2-workload system, it reduces execution time by 12 %, as well as cache miss rate by 13 %, while increasing throughput by 13 %, on average. Moreover, CCAP also improves the average performance of the cache pollution programs by 5 %. For a 4-workload system, CCAP brings more significant performance improvement to cache sensitive applications, an average increase of 20 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Alarm, S., Barrett, R.F., Kuehn, J.A., Roth, P.C., Vetter, J.S.: Characterization of scientific workloads on systems with multi-core processors. In: Proceedings of IEEE International Symposium on Workload Characterization (IISWC’06), pp. 225–236. IEEE (2006)

  2. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., et al.: The nas parallel benchmarks—summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (SC’91), pp. 158–165. ACM (1991)

  3. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the 9th ACM Symposium on Operating Systems Principles (SOSP’03), pp. 164–177. ACM (2003)

  4. Barker, D.P.: Realities of multi-core CPU chips and memory contention. In: Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP’2009), pp. 446–453. IEEE (2009)

  5. Borkar, S.: Thousand core chips: a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference, pp. 746–749. ACM (2007)

  6. Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05), pp. 340–351. IEEE (2005)

  7. Chang, J., Sohi, G.S.: Cooperative cache partitioning for chip multiprocessors. In: Proceedings of the 21st Annual International Conference on Supercomputing (SC’07), pp. 242–252. ACM (2007)

  8. Cohen, W.E.: Tuning programs with oprofile. Wide Open Mag. 1, 53–62 (2004)

    Google Scholar 

  9. Ding, C., Zhong, Y.: Predicting whole-program locality through reuse distance analysis. In: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI’03), pp. 245–257. ACM (2003)

  10. Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., Veidenbaum, A.V.: Improving cache management policies using dynamic reuse distances. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’12), pp. 389–400. IEEE (2012)

  11. Fedorova, A., Seltzer, M., Smith, M.D.: Improving performance isolation on chip multiprocessors via an operating system scheduler. In: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT’07), pp. 25–38. IEEE (2007)

  12. Goldberg, R.P.: Survey of virtual machine research. Computer 7(6), 34–45 (1974)

    Article  Google Scholar 

  13. Guo, F., Kannan, H., Zhao, L., Illikkal, R., Iyer, R., Newell, D., Solihin, Y., Kozyrakis, C.: From chaos to QoS: case studies in CMP resource management. ACM SIGARCH Comput. Archit. News 35(1), 21–30 (2007)

    Article  Google Scholar 

  14. Hao, S., Du, Z., Bader, D.A., Ye, Y.: A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache. In: Proceedings of the 38th International Conference on Parallel Processing (ICPP’09), pp. 396–403. IEEE (2009)

  15. Hsu, L.R., Reinhardt, S.K., Iyer, R., Makineni, S.: Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT’06), pp. 13–22. ACM (2006)

  16. Hsu, W.C., Chen, H., Yew, P.C., Chen, D.Y.: On the predictability of program behavior using different input data sets. In: Proceedings of 6th Annual Workshop on Interaction Between Compilers and Computer Architectures, pp. 45–53. IEEE (2002)

  17. Iyer, R.: CQoS: a framework for enabling Qos in shared caches of CMP platforms. In: Proceedings of the 18th Annual International Conference on Supercomputing (SC’04), pp. 257–266. ACM (2004)

  18. Jahre, M., Natvig, L.: A light-weight fairness mechanism for chip multiprocessor memory systems. In: Proceedings of the 6th ACM Conference on Computing Frontiers (CF’09), pp. 1–10. ACM (2009)

  19. Jaleel, A., Theobald, K.B., Steely, S.C. Jr., Emer, J.: High performance cache replacement using re-reference interval prediction (rrip). In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10), pp. 60–71. ACM (2010)

  20. Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT’04), pp. 111–122. IEEE (2004)

  21. Lu, Q., Lin, J., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Soft-olp: improving hardware cache performance through software-controlled object-level partitioning. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09), pp. 246–257. IEEE (2009)

  22. Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05), pp. 190–200. ACM (2005)

  23. Mattson, R.L., Gecsei, J., Slutz, D.R., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM Syst. J. 9(2), 78–117 (1970)

    Article  Google Scholar 

  24. Nesbit, K.J., Laudon, J., Smith, J.E.: Virtual private caches. In: Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07), pp. 57–68. ACM (2007)

  25. Nesbit, K.J., Moreto, M., Cazorla, F.J., Ramirez, A., Valero, M., Smith, J.E.: Multicore resource management. IEEE Micro 28(3), 6–16 (2008)

    Article  Google Scholar 

  26. Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06), pp. 423–432. IEEE (2006)

  27. Rosenblum, M., Garfinkel, T.: Virtual machine monitors: current technology and future trends. Computer 38(5), 39–47 (2005)

    Article  Google Scholar 

  28. Schuff, D.L., Parsons, B.S., Pai, V.S.: Multicore-aware reuse distance analysis. In: Proceedings of 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Ph.d. Forum (IPDPSW’10), pp. 1–8. IEEE (2010)

  29. Smith, J.E., Nair, R.: The architecture of virtual machines. Computer 38(5), 32–38 (2005)

    Article  Google Scholar 

  30. Soares, L., Tam, D., Stumm, M.: Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’08), pp. 258–269. IEEE (2008)

  31. Suo, G., Yang, X., Liu, G., Wu, J., Zeng, K., Zhang, B., Lin, Y.: IPC-based cache partitioning: an IPC-oriented dynamic shared cache partitioning mechanism. In: Proceedings of the 2008 3rd International Conference on Convergence and Hybrid Information Technology (ICHIT’08), pp. 399–406. IEEE (2008)

  32. Zhong, Y., Dropsho, S.G., Shen, X., Studer, A., Ding, C.: Miss rate prediction across program inputs and cache configurations. IEEE Trans. Comput. 56(3), 328–343 (2007)

    Article  MathSciNet  Google Scholar 

  33. Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10), pp. 129–142. ACM (2010)

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their helpful comments. The research is supported by National Science Foundation of China under Grant No. 61232008 and 61073024, ChinaGrid and CRANE project, Outstanding Youth Foundation of Hubei Province under Grant No. 2011CD-A086, National 863 Hi-Tech Research and Development Program under Grant No. 2013AA01A213, and Research Fund for the Doctoral Program of MOE under Grant 20110142130005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Jin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, H., Qin, H., Wu, S. et al. CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud. Int J Parallel Prog 43, 403–420 (2015). https://doi.org/10.1007/s10766-013-0286-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-013-0286-1

Keywords

Navigation