Skip to main content
Log in

A Phase Behavior Aware Dynamic Cache Partitioning Scheme for CMPs

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

In multi-program environment, cache contention among processors can significantly degrade system performance. Cache partitioning served as an effective measure has been widely studied, especially for dynamic cache partitioning. However, it is difficult to decide the best cache quota which should be allocated to co-scheduled programs and the best time when a cache adjusting should be performed in dynamic cache partitioning scheme. This paper presents a novel dynamic cache partitioning mechanism based on the phase behavior of programs. It uses the performance monitoring units of modern processors and detects the phase behavior of programs to guide the cache partitioning at run-time. Since programs have recurring phase behavior during the whole execution time, on one hand, we can adjust the cache quota when a phase change occurs, on the other hand, we can make cache partitioning policy with higher accuracy and lower overhead by classifying phases. The method proposed in this work is validated in the measured results for applications from SPEC CPU 2006 benchmark suite. Compared with the performance of shared cache scheme, our method can achieve a speedup up to 1.214 for co-scheduled applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, HPCA-11, pp. 340–351 (2005)

  2. Cho, S., Jin, L.: Managing distributed, shared l2 caches through os-level page allocation. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 39, pp. 455–468. IEEE Computer Society, Washington, DC, USA (2006)

  3. Davies, B., Bouguet, J., Polito, M., Annavaram, M.: ipart: an automated phase analysis and recognition tool. Tech. rep., IR-TR-2004-1-iPART, Intel Corporation (2004)

  4. Dhodapkar, A., Smith, J.: Managing multi-configuration hardware via dynamic working set analysis. In: Proceedings of the 29th Annual International Symposium on Computer Architecture, ISCA 29, pp. 233–244 (2002)

  5. Dhodapkar, A.S., Smith, J.E.: Comparing program phase detection techniques. In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 36, pp. 217–228. IEEE Computer Society, Washington, DC, USA (2003)

  6. He, L., Yu, Z., Jin, H.: Fractalmrc: online cache miss rate curve prediction on commodity systems. In: Proceedings of the IEEE 26th International Parallel Distributed Processing Symposium, IPDPS-26, pp. 1341–1351 (2012)

  7. Henning, J.L.: Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News 34(4), 1–17 (2006)

    Article  MathSciNet  Google Scholar 

  8. Isci, C., Contreras, G., Martonosi, M.: Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 39, pp. 359–370. IEEE Computer Society, Washington, DC, USA (2006)

  9. Kihm, J., Settle, A., Janiszewski, A., Connors, D.A.: Understanding the impact of inter-thread cache interference on ilp in modern smt processors. J Instr Level Parallelism 7(2), 1–28 (2005)

  10. Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. PACT ’04, pp. 111–122. IEEE Computer Society, Washington, DC, USA (2004)

  11. Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: Proceedings of the 14th International Symposium on High Performance Computer Architecture, HPCA-14, pp. 367–378 (2008)

  12. Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., Sadayappan, P.: Enabling software management for multicore caches with a lightweight hardware support. In: Proceedings of the Conference on High Performance Computing Networking. Storage and Analysis, SC ’09, pp. 1–12. ACM, New York, NY, USA (2009)

  13. Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., Karunanidhi, A.: Pinpointing representative portions of large intel programs with dynamic instrumentation. In: Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 37, pp. 81–92. IEEE Computer Society, Washington, DC, USA (2004)

  14. Perelman, E., Polito, M., Bouguet, J.Y., Sampson, J., Calder, B., Dulong, C.: Detecting phases in parallel applications on shared memory architectures. In: Proceedings of the 20th International Parallel and Distributed Processing Symposium, IPDPS 20 (2006)

  15. Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 39, pp. 423–432. IEEE Computer Society, Washington, DC, USA (2006)

  16. Ravindar, A., Srikant, Y.N.: Implications of program phase behavior on timing analysis. In: Proceedings of the 15th Workshop on the Interaction between Compilers and Computer Architectures, pp. 71–79 (2011)

  17. Sembrant, A., Eklov, D., Hagersten, E.: Efficient software-based online phase classification. In: Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC ’11, pp. 104–115 (2011)

  18. Shen, X., Zhong, Y., Ding, C.: Locality phase prediction. In: Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS XI, pp. 165–176. ACM, New York, NY, USA (2004)

  19. Sherwood, T., Calder, B.: Time varying behavior of programs. Tech. Rep. CS99-630. University of California, San Diego (1999)

  20. Sherwood, T., Sair, S., Calder, B.: Phase tracking and prediction. In: Proceedings of the 30th Annual International Symposium on Computer Architecture. ISCA ’03, pp. 336–349. ACM, New York, NY, USA (2003)

  21. Srikantaiah, S., Kandemir, M., Irwin, M.J.: Adaptive set pinning: managing shared caches in chip multiprocessors. In: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS XIII, pp. 135–144. ACM, New York, NY, USA (2008)

  22. Srivastava, A., Eustace, A.: Atom: A system for building customized program analysis tools. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation. PLDI ’94, pp. 196–205. ACM, New York, NY, USA (1994)

  23. Suh, G., Rudolph, L., Devadas, S.: Dynamic partitioning of shared cache memory. J. Supercomput. 28(1), 7–26 (2004)

    Article  MATH  Google Scholar 

  24. Sundararajan, K., Porpodas, V., Jones, T., Topham, N., Franke, B.: Cooperative partitioning: energy-efficient cache partitioning for high-performance cmps. In: Proceedings of the 18th International Symposium on High Performance Computer Architecture, HPCA-18, pp. 1–12 (2012)

  25. Tam, D., Azimi, R., Soares, L., Stumm, M.: Managing shared l2 caches on multicore systems in software. In: Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture, pp. 26–33 (2007)

  26. Taylor, G., Davies, P., Farmwald, M.: The tlb slice—a low-cost high-speed address translation mechanism. In: Proceedings of the 17th Annual International Symposium on Computer Architecture. ISCA ’90, pp. 355–363. ACM, New York, NY, USA (1990)

  27. Van Biesbrouck, M., Sherwood, T., Calder, B.: A co-phase matrix to guide simultaneous multithreading simulation. In: Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS ’04, pp. 45–56 (2004)

  28. Yu, Z., Zhang, W., Tu, X.: Mt-profiler: a parallel dynamic analysis framework based on two-stage sampling. In: Olivier, T., Pen-Chung, Y., Binyu Z. (eds.) Advanced Parallel Processing Technologies, pp. 172–185. Springer, New York (2011)

Download references

Acknowledgments

This paper is supported by China National Natural Science Foundation under Grant Nos. 61272408, 61322210, National High-tech Research and Development Program of China (863 Program) under Grant No. 2012AA010905, CCCPC Youngth Talent Plan, Doctoral Fund of Ministry of Education of China under Grant No. 20130142110048.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofei Liao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, X., Guo, R., Yu, D. et al. A Phase Behavior Aware Dynamic Cache Partitioning Scheme for CMPs. Int J Parallel Prog 44, 68–86 (2016). https://doi.org/10.1007/s10766-014-0334-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-014-0334-5

Keywords

Navigation