skip to main content
10.1145/1555754.1555778acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Published:20 June 2009Publication History

ABSTRACT

Many multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that sharing-oblivious cache management policies (e.g., LRU) can lead to poor performance and fairness when the multiple cores compete for the limited LLC capacity. Different memory access patterns can cause cache contention in different ways, and various techniques have been proposed to target some of these behaviors. In this work, we propose a new cache management approach that combines dynamic insertion and promotion policies to provide the benefits of cache partitioning, adaptive insertion, and capacity stealing all with a single mechanism. By handling multiple types of memory behaviors, our proposed technique outperforms techniques that target only either capacity partitioning or adaptive insertion.

References

  1. J. Abella, A. González, X. Vera, and M. F. P. O'Boyle. IATAC: A Smart Predictor to Turn-Off L2 Cache Lines. Trans. on Architecture and Code Optimization, 2(1):55--77, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Micro Magazine, pages 59--67, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. A. Bader, Y. Li, T. Li, and V. Sachdeva. BioPerf: A Benchmark Suite to Evaluate High-Performance Computer Architecture of Bioinformatics Applications. In Proc. of the IEEE Int. Symp. on Workload Characterization, pages 163--173, Austin, TX, USA, Oct. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. Behar, A. Mendelson, and A. Kolodny. Trace Cache Sampling Filter. In Proc. of the 14th Int. Conference on Parallel Architectures and Compilation Techniques, pages 255--266, St. Louis, MO, USA, Sep. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. S. Bolme, M. M. Strout, and J. R. Beveridge. FacePerf: Benchmarks for Face Recognition Algorithms. In Proc. of the IEEE Int. Symp. on Workload Characterization, Boston, MA, USA, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contenton on a Chip Multi-Processor Architecture. In Proc. of the 11th Int. Symp. on High Performance Computer Architecture, pages 340--351, San Francisco, CA, USA, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chang and G. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. In Proc. of the 21st Int. Conference on Supercomputing, pages 242--252, Seattle, WA, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Chiou. Extending the Reach of Microprocessors: Column and Curious Caching. PhD thesis, Massachusettts Institute of Technology, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Doweck. Inside Intel Core Microarchitecture and Smart Memory Access. White paper, Intel Corporation, 2006. http://download.intel.com/technology/architecture/sma.pdf.Google ScholarGoogle Scholar
  10. K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy Caches: Simple Techniques for Reducing Leakage Power. In Proc. of the 29th Int. Symp. on Computer Architecture, pages 148--157, Anchorage, AK, USA, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Ghasemzadeh, S. Mazrouee, and M. R. Kakoee. Modified Pseudo LRU Replacement Algorithm. In Proc. of the Int. Symp. on Low Power Electronics and Design, pages 27--30, Potsdam, Germany, Mar. 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A Framework for Providing Quality of Service in Chip Multi-Processors. In Proc. of the 40th Int. Symp. on Microarchitecture, Chicago, IL, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commerically Representative Embedded Benchmark Suite. In Proc. of the 4th Workshop on Workload Characterization, pages 83--94, Austin, TX, USA, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and More Flexible Program Analysis. In Proc. of the Workshop on Modeling, Benchmarking and Simulation, Madison, WI, USA, June 2005.Google ScholarGoogle Scholar
  15. L. R. Hsu, S. K. Reinhardt, R. R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. In Proc. of the 15th Int. Conference on Parallel Architectures and Compilation Techniques, pages 13--22, Seattle, WA, USA, Sep. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Hu, M. Martonosi, and S. Kaxiras. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc. of the 29th Int. Symp. on Computer Architecture, pages 209--220, Anchorage, AK, USA, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Iyer. CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms. In Proc. of the Int. Conference on Supercomputing, Saint-Malo, France, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS Policies and Architecture for Cache/Memory in CMP Platforms. In Proc. of the ACM SIGMETRICS, San Diego, CA, USA, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. S. Jr., and J. Emer. Adaptive Insertion Policies for Managing Shared Caches. In Proc. of the 17th Int. Conference on Parallel Architectures and Compilation Techniques, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Kaxiras, Z. Hu, and M. Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. In Proc. of the 28th Int. Symp. on Computer Architecture, pages 240--251, Göteborg, Sweden, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Kharbutli and Y. Solihin. Counter-Based Cache Replacement Algorithms. In Proc. of the Int. Conference on Computer Design, pages 61--68, San Jose, CA, USA, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Kharbutli and Y. Solihin. Counter-Based Cache Replacement and Bypassing Algorithms. Trans. on Computers, 57(4):433--447, Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proc. of the 13th Int. Conference on Parallel Architectures and Compilation Techniques, pages 111--122, Antibes Juan-les-Pins, France, Sep. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Kim, D. Chandra, and Y. Solihin. Fair Caching in a Chip Multi-Processor Architecture. In Proc. of the IBM P=ACÆ2 Conference, Yorktown Heights, NY, USA, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. D. Kron, B. Prumo, and G. H. Loh. Double-DIP: Augmenting DIP with Adaptive Promotion Policies to Manage Shared L2 Caches. In Proc. of the Workshop on Chip Multiprocessor Memory Systems and Interconnects, Beijing, China, June 2008.Google ScholarGoogle Scholar
  26. A.-C. Lai, C. Fide, and B. Falsafi. Dead--Block Prediction&Dead-Block Correlating Prefetchers. In Proc. of the 28th Int. Symp. on Microarchitecture, pages 144--154, Gööteborg, Sweden, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems. In Proc. of the 30th Int. Symp. on Microarchitecture, pages 330--335, Research Triangle Park, NC, USA, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Lin, Q. Lu, X. Ding, Z. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In Proc. of the 14th Int. Symp. on High Performance Computer Architecture, pages 367--378, Salt Lake City, UT, USA, Feb. 2008.Google ScholarGoogle Scholar
  29. H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency. In Proc. of the 41st Int. Symp. on Microarchitecture, pages 222--233, Lake Como, Italy, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. H. Loh, S. Subramaniam, and Y. Xie. Zesto: A Cycle-Level Simulator for Highly Detailed Microarchitecture Exploration. In Proc. of the Int. Symp. on Performance Analysis of Systems and Software, Boston, MA, USA, Apr. 2009.Google ScholarGoogle ScholarCross RefCross Ref
  31. K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. In Proc. of the 2001 Int. Symp. on Performance Analysis of Systems and Software, pages 164--171, Tucson, AZ, USA, Nov. 2001.Google ScholarGoogle Scholar
  32. R. Narayanan, B. Ozisikyilmax, J. Zambreno, G. Memik, and A. N. Choudhary. MineBench: A Benchmark Suite for Data Mining Workloads. In Proc. of the IEEE Int. Symp. on Workload Characterization, pages 182---188, San Jose, CA, USA, Oct. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  33. M. K. Qureshi, , D. Lynch, O. Mutlu, and Y. N. Patt. A Case for MLP-Aware Cache Replacement. In Proc. of the 33rd Int. Symp. on Computer Architecture, pages 167--178, Boston, MA, USA, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. K. Qureshi. Dynamic Spill-Accept for Scalable High-Performance Caching in CMPs. In Proc. of the 15th Int. Symp. on High Performance Computer Architecture, Raleigh, NC, USA, Feb. 2009.Google ScholarGoogle Scholar
  35. M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. Emer. Adaptive Insertion Policies for High-Performance Caching. In Proc. of the 34th Int. Symp. on Computer Architecture, pages 381--391, San Diego, CA, USA, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Int. Symp. on Microarchitecture, pages 423--432, Orlando, FL, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. Rafique, W.-T. Lin, and M. Thottethodi. Architectural Support for Operating System-Driven CMP Cache Management. In Proc. of the 15th Int. Conference on Parallel Architectures and Compilation Techniques, pages 2--12, Seattle, WA, USA, Sep. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Srikantaiah, M. Kandemir, and M. J. Irwin. Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors. In Proc. of the 13th Symp. on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, USA, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. S. Stone, J. Tuerk, and J. L. Wolf. Optimal Paritioning of Cache Memory. Trans. on Computers, 41(9):1054--1068, Sep. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Jour. of Supercomputing, 28(1):7--26, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. Y. Yeh, P. Faloutsos, S. J. Patel, and G. Reinman. ParallAX: an Architecture for Real-Time Physics. In Proc. of the 34th Int. Symp. on Computer Architecture, pages 232--243, San Diego, CA, USA, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
      June 2009
      510 pages
      ISBN:9781605585260
      DOI:10.1145/1555754
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
        June 2009
        495 pages
        ISSN:0163-5964
        DOI:10.1145/1555815
        Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate543of3,203submissions,17%

      Upcoming Conference

      ISCA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader