skip to main content
10.1145/2370816.2370869acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

A software memory partition approach for eliminating bank-level interference in multicore systems

Authors Info & Claims
Published:19 September 2012Publication History

ABSTRACT

Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to address the interference problem. However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers, as a result, industrial venders seem to have some hesitation in adopting them.

This paper presents a practical software approach to effectively eliminate the interference without hardware modification. The key idea is to modify the OS memory management subsystem to adopt a page-coloring based bank-level partition mechanism (BPM), which allocates specific DRAM banks to specific cores (threads). By using BPM, memory controllers can passively schedule memory requests in a core-cluster (or thread-cluster) way.

We implement BPM in Linux 2.6.32.15 kernel and evaluate BPM on 4-core and 8-core real machines by running randomly generated 20 multi-programmed workloads (each contains 4/8 benchmarks) and multi-threaded benchmark. Experimental results show that BPM can improve the overall system throughput by 4.7% on average (up to 8.6%), and reduce the maximum slowdown by 4.5% on average (up to 15.8%). Moreover, BPM also saves 5.2% of the energy consumption of memory system.

References

  1. Hewlett-Packed Development Company. Perfmon project. http: //www.hpl.hp.com/research/linux/ perfmon.Google ScholarGoogle Scholar
  2. Standard Performance Evaluation Corporation. http://www.spec.org/cpu2006/CINT2006/.Google ScholarGoogle Scholar
  3. N. Aggarwal et al. Power Efficient DRAM Speculation. In HPCA-14, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Azimi, D. K. Tam, L. Soares, and M. Stumm. Enhancing Operating System Support for Multicore Processors by Using Hardware Performance Monitoring. In ACM SIGOPS Operating Systems Review 43(2): 56--65, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Bao et al. HMTT: A Platform Independent Full-System Memory Trace Monitoring System. In SIGMETRICS-08, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Beamer et al. Re-Architecting DRAM Memory Systems with Monolithically Integrated Silicon Photonics. In ISCA-37, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton Univ., Jan. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Cho, and L. Jin. Managing Distributed, Shared L2 Caches through OS-Level page Allocation. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Carter, IBM Power Aware Systems. Personal Correspondence, 2011.Google ScholarGoogle Scholar
  10. Z.Cui, Y.Zhu, Y.Bao and M.Chen. A Fine-grained Component-level power measurement method. In PMP,2011.Google ScholarGoogle Scholar
  11. G. Dhiman, G. Marchetti, and T. Rosing. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design. In ISLPED-2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Demme et al, Rapid Identification of Architectural Bottlenecks via Precise Event Counting. In ISCA, 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In HPCA-8, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. In Journal of Supercomputing, 28(1), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Hur and C. Lin. Memory scheduling for modern microprocessors. ACM Transactions on Computer Systems, 25(4), December 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Iyer et al, QoS policy and architecture for cache/memory in CMP platforms. In SIGMETRICS-07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. J. Lee et al. Improving memory bank-level parallelism in the presence of prefetching. In MICRO-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Kim, M. Papamicheal and O. Mutlu. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In MICRO-43, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y . Kim et al. A TLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16, 2010.Google ScholarGoogle Scholar
  20. D. Kaseridis, J. Stuecheli, and L. K. John. Minimalist Open- page: A DRAM Page-mode Scheduling Policy for the many- core Era. In MICRO-44, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. In Micro-41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. L. Yuan et al. Complexity effective memory access scheduling for many-core accelerator architectures. In MICRO-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In HPCA-14, 2008.Google ScholarGoogle Scholar
  24. J. Liedtke, H. Haertig, and M. Hohmuth. OS-Controlled Cache Predictability for Real-Time Systems. In RTAS-3, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Mi, X. Feng, J. Xue, and Y. Jia. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proc. the 2010 IFIP Int'l Conf. Network and Parallel Computing (NPC), Sep. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Natarajan, B. Christenson, and F. Briggs. A Study of Performance Impact of Memory Controller Features in Multi- Processor Environment. In Proceedings of WMPI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Prashanth et al. Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Micro-44, 2011.Google ScholarGoogle Scholar
  31. M. K. Qureshi, and Y . N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. K. Jeong, D. H. Yoon et al. Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems. In HPCA- 18, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Rixner, W. J. Dally, U. J. Kapasi, P. R. Mattson, and J. D. Owens. Memory access scheduling. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Rogers et al. Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling. In ISCA-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. Micro-Pages: Increasing DRAM Efficiency with Locality-Aware. In ASPLOS-2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. S. Stone, J. Turek, and J. L. Wolf. Optimal Partitioning of Cache Memory. In IEEE Transactions on Computers, 41(9), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Udipi et al. Rethinking DRAM design and organization for energy-constrained multi-cores. ISCA, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In MICRO-33, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS-XV, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A software memory partition approach for eliminating bank-level interference in multicore systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
        September 2012
        512 pages
        ISBN:9781450311823
        DOI:10.1145/2370816

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 September 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate121of471submissions,26%

        Upcoming Conference

        PACT '24
        International Conference on Parallel Architectures and Compilation Techniques
        October 14 - 16, 2024
        Southern California , CA , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader