research-article

A software memory partition approach for eliminating bank-level interference in multicore systems

Authors:
Lei Liu

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
View Profile

,
Zehan Cui

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
View Profile

,
Mingjie Xing

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
View Profile

,
Yungang Bao

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
View Profile

,
Mingyu Chen

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
View Profile

,
Chengyong Wu

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China

Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
View Profile

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesSeptember 2012Pages 367–376https://doi.org/10.1145/2370816.2370869

Published:19 September 2012Publication History

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 367–376

ABSTRACT

Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to address the interference problem. However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers, as a result, industrial venders seem to have some hesitation in adopting them.

This paper presents a practical software approach to effectively eliminate the interference without hardware modification. The key idea is to modify the OS memory management subsystem to adopt a page-coloring based bank-level partition mechanism (BPM), which allocates specific DRAM banks to specific cores (threads). By using BPM, memory controllers can passively schedule memory requests in a core-cluster (or thread-cluster) way.

We implement BPM in Linux 2.6.32.15 kernel and evaluate BPM on 4-core and 8-core real machines by running randomly generated 20 multi-programmed workloads (each contains 4/8 benchmarks) and multi-threaded benchmark. Experimental results show that BPM can improve the overall system throughput by 4.7% on average (up to 8.6%), and reduce the maximum slowdown by 4.5% on average (up to 15.8%). Moreover, BPM also saves 5.2% of the energy consumption of memory system.

References

Hewlett-Packed Development Company. Perfmon project. http: //www.hpl.hp.com/research/linux/ perfmon.Google Scholar
Standard Performance Evaluation Corporation. http://www.spec.org/cpu2006/CINT2006/.Google Scholar
N. Aggarwal et al. Power Efficient DRAM Speculation. In HPCA-14, 2008.Google ScholarCross Ref
R. Azimi, D. K. Tam, L. Soares, and M. Stumm. Enhancing Operating System Support for Multicore Processors by Using Hardware Performance Monitoring. In ACM SIGOPS Operating Systems Review 43(2): 56--65, 2009. Google ScholarDigital Library
Y. Bao et al. HMTT: A Platform Independent Full-System Memory Trace Monitoring System. In SIGMETRICS-08, 2008 Google ScholarDigital Library
S. Beamer et al. Re-Architecting DRAM Memory Systems with Monolithically Integrated Silicon Photonics. In ISCA-37, 2010. Google ScholarDigital Library
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton Univ., Jan. 2008.Google ScholarDigital Library
S. Cho, and L. Jin. Managing Distributed, Shared L2 Caches through OS-Level page Allocation. In MICRO-39, 2006. Google ScholarDigital Library
J. Carter, IBM Power Aware Systems. Personal Correspondence, 2011.Google Scholar
Z.Cui, Y.Zhu, Y.Bao and M.Chen. A Fine-grained Component-level power measurement method. In PMP,2011.Google Scholar
G. Dhiman, G. Marchetti, and T. Rosing. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design. In ISLPED-2009. Google ScholarDigital Library
J. Demme et al, Rapid Identification of Architectural Bottlenecks via Precise Event Counting. In ISCA, 2011 Google ScholarDigital Library
G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In HPCA-8, 2002. Google ScholarDigital Library
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. In Journal of Supercomputing, 28(1), 2004. Google ScholarDigital Library
I. Hur and C. Lin. Memory scheduling for modern microprocessors. ACM Transactions on Computer Systems, 25(4), December 2007. Google ScholarDigital Library
R. Iyer et al, QoS policy and architecture for cache/memory in CMP platforms. In SIGMETRICS-07, 2007. Google ScholarDigital Library
C. J. Lee et al. Improving memory bank-level parallelism in the presence of prefetching. In MICRO-42, 2009. Google ScholarDigital Library
Y. Kim, M. Papamicheal and O. Mutlu. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In MICRO-43, 2010. Google ScholarDigital Library
Y . Kim et al. A TLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16, 2010.Google Scholar
D. Kaseridis, J. Stuecheli, and L. K. John. Minimalist Open- page: A DRAM Page-mode Scheduling Policy for the many- core Era. In MICRO-44, 2011. Google ScholarDigital Library
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. In Micro-41, 2008. Google ScholarDigital Library
G. L. Yuan et al. Complexity effective memory access scheduling for many-core accelerator architectures. In MICRO-42, 2009. Google ScholarDigital Library
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In HPCA-14, 2008.Google Scholar
J. Liedtke, H. Haertig, and M. Hohmuth. OS-Controlled Cache Predictability for Real-Time Systems. In RTAS-3, 1997. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008. Google ScholarDigital Library
T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarDigital Library
W. Mi, X. Feng, J. Xue, and Y. Jia. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proc. the 2010 IFIP Int'l Conf. Network and Parallel Computing (NPC), Sep. 2010. Google ScholarDigital Library
C. Natarajan, B. Christenson, and F. Briggs. A Study of Performance Impact of Memory Controller Features in Multi- Processor Environment. In Proceedings of WMPI, 2004. Google ScholarDigital Library
S. Prashanth et al. Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Micro-44, 2011.Google Scholar
M. K. Qureshi, and Y . N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006. Google ScholarDigital Library
M. K. Jeong, D. H. Yoon et al. Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems. In HPCA- 18, 2012. Google ScholarDigital Library
S. Rixner, W. J. Dally, U. J. Kapasi, P. R. Mattson, and J. D. Owens. Memory access scheduling. In ISCA-27, 2000. Google ScholarDigital Library
B. Rogers et al. Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling. In ISCA-42, 2009. Google ScholarDigital Library
K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. Micro-Pages: Increasing DRAM Efficiency with Locality-Aware. In ASPLOS-2010. Google ScholarDigital Library
H. S. Stone, J. Turek, and J. L. Wolf. Optimal Partitioning of Cache Memory. In IEEE Transactions on Computers, 41(9), 1992. Google ScholarDigital Library
A. Udipi et al. Rethinking DRAM design and organization for energy-constrained multi-cores. ISCA, June 2010. Google ScholarDigital Library
Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In MICRO-33, 2000. Google ScholarDigital Library
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS-XV, 2010. Google ScholarDigital Library

Index Terms

A software memory partition approach for eliminating bank-level interference in multicore systems

Recommendations

BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

The main memory system is a shared resource in modern multicore machines that can result in serious interference leading to reduced throughput and unfairness. Many new memory scheduling mechanisms have been proposed to address the interference problem. ...
Read More
Reducing memory interference in multicore systems via application-aware memory channel partitioning
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Main memory is a major shared resource among cores in a multicore system. If the interference between different applications' memory requests is not controlled effectively, system performance can degrade significantly. Previous work aimed to mitigate ...
Read More
Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems

In heterogeneous multicore systems, the memory subsystem, including the last-level cache and DRAM, is widely shared among the CPU, the GPU, and the real-time cores. Due to their distinct memory traffic patterns, heterogeneous cores result in more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
September 2012
512 pages
ISBN:9781450311823
DOI:10.1145/2370816
General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 September 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bank
data allocation
interference
main memory
memory scheduling
multicore
partition
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 208
  Total Citations
  View Citations
- 1,011
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A software memory partition approach for eliminating bank-level interference in multicore systems

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

Reducing memory interference in multicore systems via application-aware memory channel partitioning

Harvesting Row-Buffer Hits via Orchestrated Last-Level Cache and DRAM Scheduling for Heterogeneous Multicore Systems