PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core

Jia, Gangyong; Shi, Liang; Li, Xi; Dai, Dong

doi:10.1007/s11265-015-1015-3

PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core

Published: 30 June 2015

Volume 84, pages 139–150, (2016)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Gangyong Jia¹,
Liang Shi²,
Xi Li³ &
…
Dong Dai⁴

301 Accesses
1 Citation
Explore all metrics

Abstract

In contemporary multi-core systems, memory is shared among a number of concurrent threads. Memory contention and interference are becoming increasingly severe incurring such problems as performance degradation, unfair resource sharing and priority inversion. In this paper, we aim at the challenge of improving performance and fairness for concurrent threads while minimizing energy consumption in main memory. Therefore, we propose PUMA, a novel solution that reduces memory contention and interference by judiciously partitioning threads among cores and allocating each core exclusive memory banks and bandwidth based on thread’s characteristics. Our results demonstrate that PUMA is able to improve both performance and fairness while reducing energy consumption significantly compared to existing memory management approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

Article 25 July 2018

Geunchul Park, Seungwoo Rho, … Dukyun Nam

A Comparative Review of Contention-Aware Scheduling Algorithms to Avoid Contention in Multicore Systems

References

Kim, Y., Papamicheal, M., & Mutlu, O. (2010). Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In MICRO-43.
Kim, Y., & et al. (2010). ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16.
Mutlu, O., & Moscibroda, T. (2008). Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35.
Moscibroda, T., & Mutlu, O. (2007). Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security.
Mutlu, O., & Moscibroda, T. (2007). Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40.
Prashanth, S., & et al. (2011). Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Micro-44.
Ausavarungnirun, R., & et al. (2012). Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In ISCA.
Nesbit, K.J., & et al. (2006). Fair queuing memory systems. In MICRO.
Jeong, M.K., & et al. (2012). Balancing DRAM locality and parallelism in shared memory CMP systems. In HPCA.
Ebrahimi, E., & et al. (2010). Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS.
Deng, Q., Meisner, D., Ramos, L., Wenisch, T.F., & Bianchini, R. (2011). MemScale: Active Low-Power Modes for Main Memory. In ASPLOS.
Cho, S., & Jin, L. (2006). Managing Distributed, shared L2 Caches through OS-Level page Allocation. In MICRO-39.
Zhuravlev, S., Blagodurov, S., & Fedorova, A. (2010). Addressing shared resource contention in multicore processors via scheduling. In ASPLOS - XV.
Dhiman, G., Marchetti, G., & Rosing, T. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design. In ISLPED-2009.
Knauerhase, R., Brett, P., Hohlt, B., Li, T., & Hahn, S. (2008). Using OS Observations to Improve Performance in Multicore Systems. In Micro- 41.
Cuppu, V., Jacob, B., Davis, B., & Mudge, T. (2001). High-performance drams in workstation environments. IEEE Transactions on Computer, 50(11), 1133–1153.
Article Google Scholar
Davis, B. (2001). Modern dram architectures. Ph.D. thesis, Department of Computer Science and Engineering, University of Michigan.
Sudan, K., Chatterjee, N., Nellans, D., Awasthi, M., Balasubramonian, R., & Davis, A. Micro- Pages: Increasing DRAM Efficiency with Locality- Aware. In ASPLOS- 2010.
Avadh, P., & et al. (2011). MARSSx86: a full system simulator for x86 CPUs. In DAC.
Liu, L., Cui, Z., Xing, M., Bao, Y., Chen, M., & Wu, C (2012). A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems. In PACT’12.
Kopytov, A. (2004). SysBench: a system performance benchmark. http://sysbench.sourceforge.net/index.html.

Download references

Acknowledgment

This work was supported by Qing Lan Project, the National Science Foundation of China under grants (No. 61402059, No. 61472109, No. 61300033, No. 61402140). Zhejiang provincial Natural Science Foundation (No. LQ14F020011, No. LY13F020045). Jiangsu provincial Natural Science Foundation (No. SBK201240198) and Fundamental Research Funds for the Central Universities (106112014CDJZR185502).

Author information

Authors and Affiliations

Department of Computer Science, Hangzhou Dianzi University, Hangzhou, China
Gangyong Jia
Department of Computer Science, Chongqing University, Chongqing, China
Liang Shi
Department of Computer Science, University of Science and Technology of China, Hefei, China
Xi Li
Department of Computer Science Texas, Technology University, Lubbock, TX, USA
Dong Dai

Authors

Gangyong Jia
View author publications
You can also search for this author in PubMed Google Scholar
Liang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xi Li
View author publications
You can also search for this author in PubMed Google Scholar
Dong Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Shi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jia, G., Shi, L., Li, X. et al. PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core. J Sign Process Syst 84, 139–150 (2016). https://doi.org/10.1007/s11265-015-1015-3

Download citation

Received: 26 October 2014
Revised: 24 March 2015
Accepted: 07 May 2015
Published: 30 June 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s11265-015-1015-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core

Abstract

Access this article

Similar content being viewed by others

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

A Comparative Review of Contention-Aware Scheduling Algorithms to Avoid Contention in Multicore Systems

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core

Abstract

Access this article

Similar content being viewed by others

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

A Comparative Review of Contention-Aware Scheduling Algorithms to Avoid Contention in Multicore Systems

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation