Skip to main content
Log in

PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In contemporary multi-core systems, memory is shared among a number of concurrent threads. Memory contention and interference are becoming increasingly severe incurring such problems as performance degradation, unfair resource sharing and priority inversion. In this paper, we aim at the challenge of improving performance and fairness for concurrent threads while minimizing energy consumption in main memory. Therefore, we propose PUMA, a novel solution that reduces memory contention and interference by judiciously partitioning threads among cores and allocating each core exclusive memory banks and bandwidth based on thread’s characteristics. Our results demonstrate that PUMA is able to improve both performance and fairness while reducing energy consumption significantly compared to existing memory management approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15

Similar content being viewed by others

References

  1. Kim, Y., Papamicheal, M., & Mutlu, O. (2010). Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In MICRO-43.

  2. Kim, Y., & et al. (2010). ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16.

  3. Mutlu, O., & Moscibroda, T. (2008). Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35.

  4. Moscibroda, T., & Mutlu, O. (2007). Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security.

  5. Mutlu, O., & Moscibroda, T. (2007). Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40.

  6. Prashanth, S., & et al. (2011). Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Micro-44.

  7. Ausavarungnirun, R., & et al. (2012). Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In ISCA.

  8. Nesbit, K.J., & et al. (2006). Fair queuing memory systems. In MICRO.

  9. Jeong, M.K., & et al. (2012). Balancing DRAM locality and parallelism in shared memory CMP systems. In HPCA.

  10. Ebrahimi, E., & et al. (2010). Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS.

  11. Deng, Q., Meisner, D., Ramos, L., Wenisch, T.F., & Bianchini, R. (2011). MemScale: Active Low-Power Modes for Main Memory. In ASPLOS.

  12. Cho, S., & Jin, L. (2006). Managing Distributed, shared L2 Caches through OS-Level page Allocation. In MICRO-39.

  13. Zhuravlev, S., Blagodurov, S., & Fedorova, A. (2010). Addressing shared resource contention in multicore processors via scheduling. In ASPLOS - XV.

  14. Dhiman, G., Marchetti, G., & Rosing, T. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design. In ISLPED-2009.

  15. Knauerhase, R., Brett, P., Hohlt, B., Li, T., & Hahn, S. (2008). Using OS Observations to Improve Performance in Multicore Systems. In Micro- 41.

  16. Cuppu, V., Jacob, B., Davis, B., & Mudge, T. (2001). High-performance drams in workstation environments. IEEE Transactions on Computer, 50(11), 1133–1153.

    Article  Google Scholar 

  17. Davis, B. (2001). Modern dram architectures. Ph.D. thesis, Department of Computer Science and Engineering, University of Michigan.

  18. Sudan, K., Chatterjee, N., Nellans, D., Awasthi, M., Balasubramonian, R., & Davis, A. Micro- Pages: Increasing DRAM Efficiency with Locality- Aware. In ASPLOS- 2010.

  19. Avadh, P., & et al. (2011). MARSSx86: a full system simulator for x86 CPUs. In DAC.

  20. Liu, L., Cui, Z., Xing, M., Bao, Y., Chen, M., & Wu, C (2012). A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems. In PACT’12.

  21. Kopytov, A. (2004). SysBench: a system performance benchmark. http://sysbench.sourceforge.net/index.html.

Download references

Acknowledgment

This work was supported by Qing Lan Project, the National Science Foundation of China under grants (No. 61402059, No. 61472109, No. 61300033, No. 61402140). Zhejiang provincial Natural Science Foundation (No. LQ14F020011, No. LY13F020045). Jiangsu provincial Natural Science Foundation (No. SBK201240198) and Fundamental Research Funds for the Central Universities (106112014CDJZR185502).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Shi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, G., Shi, L., Li, X. et al. PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core. J Sign Process Syst 84, 139–150 (2016). https://doi.org/10.1007/s11265-015-1015-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-1015-3

Keywords

Navigation