Abstract
In contemporary multi-core systems, memory is shared among a number of concurrent threads. Memory contention and interference are becoming increasingly severe incurring such problems as performance degradation, unfair resource sharing and priority inversion. In this paper, we aim at the challenge of improving performance and fairness for concurrent threads while minimizing energy consumption in main memory. Therefore, we propose PUMA, a novel solution that reduces memory contention and interference by judiciously partitioning threads among cores and allocating each core exclusive memory banks and bandwidth based on thread’s characteristics. Our results demonstrate that PUMA is able to improve both performance and fairness while reducing energy consumption significantly compared to existing memory management approaches.
Similar content being viewed by others
References
Kim, Y., Papamicheal, M., & Mutlu, O. (2010). Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior. In MICRO-43.
Kim, Y., & et al. (2010). ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16.
Mutlu, O., & Moscibroda, T. (2008). Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35.
Moscibroda, T., & Mutlu, O. (2007). Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security.
Mutlu, O., & Moscibroda, T. (2007). Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40.
Prashanth, S., & et al. (2011). Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning. In Micro-44.
Ausavarungnirun, R., & et al. (2012). Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In ISCA.
Nesbit, K.J., & et al. (2006). Fair queuing memory systems. In MICRO.
Jeong, M.K., & et al. (2012). Balancing DRAM locality and parallelism in shared memory CMP systems. In HPCA.
Ebrahimi, E., & et al. (2010). Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS.
Deng, Q., Meisner, D., Ramos, L., Wenisch, T.F., & Bianchini, R. (2011). MemScale: Active Low-Power Modes for Main Memory. In ASPLOS.
Cho, S., & Jin, L. (2006). Managing Distributed, shared L2 Caches through OS-Level page Allocation. In MICRO-39.
Zhuravlev, S., Blagodurov, S., & Fedorova, A. (2010). Addressing shared resource contention in multicore processors via scheduling. In ASPLOS - XV.
Dhiman, G., Marchetti, G., & Rosing, T. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design. In ISLPED-2009.
Knauerhase, R., Brett, P., Hohlt, B., Li, T., & Hahn, S. (2008). Using OS Observations to Improve Performance in Multicore Systems. In Micro- 41.
Cuppu, V., Jacob, B., Davis, B., & Mudge, T. (2001). High-performance drams in workstation environments. IEEE Transactions on Computer, 50(11), 1133–1153.
Davis, B. (2001). Modern dram architectures. Ph.D. thesis, Department of Computer Science and Engineering, University of Michigan.
Sudan, K., Chatterjee, N., Nellans, D., Awasthi, M., Balasubramonian, R., & Davis, A. Micro- Pages: Increasing DRAM Efficiency with Locality- Aware. In ASPLOS- 2010.
Avadh, P., & et al. (2011). MARSSx86: a full system simulator for x86 CPUs. In DAC.
Liu, L., Cui, Z., Xing, M., Bao, Y., Chen, M., & Wu, C (2012). A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems. In PACT’12.
Kopytov, A. (2004). SysBench: a system performance benchmark. http://sysbench.sourceforge.net/index.html.
Acknowledgment
This work was supported by Qing Lan Project, the National Science Foundation of China under grants (No. 61402059, No. 61472109, No. 61300033, No. 61402140). Zhejiang provincial Natural Science Foundation (No. LQ14F020011, No. LY13F020045). Jiangsu provincial Natural Science Foundation (No. SBK201240198) and Fundamental Research Funds for the Central Universities (106112014CDJZR185502).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jia, G., Shi, L., Li, X. et al. PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core. J Sign Process Syst 84, 139–150 (2016). https://doi.org/10.1007/s11265-015-1015-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-1015-3