research-article

Latency sensitivity-based cache partitioning for heterogeneous multi-core architecture

Authors:

Cheng-Hsuan Li,

Chia-Lin YangAuthors Info & Claims

DAC '16: Proceedings of the 53rd Annual Design Automation Conference

Article No.: 5, Pages 1 - 6

https://doi.org/10.1145/2897937.2898036

Published: 05 June 2016 Publication History

Abstract

Shared last-level cache (LLC) management is a critical design issue for heterogeneous multi-cores. In this paper, we observe two major challenges: the contribution of LLC latency to overall performance varies among applications/cores and also across time; overlooking the off-chip latency factor often leads to adverse effects on overall performance. Hence, we propose a Latency Sensitivity-based Cache Partitioning (LSP) framework, including a lightweight runtime mechanism to quantify the latency-sensitivity and a new cost function to guide the LLC partitioning. Results show that LSP improves the overall throughput by 8% on average (27% at most), compared with the state-of-the-art partitioning mechanism, TAP.

References

[1]

R. Ausavarungnirun, K. K.-W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In ISCA, 2012.

Digital Library

[2]

A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In ISPASS, 2009.

[3]

A. R. Brodtkorb, T. R. Hagen, and M. L. SÃętra. Graphics processing unit (gpu) programming strategies and trends in gpu computing. J. Parallel Distrib. Comput., 2013.

Digital Library

[4]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009.

Digital Library

[5]

M. Garrido and J. Grajal. Continuous-flow variable-length memoryless linear regression architecture. Electronics Letters, 2013.

[6]

L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on cmps: Caches as a shared resource. In PACT, 2006.

Digital Library

[7]

A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA, 2010.

Digital Library

[8]

O. Kayiran, N. Nachiappan, A. Jog, R. Ausavarungnirun, M. Kandemir, G. Loh, O. Mutlu, and C. Das. Managing gpu concurrency in heterogeneous architectures. In MICRO, 2014.

Digital Library

[9]

J. Lee and H. Kim. Tap: A tlp-aware cache management policy for a cpu-gpu heterogeneous architecture. In HPCA, 2012.

Digital Library

[10]

X. Lin and R. Balasubramonian. Refining the utility metric for utility-based cache partitioning. In WDDD, 2011.

[11]

J. Lotze, P. Sutton, and H. Lahlou. Many-core accelerated libor swaption portfolio pricing. In SCC, 2012.

Digital Library

[12]

V. Mekkat, A. Holey, P.-C. Yew, and A. Zhai. Managing shared last-level cache in a heterogeneous multicore processor. In PACT, 2013.

Digital Library

[13]

A. Patel, F. Afram, S. Chen, and K. Ghose. Marss: A full system simulator for multicore x86 cpus. In DAC, 2011.

Digital Library

[14]

M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.

Digital Library

[15]

B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin. Scaling the bandwidth wall: Challenges in and avenues for cmp scaling. SIGARCH Comput. Archit. News, 2009.

Digital Library

[16]

P. Rosenfeld, E. Cooper-Balis, and B. Jacob. Dramsim2: A cycle accurate memory system simulator. Computer Architecture Letters, 2011.

Digital Library

[17]

G. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. The Journal of Supercomputing, 2004.

Digital Library

[18]

P.-H. Wang, G.-H. Liu, J.-C. Yeh, T.-M. Chen, H.-Y. Huang, C.-L. Yang, S.-L. Liu, and J. Greensky. Full system simulation framework for integrated cpu/gpu architecture. In VLSI-DAT, 2014.

[19]

P.-H. Wang, C.-W. Lo, C.-L. Yang, and Y.-J. Cheng. A cycle-level simt-gpu simulation framework. In ISPASS, 2012.

Digital Library

Cited By

Bagchi AJoshi DPanda P(2023)COBRRA: COntention-aware cache Bypass with Request-Response ArbitrationACM Transactions on Embedded Computing Systems10.1145/363274823:1(1-30)Online publication date: 17-Nov-2023
https://dl.acm.org/doi/10.1145/3632748
Modi GBagchi AJindal NMandal APanda P(2023)CABARRE: Request Response Arbitration for Shared Cache ManagementACM Transactions on Embedded Computing Systems10.1145/360809622:5s(1-24)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3608096
Du ZZhang QLin MLi SLi XJu L(2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
https://doi.org/10.1109/TCAD.2022.3179323
Show More Cited By

Recommendations

Exploring cache bypassing and partitioning for multi-tasking on GPUs
ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design

Graphics Processing Units (GPUs) computing has become ubiquitous for embedded system, evidenced by its wide adoption for various general purpose applications. As more and more applications are accelerated by GPUs, multi-tasking scenario starts to ...
Time-sensitivity-aware shared cache architecture for multi-core embedded systems
Abstract
In embedded systems such as automotive systems, multi-core processors are expected to improve performance and reduce manufacturing cost by integrating multiple functions on a single chip. However, inter-core interference in shared last-level cache ...
Code-based cache partitioning for improving hardware cache performance
ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication

Recently, improving hardware cache performance is getting more important, because the performance gap between processor and memory has caused "memory wall" problem. Most cache designs are based on the LRU replacement policy which is effective for high-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

DAC '16: Proceedings of the 53rd Annual Design Automation Conference

June 2016

1048 pages

ISBN:9781450342360

DOI:10.1145/2897937

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

DAC '16

DAC '16: The 53rd Annual Design Automation Conference 2016

June 5 - 9, 2016

Texas, Austin

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
374
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bagchi AJoshi DPanda P(2023)COBRRA: COntention-aware cache Bypass with Request-Response ArbitrationACM Transactions on Embedded Computing Systems10.1145/363274823:1(1-30)Online publication date: 17-Nov-2023
https://dl.acm.org/doi/10.1145/3632748
Modi GBagchi AJindal NMandal APanda P(2023)CABARRE: Request Response Arbitration for Shared Cache ManagementACM Transactions on Embedded Computing Systems10.1145/360809622:5s(1-24)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3608096
Du ZZhang QLin MLi SLi XJu L(2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
https://doi.org/10.1109/TCAD.2022.3179323
Masola ACapodieci N(2023)Optimization strategies for GPUs: an overview of architectural approachesInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2023.217375238:2(140-154)Online publication date: 5-Feb-2023
https://doi.org/10.1080/17445760.2023.2173752
Shahrad MElnikety SBianchini R(2021)Provisioning Differentiated Last-Level Cache Allocations to VMs in Public CloudsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3487006(319-334)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1145/3472883.3487006
Liao JChen HChen Y(2020)A Cache Contention-aware Run-time Scheduling for Power-constrained Asymmetric Multicore ProcessorsProceedings of the International Conference on Research in Adaptive and Convergent Systems10.1145/3400286.3418230(207-212)Online publication date: 13-Oct-2020
https://dl.acm.org/doi/10.1145/3400286.3418230
Shantharama PThyagaturu AReisslein M(2020)Hardware-Accelerated Platforms and Infrastructures for Network Functions: A Survey of Enabling Technologies and Research StudiesIEEE Access10.1109/ACCESS.2020.30082508(132021-132085)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3008250
Tiwari STuli SAhmad IAgarwal APanda PSubramoney S(2019)REALACM Transactions on Embedded Computing Systems10.1145/336210018:6(1-24)Online publication date: 15-Nov-2019
https://dl.acm.org/doi/10.1145/3362100
Song YAlavoine OLin B(2019)A Self-aware Resource Management Framework for Heterogeneous Multicore SoCs with Diverse QoS TargetsACM Transactions on Architecture and Code Optimization10.1145/331980416:2(1-23)Online publication date: 9-Apr-2019
https://dl.acm.org/doi/10.1145/3319804
Jain PGautam DSurve S(2019)Resource-Based Modeling of Applications on Multi-cores Using Adapted Tilman ModelComputing in Engineering and Technology10.1007/978-981-32-9515-5_38(397-409)Online publication date: 17-Oct-2019
https://doi.org/10.1007/978-981-32-9515-5_38
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten