research-article

SCUD: a fast single-pass L1 cache simulation approach for embedded processors with round-robin replacement policy

Authors:

Mohammad Shihabul Haque,

Jorgen Peddersen,

Andhi Janapsatya,

Sri ParameswaranAuthors Info & Claims

DAC '10: Proceedings of the 47th Design Automation Conference

Pages 356 - 361

https://doi.org/10.1145/1837274.1837364

Published: 13 June 2010 Publication History

Abstract

Embedded systems designers are free to choose the most suitable configuration of L1 cache in modern processor based SoCs. Choosing the appropriate L1 cache configuration necessitates the simulation of long memory access traces to accurately obtain hit/miss rates. The long execution time taken to simulate these traces, particularly separate simulation for each configuration is a major drawback. Researchers have proposed techniques to speed up the simulation of caches with LRU replacement policy. These techniques are of little use in the majority of embedded processors as these processors utilize Round-robin policy based caches. In this paper we propose a fast L1 cache simulation approach, called SCUD (Sorted Collection of Unique Data), for caches with the Round-robin policy. SCUD is a single-pass cache simulator that can simulate multiple L1 cache configurations (with varying set sizes and associativities) by reading the application trace once. Utilizing fast binary searches in a novel data structure, SCUD simulates an application trace significantly faster than a widely used single configuration cache simulator (Dinero IV). We show SCUD can simulate a set of cache configurations up to 57 times faster than Dinero IV. SCUD shows an average speed up of 19.34 times over Dinero IV for Mediabench applications, and an average speed up of over 10 times for SPEC CPU2000 applications.

References

[1]

Intel xscale microprocessor data book. www.intel.com.

[2]

Xtensa processor. http://www.tensilica.com/.

[3]

Xtensa lx2 data book. www.tensilica.com. 3/2007.

[4]

H. Al-Zoubi, A. Milenkovic, and M. Milenkovic. Performance evaluation of cache replacement policies for the spec cpu2000 benchmark suite. In ACM-SE 42: Proceedings of the 42nd annual Southeast regional conference, 2004.

Digital Library

[5]

D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 83--94, 2000.

Digital Library

[6]

D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. SIGARCH Comput. Archit. News, 25(3):13--25, 1997.

Digital Library

[7]

J. Edler and M. D. Hill. Dinero iv trace-driven uniprocessor cache simulator. http://www.cs.wisc.edu/markhill/DineroIV/, 2004.

[8]

W. Fornaciari, D. Sciuto, C. Silvano, and V. Zaccaria. A design framework to efficiently explore energy-delay tradeoffs. In CODES '01, 2001.

Digital Library

[9]

J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBN System Journal, 9(2):78--117, 1970.

Digital Library

[10]

S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21:703--746, 1999.

Digital Library

[11]

H. Greg, P. Erez, L. Jeremy, and C. Brad. Simpoint 3.0: Faster and more flexible program analysis. In Journal of Instruction Level Parallel, 2005.

[12]

M. S. Haque, A. Janapsatya, and S. Parameswaran. Susesim: a fast simulation strategy to find optimal l1 cache configuration for embedded systems. In CODES+ISSS '09: Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, 2009.

Digital Library

[13]

M. S. Haque, J. Peddersen, A. Janapsatya, and S. Parameswaran. Dew: A fast level 1 cache simulation approach for embedded processors with fifo replacement policy. In DATE '10: Proceedings of Design Automation and Testing in Europe 2010, 2010.

Digital Library

[14]

M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Trans. Comput., 38(12):1612--1630, 1989.

Digital Library

[15]

K. Horiuchi, S. Kohara, N. Togawa, M. Yanagisawa, and T. Ohtsuki. A data cache optimization system for application processor cores and its experimental evaluation. In IEICE Technical Report, VLD2006-122, ICD2006-213, 2006.

[16]

A. Janapsatya, A. Ignjatović, and S. Parameswaran. Finding optimal l1 cache configuration for embedded systems. In ASP-DAC '06, 2006.

Digital Library

[17]

S. Leibson and J. Massingham. Flix: Fast relief for performance-hungry embedded applications. Technical report, Tensilica Inc., 2005.

[18]

X. Li, H. S. Negi, T. Mitra, and A. Roychoudhury. Design space exploration of caches using compressed traces. In ICS '04.

Digital Library

[19]

J. J. Pieper, A. Mellan, J. M. Paul, D. E. Thomas, and F. Karim. High level cache simulation for heterogeneous multiprocessors. In DAC '04, 2004.

Digital Library

[20]

D. Ponomarev, G. Kucuk, and K. Ghose. Accupower: An accurate power estimation tool for superscalar microprocessors. In DATE '02: Proceedings of the conference on Design, automation and test in Europe, 2002.

Digital Library

[21]

R. A. Sugumar and S. G. Abraham. Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst., 13(1):32--56, 1995.

Digital Library

[22]

N. Tojo, N. Togawa, M. Yanagisawa, and T. Ohtsuki. Exact and fast 11 cache simulation for embedded systems. In ASP-DAC '09: Proceedings of the 2009 Conference on Asia and South Pacific Design Automation. IEEE Press, 2009.

Digital Library

[23]

X. Vera, N. Bermudo, J. Llosa, and A. González. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Trans. Prog. Lang. Syst., 26(2):263--300, 2004.

Digital Library

Cited By

Eusse JFernandez FLeupers RAscheid G(2016)Concurrent memory subsystem and application optimization for ASIP design2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818325(1-10)Online publication date: Jul-2016
https://doi.org/10.1109/SAMOS.2016.7818325
Nawinne ISchneider JJavaid HParameswaran SFettweis GNebel W(2014)Hardware-based fast exploration of cache hierarchies in application specific MPSoCsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2617020(1-6)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2617020
Schneider JPeddersen JParameswaran S(2014)MASH{fifo}Proceedings of the 51st Annual Design Automation Conference10.1145/2593069.2593159(1-6)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1145/2593069.2593159
Show More Cited By

Index Terms

SCUD: a fast single-pass L1 cache simulation approach for embedded processors with round-robin replacement policy
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Hardware
  1. Hardware validation

Recommendations

HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support
CODES+ISSS '11: Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

The configuration of L1 caches has a significant impact on the performance and energy consumption of an embedded system. Normally, an embedded system is designed for a specific application or a domain of applications. Performing simulations on the ...
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems
CODES+ISSS '09: Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis

Simulation of an application is a popular and reliable approach to find the optimal configuration of level one cache memory for an application specific embedded system processor. However, long simulation time is one of the main disadvantages of ...
Two Fast and High-Associativity Cache Schemes

A traditional implementation of the set-associative cache has the disadvantage of longer access cycle times than that of a direct-mapped cache. Several methods have been proposed for implementing associativity in non-traditional ways. However, most of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '10: Proceedings of the 47th Design Automation Conference

June 2010

1036 pages

ISBN:9781450300025

DOI:10.1145/1837274

General Chair:
Sachin Sapatnekar
Univ. of Minnesota, Minneapolis, MN

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '10

Sponsor:

EDAC
SIGDA

DAC '10: The 47th Annual Design Automation Conference 2010

June 13 - 18, 2010

California, Anaheim

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
167
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Eusse JFernandez FLeupers RAscheid G(2016)Concurrent memory subsystem and application optimization for ASIP design2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818325(1-10)Online publication date: Jul-2016
https://doi.org/10.1109/SAMOS.2016.7818325
Nawinne ISchneider JJavaid HParameswaran SFettweis GNebel W(2014)Hardware-based fast exploration of cache hierarchies in application specific MPSoCsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2617020(1-6)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2617020
Schneider JPeddersen JParameswaran S(2014)MASH{fifo}Proceedings of the 51st Annual Design Automation Conference10.1145/2593069.2593159(1-6)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1145/2593069.2593159
Ravishankar PAbdi S(2013)pCacheProceedings of the 2013 Euromicro Conference on Digital System Design10.1109/DSD.2013.19(103-110)Online publication date: 4-Sep-2013
https://dl.acm.org/doi/10.1109/DSD.2013.19
Haque MPeddersen JParameswaran SPhillips JHu AGraeb H(2011)CIPARSimProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132357(126-133)Online publication date: 7-Nov-2011
https://dl.acm.org/doi/10.5555/2132325.2132357
Tawada MYanagisawa MOhtsuki TTogawa N(2011)Exact and fast L1 cache configuration simulation for embedded systems with FIFO/PLRU cache replacement policiesProceedings of 2011 International Symposium on VLSI Design, Automation and Test10.1109/VDAT.2011.5783622(1-4)Online publication date: Apr-2011
https://doi.org/10.1109/VDAT.2011.5783622
Haque MPeddersen JParameswaran S(2011)CIPARSimProceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design10.1109/ICCAD.2011.6105316(126-133)Online publication date: 7-Nov-2011
https://dl.acm.org/doi/10.1109/ICCAD.2011.6105316

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten