skip to main content
10.1145/1837274.1837364acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

SCUD: a fast single-pass L1 cache simulation approach for embedded processors with round-robin replacement policy

Published: 13 June 2010 Publication History

Abstract

Embedded systems designers are free to choose the most suitable configuration of L1 cache in modern processor based SoCs. Choosing the appropriate L1 cache configuration necessitates the simulation of long memory access traces to accurately obtain hit/miss rates. The long execution time taken to simulate these traces, particularly separate simulation for each configuration is a major drawback. Researchers have proposed techniques to speed up the simulation of caches with LRU replacement policy. These techniques are of little use in the majority of embedded processors as these processors utilize Round-robin policy based caches. In this paper we propose a fast L1 cache simulation approach, called SCUD (Sorted Collection of Unique Data), for caches with the Round-robin policy. SCUD is a single-pass cache simulator that can simulate multiple L1 cache configurations (with varying set sizes and associativities) by reading the application trace once. Utilizing fast binary searches in a novel data structure, SCUD simulates an application trace significantly faster than a widely used single configuration cache simulator (Dinero IV). We show SCUD can simulate a set of cache configurations up to 57 times faster than Dinero IV. SCUD shows an average speed up of 19.34 times over Dinero IV for Mediabench applications, and an average speed up of over 10 times for SPEC CPU2000 applications.

References

[1]
Intel xscale microprocessor data book. www.intel.com.
[2]
Xtensa processor. http://www.tensilica.com/.
[3]
Xtensa lx2 data book. www.tensilica.com. 3/2007.
[4]
H. Al-Zoubi, A. Milenkovic, and M. Milenkovic. Performance evaluation of cache replacement policies for the spec cpu2000 benchmark suite. In ACM-SE 42: Proceedings of the 42nd annual Southeast regional conference, 2004.
[5]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 83--94, 2000.
[6]
D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. SIGARCH Comput. Archit. News, 25(3):13--25, 1997.
[7]
J. Edler and M. D. Hill. Dinero iv trace-driven uniprocessor cache simulator. http://www.cs.wisc.edu/markhill/DineroIV/, 2004.
[8]
W. Fornaciari, D. Sciuto, C. Silvano, and V. Zaccaria. A design framework to efficiently explore energy-delay tradeoffs. In CODES '01, 2001.
[9]
J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBN System Journal, 9(2):78--117, 1970.
[10]
S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21:703--746, 1999.
[11]
H. Greg, P. Erez, L. Jeremy, and C. Brad. Simpoint 3.0: Faster and more flexible program analysis. In Journal of Instruction Level Parallel, 2005.
[12]
M. S. Haque, A. Janapsatya, and S. Parameswaran. Susesim: a fast simulation strategy to find optimal l1 cache configuration for embedded systems. In CODES+ISSS '09: Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, 2009.
[13]
M. S. Haque, J. Peddersen, A. Janapsatya, and S. Parameswaran. Dew: A fast level 1 cache simulation approach for embedded processors with fifo replacement policy. In DATE '10: Proceedings of Design Automation and Testing in Europe 2010, 2010.
[14]
M. D. Hill and A. J. Smith. Evaluating associativity in cpu caches. IEEE Trans. Comput., 38(12):1612--1630, 1989.
[15]
K. Horiuchi, S. Kohara, N. Togawa, M. Yanagisawa, and T. Ohtsuki. A data cache optimization system for application processor cores and its experimental evaluation. In IEICE Technical Report, VLD2006-122, ICD2006-213, 2006.
[16]
A. Janapsatya, A. Ignjatović, and S. Parameswaran. Finding optimal l1 cache configuration for embedded systems. In ASP-DAC '06, 2006.
[17]
S. Leibson and J. Massingham. Flix: Fast relief for performance-hungry embedded applications. Technical report, Tensilica Inc., 2005.
[18]
X. Li, H. S. Negi, T. Mitra, and A. Roychoudhury. Design space exploration of caches using compressed traces. In ICS '04.
[19]
J. J. Pieper, A. Mellan, J. M. Paul, D. E. Thomas, and F. Karim. High level cache simulation for heterogeneous multiprocessors. In DAC '04, 2004.
[20]
D. Ponomarev, G. Kucuk, and K. Ghose. Accupower: An accurate power estimation tool for superscalar microprocessors. In DATE '02: Proceedings of the conference on Design, automation and test in Europe, 2002.
[21]
R. A. Sugumar and S. G. Abraham. Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst., 13(1):32--56, 1995.
[22]
N. Tojo, N. Togawa, M. Yanagisawa, and T. Ohtsuki. Exact and fast 11 cache simulation for embedded systems. In ASP-DAC '09: Proceedings of the 2009 Conference on Asia and South Pacific Design Automation. IEEE Press, 2009.
[23]
X. Vera, N. Bermudo, J. Llosa, and A. González. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Trans. Prog. Lang. Syst., 26(2):263--300, 2004.

Cited By

View all
  • (2016)Concurrent memory subsystem and application optimization for ASIP design2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818325(1-10)Online publication date: Jul-2016
  • (2014)Hardware-based fast exploration of cache hierarchies in application specific MPSoCsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2617020(1-6)Online publication date: 24-Mar-2014
  • (2014)MASH{fifo}Proceedings of the 51st Annual Design Automation Conference10.1145/2593069.2593159(1-6)Online publication date: 1-Jun-2014
  • Show More Cited By

Index Terms

  1. SCUD: a fast single-pass L1 cache simulation approach for embedded processors with round-robin replacement policy

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DAC '10: Proceedings of the 47th Design Automation Conference
      June 2010
      1036 pages
      ISBN:9781450300025
      DOI:10.1145/1837274
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 June 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. L1 cache
      2. cache simulation
      3. miss rate
      4. round robin
      5. simulation

      Qualifiers

      • Research-article

      Conference

      DAC '10
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

      Upcoming Conference

      DAC '25
      62nd ACM/IEEE Design Automation Conference
      June 22 - 26, 2025
      San Francisco , CA , USA

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 18 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Concurrent memory subsystem and application optimization for ASIP design2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818325(1-10)Online publication date: Jul-2016
      • (2014)Hardware-based fast exploration of cache hierarchies in application specific MPSoCsProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2617020(1-6)Online publication date: 24-Mar-2014
      • (2014)MASH{fifo}Proceedings of the 51st Annual Design Automation Conference10.1145/2593069.2593159(1-6)Online publication date: 1-Jun-2014
      • (2013)pCacheProceedings of the 2013 Euromicro Conference on Digital System Design10.1109/DSD.2013.19(103-110)Online publication date: 4-Sep-2013
      • (2011)CIPARSimProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132357(126-133)Online publication date: 7-Nov-2011
      • (2011)Exact and fast L1 cache configuration simulation for embedded systems with FIFO/PLRU cache replacement policiesProceedings of 2011 International Symposium on VLSI Design, Automation and Test10.1109/VDAT.2011.5783622(1-4)Online publication date: Apr-2011
      • (2011)CIPARSimProceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design10.1109/ICCAD.2011.6105316(126-133)Online publication date: 7-Nov-2011

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media