skip to main content
10.1145/3264746.3264766acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Hardware-accelerated cache simulation for multicore by FPGA

Published: 09 October 2018 Publication History

Abstract

Developers often use a virtual platform to develop software before the hardware is available. For software optimization, it is important to profile the cache misses of applications in a realistic operating environment under the virtual platform. In the multicore era, it is hard to simulate the coherence cache miss in a high speed way. In this paper, we propose a hardware-accelerated architecture to simulate the cache misses of a multicore system. We implement the cache miss simulator over a virtual platform with FPGA. Users can profile their software as running over the multicore system. The evaluation shows the throughput achieves 65 MB of trace log per second, when FPGA works in 100 MHz and about 570,000 logic elements are occupied to simulate 4 sets of L1 cache and 1 set of L2 cache in the multicore system with 4 virtual CPUs. The system achieves 1.6 to 2 times of speedup, when comparing with the popular cache miss simulator, Dinero IV. Dinero does less work and does not support coherence cache misses in the multicore system. The evaluation result shows high advantage to speed up the cache miss simulation of the multicore system by the hardware-accelerated architecture as well as FPGA.

References

[1]
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track. 41--46.
[2]
Erik Berg, Hakan Zeffer, and Erik Hagersten. 2006. A statistical multiprocessor cache model. In Performance Analysis of Systems and Software, 2006 IEEE International Symposium on. IEEE, 89--99.
[3]
Kristof Beyls and Erik DâĂŹHollander. 2001. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and systems, Vol. 14. 350--360.
[4]
Derek Chiou, Dam Sunwoo, Joonsoo Kim, Nikhil A Patil, William Reinhart, Darrel Eric Johnson, Jebediah Keefe, and Hari Angepat. 2007. Fpga-accelerated simulation technologies (fast): Fast, full-system, cycle-accurate simulators. In Proceedings of the 40th Annual IEEE/ACM international Symposium on Microarchitecture. IEEE Computer Society, 249--261.
[5]
Intel Coporation. {n. d.}. SignalTap II with Verilog Designs.
[6]
Intel Coporation. {n. d.}. Using ModelSim to Simulate Logic Circuits in Verilog Designs.
[7]
Intel Coporation. {n. d.}. Using TimeQuest Timing Analyzer.
[8]
Intel Coporation. 2017. AvalonÂö Interface Specifications.
[9]
Jan Edler and Mark D. Hill. {n. d.}. Dinero IV Trace-Driven Uniprocessor Cache Simulator. ({n. d.}).
[10]
Matthew R Guthaus, Jeffrey S Ringenberg, Dan Ernst, Todd M Austin, Trevor Mudge, and Richard B Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on. IEEE, 3--14.
[11]
Mark D Hill and Alan Jay Smith. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. 38, 12 (1989), 1612--1630.
[12]
Matthew Jacobsen, Dustin Richmond, Matthew Hogains, and Ryan Kastner. 2015. RIFFA 2.1: A reusable integration framework for FPGA accelerators. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 8, 4 (2015), 22.
[13]
Xiaoyue Pan and Bengt Jonsson. 2014. Modeling cache coherence misses on multicores. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on. IEEE, 96--105.
[14]
Derek L Schuff, Milind Kulkarni, and Vijay S Pai. 2010. Accelerating multicore reuse distance analysis with sampling and parallelization. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 53--64.
[15]
Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David Brooks. 2016. Co-designing accelerators and soc interfaces using gem5-aladdin. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--12.
[16]
Chia-Heng Tu, Hui-Hsin Hsu, Jen-Hao Chen, Chun-Han Chen, and Shih-Hao Hung. 2014. Performance and power profiling for emulated android systems. ACM Transactions on Design Automation of Electronic Systems (TODAES) 19, 2 (2014), 10.

Cited By

View all
  • (2023)MCSim: A Multi-Core Cache Simulator Accelerated on a Resource-constrained FPGAProceedings of the Great Lakes Symposium on VLSI 202310.1145/3583781.3590309(155-158)Online publication date: 5-Jun-2023
  • (2021)Cache-accel: FPGA Accelerated Cache Simulator with Partially Reconfigurable Prefetcher2021 24th Euromicro Conference on Digital System Design (DSD)10.1109/DSD53832.2021.00024(97-100)Online publication date: Sep-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RACS '18: Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems
October 2018
355 pages
ISBN:9781450358859
DOI:10.1145/3264746
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • KISM: Korean Institute of Smart Media

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. cache simulation
  3. multicore

Qualifiers

  • Research-article

Funding Sources

Conference

RACS '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 393 of 1,581 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)MCSim: A Multi-Core Cache Simulator Accelerated on a Resource-constrained FPGAProceedings of the Great Lakes Symposium on VLSI 202310.1145/3583781.3590309(155-158)Online publication date: 5-Jun-2023
  • (2021)Cache-accel: FPGA Accelerated Cache Simulator with Partially Reconfigurable Prefetcher2021 24th Euromicro Conference on Digital System Design (DSD)10.1109/DSD53832.2021.00024(97-100)Online publication date: Sep-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media