skip to main content
10.1145/1815961.1815985acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

LReplay: a pending period based deterministic replay scheme

Published: 19 June 2010 Publication History

Abstract

Debugging parallel program is a well-known difficult problem. A promising method to facilitate debugging parallel program is using hardware support to achieve deterministic replay. A hardware-assisted deterministic replay scheme should have a small log size, as well as low design cost, to be feasible for adopting by industrial processors. To achieve the goals, we propose a novel and succinct hardware-assisted deterministic replay scheme named LReplay. The key innovation of LReplay is that instead of recording the logical time orders between instructions or instruction blocks as previous investigations, LReplay is built upon recording the pending period information [6]. According to the experimental results on Godson-3, the overall log size of LReplay is about 0.55B/K-Inst (byte per k-instruction) for sequential consistency, and 0.85B/K-Inst for Godson-3 consistency. The log size is smaller in an order of magnitude than state-of-art deterministic replay schemes incuring no performance loss. Furthermore, LReplay only consumes about $1.3%$ area of Godson-3, since it requires only trivial modifications to the existing components of Godson-3. The above features of LReplay demonstrate the potential of integrating hardware-assisted deterministic replay into future industrial processors.

References

[1]
M. Abramovici, K. Goossens, B. Vermeulen, J. Greenbaum, N. Stollon, and A. Donlin. "You Can Catch More Bugs with Transaction Level Honey". In Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08), 2008.
[2]
G. Altekar and I. Stoica. "ODR: Output-Deterministic Replay for Multicore Debugging". In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP'09), 2009.
[3]
Arvind and J. Maessen. "Memory Model = Instruction Reordering + Store Atomicity". In Proceedings of the 33rd International Symposium on Computer Architecture (ISCA'06), 2006.
[4]
D. Bacon and S. Goldstein. "Hardware-assisted Replay of Multiprocessor Programs". In Proceedings of the Workshop on Parallel and Distributed Debugging, 1991.
[5]
Y. Chen, Y. Lv,W. Hu, T. Chen, H. Shen, P. Wang, and H. Pan. "Fast Complete Memory Consistency Verification". In Proceedings of the 15th International Symposium on High-Performance Computer Architecture (HPCA'09), 2009.
[6]
Y. Chen, T. Chen, and W. Hu. "Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems". CoRR abs/0903.4961, 2009. (http://arxiv.org/pdf/0903.4961)
[7]
CoreSight Program Flow Trace Architecture Specification. http://infocenter.arm.com/help/topic/com.arm.doc.ihi0035a/ index.html
[8]
J. Devietti, B. Lucia, L. Ceze, and M. Oskin. "DMP: Deterministic Shared Memory Multiprocessing". In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09), 2009.
[9]
G. Dunlap, S. King, S. Cinar, M. Basrai, and P. Chen. "ReVirt: Enabling Intrusion Analysis Through Virtual-Machine Logging and Replay". In Proceedings of the 5th USENIX Symposium on Operating System Design and Implementation (OSDI'02), 2002.
[10]
M. Dubois, C. Scheurich, and F. Briggs. "Memory Access Buffering in Multiprocessors". In Proceedings of the 13rd International Symposium on Computer Architecture (ISCA'86), 1986.
[11]
T. Foster, D. Lastor, and P. Singh. "First Silicon Functional Validation and Debug of Multicore Microprocessors". IEEE Transaction on VLSI System, Vol. 15, No. 5, 2007.
[12]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. "Memory Consistency and Event Ordering in Scalable Shared-Memory Multi Processors". In Proceedings of the 17th International Symposium on Computer Architecture (ISCA'90), 1990.
[13]
P. Gibbons and E. Korach. "On Testing Cache-Coherent Shared Memories". In Proceedings of the 6th ACM Symposium on Parallel Algorithms and Architectures (SPAA'94), 1994.
[14]
J. Goodman. "Cache Consistency And Sequential Consistency ". Technical Report No. 61, SCI committee, 1989.
[15]
Z. Guo, X.Wang, J. Tang, X. Liu, Z. Xu, M.Wu, M. F. Kaashoek, and Z. Zhang. "R2: An Application-Level Kernel for Record and Replay". In Proceedings of the 8th USENIX Symposium on Operating System Design and Implementation (OSDI'08), 2008.
[16]
D. Hower and M. Hill. "Rerun: Exploiting Episodes for Lightweight Memory Race Recording". In Proceedings of the 35th International Symposium on Computer Architecture (ISCA'08), 2008.
[17]
M. Hsieh and C. Huang. "An Embedded Infrastructure of Debug and Trace Interface for the DSP Platform". In Proceedings of the 45th Design Automation Conference (DAC'08), 2008.
[18]
W. Hu, J.Wang, X. Gao, Y. Chen, Q. Liu, and G. Li. "Godson-3: A Scalable Multicore RISC Processor with x86 Emulation". IEEE Micro, Vol. 29, No. 2, 2009.
[19]
W. Huott, M. McManus, D. Knebel, S. Steen, D. Manzer, P. Sanda, S. Wilson, Y. Chan, A. Pelella, and S. Polonsky. "The Attack of the "Holey Shmoos": A Case Study of Advanced DFD and Picosecond Imaging Circuit Analysis (PICA)". In Proceedings of the International Test Conference (ITC'99), 1999.
[20]
D. Josephson. "The Good, the Bad, and the Ugly of Silicon Debug". In Proceedings of the 43rd Design Automation Conference (DAC'06), 2006.
[21]
IEEE Std. 1149.1-1990. IEEE Standard Test Access Port and Boundary-Scan Architecture-Description.
[22]
C. Kao, I. Huang, and C. Lin. "An Embedded Multi-resolution AMBA Trace Analyzer for Microprocessor-based SoC Integration". In Proceedings of the 44th Design Automation Conference (DAC'07), 2007.
[23]
L. Lamport. "Time, Clocks, and the Ordering of Events in a Distributed System". Communications of the ACM, Vol. 21, No. 7, 1978.
[24]
L. Lamport. "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs". IEEE Transactions on Computers, Vol. 28, No. 9, 1979.
[25]
T. LeBlanc and J. Mellor-Crummey. "Debugging Parallel Programs with Instant Replay". IEEE Transactions on Computers, Vol. 36, No. 4, 1987.
[26]
P. Montesinos, L. Ceze, and J. Torrellas. "DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Effciently". In Proceedings of the 35th International Symposium on Computer Architecture (ISCA'08), 2008.
[27]
P. Montesinos, M. Hicks, S. King, and J. Torrellas. "Capo: Abstraction and Software-hardware Interface for Hardware-assited Deterministic Multiprocessor Replay". In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09), 2009.
[28]
S. Narayanasamy, G. Pokam, and B. Calder. "BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging". In Proceedings of the 31st International Symposium on Computer Architecture (ISCA'05), 2005.
[29]
S. Narayanasamy, C. Pereira, and B. Calder. "Recording Shared Memory Dependencies Using Strata". In Proceedings of the 12nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06), 2006.
[30]
R. Netzer and B. Miller. "On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions". In Proceedings of the International Conference on Parallel Processing (ICPP'90), 1990.
[31]
R. Netzer. "Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs". In Proceedings of the Workshop on Parallel and Distributed Debugging, 1993.
[32]
A. Roy, S. Zeisset, C. Fleckenstein, and J. Huang. "Fast and Generalized Polynomial Time Memory Consistency Verification". In Proceedings of the 18th International Conference on Computer Aided Verification (CAV'06), 2006.
[33]
C. Scheurich and M. Dubois. "Correct Memory Operation of Cached-Based Multiprocessors". In Proceedings of the 14th International Symposium on Computer Architecture (ISCA'87), 1987.
[34]
I. Silas, I. Frumkin, E. Hazan, E. Mor, and G. Zobin. "System-Level Validation of the Intel Pentium M Processor". Intel Technical Journal, Vol. 7, No. 2, 2003.
[35]
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations". In Proceedings of the 22nd International Symposium on Computer Architecture (ISCA'95), 1995.
[36]
Incisive Xtreme Series Datasheet. http://www.cadence.com/rl/Resources/datasheets/Cadence_6569_DS_R2.pdf.
[37]
M. Xu, R. Bodik, and M. Hill. "A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay". In Proceedings of the 30th International Symposium on Computer Architecture (ISCA'03), 2003.
[38]
M. Xu, M. Hill, and R. Bodik. "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording". In Proceedings of the 12nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06), 2006.
[39]
R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, G. Voelker. "MPIWiz: Subgroup Reproducible Replay of MPI Applications". In Proceedings of the 14th Annual Symposium on Principles and Practice of Parallel Programming (PPoPP'09), 2009.
[40]
J. Zhai,W. Chen, and W. Zheng. "Phantom: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node". In Proceedings of the 15th Annual Symposium on Principles and Practice of Parallel Programming (PPoPP'10), 2010.
[41]
M. Zilmer. "Non-intrusive On-chip Debug Hardware Accelerates Development for MIPS RISC Processors". http://cms.mips.com/media/files/white-papers/ejtag_debug_eetimes.pdf

Cited By

View all
  • (2023)Vidi: Record Replay for Reconfigurable HardwareProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582040(806-820)Online publication date: 25-Mar-2023
  • (2021)STRABProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3442028(1532-1541)Online publication date: 22-Mar-2021
  • (2019)Differential Testing of Certificate Validation in SSL/TLS ImplementationsACM Transactions on Software Engineering and Methodology10.1145/335504828:4(1-37)Online publication date: 9-Oct-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
June 2010
520 pages
ISBN:9781450300537
DOI:10.1145/1815961
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
    ISCA '10
    June 2010
    508 pages
    ISSN:0163-5964
    DOI:10.1145/1816038
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deterministic replay
  2. dfd
  3. global clock
  4. multi-core processor
  5. pending period
  6. physical time order

Qualifiers

  • Research-article

Conference

ISCA '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Vidi: Record Replay for Reconfigurable HardwareProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582040(806-820)Online publication date: 25-Mar-2023
  • (2021)STRABProceedings of the 36th Annual ACM Symposium on Applied Computing10.1145/3412841.3442028(1532-1541)Online publication date: 22-Mar-2021
  • (2019)Differential Testing of Certificate Validation in SSL/TLS ImplementationsACM Transactions on Software Engineering and Methodology10.1145/335504828:4(1-37)Online publication date: 9-Oct-2019
  • (2019)Precise Learn-to-Rank Fault Localization Using Dynamic and Static Features of Target ProgramsACM Transactions on Software Engineering and Methodology10.1145/334562828:4(1-34)Online publication date: 9-Oct-2019
  • (2019)The Virtual DeveloperACM Transactions on Software Engineering and Methodology10.1145/334054528:4(1-38)Online publication date: 2-Sep-2019
  • (2019)Enabling On-the-Fly Hardware Tracing of Data Reads in MulticoresACM Transactions on Embedded Computing Systems10.1145/332264218:4(1-27)Online publication date: 10-Jun-2019
  • (2018)GPSonflowACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/31976563:3(1-29)Online publication date: 13-Jun-2018
  • (2017)Imitation LearningACM Computing Surveys10.1145/305491250:2(1-35)Online publication date: 6-Apr-2017
  • (2017)VarCatcherIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.261352428:4(1215-1228)Online publication date: 1-Apr-2017
  • (2016)SReplayProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926264(1-13)Online publication date: 1-Jun-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media