skip to main content
10.1145/2541940.2541979acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

RelaxReplay: record and replay for relaxed-consistency multiprocessors

Published: 24 February 2014 Publication History

Abstract

Record and Deterministic Replay (RnR) of multithreaded programs on relaxed-consistency multiprocessors has been a long-standing problem. While there are designs that work for Total Store Ordering (TSO), finding a general solution that is able to record the access reordering allowed by any relaxed-consistency model has proved challenging. This paper presents the first complete solution for hard-ware-assisted memory race recording that works for any relaxed-consistency model of current processors. With the scheme, called RelaxReplay, we can build an RnR system for any relaxed-consistency model and coherence protocol. RelaxReplay's core innovation is a new way of capturing memory access reordering. Each memory instruction goes through a post-completion in-order counting step that detects any reordering, and efficiently records it. We evaluate RelaxReplay with simulations of an 8-core release-consistent multicore running SPLASH-2 programs. We observe that RelaxReplay induces negligible overhead during recording. In addition, the average size of the log produced is comparable to the log sizes reported for existing solutions, and still very small compared to the memory bandwidth of modern machines. Finally, deterministic replay is efficient and needs minimal hardware support.

References

[1]
H. Agrawal, R. A. DeMillo, and E. H. Spafford. An Execution-Backtracking Approach to Debugging. phIEEE Software, 8 (3), May 1991.
[2]
ARM. phARM Architecture Reference Manual, ARMv7-A and ARMv7-R Edition Issue C, July 2012.
[3]
A. Basu, J. Bobba, and M. D. Hill. Karma: Scalable Deterministic Record-Replay. In phICS, June 2011.
[4]
B. H. Bloom. Space/Time Trade-Offs in Hash Coding with Allowable Errors. phComm. of the ACM, 11 (7), July 1970.
[5]
T. Bressoud and F. Schneider. Hypervisor-Based Fault-Tolerance. phACM TOCS, 14 (1), February 1996.
[6]
Y. Chen, W. Hu, T. Chen, and R. Wu. LReplay: A Pending Period Based Deterministic Replay Scheme. In phISCA, June 2010.
[7]
B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. Remus: High Availability via Asynchronous Virtual Machine Replication. In phNSDI, April 2008.
[8]
G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In phOSDI, December 2002.
[9]
N. Honarmand, N. Dautenhahn, J. Torrellas, S. T. King, G. Pokam, and C. Pereira. Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism. In phASPLOS, March 2013.
[10]
D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In phISCA, June 2008.
[11]
A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen. Detecting Past and Present Intrusions Through Vulnerability-Specific Predicates. In phSOSP, October 2005.
[12]
S. T. King and P. M. Chen. Backtracking Intrusions. In phSOSP, October 2003.
[13]
S. T. King, G. W. Dunlap, and P. M. Chen. Debugging Operating Systems with Time-Traveling Virtual Machines. In phUSENIX Ann. Tech. Conf., April 2005.
[14]
L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. phIEEE Trans. Comput., 28 (9), September 1979.
[15]
D. Lee, M. Said, S. Narayanasamy, Z. Yang, and C. Pereira. Offline Symbolic Analysis for Multi-Processor Execution Replay. In phMICRO, December 2009.
[16]
D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism. In phASPLOS, March 2010.
[17]
D. Lee, M. Said, S. Narayanasamy, and Z. Yang. Offline Symbolic Analysis to Infer Total Store Order. In phHPCA, February 2011.
[18]
P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In phISCA, June 2008.
[19]
P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: A Software-Hardware Interface for Practical Deterministic Multiprocessor Replay. In phASPLOS, March 2009.
[20]
S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In phISCA, June 2005.
[21]
S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In phASPLOS, Oct 2006.
[22]
S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In phSOSP, October 2009.
[23]
G. Pokam, C. Pereira, K. Danne, R. Kassa, and A.-R. Adl-Tabatabai. Architecting a Chunk-Based Memory Race Recorder in Modern CMPs. In phMICRO, December 2009.
[24]
G. Pokam, C. Pereira, S. Hu, A.-R. Adl-Tabatabai, J. Gottschlich, H. Jungwoo, and Y. Wu. CoreRacer: A Practical Memory Race Recorder for Multicore x86 TSO Processors. In phMICRO, December 2011.
[25]
G. Pokam, K. Danne, C. Pereira, R. Kassa, T. Kranich, S. Hu, J. Gottschlich, N. Honarmand, N. Dautenhahn, S. T. King, and J. Torrellas. QuickRec: Prototyping an Intel Architecture Extension for Record and Replay of Multithreaded Programs. In phISCA, June 2013.
[26]
Power.org. phPower ISA#8482; Version 2.06 Revision B, July 2010.
[27]
X. Qian, H. Huang, B. Sahelices, and D. Qian. Rainbow: Efficient Memory Dependence Recording with High Replay Parallelism for Relaxed Memory Model. In phHPCA, Feb 2013.
[28]
D. J. Sorin, M. D. Hill, and D. A. Wood. phA Primer on Memory Consistency and Cache Coherence. Morgan & Claypool Publishers, 1st edition, 2011.
[29]
S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In phUSENIX Ann. Tech. Conf., June 2004.
[30]
Tilera. phTile Processor User Architecture Manual Rel. 2.4, November 2011.
[31]
J. Torrellas, L. Ceze, J. Tuck, C. Cascaval, P. Montesinos, W. Ahn, and M. Prvulovic. The Bulk Multicore Architecture for Improved Programmability. phComm. of the ACM, 52 (12), 2009.
[32]
K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In phASPLOS, March 2011.
[33]
G. Voskuilen, F. Ahmad, and T. N. Vijaykumar. Timetraveler: Exploiting Acyclic Races for Optimizing Memory Race Recording. In phISCA, June 2010.
[34]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In phISCA, June 1995.
[35]
M. Xu, R. Bodik, and M. D. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In phISCA, June 2003.
[36]
M. Xu, R. Bodik, and M. D. Hill. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In phASPLOS, 2006.

Cited By

View all
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2022)ClusterRR: a record and replay framework for virtual machine clusterProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516819(31-44)Online publication date: 25-Feb-2022
  • (2019)Sparse record and replay with controlled schedulingProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314635(576-593)Online publication date: 8-Jun-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
February 2014
780 pages
ISBN:9781450323055
DOI:10.1145/2541940
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory race recording
  2. record and deterministic replay
  3. relaxed consistency

Qualifiers

  • Research-article

Conference

ASPLOS '14

Acceptance Rates

ASPLOS '14 Paper Acceptance Rate 49 of 217 submissions, 23%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2022)ClusterRR: a record and replay framework for virtual machine clusterProceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3516807.3516819(31-44)Online publication date: 25-Feb-2022
  • (2019)Sparse record and replay with controlled schedulingProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314635(576-593)Online publication date: 8-Jun-2019
  • (2018)Leveraging Hardware-Assisted Virtualization for Deterministic Replay on Commodity Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2017.272749267:1(45-58)Online publication date: 1-Jan-2018
  • (2018)Record-Replay Architecture as a General Security Framework2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00025(180-193)Online publication date: Feb-2018
  • (2018)A Lightweight and Flexible Tool for Distinguishing Between Hardware Malfunctions and Program Bugs in Debugging Large-Scale ProgramsIEEE Access10.1109/ACCESS.2018.28823946(71892-71905)Online publication date: 2018
  • (2016)Abstractions for Practical Virtual Machine ReplayACM SIGPLAN Notices10.1145/3007611.289225751:7(93-106)Online publication date: 25-Mar-2016
  • (2016)SReplayProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926264(1-13)Online publication date: 1-Jun-2016
  • (2016)Abstractions for Practical Virtual Machine ReplayProceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/2892242.2892257(93-106)Online publication date: 25-Mar-2016
  • (2015)SamsaraProceedings of the 6th Asia-Pacific Workshop on Systems10.1145/2797022.2797028(1-7)Online publication date: 27-Jul-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media