skip to main content
10.1145/1815961.1815985acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

LReplay: a pending period based deterministic replay scheme

Authors Info & Claims
Published:19 June 2010Publication History

ABSTRACT

Debugging parallel program is a well-known difficult problem. A promising method to facilitate debugging parallel program is using hardware support to achieve deterministic replay. A hardware-assisted deterministic replay scheme should have a small log size, as well as low design cost, to be feasible for adopting by industrial processors. To achieve the goals, we propose a novel and succinct hardware-assisted deterministic replay scheme named LReplay. The key innovation of LReplay is that instead of recording the logical time orders between instructions or instruction blocks as previous investigations, LReplay is built upon recording the pending period information [6]. According to the experimental results on Godson-3, the overall log size of LReplay is about 0.55B/K-Inst (byte per k-instruction) for sequential consistency, and 0.85B/K-Inst for Godson-3 consistency. The log size is smaller in an order of magnitude than state-of-art deterministic replay schemes incuring no performance loss. Furthermore, LReplay only consumes about $1.3%$ area of Godson-3, since it requires only trivial modifications to the existing components of Godson-3. The above features of LReplay demonstrate the potential of integrating hardware-assisted deterministic replay into future industrial processors.

References

  1. M. Abramovici, K. Goossens, B. Vermeulen, J. Greenbaum, N. Stollon, and A. Donlin. "You Can Catch More Bugs with Transaction Level Honey". In Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Altekar and I. Stoica. "ODR: Output-Deterministic Replay for Multicore Debugging". In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arvind and J. Maessen. "Memory Model = Instruction Reordering + Store Atomicity". In Proceedings of the 33rd International Symposium on Computer Architecture (ISCA'06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bacon and S. Goldstein. "Hardware-assisted Replay of Multiprocessor Programs". In Proceedings of the Workshop on Parallel and Distributed Debugging, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Chen, Y. Lv,W. Hu, T. Chen, H. Shen, P. Wang, and H. Pan. "Fast Complete Memory Consistency Verification". In Proceedings of the 15th International Symposium on High-Performance Computer Architecture (HPCA'09), 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. Y. Chen, T. Chen, and W. Hu. "Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems". CoRR abs/0903.4961, 2009. (http://arxiv.org/pdf/0903.4961)Google ScholarGoogle Scholar
  7. CoreSight Program Flow Trace Architecture Specification. http://infocenter.arm.com/help/topic/com.arm.doc.ihi0035a/ index.htmlGoogle ScholarGoogle Scholar
  8. J. Devietti, B. Lucia, L. Ceze, and M. Oskin. "DMP: Deterministic Shared Memory Multiprocessing". In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Dunlap, S. King, S. Cinar, M. Basrai, and P. Chen. "ReVirt: Enabling Intrusion Analysis Through Virtual-Machine Logging and Replay". In Proceedings of the 5th USENIX Symposium on Operating System Design and Implementation (OSDI'02), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Dubois, C. Scheurich, and F. Briggs. "Memory Access Buffering in Multiprocessors". In Proceedings of the 13rd International Symposium on Computer Architecture (ISCA'86), 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Foster, D. Lastor, and P. Singh. "First Silicon Functional Validation and Debug of Multicore Microprocessors". IEEE Transaction on VLSI System, Vol. 15, No. 5, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. "Memory Consistency and Event Ordering in Scalable Shared-Memory Multi Processors". In Proceedings of the 17th International Symposium on Computer Architecture (ISCA'90), 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Gibbons and E. Korach. "On Testing Cache-Coherent Shared Memories". In Proceedings of the 6th ACM Symposium on Parallel Algorithms and Architectures (SPAA'94), 1994.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Goodman. "Cache Consistency And Sequential Consistency ". Technical Report No. 61, SCI committee, 1989.Google ScholarGoogle Scholar
  15. Z. Guo, X.Wang, J. Tang, X. Liu, Z. Xu, M.Wu, M. F. Kaashoek, and Z. Zhang. "R2: An Application-Level Kernel for Record and Replay". In Proceedings of the 8th USENIX Symposium on Operating System Design and Implementation (OSDI'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Hower and M. Hill. "Rerun: Exploiting Episodes for Lightweight Memory Race Recording". In Proceedings of the 35th International Symposium on Computer Architecture (ISCA'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Hsieh and C. Huang. "An Embedded Infrastructure of Debug and Trace Interface for the DSP Platform". In Proceedings of the 45th Design Automation Conference (DAC'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Hu, J.Wang, X. Gao, Y. Chen, Q. Liu, and G. Li. "Godson-3: A Scalable Multicore RISC Processor with x86 Emulation". IEEE Micro, Vol. 29, No. 2, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Huott, M. McManus, D. Knebel, S. Steen, D. Manzer, P. Sanda, S. Wilson, Y. Chan, A. Pelella, and S. Polonsky. "The Attack of the "Holey Shmoos": A Case Study of Advanced DFD and Picosecond Imaging Circuit Analysis (PICA)". In Proceedings of the International Test Conference (ITC'99), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Josephson. "The Good, the Bad, and the Ugly of Silicon Debug". In Proceedings of the 43rd Design Automation Conference (DAC'06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. IEEE Std. 1149.1-1990. IEEE Standard Test Access Port and Boundary-Scan Architecture-Description.Google ScholarGoogle Scholar
  22. C. Kao, I. Huang, and C. Lin. "An Embedded Multi-resolution AMBA Trace Analyzer for Microprocessor-based SoC Integration". In Proceedings of the 44th Design Automation Conference (DAC'07), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Lamport. "Time, Clocks, and the Ordering of Events in a Distributed System". Communications of the ACM, Vol. 21, No. 7, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Lamport. "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs". IEEE Transactions on Computers, Vol. 28, No. 9, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. LeBlanc and J. Mellor-Crummey. "Debugging Parallel Programs with Instant Replay". IEEE Transactions on Computers, Vol. 36, No. 4, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Montesinos, L. Ceze, and J. Torrellas. "DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Effciently". In Proceedings of the 35th International Symposium on Computer Architecture (ISCA'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Montesinos, M. Hicks, S. King, and J. Torrellas. "Capo: Abstraction and Software-hardware Interface for Hardware-assited Deterministic Multiprocessor Replay". In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Narayanasamy, G. Pokam, and B. Calder. "BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging". In Proceedings of the 31st International Symposium on Computer Architecture (ISCA'05), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Narayanasamy, C. Pereira, and B. Calder. "Recording Shared Memory Dependencies Using Strata". In Proceedings of the 12nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Netzer and B. Miller. "On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions". In Proceedings of the International Conference on Parallel Processing (ICPP'90), 1990.Google ScholarGoogle Scholar
  31. R. Netzer. "Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs". In Proceedings of the Workshop on Parallel and Distributed Debugging, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Roy, S. Zeisset, C. Fleckenstein, and J. Huang. "Fast and Generalized Polynomial Time Memory Consistency Verification". In Proceedings of the 18th International Conference on Computer Aided Verification (CAV'06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Scheurich and M. Dubois. "Correct Memory Operation of Cached-Based Multiprocessors". In Proceedings of the 14th International Symposium on Computer Architecture (ISCA'87), 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. Silas, I. Frumkin, E. Hazan, E. Mor, and G. Zobin. "System-Level Validation of the Intel Pentium M Processor". Intel Technical Journal, Vol. 7, No. 2, 2003.Google ScholarGoogle Scholar
  35. S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations". In Proceedings of the 22nd International Symposium on Computer Architecture (ISCA'95), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Incisive Xtreme Series Datasheet. http://www.cadence.com/rl/Resources/datasheets/Cadence_6569_DS_R2.pdf.Google ScholarGoogle Scholar
  37. M. Xu, R. Bodik, and M. Hill. "A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay". In Proceedings of the 30th International Symposium on Computer Architecture (ISCA'03), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Xu, M. Hill, and R. Bodik. "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording". In Proceedings of the 12nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, G. Voelker. "MPIWiz: Subgroup Reproducible Replay of MPI Applications". In Proceedings of the 14th Annual Symposium on Principles and Practice of Parallel Programming (PPoPP'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Zhai,W. Chen, and W. Zheng. "Phantom: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node". In Proceedings of the 15th Annual Symposium on Principles and Practice of Parallel Programming (PPoPP'10), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Zilmer. "Non-intrusive On-chip Debug Hardware Accelerates Development for MIPS RISC Processors". http://cms.mips.com/media/files/white-papers/ejtag_debug_eetimes.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. LReplay: a pending period based deterministic replay scheme

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
        June 2010
        520 pages
        ISBN:9781450300537
        DOI:10.1145/1815961
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
          ISCA '10
          June 2010
          508 pages
          ISSN:0163-5964
          DOI:10.1145/1816038
          Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 June 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate543of3,203submissions,17%

        Upcoming Conference

        ISCA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader