ABSTRACT
Debugging parallel program is a well-known difficult problem. A promising method to facilitate debugging parallel program is using hardware support to achieve deterministic replay. A hardware-assisted deterministic replay scheme should have a small log size, as well as low design cost, to be feasible for adopting by industrial processors. To achieve the goals, we propose a novel and succinct hardware-assisted deterministic replay scheme named LReplay. The key innovation of LReplay is that instead of recording the logical time orders between instructions or instruction blocks as previous investigations, LReplay is built upon recording the pending period information [6]. According to the experimental results on Godson-3, the overall log size of LReplay is about 0.55B/K-Inst (byte per k-instruction) for sequential consistency, and 0.85B/K-Inst for Godson-3 consistency. The log size is smaller in an order of magnitude than state-of-art deterministic replay schemes incuring no performance loss. Furthermore, LReplay only consumes about $1.3%$ area of Godson-3, since it requires only trivial modifications to the existing components of Godson-3. The above features of LReplay demonstrate the potential of integrating hardware-assisted deterministic replay into future industrial processors.
- M. Abramovici, K. Goossens, B. Vermeulen, J. Greenbaum, N. Stollon, and A. Donlin. "You Can Catch More Bugs with Transaction Level Honey". In Proceedings of the 6th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08), 2008. Google ScholarDigital Library
- G. Altekar and I. Stoica. "ODR: Output-Deterministic Replay for Multicore Debugging". In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP'09), 2009. Google ScholarDigital Library
- Arvind and J. Maessen. "Memory Model = Instruction Reordering + Store Atomicity". In Proceedings of the 33rd International Symposium on Computer Architecture (ISCA'06), 2006. Google ScholarDigital Library
- D. Bacon and S. Goldstein. "Hardware-assisted Replay of Multiprocessor Programs". In Proceedings of the Workshop on Parallel and Distributed Debugging, 1991. Google ScholarDigital Library
- Y. Chen, Y. Lv,W. Hu, T. Chen, H. Shen, P. Wang, and H. Pan. "Fast Complete Memory Consistency Verification". In Proceedings of the 15th International Symposium on High-Performance Computer Architecture (HPCA'09), 2009.Google ScholarCross Ref
- Y. Chen, T. Chen, and W. Hu. "Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems". CoRR abs/0903.4961, 2009. (http://arxiv.org/pdf/0903.4961)Google Scholar
- CoreSight Program Flow Trace Architecture Specification. http://infocenter.arm.com/help/topic/com.arm.doc.ihi0035a/ index.htmlGoogle Scholar
- J. Devietti, B. Lucia, L. Ceze, and M. Oskin. "DMP: Deterministic Shared Memory Multiprocessing". In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09), 2009. Google ScholarDigital Library
- G. Dunlap, S. King, S. Cinar, M. Basrai, and P. Chen. "ReVirt: Enabling Intrusion Analysis Through Virtual-Machine Logging and Replay". In Proceedings of the 5th USENIX Symposium on Operating System Design and Implementation (OSDI'02), 2002. Google ScholarDigital Library
- M. Dubois, C. Scheurich, and F. Briggs. "Memory Access Buffering in Multiprocessors". In Proceedings of the 13rd International Symposium on Computer Architecture (ISCA'86), 1986. Google ScholarDigital Library
- T. Foster, D. Lastor, and P. Singh. "First Silicon Functional Validation and Debug of Multicore Microprocessors". IEEE Transaction on VLSI System, Vol. 15, No. 5, 2007. Google ScholarDigital Library
- K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. "Memory Consistency and Event Ordering in Scalable Shared-Memory Multi Processors". In Proceedings of the 17th International Symposium on Computer Architecture (ISCA'90), 1990. Google ScholarDigital Library
- P. Gibbons and E. Korach. "On Testing Cache-Coherent Shared Memories". In Proceedings of the 6th ACM Symposium on Parallel Algorithms and Architectures (SPAA'94), 1994.. Google ScholarDigital Library
- J. Goodman. "Cache Consistency And Sequential Consistency ". Technical Report No. 61, SCI committee, 1989.Google Scholar
- Z. Guo, X.Wang, J. Tang, X. Liu, Z. Xu, M.Wu, M. F. Kaashoek, and Z. Zhang. "R2: An Application-Level Kernel for Record and Replay". In Proceedings of the 8th USENIX Symposium on Operating System Design and Implementation (OSDI'08), 2008. Google ScholarDigital Library
- D. Hower and M. Hill. "Rerun: Exploiting Episodes for Lightweight Memory Race Recording". In Proceedings of the 35th International Symposium on Computer Architecture (ISCA'08), 2008. Google ScholarDigital Library
- M. Hsieh and C. Huang. "An Embedded Infrastructure of Debug and Trace Interface for the DSP Platform". In Proceedings of the 45th Design Automation Conference (DAC'08), 2008. Google ScholarDigital Library
- W. Hu, J.Wang, X. Gao, Y. Chen, Q. Liu, and G. Li. "Godson-3: A Scalable Multicore RISC Processor with x86 Emulation". IEEE Micro, Vol. 29, No. 2, 2009. Google ScholarDigital Library
- W. Huott, M. McManus, D. Knebel, S. Steen, D. Manzer, P. Sanda, S. Wilson, Y. Chan, A. Pelella, and S. Polonsky. "The Attack of the "Holey Shmoos": A Case Study of Advanced DFD and Picosecond Imaging Circuit Analysis (PICA)". In Proceedings of the International Test Conference (ITC'99), 1999. Google ScholarDigital Library
- D. Josephson. "The Good, the Bad, and the Ugly of Silicon Debug". In Proceedings of the 43rd Design Automation Conference (DAC'06), 2006. Google ScholarDigital Library
- IEEE Std. 1149.1-1990. IEEE Standard Test Access Port and Boundary-Scan Architecture-Description.Google Scholar
- C. Kao, I. Huang, and C. Lin. "An Embedded Multi-resolution AMBA Trace Analyzer for Microprocessor-based SoC Integration". In Proceedings of the 44th Design Automation Conference (DAC'07), 2007. Google ScholarDigital Library
- L. Lamport. "Time, Clocks, and the Ordering of Events in a Distributed System". Communications of the ACM, Vol. 21, No. 7, 1978. Google ScholarDigital Library
- L. Lamport. "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs". IEEE Transactions on Computers, Vol. 28, No. 9, 1979. Google ScholarDigital Library
- T. LeBlanc and J. Mellor-Crummey. "Debugging Parallel Programs with Instant Replay". IEEE Transactions on Computers, Vol. 36, No. 4, 1987. Google ScholarDigital Library
- P. Montesinos, L. Ceze, and J. Torrellas. "DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Effciently". In Proceedings of the 35th International Symposium on Computer Architecture (ISCA'08), 2008. Google ScholarDigital Library
- P. Montesinos, M. Hicks, S. King, and J. Torrellas. "Capo: Abstraction and Software-hardware Interface for Hardware-assited Deterministic Multiprocessor Replay". In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09), 2009. Google ScholarDigital Library
- S. Narayanasamy, G. Pokam, and B. Calder. "BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging". In Proceedings of the 31st International Symposium on Computer Architecture (ISCA'05), 2005. Google ScholarDigital Library
- S. Narayanasamy, C. Pereira, and B. Calder. "Recording Shared Memory Dependencies Using Strata". In Proceedings of the 12nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06), 2006. Google ScholarDigital Library
- R. Netzer and B. Miller. "On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions". In Proceedings of the International Conference on Parallel Processing (ICPP'90), 1990.Google Scholar
- R. Netzer. "Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs". In Proceedings of the Workshop on Parallel and Distributed Debugging, 1993. Google ScholarDigital Library
- A. Roy, S. Zeisset, C. Fleckenstein, and J. Huang. "Fast and Generalized Polynomial Time Memory Consistency Verification". In Proceedings of the 18th International Conference on Computer Aided Verification (CAV'06), 2006. Google ScholarDigital Library
- C. Scheurich and M. Dubois. "Correct Memory Operation of Cached-Based Multiprocessors". In Proceedings of the 14th International Symposium on Computer Architecture (ISCA'87), 1987. Google ScholarDigital Library
- I. Silas, I. Frumkin, E. Hazan, E. Mor, and G. Zobin. "System-Level Validation of the Intel Pentium M Processor". Intel Technical Journal, Vol. 7, No. 2, 2003.Google Scholar
- S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations". In Proceedings of the 22nd International Symposium on Computer Architecture (ISCA'95), 1995. Google ScholarDigital Library
- Incisive Xtreme Series Datasheet. http://www.cadence.com/rl/Resources/datasheets/Cadence_6569_DS_R2.pdf.Google Scholar
- M. Xu, R. Bodik, and M. Hill. "A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay". In Proceedings of the 30th International Symposium on Computer Architecture (ISCA'03), 2003. Google ScholarDigital Library
- M. Xu, M. Hill, and R. Bodik. "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording". In Proceedings of the 12nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'06), 2006. Google ScholarDigital Library
- R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, G. Voelker. "MPIWiz: Subgroup Reproducible Replay of MPI Applications". In Proceedings of the 14th Annual Symposium on Principles and Practice of Parallel Programming (PPoPP'09), 2009. Google ScholarDigital Library
- J. Zhai,W. Chen, and W. Zheng. "Phantom: Predicting Performance of Parallel Applications on Large-Scale Parallel Machines Using a Single Node". In Proceedings of the 15th Annual Symposium on Principles and Practice of Parallel Programming (PPoPP'10), 2010. Google ScholarDigital Library
- M. Zilmer. "Non-intrusive On-chip Debug Hardware Accelerates Development for MIPS RISC Processors". http://cms.mips.com/media/files/white-papers/ejtag_debug_eetimes.pdfGoogle Scholar
Index Terms
- LReplay: a pending period based deterministic replay scheme
Recommendations
LReplay: a pending period based deterministic replay scheme
ISCA '10Debugging parallel program is a well-known difficult problem. A promising method to facilitate debugging parallel program is using hardware support to achieve deterministic replay. A hardware-assisted deterministic replay scheme should have a small log ...
Deterministic Replay Using Global Clock
Debugging parallel programs is a well-known difficult problem. A promising method to facilitate debugging parallel programs is using hardware support to achieve deterministic replay on a Chip Multi-Processor (CMP). As a Design-For-Debug (DFD) feature, a ...
Linear Time Memory Consistency Verification
Verifying the execution of a parallel program against a given memory consistency model (memory consistency verification) is a crucial problem in the functional validation of Chip Multiprocessor (CMP). In the absence of additional information, the above ...
Comments