skip to main content
10.1145/3373376.3378452acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

Cross-Failure Bug Detection in Persistent Memory Programs

Authors Info & Claims
Published:13 March 2020Publication History

ABSTRACT

Persistent memory (PM) technologies, such as Intel's Optane memory, deliver high performance, byte-addressability, and persistence, allowing programs to directly manipulate persistent data in memory without any OS intermediaries. An important requirement of these programs is that persistent data must remain consistent across a failure, which we refer to as the crash consistency guarantee. However, maintaining crash consistency is not trivial. We identify that a consistent recovery critically depends not only on the execution before the failure, but also on the recovery and resumption after failure. We refer to these stages as the pre- and post-failure execution stages. In order to holistically detect crash consistency bugs, we categorize the underlying causes behind inconsistent recovery due to incorrect interactions between the pre- and post-failure execution. First, a program is not crash-consistent if the post-failure stage reads from locations that are not guaranteed to be persisted in all possible access interleavings during the pre-failure stage -- a type of programming error that leads to a race that we refer to as a cross-failure race. Second, a program is not crash-consistent if the post-failure stage reads persistent data that has been left semantically inconsistent during the pre-failure stage, such as a stale log or uncommitted data. We refer to this type of bugs as a cross-failure semantic bug. Together, they form the cross-failure bugs in PM programs. In this work, we provide XFDetector, a tool that detects cross-failure bugs by automatically injecting failures into the pre-failure execution, and checking for cross-failure races and semantic bugs in the post-failure continuation. XFDetector has detected four new bugs in three pieces of PM software: one of PMDK's examples, a PM-optimized Redis database, and a PMDK library function.

References

  1. S. V. Adve and M. D. Hill. A unified formalization of four shared-memory models. TPDS, 4(6):613--624, June 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sarita V. Adve, Mark D. Hill, Barton P. Miller, and Robert H. B. Netzer. Detecting data races on weak memory systems. In ISCA, 1991.Google ScholarGoogle Scholar
  3. ARM. ARM architecture reference manual ARMv8, for ARMv8-A architecture profile. https://static.docs.arm.com/ddi0487/da/DDI0487D_a_armv8_arm.pdf, 2018.Google ScholarGoogle Scholar
  4. Joy Arulraj and Andrew Pavlo. How to build a non-volatile memory database management system. In SIGMOD, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. Atlas: Leveraging locks for non-volatile memory consistency. In OOPSLA, 2014.Google ScholarGoogle Scholar
  6. Andreas Chatzistergiou, Marcelo Cintra, and Stratis D. Viglas. REWIND: Recovery write-ahead system for in-memory non-volatile data-structures. PVLDB, 8(5):497--508, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. Using crash hoare logic for certifying the FSCQ file system. In SOSP, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Shimin Chen and Qin Jin. Persistent BGoogle ScholarGoogle Scholar
  9. -Trees in non-volatile main memory. In VLDB, 2015.Google ScholarGoogle Scholar
  10. Xianzhang Chen, Edwin H.-M. Sha, Ahmad Abdullah, Qingfeng Zhuge, Lin Wu, Chaoshu Yang, and Weiwen Jiang. UDORN: A design framework of persistent in-memory key-value database for NVM. In NVMSA, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  11. Joel Coburn, Trevor Bunker, Meir Schwarz, Rajesh Gupta, and Steven Swanson. From ARIES to MARS: Transaction support for next-generation, solid-state drives. In SOSP, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In ASPLOS, 2011.Google ScholarGoogle Scholar
  13. Nachshon Cohen, David T. Aksun, and James R. Larus. Object-oriented recovery for non-volatile memory. Proc. ACM Program. Lang., 2(OOPSLA):153:1--153:22, October 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. Better I/O through byte-addressable, persistent memory. In SOSP, 2009.Google ScholarGoogle Scholar
  15. Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. System software for persistent memory. In EuroSys, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Daniel Fryer, Kuei Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Ashvin Goel, and Angela Demke Brown. Recon: Verifying file system consistency at runtime. In FAST, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. R. Giles, K. Doshi, and P. Varman. SoftWrAP: A lightweight framework for transactional support of storage class memory. In MSST, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ellis Giles, Kshitij Doshi, and Peter Varman. Continuous checkpointing of htm transactions in nvm. In ISMM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M. Chen, and Thomas F. Wenisch. Persistency for synchronization-free regions. In PLDI, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Terry Ching-Hsiang Hsu, Helge Brügner, Indrajit Roy, Kimberly Keeton, and Patrick Eugster. NVthreads: Practical persistence for multi-threaded applications. In EuroSys, 2017.Google ScholarGoogle Scholar
  21. Qingda Hu, Jinglei Ren, Anirudh Badam, Jiwu Shu, and Thomas Moscibroda. Log-structured non-volatile main memory. In ATC, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Intel. Intel Optane DC persistent memory. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.Google ScholarGoogle Scholar
  23. Intel. An introduction to pmemcheck. http://pmem.io/2015/07/17/pmemcheck-basic.html.Google ScholarGoogle Scholar
  24. Intel. Persistent memory programming. https://pmem.io/.Google ScholarGoogle Scholar
  25. Intel. PMDK man page: libpmem. http://pmem.io/pmdk/manpages/linux/v1.6/libpmem/libpmem.7.html.Google ScholarGoogle Scholar
  26. Intel. Redis. https://github.com/pmem/redis/tree/3.2-nvml, 2018.Google ScholarGoogle Scholar
  27. Intel. Intel 64 and IA-32 architectures software developer's manual. https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1--2abcd-3abcd.pdf, 2019.Google ScholarGoogle Scholar
  28. Intel. Quick start guide: Configure Intel Optane? DC persistent memory modules on Linux. https://software.intel.com/en-us/articles/quick-start-guide-configure-intel-optane-dc-persistent-memory-on-linux, 2019.Google ScholarGoogle Scholar
  29. Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. Failure-atomic persistent memory updates via JUSTDO logging. In ASPLOS, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Arpit Joshi, Vijay Nagarajan, Stratis Viglas, and Marcelo Cintra. ATOM: Atomic durability in non-volatile memory through hardware logging. In HPCA, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  31. S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. Optimizing checkpoints using NVM as virtual memory. In IPDPS, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. High-performance transactions for persistent memories. In ASPLOS, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Aasheesh Kolli, Jeff Rosen, Stephan Diestelhorst, Ali Saidi, Steven Pelley, Sihang Liu, Peter M. Chen, and Thomas F. Wenisch. Delegated persist ordering. In MICRO, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Harendra Kumar, Yuvraj Patel, Ram Kesavan, and Sumith Makam. High-performance metadata integrity protection in the WAFL copy-on-write file system. In FAST, 2017.Google ScholarGoogle Scholar
  35. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas Anderson. Strata: A cross media file system. In SOSP, 2017.Google ScholarGoogle Scholar
  36. Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, July 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Philip Lantz, Dulloor Subramanya Rao, Sanjay Kumar, Rajesh Sankaran, and Jeff Jackson. Yat: A validation framework for persistent memory software. In ATC, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lenovo. Memcached-pmem. https://github.com/lenovo/memcached-pmem, 2018.Google ScholarGoogle Scholar
  39. Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, and Jinglei Ren. Dude™: Building durable transactions with decoupling for persistent memory. In ASPLOS, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Qingrui Liu, Joseph Lzraelevitz, Se Kwon Lee, Michael L. Scott, Sam H. Noh, and Changhee Jung. iDO: Compiler-directed failure atomicity for nonvolatile memory. In MICRO, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Liu, A. Kolli, J. Ren, and S. Khan. Crash consistency in encrypted non-volatile main memory systems. In HPCA, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  42. Sihang Liu, Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, and Samira Khan. Janus: Optimizing memory and storage support for non-volatile memory systems. In ISCA, 2019.Google ScholarGoogle Scholar
  43. Sihang Liu, Yizhou Wei, Jishen Zhao, Aasheesh Kolli, and Samira Khan. PMTest: A fast and flexible testing framework for persistent memory programs. In ASPLOS, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. David E. Lowell and Peter M. Chen. Free transactions with rio vista. In SOSP, 1997.Google ScholarGoogle Scholar
  45. Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. Octopus: An RDMA-enabled distributed persistent memory file system. In ATC, 2017.Google ScholarGoogle Scholar
  46. Brandon Lucia, Vignesh Balaji, Alexei Colin, Kiwan Maeng, and Emily Ruppel. Intermittent Computing: Challenges and Opportunities. In SNAPL, 2017.Google ScholarGoogle Scholar
  47. Brandon Lucia and Benjamin Ransford. A simpler, safer programming and execution model for intermittent systems. In PLDI, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. N. Malviya, A. Weisberg, S. Madden, and M. Stonebraker. Rethinking main memory OLTP recovery. In ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  50. Virendra J. Marathe, Margo Seltzer, Steve Byan, and Tim Harris. Persistent Memcached: Bringing legacy code to byte-addressable persistent memory. In HotStorage, 2017.Google ScholarGoogle Scholar
  51. Ashlie Martinez and Vijay Chidambaram. CrashMonkey: A framework to systematically test file-system crash consistency. In HotStorage, 2017.Google ScholarGoogle Scholar
  52. C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. TODS, 17(1):94--162, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, and Vijay Chidambaram. Finding crash-consistency bugs with bounded black-box crash testing. In OSDI, 2018.Google ScholarGoogle Scholar
  54. Sanketh Nalli, Swapnil Haria, Mark D. Hill, Michael M. Swift, Haris Volos, and Kimberly Keeton. An analysis of persistent memory use with WHISPER. In ASPLOS, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Robert H. B. Netzer and Barton P. Miller. What are race conditions?: Some issues and formalizations. LOPLAS, 1(1):74--88, March 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Tri Nguyen and David Wentzlaff. PiCL: A software-transparent, persistent cache log for nonvolatile main memory. In MICRO, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yuanjiang Ni, Jishen Zhao, Daniel Bittman, and Ethan Miller. Reducing NVM writes with optimized shadow paging. In HotStorage, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. Fast crash recovery in RAMCloud. In SOSP, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Eli Pozniansky and Assaf Schuster. Efficient on-the-fly data race detection in multithreaded CGoogle ScholarGoogle Scholar
  60. programs. In PPoPP, 2003.Google ScholarGoogle Scholar
  61. Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Iron file systems. In SOSP, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu. ThyNVM: Enabling software-transparent crash consistency in persistent memory systems. In MICRO, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. Eraser: A dynamic data race detector for multithreaded programs. TOCS, 15(4):391--411, November 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Konstantin Serebryany and Timur Iskhodzhanov. ThreadSanitizer: Data race detection in practice. In WBIA, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, and Xi Wang. Push-button verification of file systems via crash refinement. In OSDI, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Haris Volos, Sanketh Nalli, Sankarlingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. Aerie: Flexible file-system interfaces to storage-class memory. In EuroSys, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Haris Volos, Andres Jaan Tack, and Michael M. Swift. Mnemosyne: Lightweight persistent memeory. In ASPLOS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Benjamin Wester, David Devecsery, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. Parallelizing data race detection. In ASPLOS, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Michael Wu and Willy Zwaenepoel. eNVy: A non-volatile, main memory storage system. In ASPLOS, 1994.Google ScholarGoogle Scholar
  70. Xiaojian Wu and A. L. Narasimha Reddy. SCMFS: A file system for storage class memory. In SC, 2011.Google ScholarGoogle Scholar
  71. Xingbo Wu, Fan Ni, Li Zhang, Yandong Wang, Yufei Ren, Michel Hack, Zili Shao, and Song Jiang. NVMcached: An NVM-based key-value cache. In ApSys, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. HiKV: A hybrid index key-value store for DRAM-NVM memory systems. In ATC, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Jian Xu and Steven Swanson. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In FAST, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudoff. Nova-Fortis: A fault-tolerant non-volatile main memory file system. In SOSP, 2017.Google ScholarGoogle Scholar
  75. Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. NV-Tree: Reducing consistency cost for NVM-based single level systems. In FAST, 2015.Google ScholarGoogle Scholar
  76. M. Ye, C. Hughes, and A. Awad. Osiris: A low-cost mechanism to enable restoration of secure non-volatile memories. In MICRO, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. Kiln: Closing the performance gap between systems with and without persistence support. In MICRO, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cross-Failure Bug Detection in Persistent Memory Programs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2020
        1412 pages
        ISBN:9781450371025
        DOI:10.1145/3373376

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 March 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate535of2,713submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader