ABSTRACT
Persistent memory (PM) technologies, such as Intel's Optane memory, deliver high performance, byte-addressability, and persistence, allowing programs to directly manipulate persistent data in memory without any OS intermediaries. An important requirement of these programs is that persistent data must remain consistent across a failure, which we refer to as the crash consistency guarantee. However, maintaining crash consistency is not trivial. We identify that a consistent recovery critically depends not only on the execution before the failure, but also on the recovery and resumption after failure. We refer to these stages as the pre- and post-failure execution stages. In order to holistically detect crash consistency bugs, we categorize the underlying causes behind inconsistent recovery due to incorrect interactions between the pre- and post-failure execution. First, a program is not crash-consistent if the post-failure stage reads from locations that are not guaranteed to be persisted in all possible access interleavings during the pre-failure stage -- a type of programming error that leads to a race that we refer to as a cross-failure race. Second, a program is not crash-consistent if the post-failure stage reads persistent data that has been left semantically inconsistent during the pre-failure stage, such as a stale log or uncommitted data. We refer to this type of bugs as a cross-failure semantic bug. Together, they form the cross-failure bugs in PM programs. In this work, we provide XFDetector, a tool that detects cross-failure bugs by automatically injecting failures into the pre-failure execution, and checking for cross-failure races and semantic bugs in the post-failure continuation. XFDetector has detected four new bugs in three pieces of PM software: one of PMDK's examples, a PM-optimized Redis database, and a PMDK library function.
- S. V. Adve and M. D. Hill. A unified formalization of four shared-memory models. TPDS, 4(6):613--624, June 1993.Google ScholarDigital Library
- Sarita V. Adve, Mark D. Hill, Barton P. Miller, and Robert H. B. Netzer. Detecting data races on weak memory systems. In ISCA, 1991.Google Scholar
- ARM. ARM architecture reference manual ARMv8, for ARMv8-A architecture profile. https://static.docs.arm.com/ddi0487/da/DDI0487D_a_armv8_arm.pdf, 2018.Google Scholar
- Joy Arulraj and Andrew Pavlo. How to build a non-volatile memory database management system. In SIGMOD, 2017.Google ScholarDigital Library
- Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. Atlas: Leveraging locks for non-volatile memory consistency. In OOPSLA, 2014.Google Scholar
- Andreas Chatzistergiou, Marcelo Cintra, and Stratis D. Viglas. REWIND: Recovery write-ahead system for in-memory non-volatile data-structures. PVLDB, 8(5):497--508, 2015.Google ScholarDigital Library
- Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. Using crash hoare logic for certifying the FSCQ file system. In SOSP, 2015.Google ScholarDigital Library
- Shimin Chen and Qin Jin. Persistent BGoogle Scholar
- -Trees in non-volatile main memory. In VLDB, 2015.Google Scholar
- Xianzhang Chen, Edwin H.-M. Sha, Ahmad Abdullah, Qingfeng Zhuge, Lin Wu, Chaoshu Yang, and Weiwen Jiang. UDORN: A design framework of persistent in-memory key-value database for NVM. In NVMSA, 2017.Google ScholarCross Ref
- Joel Coburn, Trevor Bunker, Meir Schwarz, Rajesh Gupta, and Steven Swanson. From ARIES to MARS: Transaction support for next-generation, solid-state drives. In SOSP, 2013.Google ScholarDigital Library
- Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In ASPLOS, 2011.Google Scholar
- Nachshon Cohen, David T. Aksun, and James R. Larus. Object-oriented recovery for non-volatile memory. Proc. ACM Program. Lang., 2(OOPSLA):153:1--153:22, October 2018.Google ScholarDigital Library
- Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. Better I/O through byte-addressable, persistent memory. In SOSP, 2009.Google Scholar
- Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. System software for persistent memory. In EuroSys, 2014.Google ScholarDigital Library
- Daniel Fryer, Kuei Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Ashvin Goel, and Angela Demke Brown. Recon: Verifying file system consistency at runtime. In FAST, 2012.Google ScholarDigital Library
- E. R. Giles, K. Doshi, and P. Varman. SoftWrAP: A lightweight framework for transactional support of storage class memory. In MSST, 2015.Google ScholarCross Ref
- Ellis Giles, Kshitij Doshi, and Peter Varman. Continuous checkpointing of htm transactions in nvm. In ISMM, 2017.Google ScholarDigital Library
- Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M. Chen, and Thomas F. Wenisch. Persistency for synchronization-free regions. In PLDI, 2018.Google ScholarDigital Library
- Terry Ching-Hsiang Hsu, Helge Brügner, Indrajit Roy, Kimberly Keeton, and Patrick Eugster. NVthreads: Practical persistence for multi-threaded applications. In EuroSys, 2017.Google Scholar
- Qingda Hu, Jinglei Ren, Anirudh Badam, Jiwu Shu, and Thomas Moscibroda. Log-structured non-volatile main memory. In ATC, 2017.Google ScholarDigital Library
- Intel. Intel Optane DC persistent memory. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.Google Scholar
- Intel. An introduction to pmemcheck. http://pmem.io/2015/07/17/pmemcheck-basic.html.Google Scholar
- Intel. Persistent memory programming. https://pmem.io/.Google Scholar
- Intel. PMDK man page: libpmem. http://pmem.io/pmdk/manpages/linux/v1.6/libpmem/libpmem.7.html.Google Scholar
- Intel. Redis. https://github.com/pmem/redis/tree/3.2-nvml, 2018.Google Scholar
- Intel. Intel 64 and IA-32 architectures software developer's manual. https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1--2abcd-3abcd.pdf, 2019.Google Scholar
- Intel. Quick start guide: Configure Intel Optane? DC persistent memory modules on Linux. https://software.intel.com/en-us/articles/quick-start-guide-configure-intel-optane-dc-persistent-memory-on-linux, 2019.Google Scholar
- Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. Failure-atomic persistent memory updates via JUSTDO logging. In ASPLOS, 2016.Google ScholarDigital Library
- Arpit Joshi, Vijay Nagarajan, Stratis Viglas, and Marcelo Cintra. ATOM: Atomic durability in non-volatile memory through hardware logging. In HPCA, 2017.Google ScholarCross Ref
- S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. Optimizing checkpoints using NVM as virtual memory. In IPDPS, 2013.Google ScholarDigital Library
- Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. High-performance transactions for persistent memories. In ASPLOS, 2016.Google ScholarDigital Library
- Aasheesh Kolli, Jeff Rosen, Stephan Diestelhorst, Ali Saidi, Steven Pelley, Sihang Liu, Peter M. Chen, and Thomas F. Wenisch. Delegated persist ordering. In MICRO, 2016.Google ScholarDigital Library
- Harendra Kumar, Yuvraj Patel, Ram Kesavan, and Sumith Makam. High-performance metadata integrity protection in the WAFL copy-on-write file system. In FAST, 2017.Google Scholar
- Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas Anderson. Strata: A cross media file system. In SOSP, 2017.Google Scholar
- Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, July 1978.Google ScholarDigital Library
- Philip Lantz, Dulloor Subramanya Rao, Sanjay Kumar, Rajesh Sankaran, and Jeff Jackson. Yat: A validation framework for persistent memory software. In ATC, 2014.Google ScholarDigital Library
- Lenovo. Memcached-pmem. https://github.com/lenovo/memcached-pmem, 2018.Google Scholar
- Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, and Jinglei Ren. Dude™: Building durable transactions with decoupling for persistent memory. In ASPLOS, 2017.Google ScholarDigital Library
- Qingrui Liu, Joseph Lzraelevitz, Se Kwon Lee, Michael L. Scott, Sam H. Noh, and Changhee Jung. iDO: Compiler-directed failure atomicity for nonvolatile memory. In MICRO, 2018.Google ScholarDigital Library
- S. Liu, A. Kolli, J. Ren, and S. Khan. Crash consistency in encrypted non-volatile main memory systems. In HPCA, 2018.Google ScholarCross Ref
- Sihang Liu, Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, and Samira Khan. Janus: Optimizing memory and storage support for non-volatile memory systems. In ISCA, 2019.Google Scholar
- Sihang Liu, Yizhou Wei, Jishen Zhao, Aasheesh Kolli, and Samira Khan. PMTest: A fast and flexible testing framework for persistent memory programs. In ASPLOS, 2019.Google ScholarDigital Library
- David E. Lowell and Peter M. Chen. Free transactions with rio vista. In SOSP, 1997.Google Scholar
- Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. Octopus: An RDMA-enabled distributed persistent memory file system. In ATC, 2017.Google Scholar
- Brandon Lucia, Vignesh Balaji, Alexei Colin, Kiwan Maeng, and Emily Ruppel. Intermittent Computing: Challenges and Opportunities. In SNAPL, 2017.Google Scholar
- Brandon Lucia and Benjamin Ransford. A simpler, safer programming and execution model for intermittent systems. In PLDI, 2015.Google ScholarDigital Library
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.Google ScholarDigital Library
- N. Malviya, A. Weisberg, S. Madden, and M. Stonebraker. Rethinking main memory OLTP recovery. In ICDE, 2014.Google ScholarCross Ref
- Virendra J. Marathe, Margo Seltzer, Steve Byan, and Tim Harris. Persistent Memcached: Bringing legacy code to byte-addressable persistent memory. In HotStorage, 2017.Google Scholar
- Ashlie Martinez and Vijay Chidambaram. CrashMonkey: A framework to systematically test file-system crash consistency. In HotStorage, 2017.Google Scholar
- C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. TODS, 17(1):94--162, 1992.Google ScholarDigital Library
- Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, and Vijay Chidambaram. Finding crash-consistency bugs with bounded black-box crash testing. In OSDI, 2018.Google Scholar
- Sanketh Nalli, Swapnil Haria, Mark D. Hill, Michael M. Swift, Haris Volos, and Kimberly Keeton. An analysis of persistent memory use with WHISPER. In ASPLOS, 2017.Google ScholarDigital Library
- Robert H. B. Netzer and Barton P. Miller. What are race conditions?: Some issues and formalizations. LOPLAS, 1(1):74--88, March 1992.Google ScholarDigital Library
- Tri Nguyen and David Wentzlaff. PiCL: A software-transparent, persistent cache log for nonvolatile main memory. In MICRO, 2018.Google ScholarDigital Library
- Yuanjiang Ni, Jishen Zhao, Daniel Bittman, and Ethan Miller. Reducing NVM writes with optimized shadow paging. In HotStorage, 2018.Google ScholarDigital Library
- Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. Fast crash recovery in RAMCloud. In SOSP, 2011.Google ScholarDigital Library
- Eli Pozniansky and Assaf Schuster. Efficient on-the-fly data race detection in multithreaded CGoogle Scholar
- programs. In PPoPP, 2003.Google Scholar
- Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Iron file systems. In SOSP, 2005.Google ScholarDigital Library
- Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu. ThyNVM: Enabling software-transparent crash consistency in persistent memory systems. In MICRO, 2015.Google ScholarDigital Library
- Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Anderson. Eraser: A dynamic data race detector for multithreaded programs. TOCS, 15(4):391--411, November 1997.Google ScholarDigital Library
- Konstantin Serebryany and Timur Iskhodzhanov. ThreadSanitizer: Data race detection in practice. In WBIA, 2009.Google ScholarDigital Library
- Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, and Xi Wang. Push-button verification of file systems via crash refinement. In OSDI, 2016.Google ScholarDigital Library
- Haris Volos, Sanketh Nalli, Sankarlingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. Aerie: Flexible file-system interfaces to storage-class memory. In EuroSys, 2014.Google ScholarDigital Library
- Haris Volos, Andres Jaan Tack, and Michael M. Swift. Mnemosyne: Lightweight persistent memeory. In ASPLOS, 2011.Google ScholarDigital Library
- Benjamin Wester, David Devecsery, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. Parallelizing data race detection. In ASPLOS, 2013.Google ScholarDigital Library
- Michael Wu and Willy Zwaenepoel. eNVy: A non-volatile, main memory storage system. In ASPLOS, 1994.Google Scholar
- Xiaojian Wu and A. L. Narasimha Reddy. SCMFS: A file system for storage class memory. In SC, 2011.Google Scholar
- Xingbo Wu, Fan Ni, Li Zhang, Yandong Wang, Yufei Ren, Michel Hack, Zili Shao, and Song Jiang. NVMcached: An NVM-based key-value cache. In ApSys, 2016.Google ScholarDigital Library
- Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. HiKV: A hybrid index key-value store for DRAM-NVM memory systems. In ATC, 2017.Google ScholarDigital Library
- Jian Xu and Steven Swanson. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In FAST, 2016.Google ScholarDigital Library
- Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudoff. Nova-Fortis: A fault-tolerant non-volatile main memory file system. In SOSP, 2017.Google Scholar
- Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. NV-Tree: Reducing consistency cost for NVM-based single level systems. In FAST, 2015.Google Scholar
- M. Ye, C. Hughes, and A. Awad. Osiris: A low-cost mechanism to enable restoration of secure non-volatile memories. In MICRO, 2018.Google ScholarDigital Library
- Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. Kiln: Closing the performance gap between systems with and without persistence support. In MICRO, 2013.Google ScholarDigital Library
Index Terms
- Cross-Failure Bug Detection in Persistent Memory Programs
Recommendations
Jaaru: efficiently model checking persistent memory programs
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPersistent memory (PM) technologies combine near DRAM performance with persistency and open the possibility of using one copy of a data structure as both a working copy and a persistent store of the data. Ensuring that these persistent data structures ...
PMFuzz: test case generation for persistent memory programs
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsThe Persistent Memory (PM) technology combines the persistence of storage with the performance approaching that of DRAM. Programs taking advantage of PM must ensure data remains recoverable after a failure (e.g., power outage), and therefore, are ...
Fast, flexible, and comprehensive bug detection for persistent memory programs
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsDebugging persistent memory (PM) programs faces a fundamental tradeoff between performance overhead and bug coverage (comprehensiveness). Large performance overhead or limited bug coverage makes debugging infeasible or ineffective for PM programs. We ...
Comments