Abstract
File systems may become corrupted for many reasons despite various protection techniques. Therefore, most file systems come with a checker to recover the file system to a consistent state. However, existing checkers are commonly assumed to be able to complete the repair without interruption, which may not be true in practice. In this work, we demonstrate via fault injection experiments that checkers of widely used file systems (EXT4, XFS, BtrFS, and F2FS) may leave the file system in an uncorrectable state if the repair procedure is interrupted unexpectedly. To address the problem, we first fix the ordering issue in the undo logging of e2fsck and then build a general logging library (i.e., rfsck-lib) for strengthening checkers. To demonstrate the practicality, we integrate rfsck-lib with existing checkers and create two new checkers: rfsck-ext, a robust checker for Ext-family file systems, and rfsck-xfs, a robust checker for XFS file systems, both of which require only tens of lines of modification to the original versions. Both rfsck-ext and rfsck-xfs are resilient to faults in our experiments. Also, both checkers incur reasonable performance overhead (i.e., up to 12%) compared to the original unreliable versions. Moreover, rfsck-ext outperforms the patched e2fsck by up to nine times while achieving the same level of robustness.
- Btrfs File System. n.d. https://btrfs.wiki.kernel.org/index.php/Main_Page.Google Scholar
- debugfs. n.d. http://man7.org/linux/man-pages/man8/debugfs.8.html.Google Scholar
- Discussion with Theodore Ts’o at Linux FAST Summit’17. n.d. https://www.usenix.org/conference/linuxfastsummit17.Google Scholar
- E2fsprogs: Ext2/3/4 Filesystem Utilities. n.d. http://e2fsprogs.sourceforge.net/.Google Scholar
- Ext4 File System. n.d. https://ext4.wiki.kernel.org/index.php/Main_Page.Google Scholar
- fs_mark: Benchmark file creation. n.d. https://github.com/josefbacik/fs_mark.Google Scholar
- LFSCK: an online file system checker for Lustre. n.d. https://github.com/Xyratex/lustre-stable/blob/master/Documentation/lfsck.txt.Google Scholar
- Linux Programmer’s Manual: O_SYNC flag for open. n.d. http://man7.org/linux/man-pages/man2/open.2.html.Google Scholar
- Linux SCSI target framework (tgt). n.d. http://stgt.sourceforge.net/.Google Scholar
- Lustre File System. n.d. http://opensfs.org/lustre/.Google Scholar
- mkfs. n.d. https://linux.die.net/man/8/mkfs.Google Scholar
- Prototypes of rfsck-test, e2fsck-patch, refsck-lib, refsck-ext, rfsck-xfs. n.d. https://www.cs.nmsu.edu/ mzheng/lab/lab.html.Google Scholar
- ROSE Compiler Infrastructure. n.d. http://rosecompiler.org/.Google Scholar
- SQLite documents. n.d. http://www.sqlite.org/docs.html.Google Scholar
- The LLVM Compiler Infrastructure. n.d. https://llvm.org/.Google Scholar
- XFS File System Utilities. n.d. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/xfsothers.html.Google Scholar
- {PATCH 1/3} e2fsprogs: Add undo I/O manager. 2007. http://lists.openwall.net/linux-ext4/2007/07/25/2.Google Scholar
- {PATCH 16/31} e2undo: ditch tdb file, write everything to a flat file. 2015. http://lists.openwall.net/linux-ext4/2015/01/08/1.Google Scholar
- High Performance Computing Center (HPCC) Power Outage Event. Email Announcement by HPCC, Monday, January 11, 2016 at 8:50:17 AM CST. 2016. https://www.cs.nmsu.edu/ mzheng/docs/failures/2016-hpcc-outage.pdf.Google Scholar
- Nitin Agarwal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design Tradeoffs for SSD Performance. In Proceedings of the 2008 USENIX Annual Technical Conference (USENIX ATC'08), Vol 57. Google ScholarDigital Library
- Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Garth R. Goodson, and Bianca Schroeder. 2008. An analysis of data corruption in the storage stack. ACM Transactions on Storage 4, 3 (Nov. 2008), 8:1--8:28. Google ScholarDigital Library
- Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the 2007 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07). ACM, 289--300. Google ScholarDigital Library
- Luiz Andre Barroso and Urs Hoelzle. 2009. The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines (1st ed.). Morgan and Claypool Publishers. Google ScholarDigital Library
- Hanmant P. Belgal, Nick Righos, Ivan Kalastirsky, Jeff J. Peterson, Robert Shiner, and Neal Mielke. 2002. A new reliability model for post-cycling charge retention of flash memories. In Proceedings of the 40th Annual Reliability Physics Symposium. IEEE, 7--20.Google ScholarCross Ref
- James Bornholt, Antoine Kaufmann, Jialin Li, Arvind Krishnamurthy, Emina Torlak, and Xi Wang. 2016. Specifying and checking file system crash-consistency models. Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16) 51, 4 (2016), 83--98. Google ScholarDigital Library
- Adam Brand, Ken Wu, Sam Pan, and David Chin. 1993. Novel read disturb failure mechanism induced by FLASH cycling. In Proceedings of the 31st Annual Reliability Physics Symposium. IEEE, 127--132.Google ScholarCross Ref
- Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai. 2012. Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, Dresden, 521--526. Google ScholarDigital Library
- Yu Cai, Gulay Yalcin, Onur Mutlu, Erich F. Haratsch, Osman Unsal, Adrian Cristal, and Ken Mai. 2014. Neighbor-cell assisted error correction for MLC NAND flash memories. In ACM SIGMETRICS Performance Evaluation Review, Vol. 42. ACM, 491--504. Google ScholarDigital Library
- Jinrui Cao, Om Rameshwar Gatla, Mai Zheng, Dong Dai, Vidya Eswarappa, Yan Mu, and Yong Chen. 2018. PFault: A general framework for analyzing the reliability of high-performance parallel file systems. In Proceedings of the 32nd ACM International Conference on Supercomputing (ICS’18). 1--11. Google ScholarDigital Library
- Jinrui Cao, Simeng Wang, Dong Dai, Mai Zheng, and Yong Chen. 2016. A generic framework for testing parallel file systems. In Proceedings of the 1st Joint International Workshop on Parallel Data Storage 8 Data Intensive Scalable Computing Systems (PDSW-DISCS’16). 49--54. Google ScholarDigital Library
- João Carlos Menezes Carreira, Rodrigo Rodrigues, George Candea, and Rupak Majumdar. 2012. Scalable testing of file system checkers. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). ACM, 239--252. Google ScholarDigital Library
- Feng Chen, David A. Koufaty, and Xiaodong Zhang. 2009. Understanding intrinsic characteristics and system implications of flash memory based solid state drives. In Proceedings of the ACM Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’09). Google ScholarDigital Library
- Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. 2015. Using crash hoare logic for certifying the FSCQ file system. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). ACM, 18--37. Google ScholarDigital Library
- Peter M. Chen, Edward K. Lee, Garth A. Gibson, Randy H. Katz, and David A. Patterson. 1994. RAID: High-performance, reliable secondary storage. Computer Surveys 26, 2 (June 1994), 145--185. Google ScholarDigital Library
- Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Farmington, PA. Google ScholarDigital Library
- Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the 10th Conference on File and Storage Technologies (FAST’12). Google ScholarDigital Library
- Alex Conway, Ainesh Bakshi, Yizheng Jiao, William Jannen, Yang Zhan, Jun Yuan, Michael A. Bender, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, and Martin Farach-Colton. 2017. File systems fated for senescence? Nonsense, says science! In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 45--58. Google ScholarDigital Library
- GParted Forum. 2009. e2fsck is taking forever. http://gparted-forum.surf4.info/viewtopic.php?id=13613.Google Scholar
- JaguarPC Forum. 2006. How long does it take FSCK to run?! http://forums.jaguarpc.com/hosting-talk-chit-chat/14217-how-long-does-take-fsck-run.html.Google Scholar
- Daniel Fryer, Kuei Sun, Rahat Mahmood, TingHao Cheng, Shaun Benjamin, Ashvin Goel, and Angela Demke Brown. 2012. Recon: Verifying file system consistency at runtime. In Proceedings of the 10th Conference on File and Storage Technologies (FAST’12). Google ScholarDigital Library
- Ryan Gabrys, Eitan Yaakobi, Laura M. Grupp, Steven Swanson, and Lara Dolecek. 2012. Tackling intracell variability in TLC flash through tensor product codes. In Proceedings of IEEE International Symposium of Information Theory. 1000--1004.Google ScholarCross Ref
- Gregory R. Ganger, Marshall Kirk McKusick, Craig A. N. Soules, and Yale N. Patt. 2000. Soft updates: A solution to the metadata update problem in file systems. ACM Transactions on Computer Systems (TOCS’00) 18, 2 (2000), 127--153. Google ScholarDigital Library
- Om Rameshwar Gatla, Muhammad Hameed, Mai Zheng, Viacheslav Dubeyko, Adam Manzanares, Filip Blagojević, Cyril Guyot, and Robert Mateescu. 2018. Towards robust file system checkers. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). USENIX Association, Oakland, CA, 105--122. Google ScholarDigital Library
- Om Rameshwar Gatla and Mai Zheng. 2017. Understanding the fault resilience of file system checkers. In Proceedings of the 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’17). USENIX Association, Santa Clara, CA. Google ScholarDigital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The google file system. In Proceedings of the 9th ACM Symposium on Operating Systems Principles (SOSP’03). 29--43. Google ScholarDigital Library
- Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). 24--33. Google ScholarDigital Library
- Haryadi S. Gunawi, Mingzhe Hao, Riza O. Suminto, Agung Laksono, Anang D. Satria, Jeffry Adityatama, and Kurnia J. Eliazar. 2016. Why does the cloud stop computing? Lessons from hundreds of service outages. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’16). 1--16. Google ScholarDigital Library
- Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2008. SQCK: A declarative file system checker. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). 131--146. Google ScholarDigital Library
- Haryadi S. Gunawi, Cindy Rubio-González, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Ben Liblit. 2008. EIO: Error handling is occasionally correct. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08), Vol. 8. 1--16. Google ScholarDigital Library
- Zhenyu Guo, Sean McDirmid, Mao Yang, Li Zhuang, Pu Zhang, Yingwei Luo, Tom Bergan, Madan Musuvathi, Zheng Zhang, and Lidong Zhou. 2013. Failure recovery: When the cure is worse than the disease. In Proceedings of the 14th Workshop on Hot Topics in Operating Systems (HotOS’13). Google ScholarDigital Library
- Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC’12). 15--26. Google ScholarDigital Library
- Xavier Jimenez, David Novo, and Paolo Ienne. 2014. Wear unleveling: Improving NAND flash lifetime by balancing page endurance. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 47--59. Google ScholarDigital Library
- Andrew Krioukov, Lakshmi N. Bairavasundaram, Garth R. Goodson, Kiran Srinivasan, Randy Thelen, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2008. Parity lost and parity regained. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’08), Vol. 8. 1--15. Google ScholarDigital Library
- H. Kurata, K. Otsuga, A. Kotabe, S. Kajiyama, T. Osabe, Y. Sasago, S. Narumi, K. Tokami, S. Kamohara, and O. Tsuchiya. 2006. The impact of random telegraph signals on the scaling of multilevel Flash memories. In Proceedings of the 2006 Symposium on VLSI Circuits. IEEE, 112--113.Google Scholar
- Changman Lee, Dongho Sim, Joo-Young Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 273--286. Google ScholarDigital Library
- Jiangpeng Li, Kai Zhao, Xuebin Zhang, Jun Ma, Ming Zhao, and Tong Zhang. 2015. How much can data compressibility help to improve NAND flash memory lifetime? In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 227--240. Google ScholarDigital Library
- Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Shan Lu. 2013. A study of Linux file system evolution. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 31--44. Google ScholarDigital Library
- Lanyue Lu, Yupu Zhang, Thanh Do, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Physical disentanglement in a container-based file system. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 81--96. Google ScholarDigital Library
- Youyou Lu, Jiwu Shu, Weimin Zheng, et al. 2013. Extending the lifetime of flash-based storage through reducing write amplification from file systems. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13), Vol. 13. Google ScholarDigital Library
- Ao Ma, Chris Dragga, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. ffsck: The fast file system checker. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 1--15. Google ScholarDigital Library
- Ashlie Martinez and Vijay Chidambaram. 2017. CrashMonkey: A framework to automatically test file-system crash consistency. In Proceedings of the 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’17). Google ScholarDigital Library
- Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry. 1984. A fast file system for UNIX. Proceedings of the ACM Transactions on Computer Systems (TOCS’84) 2, 3 (Aug. 1984), 181--197. Google ScholarDigital Library
- Changwoo Min, Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, and Taesoo Kim. 2015. Cross-checking semantic correctness: The case of finding file system bugs. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). ACM, 361--377. Google ScholarDigital Library
- C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS’92) (1992). Google ScholarDigital Library
- T. Ong, A. Frazio, N. Mielke, S. Pan, N. Righos, G. Atwood, and S. Lai. 1993. Erratic erase in ETOX/sup TM/ flash memory array. In Proceedings of the Symposium on VLSI Technology (VLSI’93).Google Scholar
- Lluis Pamies-Juarez, Filip Blagojević, Robert Mateescu, Cyril Gyuot, Eyal En Gad, and Zvonimir Bandić. 2016. Opening the chrysalis: On the real repair performance of MSR codes. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 81--94. Google ScholarDigital Library
- Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). Google ScholarDigital Library
- Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP’05). 206--220. Google ScholarDigital Library
- Abhishek Rajimwale, Vijay Chidambaram, Deepak Ramamurthi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2011. Coerced cache eviction and discreet mode journaling: Dealing with misbehaving disks. In Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems 8 Networks (DSN’11). IEEE, 518--529. Google ScholarDigital Library
- Bianca Schroeder and Garth A. Gibson. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST’07). Google ScholarDigital Library
- Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 67--80. Google ScholarDigital Library
- Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). IEEE, 1--10. Google ScholarDigital Library
- Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, and Xi Wang. 2016. Push-button verification of file systems via crash refinement. In Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). Google ScholarDigital Library
- Keith A. Smith and Margo I. Seltzer. 1997. File system aging: Increasing the relevance of file system benchmarks. In Proceedings of the 1997 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’97). 203--213. Google ScholarDigital Library
- V. Svanberg. 2009. Fsck takes too long on multiply-claimed blocks. http://old.nabble.com/Fsck-takes-too-long-on-multiply-claimed-blocks-td21972943.html.Google Scholar
- Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the 1996 USENIX Annual Technical Conference (USENIX ATC’96), Vol. 15. Google ScholarDigital Library
- Huang-Wei Tseng, Laura M. Grupp, and Steven Swanson. 2011. Understanding the impact of power loss on flash memory. In Proceedings of the 48th Design Automation Conference (DAC’11). Google ScholarDigital Library
- Stephen C. Tweedie. 1998. Journaling the linux ext2fs filesystem. In Proceedings of the 4th Annual Linux Expo.Google Scholar
- Simeng Wang, Jinrui Cao, Danny V. Murillo, Yiliang Shi, and Mai Zheng. 2016. Emulating realistic flash device errors with high fidelity. In Proceedings of the IEEE International Conference on Networking, Architecture and Storage (NAS’16). IEEE.Google ScholarCross Ref
- Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 307--320. Google ScholarDigital Library
- Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. 2015. A tale of two erasure codes in HDFS. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 213--226. Google ScholarDigital Library
- Gala Yadgar, Eitan Yaakobi, and Assaf Schuster. 2015. Write once, get 50% free: Saving SSD erase costs using WOM codes. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 257--271. Google ScholarDigital Library
- Junfeng Yang, Can Sar, and Dawson Engler. 2006. EXPLODE: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 131--146. Google ScholarDigital Library
- Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. De-indirection for flash-based SSDs with nameless writes. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). Google ScholarDigital Library
- Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2010. End-to-end data integrity for file systems: A ZFS case study. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). 29--42. Google ScholarDigital Library
- Mai Zheng, Joseph Tucek, Dachuan Huang, Feng Qin, Mark Lillibridge, Elizabeth S. Yang, Bill W. Zhao, and Shashank Singh. 2014. Torturing databases for fun and profit. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 449--464. Google ScholarDigital Library
- Mai Zheng, Joseph Tucek, Feng Qin, and Mark Lillibridge. 2013. Understanding the robustness of SSDs under power fault. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). Google ScholarDigital Library
- Mai Zheng, Joseph Tucek, Feng Qin, Mark Lillibridge, Bill W. Zhao, and Elizabeth S. Yang. 2016. Reliability analysis of SSDs under power fault. In Proceedings of the ACM Transactions on Computer Systems (TOCS’16). Google ScholarDigital Library
Index Terms
- Towards Robust File System Checkers
Recommendations
Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to File-System Faults
Special Issue on FAST 2017 and Regular PapersWe analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous problems related to file-...
Towards robust file system checkers
FAST'18: Proceedings of the 16th USENIX Conference on File and Storage TechnologiesFile systems may become corrupted for many reasons despite various protection techniques. Therefore, most file systems come with a checker to recover the file system to a consistent state. However, existing checkers are commonly assumed to be able to ...
Scalable testing of file system checkers
EuroSys '12: Proceedings of the 7th ACM european conference on Computer SystemsFile system checkers (like e2fsck) are critical, complex, and hard to develop, and developers today rely on hand-written tests to exercise this intricate code. Test suites for file system checkers take a lot of effort to develop and require careful ...
Comments