Skip to main content
Log in

montage: NVM-based scalable synchronization framework for crash-consistent file systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In file systems, a single write system call can make multiple modifications to data and metadata, but such changes are not flushed in an atomic way. To retain the consistency of file systems, conventional approaches guarantee crash consistency in exchange for sacrificing system performance. To mitigate the performance penalty, non-volatile memory (NVM) technologies are believed to be a good candidate for this purpose owing to their low latency and byte-addressibility. However, none of the prior proposals that exploit NVM provide both scalability and strict ordering of modifications. In this article, we propose montage, a crash consistency framework for file systems that consists of two parts. First, montage splits NVM space into multiple staging areas and synchronizes the flushing of modifications these to the storage device. Second, montage uses the pipelined architecture to speed up data flushing to storage. We apply montage to two journaling file systems (ext4 and JFS) and evaluate them on a multicore server with high-performance storage. The evaluation results demonstrate that our design exhibits better performance by a wide margin than recent NVM-based journaling file systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. In software deployment, the staging phase denotes a pre-production phase that is separated from the real environment but eventually transits to the production phase. We borrow this term to represent the file system states that are durable and consistent with modifications but are not reflected in the file system storage.

  2. The ext4 file system uses this routine for both metadata and data journaling.

  3. Since the commodity storage devices limit I/O parallelism and I/O bandwidth compared to high-end storage devices, more than 16 threads do not increase the file system performance in all cases. Due to the limited space, we compare the file systems with high-end storage devices.

  4. At the current experimental settings, montage cannot take performance benefit from more than eight partitions.

References

  1. Best, S.: Jfs log: how the journaled file system performs logging. In: Annual Linux Showcase & Conference (2000)

  2. Bhat, S.S., Eqbal, R., Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scaling a file system to many cores using an operation log. In: ACM SOSP (2017)

  3. Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, N.: An analysis of linux scalability to many cores. In: USENIX OSDI (2010)

  4. Chen, C., Yang, J., Wei, Q., Wang, C., Xue, M.: Fine-grained metadata journaling on NVM. In: IEEE MSST (2016)

  5. Chen, J., Wei, Q., Chen, C., Wu, L.: FSMAC: a file system metadata accelerator with non-volatile memory. In: IEEE MSST (2013)

  6. Clements, A.T., Kaashoek, M.F., Zeldovich, N.: RadixVM: scalable address spaces for multithreaded applications. In: ACM EuroSys (2013)

  7. Clements A.T., Kaashoek, M.F., Zeldovich, N., Morris, R.T., Kohler, E.: The scalable commutativity rule: designing scalable software for multicore processors. In: ACM SOSP (2013)

  8. Coburn, J., Bunker, T., Schwarz, M., Gupta, R., Swanson, S.: From ARIES to MARS: transaction support for next-generation. solid-state drives. In: ACM SOSP (2013)

  9. Condit, J., Nightingale, E.B., Frost, C., Ipek, E., Lee, B., Burger, D., Coetzee, D.: Better I/O through byte-addressable. In: Persistent Memory, ACM SOSP (2009)

  10. Dulloor, S.R., Kumar, S., Keshavamurthy, A., Lantz, P., Reddy, D., Sankaran, R., Jackson, J.: System software for persistent memory. In: ACM EuroSys (2014)

  11. Esmet, J., Bender, M.A., Farach-Colton, M., Kuszmaul, B.C.: The tokufs streaming file system. In: USENIX HotStorage (2012)

  12. Gao, S., Xu, J., Härder, T., He, B., Choi, B., Hu, H.: Pcmlogging: Optimizing transaction logging and recovery performance with pcm. IEEE Trans. Knowl. Data Eng. 27(12), 3332–3346 (2015)

    Article  Google Scholar 

  13. Han, H., Park, S., Jung, H., Fekete, A., Röhm, U., Yeom, H.Y.: Scalable serializable snapshot isolation for multicore systems. In: IEEE ICDE (2014)

  14. Huang, F., Feng, D., Hua, Y., Zhou, W.: A wear-leveling-aware counter mode for data encryption in non-volatile memories. In: DATE (2017)

  15. Izraelevitz, J., Yang, J., Zhang, L., Kim, J., Liu, X., Memaripour, A., Soh, Y.J., Wang, Z., Xu, Y., Dulloor, S.R., et al.: Basic performance measurements of the intel optane dc persistent memory module. arXiv preprint arXiv:1903.05714 (2019)

  16. Jang, H., Rhee, S.Y., Kim, J.E., Kang, S., Han, H., Jung, H.: Autobahn: accelerating concurrent, durable file i/o via a non-volatile buffer. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 228–232. IEEE (2017)

  17. Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. VLDB Endow. 3(1–2), 681 (2010)

    Article  Google Scholar 

  18. Jung, H., Han, H., Fekete, A., Heiser, G., Yeom, H.Y.: A scalable lock manager for multicores. ACM Trans. Database Syst. 39(4), 1 (2014)

    Article  MathSciNet  Google Scholar 

  19. Jung, H., Han, H., Kang, S.: Scalable database logging for multicores. Proc. VLDB Endow. 11(2), 135 (2017)

    Article  Google Scholar 

  20. Kadekodi, R., Lee, S.K., Kashyap, S., Kim, T., Kolli, A., Chidambaram, V.: Splitfs: reducing software overhead in file systems for persistent memory. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 494–508 (2019)

  21. Kang, J., Zhang, B., Wo, T., Yu, W., Du, L., Ma, S., Huai, J.: SpanFS: a scalable file system on fast storage devices. In: USENIX ATC (2015)

  22. Kim, J., Jang, H., Son, S., Han, H., Kang, S., Jung, H.: Border-collie: a wait-free, read-optimal algorithm for database logging on multicore hardware. In: ACM SIGMOD (2019)

  23. Kim, J., Min, C., Eom, Y.I.: Reducing excessive journaling overhead with small-sized NVRAM for mobile devices. IEEE Trans. Consumer Electron. 60, 217 (2014)

    Article  Google Scholar 

  24. Kimura, H.: Foedus: Oltp engine for a thousand cores and nvram. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 691–706 (2015)

  25. Krishnan, R.M., Kim, J., Mathew, A., Fu, X., Demeri, A., Min, C., Kannan, S.: Durable transactional memory can scale with timestone. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 335–349 (2020)

  26. Kwon, Y., Fingler, H., Hunt, T., Peter, S., Witchel, E., Anderson, T.: Strata: A cross media file system. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 460–477 (2017)

  27. Lee, C., Sim, D., Hwang, J., Cho, S.: F2fs: a new file system for flash storage. In: 13thUSENIX Conference on File and Storage Technologies (FAST 15), pp. 273–286 (2015)

  28. Lee, E., Bahn, H., Noh, S.H.: Unioning of the buffer cache and journaling layers with non-volatile memory. In: USENIX FAST (2013)

  29. Linux. Direct acess for files. https://www.kernel.org/doc/Documentation/filesystems/dax.txt (2019)

  30. Liu, Y., Li, H., Lu, Y., Chen, Z., Xiao, N., Zhao, M.: Hasfs: optimizing file system consistency mechanism on nvm-based hybrid storage architecture. Clust. Comput. 1–15, (2019)

  31. Lorie, R.A.: Physical integrity in a large segmented database. ACM Trans. Database Syst. (TODS) 2(1), 91–104 (1977)

    Article  Google Scholar 

  32. Lu, L., Zhang, Y., Do, T., Al-Kiswany, S., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Physical disentanglement in a container-based file system. In: USENIX OSDI (2014)

  33. Min, C., Kashyap, S., Maass, S., Kang, W., Kim, T.: Understanding manycore scalability of file systems. In: USENIX ATC (2016)

  34. Oh, G., Kim, S., Lee, S.-W., Moon, B.: SQLite optimization with phase change memory for mobile applications. Proc. VLDB Endow. 8(12), 1454–1465 (2015)

    Article  Google Scholar 

  35. Park, D., Shin, D.: iJournaling: fine-grained journaling for improving the latency of fsync system call. In: USENIX ATC (2017)

  36. Son, Y., Kim, S., Yeom, H.Y., Han, H.: High-performance transaction processing in journaling file systems. In: USENIX FAST (2018)

  37. Sul, W., Kim, K., Ryu, M., Jung, H., Han, H.: Fast journaling made simple with nvm. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, pp. 1214–1221 (2020)

  38. Tarasov, V., Zadok, E., Shepler, S.: Filebench: A flexible framework for file system benchmarking. login 41(1), 6 (2016)

    Google Scholar 

  39. Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: ACM SOSP (2013)

  40. Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 7(10), 865–876 (2014)

    Article  Google Scholar 

  41. Wu, T., Chen, X., Liu, K., Xiao, C., Liu, Z., Zhuge, Q., Sha, E.H.-M.: Hydrafs: an efficient numa-aware in-memory file system. Clust. Comput. 1–20, (2019)

  42. Wu, X., Reddy, A.L.N.. SCMFS: a file system for storage class memory. In: IEEE/ACM SC (2011)

  43. Xu, J., Kim, J., Memaripour, A., Swanson, S.: Finding and fixing performance pathologies in persistent memory software stacks. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 427–439 (2019)

  44. Xu, J., Swanson, S.: NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In: USENIX FAST (2016)

  45. Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: 18th USENIX Conference on File and Storage Technologies (FAST 20), pp. 169–182 (2020)

  46. Zeng, L., Hou, B., Feng, D., Kent, K.B.: SJM: An SCM-based journaling mechanism with write reduction for file systems. In: DISCS (2015)

  47. Zhang, X., Feng, D., Hua, Y., Chen, J.: A cost-efficient nvm-based journaling scheme for file systems. In: IEEE ICCD (2017)

  48. Zheng, W., Tu, S., Kohler, E., Liskov, B.: Fast databases with fast durability and recovery through multicore parallelism. In: USENIX OSDI (2014)

Download references

Acknowledgements

This research was supported by the National Research Foundation of Korea (NRF) (2015M3C4A7065645, 2020R1F1A1055489). Hyuck Han is the corresponding author of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyuck Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sul, W., Yeom, H.Y. & Han, H. montage: NVM-based scalable synchronization framework for crash-consistent file systems. Cluster Comput 24, 3573–3590 (2021). https://doi.org/10.1007/s10586-021-03329-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03329-w

Keywords

Navigation