montage: NVM-based scalable synchronization framework for crash-consistent file systems

Sul, Woong; Yeom, Heon Y.; Han, Hyuck

doi:10.1007/s10586-021-03329-w

montage: NVM-based scalable synchronization framework for crash-consistent file systems

Published: 30 June 2021

Volume 24, pages 3573–3590, (2021)
Cite this article

Cluster Computing Aims and scope Submit manuscript

351 Accesses
1 Citation
Explore all metrics

Abstract

In file systems, a single write system call can make multiple modifications to data and metadata, but such changes are not flushed in an atomic way. To retain the consistency of file systems, conventional approaches guarantee crash consistency in exchange for sacrificing system performance. To mitigate the performance penalty, non-volatile memory (NVM) technologies are believed to be a good candidate for this purpose owing to their low latency and byte-addressibility. However, none of the prior proposals that exploit NVM provide both scalability and strict ordering of modifications. In this article, we propose montage, a crash consistency framework for file systems that consists of two parts. First, montage splits NVM space into multiple staging areas and synchronizes the flushing of modifications these to the storage device. Second, montage uses the pipelined architecture to speed up data flushing to storage. We apply montage to two journaling file systems (ext4 and JFS) and evaluate them on a multicore server with high-performance storage. The evaluation results demonstrate that our design exhibits better performance by a wide margin than recent NVM-based journaling file systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spindle: A Write-Optimized NVM Cache for Journaling File System

TSU: A Two-Stage Update Approach for Persistent Skiplist

HasFS: optimizing file system consistency mechanism on NVM-based hybrid storage architecture

Article 11 December 2019

Notes

In software deployment, the staging phase denotes a pre-production phase that is separated from the real environment but eventually transits to the production phase. We borrow this term to represent the file system states that are durable and consistent with modifications but are not reflected in the file system storage.
The ext4 file system uses this routine for both metadata and data journaling.
Since the commodity storage devices limit I/O parallelism and I/O bandwidth compared to high-end storage devices, more than 16 threads do not increase the file system performance in all cases. Due to the limited space, we compare the file systems with high-end storage devices.
At the current experimental settings, montage cannot take performance benefit from more than eight partitions.

References

Best, S.: Jfs log: how the journaled file system performs logging. In: Annual Linux Showcase & Conference (2000)
Bhat, S.S., Eqbal, R., Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scaling a file system to many cores using an operation log. In: ACM SOSP (2017)
Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, N.: An analysis of linux scalability to many cores. In: USENIX OSDI (2010)
Chen, C., Yang, J., Wei, Q., Wang, C., Xue, M.: Fine-grained metadata journaling on NVM. In: IEEE MSST (2016)
Chen, J., Wei, Q., Chen, C., Wu, L.: FSMAC: a file system metadata accelerator with non-volatile memory. In: IEEE MSST (2013)
Clements, A.T., Kaashoek, M.F., Zeldovich, N.: RadixVM: scalable address spaces for multithreaded applications. In: ACM EuroSys (2013)
Clements A.T., Kaashoek, M.F., Zeldovich, N., Morris, R.T., Kohler, E.: The scalable commutativity rule: designing scalable software for multicore processors. In: ACM SOSP (2013)
Coburn, J., Bunker, T., Schwarz, M., Gupta, R., Swanson, S.: From ARIES to MARS: transaction support for next-generation. solid-state drives. In: ACM SOSP (2013)
Condit, J., Nightingale, E.B., Frost, C., Ipek, E., Lee, B., Burger, D., Coetzee, D.: Better I/O through byte-addressable. In: Persistent Memory, ACM SOSP (2009)
Dulloor, S.R., Kumar, S., Keshavamurthy, A., Lantz, P., Reddy, D., Sankaran, R., Jackson, J.: System software for persistent memory. In: ACM EuroSys (2014)
Esmet, J., Bender, M.A., Farach-Colton, M., Kuszmaul, B.C.: The tokufs streaming file system. In: USENIX HotStorage (2012)
Gao, S., Xu, J., Härder, T., He, B., Choi, B., Hu, H.: Pcmlogging: Optimizing transaction logging and recovery performance with pcm. IEEE Trans. Knowl. Data Eng. 27(12), 3332–3346 (2015)
Article Google Scholar
Han, H., Park, S., Jung, H., Fekete, A., Röhm, U., Yeom, H.Y.: Scalable serializable snapshot isolation for multicore systems. In: IEEE ICDE (2014)
Huang, F., Feng, D., Hua, Y., Zhou, W.: A wear-leveling-aware counter mode for data encryption in non-volatile memories. In: DATE (2017)
Izraelevitz, J., Yang, J., Zhang, L., Kim, J., Liu, X., Memaripour, A., Soh, Y.J., Wang, Z., Xu, Y., Dulloor, S.R., et al.: Basic performance measurements of the intel optane dc persistent memory module. arXiv preprint arXiv:1903.05714 (2019)
Jang, H., Rhee, S.Y., Kim, J.E., Kang, S., Han, H., Jung, H.: Autobahn: accelerating concurrent, durable file i/o via a non-volatile buffer. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 228–232. IEEE (2017)
Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. VLDB Endow. 3(1–2), 681 (2010)
Article Google Scholar
Jung, H., Han, H., Fekete, A., Heiser, G., Yeom, H.Y.: A scalable lock manager for multicores. ACM Trans. Database Syst. 39(4), 1 (2014)
Article MathSciNet Google Scholar
Jung, H., Han, H., Kang, S.: Scalable database logging for multicores. Proc. VLDB Endow. 11(2), 135 (2017)
Article Google Scholar
Kadekodi, R., Lee, S.K., Kashyap, S., Kim, T., Kolli, A., Chidambaram, V.: Splitfs: reducing software overhead in file systems for persistent memory. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 494–508 (2019)
Kang, J., Zhang, B., Wo, T., Yu, W., Du, L., Ma, S., Huai, J.: SpanFS: a scalable file system on fast storage devices. In: USENIX ATC (2015)
Kim, J., Jang, H., Son, S., Han, H., Kang, S., Jung, H.: Border-collie: a wait-free, read-optimal algorithm for database logging on multicore hardware. In: ACM SIGMOD (2019)
Kim, J., Min, C., Eom, Y.I.: Reducing excessive journaling overhead with small-sized NVRAM for mobile devices. IEEE Trans. Consumer Electron. 60, 217 (2014)
Article Google Scholar
Kimura, H.: Foedus: Oltp engine for a thousand cores and nvram. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 691–706 (2015)
Krishnan, R.M., Kim, J., Mathew, A., Fu, X., Demeri, A., Min, C., Kannan, S.: Durable transactional memory can scale with timestone. In: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 335–349 (2020)
Kwon, Y., Fingler, H., Hunt, T., Peter, S., Witchel, E., Anderson, T.: Strata: A cross media file system. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 460–477 (2017)
Lee, C., Sim, D., Hwang, J., Cho, S.: F2fs: a new file system for flash storage. In: 13thUSENIX Conference on File and Storage Technologies (FAST 15), pp. 273–286 (2015)
Lee, E., Bahn, H., Noh, S.H.: Unioning of the buffer cache and journaling layers with non-volatile memory. In: USENIX FAST (2013)
Linux. Direct acess for files. https://www.kernel.org/doc/Documentation/filesystems/dax.txt (2019)
Liu, Y., Li, H., Lu, Y., Chen, Z., Xiao, N., Zhao, M.: Hasfs: optimizing file system consistency mechanism on nvm-based hybrid storage architecture. Clust. Comput. 1–15, (2019)
Lorie, R.A.: Physical integrity in a large segmented database. ACM Trans. Database Syst. (TODS) 2(1), 91–104 (1977)
Article Google Scholar
Lu, L., Zhang, Y., Do, T., Al-Kiswany, S., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Physical disentanglement in a container-based file system. In: USENIX OSDI (2014)
Min, C., Kashyap, S., Maass, S., Kang, W., Kim, T.: Understanding manycore scalability of file systems. In: USENIX ATC (2016)
Oh, G., Kim, S., Lee, S.-W., Moon, B.: SQLite optimization with phase change memory for mobile applications. Proc. VLDB Endow. 8(12), 1454–1465 (2015)
Article Google Scholar
Park, D., Shin, D.: iJournaling: fine-grained journaling for improving the latency of fsync system call. In: USENIX ATC (2017)
Son, Y., Kim, S., Yeom, H.Y., Han, H.: High-performance transaction processing in journaling file systems. In: USENIX FAST (2018)
Sul, W., Kim, K., Ryu, M., Jung, H., Han, H.: Fast journaling made simple with nvm. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, pp. 1214–1221 (2020)
Tarasov, V., Zadok, E., Shepler, S.: Filebench: A flexible framework for file system benchmarking. login 41(1), 6 (2016)
Google Scholar
Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: ACM SOSP (2013)
Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 7(10), 865–876 (2014)
Article Google Scholar
Wu, T., Chen, X., Liu, K., Xiao, C., Liu, Z., Zhuge, Q., Sha, E.H.-M.: Hydrafs: an efficient numa-aware in-memory file system. Clust. Comput. 1–20, (2019)
Wu, X., Reddy, A.L.N.. SCMFS: a file system for storage class memory. In: IEEE/ACM SC (2011)
Xu, J., Kim, J., Memaripour, A., Swanson, S.: Finding and fixing performance pathologies in persistent memory software stacks. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 427–439 (2019)
Xu, J., Swanson, S.: NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In: USENIX FAST (2016)
Yang, J., Kim, J., Hoseinzadeh, M., Izraelevitz, J., Swanson, S.: An empirical guide to the behavior and use of scalable persistent memory. In: 18th USENIX Conference on File and Storage Technologies (FAST 20), pp. 169–182 (2020)
Zeng, L., Hou, B., Feng, D., Kent, K.B.: SJM: An SCM-based journaling mechanism with write reduction for file systems. In: DISCS (2015)
Zhang, X., Feng, D., Hua, Y., Chen, J.: A cost-efficient nvm-based journaling scheme for file systems. In: IEEE ICCD (2017)
Zheng, W., Tu, S., Kohler, E., Liskov, B.: Fast databases with fast durability and recovery through multicore parallelism. In: USENIX OSDI (2014)

Download references

Acknowledgements

This research was supported by the National Research Foundation of Korea (NRF) (2015M3C4A7065645, 2020R1F1A1055489). Hyuck Han is the corresponding author of this article.

Author information

Authors and Affiliations

Hanyang University, Seoul, South Korea
Woong Sul
Seoul National University, Seoul, South Korea
Woong Sul & Heon Y. Yeom
Dongduk Women’s University, Seoul, South Korea
Hyuck Han

Authors

Woong Sul
View author publications
You can also search for this author in PubMed Google Scholar
Heon Y. Yeom
View author publications
You can also search for this author in PubMed Google Scholar
Hyuck Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyuck Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sul, W., Yeom, H.Y. & Han, H. montage: NVM-based scalable synchronization framework for crash-consistent file systems. Cluster Comput 24, 3573–3590 (2021). https://doi.org/10.1007/s10586-021-03329-w

Download citation

Received: 01 September 2020
Revised: 24 May 2021
Accepted: 01 June 2021
Published: 30 June 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10586-021-03329-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

montage: NVM-based scalable synchronization framework for crash-consistent file systems

Abstract

Access this article

Similar content being viewed by others

Spindle: A Write-Optimized NVM Cache for Journaling File System

TSU: A Two-Stage Update Approach for Persistent Skiplist

HasFS: optimizing file system consistency mechanism on NVM-based hybrid storage architecture

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

montage: NVM-based scalable synchronization framework for crash-consistent file systems

Abstract

Access this article

Similar content being viewed by others

Spindle: A Write-Optimized NVM Cache for Journaling File System

TSU: A Two-Stage Update Approach for Persistent Skiplist

HasFS: optimizing file system consistency mechanism on NVM-based hybrid storage architecture

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation