A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory

Choi, Won Gi; Park, Sanghyun

doi:10.1007/s11227-019-02876-9

A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory

Published: 10 May 2019

Volume 75, pages 6632–6662, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

450 Accesses
6 Citations
Explore all metrics

Abstract

With the emergence of the big data era, various technologies have been proposed to cope with the exascale of data. For a considerably large volume of data, a single machine does not comprise enough resources to store the complete data. Hadoop distributed file system (HDFS) enables large datasets to be stored across the big data environment consisting of several machines. Although Hadoop has become a crucial part of the big data industry, because of its simple architecture which composed of master and slaves several problems such as scalability and performance bottleneck has been remained to solve. New storage technologies offer an opportunity to solve the problems and improve HDFS. We propose a novel management scheme for namespace metadata of HDFS by utilizing nonvolatile memory which has been mentioned as the next-generation device since flash memory devices. Nonvolatile memory, which can guarantee data persistence and high performance with byte-address access, alleviates Namenode bottlenecks resulting from journaling processes performed to preserve the file system’s metadata. Our proposed methods show significant improvement compared with block devices such as hard disk drive, solid-state drive in terms of NameNode performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Overview and outlook of emerging non-volatile memories

Article 01 October 2021

Redis-based full-text search extensions for relational databases

Article 12 April 2024

References

Andrei M, Lemke C, Radestock G, Schulze R, Thiel C, Blanco R, Meghlan A, Sharique M, Seifert S, Vishnoi S et al (2017) Sap hana adoption of non-volatile memory. Proc VLDB Endow 10(12):1754–1765
Article Google Scholar
Apache Hadoop Home Page. http://hadoop.apache.org
Apache Kafka Home Page. https://kafka.apache.org
Apache Storm Home Page. http://storm.apache.org
Apache Zookeeper Home Page. https://zookeeper.apache.org
Arulraj J, Pavlo A (2017) How to build a non-volatile memory database management system. In: Proceedings of the 2017 ACM International Conference on Management of Data. ACM, pp 1753–1758
Arulraj J, Perron M, Pavlo A (2016) Write-behind logging. Proc VLDB Endow 10(4):337–348
Article Google Scholar
Bakratsas M, Basaras P, Katsaros D, Tassiulas L (2016) Hadoop mapreduce performance on ssds: the case of complex network analysis tasks. In: INNS Conference on Big Data. Springer, Berlin, pp 111–119
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Gao S, Xu J, Härder T, He B, Choi B, Hu H (2015) Pcmlogging: optimizing transaction logging and recovery performance with PCM. IEEE Trans Knowl Data Eng 27(12):3332–3346
Article Google Scholar
Hadoop Distribted Filesystem Federation. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html
Hadoop Archival Stroage, SSD & Memory Document. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
HiBench Home Page. https://github.com/intel-hadoop
Huang S, Huang J, Dai J, Xie T, Huang B (2010) The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010). IEEE, pp 41–51
Islam NS, Wasi-ur Rahman M, Lu X, Panda DK (2016) High performance design for HDFS with byte-addressability of NVM and RDMA. In: Proceedings of the 2016 International Conference on Supercomputing. ACM, p 8
Kambatla K, Chen Y (2014) The truth about mapreduce performance on SSDS. In: 28th Large Installation System Administration Conference (LISA14), pp 118–126
Kim M, Shin M, Park S (2016) Take me to SSD: a hybrid block-selection method on HDFS based on storage type. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, pp 965–971
Kim WH, Kim J, Baek W, Nam B, Won Y (2016) Nvwal: exploiting NVRAM in write-ahead logging. ACM SIGOPS Oper Syst Rev 50(2):385–398
Article Google Scholar
Krish K, Iqbal MS, Butt AR (2014) Venu: Orchestrating SSDS in Hadoop storage. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, pp 207–212
Lee BC, Ipek E, Mutlu O, Burger D (2009) Architecting phase change memory as a scalable dram alternative. ACM SIGARCH Comput Archit News 37(3):2–13
Article Google Scholar
Lee SK, Lim KH, Song H, Nam B, Noh SH (2017) WORT: write optimal radix tree for persistent memory storage systems. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp 257–270
Lu Y, Shu J, Chen Y, Li T (2017) Octopus: an RDMA-enabled distributed persistent memory file system. In: 2017 USENIX Annual Technical Conference (USENIXATC 17), pp 773–785
Moon S, Lee J, Kee YS (2014) Introducing SSDS to the Hadoop mapreduce framework. In: 2014 IEEE 7th International Conference on Cloud Computing. IEEE, pp 272–279
Neshatpour K, Malik M, Ghodrat MA, Sasan A, Homayoun H (2015) Energy-efficient acceleration of big data analytics applications using fpgas. In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp 115–123
Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M (2017) Hopsfs: scaling hierarchical file system metadata using newsql databases. In: 15th USENIX Conference on File and Storage Technologies (FAST 17), pp 89–104
Oh G, Kim S, Lee SW, Moon B (2015) Sqlite optimization with phase change memory for mobile applications. Proc VLDB Endow 8(12):1454–1465
Article Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R et al (2010) The hadoop distributed file system. MSST 10:1–10
Google Scholar
Wasi-ur Rahman M, Islam NS, Lu X, Panda DK (2016) Can non-volatile memory benefit mapreduce applications on hpc clusters? In: 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS). IEEE, pp 19–24
Wasi-ur Rahman M, Islam NS, Lu X, Panda DKD (2017) Nvmd: non-volatile memory assisted design for accelerating mapreduce and dag execution frameworks on HPC systems. In: 2017 IEEE International Conference on Big Data (Big Data). IEEE, pp 369–374
Xia F, Jiang D, Xiong J, Sun N (2017) Hikv: a hybrid index key-value store for dram-NVM memory systems. In: 2017 USENIX Annual Technical Conference (USENIXATC 17), pp 349–362
Yang J, Izraelevitz J, Swanson S (2019) Orion: a distributed file system for non-volatile main memory and RDMA-capable networks. In: 17th USENIX Conference on File and Storage Technologies (FAST 19), pp 221–234
Yang J, Wei Q, Wang C, Chen C, Yong KL, He B (2016) Nv-tree: a consistent and workload-adaptive tree structure for non-volatile memory. IEEE Trans Comput 65(7):2169–2183
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2015M3C4A7065522).

Author information

Authors and Affiliations

Yonsei University, Yonsei-ro 50, Seodaemungu, Seoul, Republic of Korea
Won Gi Choi & Sanghyun Park

Authors

Won Gi Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sanghyun Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanghyun Park.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, W.G., Park, S. A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory. J Supercomput 75, 6632–6662 (2019). https://doi.org/10.1007/s11227-019-02876-9

Download citation

Published: 10 May 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11227-019-02876-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory

Abstract

Access this article

Similar content being viewed by others

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Overview and outlook of emerging non-volatile memories

Redis-based full-text search extensions for relational databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A write-friendly approach to manage namespace of Hadoop distributed file system by utilizing nonvolatile memory

Abstract

Access this article

Similar content being viewed by others

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Overview and outlook of emerging non-volatile memories

Redis-based full-text search extensions for relational databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation