Skip to main content

Placement Scheduling for Replication in HDFS Based on Probabilistic Approach

  • Conference paper
  • First Online:
Inclusive Smart Cities and Digital Health (ICOST 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9677))

Included in the following conference series:

  • 2253 Accesses

Abstract

Along with the rapid evolution in Big Data analysis, Apache Hadoop keeps the important role to deliver the high availability on top of computing clusters. Also, to maintain the high throughput access for computation, the Apache Hadoop is equipped with the Hadoop File System (HDFS) for managing the file operations. Besides, HDFS is ensured the reliability and high availability by using a specific replication mechanism. However, because the workload on each computing node is various, keeping the same replication strategy might result in imbalance. Targeting to solve this drawbacks of HDFS architecture, we proposes an approach to adaptively choose the placement for replicas. To do that, the network status and system utilization can be used to create the individual replication placement strategy for each file. Eventually, the proposed approach can provide the suitable destination for replicas to improve the performance. Subsequently, the availability of the system is enhanced while still keeping the reliability of data storage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wei, Q., Veeravalli, B., Gong, B., Zeng, L., Feng, D.: Cdrm: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing (CLUSTER), pp. 188–196, September 2010

    Google Scholar 

  2. Abad, C.L., Lu, Y., Campbell, R.H.: Dare: adaptive data replication for efficient cluster scheduling. In: CLUSTER, pp. 159–168. IEEE (2011)

    Google Scholar 

  3. Cheng, Z., Luan, Z., Meng, Y., Xu, Y., Qian, D., Roy, A., Zhang, N., Guan, G.: Erms: an elastic replication management system for hdfs. In: 2012 IEEE International Conference on Cluster Computing Workshops (CLUSTER WORKSHOPS), pp. 32–40, September 2012

    Google Scholar 

  4. Kousiouris, G., Vafiadis, G., Varvarigou, T.: Enabling proactive data management in virtualized hadoop clusters based on predicted data activity patterns. In: 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), pp. 1–8, October 2013

    Google Scholar 

  5. Wu, X.: Performance Evaluation Prediction and Visualization of Parallel Systems. The International Series on Asian Studies in Computer and Information Science. Springer US, New York (1999). http://books.google.co.kr/books?id=IJZt5H6R8OIC

    Google Scholar 

  6. Gallager, R.: Stochastic Processes: Theory for Applications. Cambridge University Press, Cambridge (2013). http://books.google.co.kr/books?id=CGFbAgAAQBAJ

    Google Scholar 

Download references

Acknowledgment

This work was supported by the Industrial Core Technology Development Program (10049079, Develop of mining core technology exploiting personal big data) funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea); and supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) NRF-2014R1A2A2A01003914.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sungyoung Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Bui, DM., Lee, S. (2016). Placement Scheduling for Replication in HDFS Based on Probabilistic Approach. In: Chang, C., Chiari, L., Cao, Y., Jin, H., Mokhtari, M., Aloulou, H. (eds) Inclusive Smart Cities and Digital Health. ICOST 2016. Lecture Notes in Computer Science(), vol 9677. Springer, Cham. https://doi.org/10.1007/978-3-319-39601-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39601-9_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39600-2

  • Online ISBN: 978-3-319-39601-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics