Skip to main content

PAHDFS: Preference-Aware HDFS for Hybrid Storage

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9529))

Abstract

In order to satisfy requirements of real-time processing and large capacity put forwarded by big data, hybrid storage has become a trend. There’s asymmetric read/write performance for storage devices, and asymmetric read/write access characteristics for data. Data may obtain different access performance on the same device due to access characteristics waving, and the most suitable device of data may also change at different time points. As data prefer to reside on device on which they can obtain higher access performance, this paper distributes data on device with highest preference degree to improve performance and efficiency of whole storage system. A Preference-Aware HDFS (PAHDFS) with high efficiency and scalability is implemented. PAHDFS shows good performance in experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, S, Gibbons, P, Nath, S.: Rethinking database algorithms for phase change memory. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR), pp. 21–31. Asilomar, California, USA (2011)

    Google Scholar 

  2. Gao, S., Xu, J.-L., He, B., et al.: PCMLogging: reducing transaction logging overhead with PCM. In: 20th Conference on Information and Knowledge Management (CIKM), pp. 2401–2404. Glasgow, Scotland, UK (2011)

    Google Scholar 

  3. Sun, G.-Y., Joo Y, Chen Y-B, Niu D-M, et al.: A Hybrid solid-state storage architecture for the performance, energy consumption, and-lifetime-improvement. In: 16th International Conference on High-Performance Computer Architecture (HPCA), pp. 1–12. Bangalore, India (2010)

    Google Scholar 

  4. HDFS Architecture Guide. http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html

  5. Apache Hadoop. http://hadoop.apache.org

  6. Apache Spark. https://spark.apache.org

  7. Chen, S.: FlashLogging: exploiting flash devices for synchronous logging performance. In: 35th SIGMOD International Conference on Management of Data, pp. 73–86. Rhode Island, USA (2009)

    Google Scholar 

  8. Lv, Y., Li, J., Cui, B., Chen, X.: Log-compact R-tree: an efficient spatial index for SSD. In: 16th International Conference on Database Systems for Advanced Applications, pp. 202–213. Hong Kong, China (2011)

    Google Scholar 

  9. Kang, W.-H., Lee, S.-W., Moon, B.: Flash-based extended cache for higher throughput and faster recovery. Proc. VLDB Endowment 5(11), 1615–1626 (2012)

    Article  Google Scholar 

  10. HDFS-2832. https://issues.apache.org/jira/browse/HDFS-2832

  11. Harter, T., Dragga, C., Vaughn, M., et al.: A file is not a file: understanding the I/O behavior of apple desktop applications. In: 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal (2011)

    Google Scholar 

  12. Chen, Y., Srinivasan, K., Goodson, G., Katz, R.: Design implications for enterprise storage systems via multi-dimensional trace analysis. In: 23rd ACM Symposium on Operating Systems Principles (SOSP), pp. 43–56. Cascais, Portugal (2011)

    Google Scholar 

  13. Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc. VLDB Endowment 5(12), 1802–1813 (2012)

    Article  Google Scholar 

  14. Krish, K.R., Anwar, A, Butt, A.R.: hatS: a heterogeneity-aware tiered storage for Hadoop. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 502–511. Chicago, Illinois, USA (2014)

    Google Scholar 

  15. Ioannis, K., Stratis, V.: Flashing up the storage layer. Proc. VLDB Endowment 1(1), 514–525 (2008)

    Article  Google Scholar 

  16. Yang, P.-Y., Jin, P.-Q., Yue, L.-H.: A time-sensitive and efficient hybrid storage model involving SSD and HDD. Chin. J. Comput. 35(11), 2294–2305 (2012)

    Article  Google Scholar 

  17. Soundararajan, G., Prabhakaran, V., Balakrishnan, M., Wobber, T.: Extending SSD lifetimes with disk-based write caches. In: 8th USENIX Conference on File and Storage Technologies (FAST), Berkeley, USA (2010)

    Google Scholar 

  18. Lu, Y., Shu, J., Zheng, W.: Extending the lifetime of flash-based storage through reducing write amplification from file systems. In: 11th Conference on File and Storage Technologies (FAST), pp. 257–270. San, CA (2013)

    Google Scholar 

  19. Yang, Q., Ren, J.: I-CASH: intelligently coupled array of SSD and HDD. In: 17th International Conference on High-Performance Computer Architecture (HPCA), pp. 278–289. San Antonio, Texas (2011)

    Google Scholar 

  20. Chen, F., Koufaty, D., Zhang, X.: Hystor: making the best use of solid state drives in high performance storage systems. In: 25th International Conference on Supercomputing, pp. 22–32. Tuscon, Arizona, USA (2011)

    Google Scholar 

  21. He, S., Sun, X.-H., Feng, B.: S4D-cache: smart selective SSD cache for parallel I/O systems. In: 34th IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 514–523. IEEE Press, Madrid, Spain (2014)

    Google Scholar 

  22. Wang, L., Zhan, J., Luo, C., et al.: BigDataBench: a big data benchmark suite from internet services. In: 20th IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 488–499. Orlando, Florida, USA (2014)

    Google Scholar 

Download references

Acknowledgments

This work is supported by National Basic Research 973 Program of China under Grant No. 2011CB302301, National University’s Special Research Fee No. 2015XJGH010, NSFC No. 61173043.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhipeng Tan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhou, W., Feng, D., Tan, Z., Zheng, Y. (2015). PAHDFS: Preference-Aware HDFS for Hybrid Storage. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27122-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27121-7

  • Online ISBN: 978-3-319-27122-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics