Abstract
In order to satisfy requirements of real-time processing and large capacity put forwarded by big data, hybrid storage has become a trend. There’s asymmetric read/write performance for storage devices, and asymmetric read/write access characteristics for data. Data may obtain different access performance on the same device due to access characteristics waving, and the most suitable device of data may also change at different time points. As data prefer to reside on device on which they can obtain higher access performance, this paper distributes data on device with highest preference degree to improve performance and efficiency of whole storage system. A Preference-Aware HDFS (PAHDFS) with high efficiency and scalability is implemented. PAHDFS shows good performance in experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, S, Gibbons, P, Nath, S.: Rethinking database algorithms for phase change memory. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR), pp. 21–31. Asilomar, California, USA (2011)
Gao, S., Xu, J.-L., He, B., et al.: PCMLogging: reducing transaction logging overhead with PCM. In: 20th Conference on Information and Knowledge Management (CIKM), pp. 2401–2404. Glasgow, Scotland, UK (2011)
Sun, G.-Y., Joo Y, Chen Y-B, Niu D-M, et al.: A Hybrid solid-state storage architecture for the performance, energy consumption, and-lifetime-improvement. In: 16th International Conference on High-Performance Computer Architecture (HPCA), pp. 1–12. Bangalore, India (2010)
HDFS Architecture Guide. http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html
Apache Hadoop. http://hadoop.apache.org
Apache Spark. https://spark.apache.org
Chen, S.: FlashLogging: exploiting flash devices for synchronous logging performance. In: 35th SIGMOD International Conference on Management of Data, pp. 73–86. Rhode Island, USA (2009)
Lv, Y., Li, J., Cui, B., Chen, X.: Log-compact R-tree: an efficient spatial index for SSD. In: 16th International Conference on Database Systems for Advanced Applications, pp. 202–213. Hong Kong, China (2011)
Kang, W.-H., Lee, S.-W., Moon, B.: Flash-based extended cache for higher throughput and faster recovery. Proc. VLDB Endowment 5(11), 1615–1626 (2012)
Harter, T., Dragga, C., Vaughn, M., et al.: A file is not a file: understanding the I/O behavior of apple desktop applications. In: 23rd ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal (2011)
Chen, Y., Srinivasan, K., Goodson, G., Katz, R.: Design implications for enterprise storage systems via multi-dimensional trace analysis. In: 23rd ACM Symposium on Operating Systems Principles (SOSP), pp. 43–56. Cascais, Portugal (2011)
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc. VLDB Endowment 5(12), 1802–1813 (2012)
Krish, K.R., Anwar, A, Butt, A.R.: hatS: a heterogeneity-aware tiered storage for Hadoop. In: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 502–511. Chicago, Illinois, USA (2014)
Ioannis, K., Stratis, V.: Flashing up the storage layer. Proc. VLDB Endowment 1(1), 514–525 (2008)
Yang, P.-Y., Jin, P.-Q., Yue, L.-H.: A time-sensitive and efficient hybrid storage model involving SSD and HDD. Chin. J. Comput. 35(11), 2294–2305 (2012)
Soundararajan, G., Prabhakaran, V., Balakrishnan, M., Wobber, T.: Extending SSD lifetimes with disk-based write caches. In: 8th USENIX Conference on File and Storage Technologies (FAST), Berkeley, USA (2010)
Lu, Y., Shu, J., Zheng, W.: Extending the lifetime of flash-based storage through reducing write amplification from file systems. In: 11th Conference on File and Storage Technologies (FAST), pp. 257–270. San, CA (2013)
Yang, Q., Ren, J.: I-CASH: intelligently coupled array of SSD and HDD. In: 17th International Conference on High-Performance Computer Architecture (HPCA), pp. 278–289. San Antonio, Texas (2011)
Chen, F., Koufaty, D., Zhang, X.: Hystor: making the best use of solid state drives in high performance storage systems. In: 25th International Conference on Supercomputing, pp. 22–32. Tuscon, Arizona, USA (2011)
He, S., Sun, X.-H., Feng, B.: S4D-cache: smart selective SSD cache for parallel I/O systems. In: 34th IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 514–523. IEEE Press, Madrid, Spain (2014)
Wang, L., Zhan, J., Luo, C., et al.: BigDataBench: a big data benchmark suite from internet services. In: 20th IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 488–499. Orlando, Florida, USA (2014)
Acknowledgments
This work is supported by National Basic Research 973 Program of China under Grant No. 2011CB302301, National University’s Special Research Fee No. 2015XJGH010, NSFC No. 61173043.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhou, W., Feng, D., Tan, Z., Zheng, Y. (2015). PAHDFS: Preference-Aware HDFS for Hybrid Storage. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-27122-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27121-7
Online ISBN: 978-3-319-27122-4
eBook Packages: Computer ScienceComputer Science (R0)