Abstract
Distributed file systems often rely on disk file systems for storing data on disks. Disk file systems can do a relative good performance on large files than small files as sequential access patterns often exhibit for large files. This paper improves the performance of data servers for distributed file systems by improving the performance for small files. A LSM structure based key-value store is used for storing the data for small files for transforming the random access to sequential access as well as reducing the metadata of disk file systems. The key-value store is also used as the index for accessing small files. Experimental results showed that our method could improve the throughput up to 78% as well as 37% improvement on IOPS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Filebench, http://sourceforge.net/apps/mediawiki/filebench/index.php/
Leveldb, https://code.google.com/p/leveldb/
Moosefs, http://www.moosefs.org/
Beaver, D., Kumar, S., Li, H.C., et al.: Finding a needle in haystack: Facebook’s photo storage. In: Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI 2010), vol. 2010, pp. 47–60 (2010)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2), 4 (2008)
Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A novel approach to improving the efficiency of storing and accessing small files on hadoop: a case study by powerpoint files. In: Proceedings of 7th International Conference on Services Computing (SCC 2010), pp. 65–72. IEEE (2010)
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP 2003), pp. 29–43. ACM (2003)
Karger, D., Lehman, E., Leighton, T., et al.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC 1997), pp. 654–663. ACM (1997)
Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in hdfs. In: IEEE International Conference on Cluster Computing and Workshops, CLUSTER 2009, pp. 1–4. IEEE (2009)
O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)
Ren, K., Gibson, G.: Tablefs: Enhancing metadata efficiency in the local file system. In: Proceedings of 2013 USENIX Annual Technical Conference (2013)
Sears, R., Ramakrishnan, R.: blsm: a general purpose log structured merge tree. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD 2012), pp. 217–228. ACM (2012)
Shetty, P., Spillane, R., Malpani, R., et al.: Building workload-independent storage with vt-trees. In: Proccedings of the 11th Conference on File and Storage Technologies, FAST 2013 (2013)
Tweedie, S.: Ext3, journaling filesystem. In: Ottawa Linux Symposium (2000)
Harter, T., Borthakur, D., Dong, S., et al.: Analysis of hdfs under hbase: A facebook messages case study. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies, FAST 2014 (2014)
Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: Crush: Controlled, scalable, decentralized placement of replicated data. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC 2006), pp. 122–133. ACM (2006)
Weil, S.A., Brandt, S.A., et al.: Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI 2006), pp. 307–320 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, Z., Chen, K., Wu, Y., Zheng, W. (2014). SepStore: Data Storage Accelerator for Distributed File Systems by Separating Small Files from Large Files. In: Hsu, R.CH., Wang, S. (eds) Internet of Vehicles – Technologies and Services. IOV 2014. Lecture Notes in Computer Science, vol 8662. Springer, Cham. https://doi.org/10.1007/978-3-319-11167-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-11167-4_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11166-7
Online ISBN: 978-3-319-11167-4
eBook Packages: Computer ScienceComputer Science (R0)