Skip to main content

SepStore: Data Storage Accelerator for Distributed File Systems by Separating Small Files from Large Files

  • Conference paper
Internet of Vehicles – Technologies and Services (IOV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8662))

Included in the following conference series:

  • 2899 Accesses

Abstract

Distributed file systems often rely on disk file systems for storing data on disks. Disk file systems can do a relative good performance on large files than small files as sequential access patterns often exhibit for large files. This paper improves the performance of data servers for distributed file systems by improving the performance for small files. A LSM structure based key-value store is used for storing the data for small files for transforming the random access to sequential access as well as reducing the metadata of disk file systems. The key-value store is also used as the index for accessing small files. Experimental results showed that our method could improve the throughput up to 78% as well as 37% improvement on IOPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Filebench, http://sourceforge.net/apps/mediawiki/filebench/index.php/

  2. Leveldb, https://code.google.com/p/leveldb/

  3. Moosefs, http://www.moosefs.org/

  4. Beaver, D., Kumar, S., Li, H.C., et al.: Finding a needle in haystack: Facebook’s photo storage. In: Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI 2010), vol. 2010, pp. 47–60 (2010)

    Google Scholar 

  5. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2), 4 (2008)

    Article  Google Scholar 

  6. Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A novel approach to improving the efficiency of storing and accessing small files on hadoop: a case study by powerpoint files. In: Proceedings of 7th International Conference on Services Computing (SCC 2010), pp. 65–72. IEEE (2010)

    Google Scholar 

  7. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP 2003), pp. 29–43. ACM (2003)

    Google Scholar 

  8. Karger, D., Lehman, E., Leighton, T., et al.: Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC 1997), pp. 654–663. ACM (1997)

    Google Scholar 

  9. Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in hdfs. In: IEEE International Conference on Cluster Computing and Workshops, CLUSTER 2009, pp. 1–4. IEEE (2009)

    Google Scholar 

  10. O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)

    Article  Google Scholar 

  11. Ren, K., Gibson, G.: Tablefs: Enhancing metadata efficiency in the local file system. In: Proceedings of 2013 USENIX Annual Technical Conference (2013)

    Google Scholar 

  12. Sears, R., Ramakrishnan, R.: blsm: a general purpose log structured merge tree. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD 2012), pp. 217–228. ACM (2012)

    Google Scholar 

  13. Shetty, P., Spillane, R., Malpani, R., et al.: Building workload-independent storage with vt-trees. In: Proccedings of the 11th Conference on File and Storage Technologies, FAST 2013 (2013)

    Google Scholar 

  14. Tweedie, S.: Ext3, journaling filesystem. In: Ottawa Linux Symposium (2000)

    Google Scholar 

  15. Harter, T., Borthakur, D., Dong, S., et al.: Analysis of hdfs under hbase: A facebook messages case study. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies, FAST 2014 (2014)

    Google Scholar 

  16. Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: Crush: Controlled, scalable, decentralized placement of replicated data. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC 2006), pp. 122–133. ACM (2006)

    Google Scholar 

  17. Weil, S.A., Brandt, S.A., et al.: Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI 2006), pp. 307–320 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, Z., Chen, K., Wu, Y., Zheng, W. (2014). SepStore: Data Storage Accelerator for Distributed File Systems by Separating Small Files from Large Files. In: Hsu, R.CH., Wang, S. (eds) Internet of Vehicles – Technologies and Services. IOV 2014. Lecture Notes in Computer Science, vol 8662. Springer, Cham. https://doi.org/10.1007/978-3-319-11167-4_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11167-4_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11166-7

  • Online ISBN: 978-3-319-11167-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics