Skip to main content

Efficient Memory Caching for Erasure Coding Based Key-Value Storage Systems

  • Conference paper
  • First Online:
Big Data (Big Data 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 945))

Included in the following conference series:

Abstract

Erasure codes are widely advocated as a viable means to ensure the dependability of key-value storage systems for big data applications (e.g., MapReduce). They separate user data to several data splits, encode data splits to generate parity splits, and store these splits in storage nodes. Reducing the disk Input and Output (I/O) latency is a well-known challenge to enhance the performance of erasure coding based storage systems. In this paper, we consider the problem of reducing the latency of read operations by caching splits in the memory of storage nodes. We find the key to solve this problem is that storage nodes need to cache enough splits in the memory, so that the application server can reconstruct the objects without reading data from disks. We design an efficient memory caching scheme, namely ECCS. The theoretical analysis verifies that ECCS can effectively reduce the latency of read operations. Accordingly, we implement a prototype storage systems to deploy our proposal. The extensive experiments are conducted on the prototype with the real-world storage cluster and traces. The experimental results show that our proposal can reduce the time of read operations by up to 32% and improve the throughput of read operations by up to 48% compared with current caching approaches.

This work was supported by the National Natural Science Foundation of China (Project Nos. 61571136 and 61672164), and a CERNET Innovation Project (No. NGII20160615).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Weil, S.A., Brandt, S.A., Miller, E.L., et al.: Ceph: a scalable, high-performance distributed file system. In: Proceedings of Symposium on Operating Systems Design and Implementation, OSDI 2006, pp. 307–320 (2006)

    Google Scholar 

  2. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  3. Rashmi, K., Chowdhury, M., Kosaian, J., et al.: EC-Cache: load-balanced, low-latency cluster caching with online erasure coding. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, pp. 401–417 (2016)

    Google Scholar 

  4. Li, J., Li, B.: On data parallelism of erasure coding in distributed storage systems. In: Proceedings of IEEE International Conference on Distributed Computing Systems, ICDCS 2017, pp. 1–12 (2017)

    Google Scholar 

  5. Li, H., Ghodsi, A., Zaharia, M., et al.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of the ACM Symposium on Cloud Computing, SoCC 2014, pp. 1–15 (2014)

    Google Scholar 

  6. Sathiamoorthy, M., et al.: XORing elephants: novel erasure codes for big data. In: Proceedings of IEEE Conference on International Conference on Very Large Data Bases, VLDB 2013, pp. 325–336 (2013)

    Google Scholar 

  7. Javadi, B., Kondo, D., Iosup, A., Epema, D.: The failure trace archive: enabling the comparison of failure measurements and models of distributed systems. J. Parallel Distrib. Comput. 73(8), 1208–1223 (2013)

    Article  Google Scholar 

  8. Yan, J., Zhu, Y.L., Xiong, H., et al.: A design of metadata server cluster in large distributed object-based storage. In: Proceedings of IEEE Conference on MASS Storage Systems and Technologies, MSST 2004, pp. 199–205 (2004)

    Google Scholar 

  9. Dragojević, A., Narayanan, D., Castro, M., Hodson, O.: FaRM: fast remote memory. In: Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2014, pp. 401–414 (2014)

    Google Scholar 

  10. Nishtala, R., Fugal, H., Grimm, S., et al.: Scaling memcache at Facebook. In: Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, pp. 385–398 (2013)

    Google Scholar 

  11. Li, B., Ruan, Z., Xiao, W., et al.: KV-Direct: high-performance in-memory key-value store with programmable NIC. In: Proceedings of ACM Symposium on Operating Systems Principles, SOSP 2017, pp. 137–152 (2017)

    Google Scholar 

  12. Kaminsky, M., Andersen, D., Lim, H.: MICA: a holistic approach to fast in-memory key-value storage, pp. 429–444 (2014)

    Google Scholar 

  13. Ananthanarayanan, G., et al.: Scarlett: coping with skewed content popularity in MapReduce clusters. In: Proceedings of the 6th European Conference on Computer Systems, EUROSYS 2011, pp. 287–300 (2011)

    Google Scholar 

  14. Amazon S3 storage, 14 September 2017. http://aws.amazon.com/s3

  15. OpenStack Swift, 14 September 2017. http://swift.openstack.org

  16. Calder, B., Wang, J., Ogus, A., et al.: Windows Azure Storage: a highly available cloud storage service with strong consistency. In: Proceedings of the 23rd ACM Symposium on Operating Systems Principles, pp. 143–157 (2011)

    Google Scholar 

  17. Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)

    Article  MathSciNet  Google Scholar 

  18. Huang, C., Simitci, H., Xu, Y., et al.: Erasure coding in Windows Azure Storage. In: Proceedings of the USENIX Annual Technical Conference, ATC 2012, pp. 15–26 (2012)

    Google Scholar 

  19. Dimakis, A.G., Godfrey, P.B., Wu, Y., et al.: Network coding for distributed storage systems. IEEE Trans. Inf. Theory 56(9), 4539–4551 (2010)

    Article  Google Scholar 

  20. Zhang, H., Dong, M., Chen, H.: Efficient and available in-memory KV-store with hybrid erasure coding and replication. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016, pp. 167–180 (2016)

    Google Scholar 

  21. Rashmi, K.V., Shah, N.B., Gu, D., et al.: A “Hitchhiker’s” guide to fast and efficient data reconstruction in erasure-coded data centers. In: Proceedings of the Annual Conference on ACM Special Interest Group on Data Communication, SIGCOMM 2014, pp. 331–342 (2014)

    Google Scholar 

  22. Li, M., Lee, P.P.: STAIR codes: a general family of erasure codes for tolerating device and sector failures in practical storage systems. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies, FAST 2014, pp. 147–162 (2014)

    Article  MathSciNet  Google Scholar 

  23. Yiu, M.M.T., Chan, H.H.W., Lee, P.P.C.: Erasure coding for small objects in in-memory KV storage. In: Proceedings of ACM International Systems and Storage Conference, SYSTOR 2017, pp. 1–12 (2017)

    Google Scholar 

  24. Li, S., Zhang, Q., Yang, Z., Dai, Y.: BCStore: bandwidth-efficient in-memory KV-store with batch coding. In: Proceedings of International Conference on Massive Storage Systems and Technology, MSST 2017, pp. 1–13 (2017)

    Google Scholar 

  25. Halalai, R., Felber, P., Kermarrec, A.M., Taiani, F.: Agar: a caching system for erasure-coded data. In: Proceedings of IEEE International Conference on Distributed Computing Systems, ICDCS 2017, pp. 23–33 (2017)

    Google Scholar 

  26. Li, R., Lin, J., Lee, P.P.: CORE: augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. In: Proceedings of IEEE Conference on Mass Storage Systems and Technologies, MSST, pp. 1–6. IEEE (2013)

    Google Scholar 

  27. Facebook SWIM traces, 17 June 2016. https://github.com/SWIMProjectUCB/SWIM/wiki/Workloads-repository

  28. Li, J., Li, B.: Zebra: demand-aware erasure coding for distributed storage systems. In: Proceedings of IEEE Symposium on Quality of Services, IWQoS 2016, pp. 1–10 (2016)

    Google Scholar 

  29. Plank, J.S., Luo, J., Schuman, C.D., Xu, L., Wilcox-O’Hearn, Z.: A performance evaluation and examination of open-source erasure coding libraries for storage. In: Proceedings of the 7th USENIX Conference on File and Storage Technologies, FAST 2009, pp. 253–265 (2009)

    Google Scholar 

  30. Ananthanarayanan, G., Ghodsi, A., Borthakur, D., et al.: PACMan: coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, pp. 1–20 (2012)

    Google Scholar 

  31. Manes, B.: Caffeine: a high performance caching library for JAVA 8 (2016). https://github.com/benmanes/caffeine

  32. Einziger, G., Friedman, R.: TinyLFU: a highly efficient cache admission policy. In: Proceedings of Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2014, pp. 146–153 (2014)

    Google Scholar 

  33. Xia, M., Saxena, M., Blaum, M., Pease, D.A.: A tale of two erasure codes in HDFS. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015, pp. 213–226 (2015)

    Google Scholar 

  34. Cherkasova, L.: Improving WWW proxies performance with greedy-dual-size-frequency caching policy. Hp Technical report (1998)

    Google Scholar 

  35. Dan, A., Towsley, D.: An approximate analysis of the LRU and FIFO buffer replacement schemes. ACM SIGMETRICS Perform. Eval. Rev. 18(1), 143–152 (1990)

    Article  Google Scholar 

  36. Robinson, J.T., Devarakonda, M.V.: Data cache management using frequency-based replacement. In: Proceedings of ACM Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 1990, pp. 134–142 (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangfan Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shen, J., Li, Y., Sheng, G., Zhou, Y., Wang, X. (2018). Efficient Memory Caching for Erasure Coding Based Key-Value Storage Systems. In: Xu, Z., Gao, X., Miao, Q., Zhang, Y., Bu, J. (eds) Big Data. Big Data 2018. Communications in Computer and Information Science, vol 945. Springer, Singapore. https://doi.org/10.1007/978-981-13-2922-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2922-7_34

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2921-0

  • Online ISBN: 978-981-13-2922-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics