Skip to main content

WarmCache: A Comprehensive Distributed Storage System Combining Replication, Erasure Codes and Buffer Cache

  • Conference paper
  • First Online:
Green, Pervasive, and Cloud Computing (GPC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11204))

Included in the following conference series:

Abstract

A tiered storage system uses replication method to provide both high reliability and availability, which stores three replicas over different nodes in the clusters. Erasure codes (EC) such as Reed-Solomon (RS) are increasingly utilized to further reduce the storage overhead while providing low I/O performance and availability. Existing solutions nowadays implement heterogeneous storage systems either using triple replication, erasure coding methods or a combination of both, although involves high performance gap between each data layer. To address this problem, in this paper, we introduce WarmCache, a new data layer for warm data by having one copy stored using erasure coding and the other copy in memory data layer. Using one copy in erasure coding data layer ensures data reliability, while the other copy in memory data layer provides fast I/O performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Teradata: The impact of data temperature on the data warehouse, August 2012. http://www.teradata.com/Resources/White-Papers/The-Impact-of-Data-Temperature-on-the-Data-Warehouse/

  2. Chen, F., Koufaty, D.A., Zhang, X.: Hystor: making the best use of solid state drives in high performance storage systems. In: Proceedings on Supercomputing, pp. 22–32. ACM (2011)

    Google Scholar 

  3. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

    Google Scholar 

  4. Wei, Q., Veeravalli, B., Gong, B., Zeng, L., Feng, D.: CDRM: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing (CLUSTER), pp. 188–196. IEEE (2010)

    Google Scholar 

  5. Ananthanarayanan, G., et al.: Scarlett: coping with skewed content popularity in mapreduce clusters. In: Proceedings of the Sixth Conference on Computer Systems, pp. 287–300. ACM (2011)

    Google Scholar 

  6. Plank, J.S.: Erasure codes for storage systems: a brief primer. Usenix Mag. 38(6), 44–50 (2013)

    Google Scholar 

  7. Patterson, D.A., Gibson, G., Katz, R.H.: A case for redundant arrays of inexpensive disks (RAID), vol. 17. ACM, New York (1988)

    Google Scholar 

  8. Plank, J.S., et al.: A tutorial on reed-solomon coding for fault-tolerance in raid-like systems. Softw. Pract. Exp. 27(9), 995–1012 (1997)

    Article  MathSciNet  Google Scholar 

  9. Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57(7), 889–901 (2008)

    Article  MathSciNet  Google Scholar 

  10. Sathiamoorthy, M., et al.: Xoring elephants: novel erasure codes for big data. In: Proceedings of the 39th International Conference on Very Large Data Bases, PVLDB 2013, pp. 325–336. VLDB Endowment (2013)

    Google Scholar 

  11. Huang, C., et al.: Erasure coding in windows azure storage. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC 2012, Berkeley, CA, USA, p. 2. USENIX Association (2012)

    Google Scholar 

  12. Li, M., Shu, J., Zheng, W.: Grid codes: Strip-based erasure codes with high fault tolerance for storage systems. ACM Trans. Storage (TOS) 4(4), 15 (2009)

    Google Scholar 

  13. Hafner, J.L.: Hover erasure codes for disk arrays. In: 2006 International Conference on Dependable Systems and Networks, DSN 2006, pp. 217–226. IEEE (2006)

    Google Scholar 

  14. Cheng, Z., et al.: ERMS: an elastic replication management system for HDFS. In: 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops, pp. 32–40. IEEE (2012)

    Google Scholar 

  15. Li, R., Hu, Y., Lee, P.P.: Enabling efficient and reliable transition from replication to erasure coding for clustered file systems. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 148–159. IEEE (2015)

    Google Scholar 

  16. Tang, Y., et al.: MICS: mingling chained storage combining replication and erasure coding. In: 2015 IEEE 34th Symposium on Reliable Distributed Systems, SRDS, pp. 192–201. IEEE (2015)

    Google Scholar 

  17. Ma, Y., Nandagopal, T., Puttaswamy, K.P., Banerjee, S.: An ensemble of replication and erasure codes for cloud file systems. In: 2013 Proceedings IEEE INFOCOM, pp. 1276–1284. IEEE (2013)

    Google Scholar 

  18. Rashmi, K.V., et al.: A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the facebook warehouse cluster. In: Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File Systems, Berkeley, CA, USA, pp. 3–8 (2013)

    Google Scholar 

  19. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM (2003)

    Google Scholar 

  20. Plank, J.S.: T1: erasure codes for storage applications. In: Proceedings of the 4th USENIX Conference on File and Storage Technologies, pp. 1–74 (2005)

    Google Scholar 

  21. Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: a quantitative comparison. In: Druschel, P., Kaashoek, F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 328–337. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45748-8_31

    Chapter  MATH  Google Scholar 

  22. Plank, J.S., Luo, J., Schuman, C.D., Xu, L., Wilcox-O’Hearn, Z.: A performance evaluation and examination of open-source erasure coding libraries for storage. In: Proccedings of the 7th Conference on File and Storage Technologies, Berkeley, CA, USA, pp. 253–265 (2009)

    Google Scholar 

  23. Dimakis, A.G., Godfrey, P.B., Wu, Y., Wainwright, M.J., Ramchandran, K.: Network coding for distributed storage systems. IEEE Trans. Inf. Theory 56(9), 4539–4551 (2010)

    Article  Google Scholar 

  24. Fan, B., Tantisiriroj, W., Xiao, L., Gibson, G.: DiskReduce: RAID for data-intensive scalable computing. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, pp. 6–10. ACM (2009)

    Google Scholar 

  25. Facebook: Erasure coded HDFS, November 2011. https://github.com/facebook/hadoop-20

  26. Alluxio Open Foundation: Alluxio (2012). http://www.alluxio.org/

  27. Subramanyam, R.: HDFS heterogeneous storage resource management based on data temperature. In: 2015 International Conference on Cloud and Autonomic Computing, ICCAC, pp. 232–235. IEEE (2015)

    Google Scholar 

  28. Zhou, W., Feng, D., Tan, Z., Zheng, Y.: PAHDFS: preference-aware hdfs for hybrid storage. In: Wang, G., Zomaya, A., Perez, G.M., Li, K. (eds.) ICA3PP 2015. LNCS, vol. 9529, pp. 3–17. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27122-4_1

    Chapter  Google Scholar 

  29. Wang, T., et al.: BurstMem:: a high-performance burst buffer system for scientific applications. In: 2014 IEEE International Conference on Big Data, Big Data, pp. 71–79. IEEE (2014)

    Google Scholar 

  30. Shu, P., Gu, R., Dong, Q., Yuan, C., Huang, Y.: Accelerating big data applications on tiered storage system with various eviction policies. In: 2016 IEEE Trustcom/BigDataSE/ SPA, pp. 1350–1357. IEEE (2016)

    Google Scholar 

  31. Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating mapreduce performance using workload suites. In: 2011 IEEE 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS, pp. 390–399. IEEE (2011)

    Google Scholar 

  32. Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc. VLDB Endowment 5(12), 1802–1813 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chentao Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ignacio, B.A., Wu, C., Li, J. (2019). WarmCache: A Comprehensive Distributed Storage System Combining Replication, Erasure Codes and Buffer Cache. In: Li, S. (eds) Green, Pervasive, and Cloud Computing. GPC 2018. Lecture Notes in Computer Science(), vol 11204. Springer, Cham. https://doi.org/10.1007/978-3-030-15093-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15093-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15092-1

  • Online ISBN: 978-3-030-15093-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics