Skip to main content

EarnCache: Self-adaptive Incremental Caching for Big Data Applications

  • Conference paper
  • First Online:
  • 1601 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10988))

Abstract

Memory caching plays a crucial role in satisfying the requirements for (quasi-)real-time processing of exploding data on big-data clusters. As big data clusters are usually shared by multiple computing frameworks, applications or end users, there exists intense competition for memory cache resources, especially on small clusters that are supposed to process comparably big datasets as large clusters do, yet with tightly limited resource budgets. Applying existing on-demand caching strategies on such shared clusters inevitably results in frequent cache thrashing when the conflicts of simultaneous cache resource demands are not mediated, which will deteriorate the overall cluster efficiency.

In this paper, we propose a novel self-adaptive incremental big data caching mechanism, called EarnCache, to improve the cache efficiency for shared big data clusters, especially for small clusters where cache thrashing may occur frequently. EarnCache self-adaptively adjusts resource allocation strategy according to the condition of cache resource competition: turning to incremental caching to depress competition when resource is in deficit, and returning to traditional on-demand caching to expedite data caching-in when resource is in surplus. Extensive experimental evaluation shows that the elasticity of EarnCache enhances the cache efficiency on shared big data clusters, and thus improves resource utilization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhang, J., Wu, G., Hu, X., et al.: A distributed cache for Hadoop distributed file system in real-time cloud services. In: Proceedings of GRID, pp. 12–21 (2012)

    Google Scholar 

  2. Ousterhout, J., Agrawal, P., Erickson, D., et al.: The case for RAMCloud. Commun. ACM 54(7), 121–130 (2011)

    Article  Google Scholar 

  3. Dean, J., Ghemawat, S., et al.: MapReduce: simplified data processing on large cluster. In: Proceedings of OSDI, pp. 137–150 (2004)

    Google Scholar 

  4. Li, H., Ghodsi, A., Zaharia, M., et al.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of SOCC, pp. 6:1–6:15 (2014)

    Google Scholar 

  5. Li, Y., Feng, D., Shi, Z.: Enhancing both fairness and performance using rate-aware dynamic storage cache partitioning. In: Proceedings of DISCS, pp. 31–36 (2013)

    Google Scholar 

  6. Shvachko, K., Kuang, H., Radia, S., et al.: The Hadoop distributed file system. In: Proceedings of MSST, pp. 121–134 (2010)

    Google Scholar 

  7. Ananthanarayanan, G., Ghodsi, A., Warfield, A., et al.: PACMan: coordinated memory caching for parallel jobs. In: Proceedings of NSDI, pp. 267–280 (2012)

    Google Scholar 

  8. Pu, Q., Li, H., Zaharia, M., et al.: FairRide: near-optimal, fair cache sharing. In: Proceedings of NSDI, pp. 393–406 (2016)

    Google Scholar 

  9. Zaharia, M., Chowdhury, M., Das, T., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI, pp. 15–28 (2012)

    Google Scholar 

  10. Zhang, S., Han, J., Liu, Z., et al.: Accelerating MapReduce with distributed memory cache. In: Proceedings of ICPADS, pp. 472–478 (2009)

    Google Scholar 

  11. Luo, Y., Luo, S., Guan, J., et al.: A RAMCloud storage system based on HDFS: architecture, implementation and evaluation. J. Syst. Softw. 86(3), 744–750 (2013)

    Article  MathSciNet  Google Scholar 

  12. Ma, Q., Steenkiste, P., Zhang, H.: Routing high-bandwidth traffic in max-min fair share networks. In: Proceedings of SIGCOMM, pp. 206–217 (1996)

    Article  Google Scholar 

  13. Cao, Z., Zegura, W.: Utility max-min: an application-oriented bandwidth allocation scheme. In: Proceedings of INFOCOM, pp. 793–801 (1999)

    Google Scholar 

  14. Luo, Y., Guo, J., Zhu, J., Guan, J., Zhou, S.: Towards efficiently supporting database as a service with QoS guarantees. J. Syst. Softw. 139, 51–63 (2018)

    Article  Google Scholar 

  15. Luo, Y., Guo, J., Zhu, J., Guan, J., Zhou, S.: Supporting cost-efficient multi-tenant database services with service level objectives (SLOs). In: Candan, S., Chen, L., Pedersen, T.B., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10177, pp. 592–606. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55753-3_37

    Chapter  Google Scholar 

  16. Luo, Y., Shi, J., Zhou, S.: JeCache: just-enough data caching with just-in-time prefetching for big data applications. In: Proceedings of ICDCS, pp. 2405–2410 (2017)

    Google Scholar 

  17. Tang, S., Lee, B., He, B., et al.: Long-term resource fairness: towards economic fairness on pay-as-you-use computing systems. In: Proceedings of ICS, pp. 251–260 (2014)

    Google Scholar 

  18. Ghodsi, A., Zaharia, M., Hindman, B., et al.: Dominant resource fairness: fair allocation of multiple resource types. In: Proceedings of NSDI, pp. 323–336 (2011)

    Google Scholar 

  19. Redis. http://redis.io

  20. HDFS. http://hadoop.apache.org/hdfs

  21. Spark. http://spark.apache.org

  22. Memcached. http://danga.com/memcached

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (NSFC) (No. U1636205), and the Science and Technology Innovation Action Program of Science and Technology Commission of Shanghai Municipality (STCSM) (No. 17511105204).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuigeng Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, Y., Guo, J., Zhou, S. (2018). EarnCache: Self-adaptive Incremental Caching for Big Data Applications. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10988. Springer, Cham. https://doi.org/10.1007/978-3-319-96893-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96893-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96892-6

  • Online ISBN: 978-3-319-96893-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics