Abstract
Memory caching plays a crucial role in satisfying the requirements for (quasi-)real-time processing of exploding data on big-data clusters. As big data clusters are usually shared by multiple computing frameworks, applications or end users, there exists intense competition for memory cache resources, especially on small clusters that are supposed to process comparably big datasets as large clusters do, yet with tightly limited resource budgets. Applying existing on-demand caching strategies on such shared clusters inevitably results in frequent cache thrashing when the conflicts of simultaneous cache resource demands are not mediated, which will deteriorate the overall cluster efficiency.
In this paper, we propose a novel self-adaptive incremental big data caching mechanism, called EarnCache, to improve the cache efficiency for shared big data clusters, especially for small clusters where cache thrashing may occur frequently. EarnCache self-adaptively adjusts resource allocation strategy according to the condition of cache resource competition: turning to incremental caching to depress competition when resource is in deficit, and returning to traditional on-demand caching to expedite data caching-in when resource is in surplus. Extensive experimental evaluation shows that the elasticity of EarnCache enhances the cache efficiency on shared big data clusters, and thus improves resource utilization.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Zhang, J., Wu, G., Hu, X., et al.: A distributed cache for Hadoop distributed file system in real-time cloud services. In: Proceedings of GRID, pp. 12–21 (2012)
Ousterhout, J., Agrawal, P., Erickson, D., et al.: The case for RAMCloud. Commun. ACM 54(7), 121–130 (2011)
Dean, J., Ghemawat, S., et al.: MapReduce: simplified data processing on large cluster. In: Proceedings of OSDI, pp. 137–150 (2004)
Li, H., Ghodsi, A., Zaharia, M., et al.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of SOCC, pp. 6:1–6:15 (2014)
Li, Y., Feng, D., Shi, Z.: Enhancing both fairness and performance using rate-aware dynamic storage cache partitioning. In: Proceedings of DISCS, pp. 31–36 (2013)
Shvachko, K., Kuang, H., Radia, S., et al.: The Hadoop distributed file system. In: Proceedings of MSST, pp. 121–134 (2010)
Ananthanarayanan, G., Ghodsi, A., Warfield, A., et al.: PACMan: coordinated memory caching for parallel jobs. In: Proceedings of NSDI, pp. 267–280 (2012)
Pu, Q., Li, H., Zaharia, M., et al.: FairRide: near-optimal, fair cache sharing. In: Proceedings of NSDI, pp. 393–406 (2016)
Zaharia, M., Chowdhury, M., Das, T., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI, pp. 15–28 (2012)
Zhang, S., Han, J., Liu, Z., et al.: Accelerating MapReduce with distributed memory cache. In: Proceedings of ICPADS, pp. 472–478 (2009)
Luo, Y., Luo, S., Guan, J., et al.: A RAMCloud storage system based on HDFS: architecture, implementation and evaluation. J. Syst. Softw. 86(3), 744–750 (2013)
Ma, Q., Steenkiste, P., Zhang, H.: Routing high-bandwidth traffic in max-min fair share networks. In: Proceedings of SIGCOMM, pp. 206–217 (1996)
Cao, Z., Zegura, W.: Utility max-min: an application-oriented bandwidth allocation scheme. In: Proceedings of INFOCOM, pp. 793–801 (1999)
Luo, Y., Guo, J., Zhu, J., Guan, J., Zhou, S.: Towards efficiently supporting database as a service with QoS guarantees. J. Syst. Softw. 139, 51–63 (2018)
Luo, Y., Guo, J., Zhu, J., Guan, J., Zhou, S.: Supporting cost-efficient multi-tenant database services with service level objectives (SLOs). In: Candan, S., Chen, L., Pedersen, T.B., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10177, pp. 592–606. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55753-3_37
Luo, Y., Shi, J., Zhou, S.: JeCache: just-enough data caching with just-in-time prefetching for big data applications. In: Proceedings of ICDCS, pp. 2405–2410 (2017)
Tang, S., Lee, B., He, B., et al.: Long-term resource fairness: towards economic fairness on pay-as-you-use computing systems. In: Proceedings of ICS, pp. 251–260 (2014)
Ghodsi, A., Zaharia, M., Hindman, B., et al.: Dominant resource fairness: fair allocation of multiple resource types. In: Proceedings of NSDI, pp. 323–336 (2011)
Redis. http://redis.io
Spark. http://spark.apache.org
Memcached. http://danga.com/memcached
Acknowledgements
This work was supported by National Natural Science Foundation of China (NSFC) (No. U1636205), and the Science and Technology Innovation Action Program of Science and Technology Commission of Shanghai Municipality (STCSM) (No. 17511105204).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Luo, Y., Guo, J., Zhou, S. (2018). EarnCache: Self-adaptive Incremental Caching for Big Data Applications. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10988. Springer, Cham. https://doi.org/10.1007/978-3-319-96893-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-96893-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96892-6
Online ISBN: 978-3-319-96893-3
eBook Packages: Computer ScienceComputer Science (R0)