ABSTRACT
In data centers, caches work both to provide low IO latencies and to reduce the load on the back-end network and storage. But they are not designed for multi-tenancy; system-level caches today cannot be configured to match tenant or provider objectives. Exacerbating the problem is the increasing number of un-coordinated caches on the IO data plane. The lack of global visibility on the control plane to coordinate this distributed set of caches leads to inefficiencies, increasing cloud provider cost.
We present Moirai, a tenant- and workload-aware system that allows data center providers to control their distributed caching infrastructure. Moirai can help ease the management of the cache infrastructure and achieve various objectives, such as improving overall resource utilization or providing tenant isolation and QoS guarantees, as we show through several use cases. A key benefit of Moirai is that it is transparent to applications or VMs deployed in data centers. Our prototype runs unmodified OSes and databases, providing immediate benefit to existing applications.
- Moirai prototype. https://github.com/ioan-stefanovici/Moirai.Google Scholar
- S. Angel, H. Ballani, T. Karagiannis, G. O'Shea, and E. Thereska. End-to-end performance isolation through virtual datacenters. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), Broomfield, CO, USA, Oct. 2014. Google ScholarDigital Library
- H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron. Towards predictable datacenter networks. In Proceedings of the ACM SIGCOMM 2011 Conference, Toronto, Ontario, Canada. Google ScholarDigital Library
- J.-P. Billaud and A. Gulati. hclock: Hierarchical qos for packet scheduling in a hypervisor. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, Prague, Czech Republic, 2013. Google ScholarDigital Library
- N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. Tao: Facebook's distributed data store for the social graph. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC'13, San Jose, CA, USA. Google ScholarDigital Library
- M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and S. Shenker. Ethane: taking control of the enterprise. In Proceedings of ACM SIGCOMM 2007, Kyoto, Japan. Google ScholarDigital Library
- Z. Chen, Y. Zhang, Y. Zhou, H. Scott, and B. Schiefer. Empirical evaluation of multi-level buffer cache collaboration for storage systems. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '05, Banff, Alberta, Canada. Google ScholarDigital Library
- G. Chockler, G. Laden, and Y. Vigfusson. Data caching as a cloud service. In Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware, LADIS '10, Zrich, Switzerland, 2010. Google ScholarDigital Library
- G. Chockler, G. Laden, and Y. Vigfusson. Design and implementation of caching services in the cloud. IBM Journal of Research and Development, 55(6):9:1--9:11, Nov 2011. Google ScholarDigital Library
- J. Choi, S. H. Noh, S. L. Min, and Y. Cho. An implementation study of a detection-based adaptive block replacement scheme. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '99, Monterey, California, 1999. Google ScholarDigital Library
- H.-T. Chou and D. J. DeWitt. An evaluation of buffer management strategies for relational database systems. In Proceedings of the 11th International Conference on Very Large Data Bases - Volume 11, VLDB '85, Stockholm, Sweden, 1985. Google ScholarDigital Library
- A. Dragojevic, D. Narayanan, O. Hodson, and M. Castro. Farm: Fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, Seattle, WA, 2014. Google ScholarDigital Library
- A. D. Ferguson, A. Guha, C. Liang, R. Fonseca, and S. Krishnamurthi. Participatory networking: An API for application control of SDNs. In Proceedings of ACM SIGCOMM 2013, Hong Kong, 2013. Google ScholarDigital Library
- B. Fitzpatrick. Distributed caching with memcached. Linux J., 2004(124):5--, Aug. 2004. Google ScholarDigital Library
- C. Gniady, A. R. Butt, and Y. C. Hu. Program-counter-based pattern classification in buffer caching. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, OSDI'04, San Francisco, CA, 2004. Google ScholarDigital Library
- R. B. Gramacy, M. K. Warmuth, S. A. Brandt, and I. Ari. Adaptive caching by refetching. In In Advances in Neural Information Processing Systems 15, pages 1465--1472. MIT Press, 2002.Google Scholar
- A. Gulati, I. Ahmad, and C. A. Waldspurger. Parda: proportional allocation of resources for distributed storage access. In Proccedings of Usenix FAST 2009, San Francisco, California, 2009. Google ScholarDigital Library
- A. Gulati, A. Merchant, and P. J. Varman. mClock: handling throughput variability for hypervisor IO scheduling. In Proceedings of USENIX OSDI 2010, Vancouver, BC, Canada, 2010. Google ScholarDigital Library
- C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun, W. Wu, and Y. Zhang. Secondnet: A data center network virtualization architecture with bandwidth guarantees. In Proceedings of the 6th International COnference, Co-NEXT '10, Philadelphia, Pennsylvania, 2010. Google ScholarDigital Library
- D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference engine: Harnessing memory redundancy in virtual machines. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, San Diego, California, 2008. Google ScholarDigital Library
- K. Harty and D. R. Cheriton. Application-controlled physical memory using external page-cache management. In Proceedings of ACM ASPLOS 1992, Boston, Massachusetts, USA, 1992. Google ScholarDigital Library
- Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, and H. C. Li. An analysis of Facebook photo caching. In Proceedings of ACM SOSP 2013, Farmington, Pennsylvania, USA, 2013. Google ScholarDigital Library
- S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a globally-deployed software defined wan. In Proceedings of ACM SIGCOMM 2013, Hong Kong, China, 2013. Google ScholarDigital Library
- V. Jeyakumar, M. Alizadeh, D. Mazières, B. Prabhakar, C. Kim, and A. Greenberg. Eyeq: Practical network performance isolation at the edge. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI '13, Lombard, IL, 2013. Google ScholarDigital Library
- J. M. Kim, J. Choi, J. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references. In Proceedings of the 4th Conference on Symposium on Operating System Design & Implementation, OSDI '00, San Diego, California, 2000. Google ScholarDigital Library
- T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, and S. Shenker. Onix: a distributed control platform for large-scale production networks. In Proceedings of USENIX OSDI 2010, Vancouver, BC, Canada, 2010. Google ScholarDigital Library
- C.-H. Lee, M. C. Chen, and R.-C. Chang. Hipec: High performance external virtual memory caching. In Proceedings of USENIX OSDI 1994, Monterey, California, USA, 1994. Google ScholarDigital Library
- X. Li, A. Aboulnaga, K. Salem, A. Sachedina, and S. Gao. Second-tier cache management using write hints. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies, FAST '05, San Francisco, CA, 2005. Google ScholarDigital Library
- N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In Proceedings of the 2Nd USENIX Conference on File and Storage Technologies, FAST '03, San Francisco, CA, 2003. Google ScholarDigital Library
- G. Miłós, D. G. Murray, S. Hand, and M. A. Fetterman. Satori: Enlightened page sharing. In Proceedings of the 2009 Conference on USENIX Annual Technical Conference, USENIX'09, San Diego, California, 2009. Google ScholarDigital Library
- L. Popa, G. Kumar, M. Chowdhury, A. Krishnamurthy, S. Ratnasamy, and I. Stoica. Faircloud: Sharing the network in cloud computing. In Proceedings of the ACM SIGCOMM 2012, Helsinki, Finland, 2012. Google ScholarDigital Library
- Z. A. Qazi, C.-C. Tu, L. Chiang, R. Miao, S. Vyas, and M. Yu. SIMPLE-fying middlebox policy enforcement using SDN. In Proceedings of the ACM SIGCOMM 2013, Hong Kong, 2013. Google ScholarDigital Library
- T. Saemundsson, H. Bjornsson, G. Chockler, and Y. Vigfusson. Dynamic performance profiling of cloud caches. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, Seattle, WA, USA, 2014. Google ScholarDigital Library
- A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha. Sharing the data center network. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, Boston, MA, 2011. Google ScholarDigital Library
- D. Shue, M. J. Freedman, and A. Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In Proceedings of Usenix OSDI 2012, Hollywood, CA, USA, 2012. Google ScholarDigital Library
- I. Stefanovici, E. Thereska, G. OShea, B. Schroeder, H. Ballani, T. Karagiannis, A. Rowstron, and T. Talpey. Software-defined caching: Managing caches in multi-tenant data centers. Technical Report CSRG-626, Department of Computer Science, University of Toronto, ftp://ftp.cs.toronto.edu/csrg-technical-reports/626/ut-csrg-626.pdf, 2015.Google Scholar
- H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Trans. Comput., 41(9):1054--1068, Sept. 1992. Google ScholarDigital Library
- E. Thereska, H. Ballani, G. O'Shea, T. Karagiannis, A. Rowstrow, T. Talpey, R. Black, and T. Zhu. IOFlow: A software-defined storage architecture. In Proceedings of ACM SOSP, Farmington, Pennsylvania, USA, 2013. Google ScholarDigital Library
- N. Tolia, M. Kaminsky, D. G. Andersen, and S. Patil. An architecture for internet data transfer. In Proceedings of USENIX NSDI 2006, San Jose, CA, 2006. Google ScholarDigital Library
- C. A. Waldspurger. Memory resource management in VMware ESX server. SIGOPS Oper. Syst. Rev., 36(SI):181--194, Dec. 2002. Google ScholarDigital Library
- C. A. Waldspurger, N. Park, A. Garthwaite, and I. Ahmad. Efficient MRC construction with SHARDS. In 13th USENIX Conference on File and Storage Technologies (FAST 15), Santa Clara, CA, Feb. 2015. Google ScholarDigital Library
- J. Wires, S. Ingram, Z. Drudi, N. J. A. Harvey, and A. Warfield. Characterizing storage workloads with counter stacks. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), Broomfield, CO, Oct. 2014. Google ScholarDigital Library
- T. M. Wong and J. Wilkes. My cache or yours? making storage more exclusive. In Proceedings of USENIX ATC 2002, Monterey, California, 2002. Google ScholarDigital Library
- H. Yan, D. A. Maltz, T. S. E. Ng, H. Gogineni, H. Zhang, and Z. Cai. Tesseract: a 4D network control plane. In Proceedings of USENIX NSDI 2007, Cambridge, MA, 2007. Google ScholarDigital Library
Index Terms
- Software-defined caching: managing caches in multi-tenant data centers
Recommendations
Designing A Simple Storage Services (S3) Compatible System Based on Ceph Software-Defined Storage System
ICMSSP 2017: Proceedings of the 2017 2nd International Conference on Multimedia Systems and Signal ProcessingAs a result of cloud computing evolution, the rapid growth of the data center storage has prompted enterprises to develop their own public clouds, which, to name a few, include Amazon Web Services (AWS), Microsoft Azure, IBM SoftLayer etc. These public ...
On construction of a cloud storage system with heterogeneous software-defined storage technologies
With the rapid development of networks and Information technologies, cloud computing is not only becoming popular, the types of cloud services available are also increasing. Through cloud services, users can upload their requirements via the Internet to ...
A Conceptual Platform of SLA in Cloud Computing
DASC '11: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure ComputingCloud computing is a promising technology, where the infrastructure, developing platform, software and storage are delivered as a service. With the development of cloud computing, more and more cloud service providers emerge. However, there are no ...
Comments