ABSTRACT
Gemini is a distributed crash recovery protocol for persistent caches. When a cache instance fails, Gemini assigns other cache instances to process its reads and writes. Once the failed instance recovers, Gemini starts to recover its persistent content while using it to process reads and writes immediately. Gemini does so while guaranteeing read-after-write consistency. It also transfers the working set of the application to the recovering instance to maximize its cache hit ratio. Our evaluation shows that Gemini restores hit ratio two orders of magnitude faster than a volatile cache. Working set transfer is particularly effective with workloads that exhibit an evolving access pattern.
Supplemental Material
- Atul Adya, Daniel Myers, Jon Howell, Jeremy Elson, Colin Meek, Vishesh Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, Roberto Peon, Larry Kai, Alexander Shraer, Arif Merchant, and Kfir Lev-Ari. 2016. Slicer: Auto-Sharding for Datacenter Applications. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 739--753. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/adya Google ScholarDigital Library
- Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles (SOSP '07). ACM, New York, NY, USA, 159--174. Google ScholarDigital Library
- Yazeed Alabdulkarim, Marwan Almaymoni, and Shahram Ghandeharizadeh. 2017. Polygraph. Technical Report 2017-02 http://dblab.usc.edu/Users/papers/PolygraphMay2017.pdf. USC Database Laboratory.Google Scholar
- S. Sanfilippo (antirez) and M. Kleppmann. 2018. Redlease and How To do distributed locking. http://redis.io/topics/distlock and http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html.Google Scholar
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-value Store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '12). ACM, New York, NY, USA, 53--64. Google ScholarDigital Library
- Redis contributors. 2018. Redis. https://redis.io/Google Scholar
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154. Google ScholarDigital Library
- James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-Distributed Database. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX Association, Hollywood, CA, 261--264. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/corbett Google ScholarDigital Library
- Peter J. Denning. 1967. The Working Set Model for Program Behavior. In Proceedings of the First ACM Symposium on Operating System Principles (SOSP '67). ACM, New York, NY, USA, 15.1--15.12. Google ScholarDigital Library
- Robert Escriva, Bernard Wong, and Emin Gün Sirer. 2012. HyperDex: A Distributed, Searchable Key-value Store. SIGCOMM Comput. Commun. Rev. 42, 4 (Aug. 2012), 25--36. Google ScholarDigital Library
- Daniel Ford, François Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in Globally Distributed Storage Systems. In Presented as part of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX, Vancouver, BC. https://www.usenix.org/conference/osdi10/availability-globally-distributed-storage-systems Google ScholarDigital Library
- Shahram Ghandeharizadeh, Marwan Almaymoni, and Haoyu Huang. 2018. Rejig: A Scalable Online Algorithm for Cache Server Configuration Changes. Technical Report 2018-05 http://dblab.usc.edu/Users/papers/rejig.pdf. USC Database Laboratory.Google Scholar
- Shahram Ghandeharizadeh and Haoyu Huang. 2018. Gemini: A Distributed Crash Recovery Protocol for Persistent Caches. Technical Report 2018-06 http://dblab.usc.edu/Users/papers/Gemini.pdf. USC Database Laboratory.Google Scholar
- Shahram Ghandeharizadeh, Jason Yap, and Hieu Nguyen. 2014. Strong Consistency in Cache Augmented SQL Systems. In Proceedings of the 15th International Middleware Conference (Middleware '14). ACM, New York, NY, USA, 181--192. Google ScholarDigital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP '03). ACM, New York, NY, USA, 29--43. Google ScholarDigital Library
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference (USENIX-ATC'10). USENIX Association, Berkeley, CA, USA, 11--11. http://dl.acm.org/citation.cfm?id=1855840.1855851 Google ScholarDigital Library
- Jinho Hwang and Timothy Wood. 2013. Adaptive Performance-Aware Distributed Memory Caching. In Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13). USENIX, San Jose, CA, 33--43. https://www.usenix.org/conference/icac13/technical-sessions/presentation/hwangGoogle Scholar
- Facebook Inc. 2018. McDipper. https://www.facebook.com/notes/facebook-engineering/mcdipper-a-key-value-cache-for-flash-storage/10151347090423920/Google Scholar
- Google Inc. 2018. Google Protocol Buffer. https://developers.google.com/protocol-buffersGoogle Scholar
- Twitter Inc. 2018. Fatcache. https://github.com/twitter/fatcacheGoogle Scholar
- Intel. 2018. pmem. http://pmem.io/Google Scholar
- Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, and Ryan Stutsman. 2017. Rocksteady: Fast Migration for Low-latency In-memory Storage. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 390--405. Google ScholarDigital Library
- USC Database Laboratory. 2018. Facebook Workload Generator. https://github.com/scdblab/fbworkload/tree/middleware18Google Scholar
- Leslie Lamport. 1998. The Part-time Parliament. ACM Trans. Comput. Syst. 16, 2 (May 1998), 133--169. Google ScholarDigital Library
- Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt, Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, and Wyatt Lloyd. 2015. Existential Consistency: Measuring and Understanding Consistency at Facebook. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 295--310. Google ScholarDigital Library
- Virendra J. Marathe, Margo Seltzer, Steve Byan, and Tim Harris. 2017. Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory. In 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 17). USENIX Association, Santa Clara, CA. https://www.usenix.org/conference/hotstorage17/program/presentation/marathe Google ScholarDigital Library
- memcached contributors. 2018. memcached. https://memcached.org/Google Scholar
- Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX, Lombard, IL, 385--398. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala Google ScholarDigital Library
- Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 29--41. Google ScholarDigital Library
- D. Skeen and M. Stonebraker. 1983. A Formal Model of Crash Recovery in a Distributed System. IEEE Trans. Softw. Eng. 9, 3 (May 1983), 219--228. Google ScholarDigital Library
- Linpeng Tang, Qi Huang, Wyatt Lloyd, Sanjeev Kumar, and Kai Li. 2015. RIPQ: Advanced Photo Caching on Flash for Facebook. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 373--386. http://dl.acm.org/citation.cfm?id=2750482.2750510 Google ScholarDigital Library
- Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. 2002. An Integrated Experimental Environment for Distributed Systems and Networks. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 255--270. Google ScholarDigital Library
- Xingbo Wu, Fan Ni, Li Zhang, Yandong Wang, Yufei Ren, Michel Hack, Zili Shao, and Song Jiang. 2016. NVMcached: An NVM-based Key-Value Cache. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys '16). ACM, New York, NY, USA, Article 18, 7 pages. Google ScholarDigital Library
- Shuotao Xu, Sungjin Lee, Sang-Woo Jun, Ming Liu, Jamey Hicks, and Arvind. 2016. Bluecache: A Scalable Distributed Flash-based Key-value Store. Proc. VLDB Endow. 10, 4 (Nov. 2016), 301--312. Google ScholarDigital Library
- Yiying Zhang, Gokul Soundararajan, Mark W. Storer, Lakshmi N. Bairavasundaram, Sethuraman Subbiah, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Warming Up Storage-Level Caches with Bonfire. In Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13). USENIX, San Jose, CA, 59--72. https://www.usenix.org/conference/fast13/technical-sessions/presentation/zhang Google ScholarDigital Library
- Timothy Zhu, Anshul Gandhi, Mor Harchol-Balter, and Michael A. Kozuch. Submitted. Saving Cash by Using Less Cache. In Presented as part of the. USENIX. https://www.usenix.org/conference/hotcloud12/saving-cash-using-less-cache Google ScholarDigital Library
Index Terms
- Gemini: A Distributed Crash Recovery Protocol for Persistent Caches
Recommendations
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and designChip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines
Cache memory plays a crucial role in determining the performance of processors, especially for embedded processors where area and power are tightly constrained. It is necessary to have effective management mechanisms, such as cache replacement policies, ...
The ZCache: Decoupling Ways and Associativity
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on MicroarchitectureThe ever-increasing importance of main memory latency and bandwidth is pushing CMPs towards caches with higher capacity and associativity. Associativity is typically improved by increasing the number of ways. This reduces conflict misses, but increases ...
Comments