skip to main content
10.1145/3064176.3064208acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Malacology: A Programmable Storage System

Published:23 April 2017Publication History

ABSTRACT

Storage systems need to support high-performance for special-purpose data processing applications that run on an evolving storage device technology landscape. This puts tremendous pressure on storage systems to support rapid change both in terms of their interfaces and their performance. But adapting storage systems can be difficult because unprincipled changes might jeopardize years of code-hardening and performance optimization efforts that were necessary for users to entrust their data to the storage system. We introduce the programmable storage approach, which exposes internal services and abstractions of the storage stack as building blocks for higher-level services. We also build a prototype to explore how existing abstractions of common storage system services can be leveraged to adapt to the needs of new data processing systems and the increasing variety of storage devices. We illustrate the advantages and challenges of this approach by composing existing internal abstractions into two new higher-level services: a file system metadata load balancer and a high-performance distributed shared-log. The evaluation demonstrates that our services inherit desirable qualities of the back-end storage system, including the ability to balance load, efficiently propagate service metadata, recover from failure, and navigate trade-offs between latency and throughput using leases.

References

  1. Ceph Architecture. URL http://docs.ceph.com/docs/master/architecture.Google ScholarGoogle Scholar
  2. P. Alvaro, N. Conway, J. M. Hellerstein, and W. R. Marczak. Consistency Analysis in Bloom: A CALM and Collected Approach. In Proceedings 5th Biennial Conference on Innovative Data Systems Research, CIDR '11, Asilomar, CA, January 2011.Google ScholarGoogle Scholar
  3. Apache Parquet Contributors. Parquet Columnar Storage Format, http://parquet.io.Google ScholarGoogle Scholar
  4. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. A View of Cloud Computing. Communications of the ACM, vol. 53, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. C. Arpaci-Dusseau and R. H. Arpaci-Dusseau. Information and Control in Gray-box systems. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, SOSP '01, Banff, Alberta, Canada, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis. CORFU: A Shared Log Design for Flash Clusters. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI '12, San Jose, CA, April 2012.Google ScholarGoogle Scholar
  7. M. Balakrishnan, D. Malkhi, T. Wobber, M. Wu, V. Prabhakaran, M. Wei, J. D. Davis, S. Rao, T. Zou, and A. Zuck. Tango: Distributed Data Structures Over a Shared Log. In Proceedings of the 24th ACM Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, November 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. A. Bernstein, C. W. Reid, and S. Das. Hyder -- A Transactional Record Manager for Shared Flash. In Proceedings 5th Biennial Conference on Innovative Data Systems Research, CIDR '11, Asilomar, CA, January 2011.Google ScholarGoogle Scholar
  9. P. A. Bernstein, C. W. Reid, M. Wu, and X. Yuan. Optimistic Concurrency Control by Melding Trees. In Proceedings of the 37th International Conference on Very Large Data Bases, VLDB '11, August 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. A. Bernstein, S. Das, B. Ding, and M. Pilman. Optimizing Optimistic Concurrency Control for Tree-Structured, Log-Structured Databases. In Proceedings of the ACM International Conference on Management of Data, SIGMOD '15, Melbourne, Australia, May 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fiuczynski, D. Becker, C. Chambers, and S. Eggers. Extensibility Safety and Performance in the SPIN Operating System. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, SOSP '95, Copper Mountain, CO, December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Brewer, L. Ying, L. Greenfield, R. Cypher, and T. T'so. Disks for Data Centers. Technical Report, Google, 2016.Google ScholarGoogle Scholar
  13. M. Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI '06, Seattle, WA, November 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Carlson, A. Yoder, L. Schoeb, D. Deel, C. Pratt, C. Lionetti, and D. Voigt. Software Defined Storage. SNIA Whitepaper, January 2015.Google ScholarGoogle Scholar
  15. D. R. Engler, M. F. Kaashoek, and J. J. O'Toole. Exokernel: An Operating System Architecture for Application-Level Resource Management. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, SOSP 95, Copper Mountain, CO, December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Epping, Duncan and Denneman, Frank. VMware vSphere 5.1 Clustering Deepdive, accessed 03/21/2014, http://www.vmware.com/product/drs.Google ScholarGoogle Scholar
  17. R. Geambasu, A. A. Levy, T. Kohno, A. Krishnamurthy, and H. M. Levy. Comet: An Active Distributed Key-Value Store. In Proceedings of the 9th USENIX conference on Operating Systems Design and Implementation, OSDI '10, Vancouver, Canada, October 2010.Google ScholarGoogle Scholar
  18. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proceedings of the 14th ACM Symposium on Operating Systems Principles, SOSP '03, Bolton Landing, NY, October 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Gracia-Tinedo et al. IOStack: Software-Defined Object Storage. IEEE Internet Computing, 20(3):10--18, May-June 2016. Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Grawinkel, T. Sub, G. Best, I. Popov, and A. Brinkmann. Towards Dynamic Scripted pNFS Layouts. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC '12, Salt Lake City, UT, November 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Gray. Tape is Dead,Disk is Tape, Flash is Disk, RAM Locality is King. CIDR 2007 - Gong Show Presentation, January 2007.Google ScholarGoogle Scholar
  22. J. Gray and B. Fitzgerald. Flash Disk Opportunity for Server Applications. Queue, vol. 6, Juy 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Gulati, G. Shanmuganathan, A. Holler, and I. Ahmad. Cloud-Scale Resource Management: Challenges and Techniques. In Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing, HotCloud '11, Portland, OR, June 2011.Google ScholarGoogle Scholar
  24. J. M. Hellerstein and M. Stonebraker. Anatomy of a Database System. Readings in Database Systems, January 2005.Google ScholarGoogle Scholar
  25. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free Coordination for Internet-Scale Systems. In Proceedings of the USENIX Annual Technical Conference, ATC '10, Boston, MA, June 2010.Google ScholarGoogle Scholar
  26. R. Ierusalimschy, L. H. De Figueiredo, and W. Celes Filho. Lua - An Extensible Extension Language. Software Practical Experiences, 26(6):635--652, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hlzle, S. Stuart, and A. Vahdat. B4: Experience with a Globally-Deployed Software Defined WAN. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM '13, Hong Kong, China, August 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. Jimenez, M. Sevilla, N. Watkins, C. Maltzahn, J. Lofstead, K. Mohror, R. Arpaci-Dusseau, and A. Arpaci-Dusseau. Popper: Making Reproducible Systems Performance Evaluation Practical, UCSC-SOE-16-10. Technical Report UCSC-SOE-16-10, UC Santa Cruz, May 2016.Google ScholarGoogle Scholar
  29. L. Joao. Ceph's New Monitor Changes. URL https://ceph.com/dev-notes/cephs-new-monitor-changes.Google ScholarGoogle Scholar
  30. L. Lamport. The Part-Time Parliament. ACM Transactions on Computer Systems, 16(2):133--169, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Linux Foundation. Kinetic Open Storage Project, 2015. URL https://www.openkinetic.org/.Google ScholarGoogle Scholar
  32. J. MacCormick, N. Murphy, M. Najork, andramohan A. Thekkath, and L. Zhou. Boxwood: Abstractions as the Foundation for Storage Infrastructure. In Proceedings of the 6th USENIX Symposium on Operarting Systems Design and Implementation, OSDI '04, San Francisco, CA, December 2004.Google ScholarGoogle Scholar
  33. M. Mesnier, F. Chen, and J. B. Akers. Differentiated Storage Services. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, Cascais, Portugal, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rackspace. ZeroVM and OpenStack Swift. URL http://www.zerovm.org/zerocloud.html.Google ScholarGoogle Scholar
  35. E. Riedel, G. A. Gibson, and C. Faloutsos. Active Storage For Large-Scale Data Mining and Multimedia. In Proceedings of the 24th international Conference on Very Large Databases, VLDB '98, New York, NY, July 1998.Google ScholarGoogle Scholar
  36. M. I. Seltzer, Y. Endo, C. Small, and K. A. Smith. Dealing with Disaster: Surviving Misbehaved Kernel Extensions. In Proceedings of the 2nd Symposium on Operating Systems Design and Implementation, OSDI '96, Seattle, WA, October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. A. Sevilla, N. Watkins, C. Maltzahn, I. Nassi, S. A. Brandt, S. A. Weil, G. Farnum, and S. Fineberg. Mantle: A Programmable Metadata Load Balancer for the Ceph File System. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '15, November 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. V. Shvachko, H. Kuang, S. Radia, and bert Chansler. The Hadoop Distributed File System. In Proceedings of the 26th Symposium on Mass Storage Systems and Technologies, MSST '10, Incline Village, NV, May 2010.Google ScholarGoogle Scholar
  39. M. Sivathanu, V. Prabhakaran, F. I. Popovici, T. E. Denehy, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dussea. Semantically-Smart Disk Systems. In Proceedings of the 2rd USENIX Conference on File and Storage Technologies, FAST '03, San Francisco, CA, March 2003.Google ScholarGoogle Scholar
  40. SNIA. Implementing Multiple Cloud Storage APIs, November 2014. URL http://www.sniacloud.com/?p=88.Google ScholarGoogle Scholar
  41. I. Stefanovici, B. Schroeder, G. O'Shea, and E. Thereska. sRoute: Treating the Storage Stack Like a Network. In Proceedings of the 15th USENIX Conference on File and Storage Technologies, FAST '16, Santa Clara, CA, February 2016.Google ScholarGoogle Scholar
  42. R. Thakur, W. Gropp, and E. Lusk. On Implementing MPIIO Portably and with High Performance. In Proceedings of the th Workshop on I/O in Parallel and Distributed Systems, IOPADS '99, Atlanta, Georgia, May 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. E. Thereska, H. Ballani, G. O'Shea, T. Karagiannis, A. Rowstron, T. Talpey, R. Black, and T. Zhu. IOFlow: A Software-Defined Storage Architecture. In Proceedings of the 24th ACM Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, November 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. van Renesse and F. B. Schneider. Chain Replication for Supporting High Throughput and Availability. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation, OSDI '04, San Francisco, CA, December 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. Vieira Neto, R. Ierusalimschy, A. L. de Moura, and M. Balmer. Scriptable Operating Systems with Lua. In Proceedings of the 10th ACM Symposium on Dynamic Languages, DLS '14, New York, NY, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. N. Watkins. Dynamic Object Interfaces with Lua. URL http://ceph.com/rados/dynamic-object-interfaces-with-lua.Google ScholarGoogle Scholar
  47. N. Watkins, C. Maltzahn, S. Brandt, and A. Manzanares. DataMods: Programmable File System Services. In Proceedings of the 6th Workshop on Parallel Data Storage, PDSW '12, Salt Lake City, Utah, November 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. N. Watkins, C. Maltzahn, S. Brandt, I. Pye, and A. Manzanares. In-Vivo Storage System Development. In Euro-Par: Parallel Processing Workshops, Aachen, Germany, August 2013.Google ScholarGoogle Scholar
  49. S. A. Weil, K. T. Pollack, S. A. Brandt, and E. L. Miller. Dynamic Metadata Management for Petabyte-Scale File Systems. In Proceedings of the 17th ACM/IEEE Conference on Supercomputing, SC '04, Pittsburgh, PA, November 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. A. Weil, A. W. Leung, S. A. Brandt, and C. Maltzahn. RADOS: A Scalable, Reliable Storage Service for Petabyte-Scale Storage Clusters. In Proceedings of the 2nd International Workshop on Petascale Data Storage, PDSW '07, Reno, NV, November 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Malacology: A Programmable Storage System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems
      April 2017
      648 pages
      ISBN:9781450349383
      DOI:10.1145/3064176

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 April 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate241of1,308submissions,18%

      Upcoming Conference

      EuroSys '24
      Nineteenth European Conference on Computer Systems
      April 22 - 25, 2024
      Athens , Greece

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader