skip to main content
article

BitVault: a highly reliable distributed data retention platform

Published:01 April 2007Publication History
Skip Abstract Section

Abstract

This paper summarizes our experience designing and implementing BitVault: a content-addressable retention platform for large volumes of reference data -- seldom-changing information that needs to be retained for a long time. BitVault uses "smart bricks" as the building block to lower the hardware cost. The challenges are to keep management costs low in a system that scales from one brick to tens of thousands, to ensure reliability, and to deliver a simple design. Our design incorporates peer-to-peer (P2P) technologies for self-managing and self-healing and uses massively parallel repair to reduce system vulnerability to data loss. The simplicity of the architecture relies on an eventually reliable membership service provided by a perfect one-hop distributed hash table (DHT). Its object-driven repair model yields last-replica recall guarantee independent of the failure scenario. So long as the last copy of a data object remains in the system, that data can be retrieved and its replication degree can be restored. A prototype has been implemented. Theoretical analysis, simulations and experiments have been conducted to validate the design of BitVault.

References

  1. "Enterprise Storage Group Reference Information: The Next Wave.", June 2002.Google ScholarGoogle Scholar
  2. A. Adya, W. J. Bolosky, M. Castro, et al, "FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment", OSDI'02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. E. Anderson, M. D. Dahlin, J. M. Neefe, et al. "Serverless Network File Systems", SOSP'95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Arpaci-Dusseau, R. Arpaci-Dusseau, et al, "Manageable Storage via Adaptation in WiND", CCGrid'01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bhagwan, K. Tati, Y. C. Cheng et al, "Total Recall: System Support for Automated Availability Management", NSDI'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Birman and R. van Renesse, "Reliable Distributed Computing with ISIS Toolkit", IEEE Computing Society Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Black, R. Rodrigues, "High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two", HOTOS'03.Google ScholarGoogle Scholar
  8. M. Chen, W. Chen, L. K. Liu, Z. Zhang, "An Analytical Framework and Its Applications for Studying Brick Storage Reliability", submitted to DCCS'07.Google ScholarGoogle Scholar
  9. W. Chen, X. Z. Liu, "Enforcing Routing Consistency in Structured Peer-to-Peer Overlays: Should We and Could We?", IPTPS'06.Google ScholarGoogle Scholar
  10. G. V. Chockler, I. Keidar, and R. Vitenburg, "Group communication specifications: A comprehensive study", ACM Computing Surveys, 88:4, 2001, 427--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. P. Cox, C. D. Murray, B. D. Noble, "Pastiche: Making Back-up Cheap and Easy", OSDI'02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Dabek, M. F. Kaashoek, D. Karger, et al, "Wide-area cooperative storage with CFS", SOSP'01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Das, I. Gupta, A. Motivala, "SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol", DSN'02 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. EMC-Centara: http://www.emc.com/products/systems/centera.jspGoogle ScholarGoogle Scholar
  15. S. Frolund, A. Merchant, Y. Saito, et al, "FAB: enterprise storage systems on a shoestring", HOTOS'03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Ghemawat, H. Gobioff, S. T. Leung, "The Google File System", SOSP'03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. A. Gibson, D. F. Nagle, K. Amiri, et al. "A Cost-Effective, High-Bandwidth Storage Architecture", ASPLOS'98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Gray, W. Chong, T. Barclay, et al. "TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange", MSR Technical Report No. MSR-TR-2002-54.Google ScholarGoogle Scholar
  19. J. Gray, "Storage Bricks Have Arrived," invited talk FAST'02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Y. Guo, X. Wang, X. Z. Liu, W. Lin and Z. Zhang. "BOX: Icing the APIs", submitted to HotOS'07Google ScholarGoogle Scholar
  21. D. Kostić, A. Rodriguez, J. Albrecht, et al, "Using Random Subsets to Build Scalable Network Services", USITS'03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Kubiatowicz, D. Bindel, Y. Chen, et al, "OceanStore: An Architecture for Global-Scale Persistent Storage", ASPLOS'00. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Q. Lian, W. Chen, Z. Zhang, "On the Impact of Replica Placement to the Reliability of Distributed Brick Storage Systems", ICDCS'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. K. Lee, C. A. Thekkath, "Petal: Distributed Virtual Disks", ASPLOS'96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. D. Lin, Q. Lian, M. Chen et al, "A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems", IPTPS'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Z. Liu, W. Lin, A. M. Pan, Z. Zhang. "WiDS-Checker: Combating Bugs in Distributed Systems", to appear in NSDI'07 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. D. Lin, A. M. Pan, R. Guo, Z. Zhang, "Simulating Large-Scale P2P Systems with the WiDS Toolkit", MASCOTS'05 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. D. Lin, A. M. Pan, Z. Zhang, R. Guo, Z. Y. Guo, "WiDS: an Intergrated Toolkit for Distributed System Development", HotOS'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou, Boxwood: Abstractions as the Foundation for Storage Infrastructure, ODSI'04 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Muthitacharoen, R. Morris, T. M. Gil, et al, "Ivy: A Read/Write Peer-to-peer File System", OSDI'02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Patterson, A. Brown, P. Broadwell, et al, "Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies", UCB Technical Report No. UCB/CSD-02-1125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Quinlan, S. Dorward, "Venti: a new approach to archival storage", FAST'02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Ratnasamy, P. Francis, M. Handley, et al, "A Scalable Content-Addressable Network", SIGCOMM'01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Rhea, P. Eaton, D. Geels, et al, "Pond: the OceanStore Prototype". FAST '03 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Rowstron, P. Druschel, "Pastry: Scalable, Distributed Object Location and Routing for Large-scale Peer-to-peer Systems", IFIP/ACM Middleware'01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Rowstron and P. Druschel, "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility", SOSP'01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. I. Stoica, R. Morris, D. Karger, et al, "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications", SIGCOMM'01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. N. Talagala, S. Asami, D. Patterson, et al, "Tertiary Disk: Large Scale Distributed Storage", UCB Technical Report No. UCB//CSD-98-989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Tang, Y, Chen, Z. Zhang, "Machine Bank: Own Your Virtual Personal Computer", to be appear in IPDPS'07.Google ScholarGoogle Scholar
  40. C. A. Thekkath, T. Mann, E. K. Lee, "Frangipani: A Scalable Distributed File System", SOSP'97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Z. Zhang, Q. Lian, Y. Chen, "XRing a Robust and High-Performance P2P DHT", Microsoft Research Technical Report No. MSR-TR-2004-93.Google ScholarGoogle Scholar
  42. Z. Zhang, Q. Lian, S. D. Lin, W. Chen, Y. Chen, C. Jin, "Bitvault: a Highly Reliable Distributed Data Retention Platform", Microsoft Research Technical Report No. MSR-TR-2005-179.Google ScholarGoogle Scholar
  43. Z. Zhang, S. D. Lin, Q. Lian, C. Jin, "RepStore: A Self-Managing and Self-Tuning Storage Backend with Smart Bricks", ICAC'04 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Z. Zhang, S. M. Shi, J. Zhu, "SOMO: Self-Organized Metadata Overlay for Resource Management", IPTPS'03.Google ScholarGoogle Scholar
  45. B. Y. Zhao, J. Kubiatowicz, A. D. Josep, "Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing", UCB Technical Report No. UCB/CSD-01-1141. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BitVault: a highly reliable distributed data retention platform

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGOPS Operating Systems Review
              ACM SIGOPS Operating Systems Review  Volume 41, Issue 2
              Systems work at Microsoft Research
              April 2007
              93 pages
              ISSN:0163-5980
              DOI:10.1145/1243418
              Issue’s Table of Contents

              Copyright © 2007 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 April 2007

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader