Abstract
This paper summarizes our experience designing and implementing BitVault: a content-addressable retention platform for large volumes of reference data -- seldom-changing information that needs to be retained for a long time. BitVault uses "smart bricks" as the building block to lower the hardware cost. The challenges are to keep management costs low in a system that scales from one brick to tens of thousands, to ensure reliability, and to deliver a simple design. Our design incorporates peer-to-peer (P2P) technologies for self-managing and self-healing and uses massively parallel repair to reduce system vulnerability to data loss. The simplicity of the architecture relies on an eventually reliable membership service provided by a perfect one-hop distributed hash table (DHT). Its object-driven repair model yields last-replica recall guarantee independent of the failure scenario. So long as the last copy of a data object remains in the system, that data can be retrieved and its replication degree can be restored. A prototype has been implemented. Theoretical analysis, simulations and experiments have been conducted to validate the design of BitVault.
- "Enterprise Storage Group Reference Information: The Next Wave.", June 2002.Google Scholar
- A. Adya, W. J. Bolosky, M. Castro, et al, "FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment", OSDI'02. Google ScholarDigital Library
- T. E. Anderson, M. D. Dahlin, J. M. Neefe, et al. "Serverless Network File Systems", SOSP'95. Google ScholarDigital Library
- A. Arpaci-Dusseau, R. Arpaci-Dusseau, et al, "Manageable Storage via Adaptation in WiND", CCGrid'01. Google ScholarDigital Library
- R. Bhagwan, K. Tati, Y. C. Cheng et al, "Total Recall: System Support for Automated Availability Management", NSDI'04. Google ScholarDigital Library
- K. Birman and R. van Renesse, "Reliable Distributed Computing with ISIS Toolkit", IEEE Computing Society Press, 1994. Google ScholarDigital Library
- C. Black, R. Rodrigues, "High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two", HOTOS'03.Google Scholar
- M. Chen, W. Chen, L. K. Liu, Z. Zhang, "An Analytical Framework and Its Applications for Studying Brick Storage Reliability", submitted to DCCS'07.Google Scholar
- W. Chen, X. Z. Liu, "Enforcing Routing Consistency in Structured Peer-to-Peer Overlays: Should We and Could We?", IPTPS'06.Google Scholar
- G. V. Chockler, I. Keidar, and R. Vitenburg, "Group communication specifications: A comprehensive study", ACM Computing Surveys, 88:4, 2001, 427--469. Google ScholarDigital Library
- L. P. Cox, C. D. Murray, B. D. Noble, "Pastiche: Making Back-up Cheap and Easy", OSDI'02. Google ScholarDigital Library
- F. Dabek, M. F. Kaashoek, D. Karger, et al, "Wide-area cooperative storage with CFS", SOSP'01. Google ScholarDigital Library
- A. Das, I. Gupta, A. Motivala, "SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol", DSN'02 Google ScholarDigital Library
- EMC-Centara: http://www.emc.com/products/systems/centera.jspGoogle Scholar
- S. Frolund, A. Merchant, Y. Saito, et al, "FAB: enterprise storage systems on a shoestring", HOTOS'03. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, S. T. Leung, "The Google File System", SOSP'03. Google ScholarDigital Library
- G. A. Gibson, D. F. Nagle, K. Amiri, et al. "A Cost-Effective, High-Bandwidth Storage Architecture", ASPLOS'98. Google ScholarDigital Library
- J. Gray, W. Chong, T. Barclay, et al. "TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange", MSR Technical Report No. MSR-TR-2002-54.Google Scholar
- J. Gray, "Storage Bricks Have Arrived," invited talk FAST'02. Google ScholarDigital Library
- Z. Y. Guo, X. Wang, X. Z. Liu, W. Lin and Z. Zhang. "BOX: Icing the APIs", submitted to HotOS'07Google Scholar
- D. Kostić, A. Rodriguez, J. Albrecht, et al, "Using Random Subsets to Build Scalable Network Services", USITS'03. Google ScholarDigital Library
- J. Kubiatowicz, D. Bindel, Y. Chen, et al, "OceanStore: An Architecture for Global-Scale Persistent Storage", ASPLOS'00. Google ScholarDigital Library
- Q. Lian, W. Chen, Z. Zhang, "On the Impact of Replica Placement to the Reliability of Distributed Brick Storage Systems", ICDCS'05. Google ScholarDigital Library
- E. K. Lee, C. A. Thekkath, "Petal: Distributed Virtual Disks", ASPLOS'96. Google ScholarDigital Library
- S. D. Lin, Q. Lian, M. Chen et al, "A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems", IPTPS'04. Google ScholarDigital Library
- X. Z. Liu, W. Lin, A. M. Pan, Z. Zhang. "WiDS-Checker: Combating Bugs in Distributed Systems", to appear in NSDI'07 Google ScholarDigital Library
- S. D. Lin, A. M. Pan, R. Guo, Z. Zhang, "Simulating Large-Scale P2P Systems with the WiDS Toolkit", MASCOTS'05 Google ScholarDigital Library
- S. D. Lin, A. M. Pan, Z. Zhang, R. Guo, Z. Y. Guo, "WiDS: an Intergrated Toolkit for Distributed System Development", HotOS'05. Google ScholarDigital Library
- J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou, Boxwood: Abstractions as the Foundation for Storage Infrastructure, ODSI'04 Google ScholarDigital Library
- A. Muthitacharoen, R. Morris, T. M. Gil, et al, "Ivy: A Read/Write Peer-to-peer File System", OSDI'02. Google ScholarDigital Library
- D. Patterson, A. Brown, P. Broadwell, et al, "Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies", UCB Technical Report No. UCB/CSD-02-1125. Google ScholarDigital Library
- S. Quinlan, S. Dorward, "Venti: a new approach to archival storage", FAST'02. Google ScholarDigital Library
- S. Ratnasamy, P. Francis, M. Handley, et al, "A Scalable Content-Addressable Network", SIGCOMM'01. Google ScholarDigital Library
- S. Rhea, P. Eaton, D. Geels, et al, "Pond: the OceanStore Prototype". FAST '03 Google ScholarDigital Library
- A. Rowstron, P. Druschel, "Pastry: Scalable, Distributed Object Location and Routing for Large-scale Peer-to-peer Systems", IFIP/ACM Middleware'01. Google ScholarDigital Library
- A. Rowstron and P. Druschel, "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility", SOSP'01. Google ScholarDigital Library
- I. Stoica, R. Morris, D. Karger, et al, "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications", SIGCOMM'01. Google ScholarDigital Library
- N. Talagala, S. Asami, D. Patterson, et al, "Tertiary Disk: Large Scale Distributed Storage", UCB Technical Report No. UCB//CSD-98-989. Google ScholarDigital Library
- S. Tang, Y, Chen, Z. Zhang, "Machine Bank: Own Your Virtual Personal Computer", to be appear in IPDPS'07.Google Scholar
- C. A. Thekkath, T. Mann, E. K. Lee, "Frangipani: A Scalable Distributed File System", SOSP'97. Google ScholarDigital Library
- Z. Zhang, Q. Lian, Y. Chen, "XRing a Robust and High-Performance P2P DHT", Microsoft Research Technical Report No. MSR-TR-2004-93.Google Scholar
- Z. Zhang, Q. Lian, S. D. Lin, W. Chen, Y. Chen, C. Jin, "Bitvault: a Highly Reliable Distributed Data Retention Platform", Microsoft Research Technical Report No. MSR-TR-2005-179.Google Scholar
- Z. Zhang, S. D. Lin, Q. Lian, C. Jin, "RepStore: A Self-Managing and Self-Tuning Storage Backend with Smart Bricks", ICAC'04 Google ScholarDigital Library
- Z. Zhang, S. M. Shi, J. Zhu, "SOMO: Self-Organized Metadata Overlay for Resource Management", IPTPS'03.Google Scholar
- B. Y. Zhao, J. Kubiatowicz, A. D. Josep, "Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing", UCB Technical Report No. UCB/CSD-01-1141. Google ScholarDigital Library
Index Terms
- BitVault: a highly reliable distributed data retention platform
Recommendations
An Efficient Hybrid Peer-to-Peer System for Distributed Data Sharing
Peer-to-peer overlay networks are widely used in distributed systems. Based on whether a regular topology is maintained among peers, peer-to-peer networks can be divided into two categories: structured peer-to-peer networks in which peers are connected ...
The Design and Evaluation of a Self-Organizing Superpeer Network
Superpeer architectures exploit the heterogeneity of nodes in a peer-to-peer (P2P) network by assigning additional responsibilities to higher capacity nodes. In the design of a superpeer network for file sharing, several issues have to be addressed: how ...
Comments