skip to main content
research-article

Kinesis: A new approach to replica placement in distributed storage systems

Published: 09 February 2009 Publication History

Abstract

Kinesis is a novel data placement model for distributed storage systems. It exemplifies three design principles: structure (division of servers into a few failure-isolated segments), freedom of choice (freedom to allocate the best servers to store and retrieve data based on current resource availability), and scattered distribution (independent, pseudo-random spread of replicas in the system). These design principles enable storage systems to achieve balanced utilization of storage and network resources in the presence of incremental system expansions, failures of single and shared components, and skewed distributions of data size and popularity. In turn, this ability leads to significantly reduced resource provisioning costs, good user-perceived response times, and fast, parallelized recovery from independent and correlated failures.
This article validates Kinesis through theoretical analysis, simulations, and experiments on a prototype implementation. Evaluations driven by real-world traces show that Kinesis can significantly outperform the widely used Chain replica-placement strategy in terms of resource requirements, end-to-end delay, and failure recovery.

References

[1]
Azar, Y., Broder, A. Z., Karlin, A. R., and Upfal, E. 1999. Balanced allocations. SIAM J. Comput. 29, 1, 180--200.
[2]
Berenbrink, P., Czumaj, A., Steger, A., and Vöcking, B. 2000. Balanced allocations: the heavily loaded case. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC).
[3]
Byers, J., Considine, J., and Mitzenmacher, M. 2003. Simple load balancing for distributed hash tables. In Proceedings of the International Workshop on Peer-to-Peer Systems (IPTPS).
[4]
Czumaj, A., Riley, C., and Scheideler, C. 2003. Perfectly balanced allocation.
[5]
Dabek, F., Kaashoek, M., Karger, D., Morris, R., and Stoica, I. 2001. Wide-Area cooperative storage with CFS. In Proceedings of the SIGOPS Symposium on Operating Systems Principles (SOSP).
[6]
Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the SIGOPS Symposium on Operating Systems Principles (SOSP).
[7]
Godfrey, B., Lakshminarayanan, K., Surana, S., Karp, R., and Stoica, I. 2004. Load balancing in dynamic structured p2p systems. In Proceedings of the Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM).
[8]
Hsiao, H. and DeWitt, D. J. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. In Proceedings of the International Conference on Data Engineering (ICDE).
[9]
Ji, M., Felten, E. W., Wang, R., and Singh, J. P. 2000. Archipelago: An island-based file system for highly available and scalable internet services. In Proceedings of the Windows Systems Symposium.
[10]
Karger, D., Lehman, E., Leighton, T., Levine, M., Lewin, D., and Panigrahy, R. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC).
[11]
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. OceanStore: An architecture for global-scale persistent storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[12]
Lee, E. K. and Thekkath, C. A. 1996. Petal: Distributed virtual disks. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[13]
Litwin, W. 1980. Linear hashing: A new tool for file and table addressing. In Proceedings of the Intlernational Conference on Very Large Data Bases (VLDB).
[14]
Lumb, C. R., Golding, R., and Ganger, G. R. 2004. DSPTF: Decentralized request distribution in brickbased storage systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[15]
MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[16]
Pagh, R. and Rodler, F. F. 2004. Cuckoo hashing. J. Algor. 51, 2, 122--144.
[17]
Pai, V. S., Aron, M., Banga, G., Svendsen, M., Druschel, P., Zwaenepoel, W., and Nahum, E. 1998. Locality-Aware request distribution in cluster-based network servers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[18]
Quinlan, S. and Dorward, S. 2002. Venti: A new approach to archival storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST).
[19]
Rowstron, A. and Druschel, P. 2001. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the SIGOPS Symposium on Operating Systems Principles (SOSP).
[20]
Sanders, P., Egner, S., and Korst, J. H. M. 2003. Fast concurrent access to parallel disks. Algorithmica 35, 1, 21--55.
[21]
Talwar, K. and Wieder, U. 2007. Ballanced allocations: The weighted case. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC).
[22]
van Renesse, R. and Schneider, F. B. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[23]
Vöcking, B. 1999. How asymmetry helps load balancing. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). New York, NY.
[24]
Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D. E., and Maltzahn, C. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[25]
Weil, S. A., Brandt, S. A., Miller, E. L., and Maltzahn, C. 2006. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the International Conference on Super Computing (SC).
[26]
Wieder, U. 2007. Ballanced allocations with heterogeneous bins. In Proceedings of the Sympostiom on Parallel Algorithms and Architecture (SPAA).

Cited By

View all
  • (2024)Towards Energy-Efficient and Thermal-Aware Data Placement for Storage ClustersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33516849:4(631-647)Online publication date: Jul-2024
  • (2024)A Scalable, Fault Resilient and Balanced Storage Architecture for Cyber-Physical Systems2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA61579.2024.10665062(1-6)Online publication date: 5-Aug-2024
  • (2023)TADRP: Toward Thermal-Aware Data Replica Placement in Data-Intensive Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2023.326386420:4(4397-4415)Online publication date: 1-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 4, Issue 4
January 2009
116 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/1480439
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2009
Accepted: 01 May 2008
Revised: 01 May 2008
Received: 01 February 2008
Published in TOS Volume 4, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Storage system
  2. load balancing
  3. multiple-choice paradigm

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Energy-Efficient and Thermal-Aware Data Placement for Storage ClustersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33516849:4(631-647)Online publication date: Jul-2024
  • (2024)A Scalable, Fault Resilient and Balanced Storage Architecture for Cyber-Physical Systems2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA61579.2024.10665062(1-6)Online publication date: 5-Aug-2024
  • (2023)TADRP: Toward Thermal-Aware Data Replica Placement in Data-Intensive Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2023.326386420:4(4397-4415)Online publication date: 1-Dec-2023
  • (2023)Popularity-Based Data Placement With Load Balancing in Edge ComputingIEEE Transactions on Cloud Computing10.1109/TCC.2021.309646711:1(397-411)Online publication date: 1-Jan-2023
  • (2023)Managing data replication and distribution in the fog with FReDSoftware: Practice and Experience10.1002/spe.323753:10(1958-1981)Online publication date: 11-Jul-2023
  • (2022)RLRP: High-Efficient Data Placement with Reinforcement Learning for Modern Distributed Storage Systems2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00064(595-605)Online publication date: May-2022
  • (2022)A storage computing architecture with multiple NDP devices for accelerating compaction performance in LSM-tree based KV storesJournal of Systems Architecture10.1016/j.sysarc.2022.102681(102681)Online publication date: Jul-2022
  • (2021)Towards Predictive Replica Placement for Distributed Data Stores in Fog Environments2021 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E52221.2021.00047(280-281)Online publication date: Oct-2021
  • (2019)Enabling Efficient Updates in KV Storage via HashingACM Transactions on Storage10.1145/334028715:3(1-29)Online publication date: 13-Aug-2019
  • (2018)HashKVProceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference10.5555/3277355.3277451(1007-1019)Online publication date: 11-Jul-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media