research-article

Untangling cluster management with Helix

Authors:
Kishore Gopalakrishna

LinkedIn, Mountain View, CA

LinkedIn, Mountain View, CA
View Profile

,
Shi Lu

LinkedIn, Mountain View, CA

LinkedIn, Mountain View, CA
View Profile

,
Zhen Zhang

LinkedIn, Mountain View, CA

LinkedIn, Mountain View, CA
View Profile

,
Adam Silberstein

LinkedIn, Mountain View, CA

LinkedIn, Mountain View, CA
View Profile

,
Kapil Surlaker

LinkedIn, Mountain View, CA

LinkedIn, Mountain View, CA
View Profile

,
Ramesh Subramonian

LinkedIn, Mountain View, CA

LinkedIn, Mountain View, CA
View Profile

,
Bob Schulman

LinkedIn, Mountain View, CA

LinkedIn, Mountain View, CA
View Profile

SoCC '12: Proceedings of the Third ACM Symposium on Cloud ComputingOctober 2012Article No.: 19Pages 1–13https://doi.org/10.1145/2391229.2391248

Published:14 October 2012Publication History

SoCC '12: Proceedings of the Third ACM Symposium on Cloud Computing

Pages 1–13

ABSTRACT

Distributed data systems systems are used in a variety of settings like online serving, offline analytics, data transport, and search, among other use cases. They let organizations scale out their workloads using cost-effective commodity hardware, while retaining key properties like fault tolerance and scalability. At LinkedIn we have built a number of such systems. A key pattern we observe is that even though they may serve different purposes, they tend to have a lot of common functionality, and tend to use common building blocks in their architectures. One such building block that is just beginning to receive attention is cluster management, which addresses the complexity of handling a dynamic, large-scale system with many servers. Such systems must handle software and hardware failures, setup tasks such as bootstrapping data, and operational issues such as data placement, load balancing, planned upgrades, and cluster expansion.

All of this shared complexity, which we see in all of our systems, motivates us to build a cluster management framework, Helix, to solve these problems once in a general way.

Helix provides an abstraction for a system developer to separate coordination and management tasks from component functional tasks of a distributed system. The developer defines the system behavior via a state model that enumerates the possible states of each component, the transitions between those states, and constraints that govern the system's valid settings. Helix does the heavy lifting of ensuring the system satisfies that state model in the distributed setting, while also meeting the system's goals on load balancing and throttling state changes. We detail several Helix-managed production distributed systems at LinkedIn and how Helix has helped them avoid building custom management components. We describe the Helix design and implementation and present an experimental study that demonstrates its performance and functionality.

References

Apache Cassandra. http://cassandra.apache.org.Google Scholar
Apache Hadoop. http://hadoop.apache.org/.Google Scholar
Apache Hadoop NextGen MapReduce (YARN). http://hadoop.apache.org/.Google Scholar
Apache HBase. http://hbase.apache.org/.Google Scholar
Apache Mesos. http://incubator.apache.org/mesos/.Google Scholar
Hedwig. https://cwiki.apache.org/ZOOKEEPER/hedwig.html.Google Scholar
MongoDB. http://www.mongodb.org/.Google Scholar
SenseiDB. http://www.senseidb.com/.Google Scholar
Zookeeper. http://zookeeper.apache.org.Google Scholar
F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarDigital Library
B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB, 2008. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarDigital Library
R. Honicky and E. Miller. Replication under scalable hashing: A family of algorithms for scalable decentralized data distribution. In IPDPS, 2004.Google Scholar
LinkedIn Data Infrastructure Team. Data infrastructure at LinkedIn. In ICDE, 2012.Google Scholar
J. Shute et al. F1-the fault-tolerant distributed rdbms supporting google's ad business. In SIGMOD, 2012. Google ScholarDigital Library
M. Zaharia et al. The datacenter needs an operating system. In HotCloud, 2011. Google ScholarDigital Library

Index Terms

Untangling cluster management with Helix
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

Multi-nucleation and vectorial folding pathways of large helix protein

Graphical abstractAt present a unified picture of how large real proteins fold is still absent. We simulated the folding of a large eight-helix-bundle protein with a length of 145 amino acids by using a united-residue protein model and observed a ...
Read More
Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins

Motivation: In many proteins, helix--helix interactions can be critical to establishing protein conformation (folding) and dynamics, as well as determining associations between protein units. However, the determination of a set of rules that guide ...
Read More
Predicting helix–helix interactions from residue contacts in membrane proteins

Motivation: Helix–helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SoCC '12: Proceedings of the Third ACM Symposium on Cloud Computing
October 2012
325 pages
ISBN:9781450317610
DOI:10.1145/2391229
Program Chairs:
Michael Carey
UC Irvine
,
Steven Hand
University of Cambridge
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 473
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Untangling cluster management with Helix

SoCC '12: Proceedings of the Third ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-nucleation and vectorial folding pathways of large helix protein

Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins

Predicting helix–helix interactions from residue contacts in membrane proteins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Untangling cluster management with Helix

SoCC '12: Proceedings of the Third ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-nucleation and vectorial folding pathways of large helix protein

Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins

Predicting helix–helix interactions from residue contacts in membrane proteins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media