skip to main content
10.1145/2901318.2901343acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Optimizing distributed actor systems for dynamic interactive services

Published: 18 April 2016 Publication History

Abstract

Distributed actor systems are widely used for developing interactive scalable cloud services, such as social networks and on-line games. By modeling an application as a dynamic set of lightweight communicating "actors", developers can easily build complex distributed applications, while the underlying runtime system deals with low-level complexities of a distributed environment.
We present ActOp---a data-driven, application-independent runtime mechanism for optimizing end-to-end service latency of actor-based distributed applications. ActOp targets the two dominant factors affecting latency: the overhead of remote inter-actor communications across servers, and the intra-server queuing delay. ActOp automatically identifies frequently communicating actors and migrates them to the same server transparently to the running application. The migration decisions are driven by a novel scalable distributed graph partitioning algorithm which does not rely on a single server to store the whole communication graph, thereby enabling efficient actor placement even for applications with rapidly changing graphs (e.g., chat services). Further, each server autonomously reduces the queuing delay by learning an internal queuing model and configuring threads according to instantaneous request rate and application demands.
We prototype ActOp by integrating it with Orleans -- a popular open-source actor system [4, 13]. Experiments with realistic workloads show latency improvements of up to 75% for the 99th percentile, up to 63% for the mean, with up to 2x increase in peak system throughput.

References

[1]
akka. http://akka.io/.
[2]
Event Tracing for Windows (ETW). https://msdn.microsoft.com/en-us/library/windows/desktop/bb968803(v=vs.85).aspx.
[3]
Erlang. http://www.erlang.org/.
[4]
Orleans. https://github.com/dotnet/orleans.
[5]
Who is using Orleans. http://dotnet.github.io/orleans/Who-Is-Using-Orleans.
[6]
Azure Reliable Actors. https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-actors-introduction/.
[7]
Apache Camel. https://projects.apache.org/projects/camel.html.
[8]
Halo using Orleans. https://gigaom.com/2014/12/15/microsoft-open-sources-cloud-framework-that-powers-halo/.
[9]
Orbit. https://github.com/electronicarts/orbit.
[10]
SwiftMQ. http://www.swiftmq.com/.
[11]
WhatsApp Scaling. http://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf.
[12]
S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):5, 2009.
[13]
P. A. Bernstein, S. Bykov, A. Geller, G. Kliot, and J. Thelin. Orleans: Distributed virtual actors for programmability and scalability. Technical report, MSR Technical Report (MSR-TR-2014-41, 24). http://research.microsoft.com/apps/pubs/default.aspx?id=210931, 2014.
[14]
D. P. Bertsekas and R. G. Gallager. Data networks. Prentice-Hall International, 2nd edition, 1992.
[15]
B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, et al. Windows Azure storage: a highly available cloud storage service with strong consistency. In Proc. of the 23rd ACM Symposium on Operating Systems Principles, pages 143--157, 2011.
[16]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In ACM SIGOPS Operating Systems Review, volume 41, pages 205--220, 2007.
[17]
B. Fitzpatrick. Distributed caching with memcached. Linux journal, 2004(124):5, 2004.
[18]
M. E. Gordon. Stage scheduling for CPU-intensive servers. University of Cambridge, Computer Laboratory, Technical Report, (UCAM-CL-TR-781), 2010.
[19]
M. F. Kaashoek and D. R. Karger. Koorde: A simple degree-optimal distributed hash table. In Peer-to-peer systems II, pages 98--107. Springer, 2003.
[20]
G. Karypis and V. Kumar. METIS: Unstructured graph partitioning and sparse matrix ordering system, version 2.0. Technical report, University of Minnesota, 1995.
[21]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing, 20(1):359--392, 1998.
[22]
R. Krauthgamer, J. S. Naor, and R. Schwartz. Partitioning graphs into balanced components. In Proc. of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 942--949, 2009.
[23]
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a PC. In Proc. of the Symposium on Operating Systems Design and Implementation (OSDI), pages 31--46, 2012.
[24]
T. Leighton and S. Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM (JACM), 46(6):787--832, 1999.
[25]
Z. Li, D. Levy, S. Chen, and J. Zic. Auto-tune design and evaluation on staged event-driven architecture. In Proc. of the 1st workshop on MOdel Driven Development for Middleware (MODDM), pages 1--6, 2006.
[26]
A. Metwally, D. Agrawal, and A. El Abbadi. Efficient computation of frequent and top-k elements in data streams. In Database Theory-ICDT 2005, pages 398--412. Springer, 2005.
[27]
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, et al. Scaling memcache at Facebook. In Proc. of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI), pages 385--398, 2013.
[28]
J. M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine(s) that could: Scaling online social networks. ACM SIGCOMM Computer Communication Review, 41(4):375--386, 2011.
[29]
H. Räcke. Optimal hierarchical decompositions for congestion minimization in networks. In Proc. of the 40th ACM Symposium on Theory of Computing, pages 255--264, 2008.
[30]
F. Rahimian, A. H. Payberah, S. Girdzijauskas, M. Jelasity, and S. Haridi. Ja-Be-Ja: A distributed algorithm for balanced graph partitioning. In Proc. of the 7th International Conference on Self-Adaptive and Self-Organizing Systems (SASO), pages 51--60, 2013.
[31]
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of the 18th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 1222--1230, 2012.
[32]
V. Venkataramani, Z. Amsden, N. Bronson, G. Cabrera III, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, J. Hoon, et al. Tao: How Facebook serves the social graph. In Proc. of the 2012 ACM SIGMOD International Conference on Management of Data, pages 791--792, 2012.
[33]
M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for Well-Conditioned, Scalable Internet Services. In ACM SIGOPS Operating Systems Review, volume 35, pages 230--243, 2001.
[34]
M. D. Welsh. An architecture for highly concurrent, well-conditioned internet services. PhD thesis, University of California at Berkeley, 2002.

Cited By

View all
  • (2024)A Runtime System for Interruptible Query Processing: When Incremental Computing Meets Fine-Grained ParallelismProceedings of the ACM on Programming Languages10.1145/36897728:OOPSLA2(1729-1756)Online publication date: 8-Oct-2024
  • (2020)Scalable and serializable networked multi-actor programmingProceedings of the ACM on Programming Languages10.1145/34282664:OOPSLA(1-30)Online publication date: 13-Nov-2020
  • (2019)PartisanProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358814(63-76)Online publication date: 10-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroSys '16: Proceedings of the Eleventh European Conference on Computer Systems
April 2016
605 pages
ISBN:9781450342407
DOI:10.1145/2901318
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • Israel Science Foundation
  • Israeli Ministry of Science

Conference

EuroSys '16
EuroSys '16: Eleventh EuroSys Conference 2016
April 18 - 21, 2016
London, United Kingdom

Acceptance Rates

EuroSys '16 Paper Acceptance Rate 38 of 180 submissions, 21%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Runtime System for Interruptible Query Processing: When Incremental Computing Meets Fine-Grained ParallelismProceedings of the ACM on Programming Languages10.1145/36897728:OOPSLA2(1729-1756)Online publication date: 8-Oct-2024
  • (2020)Scalable and serializable networked multi-actor programmingProceedings of the ACM on Programming Languages10.1145/34282664:OOPSLA(1-30)Online publication date: 13-Nov-2020
  • (2019)PartisanProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358814(63-76)Online publication date: 10-Jul-2019
  • (2019)Integrating Concurrency Control in n-Tier Application Scaling Management in the CloudIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287108630:4(855-869)Online publication date: 1-Apr-2019
  • (2019)ElasticActorInternational Journal of Parallel Programming10.1007/s10766-018-0613-747:3(520-534)Online publication date: 1-Jun-2019
  • (2017)Programmable Elasticity for Actor-based Cloud ApplicationsProceedings of the 9th Workshop on Programming Languages and Operating Systems10.1145/3144555.3144558(15-21)Online publication date: 28-Oct-2017
  • (2017)GPUNFVProceedings of the First Asia-Pacific Workshop on Networking10.1145/3106989.3106990(85-91)Online publication date: 3-Aug-2017
  • (2017)Analysing Message Numbers in Actor Systems2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP.2017.86(291-294)Online publication date: 2017
  • (2016)Programming Scalable Cloud Services with AEONProceedings of the 17th International Middleware Conference10.1145/2988336.2988352(1-14)Online publication date: 28-Nov-2016
  • (2016)Developing Cloud Services Using the Orleans Virtual Actor ModelIEEE Internet Computing10.1109/MIC.2016.10820:5(71-75)Online publication date: 1-Sep-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media