skip to main content
10.1145/2335484.2335516acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

Understanding and improving the cost of scaling distributed event processing

Published: 16 July 2012 Publication History

Abstract

Building scalable back-end infrastructures for data-centric applications is becoming important. Applications used in data-centres have complex, multilayer software stacks and are required to scale to a large number of nodes. Today, there is increased interest in improving the efficiency of such software stacks. In this paper, we examine the efficiency of such a stack used for distributed stream processing, an important application domain. We use a specific streaming system, Borealis [10], and extensively hand-tune the end-to-end data path. We focus on parts of the stack that are related to intra- and inter-node communication and data exchange, a central component of many software stacks. We find that application-independent code in stream processing middleware employs operations for communication that consume significant amount of CPU cycles and are not strictly necessary. We first categorize these operations based on the protocol function they support. We then proceed to remove these operations by producing a functionally equivalent software stack in terms of application processing. Our results show that restructuring the data path achieves up to 5x higher throughput, reduces energy consumption by up to 60% and saves infrastructure cost by up to 40%. Finally, we project that with 1024-core processors per node, stream processing applications will demand up to 2 TBits/s/node of networking throughput.

References

[1]
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2):120--139, 2003.
[2]
U. E. P. Agency. Report to congress on server and data center energy efficiency.
[3]
S. Akram and A. Bilas. A sleep-based communication mechanism to save processor utilization in distributed streaming systems. In Proceedings of the Second Workshop on Computer Architecture and Operating System co-design, CAOS'11, 2011.
[4]
S. Akram, M. Marazakis, and A. Bilas. Understanding scalability and performance requirements of i/o intensive applications on future multicore servers. In Proceedings of the 20th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS'12. IEEE, 2012.
[5]
E. Anderson and J. Tucek. Efficiency matters! SIGOPS Oper. Syst. Rev., 44(1):40--45, 2010.
[6]
A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom. Stream: The stanford data stream management system. Technical Report 2004-20, Stanford InfoLab, 2004.
[7]
A. Arasu, S. Babu, and J. Widom. The cql continuous query language: semantic foundations and query execution. The VLDB Journal, 15(2):121--142, 2006.
[8]
A. Benner, P. Pepeljugoski, and R. Recio. A roadmap to 100g ethernet at the enterprise data center. Communications Magazine, IEEE, 45(11):10--17, november 2007.
[9]
N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W. Su. Myrinet: A gigabit-per-second local area network. IEEE Micro, 15(1):29--36, Feb. 1995.
[10]
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, Y. Xing, and S. Zdonik. Scalable distributed stream processing. In Proceedings of the 2003 CIDR Conference, CIDR'03, 2003.
[11]
EMC2. Extracting value from chaos.
[12]
X. Fan, W.-D. Weber, and L. A. Barroso. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th annual international symposium on Computer architecture, ISCA'07.
[13]
W.-c. Feng, J. G. Hurwitz, H. Newman, S. Ravot, R. L. Cottrell, O. Martin, F. Coccetti, C. Jin, X. D. Wei, and S. Low. Optimizing 10-gigabit ethernet for networks of workstations, clusters, and grids: A case study. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC'03, pages 50--, Washington, DC, USA, 2003. IEEE Computer Society.
[14]
M. Galili, J. Xu, H. C. Mulvad, L. K. Oxenløwe, A. T. Clausen, P. Jeppesen, B. Luther-Davies, S. Madden, A. Rode, D.-Y. Choi, M. Pelusi, F. Luan, and B. J. Eggleton. Breakthrough switching speed with an all-optical chalcogenide glass chip: 640 gbit/s demultiplexing. Opt. Express, 17(4):2182--2187, Feb 2009.
[15]
B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. Spade: the system s declarative stream processing engine. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, SIGMOD'08, pages 1123--1134, New York, NY, USA, 2008. ACM.
[16]
V. Gulisano, R. Jiménez-Peris, M. Patiño-Martínez, and P. Valduriez. Streamcloud: A large scale data streaming system. In Proceedings of the 30th International Conference on Distributed Computing Systems, ICDCS'10, pages 126--137. IEEE Computer Society, 2010.
[17]
M. Hericko, M. B. Juric, I. Rozman, S. Beloglavec, and A. Zivkovic. Object serialization analysis and comparison in java and .net. SIGPLAN Notes, 38:44--54, Aug 2003.
[18]
J. Hyde. Data in flight. ACM Queue, 7(11):20--26, 2009.
[19]
InfiniBand Trade Association. Infiniband Architecture Specification, Version 1.0, Oct. 2000.
[20]
A. Jacobs. The pathologies of big data. Commun. ACM, 52:36--44, Aug. 2009.
[21]
R. Khandekar, K. Hildrum, S. Parekh, D. Rajan, J. Wolf, K.-L. Wu, H. Andrade, and B. Gedik. Cola: optimizing stream processing applications via graph partitioning. In Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware, Middleware'09, pages 1--20, New York, NY, USA, 2009. Springer-Verlag New York, Inc.
[22]
N. Mitchell. The big pileup. In Proceedings of the International Symposium on Performance Analysis of Systems Software, ISPASS'10, page 1, march 2010.
[23]
N. Mitchell and G. Sevitsky. The causes of bloat, the limits of health. SIGPLAN Notes, 42:245--260, October 2007.
[24]
Myrinet Inc. Myrinet Express (MX): A High-Performance, Low-Level, Message-Passing Interface for Myrinet, Version 1.2., October 01, 2006.
[25]
R. J. Recio. Server i/o networks past, present, and future. In Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications, NICELI '03, pages 163--178, New York, NY, USA, 2003. ACM.
[26]
S. Rivoire, P. Ranganathan, and C. Kozyrakis. A comparison of high-level full-system power models. In Proceedings of the conference on Power aware computing and systems, HotPower'08, pages 3--3, Berkeley, CA, USA, 2008. USENIX Association.
[27]
S. Rivoire, M. Shah, P. Ranganatban, C. Kozyrakis, and J. Meza. Models and metrics to enable energy-efficiency optimizations. Computer, 40(12):39--48, dec. 2007.
[28]
M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. SIGMOD Rec., 34(4):42--47, 2005.
[29]
T. Suzumura, T. Yasue, and T. Onodera. Scalable performance of system s for extract-transform-load processing. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference, SYSTOR'10, pages 1--14, New York, NY, USA, 2010. ACM.
[30]
A. Vasan, A. Sivasubramaniam, V. Shimpi, T. Sivabalan, and R. Subbiah. Worth their watts? - an empirical study of datacenter servers. In Proceedings of the 16th International Symposium on High Performance Computer Architecture, HPCA'10, pages 1--10, Jan. 2010.
[31]
J. Wolf, N. Bansal, K. Hildrum, S. Parekh, D. Rajan, R. Wagle, K.-L. Wu, and L. Fleischer. Soda: An optimizing scheduler for large-scale stream-based distributed computer systems. In Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference, pages 306--325, Berlin, Heidelberg, 2008. Springer-Verlag.
[32]
A. Wright. Data streaming 2.0. Commun. ACM, 53:13--14, Apr. 2010.
[33]
Y. Xing, J.-H. Hwang, U. Çetintemel, and S. Zdonik. Providing resiliency to load variations in distributed stream processing. In Proceedings of the 32nd international conference on Very large data bases, VLDB'06, pages 775--786. VLDB Endowment, 2006.

Cited By

View all
  • (2020)Multiple stream job performance optimization with source operator graph transformationsConcurrency and Computation: Practice and Experience10.1002/cpe.565832:16Online publication date: 6-Jan-2020
  • (2018)Automatic optimization of stream programs via source program operator graph transformationsDistributed and Parallel Databases10.1007/s10619-013-7130-x31:4(543-599)Online publication date: 27-Dec-2018
  • (2018)Viper: Communication-Layer Determinism and Scaling in Low-Latency Stream ProcessingEuro-Par 2017: Parallel Processing Workshops10.1007/978-3-319-75178-8_11(129-140)Online publication date: 8-Feb-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '12: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
July 2012
410 pages
ISBN:9781450313155
DOI:10.1145/2335484
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 July 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data-centric infrastructures
  2. distributed event processing
  3. stream processing engines

Qualifiers

  • Research-article

Funding Sources

Conference

DEBS '12

Acceptance Rates

Overall Acceptance Rate 145 of 583 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Multiple stream job performance optimization with source operator graph transformationsConcurrency and Computation: Practice and Experience10.1002/cpe.565832:16Online publication date: 6-Jan-2020
  • (2018)Automatic optimization of stream programs via source program operator graph transformationsDistributed and Parallel Databases10.1007/s10619-013-7130-x31:4(543-599)Online publication date: 27-Dec-2018
  • (2018)Viper: Communication-Layer Determinism and Scaling in Low-Latency Stream ProcessingEuro-Par 2017: Parallel Processing Workshops10.1007/978-3-319-75178-8_11(129-140)Online publication date: 8-Feb-2018
  • (2017)Energy consumption analysis of data stream processingSoftware—Practice & Experience10.1002/spe.245847:10(1443-1462)Online publication date: 1-Oct-2017
  • (2015)Data-Streaming and Concurrent Data-Object Co-design: Overview and Algorithmic ChallengesAlgorithms, Probability, Networks, and Games10.1007/978-3-319-24024-4_15(242-260)Online publication date: 22-Nov-2015
  • (2014)Concurrent data structures for efficient streaming aggregationProceedings of the 26th ACM symposium on Parallelism in algorithms and architectures10.1145/2612669.2612701(76-78)Online publication date: 23-Jun-2014
  • (2012)Understanding Scalability and Performance Requirements of I/O-Intensive Applications on Future Multicore ServersProceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems10.1109/MASCOTS.2012.29(171-180)Online publication date: 7-Aug-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media