skip to main content
10.1145/2611286.2611304acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

Scalable and elastic realtime click stream analysis using StreamMine3G

Published: 26 May 2014 Publication History

Abstract

Click stream analysis is a common approach for analyzing customer behavior during the navigation through e-commerce or social network sites. Performing such an analysis in real-time opens up new business opportunities as well as increases revenues as recommendations can be generated on the fly making a previously unknown product to the potential customer attractive.
As click streams are highly fluctuating as well as must be processed in real time, there is a high demand for Event-Stream-Processing (ESP) engines that are (1) horizontally as well as vertically scalable, (2) elastic in order to cope with the fluctuation in the data stream, and (3) provide efficient state management mechanisms in order to drive such kind of analysis. However, the majority of the nowadays ESP engines such as Apache S4 or Storm provide neither explicit state management nor techniques for elastic scaling.
In this paper, we present StreamMine3G, a scalable and elastic ESP engine which provides state management out of the box, scales with the number of nodes as well as cores and improves performance due to a novel delegation mechanisms lowering contention on state as well as network links caused by fluctuations and temporary imbalances in the data streams.

References

[1]
Amazon aws. https://aws.amazon.com, February, 15th 2014.
[2]
Apache httpd piped logs. https://httpd.apache.org/docs/2.2/logs.html#piped, February, 15th 2014.
[3]
Boost.asio. http://www.boost.org/doc/libs/1_55_0/doc/html/boost_asio.html, February, 15th 2014.
[4]
hadoop. http://hadoop.apache.org/, February, 15th 2014.
[5]
netty.io. http://netty.io/, February, 15th 2014.
[6]
Storm. http://storm-project.net/, February, 15th 2014.
[7]
Streammine3g. https://streammine3g.inf.tu-dresden.de/, February, 15th 2014.
[8]
zeromq. http://zeromq.org/, February, 15th 2014.
[9]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the borealis stream processing engine. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR'05), Asilomar, CA, January 2005.
[10]
A. Brito, A. Martin, T. Knauth, S. Creutz, D. Becker, S. Weigert, and C. Fetzer. Scalable and low-latency data processing with stream mapreduce. In Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, pages 48--58, 2011.
[11]
R. Castro Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Integrating scale out and fault tolerance in stream processing using operator state management. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 725--736, New York, NY, USA, 2013. ACM.
[12]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, Jan. 2008.
[13]
Y. Gu, Z. Zhang, F. Ye, H. Yang, M. Kim, H. Lei, and Z. Liu. An empirical study of high availability in stream processing systems. In Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware, Middleware '09, pages 23:1--23:9, New York, NY, USA, 2009. Springer-Verlag New York, Inc.
[14]
V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel and Distributed Systems, 23:2351--2365, 2012.
[15]
T. Heinze, A. Martin, M. Pasin, R. Barazzutti, C. Fetzer, P. Felber, Z. Jerzak, E. Onica, and E. Riviere. estreamhub: Elastic scaling of a high-throughput content-based publish/subscribe engine. ICDCS '14, Washington, DC, USA, 2014. IEEE Computer Society.
[16]
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In In USENIX Annual Technical Conference.
[17]
Y. Kwon, M. Balazinska, and A. Greenberg. Fault-tolerant stream processing using a distributed, replicated file system. volume 1, pages 574--585. VLDB Endowment, Aug. 2008.
[18]
A. Martin, A. Brito, and C. Fetzer. Active replication at (almost) no cost. In SRDS '11: Proceedings of the 2011 30th IEEE International Symposium on Reliable Distributed Systems, pages 21--30, Washington, DC, USA, Oct 2011. IEEE Computer Society.
[19]
A. Martin, R. Marinho, A. Brito, and C. Fetzer. Grand challenge: Predicting energy consumption with streammine3g. In Proceedings of the 8th ACM International Conference on Distributed Event-based Systems, DEBS '14, New York, NY, USA, 2014. ACM.
[20]
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDM Workshops, pages 170--177, 2010.
[21]
D. L. Quoc, A. Martin, and C. Fetzer. Scalable and real-time deep packet inspection. In Workshop on Distributed Cloud Computing (DCC 2013), UCC '13, pages 446--451, Washington, DC, USA, 2013. IEEE Computer Society.
[22]
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An adaptive partitioning operator for continuous query systems. In Proceeding of the 19th Internationsal Conference on Data Engineering, pages 25--36, 2003.

Cited By

View all
  • (2024)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: Sep-2025
  • (2022)STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318197933:12(4221-4238)Online publication date: 1-Dec-2022
  • (2022)Elastic Resource Management in Stream ProcessingEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_191-2(1-7)Online publication date: 17-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '14: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems
May 2014
371 pages
ISBN:9781450327374
DOI:10.1145/2611286
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ESP
  2. click stream analysis
  3. migration
  4. scalability
  5. state management

Qualifiers

  • Research-article

Funding Sources

Conference

DEBS '14

Acceptance Rates

DEBS '14 Paper Acceptance Rate 16 of 174 submissions, 9%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)To Migrate or Not to Migrate: An Analysis of Operator Migration in Distributed Stream ProcessingIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333095326:1(670-705)Online publication date: Sep-2025
  • (2022)STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318197933:12(4221-4238)Online publication date: 1-Dec-2022
  • (2022)Elastic Resource Management in Stream ProcessingEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_191-2(1-7)Online publication date: 17-Mar-2022
  • (2021)MEAD: Model-Based Vertical Auto-Scaling for Data Stream Processing2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00041(314-323)Online publication date: May-2021
  • (2021)Self‐adaptation on parallel stream processing: A systematic reviewConcurrency and Computation: Practice and Experience10.1002/cpe.675934:6Online publication date: 7-Dec-2021
  • (2019)Automating Multi-level Performance Elastic Components for IBM StreamsProceedings of the 20th International Middleware Conference10.1145/3361525.3361544(163-175)Online publication date: 9-Dec-2019
  • (2019)STRETCHProceedings of the 13th ACM International Conference on Distributed and Event-based Systems10.1145/3328905.3329509(7-18)Online publication date: 24-Jun-2019
  • (2019)Pec: Proactive Elastic Collaborative Resource Scheduling in Data Stream ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.289158730:7(1628-1642)Online publication date: 1-Jul-2019
  • (2019)ElasticityEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_191(693-699)Online publication date: 20-Feb-2019
  • (2018)C-StreamACM Transactions on Parallel Computing10.1145/31841204:3(1-27)Online publication date: 27-Apr-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media