skip to main content
10.1145/2723372.2742783acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database

Published: 27 May 2015 Publication History

Abstract

Modern data-centric flows in the telecommunications industry require real time analytical processing over a rapidly changing and large dataset. The traditional approach of separating OLTP and OLAP workloads cannot satisfy this requirement. Instead, a new class of integrated solutions for handling hybrid workloads is needed. This paper presents an industrial use case and a novel architecture that integrates key-value-based event processing and SQL-based analytical processing on the same distributed store while minimizing the total cost of ownership. Our approach combines several well-known techniques such as shared scans, delta processing, a PAX-fashioned storage layout, and an interleaving of scanning and delta merging in a completely new way. Performance experiments show that our system scales out linearly with the number of servers. For instance, our system sustains event streams of 100,000 events per second while simultaneously processing 100 ad-hoc analytical queries per second, using a cluster of 12 commodity servers. In doing so, our system meets all response time goals of our telecommunication customers; that is, 10 milliseconds per event and 100 milliseconds for an ad-hoc analytical query. Moreover, our system beats commercial competitors by a factor of 2.5 in analytical and two orders of magnitude in update performance.

References

[1]
A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving Relations for Cache Performance. In VLDB, pages 169--180, 2001.
[2]
I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A Hands-free Adaptive Store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1103--1114. ACM, 2014.
[3]
M. Ali. An introduction to microsoft sql server streaminsight. In Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, page 66. ACM, 2010.
[4]
Apache Foundation. Apache Storm -- A system for processing streaming data in real time.
[5]
Apache Foundation. Hadoop. http://hadoop.apache.org/.
[6]
M. Aslett. Data Platforms Landscape Map. http://blogs.the451group.com/information_management/2014/03/18/updated-data-platforms-landscape-map-february-2014.
[7]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5):720--748, 1999.
[8]
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, volume 5, pages 225--237, 2005.
[9]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008.
[10]
R. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. Kuno, R. Nambiar, T. Neumann, M. Poess, et al. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, page 8. ACM, 2011.
[11]
F. Fabret, H.-A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe. In ACM SIGMOD Record, volume 30, pages 115--126. ACM, 2001.
[12]
F. Färber et al. The SAP HANA Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1), 2012.
[13]
G. Gasparis. AIM: A System for Handling Enormous Workloads under Strict Latency and Scalability Regulations. Master's thesis, Systems Group, Dep. of CS, ETH Zurich, 2013.
[14]
G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: killing one thousand queries with one stone. Proceedings of the VLDB Endowment, 5(6):526--537, 2012.
[15]
Google. Sparsehash. https://code.google.com/p/sparsehash.
[16]
Google. Supersonic Query Engine. https://code.google.com/p/supersonic.
[17]
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudré-Mauroux, and S. Madden. HYRISE - A Main Memory Hybrid Storage Engine. Proceedings of the VLDB Endowment, 4(2):105--116, 2010.
[18]
S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. Oltp through the looking glass, and what we found there. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 981--992. ACM, 2008.
[19]
InfiniBand Trade Association. http://www.infinibandta.org.
[20]
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment, 1(2):1496--1499, 2008.
[21]
A. Kemper and T. Neumann. HyPer: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011.
[22]
A. Khetrapal and V. Ganesh. Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, 2006.
[23]
R. Kimball. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley, 1996.
[24]
C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. The VLDB Journal, 23(2):253--278, 2014.
[25]
J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. Proceedings of the VLDB Endowment, 5(1):61--72, 2011.
[26]
Y. Li and J. M. Patel. Widetable: An accelerator for analytical data processing. Proceedings of the VLDB Endowment, 7(10), 2014.
[27]
S. Loesing, M. Pilman, T. Etter, and D. Kossmann. On the Design and Scalability of Distributed Shared-Memory Databases. Technical report, ETH Zurich, 2013.
[28]
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 54(7):121--130, July 2011.
[29]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A Column-oriented DBMS. In Proceedings of the 31st international conference on Very large data bases, pages 553--564. VLDB Endowment, 2005.
[30]
M. Stonebraker and A. Weisberg. The voltdb main memory dbms. IEEE Data Eng. Bull., 36(2):21--27, 2013.
[31]
E. Tech. Event Series Intelligence: Esper & NEsper. http://esper.codehaus.org.
[32]
TELCO-X Network Analytics Technical Questionnaire. Huawei internal document relating to customer TELCO-X, 2012.
[33]
A. Thomson and D. J. Abadi. The case for determinism in database systems. Proceedings of the VLDB Endowment, 3(1--2):70--80, 2010.
[34]
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable Performance for Unpredictable Workloads. Proceedings of the VLDB Endowment, 2(1):706--717, 2009.
[35]
T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. Simd-scan: ultra fast in-memory table scan using on-chip vector processing units. Proceedings of the VLDB Endowment, 2(1):385--394, 2009.
[36]
F. Yang, E. Tschetter, G. Merlino, N. Ray, X. Léauté, D. Ganguli, and H. Singh. Druid: A Real-time Analytical Data Store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 157--168. ACM, 2014.
[37]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10--17, 2010.
[38]
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 145--156. ACM, 2002.

Cited By

View all
  • (2024)RIOKV: reducing iterator overhead for efficient short-range query in LSM-tree-based key-value storesThe Journal of Supercomputing10.1007/s11227-024-06735-081:1Online publication date: 27-Dec-2024
  • (2024)A survey on transactional stream processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00814-z33:2(451-479)Online publication date: 1-Mar-2024
  • (2023)BP-Tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-TreesProceedings of the VLDB Endowment10.14778/3611479.361150216:11(2976-2989)Online publication date: 24-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. analytics
  2. event-processing
  3. oltp/olap engine

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'15
Sponsor:
SIGMOD/PODS'15: International Conference on Management of Data
May 31 - June 4, 2015
Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)6
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)RIOKV: reducing iterator overhead for efficient short-range query in LSM-tree-based key-value storesThe Journal of Supercomputing10.1007/s11227-024-06735-081:1Online publication date: 27-Dec-2024
  • (2024)A survey on transactional stream processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00814-z33:2(451-479)Online publication date: 1-Mar-2024
  • (2023)BP-Tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-TreesProceedings of the VLDB Endowment10.14778/3611479.361150216:11(2976-2989)Online publication date: 24-Aug-2023
  • (2023)S/C: Speeding up Data Materialization with Bounded Memory2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00393(1981-1994)Online publication date: Apr-2023
  • (2022)Enabling efficient and general subpopulation analytics in multidimensional data streamsProceedings of the VLDB Endowment10.14778/3551793.355186715:11(3249-3262)Online publication date: 1-Jul-2022
  • (2022)Efficient Interactive Global Cellular Signal Strength VisualizationIEEE Transactions on Big Data10.1109/TBDATA.2020.30295598:5(1209-1219)Online publication date: 1-Oct-2022
  • (2022)Maximizing Bigdata Retrieval: Block as a Value for NoSQL over SQLProceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1109/ASONAM55673.2022.10068692(556-563)Online publication date: 10-Nov-2022
  • (2021)FoundationDB: A Distributed Unbundled Transactional Key Value StoreProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457559(2653-2666)Online publication date: 9-Jun-2021
  • (2021)TS-Benchmark: A Benchmark for Time Series Databases2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00057(588-599)Online publication date: Apr-2021
  • (2020)Meet me halfwayProceedings of the VLDB Endowment10.14778/3407790.340784913:12(2620-2633)Online publication date: 14-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media