skip to main content
10.1145/2064676.2064687acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Optimization of operator partitions in stream data warehouse

Published: 28 October 2011 Publication History

Abstract

Memory and time optimization is a key task of Stream Data Warehouses (SDWs). StrETL processes in those systems are similar to queries in Data Stream Management Systems (DSMSs). This fact allows us to migrate some methods from DSMS to SDW. We have observed that schedulers and algorithms introduced to create operator partitions are analyzed separately either in StrETL processes or in stream queries. The fact is, those two mechanisms affect each other and it is justified to study potential benefits of combining them together. In the paper we introduce a solution which cooperates with a scheduler in order to create more efficient operator partitions. Another noteworthy issue is that this algorithm is able to optimize a wider range of operator topologies. Finally, experimental evaluation show that our solution allows achieving a smaller memory consumption or a shorter response time in comparison with the competing strategies.

References

[1]
http://ita.ee.lbl.gov/html/contrib/LBL-PKT.html.
[2]
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2):120--139, 2003.
[3]
A. Arasu, S. Babu, and J. Widom. The cql continuous query language: semantic foundations and query execution. The VLDB Journal, 15(2):121--142, 2006.
[4]
B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator scheduling for memory minimization in data stream systems. In ACM International Conference on Management of Data (SIGMOD 2003), 2003.
[5]
Y. Bai, H. Thakkar, H. Wang, and C. Zaniolo. Optimizing timestamp management in data stream management systems data engineering, 2007. icde 2007. ieee 23rd international conference on. pages 1334--1338, 2007.
[6]
Y. Bai and C. Zaniolo. Minimizing latency and memory in dsms: a unified approach to quasi-optimal scheduling. In SSPS '08: Proceedings of the 2nd international workshop on Scalable stream processing system, pages 58--67, New York, NY, USA, 2008. ACM.
[7]
M. H. Bateni, L. Golab, M. T. Hajiaghayi, and H. Karloff. Scheduling to minimize staleness and stretch in real-time data warehouses. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, SPAA '09, pages 29--38, New York, NY, USA, 2009. ACM.
[8]
D. Carney, U. Çetintemel, A. Rasin, S. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. In VLDB '2003: Proceedings of the 29th international conference on Very large data bases, pages 838--849. VLDB Endowment, 2003.
[9]
A. Chakraborty and A. Singh. A partition-based approach to support streaming updates over persistent data in an active datawarehouse. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1--11, Washington, DC, USA, 2009. IEEE Computer Society.
[10]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. Telegraphcq: Continuous dataflow processing for an uncertain world. In CIDR, 2003.
[11]
M. Gorawski. Advanced Data Warehouses, volume 30, nr 3B. Studia Informatica, 2009.
[12]
M. Gorawski and A. Chrószcz. The design of stream database engine in concurrent environment. In OTM Conferences (2), pages 1033--1049, 2009.
[13]
M. Gorawski and A. Chrószcz. Streamapas: Query language and data model. In CISIS, pages 75--82, 2009.
[14]
M. Gorawski and R. Malczok. Towards stream data parallel processing in spatial aggregating index. In PPAM, pages 209--218, 2007.
[15]
J. K. A. M. B. S. M. Cammert, Ch. Heinz. Pipes: A multi-threaded publish-subscribe architecture for continuous queries over streaming data sources. Technical Report 32, Department of Mathematics and Computer Science. University of Marburg, July 2003.
[16]
N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, and N. Frantzell. Meshing streaming updates with persistent data in an active data warehouse. volume 20, pages 976--991. IEEE Educational Activities Department, Piscataway, NJ, USA, July 2008.
[17]
M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Efficient scheduling of heterogeneous continuous queries. In VLDB '06: Proceedings of the 32nd international conference on Very large data bases, pages 511--522. VLDB Endowment, 2006.
[18]
M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Efficient scheduling of heterogeneous continuous queries. In VLDB '06: Proceedings of the 32nd international conference on Very large data bases, pages 511--522. VLDB Endowment, 2006.
[19]
E. N. Tatbul. Load shedding techniques for data stream management systems. Providence, RI, USA, 2007. Brown University.
[20]
T. K. P. V. Tucker, P. A. and D. Maier. Nexmark - a benchmark for queries over data streams, 2002.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DOLAP '11: Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
October 2011
112 pages
ISBN:9781450309639
DOI:10.1145/2064676
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. operator partitions
  2. scheduler

Qualifiers

  • Research-article

Conference

CIKM '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 29 of 79 submissions, 37%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)The stream data warehouseFuture Generation Computer Systems10.1016/j.future.2023.01.003142:C(212-227)Online publication date: 1-May-2023
  • (2014)Research on the Stream ETL ProcessBeyond Databases, Architectures, and Structures10.1007/978-3-319-06932-6_7(61-71)Online publication date: 2014
  • (2014)User Identity Unification in e-CommerceAdvances in Systems Science10.1007/978-3-319-01857-7_16(163-172)Online publication date: 2014
  • (2013)On-Demand ELT Architecture for Right-Time BIInternational Journal of Data Warehousing and Mining10.4018/jdwm.20130401029:2(21-38)Online publication date: 1-Apr-2013
  • (2013)Use of Grammars and Machine Learning in ETL Systems That Control Load Balancing Process2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing10.1109/HPCC.and.EUC.2013.243(1709-1714)Online publication date: Nov-2013
  • (2013)Customer Unification in E-CommerceProceedings of the 14th International Conference on Intelligent Data Engineering and Automated Learning --- IDEAL 2013 - Volume 820610.1007/978-3-642-41278-3_18(142-152)Online publication date: 20-Oct-2013
  • (2013)Modeling Data Stream Intensity in Distributed Stream Processing SystemComputer Networks10.1007/978-3-642-38865-1_38(372-383)Online publication date: 2013
  • (2013)Evaluation and Development Perspectives of Stream Data Processing SystemsComputer Networks10.1007/978-3-642-38865-1_31(300-311)Online publication date: 2013
  • (2011)DOLAP 2011Proceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2064055(2645-2646)Online publication date: 24-Oct-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media