skip to main content
10.1145/2076623.2076654acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Black-box determination of cost models' parameters for federated stream-processing systems

Published: 21 September 2011 Publication History

Abstract

For distribution and deployment of queries in distributed stream-processing environments, it is vital to estimate the expected costs in advance. Having heterogeneous Stream-Processing Systems (SPSs) running on various hosts, the parameters of a cost model for an operator must be determined by measurements for each relevant combination of an SPS and hardware.
This paper presents a black-box method that determines the parameters of appropriate cost models that regard system-specific behavior. For some SPSs, there might not be any appropriate cost model available due to the lack of internal knowledge. If no cost model is available for any reason, we provide and apply a non-parametric model.

References

[1]
D. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In 2nd Biennial Conference on Innovative data Systems Research (CIDR), 2005.
[2]
A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom. Stream: The stanford data stream management system. to be published, 2004.
[3]
A. M. Ayad and J. F. Naughton. Static optimization of conjunctive queries with sliding windows over infinite streams. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 419--430, New York, NY, USA, 2004. ACM Press.
[4]
M. Cammert, J. Krämer, B. Seeger, and S. Vaupel. A Cost-Based Approach to Adaptive Resource Management in Data Stream Systems. Transactions on Knowledge and Data Engineering, 20(2):230--245, Feb. 2008.
[5]
M. Daum, F. Lauterwald, P. Baumgärtel, and K. Meyer-Wegener. Propagation of Densities of Streaming Data within Query Graphs. In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), pages 584--601, 2010.
[6]
M. Daum, F. Lauterwald, M. Fischer, M. Kiefer, and K. Meyer-Wegener. Integration of Heterogeneous Sensor Nodes by Data Stream Management, chapter Wireless Sensor Network Technologies for Information Explosion Era, pages 139--172. Number 278 in Studies in Computational Intelligence. Springer, 2010.
[7]
W. Du, R. Krishnamurthy, and M.-C. Shan. Query optimization in a heterogeneous dbms. In VLDB '92: Proceedings of the 18th International Conference on Very Large Data Bases, pages 277--291, San Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc.
[8]
D. E. Farrar and R. R. Glauber. Multicollinearity in regression analysis: The problem revisited. The Review of Economics and Statistics, 49(1):92--107, 1967.
[9]
M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381--395, 1981.
[10]
J. Gomes and H. Choi. Cost-based solution for optimizing multi-join queries over distributed streaming sensor data. In International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2006. CollaborateCom 2006, 2006.
[11]
T. N. E. Greville. Some applications of the pseudoinverse of a matrix. SIAM Review, 2(1):15--22, 1960.
[12]
D. Kossmann. The State of the Art in Distributed Query Processing. ACM Computing Surveys (CSUR), 32(4):422--469, 2004.
[13]
J. R. Koza. Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA, 1992.
[14]
R. Kuntschke and A. Kemper. Data stream sharing. In Current Trends in Database Technology - EDBT 2006, 2006.
[15]
Y. Liu and B. Plale. Multi-model based optimization for stream query processing. In KSI Eighteenth International Conference on Software Engineering and Knowledge Engineering (SEKE'06), 2006.
[16]
D. W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal on Applied Mathematics, 11(2):431--441, 1963.
[17]
R. Penrose. A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society, 51(03):406--413, 1955.
[18]
T. Poggio and F. Girosi. A theory of networks for approximation and learning. Techreport, Massachusetts Institute of Technology, Cambridge, MA, USA, 1989.
[19]
S. Schmidt. Quality-of-Service-Aware Data Stream Processing. PhD thesis, Technische Universität Dresden, 2007.
[20]
S. D. Silvey. Multicollinearity and imprecise estimation. Journal of the Royal Statistical Society. Series B (Methodological), 31(3):539--552, 1969.
[21]
H.-L. Truong and S. Dustdar. Composable cost estimation and monitoring for computational applications in cloud computing environments. In International Conference on Computational Science (ICCS), pages 2175--2184, 2010.
[22]
S. Viglas and J. F. Naughton. Rate-Based Query Optimization for Streaming Information Sources. In ACM SIGMOD Conference (SIGMOD), pages 37--48, 2002.
[23]
Y. Wei, V. Prasad, S. H. Son, and J. A. Stankovic. Prediction-based qos management for real-time data streams. In Proceedings of the 27th IEEE International Real-Time Systems Symposium, pages 344--358, Washington, DC, USA, 2006. IEEE Computer Society.
[24]
Q. Zhu and P. Larson. A query sampling method of estimating local cost parameters in a multidatabase system. In Proceedings of the Tenth International Conference on Data Engineering, pages 144--153, Washington, DC, USA, 1994. IEEE Computer Society.
[25]
Q. Zhu, Y. Sun, and S. Motheramgari. Developing cost models with qualitative variables for dynamic multidatabase environments. In Proceedings of the 16th ICDE Conference, 2000.

Cited By

View all
  • (2015)Machines Tuning MachinesProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.13(22-31)Online publication date: 8-Sep-2015
  • (2015)Placement-Safe Operator-Graph Changes in Distributed Heterogeneous Data Stream SystemsDatenbank-Spektrum10.1007/s13222-015-0196-z15:3(203-211)Online publication date: 14-Sep-2015
  • (2012)Data Stream Application Manager (DSAM)Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems10.1145/2335484.2335532(381-382)Online publication date: 16-Jul-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '11: Proceedings of the 15th Symposium on International Database Engineering & Applications
September 2011
274 pages
ISBN:9781450306270
DOI:10.1145/2076623
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. (self-)management
  2. cost models
  3. heterogeneous event-based systems
  4. optimization techniques
  5. performance modeling

Qualifiers

  • Research-article

Conference

IDEAS '11

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Machines Tuning MachinesProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.13(22-31)Online publication date: 8-Sep-2015
  • (2015)Placement-Safe Operator-Graph Changes in Distributed Heterogeneous Data Stream SystemsDatenbank-Spektrum10.1007/s13222-015-0196-z15:3(203-211)Online publication date: 14-Sep-2015
  • (2012)Data Stream Application Manager (DSAM)Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems10.1145/2335484.2335532(381-382)Online publication date: 16-Jul-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media