Harnessing the strengths of anytime algorithms for constant data streams

Kranen, Philipp; Seidl, Thomas

doi:10.1007/s10618-009-0139-0

Harnessing the strengths of anytime algorithms for constant data streams

Published: 22 July 2009

Volume 19, pages 245–260, (2009)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Philipp Kranen¹ &
Thomas Seidl¹

392 Accesses
18 Citations
Explore all metrics

Abstract

Anytime algorithms have been proposed for many different applications, e.g., in data mining. Their strengths are the ability to first provide a result after a very short initialization and second to improve their result with additional time. Therefore, anytime algorithms have so far been used when the available processing time varies, e.g., on varying data streams. In this paper we propose to employ anytime algorithms on constant data streams, i.e., for tasks with constant time allowance. We introduce two approaches that harness the strengths of anytime algorithms on constant data streams and thereby improve the over all quality of the result with respect to the corresponding budget algorithm. We derive formulas for the expected performance gain and demonstrate the effectiveness of our novel approaches using existing anytime algorithms on benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time-weighted counting for recently frequent pattern mining in data streams

Article 22 March 2017

Stream Query Optimization

STREAM: The Stanford Data Stream Management System

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th VLDB, pp 81–92
Aggarwal CC, Han J, Wang J, Yu PS (2004a) A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th VLDB. Morgan Kaufmann, pp 852–863
Aggarwal CC, Han J, Wang J, Yu PS (2004b) On demand classification of data streams. In: Proceedings of the 10th ACM KDD. ACM, pp 503–508
Arai B, Das G, Gunopulos D, Koudas N (2007) Anytime measures for top-k algorithms. In: Proceedings of the 33rd VLDB. ACM, pp 914–925
Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Proceedings of the 35th ACM STOC. ACM, pp 30–39
Cheetham W (2000) Case-based reasoning with confidence. In: Advances in case-based reasoning, (EWCBR). Lecture notes in computer science, vol 1898. Springer, pp 15–25
Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: Proceedings of the 22nd ACM PODS, pp 296–306
Crammer K, Kandola JS, Singer Y (2003) Online classification on a budget. In: NIPS. MIT Press
DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: Proceedings of the 19th ICML. Morgan Kaufmann, pp 99–106
DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: Proceedings of the 3rd SIAM SDM. SIAM
Delany SJ, Cunningham P, Doyle D, Zamolotskikh A (2005) Generating estimates of classification confidence for a case-based spam filter. In: 6th international conference on case-based reasoning (ICCBR). Lecture notes in computer science, vol 3620. Springer, pp 177–190
Dredze M, Crammer K, Pereira F (2008) Confidence-weighted linear classification. In: Proceedings of the 25th ICML, pp 264–271
Esmeir S, Markovitch S (2006) Anytime induction of decision trees: an iterative improvement approach. In: Proceedings of the 21st AAAI. AAAI Press
Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2): 20–27
Article Google Scholar
Hettich S, Bay S (1999) The UCI KDD archive. http://kdd.ics.uci.edu
Hulten G, Domingos P (2002) Mining complex models from arbitrarily large databases in constant time. In: Proceedings of the 8th ACM KDD, pp 525–531
Liu C-L, Wellman MP (1996) On state-space abstraction for anytime evaluation of Bayesian networks. SIGART Bull 7(2): 50–57
Article Google Scholar
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB. Morgan Kaufmann, pp 346–357
Myers K, Kearns MJ, Singh SP, Walker MA (2000) A boosting approach to topic spotting on subdialogues. In: Proceedings of the 17th ICML, pp 655–662
Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: Proceedings of the 12th EDBT/ICDT. ACM international conference proceeding series, vol 360. ACM, pp 311–322
Silberstein A, Gelfand A, Munagala K, Puggioni G, Yang J (2007) Suppressions and failures in sensor data: a Bayesian approach. In: Proceedings of the 33rd VLDB, pp 842–853
Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM KDD, pp 377–382
Ueno K, Xi X, Keogh EJ, Lee D-J (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: Proceedings of the 6th IEEE ICDM. IEEE Computer Society, pp 623–632
Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications (at ICDM)
Wan EA (1990) Neural network classification: a Bayesian interpretation. IEEE Trans Neural Netw 1(4): 303–305. doi:10.1109/72.80269
Article Google Scholar
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM KDD, pp 226–235
Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1): 35–53
Article Google Scholar
Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3): 73–83
Google Scholar

Download references

Author information

Authors and Affiliations

Data Management and Data Exploration Group, RWTH Aachen University, 52056, Aachen, Germany
Philipp Kranen & Thomas Seidl

Authors

Philipp Kranen
View author publications
You can also search for this author inPubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Philipp Kranen.

Additional information

Responsible editors: Aleksander Kołcz, Wray Buntine, Marko Grobelnik, Dunja Mladenic, and John Shawe-Taylor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kranen, P., Seidl, T. Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Disc 19, 245–260 (2009). https://doi.org/10.1007/s10618-009-0139-0

Download citation

Received: 12 June 2009
Accepted: 24 June 2009
Published: 22 July 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10618-009-0139-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Harnessing the strengths of anytime algorithms for constant data streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Time-weighted counting for recently frequent pattern mining in data streams

Stream Query Optimization

STREAM: The Stanford Data Stream Management System

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now