Skip to main content
Log in

Harnessing the strengths of anytime algorithms for constant data streams

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Anytime algorithms have been proposed for many different applications, e.g., in data mining. Their strengths are the ability to first provide a result after a very short initialization and second to improve their result with additional time. Therefore, anytime algorithms have so far been used when the available processing time varies, e.g., on varying data streams. In this paper we propose to employ anytime algorithms on constant data streams, i.e., for tasks with constant time allowance. We introduce two approaches that harness the strengths of anytime algorithms on constant data streams and thereby improve the over all quality of the result with respect to the corresponding budget algorithm. We derive formulas for the expected performance gain and demonstrate the effectiveness of our novel approaches using existing anytime algorithms on benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th VLDB, pp 81–92

  • Aggarwal CC, Han J, Wang J, Yu PS (2004a) A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th VLDB. Morgan Kaufmann, pp 852–863

  • Aggarwal CC, Han J, Wang J, Yu PS (2004b) On demand classification of data streams. In: Proceedings of the 10th ACM KDD. ACM, pp 503–508

  • Arai B, Das G, Gunopulos D, Koudas N (2007) Anytime measures for top-k algorithms. In: Proceedings of the 33rd VLDB. ACM, pp 914–925

  • Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Proceedings of the 35th ACM STOC. ACM, pp 30–39

  • Cheetham W (2000) Case-based reasoning with confidence. In: Advances in case-based reasoning, (EWCBR). Lecture notes in computer science, vol 1898. Springer, pp 15–25

  • Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: Proceedings of the 22nd ACM PODS, pp 296–306

  • Crammer K, Kandola JS, Singer Y (2003) Online classification on a budget. In: NIPS. MIT Press

  • DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: Proceedings of the 19th ICML. Morgan Kaufmann, pp 99–106

  • DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: Proceedings of the 3rd SIAM SDM. SIAM

  • Delany SJ, Cunningham P, Doyle D, Zamolotskikh A (2005) Generating estimates of classification confidence for a case-based spam filter. In: 6th international conference on case-based reasoning (ICCBR). Lecture notes in computer science, vol 3620. Springer, pp 177–190

  • Dredze M, Crammer K, Pereira F (2008) Confidence-weighted linear classification. In: Proceedings of the 25th ICML, pp 264–271

  • Esmeir S, Markovitch S (2006) Anytime induction of decision trees: an iterative improvement approach. In: Proceedings of the 21st AAAI. AAAI Press

  • Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2): 20–27

    Article  Google Scholar 

  • Hettich S, Bay S (1999) The UCI KDD archive. http://kdd.ics.uci.edu

  • Hulten G, Domingos P (2002) Mining complex models from arbitrarily large databases in constant time. In: Proceedings of the 8th ACM KDD, pp 525–531

  • Liu C-L, Wellman MP (1996) On state-space abstraction for anytime evaluation of Bayesian networks. SIGART Bull 7(2): 50–57

    Article  Google Scholar 

  • Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB. Morgan Kaufmann, pp 346–357

  • Myers K, Kearns MJ, Singh SP, Walker MA (2000) A boosting approach to topic spotting on subdialogues. In: Proceedings of the 17th ICML, pp 655–662

  • Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: Proceedings of the 12th EDBT/ICDT. ACM international conference proceeding series, vol 360. ACM, pp 311–322

  • Silberstein A, Gelfand A, Munagala K, Puggioni G, Yang J (2007) Suppressions and failures in sensor data: a Bayesian approach. In: Proceedings of the 33rd VLDB, pp 842–853

  • Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM KDD, pp 377–382

  • Ueno K, Xi X, Keogh EJ, Lee D-J (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: Proceedings of the 6th IEEE ICDM. IEEE Computer Society, pp 623–632

  • Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications (at ICDM)

  • Wan EA (1990) Neural network classification: a Bayesian interpretation. IEEE Trans Neural Netw 1(4): 303–305. doi:10.1109/72.80269

    Article  Google Scholar 

  • Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM KDD, pp 226–235

  • Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1): 35–53

    Article  Google Scholar 

  • Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3): 73–83

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philipp Kranen.

Additional information

Responsible editors: Aleksander Kołcz, Wray Buntine, Marko Grobelnik, Dunja Mladenic, and John Shawe-Taylor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kranen, P., Seidl, T. Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Disc 19, 245–260 (2009). https://doi.org/10.1007/s10618-009-0139-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-009-0139-0

Keywords

Navigation