Abstract
Anytime algorithms have been proposed for many different applications, e.g., in data mining. Their strengths are the ability to first provide a result after a very short initialization and second to improve their result with additional time. Therefore, anytime algorithms have so far been used when the available processing time varies, e.g., on varying data streams. In this paper we propose to employ anytime algorithms on constant data streams, i.e., for tasks with constant time allowance. We introduce two approaches that harness the strengths of anytime algorithms on constant data streams and thereby improve the over all quality of the result with respect to the corresponding budget algorithm. We derive formulas for the expected performance gain and demonstrate the effectiveness of our novel approaches using existing anytime algorithms on benchmark data sets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th VLDB, pp 81–92
Aggarwal CC, Han J, Wang J, Yu PS (2004a) A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th VLDB. Morgan Kaufmann, pp 852–863
Aggarwal CC, Han J, Wang J, Yu PS (2004b) On demand classification of data streams. In: Proceedings of the 10th ACM KDD. ACM, pp 503–508
Arai B, Das G, Gunopulos D, Koudas N (2007) Anytime measures for top-k algorithms. In: Proceedings of the 33rd VLDB. ACM, pp 914–925
Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Proceedings of the 35th ACM STOC. ACM, pp 30–39
Cheetham W (2000) Case-based reasoning with confidence. In: Advances in case-based reasoning, (EWCBR). Lecture notes in computer science, vol 1898. Springer, pp 15–25
Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: Proceedings of the 22nd ACM PODS, pp 296–306
Crammer K, Kandola JS, Singer Y (2003) Online classification on a budget. In: NIPS. MIT Press
DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: Proceedings of the 19th ICML. Morgan Kaufmann, pp 99–106
DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: Proceedings of the 3rd SIAM SDM. SIAM
Delany SJ, Cunningham P, Doyle D, Zamolotskikh A (2005) Generating estimates of classification confidence for a case-based spam filter. In: 6th international conference on case-based reasoning (ICCBR). Lecture notes in computer science, vol 3620. Springer, pp 177–190
Dredze M, Crammer K, Pereira F (2008) Confidence-weighted linear classification. In: Proceedings of the 25th ICML, pp 264–271
Esmeir S, Markovitch S (2006) Anytime induction of decision trees: an iterative improvement approach. In: Proceedings of the 21st AAAI. AAAI Press
Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2): 20–27
Hettich S, Bay S (1999) The UCI KDD archive. http://kdd.ics.uci.edu
Hulten G, Domingos P (2002) Mining complex models from arbitrarily large databases in constant time. In: Proceedings of the 8th ACM KDD, pp 525–531
Liu C-L, Wellman MP (1996) On state-space abstraction for anytime evaluation of Bayesian networks. SIGART Bull 7(2): 50–57
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: Proceedings of the 28th VLDB. Morgan Kaufmann, pp 346–357
Myers K, Kearns MJ, Singh SP, Walker MA (2000) A boosting approach to topic spotting on subdialogues. In: Proceedings of the 17th ICML, pp 655–662
Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: Proceedings of the 12th EDBT/ICDT. ACM international conference proceeding series, vol 360. ACM, pp 311–322
Silberstein A, Gelfand A, Munagala K, Puggioni G, Yang J (2007) Suppressions and failures in sensor data: a Bayesian approach. In: Proceedings of the 33rd VLDB, pp 842–853
Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM KDD, pp 377–382
Ueno K, Xi X, Keogh EJ, Lee D-J (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: Proceedings of the 6th IEEE ICDM. IEEE Computer Society, pp 623–632
Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications (at ICDM)
Wan EA (1990) Neural network classification: a Bayesian interpretation. IEEE Trans Neural Netw 1(4): 303–305. doi:10.1109/72.80269
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM KDD, pp 226–235
Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1): 35–53
Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3): 73–83
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: Aleksander Kołcz, Wray Buntine, Marko Grobelnik, Dunja Mladenic, and John Shawe-Taylor.
Rights and permissions
About this article
Cite this article
Kranen, P., Seidl, T. Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Disc 19, 245–260 (2009). https://doi.org/10.1007/s10618-009-0139-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-009-0139-0