Abstract
For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have anywhere from several milliseconds to several minutes to return a class prediction. For such problems an anytime algorithm may be especially useful. In this work we show how we convert the ubiquitous nearest neighbor classifier into an anytime algorithm that can produce an instant classification, or if given the luxury of additional time, can continue computations to increase classification accuracy. We demonstrate the utility of our approach with a comprehensive set of experiments on data from diverse domains. We further show the utility of our work with two deployed applications, in classifying and counting fish, and in classifying insects.
Similar content being viewed by others
Notes
Note that one of the current authors is also an author of this study. However, it is more natural to refer to this work in the third person.
References
Aggarwal C, Han J, Wang J, Yu PS (2004) On demand classification of data streams. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 503–508
Shah R, Krishnaswamy S, Gaber MM (2005) Resource-aware very fast K-means for ubiquitous data stream mining. In: Proceedings of 2nd international workshop on knowledge discovery in data streams
Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART artificial intelligence, vol 7(2). ACM Press, New York
Bradley P, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’98), pp 9–15
Esmeir S, Markovitch S (2005) Interruptible anytime algorithms for iterative improvement of decision trees. In: Proceedings of workshop on the utility-based data mining, held with the 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05)
Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the 18th international conference on machine learning (ICML’01), pp 441–448
Wei L, Keogh E, Van Herle H, Mafra-Neto A (2005) Atomic Wedgie: efficient query filtering for streaming time series. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), pp 490–497
Bozma HI, Yalcin H (2002) Visual processing and classification of items on a moving conveyor: a selective perception approach. Rob Comput Integr Manuf 18(2):125–133
Adamek T, Connor NE (2004) A multiscale representation method for nonrigid shapes with a single closed contour. IEEE Circuits Syst Video Technol 14:742–753
Wen Z, Tao Y (2000) Dual-camera NIR/MIR imaging for stem-end/calyx identification in apple defect sorting. Trans ASAE 43(2):446–452
Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201
Zilberstein S, Russell S (1995) Approximate reasoning using anytime algorithms. Imprecise and approximate computation. Kluwer, Dordrecht
Myers K, Kearns MJ, Singh SP, Walker MA (2000) A boosting approach to topic spotting on subdialogues. In: Proceedings of the international conference on machine learning (ICML’00), pp 655–662
Heidemann G, Bekel H, Bax II, Ritter H (2005) Interactive online learning. Pattern Recogn Image Anal 15(1):55–58
Yamada S, Nagino N (2002) Constructing a personal web map with anytime-control of web robots. Int J Coop Inform Syst 11(1–2):1–19
Kotenko I, Stankevitch L (2002) The control of teams of autonomous objects in the time-constrained environments. Proc IEEE Int Conf Artif Intell Syst 158–163
Lindgren T (2000) Anytime inductive logic programming. In: Proceedings of the 15th international conference on computers and their applications, pp 439–442
Hulten G, Domingos P (2002) Mining complex models from arbitrarily large databases in constant time. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 525–531
Grumberg O, Livne S, Markovitch S (2003) Learning to order BDD variables in verification. J Artif Intell Res 18:83–116
Webb GI, Yang Y, Boughton J, Korb K, Ting K-M (2005) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Technical Report 2005/185, Clayton School of Information Technology, Monash University
Barandela R, Ferri FJ, Sánchez JS (2005) Decision boundary preserving prototype selection for nearest neighbor classification. Int J Pattern Recogn Artif Intell 19(6):787–806
Pekalska E, Duin R, Paclik P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recogn 39(2):189–208
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286. (Kluwer Acadamic Publishers)
Herrero JR, Juan J Navarro (2007) Exploiting computer resources for fast Nearest Neighbor Classification. Pattern Anal Appl 10(4):265–275
Ueno K, Xi X, Keogh E, Lee D-J (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: Proceedings of the 6th IEEE international conference on data mining (ICDM’06), pp 623–632
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23nd international conference on machine learning (ICDM’06), Pittsburgh, PA
Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), Edmonton, Canada, pp 102–111
Keogh E, Wei L, Xi X, Lee SH, Vlachos M (2006) LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp 882–893
Geurts P (2002) Contributions to decision tree induction: bias/variance tradeoff and time series classification. Ph.D. thesis, Department of Electrical Engineering and Computer Science, University of Liege, Belgium
Ratanamahatana CA, Keogh E (2005) Three myths about dynamic time warping. In: Proceedings of the SIAM international conference on data mining (SDM’05), pp 506–510
Lee D-J, Schoenberger R, Shiozawa D, Xu X, Zhan P (2004) Contour matching for a fish recognition and migration monitoring system. In: Proceedings of the SPIE optics east, two and three-dimensional vision systems for inspection, control, and metrology II, vol 5606-05, pp 25–28
Hardin RW (2006) Vision system monitors fish populations. Vis Syst Des (January)
Jolliffe IT (2002) Principal component analysis. Springer, Heidelberg
Chung K-C, Kee SC, Kim SR (1999) Face recognition using principal component analysis of Gabor filter responses. In: Proceedings of the international workshop on recognition, analysis, and tracking of faces and gestures in real-time systems, pp 53–57
Bimbo AD (1999) Visual information retrieval. Morgan Khaufman, San Franscico
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of international conference on management of data, Boston, MA, pp 47–57
Li M, Chen X, Li X, Ma B, Vitányi P (2003) The similarity metric. In: Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms, pp 863–872
Rodríguez JJ, Alonso CJ (2004) Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM symposium on applied computing, pp 548–552
Guarino M, Costa A, van Hirtum A, Jans P, Ghesquiere K, Aerts JM, Navarotto P, Berckmans D (2004) Automatic detection of infective pig coughing from continuous recording in field situations. Riv Ingegneria Agraria 35(4):69–73
Acknowledgments
We gratefully acknowledge Geoffrey Webb, Jill Brady and Ying Yang for their useful suggestions. We further wish to acknowledge Dennis Shiozawa, Xiaoqian Xua, and Pengcheng Zhana at Brigham Young University, and Robert Schoenberger with Agris-Schoen Vision Systems for their help with the fish monitoring problem, and Agenor Mafra-Neto of ISCA Technologies for his assistance with the insect classification problem. This research was partly funded by the NSF under grant IIS-0237918.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xi, X., Ueno, K., Keogh, E. et al. Converting non-parametric distance-based classification to anytime algorithms. Pattern Anal Applic 11, 321–336 (2008). https://doi.org/10.1007/s10044-007-0098-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-007-0098-2