Skip to main content
Log in

Converting non-parametric distance-based classification to anytime algorithms

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have anywhere from several milliseconds to several minutes to return a class prediction. For such problems an anytime algorithm may be especially useful. In this work we show how we convert the ubiquitous nearest neighbor classifier into an anytime algorithm that can produce an instant classification, or if given the luxury of additional time, can continue computations to increase classification accuracy. We demonstrate the utility of our approach with a comprehensive set of experiments on data from diverse domains. We further show the utility of our work with two deployed applications, in classifying and counting fish, and in classifying insects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. Note that one of the current authors is also an author of this study. However, it is more natural to refer to this work in the third person.

References

  1. Aggarwal C, Han J, Wang J, Yu PS (2004) On demand classification of data streams. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 503–508

  2. Shah R, Krishnaswamy S, Gaber MM (2005) Resource-aware very fast K-means for ubiquitous data stream mining. In: Proceedings of 2nd international workshop on knowledge discovery in data streams

  3. Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART artificial intelligence, vol 7(2). ACM Press, New York

  4. Bradley P, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’98), pp 9–15

  5. Esmeir S, Markovitch S (2005) Interruptible anytime algorithms for iterative improvement of decision trees. In: Proceedings of workshop on the utility-based data mining, held with the 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05)

  6. Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the 18th international conference on machine learning (ICML’01), pp 441–448

  7. Wei L, Keogh E, Van Herle H, Mafra-Neto A (2005) Atomic Wedgie: efficient query filtering for streaming time series. In: Proceedings of the 5th IEEE international conference on data mining (ICDM’05), pp 490–497

  8. Bozma HI, Yalcin H (2002) Visual processing and classification of items on a moving conveyor: a selective perception approach. Rob Comput Integr Manuf 18(2):125–133

    Article  Google Scholar 

  9. Adamek T, Connor NE (2004) A multiscale representation method for nonrigid shapes with a single closed contour. IEEE Circuits Syst Video Technol 14:742–753

    Article  Google Scholar 

  10. Wen Z, Tao Y (2000) Dual-camera NIR/MIR imaging for stem-end/calyx identification in apple defect sorting. Trans ASAE 43(2):446–452

    Google Scholar 

  11. Sánchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10(3):189–201

    Article  MathSciNet  Google Scholar 

  12. Zilberstein S, Russell S (1995) Approximate reasoning using anytime algorithms. Imprecise and approximate computation. Kluwer, Dordrecht

  13. Myers K, Kearns MJ, Singh SP, Walker MA (2000) A boosting approach to topic spotting on subdialogues. In: Proceedings of the international conference on machine learning (ICML’00), pp 655–662

  14. Heidemann G, Bekel H, Bax II, Ritter H (2005) Interactive online learning. Pattern Recogn Image Anal 15(1):55–58

    Google Scholar 

  15. Yamada S, Nagino N (2002) Constructing a personal web map with anytime-control of web robots. Int J Coop Inform Syst 11(1–2):1–19

    Article  Google Scholar 

  16. Kotenko I, Stankevitch L (2002) The control of teams of autonomous objects in the time-constrained environments. Proc IEEE Int Conf Artif Intell Syst 158–163

  17. Lindgren T (2000) Anytime inductive logic programming. In: Proceedings of the 15th international conference on computers and their applications, pp 439–442

  18. Hulten G, Domingos P (2002) Mining complex models from arbitrarily large databases in constant time. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 525–531

  19. Grumberg O, Livne S, Markovitch S (2003) Learning to order BDD variables in verification. J Artif Intell Res 18:83–116

    MATH  MathSciNet  Google Scholar 

  20. Webb GI, Yang Y, Boughton J, Korb K, Ting K-M (2005) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Technical Report 2005/185, Clayton School of Information Technology, Monash University

  21. Barandela R, Ferri FJ, Sánchez JS (2005) Decision boundary preserving prototype selection for nearest neighbor classification. Int J Pattern Recogn Artif Intell 19(6):787–806

    Article  Google Scholar 

  22. Pekalska E, Duin R, Paclik P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recogn 39(2):189–208

    Article  MATH  Google Scholar 

  23. Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38:257–286. (Kluwer Acadamic Publishers)

    Article  MATH  Google Scholar 

  24. Herrero JR, Juan J Navarro (2007) Exploiting computer resources for fast Nearest Neighbor Classification. Pattern Anal Appl 10(4):265–275

    Article  Google Scholar 

  25. Ueno K, Xi X, Keogh E, Lee D-J (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: Proceedings of the 6th IEEE international conference on data mining (ICDM’06), pp 623–632

  26. Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23nd international conference on machine learning (ICDM’06), Pittsburgh, PA

  27. Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), Edmonton, Canada, pp 102–111

  28. Keogh E, Wei L, Xi X, Lee SH, Vlachos M (2006) LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp 882–893

  29. http://www.ics.uci.edu/~mlearn/MLRepository.html

  30. Keogh E. http://www.cs.ucr.edu/~eamonn/ICDM06/

  31. Geurts P (2002) Contributions to decision tree induction: bias/variance tradeoff and time series classification. Ph.D. thesis, Department of Electrical Engineering and Computer Science, University of Liege, Belgium

  32. Ratanamahatana CA, Keogh E (2005) Three myths about dynamic time warping. In: Proceedings of the SIAM international conference on data mining (SDM’05), pp 506–510

  33. Lee D-J, Schoenberger R, Shiozawa D, Xu X, Zhan P (2004) Contour matching for a fish recognition and migration monitoring system. In: Proceedings of the SPIE optics east, two and three-dimensional vision systems for inspection, control, and metrology II, vol 5606-05, pp 25–28

  34. Hardin RW (2006) Vision system monitors fish populations. Vis Syst Des (January)

  35. Jolliffe IT (2002) Principal component analysis. Springer, Heidelberg

    MATH  Google Scholar 

  36. Chung K-C, Kee SC, Kim SR (1999) Face recognition using principal component analysis of Gabor filter responses. In: Proceedings of the international workshop on recognition, analysis, and tracking of faces and gestures in real-time systems, pp 53–57

  37. Bimbo AD (1999) Visual information retrieval. Morgan Khaufman, San Franscico

    Google Scholar 

  38. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of international conference on management of data, Boston, MA, pp 47–57

  39. Li M, Chen X, Li X, Ma B, Vitányi P (2003) The similarity metric. In: Proceedings of the 14th annual ACM-SIAM symposium on discrete algorithms, pp 863–872

  40. Rodríguez JJ, Alonso CJ (2004) Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM symposium on applied computing, pp 548–552

  41. Guarino M, Costa A, van Hirtum A, Jans P, Ghesquiere K, Aerts JM, Navarotto P, Berckmans D (2004) Automatic detection of infective pig coughing from continuous recording in field situations. Riv Ingegneria Agraria 35(4):69–73

    Google Scholar 

Download references

Acknowledgments

We gratefully acknowledge Geoffrey Webb, Jill Brady and Ying Yang for their useful suggestions. We further wish to acknowledge Dennis Shiozawa, Xiaoqian Xua, and Pengcheng Zhana at Brigham Young University, and Robert Schoenberger with Agris-Schoen Vision Systems for their help with the fish monitoring problem, and Agenor Mafra-Neto of ISCA Technologies for his assistance with the insect classification problem. This research was partly funded by the NSF under grant IIS-0237918.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaopeng Xi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xi, X., Ueno, K., Keogh, E. et al. Converting non-parametric distance-based classification to anytime algorithms. Pattern Anal Applic 11, 321–336 (2008). https://doi.org/10.1007/s10044-007-0098-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-007-0098-2

Keywords

Navigation