Skip to main content
Log in

Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

This paper addresses the problem of insufficient performance of statistical classification with the medium-sized database (thousands of classes). Each object is represented as a sequence of independent segments. Each segment is defined as a random sample of independent features with the distribution of multivariate exponential type. To increase the speed of the optimal Kullback–Leibler minimum information discrimination principle, we apply the clustering of the training set and an approximate nearest neighbor search of the input object in a set of cluster medoids. By using the asymptotic properties of the Kullback–Leibler divergence, we propose the maximal likelihood search procedure. In this method the medoid to check is selected from the cluster with the maximal joint density (likelihood) of the distances to the previously checked medoids. Experimental results in image recognition with artificially generated dataset and Essex facial database prove that the proposed approach is much more effective, than an exhaustive search and the known approximate nearest neighbor methods from FLANN and NonMetricSpace libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://sites.google.com/site/andreyvsavchenko/ImageRecognitionTest_VS13.zip.

  2. http://cswww.essex.ac.uk/mv/allfaces/index.html.

References

  1. Aggarwal, C.: Data Mining: The Textbook. Springer, New York (2015)

    Book  MATH  Google Scholar 

  2. Andoni, A., Indyk, P.: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’06, pp. 459–468 (2006)

  3. Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beis, J.S., Lowe, D.G.: Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1000–1006 (1997)

  5. Boginski, V., Butenko, S., Pardalos, P.M.: Mining market data: a network approach. Comput. Oper. Res. 33(11), 3171–3184 (2006)

    Article  MATH  Google Scholar 

  6. Boytsov, L., Naidan, B.: Engineering Efficient and Effective Non-metric Space Library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) Similarity and Applications, Lecture Notes in Computer Science, vol. 8199, pp. 280–293. Springer, Berlin (2013)

  7. Bustos, B., Navarro, G., Chvez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24(14), 2357–2366 (2003)

    Article  MATH  Google Scholar 

  8. Cayton, L.: Efficient Bregman range search. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 243–251. Curran Associates, Inc. (2009)

  9. Chen, S., Zhang, D., Zhou, Z.H.: Enhanced (PC)2a for face recognition with one training image per person. Pattern Recognit. Lett. 25(10), 1173–1181 (2004)

    Article  Google Scholar 

  10. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)

  11. Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  12. Eick, C.F., Zeidat, N.: Using supervised clustering to enhance classifiers. In: Hacid, M.S., Murray, N.V., Ra, Z.W., Tsumoto, S. (eds.) Foundations of Intelligent Systems, Lecture Notes in Computer Science, vol. 3488, pp. 248–256. Springer, Berlin (2005)

  13. Gonzalez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008)

    Article  Google Scholar 

  14. Guarracino, M.R., Chinchuluun, A., Pardalos, P.M.: Decision rules for efficient classification of biological data. Optim. Lett. 3(3), 357–366 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kullback, S.: Information Theory and Statistics. Dover Publications, Mineola (1997)

    MATH  Google Scholar 

  16. Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses, 3rd edn. Springer, New York (2008)

    MATH  Google Scholar 

  17. Li, S.Z., Jain, A.K. (eds.): Handbook of Face Recognition, 2nd edn. Springer, New York (2011)

  18. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  19. Mic, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognit. Lett. 15(1), 9–17 (1994)

    Article  Google Scholar 

  20. Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman and Hall/CRC, Boca Raton (2005)

    Book  MATH  Google Scholar 

  21. Mladenovic, N., Brimberg, J., Hansen, P., Moreno-Perez, J.A.: The p-median problem: a survey of metaheuristic approaches. Eur. J. Oper. Res. 179(3), 927–939 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  22. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340 (2009)

  23. Prince, S.: Computer Vision: Models, Learning, and Inference. Cambridge University Press, New York (2012)

    Book  MATH  Google Scholar 

  24. Sabo, K., Scitovski, R., Vazler, I.: One-dimensional center-based l 1-clustering method. Optim. Lett. 7(1), 5–22 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. Savchenko, A.V.: Directed enumeration method in image recognition. Pattern Recognit. 45(8), 2952–2961 (2012)

    Article  Google Scholar 

  26. Savchenko, A.V.: Real-time image recognition with the parallel directed enumeration method. In: Chen, M., Leibe, B., Neumann, B. (eds.) Computer Vision Systems, Lecture Notes in Computer Science, vol. 7963, pp. 123–132. Springer, Berlin (2013)

  27. Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)

  28. Syed, M.N., Pardalos, P.M., Principe, J.C.: On the optimization properties of the correntropic loss function in data analysis. Optim. Lett. 8(3), 823–839 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  29. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1701–1708 (2014)

  30. Takci, H., Gungor, T.: A high performance centroid-based classification approach for language identification. Pattern Recognit. Lett. 33(16), 2077–2084 (2012)

    Article  Google Scholar 

  31. Tan, X., Chen, S., Zhou, Z.H., Zhang, F.: Face recognition from a single image per person: a survey. Pattern Recognit. 39(9), 1725–1745 (2006)

    Article  MATH  Google Scholar 

  32. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Burlington (2008)

    MATH  Google Scholar 

  33. Wang, X., Li, Z., Zhang, L., Yuan, J.: Grassmann Hashing for approximate nearest neighbor search in high dimensional space. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2011)

  34. Yodkhad, P., Kawewong, A., Patanukhom, K.: Approximate nearest neighbor search using self-organizing map clustering for face recognition system. In: International Computer Science and Engineering Conference (ICSEC), pp. 151–156 (2014)

  35. Zhang, N., Yang, J., Qian, J.J.: Component-based global k-NN classifier for small sample size problems. Pattern Recognit. Lett. 33(13), 1689–1694 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2015–2016 (grant No 15-01-0019) and supported within the framework of a subsidy granted to the HSE by the Government of the Russian Federation for the implementation of the Global Competitiveness Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey V. Savchenko.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchenko, A.V. Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases. Optim Lett 11, 329–341 (2017). https://doi.org/10.1007/s11590-015-0948-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-015-0948-6

Keywords

Navigation