Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases

Savchenko, Andrey V.

doi:10.1007/s11590-015-0948-6

Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases

Original Paper
Published: 16 September 2015

Volume 11, pages 329–341, (2017)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Andrey V. Savchenko¹

211 Accesses
9 Citations
Explore all metrics

Abstract

This paper addresses the problem of insufficient performance of statistical classification with the medium-sized database (thousands of classes). Each object is represented as a sequence of independent segments. Each segment is defined as a random sample of independent features with the distribution of multivariate exponential type. To increase the speed of the optimal Kullback–Leibler minimum information discrimination principle, we apply the clustering of the training set and an approximate nearest neighbor search of the input object in a set of cluster medoids. By using the asymptotic properties of the Kullback–Leibler divergence, we propose the maximal likelihood search procedure. In this method the medoid to check is selected from the cluster with the maximal joint density (likelihood) of the distances to the previously checked medoids. Experimental results in image recognition with artificially generated dataset and Essex facial database prove that the proposed approach is much more effective, than an exhaustive search and the known approximate nearest neighbor methods from FLANN and NonMetricSpace libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Olga Russakovsky, Jia Deng, … Li Fei-Fei

Survey on SVM and their application in image classification

Article 11 January 2018

Mayank Arya Chandra & S. S. Bedi

A review of unsupervised feature selection methods

Article 29 January 2019

Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

Notes

References

Aggarwal, C.: Data Mining: The Textbook. Springer, New York (2015)
Book MATH Google Scholar
Andoni, A., Indyk, P.: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’06, pp. 459–468 (2006)
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)
Article MathSciNet MATH Google Scholar
Beis, J.S., Lowe, D.G.: Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1000–1006 (1997)
Boginski, V., Butenko, S., Pardalos, P.M.: Mining market data: a network approach. Comput. Oper. Res. 33(11), 3171–3184 (2006)
Article MATH Google Scholar
Boytsov, L., Naidan, B.: Engineering Efficient and Effective Non-metric Space Library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) Similarity and Applications, Lecture Notes in Computer Science, vol. 8199, pp. 280–293. Springer, Berlin (2013)
Bustos, B., Navarro, G., Chvez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24(14), 2357–2366 (2003)
Article MATH Google Scholar
Cayton, L.: Efficient Bregman range search. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 243–251. Curran Associates, Inc. (2009)
Chen, S., Zhang, D., Zhou, Z.H.: Enhanced (PC)2a for face recognition with one training image per person. Pattern Recognit. Lett. 25(10), 1173–1181 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)
Article MathSciNet MATH Google Scholar
Eick, C.F., Zeidat, N.: Using supervised clustering to enhance classifiers. In: Hacid, M.S., Murray, N.V., Ra, Z.W., Tsumoto, S. (eds.) Foundations of Intelligent Systems, Lecture Notes in Computer Science, vol. 3488, pp. 248–256. Springer, Berlin (2005)
Gonzalez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008)
Article Google Scholar
Guarracino, M.R., Chinchuluun, A., Pardalos, P.M.: Decision rules for efficient classification of biological data. Optim. Lett. 3(3), 357–366 (2009)
Article MathSciNet MATH Google Scholar
Kullback, S.: Information Theory and Statistics. Dover Publications, Mineola (1997)
MATH Google Scholar
Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses, 3rd edn. Springer, New York (2008)
MATH Google Scholar
Li, S.Z., Jain, A.K. (eds.): Handbook of Face Recognition, 2nd edn. Springer, New York (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Mic, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recognit. Lett. 15(1), 9–17 (1994)
Article Google Scholar
Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman and Hall/CRC, Boca Raton (2005)
Book MATH Google Scholar
Mladenovic, N., Brimberg, J., Hansen, P., Moreno-Perez, J.A.: The p-median problem: a survey of metaheuristic approaches. Eur. J. Oper. Res. 179(3), 927–939 (2007)
Article MathSciNet MATH Google Scholar
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340 (2009)
Prince, S.: Computer Vision: Models, Learning, and Inference. Cambridge University Press, New York (2012)
Book MATH Google Scholar
Sabo, K., Scitovski, R., Vazler, I.: One-dimensional center-based l 1-clustering method. Optim. Lett. 7(1), 5–22 (2011)
Article MathSciNet MATH Google Scholar
Savchenko, A.V.: Directed enumeration method in image recognition. Pattern Recognit. 45(8), 2952–2961 (2012)
Article Google Scholar
Savchenko, A.V.: Real-time image recognition with the parallel directed enumeration method. In: Chen, M., Leibe, B., Neumann, B. (eds.) Computer Vision Systems, Lecture Notes in Computer Science, vol. 7963, pp. 123–132. Springer, Berlin (2013)
Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Syed, M.N., Pardalos, P.M., Principe, J.C.: On the optimization properties of the correntropic loss function in data analysis. Optim. Lett. 8(3), 823–839 (2013)
Article MathSciNet MATH Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1701–1708 (2014)
Takci, H., Gungor, T.: A high performance centroid-based classification approach for language identification. Pattern Recognit. Lett. 33(16), 2077–2084 (2012)
Article Google Scholar
Tan, X., Chen, S., Zhou, Z.H., Zhang, F.: Face recognition from a single image per person: a survey. Pattern Recognit. 39(9), 1725–1745 (2006)
Article MATH Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Burlington (2008)
MATH Google Scholar
Wang, X., Li, Z., Zhang, L., Yuan, J.: Grassmann Hashing for approximate nearest neighbor search in high dimensional space. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2011)
Yodkhad, P., Kawewong, A., Patanukhom, K.: Approximate nearest neighbor search using self-organizing map clustering for face recognition system. In: International Computer Science and Engineering Conference (ICSEC), pp. 151–156 (2014)
Zhang, N., Yang, J., Qian, J.J.: Component-based global k-NN classifier for small sample size problems. Pattern Recognit. Lett. 33(13), 1689–1694 (2012)
Article Google Scholar

Download references

Acknowledgments

The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2015–2016 (grant No 15-01-0019) and supported within the framework of a subsidy granted to the HSE by the Government of the Russian Federation for the implementation of the Global Competitiveness Program.

Author information

Authors and Affiliations

Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, 136 Rodionova Ulitsa, Nizhny Novgorod, 603093, Russia
Andrey V. Savchenko

Authors

Andrey V. Savchenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey V. Savchenko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Savchenko, A.V. Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases. Optim Lett 11, 329–341 (2017). https://doi.org/10.1007/s11590-015-0948-6

Download citation

Received: 27 April 2015
Accepted: 07 September 2015
Published: 16 September 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11590-015-0948-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Survey on SVM and their application in image classification

A review of unsupervised feature selection methods

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

Survey on SVM and their application in image classification

A review of unsupervised feature selection methods

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation