Abstract
Approaches to distance metric learning (DML) for Mahalanobis distance metric involve estimating a parametric matrix that is associated with a linear transformation. For complex pattern analysis tasks, it is necessary to consider the approaches to DML that involve estimating a parametric matrix that is associated with a nonlinear transformation. One such approach involves performing the DML of Mahalanobis distance in the feature space of a Mercer kernel. In this approach, the problem of estimation of a parametric matrix of Mahalanobis distance is formulated as a problem of learning an optimal kernel gram matrix from the kernel gram matrix of a base kernel by minimizing the logdet divergence between the kernel gram matrices. We propose to use the optimal kernel gram matrices learnt from the kernel gram matrix of the base kernels in pattern analysis tasks such as clustering, multi-class pattern classification and nonlinear principal component analysis. We consider the commonly used kernels such as linear kernel, polynomial kernel, radial basis function kernel and exponential kernel as well as hyper-ellipsoidal kernels as the base kernels for optimal kernel learning. We study the performance of the DML-based class-specific kernels for multi-class pattern classification using support vector machines. Results of our experimental studies on benchmark datasets demonstrate the effectiveness of the DML-based kernels for different pattern analysis tasks.
Similar content being viewed by others
References
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Jain P, Kulis B, Davis JV, Dhillon IS (2009) Metric and kernel learning using linear transformation. Int J Mach Learn 19:1325–1352
Xing E, Ng A, Jordan M, Russell S (2004) Distance metric learning with application to clustering with side information. In: Advances in neural information processing systems, pp 521–528
Yang L, Jin R (2007) Distance metric learning: a comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University
Mahalanobis PC (1936) On the generalized distance in statistics. Proc Natl Inst Sci India 12:49–55
Dhillon IS, Tropp J (2007) Matrix nearness problems with Bregman divergences. SIAM J Matrix Anal Appl 29:1120–1146
Bregman LM (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math 7(3):620–631
Brian B Kulis, Matyas A Sustik, Dhillon IS (2009) Low-rank kernel learning with Bregman matrix divergences. J Mach Learn Res 10:341–376
Kedem D, Stephen T, Weinberger K, Sha F, Lanckriet G (2012) Non-linear metric learning. Adv Neural Inf Proc Syst 25:2582–2590
Boser IG, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: The fifth annual workshop on computational learning theory, pp 144–152
Cortes C, Vapnik V (1995) Support-vector network. J Mach Learn Res 20(3):273–297
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Xun L, Zhihao N (2011) Hyperellipsoidal statistical classification in a reproducing kernel Hilbert space. IEEE Trans Neural Netw 22(6):968–975
Xu Z, Weinbergerger KQ, Oliver C (2013) Distance metric learning for kernel machines. arXiv:1208.3422v2 [stat.ML]
Mohan BSS, Sekhar CC (2012) Class-specific mahalanobis distance metric learning for biological image classification. In: 9th international conference on image analysis and recognition—ICIAR-2012, Aveiro, Portugal, pp 240–248
Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Shivaswamy PK, Jebara T (2007) Ellipsoidal kernel machines. In: Proceeding of the 12th international workshop on artificial intelligence and statistics, pp 1–8
Bishop CM (ed) (2006) Pattern recognition and machine learning. Springer, Cambridge
Scholkopf B, Smola AJ, Muller KR (1999) Kernel principal component analysis, advances in kernel methods. MIT Press, Cambridge, pp 327–352
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighbourhood component analysis. In: Advances in neural information processing systems, pp 513–520
Weinberger KQ, Sha F, Saul L (2010) Convex optimization for distance metric learning and pattern classification. IEEE Signal Process Mag 27(3):146–150
Davis JV, Kulis B, Jain P, Survit S, Dillon IS (2006) Information theoretic metric learning. In: Proceeding of the 24th international conference on machine learning, pp 209–216
Weinberger KQ, Saul LK (2008) Fast solvers and efficient implementations for distance metric learning. In: Proceedings of the 25th international conference on machine learning, Helsinki, Finland, pp 1160–1167
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
Wang Q (2014) Learning with kernels: kernel principal component analysis applications in face recognition and active shape models. arXiv 1207.3538v3 [cs.CV]
Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, Sanfransisco, pp 77–86
Lichman M. UCI machine learning repository: U. C. Irving machine learning repository, University of California, Irvine, School of Information and Computer Sciences. http://www.archiv.ics.uci.edu/ml/
Chih-Chung C, Chih-Jen L. LIBSVM data: classification, regression and multi-label. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):14–175
Gesture dataset 2012, Massey University, New Zealand. http://www.massey.ac.nz/~albarcza/gesture_dataset2012.html
Chih-Chung C, Chih-Jen L (2011) LIBSVM—A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. http://www.csie.ntu.edu.tw/~cjlin/Librium/
Dileep AD, Sekhar C Chandra (2014) Class-specific GMM based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines. Speech Commun 57:126–143
Barczak ALC, Reyes NH, Abastillas M, Piccio A, Susnjak T (2011) A new 2D static hand gesture colour image dataset for ASL gestures. Research letters in information mathematical sciences, vol 15. IIMS, Massey University, Auckland, New Zealand, pp 12–20
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mohan, B.S.S., Sekhar, C.C. Distance metric learning-based kernel gram matrix learning for pattern analysis tasks in kernel feature space. Pattern Anal Applic 21, 847–867 (2018). https://doi.org/10.1007/s10044-017-0670-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-017-0670-3