Skip to main content
Log in

Generalized sparse metric learning with relative comparisons

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The objective of sparse metric learning is to learn a distance measure from a set of data in addition to finding a low-dimensional representation. Despite demonstrated success, the performance of existing sparse metric learning approaches is usually limited because the methods assumes certain problem relaxations or they target the SML objective indirectly. In this paper, we propose a Generalized Sparse Metric Learning method. This novel framework offers a unified view for understanding many existing sparse metric learning algorithms including the Sparse Metric Learning framework proposed in (Rosales and Fung ACM International conference on knowledge discovery and data mining (KDD), pp 367–373, 2006), the Large Margin Nearest Neighbor (Weinberger et al. in Advances in neural information processing systems (NIPS), 2006; Weinberger and Saul in Proceedings of the twenty-fifth international conference on machine learning (ICML-2008), 2008), and the D-ranking Vector Machine (D-ranking VM) (Ouyang and Gray in Proceedings of the twenty-fifth international conference on machine learning (ICML-2008), 2008). Moreover, GSML also establishes a close relationship with the Pairwise Support Vector Machine (Vert et al. in BMC Bioinform, 8, 2007). Furthermore, the proposed framework is capable of extending many current non-sparse metric learning models to their sparse versions including Relevant Component Analysis (Bar-Hillel et al. in J Mach Learn Res, 6:937–965, 2005) and a state-of-the-art method proposed in (Xing et al. Advances in neural information processing systems (NIPS), 2002). We present the detailed framework, provide theoretical justifications, build various connections with other models, and propose an iterative optimization method, making the framework both theoretically important and practically scalable for medium or large datasets. Experimental results show that this generalized framework outperforms six state-of-the-art methods with higher accuracy and significantly smaller dimensionality for seven publicly available datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal S, Wills J, Gayton L, Lanckriet G, Kriegman D, Belongie S (2008) Generalized non-metric multidimensional scaling. In: International conference on artificial intelligence and statistics (AISTAT’08)

  2. Argyriou A, Evgeniou T, Pontil M (2006) Multi-task feature learning. In: Advances in neural information processing systems (NIPS) 18

  3. Asuncion A, Newman D (2007) UCI machine-learning repository. In: http://www.ics.uci.edu/~mlearn/MLRepository.html

  4. Athitsos V, Alton J, Sclaroff S, Kollios G (2004) Boostmap: a method for efficient approximate similarity rankings. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition (CVPR)

  5. Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6: 937–965

    MathSciNet  Google Scholar 

  6. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of 1998 IEEE computer society conference on computer vision and pattern recognition (CVPR-2005)

  7. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory IT-13(1): 21–27

    Article  Google Scholar 

  8. Cox T, Cox M (1994) Multidimensional scaling. Chapman & Hall, London

    MATH  Google Scholar 

  9. Davis J, Kulis B, Jain P, Sra S, Dhillon I (2007) Information-theoretic metric learning. In: International conference on machine learning (ICML)

  10. Fung G, Mangasarian OL, Smola AJ (2002) Minimal kernel classifiers. J Mach Learn Res 3: 303–321

    Article  MathSciNet  Google Scholar 

  11. Fung G, Rosales R, Rao RB (2007) Feature selection and kernel design via linear programming. In: Proceedings of internet joint conference on artificial intelligence (IJCAI), pp 786–791

  12. Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighbourhood component analysis. In: Advances in neural information processing systems (NIPS)

  13. Hastie T, Tibshirani R, Friedman R (2003) The Elements of statistical learning. Springer, New York

    Google Scholar 

  14. Huang K, Yang H, King I, Lyu MR (2004) Learning classifiers from imbalanced data based on biased minimax probability machine. In: Proceedings of 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR-2004), vol 2. pp 558–563

  15. Huang K, Yang H, King I, Lyu MR (2008) Maximin margin machine: learning large margin classifiers globally and locally. IEEE Trans on Neural Netw 19: 260–272

    Article  Google Scholar 

  16. Huang K, Yang H, King I, Lyu MR, Chan L (2004) The minimum error minimax probability machine. J Mach Learn Res 5: 1253–1286

    MathSciNet  Google Scholar 

  17. Jolliffe IT (1989) Principal component analysis. Springer, New York

    Google Scholar 

  18. Micchelli CA, Pontil M (2005) Learning the kernel function via regularization. J Mach Learn Res 6: 1099–1125

    MathSciNet  Google Scholar 

  19. Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Springer, New York

    Google Scholar 

  20. Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 152: 103–127

    Google Scholar 

  21. Ouyang H, Gray A (2008) Learning dissimilarities by ranking: from sdp to qp. In: Proceedings of the twenty-five international conference on machine learning (ICML-2008)

  22. Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19: 361–394

    Article  Google Scholar 

  23. Quan X, Liu G, Lu Z, Ni X, Liu W (2010) Short text similarity based on probabilistic topics. Knowl Inf Syst

  24. Rosales R, Fung G (2006) Learning sparse metrics via linear programming. In: ACM international conference on knowledge discovery and data mining (KDD), pp 367–373

  25. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science (290):123–137

  26. Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge

    Google Scholar 

  27. Schultz M, Joachims T (2003) Learning a distance metric from relative comparisons. In: Advances in neural information processing systems (NIPS)

  28. Song G, Cui B, Zheng B, Xie K, Yang D (2009) Accelerating sequence searching: dimensionality reduction method. Knowl Inf Syst 20: 301–322

    Article  Google Scholar 

  29. Song L, Smola A, Borgwardt K, Gretton A (2008) Colored maximum variance unfolding. In: Advances in neural information processing systems (NIPS)

  30. Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 18: 1027–1061

    Google Scholar 

  31. Torresani L, Lee K (2007) Large margin component analysis. In: Advances in neural information processing systems (NIPS)

  32. Vert J-P, Qiu J, Nobel WS (2007) A new pairwise kernel for biological network inference with support vector machines. BMC Bioinform 8

  33. Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: International conference on machine learning (ICML)

  34. Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems (NIPS)

  35. Weinberger K, Saul L (2008) Fast solvers and efficient implementations for distance metric learning. In: Proceedings of the twenty-fifth international conference on machine learning (ICML-2008)

  36. Xing E, Ng A, Jordan M, Russell S (2002) Distance metric learning, with application to clustering with side information. In: Advances in neural information processing systems (NIPS)

  37. Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. In: Technical report, department of computer science and engineering. Michigan state university

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaizhu Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, K., Ying, Y. & Campbell, C. Generalized sparse metric learning with relative comparisons. Knowl Inf Syst 28, 25–45 (2011). https://doi.org/10.1007/s10115-010-0313-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0313-0

Keywords

Navigation