Abstract
Discovering a latent common space between different modalities plays an important role in cross-modality pattern recognition. Existing techniques often require absolutely-paired observations as training data, and are incapable of capturing more general semantic relationships between cross-modality observations. This greatly limits their applications. In this paper, we propose a general framework for learning a latent common space from relatively-paired observations (i.e., two observations from different modalities are more-likely-paired than another two). Relative-pairing information is encoded using relative proximities of observations in the latent common space. By building a discriminative model and maximizing a distance margin, a projection function that maps observations into the latent common space is learned for each modality. Cross-modality pattern recognition can then be carried out in the latent common space. To speed up the learning procedure for large scale training data, the problem is reformulated into learning a structural model, which is efficiently solved by the cutting plane algorithm. To evaluate the performance of the proposed framework, it has been applied to feature fusion, cross-pose face recognition, text-image retrieval and attribute-image retrieval. Experimental results demonstrate that the proposed framework outperforms other state-of-the-art approaches.
Similar content being viewed by others
Notes
The number of variables is very small.
References
Andrea, F., Yoram, S., Sha, F., & Jitendra, M. (2007). Learning globally-consistent local distance functions for shape-based image retrieval. In: ICCV (pp. 1–8).
Bach, F., & Jordan, M. (2005). A probabilistic interpretation of canonical correlation analysis. Technical Report: Department of Statistics, University of California, Berkeley.
Blanz, V., Grother, P., Phillips, P., & Vetter, T. (2005). Face recognition based on frontal views generated from non-frontal images. In: CVPR (pp. 454–461).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. JMLR, 3, 993–1022.
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In: COMPSTAT (pp. 177–187).
Bronstein, M., & Bronstein, A. (2010). Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: CVPR (pp. 3594–3601).
Chai, X., Shan, S., Chen, X., & Gao, W. (2007). Locally linear regression for pose-invariant face recognition. TIP, 16(7), 1716–1725.
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I. (2007). Information theoretic metric learning. In: ICML (pp. 209–216).
Ek, C.H., Rihan, J., Torr, P.H.S., Rogez, G., & Lawrence, N.D. (2008). Ambiguity modeling in latent spaces. In: MLMI (pp. 62–73).
Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2004). Neighbourhood components analysis. In: NIPS (pp. 513–520).
Gong, Y., Ke, Q., Isard, M., & Lazebnik, S. (2014). A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106(2), 210–233.
Gross, R., Matthews, I., & Baker, S. (2004). Appearance-based face recognition and light-fields. PAMI, 26(4), 449–465.
Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639–2664.
Joachims, T. (2006). Training linear SVMs in linear time. In: KDD, pp 217–226.
Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural SVMs. Machine Learning, 77(1), 27–59.
Kan, M., Shan, S., & Zhang, H. (2012). Multi-view discriminant analysis. In: ECCV (pp. 808–821).
Knutsson, H., Borga, M., & Tomas, L. (1997). Learning canonical correlations. In: SCIA, Computer Vision Laboratory, vol 1.
Kuang, Z., & Wong, K.Y.K. (2013). Relatively-paired space analysis. In: BMVC.
Kumar, N., Berg, A.C., Belhumeur, P.N., & Nayar, S.K. (2009). Attribute and simile classifiers for face verification. In: ICCV.
Lampert, C., & Krömer, O. (2010). Weakly-paired maximum covariance analysis for multimodal dimensionality reduction and transfer learning. In: ECCV (pp. 566–579).
Lin, D., & Tang, X. (2005). Coupled space learning of image style transformation. In: ICCV (pp. 1699–1706).
Lin, D., & Tang, X. (2006). Inter-modality face recognition. In: ECCV (pp. 13–26).
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1), 503–528.
Liu, X., & Chen, T. (2005). Pose-robust face recognition using geometry assisted probabilistic modeling. CVPR, 1, 502–509.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.
Navaratnam, R., Fitzgibbon, A.W., & Cipolla, R. (2007). The joint manifold model for semi-supervised multi-valued regression. In: ICCV (pp. 1–8).
Parameswaran, S., & Weinberger, K. (2010). Large margin multi-task metric learning. In: NIPS (pp. 1–9).
Parikh, D., & Grauman, K. (2011). Relative attributes. In: ICCV.
Prince, S., Warrell, J., Elder, J., & Felisberti, F. (2008). Tied factor analysis for face recognition across large pose differences. PAMI, 30(6), 970–984.
Quadrianto, N., & Lampert, C. (2011). Learning multi-view neighborhood preserving projections. In: ICML (pp. 425–432).
Rakotomamonjy, A. (2004). Support vector machines and area under ROC curves. PSI-INSA de Rouen: Technical Report.
Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., & Vasconcelos, N. (2010). A new approach to cross-modal multimedia retrieval. In: ACM MM (pp. 251–260).
Rosipal, R., & Krämer, N. (2006). Overview and recent advances in partial least squares. Subspace, latent structure and feature selection (pp. 34–51). Berlin: Springer.
Rupnik, J., & Shawe-Taylor, J. (2010). Multi-view canonical correlation analysis. In: SiKDD.
Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In: ECCV (pp. 1–14).
Shalev-Shwartz, Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-GrAdient SOlver for SVM. In: ICML.
Sharma, A., & Jacobs, D.W. (2011). Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: CVPR (pp. 593–600).
Sharma, A., & Kumar, A. (2012). Generalized multiview analysis: A discriminative latent space. In: CVPR (pp. 2160–2167).
Shen, C., Kim, J., Wang, L., & Hengel, A. (2009). Positive semidefinite metric learning with boosting. In: NIPS (pp. 1651–1659).
Shen, C., Kim, J., & Wang, L. (2011). A scalable dual approach to semidefinite metric learning. In: CVPR (pp. 2601–2608).
Shon, A.P., Grochow, K., Hertzmann, A., & Rao, R.P.N. (2006). Learning shared latent structure for image synthesis and robotic imitation. In: NIPS (pp. 1233–1240).
Stewart, G. (1993). On the early history of the singular value decomposition. In: SIAM (pp. 551–566).
Sun, T., Chen, S., Yang, J., & Shi, P. (2008). A novel method of combined feature extraction for recognition. In: ICDM (pp. 1043–1048).
Taskar, B. (2004). Learning structured prediction models: A large margin apporach. PhD thesis, Stanford University.
Tenenbaum, J., & Freeman, W. (2000). Separating style and content with bilinear models. Neural Computation, 12(6), 1247–1283.
Torre, F., & Black, M. (2001). Dynamic coupled component analysis. CVPR, 2, 643–650.
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In: ICML (pp. 104–112).
Wang, B., Tang, J., Fan, W., Chen, S., Yang, Z., & Liu, Y. (2009). Heterogeneous cross domain ranking in latent space categories and subject descriptors. In: CIKM.
Weinberger, K.Q., Blitzer, J., & Saul, L.K. (2006). Distance metric learning for large margin nearest neighbor classification. In: NIPS.
Wu, W., Xu, J., & Li, H. (2010). Learning similarity function between objects in heterogeneous spaces. Tech. Rep. MSR-TR-2010-86.
Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2002). Distance metric learning, with application to clustering with side-information. NIPS, 15, 505–512.
Zhang, J., & Zhang, D. (2011). A novel ensemble construction method for multi-view data using random cross-view correlation between within-class examples. Pattern Recognition, 44(6), 1162–1171.
Zhang, W., Wang, X., & Tang, X. (2011). Coupled information-theoretic encoding for face photo-sketch recognition. In: CVPR (pp. 513–520).
Zheng, W., Gong, S., & Tao, X. (2013). Re-identification by relative distance comparison. PAMI, 35(3), 653–668.
Zhou, H., Kuang, Z., & Wong, K.Y.K. (2012). Markov Weight Fields for face sketch synthesis. In: CVPR (pp. 1091–1097).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Tilo Burghardt , Majid Mirmehdi, Walterio Mayol and Dima Damen.
Rights and permissions
About this article
Cite this article
Kuang, Z., Wong, KY.K. Relatively-Paired Space Analysis: Learning a Latent Common Space From Relatively-Paired Observations. Int J Comput Vis 113, 176–192 (2015). https://doi.org/10.1007/s11263-014-0783-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-014-0783-8