Iterative Category Discovery via Multiple Kernel Metric Learning

Galleguillos, Carolina; McFee, Brian; Lanckriet, Gert R. G.

doi:10.1007/s11263-013-0679-z

Iterative Category Discovery via Multiple Kernel Metric Learning

Published: 07 December 2013

Volume 108, pages 115–132, (2014)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Carolina Galleguillos¹,
Brian McFee² &
Gert R. G. Lanckriet³

852 Accesses
13 Citations
Explore all metrics

Abstract

The goal of an object category discovery system is to annotate a pool of unlabeled image data, where the set of labels is initially unknown to the system, and must therefore be discovered over time by querying a human annotator. The annotated data is then used to train object detectors in a standard supervised learning setting, possibly in conjunction with category discovery itself. Category discovery systems can be evaluated in terms of both accuracy of the resulting object detectors, and the efficiency with which they discover categories and annotate the training data. To improve the accuracy and efficiency of category discovery, we propose an iterative framework which alternates between optimizing nearest neighbor classification for known categories with multiple kernel metric learning, and detecting clusters of unlabeled image regions likely to belong to a novel, unknown categories. Experimental results on the MSRC and PASCAL VOC2007 data sets show that the proposed method improves clustering for category discovery, and efficiently annotates image regions belonging to the discovered classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster Centers Provide Good First Labels for Object Detection

Multiple Instance Learning for Automatic Image Annotation

Iterative Active Classification of Large Image Collection

Notes

In this setting, a true ranking is any ranking which places all relevant results before all irrelevant results.
The Hilbert-Schmidt norm is a natural generalization of the Frobenius norm. For our purposes, this can be understood as treating \(L\) as a collection of \(n\) elements \(v_i \in \mathcal {H}\) (one per output dimension of \(L\)), and summing over the squared-norms: \(\Vert L\Vert _\text {HS}=\sqrt{\sum _i \langle v_i, v_i\rangle _\mathcal {H}}\).
Familiarity refers to a segment’s true label, which may or may not be available: an unlabeled or test segment may be familiar or unfamiliar.
We chose spectral clustering over agglomerative clustering in this set of experiments to facilitate direct comparison to Lee and Grauman (2010).
Weak labeling in PASCAL dataset makes it difficult to evaluate due to background segments without ground truth.
In Table 6, MKLMNN (Galleguillos et al. 2010) has no MAP score for class tree because there was only one test segment of that class predicted as unfamiliar.

References

Bart, E., Porteous, I., Perona, P., & Welling, M. (2008). Unsupervised learning of visual taxonomies. In Computer vision and pattern recognition (CVPR) (pp. 1–8).
Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., et al. (2010). Visual recognition with humans in the loop. In European conference in computer vision (ECCV) (pp. 438–451)
Collins, B., Deng, J., Li, K., & Fei-Fei, L. (2008). Towards scalable dataset construction: An active learning approach. In Computer Vision—ECCV.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. The Journal of Machine Learning Research, 20(3), 273–297.
MATH Google Scholar
Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, 20(4), 364–366.
Article MATH MathSciNet Google Scholar
Everingham, M, Van Gool, L, Williams, CKI, Winn, J, Zisserman, A (2007). The PASCAL visual object classes, challenge 2007 (VOC2007) Results.
Faktor, A., & Irani, M. (2012). “Clustering by composition”—unsupervised discovery of image categories. In European conference in computer vision (ECCV) (pp. 474–487). Springer.
Forsyth, D. A., Malik, J., Fleck, M. M., Greenspan, H., Leung, T., Belongie, S., et al. (1995). Finding pictures of objects in large collections of images. The Computer Journal, 1144, 335–360.
Google Scholar
Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In International conference in computer vision (ICCV) (pp. 1–8).
Galleguillos, C., McFee, B., Belongie, S., & Lanckriet, G. (2010). Multi-class object localization by combining local contextual interactions. Computer vision and pattern recognition (CVPR) (pp. 113–120).
Galleguillos, C., McFee, B., Belongie, S., & Lanckriet, G. (2011). From region similarity to category discovery. In Computer vision and pattern recognition (CVPR) (pp. 2665–2672).
Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In International conference in computer vision (ICCV).
Globerson, A., & Roweis, S. (2007). Visualizing pairwise similarity via semidefinite embedding. In International conference on artificial intelligence and statistics (AISTATS).
Grauman, K., & Darrell, T. (2006). Unsupervised learning of categories from sets of partially matching image features. In Computer vision and pattern recognition (CVPR).
Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In European conference in computer vision (ECCV) (pp. 30–43). Springer : In .
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.
Article Google Scholar
Joachims, T. (2005). A support vector method for multivariate performance measures. In International conference on machine learning (pp. 377–384).
Joachims, T., Finley, T., & Yu, C. N. J. (2009). Cutting-plane training of structural svms. The Journal of Machine Learning Research, 77(1), 27–59.
Article MATH Google Scholar
Kang, H., Hebert, M., Efros, A. A., & Kanade, T. (2012). Connecting missing links: object discovery from sparse observations using 5 million product images. European conference in computer vision (ECCV) (pp. 794–807). Springer.
Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El Ghaoui, L., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5, 27–72.
MATH Google Scholar
Lee, Y., & Grauman, K. (2010). Object-graphs for context-aware category discovery. In Computer vision and pattern recognition (CVPR).
Lee, Y., & Grauman, K. (2011). Learning the easy things first: Self-paced visual category discovery. In Computer vision and pattern recognition (CVPR) (pp. 1721–1728).
McFee, B., & Lanckriet, G. (2010). Metric learning to rank. In International conference on machine learning.
Meila, M., & Shi, J. (2001). Learning Segmentation by Random Walks. Advances in neural information processing systems.
Rabinovich, A., Lange, T., Buhmann, J., & Belongie, S. (2006). Model order selection and cue combination for image segmentation. In Computer vision and pattern recognition (CVPR).
Russell, B., Freeman, W., Efros, A., Sivic, J., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In Computer vision and pattern recognition (CVPR).
Schölkopf, B., Herbrich, R., Smola, A. J., & Williamson, R. (2001). A generalized representer theorem. In Computational learning theory (pp. 416–426).
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Article Google Scholar
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In International conference in computer vision (ICCV).
Sivic, J., Russell, B., Zisserman, A., Freeman, W., & Efros, A. (2008). Unsupervised discovery of visual object class hierarchies. In Computer vision and pattern recognition (CVPR) (pp. 1–8).
Tian, Y., Liu, W., Xiao, R., Wen, F., & Tang, X. (2007). A face annotation framework with partial clustering and interactive labeling. In Computer vision and pattern recognition (CVPR) (pp. 1–8).
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In Computer vision and pattern recognition (CVPR).
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. The Journal of Machine Learning Research, 6, 1453–1484.
MATH MathSciNet Google Scholar
Tuytelaars, T., Lampert, C., Blaschko, M., & Buntine, W. (2010). Unsupervised object discovery: A comparison. International Journal of Computer Vision, 88(2), 284–302.
Article Google Scholar
Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In International conference in computer vision (ICCV).
Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International conference in computer vision (ICCV).
Vijayanarasimhan, S., & Grauman, K. (2009). What’s it going to cost you? Predicting effort vs. informativeness for multi-label image annotations. In Computer vision and pattern recognition (CVPR).
Wang, G., Hoiem, D., & Forsyth, D. (2010). Learning image similarity from flickr groups using stochastic intersection kernel machines. In Computer vision and pattern recognition (CVPR).
Weinberger, K. Q., Blitzer, J., & Saul, L. K. (2006). Distance metric learning for large margin nearest neighbor classification. Advances in neural information processing systems.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83.
Article Google Scholar
Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In International conference in computer vision (ICCV) (Vol. 2, pp. 1800–1807).
Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis. Machine Learning.
Zhu, J. Y., Wu, J., Wei, Y., Chang, E., & Tu, Z. (2012). Unsupervised object class discovery via saliency-guided multiple class learning. In Computer vision and pattern recognition (CVPR) (pp. 3218–3225).

Download references

Author information

Authors and Affiliations

SET Media Inc, San Francisco, CA, USA
Carolina Galleguillos
Columbia University, New York, NY, USA
Brian McFee
University of California, San Diego, La Jolla, CA, USA
Gert R. G. Lanckriet

Authors

Carolina Galleguillos
View author publications
You can also search for this author in PubMed Google Scholar
Brian McFee
View author publications
You can also search for this author in PubMed Google Scholar
Gert R. G. Lanckriet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carolina Galleguillos.

Appendix: Implementation

The implementation uses the 1-slack margin-rescaling cutting plane algorithm (Joachims et al. 2009) to solve for all \(W^t\) within a prescribed tolerance \(\epsilon = 0.01\). We further constrain each \(W^t\) to be a diagonal matrix. This simplifies the semi-definite program to a linear program. For \(m\) kernels and \(n\) training points, this also reduces the number of parameters needed to learn from \(O(mn^{2})\) (\(m\) symmetric \(n\)-by-\(n\) matrices) to \(mn\).

In all experiments with MKMLR, we choose the ranking loss \(\Delta \) as the normalized discounted cumulative gain (NDCG) (Järvelin and Kekäläinen 2002) truncated at \(10\). Slack parameters \(C\) and kernel bandwidth \(\sigma \) for spectral clustering were found by cross-validation on the training set. For testing, we fix \(k=17\) as the number of nearest neighbors for classification across all experiments. Multiple stable segmentations were computed—9 different segmentations for each image—each of which contains between \(2\) and \(10\) segments, resulting in 54 segments per image (Rabinovich et al. 2006; Shi and Malik 2000).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galleguillos, C., McFee, B. & Lanckriet, G.R.G. Iterative Category Discovery via Multiple Kernel Metric Learning. Int J Comput Vis 108, 115–132 (2014). https://doi.org/10.1007/s11263-013-0679-z

Download citation

Received: 26 February 2013
Accepted: 21 November 2013
Published: 07 December 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11263-013-0679-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative Category Discovery via Multiple Kernel Metric Learning

Abstract

Access this article

Similar content being viewed by others

Cluster Centers Provide Good First Labels for Object Detection

Multiple Instance Learning for Automatic Image Annotation

Iterative Active Classification of Large Image Collection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Implementation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Iterative Category Discovery via Multiple Kernel Metric Learning

Abstract

Access this article

Similar content being viewed by others

Cluster Centers Provide Good First Labels for Object Detection

Multiple Instance Learning for Automatic Image Annotation

Iterative Active Classification of Large Image Collection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Implementation

Appendix: Implementation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation