Abstract
In this paper, a novel framework is developed for leveraging large-scale loosely tagged images for object classifier training by addressing three key issues jointly: (a) spam tags e.g., some tags are more related to popular query terms rather than the image semantics; (b) loose object tags, e.g., multiple object tags are loosely given at the image level without identifying the object locations in the images; (c) missing object tags, e.g., some object tags are missed and thus negative bags may contain positive instances. To address these three issues jointly, our framework consists of the following key components for leveraging large-scale loosely tagged images for object classifier training: (1) distributed image clustering and inter-cluster visual correlation analysis for handling the issue of spam tags by filtering out large amounts of junk images automatically, (2) multiple instance learning with missing tag prediction for dealing with the issues of loose object tags and missing object tags jointly; (3) structural learning for leveraging the inter-object visual correlations to train large numbers of inter-related object classifiers jointly. Our experiments on large-scale loosely tagged images have provided very positive results.
Similar content being viewed by others
References
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, S., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. on PAMI, (2000)
Rui, Y., Huang, T.S., Chang, S.-F.: Image retrieval: current techniques, promising directions and open issues. J. Vis. Commun. Image Represent. 10, 39–62 (1999)
Datta, R., Joshi, D., Li, J., Wang, J.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2) (2008)
Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from Google’s image search. CVPR, Colorado (2006)
Berg, T., Berg, A., Edwards, J., Mair, M., White, R., Yeh, Y., Learned-Miller, E., Forsyth, D.: Names and faces in the news. CVPR, Colorado (2004)
Schroff, F., Criminisi, A., Zisserman, A.: Harvesting image databases from the web. ICCV, Rio de Janeiro (2007)
Quattoni, A., Collins, M., Darrell, T.: Learning visual representations using images with captions. CVPR, Colorado (2007)
Ben-Haim, N., Babenko, B., Belongie, S.: Improving image search via content based clustering. CVPR SLAM, Colorado (2006)
Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical clustering of WWW image search results using visual, textual, and link information. ACM Multimedia, New York (2004)
Fan, J., Shen, Y., Zhou, N., Gao, Y.: Harvesting large-scale weakly-tagged image databases from the Web. IEEE CVPR, Colorado (2010)
Deng, Y., Manjunath, B.S.: Color image segmentation. IEEE CVPR, Colorado (1999)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. on PAMI (2000)
Russell, B., Efros, A., Sivic, J., Freeman, W., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. IEEE CVPR, Colorado (2006)
Russell, B., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Intl. J. Comput. Vision 77(1) (2008)
Griffin, G., Holub, A., Perona, P.: The Caltech-256, Caltech Technical Report
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. CVPR, Colorado (2004)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. CVPR, Colorado (2009)
Flickr, http://www.flickr.com
von Ahn, L., Dabbish, L.: Labeling images with a computer game. ACM CHI, Paris (2004)
Frey, B.J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science (2007)
Vijayanarasimhan, S., Grauman, K.: Keywords to visual categories: multiple-instance learning for weakly supervised object categorization. CVPR, Colorado (2008)
Vijayanarasimhan, S., Grauman, K.: What’s it going to cost you?: predicting effort vs. informativeness for multi-label image annotations. CVPR, Colorado (2009)
Galleguillos C., Babenko B., Rabinovich A., Belongie S.J.: Weakly supervised object localization with stable segmentations. ECCV, Denver, pp.193–207 (2008)
Cour, T., Sapp, B., Jordan, C., Taskar, B.: Learning from ambiguously labeled images. CVPR, Colorado (2009)
Rosenberg, C.R., Hebert, M.: Training object detection models with weakly labeled data. BMVC, Guildford (2002)
Syed, U., Taskar, B.: Semi-supervised learning with adversarially missing label information. NIPS, Okazaki (2010)
Zhang, Q., Yu, W., Goldman, S.A., Fritts, J.E.: Content-based image retrieval using multiple-instance learning. ICML, Una (2002)
Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. ICML, Una (1998)
Chen, Y., Bi, J., Wang, J. Z.: MILES: multiple instance learning via embedded instance selection. IEEE Trans. PAMI 28(12), 1931–1947 (2006)
Viola, P., Platt, J.C., Zhang, C.: Multiple instance boosting for object detection. ICML, Una (2006)
Tang, J., Hua, X., Wang, M., Gu, Z., Qi, G., Wu, X.: Correlative linear neighborhood propagation for video annotation. IEEE Trans. on SMC 39(2), 409–416 (2009)
Qi G.-J., Hua X.-S., Rui Y., Tang J., Mei T., Zhang H.-J. Correlative multi-label video annotation. ACM Multimedia, San Francisco, pp.17–26 (2007)
Zha, Z., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J., Wang, Z.: Joint multi-label multi-instance learning for image classification. CVPR, Colorado (2008)
Zhou, Z.H., Zhang, M.-L.: Multi-instance multi-label learning with application to scene classification. NIPS, Okazaki (2006)
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. ICCV, Rio de Janeiro (2007)
Kumar, S., Hebert, M.: Discriminative random fields. Intl. J. Comput. Vision (2006)
Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. CVPR, Colorado (2008)
Jiang, W., Chang, S.-F., Loui, A.: Context-based concept fusion with boosted conditional random fields. IEEE ICASSP, Canada (2007)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters, OSDI’04, Berkeley (2004)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random field: Probabilistic models for segmenting and labeling sequence data. Proc. ICML (2001)
Joachims, T., Finley, T., Yu, C.: Cutting-plane training of structural SVMs. Machine Learn. 77(1), 27–59 (2009)
Blaschko, M., Lampert, C.: Learning to localize objects with structured output regression. ECCV, LNCS 5302, pp. 2–15, (2008)
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multi-class object detection. IEEE CVPR, (2004)
Fan, J., Gao, Y., Luo, H., Jain, R.: Mining multilevel image semantics via hierarchical classification. IEEE Trans. on Multimedia 10(2) (2008)
Fan, J., Gao, Y., Luo, H.: Integrating concept ontology and multi-task learning to achieve more effective classifier training for multi-level image annotation. IEEE Trans. on Image Process. 17(3), 407–426 (2008)
Evgeniou, T., Micchelli, C.A., Pontil, M.: Learning multiple tasks with kernel methods. J. Machine Learn. Res. 6, 615–637 (2005)
Yang, J., Liu, Y., Ping, E.X., Hauptmann, A.G.: Harmonium models for semantic video representation and classification. SIAM Conf. on Data Mining, (2007)
Chen, M.-Y., Hauptmann, A.G.: Discriminative fields for modeling semantic concepts in video. RIAO Large-Scale Semantic Access to Content, May 30–June 1, (2007)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. PAMI, (2009)
Yuan, X., Yan, S.: Visual classification with multi-task joint sparse representation. IEEE CVPR, pp. 3493–3500, (2010)
Tian, Q., Zhang, S., Zhou, W., Ji, R., Ni, B., Sebe, N.: Building descriptive and discriminative visual codebook for large-scale image applications. Multimed. Tools Appl. 51(2), 441–477 (2011)
Fan, J., Keim, D., Gao, Y., Luo, H., Li, Z.: JustClick: Personalized image recommendation via exploratory search from large-scale Flickr images. IEEE Trans. on CSVT 19(2), 273–288 (2009)
Sebe, N., Lew, M., Huijsmans, D.: Multi-scale sub-image search, pp. 79–82. ACM Multimedia, San Francisco (1999)
Jaimes, A., Chang, S.-F., Loui, A.C.: Detection of non-identical duplicate consumer photographs. Proc. PCM (2003)
Wang, B., Li, Z., Li, M., Ma, W.-Y.: Large-scale duplicate detection for web image search. IEEE ICME, Stanford (2006)
Ke, Y., Sukthankar, R., Huston, L.: Effective near-duplicate detection and sub-image retrieval. ACM Multimedia, San Francisco (2004)
Zhang D., Chang S.-F.: Detecting image near-duplicate by stochastic attributed relational graph matching with learning. ACM Multimedia, San Francisco (2004)
Meng, Y., Chang, E., Li, B.: Enhancing dpf for near-replica image recognition. IEEE CVPR, NU (2003)
Wu, X., Ngo, C.-W., Hauptmann, A.G., Tan, H.K.: Real-time near-duplicate elimination for web video search with content and context. IEEE Trans. on Multimedia 11(2), 196–207 (2009)
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: Proceeding of Ninth IEEE International Conference on Computer Vision (2003)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classication. In: CVPR, Colorado (2009)
Shen, Y., Peng, J., Feng, X., Fan, J.: Multiple instance learning with missing object tags. In: ICIMCS (2011)
Acknowledgements
This work is partly supported by NSFC-61075014 and NSFC-60875016, by the Program for New Century Excellent Talents in University under Grant NCET-08-0458 and NCET-10-0071 and the Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20096102110025 and No. 20106102110028).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shen, Y., Peng, J., Feng, X. et al. Multi-label multi-instance learning with missing object tags. Multimedia Systems 19, 17–36 (2013). https://doi.org/10.1007/s00530-012-0290-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-012-0290-0