research-article

Label-to-region with continuity-biased bi-layer sparsity priors

Authors:

Tat-Sheng Chua,

Hai JinAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 8, Issue 4

Article No.: 50, Pages 1 - 23

https://doi.org/10.1145/2379790.2379792

Published: 30 November 2012 Publication History

Abstract

In this work, we investigate how to reassign the fully annotated labels at image level to those contextually derived semantic regions, namely Label-to-Region (L2R), in a collective manner. Given a set of input images with label annotations, the basic idea of our approach to L2R is to first discover the patch correspondence across images, and then propagate the common labels shared in image pairs to these correlated patches. Specially, our approach consists of following aspects. First, each of the input images is encoded as a Bag-of-Hierarchical-Patch (BOP) for capturing the rich cues at variant scales, and the individual patches are expressed by patch-level feature descriptors. Second, we present a sparse representation formulation for discovering how well an image or a semantic region can be robustly reconstructed by all the other image patches from the input image set. The underlying philosophy of our formulation is that an image region can be sparsely reconstructed with the image patches belonging to the other images with common labels, while the robustness in label propagation across images requires that these selected patches come from very few images. This preference of being sparse at both patch and image level is named bi-layer sparsity prior. Meanwhile, we enforce the preference of choosing larger-size patches in reconstruction, referred to as continuity-biased prior in this work, which may further enhance the reliability of L2R assignment. Finally, we harness the reconstruction coefficients to propagate the image labels to the matched patches, and fuse the propagation results over all patches to finalize the L2R task. As a by-product, the proposed continuity-biased bi-layer sparse representation formulation can be naturally applied to perform image annotation on new testing images. Extensive experiments on three public image datasets clearly demonstrate the effectiveness of our proposed framework in both L2R assignment and image annotation.

References

[1]

Bertsekas, D. 1999. Nonlinear Programming. Athena Scientific.

[2]

Candes, E., Romberg, J., and Tao, T. 2006. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Mat. 59, 8, 1207--1223.

[3]

Cao, L. and Li, F. 2007. Spatially coherent latent topic model for concurrent object segmentation and classification. In Proceedings of the IEEE International Conference on Computer Vision. 1--8.

[4]

Chen, Y., Zhu, L., Yuille, A., and Zhang, H. 2008. Unsupervised learning of probabilistic object models (poms) for object classification, segmentation and recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.

[5]

Chua, T., Tang, J., Hong, R., Li, H., Luo, Z., and Zheng, Y. 2009. Nus-wide: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 8--10.

Digital Library

[6]

Comite, F., Gilleron, R., and Tommasi, M. 2003. Learning multi-label altenating decision tree from texts and data. In Proceedings of the Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, vol. 2734. 251--274.

Digital Library

[7]

Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 2, 1--60.

Digital Library

[8]

Elisseef, A. and Weston, J. 2001. A kernel method for multi-labelled classification. In Proceedings of the Advances in Neural Information Processing Systems. 681--687.

[9]

Fan, R., Chen, P., and Lin, C. 2005. Working set selection using the second order information for training svm. J. Mach. Learn. Res. 6, 1889--1918.

Digital Library

[10]

Felzenszwalb, P. and Huttenlocher, D. 2004. Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 2, 167--181.

Digital Library

[11]

Feng, S., Manmatha, R., and Lavrenko, V. 2004. Multiple bernoulli relevance models for image and video annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1002--1009.

Digital Library

[12]

Fergus, R., Lee, F., Perona, P., and Zisserman, A. 2005. Learning object categories from google's image search. In Proceedings of the IEEE International Conference on Computer Vision.

Digital Library

[13]

Forsyth, D. and Fleck, M. 1997. Body plans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 678--683.

Digital Library

[14]

Fu, W. 1998. Penalized regression: The bridge versus the lasso. J. Comput. Graph. Statist. 7, 397--416.

[15]

Galleguillos, C., Rabinovich, A., and Belongie, S. 2008. Object categorization using co-occurrence, location and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.

[16]

Gu, C., Lim, J., Arbelaez, P., and Malik, J. 2009. Recognition using regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1030--1037.

[17]

Haering, N., Myles, Z., and Lobo, N. 1997. Locating dedicuous trees. In Proceedings of the IEEE Workshop on Contentbased Access of Image and Video Libraries. 18--25.

Digital Library

[18]

Jacob, L., Obozinski, G., and Vert, J.-P. 2009. Group lasso with overlap and graph lasso. In Proceedings of the International Conference on Machine Learning.

Digital Library

[19]

Jeon, J., Lavrenko, V., and Manmatha, R. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. 119--126.

Digital Library

[20]

Jin, R., Chai, J., and Si, L. 2004. Effective automatic image annotation via a coherent language model and active learning. In Proceedings of the ACM International Conference on Multimedia. 892--899.

Digital Library

[21]

Kang, F., Jin, R., and Sukthankar, R. 2006. Correlated label propagation with application to multi-label learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1719--1726.

Digital Library

[22]

Lavrenko, V., Manmatha, R., and Jeon, J. 2004. A model for learning the semantics of pictures. In Proceedings of the Advances in Neural Information Processing Systems. 553--560.

[23]

Leibe, B., Leonardis, A., and Schiele, B. 2004. Combined object categorization and segmentation with an implicit shape model. In Proceedings of the ECCV Workshop on Statistical Learning in Computer Vision. 17--32.

[24]

Li, L., Socher, R., and Li, F. 2009. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2036--2043.

[25]

Liu, C., Yuen, J., and Torralba, A. 2010a. Sift flow: Dense correspondence across scenes and its applications. Pattern Anal. Mach. Intell. 99, 1--14.

Digital Library

[26]

Liu, J., Wang, B., Li, M., Li, Z., Ma, W., Lu, H., and Ma, S. 2007. Dual cross-media relevance model for image annotation. In Proceedings of the ACM International Conference on Multimedia. 605--614.

Digital Library

[27]

Liu, X., Cheng, B., Yan, S., Tang, J., Chua, T., and Jin, H. 2009. Label to region by bi-layer sparsity priors. In Proceedings of the ACM International Conference on Multimedia. 115--124.

Digital Library

[28]

Liu, X., Feng, J., Yan, S., and Jin, H. 2010b. Image segmentation with patch-pair density priors. In Proceedings of the ACM International Conference on Multimedia.

Digital Library

[29]

Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110.

Digital Library

[30]

Nesterov, Y. 2007. Gradient methods for minimizing composite objective function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[31]

Olshausen, B. and Field, D. 1997. Sparse coding with an overcomplete basis set: A strategy employed by v1&quest; Vis. Res. 37, 23, 3311--3325.

[32]

Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., and Belongie, S. 2007. Objects in context. In Proceedings of the IEEE International Conference on Computer Vision. 1--8.

[33]

Russell, B., Freeman, W., Efros, A., and Zisserman, A. 2006. Using multiple segmentations to discover objects and their extent in image collections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1605--1614.

Digital Library

[34]

Serre, T., Wolf, L., and Poggio, T. 2005. Object recognition with features inspired by visual cortex. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[35]

Shen, Y. and Fan, J. 2010. Leveraging loosely-tagged images and inter-object correlations for tag recommendation. In Proceedings of the ACM Multimedia'10. 5--14.

Digital Library

[36]

Shotton, J., Winn, J., Rother, C., and Criminisi, A. 2006. Textonboost: Joint appearance, shape and context modeling for mulit-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision. 1--15.

Digital Library

[37]

Singhal, A., Luo, J., and Zhu, W. 2003. Probabilistic spatial context models for scene content understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18--20.

Digital Library

[38]

Szummer, M. and Picard, R. 1998. Indoor-outdoor image classification. In Proceedings of the IEEE International Workshop on Content-Based Access to Image and Video Databases. 42--51.

Digital Library

[39]

Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soci. B 58, 267--288.

[40]

Tseng, P. 2008. On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM J. Optimiz.

[41]

Winn, J. and Jojic, N. 2005. Locus: Learning object classes with unsupervised segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 756--773.

Digital Library

[42]

Wright, J., Ganesh, A., Rao, S., Peng, Y., and Ma, Y. 2009a. Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. J. ACM.

[43]

Wright, J., Yang, A., Ganesh, A., Sastry, S., and Ma, Y. 2009b. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2, 210--227.

Digital Library

[44]

Yan, S. and Wang, H. 2009. Semi-supervised learning by sparse representation. In Proceedings of the SIAM International Conference on Data Mining. 792--801.

[45]

Yang, J., Yu, K., Gong, Y., and Huang, T. 2000. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[46]

Yuan, J., Li, J., and Zhang, B. 2007. Exploiting spatial context constraints for automatic image region annotation. In Proceedings of the ACM International Conference on Multimedia. 595--604.

Digital Library

[47]

Yuan, X. and Yan, S. 2010. Visual classification with multi-task joint sparse representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[48]

Zhang, J. 2006. A probabilistic framework for multi-task learning. Tech. rep., CMU-LTI-06-006.

[49]

Zhang, M. and Zhou, Z. 2007. Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn. 40, 7, 2038--2048.

Digital Library

Cited By

Zhang XYang CWang HXu WKuo C(2020)Satisfied-User-Ratio Modeling for Compressed VideoIEEE Transactions on Image Processing10.1109/TIP.2020.296599429(3777-3789)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1109/TIP.2020.2965994
Zhang GGong X(2016)Nonnegative Matrix Cofactorization for Weakly Supervised Image ParsingIEEE Signal Processing Letters10.1109/LSP.2016.261470423:11(1682-1686)Online publication date: Nov-2016
https://doi.org/10.1109/LSP.2016.2614704
Xu XMa J(2016)Weakly supervised image parsing by discriminatively semantic graph propagation2016 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME.2016.7552866(1-6)Online publication date: Jul-2016
https://doi.org/10.1109/ICME.2016.7552866
Show More Cited By

Index Terms

Label-to-region with continuity-biased bi-layer sparsity priors
1. Information systems
  1. Information systems applications

Recommendations

Label to region by bi-layer sparsity priors
MM '09: Proceedings of the 17th ACM international conference on Multimedia

In this work, we investigate how to automatically reassign the manually annotated labels at the image-level to those contextually derived semantic regions. First, we propose a bi-layer sparse coding formulation for uncovering how an image or semantic ...
Automatic Image Annotation Based on Sparse Representation and Multiple Label Learning
ICVRV '12: Proceedings of the 2012 International Conference on Virtual Reality and Visualization

Automatic image annotation has emerged as an important research topic due to its potential application on both image understanding and web image search. Due to the inherent ambiguity of image-label mapping, the annotation task has become a challenge to ...
Image inpainting by patch propagation using patch sparsity

This paper introduces a novel examplar-based inpainting algorithm through investigating the sparsity of natural image patches. Two novel concepts of sparsity at the patch level are proposed for modeling the patch priority and patch representation, which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 8, Issue 4

November 2012

139 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/2379790

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2012

Accepted: 01 June 2011

Revised: 01 May 2011

Received: 01 January 2011

Published in TOMM Volume 8, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Ministry of Science and Technology of the People's Republic of China
CSIDM Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
244
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XYang CWang HXu WKuo C(2020)Satisfied-User-Ratio Modeling for Compressed VideoIEEE Transactions on Image Processing10.1109/TIP.2020.296599429(3777-3789)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1109/TIP.2020.2965994
Zhang GGong X(2016)Nonnegative Matrix Cofactorization for Weakly Supervised Image ParsingIEEE Signal Processing Letters10.1109/LSP.2016.261470423:11(1682-1686)Online publication date: Nov-2016
https://doi.org/10.1109/LSP.2016.2614704
Xu XMa J(2016)Weakly supervised image parsing by discriminatively semantic graph propagation2016 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME.2016.7552866(1-6)Online publication date: Jul-2016
https://doi.org/10.1109/ICME.2016.7552866
Lai BGong X(2016)Saliency Guided Dictionary Learning for Weakly-Supervised Image Parsing2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2016.395(3630-3639)Online publication date: Jun-2016
https://doi.org/10.1109/CVPR.2016.395
Xu XMa JNie L(2016)Weakly supervised image parsing via label propagation over discriminatively semantic graphJournal of Visual Communication and Image Representation10.1016/j.jvcir.2016.08.00540:PB(808-815)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.jvcir.2016.08.005
Li MChen ZTan PSun STan Y(2015)QoE-aware video streaming for SVC over multiuser MIMO-OFDM systemsJournal of Visual Communication and Image Representation10.1016/j.jvcir.2014.10.01126:C(24-36)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.1016/j.jvcir.2014.10.011
Xie WPeng YXiao JHua KRui YSteinmetz RHanjalic ANatsev AZhu W(2014)Weakly-Supervised Image Parsing via Constructing Semantic Graphs and HypergraphsProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654910(277-286)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2647868.2654910

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents