Joint Visual Phrase Detection to Boost Scene Parsing

Tang, Keke; Zhao, Zhe; Chen, Xiaoping

doi:10.1007/978-3-319-27863-6_36

Keke Tang²⁵,
Zhe Zhao²⁵ &
Xiaoping Chen²⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9475))

Included in the following conference series:

International Symposium on Visual Computing

1876 Accesses

Abstract

Scene parsing is a very challenging problem which attracts increasing interests in many fields such as computer vision and robotics. However, occluded or small objects which are difficult to parse are always ignored. To deal with these two problems, we integrate visual phrase into our joint system, which has been proved to have good performance on describing relationships between objects. In this paper, we propose a joint model which integrates scene classification, object and visual phrase detection, as well as scene parsing together. By encoding them into a Conditional Random Field model, all tasks mentioned above could be solved jointly. We evaluate our method on the MSRC-21 dataset. The experimental results demonstrate that our method achieves comparable and on some occasions even superior performance with respect to state-of-the-art joint methods especially when there exist partially occluded or small objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: Label transfer via dense scene alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1972–1979. IEEE (2009)
Google Scholar
Tighe, J., Lazebnik, S.: SuperParsing: scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Chapter Google Scholar
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)
Chapter Google Scholar
Ren, X., Bo, L., Fox, D.: Rgb-(d) scene labeling: Features and algorithms. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766. IEEE (2012)
Google Scholar
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 2–23 (2009)
Article Google Scholar
Tighe, J., Lazebnik, S.: Finding things: Image parsing with regions and per-exemplar detectors. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3001–3008. IEEE (2013)
Google Scholar
Yao, J., Fidler, S., Urtasun, R.: Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 702–709. IEEE (2012)
Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3485–3492. IEEE (2010)
Google Scholar
Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008)
Chapter Google Scholar
Yang, J., Price, B., Cohen, S., Yang, M.H.: Context driven scene parsing with attention to rare classes. In: Proceedings of the CVPR (2014)
Google Scholar
Sadeghi, M.A., Farhadi, A.: Recognition using visual phrases. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1745–1752. IEEE (2011)
Google Scholar
Li, C., Parikh, D., Chen, T.: Automatic discovery of groups of objects for scene understanding. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2735–2742. IEEE (2012)
Google Scholar
Sadovnik, A., Chen, T.: Hierarchical object groups for scene classification. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 1881–1884. IEEE (2012)
Google Scholar
Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Understanding indoor scenes using 3d geometric phrases. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 33–40. IEEE (2013)
Google Scholar
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: ICCV (2011)
Google Scholar
Hazan, T., Urtasun, R.: A primal-dual message-passing algorithm for approximated large scale structured prediction. In: Advances in Neural Information Processing Systems, pp. 838–846 (2010)
Google Scholar
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 898–916 (2011)
Article Google Scholar
Kohli, P., Kumar, M.P., Torr, P.H.S.: P3 and beyond: Solving energies with higher order cliques. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Distributed Message passing for large scale graphical models. In: Proceedings of the CVPR (2011)
Google Scholar
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp. 109–117 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Keke Tang, Zhe Zhao & Xiaoping Chen

Authors

Keke Tang
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keke Tang .

Editor information

Editors and Affiliations

University of Nevada, Reno, Nevada, USA
George Bebis
NASA Ames Research Center, Moffett Field, California, USA
Richard Boyle
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Bahram Parvin
Desert Research Institute, Reno, Nevada, USA
Darko Koracin
University of Houston, Houston, Texas, USA
Ioannis Pavlidis
IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Rogerio Feris
Purdue University, West Lafayette, Indiana, USA
Tim McGraw
Side Effects Software, Santa Monica, California, USA
Mark Elendt
The DiVE, Durham, North Carolina, USA
Regis Kopper
Texas A&M University, College Station, Texas, USA
Eric Ragan
Kent State University, Kent, Ohio, USA
Zhao Ye
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Gunther Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, K., Zhao, Z., Chen, X. (2015). Joint Visual Phrase Detection to Boost Scene Parsing. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2015. Lecture Notes in Computer Science(), vol 9475. Springer, Cham. https://doi.org/10.1007/978-3-319-27863-6_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-27863-6_36
Published: 18 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27862-9
Online ISBN: 978-3-319-27863-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics