skip to main content
10.1145/3490035.3490290acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Enhancing label transfer in non-parametric scene parsing by superpixel-based dense alignment

Published: 19 December 2021 Publication History

Abstract

Contemporary (parametric) scene parsing methods are learning-based and mostly operate in a closed-universe scenario. We introduce a non-parametric scene parsing framework that is model-free, data-driven, and scales naturally to growing data. The scene parsing performance in the non-parametric approach depends on reliable dense correspondence or alignment across scenes for label transfer. Incorrect correspondence is known to affect the scene parsing results adversely. We propose a label transfer approach that relies on the dense correspondence of super-pixel pairs (in a query and candidate image) matched by a homogeneous kernel map to guide semantic label transfer. The aggregation (fusing) of multiple labels is done through a simple heuristic aggregation scheme (simple majority voting). The Markov Random Field (MRF) provides a principled probabilistic framework for combining the disparate information in the smoothing stage and ensures plausible labeling results. Evaluation results show that our non-parametric system obtains competitive scene parsing performance on the standard SIFT Flow and MSRC-21 datasets.

References

[1]
Parvaneh Aliniya and Parvin Razzaghi. 2018. Parametric and nonparametric context models: A unified approach to scene parsing. Pattern Recognition 84 (Dec. 2018), 165--181.
[2]
Xinyi An, Shuai Li, Hong Qin, and Aimin Hao. 2016. Automatic non-parametric image parsing via hierarchical semantic voting based on sparse-dense reconstruction and spatial-contextual cues. Neurocomputing 201 (Aug. 2016), 92--103.
[3]
Yuri Boykov, Olga Veksler, and Ramin Zabih. 2001. Fast Approximate Energy Minimization via Graph Cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 11 (Nov. 2001), 1222--1239.
[4]
Wonmin Byeon, Thomas M. Breuel, Federico Raue, and Marcus Liwicki. 2015. Scene labeling with LSTM recurrent neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3547--3555. ISSN: 1063-6919.
[5]
Xiaowu Chen, Qing Li, Yafei Song, Xin Jin, and Qinping Zhao. 2012. Supervised Geodesic Propagation for Semantic Label Transfer. In Computer Vision - ECCV 2012, Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Number 7574 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, 553--565.
[6]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. 3213--3223.
[7]
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Häusser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 2758--2766.
[8]
David Eigen and Rob Fergus. 2012. Nonparametric image parsing using adaptive neighbor sets. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012. IEEE Computer Society, 2799--2806.
[9]
Clément Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. 2012. Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress.
[10]
Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2004. Efficient Graph-Based Image Segmentation. Int. J. Comput. Vision 59, 2 (Sept. 2004), 167--181.
[11]
Marian George. 2015. Image parsing with a wide range of classes and scene-level context. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE, 3622--3630.
[12]
Stephen Gould and Xuming He. 2014. Scene Understanding by Labeling Pixels. Commun. ACM 57, 11 (Oct. 2014), 68--77.
[13]
Stephen Gould and Yuhang Zhang. 2012. PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer. In Computer Vision - ECCV 2012, Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Number 7576 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, 439--452.
[14]
James Hays and Alexei A. Efros. 2007. Scene Completion Using Millions of Photographs. In ACM SIGGRAPH 2007 Papers (SIGGRAPH '07). ACM, New York, NY, USA.
[15]
W. Hung, Y. Tsai, X. Shen, Z. Lin, K. Sunkavalli, X. Lu, and M. Yang. 2017. Scene Parsing with Global Context Embedding. In 2017 IEEE International Conference on Computer Vision (ICCV). 2650--2658.
[16]
H. Jegou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. 2012. Aggregating Local Image Descriptors into Compact Codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9 (Sept. 2012), 1704--1716.
[17]
Lazhar Khelifi and Max Mignotte. 2017. Semantic image segmentation using the ICM algorithm. In 2017 IEEE International Conference on Image Processing (ICIP). 3080--3084.
[18]
L. Khelifi and M. Mignotte. 2019. MC-SSM: Nonparametric Semantic Image Segmentation With the ICM Algorithm. IEEE Transactions on Multimedia 21, 8 (Aug. 2019), 1946--1959.
[19]
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR '06). IEEE Computer Society, Washington, DC, USA, 2169--2178.
[20]
Teng Li, Xinyu Wu, Bingbing Ni, Ke Lu, and Shuicheng Yan. 2015. Weakly-supervised scene parsing with multiple contextual cues. Information Sciences 323 (Dec. 2015), 59--72.
[21]
Xiaowei Li, Changchang Wu, Christopher Zach, Svetlana Lazebnik, and Jan-Michael Frahm. 2008. Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. In Computer Vision - ECCV 2008 (Lecture Notes in Computer Science), David Forsyth, Philip Torr, and Andrew Zisserman (Eds.). Springer Berlin Heidelberg, 427--440.
[22]
Ce Liu, Jenny Yuen, and Antonio Torralba. 2009. Nonparametric scene parsing: Label transfer via dense scene alignment. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA. IEEE Computer Society, 1972--1979.
[23]
Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. Nonparametric Scene Parsing via Label Transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2368--2382.
[24]
Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. SIFT Flow: Dense Correspondence across Scenes and Its Applications. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (2011), 978--994.
[25]
Fayao Liu, Guosheng Lin, Ruizhi Qiao, and Chunhua Shen. 2018. Structured Learning of Tree Potentials in CRF for Image Segmentation. IEEE Transactions on Neural Networks and Learning Systems 29, 6 (June 2018), 2631--2637.
[26]
David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60, 2 (Nov. 2004), 91--110.
[27]
Chih-Hao Ma, Chiou-Ting Hsu, and B. Huet. 2015. Nonparametric scene parsing with deep convolutional features and dense alignment. In 2015 IEEE International Conference on Image Processing (ICIP). 1915--1919.
[28]
Henning Müller, Katharina Grüenberg, Marc André Weber, Oscar Alfonso Jiménez del Toro, Orcun Goksel, Bjöern Menze, Georg Langs, Ivan Eggel, Markus Holzer, Georgios Kontokotsios, Markus Krenn, Roger Schaer, Abdel Aziz Taha, Marianne Winterstein, and Allan Hanbury. 2015. VISCERAL-VISual concept extraction challenge in radiology :. In Proceedings of the 9th European Congress of Radiology (ECR) 2015.
[29]
T.V. Nguyen, Canyi Lu, J. Sepulveda, and Shuicheng Yan. 2015. Adaptive Non-parametric Image Parsing. IEEE Transactions on Circuits and Systems for Video Technology 25, 10 (Oct. 2015), 1565--1575.
[30]
Aude Oliva. 2005. Gist of the scene. In Neurobiology of Attention, L Itti, G Rees, and J.K Tsotsos (Eds.). Elsevier, San Diego, CA, 251--256.
[31]
Aude Oliva and Antonio Torralba. 2001. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Int. J. Comput. Vision 42, 3 (May 2001), 145--175.
[32]
Aude Oliva and Antonio Torralba. 2006. Building the Gist of a scene: the role of global image features in recognition. Progress in Brain Research 155 (2006), 23--36.
[33]
P. O. Pinheiro and R. Collobert. 2015. From image-level to pixel-level labeling with Convolutional Networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1713--1721.
[34]
Parvin Razzaghi and Shadrokh Samavi. 2014. A new fast approach to nonparametric scene parsing. Pattern Recognition Letters 42 (June 2014), 56--64.
[35]
Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, and Cordelia Schmid. 2016. DeepMatching: Hierarchical Deformable Dense Matching. International Journal of Computer Vision 120, 3 (Dec. 2016), 300--323.
[36]
Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, and Andrew Zisserman. 2009. Segmenting Scenes by Matching Image Composites. In Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Curran Associates, Inc., 1580--1588.
[37]
Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2007. LabelMe: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision 77, 1-3 (Oct. 2007), 157--173.
[38]
J. Shotton, M. Johnson, and R. Cipolla. 2008. Semantic texton forests for image categorization and segmentation. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1--8.
[39]
Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In Computer Vision - ECCV 2006, Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Number 3951 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1--15.
[40]
Bing Shuai, Gang Wang, Zhen Zuo, Bing Wang, and Lifan Zhao. 2015. Integrating parametric and non-parametric models for scene labeling. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE, 4249--4258.
[41]
J. Sivic and A. Zisserman. 2003. Video Google: a text retrieval approach to object matching in videos. In Proceedings Ninth IEEE International Conference on Computer Vision. 1470--1477 vol.2.
[42]
N. Souly and M. Shah. 2016. Scene Labeling Using Sparse Precision Matrix. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3650--3658.
[43]
Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8934--8943.
[44]
Joseph Tighe and Svetlana Lazebnik. 2010. Superparsing: Scalable Nonparametric Image Parsing with Superpixels. In Proceedings of the 11th European Conference on Computer Vision: Part V (ECCV'10). Springer-Verlag, Berlin, Heidelberg, 352--365.
[45]
Joseph Tighe and Svetlana Lazebnik. 2013. Superparsing. Int. J. Comput. Vision 101, 2 (Jan. 2013), 329--349.
[46]
Joseph Tighe and Svetlana Lazebnik. 2013. Towards Open-Universe Image Parsing with Broad Coverage. In Proceedings of the 13. IAPR International Conference on Machine Vision Applications, MVA 2013, Kyoto, Japan, May 20-23, 2013. 13--20.
[47]
Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, and Ming-Hsuan Yang. 2016. Sky is Not the Limit: Semantic-aware Sky Replacement. ACM Trans. Graph. 35, 4 (July 2016), 149:1--149:11.
[48]
Frederick Tung and James J. Little. 2014. CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows. In Computer Vision - ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Number 8694 in Lecture Notes in Computer Science. Springer International Publishing, 511--525.
[49]
Frederick Tung and James J. Little. 2016. Scene parsing by nonparametric label transfer of content-adaptive windows. Computer Vision and Image Understanding 143 (Feb. 2016), 191--200.
[50]
Frederick Tung and James J. Little. 2017. MF3D: Model-free 3D semantic scene parsing. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Singapore, Singapore, 4596--4603.
[51]
Andrea Vedaldi and Brian Fulkerson. 2010. Vlfeat: an open and portable library of computer vision algorithms. In Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25-29, 2010, Alberto Del Bimbo, Shih-Fu Chang, and Arnold W. M. Smeulders (Eds.). ACM, 1469--1472.
[52]
Chaohui Wang, Nikos Komodakis, and Nikos Paragios. 2013. Markov Random Field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding 117, 11 (Nov. 2013), 1610--1627.
[53]
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. 2010. Locality-constrained Linear Coding for image classification. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 3360--3367.
[54]
Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, and Cordelia Schmid. 2013. DeepFlow: Large Displacement Optical Flow with Deep Matching. In 2013 IEEE International Conference on Computer Vision. 1385--1392.
[55]
Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. 2010. SUN database: Large-scale scene recognition from abbey to zoo. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010. IEEE Computer Society, 3485--3492.
[56]
Jimei Yang, Brian L. Price, Scott Cohen, and Ming-Hsuan Yang. 2014. Context Driven Scene Parsing with Attention to Rare Classes. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, 3294--3301.
[57]
Honghui Zhang, Tian Fang, Xiaowu Chen, Qinping Zhao, and Long Quan. 2011. Partial Similarity Based Nonparametric Scene Parsing in Certain Environment. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11). IEEE Computer Society, Washington, DC, USA, 2241--2248.
[58]
Honghui Zhang, Jianxiong Xiao, and Long Quan. 2010. Supervised Label Transfer for Semantic Segmentation of Street Scenes. In Computer Vision - ECCV 2010 - 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 6315), Kostas Daniilidis, Petros Maragos, and Nikos Paragios (Eds.). Springer, 561--574.
[59]
Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2018. Semantic Understanding of Scenes Through the ADE20K Dataset. International Journal of Computer Vision (Dec. 2018).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing
December 2021
428 pages
ISBN:9781450375962
DOI:10.1145/3490035
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MRF
  2. data-driven
  3. homogeneous kernels
  4. non-parametric
  5. scene parsing

Qualifiers

  • Research-article

Conference

ICVGIP '21

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 25
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media