research-article

Enhancing label transfer in non-parametric scene parsing by superpixel-based dense alignment

Authors:

Alexy Bhowmick,

Shyamanta M. HazarikaAuthors Info & Claims

ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing

Article No.: 31, Pages 1 - 9

https://doi.org/10.1145/3490035.3490290

Published: 19 December 2021 Publication History

Abstract

Contemporary (parametric) scene parsing methods are learning-based and mostly operate in a closed-universe scenario. We introduce a non-parametric scene parsing framework that is model-free, data-driven, and scales naturally to growing data. The scene parsing performance in the non-parametric approach depends on reliable dense correspondence or alignment across scenes for label transfer. Incorrect correspondence is known to affect the scene parsing results adversely. We propose a label transfer approach that relies on the dense correspondence of super-pixel pairs (in a query and candidate image) matched by a homogeneous kernel map to guide semantic label transfer. The aggregation (fusing) of multiple labels is done through a simple heuristic aggregation scheme (simple majority voting). The Markov Random Field (MRF) provides a principled probabilistic framework for combining the disparate information in the smoothing stage and ensures plausible labeling results. Evaluation results show that our non-parametric system obtains competitive scene parsing performance on the standard SIFT Flow and MSRC-21 datasets.

References

[1]

Parvaneh Aliniya and Parvin Razzaghi. 2018. Parametric and nonparametric context models: A unified approach to scene parsing. Pattern Recognition 84 (Dec. 2018), 165--181.

[2]

Xinyi An, Shuai Li, Hong Qin, and Aimin Hao. 2016. Automatic non-parametric image parsing via hierarchical semantic voting based on sparse-dense reconstruction and spatial-contextual cues. Neurocomputing 201 (Aug. 2016), 92--103.

Digital Library

[3]

Yuri Boykov, Olga Veksler, and Ramin Zabih. 2001. Fast Approximate Energy Minimization via Graph Cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 11 (Nov. 2001), 1222--1239.

Digital Library

[4]

Wonmin Byeon, Thomas M. Breuel, Federico Raue, and Marcus Liwicki. 2015. Scene labeling with LSTM recurrent neural networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3547--3555. ISSN: 1063-6919.

[5]

Xiaowu Chen, Qing Li, Yafei Song, Xin Jin, and Qinping Zhao. 2012. Supervised Geodesic Propagation for Semantic Label Transfer. In Computer Vision - ECCV 2012, Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Number 7574 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, 553--565.

Digital Library

[6]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. 3213--3223.

[7]

Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Häusser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 2758--2766.

Digital Library

[8]

David Eigen and Rob Fergus. 2012. Nonparametric image parsing using adaptive neighbor sets. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012. IEEE Computer Society, 2799--2806.

Digital Library

[9]

Clément Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. 2012. Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress.

Digital Library

[10]

Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2004. Efficient Graph-Based Image Segmentation. Int. J. Comput. Vision 59, 2 (Sept. 2004), 167--181.

Digital Library

[11]

Marian George. 2015. Image parsing with a wide range of classes and scene-level context. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE, 3622--3630.

[12]

Stephen Gould and Xuming He. 2014. Scene Understanding by Labeling Pixels. Commun. ACM 57, 11 (Oct. 2014), 68--77.

Digital Library

[13]

Stephen Gould and Yuhang Zhang. 2012. PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer. In Computer Vision - ECCV 2012, Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Number 7576 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, 439--452.

Digital Library

[14]

James Hays and Alexei A. Efros. 2007. Scene Completion Using Millions of Photographs. In ACM SIGGRAPH 2007 Papers (SIGGRAPH '07). ACM, New York, NY, USA.

Digital Library

[15]

W. Hung, Y. Tsai, X. Shen, Z. Lin, K. Sunkavalli, X. Lu, and M. Yang. 2017. Scene Parsing with Global Context Embedding. In 2017 IEEE International Conference on Computer Vision (ICCV). 2650--2658.

[16]

H. Jegou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. 2012. Aggregating Local Image Descriptors into Compact Codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9 (Sept. 2012), 1704--1716.

Digital Library

[17]

Lazhar Khelifi and Max Mignotte. 2017. Semantic image segmentation using the ICM algorithm. In 2017 IEEE International Conference on Image Processing (ICIP). 3080--3084.

[18]

L. Khelifi and M. Mignotte. 2019. MC-SSM: Nonparametric Semantic Image Segmentation With the ICM Algorithm. IEEE Transactions on Multimedia 21, 8 (Aug. 2019), 1946--1959.

[19]

Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR '06). IEEE Computer Society, Washington, DC, USA, 2169--2178.

Digital Library

[20]

Teng Li, Xinyu Wu, Bingbing Ni, Ke Lu, and Shuicheng Yan. 2015. Weakly-supervised scene parsing with multiple contextual cues. Information Sciences 323 (Dec. 2015), 59--72.

Digital Library

[21]

Xiaowei Li, Changchang Wu, Christopher Zach, Svetlana Lazebnik, and Jan-Michael Frahm. 2008. Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. In Computer Vision - ECCV 2008 (Lecture Notes in Computer Science), David Forsyth, Philip Torr, and Andrew Zisserman (Eds.). Springer Berlin Heidelberg, 427--440.

Digital Library

[22]

Ce Liu, Jenny Yuen, and Antonio Torralba. 2009. Nonparametric scene parsing: Label transfer via dense scene alignment. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA. IEEE Computer Society, 1972--1979.

[23]

Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. Nonparametric Scene Parsing via Label Transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33, 12 (2011), 2368--2382.

Digital Library

[24]

Ce Liu, Jenny Yuen, and Antonio Torralba. 2011. SIFT Flow: Dense Correspondence across Scenes and Its Applications. IEEE Trans. Pattern Anal. Mach. Intell. 33, 5 (2011), 978--994.

Digital Library

[25]

Fayao Liu, Guosheng Lin, Ruizhi Qiao, and Chunhua Shen. 2018. Structured Learning of Tree Potentials in CRF for Image Segmentation. IEEE Transactions on Neural Networks and Learning Systems 29, 6 (June 2018), 2631--2637.

[26]

David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60, 2 (Nov. 2004), 91--110.

Digital Library

[27]

Chih-Hao Ma, Chiou-Ting Hsu, and B. Huet. 2015. Nonparametric scene parsing with deep convolutional features and dense alignment. In 2015 IEEE International Conference on Image Processing (ICIP). 1915--1919.

[28]

Henning Müller, Katharina Grüenberg, Marc André Weber, Oscar Alfonso Jiménez del Toro, Orcun Goksel, Bjöern Menze, Georg Langs, Ivan Eggel, Markus Holzer, Georgios Kontokotsios, Markus Krenn, Roger Schaer, Abdel Aziz Taha, Marianne Winterstein, and Allan Hanbury. 2015. VISCERAL-VISual concept extraction challenge in radiology :. In Proceedings of the 9th European Congress of Radiology (ECR) 2015.

[29]

T.V. Nguyen, Canyi Lu, J. Sepulveda, and Shuicheng Yan. 2015. Adaptive Non-parametric Image Parsing. IEEE Transactions on Circuits and Systems for Video Technology 25, 10 (Oct. 2015), 1565--1575.

Digital Library

[30]

Aude Oliva. 2005. Gist of the scene. In Neurobiology of Attention, L Itti, G Rees, and J.K Tsotsos (Eds.). Elsevier, San Diego, CA, 251--256.

[31]

Aude Oliva and Antonio Torralba. 2001. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Int. J. Comput. Vision 42, 3 (May 2001), 145--175.

Digital Library

[32]

Aude Oliva and Antonio Torralba. 2006. Building the Gist of a scene: the role of global image features in recognition. Progress in Brain Research 155 (2006), 23--36.

[33]

P. O. Pinheiro and R. Collobert. 2015. From image-level to pixel-level labeling with Convolutional Networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1713--1721.

[34]

Parvin Razzaghi and Shadrokh Samavi. 2014. A new fast approach to nonparametric scene parsing. Pattern Recognition Letters 42 (June 2014), 56--64.

[35]

Jerome Revaud, Philippe Weinzaepfel, Zaid Harchaoui, and Cordelia Schmid. 2016. DeepMatching: Hierarchical Deformable Dense Matching. International Journal of Computer Vision 120, 3 (Dec. 2016), 300--323.

Digital Library

[36]

Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, and Andrew Zisserman. 2009. Segmenting Scenes by Matching Image Composites. In Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Curran Associates, Inc., 1580--1588.

Digital Library

[37]

Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2007. LabelMe: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision 77, 1-3 (Oct. 2007), 157--173.

Digital Library

[38]

J. Shotton, M. Johnson, and R. Cipolla. 2008. Semantic texton forests for image categorization and segmentation. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1--8.

[39]

Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In Computer Vision - ECCV 2006, Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Number 3951 in Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1--15.

Digital Library

[40]

Bing Shuai, Gang Wang, Zhen Zuo, Bing Wang, and Lifan Zhao. 2015. Integrating parametric and non-parametric models for scene labeling. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE, 4249--4258.

[41]

J. Sivic and A. Zisserman. 2003. Video Google: a text retrieval approach to object matching in videos. In Proceedings Ninth IEEE International Conference on Computer Vision. 1470--1477 vol.2.

Digital Library

[42]

N. Souly and M. Shah. 2016. Scene Labeling Using Sparse Precision Matrix. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3650--3658.

[43]

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8934--8943.

[44]

Joseph Tighe and Svetlana Lazebnik. 2010. Superparsing: Scalable Nonparametric Image Parsing with Superpixels. In Proceedings of the 11th European Conference on Computer Vision: Part V (ECCV'10). Springer-Verlag, Berlin, Heidelberg, 352--365.

Digital Library

[45]

Joseph Tighe and Svetlana Lazebnik. 2013. Superparsing. Int. J. Comput. Vision 101, 2 (Jan. 2013), 329--349.

Digital Library

[46]

Joseph Tighe and Svetlana Lazebnik. 2013. Towards Open-Universe Image Parsing with Broad Coverage. In Proceedings of the 13. IAPR International Conference on Machine Vision Applications, MVA 2013, Kyoto, Japan, May 20-23, 2013. 13--20.

Digital Library

[47]

Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, and Ming-Hsuan Yang. 2016. Sky is Not the Limit: Semantic-aware Sky Replacement. ACM Trans. Graph. 35, 4 (July 2016), 149:1--149:11.

Digital Library

[48]

Frederick Tung and James J. Little. 2014. CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows. In Computer Vision - ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Number 8694 in Lecture Notes in Computer Science. Springer International Publishing, 511--525.

[49]

Frederick Tung and James J. Little. 2016. Scene parsing by nonparametric label transfer of content-adaptive windows. Computer Vision and Image Understanding 143 (Feb. 2016), 191--200.

Digital Library

[50]

Frederick Tung and James J. Little. 2017. MF3D: Model-free 3D semantic scene parsing. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Singapore, Singapore, 4596--4603.

[51]

Andrea Vedaldi and Brian Fulkerson. 2010. Vlfeat: an open and portable library of computer vision algorithms. In Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25-29, 2010, Alberto Del Bimbo, Shih-Fu Chang, and Arnold W. M. Smeulders (Eds.). ACM, 1469--1472.

Digital Library

[52]

Chaohui Wang, Nikos Komodakis, and Nikos Paragios. 2013. Markov Random Field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding 117, 11 (Nov. 2013), 1610--1627.

Digital Library

[53]

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. 2010. Locality-constrained Linear Coding for image classification. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 3360--3367.

[54]

Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, and Cordelia Schmid. 2013. DeepFlow: Large Displacement Optical Flow with Deep Matching. In 2013 IEEE International Conference on Computer Vision. 1385--1392.

Digital Library

[55]

Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. 2010. SUN database: Large-scale scene recognition from abbey to zoo. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010. IEEE Computer Society, 3485--3492.

[56]

Jimei Yang, Brian L. Price, Scott Cohen, and Ming-Hsuan Yang. 2014. Context Driven Scene Parsing with Attention to Rare Classes. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, 3294--3301.

Digital Library

[57]

Honghui Zhang, Tian Fang, Xiaowu Chen, Qinping Zhao, and Long Quan. 2011. Partial Similarity Based Nonparametric Scene Parsing in Certain Environment. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11). IEEE Computer Society, Washington, DC, USA, 2241--2248.

Digital Library

[58]

Honghui Zhang, Jianxiong Xiao, and Long Quan. 2010. Supervised Label Transfer for Semantic Segmentation of Street Scenes. In Computer Vision - ECCV 2010 - 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V (Lecture Notes in Computer Science, Vol. 6315), Kostas Daniilidis, Petros Maragos, and Nikos Paragios (Eds.). Springer, 561--574.

Digital Library

[59]

Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2018. Semantic Understanding of Scenes Through the ADE20K Dataset. International Journal of Computer Vision (Dec. 2018).

Digital Library

Index Terms

Enhancing label transfer in non-parametric scene parsing by superpixel-based dense alignment
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic inference problems

Recommendations

Superpixel Correspondence for Non-parametric Scene Parsing of Natural Images
Pattern Recognition and Machine Intelligence
Abstract
Scene parsing refers to the task of labeling every pixel in an image with the class label it belongs to. In this paper, we propose a novel scalable non-parametric scene parsing system based on superpixels correspondence. The non-parametric ...
Fusion of 3D-LIDAR and camera data for scene parsing

One geometry segmentation algorithm is proposed to parse scanner pointclouds.One efficient multilayer perception classifier is trained to parse camera images.We propose one fuzzy logic based fusion method to integrate results of two sensors.We propose ...
Nonparametric Scene Parsing via Label Transfer

While there has been a lot of recent work on object recognition and image understanding, the focus has been on carefully establishing mathematical models for images, scenes, and objects. In this paper, we propose a novel, nonparametric approach for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing

December 2021

428 pages

ISBN:9781450375962

DOI:10.1145/3490035

General Chairs:
Rama Chellappa
Johns Hopkins University
,
Santanu Chaudhury
IIT Jodhpur
,
Program Chairs:
Chetan Arora
IIT Delhi
,
Parag Chaudhuri
IIT Bombay
,
Subhransu Maji
University of Massachusetts, Amherst

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICVGIP '21

ICVGIP '21: Indian Conference on Computer Vision, Graphics and Image Processing

December 19 - 22, 2021

Jodhpur, India

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
25
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten