Local Context Embedding Neural Network for Scene Semantic Segmentation

Li, Junxia; Dai, Lingzheng; Ding, Yu; Liu, Qingshan

doi:10.1007/978-3-030-31723-2_30

Junxia Li¹⁶,
Lingzheng Dai¹⁷,
Yu Ding¹⁶ &
…
Qingshan Liu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11858))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2454 Accesses

Abstract

This paper presents a novel and effective architecture for scene semantic segmentation, named Local Context Embedding (LCE) network. Unlike previous work, in this paper we characterize local context by exploiting the content of image patches to improve the discrimination of features. Specifically, LCE passes spatially varying contextual information both horizontally and vertically across each small patch derived from fully convolutional feature maps, through the use of Long Short-Term Memory (LSTM) network. Using the sequences of local patches from different directions can extensively characterize the spatial context. Therefore, this embedding based network enables us to utilize more meaningful information for segmentation in an end-to-end fashion. Comprehensive evaluations on CamVid and SUN RGB-D datasets well demonstrate the effectiveness and robustness of our proposed architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)
Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2481–2495 (2017)
Article Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: ECCV, pp. 325–341 (2018)
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv: 1606.02147 (2016)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In CVPR, pp. 2874–2883 (2016)
Google Scholar
Byeon, W., Breuel, T.M., Raue, F., Liwicki, M.: Scene labeling with LSTM recurrent neural networks. In: CVPR, pp. 3547–3555 (2015)
Google Scholar
Kalchbrenner, N., Danihelka, I., Graves, A.: Grid long short-term memory. Comput. Sci. (2016)
Google Scholar
Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv: 1601.06759 (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014)
Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Liu, Q., Zhou, F., Huang, R., Yuan, X.-T.: Bidirectional-convolutional LSTM based spectral-spatial feature learning for hyperspectral image classification. Remote Sens. 9(12), 1330 (2017)
Article Google Scholar
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NIPS (2009)
Google Scholar
Brostow, G., Fauqueur, J., Cipollal, R.: Semantic object classed in video: a high-definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)
Article Google Scholar
Song, S., Lichtenberg, S., Xiao, J.: SUN RGB-D: A RGB-D scene understanding benchmark suite. In: CVPR, pp. 567–576 (2015)
Google Scholar
Everingham, M., Gool, L.V., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv: 1408.5093 (2014)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT 2010, pp. 177–186. Springer, Cham (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Chapter Google Scholar
Yang, Y., Li, Z., Zhang, L., Murphy, C., Ver Hoeve, J., Jiang, H.: Local label descriptor for example based semantic image labeling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 361–375. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_27
Chapter Google Scholar
Tighe, J., Lazebnik, S.: SuperParsing: scalable nonparametric image parsing with superpixels. Int. J. Comput. Vis. 101(2), 329–349 (2013)
Article MathSciNet Google Scholar
Bulo, S.R., Kontschieder, P.: Neural decision forests for semantic image labelling. In: CVPR, pp. 81–88 (2014)
Google Scholar
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? Combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_31
Chapter Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. CoRR, vol.abs/1505.07293 (2015)
Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. 2nd edn, New York (2006)
Google Scholar
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88690-7_3
Chapter Google Scholar
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: CVPR, pp. 2759–2766 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556 (2014)

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant Numbers 61702272, 61773219, 61771249 and 61802199, the Startup Foundation for Introducing Talent of NUIST (2243141701034, 2243141701023), and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (17KJB535002).

Author information

Authors and Affiliations

B-DAT and CICAEET, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Junxia Li, Yu Ding & Qingshan Liu
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Lingzheng Dai

Authors

Junxia Li
View author publications
You can also search for this author in PubMed Google Scholar
Lingzheng Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yu Ding
View author publications
You can also search for this author in PubMed Google Scholar
Qingshan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junxia Li .

Editor information

Editors and Affiliations

School of EECS, Peking University, Beijing, China
Zhouchen Lin
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
Xidian University, Xi'an, China
Guangming Shi
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Institute of Artificial Intelligence, Xi'an Jiaotong University, Xi'an, China
Nanning Zheng
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Northwestern Polytechnical University, Xi'an, China
Yanning Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Dai, L., Ding, Y., Liu, Q. (2019). Local Context Embedding Neural Network for Scene Semantic Segmentation. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-31723-2_30
Published: 31 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics