Abstract
This paper presents a novel and effective architecture for scene semantic segmentation, named Local Context Embedding (LCE) network. Unlike previous work, in this paper we characterize local context by exploiting the content of image patches to improve the discrimination of features. Specifically, LCE passes spatially varying contextual information both horizontally and vertically across each small patch derived from fully convolutional feature maps, through the use of Long Short-Term Memory (LSTM) network. Using the sequences of local patches from different directions can extensively characterize the spatial context. Therefore, this embedding based network enables us to utilize more meaningful information for segmentation in an end-to-end fashion. Comprehensive evaluations on CamVid and SUN RGB-D datasets well demonstrate the effectiveness and robustness of our proposed architecture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2481–2495 (2017)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: ECCV, pp. 325–341 (2018)
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv: 1606.02147 (2016)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In CVPR, pp. 2874–2883 (2016)
Byeon, W., Breuel, T.M., Raue, F., Liwicki, M.: Scene labeling with LSTM recurrent neural networks. In: CVPR, pp. 3547–3555 (2015)
Kalchbrenner, N., Danihelka, I., Graves, A.: Grid long short-term memory. Comput. Sci. (2016)
Oord, A.V.D., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv: 1601.06759 (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Liu, Q., Zhou, F., Huang, R., Yuan, X.-T.: Bidirectional-convolutional LSTM based spectral-spatial feature learning for hyperspectral image classification. Remote Sens. 9(12), 1330 (2017)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NIPS (2009)
Brostow, G., Fauqueur, J., Cipollal, R.: Semantic object classed in video: a high-definition ground truth database. Pattern Recogn. Lett. 30(2), 88–97 (2009)
Song, S., Lichtenberg, S., Xiao, J.: SUN RGB-D: A RGB-D scene understanding benchmark suite. In: CVPR, pp. 567–576 (2015)
Everingham, M., Gool, L.V., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv: 1408.5093 (2014)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT 2010, pp. 177–186. Springer, Cham (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Yang, Y., Li, Z., Zhang, L., Murphy, C., Ver Hoeve, J., Jiang, H.: Local label descriptor for example based semantic image labeling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 361–375. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_27
Tighe, J., Lazebnik, S.: SuperParsing: scalable nonparametric image parsing with superpixels. Int. J. Comput. Vis. 101(2), 329–349 (2013)
Bulo, S.R., Kontschieder, P.: Neural decision forests for semantic image labelling. In: CVPR, pp. 81–88 (2014)
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? Combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_31
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. CoRR, vol.abs/1505.07293 (2015)
Nocedal, J., Wright, S.J.: Numerical Optimization. 2nd edn, New York (2006)
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88690-7_3
Ren, X., Bo, L., Fox, D.: RGB-(D) scene labeling: features and algorithms. In: CVPR, pp. 2759–2766 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556 (2014)
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant Numbers 61702272, 61773219, 61771249 and 61802199, the Startup Foundation for Introducing Talent of NUIST (2243141701034, 2243141701023), and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (17KJB535002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, J., Dai, L., Ding, Y., Liu, Q. (2019). Local Context Embedding Neural Network for Scene Semantic Segmentation. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-31723-2_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)