Skip to main content
Log in

Semantic segmentation using reinforced fully convolutional densenet with multiscale kernel

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, semantic segmentation has become one of the most active tasks of the computer vision field. Its goal is to group image pixels into semantically meaningful regions. Deep learning methods, in particular those who use convolutional neural network (CNN), have shown a big success for the semantic segmentation task. In this paper, we will introduce a semantic segmentation system using a reinforced fully convolutional densenet with multiscale kernel prediction method. Our main contribution is to build an encoder-decoder based architecture where we increase the width of dense block in the encoder part by conducting recurrent connections inside the dense block. The resulting network structure is called wider dense block where each dense block takes not only the output of the previous layer but also the initial input of the dense block. These recurrent structure emulates the human brain system and helps to strengthen the extraction of the target features. As a result, our network becomes deeper and wider with no additional parameters used because of weights sharing. Moreover, a multiscale convolutional layer has been conducted after the last dense block of the decoder part to perform model averaging over different spatial scales and to provide a more flexible method. This proposed method has been evaluated on two semantic segmentation benchmarks: CamVid and Cityscapes. Our method outperforms many recent works from the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Ghemawat S Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Publicly available at: https://tensorflow.org

  2. Alhaija H-A, Mustikovela S-K, Mescheder L, Geiger A, Rother C (2017) Augmented reality meets deep learning for car instance segmentation in urban scenes. In: British machine vision conference, vol 3

  3. Audebert N, Le Saux B, Lefevre S (2016) Semantic segmentation of earth observation data using multimodal and multi-scale deep networks. In: Asian conference on computer vision, pp 180–196

  4. Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561,2015

  5. Batenburg K-J, Sijbers J (2009) Adaptive thresholding of tomograms by projection distance minimization. Pattern Recogn 42(10):2297–2305

    Article  MATH  Google Scholar 

  6. Batenburg K-J, Sijbers J (2009) Optimal threshold selection for tomogram segmentation by projection distance minimization. IEEE Trans Med Imaging 28 (5):676–686

    Article  Google Scholar 

  7. Ben Ahmed O, Benois-Pineau J, Allard M, Ben Amar C, Catheline G (2014) Classification of Alzheimer’s disease subjects from MRI using hippocampal visual features. Multimedia Tools and Applications 74(4):1249–1266

    Article  Google Scholar 

  8. Ben Aoun N, Elarbi M, Ben Amar C (2010) Multiresolution motion estimation and compensation for video coding. In: ICSP, pp 1121–1124

  9. Ben Aoun N, Elghazel H, Ben Amar C (2011) Graph modeling based video event detection. In: IIT, pp 114–117

  10. Ben Aoun N, Elghazel H, Hacid M-S, Ben Amar C (2011) Graph aggregation based image modeling and indexing for video annotation. In: CAIP, pp 324–331

  11. Ben Aoun N, Elarbi M, Ben Amar C (2012) Wavelet transform based motion estimation and compensation for video coding. Advances in Wavelet Theory and Their Applications in Engineering, Physics and Technology, Dr. Dumitru Baleanu (Ed.), 23–40

  12. Ben Aoun N, Mejdoub M, Ben Amar C (2014) Graph-based approach for human action recognition using spatio-temporal features. J Vis Commun Image Represent 25 (2):329–338

    Article  Google Scholar 

  13. Ben Aoun N, Mejdoub M, Ben Amar C (2014) Graph-based video event recognition. In: ICASSP, pp 1566–1570

  14. Brahimi S, Ben Aoun N, Ben Amar C (2018) Boosted convolutional neural network for object recognition at large scale. NeuroComputing 330:337–354

    Article  Google Scholar 

  15. Brahimi S, Ben Aoun N, Ben amar c (2016) Improved very deep recurrent convolutional neural network for object recognition. In: SMC, pp 2497–2502

  16. Brahimi S, Ben Aoun N, Ben amar c (2016) Very deep recurrent convolutional neural network for object recognition. In: ICMV

  17. Brahimi S, Ben Aoun N, Ben Amar C, Benoit A, Lambert P (2018) Multiscale fully convolutional densenet for semantic segmentation. In: International conference on computer graphics, visualization and computer vision

  18. Brostow G-J, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88–97

    Article  Google Scholar 

  19. Chen B-K, Gong C, Yang J (2017) Importance-aware semantic segmentation for autonomous driving system. In: Proceedings of the international joint conference on artificial intelligence, pp 1504–1510

  20. Chen L-C, Barron J-T, Papandreou G, Murphy K, Yuille A-L (2016) Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In: IEEE conference on computer vision and pattern recognition, pp 4545–4554

  21. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille A-L (2014) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915

  22. Cordts M, Omran M, Ramos S, Scharwachter T, Enzweiler T, Benenson R, Franke U, Roth S, Schiele B (2015) The cityscapes dataset. In: CVPR workshop on the future of datasets in vision

  23. Dinarelli M, Tellier I (2016) Improving recurrent neural networks for sequence labelling. arXiv:1606.02555

  24. Boughrara H, Chtourou M, Ben Amar C (2012) MLP neural network based face recognition system using constructive training algorithm. In: International conference on multimedia computing and systems (ICMCS), pp 233–238

  25. El’Arbi M, Ben Amar C, Nicolas H (2006) Video watermarking based on neural networks. In: IEEE international conference on multimedia and expo (ICME), pp 1577–1580

  26. Fabijanska A, Goclawski J (2014) New accelerated graph-based method of image segmentation applying minimum spanning tree. IET Image Process 8(4):239–251

    Article  Google Scholar 

  27. Gao H, Zhuang L, Kilian Q-W (2016) Densely connected convolutional networks. arXiv:1608.06993v3

  28. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. arXiv:1704.06857

  29. Guedri B, Zaied M, Ben Amar C (2011) Indexing and images retrieval by content. In: International conference on high performance computing and simulation (HPCS), pp 369–375

  30. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  31. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  32. Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017, July) The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Computer vision and pattern recognition workshops (CVPRW), pp 1175–1183

  33. Kayalibay B, Jensen G, Smagt P (2017) CNN-based segmentation of medical imaging data. arXiv:1701.03056v2

  34. Kendall A, Badrinarayanan V, Cipolla R (2015) Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680

  35. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105

  36. Lai S, Xu L, Liu K, Zhao J (2015, January) Recurrent convolutional neural networks for text classification. In: AAAI, vol 333, pp 2267–2273

  37. Lin J, Wang W-J, Huang S-K, Chen H-C (2017) Learning based semantic segmentation for robot navigation in outdoor environment. In: Fuzzy systems association and 9th international conference on soft computing and intelligent systems (IFSA-SCIS), pp 1–5

  38. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  39. Mejdoub M, Fonteles L, Ben Amar C, Antonini M (2008) Fast indexing method for image retrieval using tree-structured lattices. In: International workshop on content-based multimedia indexing (CBMI), pp 365–372

  40. Mejdoub M, Ben Aoun N, Ben Amar C (2015) Bag of frequent subgraphs approach for image classification. Intell Data Anal 19(1):75–88

    Article  Google Scholar 

  41. Othmani M, Bellil W, Ben Amar C, Alimi AM (2010) A new structure and training procedure for multi-mother wavelet networks. Int J Wavelets Multiresolution Inf Process 8(1):149–175

    Article  MathSciNet  MATH  Google Scholar 

  42. Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147

  43. Pourian N, Karthikeyan S, Manjunath B-S (2015) Weakly supervised graph based semantic segmentation by learning communities of image-parts. In: Proceedings of the IEEE international conference on computer vision, pp 1359–1367

  44. Qin A-K, Clausi D-A (2010) Multivariate image segmentation using semantic region growing with adaptive edge penalty. IEEE Trans Image Process 19(8):2157–2170

    Article  MathSciNet  MATH  Google Scholar 

  45. Raza S-H, Grundmann M, Essa I (2013) Geometric context from video. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3081–3088

  46. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A-C, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  47. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  48. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9

  49. Tieleman T, Hinton G (2012) rmsprop adaptive learning. In: COURSERA: neural networks for machine learning

  50. Visin F, Ciccone M, Romero A, Kastner K, Cho K, Bengio Y, Matteucci M, Courville A (2016) Reseg: a recurrent neural network-based model for semantic segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR) workshops, pp 426–433

  51. Visin F, Kastner K, Cho K, Matteucci M, Courville A-C, Bengio Y (2015) Renet: a recurrent neural network based alternative to convolutional networks. arXiv:1505.00393v3

  52. Wali A, Ben Aoun N, Karray H, Ben Amar C, Alimi AM (2010) A new system for event detection from video surveillance sequences. In: ACIVS, pp 110–120

  53. Wan J, Wang D, Hoi S-C-H, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: ACM international conference on multimedia, pp 157–166

  54. Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. In: Proceedings of the 2016 ACM on multimedia conference, pp 988–997

  55. Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. In: Tools with artificial intelligence (ICTAI), pp 234–241

  56. Wang C, Yang H, Meinel C (2016) Exploring multimodal video representation for action recognition. In: Neural networks (IJCNN), pp 1924–1931

  57. Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14(2s):40:1–40:20

    Google Scholar 

  58. Wu Z, Shen C, Hengel A-V-D (2016) Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080

  59. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122

  60. Zhang K, Zhang W, Zeng S, Xue X (2014) Semantic segmentation using multiple graphs with Block-Diagonal constraints. In: AAAI, pp 2867–2873

  61. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890

  62. Zou W, Kpalma K, Ronsin J (2012) Semantic segmentation via sparse coding over hierarchical regions. In: Image processing (ICIP), pp 2577–2580

Download references

Acknowledgements

The research leading to these results has received funding from the Ministry of Higher Education and Scientific Research of Tunisia under the grant agreement number LR11ES48. LISTIC experiments have been made possible thanks to the MUST computing center of the University of Savoie Mont Blanc.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Najib Ben Aoun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brahimi, S., Ben Aoun, N., Benoit, A. et al. Semantic segmentation using reinforced fully convolutional densenet with multiscale kernel. Multimed Tools Appl 78, 22077–22098 (2019). https://doi.org/10.1007/s11042-019-7430-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7430-x

Keywords

Navigation