BVTNet: Multi-label Multi-class Fusion of Visible and Thermal Camera for Free Space and Pedestrian Segmentation

John, Vijay; Boyali, Ali; Thompson, Simon; Mita, Seiichi

doi:10.1007/978-3-030-68780-9_24

Vijay John¹⁶,
Ali Boyali¹⁷,
Simon Thompson¹⁷ &
…
Seiichi Mita¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12666))

Included in the following conference series:

International Conference on Pattern Recognition

2303 Accesses
5 Citations

Abstract

Deep learning-based visible camera semantic segmentation report state-of-the-art segmentation accuracy. However, this approach is limited by the visible camera’s susceptibility to varying illumination and environmental conditions. One approach to address this limitation is visible and thermal camera-based sensor fusion. Existing literature utilizes this sensor fusion approach for object segmentation, but the approach’s application to free space segmentation has not been reported. Here, a multi-label multi-class visible-thermal camera learning framework, termed as the BVTNet, is proposed for the semantic segmentation of pedestrians and the free space. The BVTNet estimates the pedestrians and free space in an individual multi-class output branch. Additionally, the network also separately estimates the free space and pedestrian boundaries in another multi-class output branch. The boundary semantic segmentation is integrated within the full semantic segmentation framework in a post-processing step. The proposed framework is validated on the public MFNet dataset. A comparative analysis with baseline algorithms and ablation studies with BVTNet variants show that the proposed framework report state-of-the-art segmentation accuracy in real-time in challenging environmental conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. In: CVPR (2015)
Google Scholar
Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing Using MATLAB. Prentice-Hall Inc., USA (2003)
Google Scholar
Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., Harada, T.: MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5108–5115 (2017)
Google Scholar
Haralick, R.M., Shapiro, L.G.: Computer and Robot Vision, 1st edn. Addison-Wesley Longman Publishing Co. Inc, Boston (1992)
Google Scholar
Hazirbas, C., Ma, L., Domokos, C., Cremers, D.: FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10111, pp. 213–228. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54181-5_14
Chapter Google Scholar
John, V., Guo, C., Mita, S., Kidono, K., Guo, C., Ishimaru, K.: Fast road scene segmentation using deep learning and scene-based models. In: ICPR (2016)
Google Scholar
John, V., Mita, S.: RVNet: deep sensor fusion of monocular camera and radar for image-based obstacle detection in challenging Environments. In: Lee, C., Su, Z., Sugimoto, A. (eds.) PSIVT 2019. LNCS, vol. 11854, pp. 351–364. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34879-3_27
Chapter Google Scholar
John, V., Mita, S., Liu, Z., Qi, B.: Pedestrian detection in thermal images using adaptive fuzzy c-means clustering and convolutional neural networks. In: 14th IAPR International Conference on Machine Vision Applications, pp. 246–249 (2015)
Google Scholar
John, V., et al.: Sensor fusion of intensity and depth cues using the ChiNet for semantic segmentation of road scenes. In: IEEE Intelligent Vehicles Symposium, pp. 585–590 (2018)
Google Scholar
John, V., Nithilan, M.K., Mita, S., Tehrani, H., Sudheesh, R.S., Lalu, P.P.: SO-Net: joint semantic segmentation and obstacle detection using deep fusion of monocular camera and radar. In: Dabrowski, J.J., Rahman, A., Paul, M. (eds.) PSIVT 2019. LNCS, vol. 11994, pp. 138–148. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39770-8_11
Chapter Google Scholar
Liu, Q., Zhuang, J., Ma, J.: Robust and fast pedestrian detection method for far-infrared automotive driving assistance systems. Infrared Phys. Technol. 60, 288–299 (2013)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, November 2015
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. CoRR abs/1505.04366 (2015)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. CoRR abs/1606.02147 (2016)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sun, Y., Zuo, W., Liu, M.: RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics Autom. Lett. 4(3), 2576–2583 (2019)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. CoRR abs/1612.01105 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Toyota Technological Institute, Nagoya, Japan
Vijay John & Seiichi Mita
Tier IV, Tokyo, Japan
Ali Boyali & Simon Thompson

Authors

Vijay John
View author publications
You can also search for this author in PubMed Google Scholar
Ali Boyali
View author publications
You can also search for this author in PubMed Google Scholar
Simon Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Seiichi Mita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vijay John .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

John, V., Boyali, A., Thompson, S., Mita, S. (2021). BVTNet: Multi-label Multi-class Fusion of Visible and Thermal Camera for Free Space and Pedestrian Segmentation. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-68780-9_24
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68779-3
Online ISBN: 978-3-030-68780-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)