HISNet: a Human Image Segmentation Network aiding bokeh effect generation

Gupta, Shaurya; Vishwakarma, Dinesh Kumar

doi:10.1007/s11042-022-13900-1

HISNet: a Human Image Segmentation Network aiding bokeh effect generation

Published: 19 September 2022

Volume 82, pages 12469–12492, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

212 Accesses
1 Altmetric
Explore all metrics

Abstract

The bokeh effect in photography has gained unquestionable popularity since improvements in smartphone cameras, for this effect brings out the attention of the image onto the subject and enhances the overall quality of the photo. Generally, these effects are applicable via dual-lens cameras for auto-focusing onto the subject. However, smartphones with a single lens rely on software to generate such an effect. This paper proposes a deep learning pipeline to generate depth-aware segmentation maps in human images via segmentation and depth estimation networks. The given paper provides a concatenations-based decoder for segmentation applying and experimenting with features learned through state-of-the-art encoder architectures, further we form an encoding concatenation between two prominent encoders to provide an ensemble model for learning segments. Adding to the effect we use a prominent depth estimation architecture and combine it with our segmentation results to generate dept-aware segmentation maps for achieving photos with more focus on human subjects, where the out-of-focus regions appear to be blurred out. The methodology produces compelling bokeh effects, comparable with shots taken via a dual-lens mobile camera or DSLR. During the experimentations of human segmentation, some benchmark results are reported with our best-considered model. Training on Supervisely Persons dataset achieved an IOU score of 95.88%, whereas training the same network on the EG1800 dataset achieved a state-of-the-art IOU of 96.89%. The final segmentation model thus provided some very closely accurate segmentation maps suitable for our task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

Article Open access 27 October 2023

Semantic segmentation of outdoor panoramic images

Article 14 August 2021

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Data availability

It is declared that the dataset used in this manuscript is obtained from following sources and there are freely available.

The Supervisely Persons Dataset that supports the findings of this study are available in/from supervisely-ecosystem/persons GitHub Repository, [https://github.com/supervisely-ecosystem/persons].

The EG1800 Dataset that supports the findings of this study were extracted from the link provided by the original authors: http://xiaoyongshen.me/webpage_portrait/index.html, in their paper available at DOI: https://doi.org/10.1111/cgf.12814.

The citation of these datasets are given and used ethically.

References

Alhashim I, Wonka P (2018) High quality monocular depth estimation via transfer learning
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Chen J et al (2021) A transfer learning based super-resolution microscopy for biopsy slice images: the joint methods perspective. IEEE/ACM Trans Comput Biol Bioinforma 18(1):103–113. https://doi.org/10.1109/TCBB.2020.2991173
Chen J et al (2021) TransUNet: transformers make strong encoders for medical image segmentation, pp 1–13
Digital single-lens reflex camera - Wikipedia
Fei-Fei L, Deng J, Li K (2010) ImageNet: constructing a large-scale image database. J Vis 9(8):1037. https://doi.org/10.1167/9.8.1037
Article Google Scholar
Feng R et al (2021) ChroNet: A multi-task learning based approach for prediction of multiple chronic diseases. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10482-8
Article Google Scholar
Fu J et al (2019) Dual attention network for scene segmentation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:3141–3149. https://doi.org/10.1109/CVPR.2019.00326
Article Google Scholar
Gao H, Xu K, Cao M, Xiao J, Xu Q, Yin Y (2022) The deep features and attention mechanism-based method to dish healthcare under social IoT systems: an empirical study with a hand-deep local-global net. IEEE Trans Comput Soc Syst 9(1):336–347. https://doi.org/10.1109/TCSS.2021.3102591
Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017, pp 6602–6611. https://doi.org/10.1109/CVPR.2017.699
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition
Howard AG et al (2017) MobileNets: efficient convolutional neural networks for mobile vision applications
Howard A et al (2019) Searching for mobileNetV3. Proc IEEE Int Conf Comput Vis 2019:1314–1324. https://doi.org/10.1109/ICCV.2019.00140
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. https://doi.org/10.1109/CVPR.2017.243
Kirkland EJ (2010) Advanced computing in electron microscopy: second edition. Adv Comput Electron Microsc Second Ed 1–289. https://doi.org/10.1007/978-1-4419-6533-2
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. ACM Int. Conf. Proceeding Ser., pp 145–151. https://doi.org/10.1145/3383972.3383975
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. Proc. – 2016 4th Int. Conf. 3D Vision, 3DV 2016, pp 239–248. https://doi.org/10.1109/3DV.2016.32
Lee JH, Kim CS (2019) Monocular depth estimation using relative depth maps. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:9721–9730. https://doi.org/10.1109/CVPR.2019.00996
Article Google Scholar
Lin TY et al (2014) Microsoft COCO: common objects in context. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 8693 LNCS, no PART 5, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 2117–2125
Martinez M, Yang K, Constantinescu A, Stiefelhagen R (2020) Helping the blind to get through covid-19: Social distancing assistant using real-time semantic segmentation on rgb-d video. Sens (Switzerland) 20(18):1–17. https://doi.org/10.3390/s20185202
Article Google Scholar
Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:9182–9192. https://doi.org/10.1109/CVPR.2019.00941
Oktay O et al (2018) Attention U-Net: learning where to look for the pancreas, no. Midl
Park H, Sjosund LL, Yoo Y, Monet N, Bang J, Kwak N (2020) SINet: Extreme lightweight portrait segmentation networks with spatial squeeze modules and information blocking decoder. Proc. – 2020 IEEE Winter Conf. Appl. Comput. Vision, WACV 2020, vol 2, no 1, pp 2055–2063. https://doi.org/10.1109/WACV45572.2020.9093588
Paszke A, Chaurasia A, Kim S, Culurciello E(2016) ENet: a deep neural network architecture for real-time semantic segmentation, pp 1–10
Poudel RPK, Bonde U, Liwicki S, Zach C (2019) ContextNet: exploring context and detail for semantic segmentation in real-time. Br. Mach. Vis. Conf. 2018, BMVC 2018
Ranftl R, Lasinger K, Hafner D, Schindler K, Koltun V (2020) Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans Pattern Anal Mach Intell XX(Xx):1. https://doi.org/10.1109/tpami.2020.3019967
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
Shelhamer E, Long J, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Shen X et al (2016) Automatic portrait segmentation for image stylization. Comput Graph Forum 35(2):93–102. https://doi.org/10.1111/cgf.12814
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp 1–14
Smartphones Cause Photography Boom by Felix Richter (Aug 31, 2017), Statista
Sudre CH, Li W, Vercauteren T, Ourselin S, Jorge Cardoso M (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10553 LNCS, pp 240–248. https://doi.org/10.1007/978-3-319-67558-9_28
Supervisely Person Dataset - Datasets - Supervisely
Weng W, Zhu X (2015) UNet: convolutional networks for biomedical image segmentation. IEEE Access 9:16591–16603. https://doi.org/10.1109/ACCESS.2021.3053408
Xiao J, Xu H, Gao H, Bian M, Li Y (2021) A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective. ACM Trans Multimed Comput Commun Appl 17(1s). https://doi.org/10.1145/3419842
Xiao J, Xu H, Fang DK, Cheng C, Gao HH (2021) Boosting and rectifying few-shot learning prototype network for skin lesion classification based on the internet of medical things. Wirel Netw 0123456789:1–15. https://doi.org/10.1007/s11276-021-02713-z
Article Google Scholar
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: simple and efficient design for semantic segmentation with transformers, no. NeurIPS, pp 1–14
Xu X et al (2018) Rendering portraitures from monocular camera and beyond Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 11213 LNCS, pp 36–51. https://doi.org/10.1007/978-3-030-01240-3_3
Yang K et al (2018) Unifying terrain awareness for the visually impaired through real-time semantic segmentation. Sens (Switzerland) 18(5):1–32. https://doi.org/10.3390/s18051506
Article Google Scholar
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: Bilateral segmentation network for real-time semantic segmentation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 11217 LNCS:334–349. doi: https://doi.org/10.1007/978-3-030-01261-8_20
Article Google Scholar
Zhang SH, Dong X, Li H, Li R, Yang YL (2019) PortraitNet: real-time portrait segmentation network for mobile device. Comput Graph 80:104–113. https://doi.org/10.1016/j.cag.2019.03.007
Zhang T, Lang C, Xing J (2019) Realtime human segmentation in video, vol 11296 LNCS. Springer International Publishing
Zhang J, Yang K, Constantinescu A, Peng K, Muller K, Stiefelhagen R (2021) Trans4Trans: efficient transformer for transparent object segmentation to help visually impaired people navigate in the real world. Proc IEEE Int Conf Comput Vis 2021:1760–1770. https://doi.org/10.1109/ICCVW54120.2021.00202
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017, pp 6230–6239. https://doi.org/10.1109/CVPR.2017.660

Download references

Author information

Authors and Affiliations

Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India
Shaurya Gupta & Dinesh Kumar Vishwakarma

Authors

Shaurya Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Kumar Vishwakarma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Ethics declarations

Conflict of interest

It is declared that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gupta, S., Vishwakarma, D.K. HISNet: a Human Image Segmentation Network aiding bokeh effect generation. Multimed Tools Appl 82, 12469–12492 (2023). https://doi.org/10.1007/s11042-022-13900-1

Download citation

Received: 28 November 2021
Revised: 07 June 2022
Accepted: 12 September 2022
Published: 19 September 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-022-13900-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HISNet: a Human Image Segmentation Network aiding bokeh effect generation

Abstract

Access this article

Similar content being viewed by others

SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

Semantic segmentation of outdoor panoramic images

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HISNet: a Human Image Segmentation Network aiding bokeh effect generation

Abstract

Access this article

Similar content being viewed by others

SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

Semantic segmentation of outdoor panoramic images

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation