Skip to main content
Log in

HISNet: a Human Image Segmentation Network aiding bokeh effect generation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The bokeh effect in photography has gained unquestionable popularity since improvements in smartphone cameras, for this effect brings out the attention of the image onto the subject and enhances the overall quality of the photo. Generally, these effects are applicable via dual-lens cameras for auto-focusing onto the subject. However, smartphones with a single lens rely on software to generate such an effect. This paper proposes a deep learning pipeline to generate depth-aware segmentation maps in human images via segmentation and depth estimation networks. The given paper provides a concatenations-based decoder for segmentation applying and experimenting with features learned through state-of-the-art encoder architectures, further we form an encoding concatenation between two prominent encoders to provide an ensemble model for learning segments. Adding to the effect we use a prominent depth estimation architecture and combine it with our segmentation results to generate dept-aware segmentation maps for achieving photos with more focus on human subjects, where the out-of-focus regions appear to be blurred out. The methodology produces compelling bokeh effects, comparable with shots taken via a dual-lens mobile camera or DSLR. During the experimentations of human segmentation, some benchmark results are reported with our best-considered model. Training on Supervisely Persons dataset achieved an IOU score of 95.88%, whereas training the same network on the EG1800 dataset achieved a state-of-the-art IOU of 96.89%. The final segmentation model thus provided some very closely accurate segmentation maps suitable for our task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

It is declared that the dataset used in this manuscript is obtained from following sources and there are freely available.

The Supervisely Persons Dataset that supports the findings of this study are available in/from supervisely-ecosystem/persons GitHub Repository, [https://github.com/supervisely-ecosystem/persons].

The EG1800 Dataset that supports the findings of this study were extracted from the link provided by the original authors: http://xiaoyongshen.me/webpage_portrait/index.html, in their paper available at DOI: https://doi.org/10.1111/cgf.12814.

The citation of these datasets are given and used ethically.

References

  1. Alhashim I, Wonka P (2018) High quality monocular depth estimation via transfer learning

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

  3. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

  4. Chen J et al (2021) A transfer learning based super-resolution microscopy for biopsy slice images: the joint methods perspective. IEEE/ACM Trans Comput Biol Bioinforma 18(1):103–113. https://doi.org/10.1109/TCBB.2020.2991173

  5. Chen J et al (2021) TransUNet: transformers make strong encoders for medical image segmentation, pp 1–13

  6. Digital single-lens reflex camera - Wikipedia

  7. Fei-Fei L, Deng J, Li K (2010) ImageNet: constructing a large-scale image database. J Vis 9(8):1037. https://doi.org/10.1167/9.8.1037

    Article  Google Scholar 

  8. Feng R et al (2021) ChroNet: A multi-task learning based approach for prediction of multiple chronic diseases. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10482-8

    Article  Google Scholar 

  9. Fu J et al (2019) Dual attention network for scene segmentation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:3141–3149. https://doi.org/10.1109/CVPR.2019.00326

    Article  Google Scholar 

  10. Gao H, Xu K, Cao M, Xiao J, Xu Q, Yin Y (2022) The deep features and attention mechanism-based method to dish healthcare under social IoT systems: an empirical study with a hand-deep local-global net. IEEE Trans Comput Soc Syst 9(1):336–347. https://doi.org/10.1109/TCSS.2021.3102591

  11. Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017, pp 6602–6611. https://doi.org/10.1109/CVPR.2017.699

  12. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition

  13. Howard AG et al (2017) MobileNets: efficient convolutional neural networks for mobile vision applications

  14. Howard A et al (2019) Searching for mobileNetV3. Proc IEEE Int Conf Comput Vis 2019:1314–1324. https://doi.org/10.1109/ICCV.2019.00140

    Article  Google Scholar 

  15. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. https://doi.org/10.1109/CVPR.2017.243

  16. Kirkland EJ (2010) Advanced computing in electron microscopy: second edition. Adv Comput Electron Microsc Second Ed 1–289. https://doi.org/10.1007/978-1-4419-6533-2

  17. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. ACM Int. Conf. Proceeding Ser., pp 145–151. https://doi.org/10.1145/3383972.3383975

  18. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. Proc. – 2016 4th Int. Conf. 3D Vision, 3DV 2016, pp 239–248. https://doi.org/10.1109/3DV.2016.32

  19. Lee JH, Kim CS (2019) Monocular depth estimation using relative depth maps. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:9721–9730. https://doi.org/10.1109/CVPR.2019.00996

    Article  Google Scholar 

  20. Lin TY et al (2014) Microsoft COCO: common objects in context. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 8693 LNCS, no PART 5, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  21. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 2117–2125

  22. Martinez M, Yang K, Constantinescu A, Stiefelhagen R (2020) Helping the blind to get through covid-19: Social distancing assistant using real-time semantic segmentation on rgb-d video. Sens (Switzerland) 20(18):1–17. https://doi.org/10.3390/s20185202

    Article  Google Scholar 

  23. Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:9182–9192. https://doi.org/10.1109/CVPR.2019.00941

  24. Oktay O et al (2018) Attention U-Net: learning where to look for the pancreas,  no. Midl

  25. Park H, Sjosund LL, Yoo Y, Monet N, Bang J, Kwak N (2020) SINet: Extreme lightweight portrait segmentation networks with spatial squeeze modules and information blocking decoder. Proc. – 2020 IEEE Winter Conf. Appl. Comput. Vision, WACV 2020, vol 2, no 1, pp 2055–2063. https://doi.org/10.1109/WACV45572.2020.9093588

  26. Paszke A, Chaurasia A, Kim S, Culurciello E(2016) ENet: a deep neural network architecture for real-time semantic segmentation, pp 1–10

  27. Poudel RPK, Bonde U, Liwicki S, Zach C (2019) ContextNet: exploring context and detail for semantic segmentation in real-time. Br. Mach. Vis. Conf. 2018, BMVC 2018

  28. Ranftl R, Lasinger K, Hafner D, Schindler K, Koltun V (2020) Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans Pattern Anal Mach Intell XX(Xx):1. https://doi.org/10.1109/tpami.2020.3019967

  29. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

  30. Shelhamer E, Long J, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683

  31. Shen X et al (2016) Automatic portrait segmentation for image stylization. Comput Graph Forum 35(2):93–102. https://doi.org/10.1111/cgf.12814

    Article  Google Scholar 

  32. Simonyan K, Zisserman A (2015)  Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp 1–14

  33. Smartphones Cause Photography Boom by Felix Richter (Aug 31, 2017), Statista

  34. Sudre CH, Li W, Vercauteren T, Ourselin S, Jorge Cardoso M (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10553 LNCS, pp 240–248. https://doi.org/10.1007/978-3-319-67558-9_28

  35. Supervisely Person Dataset - Datasets - Supervisely

  36. Weng W, Zhu X (2015) UNet: convolutional networks for biomedical image segmentation. IEEE Access 9:16591–16603. https://doi.org/10.1109/ACCESS.2021.3053408

  37. Xiao J, Xu H, Gao H, Bian M, Li Y (2021) A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective. ACM Trans Multimed Comput Commun Appl 17(1s).  https://doi.org/10.1145/3419842

  38. Xiao J, Xu H, Fang DK, Cheng C, Gao HH (2021) Boosting and rectifying few-shot learning prototype network for skin lesion classification based on the internet of medical things. Wirel Netw 0123456789:1–15. https://doi.org/10.1007/s11276-021-02713-z

    Article  Google Scholar 

  39. Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: simple and efficient design for semantic segmentation with transformers, no. NeurIPS, pp 1–14

  40. Xu X et al (2018) Rendering portraitures from monocular camera and beyond Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 11213 LNCS, pp 36–51. https://doi.org/10.1007/978-3-030-01240-3_3

  41. Yang K et al (2018) Unifying terrain awareness for the visually impaired through real-time semantic segmentation. Sens (Switzerland) 18(5):1–32. https://doi.org/10.3390/s18051506

    Article  Google Scholar 

  42. Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: Bilateral segmentation network for real-time semantic segmentation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 11217 LNCS:334–349. doi: https://doi.org/10.1007/978-3-030-01261-8_20

    Article  Google Scholar 

  43. Zhang SH, Dong X, Li H, Li R, Yang YL (2019) PortraitNet: real-time portrait segmentation network for mobile device. Comput Graph 80:104–113. https://doi.org/10.1016/j.cag.2019.03.007

  44. Zhang T, Lang C, Xing J (2019) Realtime human segmentation in video, vol 11296 LNCS. Springer International Publishing

  45. Zhang J, Yang K, Constantinescu A, Peng K, Muller K, Stiefelhagen R (2021) Trans4Trans: efficient transformer for transparent object segmentation to help visually impaired people navigate in the real world. Proc IEEE Int Conf Comput Vis 2021:1760–1770. https://doi.org/10.1109/ICCVW54120.2021.00202

  46. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017, pp 6230–6239. https://doi.org/10.1109/CVPR.2017.660

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Ethics declarations

Conflict of interest

It is declared that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Vishwakarma, D.K. HISNet: a Human Image Segmentation Network aiding bokeh effect generation. Multimed Tools Appl 82, 12469–12492 (2023). https://doi.org/10.1007/s11042-022-13900-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13900-1

Keywords

Navigation