Abstract
The bokeh effect in photography has gained unquestionable popularity since improvements in smartphone cameras, for this effect brings out the attention of the image onto the subject and enhances the overall quality of the photo. Generally, these effects are applicable via dual-lens cameras for auto-focusing onto the subject. However, smartphones with a single lens rely on software to generate such an effect. This paper proposes a deep learning pipeline to generate depth-aware segmentation maps in human images via segmentation and depth estimation networks. The given paper provides a concatenations-based decoder for segmentation applying and experimenting with features learned through state-of-the-art encoder architectures, further we form an encoding concatenation between two prominent encoders to provide an ensemble model for learning segments. Adding to the effect we use a prominent depth estimation architecture and combine it with our segmentation results to generate dept-aware segmentation maps for achieving photos with more focus on human subjects, where the out-of-focus regions appear to be blurred out. The methodology produces compelling bokeh effects, comparable with shots taken via a dual-lens mobile camera or DSLR. During the experimentations of human segmentation, some benchmark results are reported with our best-considered model. Training on Supervisely Persons dataset achieved an IOU score of 95.88%, whereas training the same network on the EG1800 dataset achieved a state-of-the-art IOU of 96.89%. The final segmentation model thus provided some very closely accurate segmentation maps suitable for our task.
Similar content being viewed by others
Data availability
It is declared that the dataset used in this manuscript is obtained from following sources and there are freely available.
The Supervisely Persons Dataset that supports the findings of this study are available in/from supervisely-ecosystem/persons GitHub Repository, [https://github.com/supervisely-ecosystem/persons].
The EG1800 Dataset that supports the findings of this study were extracted from the link provided by the original authors: http://xiaoyongshen.me/webpage_portrait/index.html, in their paper available at DOI: https://doi.org/10.1111/cgf.12814.
The citation of these datasets are given and used ethically.
References
Alhashim I, Wonka P (2018) High quality monocular depth estimation via transfer learning
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Chen J et al (2021) A transfer learning based super-resolution microscopy for biopsy slice images: the joint methods perspective. IEEE/ACM Trans Comput Biol Bioinforma 18(1):103–113. https://doi.org/10.1109/TCBB.2020.2991173
Chen J et al (2021) TransUNet: transformers make strong encoders for medical image segmentation, pp 1–13
Digital single-lens reflex camera - Wikipedia
Fei-Fei L, Deng J, Li K (2010) ImageNet: constructing a large-scale image database. J Vis 9(8):1037. https://doi.org/10.1167/9.8.1037
Feng R et al (2021) ChroNet: A multi-task learning based approach for prediction of multiple chronic diseases. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10482-8
Fu J et al (2019) Dual attention network for scene segmentation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:3141–3149. https://doi.org/10.1109/CVPR.2019.00326
Gao H, Xu K, Cao M, Xiao J, Xu Q, Yin Y (2022) The deep features and attention mechanism-based method to dish healthcare under social IoT systems: an empirical study with a hand-deep local-global net. IEEE Trans Comput Soc Syst 9(1):336–347. https://doi.org/10.1109/TCSS.2021.3102591
Godard C, Aodha OM, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017, pp 6602–6611. https://doi.org/10.1109/CVPR.2017.699
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition
Howard AG et al (2017) MobileNets: efficient convolutional neural networks for mobile vision applications
Howard A et al (2019) Searching for mobileNetV3. Proc IEEE Int Conf Comput Vis 2019:1314–1324. https://doi.org/10.1109/ICCV.2019.00140
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. https://doi.org/10.1109/CVPR.2017.243
Kirkland EJ (2010) Advanced computing in electron microscopy: second edition. Adv Comput Electron Microsc Second Ed 1–289. https://doi.org/10.1007/978-1-4419-6533-2
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. ACM Int. Conf. Proceeding Ser., pp 145–151. https://doi.org/10.1145/3383972.3383975
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. Proc. – 2016 4th Int. Conf. 3D Vision, 3DV 2016, pp 239–248. https://doi.org/10.1109/3DV.2016.32
Lee JH, Kim CS (2019) Monocular depth estimation using relative depth maps. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:9721–9730. https://doi.org/10.1109/CVPR.2019.00996
Lin TY et al (2014) Microsoft COCO: common objects in context. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 8693 LNCS, no PART 5, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 2117–2125
Martinez M, Yang K, Constantinescu A, Stiefelhagen R (2020) Helping the blind to get through covid-19: Social distancing assistant using real-time semantic segmentation on rgb-d video. Sens (Switzerland) 20(18):1–17. https://doi.org/10.3390/s20185202
Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019:9182–9192. https://doi.org/10.1109/CVPR.2019.00941
Oktay O et al (2018) Attention U-Net: learning where to look for the pancreas, no. Midl
Park H, Sjosund LL, Yoo Y, Monet N, Bang J, Kwak N (2020) SINet: Extreme lightweight portrait segmentation networks with spatial squeeze modules and information blocking decoder. Proc. – 2020 IEEE Winter Conf. Appl. Comput. Vision, WACV 2020, vol 2, no 1, pp 2055–2063. https://doi.org/10.1109/WACV45572.2020.9093588
Paszke A, Chaurasia A, Kim S, Culurciello E(2016) ENet: a deep neural network architecture for real-time semantic segmentation, pp 1–10
Poudel RPK, Bonde U, Liwicki S, Zach C (2019) ContextNet: exploring context and detail for semantic segmentation in real-time. Br. Mach. Vis. Conf. 2018, BMVC 2018
Ranftl R, Lasinger K, Hafner D, Schindler K, Koltun V (2020) Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans Pattern Anal Mach Intell XX(Xx):1. https://doi.org/10.1109/tpami.2020.3019967
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
Shelhamer E, Long J, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Shen X et al (2016) Automatic portrait segmentation for image stylization. Comput Graph Forum 35(2):93–102. https://doi.org/10.1111/cgf.12814
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp 1–14
Smartphones Cause Photography Boom by Felix Richter (Aug 31, 2017), Statista
Sudre CH, Li W, Vercauteren T, Ourselin S, Jorge Cardoso M (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10553 LNCS, pp 240–248. https://doi.org/10.1007/978-3-319-67558-9_28
Supervisely Person Dataset - Datasets - Supervisely
Weng W, Zhu X (2015) UNet: convolutional networks for biomedical image segmentation. IEEE Access 9:16591–16603. https://doi.org/10.1109/ACCESS.2021.3053408
Xiao J, Xu H, Gao H, Bian M, Li Y (2021) A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective. ACM Trans Multimed Comput Commun Appl 17(1s). https://doi.org/10.1145/3419842
Xiao J, Xu H, Fang DK, Cheng C, Gao HH (2021) Boosting and rectifying few-shot learning prototype network for skin lesion classification based on the internet of medical things. Wirel Netw 0123456789:1–15. https://doi.org/10.1007/s11276-021-02713-z
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) SegFormer: simple and efficient design for semantic segmentation with transformers, no. NeurIPS, pp 1–14
Xu X et al (2018) Rendering portraitures from monocular camera and beyond Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol 11213 LNCS, pp 36–51. https://doi.org/10.1007/978-3-030-01240-3_3
Yang K et al (2018) Unifying terrain awareness for the visually impaired through real-time semantic segmentation. Sens (Switzerland) 18(5):1–32. https://doi.org/10.3390/s18051506
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: Bilateral segmentation network for real-time semantic segmentation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 11217 LNCS:334–349. doi: https://doi.org/10.1007/978-3-030-01261-8_20
Zhang SH, Dong X, Li H, Li R, Yang YL (2019) PortraitNet: real-time portrait segmentation network for mobile device. Comput Graph 80:104–113. https://doi.org/10.1016/j.cag.2019.03.007
Zhang T, Lang C, Xing J (2019) Realtime human segmentation in video, vol 11296 LNCS. Springer International Publishing
Zhang J, Yang K, Constantinescu A, Peng K, Muller K, Stiefelhagen R (2021) Trans4Trans: efficient transformer for transparent object segmentation to help visually impaired people navigate in the real world. Proc IEEE Int Conf Comput Vis 2021:1760–1770. https://doi.org/10.1109/ICCVW54120.2021.00202
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol 2017, pp 6230–6239. https://doi.org/10.1109/CVPR.2017.660
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
It is declared that there is no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, S., Vishwakarma, D.K. HISNet: a Human Image Segmentation Network aiding bokeh effect generation. Multimed Tools Appl 82, 12469–12492 (2023). https://doi.org/10.1007/s11042-022-13900-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13900-1