Self-supervised Learning for Foreground Segmentation with a Few Amount of Labeled Images Using Transformers

Osman, Islam; Shehata, Mohamed S.

doi:10.1007/978-3-031-20601-6_2

Islam Osman⁷ &
Mohamed S. Shehata⁷

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 152))

Included in the following conference series:

International Conference on Advanced Intelligent Systems and Informatics

1207 Accesses

Abstract

Foreground segmentation is a challenging task in computer vision due to the lack of labeled data. Unlike other computer vision tasks, foreground segmentation requires per-pixel labeling, which takes much effort and time. However, recent deep learning models require a large-scale labeled dataset to achieve high accuracy. In this paper, we present a novel self-supervised learning technique that allows a deep learning model to learn robust features and adaptable object representation from the unlabeled dataset and transfer the knowledge to another dataset with a few amount of labeled images. The proposed work is evaluated using two benchmark datasets, namely DAVIS and SegTrack. The proposed method has an f-measure of 0.882 and 0.907 for DAVIS and SegTrack, which is higher than existing deep learning models by 1.95$\%$ on average.

Mohamed S. Shehata These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Lightweight Convolutional Neural Network for Foreground Segmentation

Domain Generalization for Foreground Segmentation Using Federated Learning

Deep Neural Network for Foreground Object Segmentation: An Unsupervised Approach

References

Yu, C., Wang, J., Gao, C., Yu, G., Shen, C., Sang, N.: Context prior for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12416–12425 (2020)
Google Scholar
Scheiner, N., Kraus, F., Wei, F., Phan, B., Mannan, F., Appenrodt, N., Ritter, W., Dickmann, J., Dietmayer, K., Sick, B., Heide, F.: Seeing around street corners: non-line-of-sight detection and tracking in-the-wild using doppler radar. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Abdelpakey, M.H., Shehata, M.S.: DP-Siam: Dynamic policy Siamese network for robust object tracking. IEEE Trans. Image Process. 29, 1479–1492 (2019)
Article MathSciNet MATH Google Scholar
Mountney, P., Stoyanov, D., Yang, G.-Z.: Three-dimensional tissue deformation recovery and tracking. IEEE Signal Process. Mag. 27(4), 14–24 (2010)
Article Google Scholar
Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In: Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR’99), pp. 85–94. IEEE (1999)
Google Scholar
Bouwmans, T.: Recent advanced statistical background modeling for foreground detection-a systematic survey. Recent Pat. Comput. Sci. 4(3), 147–176 (2011)
Google Scholar
Vaswani, N., Bouwmans, T., Javed, S., Narayanamurthy, P.: Robust subspace learning: robust PCA, robust subspace tracking, and robust subspace recovery. IEEE Signal Process. Mag. 35(4), 32–55 (2018)
Article Google Scholar
Aslani, S., Mahdavi-Nasab, H.: Optical flow based moving object detection and tracking for traffic surveillance. Int. J. Electr., Comput., Energ., Electron. Commun. Eng. 7(9), 1252–1256 (2013)
Google Scholar
ElTantawy, A., Shehata, M.S.: Local null space pursuit for real-time moving object detection in aerial surveillance. Signal, Image Video Process. 14(1), 87–95 (2020)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012)
Google Scholar
Wang, Y., Luo, Z., Jodoin, P.-M.: Interactive deep learning method for segmenting moving objects. Pattern Recognit. Lett. 96, 66–75 (2017)
Article Google Scholar
Lim, L.A., Keles, H.Y.: Learning multi-scale features for foreground segmentation. Pattern Anal. Appl. 23(3), 1369–1380 (2020)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection (2020). arXiv:2010.04159
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:2010.11929
French, R.M.: Catastrophic forgetting in connectionist networks. Encycl. Cogn. Sci. (2006)
Google Scholar
Bao, L., Wu, B., Liu, W.: CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5977–5986 (2018)
Google Scholar
Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for video object segmentation. Int. J. Comput. Vis. 127(9), 1175–1197 (2019)
Article Google Scholar
Le, T.-N., Sugimoto, A.: Video salient object detection using spatiotemporal deep features. IEEE Trans. Image Process. 27(10), 5002–5015 (2018)
Article MathSciNet Google Scholar
Le, T.-N., Sugimoto, A.: Deeply supervised 3D recurrent FCN for salient object detection in videos. In: BMVC, vol. 1, p. 3 (2017)
Google Scholar
Li, G., Yu, Y.: Deep contrast learning for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 478–487 (2016)
Google Scholar
Liu, N., Han, J.: DHSNet: deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 678–686 (2016)
Google Scholar
Wang, L., Wang, L., Lu, H., Zhang, P., Ruan, X.: Saliency detection with recurrent fully convolutional networks. In: European Conference on Computer Vision, pp. 825–841. Springer (2016)
Google Scholar
Zhuo, T., Cheng, Z., Zhang, P., Wong, Y., Kankanhalli, M.: Unsupervised online video object segmentation with motion property understanding. IEEE Trans. Image Process. 29, 237–249 (2019)
Article MathSciNet MATH Google Scholar
Fan, D.-P., Wang, W., Cheng, M.-M., Shen, J.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564 (2019)
Google Scholar
Le, H., Nguyen, V., Yu, C.-P., Samaras, D.: Geodesic distance histogram feature for video segmentation. In: Asian Conference on Computer Vision, pp. 275–290. Springer (2016)
Google Scholar
Akilan, T., Wu, Q.J.: sEnDec: an improved image to image CNN for foreground localization. IEEE Trans. Intell. Transp. Syst. 21(10), 4435–4443 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of British Columbia, 3333 University Way, Kelowna, British Columbia, V1V 1V7, Canada
Islam Osman & Mohamed S. Shehata

Authors

Islam Osman
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed S. Shehata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Islam Osman .

Editor information

Editors and Affiliations

Faculty of Computers Artificial Intelligence, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Faculty of Electrical Engineering and Computer Science, VŠB-Technical University of Ostrava, Ostrava-Poruba, Moravskoslezsky, Czech Republic
Václav Snášel
International Center for Informatics Research, Beijing Jaiotong University, Beijing, China
Mincong Tang
College of Computer Science and Mathematics, Fujian University of Technology, Fuzhou, Fujian, China
Tien-Wen Sung
Fujian University of Technology, New Taipei, Taiwan
Kuo-Chi Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Osman, I., Shehata, M.S. (2023). Self-supervised Learning for Foreground Segmentation with a Few Amount of Labeled Images Using Transformers. In: Hassanien, A.E., Snášel, V., Tang, M., Sung, TW., Chang, KC. (eds) Proceedings of the 8th International Conference on Advanced Intelligent Systems and Informatics 2022. AISI 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 152. Springer, Cham. https://doi.org/10.1007/978-3-031-20601-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-20601-6_2
Published: 18 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20600-9
Online ISBN: 978-3-031-20601-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics