Abstract
Foreground segmentation is a challenging task in computer vision due to the lack of labeled data. Unlike other computer vision tasks, foreground segmentation requires per-pixel labeling, which takes much effort and time. However, recent deep learning models require a large-scale labeled dataset to achieve high accuracy. In this paper, we present a novel self-supervised learning technique that allows a deep learning model to learn robust features and adaptable object representation from the unlabeled dataset and transfer the knowledge to another dataset with a few amount of labeled images. The proposed work is evaluated using two benchmark datasets, namely DAVIS and SegTrack. The proposed method has an f-measure of 0.882 and 0.907 for DAVIS and SegTrack, which is higher than existing deep learning models by 1.95\(\%\) on average.
Mohamed S. Shehata These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yu, C., Wang, J., Gao, C., Yu, G., Shen, C., Sang, N.: Context prior for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12416–12425 (2020)
Scheiner, N., Kraus, F., Wei, F., Phan, B., Mannan, F., Appenrodt, N., Ritter, W., Dickmann, J., Dietmayer, K., Sick, B., Heide, F.: Seeing around street corners: non-line-of-sight detection and tracking in-the-wild using doppler radar. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Abdelpakey, M.H., Shehata, M.S.: DP-Siam: Dynamic policy Siamese network for robust object tracking. IEEE Trans. Image Process. 29, 1479–1492 (2019)
Mountney, P., Stoyanov, D., Yang, G.-Z.: Three-dimensional tissue deformation recovery and tracking. IEEE Signal Process. Mag. 27(4), 14–24 (2010)
Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In: Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR’99), pp. 85–94. IEEE (1999)
Bouwmans, T.: Recent advanced statistical background modeling for foreground detection-a systematic survey. Recent Pat. Comput. Sci. 4(3), 147–176 (2011)
Vaswani, N., Bouwmans, T., Javed, S., Narayanamurthy, P.: Robust subspace learning: robust PCA, robust subspace tracking, and robust subspace recovery. IEEE Signal Process. Mag. 35(4), 32–55 (2018)
Aslani, S., Mahdavi-Nasab, H.: Optical flow based moving object detection and tracking for traffic surveillance. Int. J. Electr., Comput., Energ., Electron. Commun. Eng. 7(9), 1252–1256 (2013)
ElTantawy, A., Shehata, M.S.: Local null space pursuit for real-time moving object detection in aerial surveillance. Signal, Image Video Process. 14(1), 87–95 (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012)
Wang, Y., Luo, Z., Jodoin, P.-M.: Interactive deep learning method for segmenting moving objects. Pattern Recognit. Lett. 96, 66–75 (2017)
Lim, L.A., Keles, H.Y.: Learning multi-scale features for foreground segmentation. Pattern Anal. Appl. 23(3), 1369–1380 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection (2020). arXiv:2010.04159
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:2010.11929
French, R.M.: Catastrophic forgetting in connectionist networks. Encycl. Cogn. Sci. (2006)
Bao, L., Wu, B., Liu, W.: CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5977–5986 (2018)
Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for video object segmentation. Int. J. Comput. Vis. 127(9), 1175–1197 (2019)
Le, T.-N., Sugimoto, A.: Video salient object detection using spatiotemporal deep features. IEEE Trans. Image Process. 27(10), 5002–5015 (2018)
Le, T.-N., Sugimoto, A.: Deeply supervised 3D recurrent FCN for salient object detection in videos. In: BMVC, vol. 1, p. 3 (2017)
Li, G., Yu, Y.: Deep contrast learning for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 478–487 (2016)
Liu, N., Han, J.: DHSNet: deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 678–686 (2016)
Wang, L., Wang, L., Lu, H., Zhang, P., Ruan, X.: Saliency detection with recurrent fully convolutional networks. In: European Conference on Computer Vision, pp. 825–841. Springer (2016)
Zhuo, T., Cheng, Z., Zhang, P., Wong, Y., Kankanhalli, M.: Unsupervised online video object segmentation with motion property understanding. IEEE Trans. Image Process. 29, 237–249 (2019)
Fan, D.-P., Wang, W., Cheng, M.-M., Shen, J.: Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554–8564 (2019)
Le, H., Nguyen, V., Yu, C.-P., Samaras, D.: Geodesic distance histogram feature for video segmentation. In: Asian Conference on Computer Vision, pp. 275–290. Springer (2016)
Akilan, T., Wu, Q.J.: sEnDec: an improved image to image CNN for foreground localization. IEEE Trans. Intell. Transp. Syst. 21(10), 4435–4443 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Osman, I., Shehata, M.S. (2023). Self-supervised Learning for Foreground Segmentation with a Few Amount of Labeled Images Using Transformers. In: Hassanien, A.E., Snášel, V., Tang, M., Sung, TW., Chang, KC. (eds) Proceedings of the 8th International Conference on Advanced Intelligent Systems and Informatics 2022. AISI 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 152. Springer, Cham. https://doi.org/10.1007/978-3-031-20601-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-20601-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20600-9
Online ISBN: 978-3-031-20601-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)