Abstract
In this paper we investigate the potential of a family of efficient filters – the Gray-Code Kernels – for addressing visual saliency estimation guided by motion. Our implementation relies on the use of 3D kernels applied to overlapping blocks of frames and is able to gather meaningful spatio-temporal information with a very light computation. We introduce an attention module that reasons on the use of pooling strategies, combined in an unsupervised way to derive a saliency map highlighting the presence of motion in the scene. In the experiments we show that our method is able to effectively and efficiently identify the portion of the image where the motion is occurring, providing tolerance to a variety of scene conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The implementation of our method in Python will be made soon publicly available.
References
Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Mach. Vis. Appl. 23(2), 255–281 (2012)
Ben-Artzi, G., Hel-Or, H., Hel-Or, Y.: The gray-code filter kernels. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 382–393 (2007)
Bouwmans, T.: Recent advanced statistical background modeling for foreground detection-a systematic survey. Recent Pat. Comput. Sci. 4(3), 147–176 (2011)
Cong, R., Lei, J., Fu, H., Cheng, M.M., Lin, W., Huang, Q.: Review of visual saliency detection with comprehensive information. IEEE Trans. Circ. Syst. Video Technol. 29(10), 2941–2959 (2018)
Faktor, A., Irani, M.: Video segmentation by non-local consensus voting. In: BMVC. vol. 2, p. 8 (2014)
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45103-X_50
Fortun, D., Bouthemy, P., Kervrann, C.: Optical flow modeling and computation: a survey. Comput. Vis. Image Underst. 134, 1–21 (2015)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Hel-Or, Y., Hel-Or, H.: Real-time pattern matching using projection kernels. IEEE Trans. Pattern Anal. Mach. Intell. 27(9), 1430–1445 (2005)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Jain, S.D., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2126. IEEE (2017)
Korman, S., Avidan, S.: Coherency sensitive hashing. IEEE Trans. Pattern Anal. Mach. Intell. 38(6), 1099–1112 (2015)
Lee, D.S.: Effective gaussian mixture learning for video background subtraction. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 827–832 (2005)
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2192–2199 (2013)
Moshe, Y., Hel-Or, H.: Video block motion estimation based on gray-code kernels. IEEE Trans. Image Process. 18(10), 2243–2254 (2009)
Moshe, Y., Hel-Or, H., Hel-Or, Y.: Foreground detection using spatiotemporal projection kernels. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3210–3217. IEEE (2012)
Noceti, N., Delponte, E., Odone, F.: Spatio-temporal constraints for on-line 3D object recognition in videos. Comput. Vis. Image Underst. 113(12), 1198–1209 (2009)
Noceti, N., Sciutti, A., Sandini, G.: Cognition helps vision: recognizing biological motion using invariant dynamic cues. In: Murino, V., Puppo, E. (eds.) ICIAP 2015. LNCS, vol. 9280, pp. 676–686. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23234-8_62
Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2013)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybernet. 9(1), 62–66 (1979)
Ouyang, W., Zhang, R., Cham, W.K.: Fast pattern matching using orthogonal Haar transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3050–3057. IEEE (2010)
Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: Proceedings of the IEEE international conference on computer vision. pp. 1777–1784 (2013)
Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2663–2672 (2017)
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–732 (2016)
Rea, F., Vignolo, A., Sciutti, A., Noceti, N.: Human motion understanding for selecting action timing in collaborative human-robot interaction. Front. Robot. AI 6, 58 (2019)
Stagliano, A., Noceti, N., Verri, A., Odone, F.: Online space-variant background modeling with sparse coding. IEEE Trans. Image Process. 24(8), 2415–2428 (2015)
Vignolo, A., Noceti, N., Rea, F., Sciutti, A., Odone, F., Sandini, G.: Detecting biological motion for human-robot interaction: a link between perception and action. Front. Robot. AI p. 14 (2017)
Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364 (2017)
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: Deepflow: large displacement optical flow with deep matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1385–1392 (2013)
Werlberger, M., Pock, T., Bischof, H.: Motion estimation with non-local total variation regularization. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2464–2471. IEEE (2010)
Xiao, F., Jae Lee, Y.: Track and segment: an iterative unsupervised approach for video object proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 933–942 (2016)
Zhuo, T., Cheng, Z., Zhang, P., Wong, Y., Kankanhalli, M.: Unsupervised online video object segmentation with motion property understanding. IEEE Trans. Image Process. 29, 237–249 (2019)
Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 2, pp. 28–31. IEEE (2004)
Acknowledgements
This work has been carried out at the Machine Learning Genoa (MaLGa) center, Università di Genova (IT). It has been supported by AFOSR with the project “Cognitively-inspired architectures for human motion understanding”, grant no. FA8655-20-1-7035.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nicora, E., Noceti, N. (2022). Exploring the Use of Efficient Projection Kernels for Motion Saliency Estimation. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13233. Springer, Cham. https://doi.org/10.1007/978-3-031-06433-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-06433-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06432-6
Online ISBN: 978-3-031-06433-3
eBook Packages: Computer ScienceComputer Science (R0)