Abstract
Scene analysis in video sequences is a complex task for a computer vision system. Several schemes have been addressed in this analysis, such as deep learning networks or traditional image processing methods. However, these methods require thorough training or manual adjustment of parameters to achieve accurate results. Therefore, it is necessary to develop novel methods to analyze the scenario information in video sequences. For this reason, this paper proposes a method for object segmentation in video sequences inspired by the structural layers of the visual cortex. The method is called Neuro-Inspired Object Segmentation, SegNI. SegNI has a hierarchical architecture that analyzes object features such as edges, color, and motion to generate regions that represent the objects in the scenario. The results obtained with the Video Segmentation Benchmark VSB100 dataset demonstrate that SegNI can adapt automatically to videos with scenarios that have different nature, composition, and different types of objects. Also, SegNI adapts its processing to new scenario conditions without training, which is a significant advantage over deep learning networks.
Similar content being viewed by others
Data availability
The code and readme are available in https://github.com/RaulRan/NISeg.
References
Andersen RA (1997) Neural mechanisms of visual motion perception in primates. Cell Press. https://doi.org/10.1016/S0896-6273(00)80326-8
Arbeláez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916. https://doi.org/10.1109/TPAMI.2010.161
Bednar JA, Miikkulainen R (2000) Tilt aftereffects in a self-organizing model of the primary visual cortex. Neural Comput 12(7):1721–1740. https://doi.org/10.1162/089976600300015321
Bednar JA, De Paula JB, Miikkulainen R (2005) Self-organization of color opponent receptive fields and laterally connected orientation maps. Neurocomputing. https://doi.org/10.1016/j.neucom.2004.10.055
Brito da Silva LE, Elnabarawy I, Wunsch DC (2019) A survey of adaptive resonance theory neural network models for engineering applications. Neural Netw 120:167–203. https://doi.org/10.1016/j.neunet.2019.09.012
Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88–97. https://doi.org/10.1016/j.patrec.2008.04.005
Caelles S, Pont-Tuset J, Perazzi F, Montes A, Maninis KK, Van Gool L (2019) The 2019 davis challenge on vos: Unsupervised multi-object segmentation. arXiv:190500737
Chabane AN, Islam N, Zerr B (2017) Incremental clustering of sonar images using self-organizing maps combined with fuzzy adaptive resonance theory. Ocean Eng 142:133–144. https://doi.org/10.1016/j.oceaneng.2017.06.061
Chacon-Murguia MI, Guzman-Pando A, Ramirez-Alonso G, Ramirez-Quintana JA (2019) A novel instrument to compare dynamic object detection algorithms. Image Vis Comput 88:19–28. https://doi.org/10.1016/j.imavis.2019.04.006
Chang P, Wang X, Huang J (2012) Color image segmentation based on visual perception. In: 2012 IEEE international conference on information science and technology, pp 425–429, https://doi.org/10.1109/ICIST.2012.6221682
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2021) Clustering with local density peaks-based minimum spanning tree. IEEE Trans Knowl Data Eng 33(2):374–387. https://doi.org/10.1109/TKDE.2019.2930056
Chua L, Roska T (2010) Cellular neural networks and visual computing: foundations and applications. Cambridge University Press, Cambridge
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Corso JJ, Sharon E, Dube S, El-Saden S, Sinha U, Yuille A (2008) Efficient multilevel brain tumor segmentation with integrated bayesian model classification. IEEE Trans Med Imaging 27(5):629–640. https://doi.org/10.1109/TMI.2007.912817
Dong T, Zhang X, Ding Z, Fan J (2020) Multi-layered tree crown extraction from lidar data using graph-based segmentation. Comput Electron Agric 170:105213. https://doi.org/10.1016/j.compag.2020.105213
Du X, Dai P, Wang S, Cheng Y, Wu D (2017) Coupled wilson-cowan oscillator model with double-node for image enhancement. In: 2017 IEEE third international conference on multimedia big data (BigMM), pp. 129–133, https://doi.org/10.1109/BigMM.2017.46
Fairchild MD (2013) Color appearance models. Wiley, London
Farnworth T, Renton C, Strydom R, Wills A, Perez T (2021) A heteroscedastic likelihood model for two-frame optical flow. IEEE Robot Automat Lett 6(2):1200–1207. https://doi.org/10.1109/LRA.2021.3056342
Fortun D, Bouthemy P, Kervrann C (2015) Optical flow modeling and computation: a survey. Comput Vis Image Understand Real World Vid Netw 134:1–21. https://doi.org/10.1016/j.cviu.2015.02.008
Galasso F, Cipolla R, Schiele B (2013) Video segmentation with superpixels. In: Lecture Notes in Computer Science, including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, Springer, Berlin, Heidelberg, pp 760–774, https://doi.org/10.1007/978-3-642-37331-2_57
Galasso F, Nagaraja NS, Cárdenas TJ, Brox T, Schiele B (2013) A unified video segmentation benchmark: Annotation, metrics and analysis. In: 2013 IEEE international conference on computer vision, pp 3527–3534, https://doi.org/10.1109/ICCV.2013.438
Garg S, Goel V, Kumar S (2020) Unsupervised video object segmentation using online mask selection and space-time memory networks. The 2020 DAVIS Challenge on Video Object Segmentation - CVPR Workshops
Gharaee Z (2021) Online recognition of unsegmented actions with hierarchical SOM architecture. Cognit Process 22(1):77–91. https://doi.org/10.1007/s10339-020-00986-4
Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 2141–2148, https://doi.org/10.1109/CVPR.2010.5539893
Gupta A, Anpalagan A, Guan L, Khwaja AS (2021) Deep learning for object detection and scene perception in self-driving cars: survey, challenges, and open issues. Array 10:100057. https://doi.org/10.1016/j.array.2021.100057
Jiang L, Zhang D, Che L (2021) Texture analysis-based multi-focus image fusion using a modified pulse-coupled neural network (pcnn). Signal Process Image Commun. https://doi.org/10.1016/j.image.2020.116068
Keuper M, Brox T (2016) Point-wise mutual information-based video segmentation with high temporal consistency. In: Hua G, Jégou H (eds) Computer Vision - ECCV 2016 Workshops. Springer International Publishing, Cham, pp 789–803
Kruger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodriguez-Sanchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35(8):1847–1871. https://doi.org/10.1109/TPAMI.2012.272
Kuzmina M, Manykin E (2005) Oscillatory neural network for adaptive dynamical image processing. In: international conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06), vol 1, pp 301–306, https://doi.org/10.1109/CIMCA.2005.1631283
Li W, Ogunbona P, Ye L, Kharitonenko I (2004) Visual perceptual process model and object segmentation. In: proceedings 7th international conference on signal processing, 2004. ICSP ’04. 2004., vol 1, pp 753–756 vol.1, https://doi.org/10.1109/ICOSP.2004.1452772
Masland RH, Dallos P, Firestein S (2020) The senses : a comprehensive reference. Elsevier, Amsterdam
Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3059968
Mou L, Hua Y, Zhu XX (2020) Relation matters: relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images. IEEE Trans Geosci Remote Sens 58(11):7557–7569. https://doi.org/10.1109/TGRS.2020.2979552
Ochs P, Brox T (2011) Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: 2011 international conference on computer vision, pp 1583–1590, https://doi.org/10.1109/ICCV.2011.6126418
Pisal A, Sor R, Kinage KS (2017) Facial feature extraction using hierarchical max(hmax) method. In: 2017 international conference on computing, communication, control and automation (ICCUBEA), pp 1–5, https://doi.org/10.1109/ICCUBEA.2017.8463755
Ramirez-Quintana JA, Chacon-Murguia MI (2015) Self-adaptive som-cnn neural system for dynamic object detection in normal and complex scenarios. Pattern Recogni 48(4):1137–1149. https://doi.org/10.1016/j.patcog.2014.09.009
Saglam A, Baykan NA (2017) Effects of color spaces and distance norms on graph-based image segmentation. In: 2017 3rd international conference on frontiers of signal processing (ICFSP), pp 130–135, https://doi.org/10.1109/ICFSP.2017.8097156
Sanchez G, Madrenas J, Cosp-Vilella J (2019) Legion-based image segmentation by means of spiking neural networks using normalized synaptic weights implemented on a compact scalable neuromorphic architecture. Neurocomputing 352:106–120. https://doi.org/10.1016/j.neucom.2019.04.037
Sengupta N, McNabb CB, Kasabov N, Russell BR (2018) Integrating space, time, and orientation in spiking neural networks: a case study on multimodal brain data modeling. IEEE Trans Neural Netw Learn Syst 29(11):5249–5263. https://doi.org/10.1109/TNNLS.2018.2796023
Stoll S, Finlayson NJ, Schwarzkopf DS (2020) Topographic signatures of global object perception in human visual cortex. NeuroImage 220:116926. https://doi.org/10.1016/j.neuroimage.2020.116926
Sundberg P, Brox T, Maire M, Arbeláez P, Malik J (2011) Occlusion boundary detection and figure/ground assignment from optical flow. In: CVPR 2011:2233–2240. https://doi.org/10.1109/CVPR.2011.5995364
Sung M, Kim Y (2020) Training spiking neural networks with an adaptive leaky integrate-and-fire neuron. In: 2020 IEEE international conference on consumer electronics - Asia (ICCE-Asia), pp 1–2, https://doi.org/10.1109/ICCE-Asia49877.2020.9277455
T Zhou YY W Wang, Shen J (2020) Target-aware adaptive tracking for unsupervised video object segmentation. The 2020 DAVIS Challenge on Video Object Segmentation - CVPR Workshops
Thwaites A, Wingfield C, Wieser E, Soltan A, Marslen-Wilson WD, Nimmo-Smith I (2018) Entrainment to the ciecam02 and cielab colour appearance models in the human cortex. Vis Res 145:1–10. https://doi.org/10.1016/j.visres.2018.01.011
Tjøstheim TA, Balkenius C (2019) Cumulative inhibition in neural networks. Cognit Process 20(1):87–102. https://doi.org/10.1007/s10339-018-0888-z
Tran Q, Su S, Nguyen V (2020) Pyramidal lucas-kanade-based noncontact breath motion detection. IEEE Trans Syst Man Cybern Syst 50(7):2659–2670. https://doi.org/10.1109/TSMC.2018.2825458
Wang Q, Gao J, Yuan Y (2018) A joint convolutional neural networks and context transfer for street scenes labeling. IEEE Trans Intell Transp Syst 19(5):1457–1470. https://doi.org/10.1109/TITS.2017.2726546
Wang Q, Gao J, Li X (2019) Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Trans Image Process 28(9):4376–4386. https://doi.org/10.1109/TIP.2019.2910667
Wang Z, Wang Z (2020) A generic approach for cell segmentation based on gabor filtering and area-constrained ultimate erosion. Artif Intell Med 107:101929. https://doi.org/10.1016/j.artmed.2020.101929
X Xiao CC, Lu Y (2020) Global tracklet matching for unsupervised video object segmentation. The 2020 DAVIS Challenge on Video Object Segmentation - CVPR Workshops
Xu C, Xiong C, Corso JJ (2012) Streaming hierarchical video segmentation. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer, Berlin, Heidelberg, PART 6, pp 626–639, https://doi.org/10.1007/978-3-642-33783-3_45
Xu H, Hancock ER, Zhou W (2019) The low-rank decomposition of correlation-enhanced superpixels for video segmentation. Soft Comput 23(24):13055–13065. https://doi.org/10.1007/s00500-019-03849-z
Xu N, Yang L, Fan Y, Yue D, Liang Y, Yang J, Huang T (2018) Youtube-vos: A large-scale video object segmentation benchmark
Yamasaki T, Tobimatsu S (2018) Driving ability in alzheimer disease spectrum: neural basis, assessment, and potential use of optic flow event-related potentials. Front Neurol 9:1–14. https://doi.org/10.3389/fneur.2018.00750
Yang K, Hu X, Stiefelhagen R (2021) Is context-aware cnn ready for the surroundings? panoramic semantic segmentation in the wild. IEEE Trans Image Process 30:1866–1881. https://doi.org/10.1109/TIP.2020.3048682
Yu B, Zhang L (2004) Pulse-coupled neural networks for contour and motion matchings. IEEE Trans Neural Netw 15(5):1186–1201. https://doi.org/10.1109/TNN.2004.832830
Yu J, Xia G, Gao C, Samal A (2016) A computational model for object-based visual saliency: spreading attention along gestalt cues. IEEE Trans Multimed 18(2):273–286. https://doi.org/10.1109/TMM.2015.2505908
Zhao Y, Yuan Y, Nie F, Wang Q (2018) Spectral clustering based on iterative optimization for large-scale and high-dimensional data. Neurocomputing 318:227–235. https://doi.org/10.1016/j.neucom.2018.08.059
Funding
This work was supported by the Tecnologico Nacional de México under grants TecNM 7598.20-P and 10071.21-P.
Author information
Authors and Affiliations
Contributions
All de author develop the research, theoretical development, code, and the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Code availability
The code will be available. Please, indicate where we publish the code to the journal.
Consent for publication
If the paper is accepted, the code and all the necessary documents will be published to the journal.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Section Editor: Moreno I. Coco (University of East London) Reviewers: Qi Wang (Northwestern Polytechnical University, Shaanxi), Anastasiia Mikhailova (University of Lisbon).
A DAVIS2017 results
A DAVIS2017 results
DAVIS2017 is a dataset composed of 150 video sequences with 10474 annotated frames. DAVIS2017 considers challenging scenarios such as motion blur and occlusions within its three different tasks: Semi-Supervised, Interactive, and Unsupervised Video Object Segmentation. We considered this dataset as an option to validate SegNI, particularly by using the unsupervised task. In this modality, the methods must generate segmentation approaches without any human interaction. The results obtained by our model with three different videos are presented in Fig. 6.
The temporal coherence obtained with SegNI is very accurate due to the constant \(\tau _h\) and the contour detection in V4 (as the results in VSB100 experiments). The first video named basketball-game is highly complex because it has many small dynamic objects. The results demonstrate that SegNI could correctly identify most of the basketball players. In the second video, known as bmx-rider, the bmx-rider is accurately identified as the video sequence progresses, in some cases, even the two wheels are identified as different objects from the bicycle. Finally, the results of the last video, named mermaid are very interesting. In this case, SegNI separated the water and its refraction effect despite the color similitude.
Unfortunately, the format of the SegNI segmentation results is not compatible with the format required for evaluation on the DAVIS submission site. SegNI detects all the objects in the scenario, whereas the groundtruth of DAVIS considers just moving objects. Then, it was not possible to obtain quantitative segmentation results.
The analysis of the models ranked in the first places in the 2020 DAVIS challenge indicates that all of them use pre-trained supervised models to define an initial segmentation mask (Garg et al. 2020; T. Zhou and Shen 2020; X. Xiao and Lu 2020). This initial mask is improved by a second model trained on at least 8-GPUs for no less than two days. Even when their results are very impressive and do not require human intervention to define an initial delineation of objects, their solution are based on pre-trained supervised models. This is not the strategy that we follow in SegNI where the segmentation results are produced as the model analyzes the video with a common CPU computer.
Figure 7 shows some qualitative segmentation results whose GT is available. In this figure, the videos blackswan and dogs-scale were used. These results demonstrate that the definition of objects of SegNI is very precise.
Rights and permissions
About this article
Cite this article
Ramirez-Quintana, J.A., Rangel-Gonzalez, R., Chacon-Murguia, M.I. et al. A visual object segmentation algorithm with spatial and temporal coherence inspired by the architecture of the visual cortex. Cogn Process 23, 27–40 (2022). https://doi.org/10.1007/s10339-021-01065-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-021-01065-y