A visual object segmentation algorithm with spatial and temporal coherence inspired by the architecture of the visual cortex

Ramirez-Quintana, Juan A.; Rangel-Gonzalez, Raul; Chacon-Murguia, Mario I.; Ramirez-Alonso, Graciela

doi:10.1007/s10339-021-01065-y

A visual object segmentation algorithm with spatial and temporal coherence inspired by the architecture of the visual cortex

Research Article
Published: 15 November 2021

Volume 23, pages 27–40, (2022)
Cite this article

Cognitive Processing Aims and scope Submit manuscript

Juan A. Ramirez-Quintana ORCID: orcid.org/0000-0003-4445-6555¹,
Raul Rangel-Gonzalez²,
Mario I. Chacon-Murguia¹ &
…
Graciela Ramirez-Alonso³

376 Accesses
1 Citation
Explore all metrics

Abstract

Scene analysis in video sequences is a complex task for a computer vision system. Several schemes have been addressed in this analysis, such as deep learning networks or traditional image processing methods. However, these methods require thorough training or manual adjustment of parameters to achieve accurate results. Therefore, it is necessary to develop novel methods to analyze the scenario information in video sequences. For this reason, this paper proposes a method for object segmentation in video sequences inspired by the structural layers of the visual cortex. The method is called Neuro-Inspired Object Segmentation, SegNI. SegNI has a hierarchical architecture that analyzes object features such as edges, color, and motion to generate regions that represent the objects in the scenario. The results obtained with the Video Segmentation Benchmark VSB100 dataset demonstrate that SegNI can adapt automatically to videos with scenarios that have different nature, composition, and different types of objects. Also, SegNI adapts its processing to new scenario conditions without training, which is a significant advantage over deep learning networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Data availability

The code and readme are available in https://github.com/RaulRan/NISeg.

References

Andersen RA (1997) Neural mechanisms of visual motion perception in primates. Cell Press. https://doi.org/10.1016/S0896-6273(00)80326-8
Article Google Scholar
Arbeláez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916. https://doi.org/10.1109/TPAMI.2010.161
Article PubMed Google Scholar
Bednar JA, Miikkulainen R (2000) Tilt aftereffects in a self-organizing model of the primary visual cortex. Neural Comput 12(7):1721–1740. https://doi.org/10.1162/089976600300015321
Article CAS PubMed Google Scholar
Bednar JA, De Paula JB, Miikkulainen R (2005) Self-organization of color opponent receptive fields and laterally connected orientation maps. Neurocomputing. https://doi.org/10.1016/j.neucom.2004.10.055
Article Google Scholar
Brito da Silva LE, Elnabarawy I, Wunsch DC (2019) A survey of adaptive resonance theory neural network models for engineering applications. Neural Netw 120:167–203. https://doi.org/10.1016/j.neunet.2019.09.012
Article PubMed Google Scholar
Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88–97. https://doi.org/10.1016/j.patrec.2008.04.005
Article Google Scholar
Caelles S, Pont-Tuset J, Perazzi F, Montes A, Maninis KK, Van Gool L (2019) The 2019 davis challenge on vos: Unsupervised multi-object segmentation. arXiv:190500737
Chabane AN, Islam N, Zerr B (2017) Incremental clustering of sonar images using self-organizing maps combined with fuzzy adaptive resonance theory. Ocean Eng 142:133–144. https://doi.org/10.1016/j.oceaneng.2017.06.061
Article Google Scholar
Chacon-Murguia MI, Guzman-Pando A, Ramirez-Alonso G, Ramirez-Quintana JA (2019) A novel instrument to compare dynamic object detection algorithms. Image Vis Comput 88:19–28. https://doi.org/10.1016/j.imavis.2019.04.006
Article Google Scholar
Chang P, Wang X, Huang J (2012) Color image segmentation based on visual perception. In: 2012 IEEE international conference on information science and technology, pp 425–429, https://doi.org/10.1109/ICIST.2012.6221682
Cheng D, Zhu Q, Huang J, Wu Q, Yang L (2021) Clustering with local density peaks-based minimum spanning tree. IEEE Trans Knowl Data Eng 33(2):374–387. https://doi.org/10.1109/TKDE.2019.2930056
Article Google Scholar
Chua L, Roska T (2010) Cellular neural networks and visual computing: foundations and applications. Cambridge University Press, Cambridge
Google Scholar
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Corso JJ, Sharon E, Dube S, El-Saden S, Sinha U, Yuille A (2008) Efficient multilevel brain tumor segmentation with integrated bayesian model classification. IEEE Trans Med Imaging 27(5):629–640. https://doi.org/10.1109/TMI.2007.912817
Article CAS PubMed Google Scholar
Dong T, Zhang X, Ding Z, Fan J (2020) Multi-layered tree crown extraction from lidar data using graph-based segmentation. Comput Electron Agric 170:105213. https://doi.org/10.1016/j.compag.2020.105213
Article Google Scholar
Du X, Dai P, Wang S, Cheng Y, Wu D (2017) Coupled wilson-cowan oscillator model with double-node for image enhancement. In: 2017 IEEE third international conference on multimedia big data (BigMM), pp. 129–133, https://doi.org/10.1109/BigMM.2017.46
Fairchild MD (2013) Color appearance models. Wiley, London
Book Google Scholar
Farnworth T, Renton C, Strydom R, Wills A, Perez T (2021) A heteroscedastic likelihood model for two-frame optical flow. IEEE Robot Automat Lett 6(2):1200–1207. https://doi.org/10.1109/LRA.2021.3056342
Article Google Scholar
Fortun D, Bouthemy P, Kervrann C (2015) Optical flow modeling and computation: a survey. Comput Vis Image Understand Real World Vid Netw 134:1–21. https://doi.org/10.1016/j.cviu.2015.02.008
Article Google Scholar
Galasso F, Cipolla R, Schiele B (2013) Video segmentation with superpixels. In: Lecture Notes in Computer Science, including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, Springer, Berlin, Heidelberg, pp 760–774, https://doi.org/10.1007/978-3-642-37331-2_57
Galasso F, Nagaraja NS, Cárdenas TJ, Brox T, Schiele B (2013) A unified video segmentation benchmark: Annotation, metrics and analysis. In: 2013 IEEE international conference on computer vision, pp 3527–3534, https://doi.org/10.1109/ICCV.2013.438
Garg S, Goel V, Kumar S (2020) Unsupervised video object segmentation using online mask selection and space-time memory networks. The 2020 DAVIS Challenge on Video Object Segmentation - CVPR Workshops
Gharaee Z (2021) Online recognition of unsegmented actions with hierarchical SOM architecture. Cognit Process 22(1):77–91. https://doi.org/10.1007/s10339-020-00986-4
Article Google Scholar
Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 2141–2148, https://doi.org/10.1109/CVPR.2010.5539893
Gupta A, Anpalagan A, Guan L, Khwaja AS (2021) Deep learning for object detection and scene perception in self-driving cars: survey, challenges, and open issues. Array 10:100057. https://doi.org/10.1016/j.array.2021.100057
Article Google Scholar
Jiang L, Zhang D, Che L (2021) Texture analysis-based multi-focus image fusion using a modified pulse-coupled neural network (pcnn). Signal Process Image Commun. https://doi.org/10.1016/j.image.2020.116068
Article Google Scholar
Keuper M, Brox T (2016) Point-wise mutual information-based video segmentation with high temporal consistency. In: Hua G, Jégou H (eds) Computer Vision - ECCV 2016 Workshops. Springer International Publishing, Cham, pp 789–803
Kruger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodriguez-Sanchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35(8):1847–1871. https://doi.org/10.1109/TPAMI.2012.272
Article PubMed Google Scholar
Kuzmina M, Manykin E (2005) Oscillatory neural network for adaptive dynamical image processing. In: international conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06), vol 1, pp 301–306, https://doi.org/10.1109/CIMCA.2005.1631283
Li W, Ogunbona P, Ye L, Kharitonenko I (2004) Visual perceptual process model and object segmentation. In: proceedings 7th international conference on signal processing, 2004. ICSP ’04. 2004., vol 1, pp 753–756 vol.1, https://doi.org/10.1109/ICOSP.2004.1452772
Masland RH, Dallos P, Firestein S (2020) The senses : a comprehensive reference. Elsevier, Amsterdam
Google Scholar
Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3059968
Article PubMed Google Scholar
Mou L, Hua Y, Zhu XX (2020) Relation matters: relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images. IEEE Trans Geosci Remote Sens 58(11):7557–7569. https://doi.org/10.1109/TGRS.2020.2979552
Article Google Scholar
Ochs P, Brox T (2011) Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: 2011 international conference on computer vision, pp 1583–1590, https://doi.org/10.1109/ICCV.2011.6126418
Pisal A, Sor R, Kinage KS (2017) Facial feature extraction using hierarchical max(hmax) method. In: 2017 international conference on computing, communication, control and automation (ICCUBEA), pp 1–5, https://doi.org/10.1109/ICCUBEA.2017.8463755
Ramirez-Quintana JA, Chacon-Murguia MI (2015) Self-adaptive som-cnn neural system for dynamic object detection in normal and complex scenarios. Pattern Recogni 48(4):1137–1149. https://doi.org/10.1016/j.patcog.2014.09.009
Article Google Scholar
Saglam A, Baykan NA (2017) Effects of color spaces and distance norms on graph-based image segmentation. In: 2017 3rd international conference on frontiers of signal processing (ICFSP), pp 130–135, https://doi.org/10.1109/ICFSP.2017.8097156
Sanchez G, Madrenas J, Cosp-Vilella J (2019) Legion-based image segmentation by means of spiking neural networks using normalized synaptic weights implemented on a compact scalable neuromorphic architecture. Neurocomputing 352:106–120. https://doi.org/10.1016/j.neucom.2019.04.037
Article Google Scholar
Sengupta N, McNabb CB, Kasabov N, Russell BR (2018) Integrating space, time, and orientation in spiking neural networks: a case study on multimodal brain data modeling. IEEE Trans Neural Netw Learn Syst 29(11):5249–5263. https://doi.org/10.1109/TNNLS.2018.2796023
Article PubMed Google Scholar
Stoll S, Finlayson NJ, Schwarzkopf DS (2020) Topographic signatures of global object perception in human visual cortex. NeuroImage 220:116926. https://doi.org/10.1016/j.neuroimage.2020.116926
Article PubMed Google Scholar
Sundberg P, Brox T, Maire M, Arbeláez P, Malik J (2011) Occlusion boundary detection and figure/ground assignment from optical flow. In: CVPR 2011:2233–2240. https://doi.org/10.1109/CVPR.2011.5995364
Sung M, Kim Y (2020) Training spiking neural networks with an adaptive leaky integrate-and-fire neuron. In: 2020 IEEE international conference on consumer electronics - Asia (ICCE-Asia), pp 1–2, https://doi.org/10.1109/ICCE-Asia49877.2020.9277455
T Zhou YY W Wang, Shen J (2020) Target-aware adaptive tracking for unsupervised video object segmentation. The 2020 DAVIS Challenge on Video Object Segmentation - CVPR Workshops
Thwaites A, Wingfield C, Wieser E, Soltan A, Marslen-Wilson WD, Nimmo-Smith I (2018) Entrainment to the ciecam02 and cielab colour appearance models in the human cortex. Vis Res 145:1–10. https://doi.org/10.1016/j.visres.2018.01.011
Article PubMed Google Scholar
Tjøstheim TA, Balkenius C (2019) Cumulative inhibition in neural networks. Cognit Process 20(1):87–102. https://doi.org/10.1007/s10339-018-0888-z
Article Google Scholar
Tran Q, Su S, Nguyen V (2020) Pyramidal lucas-kanade-based noncontact breath motion detection. IEEE Trans Syst Man Cybern Syst 50(7):2659–2670. https://doi.org/10.1109/TSMC.2018.2825458
Article Google Scholar
Wang Q, Gao J, Yuan Y (2018) A joint convolutional neural networks and context transfer for street scenes labeling. IEEE Trans Intell Transp Syst 19(5):1457–1470. https://doi.org/10.1109/TITS.2017.2726546
Article Google Scholar
Wang Q, Gao J, Li X (2019) Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Trans Image Process 28(9):4376–4386. https://doi.org/10.1109/TIP.2019.2910667
Article PubMed Google Scholar
Wang Z, Wang Z (2020) A generic approach for cell segmentation based on gabor filtering and area-constrained ultimate erosion. Artif Intell Med 107:101929. https://doi.org/10.1016/j.artmed.2020.101929
Article PubMed Google Scholar
X Xiao CC, Lu Y (2020) Global tracklet matching for unsupervised video object segmentation. The 2020 DAVIS Challenge on Video Object Segmentation - CVPR Workshops
Xu C, Xiong C, Corso JJ (2012) Streaming hierarchical video segmentation. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer, Berlin, Heidelberg, PART 6, pp 626–639, https://doi.org/10.1007/978-3-642-33783-3_45
Xu H, Hancock ER, Zhou W (2019) The low-rank decomposition of correlation-enhanced superpixels for video segmentation. Soft Comput 23(24):13055–13065. https://doi.org/10.1007/s00500-019-03849-z
Article Google Scholar
Xu N, Yang L, Fan Y, Yue D, Liang Y, Yang J, Huang T (2018) Youtube-vos: A large-scale video object segmentation benchmark
Yamasaki T, Tobimatsu S (2018) Driving ability in alzheimer disease spectrum: neural basis, assessment, and potential use of optic flow event-related potentials. Front Neurol 9:1–14. https://doi.org/10.3389/fneur.2018.00750
Article Google Scholar
Yang K, Hu X, Stiefelhagen R (2021) Is context-aware cnn ready for the surroundings? panoramic semantic segmentation in the wild. IEEE Trans Image Process 30:1866–1881. https://doi.org/10.1109/TIP.2020.3048682
Article PubMed Google Scholar
Yu B, Zhang L (2004) Pulse-coupled neural networks for contour and motion matchings. IEEE Trans Neural Netw 15(5):1186–1201. https://doi.org/10.1109/TNN.2004.832830
Article PubMed Google Scholar
Yu J, Xia G, Gao C, Samal A (2016) A computational model for object-based visual saliency: spreading attention along gestalt cues. IEEE Trans Multimed 18(2):273–286. https://doi.org/10.1109/TMM.2015.2505908
Article Google Scholar
Zhao Y, Yuan Y, Nie F, Wang Q (2018) Spectral clustering based on iterative optimization for large-scale and high-dimensional data. Neurocomputing 318:227–235. https://doi.org/10.1016/j.neucom.2018.08.059
Article Google Scholar

Download references

Funding

This work was supported by the Tecnologico Nacional de México under grants TecNM 7598.20-P and 10071.21-P.

Author information

Authors and Affiliations

Graduate and Research Department, Tecnologico Nacional de Mexico / I.T. Chihuahua, Av. Tecnologico 2909, Chihuahua, 31310, Mexico
Juan A. Ramirez-Quintana & Mario I. Chacon-Murguia
Intel Tecnologia de Mexico, Guadalajara, Mexico
Raul Rangel-Gonzalez
Universidad Autónoma de Chihuahua, Facultad de Ingeniería, Chihuahua, Chihuahua, México
Graciela Ramirez-Alonso

Authors

Juan A. Ramirez-Quintana
View author publications
You can also search for this author in PubMed Google Scholar
Raul Rangel-Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Mario I. Chacon-Murguia
View author publications
You can also search for this author in PubMed Google Scholar
Graciela Ramirez-Alonso
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All de author develop the research, theoretical development, code, and the paper.

Corresponding author

Correspondence to Juan A. Ramirez-Quintana.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code availability

The code will be available. Please, indicate where we publish the code to the journal.

Consent for publication

If the paper is accepted, the code and all the necessary documents will be published to the journal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Section Editor: Moreno I. Coco (University of East London) Reviewers: Qi Wang (Northwestern Polytechnical University, Shaanxi), Anastasiia Mikhailova (University of Lisbon).

A DAVIS2017 results

DAVIS2017 is a dataset composed of 150 video sequences with 10474 annotated frames. DAVIS2017 considers challenging scenarios such as motion blur and occlusions within its three different tasks: Semi-Supervised, Interactive, and Unsupervised Video Object Segmentation. We considered this dataset as an option to validate SegNI, particularly by using the unsupervised task. In this modality, the methods must generate segmentation approaches without any human interaction. The results obtained by our model with three different videos are presented in Fig. 6.

The temporal coherence obtained with SegNI is very accurate due to the constant \(\tau _h\) and the contour detection in V4 (as the results in VSB100 experiments). The first video named basketball-game is highly complex because it has many small dynamic objects. The results demonstrate that SegNI could correctly identify most of the basketball players. In the second video, known as bmx-rider, the bmx-rider is accurately identified as the video sequence progresses, in some cases, even the two wheels are identified as different objects from the bicycle. Finally, the results of the last video, named mermaid are very interesting. In this case, SegNI separated the water and its refraction effect despite the color similitude.

Unfortunately, the format of the SegNI segmentation results is not compatible with the format required for evaluation on the DAVIS submission site. SegNI detects all the objects in the scenario, whereas the groundtruth of DAVIS considers just moving objects. Then, it was not possible to obtain quantitative segmentation results.

The analysis of the models ranked in the first places in the 2020 DAVIS challenge indicates that all of them use pre-trained supervised models to define an initial segmentation mask (Garg et al. 2020; T. Zhou and Shen 2020; X. Xiao and Lu 2020). This initial mask is improved by a second model trained on at least 8-GPUs for no less than two days. Even when their results are very impressive and do not require human intervention to define an initial delineation of objects, their solution are based on pre-trained supervised models. This is not the strategy that we follow in SegNI where the segmentation results are produced as the model analyzes the video with a common CPU computer.

Figure 7 shows some qualitative segmentation results whose GT is available. In this figure, the videos blackswan and dogs-scale were used. These results demonstrate that the definition of objects of SegNI is very precise.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramirez-Quintana, J.A., Rangel-Gonzalez, R., Chacon-Murguia, M.I. et al. A visual object segmentation algorithm with spatial and temporal coherence inspired by the architecture of the visual cortex. Cogn Process 23, 27–40 (2022). https://doi.org/10.1007/s10339-021-01065-y

Download citation

Received: 04 April 2021
Accepted: 25 October 2021
Published: 15 November 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10339-021-01065-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A visual object segmentation algorithm with spatial and temporal coherence inspired by the architecture of the visual cortex

Abstract

Access this article

Similar content being viewed by others