Skip to main content

Advertisement

Log in

A visual object segmentation algorithm with spatial and temporal coherence inspired by the architecture of the visual cortex

  • Research Article
  • Published:
Cognitive Processing Aims and scope Submit manuscript

Abstract

Scene analysis in video sequences is a complex task for a computer vision system. Several schemes have been addressed in this analysis, such as deep learning networks or traditional image processing methods. However, these methods require thorough training or manual adjustment of parameters to achieve accurate results. Therefore, it is necessary to develop novel methods to analyze the scenario information in video sequences. For this reason, this paper proposes a method for object segmentation in video sequences inspired by the structural layers of the visual cortex. The method is called Neuro-Inspired Object Segmentation, SegNI. SegNI has a hierarchical architecture that analyzes object features such as edges, color, and motion to generate regions that represent the objects in the scenario. The results obtained with the Video Segmentation Benchmark VSB100 dataset demonstrate that SegNI can adapt automatically to videos with scenarios that have different nature, composition, and different types of objects. Also, SegNI adapts its processing to new scenario conditions without training, which is a significant advantage over deep learning networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The code and readme are available in https://github.com/RaulRan/NISeg.

References

Download references

Funding

This work was supported by the Tecnologico Nacional de México under grants TecNM 7598.20-P and 10071.21-P.

Author information

Authors and Affiliations

Authors

Contributions

All de author develop the research, theoretical development, code, and the paper.

Corresponding author

Correspondence to Juan A. Ramirez-Quintana.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Code availability

The code will be available. Please, indicate where we publish the code to the journal.

Consent for publication

If the paper is accepted, the code and all the necessary documents will be published to the journal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Section Editor: Moreno I. Coco (University of East London) Reviewers: Qi Wang (Northwestern Polytechnical University, Shaanxi), Anastasiia Mikhailova (University of Lisbon).

A DAVIS2017 results

A DAVIS2017 results

DAVIS2017 is a dataset composed of 150 video sequences with 10474 annotated frames. DAVIS2017 considers challenging scenarios such as motion blur and occlusions within its three different tasks: Semi-Supervised, Interactive, and Unsupervised Video Object Segmentation. We considered this dataset as an option to validate SegNI, particularly by using the unsupervised task. In this modality, the methods must generate segmentation approaches without any human interaction. The results obtained by our model with three different videos are presented in Fig. 6.

Fig. 6
figure 6

Results of SegNI with three videos of DAVIS2017 test dataset

The temporal coherence obtained with SegNI is very accurate due to the constant \(\tau _h\) and the contour detection in V4 (as the results in VSB100 experiments). The first video named basketball-game is highly complex because it has many small dynamic objects. The results demonstrate that SegNI could correctly identify most of the basketball players. In the second video, known as bmx-rider, the bmx-rider is accurately identified as the video sequence progresses, in some cases, even the two wheels are identified as different objects from the bicycle. Finally, the results of the last video, named mermaid are very interesting. In this case, SegNI separated the water and its refraction effect despite the color similitude.

Unfortunately, the format of the SegNI segmentation results is not compatible with the format required for evaluation on the DAVIS submission site. SegNI detects all the objects in the scenario, whereas the groundtruth of DAVIS considers just moving objects. Then, it was not possible to obtain quantitative segmentation results.

The analysis of the models ranked in the first places in the 2020 DAVIS challenge indicates that all of them use pre-trained supervised models to define an initial segmentation mask (Garg et al. 2020; T. Zhou and Shen 2020; X. Xiao and Lu 2020). This initial mask is improved by a second model trained on at least 8-GPUs for no less than two days. Even when their results are very impressive and do not require human intervention to define an initial delineation of objects, their solution are based on pre-trained supervised models. This is not the strategy that we follow in SegNI where the segmentation results are produced as the model analyzes the video with a common CPU computer.

Figure 7 shows some qualitative segmentation results whose GT is available. In this figure, the videos blackswan and dogs-scale were used. These results demonstrate that the definition of objects of SegNI is very precise.

Fig. 7
figure 7

SegNI Results with videos of DAVIS2017 training dataset

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramirez-Quintana, J.A., Rangel-Gonzalez, R., Chacon-Murguia, M.I. et al. A visual object segmentation algorithm with spatial and temporal coherence inspired by the architecture of the visual cortex. Cogn Process 23, 27–40 (2022). https://doi.org/10.1007/s10339-021-01065-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10339-021-01065-y

Keywords

Navigation