Abstract
This work describes the use of brain programming for automating the video tracking design process. The challenge is that of creating visual programs that learn to detect a toy dinosaur from a database while tested in a visual-tracking scenario. When planning an object tracking system, two sub-tasks need to be approached: detection of moving objects in each frame and correct association of detection to the same object over time. Visual attention is a skill performed by the brain whose functionality is to perceive salient visual features. The automatic design of visual attention programs through an optimization paradigm is applied to the detection-based tracking of objects in a video from a moving camera. A system based on the acquisition and integration steps of the natural dorsal stream was engineered to emulate its selectivity and goal-driven behavior useful to the task of tracking objects. This is considered a challenging problem since many difficulties can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid structures, object-to-object and object-to-scene occlusions, as well as camera motion, models, and parameters. Tracking relies on the quality of the detection process and automatically designing such stage could significantly improve tracking methods. Experimental results confirm the validity of our approach using three different kinds of robotic systems. Moreover, a comparison with the method of regions with convolutional neural networks is provided to illustrate the benefit of the approach.
Similar content being viewed by others
References
Ali A, Aggarwal JK (2001) Segmentation and recognition of continuous human activity. In: Proceedings of IEEE workshop on detection and recognition of events in video, pp 28–35. https://ieeexplore.ieee.org/document/938863/
Amazon Web Service. Amazon AI. https://aws.amazon.com/machine-learning/
Avidan S (2004) Support vector tracking. IEEE Trans Pattern Anal Mach Intell 26(8):1064–1072. https://ieeexplore.ieee.org/document/1307012/
Bensebaa Amina, Larabi Slimane (2018) Direction estimation of moving pedestrian groups for intelligent vehicles. Vis Comput 34(6–8):1109–1118. https://doi.org/10.1007/s00371-018-1520-z
Black MJ, Jepson AD (1998) Eigentracking: robust matching and tracking of articulated objects using a view-based representation. Int J Comput Vis 26(1):63–84. https://link.springer.com/article/10.1023/A:1007939232436
Caffe2. https://caffe2.ai/
Chen S, Li Y, Kwok NM (2011) Active vision in robotic systems: a survey of recent developments. Int J Robot Res 30(11):1343–1377. http://journals.sagepub.com/doi/abs/10.1177/0278364911410755
Choudhury SK, Sa PK, Padhy RP, Sharma S, Bakshi S (2018) Improved pedestrian detection using motion segmentation and silhouette orientation. Multimed Tools Appl 17(1):13075–13114. https://doi.org/10.1007/s11042-017-4933-1
Clemente E, Olague G, Dozal L, Mancilla M (2012) Object recognition with an optimized ventral stream model using genetic programming. Appl Evol Comput LNCS 7248:315–325. https://doi.org/10.1007/978-3-642-29178-4_32
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619. https://ieeexplore.ieee.org/document/1000236/
Cremers D, Schnȯrr C (2003) Statistical shape knowledge in variational motion segmentation. Image Vis Comput 21(1):77–86. https://www.sciencedirect.com/science/article/pii/S0262885602001282
Cuda-Convnet. https://code.google.com/archive/p/cuda-convnet/
Deep Learning in MATLAB. https://www.mathworks.com/help/nnet/ug/deep-learning-in-matlab.html
Deng J, Dong W, Socher R, Li L-J, Li F-F (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 248–255. https://ieeexplore.ieee.org/document/5206848/
Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Ann Revue Neurosci 18:193–222. https://www.ncbi.nlm.nih.gov/pubmed/7605061
Dozal L, Olague G, Clemente E, Sánchez M (2012) Evolving visual attention programs through EVO features. Appl Evol Comput LNCS 7248:326–335. https://doi.org/10.1007/978-3-642-29178-4_33
Dozal L, Olague G, Clemente, Hernández DE (2014) Brain programming for the evolution of an artificial dorsal stream. Cogn Comput 6(3):528–557. https://doi.org/10.1007/s12559-014-9251-6
Fan J, Wu Y, Dai S (2010) Discriminative spatial attention for robust tracking. Springer, Berlin, pp 480–493. https://link.springer.com/chapter/10.1007/978-3-642-15549-9_35
Fieguth P, Terzopoulos D (1997) Color-based tracking of heads and other mobile objects at video frame rates. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 21–27. https://ieeexplore.ieee.org/document/609292/
Fukushima K (1975) Cognitron: a self-organizing multilayered neural network. Biol Cybern 20(6):121–136. https://doi.org/10.1007/BF00342633
Fukushima K (1980) Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36 (4):193–202. https://doi.org/10.1007/BF00344251
Girshick R, Donahue J, Darrel T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 580–587. https://ieeexplore.ieee.org/document/6909475/
Google Cloud Machine Learning. https://cloud.google.com/products/ai/
Google TensorFlow. https://www.tensorflow.org
Hernández DE, Olague G, Clemente E, Dozal L (2012) Evolving a conspicuous point detector based on an artificial dorsal stream: SLAM system. Gen Evol Comput Conf, 1087–1094. https://dl.acm.org/citation.cfm?doid=2330163.2330314
Hernández D, Olague G, Clemente E, Dozal L (2012) Evolutionary purposive or behavioral vision for camera trajectory estimation. Appl Evol Comput LNCS 7248:336–345. https://doi.org/10.1007/978-3-642-29178-4_34
Hernández DE, Clemente E, Olague G, Briseṅo JL (2016) Evolutionary multi-objective visual cortex for object classification in natural images. J Comput Sci 17:216–233. https://doi.org/10.1016/j.jocs.2015.10.011
Hernández DE, Olague G, Hernández B, Clemente E (2017) CUDA-based parallelization of a bio-inspired model for fast object classification. Neural Comput Appl, 1–12. Available online https://link.springer.com/article/10.1007/s00521-017-2873-3
Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Syst Man Cybern Part C (Appl Rev) 34(3):334–352. https://ieeexplore.ieee.org/document/1310448/
Hubel DH (1982) Exploration of the primary visual cortex, 1955-78. Nature 299:515–524. https://doi.org/10.1038/299515a0
Hubel DH, Wiesel TN (1959) Receptive fields of single neurones in the cat’s striate cortex. J Physiol 148(3):574–591. https://doi.org/10.1113/jphysiol.1959.sp006308
IBM Watson. https://www.ibm.com/watson/
Intille SS, Davis JW, Bobick AF (1997) Real-time closed-world tracking. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 697–703. https://ieeexplore.ieee.org/document/609402/
Isard M, Blake A (1998) Condensation – conditional density propagation for visual tracking. Int J Comput Vis 29(1):5–28. https://link.springer.com/article/10.1023/A:1008078328650
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203. https://www.nature.com/articles/35058500
Kang Jinman, Cohen I, Medioni G (2003) Continuous tracking within and across camera streams. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, vol 1, pp 267–272. https://ieeexplore.ieee.org/document/1211363/
Kim K, Davis LS (2011) Object detection and tracking for intelligent video surveillance. Springer, Berlin, pp 265–288. https://link.springer.com/chapter/10.1007
Ko T (2011) A survey on behaviour analysis in video surveillance applications, chapter 16, pp 279–294 InTech. https://www.intechopen.com/books/video-surveillance/a-survey-on-behavior-analysis-in-video-surveillance-applications
Koch C, Ullman S (1985) Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiol 4(4):219–227. Reprinted in Matters of Intelligence, pp. 115–141, 1987. https://link.springer.com/chapter/10.1007/978-94-009-3833-5_5
Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical Report, https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf
LeCun Y, Bottou L, Bengio Ya, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://ieeexplore.ieee.org/document/726791/
Li B, Chellappa R, Zheng Q, Der SZ (2001) Model-based temporal object verification using video. IEEE Trans Image Process 10(6):897–908. https://ieeexplore.ieee.org/document/923286/
Li Z, Wang W, Wang Y, Chen F, Yi W (2013) Visual tracking by proto-objects. Pattern Recogn 46(8):2187–2201. https://www.sciencedirect.com/science/article/pii/S0031320313000575
Ma L, Cheng J, Liu J, Wang J, Lu H (2010) Visual attention model based object tracking. Springer, Berlin, pp 483–493. https://link.springer.com/chapter/10.1007/978-3-642-15696-0_45
Mahadevan V, Vasconcelos N (2009) Saliency-based discriminant tracking. In: 2009 IEEE conference on computer vision and pattern recognition, pp 1007–1013. https://ieeexplore.ieee.org/document/5206573/
Mancas M, Ferrera VPP, Riche N, Taylor JGG (eds) (2016) From human attention to computational attention: a multidisciplinary approach, volume 10 springer series in cognitive and neural systems. Springer. https://www.springer.com/gp/book/9781493934331
Microsoft Azure. https://azure.microsoft.com/en-us/services/machine-learning-studio/
Microsoft Cognitive Toolkit. https://www.microsoft.com/en-us/cognitive-toolkit/
Nanda A, Sa PK, Choudhury SK, Bakshi S, Majhi B (2017) A neuromorphic person re-identification framework for video surveillance. IEEE Access 5:6471–6482. https://ieeexplore.ieee.org/document/7885600/
Nanda A, Chauhan DS, Sa PK, Bakshi S (2018) Illumination and scale invariant relevant visual features with hypergraph-based learning for multi-shot person re-identification. Multimed Tools Appl, 1–26. First online https://doi.org/10.1007/s11042-017-4875-7
Olague G (2016) Evolutionary computer vision – the first footprints. Springer. https://www.springer.com/gp/book/9783662436929
Olague G, Clemente E, Dozal L, Hernández DE (2014) Evolving an artificial visual cortex for object recognition with brain programming. In: Schütze O et al. (eds) EVOLVE – a bridge between probability set oriented numerics and evolutionary computation III, volume 500 of studies in computational intelligence, pp 97–119. https://link.springer.com/chapter/10.1007/978-3-319-01460-9_5
Olague G, Hernández DE, Clemente E, Chan-Ley M (2018) Evolving head tracking routines with brain programming. IEEE Access 6:26254–26270. https://doi.org/10.1109/ACCESS.2018.2831633
Osaka N, Rentschler I, Biederman I (eds) (2007) Object recognition attention, and action. Springer. https://www.springer.com/gp/book/9784431730187
Ouerhani N, Hügli H (2003) A model of dynamic visual attention for object tracking in natural image sequences. Springer, Berlin, pp 702–709. https://link.springer.com/chapter/10.1007/3-540-44868-3_89
Park S, Aggarwal JK (2004) A hierarchical Bayesian network for event recognition of human actions and interactions. Multimed Syst 10(2):164–179. https://link.springer.com/article/10.1007/s00530-004-0148-1
Posner MI, Snyder CR, Davidson BJ (1980) Attention and the detection of signals. J Exp Psychol 109(2):160–174. https://www.ncbi.nlm.nih.gov/pubmed/7381367
Pytorch. https://pytorch.org
Rangarajan K, Shah M (1991) Establishing motion correspondence. CVGIP: Image Understand 54(1):56–73. https://ieeexplore.ieee.org/document/139669/
Rasool Reddy K, Hari Priya K, Neelima N (2015) Object detection and tracking – a survey. In: 2015 International conference on computational intelligence and communication networks (CICN), pp 418–421. https://ieeexplore.ieee.org/document/7546127/
Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nature 2:1019–1025. https://doi.org/10.1038/14819
Rout JK, Singh S, Jena SK, Bakshi S (2017) Deceptive review detection using labeled and unlabeled data. Multimed Tools Appl 76(3):3187–3211. https://link.springer.com/article/10.1007/s11042-016-3819-y
Schweitzer H, Bell JW, Wu F (2002) Very fast template matching. In: European conference on computer vision, vol LNCS 2353, pp 358–372, https://link.springer.com/chapter/10.1007/3-540-47979-1_24
Serby D, Meier EK, van Gool L (2004) Probabilistic object tracking using multiple features. In: Proceedings of the 17th international conference on pattern recognition, ICPR, vol 2. IEEE, pp 184–187. https://ieeexplore.ieee.org/document/1334091/
Shafique K, Shah M (2005) A noniterative greedy algorithm for multiframe point correspondence. IEEE Trans Pattern Anal Mach Intell 27(1):51–65. https://ieeexplore.ieee.org/document/1359751/
Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468. https://ieeexplore.ieee.org/document/6671560/
Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cognitive Psychology. https://www.sciencedirect.com/science/article/pii/0010028580900055
Ungerleider LG, Haxby JV (1994) ‘What’ and ‘where’ in the human brain. Curr Opin Neurobiol 4(2):157–165. https://www.ncbi.nlm.nih.gov/pubmed/8038571
Vaswani N, Roy Chowdhury A, Chellappa R (2003) Activity recognition using the dynamics of the configuration of interacting objects. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 633–640. https://ieeexplore.ieee.org/abstract/document/1211526/
Veenman CJ, Reinders MJT, Backer E (2001) Resolving motion correspondence for densely moving points. IEEE Trans Pattern Anal Mach Intell 23 (1):54–72. https://ieeexplore.ieee.org/document/899946/
Wolfe JM (2000) Visual attention. In: de Valois KK (ed) Seeing (handbook of perception and cognition), Chapter 8. Academic Press, pp 335–386. https://www.sciencedirect.com/science/article/pii/B9780124437609500106
Yilmaz A, Li Xin, Shah M (2004) Contour-based object tracking with occlusion handling in video acquired using mobile cameras. IEEE Trans Pattern Anal Mach Intell 26(11):1531–1536. https://ieeexplore.ieee.org/document/1335457/
Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv, 38(4). https://doi.org/10.1145/1177352.1177355
Zang Q, Klette R (2003) Object classification and tracking in video surveillance. Springer, Berlin, pp 198–205. https://link.springer.com/chapter/10.1007/978-3-540-45179-2_25
Zhao Q (ed) (2017) Computational and cognitive neuroscience of vision, cognitive science and technology series. Springer. https://www.springer.com/gp/book/9789811002113
Zhou SK, Chellappa R, Moghaddam B (2004) Visual tracking and recognition using appearance-adaptive models in particle filters. IEEE Trans Image Process 13(11):1491–1506. https://ieeexplore.ieee.org/document/1344039/
Acknowledgements
This research was funded by CICESE through Project 634-128 – “Programación cerebral aplicada al estudio del pensamiento y la visión”. In addition, the authors acknowledge the valuable comments of the anonymous reviewers, the Editor of Multimedia Tools and Applications, and the International Editorial Board whose enthusiasm is gladly appreciated.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Olague, G., Hernández, D.E., Llamas, P. et al. Brain programming as a new strategy to create visual routines for object tracking. Multimed Tools Appl 78, 5881–5918 (2019). https://doi.org/10.1007/s11042-018-6634-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6634-9