Abstract
Visual object recognition is one of the most fundamental and challenging research topics in the field of computer vision. The research on the neural mechanism of the primates’ recognition function may bring revolutionary breakthroughs in brain-inspired vision. This Review aims to systematically review the recent works on the intersection of computational neuroscience and computer vision. It attempts to investigate the current brain-inspired object recognition models and their underlying visual neural mechanism. According to the technical architecture and exploitation methods, we describe the brain-inspired object recognition models and their advantages and disadvantages in realizing brain-inspired object recognition. We focus on analyzing the similarity between the artificial and biological neural network, and studying the biological credibility of the current popular DNN-based visual benchmark models. The analysis provides a guide for researchers to measure the occasion and condition when conducting visual object recognition research.
Similar content being viewed by others
Notes
Hemera Photo Objects: http://www.halley.cc/ed/linux/interop/hemera.html.
3D car mesh models download from Creative Commons Attribution: https://grey.colorado.edu/CompCogNeuro/index.php/CU3D.
The Psychological Image Collection at Stirling (PICS): http://pics.psych.stir.ac.uk/cgi-bin/PICS/New/pics.cgi.
PrimFace: http://visiome.neuroinf.jp/primface.
GRAINS: https://www.artcogsys.com.
Quick, Draw!: https://github.com/googlecreativelab/quickdraw-dataset.
References
Adeli H, Zelinsky G (2018) Deep-bcn: Deep networks meet biased competition to create a brain-inspired model of attention control. In: Proceedings 2018 Ieee/Cvf conference on computer vision and pattern recognition workshops (Cvprw), pp. 2013–2023. https://doi.org/10.1109/Cvprw.2018.00259
Agrawal P, Stansbury D, Malik J, Gallant J (2014) Pixels to voxels: modeling visual representation in the human brain. arXiv:abs/1407.5104
Ahissar M, Hochstein S (2004) The reverse hierarchy theory of visual perceptual learning. Trends Cogn Sci 8(10):457–464. https://doi.org/10.1016/j.tics.2004.08.011
Albright TD, Stoner GR (2002) Contextual influences on visual processing. Annual Rev Neurosci 25:339–379. https://doi.org/10.1146/annurev.neuro.25.112701.142900
Andresen DR, Vinberg J, Grill-Spector K (2009) The representation of object viewpoint in human visual cortex. Neuroimage 45(2):522–536. https://doi.org/10.1016/j.neuroimage.2008.11.009
Ashby EG, Maddox WT (2005) Human category learning. Annual Rev Psychol 56:149–178. https://doi.org/10.1146/annurev.psych.56.091103.070217
Ayzenberg V, Lourenco SF (2019) Skeletal descriptions of shape provide unique perceptual information for object recognition. Sci Rep 9. ARTN 9359 https://doi.org/10.1038/s41598-019-45268-y
Azzopardi G, Petkov N (2013) Automatic detection of vascular bifurcations in segmented retinal images using trainable cosfire filters. Pattern Recognit Lett 34(8):922–933. https://doi.org/10.1016/j.patrec.2012.11.002
Azzopardi G, Petkov N (2013) Trainable cosfire filters for keypoint detection and pattern recognition. IEEE Trans Pattern Anal Mach Intell 35(2):490–503. https://doi.org/10.1109/Tpami.2012.106
Azzopardi G, Petkov N (2014) Cosfire: a brain-inspired approach to visual pattern recognition. Brain-Inspire Comput 8603:76–87. https://doi.org/10.1007/978-3-319-12084-3_7
Bair W (2005) Visual receptive field organization. Curr Opin Neurobiol 15(4):459–464. https://doi.org/10.1016/j.conb.2005.07.006
Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629. https://doi.org/10.1038/nrn1476
Beck DM, Kastner S (2009) Top-down and bottom-up mechanisms in biasing competition in the human brain. Vision Res 49(10):1154–1165. https://doi.org/10.1016/j.visres.2008.07.012
Berberian N, Ross M, Chartier S (2019) Discrimination of motion direction in a robot using a phenomenological model of synaptic plasticity. Comput Intell Neurosci. https://doi.org/10.1155/2019/6989128
Bichot NP, Rossi AF, Desimone R (2005) Parallel and serial neural mechanisms for visual search in macaque area v4. Science 308(5721):529–534. https://doi.org/10.1126/science.1109676
Bone MB, Ahmad F, Buchsbaum BR (2020) Feature-specific neural reactivation during episodic memory. Nat Commun. https://doi.org/10.1038/s41467-020-15763-2
Born RT, Bradley DC (2005) Structure and function of visual area mt. Annual Rev Neurosci 28:157–189. https://doi.org/10.1146/annurev.neuro.26.041002.131052
Bovik AC (1991) Analysis of multichannel narrow-band-filters for image texture segmentation. IEEE Trans Signal Process 39(9):2025–2043. https://doi.org/10.1109/78.134435
Bracci S, Op de Beeck H (2016) Dissociations and associations between shape and category representations in the two visual pathways. J Neurosci 36(2):432–444. https://doi.org/10.1523/Jneurosci.2314-15.2016
Brady TF, Konkle T, Alvarez GA, Oliva A (2008) Visual long-term memory has a massive storage capacity for object details. Proc National Acad Sci U.S.A 105(38):14325–14329. https://doi.org/10.1073/pnas.0803390105
Brodeur MB, Dionne-Dostie E, Montreuil T, Lepage M (2010) The bank of standardized stimuli (boss), a new set of 480 normative photos of objects to be used as visual stimuli in cognitive research. Plos One. https://doi.org/10.1371/journal.pone.0010773
Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: Computer Vision - ECCV 2012, pp. 611–625. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-33783-3_44
Cadieu C, Hong H, Yamins D, Pinto N, Majaj N, DiCarlo (2013) The neural representation benchmark and its evaluation on brain and machine. arXiv:abs/1301.3530
Cadieu CF, Hong H, Yamins DLK, Pinto N, Ardila D, Solomon EA, Majaj NJ, DiCarlo JJ (2014) Deep neural networks rival the representation of primate it cortex for core visual object recognition. Plos Comput Biol. https://doi.org/10.1371/journal.pcbi.1003963
Cao CS, Liu XM, Yang Y, Yu YA, Wang J, Wang ZL, Huang YZ, Wang L, Huang C, Xu W, Ramanan D, Huang TS (2015) Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In: 2015 IEEE international conference on computer vision (Iccv), pp. 2956–2964. https://doi.org/10.1109/Iccv.2015.338
Carlson TA, Hogendoorn H, Kanai R, Mesik J, Turret J (2011) High temporal resolution decoding of object position and category. J Vision. https://doi.org/10.1167/11.10.9
Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A (2016) Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci Rep. https://doi.org/10.1038/srep27755
Cichy RM, Pantazis D, Oliva A (2014) Resolving human object recognition in space and time. Nat Neurosci 17(3):455–462. https://doi.org/10.1038/nn.3635
Clanuwat T, Bober-Irizar M, Kitamoto A, Lamb A, Yamamoto K, Ha D (2018) Deep learning for classical japanese literature. https://ui.adsabs.harvard.edu/abs/2018arXiv181201718C
Coggan DD, Liu WL, Baker DH, Andrews TJ (2016) Category-selective patterns of neural response in the ventral visual pathway in the absence of categorical information. Neuroimage 135:107–114. https://doi.org/10.1016/j.neuroimage.2016.04.060
Colby CL, Goldberg ME (1999) Space and attention in parietal cortex. Annual Rev Neurosci 22:319–349. https://doi.org/10.1146/annurev.neuro.22.1.319
Cox D, Dean T (2014) Neural networks and neuroscience-inspired computer vision. Current Biol 24(18):R921–R929
Cox D, Meyers E, Sinha P (2004) Contextually evoked object-specific responses in human visual cortex. Science 304(5667):115–117. https://doi.org/10.1126/science.1093110
Cristobal G, Navarro R (1994) Space and frequency variant image-enhancement based on a gabor representation. Pattern Recognit Lett 15(3):273–277. https://doi.org/10.1016/0167-8655(94)90059-0
Dan Y, Poo MM (2004) Spike timing-dependent plasticity of neural circuits. Neuron 44(1):23–30. https://doi.org/10.1016/j.neuron.2004.09.007
Dapello J, Marques T, Schrimpf M, Geiger F, Cox DD, DiCarlo JJ (2020) Simulating a primary visual cortex at the front of cnns improves robustness to image perturbations. bioRxiv p. 2020.06.16.154542. https://doi.org/10.1101/2020.06.16.154542. https://www.biorxiv.org/content/biorxiv/early/2020/10/22/2020.06.16.154542.full.pdf
David SV, Hayden BY, Gallant JL (2006) Spectral receptive field properties explain shape selectivity in area v4. J Neurophysiol 96(6):3492–3505. https://doi.org/10.1152/jn.00575.2006
Deco G, Rolls ET (2004) A neurodynamical cortical model of visual attention and invariant object recognition. Vision Res 44(6):621–642. https://doi.org/10.1016/j.visres.2003.09.037
de Landeta AB, Pereyra M, Medina JH, Katche C (2020) Anterior retrosplenial cortex is required for long-term object recognition memory. Sci Rep. https://doi.org/10.1038/s41598-020-60937-z
Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition (Cvpr), pp 248–255. DOIhttps://doi.org/10.1109/cvpr.2009.5206848
Desimone R (1998) Visual attention mediated by biased competition in extrastriate visual cortex. Philos Trans Royal Soc B-Biol Sci 353(1373):1245–1255. https://doi.org/10.1098/rstb.1998.0280
Desimone R, Albright TD, Gross CG, Bruce C (1984) Stimulus-selective properties of inferior temporal neurons in the macaque. J Neurosci 4(8):2051–2062
Desimone R, Duncan J (1995) Neural mechanisms of selective visual-attention. Annual Rev Neurosci 18:193–222. https://doi.org/10.1146/annurev.neuro.18.1.193
Doborjeh ZG, Kasabov N, Doborjeh MG, Sumich A (2018) Modelling peri-perceptual brain processes in a deep learning spiking neural network architecture. Sci Rep. https://doi.org/10.1038/s41598-018-27169-8
Dong QL, Wang H, Hu ZY (2018) Statistics of visual responses to image object stimuli from primate ait neurons to dnn neurons. Neural Comput 30(2):447–476. https://doi.org/10.1162/neco_a_01039
Doniger GM, Foxe JJ, Schroeder CE, Murray MM, Higgins BA, Javitt DC (2001) Visual perceptual learning in human object recognition areas: a repetition priming study using high-density electrical mapping. Neuroimage 13(2):305–313. https://doi.org/10.1006/nimg.2000.0684
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, van der Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: 2015 IEEE international conference on computer vision (Iccv), pp 2758–2766. https://doi.org/10.1109/Iccv.2015.316
Downing PE, Jiang YH, Shuman M, Kanwisher N (2001) A cortical area selective for visual processing of the human body. Science 293(5539):2470–2473. https://doi.org/10.1126/science.1063414
Eickenberg M, Gramfort A, Varoquaux G, Thirion B (2017) Seeing it all: convolutional network layers map the function of the human visual system. Neuroimage 152:184–194. https://doi.org/10.1016/j.neuroimage.2016.10.001
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338. https://doi.org/10.1007/s11263-009-0275-4
Faghihi F, Molhem H, Moustafa AA (2019) Toward one-shot learning in neuroscience-inspired deep spiking neural networks. bioRxiv p. 829556. https://doi.org/10.1101/829556. https://www.biorxiv.org/content/biorxiv/early/2019/11/04/829556.full.pdf
Federer C, Xu HY, Fyshe A, Zylberberg J (2020) Improved object recognition using neural networks trained to mimic the brain’s statistical properties. Neural Netw 131:103–114. https://doi.org/10.1016/j.neunet.2020.07.013
Freedman DJ, Riesenhuber M, Poggio T, Miller EK (2002) Visual categorization and the primate prefrontal cortex: neurophysiology and behavior. J Neurophysiol 88(2):929–941. https://doi.org/10.1152/jn.2002.88.2.929
Fukushima K (1980) Neocognitron - a self-organizing neural network model for a mechanism of pattern-recognition unaffected by shift in position. Biol Cybern 36(4):193–202. https://doi.org/10.1007/Bf00344251
Gavornik JP, Bear MF (2014) Learned spatiotemporal sequence recognition and prediction in primary visual cortex. Nat Neurosci 17(5):732–737. https://doi.org/10.1038/nn.3683
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
George D, Lehrach W, Kansky K, Lazaro-Gredilla M, Laan C, Marthi B, Lou XH, Meng ZS, Liu Y, Wang HY, Lavin A, Phoenix DS (2017) A generative vision model that trains with high data efficiency and breaks text-based captchas. Science. https://doi.org/10.1126/science.aag2612
Geusebroek JM, Burghouts GJ, Smeulders AWM (2005) The amsterdam library of object images. Int J Comput Vision 61(1):103–112. https://doi.org/10.1023/B:Visi.0000042993.50813.60
Gielis J (2003) A generic geometric transformation that unifies a wide range of natural and abstract shapes. Am J Botany 90(3):333–338. https://doi.org/10.3732/ajb.90.3.333
Goddard E, Carlson TA, Dermody N, Woolgar A (2016) Representational dynamics of object recognition: feedforward and feedback information flows. Neuroimage 128:385–397. https://doi.org/10.1016/j.neuroimage.2016.01.006
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun Acm 63(11):139–144. https://doi.org/10.1145/3422622
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. CalTech Report
Grigorescu C, Petkov N (2003) Distance sets for shape filters and shape recognition. IEEE Trans Image Process 12(10):1274–1286. https://doi.org/10.1109/Tip.2003.816010
Grill-Spector K, Kushnir T, Hendler T, Malach R (2000) The dynamics of object-selective activation correlate with recognition performance in humans. Nat Neurosci 3(8):837–843. https://doi.org/10.1038/77754
Han Y, Roig G, Geiger G, Poggio T (2020) Scale and translation-invariance for novel objects in human vision. Sci Rep. https://doi.org/10.1038/s41598-019-57261-6
Heidari-Gorji H, Ebrahimpour R, Zabbah S (2021) A temporal hierarchical feedforward model explains both the time and the accuracy of object recognition. Sci Rep 11(1):5640. https://doi.org/10.1038/s41598-021-85198-2
Hendrycks D, Dietterich T (2019) Benchmarking neural network robustness to common corruptions and perturbations. https://ui.adsabs.harvard.edu/abs/2019arXiv190312261H
Hong H, Yamins DLK, Majaj NJ, DiCarlo JJ (2016) Explicit information for category-orthogonal object properties increases along the ventral stream. Nat Neurosci 19(4):613–622. https://doi.org/10.1038/nn.4247
Hoover A, Kouznetsova V, Goldbaum M (2000) Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans Med Imaging 19(3):203–210. https://doi.org/10.1109/42.845178
Horikawa T, Kamitani Y (2017) Generic decoding of seen and imagined objects using hierarchical visual features. Nat Commun. https://doi.org/10.1038/ncomms15037
Huang GB, Mattar MA, Lee H, Learned-Miller E (2012) Learning to align from scratch. Adv Neural Inf Process Syst 1:764–772
Hubel DH (1985) Receptive-fields, binocular interaction and functional architecture in the cats visual-cortex. Curr Contents/Life Sci 19:23–23
Hubel DH, Wiesel TN (1959) Receptive fields of single neurones in the cats striate cortex. J Physiol-Lond 148(3):574–591. https://doi.org/10.1113/jphysiol.1959.sp006308
Jacob G, Pramod RT, Katti H, Arun SP (2021) Qualitative similarities and differences in visual object representations between brains and deep networks. Nat Commun 12(1):1872. https://doi.org/10.1038/s41467-021-22078-3
Jazlaeiyan M, Seyedin S, Motamedi SA (2018) Enhanced brain inspired model for face categorization using mutual information maximization. In: 2018 25th national and 3rd international iranian conference on biomedical engineering (ICBME), pp 1–6. https://doi.org/10.1109/ICBME.2018.8703599
Kaiser D, Azzalini DC, Peelen MV (2016) Shape-independent object category responses revealed by meg and fmri decoding. J Neurophysiol 115(4):2246–2250. https://doi.org/10.1152/jn.01074.2015
Kapoor A, Shenoy P, Tan D (2008) Combining brain computer interfaces with vision for object categorization. In: 2008 IEEE conference on computer vision and pattern recognition, Vols 1-12, pp. 2150–+
Kar K, DiCarlo JJ (2021) Fast recurrent processing via ventrolateral prefrontal cortex is needed by the primate ventral stream for robust core visual object recognition. Neuron. https://doi.org/10.1016/j.neuron.2020.09.035
Kar K, Kubilius J, Schmidt K, Issa EB, DiCarlo JJ (2019) Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nat Neurosci 22(6):974–983. https://doi.org/10.1038/s41593-019-0392-5
Karimi-Rouzbahani H, Bagheri N, Ebrahimpour R (2017) Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models. Sci Rep. https://doi.org/10.1038/s41598-017-13756-8
Karimimehr S, Yazdchi MR (2014) How computational neuroscience could help improving face recognition systems? In: 2014 4th international conference on computer and knowledge engineering (ICCKE), pp 410–413. https://doi.org/10.1109/ICCKE.2014.6993453
Katti H, Arun SP (2019) Are you from north or south india? a hard face-classification task reveals systematic representational differences between humans and machines. J Vision. https://doi.org/10.1167/19.7.1
Kay KN, Naselaris T, Prenger RJ, Gallant JL (2008) Identifying natural images from human brain activity. Nature 452(7185):352–355. https://doi.org/10.1038/nature06713
Kheradpisheh SR, Ganjtabesh M, Masquelier T (2016) Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition. Neurocomputing 205:382–392. https://doi.org/10.1016/j.neucom.2016.04.029
Kheradpisheh SR, Ghodrati M, Ganjtabesh M, Masquelier T (2016) Deep networks can resemble human feed-forward vision in invariant object recognition. Sci Rep. https://doi.org/10.1038/srep32672
Kim E, Hannan D, Kenyon G (2018) Deep sparse coding for invariant multimodal halle berry neurons. In: 2018 IEEE/Cvf conference on computer vision and pattern recognition (Cvpr), pp 1111–1120. https://doi.org/10.1109/Cvpr.2018.00122
Kim G, Jang J, Baek S, Song M, Paik SB (2021) Visual number sense in untrained deep neural networks. Sci Adv. https://doi.org/10.1126/sciadv.abd6127
Konen CS, Kastner S (2008) Two hierarchically organized neural systems for object information in human visual cortex. Nat Neurosci 11(2):224–231. https://doi.org/10.1038/nn2036
Kosse C, Burdakov D (2019) Natural hypothalamic circuit dynamics underlying object memorization. Nat Commun. https://doi.org/10.1038/s41467-019-10484-7
Kourtzi Z, Kanwisher N (2001) Representation of perceived object shape by the human lateral occipital complex. Science 293(5534):1506–1509. https://doi.org/10.1126/science.1061133
Kriegeskorte N, Douglas PK (2018) Cognitive computational neuroscience. Nat Neurosci 21(9):1148–1160. https://doi.org/10.1038/s41593-018-0210-5
Kriegeskorte N, Mur M, Ruff DA, Kiani R, Bodurka J, Esteky H, Tanaka K, Bandettini PA (2008) Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60(6):1126–1141. https://doi.org/10.1016/j.neuron.2008.10.043
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handbook of systemic autoimmune diseases 1(4)
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun Acm 60(6):84–90. https://doi.org/10.1145/3065386
Kruger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodriguez-Sanchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Transan Pattern Anal Mach Intell 35(8):1847–1871. https://doi.org/10.1109/Tpami.2012.272
Kuzovkin I, Vicente R, Petton M, Lachaux JP, Baciu M, Kahane P, Rheims S, Vidal JR, Aru J (2018) Activations of deep convolutional neural networks are aligned with gamma band activity of human visual cortex. Commun Biol. https://doi.org/10.1038/s42003-018-0110-y
Landi SM, Freiwald WA (2017) Two areas for familiar face recognition in the primate brain. Science 357(6351):591–595. https://doi.org/10.1126/science.aan1139
Langner O, Dotsch R, Bijlstra G, Wigboldus DHJ, Hawk ST, van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cognit Emot 24(8):1377–1388. https://doi.org/10.1080/02699930903485076
Lauer T, Cornelissen THW, Draschkow D, Willenbockel V, Vo MLH (2018) The role of scene summary statistics in object recognition. Sci Rep. https://doi.org/10.1038/s41598-018-32991-1
Le QV (2013) Building high-level features using large scale unsupervised learning. In: 2013 IEEE international conference on acoustics, speech and signal processing (Icassp), pp 8595–8598
Lecun Y, Cortes C (2010) The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/
LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, pp 97–104
Leibe B, Schiele B (2003) Analyzing appearance and contour based methods for object categorization. In: 2003 IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2003.1211497
Levy I, Hasson U, Avidan G, Hendler T, Malach R (2001) Center-periphery organization of human object areas. Nat Neurosci 4(5):533–539
Li FF, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vision Image Underst 106(1):59–70. https://doi.org/10.1016/j.cviu.2005.09.012
Li N, DiCarlo JJ (2008) Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science 321(5895):1502–1507. https://doi.org/10.1126/science.1160028
Liang M, Hu XL (2015) Recurrent convolutional neural network for object recognition. In: 2015 IEEE conference on computer vision and pattern recognition (Cvpr), pp 3367–3375
Liang Q, Zeng Y, Xu B (2020) Temporal-sequential learning with a brain-inspired spiking neural network and its application to musical memory. Front Comput Neurosci. https://doi.org/10.3389/fncom.2020.00051
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: computer vision - Eccv 2014, vol. 8693, pp 740–755. Doi https://doi.org/10.1007/978-3-319-10602-1_48
Liu JX, Huo H, Hu WT, Fang T (2018) Brain-inspired hierarchical spiking neural network using unsupervised stdp rule for image classification. In: proceedings of 2018 10th international conference on machine learning and computing (Icmlc 2018), pp 230–235. https://doi.org/10.1145/3195106.3195115
Liu JX, Zhao GP (2018) A bio-inspired sosnn model for object recognition. In: 2018 international joint conference on neural networks (Ijcnn), pp 861–868
Lopez-Aranda MF, Lopez-Tellez JF, Navarro-Lobato I, Masmudi-Martin M, Gutierrez A, Khan ZU (2009) Role of layer 6 of v2 visual cortex in object-recognition memory. Science 325(5936):87–89. https://doi.org/10.1126/science.1170869
Lu YF, Qiao H, Li Y, Jia LH (2018) Image recommendation based on a novel biologically inspired hierarchical model. Multimed Tools Appl 77(4):4323–4337. https://doi.org/10.1007/s11042-017-5514-z
MacEvoy SP, Epstein RA (2011) Constructing scenes from objects in human occipitotemporal cortex. Nat Neurosci. https://doi.org/10.1038/nn.2903
Mel BW (1997) Seemore: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Comput 9(4):777–804. https://doi.org/10.1162/neco.1997.9.4.777
Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3061–3070. https://doi.org/10.1109/CVPR.2015.7298925
Miceli G, Fouch E, Capasso R, Shelton JR, Tomaiuolo F, Caramazza A (2001) The dissociation of color from form and function knowledge. Nat Neurosci 4(6):662–667. https://doi.org/10.1038/88497
Miller EK, Cohen JD (2001) An integrative theory of prefrontal cortex function. Annual Rev Neurosci 24:167–202. https://doi.org/10.1146/annurev.neuro.24.1.167
Mohsenzadeh Y, Mullin C, Lahner B, Oliva A (2020) Emergence of visual center-periphery spatial organization in deep convolutional neural networks. Sci Rep. https://doi.org/10.1038/s41598-020-61409-0
Montobbio N, Bonnasse-Gahot L, Citti G, Sarti A (2019) Kercnns: biologically inspired lateral connections for classification of corrupted images. arXiv: abs/1910.08336
Nasr K, Viswanathan P, Nieder A (2019) Number detectors spontaneously emerge in a deep neural network designed for visual object recognition. Sci Adv. https://doi.org/10.1126/sciadv.aav7903
Nassi JJ, Callaway EM (2009) Parallel processing strategies of the primate visual system. Nat Rev Neurosci 10(5):360–372. https://doi.org/10.1038/nrn2619
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A (2011) Reading digits in natural images with unsupervised feature learning. NIPS
Nishimoto S, Vu AT, Naselaris T, Benjamini Y, Yu B, Gallant JL (2011) Reconstructing visual experiences from brain activity evoked by natural movies. Curr Biol 21(19):1641–1646. https://doi.org/10.1016/j.cub.2011.08.031
Okamura JY, Yamaguchi R, Honda K, Wang G, Tanaka K (2014) Neural substrates of view-invariant object recognition developed without experiencing rotations of the objects. J Neurosci 34(45):15047–15059. https://doi.org/10.1523/Jneurosci.1898-14.2014
Orlov T, Zohary E (2018) Object representations in human visual cortex formed through temporal integration of dynamic partial shape views. J Neurosci 38(3):659–678. https://doi.org/10.1523/Jneurosci.1318-17.2017
Palmeri TJ, Gauthier I (2004) Visual object understanding. Nat Rev Neurosci 5(4):291–303. https://doi.org/10.1038/nrn1364
Park MS, Zhang CJ, DeBole M, Kestur S, Narayanan V, Irwin MJ (2013) Accelerators for biologically-inspired attention and recognition. In: 2013 50th Acm / Edac / IEEEE Design automation conference (Dac), pp 1–6
Park YJ, Baek S, Paik SB (2021) A brain-inspired network architecture for cost-efficient object recognition in shallow hierarchical neural networks. Neural Netw 134:76–85. https://doi.org/10.1016/j.neunet.2020.11.013
Pedretti G, Milo V, Ambrogio S, Carboni R, Bianchi S, Calderoni A, Ramaswamy N, Spinelli AS, Ielmini D (2017) Memristive neural network for on-line learning and tracking with brain-inspired spike timing dependent plasticity. Sci Rep. https://doi.org/10.1038/s41598-017-05480-0
Peelen MV, Fei-Fei L, Kastner S (2009) Neural mechanisms of rapid natural scene categorization in human visual cortex. Nature 460(7251):94-U105. https://doi.org/10.1038/nature08103
Perrett DI, Oram MW (1993) Neurophysiology of shape processing. Image Vision Comput 11(6):317–333. https://doi.org/10.1016/0262-8856(93)90011-5
Piech V, Li W, Reeke GN, Gilbert CD (2013) Network model of top-down influences on local gain and contextual interactions in visual cortex. Proc Nat Acad Sci U.S.A 110(43):E4108–E4117. https://doi.org/10.1073/pnas.1317019110
Podvalny E, Flounders MW, King LE, Holroyd T, He BJ (2019) A dual role of prestimulus spontaneous neural activity in visual object recognition. Nat Commun. https://doi.org/10.1038/s41467-019-11877-4
Ponce CR, Xiao W, Schade PF, Hartmann TS, Kreiman G, Livingstone MS (2019) Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences. Cell 177(4):999–1013. https://doi.org/10.1016/j.cell.2019.04.005
Pramod RT, Arun SP (2016) Do computational models differ systematically from human object perception? In: 2016 IEEE conference on computer vision and pattern recognition (Cvpr), pp 1601–1609. https://doi.org/10.1109/Cvpr.2016.177
Priebe NJ (2016) Mechanisms of orientation selectivity in the primary visual cortex. Annual Rev Vision Sci 2:85–107. https://doi.org/10.1146/annurev-vision-111815-114456
Rajalingham R, Issa EB, Bashivan P, Kar K, Schmidt K, DiCarlo JJ (2018) Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J Neurosci 38(33):7255–7269. https://doi.org/10.1523/Jneurosci.0388-18.2018
Rees G, Frackowiak R, Frith C (1997) Two modulatory effects of attention that mediate object categorization in human cortex. Science 275(5301):835–838. https://doi.org/10.1126/science.275.5301.835
Rice GE, Watson DM, Hartley T, Andrews TJ (2014) Low-level image properties of visual objects predict patterns of neural response across category-selective regions of the ventral visual pathway. J Neurosci 34(26):8837–8844. https://doi.org/10.1523/Jneurosci.5265-13.2014
Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nat Neurosci 2(11):1019–1025
Riesenhuber M, Poggio T (2002) Neural mechanisms of object recognition. Curr Opin Neurobiol 12(2):162–168. https://doi.org/10.1016/S0959-4388(02)00304-5
Ritchie JB, Op de Beeck H (2019) Using neural distance to predict reaction time for categorizing the animacy, shape, and abstract properties of objects. Sci Rep. https://doi.org/10.1038/s41598-019-49732-7
Rizzolatti G, Matelli M (2003) Two different streams form the dorsal visual system: anatomy and functions. Exp Brain Res 153(2):146–157. https://doi.org/10.1007/s00221-003-1588-0
Romanski LM, Chafee MV (2021) A view from the top: prefrontal control of object recognition. Neuron 109(1):6–8. https://doi.org/10.1016/j.neuron.2020.12.014
Rosch E (1975) Basic objects in natural categories. Bull Psychon Soc 6(Nb4):415–415
Rousselet GA, Thorpe SJ, Fabre-Thorpe M (2004) How parallel is visual processing in the ventral pathway? Trends Cogn Sci 8(8):363–370. https://doi.org/10.1016/j.tics.2004.06.003
Roy K, Jaiswal A, Panda P (2019) Towards spike-based machine intelligence with neuromorphic computing. Nature 575(7784):607–617. https://doi.org/10.1038/s41586-019-1677-2
Rupp K, Roos M, Milsap G, Caceres C, Ratto C, Chevillet M, Crone NE, Wolmetz M (2017) Semantic attributes are encoded in human electrocorticographic signals during visual object recognition. Neuroimage 148:318–329. https://doi.org/10.1016/j.neuroimage.2016.12.074
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vision 77(1–3):157–173. https://doi.org/10.1007/s11263-007-0090-8
Rybak IA, Gusakova VI, Golovan AV, Podladchikova LN, Shevtsova NA (1998) A model of attention-guided visual perception and recognition. Vision Res 38(15–16):2387–2400. https://doi.org/10.1016/S0042-6989(98)00020-0
Saifullah M (2011) A biologically inspired model for occluded patterns. Neural Inf Process 7062:88–96
Savarese S, Li FF (2007) 3d generic object categorization, localization and pose estimation. In: 2007 IEEE 11th international conference on computer vision, pp 1–8. https://doi.org/10.1109/ICCV.2007.4408987
Schwartz EL, Desimone R, Albright TD, Gross CG (1983) Shape-recognition and inferior temporal neurons. Proc Natl Academy Sci U.S.A-Biol Sci 80(18):5776–5778. https://doi.org/10.1073/pnas.80.18.5776
Seeliger K, Fritsche M, Guclu U, Schoenmakers S, Schoffelen JM, Bosch SE, van Gerven MAJ (2018) Convolutional neural network-based encoding and decoding of visual object recognition in space and time. Neuroimage 180:253–266. https://doi.org/10.1016/j.neuroimage.2017.07.018
Seeliger K, Guclu U, Ambrogioni L, Gucluturk Y, van Gerven MAJ (2018) Generative adversarial networks for reconstructing natural images from brain activity. Neuroimage 181:775–785. https://doi.org/10.1016/j.neuroimage.2018.07.043
Sehatpour P, Molholm S, Javitt DC, Foxe JJ (2006) Spatiotemporal dynamics of human object recognition processing: an integrated high-density electrical mapping and functional imaging study of “closure’’ processes. Neuroimage 29(2):605–618. https://doi.org/10.1016/j.neuroimage.2005.07.049
Serre T (2019) Deep learning: the good, the bad, and the ugly. Annual Rev Vision Sci 5:399–426. https://doi.org/10.1146/annurev-vision-091718-014951
Serre T, Kreiman G, Kouh M, Cadieu C, Knoblich U, Poggio T (2007) A quantitative theory of immediate visual recognition. Comput Neurosci: Theor Insights Brain Funct 165:33–56. https://doi.org/10.1016/S0079-6123(06)65004-8
Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Analy Mach Intell 29(3):411–426. https://doi.org/10.1109/Tpami.2007.56
Shadlen MN, Movshon JA (1999) Synchrony unbound: a critical evaluation of the temporal binding hypothesis. Neuron 24(1):67–77. https://doi.org/10.1016/S0896-6273(00)80822-3
Smirnov EA, Timoshenko DM, Andrianov SN (2014) Comparison of regularization methods for imagenet classification with deep convolutional neural networks. In: 2nd aasri conference on computational intelligence and bioinformatics, vol. 6, pp 89–94. https://doi.org/10.1016/j.aasri.2014.05.013
Snodgrass JG, Vanderwart M (1980) A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. J Exp psychol 62:174–215
Solomon SG, Kohn A (2014) Moving sensory adaptation beyond suppressive effects in single neurons. Curr Biol 24(20):R1012–R1022. https://doi.org/10.1016/j.cub.2014.09.001
Song S, Ma C, Yu Q (2020) Brain-inspired framework for image classification with a new unsupervised matching pursuit encoding. In: international conference on neural information processing, neural information processing, pp 208–219. Springer International Publishing
Spampinato C, Palazzo S, Kavasidis I, Giordano D, Souly N, Shah M (2017) Deep learning human mind for automated visual classification. In: 30th IEEE conference on computer vision and pattern recognition (Cvpr 2017), pp 4503–4511. https://doi.org/10.1109/Cvpr.2017.479
Staal J, Abramoff MD, Niemeijer M, Viergever MA, van Ginneken B (2004) Ridge-based vessel segmentation in color images of the retina. IEEE Trans Med Imaging 23(4):501–509. https://doi.org/10.1109/Tmi.2004.825627
Swirsky LT, Marinacci RM, Spaniol J (2020) Reward anticipation selectively boosts encoding of gist for visual objects. Sci Rep. https://doi.org/10.1038/s41598-020-77369-4
Takeuchi D, Hirabayashi T, Tamura K, Miyashita Y (2011) Reversal of interlaminar signal between sensory and memory processing in monkey temporal cortex. Science 331(6023):1443–1447. https://doi.org/10.1126/science.1199967
Tarr MJ (1999) News on views: pandemonium revisited. Nat Neurosci 2(11):932–935. https://doi.org/10.1038/14714
Thoma V, Henson RN (2011) Object representations in ventral and dorsal visual streams: fmri repetition effects depend on attention and part-whole configuration. Neuroimage 57(2):513–525. https://doi.org/10.1016/j.neuroimage.2011.04.035
Todd JJ, Marois R (2004) Capacity limit of visual short-term memory in human posterior parietal cortex. Nature 428(6984):751–754. https://doi.org/10.1038/nature02466
Tompa T, Sary G (2010) A review on the inferior temporal cortex of the macaque. Brain Res Rev 62(2):165–182. https://doi.org/10.1016/j.brainresrev.2009.10.001
Ullman S, Assif L, Fetaya E, Harari D (2016) Atoms of recognition in human and computer vision. Proc Natl Acad Sci U.S.A 113(10):2744–2749. https://doi.org/10.1073/pnas.1513198113
Ullman S, Vidal-Naquet M, Sali E (2002) Visual features of intermediate complexity and their use in classification. Nat Neurosci 5(7):682–687. https://doi.org/10.1038/nn870
Vinken K, Boix X, Kreiman G (2020) Incorporating intrinsic suppression in deep neural networks captures dynamics of adaptation in neurophysiology and perception. Sci Adv. https://doi.org/10.1126/sciadv.abd4205
Vogels TP, Rajan K, Abbott LF (2005) Neural network dynamics. Annual Rev Neurosci 28:357–376. https://doi.org/10.1146/annurev.neuro.28.061604.135637
Wallis G, Rolls ET (1997) Invariant face and object recognition in the visual system. Prog Neurobiol 51(2):167–194. https://doi.org/10.1016/S0301-0082(96)00054-8
Wang CM, Xiong S, Hu XP, Yao L, Zhang JC (2012) Combining features from erp components in single-trial eeg for discriminating four-category visual objects. J Neural Eng. https://doi.org/10.1088/1741-2560/9/5/056013
Wang G, Obama S, Yamashita W, Sugihara T, Tanaka K (2005) Prior experience of rotation is not required for recognizing objects seen from different angles. Nat Neurosci 8(12):1568–1575. https://doi.org/10.1038/nn1600
Wang G, Tanaka K, Tanifuji M (1996) Optical imaging of functional organization in the monkey inferotemporal cortex. Science 272(5268):1665–1668. https://doi.org/10.1126/science.272.5268.1665
Wen H, Han K, Shi J, Zhang Y, Culurciello E, Liu Z (2018) Deep predictive coding network for object recognition. arXiv:abs/1802.04762
Wen HG, Shi JX, Chen W, Liu ZM (2018) Deep residual network predicts cortical representation and organization of visual features for rapid categorization. Sci Rep. https://doi.org/10.1038/s41598-018-22160-9
Wen HG, Shi JX, Chen W, Liu ZM (2018) Transferring and generalizing deep-learning-based neural encoding models across subjects. Neuroimage 176:152–163. https://doi.org/10.1016/j.neuroimage.2018.04.053
Wersing H, Korner E (2003) Learning optimized features for hierarchical models of invariant object recognition. Neural Comput 15(7):1559–1588. https://doi.org/10.1162/089976603321891800
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. https://ui.adsabs.harvard.edu/abs/2017arXiv170807747X
Xiao JX, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on computer vision and pattern recognition (Cvpr), pp 3485–3492. DOI https://doi.org/10.1109/cvpr.2010.5539970
Yamane Y, Carlson ET, Bowman KC, Wang ZH, Connor CE (2008) A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nat Neurosci 11(11):1352–1360. https://doi.org/10.1038/nn.2202
Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ (2014) Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci U.S.A 111(23):8619–8624. https://doi.org/10.1073/pnas.1403112111
Yildirim I, Belledonne M, Freiwald W, Tenenbaum J (2020) Efficient inverse graphics in biological face processing. Sci Adv. https://doi.org/10.1126/sciadv.aax5979
Yu CP, Liu HD, Samaras D, Zelinsky GJ (2019) Modelling attention control using a convolutional neural network designed after the ventral visual pathway. Vis Cogn. https://doi.org/10.1080/13506285.2019.1661927
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: computer vision - eccv 2014, vol. 8689, pp 818–833. Doi https://doi.org/10.1007/978-3-319-10590-1_53
Zeman AA, Ritchie JB, Bracci S, Op de Beeck H (2020) Orthogonal representations of object shape and category in deep convolutional neural networks and human visual cortex. Sci Rep. https://doi.org/10.1038/s41598-020-59175-0
Zeng Y, Zhao FF, Wang GX, Zhang LY, Xu B (2016) Brain-inspired obstacle detection based on the biological visual pathway. Brain Inform Health 9919:355–364. https://doi.org/10.1007/978-3-319-47103-7_35
Zhao FF, Kong QQ, Zeng Y, Xu B (2020) A brain-inspired visual fear responses model for uav emergent obstacle dodging. IEEE Trans Cogn Dev Syst 12(1):124–132. https://doi.org/10.1109/Tcds.2019.2939024
Zhao XP, Wang L, Zhan-Yi HU (2006) A perceptual object based attention mechanism for scene analysis. J Image Gr 11:281–288
Zhou BL, Lapedriza A, Khosla A, Oliva A, Torralba A (2018) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Analy Mach Intell 40(6):1452–1464. https://doi.org/10.1109/Tpami.2017.2723009
Zhuang CX, Yan SM, Nayebi A, Schrimpf M, Frank MC, DiCarlo JJ, Yamins DLK (2021) Unsupervised neural network models of the ventral visual stream. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.2014196118
Zweig S, Wolf L (2017) Interponet, a brain inspired neural network for optical flow dense interpolation. In: 30th IEEE conference on computer vision and pattern recognition (Cvpr 2017), pp 6363–6372. https://doi.org/10.1109/Cvpr.2017.674
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant No. 61703337) and by Aviation Science Foundation of China (Grant No.ASFC-20191053002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, X., Yan, J., Wang, W. et al. Brain-inspired models for visual object recognition: an overview. Artif Intell Rev 55, 5263–5311 (2022). https://doi.org/10.1007/s10462-021-10130-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-10130-z