Skip to main content
Log in

A comprehensive review of recent advances on deep vision systems

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Real-time video objects detection, tracking, and recognition are challenging issues due to the real-time processing requirements of the machine learning algorithms. In recent years, video processing is performed by deep learning (DL) based techniques that achieve higher accuracy but require higher computations cost. This paper presents a recent survey of the state-of-the-art DL platforms and architectures used for deep vision systems. It highlights the contributions and challenges from over numerous research studies. In particular, this paper first describes the architecture of various DL models such as AutoEncoders, deep Boltzmann machines, convolution neural networks, recurrent neural networks and deep residual learning. Next, deep real-time video objects detection, tracking and recognition studies are highlighted to illustrate the key trends in terms of cost of computation, number of layers and the accuracy of results. Finally, the paper discusses the challenges of applying DL for real-time video processing and draw some directions for the future of DL algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

(Reproduced with permission from Hinton et al. 2006)

Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abbas Q, Ibrahim MEA, Jaffar MA (2017) Video scene analysis: an overview and challenges on deep learning algorithms. J Multimed Tools Appl. https://doi.org/10.1007/s11042-017-5438-7

    Google Scholar 

  • Alotaibi A, Mahmood A (2016) Deep face liveness detection based on nonlinear diffusion using convolution neural network. J Signal Image Video Process. https://doi.org/10.1007/s11760-016-1014-2 (ISSN 1863-1711)

    Google Scholar 

  • Andriluka M, Stewart R, Ng AY (2016) End-to-end people detection in crowded scenes. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), pp 2325–2333. https://doi.org/10.1109/cvpr.2016.255

  • Badrinarayanan V, Kendall A, Cipolla R (2015) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint, abs/1511.00561

  • Bai J, Wu Y, Zhang J, Chen F (2015) Subset based deep learning for RGB-D object recognition. J Neuro Comput 165(3):280–292. https://doi.org/10.1016/j.neucom.2015.03.017 (ISSN 0925-2312)

    Google Scholar 

  • Batra D, Kowdle A, Parikh D, Luo J, Chen T (2010) iCoseg: interactive Co-segmentation with Intelligent scribble guidance. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 3169–3176. https://doi.org/10.1109/cvpr.2010.5540080

  • Bell S, Zitnick CL, Bala K, Girshick RB (2016) Inside–outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, pp 2874–2883

  • Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Proceedings of the 19th international conference on neural information processing systems (NIPS’06). MIT Press, Canada, pp 153–160

  • Boumbarov O, Panev S, Paliy I, Petrov P, Dimitrov L (2011) Homography-based face orientation determination from a fixed monocular camera. In: Proceedings of the 6th IEEE international conference on intelligent data acquisition and advanced computing systems, vol 1, pp. 399–403. https://doi.org/10.1109/idaacs.2011.6072783

  • Cao Y, Chen Y, Khosla D (2015) Spiking deep convolutional neural networks for energy-efficient object recognition. Int J Comput Vis 113(1):54–66. https://doi.org/10.1007/s11263-014-0788-3 (ISSN 0920-5691)

    Article  MathSciNet  Google Scholar 

  • Carneiro G, Nascimento JC (2010) The fusion of deep learning architectures and particle filtering applied to lip tracking. In: Proceedings of 20th international conference on pattern recognition, pp 2065–2068. https://doi.org/10.1109/icpr.2010.508

  • Chatfield K, Arandjelovic R, Parkhi OM, Zisserman A (2015) On-the-fly learning for visual search of large-scale image and video datasets. Int J Multimed Inf Retriev 4(2):75–93. https://doi.org/10.1007/s13735-015-0077-0

    Article  Google Scholar 

  • Chellappa R (2016) The changing fortunes of pattern recognition and computer vision. J Image Vis Comput 55(3–5):2016. https://doi.org/10.1016/j.imavis.2016.04.005

    Google Scholar 

  • Chen H, Ni D, Qin J, Li S, Yang X, Wang T, Heng PA (2015) Standard plane localization in fetal ultrasound via domain transferred deep neural networks. IEEE J Biomed Health Inf 19(5):1627–1636. https://doi.org/10.1109/jbhi.2015.2425041.4 (ISSN 2168-2194)

    Article  Google Scholar 

  • Cheng HY, Weng CC, Chen YY (2012) Vehicle detection in aerial surveillance using dynamic bayesian networks. IEEE Trans Image Process 21(4):2152–2159. https://doi.org/10.1109/tip.2011.2172798 (ISSN 1057-7149)

    Article  MathSciNet  MATH  Google Scholar 

  • Cinbis RG, Verbeek J, Schmid C (2017) Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans Pattern Anal Mach Intell 39(1):189–203. https://doi.org/10.1109/tpami.2016.2535231 (ISSN 0162-8828)

    Article  Google Scholar 

  • Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of conference on neural information processing systems, Barcelona, pp 379–387

  • Ding S, Lin L, Wang G, Chao H (2015) Deep feature learning with relative distance comparison for person re-identification. J Pattern Recogn 48(10):2993–3003. https://doi.org/10.1016/j.patcog.2015.04.005 (ISSN 0031-3203)

    Article  Google Scholar 

  • Ding J, Huang Y, Liu W, Huang K (2016) Severely blurred object tracking by learning deep image representations. IEEE Trans Circuits Syst Video Technol 26(2):319–331. https://doi.org/10.1109/tcsvt.2015.2406231 (ISSN 1051-8215)

    Article  Google Scholar 

  • Druzhkov PN, Kustikova VD (2016) A survey of deep learning methods and software tools for image classification and object detection. J Pattern Recogn Image Anal 26(1):9–15. https://doi.org/10.1134/s1054661816010065 (ISSN 1054-6618)

    Article  Google Scholar 

  • Fan B, Xie L, Yang S, Wang L, Soong FK (2016) A deep bidirectional LSTM approach for video-realistic talking head. J Multimed Tools Appl 75(9):5287–5309. https://doi.org/10.1007/s11042-015-2944-3 (ISSN 1573-7721)

    Article  Google Scholar 

  • Farrajota M, Rodrigues JMF, du Buf JMH (2016) A deep neural network video framework for monitoring elderly persons. In: Proceedings part II of 10th international conference universal access in human–computer interaction (UAHCI2016), Toronto, pp 370–381

  • Forczmanski P, Nowosielski A (2016) Deep learning approach to detection of preceding vehicle in advanced driver assistance. In: 16th International conference on transport systems telematics (TST’16), Katowice-Ustron, Poland, pp 293–304. https://doi.org/10.1007/978-3-319-49646-725

  • Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. J Biol Cybern 36(4):193–202. https://doi.org/10.1007/bf00344251 (ISSN 1432-0770)

    Article  MATH  Google Scholar 

  • Gando G, Yamada T, Sato H, Oyama S, Kurihara M (2016) Fine-tuning deep convolutional neural networks for distinguishing illustrations from photographs. Int J Expert Syst Appl 66(295–301):2016

    Google Scholar 

  • Garcia-Garcia A, Orts-Escolano S, Oprea SO, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. CoRR, vol. abs/1704.06857, 2017. [Online]. Available: http://arxiv.org/abs/1704.06857

  • Girshick R (2015) Fast R-CNN. In: Proceedings of IEEE international conference on computer vision (ICCV’15), Santiago, pp 1440–1448

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’14). IEEE Computer Society, Washington. pp 580–587. ISBN 978-1-4799-5118-5. https://doi.org/10.1109/cvpr.2014.81

  • Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, pp 6645–6649. https://doi.org/10.1109/icassp.2013.6638947

  • Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. J Neurocompt 187(6):27–48. https://doi.org/10.1016/j.neucom.2015.09.116 (ISSN 0925-2312)

    Article  Google Scholar 

  • Hamedani K, Seyyedsalehi SA, Ahamdi R (2016) Video-based face recognition and image synthesis from rotating head frames using nonlinear manifold learning by neural networks. J Neural Comput Appl 27(6):1761–1769. https://doi.org/10.1007/s00521-015-1975-z (ISSN 0941-0643)

    Article  Google Scholar 

  • Hayat M, Bennamoun M, An S (2015) Deep reconstruction models for image set classification. IEEE Trans Pattern Anal Mach Intell 37(4):713–727. https://doi.org/10.1109/tpami.2014.2353635 (ISSN 0162-8828)

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/tpami.2015.2389824

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, pp 770–778. https://doi.org/10.1109/cvpr.2016.90

  • He T, Mao H, Yi Z (2016b) Moving object recognition using multi-view three-dimensional convolutional neural networks. J Neural Comput Appl. https://doi.org/10.1007/s00521-016-2277-9 (ISSN 1433-3058)

    Google Scholar 

  • Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. J Image Vis Comput 60(2017):4–21. https://doi.org/10.1016/j.imavis.2017.01.010 (ISSN 0262-8856)

    Article  Google Scholar 

  • Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. J Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  • Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of the 32nd international conference on machine learning (ICML’15), Lille, pp 597–606

  • Hong S, Roh B, Kim K, Cheon Y, Park M (2016) PVANet: lightweight deep neural networks for real-time object detection. In: Proceedings of the 1st international workshop on efficient methods for deep neural networks (EMDNN’2016), abs/1611.08588

  • Huang GB, Lee H, Miller EL (2012) Learning hierarchical representations for face verification with convolutional deep belief networks. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition (CVPR ‘12). IEEE Computer Society, Washington, pp 2518–2525

  • Kim B, Roh J, Dong S, Lee S (2016) Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J Multimodal User Interfaces 10(2):173–189. https://doi.org/10.1007/s12193-015-0209-0

    Article  Google Scholar 

  • Krig S (2016) Feature learning and deep learning architecture survey. computer vision metrics-textbook edition. Springer, Berlin, pp 375–514. https://doi.org/10.1007/978-3-319-33762-310 (978-3-319-33762-3)

    MATH  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of 25th international conference on neural information processing systems (NIPS’12), Nevada, pp 1097–1105

  • Kuen J, Lim KM, Lee CP (2015) Self-taught learning of a deep invariant representation for visual tracking via temporal slowness. J Pattern Recogn 48(10):2964–2982. https://doi.org/10.1016/j.patcog.2015.02.012 (ISSN 0031-3203)

    Article  Google Scholar 

  • LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. J Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541 (ISSN 0899-7667)

    Article  Google Scholar 

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • LeCun Y, Bengio Y, Hinton G (2015) Deep Learning. Int J Sci 521:436–444. https://doi.org/10.1038/nature14539

    Google Scholar 

  • Lemley J, Bazrafkan S, Corcoran P (2017) Deep learning for consumer devices and services: pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consum Electron Mag 6(2):48–56. https://doi.org/10.1109/mce.2016.2640698 (ISSN 2162-2248)

    Article  Google Scholar 

  • Lenc K, Vedaldi A (2015) R-CNN Minus R. In: Proceedings of the British machine vision conference (BMVC’15), BMVA Press, pp 5.1–5.12. ISBN 1-901725-53-7

  • Li H, Li Y, Porikli F (2015) Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: Proceeding of the British Machine Vision Conference (BMVC, 2014), University of Nottingham, pp 1–10. https://doi.org/10.5244/C.28.56

  • Li H, Li Y, Porikli F (2016a) DeepTrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans Image Process 25(4):1834–1848. https://doi.org/10.1109/tip.2015.2510583 (ISSN 1057-7149)

    Article  MathSciNet  MATH  Google Scholar 

  • Li H, Li Y, Porikli F (2016b) Convolutional neural net bagging for online visual tracking. J Comput Vis Image Understand 153:120–129. https://doi.org/10.1016/j.cviu.2016.07.002 (ISSN 1077-3142)

    Article  Google Scholar 

  • Liu H, Ma B, Qin L, Pang J, Zhang C, Huang Q (2015a) Set-label modeling and deep metric learning on person re-identification. J Neuro Comput 151:1283–1292

    Google Scholar 

  • Liu Y, Guo Y, Wu S, Lew M (2015b) DeepIndex for accurate and efficient image retrieval. In: Proceedings of the ACM international conference on multimedia retrieval (ICMR’15), Shanghai, pp 43–50. https://doi.org/10.1145/2671188.2749300

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot MultiBox detector. In: Proceedings of 14th European conference on computer vision (ECCV’2016), Amsterdam

  • Liu X, Liu W, Mei T, Ma H (2016) A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: Proceedings part II of 14th European conference on computer vision (ECCV2016), Amsterdam, pp 869–884

  • Long G, Kneip L, Alvarez JM, Li H, Zhang X, Yu Q (2016) Learning image matching by simply watching video. In: Proceedings, part VI of 14th European conference on computer vision (ECCV’16), Amsterdam, pp 434–450

  • Lowry S, Sunderhauf N, Newman P, Leonard JJ, Cox D, Corke P, Milford MJ (2016) Visual place recognition: a survey. IEEE Trans Robot 32:1–19. https://doi.org/10.1109/tro.2015.2496823 (ISSN 1552-3098)

    Article  Google Scholar 

  • Ma C, Huang JB, Yang XK, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of IEEE international conference on computer vision (ICCV’15), pp 3074–3082. https://doi.org/10.1109/iccv.2015.352

  • Malmir M, Sikka K, Forster D, Fasel I, Movellan JR, Cottrell GW (2016) Deep active object recognition by joint label and action prediction. J Comput Vis Image Understand. https://doi.org/10.1016/j.cviu.2016.10.011 (ISSN 1077-3142)

    Google Scholar 

  • Milan A, Rezatofighi SH, Dick A, Reid I, Schindler K (2017) Online multi-target tracking using recurrent neural networks. In: Proceedings of the 31st conference on artificial intelligence (AAAI’17), San Francisco. arXiv:1604.03635

  • Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, pp 4293–4302. https://doi.org/10.1109/cvpr.2016.465

  • Nascimento JC, Carneiro G (2010) Efficient search methods and deep belief networks with particle filtering for non-rigid tracking: application to lip tracking. In: Proceedings of IEEE international conference on image, pp 3817–3820. https://doi.org/10.1109/icip.2010.5654045

  • Padmanabhan J, Premkumar MJJ (2015) Machine learning in automatic speech recognition: a survey. IETE Tech Rev 32(4):240–251. https://doi.org/10.1080/02564602.2015.1010611

    Article  Google Scholar 

  • Paliy I, Dovgan V, Boumbarov O, Panev S, Sachenko A, Kurylyak Y, Zagorodnya D (2011) Fast and robust face detection and tracking framework. In: Proceedings of the 6th IEEE international conference on intelligent data acquisition and advanced computing systems, vol 1, pp 430–434. https://doi.org/10.1109/idaacs.2011.6072790

  • Pan H, Jiang H (2016) A deep learning based fast image saliency detection algorithm. arXiv preprint, abs/1602.00577, 2016

  • Pang S, del Coz JJ, Yu Z, Luaces O, Dıez J (2016) Combining deep learning and preference learning for object tracking. In: Proceedings part III of 23rd international conference on neural information processing (ICONIP’16), Kyoto, pp 70–77

  • Pavlov V, Khryashchev V, Pavlov E, Shmaglit L (2013) Application for video analysis based on machine learning and computer vision algorithms, In: Proceedings of 14th conference of open innovation association (FRUCT’13), Espoo Finland, pp 90–100. https://doi.org/10.1109/fruct.2013.6737950

  • Qian X, Fu Y, Jiang Y-G, Xiang T, Xue X (2017) Multi-scale deep learning architectures for person re-identification. arXiv preprint. Available online: arXiv:1709.05165

  • Ramírez-Quintana JA, Chacon-Murguia MI, Chacon-Hinojos JF (2012) Artificial neural image processing applications: a survey. Eng Lett 20(1):68–80 (ISSN: 1816093X)

    Google Scholar 

  • Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), pp 779–788. https://doi.org/10.1109/cvpr.2016.91

  • Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 99:1–1. https://doi.org/10.1109/tpami.2016.2577031 (ISSN 0162-8828)

    Google Scholar 

  • Rothe R, Timofte R, Gool LJV (2015) DEX: deep expectation of apparent age from a single image. In: Proceedings of IEEE international conference on computer vision workshop (ICCV2015), Santiago, pp 252–257

  • Salakhutdinov R, Hinton GE (2009) Deep Boltzmann machines. In: Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS’09), Clearwater Beach, pp 448–455

  • Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of international conference on learning representations (ICLR’14), abs/1312.6229

  • Shaikh F (2017) Deep learning veresus machine learning—the essential differences you need to know! article at Analytics Vidhya. Available online: https://www.analyticsvidhya.com/blog/2017/04/comparison-between-deep-learning-machine-learning/

  • Shuai B, Wang G, Zuo Z, Wang B, Zhao L (2015) Integrating parametric and non-parametric models for scene labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15), Boston, pp 4249–4258. https://doi.org/10.1109/cvpr.2015.7299053

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’14), Columbus, pp 1891–1898. https://doi.org/10.1109/cvpr.2014.244

  • Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’15), Boston, pp 1–9. https://doi.org/10.1109/cvpr.2015.7298594

  • Taigman Y, Yang M, Ranzato M, Wolf L (2014) DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’14), Columbus, pp 1701–1708. https://doi.org/10.1109/cvpr.2014.220

  • Tan X, Li Y, Liu J, Jiang L (2010) Face liveness detection from a single image with sparse low rank bilinear discriminative model. In: Proceedings of the 11th European conference on computer vision (ECCV’10), Part VI, Heraklion, Crete, Greece, 2010. Springer, pp 504–517. ISBN 3-642-15566-9, 978-3-642-15566-6

  • Uzair M, Shafait F, Ghanem B, Mian A (2016) Representation learning with deep extreme learning machines for efficient image set classification. J Neural Comput Appl. https://doi.org/10.1007/s00521-016-2758-x (ISSN 1433-3058)

    Google Scholar 

  • Varior RR, Wang G, Lu J, Liu T (2016) Learning invariant color features for person re-identification. IEEE Trans Image Process 25(7):3395–3410. https://doi.org/10.1109/TIP.2016.2531280

    Article  MathSciNet  MATH  Google Scholar 

  • Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning (ICML’08), ACM, Helsinki, Finland, pp 1096–1103. https://doi.org/10.1145/1390156.1390294

  • Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154. https://doi.org/10.1023/b:visi.0000013087.49260.fb (ISSN 0920-5691)

    Article  Google Scholar 

  • Visin F, Kastner K, Cho K, Matteucci M, Courville AC, Bengio Y (2015) ReNet: a recurrent neural network based alternative to convolutional networks. CoRR, vol. abs/1505.00393, 2015. [Online]. Available: http://arxiv.org/abs/1505.00393

  • Visin F, Ciccone M, Romero A, Kastner K, Cho K, Bengio Y, Matteucci M, Courville AC (2016) ReSeg: a recurrent neural network-based model for semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, NV, pp 426–433. https://doi.org/10.1109/cvprw.2016.60

  • Wang L, Sng D (2015) Deep learning algorithms with applications to video analytics for a smart city: a survey. CoRR, abs/1512.03131, 2015

  • Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), Lake Tahoe, pp 809–817

  • Wang L, Liu T, Wang G, Chan KL, Yang Q (2015a) Video tracking using learned hierarchical features. IEEE Trans Image Process 24(4):1424–1435. https://doi.org/10.1109/tip.2015.2403231 (ISSN 1057-7149)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang L, Ouyang W, Wang X, Lu H (2015b) Visual tracking with fully convolutional networks. In: Proceedings of IEEE international conference on computer vision (ICCV’15), pp 3119–3127. https://doi.org/10.1109/iccv.2015.357

  • Wang L, Zhang B, Han J, Shen L, Qian CS (2016) Robust object representation by boosting-like deep learning architecture. J Image Commun 47(C):490–499. https://doi.org/10.1016/j.image.2016.06.002 (ISSN 0923-5965)

    Google Scholar 

  • Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: Proceedings of tenth IEEE international conference on computer vision (ICCV’05), Beijing, vol 1, pp 1800–1807. https://doi.org/10.1109/iccv.2005.171

  • Wu H, Chen X, Li G (2012) Simultaneous tracking and recognition of dynamic digit gestures for smart TV systems. In: Proceedings of fourth international conference on digital home, pp 351–356. https://doi.org/10.1109/icdh.2012.63

  • Wu L, Shen C, Hengel AVD (2015) PersonNet: person re-identification with deep convolutional neural networks. In: Proceedings of the 11th international conference on semantics, knowledge and grids (SKG’15), Beijing

  • Wu Z, Huang Y, Wang L (2015b) Learning representative deep features for image set analysis. IEEE Trans Multimed 17(11):1960–1968. https://doi.org/10.1109/tmm.2015.2477681 (ISSN 1520-9210)

    Article  Google Scholar 

  • Wu C, Cheng HP, Li S, Li HH, Chen Y (2016) ApesNet: a pixel-wise efficient segmentation network. In: Proceedings of the 14th ACM/IEEE symposium on embedded systems for real-time multimedia (ESTIMedia’16), Pittsburgh, ACM, pp 2–8. ISBN 978-1-4503-4543-9. https://doi.org/10.1145/2993452.2994306

  • Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, pp 1249–1258. https://doi.org/10.1109/cvpr.2016.140

  • Xie D, Zhang L, Bai L (2017a) Deep learning in visual computing and signal processing. J Appl Comput Intell Soft Comput 201:14. https://doi.org/10.1155/2017/1320780 (ISSN 1687-9724)

    Google Scholar 

  • Xie S, Girshick RB, Doll P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of computer vision and pattern recognition (CVPR’17), vol. abs/1611.05431, p 10

  • Xue H, Liu Y, Cai D, He X (2016) Tracking people in RGBD videos using deep learning and motion clues. J Neurocomput 204:70–76. https://doi.org/10.1016/j.neucom.2015.06.112 (ISSN 0925-2312)

    Article  Google Scholar 

  • Zagorukyo S, Komodakis N (2017) Wide residual networks. In: Proceedings of computer vision and pattern recognition (CVPR’17), vol. abs/1605.07146, p 15

  • Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proceedings part I of the 13th European conference computer vision (ECCV’14), Zurich, Switzerland, pp 818–833. https://doi.org/10.1007/978-3-319-10590-153

  • Zhang D, Han J, Li C, Wang J, Li X (2016a) Detection of co-salient objects by looking deep and wide. Int J Comput Vis 120(2):215–232. https://doi.org/10.1007/s11263-016-0907-4

    Article  MathSciNet  Google Scholar 

  • Zhang Z, He Z, Cao G, Cao W (2016b) Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch verification. IEEE Trans Multimed 18(10):2079–2092

    Article  Google Scholar 

  • Zhang D, Maei H, Wang X, Wang Y-F (2017) Deep reinforcement learning for visual object tracking in videos. arxiv preprint. http://arxiv.org/abs/1701.08936

  • Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’2015), pp 1265-1274. https://doi.org/10.1109/cvpr.2015.7298731

  • Zhu Y, Guo G (2016) Exploring deep features with different distance measures for still to video face matching. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’11), Chengdu, China. Springer, pp 158–166. ISBN 978-3-319-46654-5

  • Zhuang B, Wang L, Lu H (2016) Visual tracking via shallow and deep collaborative model. J Neurocomput 218:61–71. https://doi.org/10.1016/j.neucom.2016.08.070 (ISSN 0925-2312)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qaisar Abbas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbas, Q., Ibrahim, M.E.A. & Jaffar, M.A. A comprehensive review of recent advances on deep vision systems. Artif Intell Rev 52, 39–76 (2019). https://doi.org/10.1007/s10462-018-9633-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-018-9633-3

Keywords

Navigation