A comprehensive review of recent advances on deep vision systems

Abbas, Qaisar; Ibrahim, Mostafa E. A.; Jaffar, M. Arfan

doi:10.1007/s10462-018-9633-3

A comprehensive review of recent advances on deep vision systems

Published: 11 May 2018

Volume 52, pages 39–76, (2019)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

2737 Accesses
54 Citations
Explore all metrics

Abstract

Real-time video objects detection, tracking, and recognition are challenging issues due to the real-time processing requirements of the machine learning algorithms. In recent years, video processing is performed by deep learning (DL) based techniques that achieve higher accuracy but require higher computations cost. This paper presents a recent survey of the state-of-the-art DL platforms and architectures used for deep vision systems. It highlights the contributions and challenges from over numerous research studies. In particular, this paper first describes the architecture of various DL models such as AutoEncoders, deep Boltzmann machines, convolution neural networks, recurrent neural networks and deep residual learning. Next, deep real-time video objects detection, tracking and recognition studies are highlighted to illustrate the key trends in terms of cost of computation, number of layers and the accuracy of results. Finally, the paper discusses the challenges of applying DL for real-time video processing and draw some directions for the future of DL algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

References

Abbas Q, Ibrahim MEA, Jaffar MA (2017) Video scene analysis: an overview and challenges on deep learning algorithms. J Multimed Tools Appl. https://doi.org/10.1007/s11042-017-5438-7
Google Scholar
Alotaibi A, Mahmood A (2016) Deep face liveness detection based on nonlinear diffusion using convolution neural network. J Signal Image Video Process. https://doi.org/10.1007/s11760-016-1014-2 (ISSN 1863-1711)
Google Scholar
Andriluka M, Stewart R, Ng AY (2016) End-to-end people detection in crowded scenes. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), pp 2325–2333. https://doi.org/10.1109/cvpr.2016.255
Badrinarayanan V, Kendall A, Cipolla R (2015) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint, abs/1511.00561
Bai J, Wu Y, Zhang J, Chen F (2015) Subset based deep learning for RGB-D object recognition. J Neuro Comput 165(3):280–292. https://doi.org/10.1016/j.neucom.2015.03.017 (ISSN 0925-2312)
Google Scholar
Batra D, Kowdle A, Parikh D, Luo J, Chen T (2010) iCoseg: interactive Co-segmentation with Intelligent scribble guidance. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 3169–3176. https://doi.org/10.1109/cvpr.2010.5540080
Bell S, Zitnick CL, Bala K, Girshick RB (2016) Inside–outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, pp 2874–2883
Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Proceedings of the 19th international conference on neural information processing systems (NIPS’06). MIT Press, Canada, pp 153–160
Boumbarov O, Panev S, Paliy I, Petrov P, Dimitrov L (2011) Homography-based face orientation determination from a fixed monocular camera. In: Proceedings of the 6th IEEE international conference on intelligent data acquisition and advanced computing systems, vol 1, pp. 399–403. https://doi.org/10.1109/idaacs.2011.6072783
Cao Y, Chen Y, Khosla D (2015) Spiking deep convolutional neural networks for energy-efficient object recognition. Int J Comput Vis 113(1):54–66. https://doi.org/10.1007/s11263-014-0788-3 (ISSN 0920-5691)
Article MathSciNet Google Scholar
Carneiro G, Nascimento JC (2010) The fusion of deep learning architectures and particle filtering applied to lip tracking. In: Proceedings of 20th international conference on pattern recognition, pp 2065–2068. https://doi.org/10.1109/icpr.2010.508
Chatfield K, Arandjelovic R, Parkhi OM, Zisserman A (2015) On-the-fly learning for visual search of large-scale image and video datasets. Int J Multimed Inf Retriev 4(2):75–93. https://doi.org/10.1007/s13735-015-0077-0
Article Google Scholar
Chellappa R (2016) The changing fortunes of pattern recognition and computer vision. J Image Vis Comput 55(3–5):2016. https://doi.org/10.1016/j.imavis.2016.04.005
Google Scholar
Chen H, Ni D, Qin J, Li S, Yang X, Wang T, Heng PA (2015) Standard plane localization in fetal ultrasound via domain transferred deep neural networks. IEEE J Biomed Health Inf 19(5):1627–1636. https://doi.org/10.1109/jbhi.2015.2425041.4 (ISSN 2168-2194)
Article Google Scholar
Cheng HY, Weng CC, Chen YY (2012) Vehicle detection in aerial surveillance using dynamic bayesian networks. IEEE Trans Image Process 21(4):2152–2159. https://doi.org/10.1109/tip.2011.2172798 (ISSN 1057-7149)
Article MathSciNet MATH Google Scholar
Cinbis RG, Verbeek J, Schmid C (2017) Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans Pattern Anal Mach Intell 39(1):189–203. https://doi.org/10.1109/tpami.2016.2535231 (ISSN 0162-8828)
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of conference on neural information processing systems, Barcelona, pp 379–387
Ding S, Lin L, Wang G, Chao H (2015) Deep feature learning with relative distance comparison for person re-identification. J Pattern Recogn 48(10):2993–3003. https://doi.org/10.1016/j.patcog.2015.04.005 (ISSN 0031-3203)
Article Google Scholar
Ding J, Huang Y, Liu W, Huang K (2016) Severely blurred object tracking by learning deep image representations. IEEE Trans Circuits Syst Video Technol 26(2):319–331. https://doi.org/10.1109/tcsvt.2015.2406231 (ISSN 1051-8215)
Article Google Scholar
Druzhkov PN, Kustikova VD (2016) A survey of deep learning methods and software tools for image classification and object detection. J Pattern Recogn Image Anal 26(1):9–15. https://doi.org/10.1134/s1054661816010065 (ISSN 1054-6618)
Article Google Scholar
Fan B, Xie L, Yang S, Wang L, Soong FK (2016) A deep bidirectional LSTM approach for video-realistic talking head. J Multimed Tools Appl 75(9):5287–5309. https://doi.org/10.1007/s11042-015-2944-3 (ISSN 1573-7721)
Article Google Scholar
Farrajota M, Rodrigues JMF, du Buf JMH (2016) A deep neural network video framework for monitoring elderly persons. In: Proceedings part II of 10th international conference universal access in human–computer interaction (UAHCI2016), Toronto, pp 370–381
Forczmanski P, Nowosielski A (2016) Deep learning approach to detection of preceding vehicle in advanced driver assistance. In: 16th International conference on transport systems telematics (TST’16), Katowice-Ustron, Poland, pp 293–304. https://doi.org/10.1007/978-3-319-49646-725
Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. J Biol Cybern 36(4):193–202. https://doi.org/10.1007/bf00344251 (ISSN 1432-0770)
Article MATH Google Scholar
Gando G, Yamada T, Sato H, Oyama S, Kurihara M (2016) Fine-tuning deep convolutional neural networks for distinguishing illustrations from photographs. Int J Expert Syst Appl 66(295–301):2016
Google Scholar
Garcia-Garcia A, Orts-Escolano S, Oprea SO, Villena-Martinez V, Garcia-Rodriguez J (2017) A review on deep learning techniques applied to semantic segmentation. CoRR, vol. abs/1704.06857, 2017. [Online]. Available: http://arxiv.org/abs/1704.06857
Girshick R (2015) Fast R-CNN. In: Proceedings of IEEE international conference on computer vision (ICCV’15), Santiago, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’14). IEEE Computer Society, Washington. pp 580–587. ISBN 978-1-4799-5118-5. https://doi.org/10.1109/cvpr.2014.81
Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, pp 6645–6649. https://doi.org/10.1109/icassp.2013.6638947
Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. J Neurocompt 187(6):27–48. https://doi.org/10.1016/j.neucom.2015.09.116 (ISSN 0925-2312)
Article Google Scholar
Hamedani K, Seyyedsalehi SA, Ahamdi R (2016) Video-based face recognition and image synthesis from rotating head frames using nonlinear manifold learning by neural networks. J Neural Comput Appl 27(6):1761–1769. https://doi.org/10.1007/s00521-015-1975-z (ISSN 0941-0643)
Article Google Scholar
Hayat M, Bennamoun M, An S (2015) Deep reconstruction models for image set classification. IEEE Trans Pattern Anal Mach Intell 37(4):713–727. https://doi.org/10.1109/tpami.2014.2353635 (ISSN 0162-8828)
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1109/tpami.2015.2389824
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, pp 770–778. https://doi.org/10.1109/cvpr.2016.90
He T, Mao H, Yi Z (2016b) Moving object recognition using multi-view three-dimensional convolutional neural networks. J Neural Comput Appl. https://doi.org/10.1007/s00521-016-2277-9 (ISSN 1433-3058)
Google Scholar
Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. J Image Vis Comput 60(2017):4–21. https://doi.org/10.1016/j.imavis.2017.01.010 (ISSN 0262-8856)
Article Google Scholar
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. J Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of the 32nd international conference on machine learning (ICML’15), Lille, pp 597–606
Hong S, Roh B, Kim K, Cheon Y, Park M (2016) PVANet: lightweight deep neural networks for real-time object detection. In: Proceedings of the 1st international workshop on efficient methods for deep neural networks (EMDNN’2016), abs/1611.08588
Huang GB, Lee H, Miller EL (2012) Learning hierarchical representations for face verification with convolutional deep belief networks. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition (CVPR ‘12). IEEE Computer Society, Washington, pp 2518–2525
Kim B, Roh J, Dong S, Lee S (2016) Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J Multimodal User Interfaces 10(2):173–189. https://doi.org/10.1007/s12193-015-0209-0
Article Google Scholar
Krig S (2016) Feature learning and deep learning architecture survey. computer vision metrics-textbook edition. Springer, Berlin, pp 375–514. https://doi.org/10.1007/978-3-319-33762-310 (978-3-319-33762-3)
MATH Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of 25th international conference on neural information processing systems (NIPS’12), Nevada, pp 1097–1105
Kuen J, Lim KM, Lee CP (2015) Self-taught learning of a deep invariant representation for visual tracking via temporal slowness. J Pattern Recogn 48(10):2964–2982. https://doi.org/10.1016/j.patcog.2015.02.012 (ISSN 0031-3203)
Article Google Scholar
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. J Neural Comput 1(4):541–551. https://doi.org/10.1162/neco.1989.1.4.541 (ISSN 0899-7667)
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep Learning. Int J Sci 521:436–444. https://doi.org/10.1038/nature14539
Google Scholar
Lemley J, Bazrafkan S, Corcoran P (2017) Deep learning for consumer devices and services: pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consum Electron Mag 6(2):48–56. https://doi.org/10.1109/mce.2016.2640698 (ISSN 2162-2248)
Article Google Scholar
Lenc K, Vedaldi A (2015) R-CNN Minus R. In: Proceedings of the British machine vision conference (BMVC’15), BMVA Press, pp 5.1–5.12. ISBN 1-901725-53-7
Li H, Li Y, Porikli F (2015) Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: Proceeding of the British Machine Vision Conference (BMVC, 2014), University of Nottingham, pp 1–10. https://doi.org/10.5244/C.28.56
Li H, Li Y, Porikli F (2016a) DeepTrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans Image Process 25(4):1834–1848. https://doi.org/10.1109/tip.2015.2510583 (ISSN 1057-7149)
Article MathSciNet MATH Google Scholar
Li H, Li Y, Porikli F (2016b) Convolutional neural net bagging for online visual tracking. J Comput Vis Image Understand 153:120–129. https://doi.org/10.1016/j.cviu.2016.07.002 (ISSN 1077-3142)
Article Google Scholar
Liu H, Ma B, Qin L, Pang J, Zhang C, Huang Q (2015a) Set-label modeling and deep metric learning on person re-identification. J Neuro Comput 151:1283–1292
Google Scholar
Liu Y, Guo Y, Wu S, Lew M (2015b) DeepIndex for accurate and efficient image retrieval. In: Proceedings of the ACM international conference on multimedia retrieval (ICMR’15), Shanghai, pp 43–50. https://doi.org/10.1145/2671188.2749300
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot MultiBox detector. In: Proceedings of 14th European conference on computer vision (ECCV’2016), Amsterdam
Liu X, Liu W, Mei T, Ma H (2016) A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: Proceedings part II of 14th European conference on computer vision (ECCV2016), Amsterdam, pp 869–884
Long G, Kneip L, Alvarez JM, Li H, Zhang X, Yu Q (2016) Learning image matching by simply watching video. In: Proceedings, part VI of 14th European conference on computer vision (ECCV’16), Amsterdam, pp 434–450
Lowry S, Sunderhauf N, Newman P, Leonard JJ, Cox D, Corke P, Milford MJ (2016) Visual place recognition: a survey. IEEE Trans Robot 32:1–19. https://doi.org/10.1109/tro.2015.2496823 (ISSN 1552-3098)
Article Google Scholar
Ma C, Huang JB, Yang XK, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of IEEE international conference on computer vision (ICCV’15), pp 3074–3082. https://doi.org/10.1109/iccv.2015.352
Malmir M, Sikka K, Forster D, Fasel I, Movellan JR, Cottrell GW (2016) Deep active object recognition by joint label and action prediction. J Comput Vis Image Understand. https://doi.org/10.1016/j.cviu.2016.10.011 (ISSN 1077-3142)
Google Scholar
Milan A, Rezatofighi SH, Dick A, Reid I, Schindler K (2017) Online multi-target tracking using recurrent neural networks. In: Proceedings of the 31st conference on artificial intelligence (AAAI’17), San Francisco. arXiv:1604.03635
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, pp 4293–4302. https://doi.org/10.1109/cvpr.2016.465
Nascimento JC, Carneiro G (2010) Efficient search methods and deep belief networks with particle filtering for non-rigid tracking: application to lip tracking. In: Proceedings of IEEE international conference on image, pp 3817–3820. https://doi.org/10.1109/icip.2010.5654045
Padmanabhan J, Premkumar MJJ (2015) Machine learning in automatic speech recognition: a survey. IETE Tech Rev 32(4):240–251. https://doi.org/10.1080/02564602.2015.1010611
Article Google Scholar
Paliy I, Dovgan V, Boumbarov O, Panev S, Sachenko A, Kurylyak Y, Zagorodnya D (2011) Fast and robust face detection and tracking framework. In: Proceedings of the 6th IEEE international conference on intelligent data acquisition and advanced computing systems, vol 1, pp 430–434. https://doi.org/10.1109/idaacs.2011.6072790
Pan H, Jiang H (2016) A deep learning based fast image saliency detection algorithm. arXiv preprint, abs/1602.00577, 2016
Pang S, del Coz JJ, Yu Z, Luaces O, Dıez J (2016) Combining deep learning and preference learning for object tracking. In: Proceedings part III of 23rd international conference on neural information processing (ICONIP’16), Kyoto, pp 70–77
Pavlov V, Khryashchev V, Pavlov E, Shmaglit L (2013) Application for video analysis based on machine learning and computer vision algorithms, In: Proceedings of 14th conference of open innovation association (FRUCT’13), Espoo Finland, pp 90–100. https://doi.org/10.1109/fruct.2013.6737950
Qian X, Fu Y, Jiang Y-G, Xiang T, Xue X (2017) Multi-scale deep learning architectures for person re-identification. arXiv preprint. Available online: arXiv:1709.05165
Ramírez-Quintana JA, Chacon-Murguia MI, Chacon-Hinojos JF (2012) Artificial neural image processing applications: a survey. Eng Lett 20(1):68–80 (ISSN: 1816093X)
Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), pp 779–788. https://doi.org/10.1109/cvpr.2016.91
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 99:1–1. https://doi.org/10.1109/tpami.2016.2577031 (ISSN 0162-8828)
Google Scholar
Rothe R, Timofte R, Gool LJV (2015) DEX: deep expectation of apparent age from a single image. In: Proceedings of IEEE international conference on computer vision workshop (ICCV2015), Santiago, pp 252–257
Salakhutdinov R, Hinton GE (2009) Deep Boltzmann machines. In: Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS’09), Clearwater Beach, pp 448–455
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of international conference on learning representations (ICLR’14), abs/1312.6229
Shaikh F (2017) Deep learning veresus machine learning—the essential differences you need to know! article at Analytics Vidhya. Available online: https://www.analyticsvidhya.com/blog/2017/04/comparison-between-deep-learning-machine-learning/
Shuai B, Wang G, Zuo Z, Wang B, Zhao L (2015) Integrating parametric and non-parametric models for scene labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’15), Boston, pp 4249–4258. https://doi.org/10.1109/cvpr.2015.7299053
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’14), Columbus, pp 1891–1898. https://doi.org/10.1109/cvpr.2014.244
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR’15), Boston, pp 1–9. https://doi.org/10.1109/cvpr.2015.7298594
Taigman Y, Yang M, Ranzato M, Wolf L (2014) DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’14), Columbus, pp 1701–1708. https://doi.org/10.1109/cvpr.2014.220
Tan X, Li Y, Liu J, Jiang L (2010) Face liveness detection from a single image with sparse low rank bilinear discriminative model. In: Proceedings of the 11th European conference on computer vision (ECCV’10), Part VI, Heraklion, Crete, Greece, 2010. Springer, pp 504–517. ISBN 3-642-15566-9, 978-3-642-15566-6
Uzair M, Shafait F, Ghanem B, Mian A (2016) Representation learning with deep extreme learning machines for efficient image set classification. J Neural Comput Appl. https://doi.org/10.1007/s00521-016-2758-x (ISSN 1433-3058)
Google Scholar
Varior RR, Wang G, Lu J, Liu T (2016) Learning invariant color features for person re-identification. IEEE Trans Image Process 25(7):3395–3410. https://doi.org/10.1109/TIP.2016.2531280
Article MathSciNet MATH Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning (ICML’08), ACM, Helsinki, Finland, pp 1096–1103. https://doi.org/10.1145/1390156.1390294
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154. https://doi.org/10.1023/b:visi.0000013087.49260.fb (ISSN 0920-5691)
Article Google Scholar
Visin F, Kastner K, Cho K, Matteucci M, Courville AC, Bengio Y (2015) ReNet: a recurrent neural network based alternative to convolutional networks. CoRR, vol. abs/1505.00393, 2015. [Online]. Available: http://arxiv.org/abs/1505.00393
Visin F, Ciccone M, Romero A, Kastner K, Cho K, Bengio Y, Matteucci M, Courville AC (2016) ReSeg: a recurrent neural network-based model for semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, NV, pp 426–433. https://doi.org/10.1109/cvprw.2016.60
Wang L, Sng D (2015) Deep learning algorithms with applications to video analytics for a smart city: a survey. CoRR, abs/1512.03131, 2015
Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), Lake Tahoe, pp 809–817
Wang L, Liu T, Wang G, Chan KL, Yang Q (2015a) Video tracking using learned hierarchical features. IEEE Trans Image Process 24(4):1424–1435. https://doi.org/10.1109/tip.2015.2403231 (ISSN 1057-7149)
Article MathSciNet MATH Google Scholar
Wang L, Ouyang W, Wang X, Lu H (2015b) Visual tracking with fully convolutional networks. In: Proceedings of IEEE international conference on computer vision (ICCV’15), pp 3119–3127. https://doi.org/10.1109/iccv.2015.357
Wang L, Zhang B, Han J, Shen L, Qian CS (2016) Robust object representation by boosting-like deep learning architecture. J Image Commun 47(C):490–499. https://doi.org/10.1016/j.image.2016.06.002 (ISSN 0923-5965)
Google Scholar
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: Proceedings of tenth IEEE international conference on computer vision (ICCV’05), Beijing, vol 1, pp 1800–1807. https://doi.org/10.1109/iccv.2005.171
Wu H, Chen X, Li G (2012) Simultaneous tracking and recognition of dynamic digit gestures for smart TV systems. In: Proceedings of fourth international conference on digital home, pp 351–356. https://doi.org/10.1109/icdh.2012.63
Wu L, Shen C, Hengel AVD (2015) PersonNet: person re-identification with deep convolutional neural networks. In: Proceedings of the 11th international conference on semantics, knowledge and grids (SKG’15), Beijing
Wu Z, Huang Y, Wang L (2015b) Learning representative deep features for image set analysis. IEEE Trans Multimed 17(11):1960–1968. https://doi.org/10.1109/tmm.2015.2477681 (ISSN 1520-9210)
Article Google Scholar
Wu C, Cheng HP, Li S, Li HH, Chen Y (2016) ApesNet: a pixel-wise efficient segmentation network. In: Proceedings of the 14th ACM/IEEE symposium on embedded systems for real-time multimedia (ESTIMedia’16), Pittsburgh, ACM, pp 2–8. ISBN 978-1-4503-4543-9. https://doi.org/10.1145/2993452.2994306
Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’16), Las Vegas, pp 1249–1258. https://doi.org/10.1109/cvpr.2016.140
Xie D, Zhang L, Bai L (2017a) Deep learning in visual computing and signal processing. J Appl Comput Intell Soft Comput 201:14. https://doi.org/10.1155/2017/1320780 (ISSN 1687-9724)
Google Scholar
Xie S, Girshick RB, Doll P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of computer vision and pattern recognition (CVPR’17), vol. abs/1611.05431, p 10
Xue H, Liu Y, Cai D, He X (2016) Tracking people in RGBD videos using deep learning and motion clues. J Neurocomput 204:70–76. https://doi.org/10.1016/j.neucom.2015.06.112 (ISSN 0925-2312)
Article Google Scholar
Zagorukyo S, Komodakis N (2017) Wide residual networks. In: Proceedings of computer vision and pattern recognition (CVPR’17), vol. abs/1605.07146, p 15
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proceedings part I of the 13th European conference computer vision (ECCV’14), Zurich, Switzerland, pp 818–833. https://doi.org/10.1007/978-3-319-10590-153
Zhang D, Han J, Li C, Wang J, Li X (2016a) Detection of co-salient objects by looking deep and wide. Int J Comput Vis 120(2):215–232. https://doi.org/10.1007/s11263-016-0907-4
Article MathSciNet Google Scholar
Zhang Z, He Z, Cao G, Cao W (2016b) Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch verification. IEEE Trans Multimed 18(10):2079–2092
Article Google Scholar
Zhang D, Maei H, Wang X, Wang Y-F (2017) Deep reinforcement learning for visual object tracking in videos. arxiv preprint. http://arxiv.org/abs/1701.08936
Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’2015), pp 1265-1274. https://doi.org/10.1109/cvpr.2015.7298731
Zhu Y, Guo G (2016) Exploring deep features with different distance measures for still to video face matching. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR’11), Chengdu, China. Springer, pp 158–166. ISBN 978-3-319-46654-5
Zhuang B, Wang L, Lu H (2016) Visual tracking via shallow and deep collaborative model. J Neurocomput 218:61–71. https://doi.org/10.1016/j.neucom.2016.08.070 (ISSN 0925-2312)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Al Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11432, Saudi Arabia
Qaisar Abbas, Mostafa E. A. Ibrahim & M. Arfan Jaffar
Benha Faculty of Engineering, Benha University, Benha, Egypt
Mostafa E. A. Ibrahim

Authors

Qaisar Abbas
View author publications
You can also search for this author in PubMed Google Scholar
Mostafa E. A. Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
M. Arfan Jaffar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qaisar Abbas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbas, Q., Ibrahim, M.E.A. & Jaffar, M.A. A comprehensive review of recent advances on deep vision systems. Artif Intell Rev 52, 39–76 (2019). https://doi.org/10.1007/s10462-018-9633-3

Download citation

Published: 11 May 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s10462-018-9633-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive review of recent advances on deep vision systems

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

Methods for image denoising using convolutional neural network: a review

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comprehensive review of recent advances on deep vision systems

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

Methods for image denoising using convolutional neural network: a review

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation