Abstract
Deep learning algorithms have demonstrated remarkable performance in many sectors and have become one of the main foundations of modern computer-vision solutions. However, these algorithms often impose prohibitive levels of memory and computational overhead, especially in resource-constrained environments. In this study, we combine the state-of-the-art object-detection model YOLOv3 with depthwise separable convolutions and variational dropout in an attempt to bridge the gap between the superior accuracy of convolutional neural networks and the limited access to computational resources. We propose three lightweight variants of YOLOv3 by replacing the original network’s standard convolutions with depthwise separable convolutions at different strategic locations within the network, and we evaluate their impacts on YOLOv3’s size, speed, and accuracy. We also explore variational dropout: a technique that finds individual and unbounded dropout rates for each neural network weight. Experiments on the PASCAL VOC benchmark dataset show promising results where variational dropout combined with the most efficient YOLOv3 variant lead to an extremely sparse solution that reduces 95% of the baseline network’s parameters at a relatively small drop of 3% in accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision & Pattern Recognition, (CVPR 2014), pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV 2015), pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Liu, W., et al: Ssd: single shot multibox detector. Comput. Res. Repository, CoRR abs/1512.02325 (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. Comput. Res. Repository, CoRR abs/1612.08242 (2016)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. Comput. Res. Repository, CoRR abs/1804.02767 (2018)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (Voc) challenge. Inter. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Lin, Tsung-Yi., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Molchanov, D., Ashukha, A., Vetrov, D.: Variational dropout sparsifies deep neural networks. Comput. Res. Repository, CoRR abs/1701.05369 (2017)
Salem, C., Azar, D., Tokajian, S.: An image processing and genetic algorithm-based approach for the detection of Melanoma in patients. Meth. Inf. Med. (2018). https://doi.org/10.3412/ME17-01-0061
Abu-Khzam, F.N., Li, S., Markarian, C., der auf Heide, F.M., Podlipyan, P.: Efficient parallel algorithms for parameterized problems. Theor. Comput. Sci. 786, 2–12 (2019)
Abu-Khzam, F.N., Markarian, C., auf der Heide, F.M., Schubert, M.: Approximation and heuristic algorithms for computing backbones in asymmetric Ad-hoc networks. Theor. Comput. Syst. 62(8), 1673–1689 (2018). https://doi.org/10.1007/s00224-017-9836-z
Abu-Khzam, F.N., Daudjee, K., Mouawad, A.E., Nishimura, N.: On scalable parallel recursive backtracking. J. Parallel Distrib. Computing 84, 65–75 (2015)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. Comput. Res. Repository, CoRR abs/1704.04861 (2017)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. Comput. Res. Repository, CoRR abs/1207.0580 (2012)
Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Comput. Res. Repository, CoRR abs/1506.02557 (2015)
Wang S., Manning C.: Fast dropout training. In: International Conference on Machine Learning (ICML 2013), pp. 118–126 (2013)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. Comput. Res. Repository, CoRR abs/1312.6114 (2014)
Rezende, D., Mohamed, S., Wierstra, D.: Stochastic Backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, vol. 32, pp. II-1278–II-1286 (2014)
Mao, Q., Sun, H., Liu, Y., Jia, R.: Mini-YOLOv3: real-time object detector for embedded applications. IEEE Access 7, 133529–133538 (2019)
Li, Y., Han, Z., Xu, H., Liu, L., Li, X., Zhang, K.: YOLOv3-lite: a lightweight crack detection network for aircraft structure based on depthwise separable convolutions. Appl. Sci. 9(18), 3781 (2019)
Zhang, P., Zhong, Y., and Li, X.: SlimYolov3: narrower, faster and better for real-time UAV applications. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW 2019), pp. 37–45 (2019)
Zhu, P., et al.: VisDrone-VDT2018: the vision meets drone video detection and tracking challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 496–518. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_29
Ebrahimi, D., Sharafeddine, S., Ho, P., Assi, C.: Autonomous UAV trajectory for localizing ground Objects: a Reinforcement Learning approach. IEEE Trans. on Mobile Computing (2020). https://doi.org/10.1109/TMC.2020.2966989
Sang, D.V., Hung, D.V.: YOLOv3-VD: a sparse network for vehicle detection using variational dropout. In: International Symposium on Information and Communication Technology (SoICT 2019), pp. 280–284 (2019)
Gale, T., Elsen, E., Hooker, S.: The State of sparsity in deep neural networks. Comput. Res. Repository, CoRR abs/1902.09574 (2019)
Bochkovskiy, A., Wang, C. Y., Liao, H.Y.: YOLOv4: optimal speed and accuracy of object detection. Comput. Res. Repository, CoRR abs/2004.10934 (2020)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. Comput. Res. Repository, CoRR abs/1510.00149 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chakar, J., Sobbahi, R.A., Tekli, J. (2020). Depthwise Separable Convolutions and Variational Dropout within the context of YOLOv3. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2020. Lecture Notes in Computer Science(), vol 12509. Springer, Cham. https://doi.org/10.1007/978-3-030-64556-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-64556-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64555-7
Online ISBN: 978-3-030-64556-4
eBook Packages: Computer ScienceComputer Science (R0)