skip to main content
10.1145/3378936.3378941acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsimConference Proceedingsconference-collections
research-article

Efficient Semantic Segmentation through Dense Upscaling Convolutions

Published: 07 March 2020 Publication History

Abstract

Semantic segmentation is the classification of each pixel in an image to an object, the resultant pixel map has significant usage in many fields. Some fields where this technology is being actively researched is in medicine, agriculture and robotics. For uses where the resources or power requirements are restricted such as robotics or where large amounts of images are required to process, efficiency can be key to the feasibility of a technique. Other applications that require real-time processing have a need for fast and efficient methods, especially where collision avoidance or safety may be involved. We take a combination of existing semantic segmentation methods and improve upon the efficiency by the replacement of the decoder network in ERFNet with a method based upon Dense Upscaling Convolutions, we then add a novel layer that allows the fine tuning of the decoder channel depth and therefore the efficiency of the network. Our proposed modification achieves 20-30% improvement in efficiency on moderate hardware (Nvidia GTX 960) over the original ERFNET and an additional 10% efficiency over the original Dense Upscaling Convolution. We perform a series of experiments to determine viable hyperparameters for the modification and measure the efficiency and accuracy over a range of image sizes, proving the viability of our approach.

References

[1]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2015. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. (November 2015). Retrieved May 15, 2019 from https://arxiv.org/abs/1511.00561v3
[2]
L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (April 2018), 834--848.
[3]
Francois Chollet. 2017. Xception: Deep Learning With Depthwise Separable Convolutions. 1251--1258. Retrieved March 19, 2019 from http://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.html
[4]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3213--3223.
[5]
Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, and Jose Garcia-Rodriguez. 2017. A Review on Deep Learning Techniques Applied to Semantic Segmentation. ArXiv170406857 Cs (April 2017). Retrieved March 6, 2019 from http://arxiv.org/abs/1704.06857
[6]
Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Pablo Martinez-Gonzalez, and Jose Garcia-Rodriguez. 2018. A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 70, (September 2018), 41--65.
[7]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770--778.
[8]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. ArXiv170404861 Cs (April 2017). Retrieved March 19, 2019 from http://arxiv.org/abs/1704.04861
[9]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger (eds.). Curran Associates, Inc., 1097--1105. Retrieved May 31, 2019 from http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
[10]
Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning Efficient Convolutional Networks through Network Slimming. ArXiv170806519 Cs (August 2017). Retrieved August 26, 2019 from http://arxiv.org/abs/1708.06519
[11]
J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431--3440.
[12]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. ArXiv180711164 Cs (July 2018). Retrieved July 31, 2019 from http://arxiv.org/abs/1807.11164
[13]
Andres Milioto, Philipp Lottes, and Cyrill Stachniss. 2017. Real-time Semantic Segmentation of Crop and Weed for Precision Agriculture Robots Leveraging Background Knowledge in CNNs. ArXiv170906764 Cs (September 2017). Retrieved August 31, 2019 from http://arxiv.org/abs/1709.06764
[14]
F. Milletari, N. Navab, and S. Ahmadi. 2016. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), 565--571.
[15]
Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. ArXiv160602147 Cs (June 2016). Retrieved March 6, 2019 from http://arxiv.org/abs/1606.02147
[16]
E. Romera, J. M. Álvarez, L. M. Bergasa, and R. Arroyo. 2018. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation. IEEE Trans. Intell. Transp. Syst. 19, 1 (January 2018), 263--272.
[17]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. 4510--4520. Retrieved March 6, 2019 from http://openaccess.thecvf.com/content_cvpr_2018/html/Sandler_MobileNetV2_Inverted_Residuals_CVPR_2018_paper.html
[18]
Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1874--1883.
[19]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. (February 2016). Retrieved March 29, 2019 from https://arxiv.org/abs/1602.07261v2
[20]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going Deeper with Convolutions. (September 2014). Retrieved May 31, 2019 from https://arxiv.org/abs/1409.4842v1
[21]
Min Wang, Baoyuan Liu, and Hassan Foroosh. 2016. Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial "Bottleneck" Structure. ArXiv160804337 Cs (August 2016). Retrieved March 19, 2019 from http://arxiv.org/abs/1608.04337
[22]
Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2017. Understanding Convolution for Semantic Segmentation. ArXiv170208502 Cs (February 2017). Retrieved June 14, 2019 from http://arxiv.org/abs/1702.08502
[23]
Zifeng Wu, Chunhua Shen, and Anton van den Hengel. 2016. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. ArXiv161110080 Cs (November 2016). Retrieved March 8, 2019 from http://arxiv.org/abs/1611.10080
[24]
Matthew D. Zeiler and Rob Fergus. 2013. Visualizing and Understanding Convolutional Networks. (November 2013). Retrieved May 31, 2019 from https://arxiv.org/abs/1311.2901v3
[25]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 6848--6856. Retrieved March 19, 2019 from http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html
[26]
Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. 2018. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. 405--420. Retrieved March 20, 2019 from http://openaccess.thecvf.com/content_ECCV_2018/html/Hengshuang_Zhao_ICNet_for_Real-Time_ECCV_2018_paper.html
[27]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid Scene Parsing Network. 2881--2890. Retrieved May 21, 2019 from http://openaccess.thecvf.com/content_cvpr_2017/html/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.html
[28]
Download NVIDIA, GeForce, Quadro, and Tesla Drivers. Retrieved August 31, 2019 from https://www.nvidia.com/content/DriverDownload-March2009/licence.php?lang=us&type=GeForce
[29]
The Pascal Visual Object Classes Challenge: A Retrospective SpringerLink. Retrieved April 15, 2019 from https://link.springer.com/article/10.1007/s11263-014-0733-5

Index Terms

  1. Efficient Semantic Segmentation through Dense Upscaling Convolutions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICSIM '20: Proceedings of the 3rd International Conference on Software Engineering and Information Management
    January 2020
    258 pages
    ISBN:9781450376907
    DOI:10.1145/3378936
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • University of Science and Technology of China: University of Science and Technology of China

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 March 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CNN
    2. Classification
    3. Computer Vision
    4. Deep Learning
    5. Efficiency
    6. Image Processing
    7. Semantic Segmentation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICSIM '20

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 76
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media