research-article

Efficient Semantic Segmentation through Dense Upscaling Convolutions

Authors:

Kurt Schoenhoff,

Jason Holdsworth,

Ickjai LeeAuthors Info & Claims

ICSIM '20: Proceedings of the 3rd International Conference on Software Engineering and Information Management

Pages 244 - 248

https://doi.org/10.1145/3378936.3378941

Published: 07 March 2020 Publication History

Abstract

Semantic segmentation is the classification of each pixel in an image to an object, the resultant pixel map has significant usage in many fields. Some fields where this technology is being actively researched is in medicine, agriculture and robotics. For uses where the resources or power requirements are restricted such as robotics or where large amounts of images are required to process, efficiency can be key to the feasibility of a technique. Other applications that require real-time processing have a need for fast and efficient methods, especially where collision avoidance or safety may be involved. We take a combination of existing semantic segmentation methods and improve upon the efficiency by the replacement of the decoder network in ERFNet with a method based upon Dense Upscaling Convolutions, we then add a novel layer that allows the fine tuning of the decoder channel depth and therefore the efficiency of the network. Our proposed modification achieves 20-30% improvement in efficiency on moderate hardware (Nvidia GTX 960) over the original ERFNET and an additional 10% efficiency over the original Dense Upscaling Convolution. We perform a series of experiments to determine viable hyperparameters for the modification and measure the efficiency and accuracy over a range of image sizes, proving the viability of our approach.

References

[1]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2015. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. (November 2015). Retrieved May 15, 2019 from https://arxiv.org/abs/1511.00561v3

[2]

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (April 2018), 834--848.

[3]

Francois Chollet. 2017. Xception: Deep Learning With Depthwise Separable Convolutions. 1251--1258. Retrieved March 19, 2019 from http://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.html

[4]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3213--3223.

[5]

Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, and Jose Garcia-Rodriguez. 2017. A Review on Deep Learning Techniques Applied to Semantic Segmentation. ArXiv170406857 Cs (April 2017). Retrieved March 6, 2019 from http://arxiv.org/abs/1704.06857

[6]

Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Pablo Martinez-Gonzalez, and Jose Garcia-Rodriguez. 2018. A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 70, (September 2018), 41--65.

[7]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770--778.

[8]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. ArXiv170404861 Cs (April 2017). Retrieved March 19, 2019 from http://arxiv.org/abs/1704.04861

[9]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger (eds.). Curran Associates, Inc., 1097--1105. Retrieved May 31, 2019 from http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

[10]

Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning Efficient Convolutional Networks through Network Slimming. ArXiv170806519 Cs (August 2017). Retrieved August 26, 2019 from http://arxiv.org/abs/1708.06519

[11]

J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431--3440.

[12]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. ArXiv180711164 Cs (July 2018). Retrieved July 31, 2019 from http://arxiv.org/abs/1807.11164

[13]

Andres Milioto, Philipp Lottes, and Cyrill Stachniss. 2017. Real-time Semantic Segmentation of Crop and Weed for Precision Agriculture Robots Leveraging Background Knowledge in CNNs. ArXiv170906764 Cs (September 2017). Retrieved August 31, 2019 from http://arxiv.org/abs/1709.06764

[14]

F. Milletari, N. Navab, and S. Ahmadi. 2016. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), 565--571.

[15]

Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. ArXiv160602147 Cs (June 2016). Retrieved March 6, 2019 from http://arxiv.org/abs/1606.02147

[16]

E. Romera, J. M. Álvarez, L. M. Bergasa, and R. Arroyo. 2018. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation. IEEE Trans. Intell. Transp. Syst. 19, 1 (January 2018), 263--272.

[17]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. 4510--4520. Retrieved March 6, 2019 from http://openaccess.thecvf.com/content_cvpr_2018/html/Sandler_MobileNetV2_Inverted_Residuals_CVPR_2018_paper.html

[18]

Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1874--1883.

[19]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. (February 2016). Retrieved March 29, 2019 from https://arxiv.org/abs/1602.07261v2

[20]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going Deeper with Convolutions. (September 2014). Retrieved May 31, 2019 from https://arxiv.org/abs/1409.4842v1

[21]

Min Wang, Baoyuan Liu, and Hassan Foroosh. 2016. Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial "Bottleneck" Structure. ArXiv160804337 Cs (August 2016). Retrieved March 19, 2019 from http://arxiv.org/abs/1608.04337

[22]

Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. 2017. Understanding Convolution for Semantic Segmentation. ArXiv170208502 Cs (February 2017). Retrieved June 14, 2019 from http://arxiv.org/abs/1702.08502

[23]

Zifeng Wu, Chunhua Shen, and Anton van den Hengel. 2016. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. ArXiv161110080 Cs (November 2016). Retrieved March 8, 2019 from http://arxiv.org/abs/1611.10080

[24]

Matthew D. Zeiler and Rob Fergus. 2013. Visualizing and Understanding Convolutional Networks. (November 2013). Retrieved May 31, 2019 from https://arxiv.org/abs/1311.2901v3

[25]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 6848--6856. Retrieved March 19, 2019 from http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.html

[26]

Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. 2018. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. 405--420. Retrieved March 20, 2019 from http://openaccess.thecvf.com/content_ECCV_2018/html/Hengshuang_Zhao_ICNet_for_Real-Time_ECCV_2018_paper.html

[27]

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid Scene Parsing Network. 2881--2890. Retrieved May 21, 2019 from http://openaccess.thecvf.com/content_cvpr_2017/html/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.html

[28]

Download NVIDIA, GeForce, Quadro, and Tesla Drivers. Retrieved August 31, 2019 from https://www.nvidia.com/content/DriverDownload-March2009/licence.php?lang=us&type=GeForce

[29]

The Pascal Visual Object Classes Challenge: A Retrospective SpringerLink. Retrieved April 15, 2019 from https://link.springer.com/article/10.1007/s11263-014-0733-5

Index Terms

Efficient Semantic Segmentation through Dense Upscaling Convolutions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Deep Learning for Semantic Segmentation on Minimal Hardware
RoboCup 2018: Robot World Cup XXII
Abstract
Deep learning has revolutionised many fields, but it is still challenging to transfer its success to small mobile robots with minimal hardware. Specifically, some work has been done to this effect in the RoboCup humanoid football domain, but ...
Distance estimation with semantic segmentation and edge detection of surround view images
Abstract
This paper presents a method for obtaining 2D distance data through a robot’s surround view camera system. By converting semantic segmentation images into bird’s eye view, the location of the traversable region can be determined. However, since ...
Efficient and Robust 3D Object Reconstruction Based on Monocular SLAM and CNN Semantic Segmentation
RoboCup 2019: Robot World Cup XXIII
Abstract
Various applications implement slam technology, especially in the field of robot navigation. We show the advantage of slam technology for independent 3d object reconstruction. To receive a point cloud of every object of interest void of its ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICSIM '20: Proceedings of the 3rd International Conference on Software Engineering and Information Management

January 2020

258 pages

ISBN:9781450376907

DOI:10.1145/3378936

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Science and Technology of China: University of Science and Technology of China

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICSIM '20

ICSIM '20: The 3rd International Conference on Software Engineering and Information Management

January 12 - 15, 2020

NSW, Sydney, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
76
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten