Abstract:
We propose a fast and lightweight end-to-end convolutional network architecture for real-time segmentation of high resolution videos, NfS-SegNet, that can segement 2K-vid...View moreMetadata
Abstract:
We propose a fast and lightweight end-to-end convolutional network architecture for real-time segmentation of high resolution videos, NfS-SegNet, that can segement 2K-videos at 36.5 FPS with 24.3 GFLOPS. This speed and computation-efficiency is due to following reasons: 1) The encoder network, NfS-Net, is optimized for speed with simple building blocks without memory-heavy operations such as depthwise convolutions, and outperforms state-of-the-art lightweight CNN architectures such as SqueezeNet [2], Mo- bileNet v1 [3] & v2 [4] and ShuffleNet v1 [5] & v2 [6] on image classification with significantly higher speed. 2) The NfS- SegNet has an asymmetric architecture with deeper encoder and shallow decoder, whose design is based on our empirical finding that the decoder is the main bottleneck in computation with relatively small contribution to the final performance. 3) Our novel uncertainty-aware knowledge distillation method guides the teacher model to focus its knowledge transfer on the most difficult image regions. We validate the performance of NfS-SegNet with the CITYSCAPE [1] benchmark, on which it achieves state-of-the-art performance among lightweight segementation models in terms of both accuracy and speed.
Date of Conference: 31 May 2020 - 31 August 2020
Date Added to IEEE Xplore: 15 September 2020
ISBN Information: