Abstract
Recent advances in deep convolutional neural networks have led to significant success in many computer vision tasks, including edge detection. However, the existing edge detectors neglected the structural relationships among pixels, especially those among contour pixels. Inspired by human perception, this work points out the importance of learning structural relationships and proposes a novel real-time attention edge detection (AED) framework. Firstly, an elaborately designed attention mask is employed to capture the structural relationships among pixels at edges. Secondly, in the decoding phase of our encoder–decoder model, a new module called dense upsampling group convolution is designed to tackle the problem of information loss due to stride downsampling. And then, the detailed structural information can be preserved even it is ever destroyed in the encoding phase. The proposed relationship learning module introduces negligible computation overhead, and as a result, the proposed AED meets the requirement of real-time execution with only 0.65M parameters. With the proposed model, an optimal dataset scale F-score of 79.5 is obtained on the BSDS500 dataset with an inference speed of 105 frames per second, which is significantly faster than existing methods with comparable accuracy. In addition, a state-of-the-art performance is achieved on the BSDS500 (81.6) and NYU Depth (77.0) datasets when using a heavier model.
Similar content being viewed by others
References
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)
Zhang, Y., Chu, J., Leng, L., Miao, J.: Mask-refined R-CNN: a network for refining object details in instance segmentation. Sensors 20(4), 1010 (2020)
Tu, Z., Xie, W., Cao, J., Van Gemeren, C., Poppe, R., Veltkamp, R.C.: Variational method for joint optical flow estimation and edge-aware image restoration. Pattern Recognit. 65, 11–25 (2017)
Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1558–1570 (2015)
Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1403 (2015)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)
Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X.: Richer convolutional features for edge detection. In: Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pp. 5872–5881. IEEE (2017)
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., Cottrell, G.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460. IEEE (2018)
Leng, L., Zhang, J., Xu, J., Khan, M.K., Alghathbar, K.: Dynamic weighted discrimination power analysis in DCT domain for face and palmprint recognition. In: 2010 International Conference on Information and Communication Technology Convergence (ICTC), pp. 467–471. IEEE (2010)
Deng, R., Shen, C., Liu, S., Wang, H., Liu, X.: Learning to predict crisp boundaries. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 562–578 (2018)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Sobel, I.: Camera Models and Machine Perception. Technical report, Computer Science Department, Technion (1972)
Zhang, W., Zhao, Y., Breckon, T.P., Chen, L.: Noise robust image edge detection based upon the automatic anisotropic Gaussian kernels. Pattern Recognit. 63, 193–205 (2017)
Wang, F.P., Shui, P.L.: Noise-robust color edge detector using gradient matrix and anisotropic Gaussian directional derivative matrix. Pattern Recognit. 52, 346–357 (2016)
Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 530–549 (2004)
Lim, J.J., Zitnick, C.L., Dollár, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3158–3165 (2013)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Wang, Y., Zhao, X., Huang, K.: Deep crisp boundaries. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3892–3900 (2017)
Zhang, P., Liu, W., Wang, H., Lei, Y., Lu, H.: Deep gated attention networks for large-scale street-level scene segmentation. Pattern Recognit. 88, 702–714 (2019)
Xie, S., Hu, H., Wu, Y.: Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recognit. 92, 177–191 (2019)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFS with Gaussian edge potentials. In: Advances in neural information processing systems, pp. 109–117 (2011)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499. Springer, Berlin (2016)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Pinheiro, P.O., Lin, T.Y., Collobert, R., Dollár, P.: Learning to refine object segments. In: European Conference on Computer Vision, pp. 75–91. Springer, Berlin (2016)
Chu, J., Guo, Z., Leng, L.: Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6, 19959–19967 (2018)
Leng, L., Li, M., Kim, C., Bi, X.: Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed. Tools Appl. 76(1), 333–354 (2017)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision, pp. 746–760. Springer, Berlin (2012)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga,L., et al.: Pytorch: An imperative style, high-performancedeep learning library. In: Advances in neural informationprocessing systems, pp. 8026–8037 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898 (2014)
Hallman, S., Fowlkes, C.C.: Oriented edge forests for boundary detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1732–1740 (2015)
Shen, W., Wang, X., Wang, Y., Bai, X., Zhang, Z.: Deepcontour: a deep convolutional feature learned by positive-sharing loss for contour detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3982–3991 (2015)
Bertasius, G., Shi, J., Torresani, L.: Deepedge: a multi-scale bifurcated deep network for top-down contour detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4380–4389 (2015)
Bertasius, G., Shi, J., Torresani, L.: High-for-low and low-for-high: efficient boundary detection from deep object features and its applications to high-level vision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 504–512 (2015)
Yang, J., Price, B., Cohen, S., Lee, H., Yang, M.H.: Object contour detection with a fully convolutional encoder–decoder network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 193–202 (2016)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, pp. 345–360. Springer, Berlin (2014)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, R., You, M. Fast contour detection with supervised attention learning. J Real-Time Image Proc 18, 647–657 (2021). https://doi.org/10.1007/s11554-020-00980-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-00980-1