DeepFlux for Skeleton Detection in the Wild

Xu, Yongchao; Wang, Yukang; Tsogkas, Stavros; Wan, Jianqiang; Bai, Xiang; Dickinson, Sven; Siddiqi, Kaleem

doi:10.1007/s11263-021-01430-6

DeepFlux for Skeleton Detection in the Wild

Published: 30 January 2021

Volume 129, pages 1323–1339, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yongchao Xu¹,
Yukang Wang²,
Stavros Tsogkas^3,5,
Jianqiang Wan²,
Xiang Bai²,
Sven Dickinson^3,4,5 &
…
Kaleem Siddiqi⁶

1151 Accesses
9 Citations
Explore all metrics

Abstract

The medial axis, or skeleton, is a fundamental object representation that has been extensively used in shape recognition. Yet, its extension to natural images has been challenging due to the large appearance and scale variations of objects and complex background clutter that appear in this setting. In contrast to recent methods that address skeleton extraction as a binary pixel classification problem, in this article we present an alternative formulation for skeleton detection. We follow the spirit of flux-based algorithms for medial axis recovery by training a convolutional neural network to predict a two-dimensional vector field encoding the flux representation. The skeleton is then recovered from the flux representation, which captures the position of skeletal pixels relative to semantically meaningful entities (e.g., image points in spatial context, and hence the implied object boundaries), resulting in precise skeleton detection. Moreover, since the flux representation is a region-based vector field, it is better able to cope with object parts of large width. We evaluate the proposed method, termed DeepFlux, on six benchmark datasets, consistently achieving superior performance over state-of-the-art methods. Finally, we demonstrate an application of DeepFlux, augmented with a skeleton scale estimation module, to detect objects in aerial images. This combination yields results that are competitive with models trained specifically for object detection, showcasing the versatility and effectiveness of mid-level representations in high-level tasks. An implementation of our method is available at https://github.com/YukangWang/DeepFlux.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fusing Multi-scale Residual Network for Skeleton Detection

Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks

Article Open access 01 November 2023

Cong Yang, Bipin Indurkhya, … Marcin Grzegorzek

Real-time low-cost human skeleton detection

Article 09 August 2021

Eungyeol Song, Jinkyung Do & Sunjin Yu

Notes

In fact, in the context of skeletonization of binary objects Siddiqi and Pizer (2008), this flux vector would be in the direction opposite to that of the spoke vector from a skeletal pixel to its associated boundary pixel.

References

Ahn, J., Cho, S., & Kwak, S. (2019). Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2209–2218).
Bai, M., & Urtasun, R. (2017). Deep watershed transform for instance segmentation. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2858–2866).
Bai, X., Wang, X., Latecki, L. J., Liu, W., & Tu, Z. (2009). Active skeleton for non-rigid object detection. In Proceedings of IEEE international conference on computer vision (pp. 575–582).
Blum, H. (1973). Biological shape and visual science (part i). Journal of Theoretical Biology, 38(2), 205–287.
Article Google Scholar
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In Proceedings of European conference on computer vision (pp. 109–122).
Chen, L. C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., & Adam, H. (2018). Masklab: Instance segmentation by refining object detection with semantic and direction features. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4013–4022).
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Article Google Scholar
Chen, X., Fang, H., Lin, T. Y., Vedantam, R., Gupta, S., Dollár, P., & Zitnick, C. L. (2015). Microsoft coco captions: Data collection and evaluation server. CoRR abs/1504.00325.
Ci, H., Wang, C., & Wang, Y. (2018). Video object segmentation by learning location-sensitive embeddings. In Proceedings of European conference on computer vision (pp. 501–516).
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Li, F. F. (2009). Imagenet: A large-scale hierarchical image database. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 248–255).
Dickinson, S. J. (2009). Object categorization: Computer and human vision perspectives. Cambridge: Cambridge University Press.
Book Google Scholar
Dimitrov, P., Damon, J. N., & Siddiqi, K. (2013). Flux invariants for shape. In Proceedings of IEEE international conference on computer vision and pattern recognition.
Ding, J., Xue, N., Long, Y., Xia, G. S., & Lu, Q. (2019). Learning RoI transformer for oriented object detection in aerial images. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2849–2858).
Dollár, P., & Zitnick, C. L. (2015). Fast edge detection using structured forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), 1558–1570.
Article Google Scholar
Dufresne-Camaro, C. O., Rezanejad, M., Tsogkas, S., Siddiqi, K., & Dickinson, S. (2020). Appearance shock grammar for fast medial axis extraction from real images. In Proceedings of IEEE international conference on computer vision and pattern recognition.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2012). Distance transforms of sampled functions. Theory of Computing, 8(1), 415–428.
Article MathSciNet Google Scholar
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In Proceedings of IEEE international conference on computer vision (pp. 415–422).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 770–778).
Jang, J. H., & Hong, K. S. (2001). A pseudo-distance map for the segmentation-free skeletonization of gray-scale images. In Proceedings of IEEE international conference on computer vision (vol. 2, pp. 18–23).
Jerripothula, K. R., Cai, J., Lu, J., & Yuan, J. (2017). Object co-skeletonization with co-segmentation. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 3881–3889).
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACM multimedia (pp. 675–678).
Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., & Luo, Z. (2017). R2CNN: Rotational region CNN for orientation robust scene text detection. Preprint arXiv:1706.09579.
Ke, W., Chen, J., Jiao, J., Zhao, G., & Ye, Q. (2017) SRN: Side-output residual network for object symmetry detection in the wild. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 302–310).
Kinga, D., & Adam, J. B.: A method for stochastic optimization. In Proceedings of international conference on learning representations (vol. 5).
Kreiss, S., Bertoni, L., & Alahi, A. (2019) PifPaf: Composite fields for human pose estimation. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 11977–11986).
Levinshtein, A., Sminchisescu, C., & Dickinson, S. (2013). Multiscale symmetric part detection and grouping. International Journal of Computer Vision, 104(2), 117–134.
Article Google Scholar
Lindeberg, T. (1998). Edge detection and ridge detection with automatic scale selection. International Journal of Computer Vision, 30(2), 117–156.
Article Google Scholar
Lindeberg, T. (2013). Scale selection properties of generalized scale-space interest point detectors. Journal of Mathematical Imaging and Vision, 46(2), 177–210.
Article MathSciNet Google Scholar
Liu, C., Ke, W., Qin, F., & Ye, Q. (2018). Linear span network for object skeleton detection. In Proceedings of European conference on computer vision (pp. 136–151).
Liu, T. L., Geiger, D., & Yuille, A. L. (1998). Segmenting by seeking the symmetry axis. In Proceedings of international conference on pattern recognition (vol. 2, pp. 994–998).
Liu, X., Lyu, P., Bai, X., & Cheng, M. M. (2017). Fusing image and segmentation cues for skeleton extraction in the wild. In Proceedings of ICCV workshop on detecting symmetry in the wild (vol. 6, p. 8).
Liu, Y., Cheng, M. M., Hu, X., Wang, K., & Bai, X. (2017). Richer convolutional features for edge detection. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 5872–5881).
Long, J., Shelhamer, E., & Darrell, T. (2015) Fully convolutional networks for semantic segmentation. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 3431–3440).
Luo, W., Li, Y., Urtasun, R., & Zemel, R. (2016). Understanding the effective receptive field in deep convolutional neural networks. In Proceedings of advances in neural information processing systems (pp. 4898–4906).
Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., et al. (2018). Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 20(11), 3111–3122.
Article Google Scholar
Maninis, K. K., Pont-Tuset, J., Arbeláez, P., & Van Gool, L. (2018). Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 819–833.
Article Google Scholar
Marr, D., & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London B: Biological Sciences, 200(1140), 269–294.
Google Scholar
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of IEEE international conference on computer vision (vol. 2, pp. 416–423).
Martin, D. R., Fowlkes, C. C., & Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5), 530–549.
Article Google Scholar
Máttyus, G., Luo, W., & Urtasun, R. (2017). Deeproadmapper: Extracting road topology from aerial images. In Proceedings of the IEEE international conference on computer vision.
Mattyus, G., Wang, S., Fidler, S., & Urtasun, R. (2015). Enhancing road maps by parsing aerial images around the world. In Proceedings of the IEEE international conference on computer vision (pp. 1689–1697).
Nedzved, A., Ablameyko, S., & Uchida, S. (2006). Gray-scale thinning by using a pseudo-distance map. In Proceedings of IEEE international conference on pattern recognition.
Peng, S., Liu, Y., Huang, Q., Zhou, X., & Bao, H. (2019). PVNet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4561–4570).
Ren, Z., Yuan, J., Meng, J., & Zhang, Z. (2013). Robust part-based hand gesture recognition using kinect sensor. IEEE Transactions on Multimedia, 15(5), 1110–1120.
Article Google Scholar
Shen, W., Bai, X., Hu, R., Wang, H., & Latecki, L. J. (2011). Skeleton growing and pruning with bending potential ratio. Pattern Recognition, 44(2), 196–209.
Article Google Scholar
Shen, W., Bai, X., Hu, Z., & Zhang, Z. (2016). Multiple instance subspace learning via partial random projection tree for local reflection symmetry in natural images. Pattern Recognition, 52, 306–316.
Article Google Scholar
Shen, W., Zhao, K., Jiang, Y., Wang, Y., Bai, X., & Yuille, A. (2017). Deepskeleton: Learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Transactions on Image Processing, 26(11), 5298–5311.
Article MathSciNet Google Scholar
Shen, W., Zhao, K., Jiang, Y., Wang, Y., Zhang, Z., & Bai, X. (2016). Object skeleton extraction in natural images by fusing scale-associated deep side outputs. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 222–230).
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011) Real-time human pose recognition in parts from single depth images. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 1297–1304).
Siddiqi, K., Bouix, S., Tannenbaum, A., & Zucker, S. W. (2002). Hamilton-jacobi skeletons. International Journal of Computer Vision, 48(3), 215–231.
Article Google Scholar
Siddiqi, K., & Pizer, S. M. (2008). Medial Representations: Mathematics., Algorithms and Applications Berlin: Springer.
Book Google Scholar
Siddiqi, K., Shokoufandeh, A., Dickinson, S. J., & Zucker, S. W. (1999). Shock graphs and shape matching. International Journal of Computer Vision, 35(1), 13–32.
Article Google Scholar
Sie Ho Lee, T., Fidler, S., & Dickinson, S. (2013). Detecting curved symmetric parts using a deformable disc model. In Proceedings of IEEE international conference on computer vision (pp. 1753–1760).
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representations.
Sironi, A., Lepetit, V., & Fua, P. (2014). Multiscale centerline detection by learning a scale-space distance transform. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2697–2704).
Trinh, N. H., & Kimia, B. B. (2011). Skeleton search: Category-specific object recognition and segmentation using a skeletal shape model. International Journal of Computer Vision, 2, 215–240.
Article Google Scholar
Tsogkas, S., & Dickinson, S. (2017) AMAT: Medial axis transform for natural images. In Proceedings of IEEE international conference on computer vision (pp. 2727–2736).
Tsogkas, S., & Kokkinos, I. (2012). Learning-based symmetry detection in natural images. In Proceedings of European conference on computer vision (pp. 41–54).
Wang, Y., Xu, Y., Tsogkas, S., Bai, X., Dickinson, S., & Siddiqi, K. (2019). Deepflux for skeletons in the wild. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 5287–5296).
Wei, S. E., Ramakrishna, V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 4724–4732).
Xia, G., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., et al. (2017). AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions Geoscience and Remote Sensing, 55(7), 3965–3981.
Article Google Scholar
Xia, G. S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018) DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 3974–3983).
Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In Proceedings of IEEE international conference on computer vision (pp. 1395–1403).
Xu, W., Parmar, G., & Tu, Z. (2019). Geometry-aware end-to-end skeleton detection. In British Machine Vision Conference.
Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., & Bai, X. (2019). Textfield: Learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 28(11), 5566–5579.
Article MathSciNet Google Scholar
Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., et al. (2018). Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sensing, 10(1), 132.
Yu, Z., & Bajaj, C. (2004). A segmentation-free approach for skeletonization of gray-scale images via anisotropic vector diffusion. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 415–420).
Zhang, Q., & Couloigner, I. (2007). Accurate centerline detection and line width estimation of thick lines using the radon transform. IEEE Transactions on Image Processing, 16(2), 310–316.
Article MathSciNet Google Scholar
Zhang, Z., Shen, W., Yao, C., & Bai, X. (2015). Symmetry-based text line detection in natural scenes. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2558–2567).
Zhao, K., Shen, W., Gao, S., Li, D., & Cheng, M. M. (2018). Hi-fi: Hierarchical feature integration for skeleton detection. In Proceedings of international joint conference on artificial intelligence (pp. 1191–1197).
Zhu, S. C., & Yuille, A. L. (1996). Forms: A flexible object recognition and modelling system. International Journal of Computer Vision, 20(3), 187–212.
Article Google Scholar
Zucker, S. W. (2012). Local field potentials and border ownership: A conjecture about computation in visual cortex. Journal of Physiology-Paris, 106, 297–315.
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by NSFC 61936003 and 61703171, and the Major Project for New Generation of AI under Grant No. 2018AAA0100400. Yongchao Xu was supported by the Young Elite Scientists Sponsorship Program by CAST. The work of Xiang Bai was supported by the National Program for Support of Top-Notch Young Professionals and in part by the Program for HUST Academic Frontier Youth Team. Sven Dickinson and Kaleem Siddiqi would like to thank the Natural Sciences and Engineering Research Council of Canada (NSERC) for research funding.

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, China
Yongchao Xu
School of EiC, Huazhong University of Science and Technology, Wuhan, China
Yukang Wang, Jianqiang Wan & Xiang Bai
University of Toronto, Toronto, Canada
Stavros Tsogkas & Sven Dickinson
Vector Institute for Artificial Intelligence, Toronto, Canada
Sven Dickinson
Samsung Toronto AI Research Center, Toronto, Canada
Stavros Tsogkas & Sven Dickinson
School of Computer Science and Centre for Intelligent Machines, McGill University, Montreal, Canada
Kaleem Siddiqi

Authors

Yongchao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yukang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Stavros Tsogkas
View author publications
You can also search for this author in PubMed Google Scholar
Jianqiang Wan
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Bai
View author publications
You can also search for this author in PubMed Google Scholar
Sven Dickinson
View author publications
You can also search for this author in PubMed Google Scholar
Kaleem Siddiqi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Bai.

Additional information

Communicated by Christoph H. Lampert.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Disclaimer: Sven Dickinson and Stavros Tsogkas contributed to this article in their personal capacity as Professor and Adjunct Professor, respectively, at the University of Toronto. The views expressed (or the conclusions reached) are their own and do not necessarily represent the views of Samsung Research America, Inc.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Y., Wang, Y., Tsogkas, S. et al. DeepFlux for Skeleton Detection in the Wild. Int J Comput Vis 129, 1323–1339 (2021). https://doi.org/10.1007/s11263-021-01430-6

Download citation

Received: 05 June 2020
Accepted: 04 January 2021
Published: 30 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11263-021-01430-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepFlux for Skeleton Detection in the Wild

Abstract

Access this article

Similar content being viewed by others

Fusing Multi-scale Residual Network for Skeleton Detection

Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks

Real-time low-cost human skeleton detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Fusing Multi-scale Residual Network for Skeleton Detection

Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks

Real-time low-cost human skeleton detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation