Abstract
With only global image-level annotations, weakly supervised learning of deep convolutional neural networks has shown enough capacity in classification and localization but lack of ability to present the detection explicitly. In this work, we propose a novel spatial division network, which is applied to detect bounding boxes only with weak supervision. The essence of our model is two innovative differentiable modules, determination network and parameterized division, which perform the spatial division in feature maps of classification networks. After training, the learned parameters of the spatial division would correspond to a set of predicted bounding box coordinates. To demonstrate the effectiveness of our model for multi-label classification and weakly supervised detection, we conduct extensive experiments on the multi-MNIST dataset. Experimental results show our spatial division networks can (1) help improve the accuracy of multi-label classification, (2) implement in an end-to-end way only with the image-level annotations, and (3) output accurate bounding box coordinate, thereby achieving multi-digits detection.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig8_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-020-05257-z/MediaObjects/521_2020_5257_Fig9_HTML.png)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Achutti A, Achutti VR (2009) Curriculum learning. In: International conference on machine learning (ICML). ACM, Montreal, pp 41–48. https://doi.org/10.1017/s1047951100000925
Bilen H, Namboodiri VP, Van Gool LJ (2014) Object and action classification with latent window parameters. Int J Comput Vis 106(3):237–251
Bilen H, Pedersoli M, Namboodiri VP, Tuytelaars T, Van Gool L (2014) Object classification with adaptable regions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 3662–3669
Bilen H, Pedersoli M, Tuytelaars T (2014) Weakly supervised detection with posterior regularization. In: British machine vision conference, Nottingham, pp 1–12
Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, pp 2846–2854
Diba A, Sharma V, Pazandeh A, Pirsiavash H, Van Gool L, Leuven K (2017) Weakly supervised cascaded convolutional networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, pp 914–922
Durand T, Mordan, T, Thome N, Cord M (2017) WILDCAT: weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, vol 2, pp 5957–5966
Durand T, Thome N, Cord M (2016) WELDON: Weakly supervised learning of deep convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, pp 4743–4752
Durand T, Thome N, Cord M (2018) Exploiting negative evidence for deep latent structured models. IEEE Trans Pattern Anal Mach Intell 41:337–351
Everingham M, Winn J (2011) The PASCAL visual object classes challenge 2012 (VOC2012) development kit, Pattern Analysis, Statistical Modelling and Computational Learning. Tech Rep 1(1):1–32
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, Sardinia, pp 249–256
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: Advances in neural information processing systems (NIPS), Montreal, pp 2017–2015
Jiang W, Zhao Z, Su F (2018) Weakly supervised detection with decoupled attention-based deep representation. Multimed Tools Appl 77(3):3261–3277
Kantorov V, Oquab M, Cho M, Laptev I (2016) ContextLocNet: context-aware deep network models for weakly supervised localization. In: European conference on computer vision (ECCV), pp 350–365. https://doi.org/10.1007/978-3-319-46448-0
Kosugi S, Yamasaki T, Aizawa K (2019) Object-aware instance labeling for weakly supervised object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), Long Beach, CA, pp 6064–6072
Kumar MP, Packer B, Koller D (2010) Self-paced learning for latent variable models M. In: Advances in neural information processing systems (NIPS), Vancouver, pp 1189–1197
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint p. arXiv:1312.4400
Liu Y, Chen W, Mahmud SMH, Qu H (2019) Mutual constraint learning for weakly supervised object detection. In: IEEE 14th international conference on intelligent systems and knowledge engineering
Murtza I, Khan A, Akhtar N (2019) Object detection using hybridization of static and dynamic feature spaces and its exploitation by ensemble classification. Neural Comput Appl 31(2):347–361
Neri P, Heeger DJ (2002) Spatiotemporal mechanisms for detecting and identifying image features in human vision. Nat Neurosci 5(8):812–816
Nguyen MH, Torresani L, de la Torre F, Carsten (2009) Weakly supervised discriminative localization and classification: a joint learning approach. In: IEEE international conference on computer vision, Kyoto, pp 925–1932
Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Boston, Massachusetts, pp 685–694
Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models megha pandey and svetlana lazebnik. In: The IEEE conference on computer vision and pattern recognition (CVPR), Colorado Springs, pp 1307–1314
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, pp 7263–7271
Russakovsky O, Lin Y, Yu K, Fei-Fei L (2012) Object-centric spatial pooling for image classification. In: European conference on computer vision (ECCV), Florence, pp 1–15
Sande KVD (2011) Segmentation as selective search for object recognition. In: The IEEE international conference on computer vision (ICCV), vol 1, p 7. Colorado Springs. https://doi.org/10.1109/ICCV.2011.6126456
Sangineto E, Nabi M, Culibrk D, Sebe N (2018) Self paced deep learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell 41(3):712–725
Shen Y, Ji R, Wang Y, Wu Y, Cao L (2019) Cyclic guidance for weakly supervised joint detection and segmentation. In: The IEEE conference on computer vision and pattern recognition (CVPR), Long Beach, CA, pp 697–707
Shi Z, Yang Y, Hospedales TM, Xiang T (2014) Weakly supervised learning of objects, attributes and their associations. In: European conference on computer vision (ECCV), Springer, pp 472–487
Sun C, Paluri M, Collobert R, Nevatia R, Bourdev L (2016) ProNet: learning to propose object-specific boxes for cascaded neural networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, pp 3485–3493
Tang P, Wang X, Bai X, Liu W (2017) Multiple instance detection network with online instance classifier refinement. In: The IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, vol 1, pp 2843–2851. https://doi.org/10.1109/CVPR.2017.326
Vo T, Nguyen T, Le CT (2019) A hybrid framework for smile detection in class imbalance scenarios. Neural Comput Appl 31(12):8583–8592
Wang J, Wang N, Li L, Ren Z (2020) Real-time behavior detection and judgment of egg breeders based on YOLO v3. Neural Comput Appl 32(10):5471–5481
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Yang K, Li D, Dou Y (2019) Towards precise end-to-end weakly supervised object detection network. In: Proceedings of the IEEE international conference on computer vision (ICCV), Seoul, pp 8372–8381
Zeng Z, Liu B, Fu J, Chao H, Zhang L (2019) WSOD2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), Seoul, pp 8292–8300
Zhang H, Li D, Ji Y, Zhou H, Wu W, Liu K (2019) Towards new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Ind Inform. https://doi.org/10.1109/TII.2019.2954956
Zhang X, Feng J, Xiong H, Tian Q (2018) Zigzag learning for weakly supervised object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, Utah, pp 4262–4270
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018) W2F: a weakly-supervised to fully-supervised framework for object detection. In: The IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, Utah, pp 928–936
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: The IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, pp 2921–2929
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision (ECCV), Springer, Zurich, pp 391–405
Zhang M, Luo X, Chen Y, Wu J, Belatreche A, Pan Z, Qu H, Li H (2020) An efficient threshold-driven aggregate-label learning algorithm for multimodal information processing. IEEE J Sel Top Signal Process 14(3):592–602
Zhang M, Qu H, Belatreche A, Chen Y, Zhang Y (2018) A highly effective and robust membrane potential-driven supervised learning method for spiking neuron. IEEE Trans Neural Netw Learn Syst 30(1):123–137
Acknowledgements
This work was supported by National Key R&D Program of China under Grant 2018YFC0808304, and in part by the National Science Foundation of China under Grant 61976043 and Grant 61573081.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Y., Chen, W., Qu, H. et al. Spatial division networks for weakly supervised detection. Neural Comput & Applic 33, 4965–4978 (2021). https://doi.org/10.1007/s00521-020-05257-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05257-z