Skip to main content
Log in

Leveraging Prior-Knowledge for Weakly Supervised Object Detection Under a Collaborative Self-Paced Curriculum Learning Framework

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Weakly supervised object detection is an interesting yet challenging research topic in computer vision community, which aims at learning object models to localize and detect the corresponding objects of interest only under the supervision of image-level annotation. For addressing this problem, this paper establishes a novel weakly supervised learning framework to leverage both the instance-level prior-knowledge and the image-level prior-knowledge based on a novel collaborative self-paced curriculum learning (C-SPCL) regime. Under the weak supervision, C-SPCL can leverage helpful prior-knowledge throughout the whole learning process and collaborate the instance-level confidence inference with the image-level confidence inference in a robust way. Comprehensive experiments on benchmark datasets demonstrate the superior capacity of the proposed C-SPCL regime and the proposed whole framework as compared with state-of-the-art methods along this research line.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The instance-level confidence values finally inferred by the proposed approach are real numbers ranging from 0 to 1 as prior-knowledge terms will also be involved in the optimization procedure.

  2. We used the images labelled as containing the airplane, bus, cat, dog, and train category from the training set of COCO (Lin et al. 2014) to form the sub training set of COCO (totally 13034 images) that is used in our experiments. Similarly, we used the images labelled as containing the airplane, bus, cat, dog, and train category from the validation set of COCO (Lin et al. 2014) to form the sub validation set of COCO (totally 6309 images).

  3. We set \(\lambda \) equaling to the 30%-th instance’s loss value \(\ell (y_{i,c}^{(k)},f(\mathbf x _{i}^{(k)}; \mathbf w _c,b_c))\) (ranked from low to high). Notice that as the loss values of the instances and the number of instance in different object categories are not the same, the concrete \(\lambda \) values for different categories are different. Thus, we use \(\lambda _c\) in this paper.

  4. The results of Bilen and Vedaldi (2016) in Tables 4, 6 and 7 are obtained with our implementation.

References

  • Alexe, B., Deselaers, T., & Ferrari, V. (2010). What is an object? In CVPR.

  • Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In ICML.

  • Bilen, H., Pedersoli, M., & Tuytelaars, T. (2014). Weakly supervised object detection with posterior regularization. In BMVC.

  • Bilen, H., Pedersoli, M., & Tuytelaars, T. (2015). Weakly supervised object detection with convex clustering. In CVPR.

  • Bilen, H., & Vedaldi, A. (2016). Weakly supervised deep detection networks. In CVPR.

  • Chen, X., & Gupta, A. (2015). Webly supervised learning of convolutional networks. In ICCV.

  • Cinbis, R. G., Verbeek, J., & Schmid, C. (2017). Weakly supervised object localization with multi-fold multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(1), 189–203.

    Article  Google Scholar 

  • Deselaers, T., Alexe, B., & Ferrari, V. (2010). Localizing objects while learning their appearance. In ECCV.

  • Deselaers, T., Alexe, B., & Ferrari, V. (2012). Weakly supervised localization and learning with generic knowledge. International Journal of Computer Vision, 100(3), 275–293.

    Article  MathSciNet  Google Scholar 

  • Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., & Van Gool, L. (2017). Weakly supervised cascaded convolutional networks. In CVPR.

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Everingham, M., Zisserman, A., Williams, C. K., Van Gool, L., Allan, M., Bishop, C. M., Chapelle, O., Dalal, N., Deselaers, T., Dorkó, G., et al. (2007). The pascal visual object classes challenge 2007 (voc2007) results.

  • Girshick, R. (2015). Fast r-cnn. In ICCV.

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.

  • Gokberk Cinbis, R., Verbeek, J., & Schmid, C. (2014). Multi-fold mil training for weakly supervised object localization. In CVPR.

  • Han, J., Quan, R., Zhang, D., & Nie, F. (2018a). Robust object co-segmentation using background prior. IEEE Transactions on Image Processing, 27(4), 1639–1651.

    Article  MathSciNet  Google Scholar 

  • Han, J., Zhang, D., Cheng, G., Liu, N., & Xu, D. (2018b). Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Processing Magazine, 35(1), 84–100.

    Article  Google Scholar 

  • Han, L., Zhang, D., Huang, D., Chang, X., Ren, J., Luo, S., & Han, J. (2017). Self-paced mixture of regressions. In IJCAI.

  • Jiang, L., Meng, D., Mitamura, T., & Hauptmann, A. G. (2014a). Easy samples first: Self-paced reranking for zero-example multimedia search. In ACM-MM.

  • Jiang, L., Meng, D., Yu, S.-I., Lan, Z., Shan, S., & Hauptmann, A. (2014b). Self-paced learning with diversity. In NIPS.

  • Jiang, L., Meng, D., Zhao, Q., Shan, S., & Hauptmann, A. G. (2015). Self-paced curriculum learning. In AAAI.

  • Jie, Z., Wei, Y., Jin, X., Feng, J., & Liu, W. (2017). Deep self-taught learning for weakly supervised object localization. In CVPR.

  • Kantorov, V., Oquab, M., Cho, M., & Laptev, I. (2016). Contextlocnet: Context-aware deep network models for weakly supervised localization. In ECCV.

  • Khan, F., Mutlu, B., & Zhu, X. (2011). How do humans teach: On curriculum learning and teaching dimension. In NIPS.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

  • Kumar, M. P., Packer, B., & Koller, D. (2010). Self-paced learning for latent variable models. In NIPS.

  • Kumar Singh, K., Xiao, F., & Jae Lee, Y. (2016). Track and transfer: Watching videos to simulate strong human supervision for weakly-supervised object detection. In CVPR.

  • Li, D., Huang, J.-B., Li, Y., Wang, S., & Yang, M.-H. (2016). Weakly supervised object localization with progressive domain adaptation. In CVPR.

  • Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., & Dollr, P. (2014). Microsoft coco: Common objects in context. arXiv preprint arXiv:1405.0312.

  • Meng, D., Zhao, Q., & Jiang, L. (2017). Theoretical understanding of self-paced learning. Information Sciences, 414, 319–328.

    Article  Google Scholar 

  • Pandey, M., & Lazebnik, S. (2011). Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV.

  • Ren, W., Huang, K., Tao, D., & Tan, T. (2016). Weakly supervised large scale object localization with multiple instance learning and bag splitting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 405–416.

    Article  Google Scholar 

  • Russakovsky, O., Lin, Y., Yu, K., & Fei-Fei, L. (2012). Object-centric spatial pooling for image classification. In ECCV.

  • Shi, M., & Ferrari, V. (2016). Weakly supervised object localization using size estimates. In ECCV.

  • Shi, Z., Hospedales, T. M., & Xiang, T. (2015). Bayesian joint modelling for object localisation in weakly labelled images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10), 1959–1972.

    Article  Google Scholar 

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Siva, P., Russell, C., & Xiang, T. (2012). In defence of negative mining for annotating weakly labelled data. In ECCV.

  • Siva, P., Russell, C., Xiang, T., & Agapito, L. (2013). Looking beyond the image: Unsupervised learning for object saliency and detection. In CVPR.

  • Siva, P., & Xiang, T. (2011). Weakly supervised object detector learning with model drift detection. In ICCV.

  • Song, H. O., Girshick, R., Jegelka, S., Mairal, J., Harchaoui, Z., & Darrell, T. (2014a). On learning to localize objects with minimal supervision. arXiv preprint arXiv:1403.1024.

  • Song, H. O., Lee, Y. J., Jegelka, S., & Darrell, T. (2014b). Weakly-supervised discovery of visual pattern configurations. In NIPS.

  • Spitkovsky, V. I., Alshawi, H., & Jurafsky, D. (2009). Baby steps: How less is more in unsupervised dependency parsing. NIPS: Grammar Induction, Representation of Language and Language Learning.

  • Supancic, D., & Ramanan, J. S. (2013). Self-paced learning for long-term tracking. In CVPR.

  • Tang, Y., Yang, Y.-B., & Gao, Y. (2012). Self-paced dictionary learning for image classification. In ACM-MM.

  • Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.

    Article  Google Scholar 

  • Wang, L., Hua, G., Sukthankar, R., Xue, J., & Zheng, N. (2014a). Video object discovery and co-segmentation with extremely weak supervision. In ECCV.

  • Wang, C., Ren, W., Huang, K., & Tan, T. (2014b). Weakly supervised object localization with latent category learning. In ECCV.

  • Yang, X., Song, Q., & Wang, Y. (2007). A weighted support vector machine for data classification. International Journal of Pattern Recognition and Artificial Intelligence, 21(05), 961–976.

    Article  Google Scholar 

  • Yao, X., Han, J., Zhang, D., & Nie, F. (2017). Revisiting co-saliency detection: A novel approach based on two-stage multi-view spectral rotation co-clustering. IEEE Transactions on Image Processing, 26(7), 3196–3209.

    Article  MathSciNet  Google Scholar 

  • Zhang, D., Fu, H., Han, J., Borji, A., & Li, X. (2018). A review of co-saliency detection algorithms: Fundamentals, applications, and challenges. ACM Transactions on Intelligent Systems and Technology, 9(4), 38.

    Article  Google Scholar 

  • Zhang, D., Meng, D., & Han, J. (2017a). Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(5), 865–878.

    Article  Google Scholar 

  • Zhang, D., Meng, D., Zhao, L., & Han, J. (2016). Bridging saliency detection to weakly supervised object detection based on self-paced curriculum learning. In IJCAI.

  • Zhang, D., Yang, L., Meng, D., Xu, D., & Han, J. (2017b). Spftn: A self-paced fine-tuning network for segmenting objects in weakly labelled videos. In CVPR.

  • Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In ECCV.

Download references

Acknowledgements

This work was supported in part by the “National Key R&D Program of China” (2017YFB0502904), the National Science Foundation of China under Grants 61876140 and 61773301, the Fundamental Research Funds for the Central Universities under Grant JBZ170401, and the China Postdoctoral Support Scheme for Innovative Talents under Grant BX20180236.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junwei Han.

Additional information

Communicated by Jakob Verbeek.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this work appeared at IJCAI Zhang et al. (2016).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Han, J., Zhao, L. et al. Leveraging Prior-Knowledge for Weakly Supervised Object Detection Under a Collaborative Self-Paced Curriculum Learning Framework. Int J Comput Vis 127, 363–380 (2019). https://doi.org/10.1007/s11263-018-1112-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-018-1112-4

Keywords

Navigation