Weakly Supervised Object Localization Through Inter-class Feature Similarity and Intra-class Appearance Consistency

Wei, Jun; Wang, Sheng; Zhou, S. Kevin; Cui, Shuguang; Li, Zhen

doi:10.1007/978-3-031-20056-4_12

Jun Wei^12,13,14,
Sheng Wang¹⁷,
S. Kevin Zhou^12,15,16,
Shuguang Cui^12,13,14 &
…
Zhen Li^12,13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13690))

Included in the following conference series:

European Conference on Computer Vision

2266 Accesses
5 Citations

Abstract

Weakly supervised object localization (WSOL) aims at detecting objects through only image-level labels. Class activation maps (CAMs) are the commonly used features for WSOL. However, existing CAM-based methods tend to excessively pursue discriminative features for object recognition and hence ignore the feature similarities among different categories, thereby leading to CAMs incomplete for object localization. In addition, CAMs are sensitive to background noise due to over-dependence on the holistic classification. In this paper, we propose a simple but effective WSOL model (named ISIC) through Inter-class feature Similarity and Intra-class appearance Consistency. In practice, our ISIC model first proposes the inter-class feature similarity (ICFS) loss against the original cross entropy loss. Such an ICFS loss sufficiently leverages the shared features together with the discriminative features between different categories, which significantly reduces the model over-fitting risk to background noise and brings more complete object masks. Besides, instead of CAMs, a non-negative matrix factorization mask module is applied to extract object masks from multiple intra-class images. Thanks to intra-class appearance consistency, the achieved pseudo masks are more complete and robust. As a result, extensive experiments confirm that our ISIC model achieves state-of-the-art on both CUB-200 and ImageNet-1K benchmarks i.e., 97.3% and 70.0% GT-Known localization accuracy, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bae, W., Noh, J., Kim, G.: Rethinking class activation mapping for weakly supervised object localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 618–634. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_37
Chapter Google Scholar
Bai, H., Zhang, R., Wang, J., Wan, X.: Weakly supervised object localization via transformer with implicit spatial calibration. In: ECCV (2022)
Google Scholar
Choe, J., Shim, H.: Attention-based dropout layer for weakly supervised object localization. In: CVPR, pp. 2219–2228 (2019)
Google Scholar
Gao, W., et al.: Ts-cam: token semantic coupled attention map for weakly supervised object localization. In: ICCV, pp. 2886–2895 (2021)
Google Scholar
Guillamet, D., Vitria, J.: Non-negative matrix factorization for face recognition. In: CCAI, pp. 336–344 (2002)
Google Scholar
Guo, G., Han, J., Wan, F., Zhang, D.: Strengthen learning tolerance for weakly supervised object localization. In: CVPR, pp. 7403–7412 (2021)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Kim, J., Choe, J., Yun, S., Kwak, N.: Normalization matters in weakly supervised object localization. In: ICCV, pp. 3427–3436 (2021)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article MATH Google Scholar
Liu, W., Zheng, N., Lu, X.: Non-negative matrix factorization for visual coding. In: ICASSP, pp. 111–293 (2003)
Google Scholar
Lu, W., Jia, X., Xie, W., Shen, L., Zhou, Y., Duan, J.: Geometry constrained weakly supervised object localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 481–496. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_29
Chapter Google Scholar
Luo, X., Zhou, M., Xia, Y., Zhu, Q.: An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems, pp. 1273–1284 (2014)
Google Scholar
Mai, J., Yang, M., Luo, W.: Erasing integrated learning: a simple yet effective approach for weakly supervised object localization. In: CVPR, pp. 8766–8775 (2020)
Google Scholar
Meng, M., Zhang, T., Tian, Q., Zhang, Y., Wu, F.: Foreground activation maps for weakly supervised object localization. In: ICCV, pp. 3385–3395 (2021)
Google Scholar
Pan, X., et al.: Unveiling the potential of structure preserving for weakly supervised object localization. In: CVPR, pp. 11642–11651 (2021)
Google Scholar
Russakovsky, O.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR, pp. 1–14 (2015)
Google Scholar
Singh, K.K., Lee, Y.J.: Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: ICCV, pp. 3544–3553 (2017)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)
Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Technical Rep. CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Wei, J., Wang, Q., Li, Z., Wang, S., Zhou, S.K., Cui, S.: Shallow feature matters for weakly supervised object localization. In: CVPR, pp. 5993–6001 (2021)
Google Scholar
Wei, X.S., Zhang, C.L., Wu, J., Shen, C., Zhou, Z.H.: Unsupervised object discovery and co-localization by deep descriptor transformation, vol. 88, pp. 113–126 (2019)
Google Scholar
Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: CVPR, pp. 1568–1576 (2017)
Google Scholar
Xie, J., Luo, C., Zhu, X., Jin, Z., Lu, W., Shen, L.: Online refinement of low-level feature based activation map for weakly supervised object localization. In: ICCV, pp. 132–141 (2021)
Google Scholar
Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., Ye, Q.: Danet: divergent activation for weakly supervised object localization. In: ICCV, pp. 6589–6598 (2019)
Google Scholar
Yang, S., Kim, Y., Kim, Y., Kim, C.: Combinational class activation maps for weakly supervised object localization. In: WACV, pp. 2941–2949 (2020)
Google Scholar
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: CVPR, pp. 6023–6032 (2019)
Google Scholar
Zhang, C.L., Cao, Y.H., Wu, J.: Rethinking the route towards weakly supervised object localization. In: CVPR, pp. 13460–13469 (2020)
Google Scholar
Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.S.: Adversarial complementary learning for weakly supervised object localization. In: CVPR, pp. 1325–1334 (2018)
Google Scholar
Zhang, X., Wei, Y., Kang, G., Yang, Y., Huang, T.: Self-produced guidance for weakly-supervised object localization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 610–625. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_37
Chapter Google Scholar
Zhang, X., Wei, Y., Yang, Y.: Inter-image communication for weakly supervised localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 271–287. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_17
Chapter Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929 (2016)
Google Scholar

Download references

Acknowledgement

This work was supported in part by NSFC-Youth 61902335, by the Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen HK S &T Cooperation Zone, by the National Key R &D Program of China with grant No. 2018YFB1800800, by Shenzhen Outstanding Talents Training Fund, by Guangdong Research Project No. 2017ZT07X152 and No. 2019CX01X104, by the Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001), by zelixir biotechnology company Fund, by Tencent Open Fund, and by ITSO at CUHKSZ.

Author information

Authors and Affiliations

The Future Network of Intelligence Institute, School of Science and Engineering, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong (Shenzhen), Shenzhen, China
Jun Wei, S. Kevin Zhou, Shuguang Cui & Zhen Li
Shenzhen Research Institute of Big Data, Shenzhen, China
Jun Wei, Shuguang Cui & Zhen Li
The Future Network of Intelligence Institute, Shenzhen, China
Jun Wei, Shuguang Cui & Zhen Li
School of Biomedical Engineering and Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China
S. Kevin Zhou
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
S. Kevin Zhou
Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai, China
Sheng Wang

Authors

Jun Wei
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
S. Kevin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shuguang Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Li .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, J., Wang, S., Zhou, S.K., Cui, S., Li, Z. (2022). Weakly Supervised Object Localization Through Inter-class Feature Similarity and Intra-class Appearance Consistency. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13690. Springer, Cham. https://doi.org/10.1007/978-3-031-20056-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-20056-4_12
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20055-7
Online ISBN: 978-3-031-20056-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weakly Supervised Object Localization Through Inter-class Feature Similarity and Intra-class Appearance Consistency