Abstract
Weakly supervised object localization (WSOL) aims at detecting objects through only image-level labels. Class activation maps (CAMs) are the commonly used features for WSOL. However, existing CAM-based methods tend to excessively pursue discriminative features for object recognition and hence ignore the feature similarities among different categories, thereby leading to CAMs incomplete for object localization. In addition, CAMs are sensitive to background noise due to over-dependence on the holistic classification. In this paper, we propose a simple but effective WSOL model (named ISIC) through Inter-class feature Similarity and Intra-class appearance Consistency. In practice, our ISIC model first proposes the inter-class feature similarity (ICFS) loss against the original cross entropy loss. Such an ICFS loss sufficiently leverages the shared features together with the discriminative features between different categories, which significantly reduces the model over-fitting risk to background noise and brings more complete object masks. Besides, instead of CAMs, a non-negative matrix factorization mask module is applied to extract object masks from multiple intra-class images. Thanks to intra-class appearance consistency, the achieved pseudo masks are more complete and robust. As a result, extensive experiments confirm that our ISIC model achieves state-of-the-art on both CUB-200 and ImageNet-1K benchmarks i.e., 97.3% and 70.0% GT-Known localization accuracy, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bae, W., Noh, J., Kim, G.: Rethinking class activation mapping for weakly supervised object localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 618–634. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_37
Bai, H., Zhang, R., Wang, J., Wan, X.: Weakly supervised object localization via transformer with implicit spatial calibration. In: ECCV (2022)
Choe, J., Shim, H.: Attention-based dropout layer for weakly supervised object localization. In: CVPR, pp. 2219–2228 (2019)
Gao, W., et al.: Ts-cam: token semantic coupled attention map for weakly supervised object localization. In: ICCV, pp. 2886–2895 (2021)
Guillamet, D., Vitria, J.: Non-negative matrix factorization for face recognition. In: CCAI, pp. 336–344 (2002)
Guo, G., Han, J., Wan, F., Zhang, D.: Strengthen learning tolerance for weakly supervised object localization. In: CVPR, pp. 7403–7412 (2021)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Kim, J., Choe, J., Yun, S., Kwak, N.: Normalization matters in weakly supervised object localization. In: ICCV, pp. 3427–3436 (2021)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Liu, W., Zheng, N., Lu, X.: Non-negative matrix factorization for visual coding. In: ICASSP, pp. 111–293 (2003)
Lu, W., Jia, X., Xie, W., Shen, L., Zhou, Y., Duan, J.: Geometry constrained weakly supervised object localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 481–496. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_29
Luo, X., Zhou, M., Xia, Y., Zhu, Q.: An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems, pp. 1273–1284 (2014)
Mai, J., Yang, M., Luo, W.: Erasing integrated learning: a simple yet effective approach for weakly supervised object localization. In: CVPR, pp. 8766–8775 (2020)
Meng, M., Zhang, T., Tian, Q., Zhang, Y., Wu, F.: Foreground activation maps for weakly supervised object localization. In: ICCV, pp. 3385–3395 (2021)
Pan, X., et al.: Unveiling the potential of structure preserving for weakly supervised object localization. In: CVPR, pp. 11642–11651 (2021)
Russakovsky, O.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR, pp. 1–14 (2015)
Singh, K.K., Lee, Y.J.: Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: ICCV, pp. 3544–3553 (2017)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the Inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114 (2019)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Technical Rep. CNS-TR-2011-001, California Institute of Technology (2011)
Wei, J., Wang, Q., Li, Z., Wang, S., Zhou, S.K., Cui, S.: Shallow feature matters for weakly supervised object localization. In: CVPR, pp. 5993–6001 (2021)
Wei, X.S., Zhang, C.L., Wu, J., Shen, C., Zhou, Z.H.: Unsupervised object discovery and co-localization by deep descriptor transformation, vol. 88, pp. 113–126 (2019)
Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: CVPR, pp. 1568–1576 (2017)
Xie, J., Luo, C., Zhu, X., Jin, Z., Lu, W., Shen, L.: Online refinement of low-level feature based activation map for weakly supervised object localization. In: ICCV, pp. 132–141 (2021)
Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., Ye, Q.: Danet: divergent activation for weakly supervised object localization. In: ICCV, pp. 6589–6598 (2019)
Yang, S., Kim, Y., Kim, Y., Kim, C.: Combinational class activation maps for weakly supervised object localization. In: WACV, pp. 2941–2949 (2020)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: CVPR, pp. 6023–6032 (2019)
Zhang, C.L., Cao, Y.H., Wu, J.: Rethinking the route towards weakly supervised object localization. In: CVPR, pp. 13460–13469 (2020)
Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.S.: Adversarial complementary learning for weakly supervised object localization. In: CVPR, pp. 1325–1334 (2018)
Zhang, X., Wei, Y., Kang, G., Yang, Y., Huang, T.: Self-produced guidance for weakly-supervised object localization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11216, pp. 610–625. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01258-8_37
Zhang, X., Wei, Y., Yang, Y.: Inter-image communication for weakly supervised localization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 271–287. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_17
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929 (2016)
Acknowledgement
This work was supported in part by NSFC-Youth 61902335, by the Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen HK S &T Cooperation Zone, by the National Key R &D Program of China with grant No. 2018YFB1800800, by Shenzhen Outstanding Talents Training Fund, by Guangdong Research Project No. 2017ZT07X152 and No. 2019CX01X104, by the Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001), by zelixir biotechnology company Fund, by Tencent Open Fund, and by ITSO at CUHKSZ.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, J., Wang, S., Zhou, S.K., Cui, S., Li, Z. (2022). Weakly Supervised Object Localization Through Inter-class Feature Similarity and Intra-class Appearance Consistency. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13690. Springer, Cham. https://doi.org/10.1007/978-3-031-20056-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-20056-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20055-7
Online ISBN: 978-3-031-20056-4
eBook Packages: Computer ScienceComputer Science (R0)