research-article

LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization

Authors:

Ke Li,

Yonghong TianAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 5655 - 5664

https://doi.org/10.1145/3581783.3612165

Published: 27 October 2023 Publication History

Get Access

Abstract

Weakly Supervised Object Localization (WSOL) aims to localize objects using only image-level labels while ensuring competitive classification performance. However, previous efforts have prioritized localization over classification accuracy in discriminative features, in which low-level information is neglected. We argue that low-level image representations, such as edges, color, texture, and motions are crucial for accurate detection. That is, using such information further achieves more refined localization, which can be used to promote classification accuracy. In this paper, we propose a unified framework that simultaneously improves localization and classification accuracy, termed as LocLoc (Low-level Cues and Local-area Guides). It leverages low-level image cues to explore global and local representations for accurate localization and classification. Specifically, we introduce a GrabCut-Enhanced Generator (GEG) to learn global semantic representations for localization based on graph cuts to enhance low-level information based on long-range dependencies captured by the transformer. We further design a Local Feature Digging Module (LFDM) that utilizes low-level cues to guide the learning route of local feature representations for accurate classification. Extensive experiments demonstrate the effectiveness of LocLoc with 84.4%(↑5.2%) Top-1 Loc., 85.8% Top-1 Cls. on CUB-200-2011 and 57.6% (↑1.5%) Top-1 Loc., 78.6% Top-1Cls. on ILSVRC 2012, indicating that our method achieves competitive performance with a large margin compared to previous approaches. Code and models are available at https://github.com/Cliffia123/LocLoc.

References

[1]

Haotian Bai, Ruimao Zhang, Jiong Wang, and Xiang Wan. 2022. Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration. In Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IX (Lecture Notes in Computer Science, Vol. 13669). Springer, 612--628. https://doi.org/10.1007/978-3-031-20077-9_36

Abstract

References

Cited By

Index Terms

Recommendations

E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization

Dual-Gradients Localization Framework for Weakly Supervised Object Localization

Self-taught cross-domain few-shot learning with weakly supervised object localization and task-decomposition

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations