research-article

Multi-Label and Evolvable Dataset Preparation for Web-Based Object Detection

Authors:

Sheng ZhongAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 9

Article No.: 236, Pages 1 - 21

https://doi.org/10.1145/3695465

Published: 30 October 2024 Publication History

Abstract

In this article, we focus on the emerging field of web-based object detection, which has gained considerable attention due to its ability to utilize large amounts of web data for training, thus eliminating the need for labor-intensive manual annotations. However, the noisy and ever-evolving nature of web data poses challenges in preparing high-quality datasets for web-based object detection. To address these challenges, we propose a fully automatic dataset preparation method in this article. Our proposed method incorporates a hierarchical clustering module that assigns multiple precise labels to each image. This module is based on our observation that web image data exhibits different distributions at varying granularities. Furthermore, an evolutionary relabeling module ensures the adaptability of both the prepared dataset and trained detection models to the ever-evolving web data. Extensive experiments demonstrate that our method outperforms other web-based methods, and achieves a comparable performance to those manually labeled benchmark datasets.

References

[1]

T/CESA 1307-2024. 2024. Information technology - Technical requirements of collaborative learning systems for heterogeneous computing.

[2]

T/CESA 1308-2024. 2024. Information technology - Data quality requirements for heterogeneous computing.

[3]

Hakan Bilen and Andrea Vedaldi. 2016. Weakly supervised deep detection networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2846–2854.

[4]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision. Springer, 213–229.

Digital Library

[5]

Xinlei Chen and Abhinav Gupta. 2015. Webly supervised learning of convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, 1431–1439.

Digital Library

[6]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.

[7]

Santosh K. Divvala, Ali Farhadi, and Carlos Guestrin. 2014. Learning everything about anything: Webly-supervised visual concept learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3270–3277.

Digital Library

[8]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88 (2010), 303–338.

Digital Library

[9]

Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, 1440–1448.

Digital Library

[10]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580–587.

Digital Library

[11]

Zuxian Huang, Gangshan Wu, and Limin Wang. 2023. Webly-supervised semantic segmentation via curriculum learning. Computer Vision and Image Understanding 236 (2023), 103810.

Digital Library

[12]

Zeyi Huang, Yang Zou, BVK Kumar, and Dong Huang. 2020. Comprehensive attention self-distillation for weakly-supervised object detection. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33, 16797–16807.

[13]

Youngwook Kim, Jae Myung Kim, Zeynep Akata, and Jungwoo Lee. 2022. Large loss matters in weakly supervised multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14156–14165.

[14]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from http://arxiv.org/abs/1412.6980

[15]

Shucheng Li, Boyu Chang, Bo Yang, Hao Wu, Sheng Zhong, and Fengyuan Xu. 2023. Dataset preparation for arbitrary object detection: An automatic approach based on web information in English. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 749–759.

Digital Library

[16]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision (ECCV ’14). Springer, 740–755.

[17]

Huafeng Liu, Chuanyi Zhang, Yazhou Yao, Xiu-Shen Wei, Fumin Shen, Zhenmin Tang, and Jian Zhang. 2021. Exploiting web images for fine-grained visual recognition by eliminating open-set noise and utilizing hard examples. IEEE Transactions on Multimedia 24 (2021), 546–557.

[18]

Alessandro Prest, Christian Leistner, Javier Civera, Cordelia Schmid, and Vittorio Ferrari. 2012. Learning object class detectors from weakly annotated video. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3282–3289.

Digital Library

[19]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 8748–8763.

[20]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 28.

[21]

Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G Schwing, and Jan Kautz. 2020. Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10598–10607.

[22]

Yunhang Shen, Rongrong Ji, Zhiwei Chen, Xiaopeng Hong, Feng Zheng, Jianzhuang Liu, Mingliang Xu, and Qi Tian. 2020. Noise-aware fully webly supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11326–11335.

[23]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556

[24]

Ximeng Sun, Ping Hu, and Kate Saenko. 2022. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 35, 30569–30582.

[25]

Zeren Sun, Yazhou Yao, Xiu-Shen Wei, Yongshun Zhang, Fumin Shen, Jianxin Wu, Jian Zhang, and Heng Tao Shen. 2021. Webly supervised fine-grained recognition: Benchmark datasets and an approach. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10602–10611.

[26]

Peng Tang, Xinggang Wang, Xiang Bai, and Wenyu Liu. 2017. Multiple instance detection network with online instance classifier refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2843–2851.

[27]

Qingyi Tao, Hao Yang, and Jianfei Cai. 2018. Exploiting web images for weakly supervised object detection. IEEE Transactions on Multimedia 21, 5 (2018), 1135–1146.

Digital Library

[28]

Qingyi Tao, Hao Yang, and Jianfei Cai. 2018. Zero-annotation object detection with web knowledge transfer. In Proceedings of the European Conference on Computer Vision (ECCV ’18), 369–384.

Digital Library

[29]

Jasper R. R. Uijlings, Koen E. A. Van De Sande, Theo Gevers, and Arnold W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision 104 (2013), 154–171.

Digital Library

[30]

Yazhou Yao, Xian-sheng Hua, Fumin Shen, Jian Zhang, and Zhenmin Tang. 2016. A domain robust approach for image dataset construction. In Proceedings of the 24th ACM International Conference on Multimedia, 212–216.

Digital Library

[31]

Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2017. Exploiting web images for dataset construction: A domain robust approach. IEEE Transactions on Multimedia 19, 8 (2017), 1771–1784.

Digital Library

[32]

Yazhou Yao, Jian Zhang, Fumin Shen, Li Liu, Fan Zhu, Dongxiang Zhang, and Heng Tao Shen. 2019. Towards automatic construction of diverse, high-quality image datasets. IEEE Transactions on Knowledge and Data Engineering 32, 6 (2019), 1199–1211.

[33]

Chuanyi Zhang, Yazhou Yao, Huafeng Liu, Guo-Sen Xie, Xiangbo Shu, Tianfei Zhou, Zheng Zhang, Fumin Shen, and Zhenmin Tang. 2020. Web-supervised network with softly update-drop training for fine-grained visual classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 12781–12788.

[34]

Yongqiang Zhang, Yancheng Bai, Mingli Ding, Yongqiang Li, and Bernard Ghanem. 2018. W2f: A weakly-supervised to fully-supervised framework for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 928–936.

[35]

C Lawrence Zitnick and Piotr Dollár. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the 13th European Conference on Computer Vision (ECCV ’14). Springer, 391–405.

[36]

Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2023. Object detection in 20 years: A survey. Proceedings of the IEEE 111, 3 (2023), 257–276.

Cited By

Costa de Araujo JRodrigues GCarwehl MVogel TGrunske LCaldas RPelliccione P(2024)Explainability for Property Violations in Cyberphysical Systems: An Immune-Inspired ApproachIEEE Software10.1109/MS.2024.338728941:5(43-51)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1109/MS.2024.3387289

Index Terms

Multi-Label and Evolvable Dataset Preparation for Web-Based Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
2. Information systems
  1. Information retrieval
  2. World Wide Web
    1. Web mining

Recommendations

Dataset Preparation for Arbitrary Object Detection: An Automatic Approach based on Web Information in English
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Automatic dataset preparation can help users avoid labor-intensive and costly manual data annotations. The difficulty in preparing a high-quality dataset for object detection involves three key aspects: relevance, naturality, and balance, which are not ...
Using multi-label classification to improve object detection
Abstract
In this paper, a novel multi-task framework for object detection is proposed. The framework uses multi-label classification as an auxiliary task to improve object detection, and can be trained and tested end-to-end. The object ...
Leveraging Prior-Knowledge for Weakly Supervised Object Detection Under a Collaborative Self-Paced Curriculum Learning Framework

Weakly supervised object detection is an interesting yet challenging research topic in computer vision community, which aims at learning object models to localize and detect the corresponding objects of interest only under the supervision of image-level ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 18, Issue 9

November 2024

730 pages

EISSN:1556-472X

DOI:10.1145/3613722

Editor:
Jian Pei
Duke University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2024

Online AM: 09 September 2024

Accepted: 31 August 2024

Revised: 25 June 2024

Received: 03 February 2024

Published in TKDD Volume 18, Issue 9

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R & D Program of China
NSFC
Leading edge Technology Program of Jiangsu Natural Science
Science Foundation for Youths of Jiangsu

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
227
Total Downloads

Downloads (Last 12 months)227
Downloads (Last 6 weeks)34

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Costa de Araujo JRodrigues GCarwehl MVogel TGrunske LCaldas RPelliccione P(2024)Explainability for Property Violations in Cyberphysical Systems: An Immune-Inspired ApproachIEEE Software10.1109/MS.2024.338728941:5(43-51)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1109/MS.2024.3387289

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents