skip to main content
research-article

Multi-Label and Evolvable Dataset Preparation for Web-Based Object Detection

Published: 30 October 2024 Publication History

Abstract

In this article, we focus on the emerging field of web-based object detection, which has gained considerable attention due to its ability to utilize large amounts of web data for training, thus eliminating the need for labor-intensive manual annotations. However, the noisy and ever-evolving nature of web data poses challenges in preparing high-quality datasets for web-based object detection. To address these challenges, we propose a fully automatic dataset preparation method in this article. Our proposed method incorporates a hierarchical clustering module that assigns multiple precise labels to each image. This module is based on our observation that web image data exhibits different distributions at varying granularities. Furthermore, an evolutionary relabeling module ensures the adaptability of both the prepared dataset and trained detection models to the ever-evolving web data. Extensive experiments demonstrate that our method outperforms other web-based methods, and achieves a comparable performance to those manually labeled benchmark datasets.

References

[1]
T/CESA 1307-2024. 2024. Information technology - Technical requirements of collaborative learning systems for heterogeneous computing.
[2]
T/CESA 1308-2024. 2024. Information technology - Data quality requirements for heterogeneous computing.
[3]
Hakan Bilen and Andrea Vedaldi. 2016. Weakly supervised deep detection networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2846–2854.
[4]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision. Springer, 213–229.
[5]
Xinlei Chen and Abhinav Gupta. 2015. Webly supervised learning of convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, 1431–1439.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
[7]
Santosh K. Divvala, Ali Farhadi, and Carlos Guestrin. 2014. Learning everything about anything: Webly-supervised visual concept learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3270–3277.
[8]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88 (2010), 303–338.
[9]
Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, 1440–1448.
[10]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580–587.
[11]
Zuxian Huang, Gangshan Wu, and Limin Wang. 2023. Webly-supervised semantic segmentation via curriculum learning. Computer Vision and Image Understanding 236 (2023), 103810.
[12]
Zeyi Huang, Yang Zou, BVK Kumar, and Dong Huang. 2020. Comprehensive attention self-distillation for weakly-supervised object detection. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33, 16797–16807.
[13]
Youngwook Kim, Jae Myung Kim, Zeynep Akata, and Jungwoo Lee. 2022. Large loss matters in weakly supervised multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14156–14165.
[14]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from http://arxiv.org/abs/1412.6980
[15]
Shucheng Li, Boyu Chang, Bo Yang, Hao Wu, Sheng Zhong, and Fengyuan Xu. 2023. Dataset preparation for arbitrary object detection: An automatic approach based on web information in English. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 749–759.
[16]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision (ECCV ’14). Springer, 740–755.
[17]
Huafeng Liu, Chuanyi Zhang, Yazhou Yao, Xiu-Shen Wei, Fumin Shen, Zhenmin Tang, and Jian Zhang. 2021. Exploiting web images for fine-grained visual recognition by eliminating open-set noise and utilizing hard examples. IEEE Transactions on Multimedia 24 (2021), 546–557.
[18]
Alessandro Prest, Christian Leistner, Javier Civera, Cordelia Schmid, and Vittorio Ferrari. 2012. Learning object class detectors from weakly annotated video. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3282–3289.
[19]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 8748–8763.
[20]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 28.
[21]
Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G Schwing, and Jan Kautz. 2020. Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10598–10607.
[22]
Yunhang Shen, Rongrong Ji, Zhiwei Chen, Xiaopeng Hong, Feng Zheng, Jianzhuang Liu, Mingliang Xu, and Qi Tian. 2020. Noise-aware fully webly supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11326–11335.
[23]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556
[24]
Ximeng Sun, Ping Hu, and Kate Saenko. 2022. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 35, 30569–30582.
[25]
Zeren Sun, Yazhou Yao, Xiu-Shen Wei, Yongshun Zhang, Fumin Shen, Jianxin Wu, Jian Zhang, and Heng Tao Shen. 2021. Webly supervised fine-grained recognition: Benchmark datasets and an approach. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10602–10611.
[26]
Peng Tang, Xinggang Wang, Xiang Bai, and Wenyu Liu. 2017. Multiple instance detection network with online instance classifier refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2843–2851.
[27]
Qingyi Tao, Hao Yang, and Jianfei Cai. 2018. Exploiting web images for weakly supervised object detection. IEEE Transactions on Multimedia 21, 5 (2018), 1135–1146.
[28]
Qingyi Tao, Hao Yang, and Jianfei Cai. 2018. Zero-annotation object detection with web knowledge transfer. In Proceedings of the European Conference on Computer Vision (ECCV ’18), 369–384.
[29]
Jasper R. R. Uijlings, Koen E. A. Van De Sande, Theo Gevers, and Arnold W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision 104 (2013), 154–171.
[30]
Yazhou Yao, Xian-sheng Hua, Fumin Shen, Jian Zhang, and Zhenmin Tang. 2016. A domain robust approach for image dataset construction. In Proceedings of the 24th ACM International Conference on Multimedia, 212–216.
[31]
Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2017. Exploiting web images for dataset construction: A domain robust approach. IEEE Transactions on Multimedia 19, 8 (2017), 1771–1784.
[32]
Yazhou Yao, Jian Zhang, Fumin Shen, Li Liu, Fan Zhu, Dongxiang Zhang, and Heng Tao Shen. 2019. Towards automatic construction of diverse, high-quality image datasets. IEEE Transactions on Knowledge and Data Engineering 32, 6 (2019), 1199–1211.
[33]
Chuanyi Zhang, Yazhou Yao, Huafeng Liu, Guo-Sen Xie, Xiangbo Shu, Tianfei Zhou, Zheng Zhang, Fumin Shen, and Zhenmin Tang. 2020. Web-supervised network with softly update-drop training for fine-grained visual classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 12781–12788.
[34]
Yongqiang Zhang, Yancheng Bai, Mingli Ding, Yongqiang Li, and Bernard Ghanem. 2018. W2f: A weakly-supervised to fully-supervised framework for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 928–936.
[35]
C Lawrence Zitnick and Piotr Dollár. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the 13th European Conference on Computer Vision (ECCV ’14). Springer, 391–405.
[36]
Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. 2023. Object detection in 20 years: A survey. Proceedings of the IEEE 111, 3 (2023), 257–276.

Cited By

View all
  • (2024)Explainability for Property Violations in Cyberphysical Systems: An Immune-Inspired ApproachIEEE Software10.1109/MS.2024.338728941:5(43-51)Online publication date: 16-Apr-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 9
November 2024
730 pages
EISSN:1556-472X
DOI:10.1145/3613722
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2024
Online AM: 09 September 2024
Accepted: 31 August 2024
Revised: 25 June 2024
Received: 03 February 2024
Published in TKDD Volume 18, Issue 9

Check for updates

Author Tags

  1. Dataset preparation
  2. Information retrieval
  3. Object detection

Qualifiers

  • Research-article

Funding Sources

  • National Key R & D Program of China
  • NSFC
  • Leading edge Technology Program of Jiangsu Natural Science
  • Science Foundation for Youths of Jiangsu

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)227
  • Downloads (Last 6 weeks)34
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Explainability for Property Violations in Cyberphysical Systems: An Immune-Inspired ApproachIEEE Software10.1109/MS.2024.338728941:5(43-51)Online publication date: 16-Apr-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media