skip to main content
10.1145/3595916.3626374acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Multi-scale and Dense Object Detector for Tibetan Thangka Images

Published: 01 January 2024 Publication History

Abstract

Thangka cultural elements detection aims to locate and identify instances in Thangka. However, as a unique form of pictorial art, Thangka exhibits distinct spatial structures that deviate significantly from general images in scale and density. Therefore, it is challenging for most state-of-the-art detectors designed for natural scenes to handle Thangka cultural elements detection effectively. To overcome this issue, we propose a multi-scale and dense object detector referred as MDDet. It embeds a multi-scale receptive field fusion module (MRF) that enlarges the receptive field while capturing the spatial and channel relationships at different scales, which significantly enriches the multi-scale features extracted from the backbone. In addition, we introduce a threshold-slicing aided hyper inference (T-SAHI) scheme, which adaptively slices images in dense scenarios to aid with dense object detection in the test time. We thoroughly evaluate our method, and MDDet outperforms the prior art by a clear margin on the Thangka dataset, achieving an absolute improvement of 1.9% in average precision (AP). For the challenging medium and small objects in Thangka, MDDet obtains wide margins of 12% and 3.7% in accuracy improvement, respectively. It also shows strong generalization ability when evaluated on general scenarios, e.g., Pascal VOC 2007 and MS COCO, validating the role of MDDet in object detection.

Supplementary Material

The supplemental file of our paper and the code of our model. (mmasia23-26-supplementary.zip)
Supplementary material for the paper: A Multi-scale and Dense Object Detector for Tibetan Thangka Images (supplementary.pdf)

References

[1]
Fatih Cagatay Akyon, Sinan Onur Altinuc, and Alptekin Temizel. 2022. Slicing aided hyper inference and fine-tuning for small object detection. In 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 966–970.
[2]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154–6162.
[3]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, 213–229.
[4]
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, 2019. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).
[5]
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 379–387.
[6]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society, 764–773. https://doi.org/10.1109/ICCV.2017.89
[7]
Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, and Lei Zhang. 2021. Dynamic DETR: End-to-End Object Detection with Dynamic Attention. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2968–2977. https://doi.org/10.1109/ICCV48922.2021.00298
[8]
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2009. The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2009), 303–308.
[9]
Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. 2021. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).
[10]
Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
[11]
Tao Kong, Fuchun Sun, Wen-bing Huang, and Huaping Liu. 2018. Deep Feature Pyramid Reconfiguration for Object Detection. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part V(Lecture Notes in Computer Science, Vol. 11209), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 172–188. https://doi.org/10.1007/978-3-030-01228-1_11
[12]
Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, and Shanyue Guan. 2020. Density Map Guided Object Detection in Aerial Images. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, 2020. Computer Vision Foundation / IEEE, 737–746. https://doi.org/10.1109/CVPRW50498.2020.00103
[13]
Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhao-Xiang Zhang. 2019. Scale-Aware Trident Networks for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 6053–6062. https://doi.org/10.1109/ICCV.2019.00615
[14]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V(Lecture Notes in Computer Science, Vol. 8693), David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer, 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
[15]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.
[16]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.
[17]
Ziming Liu, Guangyu Gao, Lin Sun, and Li Fang. 2020. IPG-Net: Image Pyramid Guidance Network for Small Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, 2020. Computer Vision Foundation / IEEE, 4422–4430. https://doi.org/10.1109/CVPRW50498.2020.00521
[18]
Ziming Liu, Guangyu Gao, Lin Sun, and Li Fang. 2020. IPG-net: Image pyramid guidance network for small object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 1026–1027.
[19]
Yanchun Ma, Yongjian Liu, Qing Xie, Shengwu Xiong, Lihua Bai, and Anshu Hu. 2021. A Tibetan Thangka data set and relative tasks. Image and Vision Computing 108 (2021), 104125.
[20]
Siyuan Qiao, Liang-Chieh Chen, and Alan Yuille. 2021. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10213–10224.
[21]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).
[22]
Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, 2021. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14454–14463.
[23]
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10781–10790.
[24]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2022. FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2022), 1922–1933. https://doi.org/10.1109/TPAMI.2020.3032166
[25]
Fan Yang, Heng Fan, Peng Chu, Erik Blasch, and Haibin Ling. 2019. Clustered Object Detection in Aerial Images. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 8310–8319. https://doi.org/10.1109/ICCV.2019.00840
[26]
Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z. Li. 2020. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, 9756–9765. https://doi.org/10.1109/CVPR42600.2020.00978
[27]
Lin Zhou, Haoming Cai, Jinjin Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, and Chao Dong. 2023. Efficient image super-resolution using vast-receptive-field attention. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II. Springer, 256–272.
[28]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020).
[29]
Yousong Zhu, Chaoyang Zhao, Jinqiao Wang, Xu Zhao, Yi Wu, and Hanqing Lu. 2017. CoupleNet: Coupling Global Structure with Local Parts for Object Detection. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society, 4146–4154. https://doi.org/10.1109/ICCV.2017.444

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
December 2023
745 pages
ISBN:9798400702051
DOI:10.1145/3595916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dense object detection
  2. Feature fusion
  3. Multi-scale object detection
  4. Thangka images

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MMAsia '23
Sponsor:
MMAsia '23: ACM Multimedia Asia
December 6 - 8, 2023
Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media