research-article

A Multi-scale and Dense Object Detector for Tibetan Thangka Images

Authors:

Yongjian LiuAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 5, Pages 1 - 7

https://doi.org/10.1145/3595916.3626374

Published: 01 January 2024 Publication History

Abstract

Thangka cultural elements detection aims to locate and identify instances in Thangka. However, as a unique form of pictorial art, Thangka exhibits distinct spatial structures that deviate significantly from general images in scale and density. Therefore, it is challenging for most state-of-the-art detectors designed for natural scenes to handle Thangka cultural elements detection effectively. To overcome this issue, we propose a multi-scale and dense object detector referred as MDDet. It embeds a multi-scale receptive field fusion module (MRF) that enlarges the receptive field while capturing the spatial and channel relationships at different scales, which significantly enriches the multi-scale features extracted from the backbone. In addition, we introduce a threshold-slicing aided hyper inference (T-SAHI) scheme, which adaptively slices images in dense scenarios to aid with dense object detection in the test time. We thoroughly evaluate our method, and MDDet outperforms the prior art by a clear margin on the Thangka dataset, achieving an absolute improvement of 1.9% in average precision (AP). For the challenging medium and small objects in Thangka, MDDet obtains wide margins of 12% and 3.7% in accuracy improvement, respectively. It also shows strong generalization ability when evaluated on general scenarios, e.g., Pascal VOC 2007 and MS COCO, validating the role of MDDet in object detection.

Supplementary Material

The supplemental file of our paper and the code of our model. (mmasia23-26-supplementary.zip)

Download
22.23 MB

Supplementary material for the paper: A Multi-scale and Dense Object Detector for Tibetan Thangka Images (supplementary.pdf)

Download
1.78 MB

References

[1]

Fatih Cagatay Akyon, Sinan Onur Altinuc, and Alptekin Temizel. 2022. Slicing aided hyper inference and fine-tuning for small object detection. In 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 966–970.

[2]

Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154–6162.

[3]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer, 213–229.

[4]

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, 2019. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).

[5]

Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 379–387.

[6]

Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society, 764–773. https://doi.org/10.1109/ICCV.2017.89

[7]

Xiyang Dai, Yinpeng Chen, Jianwei Yang, Pengchuan Zhang, Lu Yuan, and Lei Zhang. 2021. Dynamic DETR: End-to-End Object Detection with Dynamic Attention. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2968–2977. https://doi.org/10.1109/ICCV48922.2021.00298

[8]

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2009. The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2009), 303–308.

[9]

Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. 2021. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).

[10]

Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).

[11]

Tao Kong, Fuchun Sun, Wen-bing Huang, and Huaping Liu. 2018. Deep Feature Pyramid Reconfiguration for Object Detection. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part V(Lecture Notes in Computer Science, Vol. 11209), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 172–188. https://doi.org/10.1007/978-3-030-01228-1_11

Digital Library

[12]

Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, and Shanyue Guan. 2020. Density Map Guided Object Detection in Aerial Images. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, 2020. Computer Vision Foundation / IEEE, 737–746. https://doi.org/10.1109/CVPRW50498.2020.00103

[13]

Yanghao Li, Yuntao Chen, Naiyan Wang, and Zhao-Xiang Zhang. 2019. Scale-Aware Trident Networks for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 6053–6062. https://doi.org/10.1109/ICCV.2019.00615

[14]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V(Lecture Notes in Computer Science, Vol. 8693), David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer, 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

[15]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.

[16]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988.

[17]

Ziming Liu, Guangyu Gao, Lin Sun, and Li Fang. 2020. IPG-Net: Image Pyramid Guidance Network for Small Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, June 14-19, 2020. Computer Vision Foundation / IEEE, 4422–4430. https://doi.org/10.1109/CVPRW50498.2020.00521

[18]

Ziming Liu, Guangyu Gao, Lin Sun, and Li Fang. 2020. IPG-net: Image pyramid guidance network for small object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 1026–1027.

[19]

Yanchun Ma, Yongjian Liu, Qing Xie, Shengwu Xiong, Lihua Bai, and Anshu Hu. 2021. A Tibetan Thangka data set and relative tasks. Image and Vision Computing 108 (2021), 104125.

[20]

Siyuan Qiao, Liang-Chieh Chen, and Alan Yuille. 2021. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10213–10224.

[21]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).

[22]

Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, 2021. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14454–14463.

[23]

Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10781–10790.

[24]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2022. FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2022), 1922–1933. https://doi.org/10.1109/TPAMI.2020.3032166

[25]

Fan Yang, Heng Fan, Peng Chu, Erik Blasch, and Haibin Ling. 2019. Clustered Object Detection in Aerial Images. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 8310–8319. https://doi.org/10.1109/ICCV.2019.00840

[26]

Shifeng Zhang, Cheng Chi, Yongqiang Yao, Zhen Lei, and Stan Z. Li. 2020. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, 9756–9765. https://doi.org/10.1109/CVPR42600.2020.00978

[27]

Lin Zhou, Haoming Cai, Jinjin Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, and Chao Dong. 2023. Efficient image super-resolution using vast-receptive-field attention. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II. Springer, 256–272.

[28]

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020).

[29]

Yousong Zhu, Chaoyang Zhao, Jinqiao Wang, Xu Zhao, Yi Wu, and Hanqing Lu. 2017. CoupleNet: Coupling Global Structure with Local Parts for Object Detection. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society, 4146–4154. https://doi.org/10.1109/ICCV.2017.444

Index Terms

A Multi-scale and Dense Object Detector for Tibetan Thangka Images
1. Applied computing
  1. Arts and humanities
    1. Fine arts
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

Thangka Mural Super-Resolution Based on Nimble Convolution and Overlapping Window Transformer
Pattern Recognition and Computer Vision
Abstract
Thangka murals are important cultural heritages of Tibet, but most of the existing Thangka images are of low resolution. Thangka mural super-resolution reconstruction is very important for the protection of Tibetan cultural heritage. Transformer-...
Dense object detection for remote sensing images based on multi-scale partitioning and super-resolution optimization
SPCNC '24: Proceedings of the 3rd International Conference on Signal Processing, Computer Networks and Communications

Detecting dense objects in remote sensing image is a critical yet challenging task due to variations in target scale, image resolution, and complex backgrounds. Traditional detection methods often struggle with these dense regions, particularly for small ...
Addressing scale imbalance for small object detection with dense detector
Abstract
There are severe challenges on small object detection when using general object detector, especially scale imbalance on samples and features. Anchor-based detector performs poorly on small object detection because IoUs are too low to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
Key Research Program of Hubei
Social Science Foundation of Ministry of Education

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
77
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten