research-article

MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection

Authors:

Houqiang LiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 3

Article No.: 78, Pages 1 - 22

https://doi.org/10.1145/3440087

Published: 22 July 2021 Publication History

Abstract

Recently, many scene text detection algorithms have achieved impressive performance by using convolutional neural networks. However, most of them do not make full use of the context among the hierarchical multi-level features to improve the performance of scene text detection. In this article, we present an efficient multi-level features enhanced cumulative framework based on instance segmentation for scene text detection. At first, we adopt a Multi-Level Features Enhanced Cumulative (MFEC) module to capture features of cumulative enhancement of representational ability. Then, a Multi-Level Features Fusion (MFF) module is designed to fully integrate both high-level and low-level MFEC features, which can adaptively encode scene text information. To verify the effectiveness of the proposed method, we perform experiments on six public datasets (namely, CTW1500, Total-text, MSRA-TD500, ICDAR2013, ICDAR2015, and MLT2017), and make comparisons with other state-of-the-art methods. Experimental results demonstrate that the proposed Multi-Level Features Enhanced Cumulative Network (MFECN) detector can well handle scene text instances with irregular shapes (i.e., curved, oriented, and horizontal) and achieves better or comparable results.

References

[1]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 834–848.

[2]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).

[3]

Zhineng Chen, Shanshan Ai, and Caiyan Jia. 2019. Structure-aware deep learning for product image classification. ACM Trans. Multim. Comput. Commun. Applic. 15, 1s (2019), 4.

Digital Library

[4]

Chee Kheng Ch’ng and Chee Seng Chan. 2017. Total-text: A comprehensive dataset for scene text detection and recognition. InProceedings of theInternational Conference on Document Analysis and Recognition (ICDAR). 935–942.

[5]

Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans. Multim. Comput. Commun. Applic. 14, 2 (2018), 48.

Digital Library

[6]

Yuchen Dai, Zheng Huang, Yuting Gao, Youxuan Xu, Kai Chen, Jie Guo, and Weidong Qiu. 2018. Fused text segmentation networks for multi-oriented scene text detection. In Proceedings of the International Conference on Pattern Recognition (ICPR). 3604–3609.

[7]

Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. 2018. PixelLink: Detecting scene text via instance segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 6773–6780.

[8]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88, 2 (2010), 303–338.

Digital Library

[9]

Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, and Du Tran. 2018. Detect-and-track: Efficient pose estimation in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 350–359.

[10]

Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic data for text localisation in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2315–2324.

[11]

Dafang He, Xiao Yang, Chen Liang, Zihan Zhou, Alexander G. Ororbi, Daniel Kifer, and C. Lee Giles. 2017. Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3519–3528.

[12]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2961–2969.

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.

[14]

Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li. 2017. Single shot text detector with regional attention. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 3047–3055.

[15]

Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, and Changming Sun. 2018. An end-to-end textspotter with explicit alignment and attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5020–5029.

[16]

Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu. 2017. Deep direct regression for multi-oriented scene text detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 745–753.

[17]

Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip H. S. Torr. 2017. Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3203–3212.

[18]

Han Hu, Chengquan Zhang, Yuxuan Luo, Yuzhuo Wang, Junyu Han, and Errui Ding. 2017. Wordsup: Exploiting word annotations for character based text detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 4940–4949.

[19]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7132–7141.

[20]

Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. 2018. Learning to segment every thing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4233–4241.

[21]

Shao Huang, Weiqiang Wang, Shengfeng He, and Rynson W. H. Lau. 2018. Egocentric hand detection via dynamic region growing. ACM Trans. Multim. Comput. Commun. Applic. 14, 1 (2018), 10.

Digital Library

[22]

Zhida Huang, Zhuoyao Zhong, Lei Sun, and Qiang Huo. 2019. Mask R-CNN with pyramid attention network for scene text detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). 764–772.

[23]

Jisoo Jeong, Hyojin Park, and Nojun Kwak. 2017. Enhancement of SSD by concatenating feature maps for object detection. arXiv preprint arXiv:1705.09587 (2017).

[24]

Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu et al. 2015. ICDAR 2015 competition on robust reading. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 1156–1160.

Digital Library

[25]

Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. 2013. ICDAR 2013 robust reading competition. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 1484–1493.

Digital Library

[26]

Wei Ke, Jie Chen, Jianbin Jiao, Guoying Zhao, and Qixiang Ye. 2017. SRN: Side-output residual network for object symmetry detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1068–1076.

[27]

Hyungtae Lee and Heesung Kwon. 2017. Going deeper with contextual CNN for hyperspectral image classification. IEEE Trans. Image Proc. 26, 10 (2017), 4843–4855.

Digital Library

[28]

Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, and Yichen Wei. 2017. Fully convolutional instance-aware semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2359–2367.

[29]

Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. Textboxes: A fast text detector with a single deep neural network. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 4161–4167.

Digital Library

[30]

Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, and Xiang Bai. 2018. Rotation-sensitive regression for oriented scene text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5909–5918.

[31]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2117–2125.

[32]

Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, and Qingjie Liu. 2019. Pyramid mask text detector. arXiv preprint arXiv:1903.11800 (2019).

[33]

Jiaming Liu, Chengquan Zhang, Yipeng Sun, Junyu Han, and Errui Ding. 2019. Detecting text in the wild with deep character embedding network. arXiv preprint arXiv:1901.00363 (2019).

[34]

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 8759–8768.

[35]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV). 21–37.

[36]

Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, and Junjie Yan. 2018. FOTS: Fast oriented text spotting with a unified network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5676–5685.

[37]

Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Kai Wang, and Xiang Bai. 2017. Richer convolutional features for edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)). 3000–3009.

[38]

Yuliang Liu, Lianwen Jin, Shuaitao Zhang, and Sheng Zhang. 2017. Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170 (2017).

[39]

Zichuan Liu, Guosheng Lin, Sheng Yang, Fayao Liu, Weisi Lin, and Wang Ling Goh. 2019. Towards robust curve text detection with conditional spatial expansion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7269–7278.

[40]

Zhandong Liu, Wengang Zhou, and Houqiang Li. 2019. AB-LSTM: Attention-based bidirectional LSTM model for scene text detection. ACM Trans. Multim. Comput. Commun. Applic. 15, 4 (2019), 1–23.

Digital Library

[41]

Zhandong Liu, Wengang Zhou, and Houqiang Li. 2019. Scene text detection with fully convolutional neural networks. Multim. Tools Applic. 78, 13 (2019), 18205–18227.

Digital Library

[42]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3431–3440.

[43]

Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, and Cong Yao. 2018. TextSnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV). 20–36.

Digital Library

[44]

Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. 2018. Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV). 67–83.

[45]

Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, and Xiang Bai. 2018. Multi-oriented scene text detection via corner localization and region segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7553–7563.

[46]

Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, and Xiangyang Xue. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multim. 20, 11 (2018), 3111–3122.

Digital Library

[47]

Nibal Nayef, Fei Yin, Imen Bizid, Hyunsoo Choi, Yuan Feng, Dimosthenis Karatzas, Zhenbo Luo, Umapada Pal, Christophe Rigaud, Joseph Chazalon et al. 2017. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 1454–1459.

[48]

S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.

Digital Library

[49]

Abhijit Guha Roy, Nassir Navab, and Christian Wachinger. 2019. Recalibrating fully convolutional networks with spatial and channel “Squeeze and Excitation” blocks. IEEE Trans. Med. Imag. 38, 2 (2019), 540–549.

[50]

Baoguang Shi, Xiang Bai, and Serge Belongie. 2017. Detecting oriented text in natural images by linking segments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2550–2558.

[51]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[52]

Jingkuan Song, Zhilong Zhou, Lianli Gao, Xing Xu, and Heng Tao Shen. 2018. Cumulative nets for edge detection. In Proceedings of the ACM International Conference on Multimedia (MM). 1847–1855.

Digital Library

[53]

Mingxing Tan, Ruoming Pang, and Quoc V. Le. 2020. EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 10781–10790.

[54]

Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. 2016. Detecting text in natural image with connectionist text proposal network. In Proceedings of the European Conference on Computer Vision (ECCV). 56–72.

[55]

Fangfang Wang, Liming Zhao, Xi Li, Xinchao Wang, and Dacheng Tao. 2018. Geometry-aware scene text detection with instance transformation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1381–1389.

[56]

Pengfei Wang, Chengquan Zhang, Fei Qi, Zuming Huang, Mengyi En, Junyu Han, Jingtuo Liu, Errui Ding, and Guangming Shi. 2019. A single-shot arbitrarily-shaped text detector based on context attended multi-task learning. In Proceedings of the ACM International Conference on Multimedia (MM). 1277–1285.

Digital Library

[57]

Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao. 2019. Shape robust text detection with progressive scale expansion network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 9336–9345.

[58]

Christian Wolf and Jean-Michel Jolion. 2006. Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recog. 8, 4 (2006), 280–296.

Digital Library

[59]

Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, and Guangyao Li. 2019. Scene text detection with supervised pyramid context network. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 9038–9045.

[60]

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1492–1500.

[61]

Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1395–1403.

Digital Library

[62]

Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, and Xiang Bai. 2019. TextField: Learning a deep direction field for irregular scene text detection. IEEE Trans. Image Proc. 28, 11 (2019), 5566--5579.

Digital Library

[63]

Chuhui Xue, Shijian Lu, and Fangneng Zhan. 2018. Accurate scene text detection through border semantics awareness and bootstrapping. In Proceedings of the European Conference on Computer Vision (ECCV). 355–372.

[64]

Chuhui Xue, Shijian Lu, and Wei Zhang. 2019. MSR: Multi-scale shape regression for scene text detection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 989–995.

Digital Library

[65]

Qiangpeng Yang, Mengli Cheng, Wenmeng Zhou, Yan Chen, Minghui Qiu, and Wei Lin. 2018. IncepText: A new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 1071–1077.

Digital Library

[66]

Cong Yao, Xiang Bai, and Wenyu Liu. 2014. A unified framework for multioriented text detection and recognition. IEEE Trans. Image Proc. 23, 11 (2014), 4737–4749.

[67]

Cong Yao, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. 2012. Detecting texts of arbitrary orientations in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1083–1090.

Digital Library

[68]

Xu-Cheng Yin, Ze-Yu Zuo, Shu Tian, and Cheng-Lin Liu. 2016. Text detection, tracking and recognition in video: A comprehensive survey. IEEE Trans. Image Proc. 25, 6 (2016), 2752–2773.

Digital Library

[69]

Jiahui Yu, Yuning Jiang, Zhangyang Wang, Zhimin Cao, and Thomas S. Huang. 2016. UnitBox: An advanced object detection network. In Proceedings of the ACM International Conference on Multimedia (MM). 516–520.

Digital Library

[70]

Xingyu Zeng, Wanli Ouyang, Bin Yang, Junjie Yan, and Xiaogang Wang. 2016. Gated bi-directional CNN for object detection. In Proceedings of the European Conference on Computer Vision (ECCV). 354–369.

[71]

Lu Zhang, Ju Dai, Huchuan Lu, You He, and Gang Wang. 2018. A bi-directional message passing model for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1741–1750.

[72]

Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, and Xiang Ruan. 2017. Amulet: Aggregating multi-level convolutional features for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 202–211.

[73]

Sheng Zhang, Yuliang Liu, Lianwen Jin, and Canjie Luo. 2018. Feature enhancement network: A refined scene text detector. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2612–2619.

[74]

Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai. 2016. Multi-oriented text detection with fully convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4159–4167.

[75]

Kai Zhao, Wei Shen, Shanghua Gao, Dandan Li, and Ming-Ming Cheng. 2018. Hi-Fi: Hierarchical feature integration for skeleton detection. In Proceedings of the International Joint Conference on Artificial Intelligenc (IJCAI). 1191–1197.

Digital Library

[76]

Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An efficient and accurate scene text detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5551–5560.

[77]

Yingying Zhu, Cong Yao, and Xiang Bai. 2016. Scene text detection and recognition: Recent advances and future trends. Front. Comput. Sci. 10, 1 (2016), 19–36.

Digital Library

Cited By

陈伟(2024)A Review on the Application of Segmentation-Based Text Detection Techniques for Natural ScenesArtificial Intelligence and Robotics Research10.12677/airr.2024.13204113:02(399-407)Online publication date: 2024
https://doi.org/10.12677/airr.2024.132041
Hou WLi GTian YHu D(2024)Towards Long Form Audio-visual Video UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672079Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3672079
Zhao JYang HHe HPeng JZhang WNi JSangaiah ACastiglione A(2024)Backdoor Two-Stream Video Models on Federated LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3651307Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3651307
Show More Cited By

Index Terms

MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Object detection
        Object recognition
      2. Computer vision tasks
        Scene understanding

Recommendations

Selective feature fusion network for salient object detection
Abstract
Fully convolutional neural networks have achieved great success in salient object detection, in which the effective use of multi‐layer features plays a critical role. Based on this advantage, many saliency detectors have emerged in recent years, ...

In this paper, we propose a selective feature fusion network which consists of a selective feature fusion module (SFM) and an attention‐guide hierarchical feature emphasis module (AEM). Selective feature fusion modules adaptively selects the important ...
Feature extraction and fusion network for salient object detection
Abstract
In the salient object detection (SOD) models based on convolutional neural network (CNN), the high-level semantic features and low-level features of the image are effectively fused and complementary, which can effectively improve the performance ...
Multilingual natural scene text detection via global feature fusion: Multilingual natural scene text detection via global feature fusion
Abstract
Natural scene text detection is a significant challenge in computer vision, with tremendous potential applications in multilingual, diverse, and complex text scenarios. A multilingual text detection model based on the Cascade Mask R-CNN is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 3

August 2021

443 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3476118

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Accepted: 01 November 2021

Published: 22 July 2021

Revised: 01 September 2020

Received: 01 August 2019

Published in TOMM Volume 17, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Natural Science Foundation of Xinjiang Province
NSFC
Youth Innovation Promotion Association CAS

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
211
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

陈伟(2024)A Review on the Application of Segmentation-Based Text Detection Techniques for Natural ScenesArtificial Intelligence and Robotics Research10.12677/airr.2024.13204113:02(399-407)Online publication date: 2024
https://doi.org/10.12677/airr.2024.132041
Hou WLi GTian YHu D(2024)Towards Long Form Audio-visual Video UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672079Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1145/3672079
Zhao JYang HHe HPeng JZhang WNi JSangaiah ACastiglione A(2024)Backdoor Two-Stream Video Models on Federated LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3651307Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3651307
Liu WCai JLi QLiao CCao JHe SYu Y(2024)Learning Nighttime Semantic Segmentation the Hard WayACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365003220:7(1-23)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3650032
Gao XPang YLiu YHan MYu JWang WChen Y(2024)Multimodal Visual-Semantic Representations Learning for Scene Text RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364655120:7(1-18)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1145/3646551
Chen QHuang TLiu Q(2024)SWRM: Similarity Window Reweighting and Margin for Long-Tailed RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364381620:6(1-18)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3643816
Liang RZhang SZhang WZhang GTang J(2024)Nonlocal Hybrid Network for Long-tailed Image ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363025620:4(1-22)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3630256
Luo JHu D(2023)An Image Classification Method Based on Adaptive Attention Mechanism and Feature Extraction NetworkComputational Intelligence and Neuroscience10.1155/2023/43055942023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/4305594
You WRen MMa YWu DYang JLiu XLiu T(2023)Practical Charger Placement Scheme for Wireless Rechargeable Sensor Networks with ObstaclesACM Transactions on Sensor Networks10.1145/361443120:1(1-23)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3614431
Guo KChen LZhu XKui XZhang JShi H(2023)Double-Layer Search and Adaptive Pooling Fusion for Reference-Based Image Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360493720:1(1-23)Online publication date: 25-Aug-2023
https://dl.acm.org/doi/10.1145/3604937
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents