research-article

Multimodal Super-Resolution for Vessel License Plate Recognition

Authors:

Huaiyong Zhang,

Zhenchang Zhang,

Lei ChenAuthors Info & Claims

ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition

Pages 279 - 285

https://doi.org/10.1145/3633637.3633681

Published: 28 February 2024 Publication History

Abstract

In conventional image super-resolution tasks, the primary elements targeted for enhancement are objects in natural scenes, which convey predominantly visual texture information. However, text in scene images differs from these objects in containing two levels of content: visual texture and semantic information. Unfortunately, most natural scene text super-resolution algorithms treat text as a regular natural object without considering its semantic information. This paper proposes a Multimodal Super-Resolution Neural Network (MSRNN) combining text and image information. Specifically, we extract semantic information from text images using text knowledge and utilize this to guide the super-resolution network in enhancing low-resolution text images. Additionally, we employ a visual graph convolutional network (VIG) to extract image features while preserving structural information. Furthermore, we design a contextual orthogonal attention module to effectively extract the structural and edge information of text images and integrate the text-image features for recognition. Experimental results demonstrate that our model achieves state-of-the-art performance among existing models.

References

[1]

T Chan, K Jia, S Gao, J Lu, Z Zeng, and YP Ma. 2014. A simple deep learning baseline for image classification? arXiv preprint. arXiv preprint arXiv:1404.3606 1, 3 (2014).

[2]

Chun-Fu Richard Chen, Quanfu Fan, and Rameswar Panda. 2021. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF international conference on computer vision. 357–366.

[3]

Jingye Chen, Bin Li, and Xiangyang Xue. 2021. Scene text telescope: Text-focused scene image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12026–12035.

[4]

Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. 2020. Simple and deep graph convolutional networks. In International conference on machine learning. PMLR, 1725–1735.

[5]

Xi Cheng, Xiang Li, Jian Yang, and Ying Tai. 2018. SESR: Single image super resolution with recursive squeeze and excitation networks. In 2018 24th International conference on pattern recognition (ICPR). IEEE, 147–152.

[6]

A Dosovitskiy, L Beyer, A Kolesnikov, D Weissenborn, X Zhai, and T Unterthiner. 2020. Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[7]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[8]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.

[9]

Quan Huynh-Thu and Mohammed Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment. Electronics letters 44, 13 (2008), 800–801.

[10]

Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681–4690.

[11]

Siqi Li, Changqing Zou, Yipeng Li, Xibin Zhao, and Yue Gao. 2020. Attention-based multi-modal fusion network for semantic scene completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11402–11409.

[12]

Jianqi Ma, Shi Guo, and Lei Zhang. 2023. Text prior guided scene text image super-resolution. IEEE Transactions on Image Processing 32 (2023), 1341–1353.

Digital Library

[13]

Jianqi Ma, Zhetong Liang, and Lei Zhang. 2022. A text attention network for spatial deformation robust scene text image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5911–5920.

[14]

Junhyug Noh, Wonho Bae, Wonhee Lee, Jinhwan Seo, and Gunhee Kim. 2019. Better to follow, follow to be better: Towards precise supervision of feature super-resolution for small object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9725–9734.

[15]

Lian Shen, JiaXiang Lin, DanDan Bai, ZhenChang Zhang, ChangYing Wang, and Xiang Lei. 2022. Multi-Level Relational Knowledge Distillation for Low Resolution Image Recognition(ICCPR ’21). Association for Computing Machinery, New York, NY, USA, 31–35. https://doi.org/10.1145/3497623.3497629

Digital Library

[16]

Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39, 11 (2016), 2298–2304.

Digital Library

[17]

Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2018. Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence 41, 9 (2018), 2035–2048.

[18]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[19]

Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, and Xiang Bai. 2020. Scene text image super-resolution in the wild. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16. Springer, 650–666.

[20]

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous graph attention network. In The world wide web conference. 2022–2032.

[21]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.

Digital Library

[22]

Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10941–10950.

[23]

Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017).

[24]

Deli Yu, Xuan Li, Chengquan Zhang, Tao Liu, Junyu Han, Jingtuo Liu, and Errui Ding. 2020. Towards accurate scene text recognition with semantic reasoning networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12113–12122.

[25]

Junwei Zhou, Xi Wang, Jiao Dai, and Jizhong Han. 2023. Improved Fusion of Visual and Semantic Representations by Gated Co-Attention for Scene Text Recognition. In Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition (Beijing, China) (ICCPR ’22). Association for Computing Machinery, New York, NY, USA, 204–209. https://doi.org/10.1145/3581807.3581837

Digital Library

Index Terms

Multimodal Super-Resolution for Vessel License Plate Recognition

Recommendations

Deep Residual Attention Network for Spectral Image Super-Resolution
Computer Vision – ECCV 2018 Workshops
Abstract
Spectral imaging sensors often suffer from low spatial resolution, as there exists an essential tradeoff between the spectral and spatial resolutions that can be simultaneously achieved, especially when the temporal resolution needs to be ...
Face image super-resolution through locality-induced support regression

In this paper we propose a novel face image super-resolution (SR) method named Locality-induced Support Regression (LiSR). Given a low-resolution (LR) input patch, we learn a mapping function between the local support LR and high-resolution (HR) patch ...
Edge-directed single image super-resolution via cross-resolution sharpening function learning

Edge-directed single image super-resolution methods have been paid more attentions due to their sharp edge preserving in the recovered high-resolution image. Their core is the high-resolution gradient estimation. In this paper, we propose a novel cross-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition

October 2023

589 pages

ISBN:9798400707988

DOI:10.1145/3633637

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCPR 2023

ICCPR 2023: 2023 12th International Conference on Computing and Pattern Recognition

October 27 - 29, 2023

Qingdao, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
19
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)6

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten