skip to main content
10.1145/3633637.3633681acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccprConference Proceedingsconference-collections
research-article

Multimodal Super-Resolution for Vessel License Plate Recognition

Published: 28 February 2024 Publication History

Abstract

In conventional image super-resolution tasks, the primary elements targeted for enhancement are objects in natural scenes, which convey predominantly visual texture information. However, text in scene images differs from these objects in containing two levels of content: visual texture and semantic information. Unfortunately, most natural scene text super-resolution algorithms treat text as a regular natural object without considering its semantic information. This paper proposes a Multimodal Super-Resolution Neural Network (MSRNN) combining text and image information. Specifically, we extract semantic information from text images using text knowledge and utilize this to guide the super-resolution network in enhancing low-resolution text images. Additionally, we employ a visual graph convolutional network (VIG) to extract image features while preserving structural information. Furthermore, we design a contextual orthogonal attention module to effectively extract the structural and edge information of text images and integrate the text-image features for recognition. Experimental results demonstrate that our model achieves state-of-the-art performance among existing models.

References

[1]
T Chan, K Jia, S Gao, J Lu, Z Zeng, and YP Ma. 2014. A simple deep learning baseline for image classification? arXiv preprint. arXiv preprint arXiv:1404.3606 1, 3 (2014).
[2]
Chun-Fu Richard Chen, Quanfu Fan, and Rameswar Panda. 2021. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF international conference on computer vision. 357–366.
[3]
Jingye Chen, Bin Li, and Xiangyang Xue. 2021. Scene text telescope: Text-focused scene image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12026–12035.
[4]
Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. 2020. Simple and deep graph convolutional networks. In International conference on machine learning. PMLR, 1725–1735.
[5]
Xi Cheng, Xiang Li, Jian Yang, and Ying Tai. 2018. SESR: Single image super resolution with recursive squeeze and excitation networks. In 2018 24th International conference on pattern recognition (ICPR). IEEE, 147–152.
[6]
A Dosovitskiy, L Beyer, A Kolesnikov, D Weissenborn, X Zhai, and T Unterthiner. 2020. Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[8]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.
[9]
Quan Huynh-Thu and Mohammed Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment. Electronics letters 44, 13 (2008), 800–801.
[10]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681–4690.
[11]
Siqi Li, Changqing Zou, Yipeng Li, Xibin Zhao, and Yue Gao. 2020. Attention-based multi-modal fusion network for semantic scene completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11402–11409.
[12]
Jianqi Ma, Shi Guo, and Lei Zhang. 2023. Text prior guided scene text image super-resolution. IEEE Transactions on Image Processing 32 (2023), 1341–1353.
[13]
Jianqi Ma, Zhetong Liang, and Lei Zhang. 2022. A text attention network for spatial deformation robust scene text image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5911–5920.
[14]
Junhyug Noh, Wonho Bae, Wonhee Lee, Jinhwan Seo, and Gunhee Kim. 2019. Better to follow, follow to be better: Towards precise supervision of feature super-resolution for small object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9725–9734.
[15]
Lian Shen, JiaXiang Lin, DanDan Bai, ZhenChang Zhang, ChangYing Wang, and Xiang Lei. 2022. Multi-Level Relational Knowledge Distillation for Low Resolution Image Recognition(ICCPR ’21). Association for Computing Machinery, New York, NY, USA, 31–35. https://doi.org/10.1145/3497623.3497629
[16]
Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39, 11 (2016), 2298–2304.
[17]
Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2018. Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence 41, 9 (2018), 2035–2048.
[18]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[19]
Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, and Xiang Bai. 2020. Scene text image super-resolution in the wild. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16. Springer, 650–666.
[20]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous graph attention network. In The world wide web conference. 2022–2032.
[21]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[22]
Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10941–10950.
[23]
Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2017. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875 (2017).
[24]
Deli Yu, Xuan Li, Chengquan Zhang, Tao Liu, Junyu Han, Jingtuo Liu, and Errui Ding. 2020. Towards accurate scene text recognition with semantic reasoning networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12113–12122.
[25]
Junwei Zhou, Xi Wang, Jiao Dai, and Jizhong Han. 2023. Improved Fusion of Visual and Semantic Representations by Gated Co-Attention for Scene Text Recognition. In Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition (Beijing, China) (ICCPR ’22). Association for Computing Machinery, New York, NY, USA, 204–209. https://doi.org/10.1145/3581807.3581837

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICCPR '23: Proceedings of the 2023 12th International Conference on Computing and Pattern Recognition
October 2023
589 pages
ISBN:9798400707988
DOI:10.1145/3633637
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Contextual Orthogonal Attention
  2. Multi-modal
  3. Super-Resolution
  4. VIG
  5. Vessel license plate Recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICCPR 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 19
    Total Downloads
  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)6
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media