A Transformer-Based Decoupled Attention Network for Text Recognition in Shopping Receipt Images

Ren, Lang; Zhou, Haibin; Chen, Jiaqi; Shao, Lujiao; Wu, Yingji; Zhang, Haijun

doi:10.1007/978-981-16-5188-5_40

A Transformer-Based Decoupled Attention Network for Text Recognition in Shopping Receipt Images

Lang Ren¹⁰,
Haibin Zhou¹⁰,
Jiaqi Chen¹⁰,
Lujiao Shao¹⁰,
Yingji Wu¹⁰ &
…
Haijun Zhang¹⁰

Conference paper
First Online: 20 August 2021

1766 Accesses
3 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1449))

Abstract

Optical character recognition (OCR) of shopping receipts plays an important role in smart business and personal financial management. Many challenging issues remain in current OCR systems for text recognition of shopping receipts captured by mobile phones. This research constructs a multi-task model by integrating saliency object detection as a branch task, which enables us to filter out irrelevant text instances by detecting the outline of a shopping receipt. Moreover, the developed model utilized a deformable convolution so as to learning visual information more effectively. On the other hand, to deal with attention drift of text recognition, we propose a transformer-based decoupled attention network, which is able to decouple the attention and prediction processes in attention mechanism. This mechanism can not only increase prediction accuracy, but also increase the inference speed. Extensive experimental results on a large-scale real-life dataset exhibit the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Mithun, N.C., et al.: Webly supervised joint embedding for cross-modal image-text retrieval. In: Proceedings of the 26th ACM International Conference on Multimedia (2018)
Google Scholar
Jiang, Y., Zhu, X., Wang, X., et al.: R2CNN: rotational region cnn for orientation robust scene text detection (2017). arXiv:170609579
Zhang, S., Liu, Y., Jin, L., et al.: Feature enhancement network: a refined scene text detector (2017). arXiv:171104249
Liao, M., et al.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision. Springer, Cham (2016)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vis. 129, 1–24 (2020)
Google Scholar
Tian, Z., et al.: Detecting text in natural image with connectionist text proposal network. In: European Conference on Computer Vision. Springer, Cham (2016)
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Google Scholar
Baek, Y., et al.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Feng, W., et al.: Textdragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Google Scholar
Deng, D., Liu, H., Li, X., et al.: Pixellink: detecting scene text via instance segmentation (2018). arXiv:180101315
Li, Y., Yu, Y., Li, Z., et al.: Pixel-anchor: a fast oriented scene text detector with combined networks (2018). arXiv:181107432
Wang, W., Xie, E., Song, X., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8440–8449 (2019)
Google Scholar
Wang, W., Xie, E., Li, X., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Liao, M., Wan, Z., Yao, C., et al.: Real-time scene text detection with differentiable binarization. In: AAAI Conference on Artificial Intelligence, pp. 11474–11481 (2020)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11) 2298–2304 (2016)
Article Google Scholar
Gao, Y., Chen, Y., Wang, J., et al.: Reading scene text with attention convolutional sequence modeling (2017). arXiv:170904303
Nguyen, D., Tran, N., Le, H.: Improving long handwritten text line recognition with convolutional multi-way associative memory (2019). arXiv:191101577
Yin, F., Wu. Y.-C., Zhang, X.-Y., et al.: Scene text recognition with sliding convolutional character models (2017). arXiv:170901727
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)
Google Scholar
Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239 (2016)
Google Scholar
Cheng, Z., Bai, F., Xu, Y., et al.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5076–5084 (2017)
Google Scholar
Bai, F., Cheng, Z., Niu, Y., et al.: Edit probability for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1508–1516 (2018)
Google Scholar
Liu, Z., Lin, G., Yang, S., et al.: Learning markov clustering networks for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Shi, B., Yang, M., Wang, X., et al.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Google Scholar
Berman, M., Amal, R.T., Blaschko, M.B.: The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Wang, T., Zhu, Y., Jin, L., et al.: Decoupled attention network for text recognition. In: AAAI Conference on Artificial Intelligence, pp. 12216–12224 (2020)
Google Scholar
Ryan, M., Hanafiah, N.,: An examination of character recognition on ID card using template matching approach. Procedia Comput. Sci. 59, 520–529 (2015)
Google Scholar
Chang, S.-L., et al.: Automatic license plate recognition. IEEE Trans. Intell. Transp. Syst. 5(1), 42–53 (2004)
Google Scholar
Maes, P.: Smart commerce: the future of intelligent agents in cyberspace. J. Interact. Mark. 13(3) 66–76 (1999)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Ji, Y., Haijun, Z., Wu, Q.J.: Salient object detection via multi-scale attention CNN. Neurocomputing 322 130–140 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Lin, T.-Y., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant no. 61972112 and no. 61832004, the Guangdong Basic and Applied Basic Research Foundation under Grant no. 2021B1515020088, and the HITSZ-J&A Joint Laboratory of Digital Design and Intelligent Fabrication under Grant no. HITSZ-J&A-2021A01.

Author information

Authors and Affiliations

Department of Computer Science, Harbin Institute of Technology, Shenzhen, China
Lang Ren, Haibin Zhou, Jiaqi Chen, Lujiao Shao, Yingji Wu & Haijun Zhang

Authors

Lang Ren
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lujiao Shao
View author publications
You can also search for this author in PubMed Google Scholar
Yingji Wu
View author publications
You can also search for this author in PubMed Google Scholar
Haijun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haijun Zhang .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Haijun Zhang
Nanfang College of Sun Yat-sen University, Guangzhou, China
Zhi Yang
Hefei University of Technology, Hefei, China
Zhao Zhang
Chongqing University, Chongqing, China
Zhou Wu
South China Normal University, Guangzhou, China
Tianyong Hao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ren, L., Zhou, H., Chen, J., Shao, L., Wu, Y., Zhang, H. (2021). A Transformer-Based Decoupled Attention Network for Text Recognition in Shopping Receipt Images. In: Zhang, H., Yang, Z., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2021. Communications in Computer and Information Science, vol 1449. Springer, Singapore. https://doi.org/10.1007/978-981-16-5188-5_40

Download citation

DOI: https://doi.org/10.1007/978-981-16-5188-5_40
Published: 20 August 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5187-8
Online ISBN: 978-981-16-5188-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics