Abstract
Optical character recognition (OCR) of shopping receipts plays an important role in smart business and personal financial management. Many challenging issues remain in current OCR systems for text recognition of shopping receipts captured by mobile phones. This research constructs a multi-task model by integrating saliency object detection as a branch task, which enables us to filter out irrelevant text instances by detecting the outline of a shopping receipt. Moreover, the developed model utilized a deformable convolution so as to learning visual information more effectively. On the other hand, to deal with attention drift of text recognition, we propose a transformer-based decoupled attention network, which is able to decouple the attention and prediction processes in attention mechanism. This mechanism can not only increase prediction accuracy, but also increase the inference speed. Extensive experimental results on a large-scale real-life dataset exhibit the effectiveness of our proposed method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Mithun, N.C., et al.: Webly supervised joint embedding for cross-modal image-text retrieval. In: Proceedings of the 26th ACM International Conference on Multimedia (2018)
Jiang, Y., Zhu, X., Wang, X., et al.: R2CNN: rotational region cnn for orientation robust scene text detection (2017). arXiv:170609579
Zhang, S., Liu, Y., Jin, L., et al.: Feature enhancement network: a refined scene text detector (2017). arXiv:171104249
Liao, M., et al.: Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision. Springer, Cham (2016)
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vis. 129, 1–24 (2020)
Tian, Z., et al.: Detecting text in natural image with connectionist text proposal network. In: European Conference on Computer Vision. Springer, Cham (2016)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Baek, Y., et al.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Feng, W., et al.: Textdragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Deng, D., Liu, H., Li, X., et al.: Pixellink: detecting scene text via instance segmentation (2018). arXiv:180101315
Li, Y., Yu, Y., Li, Z., et al.: Pixel-anchor: a fast oriented scene text detector with combined networks (2018). arXiv:181107432
Wang, W., Xie, E., Song, X., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8440–8449 (2019)
Wang, W., Xie, E., Li, X., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Liao, M., Wan, Z., Yao, C., et al.: Real-time scene text detection with differentiable binarization. In: AAAI Conference on Artificial Intelligence, pp. 11474–11481 (2020)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11) 2298–2304 (2016)
Gao, Y., Chen, Y., Wang, J., et al.: Reading scene text with attention convolutional sequence modeling (2017). arXiv:170904303
Nguyen, D., Tran, N., Le, H.: Improving long handwritten text line recognition with convolutional multi-way associative memory (2019). arXiv:191101577
Yin, F., Wu. Y.-C., Zhang, X.-Y., et al.: Scene text recognition with sliding convolutional character models (2017). arXiv:170901727
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)
Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2231–2239 (2016)
Cheng, Z., Bai, F., Xu, Y., et al.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5076–5084 (2017)
Bai, F., Cheng, Z., Niu, Y., et al.: Edit probability for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1508–1516 (2018)
Liu, Z., Lin, G., Yang, S., et al.: Learning markov clustering networks for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Shi, B., Yang, M., Wang, X., et al.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Berman, M., Amal, R.T., Blaschko, M.B.: The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Wang, T., Zhu, Y., Jin, L., et al.: Decoupled attention network for text recognition. In: AAAI Conference on Artificial Intelligence, pp. 12216–12224 (2020)
Ryan, M., Hanafiah, N.,: An examination of character recognition on ID card using template matching approach. Procedia Comput. Sci. 59, 520–529 (2015)
Chang, S.-L., et al.: Automatic license plate recognition. IEEE Trans. Intell. Transp. Syst. 5(1), 42–53 (2004)
Maes, P.: Smart commerce: the future of intelligent agents in cyberspace. J. Interact. Mark. 13(3) 66–76 (1999)
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Ji, Y., Haijun, Z., Wu, Q.J.: Salient object detection via multi-scale attention CNN. Neurocomputing 322 130–140 (2018)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Lin, T.-Y., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant no. 61972112 and no. 61832004, the Guangdong Basic and Applied Basic Research Foundation under Grant no. 2021B1515020088, and the HITSZ-J&A Joint Laboratory of Digital Design and Intelligent Fabrication under Grant no. HITSZ-J&A-2021A01.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ren, L., Zhou, H., Chen, J., Shao, L., Wu, Y., Zhang, H. (2021). A Transformer-Based Decoupled Attention Network for Text Recognition in Shopping Receipt Images. In: Zhang, H., Yang, Z., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2021. Communications in Computer and Information Science, vol 1449. Springer, Singapore. https://doi.org/10.1007/978-981-16-5188-5_40
Download citation
DOI: https://doi.org/10.1007/978-981-16-5188-5_40
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-5187-8
Online ISBN: 978-981-16-5188-5
eBook Packages: Computer ScienceComputer Science (R0)