Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations

Xu, Yikun; Dai, Pengwen; Cao, Xiaochun

doi:10.1007/978-3-030-92310-5_62

Yikun Xu^10,11,
Pengwen Dai^10,11 &
Xiaochun Cao^10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1517))

Included in the following conference series:

International Conference on Neural Information Processing

2122 Accesses
2 Citations

Abstract

Scene text recognition (STR) has made tremendous progress in the era of deep learning. However, the attack of the sequential STR does not attract sufficient scholarly attention. The very few existing researches to fool STR belong to white-box attacks and thus would have limitations in practical applications. In this paper, we propose a novel black-box attack on STR models, only using the probability distribution of the model output. Instead of disturbing most pixels like existing STR attack methods, our proposed approach only disturbs very few pixels and utilizes own characteristics of recurrent neural networks (RNNs) to propagate perturbations. Experiments validate the effectiveness and superiority of our attack approach.

Y. Xu and P. Dai—Equal contribution.

This research work has been funded by the Guangdong United Fund (Grant No. E010061112), the Xinjiang Fund (Grant No. Y910071112) and the Key Fund of the Institute of Computing Technology of the Chinese Academy of Sciences (Grant No. Y950421112).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adversarial Attack on Scene Text Recognition Based on Adversarial Networks

Scene Text Recognition: An Overview

SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer

References

Karaoglu, S., Tao, R., Gevers, T., Smeulders, A.W.M.: Words matter: scene text for image classification and retrieval. IEEE Trans. Multimedia 19(5), 1063–1076 (2017)
Article Google Scholar
Wang, J., Tang, J., Luo, J. : Multimodal attention with image text spatial relationship for OCR-based image captioning. In: MM, pp. 4337–4345 (2020)
Google Scholar
Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: SIGKDD, pp. 71–79 (2018)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: ICCV, pp. 4714–4722 (2019)
Google Scholar
Dai, P., Zhang, H., Cao, X.: SLOAN: scale-adaptive orientation attention network for scene text recognition. IEEE Trans. Image Process. 30, 1687–1701 (2021)
Article Google Scholar
Xu, X., Chen, J., Xiao, J., Gao, L., Shen, F., Shen, H.T.: What machines see is not what they get: fooling scene text recognition models with adversarial text images. In: CVPR, vol. 311, pp. 12 301–12 (2020)
Google Scholar
Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In: CCS, pp. 1528–1540 (2016)
Google Scholar
Goswami, G., Ratha, N.K. Agarwal, A., Singh, R., Vatsa, M.: Unravelling robustness of deep learning based face recognition against adversarial attacks. In: AAAI, pp. 6829–6836 (2018)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Google Scholar
Yuan, X., He, P., Li, X., Wu, D.: Adaptive adversarial attack on scene text recognition. In: INFOCOM Workshops, pp. 358–363 (2020)
Google Scholar
Zha, M., Meng, G., Lin, C., Zhou, Z., Chen, K.: RoLMA: a practical adversarial attack against deep learning-based LPR systems. In: Inscrypt, pp. 101–117 (2019)
Google Scholar
Song, C., Shmatikov, V.: Fooling OCR systems with adversarial text images. In: CoRR, vol. abs/1802.05385 (2018)
Google Scholar
Su, J., Vargas, D.V., Kouichi, S.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comp. (2017)
Google Scholar
Chen, I., Sun, J., Xu, W.: FAWA: fast adversarial watermark attack on optical character recognition (OCR) systems. In: ECML-PKDD, pp. 547–563 (2020)
Google Scholar
Xu, X., Chen, J., Xiao, J., Wang, Z., Yang, Y., Shen, H.T.: Learning optimization-based adversarial perturbations for attacking sequential recognition models. In: MM, pp. 2802–2822 (2020)
Google Scholar
Yang, M., Zheng, H., Bai, X., Luo, J.: Cost-effective adversarial attacks against scene text recognition. In: ICPR, pp. 2368–2374 (2021)
Google Scholar
Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Gob. Optim. 11(4), 341–359 (1997)
Article MathSciNet Google Scholar
Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comp. 15(1), 4–31 (2010)
Article Google Scholar
Liu, W., Chen, C., Wong, K.-Y.K., Su, Z., Han, J.: STAR-Net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Information Security, The Institute of Information Engineering, The Chinese Academy of Sciences, Beijing, China
Yikun Xu, Pengwen Dai & Xiaochun Cao
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Yikun Xu, Pengwen Dai & Xiaochun Cao

Authors

Yikun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Pengwen Dai
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yikun Xu or Xiaochun Cao .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Y., Dai, P., Cao, X. (2021). Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham. https://doi.org/10.1007/978-3-030-92310-5_62

Download citation

DOI: https://doi.org/10.1007/978-3-030-92310-5_62
Published: 02 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92309-9
Online ISBN: 978-3-030-92310-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adversarial Attack on Scene Text Recognition Based on Adversarial Networks

Scene Text Recognition: An Overview

SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adversarial Attack on Scene Text Recognition Based on Adversarial Networks

Scene Text Recognition: An Overview

SAM: Self Attention Mechanism for Scene Text Recognition Based on Swin Transformer

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation