Skip to main content

Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1517))

Included in the following conference series:

Abstract

Scene text recognition (STR) has made tremendous progress in the era of deep learning. However, the attack of the sequential STR does not attract sufficient scholarly attention. The very few existing researches to fool STR belong to white-box attacks and thus would have limitations in practical applications. In this paper, we propose a novel black-box attack on STR models, only using the probability distribution of the model output. Instead of disturbing most pixels like existing STR attack methods, our proposed approach only disturbs very few pixels and utilizes own characteristics of recurrent neural networks (RNNs) to propagate perturbations. Experiments validate the effectiveness and superiority of our attack approach.

Y. Xu and P. Dai—Equal contribution.

This research work has been funded by the Guangdong United Fund (Grant No. E010061112), the Xinjiang Fund (Grant No. Y910071112) and the Key Fund of the Institute of Computing Technology of the Chinese Academy of Sciences (Grant No. Y950421112).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Karaoglu, S., Tao, R., Gevers, T., Smeulders, A.W.M.: Words matter: scene text for image classification and retrieval. IEEE Trans. Multimedia 19(5), 1063–1076 (2017)

    Article  Google Scholar 

  2. Wang, J., Tang, J., Luo, J. : Multimodal attention with image text spatial relationship for OCR-based image captioning. In: MM, pp. 4337–4345 (2020)

    Google Scholar 

  3. Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: SIGKDD, pp. 71–79 (2018)

    Google Scholar 

  4. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  5. Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: ICCV, pp. 4714–4722 (2019)

    Google Scholar 

  6. Dai, P., Zhang, H., Cao, X.: SLOAN: scale-adaptive orientation attention network for scene text recognition. IEEE Trans. Image Process. 30, 1687–1701 (2021)

    Article  Google Scholar 

  7. Xu, X., Chen, J., Xiao, J., Gao, L., Shen, F., Shen, H.T.: What machines see is not what they get: fooling scene text recognition models with adversarial text images. In: CVPR, vol. 311, pp. 12 301–12 (2020)

    Google Scholar 

  8. Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In: CCS, pp. 1528–1540 (2016)

    Google Scholar 

  9. Goswami, G., Ratha, N.K. Agarwal, A., Singh, R., Vatsa, M.: Unravelling robustness of deep learning based face recognition against adversarial attacks. In: AAAI, pp. 6829–6836 (2018)

    Google Scholar 

  10. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)

    Google Scholar 

  11. Yuan, X., He, P., Li, X., Wu, D.: Adaptive adversarial attack on scene text recognition. In: INFOCOM Workshops, pp. 358–363 (2020)

    Google Scholar 

  12. Zha, M., Meng, G., Lin, C., Zhou, Z., Chen, K.: RoLMA: a practical adversarial attack against deep learning-based LPR systems. In: Inscrypt, pp. 101–117 (2019)

    Google Scholar 

  13. Song, C., Shmatikov, V.: Fooling OCR systems with adversarial text images. In: CoRR, vol. abs/1802.05385 (2018)

    Google Scholar 

  14. Su, J., Vargas, D.V., Kouichi, S.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comp. (2017)

    Google Scholar 

  15. Chen, I., Sun, J., Xu, W.: FAWA: fast adversarial watermark attack on optical character recognition (OCR) systems. In: ECML-PKDD, pp. 547–563 (2020)

    Google Scholar 

  16. Xu, X., Chen, J., Xiao, J., Wang, Z., Yang, Y., Shen, H.T.: Learning optimization-based adversarial perturbations for attacking sequential recognition models. In: MM, pp. 2802–2822 (2020)

    Google Scholar 

  17. Yang, M., Zheng, H., Bai, X., Luo, J.: Cost-effective adversarial attacks against scene text recognition. In: ICPR, pp. 2368–2374 (2021)

    Google Scholar 

  18. Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Gob. Optim. 11(4), 341–359 (1997)

    Article  MathSciNet  Google Scholar 

  19. Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comp. 15(1), 4–31 (2010)

    Article  Google Scholar 

  20. Liu, W., Chen, C., Wong, K.-Y.K., Su, Z., Han, J.: STAR-Net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yikun Xu or Xiaochun Cao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, Y., Dai, P., Cao, X. (2021). Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham. https://doi.org/10.1007/978-3-030-92310-5_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92310-5_62

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92309-9

  • Online ISBN: 978-3-030-92310-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics