Abstract
Scene text recognition (STR) has made tremendous progress in the era of deep learning. However, the attack of the sequential STR does not attract sufficient scholarly attention. The very few existing researches to fool STR belong to white-box attacks and thus would have limitations in practical applications. In this paper, we propose a novel black-box attack on STR models, only using the probability distribution of the model output. Instead of disturbing most pixels like existing STR attack methods, our proposed approach only disturbs very few pixels and utilizes own characteristics of recurrent neural networks (RNNs) to propagate perturbations. Experiments validate the effectiveness and superiority of our attack approach.
Y. Xu and P. Dai—Equal contribution.
This research work has been funded by the Guangdong United Fund (Grant No. E010061112), the Xinjiang Fund (Grant No. Y910071112) and the Key Fund of the Institute of Computing Technology of the Chinese Academy of Sciences (Grant No. Y950421112).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Karaoglu, S., Tao, R., Gevers, T., Smeulders, A.W.M.: Words matter: scene text for image classification and retrieval. IEEE Trans. Multimedia 19(5), 1063–1076 (2017)
Wang, J., Tang, J., Luo, J. : Multimodal attention with image text spatial relationship for OCR-based image captioning. In: MM, pp. 4337–4345 (2020)
Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: SIGKDD, pp. 71–79 (2018)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: ICCV, pp. 4714–4722 (2019)
Dai, P., Zhang, H., Cao, X.: SLOAN: scale-adaptive orientation attention network for scene text recognition. IEEE Trans. Image Process. 30, 1687–1701 (2021)
Xu, X., Chen, J., Xiao, J., Gao, L., Shen, F., Shen, H.T.: What machines see is not what they get: fooling scene text recognition models with adversarial text images. In: CVPR, vol. 311, pp. 12 301–12 (2020)
Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In: CCS, pp. 1528–1540 (2016)
Goswami, G., Ratha, N.K. Agarwal, A., Singh, R., Vatsa, M.: Unravelling robustness of deep learning based face recognition against adversarial attacks. In: AAAI, pp. 6829–6836 (2018)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)
Yuan, X., He, P., Li, X., Wu, D.: Adaptive adversarial attack on scene text recognition. In: INFOCOM Workshops, pp. 358–363 (2020)
Zha, M., Meng, G., Lin, C., Zhou, Z., Chen, K.: RoLMA: a practical adversarial attack against deep learning-based LPR systems. In: Inscrypt, pp. 101–117 (2019)
Song, C., Shmatikov, V.: Fooling OCR systems with adversarial text images. In: CoRR, vol. abs/1802.05385 (2018)
Su, J., Vargas, D.V., Kouichi, S.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comp. (2017)
Chen, I., Sun, J., Xu, W.: FAWA: fast adversarial watermark attack on optical character recognition (OCR) systems. In: ECML-PKDD, pp. 547–563 (2020)
Xu, X., Chen, J., Xiao, J., Wang, Z., Yang, Y., Shen, H.T.: Learning optimization-based adversarial perturbations for attacking sequential recognition models. In: MM, pp. 2802–2822 (2020)
Yang, M., Zheng, H., Bai, X., Luo, J.: Cost-effective adversarial attacks against scene text recognition. In: ICPR, pp. 2368–2374 (2021)
Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Gob. Optim. 11(4), 341–359 (1997)
Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comp. 15(1), 4–31 (2010)
Liu, W., Chen, C., Wong, K.-Y.K., Su, Z., Han, J.: STAR-Net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Y., Dai, P., Cao, X. (2021). Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1517. Springer, Cham. https://doi.org/10.1007/978-3-030-92310-5_62
Download citation
DOI: https://doi.org/10.1007/978-3-030-92310-5_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92309-9
Online ISBN: 978-3-030-92310-5
eBook Packages: Computer ScienceComputer Science (R0)