skip to main content
10.1145/3581783.3612076acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

ProTegO: Protect Text Content against OCR Extraction Attack

Published: 27 October 2023 Publication History

Abstract

Online documents greatly improve the efficiency of information interaction but also cause potential security hazards, such as the ability to copy and reuse text content without authorization readily. To address copyright concerns, recent works have proposed converting reproducible text content into non-reproducible formats, making digital text content observable but not duplicable. However, as the Optical Character Recognition (OCR) technology develops, adversaries can still take screenshots of the target text region and use OCR to extract the text content. None of the existing methods can be well adapted to this kind of OCR extraction attack. In this paper, we propose "ProTegO'', a novel text content protection method against the OCR extraction attack, which generates adversarial underpaintings that do not affect human reading but can interfere with OCR after taking screenshots. Specifically, we design a text-style universal adversarial underpaintings generation framework, which can mislead both text recognition models and commercial OCR services. For invisibility, we take full advantage of the fusion property of human eyes and create complementary underpaintings to display alternatively on the screen. Experimental results demonstrate that ProTegO is a one-size-fits-all method that can ensure good visual quality while simultaneously achieving a high protection success rate on text recognition models with different architectures, outperforming the state-of-the-art methods. Furthermore, we validate the feasibility of ProTegO on a wide range of popular commercial OCR services, including Microsoft, Tencent, Alibaba, Huawei, Baidu, Apple, and Xiaomi. Codes will be available at https://github.com/Ruby-He/ProTegO.

Supplemental Material

MP4 File
Presentation video

References

[1]
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of the IEEE/CVF international conference on computer vision. 4715--4723.
[2]
Belval. 2020. TextRecognitionDataGenerator. https://github.com/Belval/ TextRecognitionDataGenerator.
[3]
Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar. 2018. Rosetta: Large scale system for text detection and recognition in images. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 71--79.
[4]
Nicholas Boucher, Ilia Shumailov, Ross Anderson, and Nicolas Papernot. 2022. Bad characters: Imperceptible nlp attacks. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 1987--2004.
[5]
Gary Bradski. 2000. The openCV library. Dr. Dobb's Journal: Software Tools for the Professional Programmer 25, 11 (2000), 120--123.
[6]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). Ieee, 39--57.
[7]
Lu Chen, Jiao Sun, and Wei Xu. 2020. FAWA: fast adversarial watermark attack on optical character recognition (OCR) systems. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 547--563.
[8]
Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2021. Text recognition in the wild: A survey. ACM Computing Surveys (CSUR) 54, 2 (2021), 1--35.
[9]
Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS workshop.
[10]
Hao Cui, Huanyu Bian, Weiming Zhang, and Nenghai Yu. 2019. Unseencode: Invisible on-screen barcode with image-based extraction. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1315--1323.
[11]
Han Fang, Dongdong Chen, Feng Wang, Zehua Ma, Honggu Liu, Wenbo Zhou, Weiming Zhang, and Nenghai Yu. 2021. TERA: Screen-to-Camera Image Code with Transparency, Efficiency, Robustness and Adaptability. IEEE Transactions on Multimedia 24 (2021), 955--967.
[12]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[13]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376.
[14]
Shawn Lawton Henry. 2012. Developing text customisation functionality require- ments of PDF reader and other user agents. In Computers Helping People with Special Needs: 13th International Conference, ICCHP 2012, Linz, Austria, July 11-13, 2012, Proceedings, Part I 13. Springer, 602--609.
[15]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to- Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 5967--5976.
[16]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681--4690.
[17]
Zhi Jian Li and Nuo Li. 2013. Investigation of reading background colour based on visual fatigue. In Applied Mechanics and Materials, Vol. 295. Trans Tech Publ, 536--538.
[18]
Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, and Xiang Bai. 2022. Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion. IEEE transactions on pattern analysis and machine intelligence PP (2022).
[19]
Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E Hopcroft. 2019. Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks. In International Conference on Learning Representations.
[20]
Wei Liu, Chaofeng Chen, Kwan-Yee K Wong, Zhizhong Su, and Junyu Han. 2016. Star-net: a spatial attention residue network for scene text recognition. In BMVC, Vol. 2. 7.
[21]
Xiyan Liu, Gaofeng Meng, and Chunhong Pan. 2019. Scene text detection and recognition with advances in deep learning: a survey. International Journal on Document Analysis and Recognition (IJDAR) 22, 2 (2019), 143--162.
[22]
Yanhong Liu, Fengming Cao, and Yuqi Zhang. 2022. Generative Adversarial Examples for Sequential Text Recognition Models with Artistic Text Style. In ICPRAM. 71--79.
[23]
Ian Markwood, Dakun Shen, Yao Liu, and Zhuo Lu. 2017. PDF mirage: content masking attack against information-based online services. In Proceedings of the 26th USENIX Conference on Security Symposium. 833--847.
[24]
Marino Menozzi, F Lang, U Naepflin, C Zeller, and H Krueger. 2001. CRT versus LCD: Effects of refresh rate, display technology and background luminance in visual performance. Displays 22, 3 (2001), 79--85.
[25]
Myndex. 2022. Accessible Perceptual Contrast Algorithm. https://github.com/ Myndex/apca-w3.
[26]
Viet Nguyen, Yaqin Tang, Ashwin Ashok, Marco Gruteser, Kristin Dana, Wenjun Hu, Eric Wengrowski, and Narayan Mandayam. 2016. High-rate flicker-free screen-camera communication with spatially adaptive embedding. In IEEE INFO- COM 2016-The 35th Annual IEEE International Conference on Computer Communi- cations. IEEE, 1--9.
[27]
Visual Contrast of Text Subgroup. 2021. Visual Contrast Whitepa- per. https://www.w3.org/WAI/GL/task-forces/silver/wiki/Visual_Contrast_ of_Text_Subgroup/Whitepaper.
[28]
Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).
[29]
Baoguang Shi, Xiang Bai, and Cong Yao. 2017. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (2017), 2298--2304.
[30]
Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2016. Robust scene text recognition with automatic rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4168--4176.
[31]
Congzheng Song and Vitaly Shmatikov. 2018. Fooling OCR systems with adversarial text images. arXiv preprint arXiv:1802.05385 (2018).
[32]
Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation 23, 5 (2019), 828--841.
[33]
Milad Taleby Ahvanooey, Qianmu Li, Hiuk Jae Shim, and Yanyan Huang. 2018. A comparative analysis of information hiding techniques for copyright protection of text documents. Security and Communication Networks 2018 (2018).
[34]
Anran Wang, Zhuoran Li, Chunyi Peng, Guobin Shen, Gan Fang, and Bing Zeng. 2015. Inframe achieve simultaneous screen-human viewing and hidden screen- camera communication. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services. 181--195.
[35]
Xiaosen Wang and Kun He. 2021. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1924--1933.
[36]
Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. 2018. Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610 (2018).
[37]
Xing Xu, Jiefu Chen, Jinhui Xiao, Lianli Gao, Fumin Shen, and Heng Tao Shen. 2020. What machines see is not what they get: Fooling scene text recognition models with adversarial text images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12304--12314.
[38]
Yikun Xu, Pengwen Dai, and Xiaochun Cao. 2021. Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations. In Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8-12, 2021, Proceedings, Part VI 28. Springer, 537--544.
[39]
Yikun Xu, Pengwen Dai, Zekun Li, Hongjun Wang, and Xiaochun Cao. 2023. The Best Protection is Attack: Fooling Scene Text Recognition With Minimal Pixels. IEEE Transactions on Information Forensics and Security 18 (2023), 1580--1595.
[40]
Mingkun Yang, Haitian Zheng, Xiang Bai, and Jiebo Luo. 2021. Cost-effective adversarial attacks against scene text recognition. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2368--2374.
[41]
Xiaoyong Yuan, Pan He, Xiaolin Lit, and Dapeng Wu. 2020. Adaptive adversarial attack on scene text recognition. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 358--363.

Cited By

View all
  • (2025)TextSafety: Visual Text Vanishing via Hierarchical Context-Aware Interaction ReconstructionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.352824920(1421-1433)Online publication date: 2025

Index Terms

  1. ProTegO: Protect Text Content against OCR Extraction Attack

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adversarial examples}
    2. keywords{optical character recognition (ocr)
    3. text protection

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 1,291 of 5,076 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)117
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)TextSafety: Visual Text Vanishing via Hierarchical Context-Aware Interaction ReconstructionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.352824920(1421-1433)Online publication date: 2025

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media