research-article

ProTegO: Protect Text Content against OCR Extraction Attack

Authors:

Nenghai YuAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 7424 - 7434

https://doi.org/10.1145/3581783.3612076

Published: 27 October 2023 Publication History

Abstract

Online documents greatly improve the efficiency of information interaction but also cause potential security hazards, such as the ability to copy and reuse text content without authorization readily. To address copyright concerns, recent works have proposed converting reproducible text content into non-reproducible formats, making digital text content observable but not duplicable. However, as the Optical Character Recognition (OCR) technology develops, adversaries can still take screenshots of the target text region and use OCR to extract the text content. None of the existing methods can be well adapted to this kind of OCR extraction attack. In this paper, we propose "ProTegO'', a novel text content protection method against the OCR extraction attack, which generates adversarial underpaintings that do not affect human reading but can interfere with OCR after taking screenshots. Specifically, we design a text-style universal adversarial underpaintings generation framework, which can mislead both text recognition models and commercial OCR services. For invisibility, we take full advantage of the fusion property of human eyes and create complementary underpaintings to display alternatively on the screen. Experimental results demonstrate that ProTegO is a one-size-fits-all method that can ensure good visual quality while simultaneously achieving a high protection success rate on text recognition models with different architectures, outperforming the state-of-the-art methods. Furthermore, we validate the feasibility of ProTegO on a wide range of popular commercial OCR services, including Microsoft, Tencent, Alibaba, Huawei, Baidu, Apple, and Xiaomi. Codes will be available at https://github.com/Ruby-He/ProTegO.

Supplemental Material

MP4 File

Presentation video

Download
157.69 MB

References

[1]

Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What is wrong with scene text recognition model comparisons? dataset and model analysis. In Proceedings of the IEEE/CVF international conference on computer vision. 4715--4723.

[2]

Belval. 2020. TextRecognitionDataGenerator. https://github.com/Belval/ TextRecognitionDataGenerator.

[3]

Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar. 2018. Rosetta: Large scale system for text detection and recognition in images. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 71--79.

Digital Library

[4]

Nicholas Boucher, Ilia Shumailov, Ross Anderson, and Nicolas Papernot. 2022. Bad characters: Imperceptible nlp attacks. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 1987--2004.

[5]

Gary Bradski. 2000. The openCV library. Dr. Dobb's Journal: Software Tools for the Professional Programmer 25, 11 (2000), 120--123.

[6]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp). Ieee, 39--57.

[7]

Lu Chen, Jiao Sun, and Wei Xu. 2020. FAWA: fast adversarial watermark attack on optical character recognition (OCR) systems. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 547--563.

[8]

Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2021. Text recognition in the wild: A survey. ACM Computing Surveys (CSUR) 54, 2 (2021), 1--35.

Digital Library

[9]

Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS workshop.

[10]

Hao Cui, Huanyu Bian, Weiming Zhang, and Nenghai Yu. 2019. Unseencode: Invisible on-screen barcode with image-based extraction. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1315--1323.

Digital Library

[11]

Han Fang, Dongdong Chen, Feng Wang, Zehua Ma, Honggu Liu, Wenbo Zhou, Weiming Zhang, and Nenghai Yu. 2021. TERA: Screen-to-Camera Image Code with Transparency, Efficiency, Robustness and Adaptability. IEEE Transactions on Multimedia 24 (2021), 955--967.

[12]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).

[13]

Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. 369--376.

Digital Library

[14]

Shawn Lawton Henry. 2012. Developing text customisation functionality require- ments of PDF reader and other user agents. In Computers Helping People with Special Needs: 13th International Conference, ICCHP 2012, Linz, Austria, July 11-13, 2012, Proceedings, Part I 13. Springer, 602--609.

Digital Library

[15]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to- Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 5967--5976.

[16]

Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681--4690.

[17]

Zhi Jian Li and Nuo Li. 2013. Investigation of reading background colour based on visual fatigue. In Applied Mechanics and Materials, Vol. 295. Trans Tech Publ, 536--538.

[18]

Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, and Xiang Bai. 2022. Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion. IEEE transactions on pattern analysis and machine intelligence PP (2022).

[19]

Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E Hopcroft. 2019. Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks. In International Conference on Learning Representations.

[20]

Wei Liu, Chaofeng Chen, Kwan-Yee K Wong, Zhizhong Su, and Junyu Han. 2016. Star-net: a spatial attention residue network for scene text recognition. In BMVC, Vol. 2. 7.

[21]

Xiyan Liu, Gaofeng Meng, and Chunhong Pan. 2019. Scene text detection and recognition with advances in deep learning: a survey. International Journal on Document Analysis and Recognition (IJDAR) 22, 2 (2019), 143--162.

Digital Library

[22]

Yanhong Liu, Fengming Cao, and Yuqi Zhang. 2022. Generative Adversarial Examples for Sequential Text Recognition Models with Artistic Text Style. In ICPRAM. 71--79.

[23]

Ian Markwood, Dakun Shen, Yao Liu, and Zhuo Lu. 2017. PDF mirage: content masking attack against information-based online services. In Proceedings of the 26th USENIX Conference on Security Symposium. 833--847.

[24]

Marino Menozzi, F Lang, U Naepflin, C Zeller, and H Krueger. 2001. CRT versus LCD: Effects of refresh rate, display technology and background luminance in visual performance. Displays 22, 3 (2001), 79--85.

[25]

Myndex. 2022. Accessible Perceptual Contrast Algorithm. https://github.com/ Myndex/apca-w3.

[26]

Viet Nguyen, Yaqin Tang, Ashwin Ashok, Marco Gruteser, Kristin Dana, Wenjun Hu, Eric Wengrowski, and Narayan Mandayam. 2016. High-rate flicker-free screen-camera communication with spatially adaptive embedding. In IEEE INFO- COM 2016-The 35th Annual IEEE International Conference on Computer Communi- cations. IEEE, 1--9.

Digital Library

[27]

Visual Contrast of Text Subgroup. 2021. Visual Contrast Whitepa- per. https://www.w3.org/WAI/GL/task-forces/silver/wiki/Visual_Contrast_ of_Text_Subgroup/Whitepaper.

[28]

Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).

[29]

Baoguang Shi, Xiang Bai, and Cong Yao. 2017. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (2017), 2298--2304.

Digital Library

[30]

Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2016. Robust scene text recognition with automatic rectification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4168--4176.

[31]

Congzheng Song and Vitaly Shmatikov. 2018. Fooling OCR systems with adversarial text images. arXiv preprint arXiv:1802.05385 (2018).

[32]

Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation 23, 5 (2019), 828--841.

[33]

Milad Taleby Ahvanooey, Qianmu Li, Hiuk Jae Shim, and Yanyan Huang. 2018. A comparative analysis of information hiding techniques for copyright protection of text documents. Security and Communication Networks 2018 (2018).

[34]

Anran Wang, Zhuoran Li, Chunyi Peng, Guobin Shen, Gan Fang, and Bing Zeng. 2015. Inframe achieve simultaneous screen-human viewing and hidden screen- camera communication. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services. 181--195.

Digital Library

[35]

Xiaosen Wang and Kun He. 2021. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1924--1933.

[36]

Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. 2018. Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610 (2018).

Digital Library

[37]

Xing Xu, Jiefu Chen, Jinhui Xiao, Lianli Gao, Fumin Shen, and Heng Tao Shen. 2020. What machines see is not what they get: Fooling scene text recognition models with adversarial text images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12304--12314.

[38]

Yikun Xu, Pengwen Dai, and Xiaochun Cao. 2021. Less Is Better: Fooling Scene Text Recognition with Minimal Perturbations. In Neural Information Processing: 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8-12, 2021, Proceedings, Part VI 28. Springer, 537--544.

[39]

Yikun Xu, Pengwen Dai, Zekun Li, Hongjun Wang, and Xiaochun Cao. 2023. The Best Protection is Attack: Fooling Scene Text Recognition With Minimal Pixels. IEEE Transactions on Information Forensics and Security 18 (2023), 1580--1595.

Digital Library

[40]

Mingkun Yang, Haitian Zheng, Xiang Bai, and Jiebo Luo. 2021. Cost-effective adversarial attacks against scene text recognition. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2368--2374.

[41]

Xiaoyong Yuan, Pan He, Xiaolin Lit, and Dapeng Wu. 2020. Adaptive adversarial attack on scene text recognition. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 358--363.

Cited By

Dai PLi JWu DZheng PCao X(2025)TextSafety: Visual Text Vanishing via Hierarchical Context-Aware Interaction ReconstructionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.352824920(1421-1433)Online publication date: 2025
https://doi.org/10.1109/TIFS.2025.3528249

Index Terms

ProTegO: Protect Text Content against OCR Extraction Attack
1. Security and privacy
  1. Security services
    1. Digital rights management

Recommendations

OCR of printed telugu text with high recognition accuracies
ICVGIP'06: Proceedings of the 5th Indian conference on Computer Vision, Graphics and Image Processing

Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Development of Optical Character Recognition systems for Telugu text is an area of current research.

OCR of Indian scripts is ...
Development of OCR Techniques for Handwritten Bangla Text: OCR Techniques for Bangla Text
Prototype extraction and adaptive ocr

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 1,291 of 5,076 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
206
Total Downloads

Downloads (Last 12 months)117
Downloads (Last 6 weeks)9

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dai PLi JWu DZheng PCao X(2025)TextSafety: Visual Text Vanishing via Hierarchical Context-Aware Interaction ReconstructionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.352824920(1421-1433)Online publication date: 2025
https://doi.org/10.1109/TIFS.2025.3528249

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten