Abstract
Many military and business documents such as memorandums, invoices, medical records and bills among others are transmitted over the network in the form of images. These scanned document images may contain some private information that must not be accessed by third party or unauthorized users. Most of the encryption techniques encrypt the complete image but only parts of it require protection, which increases the computational and overhead communication cost. In this paper, we introduce a novel technique for selective multi-modal text recognition and encryption in scanned document images to overcome the issue of security and efficiency. Maximally stable extremal regions (MSER) are used in combination with two filtering techniques to automatically detect regions of interest (ROI) that contain text. Proposed technique combines optical character recognition (OCR) and natural language processing (NLP) in text recognition thus taking advantage of different modalities of data. Multi-modal recognized text is then encrypted using advanced encryption standard in cipher-block chaining mode (AES-CBC) and hybrid chaotic map. Simulation results illustrate the reliability and efficiency of proposed encryption scheme based on ROI. Moreover, security analysis exhibits the robustness of proposed encryption technique against linear and differential cryptographic attacks.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availibility statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Lin CH, Hu GH, Chan CY, Yan JJ (2021) Chaos-Based Synchronized Dynamic Keys and Their Application to Image Encryption with an Improved AES Algorithm. Appl Sci 11(3):1329
Gaur G, Sharma JB, Tharani L (2021) Verilog Implementation of Biometric-Based Transmission of Fused Images Using Data Encryption Standards Algorithm. Nanoelectron Circuits Commun Syst 692:456–457
Al-kadei FHM, Mardan HA, Minas NA (2020) Speed Up Image Encryption by Using RSA Algorithm. In:6th International Conference on Advanced Computing and Communication Systems
Zhang X, Wang X (2018) Digital image encryption algorithm based on elliptic curve public cryptosystem. IEEE Access 6:70025–70034
Shor PW (1997) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. Soc Ind Appl Math 26(5):184–1509
Anthimopoulos M, Gatos B, Pratikakis I (2007) Multiresolution text detection in video frames, In: International Conference Signal Processing, Pattern Recognition and Applications, pp. 14–17
Bhateja V, Devi S, Urooj S (2013) An evaluation of edge detection algorithms for mammographic calcifications, In: Proceedings of the Fourth International Conference on Signal and Image Processing, pp. 487–498
Shivakumara P, Phan TQ, Tan CL (2009) Video text detection based on filters and edge features, In: IEEE International Conference on Multimedia and Expo, pp. 514–517
Jeong M, Jo KH (2015) Multi language text detection using fast stroke width transform. In: 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision, pp. 1–4
Aradhya VNM, Pavithra MS (2013) An application of k-means clustering for improving video text detection. Adv Intell Syst Comput 182:41–47
Wu W, Chen X, Yang J (2005) Detection of text on road signs from video. IEEE Trans Intell Transp Syst 6(4):378–390
Phan TQ, Shivakumara, Tan CL (2009) A Laplacian method for video text detection. In: 10th International Conference on Document Analysis and Recognition, pp. 66–70
Ma XH, Ng WW, Chan PP, Yeung DS (2010) Video text detection and localization based on localized generalization error model, In: International Conference on Machine Learning Cybernet, vol 4, pp 2161–2166
Matas J, Chuma O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767
He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541
Zhou X, Yao C, Wen H, Wang Y Y, Zhou S S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5551–5560
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20
Ji Z, Wang J, Su YT (2009) Text detection in video frames using hybrid features. In: International Conference on Machine Learning Cybernet 1:318–322
Zhen W, Zhiqiang W (2009) A comparative study of feature selection for SVM in video text detection. In: Second International Symposium on Computational Intelligence and Design, 2:552–556
Miao G, Huang Q, Jiang S, Gao W (2008) Coarse-to-fine video text detection, In: IEEE International Conference on Multimedia and Expo, pp 569–572
Li X, Wang W, Jiang S, Huang Q, Gao W (2008) Fast and effective text detection. In: 15th IEEE International Conference on Image Processing, pp 969–972
Zhao Y, Lu T, Liao W (2011) A robust color-independent text detection method from complex videos. In: International Conference on Document Analysis and Recognition, pp 374–378
Khare VV, Shivakumara P, Raveendran P (2015) A new histogram oriented moments descriptor for multi-oriented moving text detection in video’’. Exp Syst Appl 42(21):7627–7640
Mousavirad SJ, Ebrahimpour-Komleh H (2017) Multilevel image thresholding using entropy of histogram and recently developed population-based metaheuristic algorithms. Evolut Intell 10(1–2):45–75
Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using color channel selection. Multimed Tools Appl 77(7):8551–8578
Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 18th IEEE International Conference on Image Processing, pp 2609–2612
“MATLAB OCR-function,” https://se.mathworks.com/help/vision/ref/ocr.html Accessed: 2021-03-01
Alginahi Y (2010) Preprocessing techniques in character recognition. Character Recognit 1:1–19
Campos T, Rakesh B, Varma M (2009) Character recognition in natural images. VISAPP 2(7)
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2963–2970
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30:3–26
Katti AR, Reisswig C, Guder C, Brarda S, Bickel S, Hohne J, Faddoul JB (2018) Towards understanding 2d documents. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4459–4469
Denk T, Reisswig C (2019) Contextualized embedding for 2d document representation and understanding, arxiv:1909.04948
Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M (2020) Pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. NAACL-HLT 1:4171–4186
Yang X, Yumer E, Asente P, Kraley M, Kifer D, Giles CL (2017) Learning to extract semantic structure from documents using multi-modal fully convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. NAACL
Shivakumara P, Phan TQ, Tan CL (2010) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419
Daemen J, Rijmen V (1999) “AES Proposal: Rijndael”
Hua Z, Zhou Y, Huang H (2019) Cosine-transform-based chaotic system for image encryption. Inf Sci 480:403–419
Arif J, Khan MA, Ghaleb B, Ahmad J, Munir A, Rashid U, Al-Dubai AY (2022) A novel chaotic permutation-substitution image encryption scheme based on logistic map and random substitution. IEEE Access 10:12966–12982
Alvarez G, Li S (2006) Some basic cryptographic requirements for chaos-based cryptosystems. Int J Bifurc Chaos 16(8):2129–2151
Tsafack N, Kengne J (2019) Multiple coexisting attractors in a generalized Chua’s circuit with a smoothly adjustable symmetry and nonlinearity. J Phys Math 10(298):0902–2090
Verma R, Sharma AK (2020) Cryptography: avalanche effect of AES and RSA. Int J Sci Res Publ 10(4):119–25
Castro JCH, Sierra JM, Seznec A, Izquierdo A, Ribagorda A (2005) The strict avalanche criterion randomness test. Math Comput Simul 68(1):1–7
Wu Y, Noonan JP, Agaian S (2011) NPCR And UACI randomness tests for image encryption. Cyber J: J Sel Areas in Telecommun(JSAT) 1:31–38
Zeghid M, Machhout M, Khriji L, Baganne A, Tourki R (2007) A modified AES based algorithm for image encryption. World Acad Sci Eng Technol 27:206–211
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue. The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kayani, M., Ghafoor, A. & Riaz, M.M. Multi-modal text recognition and encryption in scanned document images. J Supercomput 79, 7916–7936 (2023). https://doi.org/10.1007/s11227-022-04912-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04912-7