Skip to main content

Advertisement

Log in

Multi-modal text recognition and encryption in scanned document images

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Many military and business documents such as memorandums, invoices, medical records and bills among others are transmitted over the network in the form of images. These scanned document images may contain some private information that must not be accessed by third party or unauthorized users. Most of the encryption techniques encrypt the complete image but only parts of it require protection, which increases the computational and overhead communication cost. In this paper, we introduce a novel technique for selective multi-modal text recognition and encryption in scanned document images to overcome the issue of security and efficiency. Maximally stable extremal regions (MSER) are used in combination with two filtering techniques to automatically detect regions of interest (ROI) that contain text. Proposed technique combines optical character recognition (OCR) and natural language processing (NLP) in text recognition thus taking advantage of different modalities of data. Multi-modal recognized text is then encrypted using advanced encryption standard in cipher-block chaining mode (AES-CBC) and hybrid chaotic map. Simulation results illustrate the reliability and efficiency of proposed encryption scheme based on ROI. Moreover, security analysis exhibits the robustness of proposed encryption technique against linear and differential cryptographic attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availibility statement

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Lin CH, Hu GH, Chan CY, Yan JJ (2021) Chaos-Based Synchronized Dynamic Keys and Their Application to Image Encryption with an Improved AES Algorithm. Appl Sci 11(3):1329

    Article  Google Scholar 

  2. Gaur G, Sharma JB, Tharani L (2021) Verilog Implementation of Biometric-Based Transmission of Fused Images Using Data Encryption Standards Algorithm. Nanoelectron Circuits Commun Syst 692:456–457

    Google Scholar 

  3. Al-kadei FHM, Mardan HA, Minas NA (2020) Speed Up Image Encryption by Using RSA Algorithm. In:6th International Conference on Advanced Computing and Communication Systems

  4. Zhang X, Wang X (2018) Digital image encryption algorithm based on elliptic curve public cryptosystem. IEEE Access 6:70025–70034

    Article  Google Scholar 

  5. Shor PW (1997) Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. Soc Ind Appl Math 26(5):184–1509

    MathSciNet  MATH  Google Scholar 

  6. Anthimopoulos M, Gatos B, Pratikakis I (2007) Multiresolution text detection in video frames, In: International Conference Signal Processing, Pattern Recognition and Applications, pp. 14–17

  7. Bhateja V, Devi S, Urooj S (2013) An evaluation of edge detection algorithms for mammographic calcifications, In: Proceedings of the Fourth International Conference on Signal and Image Processing, pp. 487–498

  8. Shivakumara P, Phan TQ, Tan CL (2009) Video text detection based on filters and edge features, In: IEEE International Conference on Multimedia and Expo, pp. 514–517

  9. Jeong M, Jo KH (2015) Multi language text detection using fast stroke width transform. In: 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision, pp. 1–4

  10. Aradhya VNM, Pavithra MS (2013) An application of k-means clustering for improving video text detection. Adv Intell Syst Comput 182:41–47

    Google Scholar 

  11. Wu W, Chen X, Yang J (2005) Detection of text on road signs from video. IEEE Trans Intell Transp Syst 6(4):378–390

    Article  Google Scholar 

  12. Phan TQ, Shivakumara, Tan CL (2009) A Laplacian method for video text detection. In: 10th International Conference on Document Analysis and Recognition, pp. 66–70

  13. Ma XH, Ng WW, Chan PP, Yeung DS (2010) Video text detection and localization based on localized generalization error model, In: International Conference on Machine Learning Cybernet, vol 4, pp 2161–2166

  14. Matas J, Chuma O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767

    Article  Google Scholar 

  15. He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541

    Article  MathSciNet  MATH  Google Scholar 

  16. Zhou X, Yao C, Wen H, Wang Y Y, Zhou S S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5551–5560

  17. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  18. Ji Z, Wang J, Su YT (2009) Text detection in video frames using hybrid features. In: International Conference on Machine Learning Cybernet 1:318–322

  19. Zhen W, Zhiqiang W (2009) A comparative study of feature selection for SVM in video text detection. In: Second International Symposium on Computational Intelligence and Design, 2:552–556

  20. Miao G, Huang Q, Jiang S, Gao W (2008) Coarse-to-fine video text detection, In: IEEE International Conference on Multimedia and Expo, pp 569–572

  21. Li X, Wang W, Jiang S, Huang Q, Gao W (2008) Fast and effective text detection. In: 15th IEEE International Conference on Image Processing, pp 969–972

  22. Zhao Y, Lu T, Liao W (2011) A robust color-independent text detection method from complex videos. In: International Conference on Document Analysis and Recognition, pp 374–378

  23. Khare VV, Shivakumara P, Raveendran P (2015) A new histogram oriented moments descriptor for multi-oriented moving text detection in video’’. Exp Syst Appl 42(21):7627–7640

    Article  Google Scholar 

  24. Mousavirad SJ, Ebrahimpour-Komleh H (2017) Multilevel image thresholding using entropy of histogram and recently developed population-based metaheuristic algorithms. Evolut Intell 10(1–2):45–75

    Article  Google Scholar 

  25. Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using color channel selection. Multimed Tools Appl 77(7):8551–8578

    Article  Google Scholar 

  26. Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B (2011) Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: 18th IEEE International Conference on Image Processing, pp 2609–2612

  27. “MATLAB OCR-function,” https://se.mathworks.com/help/vision/ref/ocr.html Accessed: 2021-03-01

  28. Alginahi Y (2010) Preprocessing techniques in character recognition. Character Recognit 1:1–19

    Google Scholar 

  29. Campos T, Rakesh B, Varma M (2009) Character recognition in natural images. VISAPP 2(7)

  30. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2963–2970

  31. Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30:3–26

    Article  Google Scholar 

  32. Katti AR, Reisswig C, Guder C, Brarda S, Bickel S, Hohne J, Faddoul JB (2018) Towards understanding 2d documents. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 4459–4469

  33. Denk T, Reisswig C (2019) Contextualized embedding for 2d document representation and understanding, arxiv:1909.04948

  34. Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M (2020) Pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  35. Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. NAACL-HLT 1:4171–4186

    Google Scholar 

  36. Yang X, Yumer E, Asente P, Kraley M, Kifer D, Giles CL (2017) Learning to extract semantic structure from documents using multi-modal fully convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition

  37. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. NAACL

  38. Shivakumara P, Phan TQ, Tan CL (2010) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419

    Article  Google Scholar 

  39. Daemen J, Rijmen V (1999) “AES Proposal: Rijndael”

  40. Hua Z, Zhou Y, Huang H (2019) Cosine-transform-based chaotic system for image encryption. Inf Sci 480:403–419

    Article  Google Scholar 

  41. Arif J, Khan MA, Ghaleb B, Ahmad J, Munir A, Rashid U, Al-Dubai AY (2022) A novel chaotic permutation-substitution image encryption scheme based on logistic map and random substitution. IEEE Access 10:12966–12982

    Article  Google Scholar 

  42. Alvarez G, Li S (2006) Some basic cryptographic requirements for chaos-based cryptosystems. Int J Bifurc Chaos 16(8):2129–2151

    Article  MathSciNet  MATH  Google Scholar 

  43. Tsafack N, Kengne J (2019) Multiple coexisting attractors in a generalized Chua’s circuit with a smoothly adjustable symmetry and nonlinearity. J Phys Math 10(298):0902–2090

    Google Scholar 

  44. Verma R, Sharma AK (2020) Cryptography: avalanche effect of AES and RSA. Int J Sci Res Publ 10(4):119–25

    Google Scholar 

  45. Castro JCH, Sierra JM, Seznec A, Izquierdo A, Ribagorda A (2005) The strict avalanche criterion randomness test. Math Comput Simul 68(1):1–7

    Article  MathSciNet  MATH  Google Scholar 

  46. Wu Y, Noonan JP, Agaian S (2011) NPCR And UACI randomness tests for image encryption. Cyber J: J Sel Areas in Telecommun(JSAT) 1:31–38

    Google Scholar 

  47. Zeghid M, Machhout M, Khriji L, Baganne A, Tourki R (2007) A modified AES based algorithm for image encryption. World Acad Sci Eng Technol 27:206–211

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Ghafoor.

Ethics declarations

Conflict of interest

All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version. This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue. The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kayani, M., Ghafoor, A. & Riaz, M.M. Multi-modal text recognition and encryption in scanned document images. J Supercomput 79, 7916–7936 (2023). https://doi.org/10.1007/s11227-022-04912-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04912-7