Abstract
Historical documents are mostly in printed format. Considering space requirements and physical inspection, their preservation and restoration are costly. Scanners can turn these materials into an electronic mode, producing images polluted with noise. As a result, there is a higher storage demand and worse OCR precision. To overcome this, the most appropriate choice is noise reduction. The low-resolution grayscaled image and binarization process reduces the input data source. Furthermore, hidden feature space is extracted based on binary pixel quantization by the KF-CM method to obtain the feature space from binary images. The local-minimal points in binarized image segments define the 33 variables in the preprocessing stage. Followed by preprocessing, the scanned document images point KF-CM method is described as grouping input image pixels into noise, text, and background categories based on their characteristics. Therefore, noise reduction and binarization were both completed at the same time. The proposed approach has binarized a noisy image's bit planes by choosing local thresholds. This approach is evaluated with the document image datasets and compared with widely used binarization-based existing feature extraction methods, wherein the proposed work outperforms all other methods.







Similar content being viewed by others
References
Thangamani, M., Thangaraj, P.: Fuzzy ontology for document clustering based on genetic algorithm. Appl. Math. Inf. Sci. 4(7), 1563–74 (2013)
Rajkumar, R., Dileepan, D., Chinmay, C., Suresh, P.: Modified minkowski fractal multiband antenna with circular-shaped split-ring resonator for wireless applications. Measurement 182, 109766 (2021). https://doi.org/10.1016/j.measurement.2021.109766
Markkandan, S., Malarvizhi, C., Raja, L., Kalloor, J., Karthi, J., Atla, R.: Highly compact-sized circular microstrip patch antenna with the partial ground for biomedical applications. Mater. Today: Proceedings 47, 318–320 (2021)
A. Farahmand, A. Sarrafzadeh, and J. Shanbehzadeh,: "Document image noises and removal methods," IMECS, Newswood Limited, 436–440, 2013.
Leonid, T.T., Jayaparvathy, R.: Statistical–model based voice activity identification for human-elephant conflict mitigation. J. Ambient Intell. Human. Comput. 12, 5269–5275 (2021). https://doi.org/10.1007/s12652-020-02005-y
Fan, K.C., Wang, Y.K., Lay, T.R.: Marginal noise removal of document images. Pattern Recognit. Soc. 35(11), 2593–2611 (2002)
W. Peerawit and A. Kawtrakul,: "Marginal Noise Removal from Document Images Using Edge Density," Proc. Fourth Information and Computer Eng. Postgraduate Workshop, Jan. 2004.
Shafait, Faisal, van Beusekom, Joost, Keysers, Daniel, Breuel, Thomas M.: Document cleanup using page frame detection. IJDAR 11(2), 81–96 (2008)
F. Shafait and T. M. Breuel,: "A simple and effective approach for border noise removal from document images," Proceedings. 13th IEEE Int’l Multi-Topic Conf., Dec. 2009.
Garateguy, G.J., Arce, G.R., Lau, D.L., Villarreal, O.P.: QR images: optimized image embedding in QR codes. IEEE trans. image process. 23(7), 2842–2853 (2014)
Q. Zhang and B. Li,: "Discriminative K-SVD for dictionary learning in face recognition," in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, Jun. 2010 - Jun. 2010, pp. 2691–2698.
Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE trans. image process. 19(11), 2861–2873 (2010)
Y. Bengio and O. Delalleau,: “On the Expressive Power of Deep Architectures,” in Lecture Notes in Computer Science, Algorithmic Learning Theory, J. Kivinen, C. Szepesvári, E. Ukkonen, and T. Zeugmann, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 18–36.
RFMoghaddam, MCheriet: RSLDI: restoration of single-sided low-quality document images. Pattern Recognit. 42(12), 3355–3364 (2009)
Jia, Fuxi, Shi, Cunzhao, He, Kun, Wang, Chunheng, Xiao, Baihua: Degraded document image binarization using structural symmetry of strokes. Pattern Recogniti. 74, 225–240 (2018)
Ranjan Mondal, Sanchayan Santra, and Bhabatosh Chanda.: Dense morphological network: An universal function approximator. arXiv preprint arXiv:1901.00109, 2019.
M. Valizadeh, N. Armanfard, M. Komeili, and E. Kabir,: "A novel hybrid algorithm for binarization of badly illuminated document images," in Proceedings of the 14th International CSI Computer Conference (CSICC' 09), pp. 121–126, Tehran, Iran, October 2009.
Y.-F. Chang, Y.-T. Pai, and S.-J. Ruan,: "An efficient thresholding algorithm for degraded document images based on intelligent block detection," in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC' 08), pp. 667– 672, October 2009.
B. Gatos, I. Pratikakis, and S. J. Perantonis,: "Efficient binarization of historical and degraded document images," in Proceedings of the 8th IAPR International Workshop on Document Analysis Systems (DAS' 08), pp. 447–454, September 2008.
H. Cao, R. Prasad, and P. Natarajan.: "A stroke regeneration method for cleaning rule lines in handwritten document images," In MOCR '09: Proceedings of the International Workshop on Multilingual OCR, pages 1–10, New York, NY, USA, 2009.
Zhixin Shi, Srirangaraj Setlur, Venu Govindaraju, "Removing RuleLines from Binary Handwritten Arabic Document Images Using Directional Local Profile," ICPR 2010: 1916-1919.
M. Agarwal, D. Doermann, "Clutter noise removal in binary document images," in [Proc. Intl. Conf. on Document Analysis and Recognition], 556–560 (2009).
M. Agrawal, D. S. Doermann: "Stroke-Like Pattern Noise Removal in Binary Document Images," ICDAR 2011: 17-21.
Kim, J.Y., Kim, L.S., Hwang, S.H.: An advanced contrast enhancement using partially overlapped sub-block histogram equalization. IEEE Trans. Circuits Syst. Video Technol. 11, 475–484 (2006)
R. Parvathi, S. K. Jayanthi, N. Palaniappan, S. Devi, "Intuitionistic Fuzzy approach to Enhance Text Documents," Proceedings -3rd IEEE International Conference on Intelligent Systems (IEEE IS' 06), 2006, p733-737.
Nomura, S., Yamanaka, K., Shiose, T., Kawakami, H., Katai, O.: Morphological preprocessing method to thresholding degraded word images. Pattern recognition letters 30(8), 729–744 (2009)
H. Deborah and A. M. Arymurthy, "Image Enhancement and Image Restoration for Old Document Image using Genetic Algorithm," Proceedings of Second International Conference on Advances in Computing, Control and Telecommunication Technologies (ACT 2010), p 108-12, 2010.
Funding
The authors received no specific funding for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
The manuscript has not been submitted to more than one journal for simultaneous consideration. The manuscript has not been published previously. The research does not involve human participants and/or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Umadevi, K.S., Thakare, K.S., Patil, S. et al. Dynamic hidden feature space detection of noisy image set by weight binarization. SIViP 17, 761–768 (2023). https://doi.org/10.1007/s11760-022-02284-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-022-02284-2