Abstract
Due to the presence of struck-out handwritten words in document images, the performance of different methods degrades for several important applications, such as handwriting recognition, writer, gender, fraudulent document identification, document age estimation, writer age estimation, normal/abnormal behavior of person analysis, and descriptive answer evaluation. This work proposes a new method which combines connected component analysis for text component detection and deep learning for classification of struck-out and non-struck-out words. For text component detection, the proposed method finds the stroke width to detect edges of texts in images, and then performs smoothing operations to remove noise. Furthermore, morphological operations are performed on smoothed images to label connected components as text by fixing bounding boxes. Inspired by the great success of deep learning models, we explore DenseNet for classifying struck-out and non-struck-out handwritten components by considering text components as input. Experimental results on our dataset demonstrate the proposed method outperforms the existing methods in terms of classification rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chaudhuri, B.B., Adak, C.: An approach for detecting and cleaning of struck-out handwritten text. Pattern Recogn. 61, 282–294 (2017)
Brink, A., Klauw, H.V.D., Schomaker, L.: Automatic removal of crossed-out handwritten text and the effect on writer verification and identification. In: Proceedings of SPIE (2008)
Shivakumara, P., Pal, U., Lu, T., Chakarborti, T., Blumenstein, M.: A new roadmap for evaluating descriptive handwritten answer script. In: Proceedings of ICPRAI, pp. 83–96 (2019)
Navya, B.J., et al.: Multi-gradient directional features for gender identification. In: Proceedings of ICPR, pp. 3657–3662 (2018)
Raghunandan, K.S., et al.: Fourier coefficients for fraud handwritten document identification through age analysis. In: Proceedings of ICFHR, pp. 25–30 (2016)
Basavaraj, V., Shivakumara, P., Guru, D.S., Pal, U., Lu, T., Blumenstein, M.: Age estimation using disconnectedness features in handwriting. In: Proceedings of ICDAR, pp. 1131–1136 (2019)
Nag, S., Shivakumara, P., Wu, Y., Pal, U., Lu, T.: New COLD feature based handwriting analysis for ethnicity/nationality identification. In: Proceedings of ICFHR, pp. 523–527 (2018)
Nandanwar, L., et al.: A new method for detecting altered text in document images. In: Proceedings of ICPRAI, pp. 93–108 (2020)
Kundu, S., Shivakumara, P., Grouver, A., Pal, U., Lu, T., Blumenstein, M.: A new forged handwriting detection method based on Fourier spectral density and variation. In: Proceedings of ACPR, pp. 136–150 (2019)
Adak, C., Chaudhuri, B.B.: An approach of strike-out text identification from handwritten documents. In: Proceedings of ICFHR, pp. 643–648 (2014)
Adak, C., Chaudhuri, B.B., Blumenstein, M.: Impact of struck-out text on writer identification. In: Proceedings of IJCNN, pp. 1465–1471 (2017)
Nisa, H., Thom, J.A., Ciesielski, V., Tennakoon, R.: A deep learning approach to handwritten text recognition in the presence of struck-out text. In: Proceedings of IVCNZ (2019)
Qi, Y., Huang, W.R., Li, Q., DeGange, J.L.: DeepErase: weakly supervised ink artifact removal in document text images. In: Proceedings of WACV, pp. 3511–3519 (2020)
Bhattacharya, N., Frinken, V., Pal, U., Roy, P.P.: Overwriting repetition and crossing-out detection in online handwritten text. In: ACPR 2015, pp. 680–684 (2015)
Wan, Z., Yuxiang, Z., Gong, X.Z., Yu, B.: DenseNet model with RAdam optimization algorithm for cancer image classification. In: Proceedings of ICCECE, pp. 771–775 (2021)
Tong, W., Chen, W., Han, W., Li, X., Wang, L.: Channel-attention-based DenseNet network for remote sensing image scene classification. IEEE Trans. AEORS 13, 4121–4132 (2020).
Marti, U., Bunke, H.: The IAM-database: an english sentence database for off-line handwriting recognition. IJDAR 5, 39–46 (2002)
Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. PAMI 34, 211–224 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Shivakumara, P. et al. (2021). A Connected Component-Based Deep Learning Model for Multi-type Struck-Out Component Classification. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-86159-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86158-2
Online ISBN: 978-3-030-86159-9
eBook Packages: Computer ScienceComputer Science (R0)