Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features

Ghosh, Manosij; Ghosh, Kushal Kanti; Bhowmik, Showmik; Sarkar, Ram

doi:10.1007/s11042-020-09844-z

Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features

Published: 19 September 2020

Volume 80, pages 3229–3249, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Manosij Ghosh¹,
Kushal Kanti Ghosh¹,
Showmik Bhowmik ORCID: orcid.org/0000-0003-3971-5807² &
…
Ram Sarkar¹

169 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Text non-text classification is an important research problem in the domain of document image processing. Undesirably, this is an almost ignored research topic, particularly, when we consider the unconstrained offline handwritten document images. For text non-text classification, many times researchers employ high dimensional feature vectors, which not only increase the computation time and storage requirement, but also reduce the classification accuracy due to the presence of redundant or irrelevant features. Here lies the application of some feature selection (FS) algorithms in order to find out the relevant subset of the features from the original feature vector. In this paper, our aim is two-fold. Firstly, application of coalition game based FS technique to find out an optimal feature subset for classifying the components present in a handwritten document image either as text or non-text. Secondly, five variants of a popular texture based feature descriptor, called Local Binary Pattern (LBP), along with its basic version are fed to the FS module for identifying the useful patterns only which can pinpoint the regions of an image as most informative in terms of the said classification task. To the best of our knowledge, the approach is completely novel where coalition game based FS technique is applied for locating the feature-rich regions to be used for text non-text classification. For experimentation, we have prepared an in-house dataset along with its ground truth information which consists of 104 handwritten engineering class notes as well as laboratory copies that include handwritten and printed texts, graphical components and tables etc. Experimental outcomes confirm that the proposed approach not only helps in reducing the feature dimension significantly but also increases the recognition ability of all six feature vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm

Article 06 October 2021

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Article Open access 03 January 2021

Visual Analytic-Based Technique for Handwritten Indic Script Identification—A Greedy Heuristic Feature Fusion Framework

References

Ben Brahim A, Limam M (2016) A hybrid feature selection method based on instance learning and cooperative subset search. Pattern Recogn Lett 69:28–34
Article Google Scholar
Benesty J, Chen J, Huang Y, Cohen I (2009) “Pearson Correlation Coefficient,” in Noise reduction in speech processing, Springer, pp. 1–4
Bhowmik S, Sarkar R, Nasipuri M (2017) “Text and Non-text Separation in Handwritten Document Images Using Local Binary Pattern Operator,” In Proceedings of the First International Conference on Intelligent Computing and Communication, pp. 507–515
Bhowmik S, Sarkar R, Nasipuri M (2017) Text and non-text separation in handwritten document images using local binary pattern operator, vol. 458
Bhowmik S, Sarkar R, Nasipuri M, Doermann D (2018) Text and non-text separation in offline document images: a survey. Int J Doc Anal Recognit 21(1–2):1–20
Article Google Scholar
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839
Article MathSciNet Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Article Google Scholar
Chen YL and Wu BF (2009) “A multi-plane approach for text segmentation of complex document images,” Pattern Recognit, https://doi.org/10.1016/j.patcog.2008.10.032
Chowdhury SP, Mandal S, Das AK and Chanda B (2007) “Segmentation of text and graphics from document images,” in Document Analysis and Recognition. ICDAR 2007. Ninth International Conference on, 2007, 2: 619–623
Cohen R, Asi A, Kedem K, El-Sana J, and Dinstein I (2013) “Robust text and drawing segmentation algorithm for historical documents,” In Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, pp. 110–117
Davis JC (1986) Statistical and data analysis in geology. J. Wiley New York
Delaye A, Liu C-L (2014) Contextual text/non-text stroke classification in online handwritten notes with conditional random fields. Pattern Recogn 47(3):959–968
Article Google Scholar
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305
MATH Google Scholar
Ghosh M, Adhikary S, Ghosh KK, Sardar A, Begum S, Sarkar R (2019) Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods. Med Biol Eng Comput 57(1):159–176
Article Google Scholar
M. Ghosh, S. Begum, R. Sarkar, D. Chakraborty, and U. Maulik (2018). “Recursive Memetic algorithm for gene selection in microarray data,” Expert Syst Appl
Ghosh M, Guha R, Mondal R, Singh PK, Sarkar R, Nasipuri M (2018) Feature selection using histogram-based multi-objective GA for handwritten Devanagari numeral recognition. Adv Intell Syst Comput 695:471–479. https://doi.org/10.1007/978-981-10-7566-7_46
Article Google Scholar
Ghosh S, Lahiri D, Bhowmik S, Kavallieratou E, Sarkar R (2018) Text/Non-Text Separation from Handwritten Document Images Using LBP Based Features: An Empirical Study. J. Imaging 4(4):57
Article Google Scholar
Guha R, Ghosh M, Kapri S, Shaw S, Mutsuddi S, Bhateja V, Sarkar R (2019) Deluge based genetic algorithm for feature selection. Evol Intell:1–11. https://doi.org/10.1007/s12065-019-00218-5
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(1):1157–1182
MATH Google Scholar
I Guyon and A Elisseeff (2006). “An introduction to feature extraction,” in Feature extraction, Springer, pp. 1–25
I Haritaoglu (2005). “Scene text extraction and translation for handheld devices,” In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 2, pp. II-408-II–413, https://doi.org/10.1109/CVPR.2001.990990
Harwood D, Ojala T, Pietikäinen M, Kelman S, Davis L (1995) Texture classification by center-symmetric auto-correlation, using Kullback discrimination of distributions. Pattern Recogn Lett 16(1):1–10
Article Google Scholar
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. Adv Neural Inf Proces Syst:507–514
“Islamic Heritage project (IHP) collection.” http://ocp.hul.harvard.edu/ihp/
H Jin, Q Liu, H Lu, and X Tong (2004) “Face detection using improved LBP under Bayesian framework,” In Image and Graphics (ICIG’04), Third International Conference on, pp. 306–309
AS Kavitha, P Shivakumara, GH Kumar, and T Lu (2016). “Text segmentation in degraded historical document images,” Egypt. Informatics J
K Kira and LA Rendell (1992). “A practical approach to feature selection,” In Proceedings of the ninth international workshop on Machine learning, pp. 249–256
VP Le, N Nayef, M Visani, JM Ogier, and C De Tran (2015). “Text and non-text segmentation based on connected component features,” In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 1096–1100
Malakar S, Ghosh M, Bhowmik S, Sarkar R, Nasipuri M (2019) A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput & Applic 32:1–20. https://doi.org/10.1007/s00521-018-3937-8
Article Google Scholar
Mortazavi A, Moattar MH (2016) Robust feature selection from microarray data based on cooperative game theory and qualitative mutual information. Adv Bioinforma 2016:1–16. https://doi.org/10.1155/2016/1058305
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
OK Oyedotun and A Khashman (2016). “Document segmentation using textural features summarization and feedforward neural network,” Appl Intell, pp. 1–15
Radovic M, Ghalwash M, Filipovic N, Obradovic Z (2017) Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics 18(1):1–14. https://doi.org/10.1186/s12859-016-1423-9
Article Google Scholar
AK Sah, S Bhowmik, S Malakar, R Sarkar, E Kavallieratou, and N Vasilopoulos (2018). “Text and non-Text recognition using modified HOG descriptor,” in 2017 IEEE Calcutta Conference, CALCON 2017 - Proceedings, vol. 2018-Janua, pp. 64–68, https://doi.org/10.1109/CALCON.2017.8280697.
R Sarkar, S Moulik, N Das, S Basu, M Nasipuri, and M Kundu (2011). “Suppression of non-text components in handwritten document images,” ICIIP 2011 - Proc. 2011 Int. Conf. Image Inf. Process., no. Iciip, https://doi.org/10.1109/ICIIP.2011.6108921
School of Cultural Texts and Records. “Bichitra: Online Tagore Variorum.” http://bichitra.jdvu.ac.in/index.php (accessed Nov. 06, 2017).
Shang W, Huang H, Zhu H, Lin Y, Qu Y, Wang Z (2007) A novel feature selection algorithm for text categorization. Expert Syst Appl 33(1):1–5
Article Google Scholar
Singh PK, Das S, Sarkar R, Nasipuri M (2018) Feature selection using harmony search for script identification from handwritten document images. J Intell Syst 27(3):465–488. https://doi.org/10.1515/jisys-2016-0070
Article Google Scholar
Sun X, Liu Y, Li J, Zhu J, Chen H, Liu X (2012) Feature evaluation and selection with cooperative game theory. Pattern Recogn 45(8):2992–3002
Article Google Scholar
Sun X, Liu Y, Li J, Zhu J, Liu X, Chen H (2012) Using cooperative game theory to optimize the feature selection problem. Neurocomputing 97:86–93
Article Google Scholar
“The MediaTeam document database II.” http://www.mediateam.oulu.fi/downloads/MTDB/.
“UCI Machine Learning Repository.” http://archive.ics.uci.edu/ml/datasets/Newspaper+and+magazine+images+segmentation+dataset#.
“UW-III English/Technical Document Image Database.” http://isis-data.science.uva.nl/events/dlia//datasets/uwash3.html.
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput & Applic 24(1):175–186
Article Google Scholar
Wang A, An N, Chen G, Liu L, Alterovitz G (2018) Subtype dependent biomarker identification and tumor classification from gene expression profiles. Knowledge-Based Syst 146:104–117
Article Google Scholar
Wang Z, Wu D, Chen J, Ghoneim A, Hossain MA (2016) A triaxial accelerometer-based human activity recognition via EEMD-based features and game-theory-based feature selection. IEEE Sensors J 16(9):3198–3207
Article Google Scholar
F Wilcoxon (1992). “Individual comparisons by ranking methods,” In Breakthroughs in statistics, Springer, pp. 196–202
IH Witten, E Frank, MA Hall, and CJ Pal (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann
Zagoris K, Chatzichristofis SA, Papamarkos N (2011) Text localization using standard deviation analysis of structure elements and support vector machines. EURASIP J Adv Signal Process 2011(1):1–12
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Manosij Ghosh, Kushal Kanti Ghosh & Ram Sarkar
Department of Computer Science and Engineering, Ghani Khan Choudhury Institute of Engineering and Technology (GKCIET), Malda, India
Showmik Bhowmik

Authors

Manosij Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Kushal Kanti Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Showmik Bhowmik
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Showmik Bhowmik.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, M., Ghosh, K.K., Bhowmik, S. et al. Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features. Multimed Tools Appl 80, 3229–3249 (2021). https://doi.org/10.1007/s11042-020-09844-z

Download citation

Received: 11 December 2019
Revised: 23 July 2020
Accepted: 09 September 2020
Published: 19 September 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11042-020-09844-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features

Abstract

Access this article

Similar content being viewed by others

Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Visual Analytic-Based Technique for Handwritten Indic Script Identification—A Greedy Heuristic Feature Fusion Framework

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features

Abstract

Access this article

Similar content being viewed by others

Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Visual Analytic-Based Technique for Handwritten Indic Script Identification—A Greedy Heuristic Feature Fusion Framework

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation