Parameter free approach for segmenting complex manhattan layouts

Melinda, Laiphangbam; Bhagvati, Chakravarthy

doi:10.1007/s11042-022-13400-2

Parameter free approach for segmenting complex manhattan layouts

Published: 08 August 2022

Volume 82, pages 6581–6603, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

154 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper presents a two-stage parameter-free technique for the physical layout analysis of a document. In the first stage, Gaussian Mixture Model (GMM) with Expectation-Maximization (EM) is applied followed by recursive merging to obtain the best number of components from the height-frequency data. Such components are classified into running text, titles, and graphical elements. Using a Next Nearest Neighbor analysis, running-text and title text are grouped into blocks in the initial layout. At the second stage, the graphical elements are further divided into text boxes, light-colored text on a dark background, line separators, and graphics that give the final layout. Our proposed method achieved an accuracy of 86.30% and 75.14% in recognizing text and non-text elements from our generated dataset, which contains over 700 documents. Results on the ICDAR dataset show accuracy comparable to some of the best and most popular algorithms such as MHS (winner of the ICDAR-RDCL2015 competition) and PRImA’s Aletheia. The strength of our algorithm is that it is entirely free of manually tuned parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 4

A Probabilistic Clustering Approach for Detecting Linear Structures in Two-Dimensional Spaces

Article 01 October 2021

Page Segmentation Techniques in Document Analysis

Detecting Arbitrarily Oriented Text Labels in Early Maps

References

Alginahi Y, Fekri D, Sid-Ahmed MA (2005) A neural-based page segmentation system. J Circ Syst Comput 14(1):109–122
Article Google Scholar
Antonacopoulos A, Clausner C, Papadopoulos C, Pletschacher S (2015) Icdar2015 competition on recognition of documents with complex layouts-rdcl2015. In: 2015 13th International conference on document analysis and recognition (ICDAR). IEEE, pp 1151–1155
Antonacopoulos A, Pletschacher S, Bridson D, Papadopoulos C (2009) Icdar 2009 page segmentation competition. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 1370–1374
Augusto Borges Oliveira D, Palhares Viana M (2017) Fast cnn-based document layout analysis. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1173–1180
Baird HS, Jones SE, Fortune SJ (1990) Image segmentation by shape-directed covers. In: [1990] Proceedings. 10th International conference on pattern recognition, vol 1. IEEE, pp 820–825
Binmakhashen GM, Mahmoud SA (2019) Document layout analysis: a comprehensive survey. ACM Comput Surv (CSUR) 52(6):1–36
Article Google Scholar
Chaudhuri AR, Mandal AK, Chaudhuri BB (2002) Page layout analyser for multilingual indian documents. In: Language engineering conference, 2002. Proceedings. IEEE, pp 24–32
Chen K, Yin F, Liu C-L (2013) Hybrid page segmentation with efficient whitespace rectangles extraction and grouping. In: 2013 12th International conference on document analysis and recognition. IEEE, pp 958–962
Clausner C, Pletschacher S, Antonacopoulos A (2011) Aletheia-an advanced document layout and text ground-truthing system for production environments. In: 2011 International conference on document analysis and recognition. IEEE, pp 48–52
Dasigi P, Jain R, Jawahar CV (2008) Document image segmentation as a spectral partitioning problem. In: 2008 Sixth Indian conference on computer vision, graphics & image processing. IEEE, pp 305–312
Dong-Rong Liu, Bao-Lan Guo, Xue-Dong Tian (2002) An approach of page layout analysis based on active contour model. In: Proceedings. International conference on machine learning and cybernetics, vol 4, pp 1711–1714
Esposito F, Malerba D, Semeraro G (1995) A knowledge-based approach to the layout analysis. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 466–471
Fan K-C, Liu C-H, Wang Y-K (1994) Segmentation and classification of mixed text/graphics/image documents. Pattern Recogn Lett 15(12):1201–1209
Article Google Scholar
Felhi M, Tabbone S, Segovia MVO (2014) Multiscale stroke-based page segmentation approach. In: 2014 11th IAPR International workshop on document analysis systems. IEEE, pp 6–10
Ferilli S, Biba M, Esposito F, Basile Teresa MA (2009) A distance-based technique for non-manhattan layout analysis. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 231–235
Forczmański P, Smoliński A, Nowosielski A, Małecki K (2019) Segmentation of scanned documents using deep-learning approach. In: International conference on computer recognition systems. Springer, pp 141–152
Grana C, Serra G, Manfredi M, Coppi D, Cucchiara R (2016) Layout analysis and content enrichment of digitized books. Multimed Tools Appl 75(7):3879–3900
Article Google Scholar
Hadjar K, Hitz O, Ingold R (2001) Newspaper page decomposition using a split and merge approach. In: Proceedings of sixth international conference on document analysis and recognition, pp 1186–1189
Ittner DJ, Baird HS (1993) Language-free layout analysis. In: Proceedings of 2nd International conference on document analysis and recognition (ICDAR ’93), pp 336–340
Kamola G, Spytkowski M, Paradowski M, Markowska-Kaczmar U (2015) Image-based logical document structure recognition. Pattern Anal Applic 18(3):651–665
Article Google Scholar
Kise K, Sato A, Iwata M (1998) Segmentation of page images using the area voronoi diagram. Comput Vis Image Underst 70(3):370–382
Article Google Scholar
Kise K, Yanagida O, Takamatsu S (1996) Page segmentation based on thinning of background. In: Proceedings of 13th international conference on pattern recognition, vol 3. IEEE, pp 788–792
Krishnamoorthy M, Nagy G, Seth S, Viswanathan M (1993) Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Trans Pattern Anal Mach Intell 15(7):737–747
Article Google Scholar
Le VP, Nayef N, Visani M, Ogier J-M, De Tran C (2015) Text and non-text segmentation based on connected component features. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1096–1100
Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354
Article Google Scholar
Leng L, Li M, Teoh ABJ (2013) Conjugate 2dpalmhash code for secure palm-print-vein verification. In: 2013 6th International congress on image and signal processing (CISP), vol 3. IEEE, pp 1705–1710
Leng L, Zhang J (2013) Palmhash code vs. palmphasor code. Neurocomputing 108:1–12
Article Google Scholar
Leng L, Zhang J, Khan MK, Chen X, Alghathbar K (2010) Dynamic weighted discrimination power analysis: a novel approach for face and palmprint recognition in dct domain. Int J Phys Sci 5(17):2543–2554
Google Scholar
Li X-H, Yin F, Liu C-L (2020) Page segmentation using convolutional neural network and graphical model. In: International workshop on document analysis systems. Springer, pp 231–245
Liang J, Ha J, Haralick R M, Phillips IT (1996) Document layout structure extraction using bounding boxes of different entitles. In: Proceedings third IEEE workshop on applications of computer vision. WACV’96. IEEE, pp 278–283
Liu F, Luo Y, Yoshikawa M, Hu D (2001) A new component based algorithm for newspaper layout analysis. In: Proceedings of sixth international conference on document analysis and recognition. IEEE, pp 1176–1180
Melinda L, Ghanapuram R, Bhagvati C (2017) Document layout analysis using multigaussian fitting. In: 2017 14th IAPR International conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 747–752
Mitchell PE, Hong Yan (2001) Newspaper document analysis featuring connected line segmentation. In: Proceedings of sixth international conference on document analysis and recognition, pp 1181–1185
Mitchell PE, Yan H (2000) Document page segmentation and layout analysis using soft ordering. In: Proceedings 15th international conference on pattern recognition. ICPR-2000, vol 1. IEEE, pp 458–461
Nagy G, Seth S, Viswanathan M (1992) A prototype document image analysis system for technical journals. Computer 25(7):10–22
Article Google Scholar
O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15(11):1162–1173
Article Google Scholar
Pati PB, Raju SS, Pati N, Ramakrishnan AG (2004) Gabor filters for document analysis in indian bilingual documents. In: Proceedings of international conference on intelligent sensing and information processing, 2004. IEEE, pp 123–126
Pavlidis T, Zhou J (1992) Page segmentation and classification. CVGIP: Graphical models and image processing 54(6):484–496
Google Scholar
Qiao Y-L, Lu Z-M, Song C-Y, Sun S-H (2006) Document image segmentation using gabor wavelet and kernel-based methods. In: 2006 1st International symposium on systems and control in aerospace and astronautics. IEEE, pp 5–pp
Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236
Article Google Scholar
Schwarz G, et al. (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MATH Google Scholar
Shih FY, Shy-Shyan Chen (1996) Adaptive document block segmentation and classification. IEEE Trans Syst Man Cybern, Part B (Cybernetics) 26(5):797–802
Article Google Scholar
Singh V, Kumar B (2014) Document layout analysis for indian newspapers using contour based symbiotic approach. In: 2014 International conference on computer communication and informatics. IEEE, pp 1–4
Smith R (2007) An overview of the tesseract ocr engine. In: Ninth international conference on document analysis and recognition (ICDAR 2007), vol 2. IEEE, pp 629–633
Smith RW (2009) Hybrid page layout analysis via tab-stop detection. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 241–245
Sun H-M (2005) Page segmentation for manhattan and non-manhattan layout documents via selective crla. In: Eighth international conference on document analysis and recognition (ICDAR’05). IEEE, pp 116–120
Taylor SL, Dahl DA, Lipshutz M, Weir C, Norton LM, Nilson RW, Linebarger MC (1994) Integrating natural language understanding with document structure analysis. In: Integration of natural language and vision processing. Springer, pp 163–184
Tran TA, Na I-S, Kim S-H (2015) Hybrid page segmentation using multilevel homogeneity structure. In: Proceedings of the 9th international conference on ubiquitous information management and communication. ACM, p 78
Tran TA, Na IS, Kim SH (2016) Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology. Int J Doc Anal Recogn (IJDAR) 19(3):191–209
Article Google Scholar
Wahl FM (1983) A new distance mapping and its use for shape measurement on binary patterns. Comput Vis Graph Image Process 23(2):218–226
Article Google Scholar
Wong KY, Casey RG, Wahl FM (1982) Document analysis system. IBM J Res Dev 26(6):647–656
Article Google Scholar

Download references

Funding

No funding was received for this paper.

Author information

Authors and Affiliations

School of Computer and Information Sciences, University of Hyderabad, Hyderabad, India
Laiphangbam Melinda & Chakravarthy Bhagvati

Authors

Laiphangbam Melinda
View author publications
You can also search for this author in PubMed Google Scholar
Chakravarthy Bhagvati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laiphangbam Melinda.

Ethics declarations

Conflict of Interests

And there is no conflict of interest except members of University of Hyderabad.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Melinda, L., Bhagvati, C. Parameter free approach for segmenting complex manhattan layouts. Multimed Tools Appl 82, 6581–6603 (2023). https://doi.org/10.1007/s11042-022-13400-2

Download citation

Received: 21 October 2020
Revised: 02 April 2022
Accepted: 02 July 2022
Published: 08 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11042-022-13400-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parameter free approach for segmenting complex manhattan layouts

Abstract

Access this article

Similar content being viewed by others

A Probabilistic Clustering Approach for Detecting Linear Structures in Two-Dimensional Spaces

Page Segmentation Techniques in Document Analysis

Detecting Arbitrarily Oriented Text Labels in Early Maps

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parameter free approach for segmenting complex manhattan layouts

Abstract

Access this article

Similar content being viewed by others

A Probabilistic Clustering Approach for Detecting Linear Structures in Two-Dimensional Spaces

Page Segmentation Techniques in Document Analysis

Detecting Arbitrarily Oriented Text Labels in Early Maps

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation