Abstract
This paper presents a two-stage parameter-free technique for the physical layout analysis of a document. In the first stage, Gaussian Mixture Model (GMM) with Expectation-Maximization (EM) is applied followed by recursive merging to obtain the best number of components from the height-frequency data. Such components are classified into running text, titles, and graphical elements. Using a Next Nearest Neighbor analysis, running-text and title text are grouped into blocks in the initial layout. At the second stage, the graphical elements are further divided into text boxes, light-colored text on a dark background, line separators, and graphics that give the final layout. Our proposed method achieved an accuracy of 86.30% and 75.14% in recognizing text and non-text elements from our generated dataset, which contains over 700 documents. Results on the ICDAR dataset show accuracy comparable to some of the best and most popular algorithms such as MHS (winner of the ICDAR-RDCL2015 competition) and PRImA’s Aletheia. The strength of our algorithm is that it is entirely free of manually tuned parameters.
Similar content being viewed by others
References
Alginahi Y, Fekri D, Sid-Ahmed MA (2005) A neural-based page segmentation system. J Circ Syst Comput 14(1):109–122
Antonacopoulos A, Clausner C, Papadopoulos C, Pletschacher S (2015) Icdar2015 competition on recognition of documents with complex layouts-rdcl2015. In: 2015 13th International conference on document analysis and recognition (ICDAR). IEEE, pp 1151–1155
Antonacopoulos A, Pletschacher S, Bridson D, Papadopoulos C (2009) Icdar 2009 page segmentation competition. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 1370–1374
Augusto Borges Oliveira D, Palhares Viana M (2017) Fast cnn-based document layout analysis. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1173–1180
Baird HS, Jones SE, Fortune SJ (1990) Image segmentation by shape-directed covers. In: [1990] Proceedings. 10th International conference on pattern recognition, vol 1. IEEE, pp 820–825
Binmakhashen GM, Mahmoud SA (2019) Document layout analysis: a comprehensive survey. ACM Comput Surv (CSUR) 52(6):1–36
Chaudhuri AR, Mandal AK, Chaudhuri BB (2002) Page layout analyser for multilingual indian documents. In: Language engineering conference, 2002. Proceedings. IEEE, pp 24–32
Chen K, Yin F, Liu C-L (2013) Hybrid page segmentation with efficient whitespace rectangles extraction and grouping. In: 2013 12th International conference on document analysis and recognition. IEEE, pp 958–962
Clausner C, Pletschacher S, Antonacopoulos A (2011) Aletheia-an advanced document layout and text ground-truthing system for production environments. In: 2011 International conference on document analysis and recognition. IEEE, pp 48–52
Dasigi P, Jain R, Jawahar CV (2008) Document image segmentation as a spectral partitioning problem. In: 2008 Sixth Indian conference on computer vision, graphics & image processing. IEEE, pp 305–312
Dong-Rong Liu, Bao-Lan Guo, Xue-Dong Tian (2002) An approach of page layout analysis based on active contour model. In: Proceedings. International conference on machine learning and cybernetics, vol 4, pp 1711–1714
Esposito F, Malerba D, Semeraro G (1995) A knowledge-based approach to the layout analysis. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 466–471
Fan K-C, Liu C-H, Wang Y-K (1994) Segmentation and classification of mixed text/graphics/image documents. Pattern Recogn Lett 15(12):1201–1209
Felhi M, Tabbone S, Segovia MVO (2014) Multiscale stroke-based page segmentation approach. In: 2014 11th IAPR International workshop on document analysis systems. IEEE, pp 6–10
Ferilli S, Biba M, Esposito F, Basile Teresa MA (2009) A distance-based technique for non-manhattan layout analysis. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 231–235
Forczmański P, Smoliński A, Nowosielski A, Małecki K (2019) Segmentation of scanned documents using deep-learning approach. In: International conference on computer recognition systems. Springer, pp 141–152
Grana C, Serra G, Manfredi M, Coppi D, Cucchiara R (2016) Layout analysis and content enrichment of digitized books. Multimed Tools Appl 75(7):3879–3900
Hadjar K, Hitz O, Ingold R (2001) Newspaper page decomposition using a split and merge approach. In: Proceedings of sixth international conference on document analysis and recognition, pp 1186–1189
Ittner DJ, Baird HS (1993) Language-free layout analysis. In: Proceedings of 2nd International conference on document analysis and recognition (ICDAR ’93), pp 336–340
Kamola G, Spytkowski M, Paradowski M, Markowska-Kaczmar U (2015) Image-based logical document structure recognition. Pattern Anal Applic 18(3):651–665
Kise K, Sato A, Iwata M (1998) Segmentation of page images using the area voronoi diagram. Comput Vis Image Underst 70(3):370–382
Kise K, Yanagida O, Takamatsu S (1996) Page segmentation based on thinning of background. In: Proceedings of 13th international conference on pattern recognition, vol 3. IEEE, pp 788–792
Krishnamoorthy M, Nagy G, Seth S, Viswanathan M (1993) Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Trans Pattern Anal Mach Intell 15(7):737–747
Le VP, Nayef N, Visani M, Ogier J-M, De Tran C (2015) Text and non-text segmentation based on connected component features. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1096–1100
Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354
Leng L, Li M, Teoh ABJ (2013) Conjugate 2dpalmhash code for secure palm-print-vein verification. In: 2013 6th International congress on image and signal processing (CISP), vol 3. IEEE, pp 1705–1710
Leng L, Zhang J (2013) Palmhash code vs. palmphasor code. Neurocomputing 108:1–12
Leng L, Zhang J, Khan MK, Chen X, Alghathbar K (2010) Dynamic weighted discrimination power analysis: a novel approach for face and palmprint recognition in dct domain. Int J Phys Sci 5(17):2543–2554
Li X-H, Yin F, Liu C-L (2020) Page segmentation using convolutional neural network and graphical model. In: International workshop on document analysis systems. Springer, pp 231–245
Liang J, Ha J, Haralick R M, Phillips IT (1996) Document layout structure extraction using bounding boxes of different entitles. In: Proceedings third IEEE workshop on applications of computer vision. WACV’96. IEEE, pp 278–283
Liu F, Luo Y, Yoshikawa M, Hu D (2001) A new component based algorithm for newspaper layout analysis. In: Proceedings of sixth international conference on document analysis and recognition. IEEE, pp 1176–1180
Melinda L, Ghanapuram R, Bhagvati C (2017) Document layout analysis using multigaussian fitting. In: 2017 14th IAPR International conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 747–752
Mitchell PE, Hong Yan (2001) Newspaper document analysis featuring connected line segmentation. In: Proceedings of sixth international conference on document analysis and recognition, pp 1181–1185
Mitchell PE, Yan H (2000) Document page segmentation and layout analysis using soft ordering. In: Proceedings 15th international conference on pattern recognition. ICPR-2000, vol 1. IEEE, pp 458–461
Nagy G, Seth S, Viswanathan M (1992) A prototype document image analysis system for technical journals. Computer 25(7):10–22
O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15(11):1162–1173
Pati PB, Raju SS, Pati N, Ramakrishnan AG (2004) Gabor filters for document analysis in indian bilingual documents. In: Proceedings of international conference on intelligent sensing and information processing, 2004. IEEE, pp 123–126
Pavlidis T, Zhou J (1992) Page segmentation and classification. CVGIP: Graphical models and image processing 54(6):484–496
Qiao Y-L, Lu Z-M, Song C-Y, Sun S-H (2006) Document image segmentation using gabor wavelet and kernel-based methods. In: 2006 1st International symposium on systems and control in aerospace and astronautics. IEEE, pp 5–pp
Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236
Schwarz G, et al. (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Shih FY, Shy-Shyan Chen (1996) Adaptive document block segmentation and classification. IEEE Trans Syst Man Cybern, Part B (Cybernetics) 26(5):797–802
Singh V, Kumar B (2014) Document layout analysis for indian newspapers using contour based symbiotic approach. In: 2014 International conference on computer communication and informatics. IEEE, pp 1–4
Smith R (2007) An overview of the tesseract ocr engine. In: Ninth international conference on document analysis and recognition (ICDAR 2007), vol 2. IEEE, pp 629–633
Smith RW (2009) Hybrid page layout analysis via tab-stop detection. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 241–245
Sun H-M (2005) Page segmentation for manhattan and non-manhattan layout documents via selective crla. In: Eighth international conference on document analysis and recognition (ICDAR’05). IEEE, pp 116–120
Taylor SL, Dahl DA, Lipshutz M, Weir C, Norton LM, Nilson RW, Linebarger MC (1994) Integrating natural language understanding with document structure analysis. In: Integration of natural language and vision processing. Springer, pp 163–184
Tran TA, Na I-S, Kim S-H (2015) Hybrid page segmentation using multilevel homogeneity structure. In: Proceedings of the 9th international conference on ubiquitous information management and communication. ACM, p 78
Tran TA, Na IS, Kim SH (2016) Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology. Int J Doc Anal Recogn (IJDAR) 19(3):191–209
Wahl FM (1983) A new distance mapping and its use for shape measurement on binary patterns. Comput Vis Graph Image Process 23(2):218–226
Wong KY, Casey RG, Wahl FM (1982) Document analysis system. IBM J Res Dev 26(6):647–656
Funding
No funding was received for this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
And there is no conflict of interest except members of University of Hyderabad.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Melinda, L., Bhagvati, C. Parameter free approach for segmenting complex manhattan layouts. Multimed Tools Appl 82, 6581–6603 (2023). https://doi.org/10.1007/s11042-022-13400-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13400-2