Skip to main content
Log in

Parameter free approach for segmenting complex manhattan layouts

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a two-stage parameter-free technique for the physical layout analysis of a document. In the first stage, Gaussian Mixture Model (GMM) with Expectation-Maximization (EM) is applied followed by recursive merging to obtain the best number of components from the height-frequency data. Such components are classified into running text, titles, and graphical elements. Using a Next Nearest Neighbor analysis, running-text and title text are grouped into blocks in the initial layout. At the second stage, the graphical elements are further divided into text boxes, light-colored text on a dark background, line separators, and graphics that give the final layout. Our proposed method achieved an accuracy of 86.30% and 75.14% in recognizing text and non-text elements from our generated dataset, which contains over 700 documents. Results on the ICDAR dataset show accuracy comparable to some of the best and most popular algorithms such as MHS (winner of the ICDAR-RDCL2015 competition) and PRImA’s Aletheia. The strength of our algorithm is that it is entirely free of manually tuned parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Alginahi Y, Fekri D, Sid-Ahmed MA (2005) A neural-based page segmentation system. J Circ Syst Comput 14(1):109–122

    Article  Google Scholar 

  2. Antonacopoulos A, Clausner C, Papadopoulos C, Pletschacher S (2015) Icdar2015 competition on recognition of documents with complex layouts-rdcl2015. In: 2015 13th International conference on document analysis and recognition (ICDAR). IEEE, pp 1151–1155

  3. Antonacopoulos A, Pletschacher S, Bridson D, Papadopoulos C (2009) Icdar 2009 page segmentation competition. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 1370–1374

  4. Augusto Borges Oliveira D, Palhares Viana M (2017) Fast cnn-based document layout analysis. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1173–1180

  5. Baird HS, Jones SE, Fortune SJ (1990) Image segmentation by shape-directed covers. In: [1990] Proceedings. 10th International conference on pattern recognition, vol 1. IEEE, pp 820–825

  6. Binmakhashen GM, Mahmoud SA (2019) Document layout analysis: a comprehensive survey. ACM Comput Surv (CSUR) 52(6):1–36

    Article  Google Scholar 

  7. Chaudhuri AR, Mandal AK, Chaudhuri BB (2002) Page layout analyser for multilingual indian documents. In: Language engineering conference, 2002. Proceedings. IEEE, pp 24–32

  8. Chen K, Yin F, Liu C-L (2013) Hybrid page segmentation with efficient whitespace rectangles extraction and grouping. In: 2013 12th International conference on document analysis and recognition. IEEE, pp 958–962

  9. Clausner C, Pletschacher S, Antonacopoulos A (2011) Aletheia-an advanced document layout and text ground-truthing system for production environments. In: 2011 International conference on document analysis and recognition. IEEE, pp 48–52

  10. Dasigi P, Jain R, Jawahar CV (2008) Document image segmentation as a spectral partitioning problem. In: 2008 Sixth Indian conference on computer vision, graphics & image processing. IEEE, pp 305–312

  11. Dong-Rong Liu, Bao-Lan Guo, Xue-Dong Tian (2002) An approach of page layout analysis based on active contour model. In: Proceedings. International conference on machine learning and cybernetics, vol 4, pp 1711–1714

  12. Esposito F, Malerba D, Semeraro G (1995) A knowledge-based approach to the layout analysis. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1. IEEE, pp 466–471

  13. Fan K-C, Liu C-H, Wang Y-K (1994) Segmentation and classification of mixed text/graphics/image documents. Pattern Recogn Lett 15(12):1201–1209

    Article  Google Scholar 

  14. Felhi M, Tabbone S, Segovia MVO (2014) Multiscale stroke-based page segmentation approach. In: 2014 11th IAPR International workshop on document analysis systems. IEEE, pp 6–10

  15. Ferilli S, Biba M, Esposito F, Basile Teresa MA (2009) A distance-based technique for non-manhattan layout analysis. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 231–235

  16. Forczmański P, Smoliński A, Nowosielski A, Małecki K (2019) Segmentation of scanned documents using deep-learning approach. In: International conference on computer recognition systems. Springer, pp 141–152

  17. Grana C, Serra G, Manfredi M, Coppi D, Cucchiara R (2016) Layout analysis and content enrichment of digitized books. Multimed Tools Appl 75(7):3879–3900

    Article  Google Scholar 

  18. Hadjar K, Hitz O, Ingold R (2001) Newspaper page decomposition using a split and merge approach. In: Proceedings of sixth international conference on document analysis and recognition, pp 1186–1189

  19. Ittner DJ, Baird HS (1993) Language-free layout analysis. In: Proceedings of 2nd International conference on document analysis and recognition (ICDAR ’93), pp 336–340

  20. Kamola G, Spytkowski M, Paradowski M, Markowska-Kaczmar U (2015) Image-based logical document structure recognition. Pattern Anal Applic 18(3):651–665

    Article  Google Scholar 

  21. Kise K, Sato A, Iwata M (1998) Segmentation of page images using the area voronoi diagram. Comput Vis Image Underst 70(3):370–382

    Article  Google Scholar 

  22. Kise K, Yanagida O, Takamatsu S (1996) Page segmentation based on thinning of background. In: Proceedings of 13th international conference on pattern recognition, vol 3. IEEE, pp 788–792

  23. Krishnamoorthy M, Nagy G, Seth S, Viswanathan M (1993) Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Trans Pattern Anal Mach Intell 15(7):737–747

    Article  Google Scholar 

  24. Le VP, Nayef N, Visani M, Ogier J-M, De Tran C (2015) Text and non-text segmentation based on connected component features. In: 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, pp 1096–1100

  25. Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. Multimed Tools Appl 76(1):333–354

    Article  Google Scholar 

  26. Leng L, Li M, Teoh ABJ (2013) Conjugate 2dpalmhash code for secure palm-print-vein verification. In: 2013 6th International congress on image and signal processing (CISP), vol 3. IEEE, pp 1705–1710

  27. Leng L, Zhang J (2013) Palmhash code vs. palmphasor code. Neurocomputing 108:1–12

    Article  Google Scholar 

  28. Leng L, Zhang J, Khan MK, Chen X, Alghathbar K (2010) Dynamic weighted discrimination power analysis: a novel approach for face and palmprint recognition in dct domain. Int J Phys Sci 5(17):2543–2554

    Google Scholar 

  29. Li X-H, Yin F, Liu C-L (2020) Page segmentation using convolutional neural network and graphical model. In: International workshop on document analysis systems. Springer, pp 231–245

  30. Liang J, Ha J, Haralick R M, Phillips IT (1996) Document layout structure extraction using bounding boxes of different entitles. In: Proceedings third IEEE workshop on applications of computer vision. WACV’96. IEEE, pp 278–283

  31. Liu F, Luo Y, Yoshikawa M, Hu D (2001) A new component based algorithm for newspaper layout analysis. In: Proceedings of sixth international conference on document analysis and recognition. IEEE, pp 1176–1180

  32. Melinda L, Ghanapuram R, Bhagvati C (2017) Document layout analysis using multigaussian fitting. In: 2017 14th IAPR International conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 747–752

  33. Mitchell PE, Hong Yan (2001) Newspaper document analysis featuring connected line segmentation. In: Proceedings of sixth international conference on document analysis and recognition, pp 1181–1185

  34. Mitchell PE, Yan H (2000) Document page segmentation and layout analysis using soft ordering. In: Proceedings 15th international conference on pattern recognition. ICPR-2000, vol 1. IEEE, pp 458–461

  35. Nagy G, Seth S, Viswanathan M (1992) A prototype document image analysis system for technical journals. Computer 25(7):10–22

    Article  Google Scholar 

  36. O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15(11):1162–1173

    Article  Google Scholar 

  37. Pati PB, Raju SS, Pati N, Ramakrishnan AG (2004) Gabor filters for document analysis in indian bilingual documents. In: Proceedings of international conference on intelligent sensing and information processing, 2004. IEEE, pp 123–126

  38. Pavlidis T, Zhou J (1992) Page segmentation and classification. CVGIP: Graphical models and image processing 54(6):484–496

    Google Scholar 

  39. Qiao Y-L, Lu Z-M, Song C-Y, Sun S-H (2006) Document image segmentation using gabor wavelet and kernel-based methods. In: 2006 1st International symposium on systems and control in aerospace and astronautics. IEEE, pp 5–pp

  40. Sauvola J, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recogn 33(2):225–236

    Article  Google Scholar 

  41. Schwarz G, et al. (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MATH  Google Scholar 

  42. Shih FY, Shy-Shyan Chen (1996) Adaptive document block segmentation and classification. IEEE Trans Syst Man Cybern, Part B (Cybernetics) 26(5):797–802

    Article  Google Scholar 

  43. Singh V, Kumar B (2014) Document layout analysis for indian newspapers using contour based symbiotic approach. In: 2014 International conference on computer communication and informatics. IEEE, pp 1–4

  44. Smith R (2007) An overview of the tesseract ocr engine. In: Ninth international conference on document analysis and recognition (ICDAR 2007), vol 2. IEEE, pp 629–633

  45. Smith RW (2009) Hybrid page layout analysis via tab-stop detection. In: 2009 10th International conference on document analysis and recognition. IEEE, pp 241–245

  46. Sun H-M (2005) Page segmentation for manhattan and non-manhattan layout documents via selective crla. In: Eighth international conference on document analysis and recognition (ICDAR’05). IEEE, pp 116–120

  47. Taylor SL, Dahl DA, Lipshutz M, Weir C, Norton LM, Nilson RW, Linebarger MC (1994) Integrating natural language understanding with document structure analysis. In: Integration of natural language and vision processing. Springer, pp 163–184

  48. Tran TA, Na I-S, Kim S-H (2015) Hybrid page segmentation using multilevel homogeneity structure. In: Proceedings of the 9th international conference on ubiquitous information management and communication. ACM, p 78

  49. Tran TA, Na IS, Kim SH (2016) Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology. Int J Doc Anal Recogn (IJDAR) 19(3):191–209

    Article  Google Scholar 

  50. Wahl FM (1983) A new distance mapping and its use for shape measurement on binary patterns. Comput Vis Graph Image Process 23(2):218–226

    Article  Google Scholar 

  51. Wong KY, Casey RG, Wahl FM (1982) Document analysis system. IBM J Res Dev 26(6):647–656

    Article  Google Scholar 

Download references

Funding

No funding was received for this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laiphangbam Melinda.

Ethics declarations

Conflict of Interests

And there is no conflict of interest except members of University of Hyderabad.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melinda, L., Bhagvati, C. Parameter free approach for segmenting complex manhattan layouts. Multimed Tools Appl 82, 6581–6603 (2023). https://doi.org/10.1007/s11042-022-13400-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13400-2

Keywords

Navigation