Abstract
With the progression of digitization it is very necessary to archive the Bangla newspaper as well as other Bangla documents. The first step of reading Bangla Newspaper is to detect headlines and column from multi column newspaper. But there is no such algorithm developed so far in Bangla OCR that can fully read Bangla Newspaper. In this paper we present an algorithmic approach for multi column & headline detection from Bangla newspaper as well as Bangla magazine. It can separate headlines from news and also can detect columns from multi column. This algorithm works based on empty space between headline- columns, column-column.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Omee, F.Y., Himel, S.S., Bikas, A.N.: A Complete Workflow for Development of Bangla OCR. International Journal of Computer Applications (IJCA) 21(9), 1–6 (2011), doi:10.5120/2543-3483
Ray Chaudhuri, A., Mandal, A.K., Chaudhuri, B.B.: Page Layout Analyzer for Multilingual Indian Documents. In: Proceedings of the Language Engineering Conference, LEC. IEEE (2002)
Khedekar, S., Ramanaprasad, V., Setlur, S.: Text-Image Separation in Devanagari Documents. In: 7th International Conference on Document Analysis and Recognition, ICDAR. IEEE (2003)
Hasnat, A., Murtoza Habib, S.M., Khan, M.: Segmentation free Bangla OCR using HMM: Training and Recognition. In: Proceeding of 1st DCCA, Irbid, Jordan (2007)
Open_Source_Bangla_OCR, http://sourceforge.net/project/showfiles.php?group_id=158301&package_id=215908
Hasnat, A., Murtoza Habib, S.M., Khan, M.: A high performance domain specific OCR for Bangla script. In: International Joint Conference on Computer, Information, and Systems Sciences, and Engineering, CISSE (2007)
Smith, R.: An Overview of the Tesseract OCR Engine. In: Proceeding of ICDAR 2007, vol. 2, pp. 629–633 (2007)
Description_Of_RLSA_Algorithm, http://crblpocr.blogspot.com/2007/06/run-length-smoothing-algorithm-rlsa.html
Breuel, T.M.: The OCRopus Open Source OCR System. In: Proceedings of the Document and Retrival XV, IS&T/SPIE 20th Annual Symposium, San Jose, CA, United States, vol. 6815. SPIE (2008)
Patnaik, T., Gupta, S., Arya, D.: Comparison of Binarization Algorithmin Indian Language OCR. In: Annual Seminar of CDAC-Noida Technologies, ASCNT (2010)
Gonzalez, Woods: Digital image processing, 2nd edn., ch. 4 sec. 4.3, 4.4; ch. 5 sec. 5.1–5.3, pp. 167–184, 220–243. Prentice Hall (2002)
Median Filter, http://en.wikipedia.org/wiki/Median_filter
Murtoza Habib, S.M., Noor, N.A., Khan, M.: Skew Angle Detection of Bangla script using Radon Transform. In: Proceeding of 9th ICCIT (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Omee, F.Y., Shabbir Himel, M.S., Naser Bikas, M.A. (2013). An Algorithm for Headline and Column Separation in Bangla Documents. In: Abraham, A., Thampi, S. (eds) Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32063-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-32063-7_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32062-0
Online ISBN: 978-3-642-32063-7
eBook Packages: EngineeringEngineering (R0)