An Algorithm for Headline and Column Separation in Bangla Documents

Omee, Farjana Yeasmin; Shabbir Himel, Md. Shiam; Naser Bikas, Md. Abu

doi:10.1007/978-3-642-32063-7_32

Farjana Yeasmin Omee³,
Md. Shiam Shabbir Himel³ &
Md. Abu Naser Bikas³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 182))

1778 Accesses
4 Citations

Abstract

With the progression of digitization it is very necessary to archive the Bangla newspaper as well as other Bangla documents. The first step of reading Bangla Newspaper is to detect headlines and column from multi column newspaper. But there is no such algorithm developed so far in Bangla OCR that can fully read Bangla Newspaper. In this paper we present an algorithmic approach for multi column & headline detection from Bangla newspaper as well as Bangla magazine. It can separate headlines from news and also can detect columns from multi column. This algorithm works based on empty space between headline- columns, column-column.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Omee, F.Y., Himel, S.S., Bikas, A.N.: A Complete Workflow for Development of Bangla OCR. International Journal of Computer Applications (IJCA) 21(9), 1–6 (2011), doi:10.5120/2543-3483
Article Google Scholar
Ray Chaudhuri, A., Mandal, A.K., Chaudhuri, B.B.: Page Layout Analyzer for Multilingual Indian Documents. In: Proceedings of the Language Engineering Conference, LEC. IEEE (2002)
Google Scholar
Khedekar, S., Ramanaprasad, V., Setlur, S.: Text-Image Separation in Devanagari Documents. In: 7th International Conference on Document Analysis and Recognition, ICDAR. IEEE (2003)
Google Scholar
Hasnat, A., Murtoza Habib, S.M., Khan, M.: Segmentation free Bangla OCR using HMM: Training and Recognition. In: Proceeding of 1st DCCA, Irbid, Jordan (2007)
Google Scholar
Open_Source_Bangla_OCR, http://sourceforge.net/project/showfiles.php?group_id=158301&package_id=215908
Hasnat, A., Murtoza Habib, S.M., Khan, M.: A high performance domain specific OCR for Bangla script. In: International Joint Conference on Computer, Information, and Systems Sciences, and Engineering, CISSE (2007)
Google Scholar
Smith, R.: An Overview of the Tesseract OCR Engine. In: Proceeding of ICDAR 2007, vol. 2, pp. 629–633 (2007)
Google Scholar
Description_Of_RLSA_Algorithm, http://crblpocr.blogspot.com/2007/06/run-length-smoothing-algorithm-rlsa.html
Breuel, T.M.: The OCRopus Open Source OCR System. In: Proceedings of the Document and Retrival XV, IS&T/SPIE 20th Annual Symposium, San Jose, CA, United States, vol. 6815. SPIE (2008)
Google Scholar
Patnaik, T., Gupta, S., Arya, D.: Comparison of Binarization Algorithmin Indian Language OCR. In: Annual Seminar of CDAC-Noida Technologies, ASCNT (2010)
Google Scholar
Gonzalez, Woods: Digital image processing, 2nd edn., ch. 4 sec. 4.3, 4.4; ch. 5 sec. 5.1–5.3, pp. 167–184, 220–243. Prentice Hall (2002)
Google Scholar
Median Filter, http://en.wikipedia.org/wiki/Median_filter
Murtoza Habib, S.M., Noor, N.A., Khan, M.: Skew Angle Detection of Bangla script using Radon Transform. In: Proceeding of 9th ICCIT (2006)
Google Scholar
OpenCV, http://opencv.willowgarage.com/wiki/

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shahjalal University of Science and Technology, Sylhet, Bangladesh
Farjana Yeasmin Omee, Md. Shiam Shabbir Himel & Md. Abu Naser Bikas

Authors

Farjana Yeasmin Omee
View author publications
You can also search for this author in PubMed Google Scholar
Md. Shiam Shabbir Himel
View author publications
You can also search for this author in PubMed Google Scholar
Md. Abu Naser Bikas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farjana Yeasmin Omee .

Editor information

Editors and Affiliations

(MIR Labs), Scientific Network for Innovation and, Machine Intelligence Research Labs, MIR Labs Campus, Auburn, 98071, Washington, USA
Ajith Abraham
Technology and Management, Indian Institute of Information, Technopark Campus, Trivandrum, 695581, India
Sabu M Thampi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Omee, F.Y., Shabbir Himel, M.S., Naser Bikas, M.A. (2013). An Algorithm for Headline and Column Separation in Bangla Documents. In: Abraham, A., Thampi, S. (eds) Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32063-7_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-32063-7_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32062-0
Online ISBN: 978-3-642-32063-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics