skip to main content
10.1145/1815330.1815348acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdasConference Proceedingsconference-collections
research-article

Handwritten Arabic text line segmentation using affinity propagation

Published: 09 June 2010 Publication History

Abstract

In this paper, we present a novel graph-based method for extracting handwritten text lines in monochromatic Arabic document images. Our approach consists of two steps - Coarse text line estimation using primary components which define the line and assignment of diacritic components which are more difficult to associate with a given line. We first estimate local orientation at each primary component to build a sparse similarity graph. We then, use a shortest path algorithm to compute similarities between non-neighboring components. From this graph, we obtain coarse text lines using two estimates obtained from Affinity propagation and Breadth-first search. In the second step, we assign secondary components to each text line. The proposed method is very fast and robust to non-uniform skew and character size variations, normally present in handwritten text lines. We evaluate our method using a pixel-matching criteria, and report 96% accuracy on a dataset of 125 Arabic document images. We also present a proximity analysis on datasets generated by artificially decreasing the spacings between text lines to demonstrate the robustness of our approach.

References

[1]
Manivannan Arivazhagan, Harish Srinivasan, and Sargur Srihari, "A statistical approach to line segmentation in handwritten documents," Volume 6500. SPIE, 2007.
[2]
Masaki Yamaoka and Osamu Iwaki, "Document layout analysis using pattern classification method," Lecture Notes in Computer Science, Vol. 1024/1995, pp. 524--525
[3]
Chih-Hong Kao, Hon-Son Don, "Skew Detection of Document Images Using Line Structural Information," icita, vol. 1, pp. 704--715, Third International Conference on Information Technology and Applications (ICITA'05) Volume 1, 2005
[4]
Arvind K. R., Jayant Kumar and Ramakrishnan A. G., "Entropy Based Skew Correction of Document Images," Lecture Notes in Computer Science, Vol. 4815/2007, Springer, pp. 495--502, 2007
[5]
U.-V. Marti, H. Bunke, "Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition," pp. 0159, Sixth International Conference on Document Analysis and Recognition (ICDAR'01), 2001
[6]
Z. Razak, K. Zulkiflee, "Off-line Handwriting textline segmentation: a review," International Journal of Computer Science and Network Security 8(7)(2008) 12--20.
[7]
Zahour A., Taconet B., Likforman-Sulem L., Bousella W., Overlapping and multi-touching text-line segmentation by block covering analysis, Pattern Analysis & Applications, DOI 10.1007/s10044-008-0127-9, July 2008.
[8]
B. Yanikoglu, P. A. Sandon, "Segmentation of off-line cursive handwriting using linear programming", Pattern Recognition 31(12) (1998) 1825--1833.
[9]
Réjean Plamondon, Sargur N. Srihari, "On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63--84, Jan. 2000
[10]
Y. Lu, "Machine printed character segmentation: an overview," Pattern Recognition 28, 67--80 (1995).
[11]
Vassilis Papavassiliou, Themos Stafylakis, Vassilis Katsouros, George Carayannis, "Handwritten document image segmentation into textlines and words," Pattern Recognition, Volume 43, Issue 1, January 2010, Pages 369--377
[12]
G. Louloudis, B. Gatos, C. Halatsis, "Text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach," ICDAR, vol. 2, pp. 599--603, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
[13]
Amin A. "Off-line Arabic character recognition: The state of the art", Pattern Recognition, Vol. 31, pp. 517--530, 1998.
[14]
A. Zahour, B. Taconet, P. Mercy, S. Ramdane, "Arabic handwritten text-line extraction," In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, 2001, pp. 281--285.
[15]
U. Pal, S. Datta, "Segmentation of Bangla unconstrained handwritten text," In: Proc. of the Seventh International Conference on Document Analysis and Recognition, vol. 2, 2003, pp. 1128--1132
[16]
Venu Govindaraju, Huaigu Cao and Anurag Bhardwaj, "Handwritten Document Retrieval Strategies", Proc. of ICDAR worskhop on Noisy Text Analytics (AND), Spain, 2009.
[17]
Yi Li, Yefeng Zheng and David Doermann, "Detecting Text Line in Handwritten Documents," ICPR'06, pages 1030--1033, 2006.
[18]
Handwritten Arabic Proximity Datasets. Language and Media Processing Laboratory. http://lampsrv02.umiacs.umd.edu/projdb/project.php
[19]
W. Boussellaa, A. Zahour, B. Taconet, A. Benabdelhafid, A. Alimi, "Segmentation texte/graphique: Application au manuscrits Arabes Anciens.", Neuvième Colloque International Francophone sur lŠEcrit et le Document, Fribourg, Suisse, 18--21 Septembre 2006, pp. 139--144
[20]
F. Farooq, V. Govindaraju, and M. Perrone, "Preprocessing Methods for Handwritten Arabic Documents", Proc. Int'l Conf. Document Analysis and Recognition, pp. 267--271, 2005.
[21]
Du, X., Pan, W. et Bui, T. D., "Text line segmentation in handwritten documents using mumford-shah model," Pattern Recogn., 42(12):3136--3145, 2009.
[22]
U. V. Martin and H. Bunke., "Text line segmentation and word recognition in a system for general writer independent handwriting recognition," In Proc. Intl. Conf. on Document Analysis and Recognition, pages 159--163, 2001.
[23]
Brendan J. Frey and Delbert Dueck, "Clustering by Passing Messages Between Data Points," Science 315, 972--976
[24]
Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 24.3: Dijkstra's algorithm," Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill.
[25]
Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Section 22.2: Breadth First Search," Introduction to Algorithms (Second ed.). MIT Press and McGraw-Hill.

Cited By

View all
  • (2024)A Survey on Text-Line Segmentation in Arab Historical ManuscriptsInternational Journal of Informatics and Applied Mathematics10.53508/ijiam.14072367:1(14-32)Online publication date: 13-Jun-2024
  • (2022)Learning-free, divide and conquer text-line extraction algorithm for printed Arabic text with diacriticsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.04.02134:9(7699-7709)Online publication date: Oct-2022
  • (2022)A Review of Various Line Segmentation Techniques Used in Handwritten Character RecognitionInformation and Communication Technology for Competitive Strategies (ICTCS 2021)10.1007/978-981-19-0095-2_34(353-365)Online publication date: 23-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
June 2010
490 pages
ISBN:9781605587738
DOI:10.1145/1815330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arabic
  2. Arabic documents
  3. Dijkstra's shortest path algorithm
  4. affinity propagation
  5. breadth-first search
  6. clustering
  7. handwritten documents
  8. line detection
  9. text line segmentation

Qualifiers

  • Research-article

Funding Sources

Conference

DAS '10

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey on Text-Line Segmentation in Arab Historical ManuscriptsInternational Journal of Informatics and Applied Mathematics10.53508/ijiam.14072367:1(14-32)Online publication date: 13-Jun-2024
  • (2022)Learning-free, divide and conquer text-line extraction algorithm for printed Arabic text with diacriticsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.04.02134:9(7699-7709)Online publication date: Oct-2022
  • (2022)A Review of Various Line Segmentation Techniques Used in Handwritten Character RecognitionInformation and Communication Technology for Competitive Strategies (ICTCS 2021)10.1007/978-981-19-0095-2_34(353-365)Online publication date: 23-Jun-2022
  • (2022)Deep Learning-Based Segmentation of Connected Components in Arabic Handwritten DocumentsIntelligent Systems and Pattern Recognition10.1007/978-3-031-08277-1_8(93-106)Online publication date: 17-Jun-2022
  • (2021)Arabic handwritten text line segmentation using a multi-agent system and a directed CNN2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS)10.1109/ICDS53782.2021.9626747(1-7)Online publication date: 20-Oct-2021
  • (2021)Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examplesInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-021-00362-8Online publication date: 3-Mar-2021
  • (2020)A Robust Progressive Text Line Segmentation Framework with Markov Line DescriptorsProceedings of the 2020 4th International Conference on Video and Image Processing10.1145/3447450.3447482(199-212)Online publication date: 25-Dec-2020
  • (2020)Survey on Segmentation and Recognition of Handwritten Arabic ScriptSN Computer Science10.1007/s42979-020-00187-y1:4Online publication date: 6-Jun-2020
  • (2020)A Robust Method for Text, Line, and Word Segmentation for Historical Arabic ManuscriptsData Analytics for Cultural Heritage10.1007/978-3-030-66777-1_7(147-172)Online publication date: 10-Dec-2020
  • (2019)Efficient Algorithms for Text Lines and Words Segmentation for Recognition of Arabic Handwritten ScriptEnergy Transfer and Dissipation in Plasma Turbulence10.1007/978-981-13-5953-8_32(387-401)Online publication date: 3-May-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media