Abstract
This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as parameter. It is applied to the document image frame from left to right and vice versa. As a result, the unwetted and wetted areas are established. These areas separate text from non-text elements in each text line, respectively. Hence, they represent the control areas that are of major importance for text line segmentation. Primarily, an extended approach means extraction of the connected-components by bounding boxes over text. By this way, each connected component is mutually separated. Hence, the water flow angle, which defines the unwetted areas, is determined adaptively. By choosing appropriate water flow angle, the unwetted areas are lengthening which leads to the better text line segmentation. Results of this approach are encouraging due to the text line segmentation improvement which is the most challenging step in document image processing.
Similar content being viewed by others
References
Likforman-Sulem L, Zahour A, Taconet B. Text line segmentation of historical documents: A survey. International Journal on Document Analysis and Recognition, 2007, 9(2–4): 123–138.
Amin A, Wu S. Robust skew detection in mixed text/graphics documents. In Proc. of the 8th ICDAR, Seoul, Korea, Aug. 29-Sept. 1, 2005, pp.247–251.
Razak Z, Zulkiflee K et al. Off-line handwriting text line segmentation: A review. International Journal of Computer Science and Network Security (IJCSNS), 2008, 8(7): 12–20.
Shi Z, Govindaraju V. Line separation for complex document images using fuzzy runlength. In Proc. the 1st Int. Workshop on Document Image Analysis for Libraries, Palo Alto, USA, Jan. 24, 2004, pp.306–312.
Yi L, Zhong Y, Doermann D, Jaeger S. Script-independent text line segmentation in freestyle handwritten documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(8): 1313–1329.
Basu S, Chaudhuri C, Kundu M, Nasipuri M, Basu D K. Text line extraction from multi-skewed handwritten documents. Pattern Recognition, 2007, 40(6): 1825–1839.
Brodić D, Milivojević Z. An Approach to modification of water flow algorithm for segmentation and text parameters extraction. In IFIP Advances in Information and Communication Technology 314, Camarinha-Matos L M, Pereira P, Ribeiro L (eds.), Springer-Verlag, 2010, pp.324–331.
Brodić D, Milivojević Z. A new approach to water flow algorithm for text line segmentation. Journal of Universal Computer Science, 2011, 17(1): 30–47.
Gonzalez R C, Woods R E. Digital Image Processing, 3rd edition. Prentice-Hall, 2007.
Otsu N. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1): 62–66.
Tsai W H. Moment-preserving thresholding: A new approach. Computer Vision, Graphics, and Image Processing, 1985, 29(3): 377–393.
Sanchez A, Suarez P D, Mello C A B, Oliveira A L I, Alves V M O. Text line segmentation in images of handwritten historical documents. In Proc. the 1st IPTA, Sousse, Tunisia, 2008, pp.1–6.
Preparata F P, Shamos M I. Computational Geometry: An Introduction. Springer, 1985.
Wang J, Leung M K H, Hui S C. Cursive word reference line detection. Pattern Recognition, 1997, 30(3): 503–511.
Brodić D, Milivojević D R, Milivojević Z. Basic test framework for the evaluation of text line segmentation and text parameter extraction. Sensors, 2010, 10(5): 5263–5279.
Brodić D. Methodology for the evaluation of the algorithms for text line segmentation based on extended binary classification. Measurement Science Review, 2011, 11(3): 71–78.
Brodić D. Advantages of the extended water flow algorithm for handwritten text segmentation. In Lecture Notes in Computer Science 6744, Kuznetsov S O et al. (eds.), Springer-Verlag, 2011, pp. 418–423.
Brodić D, Milivojević D R, Milivojević Z. An approach to a comprehensive test framework for analysis and evaluation of text line segmentation algorithms. Sensors, 2011, 11(9): 8782–8812.
Swets J A. Measuring the accuracy of diagnostic systems. Science, 1988, 240(4857): 1285–1293.
Qian X, Liu G, Wang H, Su R. Text detection, localization, and tracking in compressed video. Signal Processing: Image Communication, 2007, 22(9): 752–768.
Brodić D. The evaluation of the initial skew rate for printed text. Journal of Electrical Engineering — Elektrotechnickỳ časopis, 2011, 62(3): 134–140.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Brodić, D. Extended Approach to Water Flow Algorithm for Text Line Segmentation. J. Comput. Sci. Technol. 27, 187–194 (2012). https://doi.org/10.1007/s11390-012-1216-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-012-1216-1