Abstract
Extracting lines of text from a manuscript is an important preprocessing step in many digital paleography applications. These extracted lines play a fundamental part in the identification of the author and/or age of the manuscript. In this paper we present an unsupervised approach to text line extraction in historical manuscripts that can be applied directly to a color manuscript image. Each of the red, green and blue channels are processed separately by applying DCT on them individually. One of the key advantages of this approach is that it can be applied directly to the manuscript image without any preprocessing, training or tuning steps. Extensive testing on complex Arabic handwritten manuscripts shows the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shahab, S.A., Al-Khatib, W.G., Mahmoud, S.A.: Computer Aided Indexing of Historical Manuscripts. In: International Conference on Computer Graphics, Imaging and Visualisation, Sydney, 2006, pp. 287–295. doi:10.1109/CGIV.2006.31
Fiel, S., Hollaus, F., Gau, M., Sablatnig, R.: Writer identification on historical Glagolitic documents. In: SPIE Proceedings on Document Recognition and Retrieval XXI, p. 902102 (2013). doi:10.1117/12.2042338
He, S., Sammara, P., Burgers, J., Schomaker, L.: Towards style-based dating of historical documents. In: 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), Heraklion, pp. 265–270 (2014). doi:10.1109/ICFHR.2014.52
Antonacopoulos, A., Clausner, C., Papadopoulous, C., Pletschacher, S.: Historical document layout analysis competition. In: IEEE International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520 (2011)
Antonacopoulos, A., Clausner, C., Papadopoulous, C., Pletschacher, S.: ICDAR 2013 competition on Historical Newspaper Layout Analysis (HNLA 2013). In: IEEE International Conference on Document Analysis and Recognition (ICDAR), pp. 1454–1458 (2013)
Sulem, L.L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recogn. (IJDAR) 9(2–4), 123–138 (2007). doi:10.1007/s10032-006-0023-z
Liwicki, M., Indermuhle, E., Bunke, H.: On-line handwritten text line detection using dynamic programming. In: International Conference on Document Analysis and Recognition, vol. 1, pp. 447–451 (2007)
Fischer, A., Wuthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: International Conference on Virtual Systems and Multimedia, pp. 137–142 (2009)
Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., Stolz, M.: Ground truth creation for handwriting recognition in historical documents. In: IAPR International Workshop on Document Analysis Systems, pp. 3–10 (2010)
Bulacu, M., van Koert, R., Schomaker, L., van der Zant, T.: Layout analysis of handwritten historical documents for searching the archive of the cabinet of the Dutch Queen. In: International Conference on Document Analysis and Recognition, pp. 357–361 (2007)
Marti, U.V., Bunke, H.: Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int. J. Pattern Recogn. Artif. Intell. 15(01), 65–90 (2001)
Louloudis, G., Gatos, B., Pratikakis, I., Halatsis, C.: Text line detection in handwritten documents. Pattern Recogn. 41(12), 3758–3772 (2008)
Likforman-Sulem, L., Hanimyan, A., Faure, C.: A Hough based algorithm for extracting text lines in handwritten documents. In: International Conference on Document Analysis and Recognition, vol. 2, pp. 774–777 (1995)
Nikolaou, N., Makridis, M., Gatos, B., Stamatopoulos, N., Papamarkos, N.: Segmentation of historical machine-printed documents using adaptive run length smoothing and skeleton segmentation paths. Image Vis. Comput. 28(4), 590–604 (2010)
Arvanitopoulos, N., Susstrunk, S.: Seam carving for text line extraction on color and grayscale historical manuscripts. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 726–731, December 2014
Arvanitopoulos, N., Susstrunk, S.: Seam carving for text line extraction on color and grayscale historical manuscripts. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, pp. 726–731, December 2014
Alaql, O., Lu, C.C.: Text line extraction for historical document images using steerable directional filters. In: 2014 International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, 2014, pp. 312–317. doi:10.1109/ICALIP.2014.7009807
Shin, H.K.: Fast text line segmentation model based on DCT for color image. Korean Inf. Process. Soc. Trans. 17D(6), 463–470 (2010). doi:10.3745/KIPSTD.2010.17D.6.463
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete Cosine Transform. IEEE Trans. Comput. C-32, 90–93 (1974)
Strang, G.: The Discrete Cosine Transform. SIAM Rev. 41(1), 135–147 (1999)
Hung, A.C., Meng, T.H.-Y.: A comparison of fast DCT algorithms. Multimed. Syst. 2(5), 204–217 (1994)
Haque, M.A.: A two-dimensional fast cosine transform. IEEE Trans. Acoust. Speech Signal Process. ASSP-33, 1532–1539 (1985)
Acknowledgement
This publication was made possible by NPRP grant # NPRP NPRP7-442-1-082 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Baig, A., Al-Maadeed, S., Bouridane, A., Cheriet, M. (2016). Direct Unsupervised Text Line Extraction from Colored Historical Manuscript Images Using DCT. In: Campilho, A., Karray, F. (eds) Image Analysis and Recognition. ICIAR 2016. Lecture Notes in Computer Science(), vol 9730. Springer, Cham. https://doi.org/10.1007/978-3-319-41501-7_84
Download citation
DOI: https://doi.org/10.1007/978-3-319-41501-7_84
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41500-0
Online ISBN: 978-3-319-41501-7
eBook Packages: Computer ScienceComputer Science (R0)