Abstract
Recently, more and more studies have applied state-of-the-art algorithms for extracting information from handwritten historical documents. Line segmentation is a vital stage in the HTR systems; it directly affects the character segmentation stage, which affects the recognition success. In this study, we first applied deep learning-based layout analysis techniques to detect individuals in the first Ottoman population register series collected between the 1840s and 1860s. Then, we used a star path planning algorithm-based line segmentation to the demographic information of these detected individuals in these registers. We achieved encouraging results from the selected regions, which could be used to recognize the text in these registers.
This work was supported by the European Research Council (ERC) project: “Industrialisation and Urban Growth from the mid-nineteenth century Ottoman Empire to Contemporary Turkey in a Comparative Perspective, 1850–2000” under the European Union’s Horizon 2020 research and innovation program Grant Agreement No. 679097, acronym UrbanOccupationsOETR. M. Erdem Kabadayı is the principal investigator of UrbanOccupationsOETR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Saabni, R.M., El-Sana, J.A.: Keywords image retrieval in historical handwritten Arabic documents. J. Electron. Imaging 22(1), 013016 (2013)
Khedher, M.I., Jmila, H., El-Yacoubi, M.A.: Automatic processing of historical Arabic documents: a comprehensive survey. Pattern Recogn. 100, 107–144 (2020)
Ali, A.A.A., Suresha, M.: Efficient algorithms for text lines and words segmentation for recognition of Arabic handwritten script. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds.) Emerging Research in Computing, Information, Communication and Applications. AISC, vol. 882, pp. 387–401. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-5953-8_32
Can, Y.S., Kabadayı, M.E.: Automatic CNN-based Arabic numeral spotting and handwritten digit recognition by using deep transfer learning in Ottoman population registers. Appl. Sci. 10(16), 5430 (2020)
Can, Y.S., Kabadayı, M.E.: CNN-based page segmentation and object classification for counting population in Ottoman archival documentation. J. Imaging 6(5), 32 (2020)
Khandelwal, A., Choudhury, P., Sarkar, R., Basu, S., Nasipuri, M., Das, N.: Text line segmentation for unconstrained handwritten document images using neighborhood connected component analysis. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) PReMI 2009. LNCS, vol. 5909, pp. 369–374. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-11164-8_60
Boussellaa, W., Zahour, A., Elabed, H., Benabdelhafid, A., Alimi, A.M.: Unsupervised block covering analysis for text-line segmentation of Arabic ancient handwritten document images. In: 2010 20th International Conference on Pattern Recognition, pp. 1929–1932. IEEE (2010)
Khayyat, M., Lam, L., Suen, C.Y., Yin, F., Liu, C.L.: Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 100–104. IEEE (2012)
Adiguzel, H., Sahin, E., Duygulu, P.: A hybrid for line segmentation in handwritten documents. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 503–508. IEEE (2012)
Can, Y.S., Kabadayı, M.E.: Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 1853–1860 (2020)
Oliveira, S.A., Seguin, B., Kaplan, F.: dhSegment: a generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. IEEE (2018)
Krishnan, M.: A* path planning line segmentation algorithm (2020). https://github.com/muthuspark/line-segmentation-handwritten-doc
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Can, Y.S., Kabadayı, M.E. (2021). Line Segmentation of Individual Demographic Data from Arabic Handwritten Population Registers of Ottoman Empire. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12916. Springer, Cham. https://doi.org/10.1007/978-3-030-86198-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-86198-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86197-1
Online ISBN: 978-3-030-86198-8
eBook Packages: Computer ScienceComputer Science (R0)