Skip to main content

Line Segmentation of Individual Demographic Data from Arabic Handwritten Population Registers of Ottoman Empire

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 Workshops (ICDAR 2021)

Abstract

Recently, more and more studies have applied state-of-the-art algorithms for extracting information from handwritten historical documents. Line segmentation is a vital stage in the HTR systems; it directly affects the character segmentation stage, which affects the recognition success. In this study, we first applied deep learning-based layout analysis techniques to detect individuals in the first Ottoman population register series collected between the 1840s and 1860s. Then, we used a star path planning algorithm-based line segmentation to the demographic information of these detected individuals in these registers. We achieved encouraging results from the selected regions, which could be used to recognize the text in these registers.

This work was supported by the European Research Council (ERC) project: “Industrialisation and Urban Growth from the mid-nineteenth century Ottoman Empire to Contemporary Turkey in a Comparative Perspective, 1850–2000” under the European Union’s Horizon 2020 research and innovation program Grant Agreement No. 679097, acronym UrbanOccupationsOETR. M. Erdem Kabadayı is the principal investigator of UrbanOccupationsOETR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Saabni, R.M., El-Sana, J.A.: Keywords image retrieval in historical handwritten Arabic documents. J. Electron. Imaging 22(1), 013016 (2013)

    Article  Google Scholar 

  2. Khedher, M.I., Jmila, H., El-Yacoubi, M.A.: Automatic processing of historical Arabic documents: a comprehensive survey. Pattern Recogn. 100, 107–144 (2020)

    Google Scholar 

  3. Ali, A.A.A., Suresha, M.: Efficient algorithms for text lines and words segmentation for recognition of Arabic handwritten script. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds.) Emerging Research in Computing, Information, Communication and Applications. AISC, vol. 882, pp. 387–401. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-5953-8_32

    Chapter  Google Scholar 

  4. Can, Y.S., Kabadayı, M.E.: Automatic CNN-based Arabic numeral spotting and handwritten digit recognition by using deep transfer learning in Ottoman population registers. Appl. Sci. 10(16), 5430 (2020)

    Article  Google Scholar 

  5. Can, Y.S., Kabadayı, M.E.: CNN-based page segmentation and object classification for counting population in Ottoman archival documentation. J. Imaging 6(5), 32 (2020)

    Article  Google Scholar 

  6. Khandelwal, A., Choudhury, P., Sarkar, R., Basu, S., Nasipuri, M., Das, N.: Text line segmentation for unconstrained handwritten document images using neighborhood connected component analysis. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) PReMI 2009. LNCS, vol. 5909, pp. 369–374. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-11164-8_60

    Chapter  Google Scholar 

  7. Boussellaa, W., Zahour, A., Elabed, H., Benabdelhafid, A., Alimi, A.M.: Unsupervised block covering analysis for text-line segmentation of Arabic ancient handwritten document images. In: 2010 20th International Conference on Pattern Recognition, pp. 1929–1932. IEEE (2010)

    Google Scholar 

  8. Khayyat, M., Lam, L., Suen, C.Y., Yin, F., Liu, C.L.: Arabic handwritten text line extraction by applying an adaptive mask to morphological dilation. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 100–104. IEEE (2012)

    Google Scholar 

  9. Adiguzel, H., Sahin, E., Duygulu, P.: A hybrid for line segmentation in handwritten documents. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 503–508. IEEE (2012)

    Google Scholar 

  10. Can, Y.S., Kabadayı, M.E.: Curation of historical Arabic handwritten digit datasets from Ottoman population registers: a deep transfer learning case study. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 1853–1860 (2020)

    Google Scholar 

  11. Oliveira, S.A., Seguin, B., Kaplan, F.: dhSegment: a generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. IEEE (2018)

    Google Scholar 

  12. Krishnan, M.: A* path planning line segmentation algorithm (2020). https://github.com/muthuspark/line-segmentation-handwritten-doc

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yekta Said Can .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Can, Y.S., Kabadayı, M.E. (2021). Line Segmentation of Individual Demographic Data from Arabic Handwritten Population Registers of Ottoman Empire. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12916. Springer, Cham. https://doi.org/10.1007/978-3-030-86198-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86198-8_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86197-1

  • Online ISBN: 978-3-030-86198-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics