Creating a Handwriting Recognition Corpus for Bushman Languages

Williams, Kyle; Suleman, Hussein

doi:10.1007/978-3-642-24826-9_28

Kyle Williams¹⁹ &
Hussein Suleman¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7008))

Included in the following conference series:

International Conference on Asian Digital Libraries

2085 Accesses
2 Citations
6 Altmetric

Abstract

Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. To solve this problem, a semi-automatic Web-based tool was developed to segment, capture and encode the Bushman text. A case study demonstrated how the tool could be used to create a Bushman handwriting corpus with few errors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Suleman, H.: Digital libraries without databases: The Bleek and Lloyd collection. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 392–403. Springer, Heidelberg (2007)
Chapter Google Scholar
Marti, U., Bunke, H.: A full English sentence database for off-line handwriting recognition. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 705–708. IEEE, Washington, DC (1999)
Google Scholar
Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recognit. 9, 139–152 (2007)
Article Google Scholar
Makridis, M., Nikolaou, N., Gatos, B.: An efficient word segmentation technique for historical and degraded machine-printed documents. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, pp. 178–182. IEEE, Washington, DC (2007)
Google Scholar
Al-Ma’adeed, S., Elliman, D., Higgins, C.A.: A data base for arabic handwritten text recognition research. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 485–489. IEEE, Washington, DC (2002)
Chapter Google Scholar
Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., Stolz, M.: Ground truth creation for handwriting recognition in historical documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 3–10. ACM, New York (2010)
Chapter Google Scholar
Surowiecki, J.: The wisdom of crowds: why the many are smarter than the few. Abacus (2005)
Google Scholar
Setlur, S., Kompalli, S., Ramanaprasad, V., Govindaraju, V.: Creation of data resources and design of an evaluation test bed for Devanagari script recognition. In: 13th International Workshop on Research Issues in Data Engineering: Multi-lingual Information Management, pp. 55–61. IEEE, Washington, DC (2003)
Google Scholar
Lee, R.A., Balick, M.J.: Indigenous use of hoodia gordonii and appetite suppression. EXPLORE: The Journal of Science and Healing 3(4), 404–406 (2007)
Article Google Scholar
Williams, K., Manilal, S., Molwantoa, L., Suleman, H.: A visual dictionary for an extinct language. In: Chowdhury, G., Koo, C., Hunter, J. (eds.) ICADL 2010. LNCS, vol. 6102, pp. 1–4. Springer, Heidelberg (2010)
Chapter Google Scholar
Williams, K., Suleman, H.: Translating handwritten bushman texts. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 109–118. ACM, New York (2010)
Chapter Google Scholar
Rei, F.: Tipa: A system for processing phonetic symbols in LaTeX. TUGboat 17(2), 102–114 (1996)
Google Scholar
Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13(1), 146–168 (2007)
Google Scholar
Marti, U., Bunke, H.: On the influence of vocabulary size and language models in unconstrained handwritten text recognition. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 260–265. IEEE, Washington, DC (2001)
Chapter Google Scholar
Pastor, M., Toselli, A.H., Vidal, E.: Projection profile based algorithm for slant removal. In: Campilho, A.C., Kamel, M.S. (eds.) ICIAR 2004. LNCS, vol. 3212, pp. 183–190. Springer, Heidelberg (2004)
Chapter Google Scholar
Shapiro, L., Stockman, G.: Computer vision. Prentice Hall, Englewood Cliffs (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Cape Town, Private Bag X3, Rondebosch, 7701, South Africa
Kyle Williams & Hussein Suleman

Authors

Kyle Williams
View author publications
You can also search for this author in PubMed Google Scholar
Hussein Suleman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Science and Technology Building, Tsinghua University, 100084, Beijing, P.R. China
Chunxiao Xing
Faculty of Informatics, University of Lugano, 6900, Lugano, Switzerland
Fabio Crestani
Institute of Software Technology and Interactive Systems,, Vienna University of Technology, 1040, Vienna, Austria
Andreas Rauber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Williams, K., Suleman, H. (2011). Creating a Handwriting Recognition Corpus for Bushman Languages. In: Xing, C., Crestani, F., Rauber, A. (eds) Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation. ICADL 2011. Lecture Notes in Computer Science, vol 7008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24826-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-24826-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24825-2
Online ISBN: 978-3-642-24826-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics