Keyword Spotting Techniques for Sanskrit Documents

Bhardwaj, Anurag; Setlur, Srirangaraj; Govindaraju, Venu

doi:10.1007/978-3-642-00155-0_22

Anurag Bhardwaj²²,
Srirangaraj Setlur²² &
Venu Govindaraju²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5402))

Included in the following conference series:

948 Accesses

Abstract

With advances in the field of digitization of printed documents and several mass digitization projects underway, information retrieval and document search have emerged as key research areas. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts and languages such as Sanskrit has hampered information extraction from a large body of documents of cultural and historical importance. This chapter presents two relevant topics in this area. First, we describe the use of a script specific Keyword Spotting for Sanskrit documents that makes use of domain knowledge of the script. Second, we address the needs of a digital library to provide access to a collection of documents from multiple scripts. This requires intelligent solutions which scale across different scripts. We present a script independent Keyword Spotting approach for this purpose. Experimental results illustrate the efficacy of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Providing Access to Old Greek Documents Using Keyword Spotting Techniques

Making Large Collections of Handwritten Material Easily Accessible and Searchable

HMM Based Keyword Spotting System in Printed/Handwritten Arabic/Latin Documents with Identification Stage

References

Howe, N.R., Rath, T.M., Manmatha, R.: Boosted decision trees for word recognition in handwritten document retrievals. In: Proceedings of the SIGIR, pp. 377–383 (2005)
Google Scholar
Lee, D.R., Kim, W.Y., Oh, I.S.: Hangul document image retrieval system using rank-based recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, vol. 2, pp. 615–619 (2005)
Google Scholar
Rath, T.M., Manmatha, R., Layrenko, V.: A search engine for historical manuscripts. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (2004)
Google Scholar
Burl, M., Perona, P.: Using hierarchical shape models to spot keywords in cursive handwriting. In: IEEECS Conference on Computer Vision and Pattern Recognition, pp. 535–540 (1998)
Google Scholar
Decurtins, J.L., Chen, E.C.: Keyword spotting via word shape recognition. In: Vincent, L.M., Baird, H.S. (eds.) Proceedings of SPIE Document Recognition II, vol. 2422, pp. 270–277 (1995)
Google Scholar
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 521–527 (2003)
Google Scholar
Cao, H., Govindaraju, V.: Template-Free Word Spotting in Low-Quality Manuscripts. In: Proceedings of the 6th International Conference on Advances in Pattern Recognition, pp. 135–139 (2007)
Google Scholar
Rath, T., Manmatha, R.: Features for Word Spotting in Historical Manuscripts. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 218–222 (2003)
Google Scholar
Srihari, S.N., Srinivasan, H., Huang, C., Shetty, S.: Spotting Words in Latin, Devanagari and Arabic Scripts. Vivek: Indian Journal of Artificial Intelligence (2006)
Google Scholar
Bhardwaj, A., Kompalli, S., Setlur, S., Govindaraju, V.: An OCR based approach to word spotting in Devanagari documents. In: Proceedings of the 15th SPIE - Document Recognition and Retrieval, vol. 6815 (2008)
Google Scholar
Teh, C.-H., Chin, R.T.: On Image Analysis by the Methods of Moments. IEEE Trans. Pattern Analysis and Machine Intelligence 10(4), 496513 (1988)
Article Google Scholar
Alt, F.L.: Digital Pattern Recognition by Moments. The Journal of the ACM 9(2), 240–258 (1962)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Unified Biometrics and Sensors Department of Computer Science and Engineering, University at Buffalo, Amherst, NY – 14228, USA
Anurag Bhardwaj, Srirangaraj Setlur & Venu Govindaraju

Authors

Anurag Bhardwaj
View author publications
You can also search for this author in PubMed Google Scholar
Srirangaraj Setlur
View author publications
You can also search for this author in PubMed Google Scholar
Venu Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA, Centre de Paris-Rocquencourt, BP 105, 78153, Le Chesnay Cedex, France
Gérard Huet
Department of Sanskrit Studies, University of Hyderabad, 500046, Hyderabad, India
Amba Kulkarni
Department of Classics, Brown University, Macfarlane House, 48 College Street,, RI 02912, Providence, USA
Peter Scharf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhardwaj, A., Setlur, S., Govindaraju, V. (2009). Keyword Spotting Techniques for Sanskrit Documents. In: Huet, G., Kulkarni, A., Scharf, P. (eds) Sanskrit Computational Linguistics. ISCLS ISCLS 2007 2008. Lecture Notes in Computer Science(), vol 5402. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00155-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-00155-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00154-3
Online ISBN: 978-3-642-00155-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics