Skip to main content

TrieIR: Indexing and Retrieval Engine for Kannada Unicode Text

  • Conference paper
Digital Libraries: Social Media and Community Networks (ICADL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8279))

Included in the following conference series:

  • 1647 Accesses

Abstract

Kannada is a phonetic language. In Kannada language, the morphological forms of terms (especially of nouns and verbs) are formed by adding different morphological suffixes to their pure forms. Hence, when queried for morphological forms, search engines based on exact matching fail to identify other semantically similar and morphologically different terms, and thus reduce the quality of the search results. We observe that even though the morphological forms of a term look different, they can be grouped together based on their common prefixes. In this work we propose fuzzy matching based indexing and retrieval algorithms. We propose an indexing mechanism inspired from prefix trees. We also derive our inspirations from the fact that the Unicode encodes the Kannada terms very similar to the way terms are generated using Kannada grammar. We also discuss a query term truncation and decayed score based retrieval algorithm for better retrieval of the documents for the given query. The indexing and retrieval systems still are based on the tf-idf based indexing and retrieval. However, the novelty of the work lies in the way the algorithms bring together the similar terms. This solution can be scaled to work for other South Indian languages with no or little modification as their Unicode encoding and morphological behaviors are similar to Kannada.

This work is a part of Kanaja project, conceptualised by Karnataka Jnana Ayoga.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kulkarni, S., Srinivasa, S.: A Novel IR Approach for Kannada Unicode Text. Technical Report, Open Systems Lab (2013), http://osl.iiitb.ac.in/reports/trieir_report.pdf

  2. Bar-Ilan, J., Gutman, T.: How do search engines handle non-English queries?-A case study. WWW (Alternate Paper Tracks) (2003)

    Google Scholar 

  3. Singh, A.K., Surana, H., Gali, K.: More accurate fuzzy text search for languages using abugida scripts. In: Proceedings of ACM SIGIR Workshop on Improving Web Retrieval for Non-English Queries (2007)

    Google Scholar 

  4. Vikram, T.N., Urs, S.R.: Development of Prototype Morphological Analyzer for he South Indian Language of Kannada. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 109–116. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Singh, A.K.: A computational phonetic model for Indian language scripts. In: Constraints on Spelling Changes: Fifth International Workshop on Writing Systems (2006)

    Google Scholar 

  6. Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)

    Google Scholar 

  7. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Kulkarni, S., Srinivasa, S. (2013). TrieIR: Indexing and Retrieval Engine for Kannada Unicode Text. In: Urs, S.R., Na, JC., Buchanan, G. (eds) Digital Libraries: Social Media and Community Networks. ICADL 2013. Lecture Notes in Computer Science, vol 8279. Springer, Cham. https://doi.org/10.1007/978-3-319-03599-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03599-4_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03598-7

  • Online ISBN: 978-3-319-03599-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics