TrieIR: Indexing and Retrieval Engine for Kannada Unicode Text

Kulkarni, Sumant; Srinivasa, Srinath

doi:10.1007/978-3-319-03599-4_3

Sumant Kulkarni¹⁹ &
Srinath Srinivasa¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8279))

Included in the following conference series:

International Conference on Asian Digital Libraries

1681 Accesses

Abstract

Kannada is a phonetic language. In Kannada language, the morphological forms of terms (especially of nouns and verbs) are formed by adding different morphological suffixes to their pure forms. Hence, when queried for morphological forms, search engines based on exact matching fail to identify other semantically similar and morphologically different terms, and thus reduce the quality of the search results. We observe that even though the morphological forms of a term look different, they can be grouped together based on their common prefixes. In this work we propose fuzzy matching based indexing and retrieval algorithms. We propose an indexing mechanism inspired from prefix trees. We also derive our inspirations from the fact that the Unicode encodes the Kannada terms very similar to the way terms are generated using Kannada grammar. We also discuss a query term truncation and decayed score based retrieval algorithm for better retrieval of the documents for the given query. The indexing and retrieval systems still are based on the tf-idf based indexing and retrieval. However, the novelty of the work lies in the way the algorithms bring together the similar terms. This solution can be scaled to work for other South Indian languages with no or little modification as their Unicode encoding and morphological behaviors are similar to Kannada.

This work is a part of Kanaja project, conceptualised by Karnataka Jnana Ayoga.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Retrieval Methods of Natural Language Based on Automatic Indexing

The Effects of Word Frequency Distortions Occasioned by Compounding on the Automatic Indexing of Yorùbá Text

Cross-Lingual Information Retrieval: A Dictionary-Based Query Translation Approach

References

Kulkarni, S., Srinivasa, S.: A Novel IR Approach for Kannada Unicode Text. Technical Report, Open Systems Lab (2013), http://osl.iiitb.ac.in/reports/trieir_report.pdf
Bar-Ilan, J., Gutman, T.: How do search engines handle non-English queries?-A case study. WWW (Alternate Paper Tracks) (2003)
Google Scholar
Singh, A.K., Surana, H., Gali, K.: More accurate fuzzy text search for languages using abugida scripts. In: Proceedings of ACM SIGIR Workshop on Improving Web Retrieval for Non-English Queries (2007)
Google Scholar
Vikram, T.N., Urs, S.R.: Development of Prototype Morphological Analyzer for he South Indian Language of Kannada. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 109–116. Springer, Heidelberg (2007)
Chapter Google Scholar
Singh, A.K.: A computational phonetic model for Indian language scripts. In: Constraints on Spelling Changes: Fifth International Workshop on Writing Systems (2006)
Google Scholar
Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Bangalore, 26/C, Electronic City, Bangalore, India
Sumant Kulkarni & Srinath Srinivasa

Authors

Sumant Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Srinath Srinivasa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International School of Information Management, University of Mysore, Mysore, India
Shalini R. Urs
Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore
Jin-Cheon Na
School of Informatics, City University London, London, UK
George Buchanan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kulkarni, S., Srinivasa, S. (2013). TrieIR: Indexing and Retrieval Engine for Kannada Unicode Text. In: Urs, S.R., Na, JC., Buchanan, G. (eds) Digital Libraries: Social Media and Community Networks. ICADL 2013. Lecture Notes in Computer Science, vol 8279. Springer, Cham. https://doi.org/10.1007/978-3-319-03599-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-03599-4_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03598-7
Online ISBN: 978-3-319-03599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics