A Search Engine for Indian Languages

Mujoo, Ashwani; Malviya, Manoj Kumar; Moona, Rajat; Prabhakar, T V

doi:10.1007/3-540-44463-7_30

Ashwani Mujoo⁷,
Manoj Kumar Malviya⁷,
Rajat Moona⁷ &
…
T V Prabhakar⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1875))

Included in the following conference series:

International Conference on Electronic Commerce and Web Technologies

845 Accesses
3 Altmetric

Abstract

There is a great need for a search engine for web documents written in languages other than English. In this paper, we describe the design issues of a Search Engine for Indian Languages. We also describe the implementation of two Search Engines for Indian Languages, one for documents in ISCII and the other for documents in Unicode. The software allows full-text indexing and searching of a database of documents written in any Brahmi-based Indian Language. The Search engine gathers the HTML documents from the web, indexes and compresses the documents and then searches for the given keywords. The main features of the search engines are phonetic tolerance, morphological analysis, compression and indexing, leading and trailing substring matches for keywords, search through compressed documents. The implementation includes a search server architecture, which can be accessed from a WYSIWYG front end, which is a Java swing applet. Performance results show that the search engine achieves a compression of almost 80 percent and has an appreciable precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

AfriWeb: A Web Search Engine for a Marginalized Language

Indian Languages Requirements for String Search/comparison on Web

Keyword-Based Search on Bilingual Digital Libraries

References

S. Varadrajan and T. Chieuh, SASE: Implementation of a Compressed Text Search Engine, Proceedings of the USENIX symposium on Internet Technologies and Systems, 1997.
Google Scholar
M Wolf, K Whistler, C Wicksteed: Unicode Technical Report #6, A Standard Compression Scheme for Unicode, http://www.unicode.org.
RFC Archive, UTF-8, A transformation format of ISO 10646, Network Working Group, SunSite, Denmark.
Google Scholar
Indian Script Code for Information Interchange-ISCII standard. Bureau of Indian Standards, New Delhi, December 1992.
Google Scholar
Puneet Chopra: An Efficient Concurrency Control Model for Compressed Tries, Department of Computer Science and Engineering, Indian Institute of Technology, Delhi.
Google Scholar
Dr. Vineet Chaitanya and Dr. Rajeev Sangal: Morphological Analyser for Anusarka, Indian Languages Translation Project, IIT Kanpur Center for National Language Processing, University of Hyderabad, Hyderabad.
Google Scholar
Unicode Home page http://www.unicode.org
Mujoo, A.: A Search Engine for Devanagari in Unicode with Compression, M.Tech. Thesis, IIT Kanpur, March 2000
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India
Ashwani Mujoo, Manoj Kumar Malviya, Rajat Moona & T V Prabhakar

Authors

Ashwani Mujoo
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Kumar Malviya
View author publications
You can also search for this author in PubMed Google Scholar
Rajat Moona
View author publications
You can also search for this author in PubMed Google Scholar
T V Prabhakar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IFI, University of Zürich, Winterthurer Str. 190, 8057, Zürich, Switzerland
Kurt Bauknecht
Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
Sanjay Kumar Madria
Department of Information Systems, University of Essen, Universitätsstr. 9, 45141, Essen, Germany
Günther Pernul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mujoo, A., Malviya, M.K., Moona, R., Prabhakar, T.V. (2000). A Search Engine for Indian Languages. In: Bauknecht, K., Madria, S.K., Pernul, G. (eds) Electronic Commerce and Web Technologies. EC-Web 2000. Lecture Notes in Computer Science, vol 1875. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44463-7_30

Download citation

DOI: https://doi.org/10.1007/3-540-44463-7_30
Published: 14 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67981-3
Online ISBN: 978-3-540-44463-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Search Engine for Indian Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

AfriWeb: A Web Search Engine for a Marginalized Language

Indian Languages Requirements for String Search/comparison on Web

Keyword-Based Search on Bilingual Digital Libraries

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Search Engine for Indian Languages

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

AfriWeb: A Web Search Engine for a Marginalized Language

Indian Languages Requirements for String Search/comparison on Web

Keyword-Based Search on Bilingual Digital Libraries

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation