Skip to main content

Approximate String Matching Techniques for Effective CLIR Among Indian Languages

  • Conference paper
Applications of Fuzzy Sets Theory (WILF 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4578))

Included in the following conference series:

Abstract

Commonly used vocabulary in Indian language documents found on the web contain a number of words that have Sanskrit, Persian or English origin. However, such words may be written in different scripts with slight variations in spelling and morphology. In this paper we explore approximate string matching techniques to exploit this situation of relatively large number of cognates among Indian languages, which are higher when compared to an Indian language and a non-Indian language. We present an approach to identify cognates and make use of them for improving dictionary based CLIR when the query and documents both belong to two different Indian languages. We conduct experiments using a Hindi document collection and a set of Telugu queries and report the improvement due to cognate recognition and translation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pingali, P., Varma, V.: Hindi and Telugu to English Cross Language Information Retrieval at CLEF 2006. In: Working Notes of Cross Language Evaluation Forum 2006 (2006)

    Google Scholar 

  2. Hull, D., Grefenstette, G.: Querying across languages: A dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19th Annual international ACM SIGIR 1996, Zurich, Switzerland, pp. 49–57 (1996)

    Google Scholar 

  3. Radwan, K., Fluhr, C.: Textual database lexicon used as a filter to resolve semantic ambiguity application on multilingual information retrieval. In: The 4th Symp. on Document Analysis and Information Retrieval, pp. 121–136 (1995)

    Google Scholar 

  4. Adriani, M., Croft, W.: The effectiveness of a dictionary-based technique for indonesion-english cross-language text retrieval. CLIR Technical Report IR-170 (1997)

    Google Scholar 

  5. Melamed, I.D.: Bitext maps and alignment via pattern recognition. Computational Linguistics 25(1), 107–130 (1999)

    Google Scholar 

  6. Tiedmann, J.: Combining clues for word alignment. In: Proceedings of the 10th Conference of the European Chapter of the ACL (EACL 2003) (2003)

    Google Scholar 

  7. Koehn, P., Knight, K.: Knowledge sources for word-level translation models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 27–35 (2001)

    Google Scholar 

  8. Mann, G.S., Yarowsky, D.: Multipath translation lexicon induction via bridge languages. In: Proceedings of NAACL 2001, pp. 151–158 (2001)

    Google Scholar 

  9. Pirkola, A., Toivonen, J., Keskustalo, H., Visala, K., Jarvelin, K.: Fuzzy translation of cross-lingual spelling variants. In: Proceedings of SIGIR 2003, pp. 345–352 (2003)

    Google Scholar 

  10. Jaro, M.: Probabilistic linkage of large public health data files. Statistics in Medicine 14, 491–498 (1995)

    Article  Google Scholar 

  11. Winkler, W.: The state record linkage and current research problems. Technical report, statistics of Income Division, Internal Revenue Service Publication (1999)

    Google Scholar 

  12. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Francesco Masulli Sushmita Mitra Gabriella Pasi

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Makin, R., Pandey, N., Pingali, P., Varma, V. (2007). Approximate String Matching Techniques for Effective CLIR Among Indian Languages. In: Masulli, F., Mitra, S., Pasi, G. (eds) Applications of Fuzzy Sets Theory. WILF 2007. Lecture Notes in Computer Science(), vol 4578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73400-0_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73400-0_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73399-7

  • Online ISBN: 978-3-540-73400-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics