Ranking Multilingual Documents Using Minimal Language Dependent Resources

Santosh, G. S. K.; Kiran Kumar, N.; Varma, Vasudeva

doi:10.1007/978-3-642-19437-5_17

G. S. K. Santosh¹⁷,
N. Kiran Kumar¹⁷ &
Vasudeva Varma¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1275 Accesses
1 Citations

Abstract

This paper proposes an approach of extracting simple and effective features that enhances multilingual document ranking (MLDR). There is limited prior research on capturing the concept of multilingual document similarity in determining the ranking of documents. However, the literature available has worked heavily with language specific tools, making them hard to reimplement for other languages. Our approach extracts various multilingual and monolingual similarity features using a basic language resource (bilingual dictionary). No language-specific tools are used, hence making this approach extensible for other languages. We used the datasets provided by Forum for Information Retrieval Evaluation (FIRE) for their 2010 Adhoc Cross-Lingual document retrieval task on Indian languages. Experiments have been performed with different ranking algorithms and their results are compared. The results obtained showcase the effectiveness of the features considered in enhancing multilingual document ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Savoy, J., Calve, A.L., Vrajitoru, D.: Report on the TREC-5 experiment: Data fusion and Collection fusion. In: The Fifth Text Retrieval Conference (TREC-5), pp. 489–502 (1997)
Google Scholar
Martinez-Santiago, F., Urena-Lopez, L., Martin-Valdiva, M.: A merging strategy proposal: The 2-step retrieval status value method. In: Information Retrieval, pp. 71–93 (2006)
Google Scholar
Powell, A., French, J., Callan, J., Connell, M., Viles, C.: The impact of Database Selection on Distributed Searching. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 232–239. ACM, New York (2000)
Google Scholar
Lin, W., Chen, H.: Merging Mechanisms in Multilingual Information Retrieval. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 175–186. Springer, Heidelberg (2003)
Chapter Google Scholar
Tsai, M., Wang, Y., Chen, H.: A Study of Learning a Merge Model for Multilingual Information Retrieval. In: Proceedings of SIGIR 2008, pp. 195–202. ACM, New York (2008)
Google Scholar
Gao, W., Niu, C., Zhou, M., Wong, K.-F.: Joint ranking for multilingual web search. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 114–125. Springer, Heidelberg (2009)
Chapter Google Scholar
Savoy, J., Berger, P.-Y.: Selection and merging strategies for multilingual information retrieval. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 27–37. Springer, Heidelberg (2005)
Chapter Google Scholar
Wo, J., Si, L., Nyberg, E., Mitamura, T.: Probabilistic Models for Answer-Ranking in Multilingual Question-Answering. ACM Transactions on Information Systems (2010)
Google Scholar
Huang, A.: Similarity measures for Text Document Clustering. In: Proceedings of New Zealand Computer Science Research Student Conference, pp. 49–56 (2008)
Google Scholar
Wu, F., Weld, D.: Autonomously semantifying Wikipedia. In: Proceedings of Sixteenth CIKM, CIKM 2007. ACM, New York (2007)
Google Scholar
Ganesh, S., Harsha, S., Pingali, P., Varma, V.: Statistical Transliteration for Cross Language Information Retrieval using HMM alignment model and CRF. In: 2nd International Workshop on CLIA, 3rd International Joint Conference on Natural Language Processing (IJCNLP 2008) (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

International Institute of Information Technology, Hyderabad, India
G. S. K. Santosh, N. Kiran Kumar & Vasudeva Varma

Authors

G. S. K. Santosh
View author publications
You can also search for this author in PubMed Google Scholar
N. Kiran Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Vasudeva Varma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santosh, G.S.K., Kiran Kumar, N., Varma, V. (2011). Ranking Multilingual Documents Using Minimal Language Dependent Resources. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics