Two Models for the SMS-Based FAQ Retrieval Task of FIRE 2011

Vilariño, Darnes; Pinto, David; León, Saul; Castillo, Esteban; Tovar, Mireya

doi:10.1007/978-3-642-40087-2_17

Darnes Vilariño²¹,
David Pinto²¹,
Saul León²¹,
Esteban Castillo²¹ &
…
Mireya Tovar²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7536))

Abstract

In this paper we propose a normalization model in order to standardize the terms used in SMS. For this purpose, we use a statistical bilingual dictionary calculated on the basis of the IBM-4 model for determining the best translation for a given SMS term. In order to compare our proposal with another method of document retrieval, we have submitted to the FIRE 2011 competition forum a second run which was obtained by using a probabilistic information retrieval model which employes the same statistical dictionaries used by our normalization method.

The obtained results show that the normalization model greatly improves the performance of the probabilistic one. An interesting finding indicates that the Malayalam language is the one that seems to be better written in the SMS context, in comparison with the English and Hindi languages which were also evaluated in the framework of the monolingual, crosslingual and multilingual environments.

This project has been partially supported by projects CONACYT #106625, VIEP #VIAD-ING12-I y #PIAD-ING12-I.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kim, H., Seo, J.: High-performance faq retrieval using an automatic clustering method of query logs. Inf. Process. Manage. 42, 650–661 (2006)
Article Google Scholar
Kim, H., Lee, H., Seo, J.: A reliable faq retrieval system using a query log classification technique based on latent semantic analysis. Inf. Process. Manage. 43, 420–430 (2007)
Article Google Scholar
Kim, H., Seo, J.: Cluster-based faq retrieval using latent term weights. IEEE Intelligent Systems 23, 58–65 (2008)
Google Scholar
Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., Liu, Y.: Statistical machine translation for query expansion in answer retrieval. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 464–471. Association for Computational Linguistics, Prague (2007)
Google Scholar
Wu, C.H., Yeh, J.F., Chen, M.J.: Domain-specific faq retrieval using independent aspects. ACM Transactions on Asian Language Information Processing (TALIP) 4, 1–17 (2005)
Article Google Scholar
Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for sms text normalization. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, COLING-ACL 2006, pp. 33–40. Association for Computational Linguistics, Stroudsburg (2006)
Chapter Google Scholar
Kothari, G., Negi, S., Faruquie, T.A., Chakaravarthy, V.T., Subramaniam, L.V.: SMS based interface for FAQ retrieval. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009, vol. 2, pp. 852–860. Association for Computational Linguistics, Morristown (2009)
Google Scholar
Contractor, D., Kothari, G., Faruquie, T.A., Subramaniam, L.V., Negi, S.: Handling noisy queries in cross language faq retrieval. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 87–96. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., Rosso, P.: A statistical approach to crosslingual natural language tasks. J. Algorithms 64, 51–60 (2009)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Mexico
Darnes Vilariño, David Pinto, Saul León, Esteban Castillo & Mireya Tovar

Authors

Darnes Vilariño
View author publications
You can also search for this author in PubMed Google Scholar
David Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Saul León
View author publications
You can also search for this author in PubMed Google Scholar
Esteban Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Mireya Tovar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
Prasenjit Majumder
Indian Statistical Institute, Kolkata, India
Mandar Mitra
Indian Institutte of Technology, Bombay, India
Pushpak Bhattacharyya
IBM Research New Delhi, India
L. Venkata Subramaniam & Danish Contractor &
NLE Lab - ELiRF, Universitat Politècnica de València, Valencia, Spain
Paolo Rosso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vilariño, D., Pinto, D., León, S., Castillo, E., Tovar, M. (2013). Two Models for the SMS-Based FAQ Retrieval Task of FIRE 2011. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds) Multilingual Information Access in South Asian Languages. Lecture Notes in Computer Science, vol 7536. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40087-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-40087-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40086-5
Online ISBN: 978-3-642-40087-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics