Skip to main content

On the Effect of Stopword Removal for SMS-Based FAQ Retrieval

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7337))

Abstract

This paper investigates the effects of stopword removal in different stages of a system for SMS-based FAQ retrieval. Experiments are performed on the FIRE 2011 monolingual English data. The FAQ system comprises several stages, including normalization and correction of SMS, retrieval of FAQs potentially containing answers using the BM25 retrieval model, and detection of out-of-domain queries based on a k nearest-neighbor classifier. Both retrieval and OOD detection are tested with different stopword lists. Results indicate that i) retrieval performance is highest when stopwords are not removed and decreases when longer stopword lists are employed, ii) OOD detection accuracy decreases when trained on features collected during retrieval using no stopwords, iii) a combination of retrieval using no stopwords and OOD detection trained using the SMART stopwords yields the best results: 75.1% in-domain queries are answered correctly and 85.6% OOD queries are detected correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg memory based learner, version 6.2, reference guide. Technical Report 09-01, ILK (2004)

    Google Scholar 

  2. Dolamic, L., Savoy, J.: When stopword lists make the difference. JASIST 61(1), 200–203 (2010)

    Article  Google Scholar 

  3. El-Khair, I.A.: Effects of stop words elimination for Arabic information retrieval: A comparative study. International Journal of Computing & Information Sciences 4(3), 119–133 (2006)

    Google Scholar 

  4. Ferguson, P., Hare, N.O., Lanagan, J., Smeaton, A., Phelan, O., McCarthy, K., Smyth, B.: CLARITY at the TREC 2011 Microblog Track. In: Proceedings of the 20th TREC Conference. National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA (2011)

    Google Scholar 

  5. Fox, C.J.: A stop list for general text. SIGIR Forum 24(1-2), 19–35 (1990)

    Article  Google Scholar 

  6. Harter, S.P.: Online information retrieval. Concepts, principles, and techniques. Academic Press (1986)

    Google Scholar 

  7. Hogan, D., Leveling, J., Wang, H., Ferguson, P., Gurrin, C.: DCU@FIRE 2011: SMS-based FAQ retrieval. In: 3rd Workshop of the Forum for Information Retrieval Evaluation, FIRE 2011, IIT Bombay, December 2-4, pp. 34–42 (2011)

    Google Scholar 

  8. Kothari, G., Negi, S., Faruquie, T.A., Chakaravarthy, V.T., Subramaniam, L.V.: SMS based interface for FAQ retrieval. In: ACL/IJNLP 2009, pp. 852–860 (2009)

    Google Scholar 

  9. Lo, R.T.W., He, B., Ounis, I.: Automatically building a stopword list for an information retrieval system. JDIM 3(1), 3–8 (2005)

    Google Scholar 

  10. Robertson, S.E., Walker, S., Jones, S., Beaulieu, M.M.H., Gatford, M.: Okapi at TREC-3. In: Harman, D.K. (ed.) Overview of the Third Text Retrieval Conference (TREC-3), pp. 109–126. National Institute of Standards and Technology (NIST), Gaithersburg (1995)

    Google Scholar 

  11. Tagg, C.: A corpus linguistics study of SMS text messaging. Ph.D. thesis, University of Birmingham (2009)

    Google Scholar 

  12. Zou, F., Wang, F.L., Deng, X., Han, S., Wang, L.S.: Automatic construction of Chinese stop word list. In: 5th WSEAS International Conference on Applied Computer Science (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leveling, J. (2012). On the Effect of Stopword Removal for SMS-Based FAQ Retrieval. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31178-9_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31177-2

  • Online ISBN: 978-3-642-31178-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics