Skip to main content

An FAQ Search Method Using a Document Classifier Trained with Automatically Generated Training Data

  • Conference paper
  • First Online:
PRICAI 2016: Trends in Artificial Intelligence (PRICAI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9810))

Included in the following conference series:

Abstract

We propose an FAQ (Frequently Asked Question) search method that uses classification results of input queries. FAQs aim at covering frequently asked topics and users usually search topics in FAQs with queries represented by bag-of-words or natural language sentences. However, there is a problem that each question in FAQs is not usually sufficient enough to cover variety of queries that have the similar meaning but different surface expressions, such as synonyms, paraphrase and causal relations due to each topic usually consists of a representative question and its answer. As a result, users who cannot find their answers in FAQs ask a call center operator. To consider similarity of meaning among different surface expressions, we use a document classifier that classifies each query into topics of FAQs. A document classifier is trained with not only FAQs but also corresponding histories of operators for covering variety of queries. However, corresponding histories do not include links to FAQs, we use a method for generating training data from the corresponding histories with FAQs. To generate training data correctly, the method takes advantage of a characteristic that many answers in corresponding histories related to FAQs are created by quoting corresponding FAQs. Our method uses a surface similarity between answers in corresponding histories and the answer part of each topic in FAQs for automatically generating training data. Experimental results show that our method outperforms an FAQ search based method using word matching in terms of Mean Reciprocal Rank and Precision@N.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.elastic.co/jp/.

  2. 2.

    https://taku910.github.io/mecab/.

  3. 3.

    https://taku910.github.io/cabocha/.

  4. 4.

    http://www.statmt.org/moses/giza/GIZA++.html.

References

  1. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. (1993)

    Google Scholar 

  2. Burke, R., Hammond, K., Kulyukin, V., Lytinen, S., Tomuro, N., Schoenberg, S.: Natural language processing in the FAQ finder system: results and prospects. In: Working Notes from AAAI Spring Symposium on NLP on the WWW (1997)

    Google Scholar 

  3. Cao, X., Cong, G., Cui, B., Jensen, C.S.: A generalized framework of exploring category information for question retrieval in community question answer archives. In: Proceedings of the WWW (2010)

    Google Scholar 

  4. Cao, X., Cong, G., Cui, B., Jensen, C.S., Zhang, C.: The use of categorization information in language models for question retrieval. In: Proceedings of CIKM (2009)

    Google Scholar 

  5. Crammer, K., Kulesza, A., Dredze, M.: Adaptive regularization of weight vectors. In: Proceedings of NIPS (2010)

    Google Scholar 

  6. Higashinaka, R., Isozaki, H.: Corpus-based question answering for why-questions. In: Proceedings of IJCNLP (2008)

    Google Scholar 

  7. Jeon, J., Croft, W.B., Lee, J.H.: Finding similar questions in large question and answer archives. In: Proceedings of CIKM (2005)

    Google Scholar 

  8. Jijkoun, V., de Rijke, M.: Retrieving answers from frequently asked questions pages on the web. In: Proceedings of CIKM (2005)

    Google Scholar 

  9. Ko, J., Mitamura, T., Nyberg, E.: Language-independent probabilistic answer ranking for question answering. In: Proceedings of ACL (2007)

    Google Scholar 

  10. Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., Liu, Y.: Statistical machine translation for query expansion in answer retrieval. In: Proceedings of ACL (2007)

    Google Scholar 

  11. Soricut, R., Brill, E.: Automatic question answering using the web: beyond the factoid. Inf. Retr. 9, 191–206 (2006)

    Article  Google Scholar 

  12. Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers on large online QA collections. In: Proceedings of ACL (2008)

    Google Scholar 

  13. Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: Proceedings of SIGIR (2008)

    Google Scholar 

  14. Zhou, G., Liu, Y., Liu, F., Zeng, D., Zhao, J.: Improving question retrieval in community question answering using world knowledge. In: Proceedings of IJCAI (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Takuya Makino , Tomoya Noro or Tomoya Iwakura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Makino, T., Noro, T., Iwakura, T. (2016). An FAQ Search Method Using a Document Classifier Trained with Automatically Generated Training Data. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42911-3_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42910-6

  • Online ISBN: 978-3-319-42911-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics