An FAQ Search Method Using a Document Classifier Trained with Automatically Generated Training Data

Makino, Takuya; Noro, Tomoya; Iwakura, Tomoya

doi:10.1007/978-3-319-42911-3_25

Takuya Makino¹⁵,
Tomoya Noro¹⁵ &
Tomoya Iwakura¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9810))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

2537 Accesses
1 Citations

Abstract

We propose an FAQ (Frequently Asked Question) search method that uses classification results of input queries. FAQs aim at covering frequently asked topics and users usually search topics in FAQs with queries represented by bag-of-words or natural language sentences. However, there is a problem that each question in FAQs is not usually sufficient enough to cover variety of queries that have the similar meaning but different surface expressions, such as synonyms, paraphrase and causal relations due to each topic usually consists of a representative question and its answer. As a result, users who cannot find their answers in FAQs ask a call center operator. To consider similarity of meaning among different surface expressions, we use a document classifier that classifies each query into topics of FAQs. A document classifier is trained with not only FAQs but also corresponding histories of operators for covering variety of queries. However, corresponding histories do not include links to FAQs, we use a method for generating training data from the corresponding histories with FAQs. To generate training data correctly, the method takes advantage of a characteristic that many answers in corresponding histories related to FAQs are created by quoting corresponding FAQs. Our method uses a surface similarity between answers in corresponding histories and the answer part of each topic in FAQs for automatically generating training data. Experimental results show that our method outperforms an FAQ search based method using word matching in terms of Mean Reciprocal Rank and Precision@N.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. (1993)
Google Scholar
Burke, R., Hammond, K., Kulyukin, V., Lytinen, S., Tomuro, N., Schoenberg, S.: Natural language processing in the FAQ finder system: results and prospects. In: Working Notes from AAAI Spring Symposium on NLP on the WWW (1997)
Google Scholar
Cao, X., Cong, G., Cui, B., Jensen, C.S.: A generalized framework of exploring category information for question retrieval in community question answer archives. In: Proceedings of the WWW (2010)
Google Scholar
Cao, X., Cong, G., Cui, B., Jensen, C.S., Zhang, C.: The use of categorization information in language models for question retrieval. In: Proceedings of CIKM (2009)
Google Scholar
Crammer, K., Kulesza, A., Dredze, M.: Adaptive regularization of weight vectors. In: Proceedings of NIPS (2010)
Google Scholar
Higashinaka, R., Isozaki, H.: Corpus-based question answering for why-questions. In: Proceedings of IJCNLP (2008)
Google Scholar
Jeon, J., Croft, W.B., Lee, J.H.: Finding similar questions in large question and answer archives. In: Proceedings of CIKM (2005)
Google Scholar
Jijkoun, V., de Rijke, M.: Retrieving answers from frequently asked questions pages on the web. In: Proceedings of CIKM (2005)
Google Scholar
Ko, J., Mitamura, T., Nyberg, E.: Language-independent probabilistic answer ranking for question answering. In: Proceedings of ACL (2007)
Google Scholar
Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., Liu, Y.: Statistical machine translation for query expansion in answer retrieval. In: Proceedings of ACL (2007)
Google Scholar
Soricut, R., Brill, E.: Automatic question answering using the web: beyond the factoid. Inf. Retr. 9, 191–206 (2006)
Article Google Scholar
Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers on large online QA collections. In: Proceedings of ACL (2008)
Google Scholar
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: Proceedings of SIGIR (2008)
Google Scholar
Zhou, G., Liu, Y., Liu, F., Zeng, D., Zhao, J.: Improving question retrieval in community question answering using world knowledge. In: Proceedings of IJCAI (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Fujitsu Laboratories Ltd., Sunnyvale, USA
Takuya Makino, Tomoya Noro & Tomoya Iwakura

Authors

Takuya Makino
View author publications
You can also search for this author in PubMed Google Scholar
Tomoya Noro
View author publications
You can also search for this author in PubMed Google Scholar
Tomoya Iwakura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Takuya Makino , Tomoya Noro or Tomoya Iwakura .

Editor information

Editors and Affiliations

Cardiff University, Cardiff, United Kingdom
Richard Booth
Southeast University , Nanjing, China
Min-Ling Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Makino, T., Noro, T., Iwakura, T. (2016). An FAQ Search Method Using a Document Classifier Trained with Automatically Generated Training Data. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-42911-3_25
Published: 10 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42910-6
Online ISBN: 978-3-319-42911-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics