Skip to main content

Mirror on the Wall: Finding Similar Questions with Deep Structured Topic Modeling

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9652))

Included in the following conference series:

Abstract

Internet users today prefer getting precise answers to their questions rather than sifting through a bunch of relevant documents provided by search engines. This has led to the huge popularity of Community Question Answering (cQA) services like Yahoo! Answers, Baidu Zhidao, Quora, StackOverflow etc., where forum users respond to questions with precise answers. Over time, such cQA archives become rich repositories of knowledge encoded in the form of questions and user generated answers. In cQA archives, retrieval of similar questions, which have already been answered in some form, is important for improving the effectiveness of such forums. The main challenge while retrieving similar questions is the “lexico-syntactic” gap between the user query and the questions already present in the forum. In this paper, we propose a novel approach called “Deep Structured Topic Model (DSTM)” to bridge the lexico-syntactic gap between the question posed by the user and forum questions. DSTM employs a two-step process consisting of initially retrieving similar questions that lie in the vicinity of the query and latent topic vector space and then re-ranking them using a deep layered semantic model. Experiments on large scale real-life cQA dataset show that our approach outperforms the state-of-the-art translation and topic based baseline approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Cai, L., Zhou, G., Liu, K., Zhao, J.: Learning the latent topics for question retrieval in community QA. In: Fifth International Joint Conference on Natural Language Processing, IJCNLP 2011, pp. 273–281 (2011)

    Google Scholar 

  3. Huang, P., He, X., Gao, J., Deng, L., Acero, A., Heck, L.P.: Learning deep structured semantic models for web search using clickthrough data. In: 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, pp. 2333–2338 (2013)

    Google Scholar 

  4. Jeon, J., Croft, W.B., Lee, J.H.: Finding similar questions in large question and answer archives. In: Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, pp. 84–90. ACM (2005)

    Google Scholar 

  5. Ji, Z., Xu, F., Wang, B., He, B.: Question-answer topic model for question retrieval in community question answering. In: 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2471–2474 (2012)

    Google Scholar 

  6. Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations using convolutional neural networks for web search. In: 23rd International World Wide Web Conference, WWW, pp. 373–374 (2014)

    Google Scholar 

  7. Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 475–482 (2008)

    Google Scholar 

  8. Zhang, K., Wu, W., Wu, H., Li, Z., Zhou, M.: Question retrieval with high quality answers in community question answering. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM, pp. 371–380 (2014)

    Google Scholar 

  9. Zhou, G., Cai, L., Zhao, J., Liu, K.: Phrase-based translation model for question retrieval in community question answer archives. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 653–662 (2011)

    Google Scholar 

  10. Zhou, G., He, T., Zhao, J., Hu, P.: Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, pp. 250–259 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arpita Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Das, A., Shrivastava, M., Chinnakotla, M. (2016). Mirror on the Wall: Finding Similar Questions with Deep Structured Topic Modeling. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9652. Springer, Cham. https://doi.org/10.1007/978-3-319-31750-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31750-2_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31749-6

  • Online ISBN: 978-3-319-31750-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics