Skip to main content
Log in

Learning to suggest questions in social media

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Social media systems with Q&A functionalities have accumulated large archives of questions and answers. Two representative types are online forums and community-based Q&A services. To enable users to explore the large number of questions and answers in social media systems effectively, it is essential to suggest interesting items to an active user. In this article, we address the problem of question suggestion, which targets at suggesting questions that are semantically related to a queried question. Existing bag-of-words approaches suffer from the shortcoming that they could not bridge the lexical chasm between semantically related questions. Therefore, we present a new framework, and propose the topic-enhanced translation-based language model (TopicTRLM), which fuses both the lexical and latent semantic knowledge. This fusing enables TopicTRLM to find semantically related questions to a given question even when there is little word overlap. Moreover, to incorporate the answer information into the model to make the model more complete, we also propose the topic-enhanced translation-based language model with answer ensemble. Extensive experiments have been conducted with real-world datasets. Experimental results indicate our approach is very effective and outperforms other popular methods in several metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://yanswersblog.com/index.php/archives/2010/05/03/1-billion-answers-served/.

  2. http://zhidao.baidu.com/.

  3. http://www.prnewswire.com/news-releases/tripadvisor-grows-and-grows-and-grows-119678844.html.

References

  1. Adamic LA, Zhang J et al (2008) Knowledge sharing and yahoo answers: everyone knows something. In: Proceedings of the 17th international conference on World Wide Web. ACM

  2. Agichtein E, Lawrence S et al (2001) Learning search engine specific query transformations for question answering. In: Proceedings of the 10th international conference on World Wide Web. ACM

  3. Agichtein E, Liu Y et al (2009) Modeling information-seeker satisfaction in community question answering. ACM Trans Knowl Discov Data (TKDD) 3(2):10

    Google Scholar 

  4. Berger A, Caruana R et al (2000) Bridging the lexical chasm: statistical approaches to answer-finding. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM

  5. Berger A, Lafferty J (1999) Information retrieval as statistical translation. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM

  6. Bernhard D, Gurevych I (2009) Combining lexical semantic resources with question and answer archives for translation-based answer finding. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: volume 2-volume 2. Association for Computational Linguistics

  7. Bian J, Liu Y et al (2008) Finding the right facts in the crowd: factoid question answering over social media. In: Proceedings of the 17th international conference on World Wide Web. ACM

  8. Blei DM, Ng AY et al (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  9. Brown PF, Cocke J et al (1990) A statistical approach to machine translation. Comput Linguist 16(2): 79–85

    Google Scholar 

  10. Buckley C, Singhal A et al (1995) New retrieval approaches using SMART: TREC 4. In: Proceedings of the 4th text REtrieval conference (TREC-4)

  11. Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM

  12. Burke RD, Hammond KJ et al (1997) Question answering from frequently asked question files: experiences with the faq finder system. AI Mag 18(2):57

    Google Scholar 

  13. Cao X, Cong G et al (2010) A generalized framework of exploring category information for question retrieval in community question answer archives. In: Proceedings of the 19th international conference on World Wide Web. ACM

  14. Cao X, Cong G (2012) Approaches to exploring category information for question retrieval in community question-answer archives. ACM Trans Inf Syst (TOIS) 30(2):7

    Article  Google Scholar 

  15. Cao Y, Duan H et al (2011) Re-ranking question search results by clustering questions. J Am Soci Inf Sci Technol 62(6):1177–1187

    Article  Google Scholar 

  16. Cao Y, Duan H et al (2008) Recommending questions using the mdl-based tree cut model. In: Proceedings of the 17th international conference on World Wide Web. ACM

  17. Cong G, Wang L et al (2008) Finding question-answer pairs from online forums. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM

  18. Deerwester SC, Dumais ST et al (1990) Indexing by latent semantic analysis. JASIS 41(6):391–407

    Article  Google Scholar 

  19. Demner-Fushman D, Lin J (2007) Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist 33(1):63–103

    Article  Google Scholar 

  20. Duan H, Cao Y et al (2008) Searching questions by identifying question topic and question focus. In: Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies

  21. Ferrucci D, Brown E et al (2010) Building Watson: an overview of the deepQA project. AI Mag 31(3): 59–79

    Google Scholar 

  22. Gazan R (2011) Social Q&A. J Am Soc Inf Sci Technol 62(12):2301–2312

    Article  Google Scholar 

  23. Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Nat Acad Sci USA 101(Suppl 1):5228–5235

    Article  Google Scholar 

  24. Harabagiu S, Moldovan D et al (2001) Answering complex, list and context questions with LCC’s question-answering server. In: Proceedings of the text retrieval conference for question answering (TREC 10)

  25. Heinrich G (2005) Parameter estimation for text analysis. Fraunhofer IGD

  26. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM

  27. Huston S, Croft WB (2010) Evaluating verbose query processing techniques. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM

  28. Jeon J, Croft WB et al (2005) Finding semantically similar questions based on their answers. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. ACM

  29. Jeon J, Croft WB et al (2005) Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM

  30. Jijkoun V, de Rijke M (2005) Retrieving answers from frequently asked questions pages on the web. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM

  31. Kim S, Oh S (2009) Users’ relevance criteria for evaluating answers in a social Q&A site. J Am Soc Inf Sci Technol 60(4):716–727

    Article  MathSciNet  Google Scholar 

  32. Li B, Liu Y et al (2008) CoCQA: co-training over questions and answers with an application to predicting question subjectivity orientation. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics

  33. Lin J, Katz B (2006) Building a reusable test collection for question answering. J Am Soc Inf Sci Technol 57(7):851–861

    Article  Google Scholar 

  34. Lou J, Fang YL et al (2013) Contributing high quantity and quality knowledge to online Q&A communities. J Am Soc Inf Sci Technol 64(2):356–371

    Article  Google Scholar 

  35. Lou J, Lim KH et al (2011) Drivers of knowledge contribution quality and quantity in online question and answering communities. In: Proceedings of the 15th pacific conference on information systems

  36. Lou J, Lim KH et al (2012) Knowledge contribution in online question and answering communities: effects of groups membership. In: Proceedings of the 2012 international conference on information systems

  37. Manning CD, Raghavan P et al (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  38. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  39. Mitra M, Singhal A et al (1998) Improving automatic query expansion. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM

  40. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  MATH  Google Scholar 

  41. Ofoghi B, Yearwood J et al (2009) The impact of frame semantic annotation levels, frame-alignment techniques, and fusion methods on factoid answer processing. J Am Soc Inf Sci Technol 60(2):247–263

    Article  Google Scholar 

  42. Phan XH, Nguyen LM et al (2008) Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: Proceedings of the 17th international conference on World Wide Web. ACM

  43. Pomerantz J (2005) A linguistic analysis of question taxonomies. J Am Soc Inf Sci Technol 56(7):715–728

    Article  Google Scholar 

  44. Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Article  Google Scholar 

  45. Qu B, Cong G et al (2012) An evaluation of classification models for question topic categorization. J Am Soc Inf Sci Technol 63(5):889–903

    Article  Google Scholar 

  46. Raban DR (2009) Self-presentation and the value of information in Q&A websites. J Am Soc Inf Sci Technol 60(12):2465–2473

    Article  Google Scholar 

  47. Radev D, Fan W et al (2005) Probabilistic question answering on the web. J Am Soc Inf Sci Technol 56(6):571–583

    Article  Google Scholar 

  48. Radev DR, Libner K et al (2002) Getting answers to natural language questions on the web. J Am Soc Inf Sci Technol 53(5):359–364

    Article  Google Scholar 

  49. Ramage D, Heymann P et al (2009) Clustering the tagged web. In: Proceedings of the second ACM international conference on web search and data mining. ACM

  50. Ramos J (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning

  51. Riezler S, Vasserman A et al (2007) Statistical machine translation for query expansion in answer retrieval. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics

  52. Rosen-Zvi M, Chemudugunta C et al (2010) Learning author-topic models from text corpora. ACM Trans Inf Syst (TOIS) 28(1):4

    Article  Google Scholar 

  53. Rosenbaum H, Shachaf P (2010) A structuration approach to online communities of practice: the case of Q&A communities. J Am Soc Inf Sci Technol 61(9):1933–1944

    Article  Google Scholar 

  54. Shah C, Kitzie V (2012) Social Q&A and virtual reference–comparing apples and oranges with the help of experts and users. J Am Soc Inf Sci Technol 63(10):2020–2036

    Google Scholar 

  55. Liu GZ (1998) Automated information retrieval: theory and methods. J Am Soc Inf Sci 49(10):953–955

    Google Scholar 

  56. Shrestha L, McKeown K (2004) Detection of question-answer pairs in email conversations. In: Proceedings of the 20th international conference on computational linguistics. Association for Computational Linguistics

  57. Shtok A, Dror G et al (2012) Learning from the past: answering new questions with past answers. In: Proceedings of the 21st international conference on World Wide Web. ACM

  58. Soricut R, Brill E (2004) Automatic question answering: Beyond the factoid. In: Proceedings of the HLT-NAACL

  59. Sparck Jones K (1971) Automatic keyword classification for information retrieval. Butterworths, London

    Google Scholar 

  60. Voorhees E, Tice DM (1999) The TREC-8 question answering track evaluation. In: Proceedings of the eighth text retrieval conference (TREC-8). http://trec.nist.gov/pubs/trec8/t8_proceedings.html

  61. Wang K, Ming Z et al (2009) A syntactic tree matching approach to finding similar questions in community-based qa services. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM

  62. Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval

  63. Wu CH, Yeh JF et al (2005) Domain-specific FAQ retrieval using independent aspects. ACM Trans Asian Lang Inf Process (TALIP) 4(1):1–17

    Article  Google Scholar 

  64. Xue X, Jeon J et al (2008) Retrieval models for question and answer archives. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM

  65. Yahoo! Yahoo! Webscope dataset, ydata-yanswers-all-questions-v1\(\_0\). http://research.yahoo.com/Academic_Relations

  66. Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst (TOIS) 22(2):179–214

    Article  Google Scholar 

  67. Zhou TC, Lin CY et al (2011) Learning to suggest questions in online forums. In: Proceedings of the 25th AAAI conference on artificial intelligence

  68. Zhou TC, Lyu MR et al (2012) A classification-based approach to question routing in community question answering. In: Proceedings of the 21st international conference companion on World Wide Web. ACM

  69. Zhou TC, Ma H et al (2009) Tagrec: leveraging tagging wisdom for recommendation. Computational Science and Engineering, 2009. CSE’09. International Conference on IEEE

  70. Zhou TC, Ma H et al (2010) UserRec: a user recommendation framework in social tagging systems. AAAI

  71. Zhou TC, Si X et al (2012) A data-driven approach to question subjectivity identification in community question answering. In: Proceedings of the twenty-sixth AAAI conference on artificial intelligence

Download references

Acknowledgments

The work described in this paper was fully supported by the Basic Research Program of Shenzhen (Project No. JCYJ20120619152419087 and JC201104220300A), and the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 413212 and CUHK 415212). The authors would like to thank the anonymous reviewers for their insightful comments and helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom Chao Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, T.C., Lyu, M.RT., King, I. et al. Learning to suggest questions in social media. Knowl Inf Syst 43, 389–416 (2015). https://doi.org/10.1007/s10115-014-0737-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0737-z

Keywords

Navigation