Learning to suggest questions in social media

Zhou, Tom Chao; Lyu, Michael Rung-Tsong; King, Irwin; Lou, Jie

doi:10.1007/s10115-014-0737-z

Learning to suggest questions in social media

Regular Paper
Published: 04 March 2014

Volume 43, pages 389–416, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Tom Chao Zhou¹,
Michael Rung-Tsong Lyu^2,3,
Irwin King^2,3 &
…
Jie Lou⁴

764 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Social media systems with Q&A functionalities have accumulated large archives of questions and answers. Two representative types are online forums and community-based Q&A services. To enable users to explore the large number of questions and answers in social media systems effectively, it is essential to suggest interesting items to an active user. In this article, we address the problem of question suggestion, which targets at suggesting questions that are semantically related to a queried question. Existing bag-of-words approaches suffer from the shortcoming that they could not bridge the lexical chasm between semantically related questions. Therefore, we present a new framework, and propose the topic-enhanced translation-based language model (TopicTRLM), which fuses both the lexical and latent semantic knowledge. This fusing enables TopicTRLM to find semantically related questions to a given question even when there is little word overlap. Moreover, to incorporate the answer information into the model to make the model more complete, we also propose the topic-enhanced translation-based language model with answer ensemble. Extensive experiments have been conducted with real-world datasets. Experimental results indicate our approach is very effective and outperforms other popular methods in several metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

Article 30 January 2023

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Article 26 October 2022

Notes

References

Adamic LA, Zhang J et al (2008) Knowledge sharing and yahoo answers: everyone knows something. In: Proceedings of the 17th international conference on World Wide Web. ACM
Agichtein E, Lawrence S et al (2001) Learning search engine specific query transformations for question answering. In: Proceedings of the 10th international conference on World Wide Web. ACM
Agichtein E, Liu Y et al (2009) Modeling information-seeker satisfaction in community question answering. ACM Trans Knowl Discov Data (TKDD) 3(2):10
Google Scholar
Berger A, Caruana R et al (2000) Bridging the lexical chasm: statistical approaches to answer-finding. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval. ACM
Berger A, Lafferty J (1999) Information retrieval as statistical translation. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM
Bernhard D, Gurevych I (2009) Combining lexical semantic resources with question and answer archives for translation-based answer finding. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: volume 2-volume 2. Association for Computational Linguistics
Bian J, Liu Y et al (2008) Finding the right facts in the crowd: factoid question answering over social media. In: Proceedings of the 17th international conference on World Wide Web. ACM
Blei DM, Ng AY et al (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Brown PF, Cocke J et al (1990) A statistical approach to machine translation. Comput Linguist 16(2): 79–85
Google Scholar
Buckley C, Singhal A et al (1995) New retrieval approaches using SMART: TREC 4. In: Proceedings of the 4th text REtrieval conference (TREC-4)
Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM
Burke RD, Hammond KJ et al (1997) Question answering from frequently asked question files: experiences with the faq finder system. AI Mag 18(2):57
Google Scholar
Cao X, Cong G et al (2010) A generalized framework of exploring category information for question retrieval in community question answer archives. In: Proceedings of the 19th international conference on World Wide Web. ACM
Cao X, Cong G (2012) Approaches to exploring category information for question retrieval in community question-answer archives. ACM Trans Inf Syst (TOIS) 30(2):7
Article Google Scholar
Cao Y, Duan H et al (2011) Re-ranking question search results by clustering questions. J Am Soci Inf Sci Technol 62(6):1177–1187
Article Google Scholar
Cao Y, Duan H et al (2008) Recommending questions using the mdl-based tree cut model. In: Proceedings of the 17th international conference on World Wide Web. ACM
Cong G, Wang L et al (2008) Finding question-answer pairs from online forums. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM
Deerwester SC, Dumais ST et al (1990) Indexing by latent semantic analysis. JASIS 41(6):391–407
Article Google Scholar
Demner-Fushman D, Lin J (2007) Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist 33(1):63–103
Article Google Scholar
Duan H, Cao Y et al (2008) Searching questions by identifying question topic and question focus. In: Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies
Ferrucci D, Brown E et al (2010) Building Watson: an overview of the deepQA project. AI Mag 31(3): 59–79
Google Scholar
Gazan R (2011) Social Q&A. J Am Soc Inf Sci Technol 62(12):2301–2312
Article Google Scholar
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Nat Acad Sci USA 101(Suppl 1):5228–5235
Article Google Scholar
Harabagiu S, Moldovan D et al (2001) Answering complex, list and context questions with LCC’s question-answering server. In: Proceedings of the text retrieval conference for question answering (TREC 10)
Heinrich G (2005) Parameter estimation for text analysis. Fraunhofer IGD
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM
Huston S, Croft WB (2010) Evaluating verbose query processing techniques. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM
Jeon J, Croft WB et al (2005) Finding semantically similar questions based on their answers. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. ACM
Jeon J, Croft WB et al (2005) Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM
Jijkoun V, de Rijke M (2005) Retrieving answers from frequently asked questions pages on the web. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM
Kim S, Oh S (2009) Users’ relevance criteria for evaluating answers in a social Q&A site. J Am Soc Inf Sci Technol 60(4):716–727
Article MathSciNet Google Scholar
Li B, Liu Y et al (2008) CoCQA: co-training over questions and answers with an application to predicting question subjectivity orientation. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics
Lin J, Katz B (2006) Building a reusable test collection for question answering. J Am Soc Inf Sci Technol 57(7):851–861
Article Google Scholar
Lou J, Fang YL et al (2013) Contributing high quantity and quality knowledge to online Q&A communities. J Am Soc Inf Sci Technol 64(2):356–371
Article Google Scholar
Lou J, Lim KH et al (2011) Drivers of knowledge contribution quality and quantity in online question and answering communities. In: Proceedings of the 15th pacific conference on information systems
Lou J, Lim KH et al (2012) Knowledge contribution in online question and answering communities: effects of groups membership. In: Proceedings of the 2012 international conference on information systems
Manning CD, Raghavan P et al (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book MATH Google Scholar
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Mitra M, Singhal A et al (1998) Improving automatic query expansion. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Article MATH Google Scholar
Ofoghi B, Yearwood J et al (2009) The impact of frame semantic annotation levels, frame-alignment techniques, and fusion methods on factoid answer processing. J Am Soc Inf Sci Technol 60(2):247–263
Article Google Scholar
Phan XH, Nguyen LM et al (2008) Learning to classify short and sparse text and web with hidden topics from large-scale data collections. In: Proceedings of the 17th international conference on World Wide Web. ACM
Pomerantz J (2005) A linguistic analysis of question taxonomies. J Am Soc Inf Sci Technol 56(7):715–728
Article Google Scholar
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Article Google Scholar
Qu B, Cong G et al (2012) An evaluation of classification models for question topic categorization. J Am Soc Inf Sci Technol 63(5):889–903
Article Google Scholar
Raban DR (2009) Self-presentation and the value of information in Q&A websites. J Am Soc Inf Sci Technol 60(12):2465–2473
Article Google Scholar
Radev D, Fan W et al (2005) Probabilistic question answering on the web. J Am Soc Inf Sci Technol 56(6):571–583
Article Google Scholar
Radev DR, Libner K et al (2002) Getting answers to natural language questions on the web. J Am Soc Inf Sci Technol 53(5):359–364
Article Google Scholar
Ramage D, Heymann P et al (2009) Clustering the tagged web. In: Proceedings of the second ACM international conference on web search and data mining. ACM
Ramos J (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning
Riezler S, Vasserman A et al (2007) Statistical machine translation for query expansion in answer retrieval. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics
Rosen-Zvi M, Chemudugunta C et al (2010) Learning author-topic models from text corpora. ACM Trans Inf Syst (TOIS) 28(1):4
Article Google Scholar
Rosenbaum H, Shachaf P (2010) A structuration approach to online communities of practice: the case of Q&A communities. J Am Soc Inf Sci Technol 61(9):1933–1944
Article Google Scholar
Shah C, Kitzie V (2012) Social Q&A and virtual reference–comparing apples and oranges with the help of experts and users. J Am Soc Inf Sci Technol 63(10):2020–2036
Google Scholar
Liu GZ (1998) Automated information retrieval: theory and methods. J Am Soc Inf Sci 49(10):953–955
Google Scholar
Shrestha L, McKeown K (2004) Detection of question-answer pairs in email conversations. In: Proceedings of the 20th international conference on computational linguistics. Association for Computational Linguistics
Shtok A, Dror G et al (2012) Learning from the past: answering new questions with past answers. In: Proceedings of the 21st international conference on World Wide Web. ACM
Soricut R, Brill E (2004) Automatic question answering: Beyond the factoid. In: Proceedings of the HLT-NAACL
Sparck Jones K (1971) Automatic keyword classification for information retrieval. Butterworths, London
Google Scholar
Voorhees E, Tice DM (1999) The TREC-8 question answering track evaluation. In: Proceedings of the eighth text retrieval conference (TREC-8). http://trec.nist.gov/pubs/trec8/t8_proceedings.html
Wang K, Ming Z et al (2009) A syntactic tree matching approach to finding similar questions in community-based qa services. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM
Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval
Wu CH, Yeh JF et al (2005) Domain-specific FAQ retrieval using independent aspects. ACM Trans Asian Lang Inf Process (TALIP) 4(1):1–17
Article Google Scholar
Xue X, Jeon J et al (2008) Retrieval models for question and answer archives. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM
Yahoo! Yahoo! Webscope dataset, ydata-yanswers-all-questions-v1\(\_0\). http://research.yahoo.com/Academic_Relations
Zhai C, Lafferty J (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst (TOIS) 22(2):179–214
Article Google Scholar
Zhou TC, Lin CY et al (2011) Learning to suggest questions in online forums. In: Proceedings of the 25th AAAI conference on artificial intelligence
Zhou TC, Lyu MR et al (2012) A classification-based approach to question routing in community question answering. In: Proceedings of the 21st international conference companion on World Wide Web. ACM
Zhou TC, Ma H et al (2009) Tagrec: leveraging tagging wisdom for recommendation. Computational Science and Engineering, 2009. CSE’09. International Conference on IEEE
Zhou TC, Ma H et al (2010) UserRec: a user recommendation framework in social tagging systems. AAAI
Zhou TC, Si X et al (2012) A data-driven approach to question subjectivity identification in community question answering. In: Proceedings of the twenty-sixth AAAI conference on artificial intelligence

Download references

Acknowledgments

The work described in this paper was fully supported by the Basic Research Program of Shenzhen (Project No. JCYJ20120619152419087 and JC201104220300A), and the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 413212 and CUHK 415212). The authors would like to thank the anonymous reviewers for their insightful comments and helpful suggestions.

Author information

Authors and Affiliations

Baidu Inc., Shenzhen, China
Tom Chao Zhou
Shenzhen Key Laboratory of Rich Media Big Data Analytics and Applications, Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China
Michael Rung-Tsong Lyu & Irwin King
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
Michael Rung-Tsong Lyu & Irwin King
Department of Information Systems, City University of Hong Kong, Kowloon Tong, Hong Kong
Jie Lou

Authors

Tom Chao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Michael Rung-Tsong Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Irwin King
View author publications
You can also search for this author in PubMed Google Scholar
Jie Lou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom Chao Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, T.C., Lyu, M.RT., King, I. et al. Learning to suggest questions in social media. Knowl Inf Syst 43, 389–416 (2015). https://doi.org/10.1007/s10115-014-0737-z

Download citation

Received: 16 June 2013
Revised: 12 January 2014
Accepted: 17 January 2014
Published: 04 March 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s10115-014-0737-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to suggest questions in social media

Abstract

Access this article

Similar content being viewed by others

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

A systematic review and research perspective on recommender systems

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning to suggest questions in social media

Abstract

Access this article

Similar content being viewed by others

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

A systematic review and research perspective on recommender systems

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation