Abstract
Community Question Answering (cQA) sites have emerged as platforms designed specifically for the exchange of questions and answers among users. Although users tend to find good quality answers in cQA sites, they also engage in a significant volume of QA interactions in other platforms, such as microblog networking sites. This in part is explained because microblog platforms contain up-to-date information on current events, provide rapid information propagation, and have social trust.
Despite the potential of microblog platforms, such as Twitter, for automatic QA retrieval, how to leverage them for this task is not clear. There are unique characteristics that differentiate Twitter from traditional cQA platforms (e.g., short message length, low quality and noisy information), which do not allow to directly apply prior findings in the area. In this work, we address this problem by studying: (1) the feasibility of Twitter as a QA platform and (2) the discriminating features that identify relevant answers to a particular query. In particular, we create a document model at conversation-thread level, which enables us to aggregate microblog information, and set up a learning-to-rank framework, using factoid QA as a proxy task. Our experimental results show microblog data can indeed be used to perform QA retrieval effectively. We identify domain-specific features and combinations of those features that better account for improving QA ranking, achieving a MRR of 0.7795 (improving 62% over our baseline method). In addition, we provide evidence that our method allows to retrieve complex answers to non-factoid questions.
J. Herrera and B. Poblete have been partially funded by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004. J. Herrera is partially funded by the CONICYT Doctoral Program and D. Parra is funded by FONDECYT under grant 2015/11150783. J. Herrera, B. Poblete and D. Parra have been partially funded by FONDEF under grant ID16I10222.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
328 million users in June 2016 (https://about.twitter.com/es/company).
- 5.
6W1H corresponds to 5WH1 with the addition of the terms “Which” (i.e. Who, What, Where, When, Why, Which and How).
- 6.
Due to space constrains only a high-level description of the features is provided. However, the detailed list of features is available at https://goo.gl/qqACz5.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of ICML 2005, pp. 89–96 (2005)
Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval -Implementing and Evaluating Search Engines. MIT Press, Cambridge (2010)
Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.Y.: An empirical study on learning to rank of Tweets. In: Proceedings of COLING 2010, pp. 295–303 (2010)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Herrera, J., Poblete, B., Parra, D.: Retrieving relevant conversations for Q&A on Twitter. In: Proceedings of ACM SIGIR (Workshop of SPS) (2015)
Honey, C., Herring, S.C.: Beyond microblogging: conversation and collaboration via Twitter. In: Proceedings of HICSS 2009, pp. 1–10 (2009)
Java, A., Song, X., Finin, T., Tseng, B.: Why we Twitter: understanding microblogging usage and communities. In: Proceedings of WebKDD/SNA-KDD 2007, pp. 56–65 (2007)
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Pearson Education International (2014)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of WWW 2010, pp. 591–600 (2010)
Liu, Z., Jansen, B.J.: A taxonomy for classifying questions asked in social question and answering. In: Proceedings of CHI EA 2015, pp. 1947–1952 (2015)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR (2013)
Molino, P., Aiello, L.M., Lops, P.: Social question answering: textual, user, and network features for best answer prediction. ACM TOIS 35, 4–40 (2016)
Morris, M.R., Teevan, J., Panovich, K.: A comparison of information seeking using search engines and social networks. In: Proceedings of ICWSM 2010, pp. 23–26 (2010)
Morris, M.R., Teevan, J., Panovich, K.: What do people ask their social networks, and why?: a survey study of status message Q&A behavior. In: Proceedings of CWSM 2010, pp. 1739–1748 (2010)
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of ACL (2008)
Paul, S.A., Hong, L., Chi, E.H.: Is Twitter a good place for asking questions? a characterization study. In: Proceedings of CWSM 2010, pp. 578–581 (2011)
Raban, D.R.: Self-presentation and the value of information in Q&A websites. JASIST 60(12), 2465–2473 (2009)
Sriram, B.: Short text classification in Twitter to improve information filtering. In: Proceedings of ACM SIGIR 2010. ACM (2010)
Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers on large online QA collections. In: Proceedings of ACL 2008, pp. 719–727 (2008)
Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers to non-factoid questions from web collections. Comput. Linguist. 37(2), 351–383 (2011)
Wu, Q., Burges, C.J.C., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13(3), 254–270 (2010)
Yang, L., Ai, Q., Spina, D., Chen, R.C., Pang, L., Croft, W.B., Guo, J., Scholer, F.: Beyond factoid QA-effective methods for non-factoid answer sentence retrieval. In: Proceedings of ECIR (2016)
Zhao, Z., Mei, Q.: Questions about questions: an empirical analysis of information needs on Twitter. In: Proceedings of WWW 2013, pp. 1545–1556 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Herrera, J., Poblete, B., Parra, D. (2018). Learning to Leverage Microblog Information for QA Retrieval. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-76941-7_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)