Learning to Leverage Microblog Information for QA Retrieval

Herrera, Jose; Poblete, Barbara; Parra, Denis

doi:10.1007/978-3-319-76941-7_38

Jose Herrera¹⁷,
Barbara Poblete¹⁷ &
Denis Parra¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10772))

Included in the following conference series:

European Conference on Information Retrieval

4431 Accesses
5 Citations

Abstract

Community Question Answering (cQA) sites have emerged as platforms designed specifically for the exchange of questions and answers among users. Although users tend to find good quality answers in cQA sites, they also engage in a significant volume of QA interactions in other platforms, such as microblog networking sites. This in part is explained because microblog platforms contain up-to-date information on current events, provide rapid information propagation, and have social trust.

Despite the potential of microblog platforms, such as Twitter, for automatic QA retrieval, how to leverage them for this task is not clear. There are unique characteristics that differentiate Twitter from traditional cQA platforms (e.g., short message length, low quality and noisy information), which do not allow to directly apply prior findings in the area. In this work, we address this problem by studying: (1) the feasibility of Twitter as a QA platform and (2) the discriminating features that identify relevant answers to a particular query. In particular, we create a document model at conversation-thread level, which enables us to aggregate microblog information, and set up a learning-to-rank framework, using factoid QA as a proxy task. Our experimental results show microblog data can indeed be used to perform QA retrieval effectively. We identify domain-specific features and combinations of those features that better account for improving QA ranking, achieving a MRR of 0.7795 (improving 62% over our baseline method). In addition, we provide evidence that our method allows to retrieve complex answers to non-factoid questions.

J. Herrera and B. Poblete have been partially funded by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004. J. Herrera is partially funded by the CONICYT Doctoral Program and D. Parra is funded by FONDECYT under grant 2015/11150783. J. Herrera, B. Poblete and D. Parra have been partially funded by FONDEF under grant ID16I10222.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://answers.yahoo.com.
2.
http://stackexchange.com.
3.
http://www.twitter.com.
4.
328 million users in June 2016 (https://about.twitter.com/es/company).
5.
6W1H corresponds to 5WH1 with the addition of the terms “Which” (i.e. Who, What, Where, When, Why, Which and How).
6.
Due to space constrains only a high-level description of the features is provided. However, the detailed list of features is available at https://goo.gl/qqACz5.
7.
http://www.fredericgodin.com/software/.
8.
http://trec.nist.gov/data/qamain.html.
9.
https://github.com/brmson/dataset-factoid-curated.
10.
https://dev.twitter.com/rest/public/search.
11.
https://github.com/jotixh/ConversationThreadsTwitter/.
12.
https://sourceforge.net/p/lemur/wiki/RankLib/.

References

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of ICML 2005, pp. 89–96 (2005)
Google Scholar
Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval -Implementing and Evaluating Search Engines. MIT Press, Cambridge (2010)
MATH Google Scholar
Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.Y.: An empirical study on learning to rank of Tweets. In: Proceedings of COLING 2010, pp. 295–303 (2010)
Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
MATH Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Herrera, J., Poblete, B., Parra, D.: Retrieving relevant conversations for Q&A on Twitter. In: Proceedings of ACM SIGIR (Workshop of SPS) (2015)
Google Scholar
Honey, C., Herring, S.C.: Beyond microblogging: conversation and collaboration via Twitter. In: Proceedings of HICSS 2009, pp. 1–10 (2009)
Google Scholar
Java, A., Song, X., Finin, T., Tseng, B.: Why we Twitter: understanding microblogging usage and communities. In: Proceedings of WebKDD/SNA-KDD 2007, pp. 56–65 (2007)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Pearson Education International (2014)
Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of WWW 2010, pp. 591–600 (2010)
Google Scholar
Liu, Z., Jansen, B.J.: A taxonomy for classifying questions asked in social question and answering. In: Proceedings of CHI EA 2015, pp. 1947–1952 (2015)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR (2013)
Google Scholar
Molino, P., Aiello, L.M., Lops, P.: Social question answering: textual, user, and network features for best answer prediction. ACM TOIS 35, 4–40 (2016)
Article Google Scholar
Morris, M.R., Teevan, J., Panovich, K.: A comparison of information seeking using search engines and social networks. In: Proceedings of ICWSM 2010, pp. 23–26 (2010)
Google Scholar
Morris, M.R., Teevan, J., Panovich, K.: What do people ask their social networks, and why?: a survey study of status message Q&A behavior. In: Proceedings of CWSM 2010, pp. 1739–1748 (2010)
Google Scholar
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of ACL (2008)
Google Scholar
Paul, S.A., Hong, L., Chi, E.H.: Is Twitter a good place for asking questions? a characterization study. In: Proceedings of CWSM 2010, pp. 578–581 (2011)
Google Scholar
Raban, D.R.: Self-presentation and the value of information in Q&A websites. JASIST 60(12), 2465–2473 (2009)
Article Google Scholar
Sriram, B.: Short text classification in Twitter to improve information filtering. In: Proceedings of ACM SIGIR 2010. ACM (2010)
Google Scholar
Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers on large online QA collections. In: Proceedings of ACL 2008, pp. 719–727 (2008)
Google Scholar
Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers to non-factoid questions from web collections. Comput. Linguist. 37(2), 351–383 (2011)
Google Scholar
Wu, Q., Burges, C.J.C., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13(3), 254–270 (2010)
Article Google Scholar
Yang, L., Ai, Q., Spina, D., Chen, R.C., Pang, L., Croft, W.B., Guo, J., Scholer, F.: Beyond factoid QA-effective methods for non-factoid answer sentence retrieval. In: Proceedings of ECIR (2016)
Google Scholar
Zhao, Z., Mei, Q.: Questions about questions: an empirical analysis of information needs on Twitter. In: Proceedings of WWW 2013, pp. 1545–1556 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Chile, Santiago, Chile
Jose Herrera & Barbara Poblete
Department of Computer Science, Pontificia Universidad Católica de Chile, Santiago, Chile
Denis Parra

Authors

Jose Herrera
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Poblete
View author publications
You can also search for this author in PubMed Google Scholar
Denis Parra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose Herrera .

Editor information

Editors and Affiliations

Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
Gabriella Pasi
LIP6 – UPMC/CNRS, University Pierre et Marie Curie, Paris, France
Benjamin Piwowarski
University of Glasgow, Glasgow, United Kingdom
Leif Azzopardi
Technical University of Vienna, Vienna, Austria
Allan Hanbury

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Herrera, J., Poblete, B., Parra, D. (2018). Learning to Leverage Microblog Information for QA Retrieval. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-76941-7_38
Published: 01 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics