Skip to main content

Learning to Leverage Microblog Information for QA Retrieval

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10772))

Included in the following conference series:

Abstract

Community Question Answering (cQA) sites have emerged as platforms designed specifically for the exchange of questions and answers among users. Although users tend to find good quality answers in cQA sites, they also engage in a significant volume of QA interactions in other platforms, such as microblog networking sites. This in part is explained because microblog platforms contain up-to-date information on current events, provide rapid information propagation, and have social trust.

Despite the potential of microblog platforms, such as Twitter, for automatic QA retrieval, how to leverage them for this task is not clear. There are unique characteristics that differentiate Twitter from traditional cQA platforms (e.g., short message length, low quality and noisy information), which do not allow to directly apply prior findings in the area. In this work, we address this problem by studying: (1) the feasibility of Twitter as a QA platform and (2) the discriminating features that identify relevant answers to a particular query. In particular, we create a document model at conversation-thread level, which enables us to aggregate microblog information, and set up a learning-to-rank framework, using factoid QA as a proxy task. Our experimental results show microblog data can indeed be used to perform QA retrieval effectively. We identify domain-specific features and combinations of those features that better account for improving QA ranking, achieving a MRR of 0.7795 (improving 62% over our baseline method). In addition, we provide evidence that our method allows to retrieve complex answers to non-factoid questions.

J. Herrera and B. Poblete have been partially funded by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004. J. Herrera is partially funded by the CONICYT Doctoral Program and D. Parra is funded by FONDECYT under grant 2015/11150783. J. Herrera, B. Poblete and D. Parra have been partially funded by FONDEF under grant ID16I10222.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://answers.yahoo.com.

  2. 2.

    http://stackexchange.com.

  3. 3.

    http://www.twitter.com.

  4. 4.

    328 million users in June 2016 (https://about.twitter.com/es/company).

  5. 5.

    6W1H corresponds to 5WH1 with the addition of the terms “Which” (i.e. Who, What, Where, When, Why, Which and How).

  6. 6.

    Due to space constrains only a high-level description of the features is provided. However, the detailed list of features is available at https://goo.gl/qqACz5.

  7. 7.

    http://www.fredericgodin.com/software/.

  8. 8.

    http://trec.nist.gov/data/qamain.html.

  9. 9.

    https://github.com/brmson/dataset-factoid-curated.

  10. 10.

    https://dev.twitter.com/rest/public/search.

  11. 11.

    https://github.com/jotixh/ConversationThreadsTwitter/.

  12. 12.

    https://sourceforge.net/p/lemur/wiki/RankLib/.

References

  1. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of ICML 2005, pp. 89–96 (2005)

    Google Scholar 

  2. Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval -Implementing and Evaluating Search Engines. MIT Press, Cambridge (2010)

    MATH  Google Scholar 

  3. Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.Y.: An empirical study on learning to rank of Tweets. In: Proceedings of COLING 2010, pp. 295–303 (2010)

    Google Scholar 

  4. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)

    MATH  Google Scholar 

  5. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)

    MathSciNet  MATH  Google Scholar 

  6. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  7. Herrera, J., Poblete, B., Parra, D.: Retrieving relevant conversations for Q&A on Twitter. In: Proceedings of ACM SIGIR (Workshop of SPS) (2015)

    Google Scholar 

  8. Honey, C., Herring, S.C.: Beyond microblogging: conversation and collaboration via Twitter. In: Proceedings of HICSS 2009, pp. 1–10 (2009)

    Google Scholar 

  9. Java, A., Song, X., Finin, T., Tseng, B.: Why we Twitter: understanding microblogging usage and communities. In: Proceedings of WebKDD/SNA-KDD 2007, pp. 56–65 (2007)

    Google Scholar 

  10. Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, Pearson Education International (2014)

    Google Scholar 

  11. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of WWW 2010, pp. 591–600 (2010)

    Google Scholar 

  12. Liu, Z., Jansen, B.J.: A taxonomy for classifying questions asked in social question and answering. In: Proceedings of CHI EA 2015, pp. 1947–1952 (2015)

    Google Scholar 

  13. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR (2013)

    Google Scholar 

  15. Molino, P., Aiello, L.M., Lops, P.: Social question answering: textual, user, and network features for best answer prediction. ACM TOIS 35, 4–40 (2016)

    Article  Google Scholar 

  16. Morris, M.R., Teevan, J., Panovich, K.: A comparison of information seeking using search engines and social networks. In: Proceedings of ICWSM 2010, pp. 23–26 (2010)

    Google Scholar 

  17. Morris, M.R., Teevan, J., Panovich, K.: What do people ask their social networks, and why?: a survey study of status message Q&A behavior. In: Proceedings of CWSM 2010, pp. 1739–1748 (2010)

    Google Scholar 

  18. Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of ACL (2008)

    Google Scholar 

  19. Paul, S.A., Hong, L., Chi, E.H.: Is Twitter a good place for asking questions? a characterization study. In: Proceedings of CWSM 2010, pp. 578–581 (2011)

    Google Scholar 

  20. Raban, D.R.: Self-presentation and the value of information in Q&A websites. JASIST 60(12), 2465–2473 (2009)

    Article  Google Scholar 

  21. Sriram, B.: Short text classification in Twitter to improve information filtering. In: Proceedings of ACM SIGIR 2010. ACM (2010)

    Google Scholar 

  22. Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers on large online QA collections. In: Proceedings of ACL 2008, pp. 719–727 (2008)

    Google Scholar 

  23. Surdeanu, M., Ciaramita, M., Zaragoza, H.: Learning to rank answers to non-factoid questions from web collections. Comput. Linguist. 37(2), 351–383 (2011)

    Google Scholar 

  24. Wu, Q., Burges, C.J.C., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13(3), 254–270 (2010)

    Article  Google Scholar 

  25. Yang, L., Ai, Q., Spina, D., Chen, R.C., Pang, L., Croft, W.B., Guo, J., Scholer, F.: Beyond factoid QA-effective methods for non-factoid answer sentence retrieval. In: Proceedings of ECIR (2016)

    Google Scholar 

  26. Zhao, Z., Mei, Q.: Questions about questions: an empirical analysis of information needs on Twitter. In: Proceedings of WWW 2013, pp. 1545–1556 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose Herrera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Herrera, J., Poblete, B., Parra, D. (2018). Learning to Leverage Microblog Information for QA Retrieval. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76941-7_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76940-0

  • Online ISBN: 978-3-319-76941-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics