Skip to main content

Utilizing Word Embeddings for Result Diversification in Tweet Search

  • Conference paper
  • First Online:
Information Retrieval Technology (AIRS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9460))

Included in the following conference series:

Abstract

The performance of result diversification for tweet search suffers from the well-known vocabulary mismatch problem, as tweets are too short and usually informal. As a remedy, we propose to adopt a query and tweet expansion strategy that utilizes automatically-generated word embeddings. Our experiments using state-of-the-art diversification methods on the Tweets2013 corpus reveal encouraging results for expanding queries and/or tweets based on the word embeddings to improve the diversification performance in tweet search. We further show that the expansions based on the word embeddings may serve as useful as those based on a manually constructed knowledge base, namely, ConceptNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://conceptnet5.media.mit.edu/data/5.3/.

References

  1. Bandyopadhyay, A., Mitra, M., Majumder, P.: Query expansion for microblog retrieval. In: Proceedings of TREC 2011 (2011)

    Google Scholar 

  2. Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  3. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of SIGMOD 2008, pp. 1247–1250 (2008)

    Google Scholar 

  4. Bouchoucha, A., He, J., Nie, J.: Diversified query expansion using conceptnet. In: Proceedings of CIKM 2013, pp. 1861–1864 (2013)

    Google Scholar 

  5. Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: real-time search at twitter. In: Proceedings of ICDE 2012, pp. 1360–1369 (2012)

    Google Scholar 

  6. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR 1998, pp. 335–336 (1998)

    Google Scholar 

  7. Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of SIGIR 2012, pp. 911–920 (2012)

    Google Scholar 

  8. Gurini, D.F., Gasparetti, F.: TREC microblog, : track: real-time ranking algorithm for microblog ranking systems. In: Proceedings of TREC 2012 (2012)

    Google Scholar 

  9. Kim, Y., Yeniterzi, R., Callan, J.: Overcoming vocabulary limitations in twitter microblogs. In: Proceedings of TREC 2012 (2012)

    Google Scholar 

  10. Vasileiou, Y., Sellis, T., Giannopoulos, G., Koniaris, M.: Diversifying microblog posts. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 189–198. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  11. Liang, F., Qiang, R., Yang, J.: Exploiting real-time information retrieval in the microblogosphere. In: Proceedings of JCDL 2012, pp. 267–276 (2012)

    Google Scholar 

  12. Liu, H., Singh, P.: ConceptNet: a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)

    Article  MathSciNet  Google Scholar 

  13. Liu, X., Bouchoucha, A., Sordoni, A., Nie, J.: Compact aspect embedding for diversified query expansions. In: Proceedings of AAAI 2014, pp. 115–121 (2014)

    Google Scholar 

  14. Massoudi, K., Tsagkias, M., de Rijke, M., Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 362–367. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  15. McCreadie, R., Macdonald, C.: Relevance in microblogs: enhancing tweet retrieval using hyperlinked documents. In: Proceedings of OAIR 2013, pp. 189–196 (2013)

    Google Scholar 

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR, arxiv.org/pdf/1301.37811301.3781 (2013)

  17. Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM 2013, pp. 439–448 (2013)

    Google Scholar 

  18. Ozsoy, M.G., Onal, K.D., Altingovde, I.S.: Result diversification for tweet search. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 78–89. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  19. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)

    Google Scholar 

  20. R. Qiang, F. Fan, C. Lv, and J. Yang. Knowledge-based query expansion in real-time microblog search. CoRR, 1503.03961 (2015)

  21. Rodriguez Perez, J.A., McMinn, A.J., Jose, J.M.: University of glasgow (uog\(_{-}\)twteam) at TREC microblog 2013. In: Proceedings of TREC 2013. (2013)

    Google Scholar 

  22. Rodriguez Perez, J.A., Moshfeghi, Y., Jose, J.M.: On using inter-document relations in microblog retrieval. In: Proceedings of WWW 2013, pp. 75–76 (2013)

    Google Scholar 

  23. Santos, R.L., Macdonald, C., Ounis, I.: Exploiting query reformulations for web search result diversification. In: Proceedings of WWW 2010, pp. 881–890 (2010)

    Google Scholar 

  24. Tao, K., Abel, F., Hauff, C., Houben, G.-J., Gadiraju, U.: Groundhog day: near-duplicate detection on twitter. In: Proceedings of WWW 2013, pp. 1273–1284 (2013)

    Google Scholar 

  25. Tao, K., Hauff, C., Houben, G.-J.: Building a microblog corpus for search result diversification. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 251–262. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  26. Teevan, J., Ramage, D., Morris, M.R.: #twittersearch: a comparison of microblog search and web search. In: Proceedings of WSDM 2011, pp. 35–44 (2011)

    Google Scholar 

Download references

Acknowledgement

This work is partially funded by METU under the grant number BAP-08-11-2013-055, and The Scientific and Technological Research Council of Turkey (TÃœBÄ°TAK) under the grant numbers 113E065 and 112E275.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ismail Sengor Altingovde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Onal, K.D., Altingovde, I.S., Karagoz, P. (2015). Utilizing Word Embeddings for Result Diversification in Tweet Search. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28940-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28939-7

  • Online ISBN: 978-3-319-28940-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics