Abstract
The performance of result diversification for tweet search suffers from the well-known vocabulary mismatch problem, as tweets are too short and usually informal. As a remedy, we propose to adopt a query and tweet expansion strategy that utilizes automatically-generated word embeddings. Our experiments using state-of-the-art diversification methods on the Tweets2013 corpus reveal encouraging results for expanding queries and/or tweets based on the word embeddings to improve the diversification performance in tweet search. We further show that the expansions based on the word embeddings may serve as useful as those based on a manually constructed knowledge base, namely, ConceptNet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bandyopadhyay, A., Mitra, M., Majumder, P.: Query expansion for microblog retrieval. In: Proceedings of TREC 2011 (2011)
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of SIGMOD 2008, pp. 1247–1250 (2008)
Bouchoucha, A., He, J., Nie, J.: Diversified query expansion using conceptnet. In: Proceedings of CIKM 2013, pp. 1861–1864 (2013)
Busch, M., Gade, K., Larson, B., Lok, P., Luckenbill, S., Lin, J.: Earlybird: real-time search at twitter. In: Proceedings of ICDE 2012, pp. 1360–1369 (2012)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR 1998, pp. 335–336 (1998)
Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of SIGIR 2012, pp. 911–920 (2012)
Gurini, D.F., Gasparetti, F.: TREC microblog, : track: real-time ranking algorithm for microblog ranking systems. In: Proceedings of TREC 2012 (2012)
Kim, Y., Yeniterzi, R., Callan, J.: Overcoming vocabulary limitations in twitter microblogs. In: Proceedings of TREC 2012 (2012)
Vasileiou, Y., Sellis, T., Giannopoulos, G., Koniaris, M.: Diversifying microblog posts. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 189–198. Springer, Heidelberg (2014)
Liang, F., Qiang, R., Yang, J.: Exploiting real-time information retrieval in the microblogosphere. In: Proceedings of JCDL 2012, pp. 267–276 (2012)
Liu, H., Singh, P.: ConceptNet: a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004)
Liu, X., Bouchoucha, A., Sordoni, A., Nie, J.: Compact aspect embedding for diversified query expansions. In: Proceedings of AAAI 2014, pp. 115–121 (2014)
Massoudi, K., Tsagkias, M., de Rijke, M., Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 362–367. Springer, Heidelberg (2011)
McCreadie, R., Macdonald, C.: Relevance in microblogs: enhancing tweet retrieval using hyperlinked documents. In: Proceedings of OAIR 2013, pp. 189–196 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR, arxiv.org/pdf/1301.37811301.3781 (2013)
Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of CIKM 2013, pp. 439–448 (2013)
Ozsoy, M.G., Onal, K.D., Altingovde, I.S.: Result diversification for tweet search. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part II. LNCS, vol. 8787, pp. 78–89. Springer, Heidelberg (2014)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)
R. Qiang, F. Fan, C. Lv, and J. Yang. Knowledge-based query expansion in real-time microblog search. CoRR, 1503.03961 (2015)
Rodriguez Perez, J.A., McMinn, A.J., Jose, J.M.: University of glasgow (uog\(_{-}\)twteam) at TREC microblog 2013. In: Proceedings of TREC 2013. (2013)
Rodriguez Perez, J.A., Moshfeghi, Y., Jose, J.M.: On using inter-document relations in microblog retrieval. In: Proceedings of WWW 2013, pp. 75–76 (2013)
Santos, R.L., Macdonald, C., Ounis, I.: Exploiting query reformulations for web search result diversification. In: Proceedings of WWW 2010, pp. 881–890 (2010)
Tao, K., Abel, F., Hauff, C., Houben, G.-J., Gadiraju, U.: Groundhog day: near-duplicate detection on twitter. In: Proceedings of WWW 2013, pp. 1273–1284 (2013)
Tao, K., Hauff, C., Houben, G.-J.: Building a microblog corpus for search result diversification. In: Banchs, R.E., Silvestri, F., Liu, T.-Y., Zhang, M., Gao, S., Lang, J. (eds.) AIRS 2013. LNCS, vol. 8281, pp. 251–262. Springer, Heidelberg (2013)
Teevan, J., Ramage, D., Morris, M.R.: #twittersearch: a comparison of microblog search and web search. In: Proceedings of WSDM 2011, pp. 35–44 (2011)
Acknowledgement
This work is partially funded by METU under the grant number BAP-08-11-2013-055, and The Scientific and Technological Research Council of Turkey (TÃœBÄ°TAK) under the grant numbers 113E065 and 112E275.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Onal, K.D., Altingovde, I.S., Karagoz, P. (2015). Utilizing Word Embeddings for Result Diversification in Tweet Search. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds) Information Retrieval Technology. AIRS 2015. Lecture Notes in Computer Science(), vol 9460. Springer, Cham. https://doi.org/10.1007/978-3-319-28940-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-28940-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28939-7
Online ISBN: 978-3-319-28940-3
eBook Packages: Computer ScienceComputer Science (R0)