Abstract
Microblogging filtering can help users filter out irrelevant content, and extract timely content effectively from microblogs. However, as a typical short text, microblogging filtering suffers from the insufficient samples problem that makes the probabilistic-like models unreliable. According to the current research, an explicit brief query has been thought to be only an abstract of the user’s information needs, and it’s hard to infer what is the users’ actual searching intents. Instead, we submit the relevant external documents as a user’s implicit prior knowledge and then build a corresponding filtering framework. To against the risk of external documents expansion, we suppose the external document can be viewed as a complete statement of an explicit query, and encode the filtering preferences with the diverge degree between the external document and the the original explicit query. Thus the optimal filtering action is the one that allows one to trade off diverge degree against generalization performance. With respect to the established baselines, our algorithm yields compelling results for providing a meaningful tweets retrieval. This work helps further understand the innate risk characteristics of external expansion for the design of Microblogging filtering systems.
Similar content being viewed by others
References
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Efron, M., Organisciak, P., & Fenlon, K. (2012). Improving retrieval of short texts through document expansion. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 911–920).
Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2160–2173.
Ma, Z., Teschendorff, A. E., Leijon, A., Qiao, Y., Zhang, H., & Guo, J. (2015). Variational bayesian matrix factorization for bounded support data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 876–889.
Ma, Z., Xue, J. -H., Leijon, A., Tan, Z. -H., Yang, Z., & Guo, J. (2016). Decorrelation of neutral vector variables: Theory and applications. IEEE transactions on neural networks and learning systems.
Miyanishi, T., Seki, K., & Uehara, K. (2012). Trec 2012 microblog track experiments at kobe university. Technical report, DTIC Document.
Ounis, I., Macdonald, C., Lin, J., & Soboroff, I. (2011). Overview of the trec-2011 microblog track. In Proceeddings of the 20th text REtrieval conference (TREC 2011) (vol. 32).
Qi, H., Li, M., Gao, J., & Li, S. (2006). Information retrieval for short documents. Journal of Electronics (China), 23(6), 933–936.
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 841–842).
Strohman, T., Metzler, D., Turtle, H., & Croft, W. B. (2005). Indri: A language model-based search engine for complex queries. In Proceedings of the international conference on intelligent analysis (Vol. 2, pp. 2–6). Citeseer.
Tao, T., Wang, X., Mei, Q., & Zhai, C. (2006). Language model information retrieval with document expansion. In Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics (pp. 407–414).
Yang, Z., Gao, K., Fan, K., & Lai, Y. (2014). Sensational headline identification by normalized cross entropy-based metric. The Computer Journal, 58(4), 644–655.
Yang, Z., Jones, I., Hu, X., & Liu, H. (2015). Finding the right social media site for questions. In Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015 (pp. 639–644).
Yang, Z., Li, C., Fan, K., & Huang, J. (2017). Exploiting multi-sources query expansion in microblogging filtering. Neural Network World, 27(1), 59.
Zhai, C., & Lafferty, J. (2002). Two-stage language models for information retrieval. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 49–56).
Zhai, C., & Lafferty, J. (2006). A risk minimization framework for information retrieval. Information Processing & Management, 42(1), 31–55.
Acknowledgements
This work was partly supported by the National Nature Science Foundation of China (Grant No. 61671030), and the National Key R&D Program of China (No. 2017YFC0803300) .
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, Z., Gao, K. & Huang, J. External Expansion Risk Management: Enhancing Microblogging Filtering Using Implicit Query. Wireless Pers Commun 102, 2199–2209 (2018). https://doi.org/10.1007/s11277-017-5075-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-017-5075-5