Skip to main content
Log in

External Expansion Risk Management: Enhancing Microblogging Filtering Using Implicit Query

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Microblogging filtering can help users filter out irrelevant content, and extract timely content effectively from microblogs. However, as a typical short text, microblogging filtering suffers from the insufficient samples problem that makes the probabilistic-like models unreliable. According to the current research, an explicit brief query has been thought to be only an abstract of the user’s information needs, and it’s hard to infer what is the users’ actual searching intents. Instead, we submit the relevant external documents as a user’s implicit prior knowledge and then build a corresponding filtering framework. To against the risk of external documents expansion, we suppose the external document can be viewed as a complete statement of an explicit query, and encode the filtering preferences with the diverge degree between the external document and the the original explicit query. Thus the optimal filtering action is the one that allows one to trade off diverge degree against generalization performance. With respect to the established baselines, our algorithm yields compelling results for providing a meaningful tweets retrieval. This work helps further understand the innate risk characteristics of external expansion for the design of Microblogging filtering systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://trec.nist.gov/data/microblog.html.

References

  1. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  2. Efron, M., Organisciak, P., & Fenlon, K. (2012). Improving retrieval of short texts through document expansion. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 911–920).

  3. Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2160–2173.

    Article  Google Scholar 

  4. Ma, Z., Teschendorff, A. E., Leijon, A., Qiao, Y., Zhang, H., & Guo, J. (2015). Variational bayesian matrix factorization for bounded support data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(4), 876–889.

    Article  Google Scholar 

  5. Ma, Z., Xue, J. -H., Leijon, A., Tan, Z. -H., Yang, Z., & Guo, J. (2016). Decorrelation of neutral vector variables: Theory and applications. IEEE transactions on neural networks and learning systems.

  6. Miyanishi, T., Seki, K., & Uehara, K. (2012). Trec 2012 microblog track experiments at kobe university. Technical report, DTIC Document.

  7. Ounis, I., Macdonald, C., Lin, J., & Soboroff, I. (2011). Overview of the trec-2011 microblog track. In Proceeddings of the 20th text REtrieval conference (TREC 2011) (vol. 32).

  8. Qi, H., Li, M., Gao, J., & Li, S. (2006). Information retrieval for short documents. Journal of Electronics (China), 23(6), 933–936.

    Article  Google Scholar 

  9. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 841–842).

  10. Strohman, T., Metzler, D., Turtle, H., & Croft, W. B. (2005). Indri: A language model-based search engine for complex queries. In Proceedings of the international conference on intelligent analysis (Vol. 2, pp. 2–6). Citeseer.

  11. Tao, T., Wang, X., Mei, Q., & Zhai, C. (2006). Language model information retrieval with document expansion. In Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics (pp. 407–414).

  12. Yang, Z., Gao, K., Fan, K., & Lai, Y. (2014). Sensational headline identification by normalized cross entropy-based metric. The Computer Journal, 58(4), 644–655.

    Article  Google Scholar 

  13. Yang, Z., Jones, I., Hu, X., & Liu, H. (2015). Finding the right social media site for questions. In Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015 (pp. 639–644).

  14. Yang, Z., Li, C., Fan, K., & Huang, J. (2017). Exploiting multi-sources query expansion in microblogging filtering. Neural Network World, 27(1), 59.

    Article  Google Scholar 

  15. Zhai, C., & Lafferty, J. (2002). Two-stage language models for information retrieval. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 49–56).

  16. Zhai, C., & Lafferty, J. (2006). A risk minimization framework for information retrieval. Information Processing & Management, 42(1), 31–55.

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work was partly supported by the National Nature Science Foundation of China (Grant No. 61671030), and the National Key R&D Program of China (No. 2017YFC0803300) .

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhen Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Gao, K. & Huang, J. External Expansion Risk Management: Enhancing Microblogging Filtering Using Implicit Query. Wireless Pers Commun 102, 2199–2209 (2018). https://doi.org/10.1007/s11277-017-5075-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-017-5075-5

Keywords

Navigation