Skip to main content
Log in

A content search method for security topics in microblog based on deep reinforcement learning

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Traditional methods treat the search problem as a process of selecting and ranking sequential documents. The methods have been proved effective and are widely used in the web search domain. However, due to the complexity and particularity of microblog text contents, the classical methods are rarely used microblog searches for specific topics. Focusing on the issue of searching for specific topics in microblog content, we present a microblog search method for security topics based on deep reinforcement learning by modeling the microblog search for specific topics as a continuous-state Markov decision process. We also design a novel deep Q network to evaluate the relevance of microblog content based on the target topic. We adopt reinforcement learning to solve the microblog search problem using an intelligent strategy and evaluate content relevance through deep learning. Experiments conducted on a real-world dataset show that our approach outperforms the selected baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Agarwal, M.K., Bansal, D., Garg, M., et al.: Keyword search on microblog data streams: finding contextual messages in real time[C]. In: Proceedings of 19th International Conference on Extending Database Technology (EDBT), pp. 15–18 (2016)

    Google Scholar 

  2. Asadi, N., Lin, J.: Fast candidate generation for real-time tweet search with bloom filter chains[J]. ACM Transactions on Information Systems (TOIS). 31(3), 13 (2013)

    Article  Google Scholar 

  3. Basu, M., Roy, A., Ghosh, K., et al.: A novel word embedding based stemming approach for microblog retrieval during disasters[C]. In: European Conference on Information Retrieval, pp. 589–597. Springer, Cham (2017)

    Google Scholar 

  4. Basu, M., Roy, A., Ghosh, K., et al.: Microblog retrieval in a disaster situation: a new test collection for evaluation[C]. SMERP@ ECIR. 22–31 (2017)

  5. Borisov, A., Markov, I., de Rijke, M., et al.: A neural click model for web search[C]. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp. 531–541 (2016)

    Chapter  Google Scholar 

  6. Burges, C., Shaked, T., Renshaw, E., et al.: Learning to rank using gradient descent[C]. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ACM (2005)

  7. Busch, M., Gade, K., Larson, B., et al.: Earlybird: real-time search at twitter[C]//data engineering (ICDE), 2012 IEEE 28th international conference on. IEEE. 1360–1369 (2012)

  8. Calderone, D., Sastry, S.S.: Markov decision process routing games[C]//Proceedings of the 8th International Conference on Cyber-Physical Systems. ACM. 273–279 (2017)

  9. Cao, Z., Qin, T., Liu, T.Y., et al.: Learning to rank: from pairwise approach to listwise approach[C]. In: Proceedings of the 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)

  10. Chen, C.C., Wang, S.D.: An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm[J]. ACM Transactions on Architecture and Code Optimization (TACO). 10(4), 25 (2013)

    Google Scholar 

  11. Chen, C., Li, F., Ooi, B.C., et al.: Ti: an efficient indexing mechanism for real-time search on tweets[C]. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 649–660. ACM (2011)

  12. Chen, Q., Hu, Q., Huang, J., et al.: TAKer: fine-grained time-aware microblog search with kernel density estimation[J]. IEEE Trans. Knowl. Data Eng. 30(8), 1602–1615 (2018)

    Article  Google Scholar 

  13. De Maio, C., Fenza, G., Gallo, M., et al.: Time-aware adaptive tweets ranking through deep learning[J]. Futur. Gener. Comput. Syst. (2017)

  14. Dolotta, T.A.: Data Processing in 1980–1985[M]. Wiley (1976)

  15. Dzida, W., Herda, S., Itzfeldt, W.D.: User-perceived quality of interactive systems[J]. IEEE Trans. Softw. Eng. SE-4(4), 270–276 (1978)

    Article  Google Scholar 

  16. Feng, S., Song, K., Wang, D., et al.: A word-emoticon mutual reinforcement ranking model for building sentiment lexicon from massive collection of microblogs[J]. World Wide Web Internet and Web Information Systems. 18(4), 949–967 (2015)

    Google Scholar 

  17. Feng, S., Wang, Y., Liu, L., et al.: Attention based hierarchical LSTM network for context-aware microblog sentiment classification[J]. World Wide Web Internet and Web Information Systems. 2018, 1–23

  18. Graves A. Generating Sequences with Recurrent Neural Networks[J]. arXiv preprint arXiv:1308.0850, 2013

    Google Scholar 

  19. Guo, J., Fan, Y., Ai, Q., et al.: A deep relevance matching model for ad-hoc retrieval[C]. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)

  20. Hasanain, M., Elsayed, T.: Query performance prediction for microblog search[J]. Inf. Process. Manag. 53(6), 1320–1341 (2017)

    Article  Google Scholar 

  21. Herranz, L., Jiang, S., Li, X.: Scene recognition with CNNs: objects, scales and dataset bias[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 571–579 (2016)

    Google Scholar 

  22. Hochreiter, S., Schmidhuber, J.: Long short-term memory[J]. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  23. Huang, P.S., He, X., Gao, J., et al.: Learning deep structured semantic models for web search using clickthrough data[C]. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, pp. 2333–2338. ACM (2013)

  24. Huang, J., Peng, M., Wang, H., et al.: A probabilistic method for emerging topic tracking in microblog stream[J]. World Wide Web Internet and Web Information Systems. 20(2), 325–350 (2017)

    Google Scholar 

  25. Keyhanipour, A.H., Moshiri, B., Rahgozar, M., Oroumchian, F., Ansari, A.A.: Integration of data fusion and reinforcement learning techniques for the rank-aggregation problem[J]. Int. J. Mach. Learn. Cybern. 7(6), 1131–1145 (2016)

    Article  Google Scholar 

  26. Keyhanipour, A.H., Keyhanipour, A.H., Moshiri, B., et al.: Learning to rank with click-through features in a reinforcement learning framework[J]. International Journal of Web Information Systems. 12(4), 448–476 (2016)

    Article  Google Scholar 

  27. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection[C]//Proceedings of the International Joint Conferences on Artificial Intelligence. 14(2), 1137–1145 (1995)

    Google Scholar 

  28. Kou, F., Du, J., He, Y., Ye, L.: Social network search based on semantic analysis and learning[J]. CAAI Transactions on Intelligence Technology. 1(4), 293–302 (2016)

    Article  Google Scholar 

  29. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning[J]. Nature. 521(7553), 436–444 (2015)

    Article  Google Scholar 

  30. Liu, X., Gao, J., He, X., et al.: Representation learning using multi-task deep neural networks for semantic classification and information retrieval[C]. HLT-NAACL. 912–921 (2015)

  31. Luo, J., Zhang, S., Yang, H.: Win-win search: dual-agent stochastic game in session search[C]. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 587–596. ACM (2014)

  32. Mao, J., Liu, Y., Luan, H., et al.: Understanding and predicting usefulness judgment in web search[C]. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1169–1172. ACM (2017)

  33. Mikolov, T., Chen, K., Corrado, G., et al.: Efficient Estimation of Word Representations in Vector Space[J]. arXiv preprint arXiv:1301, vol. 3781, (2013)

    Google Scholar 

  34. Miranda, F., Lins, L., Klosowski, J.T., Silva, C.T.: TOPKUBE: a rank-aware data cube for real-time exploration of spatiotemporal data[J]. IEEE Trans. Vis. Comput. Graph. 24(3), 1394–1407 (2018)

    Article  Google Scholar 

  35. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning[J]. Nature. 518(7540), 529–533 (2015)

    Article  Google Scholar 

  36. Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning[C]. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  37. Nguyen, D.T., Jung, J.E.: Real-time event detection for online behavioral analysis of big social data[J]. Futur. Gener. Comput. Syst. 66, 137–145 (2017)

    Article  Google Scholar 

  38. Olteanu, A., Castillo, C., Diaz, F., et al.: CrisisLex: a lexicon for collecting and filtering microblogged communications in crises[C]. In: Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media, pp. 376–386 (2014)

    Google Scholar 

  39. Puterman, M.L.: Markov decision processes[J]. Handbooks in Operations Research and Management Science. 2, 331–434 (1990)

    Article  MathSciNet  Google Scholar 

  40. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond[J]. Foundations and Trends® in Information Retrieval. 3(4), 333–389 (2009)

    Article  Google Scholar 

  41. Rodriguez Perez, J.A.: Microblog Retrieval Challenges and Opportunities[D]. University of Glasgow (2018)

  42. Schütze, H.: Introduction to information retrieval[C]. Proceedings of the International Communication of Association for Computing Machinery Conference. (2008)

  43. Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks[C]. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382. ACM (2015)

  44. Shen, Y., He, X., Gao, J., et al.: A latent semantic model with convolutional-pooling structure for information retrieval[C]. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 101–110. ACM (2014)

  45. Shen, Y., He, X., Gao, J., et al.: Learning semantic representations using convolutional neural networks for web search[C]. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 373–374. ACM (2014)

  46. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search[J]. Nature. 529(7587), 484–489 (2016)

    Article  Google Scholar 

  47. Singla, R., Modha, S., Majumder, P., et al.: Information extraction from microblog for disaster related event[C]//SMERP@ ECIR. 85–92 (2017)

  48. Song, X., Jiang, S., Herranz, L.: Multi-scale multi-feature context modeling for scene recognition in the semantic manifold[J]. IEEE Trans. Image Process. 26(6), 2721–2735 (2017)

    Article  MathSciNet  Google Scholar 

  49. Song Z, Zhang L, Liu T, et al. Ranking learning algorithm of information retrieval based on WeChat public numbers[C]//Proceedings of the 6th International Conference on Information Engineering. ACM, 2017: 4

  50. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction (2nd ed)[M]. Cambridge. MIT press. (2016)

  51. Wang, S., Huang, S., Liu, T.Y., et al.: Ranking-oriented collaborative filtering: a listwise approach. [J]. ACM Transactions on Information Systems (TOIS). 35(2), 10 (2016)

    Google Scholar 

  52. Wang, Y., Huang, H., Feng, C.: Query expansion based on a feedback concept model for microblog retrieval[C]. In: Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp. 559–568 (2017)

    Google Scholar 

  53. Wei, Q., Lewis, F.L., Sun, Q., Yan, P., Song, R.: Discrete-time deterministic Q-learning: a novel convergence analysis[J]. IEEE Transactions on Cybernetics. 47(5), 1224–1237 (2017)

    Article  Google Scholar 

  54. Wei, Z., Xu, J., Lan, Y., et al.: Reinforcement learning to rank with Markov decision process[C]. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 945–948. ACM (2017)

  55. Xia, L., Xu, J., Lan, Y., et al.: Adapting markov decision process for search result diversification[C]. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 535–544. ACM (2017)

  56. Xia, F., Yu, C., Xu, L., et al.: Top-k temporal keyword search over social media data[J]. World Wide Web Internet and Web Information Systems. 20(5), 1049–1069 (2017)

    Google Scholar 

  57. Xingjian, S.H.I., Chen, Z., Wang, H., et al.: Convolutional LSTM network: a machine learning approach for precipitation now casting[C]. Adv. Neural Inf. Proces. Syst. 802–810 (2015)

  58. Xu, J., Xia, L., Lan, Y., et al.: Directly optimize diversity evaluation measures: a new approach to search result diversification[J]. ACM Transactions on Intelligent Systems and Technology (TIST). 8(3), 41 (2017)

    Google Scholar 

  59. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval[C]//ACM SIGIR Forum. ACM. 51(2), 268–276 (2017)

    Google Scholar 

  60. Zhang, X., He, B., Luo, T., et al.: Query-biased learning to rank for real-time twitter search[C]. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1915–1919. ACM (2012)

  61. Zhang, D., Nie, L., Luan, H., et al.: Compact indexing and judicious searching for billion-scale microblog retrieval[J]. ACM Transactions on Information Systems (TOIS). 35(3), 27 (2017)

    Google Scholar 

  62. Zhang, R., Jin, Z., Liu, X.: A study on the analysis model of the ranking of the theme of Weibo[J]. Int. J. Pattern Recognit. Artif. Intell. 32(03), 1851003 (2018)

    Article  MathSciNet  Google Scholar 

  63. Zheng, N., Jin, M., Hong, H., Huang, L., Gu, Z., Li, H.: Real-time and precise insect flight control system based on virtual reality[J]. Electron. Lett. 53(6), 387–389 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant (No.61772083, No.61532006, No. 61877006, No. 61802028), in part by the Fundamental Research Funds for the Central University (No.2018RC44), in part by the Director Foundation of Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia (No.ITSM20180102).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junping Du.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, N., Du, J., Yao, X. et al. A content search method for security topics in microblog based on deep reinforcement learning. World Wide Web 23, 75–101 (2020). https://doi.org/10.1007/s11280-019-00697-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00697-7

Keywords

Navigation