Skip to main content
Log in

Topic representation model based on microblogging behavior analysis

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the development of microblogging, it has become an important way for people to obtain information, express opinions, and make suggestions. Identifying new topics quickly and accurately from the massive microblogging data plays a crucial role for recommending information and controlling public opinion. The topic representation model provides a basis for topic detection. In this paper, we propose a topic representation model based on user behavior analysis, i.e., microblogging behavior analysis-latent Dirichlet allocation (MBA-LDA) model, for microblogging datasets. Topic-word distribution is acquired by the LDA model which considers information on user behaviors (such as posting, forwarding and commenting) and word distribution among documents within one topic and among different topics. The model also re-assesses the importance of words in topic representation. The basic idea is that the distribution of words within a topic or among different topics has a great influence on the selection of topic expression words. If a word is evenly distributed among all documents of a certain topic, it indicates that the word is the common word of all documents in the topic, and it is more suitable to represent this topic. If a word is more evenly distributed among various topics, it indicates that the word is the common word of all topics, and it can’t achieve the purpose of distinguishing topics, so it is less suitable to represent any topic. By experiments with Sina Microblogging’s actual data set, the topic model based on the MBA-LDA algorithm makes the representative words more important and increases the differentiation of topic words, which effectively improves the accuracy of subsequent topic detection and evolutionary analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4

Similar content being viewed by others

References

  1. Xiao, Y., Rayi, V., Sun, B., Du, X., Hu, F., Galloway, M.: A survey of key management schemes in wireless sensor networks. J. Comput. Commun. 30(11–12), 2314–2341 (2007)

    Article  Google Scholar 

  2. Qiu, J., Tian, Z., Du, C., Zuo, Q., Su, S., Fang, B.: A survey on access control in the age of Internet of Things. IEEE Internet of Things J. (2020). https://doi.org/10.1109/JIOT.2020.2969326

  3. Tian, Z., Gao, X., Su, S., Qiu, J.: Vcash: a novel reputation framework for identifying denial of traffic service in internet of connected vehicles. IEEE Internet of Things J. (2020). https://doi.org/10.1109/JIOT.2019.2951620

  4. Xiao, Y., Du, X., Zhang, J., Guizani, S.: Internet Protocol Television (IPTV): the killer application for the next generation internet. IEEE Commun. Mag. 45(11), 126–134 (2007)

    Article  Google Scholar 

  5. Li, M., Sun, Y., Lu, H., Maharjan, S., Tian, Z.: Deep reinforcement learning for partially observable data poisoning attack in crowdsensing systems. IEEE Internet of Things J. (2020). https://doi.org/10.1109/JIOT.2019.2962914

  6. Tian, Z., Su, S., Shi, W., Du, X., Guizani, M., Yu, X.: A data-driven model for future internet route decision modeling. Futur. Gener. Comput. Syst. 95, 212–220 (2019). https://doi.org/10.1016/j.future.2018.12.054

    Article  Google Scholar 

  7. Du, X., Guizani, M., Xiao, Y., Chen, H.H.: Transactions papers, a routing-driven elliptic curve cryptography based key management scheme for heterogeneous sensor networks. IEEE Trans. Wirel. Commun. 8(3), 1223–1229 (2009)

    Article  Google Scholar 

  8. Qiu, J., Du, L., Zhang, D., Su, S., Tian, Z.: Nei-TTE: intelligent traffic time estimation based on fine-grained time derivation of road segments for Smart City. IEEE Trans. Ind Inform. (2019). https://doi.org/10.1109/TII.2019.2943906

  9. Kumaran G, Allan J. Text classification and named entities for new event detection[C]. proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. University of Sheffield, UK, 2004: 297–304

  10. Kumaran G, Allan J. Using names and topics for new event detection[C]. In: Proceedings of the HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6–8 October 2005. Vancouver, British Columbia, Canada. (2005)

  11. Ogilvie P, Allan J, Jensen D, et al.: Extracting and using relationships found in text for topic tracking [J] (2000)

  12. Mei Q, Cai D, Zhang D, et al.: Topic modeling with network regularization[C]. In: Proceedings of the International Conference on World Wide Web, WWW 2008, Beijing, China, April: 101–110 (2008)

  13. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation [J]. J Mach. Learn. Res. Arch. 3, 993–1022 (2003)

    MATH  Google Scholar 

  14. Li, L., Sun, Y., Han, X., & Wang, C.: [IEEE 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC) – Guangzhou, China (2018.6.18-2018.6.21)] 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC) – Research on Improve Topic Representation over Short Text. (pp. 848–853) (2018)

  15. Nguyen, D. Q., Billingsley, R., Du, L., & Johnson, M.: Improving Topic Models with Latent Feature Word Representations (2018)

  16. Lin T, Tian W, Mei Q, et al.: The dual-sparse topic model: mining focused topics and focused terms in short text[C]. In: Proceedings of the International Conference on World Wide Web, pp. 539–550 (2014)

  17. Zhu J, Zheng X, Zhou L, et al.: Scalable inference in max-margin topic models[C]. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 964–972 (2013)

  18. Chen Y, Amiri H, Li Z, et al.: Emerging topic detection for organizations from microblogs[C]. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–52 (2013)

  19. Du J, Jiang J, Song D, et al.: Topic modeling with document relative similarities[C]. In: Proceedings of the International Conference on Artificial Intelligence, pp. 3469–75 (2015)

  20. Daniel R, David H, Ramesh N, et al.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora[C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume Association for Computational Linguistics, pp. 248–256 (2009)

  21. Bernstein M S, Suh B, Hong L, et al.: Eddi: interactive topic-based browsing of social status streams[C]. In: Proceedings of the ACM Symposium on User Interface Software and Technology, New York, NY, USA, October. pp. 303–312 (2010)

  22. Michelson M, Macskassy S A.: Discovering users’ topics of interest on twitter: a first look[C]. In: Proceedings of the Workshop on Analytics for Noisy Unstructured Text Data, Toronto, Ontario, Canada. DBLP, pp. 73–80 (2010)

  23. Chen, X., Zhou, X., Sellis, T., Li, X.: Social event detection with retweeting behavior correlation. Expert Syst. Appl. 114, 516–523 (2018)

    Article  Google Scholar 

  24. Cui, L., Zhang, X., Zhou, X., et al.: Topical Event Detection on Twitter[C]// Australasian Database Conference. Springer, Cham (2016)

    Google Scholar 

  25. Manna S, Phongpanangam O.: Exploring Topic Models on Short Texts: a Case Study with Crisis Data[C]// IEEE International Conference on Robotic Computing, 2018

  26. Yuan Y , Yao X , Han J , et al. Discriminative Joint-Feature Topic Model With Dual Constraints for WCE Classification[J]. IEEE Transactions on Cybernetics, 2017:1–12.https://doi.org/10.1109/TCYB.2017.2726818

  27. Divya P , Satyanath B , Shirish S , et al. Multi-Label Classification from Multiple Noisy Sources Using Topic Models[J]. Information, 2017, 8(2):52–63

  28. Flaspohler G, Roy N, Girdhar Y.: Feature Discovery and Visualization of Robot Mission Data Using Convolutional Autoencoders and Bayesian Nonparametric Topic Models[C]// IEEE/RSJ International Conference on Intelligent Robots & Systems, (2017)

  29. Li, M., Sun, Y., Lu, H., Maharjan, S., Tian, Z.: Deep reinforcement learning for partially observable data poisoning attack in crowdsensing systems. IEEE Internet of Things J. (2020). https://doi.org/10.1109/JIOT.2019.2962914

  30. Tian, Z., Luo, C., Qiu, J., Du, X., Guizani, M.: A distributed deep learning system for Web attack detection on edge devices. IEEE Trans. Ind. Inform. (2019). https://doi.org/10.1109/TII.2019.2938778

  31. Tian, Z., Shi, W., Wang, Y., Zhu, C., Du, X., Su, S., Sun, Y., Guizani, N.: Real time lateral movement detection based on evidence reasoning network for edge computing environment. IEEE Trans. Ind. Inform. 15(7), 4285–4294 (2019)

    Article  Google Scholar 

  32. Steinbach M, Karypis G, Kumar V.: A comparison of document clustering techniques[C]. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (2000)

  33. Tan, Q., Gao, Y., Shi, J., Wang, X., Fang, B., Tian, Z.: Towards a comprehensive insight into the eclipse attacks of Tor hidden services. IEEE Internet of Things J. (2018). https://doi.org/10.1109/JIOT.2018.2846624

  34. David, B., Al, E.: Latent dirichlct allocation [J]. J. Mach. Learn. Res. 3, 993–1002 (2003)

    Google Scholar 

Download references

Funding

Funded by NSFC (No. 61972106, U1636215, No.61871140), National Key research and Development Plan (Grant No. 2019QY1406, No. 2018YFB0803504), Guangdong Province Key research and Development Plan (Grant No. 2019B010136003 and No. 2019B010137004). Supported by Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2019).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhihong Tian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, W., Tian, Z., Huang, Z. et al. Topic representation model based on microblogging behavior analysis. World Wide Web 23, 3083–3097 (2020). https://doi.org/10.1007/s11280-020-00822-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-020-00822-x

Keywords

Navigation