A time-varying propagation model of hot topic on BBS sites and Blog networks
Introduction
With the rapid development of Internet technology, particularly the emergence of Web 2.0 sites [20] such as blog networks, BBS sites and social networks, people are not only exchanging information online but also expressing their ideas and opinions openly. Generally speaking, blogs and BBS are two kinds of web 2.0 sites which are extremely popular in China, much like twitter and Facebook in Europe and the US. Essentially BBS site serves as electronic information center and emerging media, which is used to offer public information releases, chatting, mail and other services as an alternative to physical bulletin platform. In 2005, there were at least 123 BBS sites at 78 universities and colleges in China, and these BBS sites have several million registered users. Popular BBS sites in China include the LQQM BBS established in Tsing University and the BMY BBS established at Xi’an Jiaotong University. For the Sina blog in China, there are almost 230 million users totally all over the world. Many people like expressing their opinion about hot topics on blog and BBS in recent years. For example, “08 Olympic” and “Melamine contaminated milk powder incident” are both hot topic on the BMY BBS and LQQM BBS, whereas “Roh Moo-Hyun’s death” and “Influenza A[H1N1]” are hot topics on the Sina blog.
Factually online topics are playing significant roles in public life undeniably. To understand the evolutionary trend of different hot online topic, it is necessary to study the propagation mechanism of hot online topic.
Theoretically if we can analyze each single user’s behavior for individuals participating in online discussion, maybe we can give a clear explanation for all users’ behavior trend as to hot online topic. However, the information of users’ behavior is often embedded in a vast size of network information; for example, mining web information from Sina blog will produce several millions MB data per day, because of web sites updating every day, thus this way is very time-consuming and is not feasible. In another way, we can model the propagation of hot online topics by studying users’ group collective behavior. This understanding of users’ collective behavior helps us to establish the propagation process of hot online topic as well as effectively let us tackle the disorientation problem in modeling.
In this paper, we propose a time-varying hot topic propagation model (THTPM for abbr.) on BBS or blog to describe the collective behavior of users who are in different online social subgroups. The greatest merit of our model is that it can be applied to reflect the user’s state transition process efficiently and does not depend on any empirical parameters. For accessing to the threshold by which we can prejudge the trend of hot topic discussion, we analyze the stability of equilibrium of THTPM and conclude two theorems. They reflect: when threshold Q∗ < 1, hot topic will die out, and when threshold Q∗ > 1, hot topic will keep uniformly weakly persistent. Furthermore, we propose method (I) and method (II) to predict the trend of hot online topic based on our model for the consideration of theoretical motivation and practical application respectively. By Method (I) we can predict the trend of single- peak hot topic mainly for theoretical motivation, however, in real internet circumstance, there are many multi-peak hot topics, and so we design method (II) to predict the size of discusser- users writing or commenting upon article posts mainly with respect to multi-peak hot online topic in the following two days but it also can predict single-peak hot-topic trend.
Our main contribution lies in that we firstly put forward a novel time-varying state model to depict the dissemination mechanism of hot online topic. Meantime, we bring forward the first universal trend prediction theory of hot topic based on moving time windows which can be applied to distinct hot topics effectively and our results show the proposed prediction method is validate for both single-peak and multi-peak hot topics.
The rest of the paper is organized as follows. Section 2 discusses related work; Section 3 shows problem formation. Section 4 describes definition about the dynamic propagation model of hot online topic. In Section 5, we analyze the stability of the equilibrium of the topic in system (2.1). Experiments are discussed in Section 6. Section 7 presents the conclusions and future work.
Section snippets
Related work
To the best of our knowledge, scholarly interest in social media analysis has increased due to the growing use of tools such as weblogs in past several years. Many earlier studies tended to engage in different aspect of the following five stages in blog networks from topic data acquisition to propagation modeling of hot online topic. First, semantic analysis was used to develop data mining and topic detection techniques to research online topics on inception. Zheng [29] proposed a document
Problem formulations
Now we formalize the problem of modeling the propagation of hot online topic.
On the beginning, we assume target hot topic as TD, other topics which belong to the same category as target hot topic as TR, topics category as T.
User subset D = {dk(t)}(1 ⩽ k ⩽ m, m < n) denotes the Discussed Group, where dk(t) represents the individual who participates in writing or commenting upon article posts with respect to target hot online topic TD at time t (we can deem dk(t) as discusser of topics for convenience);
Model definitions
We introduce the time-varying state equation to describe the transmission mechanism of hot online topic at now.
By Fig. 3.1, it is reasonable for us to suppose that each user in exited group E(t) loses interest in writing or commenting upon article posts with regard to any topics in the same topics category T as target hot topic TD. Then we establish the following state equations according to previous discussion and Fig. 3.1:We describe all parameters as follows.
Analysis of hot online topics propagation model
For the sake of developing the proper method to predict the trend of hot online topic by our time-varying dynamic model effectively, we should understand the stability of the equilibrium of our time-varying dynamic model. Therefore, we first analyze the stability of the equilibrium of the constant dynamic propagation model (CDPM) where all the parameters are constant. It will help us to understand the stability of the equilibrium of our time-varying hot topic propagation model (THTPM). The
Experiments
For consideration of both theoretical motivation and practical application, we excogitate Method (I) to validate the model mainly for theoretical motivation and verify the validity of our theorems, we exemplify how to establish the threshold Q∗ or Q∗ and how to prejudge the trend of hot online topic by the threshold. Meantime, we design method (II) to overcome the shortcoming of Method (I) to be practically applied for predicting the trend of single-peak hot topic as well as multi-peak one.
Conclusions and future work
The propagation of hot online topic is correlative with the collective behavior of different user’s groups on blog network or BBS site. Thus by constructing a dynamic propagation model which is time-varying state equations of different user’s group just like epidemic modeling, we can approximate to actual hot-topic propagation process on blog network or BBS site. This time-varying dynamic model (THTPM) is first proposed to describe the propagation of hot online topic. Furthermore, Theorem 5.1,
References (30)
- et al.
How valuable is medical social media data? content analysis of the medical web
Information Sciences
(2009) - et al.
Information discovery across multiple streams
Information Sciences
(2009) - et al.
Data-based mechanistic modeling of stochastic rainfall-flow processes by state dependent parameter estimation
Environmental Modeling and Software
(2009) - et al.
Dynamical behavior of an epidemic model with a nonlinear incidence rate
Journal of Differential Equations
(2003) - et al.
Sliding window-based frequent pattern mining over data streams
Information Sciences
(2009) - et al.
Global analysis of an epidemic model with a constant removal rate
Mathematical and Computer Modelling
(2007) - et al.
Exploiting noun phrases and semantic relationships for text document clustering
Information Sciences
(2009) The Mathematical theory of Infectious Diseases and its Applications
(1975)- L.B. Cao, C.Q. Zhang, Y.C. Zhao, Philip S. Yu, G. Williams, DDDM2007: Domain driven data mining, in: ACM SIGKDD...
- M.D. Choudhury, H. Sundaram, A. John, D.D. Seligmann, Multi-scale characterization of social network dynamics in the...