On modeling and predicting popularity dynamics via integrating generative model and rich features

https://doi.org/10.1016/j.knosys.2020.105786Get rights and content

Abstract

Understanding the mechanisms governing how an online message acquires more popularity than another, modeling how it gains popularity dynamically, and determining the method for predicting its dynamics popularity are of tremendous interest to related decision support systems. However, one major limitation of existing generative dynamics models is that the learning parameters are difficult to interpret and it is unclear whether it can be generalized for other messages, as they are trained for different messages independently and the feature data-based connections between messages are ignored. To alleviate the defects, we first perform experiments on real-world data from Sina Weibo to identify the general correlation between the dynamics model and rich features of online messages. Consequently, we present a novel feature-regularized dynamics model based on reinforced Poisson process (FRRPP), which regulates the parameter learning of popularity dynamics by integrating a feature regression term to capture the revealed correlation across online posts. Specifically, in addition to the objective of the maximum likelihood function, we assume that the competitiveness parameter of the different posts can be predicted by rich features, to enhance the explicability and generality of the point-process model and learn the dynamics process of different posts together. The proposed model is then evaluated on two real Sina Weibo datasets, and conclusive experimental results indicate that the proposed model achieves a remarkable improvement over baseline methods in terms of MAPE and Accuracy with various settings, which further verifies our findings about how to improve the generality of popularity dynamics modeling and prediction.

Introduction

With the ubiquity use of the Web 2.0 services, users can publish, broadcast, and participate in online user-generated content (UGC), including tweets, microblogs, videos, and images, which are placing the economy of attention in the center of this era. This is governed by a process in which different contents compete with each other, with a few gaining more prominence and most of the others losing their novelty over time [1]. Determining an effective model to predict the popularity of online contents within a dynamically evolving system becomes a key issue for related decision-making processes, such as the government decision of opinion detection during emergency and the business decision for online brand dissemination.

The early studies have focused on the process by which users access web sites. It is observed that the distribution of their requests for online pages is highly skewed and follows Zipf’s law [2]. Recently, research on the method for predicting the dynamic popularity of online content is attracting more attention because of its remarkable implications in a wide range of applications such as online news [3], online advertising [4], election prediction [5], and crisis management [6]. Thus, predicting the future popularity implies suggesting the extent to which the users would react to the online content. Hence, it is useful for guiding public policies and business decision making regarding an event. On social media platforms, such as Sina Weibo1 and Twitter,2 an post gains attention and acquires popularity when being forwarded or commented by users who can access it.

Generally, the popularity prediction task is defined as follows: given an online post p in the social media and dynamics of its popularity gain (forward or comment) Tp in the early stage of the time after it is published, predict its future popularity at a particular time, where the popularity is represented by the number of forwards or comments it acquires. As we know, these social platforms are generally large-scale open systems and may be affected by exogenous factors, making it challenging to accurately predict popularity of these messages with complex popularity dynamics.

Indeed, the current methods for popularity prediction can be mainly classified into two groups, and each group has its strengths and limitations. Specifically, the group of feature-driven methods predict the popularity of online message by formulating the problem as a standard classification or regression task. They exploit the rich features of a message including the temporal features [7], [8], content features [9], textual features [10], [11], sentiment features [12], [13], and network features [14] for explaining the popularity of an item. Despite their capability for revealing many explicable and effective features for prediction, a drawback is that there are still numerous features to consider as the performance of these approaches heavily depends on the hand-crafted features. The features are often based on human’s prior domain knowledge and may be specific to particular platform or particular type of online messages being diffused. Moreover, these methods typically transform the prediction popularity problem as a binary classification of whether the number of attention or popularity a message receives will exceed a given number, which is obviously not general for popularity prediction.

The second group (generative approaches) treat the popularity dynamics as an arrival process and solve the popularity prediction problem by fitting the time series when online posts gain popularity using certain first-order or zero-order functions [15], [16], [17], [18]. The strength of this group versus the former one is that it can predict the exact extent of the popularity by fitting the time series. Although some methods are focused on the popularity gained within a fixed period, methods based on stochastic processes can reveal the underlying arrival dynamics process. In the latter group, a probabilistic model based on the Reinforced Poisson process (RPP) was proposed to explicitly model the process by which each item gains its popularity [17]. Subsequently, other researches extended the RPP by using two specific functions (the power-law temporal decay function and exponential reinforcement function) [18].

Despite the extraordinary performance of the RPP in modeling the process by which online posts dynamically gain retweets as discussed in [18], we find two main limitations in its application for predicting the popularity of online posts. First, since only the time series data about when an online item gains popularity is considered, only when an item receives certain attentions in a period after published, can the future popularity at a particular time be predicted. This implies that it is impossible to predict the future popularity of an item immediately after it appears when it has not gained any attention, because the RPP needs to learn the parameters based on the observed dynamics. More importantly, since only the time series data is considered and the informative features are absent in the model, the parameters in the RPP are difficult to interpret and they are learned independently with the loss of correlation between items, which restrict the generality of the estimated model. With the popularity dynamics of an item, the parameters of RPP model can be learned by maximizing the likelihood of the real dynamics process. However, the relevance of these parameters is uncertain and it is unclear whether it is possible to learn the dynamics of other related items having similar features by using the learned parameters even if their dynamics is incomplete or sparse.

In essence, the objective of this research is to alleviate the limitations of these dynamics model, that is, to improve the interpretability and effectiveness of the RPP model by exploiting rich features from online posts in addition to time series data. Specifically, we seek to find the answer to the following questions in this study:

(1) Is there any correlation between the rich features of online posts and their corresponding parameters of dynamics model based on stochastic process that only exploits the time series data of when the posts gain popularity?

(2) How to integrate rich features of online posts to improve the performance of the dynamics model if this correlation does exist?

(3) How to solve the newly proposed model and whether will it improve the effectiveness of popularity prediction task on real data?

To answer this question, in this study, we first reveal strong correlations between the rich features of online posts and corresponding parameters learned from their popularity dynamics. In this view, we extend the RPP model and attempt to combat the above defects by regulating the learning parameter of the popularity dynamics using a feature regression term, named as Feature-Regularized Reinforced Poisson Process (FRRPP). In addition to the term of the maximum likelihood estimation, the intrinsic competitiveness parameters of different posts are assumed to be linearly regressed by the rich features extracted from social microblogs. Thus, we impose a feature regression regularizer on the overall optimization objective of the rate function of the online posts together. Apparently, our proposed model takes the advantages of both the two paradigms of the existing methods as we are using rich features as well as fitting the dynamics with time series data.

The proposed FRRPP not only inherits the high predictive power of RPP model, but also possesses the high interpretability of feature-driven approaches, bridging the gap between prediction and understanding of information dynamics. Thus, the main difference from existing models lie in the utilization of interpretability and connections from feature-driven approaches to improve the robustness of a generative model, which can be seen as performing regression and dynamics fitting at the same time. The proposed algorithm is then evaluated on two Sina Weibo datasets. Conclusive experimental results indicate that the proposed model can improve the performance remarkably compared with the previous RPP [17] and extended RPP models [18]. Moreover, we explore the inherent characteristics and distribution over the parameters learned from the proposed method.

In summary, the contribution of this work is three-fold:

(1) Firstly, we investigate how the features of online contents are correlated with the estimated parameters of generative model when dependently fitting the popularity dynamics of them.

(2) We propose a novel feature-regularized reinforced Poisson process model based on our findings about the high correlation between features and estimated parameters. It can capture the correlation within posts on how their features are similar and how they gain popularity similarly, which is expected to enhance the popularity prediction task.

(3) A parameter learning algorithm is presented based on alternating optimization and gradient descent to solve the newly-proposed model. We also test the proposed method on real datasets and the extensive experiments demonstrate the effectiveness of it.

The remainder of this paper is organized as follows. The related works are briefly reviewed in Section 2, and we present the problem statement and introduce the related RPP model in Section 3. A detailed description of our findings, models, and learning algorithm is provided in Section 4. In Section 5, the experimental results and discussion are presented, followed by the conclusion in Section 6.

Section snippets

Related works

Popularity prediction of online content (such as tweets, images, and videos) involves the estimation of the number of users who will participate in the propagation and interaction of the given item in a period in the future. It has drawn widespread attention in the literature. Generally, based on when the popularity prediction task is performed relative to the propagation of the item over time, it can be classified into two categories: static prediction and dynamic prediction. Static popularity

Problem statement

In this study, we model the dynamic process of how an online post in social media gains popularity from the users in an observed time and then try to infer the future volume of the popularity. For a given published post in the social media, we consider the number of users commenting/forwarding on it as the popularity. Popularity dynamics of each post p during observed time period [T0p,Top] can be defined by a set of time intervals {Tip} (1inp) with each arriving comment,3

Proposed feature regularized RPP model

In this section, we first show our identified trends and then present the novel proposed model based on the interesting observation.

Experimental results

To fully examine the effectiveness and superiority of our proposed method, first, we will show the extensive comparison results to test the performance of our proposed FRRPP models. Then we will demonstrate a detailed analysis of the learned parameters of the models. To ease the reproduction of our results, we make the source code of FRRPP and our crawled dataset publicly available on GitHub.4

Discussion and conclusion

Accurately modeling and predicting the popularity of online content plays a critical role in the decision-making process supported by social media, such as government decision for opinion detecting in emergencies and business decision for brand dissemination. To overcome the limitation of the generative approaches, we proposed a feature-regularized reinforced Poisson process for modeling the popularity dynamics to regulate the learning parameters of the RPP by integrating a feature regression

CRediT authorship contribution statement

Xiaodong Feng: Conceptualization, Methodology, Investigation, Validation, Writing - original draft. Qihang Zhao: Software, Data curation, Visualization. Jie Ma: Supervision, Project administration, Writing - review & editing. Guoyin Jiang: Writing - review & editing, Conceptualization, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was partially supported by a grant from the National Natural Science Foundation of China(No. 71671060); the Science and Technology Program of Sichuan, China (No. 2019JDR0011).

References (47)

  • SzaboG. et al.

    Predicting the popularity of online content

    Commun. Acm

    (2010)
  • H. Pinto, J.M. Almeida, M.A. Goncalves, Using early view patterns to predict the popularity of youtube videos, in: ACM...
  • BandariR. et al.

    The pulse of news in social media: forecasting popularity

  • L. Hong, O. Dan, B.D. Davison, Predicting popular messages in twitter, in: International Conference on World Wide Web,...
  • TsagkiasM. et al.

    Predicting the volume of comments on online news stories

  • BergerJ. et al.

    What makes online content viral?

    J. Mark. Res.

    (2012)
  • MaZ. et al.

    On predicting the popularity of newly emerging hashtags in twitter

    J. Assoc. Inf. Sci. Technol.

    (2013)
  • GaoS. et al.

    Effective and effortless features for popularity prediction in microblogging network

  • MatsubaraY. et al.

    Rise and fall patterns of information diffusion: model and implications

  • Gomez-RodriguezM. et al.

    Modeling information propagation with survival theory

  • ShenH.W. et al.

    Modeling and predicting popularity dynamics via reinforced poisson processes

  • S. Gao, J. Ma, Z. Chen, Modeling and predicting retweeting dynamics on microblogging platforms, in: ACM International...
  • ChengJ. et al.

    Can cascades be predicted?

  • Cited by (9)

    • Heterogeneous dynamical academic network for learning scientific impact propagation

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Another set of methods explores the underlying mechanisms that drive scientific publications to disseminate and harvest citations. Statistical methods and stochastic point processes have been introduced to model the arrival process of citations, such as reinforced Poisson processes [16], self-exciting Hawkes processes [41–43], and their combinations [17,18,44–46]. These methods treat the prediction process in a generative way by first observing a small group of early adopters and then simulating the diffusion process using deterministic stochastic models.

    View all citing articles on Scopus
    View full text