Efficient influence spread management via budget allocation at community scale
Introduction
Influence maximization (IM) is a critical technique in many social network applications which attract abundant attention. By selecting a small number of nodes as seeds, we expect to trigger a large cascade of information spreading in a social network. For instance, in viral marketing (Domingos and Richardson, 2001, Richardson and Domingos, 2002), a company provides free products to some influential individuals and expects to make its products known by most people in the social network via the word-of-mouth effect. Similarly, the misinformation prevention (MP) problem is a kind of the competitive influence maximization problem, which shows two cascades battle against each other and the positive cascade tries to protect nodes from the misinformation cascade. In the MP problem, there are generally two cascades, misinformation cascade (M-cascade) and positive cascade (P-cascade). With the seed nodes that start M-cascade (M-seeds) known, the MP problem aims to select k positive seeds (P-seeds) to start P-cascade, so as to minimize the spread of M-cascade. The MP problem is first proposed by Budak, Agrawal, and El Abbadi (2011), in which the greedy approach is shown to achieve a -approximation result. Since then, MP problems have been actively explored (He et al., 2012, Song et al., 2017, Tsai et al., 2012, Tong et al., 2018).
Existing works for both IM and MP problems mainly adopt the k-seeds triggering model, i.e., selecting k seeds from a network of n nodes to start the desired information spread. The underlying assumption is that we can successfully convince any node to become a seed by a piece of material incentive. However, in real applications, such an explicit triggering model does not always work as the seed node may decline the offer. There exists an alternative implicit seed node triggering model, i.e., making the nodes voluntarily become seeds by implicit incentives such as delivering discounts or pushing ad information (Yang, Mao, Pei, & He, 2016). Such an implicit seed triggering model transforms the discrete node selection to continuous budget allocation. However, in this case we need to search for an optimal budget allocation in an exponential space of order n, which is computationally prohibitive. Inspired by group-based IM (Eftekhar, Ganjali, & Koudas, 2013), in this paper, we propose the community-based seed triggering model Com-seed in which the budget is allocated over communities under integer lattice constraints, since people in the same community share the same information channel like community websites or billboards. Besides, we can adjust community budgets via changing the notice days or ads pushing times in order to achieve efficient propagation. Two new problems, namely Influence Maximization at Community Scale (IMCS) and Misinformation Prevention at Community Scale (MPCS), are also proposed and studied.
Solving IMCS and MPCS problems in the Com-seed model will suffer from their inherent NP-hardness. To tackle the #P-hardness of computing influence, we carefully modify the state-of-the-art reverse influence sampling technique from Influence Maximization via Martingales (IMM) (Tang, Shi, & Xiao, 2015) and Hybrid-sampling based Misinformation Prevention (HMP) (Tong & Du, 2019). Accordingly, we propose Reverse Influence based Community greedy algorithm (RIC) and Community-based Misinformation Prevention algorithm (CMP) which can return -approximation solutions for IMCS and MPCS respectively. Comparing with the existing IM and MP models, our method may excel in the following three aspects: (1) Methods directly selecting seeds, such as IMM and HMP, may suffer from performance degradation in practical applications if the chosen seeds reject the task. In contrast, our RIC and CMP algorithms adopt a more reliable strategy by targeting a user community instead of one single user, in which some of users may turn into propagation seeds voluntarily. (2) Compared with the existing group-scale method (Eftekhar et al., 2013) where each community is treated equally in budget allocation, our model implements a weighted budget allocation among different communities to achieve better performance. (3) Finally, existing node-based Influence Maximization approaches (Yang et al., 2016) will experience performance bottleneck in large social networks with a huge number of nodes, since each node has its own triggering process and thus incurs expensive computational cost. Our community-based approach triggers propagation at the community level. The number of communities is much smaller than that of nodes, thus reducing computational cost.
It is worth noting that the Com-seed triggering model proposed in this paper can be applied in various applications such as community-based marketing (Zeng et al., 2009, Mckenzie-Mohr, 2000), group-based recommendation (Hu et al., 2014), uncertain-edge-based clustering (Li, Kong, Jia, & Li, 2018) etc. Although we mainly focus on finding the approximate solutions for the IM and MP problems with the fixed parameters in the Com-seed triggering model, machine learning approaches can be employed to learn optimal parameters from real seed transition data. This can help us discover more insightful knowledge and acquire deeper understanding about the user interactions in communities and hopefully lead to more inspiring future research directions in community influence diffusion, intelligent systems etc.
To summarize, it is worthwhile to highlight our contributions as follows.
- 1.
We present a unified community-based seed triggering model for both information maximization and misinformation prevention problems in which the efficient influence spread is achieved via the budget allocation at community scale.
- 2.
Greedy algorithms are designed to find approximate solutions for both information maximization and misinformation prevention problems at community scale with a -approximation ratio.
- 3.
To address the scability concern, we propose efficient algorithms based on reverse sampling schemes for information maximization and misinformation prevention problems at community scale. They can both find -approximate solutions with high confidence.
- 4.
We perform extensive experiments in three real world datasets and the experimental results show that our methods are far more outperforming than all baselines and run much faster than IMM and HMP.
Section snippets
Related works
The influence maximization problem is first proposed by Domingos and Richardson (2001) and Richardson and Domingos (2002). Kempe, Kleinberg, and Tardos (2003) regard influence maximization as a combinatorial optimization problem, which is NP-hard, and propose a -approximation greedy approach. However, the greedy algorithm is not scalable enough to deal with large networks. Many subsequent works (Yang et al., 2015, Leskovec et al., 2007, Goyal et al., 2011, Chen et al., 2009, Chen et
Model and problem definition
Before we start to introduce definitions, we list commonly used symbols in Table 1.
Community-based influence maximization
Given an allocation vector , there are many potential seed sets S generated from it. We denote the collection of such seed sets as , which includes all possible seed sets that can be triggered by the allocation vector . And for a seed set , we assume that the allocation vector has possibility to generate it. Therefore, can be presented as Eq. 1.
Next, we show that satisfies non-negative, monotonically non-decreasing and sub-modular properties in Lemma 1.
Community-based misinformation prevention
Similar to the case in IMCS problem, we use the virtual seed set to denote the effect of the allocation vector . Then we can rewrite , where is the expected number of nodes that can reach at least one hop earlier than in the community-based realization .
Dataset
We use three datasets, email-Eu-core, DBLP and Youtube from SNAP(Leskovec & Krevl, 2014). The dataset information is listed in Table 2. The dataset email-Eu-core is the core email network from a large European research institution. Each individual belongs to exactly one of 42 departments at the research institute. The dataset DBLP is constructed as a co-authorship network from the DBLP computer science bibliography, where two authors are connected if they publish at least one paper together and
Conclusion
In this paper, we propose influence maximization and misinformation prevention problems at community scale, IMCS and MPCS, and study them under our community-based seed triggering model. Instead of selecting seed nodes directly, we allocate budgets into communities. To solve the IMCS problem, We first sample RIC sets and then greedily allocate budgets according to the RIC sets. As for the MPCS problem, we use a hybrid sampling method to get l R-samples and also use a greedy method to allocate
CRediT authorship contribution statement
Can Wang: Conceptualization, Formal analysis, Resources, Writing - original draft, Writing - review & editing, Supervision, Project administration. Yangguang Zhang: Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Qihao Shi: Conceptualization, Methodology, Formal analysis, Writing - original draft, Writing - review & editing. Yan Feng: Resources, Supervision, Funding acquisition. Chun Chen: Resources, Supervision, Funding
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This work is funded by National Key Research and Development Project (Grant No: 2018AAA0101505) and State Grid Corporation of China Scientific and Technology Project: Fundamental Theory of Human-in-the-loop Hybrid-Augmented Intelligence for Power Grid Dispatch and Control.
References (41)
- et al.
Community-based influence maximization in social networks under a competitive linear threshold model
Knowledge-Based Systems
(2017) - et al.
Big social network influence maximization via recursively estimating influence spread
Knowledge-Based Systems
(2016) - et al.
Cofim: A community-based framework for influence maximization on large-scale networks
Knowledge-Based Systems
(2017) - et al.
Location driven influence maximization: Online spread via offline deployment
Knowledge-Based Systems
(2019) - et al.
Post and repost: A holistic view of budgeted influence maximization
Neurocomputing
(2019) - Borgs, C., Brautbar, M., Chayes, J. & Lucier, B. (2014). Maximizing social influence in nearly optimal time. In Procs...
- et al.
Threshold models for competitive influence in social networks
- Budak, C., Agrawal, D., El Abbadi, A., 2011. Limiting the spread of misinformation in social networks. In: Procs of...
- Chen, W., Wang, C. & Wang, Y. (2010). Scalable influence maximization for prevalent viral marketing in large-scale...
- Chen, W., Wang, Y. & Yang, S. (2009). Efficient influence maximization in social networks. In Procs of SIGKDD (pp....
Information cascade at group scale
A data-based approach to social influence maximization
Procs of VLDB
Deep modeling of group preferences for group-based recommendation
Cited by (1)
A Novel Tripartite Evolutionary Game Model for Misinformation Propagation in Social Networks
2022, Security and Communication Networks