Efficient influence spread management via budget allocation at community scale

https://doi.org/10.1016/j.eswa.2021.114814Get rights and content

Highlights

  • We propose a unified community-based seed triggering model.

  • Two new variants of influence maximization and prevention problems are defined.

  • Efficient algorithms with provable approximation guarantees are designed.

  • Extensive experiments demonstrate the effectiveness of our methods.

Abstract

Given a social network, the classical influence maximization (IM) and misinformation prevention (MP) problems both adopt similar seed triggering models, i.e., convincing k specific users to become seed nodes by material incentive (e.g. free products). However, in real life, those chosen seeds may not be willing to spread information as expected, which will affect the final diffusion. Instead of convincing one single user, we can target a user community, in the hope that some of them may turn into propagation seeds voluntarily. This community-based seed (Com-seed) triggering model can be used in real-world applications such as distributing flyers or offering discounts in local communities, where the objective is to maximize the promotion effect with the given budget constraints. In this paper, we aim to maximize the influence or minimize the misinformation spread by finding an optimal community-based budget allocation under the Com-seed triggering models. We present new formulations of the influence maximization and misinformation prevention problems from the community perspective and design effective and scalable algorithms to solve these new problems. With intricately designed community-based sampling schemes and approximation guarantee of the greedy approach over integer lattice, our algorithms can achieve (1-1/e-)-approximation results. Experiments show our methods outperform all baselines and run faster than the state-of-the-art methods in both influence maximization and misinformation prevention problems, which demonstrate the effectiveness and scalability of the proposed algorithms.

Introduction

Influence maximization (IM) is a critical technique in many social network applications which attract abundant attention. By selecting a small number of nodes as seeds, we expect to trigger a large cascade of information spreading in a social network. For instance, in viral marketing (Domingos and Richardson, 2001, Richardson and Domingos, 2002), a company provides free products to some influential individuals and expects to make its products known by most people in the social network via the word-of-mouth effect. Similarly, the misinformation prevention (MP) problem is a kind of the competitive influence maximization problem, which shows two cascades battle against each other and the positive cascade tries to protect nodes from the misinformation cascade. In the MP problem, there are generally two cascades, misinformation cascade (M-cascade) and positive cascade (P-cascade). With the seed nodes that start M-cascade (M-seeds) known, the MP problem aims to select k positive seeds (P-seeds) to start P-cascade, so as to minimize the spread of M-cascade. The MP problem is first proposed by Budak, Agrawal, and El Abbadi (2011), in which the greedy approach is shown to achieve a (1-1/e-)-approximation result. Since then, MP problems have been actively explored (He et al., 2012, Song et al., 2017, Tsai et al., 2012, Tong et al., 2018).

Existing works for both IM and MP problems mainly adopt the k-seeds triggering model, i.e., selecting k seeds from a network of n nodes to start the desired information spread. The underlying assumption is that we can successfully convince any node to become a seed by a piece of material incentive. However, in real applications, such an explicit triggering model does not always work as the seed node may decline the offer. There exists an alternative implicit seed node triggering model, i.e., making the nodes voluntarily become seeds by implicit incentives such as delivering discounts or pushing ad information (Yang, Mao, Pei, & He, 2016). Such an implicit seed triggering model transforms the discrete node selection to continuous budget allocation. However, in this case we need to search for an optimal budget allocation in an exponential space of order n, which is computationally prohibitive. Inspired by group-based IM (Eftekhar, Ganjali, & Koudas, 2013), in this paper, we propose the community-based seed triggering model Com-seed in which the budget is allocated over communities under integer lattice constraints, since people in the same community share the same information channel like community websites or billboards. Besides, we can adjust community budgets via changing the notice days or ads pushing times in order to achieve efficient propagation. Two new problems, namely Influence Maximization at Community Scale (IMCS) and Misinformation Prevention at Community Scale (MPCS), are also proposed and studied.

Solving IMCS and MPCS problems in the Com-seed model will suffer from their inherent NP-hardness. To tackle the #P-hardness of computing influence, we carefully modify the state-of-the-art reverse influence sampling technique from Influence Maximization via Martingales (IMM) (Tang, Shi, & Xiao, 2015) and Hybrid-sampling based Misinformation Prevention (HMP) (Tong & Du, 2019). Accordingly, we propose Reverse Influence based Community greedy algorithm (RIC) and Community-based Misinformation Prevention algorithm (CMP) which can return (1-1/e-)-approximation solutions for IMCS and MPCS respectively. Comparing with the existing IM and MP models, our method may excel in the following three aspects: (1) Methods directly selecting seeds, such as IMM and HMP, may suffer from performance degradation in practical applications if the chosen seeds reject the task. In contrast, our RIC and CMP algorithms adopt a more reliable strategy by targeting a user community instead of one single user, in which some of users may turn into propagation seeds voluntarily. (2) Compared with the existing group-scale method (Eftekhar et al., 2013) where each community is treated equally in budget allocation, our model implements a weighted budget allocation among different communities to achieve better performance. (3) Finally, existing node-based Influence Maximization approaches (Yang et al., 2016) will experience performance bottleneck in large social networks with a huge number of nodes, since each node has its own triggering process and thus incurs expensive computational cost. Our community-based approach triggers propagation at the community level. The number of communities is much smaller than that of nodes, thus reducing computational cost.

It is worth noting that the Com-seed triggering model proposed in this paper can be applied in various applications such as community-based marketing (Zeng et al., 2009, Mckenzie-Mohr, 2000), group-based recommendation (Hu et al., 2014), uncertain-edge-based clustering (Li, Kong, Jia, & Li, 2018) etc. Although we mainly focus on finding the approximate solutions for the IM and MP problems with the fixed parameters in the Com-seed triggering model, machine learning approaches can be employed to learn optimal parameters from real seed transition data. This can help us discover more insightful knowledge and acquire deeper understanding about the user interactions in communities and hopefully lead to more inspiring future research directions in community influence diffusion, intelligent systems etc.

To summarize, it is worthwhile to highlight our contributions as follows.

  • 1.

    We present a unified community-based seed triggering model for both information maximization and misinformation prevention problems in which the efficient influence spread is achieved via the budget allocation at community scale.

  • 2.

    Greedy algorithms are designed to find approximate solutions for both information maximization and misinformation prevention problems at community scale with a (1-1/e)-approximation ratio.

  • 3.

    To address the scability concern, we propose efficient algorithms based on reverse sampling schemes for information maximization and misinformation prevention problems at community scale. They can both find (1-1/e-)-approximate solutions with high confidence.

  • 4.

    We perform extensive experiments in three real world datasets and the experimental results show that our methods are far more outperforming than all baselines and run much faster than IMM and HMP.

Section snippets

Related works

The influence maximization problem is first proposed by Domingos and Richardson (2001) and Richardson and Domingos (2002). Kempe, Kleinberg, and Tardos (2003) regard influence maximization as a combinatorial optimization problem, which is NP-hard, and propose a (1-1/e-)-approximation greedy approach. However, the greedy algorithm is not scalable enough to deal with large networks. Many subsequent works (Yang et al., 2015, Leskovec et al., 2007, Goyal et al., 2011, Chen et al., 2009, Chen et

Model and problem definition

Before we start to introduce definitions, we list commonly used symbols in Table 1.

Community-based influence maximization

Given an allocation vector x, there are many potential seed sets S generated from it. We denote the collection of such seed sets as S, which includes all possible seed sets that can be triggered by the allocation vector x. And for a seed set SS, we assume that the allocation vector x has possibility p(S) to generate it. Therefore, h(x) can be presented as Eq. 1.h(x)=SSp(S)f(S)

Next, we show that h(x) satisfies non-negative, monotonically non-decreasing and sub-modular properties in Lemma 1.

Community-based misinformation prevention

Similar to the case in IMCS problem, we use the virtual seed set Sx to denote the effect of the allocation vector x. Then we can rewrite h(x)=gcGcp(gc)H(Sx,gc), where H(Sx,gc) is the expected number of nodes that Sx can reach at least one hop earlier than Sr in the community-based realization gc.

Dataset

We use three datasets, email-Eu-core, DBLP and Youtube from SNAP(Leskovec & Krevl, 2014). The dataset information is listed in Table 2. The dataset email-Eu-core is the core email network from a large European research institution. Each individual belongs to exactly one of 42 departments at the research institute. The dataset DBLP is constructed as a co-authorship network from the DBLP computer science bibliography, where two authors are connected if they publish at least one paper together and

Conclusion

In this paper, we propose influence maximization and misinformation prevention problems at community scale, IMCS and MPCS, and study them under our community-based seed triggering model. Instead of selecting seed nodes directly, we allocate budgets into communities. To solve the IMCS problem, We first sample RIC sets and then greedily allocate budgets according to the RIC sets. As for the MPCS problem, we use a hybrid sampling method to get l R-samples and also use a greedy method to allocate

CRediT authorship contribution statement

Can Wang: Conceptualization, Formal analysis, Resources, Writing - original draft, Writing - review & editing, Supervision, Project administration. Yangguang Zhang: Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Qihao Shi: Conceptualization, Methodology, Formal analysis, Writing - original draft, Writing - review & editing. Yan Feng: Resources, Supervision, Funding acquisition. Chun Chen: Resources, Supervision, Funding

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work is funded by National Key Research and Development Project (Grant No: 2018AAA0101505) and State Grid Corporation of China Scientific and Technology Project: Fundamental Theory of Human-in-the-loop Hybrid-Augmented Intelligence for Power Grid Dispatch and Control.

References (41)

  • Domingos, P. & Richardson, M. (2001). Mining the network value of customers. In Procs of SIGKDD (pp....
  • M. Eftekhar et al.

    Information cascade at group scale

  • A. Goyal et al.

    A data-based approach to social influence maximization

    Procs of VLDB

    (2011)
  • Goyal, A., Lu, W. & Lakshmanan, L. (2011). Celf: Optimizing the greedy algorithm for influence maximization in social...
  • He, X., Song, G., Chen, W. & Jiang, Q. (2012). Influence blocking maximization in social networks under the competitive...
  • L. Hu et al.

    Deep modeling of group preferences for group-based recommendation

  • Kempe, D., Kleinberg, J. & Tardos (2003). Maximizing the spread of influence through a social network. In Procs of...
  • Kim, J., Kim, S. -K. & Yu, H. (2013). Scalable and parallelizable processing of influence maximization for large-scale...
  • Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J. & Glance, N. (2007). Cost-effective outbreak...
  • Leskovec, J. & Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection....
  • View full text