Elsevier

Knowledge-Based Systems

Volume 213, 15 February 2021, 106692
Knowledge-Based Systems

Efficient diversified influence maximization with adaptive policies

https://doi.org/10.1016/j.knosys.2020.106692Get rights and content

Abstract

Influence maximization (IM) in social media aims at finding influential seed users to trigger large online cascading influence spread. Existing studies mainly focus on maximizing the number of the activated nodes, but the diversity of the activated nodes has been mostly ignored even though it is crucial in many real-world applications, e.g., diversity of the influenced crowd can reduce the risk of marketing campaigns, or diversity of recommended items can improve the recommendation quality. In this paper, we study the diversified influence maximization (DIM) problem which aims to select k nodes such that both the number of activated nodes and the diversity of the activated nodes can be maximized. By designing a practical diversity function to model the distribution of activated nodes among all communities, we model the DIM problem by maximizing a parameterized linear combination of the diversity and the active node number. To tackle the NP-hardness of the DIM problem, we propose an efficient diversified influence maximization algorithm, diversified influence maximization via martingale (DIMM), which returns a 11eε-approximate solution with at least 12n probability. Moreover, as the influence spread is highly stochastic and the DIM setting leaves no reserved measures for handling unforeseen events, we present an improved version of the algorithm named adaptive diversified influence maximization via martingale (ADIMM) with the adaptive setting. With the adaptive RI-πα policy, ADIMM is able to return an α(1e1α)-approximate solution with at least 11n probability. Finally, we experimentally evaluate our algorithm against existing algorithms on large-scale real-world datasets. The experimental results validate the effectiveness and efficiency of our proposed algorithms.

Introduction

With the exponential growth of social network users, recent years have witnessed a boom of information spread in social media. Consequently, influence maximization (IM) problem [1], [2] in social media has attracted abundant attention. The IM problem aims to select k nodes as seed nodes and utilize the “word-of-mouth” [3], [4] effect to spread the information for activating other nodes in the social network. By convincing the seeds to adopt a product (or an idea, a service, etc.), the other activated nodes are regarded as adopting the product as well. The goal of the IM problem is to choose the optimal k seeds such that the expected number of activated nodes in the social network is maximized. The IM problem finds many applications in viral marketing [5], [6], network monitoring [7], [8], rumor control [9], [10], [11] and so on.

A hot line of IM research is to study the problem by considering additional information. Topic-aware IM problem considers the topics of information to be spread. The possibility that a node adopts the information is affected by the interest of the node to the topic [12], [13], [14]. Time-aware IM problem models the propagation rate of information over social networks in order to activate more nodes before they are influenced by information from competitors [15], [16]. Location-aware IM problem takes into account the geographical locations of nodes in maximizing information propagation [17], [18].

Existing studies on influence maximization mainly focus on maximizing the number of activated nodes. The diversity of the activated nodes, although bearing great importance in many practical applications, has been mostly overlooked. For instance, users in a social network naturally form different communities. In marketing campaigns, having a diverse target audience among different communities could bring many benefits, such as reducing the risk of marketing campaigns [19]. Diversity also benefits recommendation systems as the diversity of recommendations is increasingly recognized as an important aspect of recommendation quality [20], [21], [22]. As the proverb goes: “Don’t put all your eggs in one basket”, spreading influence among diverse groups is an intrinsic aspect of IM research. However, to the best of our knowledge, [19], [23] are the only previous works exploring the diversity over the activated nodes in influence maximization. [19] aims to optimize a weighted sum of the influence spread and diversity while [23] model the diversity in influence spread using three commonly used utilities in economics. The object functions in both [19] and [23] are difficult to optimize in a scalable manner. This will limit the applicability of the methods as the importance of diversity is more emphasized in large networks.

To address this issue, in this paper we propose a practical diversified influence maximization (DIM) problem. The DIM problem aims to select k nodes such that both the number of activated nodes and the diversity of the activated nodes can be maximized. We follow the framework of [19] such that the objective function is modeled as a weighted sum of the influence spread and diversity. To tackle the NP-hardness of the DIM problem, we employ the reverse influence sampling technique [24], [25] and propose the DIMM algorithm to approximately solve the DIM problem in an efficient manner. The DIMM algorithm can return a 11eε-approximate solution with at least 12n probability.

In real applications, the influence spread is highly stochastic and unforeseen events might occur [11], [26], [27], [28]. The above DIM setting assumes that the seed nodes are all selected at the very beginning and leaves no reserved measures for handling unforeseen events. A more reasonable policy is to adaptively invest the budget based on the observation of influence spread as time goes by. Therefore, we further propose the adaptive diversified influence maximization (ADIM) problem. In each time round, the available observation offers the evidence for estimating the future reduction on influence spread of the current seeds. Accordingly, we can decide whether to select new seeds and which seeds to select. With the adaptive policy, we can have reserved budget for handling the case if the influence spread dies out quickly. By careful modification of the DIMM algorithm with an adaptive setting, we propose and implement α-greedy adaptive policies that can return approximation solution with reasonable error bounds.

Finally, we evaluate our algorithm against existing algorithms on 4 real datasets (two of them are large-scale datasets with more than one million edges.). The experimental results validate the effectiveness and efficiency of the proposed algorithms.

It is worthwhile to list our contributions as follows.

  • 1.

    In this paper we propose the practical diversified influence maximization (DIM) problem and theoretically analyze the hardness, monotonicity, and submodularity of the DIM problem.

  • 2.

    We design an approximation algorithm DIMM to solve the DIM problem with a new data structure reverse influence sketch (RI-sketch) constructed. We show that the DIMM algorithm can achieve an approximation ratio of at least (11eε) and near-linear time complexity.

  • 3.

    Considering the requirement for an adaptive environment, we further propose adaptive diversified influence maximization (ADIM) problem and design an α-Greedy Policy to approximately solve it, which ensures a α-dependent α(1e1α) approximation ratio.

  • 4.

    By extending the RI-sketch data structure, we design an efficient implementation of the α-Greedy Policy, with provable error bound of the approximation ratio.

  • 5.

    We construct extensive experiments on four real-world datasets. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithms.

The rest of this paper is organized as follows. We briefly review related works in Section 2. We present the DIM problem and its solution in Sections 3 Diversified influence maximization, 4 Solution for DIM problem. We then present the Adaptive-DIM problem and its solution in Sections 5 Adaptive-DIM problem, 6 Solution for ADIM problem. The experimental results and discussions are presented in Section 7. Finally, we conclude the paper and present some directions for future work in Section 8. Note that all proofs are shown in the appendix.

Section snippets

Related works

The Influence Maximization (IM) problem was first proposed in [3], [5]. Following the probabilistic framework formulated in [5], Kempe et al. [1] modeled it as a discrete optimization problem. They prove the problem is NP-hard and propose a greedy framework to solve it with a 11e approximation guarantee. Subsequent studies mainly focus on reducing the running time of the greedy algorithm. It has been shown that the branch-and-bound approach can provide higher empirical efficiency while

Diversified influence maximization

Formally, a social network is modeled as a directed graph G(V,E), where V is the set of nodes and E the set of edges, denoted by n=|V| the number of nodes and m=|E| the number of edges. To facilitate the presentation, we first introduce the classic independent cascade (IC) model for information propagation [1].

Solution for DIM problem

If the whole social network has only one community as itself, then we can see the DIM problem degenerates to the traditional IM problem. Thus they suffer at least the same hardness for solving the problem [1] and computing the objective function [2].

Theorem 1

The DIM problem is NP-hard. Moreover, computing σ(S) and D(S) for any set S are both #P-hard.

Nevertheless, the objective function possesses nice properties that allow approximation algorithms to be designed.

Lemma 1

[19]

Function ϕ(S) is monotone and submodular

Adaptive-DIM problem

The above DIM problem adopts the same one-shot formulation of the IM problem, i.e., budget is exhausted with seed nodes all selected and activated at the beginning. Nothing is done in the subsequent influence spread process. In fact, the influence spread is highly stochastic. Though the probability is low, it would die out quickly. Thus a more flexible and effective strategy is to adaptively select seed nodes for multi rounds by observing the influence spread results in previous rounds. In this

Solution for ADIM problem

We first show the adaptive monotonicity and adaptive submodularity which are the theoretical basis of the proposed policies. For convenience, we define f(B|ψ)=(1λ)σ(B|ψ)+λD(B|ψ) as the expected value of objective function of S-pair set B chosen by policy π under the realization ψ. According to the analysis in [48], we have that the function f(B|ψF) is not adaptive submodularity. Thus greedy approach cannot be directly applied. In the following, we will propose a modified greedy strategy that

Experiment

Conclusion

Most of the existing influence maximization works focus on maximizing the number of the activated nodes while ignoring the diversity of the activated nodes. In this paper, we propose the diversified influence maximization (DIM) problem and the corresponding DIMM algorithm to approximately solve it. With carefully designed data structure RI-sketch, the DIMM algorithm can achieve a 11eε-approximate solution with at least 12n probability and time complexity near-linear to the network size.

CRediT authorship contribution statement

Can Wang: Conceptualization, Formal analysis, Resources, Writing - original draft, Writing - review & editing, Supervision, Project administration. Qihao Shi: Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Weizhao Xian: Conceptualization, Methodology, Formal analysis, Writing - original draft, Writing - review & editing. Yan Feng: Resources, Supervision, Funding acquisition. Chun Chen: Resources, Supervision, Funding

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is funded by National Key R&D Program of China (Grant No: 2018AAA0101505) and State Grid Corporation of China Scientific and Technology Project, China : Fundamental Theory of Human-in-the-loop Hybrid-Augmented Intelligence for Power Grid Dispatch and Control.

References (53)

  • GoldenbergJ. et al.

    Talk of the network: A complex systems look at the underlying process of word-of-mouth

    Mark. Lett.

    (2001)
  • RichardsonM. et al.

    Mining knowledge-sharing sites for viral marketing

  • ChenW. et al.

    Scalable influence maximization for prevalent viral marketing in large-scale social networks

  • LeskovecJ. et al.

    Cost-effective outbreak detection in networks

  • Gomez RodriguezM. et al.

    Inferring networks of diffusion and influence

  • C. Budak, D. Agrawal, A. El Abbadi, Limiting the spread of misinformation in social networks, n: i Proceedings of the...
  • HeX. et al.

    Influence blocking maximization in social networks under the competitive linear threshold model

  • ShiQ. et al.

    Adaptive influence blocking: Minimizing the negative spread by observation-based policies

  • AslayC. et al.

    Online topic-aware influence maximization queries

  • BarbieriN. et al.

    Topic-aware social influence propagation models

  • ChenS. et al.

    Online topic-aware influence maximization

    Proc. VLDB Endowment

    (2015)
  • M. Gomez-Rodriguez, D. Balduzzi, B. Schölkopf, Uncovering the temporal dynamics of diffusion networks, in: Proceedings...
  • M. Gomez-Rodriguez, B. Schölkopf, Influence maximization in continuous time diffusion networks, in: Proceedings of the...
  • LiG. et al.

    Efficient location-aware influence maximization

  • TangF. et al.

    Diversified social influence maximization

  • Q. Liu, B. Xiang, E. Chen, Y. Ge, H. Xiong, T. Bao, Y. Zheng, Influential seed items recommendation, in: Proceedings of...
  • Cited by (12)

    • Dynamic node influence tracking based influence maximization on dynamic social networks

      2022, Microprocessors and Microsystems
      Citation Excerpt :

      Several improvements have also been proposed by exploiting various topological properties of the network. It includes heuristic methods such as DegreeDiscount (DD) algorithm [13], and LTR method [14], path-based methods such as matrix influence (MATI) [15], random sampling-based methods such as RIS [16], TIM+ [17], TPH [18], DIMM [19]. The IM has also been studied in realistic situations such as community-based influence maximization [20], location-aware [21], and context-aware [22].

    • Influence maximization in social networks using effective community detection

      2022, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      Finally, Section 5 provides conclusions and suggestions for further research. Influence maximization problem was first proposed by Dominguez and Richardson in 2001 and then many scientists have investigated this [20,27–31]. The greedy algorithm takes a long execution time due to the multiple computations of the influence spread and Monte Carlo simulations.

    • Robust Sequence Networked Submodular Maximization

      2023, Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
    View all citing articles on Scopus
    View full text