Abstract
Influence maximization in social networks is a classic and extensively studied problem that targets at selecting a set of initial seed nodes to spread the influence as widely as possible. However, it remains an open challenge to design fast and accurate algorithms to find solutions in large-scale social networks. Prior Monte Carlo simulation-based methods are slow and not scalable, while other heuristic algorithms do not have any theoretical guarantee and they have been shown to produce poor solutions for quite some cases. In this paper, we propose hop-based algorithms that can be easily applied to billion-scale networks under the commonly used Independent Cascade and Linear Threshold influence diffusion models. Moreover, we provide provable data-dependent approximation guarantees for our proposed hop-based approaches. Experimental evaluations with real social network datasets demonstrate the efficiency and effectiveness of our algorithms.














Similar content being viewed by others
References
Arora A, Galhotra S, Ranu S (2017) Debunking the myths of influence maximization: an in-depth benchmarking study. In: Proceedings of ACM SIGMOD, pp 651–666
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Borgs C, Brautbar M, Chayes J, Lucier B (2014) Maximizing social influence in nearly optimal time. In: Proceedings of SODA, pp 946–957
Cha M, Mislove A, Gummadi KP (2009) A measurement-driven analysis of information propagation in the Flickr social network. In: Proceedings WWW, pp 721–730
Chen W (2009) NetHEPT dataset. http://research.microsoft.com/en-us/people/weic/
Cheng S, Shen H, Huang J, Chen W, Cheng X (2014) IMRank: influence maximization via finding self-consistent ranking. In: Proceedings ACM SIGIR, pp 475–484
Cheng S, Shen H, Huang J, Zhang G, Cheng X (2013) Staticgreedy: solving the scalability-accuracy dilemma in influence maximization. In: Proceedings ACM CIKM, pp 509–518
Chen W, Lu W, Zhang N (2012) Time-critical influence maximization in social networks with time-delayed diffusion process. In: Proceedings of AAAI, pp 592–598
Chen W, Wang C, Wang Y (2010a) Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of ACM KDD, pp 1029–1038
Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of ACM KDD, pp 199–208
Chen W, Yuan Y, Zhang L (2010b) Scalable influence maximization in social networks under the linear threshold model. In: Proceedings of IEEE ICDM, pp. 88–97
Cohen E, Delling D, Pajor T, Werneck RF (2014) Sketch-based influence maximization and computation: scaling up with guarantees. In: Proceedings ACM CIKM, pp 629–638
Conforti M, Cornuéjols G (1984) Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the rado-edmonds theorem. Discrete Appl Math 7(3):251–274
Dinh TN, Zhang H, Nguyen DT, Thai MT (2014) Cost-effective viral marketing for time-critical campaigns in large-scale social networks. IEEE ACM Trans Netw 22(6):2001–2011
Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings ACM KDD, pp 57–66
Galhotra S, Arora A, Roy S (2016) Holistic influence maximization: Combining scalability and efficiency with opinion-aware models. In: Proceedings ACM SIGMOD, pp 743–758
Goel S, Watts DJ, Goldstein DG (2012) The structure of online diffusion networks. In: Proceedings ACM EC, pp 623–638
Goyal A, Bonchi F, Lakshmanan LVS (2011a) A data-based approach to social influence maximization. Proc VLDB Endow 5(1):73–84
Goyal A, Bonchi F, Lakshmanan L, Venkatasubramanian S (2013) On minimizing budget and time in influence propagation over social networks. Social Netw Anal Min 3(2):179–192
Goyal A, Lu W, Lakshmanan LV (2011b) Celf++: Optimizing the greedy algorithm for influence maximization in social networks. In: Proceedings WWW Companion, pp 47–48
Goyal A, Lu W, Lakshmanan LVS (2011c) Simpath: An efficient algorithm for influence maximization under the linear threshold model. In: Proceedings IEEE ICDM, pp 211–220
Jiang F, Jin S, Wu Y, Xu J (2014) A uniform framework for community detection via influence maximization in social networks. In: Proceedings IEEE/ACM ASONAM, pp 27–32
Jung K, Heo W, Chen W (2012) IRIE: scalable and robust influence maximization in social networks. In: Proceedings IEEE ICDM, pp 918–923
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings ACM KDD, pp 137–146
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of WWW, pp 591–600
Lee JR, Chung CW (2014) A fast approximation for influence maximization in large social networks. In: WWW Companion, pp 1157–1162
Leskovec J, Adamic LA, Huberman BA (2007a) The dynamics of viral marketing. ACM Trans Web 1(1):5:1–5:39
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007b) Cost-effective outbreak detection in networks. In: Proceedings of ACM KDD, pp 420–429
Leskovec J, Krevl A (2014) SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data
Li Y, Zhao BQ, Lui JCS (2012) On modeling product advertisement in large-scale online social networks. IEEE ACM Trans Netw 20(5):1412–1425
Lin Y, Chen W, Lui JC (2017) Boosting information spread: an algorithmic approach. In: Proceedings of IEEE ICDE, pp 883–894
Liu B, Cong G, Xu D, Zeng Y (2012) Time constrained influence maximization in social networks. In: Proceedings of IEEE ICDM, pp 439–448
Lu W, Chen W, Lakshmanan LV (2015) From competition to complementarity: comparative influence diffusion and maximization. Proc VLDB Endow 9(2):60–71
Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions-I. Math Program 14(1):265–294
Nguyen HT, Dinh TN, Thai MT (2016a) Cost-aware targeted viral marketing in billion-scale networks. In: Proceedings of IEEE INFOCOM
Nguyen HT, Thai MT, Dinh TN (2016b) Stop-and-stare: optimal sampling algorithms for viral marketing in billion-scale networks. In: Proceedings of ACM SIGMOD, pp 695–710
Ohsaka N, Akiba T, Yoshida Y, Kawarabayashi K (2014) Fast and accurate influence maximization on large networks with pruned Monte-Carlo simulations. In: Proceedings of AAAI, pp 138–144
Ohsaka N, Sonobe T, Fujita S, Kawarabayashi Ki (2017) Coarsening massive influence networks for scalable diffusion analysis. In: Proceedings of ACM SIGMOD, pp 635–650
Song G, Zhou X, Wang Y, Xie K (2015) Influence maximization on large-scale mobile social network: a divide-and-conquer method. IEEE Trans Parallel Distrib Syst 26(5):1379–1392
Tang Y, Shi Y, Xiao X (2015) Influence maximization in near-linear time: A martingale approach. In: Proceedings of ACM SIGMOD, pp 1539–1554
Tang J, Tang X, Xiao X, Yuan J (2018a) Online processing algorithms for influence maximization. In: Proceedings of ACM SIGMOD
Tang J, Tang X, Yuan J (2016) Profit maximization for viral marketing in online social networks. In: Proceedings of IEEE ICNP, pp 1–10
Tang J, Tang X, Yuan J (2017a) Influence maximization meets efficiency and effectiveness: a hop-based approach. In: Proceedings of IEEE/ACM ASONAM, pp 64–71
Tang J, Tang X, Yuan J (2017b) Profit maximization for viral marketing in online social networks: algorithms and analysis. IEEE Trans Knowl Data Eng (Preprint)
Tang J, Tang X, Yuan J (2018b) Towards profit maximization for online social network providers. In: Proceedings of IEEE INFOCOM
Tang Y, Xiao X, Shi Y (2014) Influence maximization: Near-optimal time complexity meets practical efficiency. In: Proceedings of ACM SIGMOD, pp 75–86
Wang Z, Yang Y, Pei J, Chu L, Chen E (2017) Activity maximization by effective information diffusion in social networks. IEEE Trans Knowl Data Eng 29(11):2374–2387
Xu W, Lu Z, Wu W, Chen Z (2014) A novel approach to online social influence maximization. Social Netw Anal Min 4(1):153
Zhang C, Sun J, Wang K (2013) Information propagation in microblog networks. In: Proceedings of IEEE/ACM ASONAM, pp 190–196
Zhou C, Zhang P, Guo J, Guo L (2014) An upper bound based greedy algorithm for mining top-k influential nodes in social networks. In: Proceedings of WWW Companion, pp 421–422
Zhou C, Zhang P, Guo J, Zhu X, Guo L (2013) UBLF: an upper bound based approach to discover influential nodes in social networks. In: Proceedings of IEEE ICDM, pp 907–916
Acknowledgements
This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its IDM Futures Funding Initiative, and by Singapore Ministry of Education Academic Research Fund Tier 1 under Grant 2017-T1-002-024 and Tier 2 under Grant MOE2015-T2-2-114.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 1
To consider the outgoing edges from u one at a time, we first disable all the edges from u to its neighbors except for one edge \(\langle u,w_1\rangle\). Then, for each neighbor v of \(w_1\), all of v’s inverse neighbors other than \(w_1\) have their one-hop activation probabilities unchanged by adding \(\langle u,w_1\rangle\). Let \(\pi _2^{{S}\cup \{u\}}(v|w_1)\) denote the new two-hop activation probability of v. Then, we have
where \(\rho (S,u,v,w)=\frac{1-p_{w,v}\cdot \pi _1^{{S}\cup \{u\}}(w)}{1-p_{w,v}\cdot \pi _1^{{S}}(w)}\). Next, we enable the second edge \(\langle u,w_2\rangle\). Let \(\pi _2^{{S}\cup \{u\}}(v|w_1,w_2)\) denote the new two-hop activation probability of v. Following similar arguments, for each neighbor v of \(w_2\), we have
We continue to enable the outgoing edges of u sequentially. In general, when an edge \(\langle u,w_i\rangle\) is enabled after edges \(\langle u,w_1\rangle, \langle u,w_2\rangle, \ldots , \langle u,w_{i-1}\rangle\), for each neighbor v of \(w_i\), we have
Therefore, we can initialize \(\pi _2^{{S}\cup \{u\}}(v)\) with \(\pi _2^{{S}}(v)\) and iteratively update \(\pi _2^{{S}\cup \{u\}}(v)\) with
for all the nodes \(w\in {N}_u\setminus {S}\) and \(v\in {N}_w\setminus {S}\). Moreover, for the direct neighbors of u, their two-hop activation probabilities also need to be adjusted because u’s one-hop activation probability has changed from \(\pi _1^{{S}}(u)\) to 1. For each neighbor v of u, the adjustment can be made in a similar way by updating \(\pi _2^{{S}\cup \{u\}}(v)\) with
Then, the final two-hop activation probability \(\pi _2^{{S}\cup \{u\}}(v)\) by the iterative updates (19) and (20) is
Hence, the theorem is proven. \(\square\)
Proof of Theorem 2
Consider a single seed \(\{u\}\). Let \({A}_u\subseteq {N}_u\) denote a subset of a node u’s neighbors. Let \(p({A}_u)\) denote the probability that all the nodes in \({A}_u\) are activated directly by u under the IC and LT models, while all the nodes in \({N}_u\setminus {A}_u\) are not directly activated by u (they may not even be activated eventually). Since each of u’s neighbors is activated by u independently, we have
Furthermore, with h hops of propagation, for each node \(w\in {V}\setminus \{u\}\), w can only be activated by a propagation path starting from a node \(v\in {A}_u\) whose path length is no longer than \(h-1\) hops. In other words, the probability for w to be activated by \({A}_u\) is \(\pi _{h-1}^{{A}_u}(w)\). Considering all the possible node sets \({A}_u\) activated directly by u, we have
The second “\(\le\)” is due to the submodularity of \(\sigma _{h}(\cdot )\) (see Theorem 3) such that \(\sigma _{h-1}({A}_u)\le \sum _{v\in {A}_u}\sigma _{h-1}(\{v\})\). In the third “=”, \(p(v\in {A}_u)\) is such a binary value that \(p(v\in {A}_u)=1\) if and only if \(v\in {A}_u\). Meanwhile, we have
The last “=” follows the fact that \(p(v\in {A}_u)=0\) since \(v\not \in {A}_u\subseteq {N}_u\setminus \{v\}\) and \(p(v\in {A}_u\cup \{v\})=1\) since \(v\in {A}_u\cup \{v\}\). Therefore, from (23) and (24), we have
Furthermore, by definition,
Thus, by (25) and (26), it holds that
Inequality (11) can be proved by induction. When \(h=1\), the inequality follows directly from Inequality (10). Suppose that it holds for \(h-1\) hops of propagation, i.e., \(\sigma _{h-1}(\{u\}) \le \hat{\sigma }_{h-1}(\{u\})\). Then, for h hops of propagation, we have
Therefore, for any \(h\ge 0\), we have \(\sigma _{h}(\{u\}) \le \hat{\sigma }_{h}(\{u\})\). \(\square\)
Proof of Theorem 3
This can be proved using the live edge approach (Kempe et al. 2003).
-
Under the IC model, for each edge \(\langle u,v\rangle\in {E}\), we independently flip a coin of bias \(p_{u,v}\) to decide whether the edge ⟨u, v⟩ is live or blocked to generate a sample influence propagation outcome X.
-
Under the LT model, for each node \(v\in V\), it picks at most one of its incoming edge at random—selecting the edge from an inverse neighbor u with probability \(p_{u,v}\) and not selecting any incoming edge with probability \(1-\sum _{u\in {I}_v}p_{u,v}\).
We use p(X) to denote the probability of a specific outcome X in the sample space. Let \({V}_h^X(v)\) denote the node set that can be reached from a node v within h hops in the sample outcome X. Then, the number of nodes that can be reached from a seed set S within h hops in the outcome X is given by \(\sigma _h^X({S})=\Big |\bigcup _{v\in {S}}{V}_h^X(v)\Big |\). Thus,
where the monotonicity of \(\sigma _h({S})\) holds since \(\sigma _h^X({S})\) increases as S expands.
The marginal influence gain
is the number of nodes that are reachable from a node u within h hops but are not reachable from any node in a seed set S within h hops in a sample outcome X. For any two node sets S and T where \({S}\subseteq {T}\), we have \(\bigcup _{v\in {S}}{V}_h^X(v)\subseteq \bigcup _{v\in {T}}{V}_h^X(v)\). Thus, \({V}_h^X(u)\setminus \bigcup _{v\in {S}}{V}_h^X(v)\supseteq {V}_h^X(u)\setminus \bigcup _{v\in {T}}{V}_h^X(v)\), which implies that
Since \(p(X)\ge 0\) for any X, taking the linear combination, we have
Thus, \(\sigma _h(\cdot )\) is submodular. \(\square\)
Proof of Theorem 4
Let \({S}_h^*\) denote the optimal seed set for maximizing the influence spread within h hops of propagation, i.e., \(\sigma _h({S}_h^*)=\max _{|{S}|=k}\sigma _h({S})\). We have
The first inequality follows from the fact that the exact influence spread is equal to the influence spread without any hop limitation of propagation. The second inequality is because that the greedy algorithm can achieve \(\left(\frac{1}{\kappa _f}(1-e^{-\kappa _f})\right)\)-approximation for maximizing a monotone submodular function f with a cardinality constraint (Conforti and Cornuéjols 1984), where the submodularity and monotonicity of \(\sigma _h(\cdot )\) is given by Theorem 3. The third inequality is because \({S}_h^*\) is the optimal solution for maximizing \(\sigma _h(\cdot )\). \(\square\)
We first introduce some lemmas used to prove Theorem 5.
Lemma 1
For scale-free random graphs with propagation probability \(p_{u,v}=p\) for every edge \(\langle u,v\rangle\in {E}\), the expected influence spread produced within one hop of propagation from a random seed set S satisfies
Proof of Lemma 1
With one hop of propagation, for a randomly selected node v, it is not activated if and only if v is not a seed and v is not activated by any of its inverse neighbors. The probability for v to be a non-seed node is \(1-\frac{k}{|{V}|}\). The probability for an inverse neighbor of v to be a seed is \(\frac{k}{|V|}\), and thus, the probability for it to activate v is \(p\cdot \frac{k}{|{V}|}\). Therefore, the probability for all of v’s inverse neighbors to fail to activate v is
Note that if v is selected as a seed, it must be activated. Hence, the overall activation probability of v is
As a result, the expectation of the activation probability of a random node v is given by
Therefore, it holds that \(\mathbb {E}[\sigma _1({S})]=|{V}|\cdot \mathbb {E}[\pi _1^{S}(v)]\ge (p+1)k-pk^2/|{V}|\). This completes the proof. \(\square\)
Lemma 2
(Li et al. 2012) For an infinite random power law graph, the expected fraction of nodes activated \(\phi ({S})=\mathbb {E}[\sigma ({S})]/|{V}|\) can be computed by
where \(P_1(d)=\frac{d^{1-\gamma }}{\sum _{d=1}^{\infty }d^{1-\gamma }}\) is the probability of a node connecting to a neighbor whose degree is d, and \(\varphi ({S})\) is an instrumental variable.
Lemma 3
The expected fraction of nodes activated \(\phi ({S})\) is bounded by
where \(A=1-\left (1-\frac{k}{|{V}|}\right )P_1(1)\).
Proof of Lemma 3
and
Hence, by (40) and (41), the lemma follows. \(\square\)
Proof of Theorem 5
Lemma 1 indicates that
Lemma 3 indicates that
Putting (42) and (43) together, the theorem follows. \(\square\)
Rights and permissions
About this article
Cite this article
Tang, J., Tang, X. & Yuan, J. An efficient and effective hop-based approach for influence maximization in social networks. Soc. Netw. Anal. Min. 8, 10 (2018). https://doi.org/10.1007/s13278-018-0489-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-018-0489-y