Skip to main content
Log in

On minimizing budget and time in influence propagation over social networks

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

In recent years, study of influence propagation in social networks has gained tremendous attention. In this context, we can identify three orthogonal dimensions—the number of seed nodes activated at the beginning (known as budget), the expected number of activated nodes at the end of the propagation (known as expected spread or coverage), and the time taken for the propagation. We can constrain one or two of these and try to optimize the third. In their seminal paper, Kempe et al. constrained the budget, left time unconstrained, and maximized the coverage: this problem is known as Influence Maximization (or MAXINF for short). In this paper, we study alternative optimization problems which are naturally motivated by resource and time constraints on viral marketing campaigns. In the first problem, termed minimum target set selection (or MINTSS for short), a coverage threshold η is given and the task is to find the minimum size seed set such that by activating it, at least η nodes are eventually activated in the expected sense. This naturally captures the problem of deploying a viral campaign on a budget. In the second problem, termed MINTIME, the goal is to minimize the time in which a predefined coverage is achieved. More precisely, in MINTIME, a coverage threshold η and a budget threshold k are given, and the task is to find a seed set of size at most k such that by activating it, at least η nodes are activated in the expected sense, in the minimum possible time. This problem addresses the issue of timing when deploying viral campaigns. Both these problems are NP-hard, which motivates our interest in their approximation. For MINTSS, we develop a simple greedy algorithm and show that it provides a bicriteria approximation. We also establish a generic hardness result suggesting that improving this bicriteria approximation is likely to be hard. For MINTIME, we show that even bicriteria and tricriteria approximations are hard under several conditions. We show, however, that if we allow the budget for number of seeds k to be boosted by a logarithmic factor and allow the coverage to fall short, then the problem can be solved exactly in PTIME, i.e., we can achieve the required coverage within the time achieved by the optimal solution to MINTIME with budget k and coverage threshold η. Finally, we establish the value of the approximation algorithms, by conducting an experimental evaluation, comparing their quality against that achieved by various heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. We use the terms coverage and expected spread interchangeably throughout the article.

  2. A variant of the linear threshold model, where a deterministic threshold θ u is chosen for each node, has also been studied (Chen 2008; Ben-Zwi et al. 2009). Coverage under this variant is not submodular.

  3. If \(\epsilon = 1, \mathcal{A}\) outputs an empty collection.

  4. Here, \(\text{OPT} _\mathcal{I}\) and \(\text{OPT} _\mathcal{J}\) represent the size of the optimal solution for instances \(\mathcal{I}\) and \(\mathcal{J}\) respectively.

  5. http://www.arXiv.org

  6. http://www.meme.yahoo.com/

  7. Instead of 1, we could be left with a constant number of elements. Asymptotically, it does not make a difference.

References

  • Agarwal N, Liu H, Tang L, Yu P (2011) Modeling blogger influence in a community. Social Netw Anal Min 1–24. doi:10.1007/s13278-011-0039-3

  • Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, WSDM ’11, pp 65–74

  • Bar-Ilan J, Kortsarz G, Peleg D (2001) Generalized submodular cover problems and applications. Theor Comput Sci 250(1–2):179–200

    Article  MathSciNet  MATH  Google Scholar 

  • Ben-Zwi O, Hermelin D, Lokshtanov D, Newman I (2009) An exact almost optimal algorithm for target set selection in social networks. In: EC ’09: Proceedings of the tenth ACM conference on electronic commerce, ACM, New York, NY, USA, pp 355–362

  • Bhagat S, Goyal A, Lakshmanan LVS (2012) Maximizing product adoption in social networks. In: Web search and data mining, WSDM

  • Bross J, Richly K, Kohnen M, Meinel C (2011) Identifying the top-dogs of the blogosphere. Social Netw Anal Min 1–15. doi:10.1007/s13278-011-0027-7

  • Cha M, Trez JP, Haddadi H (2011) The spread of media content through blogs. Social Netw Anal Min 1–16. doi:10.1007/s13278-011-0040-x

  • Chen N (2008) On the approximability of influence in social networks. In: SODA ’08: Proceedings of the nineteenth annual ACM–SIAM symposium on discrete algorithms, pp 1029–1037

  • Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09)

  • Chen W, Wang C, Wang Y (2010a) Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10)

  • Chen W, Yuan Y, Zhang L (2010b) Scalable influence maximization in social networks under the linear threshold model. In: Proceedings of the 10th IEEE international conference on data mining (ICDM’2010)

  • Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’01, pp 57–66

  • Feige U (1998) A threshold of XXX for approximating set cover. J ACM 45(4):634–652

    Article  MathSciNet  MATH  Google Scholar 

  • Fujito T (1999) On approximation of the submodular set cover problem. Oper Res Lett 25(4):169–174

    Article  MathSciNet  MATH  Google Scholar 

  • Fujito T (2000) Approximation algorithms for submodular set cover with applications. IEICE Trans Inf Syst 83

  • Goyal A, Bonchi F, Lakshmanan LVS (2008) Discovering leaders from community actions. In: Proceeding of the 17th ACM conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’08, pp 499–508

  • Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on web search and data mining, ACM, New York, NY, USA, WSDM ’10, pp 241–250

  • Goyal A, Bonchi F, Lakshmanan LVS (2011) A data-based approach to social influence maximization. PVLDB 5(1)

  • Kempe D, Kleinberg JM, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03)

  • Kempe D, Kleinberg J, Tardos É (2005) Influential nodes in a diffusion model for social networks. In: ICALP, Springer, Berlin, pp 1127–1138

  • Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Process Lett 70(1):39–45

    Article  MathSciNet  MATH  Google Scholar 

  • Kimura M, Saito K (2006) Tractable models for information diffusion in social networks. In: Proceedings of PKDD 2006, Lecture notes in computer science, vol 4213

  • Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance NS (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’07)

  • Li Gørtz I, Wirth A (2006) Asymmetry in k-center variants. Theor Comput Sci 361(2):188–199

    Article  Google Scholar 

  • Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions-I. Math Program 14(1):265–294

    Article  MathSciNet  MATH  Google Scholar 

  • Panigrahy R, Vishwanathan S (1998) An O(log* n) approximation algorithm for the asymmetric p-center problem. J Algorithms 27(2):259–268

    Article  MathSciNet  MATH  Google Scholar 

  • Richardson M, Domingos P (2002) Mining knowledge-sharing sites for viral marketing. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’02, pp 61–70

  • Slaví k P (1997) Improved performance of the greedy algorithm for partial cover. Inform Process Lett 64(5):251–254

    Article  MathSciNet  Google Scholar 

  • Sviridenko M (2004) A note on maximizing a submodular set function subject to a knapsack constraint. Oper Res Lett 32(1):41–43

    Article  MathSciNet  MATH  Google Scholar 

  • Weng J, Lim EP, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on web search and data mining, ACM, New York, NY, USA, WSDM ’10, pp 261–270

  • Wolsey LA (1982) An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica 2(4):385–393

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Goyal.

Appendix

Appendix

1.1 A Proof of Lemma 2

Suppose there exists an algorithm \(\mathcal{A}\) that selects β k sets which covers γ η elements. Apply \(\mathcal{A}\) to an arbitrary instance \(\langle \mathcal{U}, \mathcal{S}, \eta \rangle\) of PSC. The output is a collection of sets \(\mathcal{C}_1\) such that \(|\mathcal{C}_1| \le \beta k\) and \(\left| { \cup _{{s \in c_{1} }} S} \right|{ \ge } \gamma \eta \) Next, discard the sets that have been selected and the elements they cover, and apply again the algorithm \(\mathcal{A}\) on the remaining universe. Repeat this process until 1 or fewer elements are left uncovered.Footnote 7

Let η i denote the number of elements uncovered after iteration i. In iteration i, the algorithm picks β k sets and covers at least γ η i−1 elements. Hence, \(\eta_i \le \eta_{i-1} \cdot (1 - \gamma). \) Expanding, \(\eta_i \le \eta \cdot (1 - \gamma)^i. \) Suppose after l iterations, η l  = 1. The total number of sets picked is \(l\beta k. \eta \cdot (1 - \gamma)^l = 1\) implies \(l = \frac{\ln \eta}{\ln \frac{1}{1-\gamma}}. \)

We now prove the first claim. Let γ > 1 − 1/e β, then \(\ln \left( \frac{1}{1-\gamma} \right) > \beta. \) This yields a PTIME algorithm for PSC which outputs a solution of size \( l \beta k = \beta k \cdot \ln \eta / \ln \frac{1}{1-\gamma} \le c \cdot k \ln \eta\) (for some c < 1) This yields an \(c \cdot \ln \eta\)-approximation for PSC for some c < 1, which is not possible unless \({\rm NP} \subseteq \text{DTIME}(n^{O(\log \log n)})\) (Feige 1998).

To prove the second claim, assume \(\beta \le (1 - \delta) \ln \left( \frac{1}{1 -\gamma} \right). \) This gives a PTIME algorithm for PSC which outputs a solution of size \(l \beta k = \beta k \cdot \ln \eta / \ln \frac{1}{1-\gamma} \le (1 - \delta) k \cdot \ln \eta\) which is not possible unless \({\rm NP} \subseteq \text{DTIME}(n^{O(\log \log n)}). \) \(\quad\square\)

1.2 B Example illustrating performance of Wolsey’s solution

Wolsey (1982) studied the RSSC problem and showed, among many things, that the greedy algorithm provides a solution that is within a factor of \(1 + \ln (\eta/(\eta-f(S_{t-1}))\) of the optimal solution. Unfortunately, this does not yield an approximation algorithm with any guaranteed bounds. The following example shows the greedy solution with threshold η can be arbitrarily worse than the optimum.

Example

(Illustrated also in Fig. 4). Consider a ground set \(\mathcal{X} = \{w_1, w_2, v_1, v_2, \ldots, v_l\}\) with elements having unit costs. Figure 4 geometrically depicts the definition of a function \({f: 2^{\mathcal{X}} {\rightarrow}\;\mathbb{R}, }\) where for any set \(S \subset \mathcal{X},\;f(S)\) is defined to be the area (shown shaded) covered by the elements of S. Specifically, f(w 1) = f(w 2) = 1 − 1/2l+1 and f(v i ) = 1/2i−1, 1 ≤ i ≤ l. Notice, \(f(\{v_1, \ldots, v_l\}) = \Upsigma_{i=1}^l 1/2^{i-1} = 2 - 1/2^{l-1} < 2 - 1/2^l = f(\{w_1, w_2\}). \) The greedy algorithm will first pick v 1. Suppose it picks \(S = \{v_1,\ldots, v_i\}\) in i rounds. Then f(S∪{v i+1}) − f(S) = 1/2i > 1 − 1/2l+1 − 1 + 1/2i = 1 − 1/2l+1 − 1/2(2 − 1/2i−1) = f(S ∪ {w 1}) − f(S). Thus, greedy will never pick w 1 or w 2 before it picks \(v_1,\ldots, v_l. \) Suppose η = 2 − 1/2l. Clearly, the greedy solution is \(\mathcal{X}\) whereas the optimal solution is {w 1, w 2}. Here l can be arbitrarily large.

Fig. 4
figure 4

Example. Rectangles represent the elements in the universe. The shaded area within a rectangle represents the coverage function f for the element. e.g., f(v 1) = 1/2 + 1/2 = 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goyal, A., Bonchi, F., Lakshmanan, L.V.S. et al. On minimizing budget and time in influence propagation over social networks. Soc. Netw. Anal. Min. 3, 179–192 (2013). https://doi.org/10.1007/s13278-012-0062-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-012-0062-z

Keywords

Navigation