Skip to main content
Log in

An exact algorithm for robust influence maximization

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

We propose a Branch-and-Cut algorithm for the robust influence maximization problem. The influence maximization problem aims to identify, in a social network, a set of given cardinality comprising actors that are able to influence the maximum number of other actors. We assume that the social network is given in the form of a graph with node thresholds to indicate the resistance of an actor to influence, and arc weights to represent the strength of the influence between two actors. In the robust version of the problem that we study, the node thresholds and arc weights are affected by uncertainty and we optimize over a worst-case scenario within given robustness budgets. We study properties of the robust solution and show that even computing the worst-case scenario for given robustness budgets is NP-hard. We implement an exact Branch-and-Cut as well as a heuristic Branch-Cut-and-Price. Numerical experiments show that we are able to solve to optimality instances of size comparable to other exact approaches in the literature for the non-robust problem, and we can tackle the robust version with similar performance. On larger instances (\(\ge 2000\) nodes), our heuristic Branch-Cut-and-Price significantly outperforms a 2-opt heuristic. An extended abstract of this paper appeared in the proceedings of IPCO 2019.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In the IPCO version of this paper, all such constraints were added to the formulation as lazy constraints. The discussion here shows that this may yield invalid dual bounds and a potentially suboptimal solution. This is due to a mistake in the proof of Theorem 4 of the IPCO version: it is fixed in this paper. Experiments show that \(\approx 25\%\) of the solutions found in the IPCO paper are suboptimal.

References

  1. Ackerman, E., Ben-Zwi, O., Wolfovitz, G.: Combinatorial model and bounds for target set selection. Theor. Comput. Sci. 411(44–46), 4017–4022 (2010)

    MathSciNet  MATH  Google Scholar 

  2. Banerjee, S., Jenamani, M., Pratihar, D.K.: A survey on influence maximization in a social network. arXiv preprint arXiv:1808.05502 (2018)

  3. Belotti, P., Bonami, P., Fischetti, M., Lodi, A., Monaci, M., Nogales-Gómez, A., Salvagnin, D.: On handling indicator constraints in mixed integer programming. Comput. Optim. Appl. 65(3), 545–566 (2016)

    MathSciNet  MATH  Google Scholar 

  4. Ben-Tal, A., Nemirovski, A.: Selected topics in robust convex optimization. Math. Program. 112(1), 125–158 (2008)

    MathSciNet  MATH  Google Scholar 

  5. Bertsimas, D., Sim, M.: The price of robustness. Oper. Res. 52(1), 35–53 (2004)

    MathSciNet  MATH  Google Scholar 

  6. Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)

    MathSciNet  MATH  Google Scholar 

  7. Buchheim, C., Kurtz, J.: Robust combinatorial optimization under convex and discrete cost uncertainty. EURO J. Comput. Optim. 6(3), 211–238 (2018)

    MathSciNet  MATH  Google Scholar 

  8. Chen, W., Lin, T., Tan, Z., Zhao, M., Zhou, X.: Robust influence maximization. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 795–804. ACM (2016)

  9. Domingos, P., Richardson, M.: Mining the network value of customers. In: Hammer, P.L., Johnson E.L., Korte, B.H., Nemhauser, G.L. (eds.) Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 57–66 (2001)

  10. Edmonds, J., Giles, R.: A min-max relation for submodular functions on graphs. In: Annals of Discrete Mathematics, vol. 1, pp. 185–204. Elsevier (1977)

  11. Erdos, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5(1), 17–60 (1960)

    MathSciNet  MATH  Google Scholar 

  12. Fischetti, M., Kahr, M., Leitner, M., Monaci, M., Ruthmair, M.: Least cost influence propagation in (social) networks. Math. Program. 170(1), 293–325 (2018). (ISSN: 1436-4646)

    MathSciNet  MATH  Google Scholar 

  13. Goldberg, S., Liu, Z.: The diffusion of networking technologies. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6–8, 2013, pp. 1577–1594 (2013)

  14. Gunnec, D.: Integrating social network effects in product design and diffusion. Ph.D. thesis, XXX (2012)

  15. He, X., Kempe, D.: Stability and robustness in influence maximization. ACM Trans. Knowl. Discov. Data (TKDD) 12(6), 66 (2018)

    Google Scholar 

  16. Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pp. 137–146, New York, NY, USA. ACM. ISBN: 1-58113-737-0 (2003)

  17. Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. Theory Comput. 11(4), 105–147 (2015)

    MathSciNet  MATH  Google Scholar 

  18. Könemann, J., Sadeghabad, S.S., Sanità, L.: Better approximation algorithms for technology diffusion. In: Algorithms—ESA 2013—21st Annual European Symposium, Sophia Antipolis, France, September 2–4, 2013. Proceedings, pp. 637–646 (2013)

  19. Li, Y., Fan, J., Wang, Y., Tan, K.-L.: Influence maximization on social graphs: a survey. IEEE Trans. Knowl. Data Eng. 30(10), 1852–1872 (2018)

    Google Scholar 

  20. Li, Z., Ding, R., Floudas, C.A.: A comparative theoretical and computational study on robust counterpart optimization: I. Robust linear optimization and robust mixed integer linear optimization. Ind. Eng. Chem. Res. 50(18), 10567–10603 (2011)

    Google Scholar 

  21. McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: part I—convex underestimating problems. Math. Program. 10(1), 147–175 (1976)

    MATH  Google Scholar 

  22. Mossel, E., Roch, S.: On the submodularity of influence in social networks. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 128–134. ACM (2007)

  23. Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 61–70. ACM (2002)

  24. Sartor, G., Traversi, E., Calvo, R.W.: RobInMax: An open-source library for robust influence maximization (2020). https://doi.org/10.5281/zenodo.3692697

  25. Watts, D.J., Strogatz, S.H.: Collective dynamics of “small-world” networks. Nature 393(6684), 440 (1998)

    MATH  Google Scholar 

  26. Wu, H.-H., Küçükyavuz, S.: A two-stage stochastic programming approach for influence maximization in social networks. Comput. Optim. Appl. 69(3), 563–595 (2018)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Part of G. Nannicini’s research was conducted with support by the Simons Foundation and by the Mathematisches Forschungsinstitut Oberwolfach.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giacomo Nannicini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Table of notation

\(\delta ^-(j)\), \(\delta ^+(j)\):

instar and outstar of node j

\(y \in \{0,1\}^{n}\) :

incidence vector of the seeds

\(x \in \{0,1\}^{n}\) :

incidence vector of the activated nodes

\(t_i\) :

threshold of node i

\(w_{ij}\) :

weight of arc \(\{i,j\}\)

\(\varDelta _N\) :

maximal node threshold variation (expressed as percentage of t)

\(B_N\) :

total budget of threshold variations

\(\varDelta _A\) :

maximal arc weight variation (expressed as percentage of w)

\(B_A\) :

total budget of weight variations

P :

set of robustness parameters \( \{B_N, \varDelta _N, B_A,\varDelta _A\}\)

\(\theta _j\) :

increase of node threshold \(t_j\) in a robust solution

\(\varphi _{ij}\) :

decrease of weight \(w_{ij}\) in a robust solution

\(\text {RI}_{x, \theta , \varphi }({\bar{y}})\) :

total amount of influence that spreads on the graph for the given set of seeds \({\bar{y}}\), formulated as an optimization problem with decision variables x, \(\theta \) and \(\varphi \)

\(\text {RI}^P_{x, \theta , \varphi }({\bar{y}})\) :

problem \(\text {RI}_{x, \theta , \varphi }({\bar{y}})\) for a given set of robustness parameters P

(R-IMP):

mathematical model for the robust IMP with q activation seeds, formulated as bilevel optimization problem

\({\mathcal {C}}_j\) :

collection of minimal activation sets

\(\text {AS}_x(y)\) :

total amount of influence that spreads on the graph for the given set of seeds \({\bar{y}}\) with \(\theta =\varphi =0\), computed with a formulation based on activation sets

\(\pi , \mu \) :

variables of the dual of \(\text {AS}_x({\bar{y}})\), for a fixed \({\bar{y}}\)

(IMP-\(\theta 0\)-\(\varphi 0\)):

mathematical model for IMP, formulated using \(\text {AS}_x(y)\)

(DUAL-\(\theta 0\)-\(\varphi 0\)):

mathematical model for IMP, formulated using the dual of \(\text {AS}_x(y)\)

\({\mathcal {C}}^e_j\) :

extended collection of minimal activation sets

\({\mathcal {C}}_j^{{\bar{y}}}\) :

collection of seed-dependent minimal activation sets

(R-IMP-\({\bar{y}}\)):

mathematical model used to obtain valid dual bounds for IMP, parametric in \({\bar{y}}\)

(PRICE-j):

mathematical model used as pricing to generate varialbes associated to minimal active sets with a negative reduced cost, one for each node j

\(\psi , \beta , \alpha \) :

variables of (PRICE-j)

Big-M discussion

To find a value of big-M that ensures a valid formulation for the problem obtained by dualizing the inner problem AS in IMP-\(\theta 0\)-\(\varphi 0\), we can amend the dual formulation putting an upper bound on the dual variables: \(\mu _j\):

$$\begin{aligned} \left. \begin{array}{rrcl} \max _{\pi , \mu } &{} \sum _{j \in V} \sum _{S \in {\mathcal {C}}_j} (|S|-1) \pi _{j,S} + \sum _{j \in V} \mu _j {\bar{y}}_j \\ \forall j \in V &{} \sum _{k \in \delta ^+(j)} \sum _{S \in {\mathcal {C}}_k : j \in S} \pi _{k,S} - \sum _{S \in {\mathcal {C}}_j} \pi _{j,S} + \mu _j &{}\le &{} 1 \\ \forall j \in V, \forall S \in {\mathcal {C}}_j &{} \pi _{j,S} &{}\le &{} 0 \\ \forall j \in V &{} \mu _j &{}\ge &{} 0 \\ \forall j \in V &{} \mu _j &{}\le &{} U. \end{array} \right\} \end{aligned}$$

We need to find a value of U that does not change the solution to this problem. If we go back to the primal, we obtain:

$$\begin{aligned} \left. \begin{array}{rrcl} \min &{} \sum _{j \in V} x_j + \sum _{j \in V} U u_j \\ \forall j \in V, \forall S \in {\mathcal {C}}_j &{} \sum _{i \in S} x_i - x_j &{}\le &{} |S| - 1 \\ \forall j \in V &{} x_j + u_j &{}\ge &{} {\bar{y}}_j \\ \forall j \in V &{} u_j &{}\ge &{} 0, \end{array} \right\} \end{aligned}$$

where \(u_j\) is the variable associated with the dual constraints \(\mu _j \le U\). We can provide examples of graphs where \(U = n\) does not suffice. Suppose we have a directed graph with n nodes: a triangle \(1 \rightarrow 2, 2 \rightarrow 3, 3 \rightarrow 1\), three edges \(1\rightarrow 4, 2 \rightarrow 4, 3 \rightarrow 4\), and finally a chain \(4 \rightarrow 5, 5 \rightarrow 6, \dots , n-1 \rightarrow n\). Suppose all edge weights are 1 and all activation thresholds are 1, except for node 4 that has activation threshold equal to 3. The initial node seed is node 1, i.e., we have \({\bar{y}}_1 = 1\), \({\bar{y}}_k = 0\) for \(k \ge 2\). In this case, if U is set correctly, i.e., large enough that all \(u_j\) variables are 0, the solution to the primal should have \(x_k = 1\) for all k: all nodes are active, for an objective function value of n. This is because node 1 activates node 2, which activates node 3, these three nodes activate node 4, and then the entire chain activates. However after we introduce variables \(u_j\) in the primal, we can set \(x_1 = \frac{2}{3}, u_1 = \frac{1}{3}, x_2 = \frac{2}{3}, x_3 = \frac{2}{3}\): now the sum \(x_1 + x_2 + x_3 = 2\) is not enough to activate node 4 (associated with the constraint \(x_1 + x_2 + x_3 - x_4 \le 2\)), so that the total objective function value is \(x_1 + x_2 + x_3 + U u_1 = 2 + \frac{1}{3} U < n\) as long as \(U < 3(n-2)\). Thus, this shows that we need \(U > n\). In fact the example can be easily extended (using a cycle of length d, rather than a triangle) to show that we need at least \(U \ge nd\), where d is the maximum indegree of a node in the graph. Numerically, we found counterexamples where even \(U = n^2\) did not suffice; these examples were difficult to study analytically.

Given these difficulties, a straightforward big-M formulation is likely to fail. On the other hand, the indicator constraint version gives more freedom to CPLEX in tightening the big-M or switching to alternative formulation. In preliminary experiments we found that the use of indicator constraints leads to a numerically more stable implementation.

Fig. 5
figure 5

In each subfigure, the value of (mpi,nic) are reported on the vertical axis, while the values of (mci,mcr) are reported on the horizonal axis

Analysis of the parameters used in the heuristic comparison

To choose the parameters to be used in algorithm \({HeurCG}\) we tested the following combinations of values: \(\texttt {nic}\in \{ 2500, 5000, 7500 \}\)\(\texttt {mpi} \in \{ 0.025, 0.05, 0.75 \}\), \(\texttt {mci} \in \{ 1, 2, 3 \}\) and \(\texttt {mcr} \in \{ 1, 2, 3 \}\), for a total of 81 different configurations. Each configuration has been used to solve a subset of Large graphs with 2000 and 5000 nodes, an average node degree \(k=12\), a rewiring probability \(b\in \{0.1, 0.3\}\), 0.15n starting seeds and 5 random instances for each combination of settings. The robustness parameters chosen are \(\hbox {B}_A \in \{0, 25\}\) and \(\hbox {B}_N \in \{0, 25\}\).

In Fig. 5 we show the optimal value obtained, where each entry corresponds to the average computed over 40 instances. We can see that the variance of the optimal value is limited: for the instances with 2000 nodes the maximum difference between the best and the worse average is equal to 6.1, while for the instances with 5000 nodes, it is equal to 19.1. The value used for \(\texttt {mci}\) has a small impact on the performance of the algorithm, therefore we decided to use the intermediate value of 2. The parameter that seems to have the largest impact is \(\texttt {mcr}\). These experiments suggest setting \(\texttt {mcr}=2\): even if in some cases the best performance is achieved with values of \(\texttt {mcr}\) different from 2, the selected setting is the one providing the most consistent results across the graph sizes considered. Finally, for what it concerns the other parameters, we selected \(\texttt {nic}=2500\) and \(\texttt {mpi}=0.025\) because they show the best performance when \(\texttt {mcr}\) is equal to 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nannicini, G., Sartor, G., Traversi, E. et al. An exact algorithm for robust influence maximization. Math. Program. 183, 419–453 (2020). https://doi.org/10.1007/s10107-020-01507-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-020-01507-z

Keywords

Mathematics Subject Classification

Navigation