Abstract
We propose a Branch-and-Cut algorithm for the robust influence maximization problem. The influence maximization problem aims to identify, in a social network, a set of given cardinality comprising actors that are able to influence the maximum number of other actors. We assume that the social network is given in the form of a graph with node thresholds to indicate the resistance of an actor to influence, and arc weights to represent the strength of the influence between two actors. In the robust version of the problem that we study, the node thresholds and arc weights are affected by uncertainty and we optimize over a worst-case scenario within given robustness budgets. We study properties of the robust solution and show that even computing the worst-case scenario for given robustness budgets is NP-hard. We implement an exact Branch-and-Cut as well as a heuristic Branch-Cut-and-Price. Numerical experiments show that we are able to solve to optimality instances of size comparable to other exact approaches in the literature for the non-robust problem, and we can tackle the robust version with similar performance. On larger instances (\(\ge 2000\) nodes), our heuristic Branch-Cut-and-Price significantly outperforms a 2-opt heuristic. An extended abstract of this paper appeared in the proceedings of IPCO 2019.
Similar content being viewed by others
Notes
In the IPCO version of this paper, all such constraints were added to the formulation as lazy constraints. The discussion here shows that this may yield invalid dual bounds and a potentially suboptimal solution. This is due to a mistake in the proof of Theorem 4 of the IPCO version: it is fixed in this paper. Experiments show that \(\approx 25\%\) of the solutions found in the IPCO paper are suboptimal.
References
Ackerman, E., Ben-Zwi, O., Wolfovitz, G.: Combinatorial model and bounds for target set selection. Theor. Comput. Sci. 411(44–46), 4017–4022 (2010)
Banerjee, S., Jenamani, M., Pratihar, D.K.: A survey on influence maximization in a social network. arXiv preprint arXiv:1808.05502 (2018)
Belotti, P., Bonami, P., Fischetti, M., Lodi, A., Monaci, M., Nogales-Gómez, A., Salvagnin, D.: On handling indicator constraints in mixed integer programming. Comput. Optim. Appl. 65(3), 545–566 (2016)
Ben-Tal, A., Nemirovski, A.: Selected topics in robust convex optimization. Math. Program. 112(1), 125–158 (2008)
Bertsimas, D., Sim, M.: The price of robustness. Oper. Res. 52(1), 35–53 (2004)
Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)
Buchheim, C., Kurtz, J.: Robust combinatorial optimization under convex and discrete cost uncertainty. EURO J. Comput. Optim. 6(3), 211–238 (2018)
Chen, W., Lin, T., Tan, Z., Zhao, M., Zhou, X.: Robust influence maximization. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 795–804. ACM (2016)
Domingos, P., Richardson, M.: Mining the network value of customers. In: Hammer, P.L., Johnson E.L., Korte, B.H., Nemhauser, G.L. (eds.) Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 57–66 (2001)
Edmonds, J., Giles, R.: A min-max relation for submodular functions on graphs. In: Annals of Discrete Mathematics, vol. 1, pp. 185–204. Elsevier (1977)
Erdos, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5(1), 17–60 (1960)
Fischetti, M., Kahr, M., Leitner, M., Monaci, M., Ruthmair, M.: Least cost influence propagation in (social) networks. Math. Program. 170(1), 293–325 (2018). (ISSN: 1436-4646)
Goldberg, S., Liu, Z.: The diffusion of networking technologies. In: Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6–8, 2013, pp. 1577–1594 (2013)
Gunnec, D.: Integrating social network effects in product design and diffusion. Ph.D. thesis, XXX (2012)
He, X., Kempe, D.: Stability and robustness in influence maximization. ACM Trans. Knowl. Discov. Data (TKDD) 12(6), 66 (2018)
Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pp. 137–146, New York, NY, USA. ACM. ISBN: 1-58113-737-0 (2003)
Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. Theory Comput. 11(4), 105–147 (2015)
Könemann, J., Sadeghabad, S.S., Sanità, L.: Better approximation algorithms for technology diffusion. In: Algorithms—ESA 2013—21st Annual European Symposium, Sophia Antipolis, France, September 2–4, 2013. Proceedings, pp. 637–646 (2013)
Li, Y., Fan, J., Wang, Y., Tan, K.-L.: Influence maximization on social graphs: a survey. IEEE Trans. Knowl. Data Eng. 30(10), 1852–1872 (2018)
Li, Z., Ding, R., Floudas, C.A.: A comparative theoretical and computational study on robust counterpart optimization: I. Robust linear optimization and robust mixed integer linear optimization. Ind. Eng. Chem. Res. 50(18), 10567–10603 (2011)
McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: part I—convex underestimating problems. Math. Program. 10(1), 147–175 (1976)
Mossel, E., Roch, S.: On the submodularity of influence in social networks. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, pp. 128–134. ACM (2007)
Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 61–70. ACM (2002)
Sartor, G., Traversi, E., Calvo, R.W.: RobInMax: An open-source library for robust influence maximization (2020). https://doi.org/10.5281/zenodo.3692697
Watts, D.J., Strogatz, S.H.: Collective dynamics of “small-world” networks. Nature 393(6684), 440 (1998)
Wu, H.-H., Küçükyavuz, S.: A two-stage stochastic programming approach for influence maximization in social networks. Comput. Optim. Appl. 69(3), 563–595 (2018)
Acknowledgements
Part of G. Nannicini’s research was conducted with support by the Simons Foundation and by the Mathematisches Forschungsinstitut Oberwolfach.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Table of notation
- \(\delta ^-(j)\), \(\delta ^+(j)\):
-
instar and outstar of node j
- \(y \in \{0,1\}^{n}\) :
-
incidence vector of the seeds
- \(x \in \{0,1\}^{n}\) :
-
incidence vector of the activated nodes
- \(t_i\) :
-
threshold of node i
- \(w_{ij}\) :
-
weight of arc \(\{i,j\}\)
- \(\varDelta _N\) :
-
maximal node threshold variation (expressed as percentage of t)
- \(B_N\) :
-
total budget of threshold variations
- \(\varDelta _A\) :
-
maximal arc weight variation (expressed as percentage of w)
- \(B_A\) :
-
total budget of weight variations
- P :
-
set of robustness parameters \( \{B_N, \varDelta _N, B_A,\varDelta _A\}\)
- \(\theta _j\) :
-
increase of node threshold \(t_j\) in a robust solution
- \(\varphi _{ij}\) :
-
decrease of weight \(w_{ij}\) in a robust solution
- \(\text {RI}_{x, \theta , \varphi }({\bar{y}})\) :
-
total amount of influence that spreads on the graph for the given set of seeds \({\bar{y}}\), formulated as an optimization problem with decision variables x, \(\theta \) and \(\varphi \)
- \(\text {RI}^P_{x, \theta , \varphi }({\bar{y}})\) :
-
problem \(\text {RI}_{x, \theta , \varphi }({\bar{y}})\) for a given set of robustness parameters P
- (R-IMP):
-
mathematical model for the robust IMP with q activation seeds, formulated as bilevel optimization problem
- \({\mathcal {C}}_j\) :
-
collection of minimal activation sets
- \(\text {AS}_x(y)\) :
-
total amount of influence that spreads on the graph for the given set of seeds \({\bar{y}}\) with \(\theta =\varphi =0\), computed with a formulation based on activation sets
- \(\pi , \mu \) :
-
variables of the dual of \(\text {AS}_x({\bar{y}})\), for a fixed \({\bar{y}}\)
- (IMP-\(\theta 0\)-\(\varphi 0\)):
-
mathematical model for IMP, formulated using \(\text {AS}_x(y)\)
- (DUAL-\(\theta 0\)-\(\varphi 0\)):
-
mathematical model for IMP, formulated using the dual of \(\text {AS}_x(y)\)
- \({\mathcal {C}}^e_j\) :
-
extended collection of minimal activation sets
- \({\mathcal {C}}_j^{{\bar{y}}}\) :
-
collection of seed-dependent minimal activation sets
- (R-IMP-\({\bar{y}}\)):
-
mathematical model used to obtain valid dual bounds for IMP, parametric in \({\bar{y}}\)
- (PRICE-j):
-
mathematical model used as pricing to generate varialbes associated to minimal active sets with a negative reduced cost, one for each node j
- \(\psi , \beta , \alpha \) :
-
variables of (PRICE-j)
Big-M discussion
To find a value of big-M that ensures a valid formulation for the problem obtained by dualizing the inner problem AS in IMP-\(\theta 0\)-\(\varphi 0\), we can amend the dual formulation putting an upper bound on the dual variables: \(\mu _j\):
We need to find a value of U that does not change the solution to this problem. If we go back to the primal, we obtain:
where \(u_j\) is the variable associated with the dual constraints \(\mu _j \le U\). We can provide examples of graphs where \(U = n\) does not suffice. Suppose we have a directed graph with n nodes: a triangle \(1 \rightarrow 2, 2 \rightarrow 3, 3 \rightarrow 1\), three edges \(1\rightarrow 4, 2 \rightarrow 4, 3 \rightarrow 4\), and finally a chain \(4 \rightarrow 5, 5 \rightarrow 6, \dots , n-1 \rightarrow n\). Suppose all edge weights are 1 and all activation thresholds are 1, except for node 4 that has activation threshold equal to 3. The initial node seed is node 1, i.e., we have \({\bar{y}}_1 = 1\), \({\bar{y}}_k = 0\) for \(k \ge 2\). In this case, if U is set correctly, i.e., large enough that all \(u_j\) variables are 0, the solution to the primal should have \(x_k = 1\) for all k: all nodes are active, for an objective function value of n. This is because node 1 activates node 2, which activates node 3, these three nodes activate node 4, and then the entire chain activates. However after we introduce variables \(u_j\) in the primal, we can set \(x_1 = \frac{2}{3}, u_1 = \frac{1}{3}, x_2 = \frac{2}{3}, x_3 = \frac{2}{3}\): now the sum \(x_1 + x_2 + x_3 = 2\) is not enough to activate node 4 (associated with the constraint \(x_1 + x_2 + x_3 - x_4 \le 2\)), so that the total objective function value is \(x_1 + x_2 + x_3 + U u_1 = 2 + \frac{1}{3} U < n\) as long as \(U < 3(n-2)\). Thus, this shows that we need \(U > n\). In fact the example can be easily extended (using a cycle of length d, rather than a triangle) to show that we need at least \(U \ge nd\), where d is the maximum indegree of a node in the graph. Numerically, we found counterexamples where even \(U = n^2\) did not suffice; these examples were difficult to study analytically.
Given these difficulties, a straightforward big-M formulation is likely to fail. On the other hand, the indicator constraint version gives more freedom to CPLEX in tightening the big-M or switching to alternative formulation. In preliminary experiments we found that the use of indicator constraints leads to a numerically more stable implementation.
Analysis of the parameters used in the heuristic comparison
To choose the parameters to be used in algorithm \({HeurCG}\) we tested the following combinations of values: \(\texttt {nic}\in \{ 2500, 5000, 7500 \}\)\(\texttt {mpi} \in \{ 0.025, 0.05, 0.75 \}\), \(\texttt {mci} \in \{ 1, 2, 3 \}\) and \(\texttt {mcr} \in \{ 1, 2, 3 \}\), for a total of 81 different configurations. Each configuration has been used to solve a subset of Large graphs with 2000 and 5000 nodes, an average node degree \(k=12\), a rewiring probability \(b\in \{0.1, 0.3\}\), 0.15n starting seeds and 5 random instances for each combination of settings. The robustness parameters chosen are \(\hbox {B}_A \in \{0, 25\}\) and \(\hbox {B}_N \in \{0, 25\}\).
In Fig. 5 we show the optimal value obtained, where each entry corresponds to the average computed over 40 instances. We can see that the variance of the optimal value is limited: for the instances with 2000 nodes the maximum difference between the best and the worse average is equal to 6.1, while for the instances with 5000 nodes, it is equal to 19.1. The value used for \(\texttt {mci}\) has a small impact on the performance of the algorithm, therefore we decided to use the intermediate value of 2. The parameter that seems to have the largest impact is \(\texttt {mcr}\). These experiments suggest setting \(\texttt {mcr}=2\): even if in some cases the best performance is achieved with values of \(\texttt {mcr}\) different from 2, the selected setting is the one providing the most consistent results across the graph sizes considered. Finally, for what it concerns the other parameters, we selected \(\texttt {nic}=2500\) and \(\texttt {mpi}=0.025\) because they show the best performance when \(\texttt {mcr}\) is equal to 2.
Rights and permissions
About this article
Cite this article
Nannicini, G., Sartor, G., Traversi, E. et al. An exact algorithm for robust influence maximization. Math. Program. 183, 419–453 (2020). https://doi.org/10.1007/s10107-020-01507-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-020-01507-z