Resampling-based predictive simulation framework of stochastic diffusion model for identifying top-K influential nodes

Ohara, Kouzou; Saito, Kazumi; Kimura, Masahiro; Motoda, Hiroshi

doi:10.1007/s41060-019-00183-3

Resampling-based predictive simulation framework of stochastic diffusion model for identifying top-K influential nodes

Regular Paper
Published: 11 April 2019

Volume 9, pages 175–195, (2020)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Kouzou Ohara ORCID: orcid.org/0000-0002-7399-2472¹,
Kazumi Saito^2,3,
Masahiro Kimura⁴ &
…
Hiroshi Motoda⁵

227 Accesses
6 Citations
Explore all metrics

Abstract

We address a problem of efficiently estimating the influence of a node in information diffusion over a social network. Since the information diffusion is a stochastic process, the influence degree of a node is quantified by the expectation, which is usually obtained by very time-consuming many runs of simulation. Our contribution is that we proposed a framework for predictive simulation based on the leave-N-out cross-validation technique that well approximates the error from the unknown ground truth for two target problems: one to estimate the influence degree of each node, and the other to identify top-K influential nodes. The method we proposed for the first problem estimates the approximation error of the influence degree of each node, and the method for the second problem estimates the precision of the derived top-K nodes, both without knowing the true influence degree. We experimentally evaluate the proposed methods using the three real-world networks and show that they can serve as a good measure to solve the target problems with far fewer runs of simulation ensuring the accuracy when the leave-half-out cross-validation, i.e., N is the half of the number of runs, is used, which means that one can identify the influential nodes without knowing exactly their influence degree in good accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying Influential Spreaders by Graph Sampling

A Diffusion Model for Maximizing Influence Spread in Large Networks

Resampling-Based Framework for Estimating Node Centrality of Large Social Network

Notes

This is similar to the influence maximization problem, but is different in that the target nodes are not guaranteed to spread the passed information, i.e., the cascade may stop there.
This paper intensively extends our conference paper [34] by analyzing the relationship between the influence degree distribution and the topological properties of a network more in depth for additional five kinds of synthetic networks and providing new findings derived from additional experiments.
http://www.ameba.jp/.
http://www.cosme.net/.

References

Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM’11), pp. 65–74 (2011)
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Article MathSciNet Google Scholar
Borgs, C., Brautbar, M., Chayes, J., Lucier, B.: Maximizing social influence in nearly optimal time. In: Proceedings of the 25th Annual ACM–SIAM Symposium on Discrete Algorithms (SODA’14), pp. 946–957 (2014)
Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2010), pp. 1029–1038 (2010)
Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 199–208 (2009)
Chen, W., Yuan, Y., Zhang, L.: Scalable influence maximization in social networks under the linear threshold model. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010), pp. 88–97 (2010)
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 23(4), 493–507 (1952)
Article MathSciNet Google Scholar
Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55, 441–453 (1997)
Article MathSciNet Google Scholar
Cohen, E.: All-distances sketches, revisited: hip estimators for massive graphs analysis. In: Proceedings of the 33rd ACM SIGMOD–SIGACT–SIGART Symposium on Principles of Database Systems, pp. 88–99 (2015)
Cohen, E., Delling, D., Pajor, T., Werneck, R.F.: Sketch-based influence maximization and computation: Scaling up with guarantees. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 629–638 (2014)
Cui, P., Wang, F., Yang, S., Sun, L.: Item-level social influence prediction with probabilistic hybrid factor matrix factorization. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2011), pp. 331–336 (2011)
Goyal, A., Lu, W., Lakshmanan, L.: Celf++: optimizing the greedy algorithm for influence maximization in social networks. In: Proceedings of the 20th International World Wide Web Conference (WWW2011), pp. 47–48 (2011)
Guille, A., Hacid, H.: A predictive model for the temporal dynamics of information diffusion in online social networks. In: Proceedings of the 21st International Conference Companion on World Wide Web (WWW’12), pp. 1145–1152 (2012)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Article MathSciNet Google Scholar
Jin, R., Liu, L., Aggarwal, C., Shen, Y.: Reliable clustering on uncertain graphs. In: Proceedings of the 2012 IEEE 12th International Conference on Data Mining (ICDM 2012), pp. 459–468 (2012)
Kassiano, V., Gounaris, A., Papadopoulos, A.N., Tsichlas, K.: Mining uncertain graphs: an overview. In: Proceedings of the International Workshop on Algorithmic Aspects of Cloud Computing (ALGOCLOUD 2016), pp. 87–116 (2017)
Chapter Google Scholar
Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 137–146 (2003)
Khan, A., Ye, Y., L, C.: On Uncertain Graphs. Morgan & Claypool, San Rafael (2018)
Book Google Scholar
Kimura, M., Saito, K.: Tractable models for information diffusion in social networks. In: Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2006), LNAI 4213, pp. 259–271 (2006)
Google Scholar
Kimura, M., Saito, K., Motoda, H.: Minimizing the spread of contamination by blocking links in a network. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI-08), pp. 1175–1180 (2008)
Kimura, M., Saito, K., Motoda, H.: Blocking links to minimize contamination spread in a social network. ACM Trans. Knowl. Discov. Data 3, 9:1–9:23 (2009)
Article Google Scholar
Kimura, M., Saito, K., Motoda, H.: Efficient estimation of influence functions fot sis model on social networks. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09) (2009)
Kimura, M., Saito, K., Nakano, R.: Extracting influential nodes for information diffusion on a social network. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI-07), pp. 1371–1376 (2007)
Kimura, M., Saito, K., Nakano, R., Motoda, H.: Extracting influential nodes on a social network for information diffusion. Data Min. Knowl. Disc. 20, 70–97 (2010)
Article MathSciNet Google Scholar
Kimura, M., Saito, K., Ohara, K., Motoda, H.: Speeding-up node influence computation for huge social networks. Int. J. Data Sci. Anal. 1, 1–14 (2016)
Article Google Scholar
Kleinberg, J.: The convergence of social and technological networks. Commun. ACM 51(11), 66–72 (2008)
Article Google Scholar
Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Proceedings of the 2004 European Conference on Machine Learning (ECML’04), pp. 217–226 (2004)
Chapter Google Scholar
Lawrence, T., Hosein, P.: Stochastic dynamic programming heuristics for influence maximization-revenue optimization. Int. J. Data Sci. Anal. (first Online) (2018)
Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), pp. 420–429 (2007)
Loukides, G., Gwadera, R.: Preventing the diffusion of information to vulnerable users while preserving pagerank. Int. J. Data Sci. Anal. 5(1), 19–39 (2018)
Article Google Scholar
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002)
Article Google Scholar
Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003)
Article MathSciNet Google Scholar
Nguyen, H., Zheng, R.: Influence spread in large-scale social networks—a belief propagation approach. In: Proceedings of the 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), LNAI 7524, pp. 515–530 (2012)
Chapter Google Scholar
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Predictive simulation framework of stochastic diffusion model for identifying top-k influential nodes. In: Proceedings of the 5th Asian Conference on Machine Learning (ACML2013), PMLR, vol. 29, pp. 149–164 (2013)
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based framework for estimating node centrality of large social network. In: Proceedings of the 17th International Conference on Discovery Science (DS’14), LNAI 8777, pp. 228–239 (2014)
Chapter Google Scholar
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Resampling-based gap analysis for detecting nodes with high centrality on large social network. In: Proceedings of the Nineteenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2015), pp. 135–147 (2015)
Chapter Google Scholar
Ohara, K., Saito, K., Kimura, M., Motoda, H.: Critical node identification based on articulation point detection for network with uncertain connectivity. In: Proceedings of the Sixth International Symposium on Computing and Networking (CANDAR 2018), pp. 146–152 (2018)
Potamias, M., Bonchi, F., Gionis, A., Kollios, G.: K-nearest neighbors in uncertain graphs. Proc. VLDB Endow. 3, 997–1008 (2010)
Article Google Scholar
Rossetti, G., Milli, L., Salvator, R., Sirbu, A., Pedreschi, D., Giannotti, F.: Ndlib: a python library to model and analyze diffusion processes over complex networks. Int. J. Data Sci. Anal. 5(1), 61–79 (2018)
Article Google Scholar
Saito, K., Kimura, M., Motoda, H.: Discovering influential nodes for sis models in social networks. In: Proceedings of the Twelfth International Conference of Discovery Science (DS2009), LNAI 5808, pp. 302–316. Springer, Berlin (2009)
Chapter Google Scholar
Saito, K., Kimura, M., Ohara, K., Motoda, H.: Learning continuous-time information diffusion model for social behavioral data analysis. In: Proceedings of the 1st Asian Conference on Machine Learning (ACML2009), LNAI 5828, pp. 322–337 (2009)
Google Scholar
Saito, K., Kimura, M., Ohara, K., Motoda, H.: Selecting information diffusion models over social networks for behavioral analysis. In: Proceedings of the 2010 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2010), LNAI 6323, pp. 180–195 (2010)
Chapter Google Scholar
Saito, K., Kimura, M., ohara, K., Motoda, H.: Which targets to contact first to maximize influence over social network. In: Proceedings of the 6th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction (SBP2013), LNCS 7812, pp. 359–367 (2013)
Chapter Google Scholar
Vázquez, A.: Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. 67(5), 056,104 (2003)
Google Scholar
Watts, D.J., Dodds, P.S.: Influence, networks, and public opinion formation. J. Consum. Res. 34, 441–458 (2007)
Article Google Scholar
Watts, D.J., Strogatz, S.H.: Collective dyanamics of “small-world” networks. Nature 393, 440–442 (1998)
Article Google Scholar
Yang, J., Counts, S.: Predicting the speed, scale, and range of information diffusion in twitter. In: Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM 2010) (2010)
Yang, J., Leskovec, J.: Modeling information diffusion in implicit networks. In: Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM’10), pp. 599–608 (2010)
Yang, Y., Chen, E., Liu, Q., Xiang, B., Xu, T., Shad, S.: On approximation of real-world influence spread. In: Proceedings of the 2012 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2012), LNAI 7524a, pp. 548–564 (2012)
Chapter Google Scholar
Zhou, C., Zhang, P., Zang, W., Guo, L.: On the upper bounds of spread for greedy algorithms in social network influence maximization. IEEE Trans. Knowl. Data Eng. 27, 2770–2783 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Integrated Information Technology, Aoyama Gakuin University, Kanagawa, 252-5258, Japan
Kouzou Ohara
Faculty of Science, Kanagawa University, Kanagawa, 259-1293, Japan
Kazumi Saito
Center for Advanced Intelligence Project, RIKEN, Tokyo, 103-0027, Japan
Kazumi Saito
Department of Electronics and Informatics, Ryukoku University, Otsu, 520-2194, Japan
Masahiro Kimura
Institute of Scientific and Industrial Research, Osaka University, Osaka, 567-0047, Japan
Hiroshi Motoda

Authors

Kouzou Ohara
View author publications
You can also search for this author in PubMed Google Scholar
Kazumi Saito
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Motoda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kouzou Ohara.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partly supported by JSPS Grant-in-Aid for Scientific Research (C) (No. 17K00314).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ohara, K., Saito, K., Kimura, M. et al. Resampling-based predictive simulation framework of stochastic diffusion model for identifying top-K influential nodes. Int J Data Sci Anal 9, 175–195 (2020). https://doi.org/10.1007/s41060-019-00183-3

Download citation

Received: 28 July 2018
Accepted: 28 March 2019
Published: 11 April 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s41060-019-00183-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resampling-based predictive simulation framework of stochastic diffusion model for identifying top-K influential nodes

Abstract

Access this article

Similar content being viewed by others

Identifying Influential Spreaders by Graph Sampling

A Diffusion Model for Maximizing Influence Spread in Large Networks

Resampling-Based Framework for Estimating Node Centrality of Large Social Network

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Resampling-based predictive simulation framework of stochastic diffusion model for identifying top-K influential nodes

Abstract

Access this article

Similar content being viewed by others

Identifying Influential Spreaders by Graph Sampling

A Diffusion Model for Maximizing Influence Spread in Large Networks

Resampling-Based Framework for Estimating Node Centrality of Large Social Network

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation