Abstract
To estimate the range of information diffusion is critical for social network and user behavior analysis. Selecting nodes to constitute the range of information diffusion is challenging by the classic independent cascade and linear threshold models, due to the unknown topology of large-scale online social networks (OSNs). In this paper, we start from the mining of frequent itemsets in historical records of information diffusion, and adopt Bayesian network (BN) as the framework to represent and infer the implied dependence relations among frequent items. To make probabilistic inferences to infer the range, we first propose a greedy algorithm to select the observed nodes as the evidence of BN inference, for which we propose the metric of proximity degree and prove its submodularity. Then, we give the algorithm to construct the item-association BN (IABN) to represent the dependencies among frequent items. Following, we present an approximate algorithm to infer the range of information diffusion w.r.t. the observed nodes. Experimental results show that the observed nodes could be selected and the range of information diffusion could be inferred effectively. Empirical studies also demonstrate that our proposed IABN outperforms some state-of-the-art methods to obtain relatively complete nodes in the range of information diffusion.
Similar content being viewed by others
References
Agarwal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large databases (VLDB), pp 487–499
Arnaboldi V, Passarella A, Conti M et al (2015) Online social networks: human cognitive constraints in Facebook and Twitter personal graphs. Elsevier, Amsterdam
Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACMSIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 199–208
Christakis Nicholas, Fowler James (2010) Social network sensors for early detection of contagious outbreaks. PLoS One 5:e12948, 09. https://doi.org/10.1371/journal.pone.0012948
Cui P, Jin S, Yu L et al (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 901–909
Devore J (2004) Probability and statistics for engineering and the sciences. Wadsworth Group, Davidson
Filmus Yuval (2013) Inequalities on submodular functions via term rewriting. Inf Process Lett 113(13):457–464
George D, Hawkins J (2005) A hierarchical Bayesian model of in variant pattern recognition in the visual cortex. In: Proceedings of 2005 IEEE international joint conference on neural networks (IJCNN), pp 1812–1817
Ha C, Wu X, Hu X et al (2011) Computing and pruning method for frequent pattern interestingness based on Bayesian networks. J Softw 22(12):2934–2950
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, Burlington
Han J, Cheng H, Xin D et al (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–85
Hasan M (2016) Methods and applications of network sampling
Hernando A, Bobadilla J, Ortega F (2016) A non negative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model. Knowl-Based Syst 97:188–202
Hu S, Cautis B, Chen Z et al (2019) Model-free inference of diffusion networks using RKHS embeddings. Data Min Knowl Discov 33:499–525
Kurant M, Gjoka M, Wang Y et al (2012) Coarse-grained topology estimation via graph sampling. In: Proceedings of the ACM SIGCOMM 2012 conference on data communication, pp 25–30
Lee G, Yun U, Ruang H (2014) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACMSIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 631–636
Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 497–506
Liu W, Yue K, Liu H et al (2014) Associative categorization of frequent patterns based on the probabilistic graphical model. Front Comput Sci 8(2):265–278
Liu W, Yue K, Wu H et al (2018) Markov-network based latent link analysis for community detection in social behavioral interactions. Appl Intell 48(8):2081–2096
Maiya A, Berger-Wolf T (2010) Online sampling of high centrality individuals in social networks. In: Proceedings of the 14th Pacific-Asia knowledge discovery and data mining (PAKDD), pp 91–98
Menon A, Chitrapura K, Garg S et al (2011) Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 141–149
Myers S, Zhu C, Leskovec J (2012) Information diffusion and external influence in networks. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 33–41
Nemhauser G, Wolsey L, Fisher M (1978) An analysis of the approximations for maximizing submodular set functions. Math Program 14:265–294
Pearl J (1988) Probabilistic reasoning in intelligent system: networks of plausible inference. Morgan Kaufmann Publishers, Burlington
Rodrigues T, Benevenuto F, Cha M et al (2011) On word-of-mouth based discovery of the web. In: Proceedings of the ACM SIGCOMM on Internet measurement conference, pp 381–396
Russell J, Norvig P (2011) Artificial intelligence: a modern approach, 3rd edn. Pearson, Hoboken
Smith S, Kao E, Shah D et al (2018) Influence estimation on social media networks using causal inference. In: Proceedings of IEEE statistical signal processing (SSP) workshop
Vlasselaer J, Meert W, Broeck G et al (2016) Exploiting local and repeated structure in dynamic Bayesian networks. Artif Intell 232:43–53
Yang C, Tang J, Sun M et al (2019) Multi-scale information diffusion prediction with reinforced recurrent networks. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence (IJCAI), pp 4033–4039
Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the 11th conference on web search and data mining (WSDM), pp 177–186
Yin Z, Yue K, Wu H, Su Y (2018) Adaptive and parallel data acquisition from online big graphs. In: Proceedings of the 23rd international conference on database systems for advanced applications (DASFAA) (1), pp 323–331
Yu K, Wu X, Ding W et al (2011) Causal associative classification. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), pp 914–923
Yu L, Cui P, Wang F et al (2017) Uncovering and predicting the dynamic process of information cascades with survival model. Knowl Inf Syst 50(2):633–659
Zhang Q, Gong Y, Wu J, et al. (2016) Retweet prediction with attention-based deep neural network. In: Proceedings of the 25th ACM international on conference on information and knowledge management (CIKM), pp 75–84
Zhong E, Fan W, Wang J et al (2012) Comsoc: adaptive transfer of user behaviors over composite social network. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 696–704
Acknowledgements
This paper was supported by the National Natural Science Foundation of China (U1802271, 62002311), the Science Foundation for Distinguished Young Scholars of Yunnan Province (2019FJ011), the Fundamental Research Project of Yunnan Province (202001BB050052), and the Cultivation Project of Donglu Scholar of Yunnan University. The authors are grateful to Mr. Kaiyu Song for his generous help to the improvement of experiments, as well as the reviewers for their constructive comments and suggestions which contribute substantially to the improvement of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Responsible editor: M. J. Zaki.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, W., Yue, K., Li, J. et al. Inferring range of information diffusion based on historical frequent items. Data Min Knowl Disc 36, 82–107 (2022). https://doi.org/10.1007/s10618-021-00800-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-021-00800-5