Skip to main content
Log in

Inferring range of information diffusion based on historical frequent items

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

To estimate the range of information diffusion is critical for social network and user behavior analysis. Selecting nodes to constitute the range of information diffusion is challenging by the classic independent cascade and linear threshold models, due to the unknown topology of large-scale online social networks (OSNs). In this paper, we start from the mining of frequent itemsets in historical records of information diffusion, and adopt Bayesian network (BN) as the framework to represent and infer the implied dependence relations among frequent items. To make probabilistic inferences to infer the range, we first propose a greedy algorithm to select the observed nodes as the evidence of BN inference, for which we propose the metric of proximity degree and prove its submodularity. Then, we give the algorithm to construct the item-association BN (IABN) to represent the dependencies among frequent items. Following, we present an approximate algorithm to infer the range of information diffusion w.r.t. the observed nodes. Experimental results show that the observed nodes could be selected and the range of information diffusion could be inferred effectively. Empirical studies also demonstrate that our proposed IABN outperforms some state-of-the-art methods to obtain relatively complete nodes in the range of information diffusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Agarwal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large databases (VLDB), pp 487–499

  • Arnaboldi V, Passarella A, Conti M et al (2015) Online social networks: human cognitive constraints in Facebook and Twitter personal graphs. Elsevier, Amsterdam

    Google Scholar 

  • Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACMSIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 199–208

  • Christakis Nicholas, Fowler James (2010) Social network sensors for early detection of contagious outbreaks. PLoS One 5:e12948, 09. https://doi.org/10.1371/journal.pone.0012948

    Article  Google Scholar 

  • Cui P, Jin S, Yu L et al (2013) Cascading outbreak prediction in networks: a data-driven approach. In: Proceedings of the 19th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 901–909

  • Devore J (2004) Probability and statistics for engineering and the sciences. Wadsworth Group, Davidson

    Google Scholar 

  • Filmus Yuval (2013) Inequalities on submodular functions via term rewriting. Inf Process Lett 113(13):457–464

    Article  MathSciNet  Google Scholar 

  • George D, Hawkins J (2005) A hierarchical Bayesian model of in variant pattern recognition in the visual cortex. In: Proceedings of 2005 IEEE international joint conference on neural networks (IJCNN), pp 1812–1817

  • Ha C, Wu X, Hu X et al (2011) Computing and pruning method for frequent pattern interestingness based on Bayesian networks. J Softw 22(12):2934–2950

    Article  Google Scholar 

  • Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, Burlington

    MATH  Google Scholar 

  • Han J, Cheng H, Xin D et al (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–85

    Article  MathSciNet  Google Scholar 

  • Hasan M (2016) Methods and applications of network sampling

  • Hernando A, Bobadilla J, Ortega F (2016) A non negative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model. Knowl-Based Syst 97:188–202

    Article  Google Scholar 

  • Hu S, Cautis B, Chen Z et al (2019) Model-free inference of diffusion networks using RKHS embeddings. Data Min Knowl Discov 33:499–525

    Article  MathSciNet  Google Scholar 

  • Kurant M, Gjoka M, Wang Y et al (2012) Coarse-grained topology estimation via graph sampling. In: Proceedings of the ACM SIGCOMM 2012 conference on data communication, pp 25–30

  • Lee G, Yun U, Ruang H (2014) An uncertainty-based approach: frequent itemset mining from uncertain data with different item importance. Knowl-Based Syst 90:239–256

    Article  Google Scholar 

  • Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACMSIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 631–636

  • Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 497–506

  • Liu W, Yue K, Liu H et al (2014) Associative categorization of frequent patterns based on the probabilistic graphical model. Front Comput Sci 8(2):265–278

    Article  MathSciNet  Google Scholar 

  • Liu W, Yue K, Wu H et al (2018) Markov-network based latent link analysis for community detection in social behavioral interactions. Appl Intell 48(8):2081–2096

    Article  Google Scholar 

  • Maiya A, Berger-Wolf T (2010) Online sampling of high centrality individuals in social networks. In: Proceedings of the 14th Pacific-Asia knowledge discovery and data mining (PAKDD), pp 91–98

  • Menon A, Chitrapura K, Garg S et al (2011) Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 141–149

  • Myers S, Zhu C, Leskovec J (2012) Information diffusion and external influence in networks. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD), pp 33–41

  • Nemhauser G, Wolsey L, Fisher M (1978) An analysis of the approximations for maximizing submodular set functions. Math Program 14:265–294

    Article  MathSciNet  Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent system: networks of plausible inference. Morgan Kaufmann Publishers, Burlington

    MATH  Google Scholar 

  • Rodrigues T, Benevenuto F, Cha M et al (2011) On word-of-mouth based discovery of the web. In: Proceedings of the ACM SIGCOMM on Internet measurement conference, pp 381–396

  • Russell J, Norvig P (2011) Artificial intelligence: a modern approach, 3rd edn. Pearson, Hoboken

    MATH  Google Scholar 

  • Smith S, Kao E, Shah D et al (2018) Influence estimation on social media networks using causal inference. In: Proceedings of IEEE statistical signal processing (SSP) workshop

  • Vlasselaer J, Meert W, Broeck G et al (2016) Exploiting local and repeated structure in dynamic Bayesian networks. Artif Intell 232:43–53

    Article  MathSciNet  Google Scholar 

  • Yang C, Tang J, Sun M et al (2019) Multi-scale information diffusion prediction with reinforced recurrent networks. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence (IJCAI), pp 4033–4039

  • Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the 11th conference on web search and data mining (WSDM), pp 177–186

  • Yin Z, Yue K, Wu H, Su Y (2018) Adaptive and parallel data acquisition from online big graphs. In: Proceedings of the 23rd international conference on database systems for advanced applications (DASFAA) (1), pp 323–331

  • Yu K, Wu X, Ding W et al (2011) Causal associative classification. In: Proceedings of the 11th IEEE international conference on data mining (ICDM), pp 914–923

  • Yu L, Cui P, Wang F et al (2017) Uncovering and predicting the dynamic process of information cascades with survival model. Knowl Inf Syst 50(2):633–659

    Article  Google Scholar 

  • Zhang Q, Gong Y, Wu J, et al. (2016) Retweet prediction with attention-based deep neural network. In: Proceedings of the 25th ACM international on conference on information and knowledge management (CIKM), pp 75–84

  • Zhong E, Fan W, Wang J et al (2012) Comsoc: adaptive transfer of user behaviors over composite social network. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 696–704

Download references

Acknowledgements

This paper was supported by the National Natural Science Foundation of China (U1802271, 62002311), the Science Foundation for Distinguished Young Scholars of Yunnan Province (2019FJ011), the Fundamental Research Project of Yunnan Province (202001BB050052), and the Cultivation Project of Donglu Scholar of Yunnan University. The authors are grateful to Mr. Kaiyu Song for his generous help to the improvement of experiments, as well as the reviewers for their constructive comments and suggestions which contribute substantially to the improvement of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Yue.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: M. J. Zaki.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, W., Yue, K., Li, J. et al. Inferring range of information diffusion based on historical frequent items. Data Min Knowl Disc 36, 82–107 (2022). https://doi.org/10.1007/s10618-021-00800-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-021-00800-5

Keywords

Navigation