Elsevier

Neurocomputing

Volume 466, 27 November 2021, Pages 265-284
Neurocomputing

Towards embedding information diffusion data for understanding big dynamic networks

https://doi.org/10.1016/j.neucom.2021.09.024Get rights and content

Abstract

Dynamic networks are popularly used to describe networks that change with time. Although there have been a large number of research works on understanding dynamic networks using link prediction, node classification and community detection, there is rare work that is specially designed to address the challenge of big network size of dynamic networks. To this end, we study in this paper an emerging and challenging problem of network coarsening in dynamic networks. Network coarsening refers to a class of network “zoom-out” operations where node pairs and edges are grouped together for efficient analysis on big networks. However, existing network coarsening approaches can only handle static networks where network structure weights have been predefined before the coarsening calculation. Under the observation that big networks are highly dynamic and naturally change over time, we consider in this paper to embed information diffusion data which reflect the dynamics of networks for network coarsening. Specifically, we present a new Semi-NetCoarsen approach that jointly maximizes the likelihood of observing the information diffusion data and minimizes the network regularization with respect to the predefined network structural data. The learning function is convex and we use the accelerated proximal gradient algorithm to obtain the global optimal solution. We conduct experiments on two synthetic and five real-world data sets to validate the performance of the proposed method.

Introduction

Dynamic network analysis plays an important role in understanding networks that change with time. Recently we have observed an increasing number of research work on building powerful models for dynamic network analysis, such as link prediction [25], [24], node classification [32], and community detection [23]. However, all these models do not specially designed to address the challenge of big network size. Recently, an emerging topic of handling the big network size problem is to use network coarsening methods. Specifically, network coarsening refers to grouping node pairs and edges together to significantly reduce the size of big networks, which has attracted increasing attention in social network analysis such as influence maximization (IM) and viral marketing.

Due to the massive amount of information generated by online social networking services, it is very challenging to analyze networks on the original big networks. Previous works on coarsening networks assume that network structure information is given in advance. Then, they focus on shrinking the network size based on the predefined network structure information by using link based [24] and heavy-clique based [31] approaches. Typically, the recent work [28] coarsens networks by collapsing necessary node pairs to obtain a smaller representation of the network.

However, existing work can only coarsen networks with predefined structural data. They neglect that big networks naturally change over time. The dynamic analysis of network structures has attracted increasing attentions recently. Dynamic networks refer to networks that either evolve over time [5] such as existing links disappear and new links appear continuously, or a subset of links are activated/deactivated at different time windows [29]. Many data-driven approaches [12], [14] based on information spreading cascades handle dynamic network analysis via maximum likelihood estimation (MLE). For example, NetRate [29] infers network structures from both information spreading data and information transmission rates over network edges.

Due to the heavily skewed degree distribution in many big networks, such as complex networks [28], a large amount of node pairs are relatively unnecessary for network analysis. In this paper, we study a new problem of network coarsening. Both the information spreading cascades and network structural data are considered to coarsen the network. The main idea is that two nodes are likely to be collapsed if they frequently co-occur in the same cascades and thus tightly connect with each other. Based on the inferred smaller representations, complicated network analysis such as influence maximization can be tackled efficiently.

A motivating example of dynamic network coarsening is viral marketing in social networks. Social networks are naturally dynamic where new users and links are continuously added/removed and thus the network structures are always changing. In such dynamic networks with a huge number of users, it is expensive to do influence maximization analysis for adverting. Thus, network coarsening is necessary to simulate the network analysis where both network dynamics and big network size are key challenges. As shown in Fig. 1, the original network has 77 nodes and 254 links, and the weight values of links are continuously changing. When coarsening the original network into 7 supernodes and 8 superedges, we can run influence maximization algorithms more efficiently with fewer Monte-Carlo simulations for accurate social network advertising.

It is challenging to coarsen a network inferred from both dynamic information spreading cascades and static structural data. First, the two different data sources are heterogeneous, which are required to be learned jointly. Second, combining the two data sources for network coarsening results in a complicated optimization problem that needs efficient algorithms. Third, the condensed network should approximately preserve the structure information of the original big network.

To address these challenges, we present a new semi-data-driven network coarsening model (Semi-NetCoarsen for short). We formulate the learning problem by simultaneously maximizing the likelihood of network coarsening with respect to the information spreading cascades and minimizing the graph regularization term with respect to the network structural data. To validate the performance of the proposed method, we use the network coarsening framework to solve the influence maximization problem in experiments.

The contribution of the paper is threefold:

  • Problem formulation: We formulate the network coarsening problem by simultaneously maximizing the likelihood of observing the information spreading cascades and minimizing the graph regularization with respect to the network structural data. The learning problem is convex.

  • Efficient algorithm: An efficient semi-NetCoarsen method is presented to solve the above learning problem. The accelerated proximal gradient algorithm is used to obtain the globally optimal solution.

  • Case study on network influence maximization: Based on the coarsened networks, we develop a new framework Semi-NetCoarsen_IM to efficiently solve the influence maximization problem. Extensive experiments show that our method runs orders of magnitude faster than the algorithms running on the original networks.

The rest of the paper is organized as follows. Section 2 provides a survey of the related work. Section 3 introduces the problem statement. Section 4 provides our solution. Section 5 discusses the semi-data-driven framework. 6 discusses the applications of our method to the influence maximization problem. Section 7 reports experimental results, and Section 8 concludes the paper.

Section snippets

Network coarsening

The problem of coarsening a big network or graph has attracted increasing attention because of its potential applications in graph partitioning [27], [35], [15], [32], graph sparsification [24], [6], community detection [23] and IM analysis [43], [28], [13]. Many graph metrics, such as link based [25], [24], and heavy-clique based [31] can be used to coarsen a big network. For example, the work [32] presents a PuLP method which partitions low-diameter networks with skewed degree distributions.

Information spreading cascade

Consider a network G, an information spreading cascade tc can be denoted as tc{(v1,t1)c,,(v|V|,t|V|)c}, where a message c is assumed to propagate throughout the network G, leaving a trace of passed nodes viV with time stamp ti, represented as (vi,ti)c. We denote the time element in (vi,ti)c as tic,tic[0,Tc]{}. The notation labels nodes that have not received message c in the time window [0,Tc]. We denote C{t1,,t|C|} as the set that contains a collection of cascades. The study [34]

Main idea

To coarsen a big network G, we can merge similar nodes vi in G into a supernode s. The supernode s and all the nodes vi in s are labeled with a class label ls. We denote yi,s as the probability of node vi merged into supernode s and yi,s[0,1]. The distribution of the class labels of vi is represented as a non-negative vector yi=[yi,1,,yi,K]T, where K is the total number of supernodes. Then the distribution of the class labels of all the nodes in G is modeled as Y=[y1,y2,,yN]TRN×K.

Obviously,

Learning the semi-NetCoarsen algorithm

In this section, we first solve Eq. (10) to obtain the optimal solution Y. We then obtain the coarsened network Gcoarsen based on Y in Section 5.2. We also carry out the network IM problem based on Gcoarsen in Section 6.

The influence maximization problem

In this section, we use IM problem [18] as a case study to measure the performance of the proposed network coarsening algorithm. It is expected that the coarsened network approximates the original network, and thus the solution for the IM problem can be sped up. Based on Semi-NetCoarsen, we design a new algorithm Semi-NetCoarsen_IM to solve the IM problem, which has the following main steps.

  • Coarsen the network: By using Semi-NetCoarsen, the original network G is coarsened to obtain a coarsened

Experiments

We evaluate Semi-NetCoarsen for network coarsening. The performance of Semi-NetCoarsen_IM on the IM problem is also evaluated based on the coarsened network. All experiments are conducted on a Linux system with 32 GB of memory and an AMD six core 1.4 GHz CPU.

Conclusions

In this paper we present a new efficient method to coarsen big dynamic networks by jointly modeling the information cascade data and the network structural data. A new network coarsening model Semi-NetCoarsen is presented by both minimizing the graph regularization with respect to the predefined structural data and maximizing the likelihood of observing the dynamic information spreading cascades. Experiments on seven data sets show the performance of the proposed method.

In this work, we measure

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was partially supported by the NSFC projects (61972105, 61872360), Guangdong Higher Education Innovation Group 2020KCXTD007, Guangzhou Higher Education Innovation Group 202032854, and the Alibaba Group through Alibaba Innovative Research Program.

Hong Yang is a Senior Postdoctoral Scientist with the University of Sydney, Australia. She received her PhD degree from University of Technology Sydney (UTS), Australia. She obtained her Master degree from University of Chinese Academy of Sciences, and her Bachelor degree from Xidian University. Before joining UTS, she worked at Mathworks as a software engineer for nine years. Her research interests include graph data analytics and medical image processing. She has published over 20 research

References (43)

  • S. Brin et al.

    Reprint of: The anatomy of a large-scale hypertextual web search engine

    Comput. Netw.

    (2012)
  • A. Adler et al.

    Linear-time subspace clustering via bipartite graph modeling

    IEEE Trans. Neural Networks Learn. Syst.

    (2015)
  • A.L. Barabási et al.

    Emergence of scaling in random networks

    Science

    (1999)
  • Barbieri, N., Bonchi, F., Manco, G.: Cascade-based community detection. In: WSDM, pp. 33–42...
  • A. Beck et al.

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems

    SIAM J. Imaging Sci.

    (2009)
  • S.Y. Bhat et al.

    Hoctracker: Tracking the evolution of hierarchical and overlapping communities in dynamic social networks

    IEEE Trans. Knowl. Data Eng.

    (2015)
  • F. Bonchi et al.

    Activity preserving graph simplification

    Data Min. Knowl. Disc.

    (2013)
  • D. Cai et al.

    Graph regularized nonnegative matrix factorization for data representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • J. Chen et al.

    Algebraic distance on graphs

    SIAM J. Scientific Computing

    (2011)
  • Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social...
  • F.R. Chung
    (1997)
  • Cui, P., Jin, S., Yu, L., Wang, F., Zhu, W., Yang, S.: Cascading outbreak prediction in networks: A data-driven...
  • C. Dong et al.

    Assessing the influence of an individual event in complex fault spreading network based on dynamic uncertain causality graph

    IEEE Trans. Neural Networks Learn. Syst.

    (2016)
  • Du, N., Liang, Y., Balcan, M., Song, L.: Influence function learning in information diffusion networks. In: ICML, pp....
  • R. Glantz et al.

    Tree-based coarsening and partitioning of complex networks

    J. Exp. Algorithmics (JEA)

    (2016)
  • Gomez Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. In: KDD, pp. 1019–1028...
  • Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: KDD, pp. 137–146...
  • D.E. Knuth
    (1993)
  • Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506...
  • J. Leskovec et al.

    Kronecker graphs: An approach to modeling networks

    J. Mach. Learn. Res.

    (2010)
  • S. Leyffer et al.

    Fast response to infection spread and cyber attacks on large-scale networks

    J. Complex Networks

    (2013)
  • Cited by (0)

    Hong Yang is a Senior Postdoctoral Scientist with the University of Sydney, Australia. She received her PhD degree from University of Technology Sydney (UTS), Australia. She obtained her Master degree from University of Chinese Academy of Sciences, and her Bachelor degree from Xidian University. Before joining UTS, she worked at Mathworks as a software engineer for nine years. Her research interests include graph data analytics and medical image processing. She has published over 20 research papers in major data mining and artificial intelligence journals and conferences.

    Peng Zhang is a professor with Guangzhou University. He received his PhD degree from University of the Chinese Academy of Sciences. He was a lecturer with University of Technology Sydney, an associate professor with the Chinese Academy of Sciences, and a senior staff engineer with Alibaba Group. He has been researching into data mining, data streams, and social network analysis, with over 150 publications in TPAMI, TKDE, TNNLS, KDD, SIGIR, WWW, ICDM, AAAI, IJCAI, etc. He has served on many program committees of international conferences, including PC member for KDD, ICLR, ICML, NeurIPS, IJCAI, and AAAI conferences. He also served as the founding editorial board of Springer Annals of Data Science, and Springer Journal of Big Data.

    Haishuai Wang is an Assistant Professor of Computer Science at Fairfield University, USA. He received his PhD degree in Computer Science from the Center of Artificial Intelligence at University of Technology Sydney. He did his postdoc training at Harvard University and Washington University in St. Louis. His research focuses on Data Mining, Machine Learning and Bioinformatics.

    Chuan Zhou obtained Ph.D. degree from Chinese Academy of Sciences in 2013. He won the outstanding doctoral dissertation of Chinese Academy of Sciences in 2014, the best paper award of ICCS-14, and the best student paper award of IJCNN-17. Currently, he is an Associate Professor at the Academy of Mathematics and Systems Science, Chinese Academy of Sciences. His research interests include socail network analysis and graph mining. To date, he has published more than 80 papers, including IEEE TKDE, ICDM, AAAI, NeurIPS, CIKM, IJCAI and WWW.

    Zhao Li received the Ph.D. degree (Hons.) from the Computer Science Department, University of Vermont. He is currently a Senior Staff Scientist with the Alibaba Group, specializing in e-commerce ranking and recommendation systems. He has published more than 70 articles in prestigious conferences and journals, including NIPS, AAAI, IJCAI, and DMKD. His current research interests include adversarial ma- chine learning, network representation learning, knowledge graphs, multi-agent reinforcement learning, and big data-driven security. He is also a Technical Committee Member of the China Computer Federation on Database.

    Li Gao is a senior R&D engineer in Baidu Search, focusing on the ranking strategy. Previously, he worked as a senior researcher at Tencent inc. (2018–2019). Before that, he received his Ph.D. (2018) in Computer Science from the Institute of Information Engineering, Chinese Academy of Sciences. His research interests include graph mining and recommender system.

    Qingfeng Tan received Ph.D. degree in information security from the University of Chinese Academy of Sciences, Beijing, China, in 2017. He is currently an Associate Professor with the Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China. His current research interests include anonymous communication and privacy protection.

    View full text