Skip to main content

Advertisement

Log in

Continuous-time dynamic graph learning based on spatio-temporal random walks

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The application of dynamic graph representation learning in the processing of dynamic systems, such as social networks and transportation networks, has increased in recent times due to its ability to efficiently integrate topological and temporal information into a compact representation. Continuous-time dynamic graphs (CTDGs) have received considerable attention due to their capacity to retain precise temporal information. Existing methods based on random walk techniques often use time-biased sampling to extract dynamic graph patterns, neglecting the topological properties of the graph. Additionally, previous anonymous walks do not share node identifiers, failing to fully leverage the correlations between network patterns, which play a crucial role in predicting future interactions. Consequently, this study focuses on methods related to CTDGs. This paper presents a novel continuous-time dynamic graph learning method based on spatio-temporal random walks, which makes three main contributions: (i) By considering temporal constraints and topological structures, our method extracts diverse expressive patterns from CTDGs; (ii) It introduces the hitting counts of the nodes at a certain position as the node’s relative identity. This approach fully leverages the correlation of network patterns, ensuring that the pattern structure remains consistent even after removing node identities; (iii) An attention mechanism is employed to aggregate walk encodings, allowing the importance of different walks to be distinguished. This facilitates a more precise delineation of the relationships and structural attributes between nodes, thereby enhancing the precision and expressive power of node representations. The proposed method demonstrates superior performance compared to the average strongest baseline, achieving gains of 2.72% and 2.46% in all transductive and inductive link prediction tasks, respectively. Additionally, it attains up to an 8.7% improvement on specific datasets. Furthermore, it exhibits the second best overall performance in dynamic node classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Algorithm 2
Fig. 4
Fig. 5

Similar content being viewed by others

Data availibility

The authors confirm that the data supporting the findings of this study are available within the article

References

  1. Kazemi SM, Goel R, Jain K, Kobyzev I, Sethi A, Forsyth P, Poupart P (2020) Representation learning for dynamic graphs: a survey. J Mach Learn Res 21(70):1–73

    MathSciNet  MATH  Google Scholar 

  2. Alvarez-Rodriguez U, Battiston F, Arruda GF, Moreno Y, Perc M, Latora V (2021) Evolutionary dynamics of higher-order interactions in social networks. Nat Hum Behav 5(5):586–595

    Article  Google Scholar 

  3. Yu L, Liu Z, Sun L, Du B, Liu C, Lv W (2023) Continuous-time user preference modelling for temporal sets prediction. IEEE Trans Knowl Data Eng 36:1475–1488

    Article  MATH  Google Scholar 

  4. Sun Y, Jiang X, Hu Y, Duan F, Guo K, Wang B, Gao J, Yin B (2022) Dual dynamic spatial-temporal graph convolution network for traffic prediction. IEEE Trans Intell Transp Syst 23(12):23680–23693

    Article  Google Scholar 

  5. Simmel G (1950) The sociology of Georg Simmel, vol 92892. Simon and Schuster, New York

    MATH  Google Scholar 

  6. Granovetter MS (1973) The strength of weak ties. Am J Sociol 78(6):1360–1380

    Article  MATH  Google Scholar 

  7. Pareja A, Domeniconi G, Chen J, Ma T, Suzumura T, Kanezashi H, Kaler T, Schardl T, Leiserson C (2020) EvolveGCN: evolving graph convolutional networks for dynamic graphs. Proc AAAI Conf Artif Intell 34:5363–5370

    Google Scholar 

  8. Zhao L, Song Y, Zhang C, Liu Y, Wang P, Lin T, Deng M, Li H (2019) T-GCN: a temporal graph convolutional network for traffic prediction. IEEE Trans Intell Transp Syst 21(9):3848–3858

    Article  MATH  Google Scholar 

  9. Seo Y, Defferrard M, Vandergheynst P, Bresson X (2018) Structured sequence modeling with graph convolutional recurrent networks. In: Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13-16, 2018, Proceedings, Part I 25. Springer, pp 362–373

  10. Wang J, Zhu W, Song G, Wang L (2022) Streaming graph neural networks with generative replay. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp 1878–1888

  11. Goyal P, Kamra N, He X, Liu Y (2018). DynGEM: deep embedding method for dynamic graphs. arXiv preprint arXiv:1805.11273

  12. Sankar A, Wu Y, Gou L, Zhang W, Yang H (2020). DySAT: Deep neural representation learning on dynamic graphs via self-attention networks. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp 519–527

  13. Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K (2020) Inductive representation learning on temporal graphs. In: International Conference on Learning Representations

  14. Rossi E, Chamberlain B, Frasca F, Eynard D, Monti F, Bronstein M (2020) Temporal graph networks for deep learning on dynamic graphs. arXiv preprint arXiv:2006.10637

  15. Wang Y, Chang YY, Liu Y, Leskovec J, Li P (2021) Inductive representation learning in temporal networks via causal anonymous walks. In: International Conference on Learning Representations (ICLR)

  16. Souza A, Mesquita D, Kaski S, Garg V (2022) Provably expressive temporal graph networks. Adv Neural Inf Process Syst 35:32257–32269

    Google Scholar 

  17. Cong W, Zhang S, Kang J, Yuan B, Wu H, Zhou X, Tong H, Mahdavi M (2023) Do we really need complicated model architectures for temporal networks? In: The Eleventh International Conference on Learning Representations

  18. Trivedi R, Farajtabar M, Biswal P, Zha H (2019) DyRep: Learning representations over dynamic graphs. In: International Conference on Learning Representations

  19. Wang L, Chang X, Li S, Chu Y, Li H, Zhang W, He X, Song L, Zhou J, Yang H (2021) TCL: transformer-based dynamic graph modelling via contrastive learning. arXiv preprint arXiv:2105.07944

  20. Kumar S, Zhang X, Leskovec J (2019). Predicting dynamic embedding trajectory in temporal interaction networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1269–1278

  21. Wang X, Lyu D, Li M, Xia Y, Yang Q, Wang X, Wang X, Cui P, Yang Y, Sun B, et al (2021) APAN: Asynchronous propagation attention network for real-time temporal graph embedding. In: Proceedings of the 2021 International Conference on Management of Data, pp 2628–2638

  22. Nguyen GH, Lee JB, Rossi RA, Ahmed NK, Koh E, Kim S (2018) Dynamic network embeddings: from random walks to temporal random walks. In: 2018 IEEE International Conference on Big Data (Big Data), IEEE, pp 1085–1092

  23. Zhang M, Xu B, Wang L (2023) Dynamic network link prediction based on random walking and time aggregation. Int J Mach Learn Cybern 14(8):2867–2875

    Article  MATH  Google Scholar 

  24. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Processing Syst 30:5998–6008

    Google Scholar 

  25. Feng Z, Wang R, Wang T, Song M, Wu S, He S (2024) A comprehensive survey of dynamic graph neural networks: Models, frameworks, benchmarks, experiments and challenges. arXiv preprint arXiv:2405.00476

  26. Yang L, Chatelain C, Adam S (2024) Dynamic graph representation learning with neural networks: a survey. IEEE Access 12:43460–43484

    Article  MATH  Google Scholar 

  27. Trivedi R, Dai H, Wang Y, Song L (2017) Know-evolve: deep temporal reasoning for dynamic knowledge graphs. In: International Conference on Machine Learning, PMLR, pp 3462–3471

  28. Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  29. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26

  30. Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 701–710

  31. Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 855–864

  32. Liu Z, Che W, Wang S, Xu J, Yin H (2023) A large-scale data security detection method based on continuous time graph embedding framework. J Cloud Comput 12(1):89

    Article  MATH  Google Scholar 

  33. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International Conference on Learning Representations

  34. Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS (2019) Heterogeneous graph attention network. In: The World Wide Web Conference, pp 2022–2032

  35. Zhang Y, Shi Z, Feng D, Zhan X-X (2019) Degree-biased random walk for large-scale network embedding. Futur Gener Comput Syst 100:198–209

    Article  MATH  Google Scholar 

  36. Jin M, Li Y-F, Pan S (2022) Neural temporal walks: Motif-aware representation learning on continuous-time dynamic graphs. Adv Neural Inf Process Syst 35:19874–19886

    MATH  Google Scholar 

  37. Poursafaei F, Huang S, Pelrine K, Rabbany R (2022) Towards better evaluation for dynamic link prediction. Adv Neural Inf Process Syst 35:32928–32941

    Google Scholar 

  38. Yu L, Sun L, Du B, Lv W (2023) Towards better dynamic graph learning: new architecture and unified library. Adv Neural Inf Process Syst 36:67686–67700

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Key Research and Development Program of Hunan Province (Grant No. 2023SK2038).

Author information

Authors and Affiliations

Authors

Contributions

These authors contributed equally to this work.

Corresponding author

Correspondence to Bin Wang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Notations

See Table 7

Table 7 Summary of important notations

B Theoretical analysis of the combined sampling probabilities

1.1 B.1 Theoretical background

In dynamic graph learning, both temporal and spatial information are crucial for understanding the relationships between nodes. Temporal random walks emphasize the relevance of time adjacency, while spatial random walks focus on the connectivity between nodes. To leverage both types of information, it is essential to consider their combined effects.

Temporal sampling probability \(P_t (a)\): The temporal sampling probability reflects the influence of nodes around a specific time t. The formula is given by:

$$\begin{aligned} P_t(a) = \frac{\exp(\alpha (t_a - t))}{\sum \limits _{a' \in \mathcal {G}_{u,t}} \exp(\alpha (t_{a'} - t))} \end{aligned}$$
(18)

The closer the timestamp \(t_a\) of node a is to the current time t, the higher its probability. This design aims to prioritize nodes that are temporally adjacent, thereby capturing temporal dynamics effectively.

Spatial sampling probability \(P_s (a)\): The spatial sampling probability considers the connectivity of nodes. The formula is:

$$\begin{aligned} P_s(a) = \frac{\exp(-\beta / d_a)}{\sum \limits _{a' \in \mathcal {G}_{u,t}} \exp(-\beta / d_{a'})} \end{aligned}$$
(19)

The higher the degree \(d_a\) of node a, the greater its probability. Nodes with higher degrees are often in more critical positions in the network, so they should be given higher priority during sampling.

1.2 B.2 Theoretical analysis: necessity of averaging

In many practical scenarios, temporal and spatial information are interconnected. In dynamic networks, the connectivity of a node (spatial) may influence its interaction time (temporal) and vice versa. Thus, assuming equal contributions from \(P_t (a)\) and \(P_s (a)\) can effectively reflect the complex relationships present in real-world situations.

In certain situations, nodes that are close in time may not be connected spatially, and vice versa. Therefore, solely considering one factor might result in the loss of crucial information. By averaging, we can balance the effects of both, enhancing the model’s adaptability and generalization capability.

1.3 B.3 Mathematical derivation

To rigorously illustrate this theory, we can approach it from a probability theory perspective.

Assuming \(P_t (a)\) and \(P_s (a)\) are valid probability distributions, we have:

$$\begin{aligned} & \sum \limits _{a' \in \mathcal {G}_{u,t}} P_t (a) = 1 \end{aligned}$$
(20)
$$\begin{aligned} & \sum \limits _{a' \in \mathcal {G}_{u,t}} P_s (a) = 1 \end{aligned}$$
(21)

Defining \(P_{\text{combined}}\) as:

$$\begin{aligned} P_{\text{combined}} = \frac{P_t (a) + P_s (a)}{2} \end{aligned}$$
(22)

We then have:

$$\begin{aligned} \sum \limits _{a' \in \mathcal {G}_{u,t}} P_{\text{combined}} = \frac{1+1}{2} = 1 \end{aligned}$$
(23)

This indicates that \(P_{\text{combined}}\) is also a valid probability distribution.

Combining the temporal and spatial information through averaging \(P_t (a)\) and \(P_s (a)\) is both reasonable and necessary. This approach not only considers the unique contributions of both factors but also enhances the model’s performance in dynamic graph learning. In summary, taking into account the influences of time and space under different conditions allows us to more effectively uncover the diversity and complexity within dynamic systems.

C Time complexity analysis

In Algorithm 1, the time complexity of the outer loop is O(l), the middle loop is O(C), and the inner loop, which traverses neighbors, has a complexity of O(d), where d is the maximum degree of the nodes. Therefore, the overall time complexity is O(lCd).

D Experimental setting

1.1 D.1 Dataset source

Most of the used original dynamic graph datasets come from Origin Datasets, which can be downloaded here. For convenience, you can also directly download the processed data package from processed_data. Here are the UNTrade contains the food and agriculture trade between 181 nations for more than 30 years. The weight of each link indicates the total sum of normalized agriculture import or export values between two particular countries of these datasets:

  • Wikipedia is a bipartite interaction graph that contains the edits on Wikipedia pages over a month. Nodes represent users and pages, and links denote the editing behaviors with timestamps. Each link is associated with a 172-dimensional Linguistic Inquiry and Word Count (LIWC) feature. This dataset additionally contains dynamic labels that indicate whether users are temporarily banned from editing.

  • Reddit is bipartite and records the posts of users under subreddits during one month. Users and subreddits are nodes, and links are the timestamped posting requests. Each link has a 172-dimensional LIWC feature. This dataset also includes dynamic labels representing whether users are banned from posting.

  • Enron records the email communications between employees of the ENRON energy corporation over three years.

  • UCI is an online communication network, where nodes are university students and links are messages posted by students.

  • Flights is a dynamic flight network that displays the development of air traffic during the COVID-19 pandemic. Airports are represented by nodes and the tracked flights are denoted as links. Each link is associated with a weight, indicating the number of flights between two airports in a day.

  • MOOC is a bipartite interaction network of online sources, where nodes are students and course content units (e.g., videos and problem sets). Each link denotes a student’s access behavior to a specific content unit and is assigned with a 4-dimensional feature.

  • LastFM is bipartite and consists of the information about which songs were listened to by which users over one month. Users and songs are nodes, and links denote the listening behaviors of users.

1.2 D.2 Baselines

  • CTDNE [22] extends the static network embedding to dynamic graphs, where temporal random walks have been proposed with the skip-gram model to learn node representations.

  • DyRep [18] introduces a recurrent architecture to update node states during each interaction. It also includes a temporal attention aggregation module to consider the structural information evolving over time in dynamic graphs.

  • JODIE [20] uses two coupled recurrent neural networks to update the states of users and items. It introduces a projection operation to learn the future representation trajectories of each user/item.

  • TGAT [13] computes node representations by aggregating features from each node’s temporal-topological neighbors through a self-attention mechanism. It also features a time encoding function to capture temporal patterns.

  • TGN [14] maintains an evolving memory for each node, updating it when nodes are observed in interactions. This is achieved through message functions, a message aggregator, and a memory updater. An embedding module generates the temporal representation of nodes.

  • CAWN [15] extracts multiple causal anonymous walks for each node, exploring the causal relationships in the network dynamics and generating relative node identities. It then encodes each walk using a recurrent neural network and aggregates these walks to obtain the final node representation.

  • EdgeBank [37] is a purely memory-based method for transductive dynamic link prediction, with no trainable parameters. It stores observed interactions in memory cells and updates the memory through various strategies.

  • GraphMixer [17] integrates a fixed time encoding function into an MLP-Mixer-based link encoder to learn temporal link relationships.

  • NeurTWs [36] learns temporal node embeddings by combining contrastive learning and random walks with neighbor graphs. The focus is on optimizing node representations by contrasting positive and negative samples.

1.3 D.3 Implementation setails

Our code is available at STAW, where we provide detailed instructions for dataset preparation and model training. The searched ranges of hyperparameters and the related methods are shown in Table 8.

Table 8 Searched ranges of hyperparameters and the related methods

E Time encoding

In our model, time is modeled using a series of cosine functions with different frequencies. We do not directly apply the traditional Fourier transform; instead, we encode the timestamps through a linear transformation to indirectly capture the frequency features in the time series. This process can be viewed as a “Fourier-like”encoding of the time series, with the goal of modeling the periodic characteristics of time using sine and cosine basis functions at different frequencies.

Specifically, \(\Delta t\) is \(t' - t\), the trainable parameter matrix \(\omega\) represents different frequency scales. Each frequency scale corresponds to a specific time period, and during the forward pass, the timestamps are transformed linearly and then passed through the cosine function (cos) to generate the corresponding encoding. This process simulates the decomposition of the time series similar to a Fourier transform, but here we directly generate the frequency features using pre-defined frequency scales (initialized via \(1 / 10^{\frac{9k}{time\_dim}}\) \(for \ k=0,1,2,...,time\_dim-1\) ).

The main motivation for using Fourier transforms or similar approaches is to capture the periodic characteristics of the time series, especially when the data contains multiple frequency components. This method allows the model to extract and represent periodic patterns at different time scales, providing better generalization when dealing with long-term and complex sequential data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sheng, J., Zhang, Y. & Wang, B. Continuous-time dynamic graph learning based on spatio-temporal random walks. J Supercomput 81, 389 (2025). https://doi.org/10.1007/s11227-024-06881-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06881-5

Keywords