Abstract
The application of dynamic graph representation learning in the processing of dynamic systems, such as social networks and transportation networks, has increased in recent times due to its ability to efficiently integrate topological and temporal information into a compact representation. Continuous-time dynamic graphs (CTDGs) have received considerable attention due to their capacity to retain precise temporal information. Existing methods based on random walk techniques often use time-biased sampling to extract dynamic graph patterns, neglecting the topological properties of the graph. Additionally, previous anonymous walks do not share node identifiers, failing to fully leverage the correlations between network patterns, which play a crucial role in predicting future interactions. Consequently, this study focuses on methods related to CTDGs. This paper presents a novel continuous-time dynamic graph learning method based on spatio-temporal random walks, which makes three main contributions: (i) By considering temporal constraints and topological structures, our method extracts diverse expressive patterns from CTDGs; (ii) It introduces the hitting counts of the nodes at a certain position as the node’s relative identity. This approach fully leverages the correlation of network patterns, ensuring that the pattern structure remains consistent even after removing node identities; (iii) An attention mechanism is employed to aggregate walk encodings, allowing the importance of different walks to be distinguished. This facilitates a more precise delineation of the relationships and structural attributes between nodes, thereby enhancing the precision and expressive power of node representations. The proposed method demonstrates superior performance compared to the average strongest baseline, achieving gains of 2.72% and 2.46% in all transductive and inductive link prediction tasks, respectively. Additionally, it attains up to an 8.7% improvement on specific datasets. Furthermore, it exhibits the second best overall performance in dynamic node classification tasks.







Similar content being viewed by others
Data availibility
The authors confirm that the data supporting the findings of this study are available within the article
References
Kazemi SM, Goel R, Jain K, Kobyzev I, Sethi A, Forsyth P, Poupart P (2020) Representation learning for dynamic graphs: a survey. J Mach Learn Res 21(70):1–73
Alvarez-Rodriguez U, Battiston F, Arruda GF, Moreno Y, Perc M, Latora V (2021) Evolutionary dynamics of higher-order interactions in social networks. Nat Hum Behav 5(5):586–595
Yu L, Liu Z, Sun L, Du B, Liu C, Lv W (2023) Continuous-time user preference modelling for temporal sets prediction. IEEE Trans Knowl Data Eng 36:1475–1488
Sun Y, Jiang X, Hu Y, Duan F, Guo K, Wang B, Gao J, Yin B (2022) Dual dynamic spatial-temporal graph convolution network for traffic prediction. IEEE Trans Intell Transp Syst 23(12):23680–23693
Simmel G (1950) The sociology of Georg Simmel, vol 92892. Simon and Schuster, New York
Granovetter MS (1973) The strength of weak ties. Am J Sociol 78(6):1360–1380
Pareja A, Domeniconi G, Chen J, Ma T, Suzumura T, Kanezashi H, Kaler T, Schardl T, Leiserson C (2020) EvolveGCN: evolving graph convolutional networks for dynamic graphs. Proc AAAI Conf Artif Intell 34:5363–5370
Zhao L, Song Y, Zhang C, Liu Y, Wang P, Lin T, Deng M, Li H (2019) T-GCN: a temporal graph convolutional network for traffic prediction. IEEE Trans Intell Transp Syst 21(9):3848–3858
Seo Y, Defferrard M, Vandergheynst P, Bresson X (2018) Structured sequence modeling with graph convolutional recurrent networks. In: Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13-16, 2018, Proceedings, Part I 25. Springer, pp 362–373
Wang J, Zhu W, Song G, Wang L (2022) Streaming graph neural networks with generative replay. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp 1878–1888
Goyal P, Kamra N, He X, Liu Y (2018). DynGEM: deep embedding method for dynamic graphs. arXiv preprint arXiv:1805.11273
Sankar A, Wu Y, Gou L, Zhang W, Yang H (2020). DySAT: Deep neural representation learning on dynamic graphs via self-attention networks. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp 519–527
Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K (2020) Inductive representation learning on temporal graphs. In: International Conference on Learning Representations
Rossi E, Chamberlain B, Frasca F, Eynard D, Monti F, Bronstein M (2020) Temporal graph networks for deep learning on dynamic graphs. arXiv preprint arXiv:2006.10637
Wang Y, Chang YY, Liu Y, Leskovec J, Li P (2021) Inductive representation learning in temporal networks via causal anonymous walks. In: International Conference on Learning Representations (ICLR)
Souza A, Mesquita D, Kaski S, Garg V (2022) Provably expressive temporal graph networks. Adv Neural Inf Process Syst 35:32257–32269
Cong W, Zhang S, Kang J, Yuan B, Wu H, Zhou X, Tong H, Mahdavi M (2023) Do we really need complicated model architectures for temporal networks? In: The Eleventh International Conference on Learning Representations
Trivedi R, Farajtabar M, Biswal P, Zha H (2019) DyRep: Learning representations over dynamic graphs. In: International Conference on Learning Representations
Wang L, Chang X, Li S, Chu Y, Li H, Zhang W, He X, Song L, Zhou J, Yang H (2021) TCL: transformer-based dynamic graph modelling via contrastive learning. arXiv preprint arXiv:2105.07944
Kumar S, Zhang X, Leskovec J (2019). Predicting dynamic embedding trajectory in temporal interaction networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1269–1278
Wang X, Lyu D, Li M, Xia Y, Yang Q, Wang X, Wang X, Cui P, Yang Y, Sun B, et al (2021) APAN: Asynchronous propagation attention network for real-time temporal graph embedding. In: Proceedings of the 2021 International Conference on Management of Data, pp 2628–2638
Nguyen GH, Lee JB, Rossi RA, Ahmed NK, Koh E, Kim S (2018) Dynamic network embeddings: from random walks to temporal random walks. In: 2018 IEEE International Conference on Big Data (Big Data), IEEE, pp 1085–1092
Zhang M, Xu B, Wang L (2023) Dynamic network link prediction based on random walking and time aggregation. Int J Mach Learn Cybern 14(8):2867–2875
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Processing Syst 30:5998–6008
Feng Z, Wang R, Wang T, Song M, Wu S, He S (2024) A comprehensive survey of dynamic graph neural networks: Models, frameworks, benchmarks, experiments and challenges. arXiv preprint arXiv:2405.00476
Yang L, Chatelain C, Adam S (2024) Dynamic graph representation learning with neural networks: a survey. IEEE Access 12:43460–43484
Trivedi R, Dai H, Wang Y, Song L (2017) Know-evolve: deep temporal reasoning for dynamic knowledge graphs. In: International Conference on Machine Learning, PMLR, pp 3462–3471
Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26
Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 701–710
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 855–864
Liu Z, Che W, Wang S, Xu J, Yin H (2023) A large-scale data security detection method based on continuous time graph embedding framework. J Cloud Comput 12(1):89
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: International Conference on Learning Representations
Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, Yu PS (2019) Heterogeneous graph attention network. In: The World Wide Web Conference, pp 2022–2032
Zhang Y, Shi Z, Feng D, Zhan X-X (2019) Degree-biased random walk for large-scale network embedding. Futur Gener Comput Syst 100:198–209
Jin M, Li Y-F, Pan S (2022) Neural temporal walks: Motif-aware representation learning on continuous-time dynamic graphs. Adv Neural Inf Process Syst 35:19874–19886
Poursafaei F, Huang S, Pelrine K, Rabbany R (2022) Towards better evaluation for dynamic link prediction. Adv Neural Inf Process Syst 35:32928–32941
Yu L, Sun L, Du B, Lv W (2023) Towards better dynamic graph learning: new architecture and unified library. Adv Neural Inf Process Syst 36:67686–67700
Acknowledgements
This research was supported by the Key Research and Development Program of Hunan Province (Grant No. 2023SK2038).
Author information
Authors and Affiliations
Contributions
These authors contributed equally to this work.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Notations
See Table 7
B Theoretical analysis of the combined sampling probabilities
1.1 B.1 Theoretical background
In dynamic graph learning, both temporal and spatial information are crucial for understanding the relationships between nodes. Temporal random walks emphasize the relevance of time adjacency, while spatial random walks focus on the connectivity between nodes. To leverage both types of information, it is essential to consider their combined effects.
Temporal sampling probability \(P_t (a)\): The temporal sampling probability reflects the influence of nodes around a specific time t. The formula is given by:
The closer the timestamp \(t_a\) of node a is to the current time t, the higher its probability. This design aims to prioritize nodes that are temporally adjacent, thereby capturing temporal dynamics effectively.
Spatial sampling probability \(P_s (a)\): The spatial sampling probability considers the connectivity of nodes. The formula is:
The higher the degree \(d_a\) of node a, the greater its probability. Nodes with higher degrees are often in more critical positions in the network, so they should be given higher priority during sampling.
1.2 B.2 Theoretical analysis: necessity of averaging
In many practical scenarios, temporal and spatial information are interconnected. In dynamic networks, the connectivity of a node (spatial) may influence its interaction time (temporal) and vice versa. Thus, assuming equal contributions from \(P_t (a)\) and \(P_s (a)\) can effectively reflect the complex relationships present in real-world situations.
In certain situations, nodes that are close in time may not be connected spatially, and vice versa. Therefore, solely considering one factor might result in the loss of crucial information. By averaging, we can balance the effects of both, enhancing the model’s adaptability and generalization capability.
1.3 B.3 Mathematical derivation
To rigorously illustrate this theory, we can approach it from a probability theory perspective.
Assuming \(P_t (a)\) and \(P_s (a)\) are valid probability distributions, we have:
Defining \(P_{\text{combined}}\) as:
We then have:
This indicates that \(P_{\text{combined}}\) is also a valid probability distribution.
Combining the temporal and spatial information through averaging \(P_t (a)\) and \(P_s (a)\) is both reasonable and necessary. This approach not only considers the unique contributions of both factors but also enhances the model’s performance in dynamic graph learning. In summary, taking into account the influences of time and space under different conditions allows us to more effectively uncover the diversity and complexity within dynamic systems.
C Time complexity analysis
In Algorithm 1, the time complexity of the outer loop is O(l), the middle loop is O(C), and the inner loop, which traverses neighbors, has a complexity of O(d), where d is the maximum degree of the nodes. Therefore, the overall time complexity is O(lCd).
D Experimental setting
1.1 D.1 Dataset source
Most of the used original dynamic graph datasets come from Origin Datasets, which can be downloaded here. For convenience, you can also directly download the processed data package from processed_data. Here are the UNTrade contains the food and agriculture trade between 181 nations for more than 30 years. The weight of each link indicates the total sum of normalized agriculture import or export values between two particular countries of these datasets:
-
Wikipedia is a bipartite interaction graph that contains the edits on Wikipedia pages over a month. Nodes represent users and pages, and links denote the editing behaviors with timestamps. Each link is associated with a 172-dimensional Linguistic Inquiry and Word Count (LIWC) feature. This dataset additionally contains dynamic labels that indicate whether users are temporarily banned from editing.
-
Reddit is bipartite and records the posts of users under subreddits during one month. Users and subreddits are nodes, and links are the timestamped posting requests. Each link has a 172-dimensional LIWC feature. This dataset also includes dynamic labels representing whether users are banned from posting.
-
Enron records the email communications between employees of the ENRON energy corporation over three years.
-
UCI is an online communication network, where nodes are university students and links are messages posted by students.
-
Flights is a dynamic flight network that displays the development of air traffic during the COVID-19 pandemic. Airports are represented by nodes and the tracked flights are denoted as links. Each link is associated with a weight, indicating the number of flights between two airports in a day.
-
MOOC is a bipartite interaction network of online sources, where nodes are students and course content units (e.g., videos and problem sets). Each link denotes a student’s access behavior to a specific content unit and is assigned with a 4-dimensional feature.
-
LastFM is bipartite and consists of the information about which songs were listened to by which users over one month. Users and songs are nodes, and links denote the listening behaviors of users.
1.2 D.2 Baselines
-
CTDNE [22] extends the static network embedding to dynamic graphs, where temporal random walks have been proposed with the skip-gram model to learn node representations.
-
DyRep [18] introduces a recurrent architecture to update node states during each interaction. It also includes a temporal attention aggregation module to consider the structural information evolving over time in dynamic graphs.
-
JODIE [20] uses two coupled recurrent neural networks to update the states of users and items. It introduces a projection operation to learn the future representation trajectories of each user/item.
-
TGAT [13] computes node representations by aggregating features from each node’s temporal-topological neighbors through a self-attention mechanism. It also features a time encoding function to capture temporal patterns.
-
TGN [14] maintains an evolving memory for each node, updating it when nodes are observed in interactions. This is achieved through message functions, a message aggregator, and a memory updater. An embedding module generates the temporal representation of nodes.
-
CAWN [15] extracts multiple causal anonymous walks for each node, exploring the causal relationships in the network dynamics and generating relative node identities. It then encodes each walk using a recurrent neural network and aggregates these walks to obtain the final node representation.
-
EdgeBank [37] is a purely memory-based method for transductive dynamic link prediction, with no trainable parameters. It stores observed interactions in memory cells and updates the memory through various strategies.
-
GraphMixer [17] integrates a fixed time encoding function into an MLP-Mixer-based link encoder to learn temporal link relationships.
-
NeurTWs [36] learns temporal node embeddings by combining contrastive learning and random walks with neighbor graphs. The focus is on optimizing node representations by contrasting positive and negative samples.
1.3 D.3 Implementation setails
Our code is available at STAW, where we provide detailed instructions for dataset preparation and model training. The searched ranges of hyperparameters and the related methods are shown in Table 8.
E Time encoding
In our model, time is modeled using a series of cosine functions with different frequencies. We do not directly apply the traditional Fourier transform; instead, we encode the timestamps through a linear transformation to indirectly capture the frequency features in the time series. This process can be viewed as a “Fourier-like”encoding of the time series, with the goal of modeling the periodic characteristics of time using sine and cosine basis functions at different frequencies.
Specifically, \(\Delta t\) is \(t' - t\), the trainable parameter matrix \(\omega\) represents different frequency scales. Each frequency scale corresponds to a specific time period, and during the forward pass, the timestamps are transformed linearly and then passed through the cosine function (cos) to generate the corresponding encoding. This process simulates the decomposition of the time series similar to a Fourier transform, but here we directly generate the frequency features using pre-defined frequency scales (initialized via \(1 / 10^{\frac{9k}{time\_dim}}\) \(for \ k=0,1,2,...,time\_dim-1\) ).
The main motivation for using Fourier transforms or similar approaches is to capture the periodic characteristics of the time series, especially when the data contains multiple frequency components. This method allows the model to extract and represent periodic patterns at different time scales, providing better generalization when dealing with long-term and complex sequential data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sheng, J., Zhang, Y. & Wang, B. Continuous-time dynamic graph learning based on spatio-temporal random walks. J Supercomput 81, 389 (2025). https://doi.org/10.1007/s11227-024-06881-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06881-5