Skip to main content
Log in

ShortWalk: an approach to network embedding on directed graphs

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

In network embedding algorithms, long random walks are often used to convert the graph into ‘text’ so that node embeddings can be learned by skip-gram with negative sampling model. However, in a directed graph, long random walks can be trapped or interrupted, leading to low-quality embeddings. This paper proposes a new algorithm, called ShortWalk, to improve the directed graph network embeddings. ShortWalk performs short random walks that restart more frequently thus produces shorter traces. It also gives nodes equal weights by generating the training pairs using pair-wise combination of nodes on the traces. We validate our method on eight directed graphs with different sizes and structures. Experimental results confirm that ShortWalk outperforms DeepWalk consistently on all datasets in node classification and link prediction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://zhang18f.myweb.cs.uwindsor.ca/shortwalk.

References

  • Abu-El-Haija S, Perozzi B, Al-Rfou R (2017) Learning edge representations via low-rank asymmetric projections. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 1787–1796

  • Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems, pp 585–591

  • Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(2003):1137–1155

    MATH  Google Scholar 

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117. https://doi.org/10.1016/S0169-7552(98)00110-X

    Article  Google Scholar 

  • Cai H, Zheng VW, Chang KC-C (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637

    Article  Google Scholar 

  • Cao S, Lu W, Xu Q (2015) Grarep: learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900

  • Chen M, Yang Q, Tang X et al (2007) Directed graph embedding. In: IJCAI, pp 2707–2712

  • Dong Y, Chawla NV, Swami A (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 135–144

  • Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: a survey. Knowl-Based Syst 151:78–94. https://doi.org/10.1016/j.knosys.2018.03.022

    Article  Google Scholar 

  • Grolmusz V (2015) A note on the pagerank of undirected graphs. Inf Process Lett 115(6–8):633–634. https://doi.org/10.1016/j.ipl.2015.02.015

    Article  MathSciNet  MATH  Google Scholar 

  • Grover A, Leskovec J (2016a) Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 855–864

  • Grover A, Leskovec J (2016b) node2vec: scalable feature learning for networks. In: SIGKDD. ACM, pp 855–864

  • Gutmann M, Hyvärinen A (2010) Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 297–304

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36. https://doi.org/10.1148/radiology.143.1.7063747

    Article  Google Scholar 

  • Jacomy M, Venturini T, Heymann S, Bastian M (2014) ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS ONE 9(6):e98679. https://doi.org/10.1371/journal.pone.0098679

    Article  Google Scholar 

  • Khosla M, Leonhardt J, Nejdl W, Anand A (2019) Node representation learning for directed graphs. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 395–411

  • Kim J, Park H, Lee J-E, Kang U (2018) Side: representation learning in signed directed networks. In: Proceedings of the 2018 World Wide Web Conference, pp 509–518

  • Kunegis J (2013) KONECT: the Koblenz Network Collection. In: Proceedings of the 22nd international conference on World Wide Web (WWW ’13 Companion). ACM, Rio de Janeiro, Brazil, pp 1343–1350. https://doi.org/10.1145/2487788.2488173

  • Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection, June 2014

  • Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031. https://doi.org/10.1002/asi.20591

    Article  Google Scholar 

  • Liu L, Cheung WK, Li X, Liao L (2016) Aligning users across social networks using network embedding. In: IJCAI, pp 1774–1780

  • Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instrum Comput 28(2):203–208. https://doi.org/10.3758/BF03204766

    Article  Google Scholar 

  • Man T, Shen H, Liu S, Jin X, Cheng X (2016) Predict anchor links across social networks via an embedding approach. IJCAI 16:1823–1829

    Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  • Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge Discovery and Data Mining—KDD ’16. ACM Press, San Francisco, California, USA, pp 1105–1114. https://doi.org/10.1145/2939672.2939751

  • Palla G, Farkas IJ, Pollner P, Derényi I, Vicsek T (2007) Directed network modules. New J Phys 9(6):186–186. https://doi.org/10.1088/1367-2630/9/6/186

    Article  Google Scholar 

  • Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543

  • Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 701–710

  • Perozzi B, Kulkarni V, Skiena S (2016) Walklets: multiscale graph embeddings for interpretable network classification. arXiv preprint arXiv:1605.02115

  • Řehuřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. ELRA, Valletta, Malta, pp 45–50

  • Sen T, Chaudhary DK (2017) Contrastive study of simple PageRank, HITS and weighted PageRank algorithms: review. In: 2017 7th international conference on cloud computing, data science engineering—Confluence, pp 721–727. https://doi.org/10.1109/CONFLUENCE.2017.7943245

  • Sun J, Bandyopadhyay B, Bashizade A, Liang J, Sadayappan P, Parthasarathy S (2018) ATP: directed graph embedding with asymmetric transitivity preservation. CoRR arXiv:1811.00839

  • Tang J, Meng Q, Wang M, Zhang M, Yan J, Mei Qaozhu (2015a) LINE: large-scale information network embedding. In: WWW. ACM, pp 1067–1077

  • Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015b) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web. ACM, pp 1067–1077

  • Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  • Tsitsulin A, Mottin D, Karras P, Müller E (2018) Verse: versatile graph embeddings from similarity measures. In: Proceedings of the 2018 World Wide Web Conference, pp 539–548

  • Van Der Maaten L (2014) Accelerating T-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245

    MathSciNet  MATH  Google Scholar 

  • Wang Y, Yao Y, Tong H, Feng X, Jian L (2020) A brief review of network embedding. Big Data Min Anal 2(1):35–47

    Article  Google Scholar 

  • Yang Z, Tang J, Cohen W (2015) Multi-modal Bayesian embeddings for learning social knowledge graphs. arXiv preprint arXiv:1508.00715

  • Zhang D, Yin J, Zhu X, Zhang C (2017) Network representation learning: a survey. arXiv:1801.05852 [cs, stat]

  • Zhang Y, Lu J, Shai O (2018) Improve network embeddings with regularization. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 1643–1646

  • Zhao F, Zhang Y, Jianguo L, Shai O (2019) Measuring academic influence using heterogeneous author-citation networks. Scientometrics 118(3):1119–1140. https://doi.org/10.1007/s11192-019-03010-5

    Article  Google Scholar 

  • Zhou C, Liu Y, Liu X, Liu Z, Gao J (2017a) Scalable graph embedding for asymmetric proximity. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 2942–2948

  • Zhou C, Liu Y, Liu X, Liu Z, Gao J (2017b) Scalable graph embedding for asymmetric proximity. In: Thirty-first AAAI conference on artificial intelligence

Download references

Acknowledgements

This research is supported by NSERC Discovery Grant RGPIN-2019-05350.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianguo Lu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, F., Zhang, Y. & Lu, J. ShortWalk: an approach to network embedding on directed graphs. Soc. Netw. Anal. Min. 11, 15 (2021). https://doi.org/10.1007/s13278-020-00714-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-020-00714-y

Keywords

Navigation