Skip to main content
Log in

Network embedding based on high-degree penalty and adaptive negative sampling

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Network embedding can effectively dig out potentially useful information and discover the relationships and rules which exist in the data, that has attracted increasing attention in many real-world applications. The goal of network embedding is to map high-dimensional and sparse networks into low-dimensional and dense vector representations. In this paper, we propose a network embedding method based on high-degree penalty and adaptive negative sampling (NEPS). First, we analyze the problem of imbalanced node training in random walk and propose an indicator base on high-degree penalty, which can control the random walk and avoid over-sampling high-degree neighbor node. Then, we propose a two-stage adaptive negative sampling strategy, which can dynamically obtain negative samples suitable for the current training according to the training stage to improve training effect. By comparing with seven well-known network embedding algorithms on eight real-world data sets, experiments show that the NEPS has good performance in node classification, network reconstruction and link prediction. The code is available at: https://github.com/Andrewsama/NEPS-master.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. (2016) Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16), pp 265–283

  • Adhikari B, Zhang Y, Ramakrishnan N, Prakash BA (2018) Sub2vec: Feature learning for subgraphs. In: Pacific-Asia conference on knowledge discovery and data mining, Springer, pp 170–182

  • Alanis-Lobato G, Mier P, Andrade-Navarro MA (2016) Efficient embedding of complex networks to hyperbolic space via their Laplacian. Sci Rep 6(1):1–10

    Article  Google Scholar 

  • Armandpour M, Ding P, Huang J, Hu X (2019) Robust negative sampling for network embedding. In: Proceedings of the AAAI conference on artificial intelligence 33:3191–3198

  • Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6):1373–1396

    Article  Google Scholar 

  • Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Zhang J, Sun S, Ling L, Zhang N et al (2003) Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Res 31(9):2443–2450

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900

  • Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 30

  • Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 119–128

  • Chen H, Perozzi B, Hu Y, Skiena S (2018) Harp: Hierarchical representation learning for networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32

  • Cox MA, Cox TF (2008) Multidimensional scaling. In: Handbook of data visualization, Springer, pp 315–347

  • Dai Q, Li Q, Tang J, Wang D (2018) Adversarial network embedding. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  • Donahue J, Simonyan K (2019) Large scale adversarial representation learning. arXiv preprint arXiv:1907.02544

  • Dong Y, Chawla NV, Swami A (2017) metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 135–144

  • Feng R, Yang Y, Hu W, Wu F, Zhang Y (2018) Representation learning for scale-free networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  • Gao H, Huang H (2018) Self-paced network embedding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1406–1415

  • Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864

  • Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st international conference on neural information processing systems, pp 1025–1035

  • Hou Z, Cen Y, Dong Y, Zhang J, Tang J (2021) Automated unsupervised graph representation learning. IEEE Trans Knowl Data Eng 35:2285–2298

    Google Scholar 

  • Hu B, Fang Y, Shi C (2019) Adversarial learning on heterogeneous information networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 120–129

  • Huang X, Li J, Hu X (2017) Label informed attributed network embedding. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 731–739

  • Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93

    Article  Google Scholar 

  • Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

  • Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD) 1(1):2–es

  • Li AQ, Ahmed A, Ravi S, Smola AJ (2014) Reducing the sampling complexity of topic models. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 891–900

  • Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A: Stat Mech Appl 390(6):1150–1170

    Article  Google Scholar 

  • Mahoney M (2011) Large text compression benchmark

  • Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  • Narayanan A, Chandramohan M, Chen L, Liu Y, Saminathan S (2016) subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. arXiv preprint arXiv:1606.08928

  • Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104

    Article  ADS  MathSciNet  CAS  Google Scholar 

  • Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1105–1114

  • Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710

  • Perozzi B, Kulkarni V, Chen H, Skiena S (2017) Don’t walk, skip! online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 258–265

  • Ribeiro LF, Saverese PH, Figueiredo DR (2017) struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 385–394

  • Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  ADS  CAS  PubMed  Google Scholar 

  • Rozemberczki B, Allen C, Sarkar R (2021) Multi-scale attributed node embedding. J Complex Netw cnab9(2):014

    MathSciNet  Google Scholar 

  • Shao J (2006) Mathematical statistics: exercises and solutions. Springer Science & Business Media

  • Shaw B, Jebara T (2009) Structure preserving embedding. In: Proceedings of the 26th annual international conference on machine learning, pp 937–944

  • Spearman C (1987) The proof and measurement of association between two things. Am J Psychol 100(3/4):441–471

    Article  CAS  PubMed  Google Scholar 

  • Tang J, Qu M, Mei Q (2015a) Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1165–1174

  • Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015b) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067–1077

  • Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  ADS  CAS  PubMed  Google Scholar 

  • Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903

  • Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1225–1234

  • Wang J, Yu L, Zhang W, Gong Y, Xu Y, Wang B, Zhang P, Zhang D (2017) Irgan: a minimax game for unifying generative and discriminative information retrieval models. In: Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pp 515–524

  • Wang J, Huang P, Zhao H, Zhang Z, Zhao B, Lee DL (2018) Billion-scale commodity embedding for e-commerce recommendation in alibaba. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 839–848

  • Wang X, Zhang Y, Shi C (2019) Hyperbolic heterogeneous information network embedding. In: Proceedings of the AAAI conference on artificial intelligence 33:5337–5344

  • Wang Z, Ye X, Wang C, Cui J, Yu P (2020) Network embedding with completely-imbalanced labels. IEEE Trans Knowl Data Eng 33:3634–3647

    Article  Google Scholar 

  • Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint arXiv:1810.00826

  • Yang C, Liu Z, Zhao D, Sun M, Chang E (2015) Network representation learning with rich text information. In: Twenty-fourth international joint conference on artificial intelligence

  • Yang Z, Ding M, Zhou C, Yang H, Zhou J, Tang J (2020) Understanding negative sampling in graph representation learning. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1666–1676

  • Yin H, Benson AR, Leskovec J, Gleich DF (2017) Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 555–564

  • Zhang J, Shi X, Xie J, Ma H, King I, Yeung DY (2018) Gaan: Gated attention networks for learning on large and spatiotemporal graphs. arXiv preprint arXiv:1803.07294

  • Zhang J, Dong Y, Wang Y, Tang J, Ding M (2019) Prone: Fast and scalable network representation learning. IJCAI 19:4278–4284

    Google Scholar 

  • Zhang W, Chen T, Wang J, Yu Y (2013) Optimizing top-n collaborative filtering via dynamic negative item sampling. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp 785–788

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61773448 and 62176236.)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu-Hua Yang.

Additional information

Responsible editor: Evangelos Papalexakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, GF., Yang, XH., Ye, W. et al. Network embedding based on high-degree penalty and adaptive negative sampling. Data Min Knowl Disc 38, 597–622 (2024). https://doi.org/10.1007/s10618-023-00973-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00973-1

Keywords

Navigation