Skip to main content
Log in

W-Metagraph2Vec: a novel approval of enriched schematic topic-driven heterogeneous information network embedding

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Recently, heterogeneous information network (HIN) embedding is wide studied due to its various applications. In general, network embedding is a way of representation network’s nodes into a low-dimensional space. Most of previous embedding techniques concentrate on the homogeneous networks only in which all nodes are considered as a single type. Heterogeneous network embedding is a challenging problem due to the complexity of different node’s types and link’s types. Recent heterogeneous network embedding studies are based on meta-path and meta-graph to guide the random walks over the networks. These heterogeneous network embedding approaches outperform state-of-the-art homogeneous embedding models in multiple heterogeneous network mining tasks. However, recent meta-graph-based approaches are ineffective in capturing topic similarity between nodes. There is no doubt that most of common HINs (DBLP, Facebook, etc.) are rich-text which contain many text-based nodes, such as paper, comment, post, etc. In this paper, we propose a novel embedding approach, namely W-MetaGraph2Vec. The W-MetaGraph2Vec uses the topic-driven meta-graph-based random walk mechanism in weighted HIN to guide the generation of heterogeneous neighborhood of a node. Extensive experiments on real-world datasets demonstrate that our proposed model not only leverage HIN mining tasks, such as node similarity search, clustering, classification, etc. in performance accuracy but also discern the problems of topic relevance between text-based nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. DBLP dataset: http://dblp.uni-trier.de/.

  2. AMiner dataset: https://aminer.org/.

  3. ACM CCS-2012: https://www.acm.org/publications/class-2012.

  4. Google Scholar Metric (Top Venues): https://scholar.google.com/citations?view_op = top_venues&hl = en.

  5. MovieLens100K: https://grouplens.org/datasets/movielens/.

  6. IMDB website: https://www.imdb.com/.

  7. TMDB website: https://www.themoviedb.org/.

References

  1. Shi C et al (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37

    Article  Google Scholar 

  2. Chang S et al (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 119–128

  3. Sun Y et al (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, New York, pp 565–576

  4. Sun Y et al (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003

    Article  Google Scholar 

  5. Sun Y et al (2011) Co-author relationship prediction in heterogeneous bibliographic networks. In: 2011 international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 121–128

  6. Shi C et al (2012) HeteRecom: a semantic-based recommendation system in heterogeneous networks. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 1552–1555

  7. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., pp 3111–3119

  8. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 701–710

  9. Tang J et al (2015) LINE: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web. International World Wide Web conferences steering committee, pp 1067–1077

  10. Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 855–864

  11. Dong Y, Chawla NV, Swami A (2017) metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 135–144

  12. Zhang D et al (2018) MetaGraph2Vec: complex semantic path augmented heterogeneous network embedding. In: Phung D, Tseng V, Webb G, Ho B, Ganji M, Rashidi L (eds) Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 196–208

    Chapter  Google Scholar 

  13. Pham P et al (2018) W-PathSim: novel approach of weighted similarity measure in content-based heterogeneous information networks by applying LDA topic modeling. In: Nguyen N, Hoang D, Hong TP, Pham H, Trawinski B (eds) Asian conference on intelligent information and database systems. Springer, Cham, pp 539–549

    Chapter  Google Scholar 

  14. Do P, Pham P (2018) W-PathSim++: the novel approach of topic-driven similarity search in large-scaled heterogeneous network with the support of Spark-based DataLog. In: 2018 10th international conference on knowledge and systems engineering (KSE). IEEE, pp 102–106

  15. Pham P, Do P (2019) W-MetaPath2Vec: the topic-driven meta-path-based model for large-scaled content-based heterogeneous information network representation learning. Expert Syst Appl 123:328–344

    Article  Google Scholar 

  16. Blei DM, Ng AY (2003) Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  17. Blei DM (2012) Probabilistic topic models. Commun ACM 55:77–84

    Article  Google Scholar 

  18. Tang J, Qu M, Mei Q (2015) Pte: predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 1165–1174

  19. Long Y et al (2016) Domain-specific user preference prediction based on multiple user activities. In: IEEE international conference on big data (big data). IEEE, pp 3913–3921

  20. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  21. Tu C et al (2017) CANE: context-aware network embedding for relation modeling. In: Proceedings of the 55th annual meeting of the association for computational linguistics (vol 1, long papers), pp 1722–1731

  22. Deerwester S et al (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407

    Article  Google Scholar 

  23. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, Burlington, pp 289–296

  24. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20:422–446

    Article  Google Scholar 

Download references

Acknowledgements

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the Grant Number B2017-26-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phu Pham.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pham, P., Do, P. W-Metagraph2Vec: a novel approval of enriched schematic topic-driven heterogeneous information network embedding. Int. J. Mach. Learn. & Cyber. 11, 1855–1874 (2020). https://doi.org/10.1007/s13042-020-01076-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-020-01076-9

Keywords

Navigation