Abstract
Recently, heterogeneous information network (HIN) embedding is wide studied due to its various applications. In general, network embedding is a way of representation network’s nodes into a low-dimensional space. Most of previous embedding techniques concentrate on the homogeneous networks only in which all nodes are considered as a single type. Heterogeneous network embedding is a challenging problem due to the complexity of different node’s types and link’s types. Recent heterogeneous network embedding studies are based on meta-path and meta-graph to guide the random walks over the networks. These heterogeneous network embedding approaches outperform state-of-the-art homogeneous embedding models in multiple heterogeneous network mining tasks. However, recent meta-graph-based approaches are ineffective in capturing topic similarity between nodes. There is no doubt that most of common HINs (DBLP, Facebook, etc.) are rich-text which contain many text-based nodes, such as paper, comment, post, etc. In this paper, we propose a novel embedding approach, namely W-MetaGraph2Vec. The W-MetaGraph2Vec uses the topic-driven meta-graph-based random walk mechanism in weighted HIN to guide the generation of heterogeneous neighborhood of a node. Extensive experiments on real-world datasets demonstrate that our proposed model not only leverage HIN mining tasks, such as node similarity search, clustering, classification, etc. in performance accuracy but also discern the problems of topic relevance between text-based nodes.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
DBLP dataset: http://dblp.uni-trier.de/.
AMiner dataset: https://aminer.org/.
ACM CCS-2012: https://www.acm.org/publications/class-2012.
Google Scholar Metric (Top Venues): https://scholar.google.com/citations?view_op = top_venues&hl = en.
MovieLens100K: https://grouplens.org/datasets/movielens/.
IMDB website: https://www.imdb.com/.
TMDB website: https://www.themoviedb.org/.
References
Shi C et al (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37
Chang S et al (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 119–128
Sun Y et al (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, New York, pp 565–576
Sun Y et al (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003
Sun Y et al (2011) Co-author relationship prediction in heterogeneous bibliographic networks. In: 2011 international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 121–128
Shi C et al (2012) HeteRecom: a semantic-based recommendation system in heterogeneous networks. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 1552–1555
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., pp 3111–3119
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 701–710
Tang J et al (2015) LINE: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web. International World Wide Web conferences steering committee, pp 1067–1077
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 855–864
Dong Y, Chawla NV, Swami A (2017) metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 135–144
Zhang D et al (2018) MetaGraph2Vec: complex semantic path augmented heterogeneous network embedding. In: Phung D, Tseng V, Webb G, Ho B, Ganji M, Rashidi L (eds) Pacific-Asia conference on knowledge discovery and data mining. Springer, Cham, pp 196–208
Pham P et al (2018) W-PathSim: novel approach of weighted similarity measure in content-based heterogeneous information networks by applying LDA topic modeling. In: Nguyen N, Hoang D, Hong TP, Pham H, Trawinski B (eds) Asian conference on intelligent information and database systems. Springer, Cham, pp 539–549
Do P, Pham P (2018) W-PathSim++: the novel approach of topic-driven similarity search in large-scaled heterogeneous network with the support of Spark-based DataLog. In: 2018 10th international conference on knowledge and systems engineering (KSE). IEEE, pp 102–106
Pham P, Do P (2019) W-MetaPath2Vec: the topic-driven meta-path-based model for large-scaled content-based heterogeneous information network representation learning. Expert Syst Appl 123:328–344
Blei DM, Ng AY (2003) Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Blei DM (2012) Probabilistic topic models. Commun ACM 55:77–84
Tang J, Qu M, Mei Q (2015) Pte: predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 1165–1174
Long Y et al (2016) Domain-specific user preference prediction based on multiple user activities. In: IEEE international conference on big data (big data). IEEE, pp 3913–3921
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Tu C et al (2017) CANE: context-aware network embedding for relation modeling. In: Proceedings of the 55th annual meeting of the association for computational linguistics (vol 1, long papers), pp 1722–1731
Deerwester S et al (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, Burlington, pp 289–296
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20:422–446
Acknowledgements
This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the Grant Number B2017-26-02.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pham, P., Do, P. W-Metagraph2Vec: a novel approval of enriched schematic topic-driven heterogeneous information network embedding. Int. J. Mach. Learn. & Cyber. 11, 1855–1874 (2020). https://doi.org/10.1007/s13042-020-01076-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01076-9