Abstract
Measuring similarity of objects in information network is a primitive problem and has attracted many studies for widely applications, such as recommendation and information retrieval. With the advent of large-scale heterogeneous information network that consist of multi-type relationships, it is important to research similarity measure in such networks. However, most existing similarity measures are defined for homogeneous network and cannot be directly applied to HINs since different semantic meanings behind edges should be considered. This paper proposes GSimRank that is the extended form of the famous SimRank to compute similarity on HINs. Rather than summing all meeting paths for two nodes in SimRank, GSimRank selects linked nodes of the same semantic category as the next step in the pairwise random walk, which ensure the two meeting paths share the same semantic. Further, in order to weight the semantic edges, we propose a domain-independent edge weight evaluation method based on entropy theory. Finally, we proof that GSimRank is still based on the expected meeting distance model and provide experiments on two real world datasets showing the performance of GSimRank.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Fang, Y., Lin, W., et al.: Metagraph-based learning on heterogeneous graphs. IEEE Trans. Knowl. Data Eng. 1–15 (2019)
Xiao, C., Wang, W., Lin, X., et al.: Top-k set similarity joins. In: Proceedings of the 25th International Conference on Data Engineering, pp. 916–927 (2009)
Wei, Z., He, X., et al.: PRSim: sublinear time SimRank computation on large power-law graphs. In: Proceedings of the ACM SIGMOD, pp. 1042–1059 (2019)
Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th International Conference on World Wide Web, pp. 271–279 (2003)
Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD, pp. 824–833 (2007)
Li, Y., Li, W.: Meta-path augmented response generation. In: The Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9971–9972 (2019)
Wang, Y., Chen, L., Che, Y., Luo, Q.: Accelerating pairwise SimRank estimation over static and dynamic graphs. VLDB J. 28(1), 99–122 (2018). https://doi.org/10.1007/s00778-018-0521-x
Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD, pp. 538–543 (2002)
Fang, Y., Lin, W., Zheng, V.W., Wu, M., Chang, K.C., Li, X.: Semantic proximity search on graphs with metagraph-based learning. In: ICDE, pp. 277–288 (2016)
Maguitman, A.G., et al.: Algorithmic computation and approximation of semantic similarity. In: Proceedings of World Wide Web, pp. 431–456 (2006)
Sun, Y., Han, J.H., et al.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. Very Large Data Bases 4(11), 992–1003 (2011)
Lao, N., et al.: Relational retrieval using a combination of path-constrained random walks. In: Proceedings of the European Conference on Machine Learning, pp. 53–67 (2010)
Shi, C., Kong, X., Huang, Y., Yu, P.S., Wu, B.: HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)
Zhang, X., Mei, C., Chen, D., Li, J.: Feature selection in mixed data. Pattern Recogn. 56, 1–15 (2016)
Tong, H., Faloutsos, C., Pan, J.Y.: Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining, pp. 613–622 (2006)
Jin, R., Lee, V.E., Hong, H.: Axiomatic ranking of network role similarity. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 922–930 (2011)
Daniel, F., Balazs, R.: Towards scaling fully personalized PageRank. In: Proceedings of the Algorithms and Models for the Web-Graph: Third International Workshop (2004)
Gupta, M., Pathak, A., Chakrabarti, S.: Fast algorithms for top-k personalized PageRank queries. In: Proceedings of the World Wide Web Conference (2008)
Cai, Y., Li, P., Liu, H., He, J., Du, X.: S-SimRank: combining content and link information to cluster papers effectively and efficiently. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 317–329. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88192-6_30
Acknowledgement
This research is supported by Shandong Provincial Key Research and Development Program no. 2019JZZY010105, NSF of Shandong, China no. ZR2017MF065.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, C., Hong, X., Peng, Z. (2020). GSimRank: A General Similarity Measure on Heterogeneous Information Network. In: Wang, X., Zhang, R., Lee, YK., Sun, L., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2020. Lecture Notes in Computer Science(), vol 12317. Springer, Cham. https://doi.org/10.1007/978-3-030-60259-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-60259-8_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60258-1
Online ISBN: 978-3-030-60259-8
eBook Packages: Computer ScienceComputer Science (R0)