Abstract
With the boom of study on heterogeneous network, searching relevant objects of different types has become a research focus. For example, people are interested in finding actors who cooperate with the famous director Steven Spielberg the most frequently in movie network. Considering the time and memory consuming drawbacks of traditional random walk models, this paper presents a random path sampling measure RSSim, where the tradeoff can be made between efficiency and estimating accuracy, to discover relevant objects in heterogeneous network. The key idea of this algorithm is that we use a Monte Carlo simulation to make an \(\varepsilon \)-approximation to our relevance measure defined on meta path, an important concept to catch up the semantic meaning of a search. The lightweight property and quickness of Monte Carlo simulation make the algorithm applicable to large scale networks. Moreover, we give the theoretical proofs for the error bound and confidence followed in the process of estimation. Experiments validate that RSSim is 100 times faster than several optional methods and can make a good ranking accuracy approximation to the baseline with a small sample size.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Fogaras, D., Rácz, B.: Towards scaling fully personalized PageRank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004)
Jarrelin, B.K., Kekalainen, J.: (2002) cumulated gain based evaluation of ir techniques. In: ACM Transactions on Information system (2010)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002)
Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th International Conference on World Wide Web, pp. 271–279 (2003)
Kusumoto, M., Maehara, T., Kawarabayashi, K.i.: Scalable similarity search for simrank. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 325–336. ACM (2014)
Lao, N., Cohen, W.W.: Fast query execution for retrieval models based on path-constrained random walks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 881–888 (2010)
Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.: Walking in the cloud: parallel simrank at scale. Proc. VLDB Endowment 9(1), 24–35 (2015)
Meng, X., Shi, C., Li, Y., Zhang, L., Wu, B.: Relevance measure in large-scale heterogeneous networks. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds.) APWeb 2014. LNCS, vol. 8709, pp. 636–643. Springer, Heidelberg (2014)
Lao, N.: W.W.C.: relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81, 53–67 (2010)
Shao, Y., Cui, B., Chen, L., Liu, M., Xie, X.: An efficient similarity search framework for simrank over large dynamic graphs. Proc. VLDB Endowment 8(8), 838–849 (2015)
Shi, C., Kong, X., Huang, Y., Yu, P.S.: Hetesim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)
Shi, C., Kong, X., Yu, P.S., Xie, S., Wu, B.: Relevance search in heterogeneous networks. In. In Proceedings of 2012 International Conference on Extending Database Technology (EDBT 2012), pp. 180–191 (2012)
Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. CoRR abs/1511.04854 (2015). http://arxiv.org/abs/1511.04854
Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: VLDB 2011 (2011)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theor. Probab. Appl. 17(2), 264–280 (1971)
Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., Li, J.: Panther: fast top-k similarity search in large networks. CoRR abs/1504.02577 (2015). http://arxiv.org/abs/1504.02577
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Gu, Q., Zhang, C., Sun, T., Ji, Y., Hu, Z., Qiu, X. (2016). Path Sampling Based Relevance Search in Heterogeneous Networks. In: Wang, Y., Yu, G., Zhang, Y., Han, Z., Wang, G. (eds) Big Data Computing and Communications. BigCom 2016. Lecture Notes in Computer Science(), vol 9784. Springer, Cham. https://doi.org/10.1007/978-3-319-42553-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-42553-5_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42552-8
Online ISBN: 978-3-319-42553-5
eBook Packages: Computer ScienceComputer Science (R0)