Abstract
Heterogeneous information networks are widely used in big data applications. These networks consist of multi-type information objects and relations. The appearance of the network can be changed depending on what perspective is used for modeling. Modeling relations between information objects has attracted recent attention. Although many related works have been proposed, they have limitations: they are hard to apply to unstructured data and they require continuous learning; and the results are often sparse. In this paper, we propose a new method based on a word-embedding technique that deduces various relations between information objects. We create viewpoint data that reflects any perspective on information objects and word embedding carried out by using these data. Using the proposed method, the system quantifies the relations between the information objects in heterogeneous information networks. The experiments use real world data to demonstrate the effectiveness of our methodology.




Similar content being viewed by others
References
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391
Deng H, Han J, Zhao B, Yu Y, Lin CX (2011) Probabilistic topic models with biased propagation on heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1271–1279
Dumais ST, Furnas GW, Landauer TK, Deerwester S, Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 281–285
Goldberg Y, Levy O (2014)
Huang F, Yates A (2009) Distributional representations for handling sparsity in supervised sequence-labeling. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1-volume 1. Association for Computational Linguistics, pp 495–503
Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 538–543
Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on world wide web. ACM, pp 271–279
Jiang Z, Liu X, Gao L (2015) Chronological citation recommendation with information-need shifting. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 1291–1300
Jung JJ (2015) Big bibliographic data analytics by random walk model. Mobile Networks and Applications 20(4):533–537
Kawale J, Bui HH, Kveton B, Tran-Thanh L, Chawla S (2015) Efficient thompson sampling for online matrix-factorization recommendation. In: Advances in neural information processing systems, pp 1297–1305
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Processes 25(2-3):259–284
Lin D, Wu X (2009) Phrase clustering for discriminative learning. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 2-volume 2. Association for Computational Linguistics, pp 1030–1038
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web
Sohn BS, Jung JE (2015) A novel ranking model for a large-scale scientific publication. Mob Netw Appl 20(4):508–520
Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Min Knowl Disc 3(2):1–159
Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4(11):992–1003
Tang D, Wei F, Qin B, Zhou M, Liu T (2014) Building large-scale twitter-specific sentiment lexicon: a representation learning approach COLING, pp 172–182
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification ACL (1), pp 1555–1565
Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 384–394
Yao K, Mak HF et al (2014) Pathsimext: revisiting pathsim in heterogeneous information networks. In: International conference on web-age information management. Springer, pp 38–42
Zhou Y, Liu L, Buttler D (2015) Integrating vertex-centric clustering with edge-centric clustering for meta path graph analysis. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1563–1572
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and future Planning (NRF - 2015R1A2 A2A01005304).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Seo, J., Choi, S., Kim, Y.A. et al. Word embedding-based relation modeling in a heterogeneous information network. Multimed Tools Appl 77, 18529–18543 (2018). https://doi.org/10.1007/s11042-017-5008-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5008-z