Skip to main content
Log in

Word embedding-based relation modeling in a heterogeneous information network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Heterogeneous information networks are widely used in big data applications. These networks consist of multi-type information objects and relations. The appearance of the network can be changed depending on what perspective is used for modeling. Modeling relations between information objects has attracted recent attention. Although many related works have been proposed, they have limitations: they are hard to apply to unstructured data and they require continuous learning; and the results are often sparse. In this paper, we propose a new method based on a word-embedding technique that deduces various relations between information objects. We create viewpoint data that reflects any perspective on information objects and word embedding carried out by using these data. Using the proposed method, the system quantifies the relations between the information objects in heterogeneous information networks. The experiments use real world data to demonstrate the effectiveness of our methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

    MATH  Google Scholar 

  2. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391

    Article  Google Scholar 

  3. Deng H, Han J, Zhao B, Yu Y, Lin CX (2011) Probabilistic topic models with biased propagation on heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1271–1279

  4. Dumais ST, Furnas GW, Landauer TK, Deerwester S, Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 281–285

    Google Scholar 

  5. Goldberg Y, Levy O (2014)

  6. Huang F, Yates A (2009) Distributional representations for handling sparsity in supervised sequence-labeling. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1-volume 1. Association for Computational Linguistics, pp 495–503

    Google Scholar 

  7. Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 538–543

    Google Scholar 

  8. Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on world wide web. ACM, pp 271–279

    Google Scholar 

  9. Jiang Z, Liu X, Gao L (2015) Chronological citation recommendation with information-need shifting. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 1291–1300

    Google Scholar 

  10. Jung JJ (2015) Big bibliographic data analytics by random walk model. Mobile Networks and Applications 20(4):533–537

    Article  Google Scholar 

  11. Kawale J, Bui HH, Kveton B, Tran-Thanh L, Chawla S (2015) Efficient thompson sampling for online matrix-factorization recommendation. In: Advances in neural information processing systems, pp 1297–1305

    Google Scholar 

  12. Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Processes 25(2-3):259–284

    Article  Google Scholar 

  13. Lin D, Wu X (2009) Phrase clustering for discriminative learning. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 2-volume 2. Association for Computational Linguistics, pp 1030–1038

    Google Scholar 

  14. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  15. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web

  16. Sohn BS, Jung JE (2015) A novel ranking model for a large-scale scientific publication. Mob Netw Appl 20(4):508–520

    Article  Google Scholar 

  17. Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synthesis Lectures on Data Min Knowl Disc 3(2):1–159

    Article  MathSciNet  Google Scholar 

  18. Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4(11):992–1003

    Google Scholar 

  19. Tang D, Wei F, Qin B, Zhou M, Liu T (2014) Building large-scale twitter-specific sentiment lexicon: a representation learning approach COLING, pp 172–182

    Google Scholar 

  20. Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification ACL (1), pp 1555–1565

    Google Scholar 

  21. Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 384–394

    Google Scholar 

  22. Yao K, Mak HF et al (2014) Pathsimext: revisiting pathsim in heterogeneous information networks. In: International conference on web-age information management. Springer, pp 38–42

    Google Scholar 

  23. Zhou Y, Liu L, Buttler D (2015) Integrating vertex-centric clustering with edge-centric clustering for meta path graph analysis. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1563–1572

    Google Scholar 

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and future Planning (NRF - 2015R1A2 A2A01005304).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangyong Han.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seo, J., Choi, S., Kim, Y.A. et al. Word embedding-based relation modeling in a heterogeneous information network. Multimed Tools Appl 77, 18529–18543 (2018). https://doi.org/10.1007/s11042-017-5008-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5008-z

Keywords

Navigation