Skip to main content
Log in

Topical network embedding

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Networked data involve complex information from multifaceted channels, including topology structures, node content, and/or node labels etc., where structure and content are often correlated but are not always consistent. A typical scenario is the citation relationships in scholarly publications where a paper is cited by others not because they have the same content, but because they share one or multiple subject matters. To date, while many network embedding methods exist to take the node content into consideration, they all consider node content as simple flat word/attribute set and nodes sharing connections are assumed to have dependency with respect to all words or attributes. In this paper, we argue that considering topic-level semantic interactions between nodes is crucial to learn discriminative node embedding vectors. In order to model pairwise topic relevance between linked text nodes, we propose topical network embedding, where interactions between nodes are built on the shared latent topics. Accordingly, we propose a unified optimization framework to simultaneously learn topic and node representations from the network text contents and structures, respectively. Meanwhile, the structure modeling takes the learned topic representations as conditional context under the principle that two nodes can infer each other contingent on the shared latent topics. Experiments on three real-world datasets demonstrate that our approach can learn significantly better network representations, i.e., 4.1% improvement over the state-of-the-art methods in terms of Micro-F1 on Cora dataset. (The source code of the proposed method is available through the github link: https://github.com/codeshareabc/TopicalNE.)

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abraham A, Pedregosa F, Eickenberg M, Gervais P, Mueller A, Kossaifi J, Gramfort A, Thirion B, Varoquaux G (2014) Machine learning for neuroimaging with scikit-learn. Front Neuroinform 8(2):14

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(1):993–1022

    MATH  Google Scholar 

  • Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of the 19th international symposium on computational statistics, pp 177–186

    Chapter  Google Scholar 

  • Cai X, Han J, Pan S, Yang L (2018a) Heterogeneous information network embedding based personalized query-focused astronomy reference paper recommendation. Int J Comput Intell Syst 11(1):591–599

    Article  Google Scholar 

  • Cai X, Han J, Yang L (2018b) Generative adversarial network based heterogeneous bibliographic network representation for personalized citation recommendation. In: Proceedings of the 32nd AAAI conference on artificial intelligence, pp 5747–5754

  • Chang J, Blei D (2009) Relational topic models for document networks. In: Proceedings of the 12th international conference on artificial intelligence and statistics, pp 81–88

  • Chen J, Zhang Q, Huang X (2016) Incorporate group information to enhance network embedding. In: Proceedings of the 25th ACM international conference on information and knowledge management, pp 1901–1904

  • Dojchinovski M, Vitvar T (2018) Linked web apis dataset. Semant Web 9(4):1–11

    Article  Google Scholar 

  • Griffiths T (2002) Gibbs sampling in the generative model of Latent Dirichlet Allocation. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.3760

  • Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864

  • Gutmann M, Hyvärinen A (2010) Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the 13th international conference on artificial intelligent and statistics, pp 297–304

  • Huang X, Li J, Hu X (2017) Label informed attributed network embedding. In: Proceedings of the 10th ACM international conference on web search and data mining, pp 731–739

  • Jian L, Li J, Liu H (2018) Toward online node classification on streaming networks. Data Min Knowl Discov 32(1):231–257

    Article  MathSciNet  Google Scholar 

  • Kimura M, Saito K, Nakano R, Motoda H (2010) Extracting influential nodes on a social network for information diffusion. Data Min Knowl Discov 20(1):70

    Article  MathSciNet  Google Scholar 

  • Le TM, Lauw HW (2014) Probabilistic latent document network embedding. In: Proceedings of the 14th international conference on data mining, pp 270–279

  • Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on machine learning, pp 1188–1196

  • Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605

    MATH  Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  • Oro E, Pizzuti C, Procopio N, Ruffolo M (2018) Detecting topic authoritative social media users: a multilayer network approach. IEEE Trans Multimed 20(5):1195–1208

    Article  Google Scholar 

  • Pan S, Wu J, Zhu X, Zhang C, Wang Y (2016) Tri-party deep network representation. In: Proceedings of the 25th international joint conference on artificial intelligence, pp 1895–1901

  • Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543

  • Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710

  • Shi M, Liu J, Zhou D, Tang Y (2018a) A topic-sensitive method for mashup tag recommendation utilizing multi-relational service data. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2018.2805826

    Article  Google Scholar 

  • Shi T, Kang K, Choo J, Reddy CK (2018b) Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In: Proceedings of the 27th international conference on world wide web, pp 1105–1114

  • Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067–1077

  • Tu C, Zhang W, Liu Z, Sun M et al (2016) Max-margin DeepWalk: discriminative learning of network representation. In: Proceedings of the 25th international joint conference on artificial intelligence, pp 3889–3895

  • Verma A, Bharadwaj KK (2017) Identifying community structure in a multi-relational network employing non-negative tensor factorization and GA k-means clustering. Wiley Interdiscip Rev Data Min Knowl Discov 7(1):e1196

    Article  Google Scholar 

  • Wang X, Cui P, Wang J, Pei J, Zhu W, Yang S (2017) Community preserving network embedding. In: Proceedings of the 31st AAAI conference on artificial intelligence, pp 203–209

  • Wang C, Song Y, Li H, Zhang M, Han J (2018) Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks. Data Min Knowl Discov 32(6):1735–1767

    Article  MathSciNet  Google Scholar 

  • Yang C, Liu Z, Zhao D, Sun M, Chang EY (2015) Network representation learning with rich text information. In: Proceedings of the 24th international joint conference on artificial intelligence, pp 2111–2117

  • Zhang D, Yin J, Zhu X, Zhang C (2018) Network representation learning: a survey. IEEE Trans Big Data. https://doi.org/10.1109/TBDATA.2018.2850013

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the US National Science Foundation (NSF) through Grants Nos. IIS-1763452 and CNS-1828181.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yufei Tang.

Additional information

Responsible editor: Po-ling Loh, Evimaria Terzi, Antti Ukkonen, Karsten Borgwardt.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, M., Tang, Y., Zhu, X. et al. Topical network embedding. Data Min Knowl Disc 34, 75–100 (2020). https://doi.org/10.1007/s10618-019-00659-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-019-00659-7

Keywords

Navigation