Abstract
Recently, several researchers have incorporated network information to enhance document classification. However, these methods are tied to some specific network representations and are unable to exploit different representations to take advantage of data specific properties. Moreover, they do not utilize the complementary information from one source to the other, and do not fully leverage the label information. In this paper, we propose CrossTL, a novel representation model, to find better representations for classification. CrossTL improves the learning at three levels: (1) at the input level, it is a general framework which can accommodate any useful text or graph embeddings, (2) at the structure level, it learns a text-to-link and link-to-text representation to comprehensively describe the data; (3) at the objective level, it bounds the error rate by incorporating four types of losses, i.e., text, link, and the combination and disagreement of text and link, into the loss function. Extensive experimental results demonstrate that CrossTL significantly outperforms the state-of-the-art representations on datasets with either rich or poor texts and links.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR 3, 1137–1155 (2003)
Bottou, L.: Stochastic gradient learning in neural networks. In: Neuro-Nîmes (1991)
Bui, T.D., Ravi, S., Ramavajjala, V.: Neural graph machines: learning neural networks using graphs. In: ICIR (2017)
Chang, J., Blei, D.M.: Relational topic models for document networks. In: Proceedings of International Conference on Artificial Intelligence and Statistics, pp. 81–88 (2009)
Chen, J., Zhang, Q., Huang, X.: Incorporate group information to enhance network embedding. In: Proceedings of CIKM, pp. 1901–1904 (2016)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. JMLR 9, 1871–1874 (2008)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of SIGKDD, pp. 855–864 (2016)
Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: Proceedings of SIGKDD, pp. 593–598 (2004)
Laurens, V.D.M., Hinton, G.: Visualizing data using t-SNE. JMLR 9(2605), 2579–2605 (2008)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of ICML, pp. 1188–1196 (2014)
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of ICML, pp. 496–503 (2003)
Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of WWW, pp. 101–110 (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR (2013)
Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of ICML, pp. 641–648 (2007)
Pan, S., Wu, J., Zhu, X., Zhang, C., Wang, Y.: Tri-party deep network representation. In: Proceedings of 25th IJCAI, pp. 1895–1901 (2016)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of SIGKDD, pp. 701–710 (2014)
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of EMNLP, pp. 1422–1432 (2015)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: Proceedings of WWW, pp. 1067–1077 (2015)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of SIGKDD, pp. 990–998 (2008)
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proceedings of the 18th UAI, pp. 485–492 (2002)
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of SIGKDD, pp. 1225–1234 (2016)
Yang, C., Liu, Z., Zhao, D., Sun, M., Chang, E.Y.: Network representation learning with rich text information. In: Proceedings of 24th IJCAI, pp. 2111–2117 (2015)
Zhang, X., Hu, X., Zhou, X.: A comparative evaluation of different link types on enhancing document clustering. In: Proceedings of SIGIR, pp. 555–562 (2008)
Acknowledgments
The work described in this paper has been supported in part by the NSFC project (61572376).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
You, Z., Qian, T. (2018). Learning a Joint Representation for Classification of Networked Documents. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11305. Springer, Cham. https://doi.org/10.1007/978-3-030-04221-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-04221-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04220-2
Online ISBN: 978-3-030-04221-9
eBook Packages: Computer ScienceComputer Science (R0)