Abstract
Exponential kernel, which models semantic similarity by means of a diffusion process on a graph defined by lexicon and co-occurrence information, has been successfully applied to the task of text categorization. However, the diffusion is an unsupervised process, which fails to exploit the class information in a supervised classification scenario. To address the limitation, we present a class-informed exponential kernel to make use of the class knowledge of training documents in addition to the co-occurrence knowledge. The basic idea is to construct an augmented term-document matrix by encoding class information as additional terms and appending to training documents. Diffusion is then performed on the augmented term-document matrix. In this way, the words belonging to the same class are indirectly drawn closer to each other, hence the class-specific word correlations are strengthened. The proposed approach was demonstrated with several variants of the popular 20Newsgroup data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, New York (2004)
Bloehdorn, S., Basili, R., Cammisa, M., Moschitti, A.: Semantic kernels for text categorization based on topological measures of feature similarity. In: Proceedings of the 6th IEEE International Conference on Data Mining, Hong Kong, China, pp. 808–812 (2006)
Wang, P., Domeniconi, C.: Building semantic kernels for text categorization using Wikipedia. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 713–721 (2008)
Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. J. Intell. Inf. Syst. 18(2–3), 127–152 (2002)
Kandola, J., Shawe-Taylor, J., Cristianini, N.: Learning semantic similarity. In: Advances in Neural Information Processing Systems, vol. 15, pp. 657–664 (2003)
Gliozzo, A.M., Strapparava, C.: Domain kernels for text categorization. In: Proceedings of the 9th Conference on Computational Natural Language Learning, Ann Arbor, USA, pp. 56–63 (2005)
Chen, J., Zhong, J., Xie, Y., Cai, C.: Text categorization using SVM with exponential kernel. Appl. Mech. Mater. 519–520, 807–810 (2014)
Altınel, B., Caniz, M.C., Diri, B.: A corpus-based semantic kernel for text categorization by using meaning values of terms. Eng. Appl. Artif. Intell. 43, 54–66 (2015)
Wang, T., Rao, J., Hu, Q.: Supervised word sense disambiguation using semantic diffusion kernel. Eng. Appl. Artif. Intell. 27, 167–174 (2014)
Chakraborti, S., Lothian, R., Wiratunga, N., Watt, S.N.: Sprinkling: supervised latent semantic indexing. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 510–514. Springer, Heidelberg (2006)
Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., Harper, D.: Supervised latent semantic indexing using adaptive sprinkling. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, pp. 1582–1587 (2007)
Hingmire, S., Chakraborti, S.: Sprinkling topics for weakly supervised text categorization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 2, Short Paper, Baltimore, USA, pp. 55–60 (2014)
Holzman, L.E., Fisher, T.A., Galitsky, L.M., Kontostathis, A., Pottenger, W.M.: A software infrastructure for research in textual data mining. Int. J. Artif. Intell. Tools 14(4), 829–849 (2004)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Acknowledgements
This work is supported in part by the National Natural Science Foundation of China (No. 61562003), the Natural Science Foundation of Jiangxi Province of China (Nos. 20151BAB207029 and 20161BAB202070), the China Scholarship Council (No. 201508360144) and the “Bai Ren Yuan Hang” Project of Jiangxi Province of China in 2015.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Wang, T., Li, W. (2016). Learning Class-Informed Semantic Similarity. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-46675-0_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46674-3
Online ISBN: 978-3-319-46675-0
eBook Packages: Computer ScienceComputer Science (R0)