Abstract
Due to the limited length and freely constructed sentence structures, short text is different from normal text, which makes traditional algorithm of text representation does not work well on it. This paper proposes a model called Conceptual and Semantic Enrichment with Topic Model (CSET) by combining Biterm Topic Model (BTM), a widely used probabilistic topic model which is designed for short text with Probase, a large-scale probabilistic knowledge base. CSET is able to capture semantic relations between words to enrich short text. Our model enables large amount of applications that rely on semantic understanding of short text, including short text classification and word similarity measurement in context.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 787–788 (2007)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J Mach. Learn. Res. Arch. 3, 993–1022 (2003)
Chen, M., Shen, D., Shen, D.: Short text classification improved by learning multi-granularity topics. In: International Joint Conference on Artificial Intelligence, pp. 1776–1781 (2011)
Hu, J., et al.: Enhancing text clustering by leveraging Wikipedia semantics, pp. 179–186 (2008)
Kim, D., Wang, H., Oh, A.: Context-dependent conceptualization. In: International Joint Conference on Artificial Intelligence, pp. 2654–2661 (2013)
Ning, Y.H., Zhang, L., Ju, Y.R., Wang, W.J., Li, S.Q.: Using semantic correlation of hownet for short text classification. Appl. Mech. Mater. 513–517, 1931–1934 (2014)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp. 91–100 (2015)
Pietra, S.A.D., Pietra, S.A.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22, 39–71 (1996)
Shen, D., et al.: Query enrichment for web-query classification. ACM Trans. Inf. Syst. 24(3), 320–352 (2006)
Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledgebase. In: International Joint Conference on Artificial Intelligence, pp. 2330–2336 (2011)
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding, pp. 481–492 (2012)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts, pp. 1445–1456 (2013)
Acknowledgement
This work is supported in part by the National Natural Science Foundation of China under Grant 61170035, 61272420 and 81674099, Six talent peaks project in Jiangsu Province (Grant No. 2014 WLW-004), the Fundamental Research Funds for the Central Universities (Grant No. 30916011328, 30918015103), Jiangsu Province special funds for transformation of science and technology achievement (Grant No. BA2013047).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Shi, Q., Wang, Y., Sun, J., Fu, A. (2018). Short Text Understanding Based on Conceptual and Semantic Enrichment. In: Gan, G., Li, B., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2018. Lecture Notes in Computer Science(), vol 11323. Springer, Cham. https://doi.org/10.1007/978-3-030-05090-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-05090-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05089-4
Online ISBN: 978-3-030-05090-0
eBook Packages: Computer ScienceComputer Science (R0)