Abstract
The paper is devoted to the issue of clustering short texts, which are free answers gathered during brain storming seminars. Those answers are short, often incomplete, and highly biased toward the question, so establishing a notion of proximity between texts is a challenging task. In addition, the number of answers is counted up to hundred instances, which causes sparsity. We present three text clustering methods in order to choose the best one for this specific task, then we show how the method can be improved by a semantic enrichment, including neural-based distributional models and external knowledge resources. The algorithms have been evaluated on the unique seminar’s data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: a cluster based approach to browsing large document collections. In: Proceedings SIGIR, Copenhagen, pp. 318–329 (1992)
Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 201–212. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23954-0_20
Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Comput. Linguist. 39(3), 709–754 (2013). MIT Press
Flati, T., Navigli, R.: Three birds (in the LLOD cloud) with one stone: BabelNet, Babelfy and the Wikipedia Bitaxonomy. In: Proceedings of SEMANTiCS, Leipzig (2014)
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of 50th Annual Meeting of the ACL (2012)
Kozłowski, M., Rybiński, H.: SnS: a novel word sense induction method. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z.W. (eds.) RSEISP 2014. LNCS, vol. 8537, pp. 258–268. Springer, Cham (2014). doi:10.1007/978-3-319-08729-0_25
Kozlowski, M., Rybinski, H.: Word sense induction with closed frequent termsets. In: Computational Intelligence (2016)
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)
Mikolov, T., Le, Q.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, Beijing (2014)
Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings Conference on Empirical Methods in NLP, Boston, pp. 116–126 (2010)
Navigli, R.: (Digital) goodies from the ERC wishing well: BabelNet, Babelfy, video games with a purpose and the Wikipedia bitaxonomy. In: Proceedings of the 2nd International Workshop on NLP and DBpedia, Italy (2014)
Osiński, S., Stefanowski, J., Weiss, D.: Lingo: search results clustering algorithm based on singular value decomposition. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) IIPWM 2004, vol. 25, pp. 359–368. Springer, Heidelberg (2004)
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intell. Syst. 20(3), 48–54 (2005). IEEE Press
Sutskever, I., Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Rijsbergen, C.: Information Retrieval. Butterworths, London (1979)
Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in NLP and Computational Natural Language Learning (2007)
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 46–54 (1998)
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Comput. Netw. 31(11), 1361–1374 (1999). Elsevier
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kozlowski, M., Rybinski, H. (2017). Semantic Enriched Short Text Clustering. In: Kryszkiewicz, M., Appice, A., Ślęzak, D., Rybinski, H., Skowron, A., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2017. Lecture Notes in Computer Science(), vol 10352. Springer, Cham. https://doi.org/10.1007/978-3-319-60438-1_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-60438-1_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60437-4
Online ISBN: 978-3-319-60438-1
eBook Packages: Computer ScienceComputer Science (R0)