Abstract
The presented work deals with automatic detection of semantic contents of groups of textual documents, which are freely written in various natural languages. The large original set of untagged documents is split between a requested number of clusters according to a user’s needs. Each cluster is taken as a class and a classifier (decision tree) is induced. The words used by the tree represent significant terms that define semantics of individual clusters. The importance (weights) of the terms combined in individual tree branches are computed according to their particular meaning from the correct classification viewpoint – a certain word combined with other words may lead to different classes but a specific class can strongly prevail. The results are demonstrated using large data sets composed from many hotel-service customers’ reviews written in six different natural languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berry, M.W., Kogan, J. (eds.): Text Mining: Applications and Theory. John Wiley & Sons, Chichester (2010)
Bloedhorn, S., Blohm, S., Cimiano, P., Giesbrecht, E., Hotho, A., Lösch, U., Mödche, A., Mönch, E., Sorg, P., Staab, S., Völker, J.: Combining Data-Driven and Semantic Approaches for Text Mining. In: Foundations for the Web of Information and Services: A Review of 20 Years of Semantic Web Research, pp. 115–142. Springer, Heidelberg (2011)
c5/See5 (June 2013), http://www.rulequest.com/see5-info.html
Dařena, F., Žižka, J.: Text Mining-Based Formation of Dictionaries Expressing Opinions in Natural Languages. In: Proceedings of the 17th International Conference on Soft Computing Mendel 2011, Brno, June 15-17, pp. 374–381 (2011)
Karypis, G., Zhao, Y.: Criterion Functions for Document Clustering: Experiments and Analysis. Technical Report 01-40, University of Minnesota, USA (2001)
Karypis, G.: Cluto: A Clustering Toolkit. Technical report 02-017, University of Minnesota, USA (2003)
Qu, L., Ifrim, G., Weikum, G.: The Bag-of-Opinions Method for Review Rating Prediction from Sparse Text Patterns. In: Proceedings of the 23rd Intl. Conference on Computational Linguistics, COLING 2010, Beijing, China, August 23-27, pp. 913–921 (2010)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 1, 1–47 (2002)
Traupman, J., Wilensky, R.: Experiments in Improving Unsupervised Word Sense Disambiguation. Technical Report UCB/CSD-03-1227, February 2003, Computer Science Division (EECS), University of California, Berkeley (2003)
Žižka, J., Dařena, F.: Mining Significant Words from Customer Opinions Written in Different Natural Languages. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 211–218. Springer, Heidelberg (2011)
Žižka, J., Dařena, F.: Mining Textual Significant Expressions Reflecting Opinions in Natural Languages. In: Proc. of the 11th Intl. Conf. on Intelligent Systems Design and Applications, ISDA 2011, Córdoba, Spain, November 22-24, pp. 136–141 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Žižka, J., Dařena, F. (2013). Revealing Prevailing Semantic Contents of Clusters Generated from Untagged Freely Written Text Documents in Natural Languages. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_55
Download citation
DOI: https://doi.org/10.1007/978-3-642-40585-3_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)