Revealing Prevailing Semantic Contents of Clusters Generated from Untagged Freely Written Text Documents in Natural Languages

Žižka, Jan; Dařena, František

doi:10.1007/978-3-642-40585-3_55

Jan Žižka²⁰ &
František Dařena²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

2471 Accesses
5 Citations

Abstract

The presented work deals with automatic detection of semantic contents of groups of textual documents, which are freely written in various natural languages. The large original set of untagged documents is split between a requested number of clusters according to a user’s needs. Each cluster is taken as a class and a classifier (decision tree) is induced. The words used by the tree represent significant terms that define semantics of individual clusters. The importance (weights) of the terms combined in individual tree branches are computed according to their particular meaning from the correct classification viewpoint – a certain word combined with other words may lead to different classes but a specific class can strongly prevail. The results are demonstrated using large data sets composed from many hotel-service customers’ reviews written in six different natural languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Semi-supervised sentiment clustering on natural language texts

Article Open access 03 April 2023

Constructing Language Models from Online Forms to Aid Better Document Representation for More Effective Clustering

Are Unsupervised Text Classification Techniques Sufficient for Categorizing Short Texts like Product Names?

References

Berry, M.W., Kogan, J. (eds.): Text Mining: Applications and Theory. John Wiley & Sons, Chichester (2010)
Google Scholar
Bloedhorn, S., Blohm, S., Cimiano, P., Giesbrecht, E., Hotho, A., Lösch, U., Mödche, A., Mönch, E., Sorg, P., Staab, S., Völker, J.: Combining Data-Driven and Semantic Approaches for Text Mining. In: Foundations for the Web of Information and Services: A Review of 20 Years of Semantic Web Research, pp. 115–142. Springer, Heidelberg (2011)
Chapter Google Scholar
c5/See5 (June 2013), http://www.rulequest.com/see5-info.html
Dařena, F., Žižka, J.: Text Mining-Based Formation of Dictionaries Expressing Opinions in Natural Languages. In: Proceedings of the 17th International Conference on Soft Computing Mendel 2011, Brno, June 15-17, pp. 374–381 (2011)
Google Scholar
Karypis, G., Zhao, Y.: Criterion Functions for Document Clustering: Experiments and Analysis. Technical Report 01-40, University of Minnesota, USA (2001)
Google Scholar
Karypis, G.: Cluto: A Clustering Toolkit. Technical report 02-017, University of Minnesota, USA (2003)
Google Scholar
Qu, L., Ifrim, G., Weikum, G.: The Bag-of-Opinions Method for Review Rating Prediction from Sparse Text Patterns. In: Proceedings of the 23rd Intl. Conference on Computational Linguistics, COLING 2010, Beijing, China, August 23-27, pp. 913–921 (2010)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 1, 1–47 (2002)
Article Google Scholar
Traupman, J., Wilensky, R.: Experiments in Improving Unsupervised Word Sense Disambiguation. Technical Report UCB/CSD-03-1227, February 2003, Computer Science Division (EECS), University of California, Berkeley (2003)
Google Scholar
Žižka, J., Dařena, F.: Mining Significant Words from Customer Opinions Written in Different Natural Languages. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 211–218. Springer, Heidelberg (2011)
Chapter Google Scholar
Žižka, J., Dařena, F.: Mining Textual Significant Expressions Reflecting Opinions in Natural Languages. In: Proc. of the 11th Intl. Conf. on Intelligent Systems Design and Applications, ISDA 2011, Córdoba, Spain, November 22-24, pp. 136–141 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, FBE, Mendel University in Brno, Zemědělská 1, 613 00, Brno, Czech Republic
Jan Žižka & František Dařena

Authors

Jan Žižka
View author publications
You can also search for this author in PubMed Google Scholar
František Dařena
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Žižka, J., Dařena, F. (2013). Revealing Prevailing Semantic Contents of Clusters Generated from Untagged Freely Written Text Documents in Natural Languages. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics