research-article

Comparative study of text clustering techniques in virtual worlds

Authors:
Gema Bello-Orgaz

Universidad Autónoma de Madrid, Madrid, Spain

Universidad Autónoma de Madrid, Madrid, Spain
View Profile

,
David Camacho

Universidad Autónoma de Madrid, Madrid, Spain

Universidad Autónoma de Madrid, Madrid, Spain
View Profile

WIMS '13: Proceedings of the 3rd International Conference on Web Intelligence, Mining and SemanticsJune 2013Article No.: 9Pages 1–8https://doi.org/10.1145/2479787.2479818

Published:12 June 2013Publication History

WIMS '13: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics

Pages 1–8

ABSTRACT

Virt-UAM (Virtual Worlds at Universidad Autónoma de Madrid) platform allows to design and implement virtual spaces where a set of avatars can be intensively monitored using a set of tools which can be managed by an administrator. In a virtual world, the users can move and interact between them with a high degree of freedom. The movements, interactions and any other information related to the avatars conversations can be stored. Hence this data is available for processing and analysing to obtain the user behavioural patterns. Document clustering techniques have been intensively applied to automatically organize a document corpus into clusters or similar groups. The topic detection problem can be considered as a special case of document clustering, therefore, these techniques can be used over textual chat to detect clusters from the data, and then extract the conversation topics. Mahout(TM) machine learning library is an Apache(TM) project whose main goal is to build scalable machine learning libraries. This library provides a set of algorithms for data mining and for information retrieval ready to use. This paper shows a practical application of some of these available clustering mahout algorithms, in a virtual world-based scenario. These algorithms have been applied to extract the topics based on clusters obtained from the text messages. Finally, a comparative study of these document clustering algorithms used is presented.

References

H. Ahonen-Myka. Mining all maximal frequent word sequences in a set of sentences. In Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM '05, pages 255--256, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
D. G. Bailey. An efficient euclidean distance transform. In In Combinatorial Image Analysis, IWCIA 2004, pages 394--408, 2004. Google ScholarDigital Library
F. Bellotti, R. Berta, A. De Gloria, and V. Zappi. Exploring gaming mechanisms to enhance knowledge acquisition in virtual worlds. In Proceedings of the 3rd international conference on Digital Interactive Media in Entertainment and Arts, DIMEA '08, pages 77--84, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
E. Castronova. Synthetic Worlds: The Business and Culture of Online Games. University of Chicago Press, 2008. Google ScholarDigital Library
D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/gather: a cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '92, pages 318--329, New York, NY, USA, 1992. ACM. Google ScholarDigital Library
S. de Freitas. Learning in Immersive worlds: A review of game-based learning. Technical report, JISC e-Learning Programme, 2006.Google Scholar
W. B. Frakes and R. A. Baeza-Yates, editors. Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992. Google ScholarDigital Library
J. A. Hartigan and M. A. Wong. A K-means clustering algorithm. Applied Statistics, 28:100--108, 1979.Google ScholarCross Ref
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31(3):264--323, Sept. 1999. Google ScholarDigital Library
J. J. Jung, E. You, and S.-B. Park. Emotion-based character clustering for managing story-based contents: a cinemetric analysis. Multimedia Tools Appl., 65(1):29--45, 2013. Google ScholarDigital Library
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 9th edition, Mar. 1990.Google Scholar
B. Larsen and C. Aone. Fast and effective text mining using linear-time document clustering. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '99, pages 16--22, New York, NY, USA, 1999. ACM. Google ScholarDigital Library
Y. Li, S. M. Chung, and J. D. Holt. Text document clustering based on frequent word meaning sequences. Data Knowl. Eng., 64(1):381--404, Jan. 2008. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarDigital Library
P. McCullagh and J. Yangy. How many clusters. Bayesian Analysis.Google Scholar
B. A. Nardi, S. Ly, and J. Harris. Learning conversations in world of warcraft. In HICSS, page 79. IEEE Computer Society, 2007. Google ScholarDigital Library
T. Nis. Dictionary of Algorithms and Data Structures, Aug. 2005.Google Scholar
G. B. Orgaz, M. D. R-Moreno, D. Camacho, and D. F. Barrero. Clustering avatars behaviours from virtual worlds interactions. In Proceedings of the 4th International Workshop on Web Intelligence & Communities, WI&C '12, pages 4:1--4:7, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
S. Owen, R. Anil, T. Dunning, and E. Friedman. Mahout in Action. Manning Publications, 1 edition, Jan. 2011. Google ScholarDigital Library
T. Ritzema and B. Harris. The use of second life for distance education. Journal of Computing Sciences in Colleges, 23(6), June 2008. Google ScholarDigital Library
V. Sachdeva, D. M. Freimuth, and C. Mueller. Evaluating the jaccard-tanimoto index on multi-core architectures. In G. Allen, J. Nabrzyski, E. Seidel, G. D. van Albada, J. Dongarra, and P. M. A. Sloot, editors, Computational Science - ICCS 2009, 9th International Conference, Baton Rouge, LA, USA, May 25--27, 2009, Proceedings, Part I, volume 5544 of Lecture Notes in Computer Science, pages 944--953. Springer, 2009. Google ScholarDigital Library
M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques, 2000.Google Scholar
N. Stephenson. Snow Crash. Random House Publishing Group, 2003.Google Scholar
D. Talbot. Fleecing of the Avatars. http://www.technologyreview.com/business/19844/page1/, Feb. 2008.Google Scholar
H. Xiong. Hyperclique pattern discovery. Data Mining and Knowledge Discovery Journal, 13:2006, 2006. Google ScholarDigital Library
O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '98, pages 46--54, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
D. Zhang and S. Chen. Fuzzy clustering using kernel method. In International Conference on Control and Automation, ICCA, pages 162--163, 2002.Google Scholar
Y. Zhao and G. Karypis. Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the eleventh international conference on Information and knowledge management, CIKM '02, pages 515--524, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn., 55(3):311--331, June 2004. Google ScholarDigital Library

Index Terms

Comparative study of text clustering techniques in virtual worlds
1. Information systems
  1. Information systems applications

Recommendations

A novel incremental conceptual hierarchical text clustering method using CFu-tree

This paper presents a novel down-top incremental conceptual hierarchical text clustering approach using CFu-tree (ICHTC-CF) representation.For summarizing a cluster, we use the term-based feature extraction in text clustering.A new measure criterion, ...
Read More
A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In this paper, we propose a text clustering algorithm using an online clustering scheme for initialization called FGSDMM+. FGSDMM+ assumes that there are at most K_max clusters in the corpus, and regards these K_max potential clusters as one large ...
Read More
Survey of Clustering: Algorithms and Applications

This article is a survey into clustering applications and algorithms. A number of important well-known clustering methods are discussed. The authors present a brief history of the development of the field of clustering, discuss various types of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WIMS '13: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
June 2013
408 pages
ISBN:9781450318501
DOI:10.1145/2479787
Conference Chair:
David Camacho
Autonomous University of Madrid, Spain
,
Program Chairs:
Rajendra Akerkar
Western Norway Research Institute, Norway
,
Maria D. Rodriguez Moreno
University of Alcalá, Spain
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
avatar behaviours
clustering algorithms
distance measures
mahout library
text clustering
Qualifiers
- research-article
Conference

Acceptance Rates
WIMS '13 Paper Acceptance Rate28of72submissions,39%Overall Acceptance Rate140of278submissions,50%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 219
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Comparative study of text clustering techniques in virtual worlds

WIMS '13: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel incremental conceptual hierarchical text clustering method using CFu-tree

A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization

Survey of Clustering: Algorithms and Applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Comparative study of text clustering techniques in virtual worlds

WIMS '13: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel incremental conceptual hierarchical text clustering method using CFu-tree

A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization

Survey of Clustering: Algorithms and Applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media