ABSTRACT
Virt-UAM (Virtual Worlds at Universidad Autónoma de Madrid) platform allows to design and implement virtual spaces where a set of avatars can be intensively monitored using a set of tools which can be managed by an administrator. In a virtual world, the users can move and interact between them with a high degree of freedom. The movements, interactions and any other information related to the avatars conversations can be stored. Hence this data is available for processing and analysing to obtain the user behavioural patterns. Document clustering techniques have been intensively applied to automatically organize a document corpus into clusters or similar groups. The topic detection problem can be considered as a special case of document clustering, therefore, these techniques can be used over textual chat to detect clusters from the data, and then extract the conversation topics. Mahout(TM) machine learning library is an Apache(TM) project whose main goal is to build scalable machine learning libraries. This library provides a set of algorithms for data mining and for information retrieval ready to use. This paper shows a practical application of some of these available clustering mahout algorithms, in a virtual world-based scenario. These algorithms have been applied to extract the topics based on clusters obtained from the text messages. Finally, a comparative study of these document clustering algorithms used is presented.
- H. Ahonen-Myka. Mining all maximal frequent word sequences in a set of sentences. In Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM '05, pages 255--256, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- D. G. Bailey. An efficient euclidean distance transform. In In Combinatorial Image Analysis, IWCIA 2004, pages 394--408, 2004. Google ScholarDigital Library
- F. Bellotti, R. Berta, A. De Gloria, and V. Zappi. Exploring gaming mechanisms to enhance knowledge acquisition in virtual worlds. In Proceedings of the 3rd international conference on Digital Interactive Media in Entertainment and Arts, DIMEA '08, pages 77--84, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- E. Castronova. Synthetic Worlds: The Business and Culture of Online Games. University of Chicago Press, 2008. Google ScholarDigital Library
- D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/gather: a cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '92, pages 318--329, New York, NY, USA, 1992. ACM. Google ScholarDigital Library
- S. de Freitas. Learning in Immersive worlds: A review of game-based learning. Technical report, JISC e-Learning Programme, 2006.Google Scholar
- W. B. Frakes and R. A. Baeza-Yates, editors. Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992. Google ScholarDigital Library
- J. A. Hartigan and M. A. Wong. A K-means clustering algorithm. Applied Statistics, 28:100--108, 1979.Google ScholarCross Ref
- A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31(3):264--323, Sept. 1999. Google ScholarDigital Library
- J. J. Jung, E. You, and S.-B. Park. Emotion-based character clustering for managing story-based contents: a cinemetric analysis. Multimedia Tools Appl., 65(1):29--45, 2013. Google ScholarDigital Library
- L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 9th edition, Mar. 1990.Google Scholar
- B. Larsen and C. Aone. Fast and effective text mining using linear-time document clustering. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '99, pages 16--22, New York, NY, USA, 1999. ACM. Google ScholarDigital Library
- Y. Li, S. M. Chung, and J. D. Holt. Text document clustering based on frequent word meaning sequences. Data Knowl. Eng., 64(1):381--404, Jan. 2008. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarDigital Library
- P. McCullagh and J. Yangy. How many clusters. Bayesian Analysis.Google Scholar
- B. A. Nardi, S. Ly, and J. Harris. Learning conversations in world of warcraft. In HICSS, page 79. IEEE Computer Society, 2007. Google ScholarDigital Library
- T. Nis. Dictionary of Algorithms and Data Structures, Aug. 2005.Google Scholar
- G. B. Orgaz, M. D. R-Moreno, D. Camacho, and D. F. Barrero. Clustering avatars behaviours from virtual worlds interactions. In Proceedings of the 4th International Workshop on Web Intelligence & Communities, WI&C '12, pages 4:1--4:7, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- S. Owen, R. Anil, T. Dunning, and E. Friedman. Mahout in Action. Manning Publications, 1 edition, Jan. 2011. Google ScholarDigital Library
- T. Ritzema and B. Harris. The use of second life for distance education. Journal of Computing Sciences in Colleges, 23(6), June 2008. Google ScholarDigital Library
- V. Sachdeva, D. M. Freimuth, and C. Mueller. Evaluating the jaccard-tanimoto index on multi-core architectures. In G. Allen, J. Nabrzyski, E. Seidel, G. D. van Albada, J. Dongarra, and P. M. A. Sloot, editors, Computational Science - ICCS 2009, 9th International Conference, Baton Rouge, LA, USA, May 25--27, 2009, Proceedings, Part I, volume 5544 of Lecture Notes in Computer Science, pages 944--953. Springer, 2009. Google ScholarDigital Library
- M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques, 2000.Google Scholar
- N. Stephenson. Snow Crash. Random House Publishing Group, 2003.Google Scholar
- D. Talbot. Fleecing of the Avatars. http://www.technologyreview.com/business/19844/page1/, Feb. 2008.Google Scholar
- H. Xiong. Hyperclique pattern discovery. Data Mining and Knowledge Discovery Journal, 13:2006, 2006. Google ScholarDigital Library
- O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '98, pages 46--54, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- D. Zhang and S. Chen. Fuzzy clustering using kernel method. In International Conference on Control and Automation, ICCA, pages 162--163, 2002.Google Scholar
- Y. Zhao and G. Karypis. Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the eleventh international conference on Information and knowledge management, CIKM '02, pages 515--524, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn., 55(3):311--331, June 2004. Google ScholarDigital Library
Index Terms
- Comparative study of text clustering techniques in virtual worlds
Recommendations
A novel incremental conceptual hierarchical text clustering method using CFu-tree
This paper presents a novel down-top incremental conceptual hierarchical text clustering approach using CFu-tree (ICHTC-CF) representation.For summarizing a cluster, we use the term-based feature extraction in text clustering.A new measure criterion, ...
A Text Clustering Algorithm Using an Online Clustering Scheme for Initialization
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn this paper, we propose a text clustering algorithm using an online clustering scheme for initialization called FGSDMM+. FGSDMM+ assumes that there are at most Kmax clusters in the corpus, and regards these Kmax potential clusters as one large ...
Survey of Clustering: Algorithms and Applications
This article is a survey into clustering applications and algorithms. A number of important well-known clustering methods are discussed. The authors present a brief history of the development of the field of clustering, discuss various types of ...
Comments