skip to main content
10.1145/2479787.2479818acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Comparative study of text clustering techniques in virtual worlds

Published:12 June 2013Publication History

ABSTRACT

Virt-UAM (Virtual Worlds at Universidad Autónoma de Madrid) platform allows to design and implement virtual spaces where a set of avatars can be intensively monitored using a set of tools which can be managed by an administrator. In a virtual world, the users can move and interact between them with a high degree of freedom. The movements, interactions and any other information related to the avatars conversations can be stored. Hence this data is available for processing and analysing to obtain the user behavioural patterns. Document clustering techniques have been intensively applied to automatically organize a document corpus into clusters or similar groups. The topic detection problem can be considered as a special case of document clustering, therefore, these techniques can be used over textual chat to detect clusters from the data, and then extract the conversation topics. Mahout(TM) machine learning library is an Apache(TM) project whose main goal is to build scalable machine learning libraries. This library provides a set of algorithms for data mining and for information retrieval ready to use. This paper shows a practical application of some of these available clustering mahout algorithms, in a virtual world-based scenario. These algorithms have been applied to extract the topics based on clusters obtained from the text messages. Finally, a comparative study of these document clustering algorithms used is presented.

References

  1. H. Ahonen-Myka. Mining all maximal frequent word sequences in a set of sentences. In Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM '05, pages 255--256, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. G. Bailey. An efficient euclidean distance transform. In In Combinatorial Image Analysis, IWCIA 2004, pages 394--408, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Bellotti, R. Berta, A. De Gloria, and V. Zappi. Exploring gaming mechanisms to enhance knowledge acquisition in virtual worlds. In Proceedings of the 3rd international conference on Digital Interactive Media in Entertainment and Arts, DIMEA '08, pages 77--84, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Castronova. Synthetic Worlds: The Business and Culture of Online Games. University of Chicago Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/gather: a cluster-based approach to browsing large document collections. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '92, pages 318--329, New York, NY, USA, 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. de Freitas. Learning in Immersive worlds: A review of game-based learning. Technical report, JISC e-Learning Programme, 2006.Google ScholarGoogle Scholar
  7. W. B. Frakes and R. A. Baeza-Yates, editors. Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. A. Hartigan and M. A. Wong. A K-means clustering algorithm. Applied Statistics, 28:100--108, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  9. A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Comput. Surv., 31(3):264--323, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. J. Jung, E. You, and S.-B. Park. Emotion-based character clustering for managing story-based contents: a cinemetric analysis. Multimedia Tools Appl., 65(1):29--45, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 9th edition, Mar. 1990.Google ScholarGoogle Scholar
  12. B. Larsen and C. Aone. Fast and effective text mining using linear-time document clustering. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '99, pages 16--22, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Li, S. M. Chung, and J. D. Holt. Text document clustering based on frequent word meaning sequences. Data Knowl. Eng., 64(1):381--404, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. McCullagh and J. Yangy. How many clusters. Bayesian Analysis.Google ScholarGoogle Scholar
  16. B. A. Nardi, S. Ly, and J. Harris. Learning conversations in world of warcraft. In HICSS, page 79. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Nis. Dictionary of Algorithms and Data Structures, Aug. 2005.Google ScholarGoogle Scholar
  18. G. B. Orgaz, M. D. R-Moreno, D. Camacho, and D. F. Barrero. Clustering avatars behaviours from virtual worlds interactions. In Proceedings of the 4th International Workshop on Web Intelligence & Communities, WI&C '12, pages 4:1--4:7, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Owen, R. Anil, T. Dunning, and E. Friedman. Mahout in Action. Manning Publications, 1 edition, Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Ritzema and B. Harris. The use of second life for distance education. Journal of Computing Sciences in Colleges, 23(6), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Sachdeva, D. M. Freimuth, and C. Mueller. Evaluating the jaccard-tanimoto index on multi-core architectures. In G. Allen, J. Nabrzyski, E. Seidel, G. D. van Albada, J. Dongarra, and P. M. A. Sloot, editors, Computational Science - ICCS 2009, 9th International Conference, Baton Rouge, LA, USA, May 25--27, 2009, Proceedings, Part I, volume 5544 of Lecture Notes in Computer Science, pages 944--953. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques, 2000.Google ScholarGoogle Scholar
  23. N. Stephenson. Snow Crash. Random House Publishing Group, 2003.Google ScholarGoogle Scholar
  24. D. Talbot. Fleecing of the Avatars. http://www.technologyreview.com/business/19844/page1/, Feb. 2008.Google ScholarGoogle Scholar
  25. H. Xiong. Hyperclique pattern discovery. Data Mining and Knowledge Discovery Journal, 13:2006, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '98, pages 46--54, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Zhang and S. Chen. Fuzzy clustering using kernel method. In International Conference on Control and Automation, ICCA, pages 162--163, 2002.Google ScholarGoogle Scholar
  28. Y. Zhao and G. Karypis. Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the eleventh international conference on Information and knowledge management, CIKM '02, pages 515--524, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn., 55(3):311--331, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Comparative study of text clustering techniques in virtual worlds

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WIMS '13: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
      June 2013
      408 pages
      ISBN:9781450318501
      DOI:10.1145/2479787

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WIMS '13 Paper Acceptance Rate28of72submissions,39%Overall Acceptance Rate140of278submissions,50%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader