skip to main content
10.1145/2792745.2792755acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

Grouping game players using parallelized k-means on supercomputers

Published: 26 July 2015 Publication History

Abstract

Grouping game players based on their online behaviors has attracted a lot of attention recently. However, due to the huge volume and extreme complexity in online game data collections, grouping players is a challenging task. This study has applied parallelized K-Means on Gordon, a supercomputer hosted at San Diego Supercomputer Center, to meet the computational challenge on this task. By using the parallelization functions supported by R, this study was able to cluster 120,000 game players into eight non-overlapping groups and speed up the clustering process by one to four times under the two- to eight-degree of parallelization. This study has systematically examined a number of factors which may affect the quality of the clusters and/or the performance of the clustering processes; those factors include degree of parallelism, number of clusters, data dimensions, and variable combinations. This study invented a method to identify the optimal clustering schema, which can choose the most discriminative features and create an appropriate number of clusters in K-Means clustering. Besides demonstrating the effectiveness of parallelized K-Means in grouping game players, this study also highlights some lessons learned for using K-Means on very large datasets and some experience on applying parallel processing techniques in intensive data analysis.

References

[1]
D. Ramirez-Cano, S. Colton, R. Baumgarten. Player Classification Using a Meta-Clustering Approach. In Proceeding of CGAT Conference, April 2010, Singapore.
[2]
D. Kerr, G. K. W. K. Chung. (2012). Identifying key features of student performance in educational video games and simulations through cluster analysis. Journal of Educational Data Mining, 4, 144--182.
[3]
A. Drachen, C. Bauckhage and R. Sifa. Introducing Clustering I: Behavioral Profiling for Game Analytics. Game Analytics Resources. May 2014.
[4]
R. Berkhin, R. A survey of clustering data mining techniques. In J. Kogan, C. Nicholas, & M. Teboulle (Eds.), Grouping multidimensional data (pp. 25--72). Springer, 2006
[5]
J. Han, M. Kamber and J. Pei. Data Mining, Concepts and Techniques, 3rd ed., Morgan Kaufman, 2011.
[6]
C. Aggarwal and C. K. Reddy (eds.). Data Clustering: Algorithms and Applications. CRC Press, 2014
[7]
P. Xing, C. Kulikowski, I. Muchnik, I. Dubchak, D. M. Wolf, S. Spengler, M. Zorn. Analysis of ribosomal RNA sequences by combinatorial clustering. In Proceeding of International Conference on Intelligent System Molecular Biology. 1999:287--96
[8]
S. Reddy, A. Parker, J. Hyman, J. Burke, D. Estrin, M. Hansen. Image Browsing, Processing, and Clustering for Participatory Sensing: Lessons from a DietSense Prototype. In Proceeding of EmNets'07, Cork, Ireland.
[9]
K. Voges, N. Pope. A rough Cluster analysis of Shopping Orientation data. In Proceeding of ANZMAC 2003. Adelaide, Australia.
[10]
M. Mahajan, P. Nimbhorkar, and K. Varadarajan. The Planar k-Means Problem is NP-Nard. In Proceeding of WALCOM2009, pp. 274--285, 2009
[11]
G. K. Lockwood. Parallel Options for R. http://www.gennklockwood.com/di/R-para.php
[12]
D. Eddelbuettel. CRAN Task View: High-Performance and Parallel Computing with R. http://cran.r-project.org/web/views/HighPerformanceComputing.html
[13]
Y. D. Cai, B. C. Riedl, R. Ratan, C. Shen, and A. Picot. "FeatureSelector: an XSEDE-Enabled Tool for Massive Game Log Analysis". In Proceeding of XSEDE14, July, 2014, Atlanta, GA
[14]
S. Amershi, C. Conati, & H. Maclaren, H. Using feature selection and unsupervised clustering to identify affective expressions in educational games. In G. Rebolledo-Mendez & E. Martinez-Miron (Eds.), Proceedings of the Workshop on Motivational and Affective Issues at the 8th International Conference on Intelligent Tutoring Systems (pp. 21--28). Berlin, Heidelberg: Springer-Verlag.
[15]
G. K. W. K. & D. Kerr. (2012). A primer on data logging to support extraction of meaningful information from educational games: An example from Save Patch (CRESST Report 814). Los Angeles, CA, University of California, National Center for Research on Evaluation, Standards, and Student Testing.

Cited By

View all
  • (2018)Large-scale hierarchical k-means for heterogeneous many-core supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291674(1-11)Online publication date: 11-Nov-2018
  • (2018)Large-scale hierarchical k-means for heterogeneous many-core supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00016(1-11)Online publication date: 11-Nov-2018
  • (2016)An Overview of the XSEDE Extended Collaborative Support ProgramHigh Performance Computer Applications10.1007/978-3-319-32243-8_1(3-13)Online publication date: 8-Apr-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
XSEDE '15: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure
July 2015
296 pages
ISBN:9781450337205
DOI:10.1145/2792745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • San Diego Super Computing Ctr: San Diego Super Computing Ctr
  • HPCWire: HPCWire
  • Omnibond: Omnibond Systems, LLC
  • SGI
  • Internet2
  • Indiana University: Indiana University
  • CASC: The Coalition for Academic Scientific Computation
  • NICS: National Institute for Computational Sciences
  • Intel: Intel
  • DDN: DataDirect Networks, Inc
  • DELL
  • CORSA: CORSA Technology
  • ALLINEA: Allinea Software
  • Cray
  • RENCI: Renaissance Computing Institute

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster analysis
  2. k-means
  3. parallel processing
  4. performance evaluation

Qualifiers

  • Research-article

Funding Sources

Conference

XSEDE '15
Sponsor:
  • San Diego Super Computing Ctr
  • HPCWire
  • Omnibond
  • Indiana University
  • CASC
  • NICS
  • Intel
  • DDN
  • CORSA
  • ALLINEA
  • RENCI

Acceptance Rates

XSEDE '15 Paper Acceptance Rate 49 of 70 submissions, 70%;
Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Large-scale hierarchical k-means for heterogeneous many-core supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291674(1-11)Online publication date: 11-Nov-2018
  • (2018)Large-scale hierarchical k-means for heterogeneous many-core supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00016(1-11)Online publication date: 11-Nov-2018
  • (2016)An Overview of the XSEDE Extended Collaborative Support ProgramHigh Performance Computer Applications10.1007/978-3-319-32243-8_1(3-13)Online publication date: 8-Apr-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media