Abstract
When conducting user research, questionnaires are often used to collect user characteristics, attitudes, and other information. Data cluster analysis is often used to divide user groups. The traditional clustering algorithms are mostly only suitable for numerical attributes or disordered categorical attributes. However, questionnaire data is mainly composed of disordered and ordinal categorical attributes. To solve the questionnaire data clustering problem, based on the traditional K-Modes algorithm, a method that combines subjective weighting and objective clustering for questionnaire data analysis is proposed. This method first performs multiple-choice clustering questions to reduce dimensionality and then re-weighting ordinal categorical attributes to rationalize the distance measurement. An optimized mixed K-Modes algorithm for questionnaire data clustering is proposed. The dissimilarity measure between objects according to the two types of disorder and ordinal categorical attributes. In order to evaluate the clustering results, an effective cluster validity index is also defined in this paper. Using a bank user survey questionnaire as a case proved the effectiveness of this data clustering method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Brickey, J., Walczak, S., Burgess, T.: A comparative analysis of persona clustering methods. In: Sustainable It Collaboration Around the Globe Americas Conference on Information Systems DBLP (2010)
Tu, N., Dong, X., Rau, P., Zhang, T.: Using cluster analysis in persona development. In: International Conference on Supply Chain Management and Information Systems (2010)
Yanlong, D., Jian, W.: Cross border e-commerce in the context of the epidemic: non market factors, logistics bottlenecks and agglomeration dynamics: based on a questionnaire analysis of 300 enterprises in “Yiwu Business Circle.” J. Mudanjiang Univ. 10, 1–6 (2020)
Li, G.: Construction and research of precision learning model based on multi-model sample t-test. J. Shaoguan Univ. 41(06), 13–17 (2020)
Laporte, L., Slegers, K., De Grooff, D.: Using correspondence analysis to monitor the persona segmentation process. In: Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through Design (NordiCHI 2012). Association for Computing Machinery, New York, NY, USA, pp. 265–274 (2012)
Greaney, J., Riordan, M.: The use of statistically derived personas in modelling mobile user populations. In: Chittaro, L. (ed.) Human-Computer Interaction with Mobile Devices and Services. LNCS, vol. 2795, pp. 476–480. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45233-1_50
Shiwei, Y., et al.: Correlation between bullying and anxiety symptoms in boarding middle school students in Anyang City. School Health in China, 1–4 (2020)
Mocanu, V., Dang, J.T., Switzer, N., Madsen, K., Birch, D.W., Karmali, S.: Sex and race predict adverse outcomes following bariatric surgery: an MBSAQIP analysis. Obes. Surg. J. Metab. Surg. Allied Care 30(5), 1093–1101 (2020)
HuaXu, J., Liu, H.: Web user clustering analysis based on KMeans algorithm. In: 2010 International Conference on Information, Networking and Automation (ICINA). IEEE (2010)
Huang, Z.X.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998). https://doi.org/10.1023/A:1009769707641
Cheung, Y.-M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recogn. 46(8), 2228–2238 (2013)
Ahmad, A., Dey, L.: A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recogn. Lett. 28(1), 110–118 (2007)
Fang, Y., Youlong, Y.: Improved distance formula of K-modes clustering algorithm for mixed categorical attribute data. Comput. Eng. Appl. 56(6), 186–193 (2020)
Hou, W.J., Yan, X.Y., Liu, J.X.: A method for quickly establishing personas. In: Artificial Intelligence in HCI (2020)
Liang, J., Zhao, X., Li, D., et al.: Determining the number of clusters using information entropy for mixed data. Pattern Recogn. 45(6), 2251–2265 (2012)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE International Conference on Data Mining (2010). https://doi.org/10.1109/icdm.2010.35
Lian-Jiang, Z., Bing-Xian, M.A., Xue-Quan, Z.: Clustering validity analysis based on silhouette coefficient. J. Comput. Appl. 30(2), 139–141 (2010)
Lukasik, S., Kowalski, P.A., Charytanowicz, M., et al.: Clustering using flower pollination algorithm and Calinski-Harabasz index. In: 2016 IEEE Congress on Evolutionary Computation, Vancouver, 24–29 July 2016, pp. 2724–2728. IEEE (2016)
Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recogn. Lett. 29(6), 773–786 (2008)
Steinley, D.: Properties of the Hubert-Arable adjusted rand index. Psychol. Methods 9(3), 386–396 (2004)
Cooper, A.: The Inmates Are Running the Asylum: Why High- Tech Products Drive Us Crazy and How to Restore the Sanity. Sams - Pearson Education, Indianapolis (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hou, Wj., Liu, Jx., Yan, Xy. (2021). A Questionnaire Data Clustering Method Based on Optimized K-Modes Algorithm. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2021. Lecture Notes in Computer Science(), vol 12797. Springer, Cham. https://doi.org/10.1007/978-3-030-77772-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-77772-2_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77771-5
Online ISBN: 978-3-030-77772-2
eBook Packages: Computer ScienceComputer Science (R0)