A Questionnaire Data Clustering Method Based on Optimized K-Modes Algorithm

Hou, Wen-jun; Liu, Jia-xin; Yan, Xiang-yuan

doi:10.1007/978-3-030-77772-2_22

A Questionnaire Data Clustering Method Based on Optimized K-Modes Algorithm

Wen-jun Hou^10,11,
Jia-xin Liu^10,11 &
Xiang-yuan Yan^10,11

Conference paper
First Online: 03 July 2021

2558 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12797))

Abstract

When conducting user research, questionnaires are often used to collect user characteristics, attitudes, and other information. Data cluster analysis is often used to divide user groups. The traditional clustering algorithms are mostly only suitable for numerical attributes or disordered categorical attributes. However, questionnaire data is mainly composed of disordered and ordinal categorical attributes. To solve the questionnaire data clustering problem, based on the traditional K-Modes algorithm, a method that combines subjective weighting and objective clustering for questionnaire data analysis is proposed. This method first performs multiple-choice clustering questions to reduce dimensionality and then re-weighting ordinal categorical attributes to rationalize the distance measurement. An optimized mixed K-Modes algorithm for questionnaire data clustering is proposed. The dissimilarity measure between objects according to the two types of disorder and ordinal categorical attributes. In order to evaluate the clustering results, an effective cluster validity index is also defined in this paper. Using a bank user survey questionnaire as a case proved the effectiveness of this data clustering method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Brickey, J., Walczak, S., Burgess, T.: A comparative analysis of persona clustering methods. In: Sustainable It Collaboration Around the Globe Americas Conference on Information Systems DBLP (2010)
Google Scholar
Tu, N., Dong, X., Rau, P., Zhang, T.: Using cluster analysis in persona development. In: International Conference on Supply Chain Management and Information Systems (2010)
Google Scholar
Yanlong, D., Jian, W.: Cross border e-commerce in the context of the epidemic: non market factors, logistics bottlenecks and agglomeration dynamics: based on a questionnaire analysis of 300 enterprises in “Yiwu Business Circle.” J. Mudanjiang Univ. 10, 1–6 (2020)
Google Scholar
Li, G.: Construction and research of precision learning model based on multi-model sample t-test. J. Shaoguan Univ. 41(06), 13–17 (2020)
Google Scholar
Laporte, L., Slegers, K., De Grooff, D.: Using correspondence analysis to monitor the persona segmentation process. In: Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through Design (NordiCHI 2012). Association for Computing Machinery, New York, NY, USA, pp. 265–274 (2012)
Google Scholar
Greaney, J., Riordan, M.: The use of statistically derived personas in modelling mobile user populations. In: Chittaro, L. (ed.) Human-Computer Interaction with Mobile Devices and Services. LNCS, vol. 2795, pp. 476–480. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45233-1_50
Chapter Google Scholar
Shiwei, Y., et al.: Correlation between bullying and anxiety symptoms in boarding middle school students in Anyang City. School Health in China, 1–4 (2020)
Google Scholar
Mocanu, V., Dang, J.T., Switzer, N., Madsen, K., Birch, D.W., Karmali, S.: Sex and race predict adverse outcomes following bariatric surgery: an MBSAQIP analysis. Obes. Surg. J. Metab. Surg. Allied Care 30(5), 1093–1101 (2020)
Google Scholar
HuaXu, J., Liu, H.: Web user clustering analysis based on KMeans algorithm. In: 2010 International Conference on Information, Networking and Automation (ICINA). IEEE (2010)
Google Scholar
Huang, Z.X.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998). https://doi.org/10.1023/A:1009769707641
Article Google Scholar
Cheung, Y.-M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recogn. 46(8), 2228–2238 (2013)
Article Google Scholar
Ahmad, A., Dey, L.: A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recogn. Lett. 28(1), 110–118 (2007)
Article Google Scholar
Fang, Y., Youlong, Y.: Improved distance formula of K-modes clustering algorithm for mixed categorical attribute data. Comput. Eng. Appl. 56(6), 186–193 (2020)
Google Scholar
Hou, W.J., Yan, X.Y., Liu, J.X.: A method for quickly establishing personas. In: Artificial Intelligence in HCI (2020)
Google Scholar
Liang, J., Zhao, X., Li, D., et al.: Determining the number of clusters using information entropy for mixed data. Pattern Recogn. 45(6), 2251–2265 (2012)
Article Google Scholar
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE International Conference on Data Mining (2010). https://doi.org/10.1109/icdm.2010.35
Lian-Jiang, Z., Bing-Xian, M.A., Xue-Quan, Z.: Clustering validity analysis based on silhouette coefficient. J. Comput. Appl. 30(2), 139–141 (2010)
Google Scholar
Lukasik, S., Kowalski, P.A., Charytanowicz, M., et al.: Clustering using flower pollination algorithm and Calinski-Harabasz index. In: 2016 IEEE Congress on Evolutionary Computation, Vancouver, 24–29 July 2016, pp. 2724–2728. IEEE (2016)
Google Scholar
Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recogn. Lett. 29(6), 773–786 (2008)
Article Google Scholar
Steinley, D.: Properties of the Hubert-Arable adjusted rand index. Psychol. Methods 9(3), 386–396 (2004)
Article Google Scholar
Cooper, A.: The Inmates Are Running the Asylum: Why High- Tech Products Drive Us Crazy and How to Restore the Sanity. Sams - Pearson Education, Indianapolis (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Digital Media and Design Arts, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Wen-jun Hou, Jia-xin Liu & Xiang-yuan Yan
Beijing Key Laboratory of Network Systems and Network Culture, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Wen-jun Hou, Jia-xin Liu & Xiang-yuan Yan

Authors

Wen-jun Hou
View author publications
You can also search for this author in PubMed Google Scholar
Jia-xin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiang-yuan Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia-xin Liu .

Editor information

Editors and Affiliations

Siemens Corporation, Princeton, NJ, USA
Helmut Degen
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Greece
Stavroula Ntoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hou, Wj., Liu, Jx., Yan, Xy. (2021). A Questionnaire Data Clustering Method Based on Optimized K-Modes Algorithm. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2021. Lecture Notes in Computer Science(), vol 12797. Springer, Cham. https://doi.org/10.1007/978-3-030-77772-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-77772-2_22
Published: 03 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77771-5
Online ISBN: 978-3-030-77772-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics