Skip to main content

A Questionnaire Data Clustering Method Based on Optimized K-Modes Algorithm

  • Conference paper
  • First Online:
  • 2558 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12797))

Abstract

When conducting user research, questionnaires are often used to collect user characteristics, attitudes, and other information. Data cluster analysis is often used to divide user groups. The traditional clustering algorithms are mostly only suitable for numerical attributes or disordered categorical attributes. However, questionnaire data is mainly composed of disordered and ordinal categorical attributes. To solve the questionnaire data clustering problem, based on the traditional K-Modes algorithm, a method that combines subjective weighting and objective clustering for questionnaire data analysis is proposed. This method first performs multiple-choice clustering questions to reduce dimensionality and then re-weighting ordinal categorical attributes to rationalize the distance measurement. An optimized mixed K-Modes algorithm for questionnaire data clustering is proposed. The dissimilarity measure between objects according to the two types of disorder and ordinal categorical attributes. In order to evaluate the clustering results, an effective cluster validity index is also defined in this paper. Using a bank user survey questionnaire as a case proved the effectiveness of this data clustering method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Brickey, J., Walczak, S., Burgess, T.: A comparative analysis of persona clustering methods. In: Sustainable It Collaboration Around the Globe Americas Conference on Information Systems DBLP (2010)

    Google Scholar 

  2. Tu, N., Dong, X., Rau, P., Zhang, T.: Using cluster analysis in persona development. In: International Conference on Supply Chain Management and Information Systems (2010)

    Google Scholar 

  3. Yanlong, D., Jian, W.: Cross border e-commerce in the context of the epidemic: non market factors, logistics bottlenecks and agglomeration dynamics: based on a questionnaire analysis of 300 enterprises in “Yiwu Business Circle.” J. Mudanjiang Univ. 10, 1–6 (2020)

    Google Scholar 

  4. Li, G.: Construction and research of precision learning model based on multi-model sample t-test. J. Shaoguan Univ. 41(06), 13–17 (2020)

    Google Scholar 

  5. Laporte, L., Slegers, K., De Grooff, D.: Using correspondence analysis to monitor the persona segmentation process. In: Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through Design (NordiCHI 2012). Association for Computing Machinery, New York, NY, USA, pp. 265–274 (2012)

    Google Scholar 

  6. Greaney, J., Riordan, M.: The use of statistically derived personas in modelling mobile user populations. In: Chittaro, L. (ed.) Human-Computer Interaction with Mobile Devices and Services. LNCS, vol. 2795, pp. 476–480. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45233-1_50

    Chapter  Google Scholar 

  7. Shiwei, Y., et al.: Correlation between bullying and anxiety symptoms in boarding middle school students in Anyang City. School Health in China, 1–4 (2020)

    Google Scholar 

  8. Mocanu, V., Dang, J.T., Switzer, N., Madsen, K., Birch, D.W., Karmali, S.: Sex and race predict adverse outcomes following bariatric surgery: an MBSAQIP analysis. Obes. Surg. J. Metab. Surg. Allied Care 30(5), 1093–1101 (2020)

    Google Scholar 

  9. HuaXu, J., Liu, H.: Web user clustering analysis based on KMeans algorithm. In: 2010 International Conference on Information, Networking and Automation (ICINA). IEEE (2010)

    Google Scholar 

  10. Huang, Z.X.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998). https://doi.org/10.1023/A:1009769707641

    Article  Google Scholar 

  11. Cheung, Y.-M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recogn. 46(8), 2228–2238 (2013)

    Article  Google Scholar 

  12. Ahmad, A., Dey, L.: A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recogn. Lett. 28(1), 110–118 (2007)

    Article  Google Scholar 

  13. Fang, Y., Youlong, Y.: Improved distance formula of K-modes clustering algorithm for mixed categorical attribute data. Comput. Eng. Appl. 56(6), 186–193 (2020)

    Google Scholar 

  14. Hou, W.J., Yan, X.Y., Liu, J.X.: A method for quickly establishing personas. In: Artificial Intelligence in HCI (2020)

    Google Scholar 

  15. Liang, J., Zhao, X., Li, D., et al.: Determining the number of clusters using information entropy for mixed data. Pattern Recogn. 45(6), 2251–2265 (2012)

    Article  Google Scholar 

  16. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE International Conference on Data Mining (2010). https://doi.org/10.1109/icdm.2010.35

  17. Lian-Jiang, Z., Bing-Xian, M.A., Xue-Quan, Z.: Clustering validity analysis based on silhouette coefficient. J. Comput. Appl. 30(2), 139–141 (2010)

    Google Scholar 

  18. Lukasik, S., Kowalski, P.A., Charytanowicz, M., et al.: Clustering using flower pollination algorithm and Calinski-Harabasz index. In: 2016 IEEE Congress on Evolutionary Computation, Vancouver, 24–29 July 2016, pp. 2724–2728. IEEE (2016)

    Google Scholar 

  19. Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recogn. Lett. 29(6), 773–786 (2008)

    Article  Google Scholar 

  20. Steinley, D.: Properties of the Hubert-Arable adjusted rand index. Psychol. Methods 9(3), 386–396 (2004)

    Article  Google Scholar 

  21. Cooper, A.: The Inmates Are Running the Asylum: Why High- Tech Products Drive Us Crazy and How to Restore the Sanity. Sams - Pearson Education, Indianapolis (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia-xin Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hou, Wj., Liu, Jx., Yan, Xy. (2021). A Questionnaire Data Clustering Method Based on Optimized K-Modes Algorithm. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2021. Lecture Notes in Computer Science(), vol 12797. Springer, Cham. https://doi.org/10.1007/978-3-030-77772-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77772-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77771-5

  • Online ISBN: 978-3-030-77772-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics