Abstract
Identifying the key risk factors of disease from a large amount of clinical data is a prerequisite for further scientific decision-making. In medical practice, the clinical symptom information of patients usually includes various types of data. Meanwhile, the occurrence and development of diseases are joint result of the mutual influence factors. Therefore, there is usually a correlation between attributes. In this paper, we discuss a kind of hybrid attribute feature selection problem considering the correlation between attributes. Firstly, we take the identification of disease pathogenic factors in medical decision as the background, and construct a hybrid attribute decision system. Secondly, by introducing kernel alignment, the uncertain relationship between attributes is defined. Based on this, a three-way clustering model in attribute space is established. Furthermore, a feature selection method for hybrid attribute data based on three-way clustering in attribute space is proposed. Finally, we applied the proposed model to identify the pathogenic factors of stroke and used 279 clinical random samples for simulation analysis. The results verified the applicability and validity of the model. The main contributions of this paper include two aspects. In terms of theory, by introducing kernel alignment, a three-way clustering algorithm in attribute space is established. Meanwhile, a hybrid attribute feature selection method based on three-way clustering is proposed. In terms of application, the proposed method is applied to identify risk factors of stroke.
Similar content being viewed by others
References
Afridi MK, Azam N, Yao J (2020) Variance based three-way clustering approaches for handling overlapping clustering. Int J Approx Reason 118:47–63. https://doi.org/10.1016/j.ijar.2019.11.011
Bentaiba-Lagrid MB, Bouzar-Benlabiod L, Rubin SH, Bouabana-Tebibel T, Hanini MR (2020) A case-based reasoning system for supervised classification problems in the medical field. Expert Syst Appl 150:113335
Chen H, Zhang Y, Gutman I (2016) A kernel-based clustering method for gene selection with gene expression data. J Biomed Inf 62:12–20. https://doi.org/10.1016/j.jbi.2016.05.007
Chen L, Chen D (2019) Alignment based feature selection for multi-label learning. Neural Process Lett 50(3):2323–2344. https://doi.org/10.1007/s11063-019-10009-9
Chen L, Chen D, Wang H (2019) Fuzzy kernel alignment with application to attribute reduction of heterogeneous data. IEEE Trans Fuzzy Syst 27(7):1469–1478. https://doi.org/10.1109/tfuzz.2018.2880933
Chu X, Sun B, Li X, Han K, Wu J, Zhang Y, Huang Q (2020) Neighborhood rough set-based three-way clustering considering attribute correlations: An approach to classification of potential gout groups. Inf Sci 535:28–41. https://doi.org/10.1016/j.ins.2020.05.039
Cortes C, Mohri M, Rostamizadeh A (2012) Algorithms for learning kernels based on centered alignment. J Mach Learn Res 13(1):795–828
Feng J, Jiao L, Sun T, Liu H, Zhang X (2016) Multiple kernel learning based on discriminative kernel clustering for hyperspectral band selection. IEEE Trans Geosci Remote Sens 54(11):6516–6530
Goswami S, Das AK, Chakrabarti A, Chakraborty B (2017) A feature cluster taxonomy based feature selection technique. Expert Syst Appl 79:76–89. https://doi.org/10.1016/j.eswa.2017.01.044
Gu Q, Liu S, Liu W, Chen D, Chen X (2016) A cluster-analysis-based feature-selection method for software defect prediction. Sci Sin Inf 46(9):1298–1320. https://doi.org/10.1360/n112015-00276
Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning. In Proceedings of the Seventeenth International Conference on Machine Learning pp 359–366.
Hu Q, Yu D, Pedrycz W, Chen D (2011) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 23(11):1649–1667. https://doi.org/10.1109/tkde.2010.260
Hu Q, Zhang L, Zhou Y, Pedrycz W (2018) Large-scale multimodality attribute reduction with multi-kernel fuzzy rough sets. IEEE Trans Fuzzy Syst 26(1):226–238. https://doi.org/10.1109/tfuzz.2017.2647966
Kandola J, Shawe-Taylor J, Cristianini N (2002) On the extensions of kernel alignment. Technical report 120, Department of Computer Science, University of London
Lang G, Miao D, Fujita H (2020) Three-way group conflict analysis based on Pythagorean fuzzy set theory. IEEE Trans Fuzzy Syst 28(3):447–461. https://doi.org/10.1109/tfuzz.2019.2908123
Li J, Huang C, Qi J, Qian Y, Liu W (2017) Three-way cognitive concept learning via multi-granularity. Inf Sci 378:244–263. https://doi.org/10.1016/j.ins.2016.04.051
Li Y, Lin Y, Liu J, Weng W, Shi Z, Wu S (2018) Feature selection for multi-label learning based on kernelized fuzzy rough sets. Neurocomputing 318:271–286. https://doi.org/10.1016/j.neucom.2018.08.065
Liu F, Lu J, Zhang G (2018) Unsupervised heterogeneous domain adaptation via shared fuzzy equivalence relations. IEEE Trans Fuzzy Syst 26(6):3555–3568
Liu J, Chen Y (2019) A personalized clustering-based and reliable trust-aware qos prediction approach for cloud service recommendation in cloud manufacturing. Knowl-Based Syst 174:43–56
Liu K, Yang X, Fujita H, Liu D, Yang X, Qian Y (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472. https://doi.org/10.1016/j.ins.2019.07.051
Liu X, Zhu T, Zhai L, Liu J (2017) Mass classification of benign and malignant with a new twin support vector machine joint. Int J Mach Learn Cybern 10(1):155–171. https://doi.org/10.1007/s13042-017-0706-4
Mahmoudi MR, Akbarzadeh H, Parvin H, Nejatian S, Rezaie V, Alinejad-Rokny H (2021) Consensus function based on cluster-wise two level clustering. Artif Intell Rev 54(1):639–665
Álvarez Meza AM, Castro-Ospina AE, Castellanos-Dominguez G (2016) Automatic graph pruning based on kernel alignment for spectral clustering. Pattern Recognit Lett 70:8–16
Mitsios J, Ekinci E, Mitsios G, Churilov L, Thijs V (2017) Relationship between glycated haemoglobin (hba1c) and acute stroke risk: a systematic review and meta-analysis. Int J Stroke 12:9
Moser B (2006) On representing and generating kernels by fuzzy equivalence relations. J Mach Learn Res 7:2603–2620
Cristianini N, Kandola J, Shawe-Taylor AEJ (2006) On kernel target alignment. J Innov Mach Learn 194:205–256
Pacheco F, Cerrada M, Sánchez RV, Cabrera D, Li C, Valente de Oliveira J (2017) Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Syst Appl 71:69–86. https://doi.org/10.1016/j.eswa.2016.11.024
Peralta D, Saeys Y (2020) Robust unsupervised dimensionality reduction based on feature clustering for single-cell imaging data. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106421
Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524
Sun B, Chen X, Zhang L, Ma W (2020) Three-way decision making approach to conflict analysis and resolution using probabilistic rough set over two universes. Inf Sci 507:809–822. https://doi.org/10.1016/j.ins.2019.05.080
Sun B, Ma W, Chen D (2014) Rough approximation of a fuzzy concept on a hybrid attribute information system and its uncertainty measure. Inf Sci 284:60–80. https://doi.org/10.1016/j.ins.2014.06.036
Sun B, Ma W, Zhao H (2015) An approach to emergency decision making based on decision-theoretic rough set over two universes. Soft Comput 20(9):3617–3628. https://doi.org/10.1007/s00500-015-1721-6
Wang C, Huang Y, Shao M, Fan X (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl-Based Syst 164:205–212. https://doi.org/10.1016/j.knosys.2018.10.038
Wang C, Wang Y, Shao M, Qian Y, Chen D (2020) Fuzzy rough attribute reduction for categorical data. IEEE Trans Fuzzy Syst 28(5):818–830. https://doi.org/10.1109/tfuzz.2019.2949765
Wang C, Zhu E, Liu X, Qin J, Yin J, Zhao K (2019) Multiple kernel clustering based on self-weighted local kernel alignment. Comput Mater Contin 61(1):409–421. https://doi.org/10.32604/cmc.2019.06206
Wang Q, Dou Y, Liu X, Xia F, Lv Q, Yang K (2018) Local kernel alignment based multi-view clustering using extreme learning machine. Neurocomputing 275:1099–1111
Wang T, Qiu Y, Hua J (2020) Centered kernel alignment inspired fuzzy support vector machine. Fuzzy Sets Syst 394:110–123
Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43(2):179–192. https://doi.org/10.1007/s10462-012-9369-4
Wang X, Li J (2020) New advances in three-way decision, granular computing and concept lattice. Int J Mach Learn Cybern 11(5):945–946. https://doi.org/10.1007/s13042-020-01117-3
Wu S, Wu B, Liu M, Chen Z, Wang W, Anderson CS, Sandercock P, Wang Y, Huang Y, Cui L, Pu C, Jia J, Zhang T, Liu X, Zhang S, Xie P, Fan D, Ji X, Wong KSL, Wang L, Wu S, Wu B, Liu M, Chen Z, Wang W, Anderson CS, Sandercock P, Wang Y, Huang Y, Cui L, Pu C, Jia J, Zhang T, Liu X, Zhang S, Xie P, Fan D, Ji X, Wong KSL, Wang L, Wei C, Wang Y, Cheng Y, Liu Y, Li X, Dong Q, Zeng J, Peng B, Xu Y, Yang Y, Wang Y, Zhao G, Wang W, Xu Y, Yang Q, He Z, Wang S, You C, Gao Y, Zhou D, He L, Li Z, Yang J, Lei C, Zhao Y, Liu J, Zhang S, Tao W, Hao Z, Wang D, Zhang S (2019) Stroke in china: advances and challenges in epidemiology, prevention, and management. Lancet Neurol 18(4):394–405. https://doi.org/10.1016/s1474-4422(18)30500-3
Yang B, Li J (2020) Complex network analysis of three-way decision researches. Int J Mach Learn Cybern 11(5):973–987. https://doi.org/10.1007/s13042-020-01082-x
YAO Y (2012) An outline of a theory of three-way decisions. RSCTC 2012. LNCS (LNAI) 7413:1–17
Yao Y (2021) The geometry of three-way decision. Appl Intell. https://doi.org/10.1007/s10489-020-02142-z
Yu H A framework of three-way cluster analysis. In: Proceedings of international joint conference on rough sets pp 300–312
Yu H, Chang Z, Wang G, Chen X (2019) An efficient three-way clustering algorithm based on gravitational search. Int J Mach Learn Cybern 11(5):1003–1016. https://doi.org/10.1007/s13042-019-00988-5
Yu H, Chen L, Yao J (2021) A three-way density peak clustering method based on evidence theory. Knowl-Based Syst 211:106532
Yu H, Chen Y, Lingras P, Wang G (2019) A three-way cluster ensemble approach for large-scale data. Int J Approx Reason 115:32–49. https://doi.org/10.1016/j.ijar.2019.09.001
Yu H, Jiao P, Yao Y, Wang G (2016) Detecting and refining overlapping regions in complex networks with three-way decisions. Inf Sci 373:21–41
Zeng S, Wang Z, Huang R, Chen L, Feng D (2019) A study on multi-kernel intuitionistic fuzzy c-means clustering with multiple attributes. Neurocomputing 335:59–71. https://doi.org/10.1016/j.neucom.2019.01.042
Zhang D, Chen S, Zhou ZH (2008) Constraint score: A new filter method for feature selection with pairwise constraints. Pattern Recognit 41(5):1440–1451
Zhang K (2019) A three-way c-means algorithm. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2019.105536
Zhu X, Wang Y, Li Y, Tan Y, Wang G, Song Q (2019) A new unsupervised feature selection algorithm using similarity-based feature clustering. Comput Intell 35(1):2–22. https://doi.org/10.1111/coin.12192
Acknowledgements
The work was partly supported by the National Natural Science Foundation of China (72071152, 71571090, and 61871141), the Xi’an Science and Technology Projects (XA2020-RKXYJ-0086), the Youth Innovation Team of Shaanxi Universities, the China Postdoctoral Science Foundation (2020M670046ZX), the Science and Technology Plan Project of Yulin (19-50), the Project of Shaanxi Key Laboratory of BrainDisorders (No.20NBZD02), Special Project of State Key Laboratory of Dampness Syndrome of Chinese Medicine (No.SZ2020ZZ02), the Guangzhou Key Research and Development Program (2022), the Philosophy and Social Science Planning Project of Gansu Province (No. 2021YB059).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, T., Sun, B., Jiang, C. et al. Kernel alignment-based three-way clustering on attribute space and its application in stroke risk identification. Int. J. Mach. Learn. & Cyber. 13, 1697–1711 (2022). https://doi.org/10.1007/s13042-021-01478-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01478-3