Abstract
The differential feature recognition algorithm of breast cancer patients is presented in this paper based on minimum spanning tree (MST) and F-statistics. The algorithm uses the minimum spanning tree clustering algorithm to cluster features of breast cancer data and the F-statistics to determine the proper number of feature clusters. Features most relevant to class labels are selected from each feature cluster to comprise the differential features. After that, samples with recognized features are clustered via MST clustering algorithm. The validity of our algorithm is evaluated by its clustering accuracy on breast cancer dataset of WDBC. In the experiments, correlations between features and class labels and similarities between features are measured by the cosine similarity and Pearson correlation coefficient. Similarities between samples are measured by the cosine similarity, the Euclidean distance and the Pearson correlation coefficient. Experimental results show that the highest clustering accuracy can be got when the cosine similarity is used to measure correlations between features and class labels and similarities between features while the Euclidean distance is used to measure similarities between samples. The recognized features are: mean radius, mean fractal dimension and standard error of fractal dimension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jiaqing, Z., Shu, W., Xinming, Q.: The present situation and version of breast cancer. Chin. J. Surg. 40(3), 161 (2002)
Magendiran, N., Jayaranjani, J.: An efficient fast clustering-based feature subset selection algorithm for high-dimensional data. Int. J. Innov. Res. Sci. Eng. Technol. 3(1), 405–408 (2014)
Yan, W., Wu, W.: Data Structure in C, pp. 173–176. Tsinghua University Press, Beijing (2007)
Xie, J., Liu, C.: Fuzzy Mathematics Method and its Application, 2nd edn. Huazhong University of Science & Technology Press, Wuhan (2000)
Xinbo, G., Jie, L., Dacheng, T., et al.: Fuzziness measurement of fuzzy sets and its application in cluster validity analysis. Int. J. Fuzzy Syst. 9(4), 188–197 (2007)
Huang, Z., Michael, K.Ng.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 4(7), 446–452 (1999)
Xie, J., Zhou, Y.: A new criterion for clustering algorithm. J. Shaanxi Norm. Univ. (Nat. Sci. Ed.) 43(6), 1–8 (2015)
Tan, P.N., Steinbach, M., Kumar, V.: An introduction to data mining, pp. 65–83. China Machine Press, Beijing (2010)
UCI Machine Learning Repository [DB/OL], 24 March 2016. http://mlr.cs.umass.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
Li, W., Xianzhong, Z., Jie, S.: An improved rough k-means clustering algorithm. Control Decis. 27(11), 1711–1719 (2012)
Jiyu, L., Qiang, W., Hao, S., Lvyun, Z.: Weighted KNN data classification algorithm based on rough set. Comput. Sci. 42(10), 281–286 (2015)
Fan, M., Li, Z., Shi, X.: A clustering algorithm based on local center object. Comput. Eng. Sci. 36(9), 1611–1616 (2014)
Qing, M., Juanying, X.: New k-medoids clustering algorithm based on granular computing. J. Comput. Appl. 32(7), 1973–1977 (2012)
Acknowledgements
We are much obliged to those who share the datasets in the machine learning repository of UCI. This work is supported in part by the National Natural Science Foundation of China under Grant No. 61673251, is also supported by the Key Science and Technology Program of Shaanxi Province of China under Grant No. 2013K12-03-24, and is at the same time supported by the Fundamental Research Funds for the Central Universities under Grant No. GK201503067 and 2016CSY009, and by the Innovation Funds of Graduate Programs at Shaanxi Normal University under Grant No. 2015CXS028.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Xie, J., Li, Y., Zhou, Y., Wang, M. (2016). Differential Feature Recognition of Breast Cancer Patients Based on Minimum Spanning Tree Clustering and F-statistics. In: Yin, X., Geller, J., Li, Y., Zhou, R., Wang, H., Zhang, Y. (eds) Health Information Science. HIS 2016. Lecture Notes in Computer Science(), vol 10038. Springer, Cham. https://doi.org/10.1007/978-3-319-48335-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-48335-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48334-4
Online ISBN: 978-3-319-48335-1
eBook Packages: Computer ScienceComputer Science (R0)