Abstract
In this study a novel framework for data mining in clinical decision making have been proposed. Our framework addresses the problems of assessing and utilizing data mining models in medical domain. The framework consists of three stages. The first stage involves preprocessing of the data to improve its quality. The second stage employs k-means clustering algorithm to cluster the data into k clusters (in our case, k=2 i.e. cluster0 / no, cluster1 / yes) for validation the class labels associated with the data. After clustering, the class labels associated with the data is compared with the labels generated by clustering algorithm if both the labels are same it is assumed that the data is correctly classified. The instances for which the labels are not same are considered to be misclassified and are removed before further processing. In the third stage support vector machine classification is applied. The classification model is validated by using k-fold cross validation method. The performance of SVM (Support Vector Machine) classifier is also compared with Naive Bayes classifier. In our case SVM classifier outperforms the Naive Bayes classifier. To validate the proposed framework, experiments have been carried out on benchmark datasets such as Indian Pima diabetes dataset and Wisconsin breast cancer dataset (WBCD).These datasets were obtained from the University of California at Irvine (UCI) machine learning repository. Our proposed study obtained classification accuracy on both datasets, which is better with respect to the other classification algorithms applied on the same datasets as cited in the literature. The performance of the proposed framework was also evaluated using the sensitivity and specificity measures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Gan, D.: Diabetes atlas, Brussels: International diabetes Second Eds, http://www.eatlas.idf.org/Atlaswebdata/docs/2003Summary.pdf (assesed 24/04/2009)
Acharya, U.R., Tan, P.H., Subramanian, T., et al.: Automated identification of diabetic type 2 subjects with and without neuropathy using wavelet transform on pedobarograph. J. Medical Systems 32(1), 21–29 (2008)
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine learning, neural and statistical Classification. Ellis Horwood, NJ (1994)
Bioch, J.C., Meer, O., Potharst, R.: Classification using bayesian neural nets. In: Int. Conf. Neural Networks, pp. 1488–1149 (1996)
Carpenter, G.A., Markuzon, N.: ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases. Neural Networks 11, 323–336 (1998)
Deng, D., Kasabov, K.: On-line pattern analysis by evolving self-organizing maps. In: Proc. 5th Biannual Int. Conf. Artificial Neural Networks and Expert System (ANNES), pp. 46–51 (2001)
Kayaer, K., Yıldırım, T.: Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proc. Int. Conf. Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP), pp. 181–184 (2003)
Polat, K., Gunes, S.: An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. J. Digital Signal Processing 17(4), 702–710 (2007)
Polat, K., Gunes, S., Aslan, A.: A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Systems with Applications 34(1), 214–221 (2008)
Humar, K., Novruz, A.: Design of a hybrid system for the diabetes and heart diseases. J. Expert Systems with Application 35, 82–89 (2008)
Newman, D., Hettich, J.S., Blake, C.L.S., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.vci.edu/~mleasn/MLRepository.html (last assessed: 1/5/2009)
Han, J., Kamber, M.: Data mining: Concepts and techniques, pp. 47–94. Morgan Kaufmann Publisher, San Francisco (2006)
Shekhar, R., Gaddam, V., Phoha, V., Kiran, S.: K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods. IEEE Trans. Knowledge AND Data engineering 19(3), 345–354 (2007)
Witten, I.H., Frank, E.: Data mining Practical Machine Learning Tools and Techniques, pp. 363–423. Morgan Kaufmann, San Fransisco (2005)
Plat, J.: Fast training of support vector machine using sequential minimal optimization in Advance kernel support vector machine learnining. In: Shoelkopf, B., Burges, C., Somolo, A. (eds.), pp. 61–74. MIT Press, Cambridge (1998)
Keerti, S.S., Shevade, S.K., Bhattachayra, C., Murthy, K.R.K.: Improvements to Plato’SMO Algorithmfor SVM classifiere design. Neural Computation 13(3), 637 (2001)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Int. Conf. Uncertainty in Artificial Intelligence, San Mateo, pp. 338–345 (1995)
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Int. Conf. Uncertainty in Artificial Intelligence. Morgan Kaufmann, Seattle (1994)
Delen, D., Walker, G., Kadam, A.: Predicting Breast Cancer Survivability: a Comparison of Three Data Mining Methods. J. Artificial Intelligence in Medicine 34(2), 113–127 (2005)
Thora, J., Ebba, T., Helgi, S., Sven, S.: The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining. J. Expert Systems with Applications 34, 108–118 (2008)
Polat, K., Gunes, S.: Breast cancer diagnosis using least square support vector Machine. Digital Signal Processing 17(4), 694–701 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Patil, B.M., Joshi, R.C., Toshniwal, D. (2010). Impact of K-Means on the Performance of Classifiers for Labeled Data. In: Ranka, S., et al. Contemporary Computing. IC3 2010. Communications in Computer and Information Science, vol 94. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14834-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-14834-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14833-0
Online ISBN: 978-3-642-14834-7
eBook Packages: Computer ScienceComputer Science (R0)