Impact of K-Means on the Performance of Classifiers for Labeled Data

Patil, Bankat M.; Joshi, Ramesh C.; Toshniwal, Durga

doi:10.1007/978-3-642-14834-7_40

Impact of K-Means on the Performance of Classifiers for Labeled Data

Bankat M. Patil⁹,
Ramesh C. Joshi⁹ &
Durga Toshniwal⁹

Conference paper

1159 Accesses
4 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 94))

Abstract

In this study a novel framework for data mining in clinical decision making have been proposed. Our framework addresses the problems of assessing and utilizing data mining models in medical domain. The framework consists of three stages. The first stage involves preprocessing of the data to improve its quality. The second stage employs k-means clustering algorithm to cluster the data into k clusters (in our case, k=2 i.e. cluster0 / no, cluster1 / yes) for validation the class labels associated with the data. After clustering, the class labels associated with the data is compared with the labels generated by clustering algorithm if both the labels are same it is assumed that the data is correctly classified. The instances for which the labels are not same are considered to be misclassified and are removed before further processing. In the third stage support vector machine classification is applied. The classification model is validated by using k-fold cross validation method. The performance of SVM (Support Vector Machine) classifier is also compared with Naive Bayes classifier. In our case SVM classifier outperforms the Naive Bayes classifier. To validate the proposed framework, experiments have been carried out on benchmark datasets such as Indian Pima diabetes dataset and Wisconsin breast cancer dataset (WBCD).These datasets were obtained from the University of California at Irvine (UCI) machine learning repository. Our proposed study obtained classification accuracy on both datasets, which is better with respect to the other classification algorithms applied on the same datasets as cited in the literature. The performance of the proposed framework was also evaluated using the sensitivity and specificity measures.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gan, D.: Diabetes atlas, Brussels: International diabetes Second Eds, http://www.eatlas.idf.org/Atlaswebdata/docs/2003Summary.pdf (assesed 24/04/2009)
Acharya, U.R., Tan, P.H., Subramanian, T., et al.: Automated identification of diabetic type 2 subjects with and without neuropathy using wavelet transform on pedobarograph. J. Medical Systems 32(1), 21–29 (2008)
Article Google Scholar
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine learning, neural and statistical Classification. Ellis Horwood, NJ (1994)
MATH Google Scholar
Bioch, J.C., Meer, O., Potharst, R.: Classification using bayesian neural nets. In: Int. Conf. Neural Networks, pp. 1488–1149 (1996)
Google Scholar
Carpenter, G.A., Markuzon, N.: ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases. Neural Networks 11, 323–336 (1998)
Article Google Scholar
Deng, D., Kasabov, K.: On-line pattern analysis by evolving self-organizing maps. In: Proc. 5th Biannual Int. Conf. Artificial Neural Networks and Expert System (ANNES), pp. 46–51 (2001)
Google Scholar
Kayaer, K., Yıldırım, T.: Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proc. Int. Conf. Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP), pp. 181–184 (2003)
Google Scholar
Polat, K., Gunes, S.: An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. J. Digital Signal Processing 17(4), 702–710 (2007)
Article Google Scholar
Polat, K., Gunes, S., Aslan, A.: A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Systems with Applications 34(1), 214–221 (2008)
Article Google Scholar
Humar, K., Novruz, A.: Design of a hybrid system for the diabetes and heart diseases. J. Expert Systems with Application 35, 82–89 (2008)
Article Google Scholar
Newman, D., Hettich, J.S., Blake, C.L.S., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.vci.edu/~mleasn/MLRepository.html (last assessed: 1/5/2009)
Google Scholar
Han, J., Kamber, M.: Data mining: Concepts and techniques, pp. 47–94. Morgan Kaufmann Publisher, San Francisco (2006)
Google Scholar
Shekhar, R., Gaddam, V., Phoha, V., Kiran, S.: K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods. IEEE Trans. Knowledge AND Data engineering 19(3), 345–354 (2007)
Article Google Scholar
Witten, I.H., Frank, E.: Data mining Practical Machine Learning Tools and Techniques, pp. 363–423. Morgan Kaufmann, San Fransisco (2005)
MATH Google Scholar
Plat, J.: Fast training of support vector machine using sequential minimal optimization in Advance kernel support vector machine learnining. In: Shoelkopf, B., Burges, C., Somolo, A. (eds.), pp. 61–74. MIT Press, Cambridge (1998)
Google Scholar
Keerti, S.S., Shevade, S.K., Bhattachayra, C., Murthy, K.R.K.: Improvements to Plato’SMO Algorithmfor SVM classifiere design. Neural Computation 13(3), 637 (2001)
Article Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Int. Conf. Uncertainty in Artificial Intelligence, San Mateo, pp. 338–345 (1995)
Google Scholar
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Int. Conf. Uncertainty in Artificial Intelligence. Morgan Kaufmann, Seattle (1994)
Google Scholar
Delen, D., Walker, G., Kadam, A.: Predicting Breast Cancer Survivability: a Comparison of Three Data Mining Methods. J. Artificial Intelligence in Medicine 34(2), 113–127 (2005)
Article Google Scholar
Thora, J., Ebba, T., Helgi, S., Sven, S.: The feasibility of constructing a predictive outcome model for breast cancer using the tools of data mining. J. Expert Systems with Applications 34, 108–118 (2008)
Article Google Scholar
Polat, K., Gunes, S.: Breast cancer diagnosis using least square support vector Machine. Digital Signal Processing 17(4), 694–701 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, Uttarakhand, India, 247667
Bankat M. Patil, Ramesh C. Joshi & Durga Toshniwal

Authors

Bankat M. Patil
View author publications
You can also search for this author in PubMed Google Scholar
Ramesh C. Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Durga Toshniwal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Sciences, University of Florida, 32611, Gainesville, FL, USA
Sanjay Ranka
University of Florida, Gainesville, Fl, USA
Arunava Banerjee
Department of Computer Science and Engineering, Indian Institute of Technology, 110016, New Delhi, INDIA
Kanad Kishore Biswas
Computer Science, College of Engineering and Science, Louisiana Tech University, LA 71272, Ruston, USA
Sumeet Dua
University of Florida, Gainesville, FL, USA
Prabhat Mishra
Department of Computer Science & Engineering, Indian Institute of Technology, 208016, Kanpur, India
Rajat Moona
National Tsing Hua University, Hsin-Chu, Taiwan, R.O.C.
Sheung-Hung Poon
Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong
Cho-Li Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patil, B.M., Joshi, R.C., Toshniwal, D. (2010). Impact of K-Means on the Performance of Classifiers for Labeled Data. In: Ranka, S., et al. Contemporary Computing. IC3 2010. Communications in Computer and Information Science, vol 94. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14834-7_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-14834-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14833-0
Online ISBN: 978-3-642-14834-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics