Abstract
In the last few years, machine learning is one of the driving forces of science and industry, but increasing of data requires paradigm shifts in traditional methods in the application of machine learning techniques on this data especially in healthcare field. Furthermore, with the availability of different clinical technologies, tumor features have been collected for breast cancer classification. Therefore, feature selection and accuracy improvement have become a challenging and time-consuming task. In this paper, the proposed approach has two stages. In the first, Association Rules (AR) are used to eliminate insignificant features. In the second, several classifiers are applied to differentiate the incoming tumors. Feature space dimension is reduced from nine to eight and four attributes by using AR. In test stage, threefold cross-validation method was applied to the Wisconsin Breast Cancer Diagnostic (WBCD) dataset from the University of California Irvine machine learning repository to evaluate the proposed system performances. The correct classification rate obtained with Support Vector Machine (SVM) model with AR shows the highest classification accuracy (98.00%) for eight attributes and 96.14% for 4 attributes. The results show that the proposed approach can be used for feature space reduction and saving of time during the training phase leading to better accuracy and fast automatic classification systems.




Similar content being viewed by others
References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Acm sigmod record, volume 22, pages 207–216. ACM
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 12(1):307–328
Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. Proc 20th Int Sonf Very Large Data Bases VLDB 1215:487–499
Albrecht A.A, Lappas G, Vinterbo S.A, Wong C, Ohno-Machado L (2002) Two applications of the lsa machine. In: Proceedings of the 9th international conference on neural information processing, 2002. ICONIP’02., volume 1, pages 184–189. IEEE
Arya C, Tiwari R (2016) Expert system for breast cancer diagnosis: a survey. In: 2016 international conference on computer communication and informatics (ICCCI), pages 1–9. IEEE
Bhardwaj A, Tiwari A (2015) Breast cancer diagnosis using genetically optimized neural network model. Expert Syst Appl 42(10):4611–4620
Brause RW (2001) Medical analysis and diagnosis by neural networks. In: International symposium on medical data analysis, pages 1–13. Springer
Chen C-H (2014) A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection. Appl Soft Comput 20:4–14
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dheeba J, Singh NA, Selvi ST (2014) Computer-aided detection of breast cancer on mammograms: a swarm intelligence optimized wavelet neural network approach. J Biomed Inform 49:45–52
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Magazine 17(3):37–37
Gulbinat W (1997) What is the role of who as an intergovernmental organisation. In: The coordination of telematics in healthcare. World Health Organisation Geneva, Switzerland
Karabatak M (2015) A new classifier for breast cancer detection based on naïve bayesian. Measurement 72:32–36
Karabatak M, Ince MC (2009) An expert system for detection of breast cancer based on association rules and neural network. Expert Syst Appl 36(2):3465–3469
Kilic N, Ucan ON, Osman O (2009) Colonic polyp detection in ct colonography with fuzzy rule based 3d template matching. J Med Syst 33(1):9
Koyuncu H, Ceylan R (2013) Artificial neural network based on rotation forest for biomedical pattern classification. In: 2013 36th international conference on telecommunications and signal processing (TSP), pages 581–585. IEEE
Mert A, Kiliç N, Bilgili E, Akan A (2015) Breast cancer detection with reduced feature set. Comput Math Methods Med 2015:1–11
Nahato KB, Harichandran KN, Arputharaj K (2015) Knowledge mining from clinical datasets using rough sets and backpropagation neural network. Comput Math Methods Med 2015:1–13
Nguyen H, Hung W, Thornton B, Thornton E, Lee W (1998) Classification of microcalcifications in mammograms using artificial neural networks. In: Proceedings of the 20th annual international conference of the IEEE engineering in medicine and biology society. Vol. 20 Biomedical Engineering Towards the Year 2000 and Beyond (Cat. No. 98CH36286), volume 2, pages 1006–1008. IEEE
Optimizat PJSM (1999) ion: A fast algorithm for training support vector machines. In: Press MIT (ed) Advances jn KerneI Metbods Support Vector I earrljng., Cambridge MA, pp 185–208
Paulin F, Santhakumaran A (2011) Classification of breast cancer by comparing back propagation training algorithms. Int J Comput Sci Eng 3(1):327–332
Prasad Y, Biswas KK (2010) Pso-svm based classifiers: a comparative approach. In: International conference on contemporary computing, pages 241–252. Springer
Stoean R, Stoean C (2013) Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection. Expert Syst Appl 40(7):2677–2686
Sweilam NH, Tharwat A, Moniem NA (2010) Support vector machine for diagnosis cancer disease: a comparative study. Egypt Inform J 11(2):81–92
Tartar A, Kilic N, Akan A (2013) Classification of pulmonary nodules by using hybrid features. Comput Math Methods Med 2013
Vapnik V (2013) The nature of statistical learning theory. Springer science & business media, Berlin
WBCD (1995) https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original). Accessed 23 July 2017
Wolberg WH, Mangasarian OL (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc Natl Acad Sci 87(23):9193–9196
Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276
Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms. Expert Syst Appl 41(4):1476–1482
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ed-daoudy, A., Maalmi, K. Breast cancer classification with reduced feature set using association rules and support vector machine. Netw Model Anal Health Inform Bioinforma 9, 34 (2020). https://doi.org/10.1007/s13721-020-00237-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-020-00237-8