A Hybrid of Random Over Sample Examples and Boosted C5.0 Algorithms for Breast Cancer Diagnosis on Imbalanced Data
To surmount the two-class imbalanced problem existing in the breast cancer diagnosis, a hybrid method of ROSE sampling approach with Boosted C5.0 ensemble classifier (R-Boosted C5.0) is proposed. ROSE as the sampling method is utilized to balance the class distribution. Boosted C5.0
is then used as the classifier. To serve this purpose, Wisconsin Breast Cancer Dataset (WBCD), Wisconsin Diagnosis Breast Cancer (WDBC) and three imbalanced datasets have been studied. Assessing by Matthews Correlation Coefficient (MCC), the performance of proposed method on WBCD and WDBC
datasets were 98.5% and 93.0%, respectively. The experimental results show that the proposed work outperforms in contrast with the rest of the classifiers. It can be used as a clinical decision support system to assist breast cancer prediction. In practice, the proposed methodology can be
further applied to class imbalanced data classification.
Keywords: BREAST CANCER DIAGNOSIS; CLASS-IMBALANCE PROBLEM; ENSEMBLE ALGORITHM; SAMPLING METHOD
Document Type: Research Article
Publication date: 01 November 2020
- Journal of Medical Imaging and Health Informatics (JMIHI) is a medium to disseminate novel experimental and theoretical research results in the field of biomedicine, biology, clinical, rehabilitation engineering, medical image processing, bio-computing, D2H2, and other health related areas.
- Editorial Board
- Information for Authors
- Subscribe to this Title
- Ingenta Connect is not responsible for the content or availability of external websites
- Access Key
- Free content
- Partial Free content
- New content
- Open access content
- Partial Open access content
- Subscribed content
- Partial Subscribed content
- Free trial content