Combining support vector machine with genetic algorithm to classify ultrasound breast tumor images

https://doi.org/10.1016/j.compmedimag.2012.07.004Get rights and content

Abstract

To promote the classification accuracy and decrease the time of extracting features and finding (near) optimal classification model of an ultrasound breast tumor image computer-aided diagnosis system, we propose an approach which simultaneously combines feature selection and parameter setting in this study.

In our approach ultrasound breast tumors were segmented automatically by a level set method. The auto-covariance texture features and morphologic features were first extracted following the use of a genetic algorithm to detect significant features and determine the near-optimal parameters for the support vector machine (SVM) to identify the tumor as benign or malignant.

The proposed CAD system can differentiate benign from malignant breast tumors with high accuracy and short feature extraction time. According to the experimental results, the accuracy of the proposed CAD system for classifying breast tumors is 95.24% and the computing time of the proposed system for calculating features of all breast tumor images is only 8% of that of a system without feature selection. Furthermore, the time of finding (near) optimal classification model is significantly than that of grid search. It is therefore clinically useful in reducing the number of biopsies of benign lesions and offers a second reading to assist inexperienced physicians in avoiding misdiagnosis.

Introduction

Breast cancer has been the main cause of death for women globally in recent years. According to statistics, in 2009, 192,370 women in the United States were expected to be diagnosed with breast cancer, and 40,170 deaths were attributed to the disease [1]. To reduce death rates and extend patient lives, early detection and prompt treatment of breast cancer is critical.

Detection of breast cancer usually consists of a physical examination, imaging, and biopsy [2]. Detecting tumors by physical examination requires experience and the process may be uncomfortable. It is also difficult to identify benign tumors from malignant ones. Biopsy is the best way to accurately determine whether the tumor is benign or malignant. However, it is invasive and is much more expensive than other detection methodologies. In order to avoid an uncomfortable physical examination and unnecessary biopsy, many researchers have investigated computer-aided diagnosis (CAD) systems based on medical imaging [3], [4], [5], [6], [7], [8]. The aim of these CAD systems is to offer more objective evidences and help physicians to differentiate benign and malignant tumors.

Mammography, magnetic resonance imaging (MRI) and ultrasound (US) are common medical imaging techniques to detect and classify breast tumors. Although mammography can be used to visualize nonpalpable and small tumors, it may miss cancer in women with dense breasts [9]. Further, it is very painful for the patients, harmful to human tissue, and cannot be used on patients under 40 years of age [10]. Though MRI is highly sensitive, it is costly and may carry risks from the required contrast media [11]. MRI for breast cancer screening has been found to have lower specificity than mammography with a higher rate of false positives, where misdetection leads to further follow up MRI and image guided biopsy costs [12]. US is a popular technique in medical imaging due to its lower cost, convenience, and real-time scanning. It has gradually developed into a sophisticated tool for the diagnosis of breast lesions in recent years. In addition to distinguishing cysts from solid breast tumors, US can be used to help classify breast tumors. However, there is a considerable overlap for benignancy and malignancy in ultrasonic images and the interpretation is subjective [13].

Ultrasound image features for the classification of breast tumors mainly include texture [4], [14] and shape [3], [13]. Texture features of benign and malignant tumors are deemed useful characteristics for their differentiation on ultrasound. They are easily extracted but usually affected by the region of interest (ROI) which is usually drawn by physicians. Shape features, also known as morphological features, are provided by the contours of tumors, which are effective features in the evaluation of breast tumors. Although shape features will be not affected by ROI, extracting these features usually requires substantial amounts of computation. Since texture and morphological features can be used to evaluate the breast tumors, we combine two kinds of features to construct an ultrasound breast tumor image CAD system in this study.

Most classification problems involving a large set of potential features must identify a small subset for features to be employed for classification, an act known as feature selection. Data collected without feature selection may be redundant or noisy, and may degrade the accuracy rate of classification. Finding a near-optimal subset of features in feature selection is inherently combinatory, since the usefulness of each feature needs to be determined. Principal component analysis (PCA) is the most widely adopted traditional statistical method. However, the features selected using PCA proved to be variable-independent but may not be the most beneficial for a specific problem [15].

Genetic algorithms (GAs) were introduced by Holland [16] and the application of GAs was accelerated by the publication of a textbook by Goldberg [17]. GAs search procedures are based on natural selection and natural genetics. A multi-dimensional search is performed to provide the near-optimal value of a fitness function in an optimization problem. Unlike conventional search methods, GAs deal with multiple solutions simultaneously and compute the fitness function values for these solutions. GAs are theoretically and empirically found to provide global near-optimal solutions for various complex optimization problems in the fields of operations research, pattern recognition, image processing, and machine learning.

In this study, we use a previously proposed image segmentation algorithm [3] to automatically extract the contours of breast tumors from ultrasound images. This algorithm includes several steps such as image smoothing, image enhancement, and the ‘level set’ technique to extract contours of breast tumors from ultrasound images with a very high reliability. After the segmentation procedure is performed, texture features inside the tumor and morphological features based on the contour of the breast tumor were calculated to form a feature vector. To decrease the time of extracting features and promote the accuracy of our CAD system, a genetic algorithm (GA) was used to select the most significant features from the texture and morphological features [18], [19], [20]. This also provides the near-optimal parameters for a support vector machine (SVM) [21], [22], [23], [24], [25], [26], [27] that evaluates the breast masses.

Section snippets

Data acquisition

For this study, a total of 210 ultrasound images that had been pathologically proven were used to evaluate our CAD system. The ultrasound image database included 120 benign breast tumor images and 90 malignant ones. The patients’ ages ranged from 18 to 64 years old and only one image from each patient is contained in the database. This study was approved by the local ethics committee and informed consent was waived.

The ultrasound images were captured by an ATL HDI 3000 system (Philips Medical

Results

This study used the 5-fold cross-validation proposed by Salzberg [39]. That is, all experimental cases were randomly divided into five groups, and one group was chosen as a testing group by turns while the remaining four groups were used to train the SVM with GA.

The proposed approach was implemented on a PC with an AMD Athlon 64 1.8 GHz processor, Windows XP operating system, and the Visual C++ 6.0 development environment. The libSVM [40] is used in the proposed approach.

For SVM with GA, we set

Discussion

We proposed an automatic diagnostic system that uses practical texture and morphological features to effectively distinguish between benign and malignant lesions of the breast in this study. In the beginning, we considered evaluating breast tumor images using all 30 features. However, some features may be redundant and irrelevant with respect to the classification process. Such redundant features may increase the computational complexity and lower the classification accuracy. Hence, the feature

Acknowledgement

This research was partially supported by the National Science Council of the Republic of China (Taiwan) under Contract no. NSC 97-2221-E-182-044.

Wen-Jie Wu is an assistant professor in Department of Information Management, Chang Gung University, Taiwan. He received the B.S. degree in computer science and engineering from Tatung University, Taipei, Taiwan, in 1996, and the M.S. degree and Ph.D. degree in computer science and information engineering from Nation Chung Cheng University, Chiayi, Taiwan, in 1998 and 2003, respectively. His research interests include image processing, medical computer-aided diagnosis system, and data mining.

References (41)

Cited by (87)

  • Applied Biomedical Engineering Using Artificial Intelligence and Cognitive Models

    2021, Applied Biomedical Engineering Using Artificial Intelligence and Cognitive Models
  • Artificial intelligence in multiparametric prostate cancer imaging with focus on deep-learning methods

    2020, Computer Methods and Programs in Biomedicine
    Citation Excerpt :

    Diagnosis is nonetheless often complicated by benign abnormalities having a similar appearance or healthy tissue being more heterogeneous in nature [53]. For reference, sensitivities and specificities of 55%–68% and 71%–82% are reported for T2W images [54]. Diffusion-weighted images visualise tissue structures based on the diffusive behaviour of the water molecules within; when Brownian motion is hindered by interaction with fibres, membranes, or other structures – which is typically the case in the dense collagen-rich tumour environment – bright lesions occur on diffusion-weighted images [55,56].

View all citing articles on Scopus

Wen-Jie Wu is an assistant professor in Department of Information Management, Chang Gung University, Taiwan. He received the B.S. degree in computer science and engineering from Tatung University, Taipei, Taiwan, in 1996, and the M.S. degree and Ph.D. degree in computer science and information engineering from Nation Chung Cheng University, Chiayi, Taiwan, in 1998 and 2003, respectively. His research interests include image processing, medical computer-aided diagnosis system, and data mining.

Shih-Wei Lin is an associate professor in Department of Information Management, Chang Gung University, Taiwan. He received his Ph.D. in industrial management from the National Taiwan University of Science and Technology in 2002. His current research interests include scheduling and data mining. His papers have appeared in Computers and Operations Research, European Journal of Operational Research, Journal of the Operational Research Society, International Journal of Production Research, International Journal of Advanced Manufacturing Technology, Knowledge and Information Systems, Applied Soft Computing, Applied Intelligence, and Expert Systems with Applications, etc.

Woo Kyung Moon is a professor in Department of Radiology, College of Medicine, Seoul National University, Korea. He received the M.D. degree from Seoul National University College of Medicine, Seoul, Korea, in 1989, and the Ph.D. degree in radiology science from Seoul National University, Seoul, Korea, in 1999. His research interests include breast imaging and intervention, computer-aided diagnosis, and molecular imaging.

View full text