Abstract
Breast cancer is the second disease responsible for women’s death in the world. To reduce the number of cases, screening mammography is used to detect this disease. To improve exam accuracy results, computer-aided systems (CAD) have been developed to analyze the mammography and provide statistics based on image features extracted. This paper presents a novel approach for a computer-aided detection system (CADe) based on Tsallis entropy extraction from quantized gray level co-occurrence matrix (GLCM) from mass images. A comparison study is presented based on a feature extraction scheme using weigthed Haralick features. The best result accuracy rate was 91.3% from Tsallis entropy based on GLCM matrix using 24 feature measures.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
According to [1], breast cancer mortality rate has been increasing among the years. In 2012, 521.907 million women in the world died from this disease and the prediction for 2015 was around 560.407 million deaths. Breast cancer is considered a heterogeneous disease and can be caused by many facts including genetic mutation on genes BRCA1 and BRCA2, responsible for 5% to 10% of breast cancer cases.
Screening mammography aids the radiologists to find anomalies on initial stage although, in some cases, diagnosis can be difficult by breast density and a false-positive diagnosis can be provided, resulting on an unnecessary biopsy. Furthermore, about 10% to 30% of results provided by radiologists analysis are failed [2].
To solve this problem, computer-aided detection and diagnosis (CADe and CADx, respectively) have been developed to assist radiologists and doctors on a final result for the patient, avoiding false-positive results and unnecessary biopsy. The main propose is to reduce the mammography review using a computer instead of a second human evaluation and to improve the accuracy based on different methods extracted from this type of image [2].
This paper presents a new CADe system for mammography image classification into mass and non-mass tissue. Its main contribution is the mass classification based on Tsallis entropy from quantized version of gray-level co-occurrence matrix (GLCM) of mammograms.
2 Related Works
Among the years, many researchers presented different textural extraction methods to improve breast tissues analysis. In [3] a CADe system for mammogram images based on feature extraction from a co-occurrence matrix (GLCM) and a gray-level run-length matrix (GLRLM) was developed. The main proposal was a comparative study over some classifiers options and the best accuracy rate from [3] was 83.8%.
In [4] another work involving breast cancer detection from GLCM matrices are presented. From these matrices, four statistical features [5] were extracted. In addition, three other shape features were used to compose the feature vector. In this work, two classifiers were used: K-means and Support Vector Machine. The best accuracy rate was 93.11%.
Tsallis entropy is frequently used for region segmentation. The q-index value variation provides a set of threshold values that can be used to binary segmentation as shown in [6].
3 Methodology
This section presents the methodology used to compose all experiments in this paper.
3.1 Gray-Level Co-occurrence Matrix (GLCM)
According to [5], a digital image can be expressed as a cartesian product between the spatial domains \(L_x\) and \(L_y\) composing the resolution cells, as follows:
where \(L_x = \{1,2,3,\ldots , N_x\}\), \(L_y = \{1,2,3,\ldots ,N_y\}\) and G represents the gray-levels of each cell.
Based on these concepts, [5] proposed a GLCM to extract texture information from an image f. Such matrices are calculated from a specific direction \(\theta \), where \(\theta \in \{0^\circ , 45^\circ , 90^\circ , 135^\circ \}\) and a specific distance d, where \(d \in \{1,2,3,4\}\). From these matrices, fourteen statistical features were calculated to compose the vector of texture information for image classification.
In this work, two GLCM-based strategies were used for feature extraction: the first one with Tsallis entropy and the second one with a weighted Haralick features [5]. Both cases according to a multilevel decomposition scheme as follows.
3.2 Multilevel Decomposition
Gray-level quantization allows to reduce the misclassification of breast lesions. Hence the noise-induced effects are decreased. Normalized co-occurrence matrices represents a relationship between the gray-levels of two pixels and it can be seen as gray-level images. Here, an uniform quantization scheme were used in which the gray-levels are quantized into separated bins. In this case, no consideration is made about the gray-levels distribution of image. A quantized version of GLCM image f can be expressed as
where g is the quantized GLCM image, \(\lfloor \bullet \rfloor \) is the floor function, \(\alpha _1\) and \(\alpha _2\) are coefficients defined as
where l is the desired number of quantization levels, \(f_{min}\) and \(f_{max}\) are the minimum and maximum gray-levels values within image f, respectively. Notice that the quantized image g is in the range [0, l].
3.3 Weighted Haralick Features
In effect to compare the aforementioned feature vectors performance another test was made. Six Haralick features were choosed-four out of 14 most used and two others. Fixing \(\theta \), for each quantization level l and corresponding distance d are computed
where \(f_i^{(l-2)} (d)\) is the \(i-th\) Haralick feature for the quantization level l (\(l = \{3,4,5,6,7,8\}\)), and a distance d (\(d = 1,2,3,4\)), with \(i = \{1,2,3,4,5,6\}\). In this case, \(f_1\), \(f_2\), \(f_3\), \(f_4\), \(f_5\) and \(f_6\) are correlation, ASM, entropy, energy, sum of entropy and sum of variance, respectively [5]. Value of quantization level works as a weight to the corresponding calculated feature from GLCM.
3.4 Tsallis Entropy
From normalized GLCM, as described above, a set of features were extracted using concepts of Tsallis entropy. Such measure generalizes the Boltzmann-Gibbs entropy from thermodynamics concepts. Equation (5) represents Tsallis entropy as follows
where \(p_i\) corresponds to the probability of a gray level in a range of size k from a normalized GLCM matrix. The q parameter represents a real value that varies according to the application.
3.5 Classification
After the feature set creation, the next step was the classification of mammography images. In this paper the concepts of Support Vector Machine (SVM) were used through an auxiliary library called libSVM [7]. In order to provide a better result for classification, the radial basis function (RBF) kernel was chosen. Equation (6) represents this kernel.
where \(\gamma \) is a positive number which could be choosen by the user. In our experiments a best value of \(\gamma \) and a classifying error parameter, C, were provided by a python script called grid.py, implemented on [7].
4 Database Acquisition
For this work, 594 images (297 mass and 297 non-mass ROIs) were randomly chosen from Digital Database Screening Mammography (DDSM) [8]. Such database was a collaborative work and contains 2620 cases, divided by normal, benign and cancer cases.
4.1 ROI Extraction
For ROI extraction step, an approach based on chain code values provided from DDSM database text files was developed to minimize muscular region and mammography background interference.
Using chain code coordinates for nodule boundary, four values were calculated: maximum and minimum height, and maximum and minimum width. Thus, a central point coordinate \((x_0,y_0)\) was calculated to compose bounding box center as shown in Fig. 1(a).
According to [9], the two most efficient bounding box size for mammography classification are \(32 \times 32\) and \(64 \times 64\). In this paper, all ROIs were extracted with a \(64\times 64\) size bounding box from the central pixel coordinate \((x_0, y_0)\) (see Fig. 1(b)).
5 Experiments
For all experiments a fixed \(\theta \) was used, where \(\theta \in \{0^\circ , 45^\circ , 90^\circ , 135^\circ \}\) and the distance d varied between four others values (\(d \in \{1,2,3,4\}\)). These values were selected from experiments described in [5]. For each ROI, six quantized new images (\(2^3, 2^4, 2^5, 2^6, 2^7, 2^8\)) were created. From these six new images, GLCM matrix with the fixed \(\theta \) was calculated for each four distances d. Each GLCM matrix was normalized and Tsallis entropy was extracted using Eq. (2). As a comparative study, weighted Haralick features were extracted from normalized GLCM to compose another texture information vector.
Five values were used to measure classification method performance: accuracy (ACC), sensibility (SENS), specificity (SPEC), positive predict value (PPV) and negative predict value (NPV).
5.1 Experiment I: Fixed Direction \(\theta \) and \(q \in [0.1, 1.9]\)
For the first step of this experiment, 19 q-index values were selected on the range [0.1, 1.9] with an increase rate equal to 0.1 for each value. Figure 2 presents results from Tsallis entropy feature extraction steps.
It is possible to conclude that for the most q-index values, the best results were provided for \(\theta = 0^\circ \).
Table 1 presents three best results from Tsallis entropy feature extraction. The largest rate of sensibility was achieved in last case (85.43%) and the corresponding specificity was 94.33%. This indicates a better capability, for the pair of pointed parameters, to identify the non-mass group. The largest value of accuracy was 89.88% for two couple of parameters \((q, \theta ) = (1.3, 0^\circ )\) and \((q, \theta ) = (1.4, 0^\circ )\).
The second step consists on weighted Haralick features average extraction as a comparative study on mammography classification field. Table 2 provides three best accuracy values for this strategy.
The largest accuracy rate was 83.40% and the attained sensibility for this case was 80.97%.
5.2 Experiment II: Two Refined Ranges: [1.31, 1.39] and [1.41, 1.49]
Based on the Experiment I results, two new q-index ranges were refined: [1.31, 1.39] and [1.41, 1.49] with an increase rate equal 0.01 for each value. Figure 3(a) and (b) shows the accuracy rate results obtained for each q-index on these ranges. Again, it is possible to conclude that \(\theta = 0^\circ \) provide the best results.
The best accuracy were 91.3%, for the couple of parameters \((q, \theta ) = (1.31, 0^\circ )\), in the first range and 89.68%, for \((q, \theta ) = (1.49, 0^\circ )\), in the second one. Table 3 shows the three best results for these cases.
Finally, Table 4 provides a comparative board with some related works and the approach developed in this paper.
6 Conclusions and Future Works
This paper provides an exhaustive amount of tests using the concepts of GLCM, Tsallis entropy and Haralick features. The main propose was to analyze and select the best value of q-index that provides highest accuracy rate for mass and non-mass classification on mammography exam. Entropic and the weighted Haralick features strategies were compared. In this way, all the experiments took into account the same number of features. Results lead to the conclusion that Tsallis entropy is a promising measure, once that the size of feature vector used does not need be long.
As a future work the authors intend to investigate how automatically to select the q-index value. For the choose range [0.1, 1.9] in this paper, the best value of q-index was 1.31, providing an accuracy rate of 91.3%.
References
Globocan Cancer Fact Sheets: Breast Cancer. http://globocan.iarc.fr/Default.aspx. Accessed 03 July 2015
Jalalian, A., Mashohor, S.B., Mahmud, H.R., Saripan, M.I.B., Ramli, A.R.B., Karasfi, B.: Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: a review. Clin. Imaging 37(3), 420–426 (2013)
Mavroforakis, M., Georgiou, H., Cavouras, D., Dimitropoulos, N., Theodoridis, S.: Mammographic mass classification using textural features and descriptive diagnostic data. In: 2002 14th International Conference on Digital Signal Processing, DSP 2002, vol. 1, pp. 461–464 (2002)
Martins, L., Junior, G.B., Silva, A.C., de Paiva, A.C., Gattass, M.: Detection of masses in digital mammograms using k-means and support vector machine. ELCVIA: Electron. Lett. Comput. Vis. Image Anal. 8(2), 39–50 (2009)
Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973)
Mohanalin, B., Kalra, P.K., Kumar, N.: A novel automatic microcalcification detection technique using Tsallis entropy a type II fuzzy index. Comput. Math. Appl. 60(8), 2426–2432 (2010)
Chang, C.-C., Lin, C.-J.: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Heath, M., Bowyer, K., Kopans, D., Kegelmeyer, P., Moore, R., Chang, K., Munishkumaran, S.: Current status of the digital database for screening mammography. In: Karssemeijer, N., Thijssen, M., Hendriks, J., van Erning, L. (eds.) Digital Mammography, vol. 13, pp. 457–460. Springer, Heidelberg (1998)
Garcia-Manso, A., Garcia-Orellana, C.J., Gonzalez-Velasco, H., Gallardo-Caballero, R., Macias Macias, M.: Consistent performance measurement of a system to detect masses in mammograms based on blind feature extraction. BioMed. Eng. Online 12, 2–18 (2013)
Mata, B., Meenaksh, M.: A novel approach for automatic detection of abnormalities in mammograms. In: 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pp. 831–836 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Alcântara, R., Junior, P.F., Ramos, A. (2017). Tsallis Entropy Extraction for Mammographic Region Classification. In: Beltrán-Castañón, C., Nyström, I., Famili, F. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2016. Lecture Notes in Computer Science(), vol 10125. Springer, Cham. https://doi.org/10.1007/978-3-319-52277-7_55
Download citation
DOI: https://doi.org/10.1007/978-3-319-52277-7_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52276-0
Online ISBN: 978-3-319-52277-7
eBook Packages: Computer ScienceComputer Science (R0)