Tsallis Entropy Extraction for Mammographic Region Classification

Alcântara, Rafaela; Junior, Perfilino Ferreira; Ramos, Aline

doi:10.1007/978-3-319-52277-7_55

Rafaela Alcântara¹⁶,
Perfilino Ferreira Junior¹⁶ &
Aline Ramos¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10125))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1380 Accesses
1 Citations

Abstract

Breast cancer is the second disease responsible for women’s death in the world. To reduce the number of cases, screening mammography is used to detect this disease. To improve exam accuracy results, computer-aided systems (CAD) have been developed to analyze the mammography and provide statistics based on image features extracted. This paper presents a novel approach for a computer-aided detection system (CADe) based on Tsallis entropy extraction from quantized gray level co-occurrence matrix (GLCM) from mass images. A comparison study is presented based on a feature extraction scheme using weigthed Haralick features. The best result accuracy rate was 91.3% from Tsallis entropy based on GLCM matrix using 24 feature measures.

You have full access to this open access chapter, Download conference paper PDF

A Mass Detection System in Mammograms Using Grey Level Co-occurrence Matrix and Optical Density Features

Relevant Features for Classification of Digital Mammogram Images

A Benign and Malignant Mass Classification Based on Second-Order Statistical Parameters at Different Offset

Keywords

1 Introduction

According to [1], breast cancer mortality rate has been increasing among the years. In 2012, 521.907 million women in the world died from this disease and the prediction for 2015 was around 560.407 million deaths. Breast cancer is considered a heterogeneous disease and can be caused by many facts including genetic mutation on genes BRCA1 and BRCA2, responsible for 5% to 10% of breast cancer cases.

Screening mammography aids the radiologists to find anomalies on initial stage although, in some cases, diagnosis can be difficult by breast density and a false-positive diagnosis can be provided, resulting on an unnecessary biopsy. Furthermore, about 10% to 30% of results provided by radiologists analysis are failed [2].

To solve this problem, computer-aided detection and diagnosis (CADe and CADx, respectively) have been developed to assist radiologists and doctors on a final result for the patient, avoiding false-positive results and unnecessary biopsy. The main propose is to reduce the mammography review using a computer instead of a second human evaluation and to improve the accuracy based on different methods extracted from this type of image [2].

This paper presents a new CADe system for mammography image classification into mass and non-mass tissue. Its main contribution is the mass classification based on Tsallis entropy from quantized version of gray-level co-occurrence matrix (GLCM) of mammograms.

2 Related Works

Among the years, many researchers presented different textural extraction methods to improve breast tissues analysis. In [3] a CADe system for mammogram images based on feature extraction from a co-occurrence matrix (GLCM) and a gray-level run-length matrix (GLRLM) was developed. The main proposal was a comparative study over some classifiers options and the best accuracy rate from [3] was 83.8%.

In [4] another work involving breast cancer detection from GLCM matrices are presented. From these matrices, four statistical features [5] were extracted. In addition, three other shape features were used to compose the feature vector. In this work, two classifiers were used: K-means and Support Vector Machine. The best accuracy rate was 93.11%.

Tsallis entropy is frequently used for region segmentation. The q-index value variation provides a set of threshold values that can be used to binary segmentation as shown in [6].

3 Methodology

This section presents the methodology used to compose all experiments in this paper.

3.1 Gray-Level Co-occurrence Matrix (GLCM)

According to [5], a digital image can be expressed as a cartesian product between the spatial domains $L_x$ and $L_y$ composing the resolution cells, as follows:

$$\begin{aligned} f: L_x \times L_y \rightarrow G . \end{aligned}$$

(1)

where $L_x = \{1,2,3,\ldots , N_x\}$, $L_y = \{1,2,3,\ldots ,N_y\}$ and G represents the gray-levels of each cell.

Based on these concepts, [5] proposed a GLCM to extract texture information from an image f. Such matrices are calculated from a specific direction $\theta $, where $\theta \in \{0^\circ , 45^\circ , 90^\circ , 135^\circ \}$ and a specific distance d, where $d \in \{1,2,3,4\}$. From these matrices, fourteen statistical features were calculated to compose the vector of texture information for image classification.

In this work, two GLCM-based strategies were used for feature extraction: the first one with Tsallis entropy and the second one with a weighted Haralick features [5]. Both cases according to a multilevel decomposition scheme as follows.

3.2 Multilevel Decomposition

Gray-level quantization allows to reduce the misclassification of breast lesions. Hence the noise-induced effects are decreased. Normalized co-occurrence matrices represents a relationship between the gray-levels of two pixels and it can be seen as gray-level images. Here, an uniform quantization scheme were used in which the gray-levels are quantized into separated bins. In this case, no consideration is made about the gray-levels distribution of image. A quantized version of GLCM image f can be expressed as

$$\begin{aligned} g = \lfloor \alpha _1 f + \alpha _2 \rfloor . \end{aligned}$$

(2)

where g is the quantized GLCM image, $\lfloor \bullet \rfloor $ is the floor function, $\alpha _1$ and $\alpha _2$ are coefficients defined as

$$\begin{aligned} \left\{ \begin{matrix} \alpha _1 = \frac{l}{f_{max} - f_{min}} \\ \\ . \\ \alpha _2 = - \alpha _1 f_{min} \end{matrix} \right. \end{aligned}$$

(3)

where l is the desired number of quantization levels, $f_{min}$ and $f_{max}$ are the minimum and maximum gray-levels values within image f, respectively. Notice that the quantized image g is in the range [0, l].

3.3 Weighted Haralick Features

In effect to compare the aforementioned feature vectors performance another test was made. Six Haralick features were choosed-four out of 14 most used and two others. Fixing $\theta $, for each quantization level l and corresponding distance d are computed

$$\begin{aligned} F_{i,d} = \frac{f_{i}^{(1)} (d) . ~3 + f_{i}^{(2)} (d) . ~4 + \dots + f_{i}^{(6)} (d) . ~8 }{3 + 4 + \dots + 8} = \frac{1}{33} . \sum \limits _{l = 3}^8 f_i^{(l-2)} (d) .~l . \end{aligned}$$

(4)

where $f_i^{(l-2)} (d)$ is the $i-th$ Haralick feature for the quantization level l ($l = \{3,4,5,6,7,8\}$), and a distance d ($d = 1,2,3,4$), with $i = \{1,2,3,4,5,6\}$. In this case, $f_1$, $f_2$, $f_3$, $f_4$, $f_5$ and $f_6$ are correlation, ASM, entropy, energy, sum of entropy and sum of variance, respectively [5]. Value of quantization level works as a weight to the corresponding calculated feature from GLCM.

3.4 Tsallis Entropy

From normalized GLCM, as described above, a set of features were extracted using concepts of Tsallis entropy. Such measure generalizes the Boltzmann-Gibbs entropy from thermodynamics concepts. Equation (5) represents Tsallis entropy as follows

$$\begin{aligned} S = \frac{1-\sum \limits _{i=0}^k (p_i)^q}{q-1} . \end{aligned}$$

(5)

where $p_i$ corresponds to the probability of a gray level in a range of size k from a normalized GLCM matrix. The q parameter represents a real value that varies according to the application.

3.5 Classification

After the feature set creation, the next step was the classification of mammography images. In this paper the concepts of Support Vector Machine (SVM) were used through an auxiliary library called libSVM [7]. In order to provide a better result for classification, the radial basis function (RBF) kernel was chosen. Equation (6) represents this kernel.

$$\begin{aligned} K(x,y) = e^{-\gamma ||x-y||^2} \; . \end{aligned}$$

(6)

where $\gamma $ is a positive number which could be choosen by the user. In our experiments a best value of $\gamma $ and a classifying error parameter, C, were provided by a python script called grid.py, implemented on [7].

4 Database Acquisition

For this work, 594 images (297 mass and 297 non-mass ROIs) were randomly chosen from Digital Database Screening Mammography (DDSM) [8]. Such database was a collaborative work and contains 2620 cases, divided by normal, benign and cancer cases.

4.1 ROI Extraction

For ROI extraction step, an approach based on chain code values provided from DDSM database text files was developed to minimize muscular region and mammography background interference.

Using chain code coordinates for nodule boundary, four values were calculated: maximum and minimum height, and maximum and minimum width. Thus, a central point coordinate $(x_0,y_0)$ was calculated to compose bounding box center as shown in Fig. 1(a).

According to [9], the two most efficient bounding box size for mammography classification are $32 \times 32$ and $64 \times 64$. In this paper, all ROIs were extracted with a $64\times 64$ size bounding box from the central pixel coordinate $(x_0, y_0)$ (see Fig. 1(b)).

5 Experiments

For all experiments a fixed $\theta $ was used, where $\theta \in \{0^\circ , 45^\circ , 90^\circ , 135^\circ \}$ and the distance d varied between four others values ($d \in \{1,2,3,4\}$). These values were selected from experiments described in [5]. For each ROI, six quantized new images ($2^3, 2^4, 2^5, 2^6, 2^7, 2^8$) were created. From these six new images, GLCM matrix with the fixed $\theta $ was calculated for each four distances d. Each GLCM matrix was normalized and Tsallis entropy was extracted using Eq. (2). As a comparative study, weighted Haralick features were extracted from normalized GLCM to compose another texture information vector.

Five values were used to measure classification method performance: accuracy (ACC), sensibility (SENS), specificity (SPEC), positive predict value (PPV) and negative predict value (NPV).

5.1 Experiment I: Fixed Direction $\theta $ and $q \in [0.1, 1.9]$

For the first step of this experiment, 19 q-index values were selected on the range [0.1, 1.9] with an increase rate equal to 0.1 for each value. Figure 2 presents results from Tsallis entropy feature extraction steps.

It is possible to conclude that for the most q-index values, the best results were provided for $\theta = 0^\circ $.

Table 1 presents three best results from Tsallis entropy feature extraction. The largest rate of sensibility was achieved in last case (85.43%) and the corresponding specificity was 94.33%. This indicates a better capability, for the pair of pointed parameters, to identify the non-mass group. The largest value of accuracy was 89.88% for two couple of parameters $(q, \theta ) = (1.3, 0^\circ )$ and $(q, \theta ) = (1.4, 0^\circ )$.

Table 1. Performance of mass/non-mass classification for $q \in [0.1, 1.9]$

Full size table

The second step consists on weighted Haralick features average extraction as a comparative study on mammography classification field. Table 2 provides three best accuracy values for this strategy.

Table 2. Performance of mass/non-mass classification for weighted Haralick features

Full size table

The largest accuracy rate was 83.40% and the attained sensibility for this case was 80.97%.

5.2 Experiment II: Two Refined Ranges: [1.31, 1.39] and [1.41, 1.49]

Based on the Experiment I results, two new q-index ranges were refined: [1.31, 1.39] and [1.41, 1.49] with an increase rate equal 0.01 for each value. Figure 3(a) and (b) shows the accuracy rate results obtained for each q-index on these ranges. Again, it is possible to conclude that $\theta = 0^\circ $ provide the best results.

The best accuracy were 91.3%, for the couple of parameters $(q, \theta ) = (1.31, 0^\circ )$, in the first range and 89.68%, for $(q, \theta ) = (1.49, 0^\circ )$, in the second one. Table 3 shows the three best results for these cases.

Table 3. Performance of mass/non-mass classification for $q \in [1.31, 1.39] \cup [1.41, 1.49]$

Full size table

Finally, Table 4 provides a comparative board with some related works and the approach developed in this paper.

Table 4. Related works comparative

Full size table

6 Conclusions and Future Works

This paper provides an exhaustive amount of tests using the concepts of GLCM, Tsallis entropy and Haralick features. The main propose was to analyze and select the best value of q-index that provides highest accuracy rate for mass and non-mass classification on mammography exam. Entropic and the weighted Haralick features strategies were compared. In this way, all the experiments took into account the same number of features. Results lead to the conclusion that Tsallis entropy is a promising measure, once that the size of feature vector used does not need be long.

As a future work the authors intend to investigate how automatically to select the q-index value. For the choose range [0.1, 1.9] in this paper, the best value of q-index was 1.31, providing an accuracy rate of 91.3%.

References

Globocan Cancer Fact Sheets: Breast Cancer. http://globocan.iarc.fr/Default.aspx. Accessed 03 July 2015
Jalalian, A., Mashohor, S.B., Mahmud, H.R., Saripan, M.I.B., Ramli, A.R.B., Karasfi, B.: Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: a review. Clin. Imaging 37(3), 420–426 (2013)
Article Google Scholar
Mavroforakis, M., Georgiou, H., Cavouras, D., Dimitropoulos, N., Theodoridis, S.: Mammographic mass classification using textural features and descriptive diagnostic data. In: 2002 14th International Conference on Digital Signal Processing, DSP 2002, vol. 1, pp. 461–464 (2002)
Google Scholar
Martins, L., Junior, G.B., Silva, A.C., de Paiva, A.C., Gattass, M.: Detection of masses in digital mammograms using k-means and support vector machine. ELCVIA: Electron. Lett. Comput. Vis. Image Anal. 8(2), 39–50 (2009)
Google Scholar
Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973)
Article Google Scholar
Mohanalin, B., Kalra, P.K., Kumar, N.: A novel automatic microcalcification detection technique using Tsallis entropy a type II fuzzy index. Comput. Math. Appl. 60(8), 2426–2432 (2010)
Article Google Scholar
Chang, C.-C., Lin, C.-J.: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Article Google Scholar
Heath, M., Bowyer, K., Kopans, D., Kegelmeyer, P., Moore, R., Chang, K., Munishkumaran, S.: Current status of the digital database for screening mammography. In: Karssemeijer, N., Thijssen, M., Hendriks, J., van Erning, L. (eds.) Digital Mammography, vol. 13, pp. 457–460. Springer, Heidelberg (1998)
Chapter Google Scholar
Garcia-Manso, A., Garcia-Orellana, C.J., Gonzalez-Velasco, H., Gallardo-Caballero, R., Macias Macias, M.: Consistent performance measurement of a system to detect masses in mammograms based on blind feature extraction. BioMed. Eng. Online 12, 2–18 (2013)
Article Google Scholar
Mata, B., Meenaksh, M.: A novel approach for automatic detection of abnormalities in mammograms. In: 2011 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pp. 831–836 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Departament, Mathematics Institute, Federal University of Bahia, Salvador, Bahia, Brazil
Rafaela Alcântara, Perfilino Ferreira Junior & Aline Ramos

Authors

Rafaela Alcântara
View author publications
You can also search for this author in PubMed Google Scholar
Perfilino Ferreira Junior
View author publications
You can also search for this author in PubMed Google Scholar
Aline Ramos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafaela Alcântara .

Editor information

Editors and Affiliations

Pontificia Universidad Católica del Perú, Lima, Peru
César Beltrán-Castañón
Uppsala University, Uppsala, Sweden
Ingela Nyström
University of Ottawa, Ottawa, Ontario, Canada
Fazel Famili

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alcântara, R., Junior, P.F., Ramos, A. (2017). Tsallis Entropy Extraction for Mammographic Region Classification. In: Beltrán-Castañón, C., Nyström, I., Famili, F. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2016. Lecture Notes in Computer Science(), vol 10125. Springer, Cham. https://doi.org/10.1007/978-3-319-52277-7_55

Download citation

DOI: https://doi.org/10.1007/978-3-319-52277-7_55
Published: 16 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52276-0
Online ISBN: 978-3-319-52277-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Tsallis Entropy Extraction for Mammographic Region Classification

Abstract

Similar content being viewed by others

A Mass Detection System in Mammograms Using Grey Level Co-occurrence Matrix and Optical Density Features

Relevant Features for Classification of Digital Mammogram Images

A Benign and Malignant Mass Classification Based on Second-Order Statistical Parameters at Different Offset

Keywords

1 Introduction

2 Related Works

3 Methodology