Stacked auto-encoder based tagging with deep features for content-based medical image retrieval

doi:10.1016/j.eswa.2020.113693

Expert Systems with Applications

Volume 161, 15 December 2020, 113693

https://doi.org/10.1016/j.eswa.2020.113693 Get rights and content

Highlights

•
Proposed method provides an effective and efficient solution method for highly unbalanced medical benchmark datasets.
•
This is the first study in which data imbalance using the feature vector at the output of the FCL layer.
•
Enabling the reduced search area to be used more effectively.
•
Converting high-level features into few digits using unsupervised sAE is considerably improves the performance.

Abstract

Content-based medical image retrieval (CBMIR) is one of the most challenging and ambiguous tasks used to minimize the semantic gap between images and human queries in datasets with rich information content. Similar to the human visual saliency mechanism, CBMIR systems also use the visual features in the images for searching purposes. As a result of this search process, automatically accessing the images is very convenient in large and balanced datasets. Still, it is generally not possible to find such datasets in the medical domain. In this study, a four-step and effective hash code generation technique is presented to reduce the semantic gap between low-level features and high-level semantics for unbalanced medical image datasets. In the first stage, the convolutional neural network (CNN) architecture, the most effective feature representation method available today, is employed to extract discriminative features from images automatically. The features obtained in the last fully connected layer (FCL) at the output of the CNN architecture are used for hash code generation. In the second stage, using the Synthetic Minority Over-sampling Technique (SMOTE), the imbalance between the classes in the dataset is reduced. The solution to the unbalanced problem increases performance by almost 3%. In the third stage, balanced features are converted to a code of 13 symbols by using deep stacked auto-encoder. Finally, this code is translated to the standard 13-character labeling and retrieval code used by the 'Image retrieval in the medical application' (IRMA) dataset, since this is the database with which experiments have been done. IRMA error parameter, classification performance, and retrieval performance of the proposed method are more successful than other state-of-the-art methods.

Introduction

Modern imaging technologies enable multi-dimensional and parametric visualization of objects, thanks to today's technological breakthroughs. Imaging devices, which can even enter our pocket with the development of technology, have been used for the medical domain since the early years because of their advantages (Bartels, Bibbo, Wied, & Bahr, 2016). Due to the advantages provided by it, the increasing dependence on these devices causes a massive increase in the volume of digital images. In 2010, an average of 120 medical images were recorded per second for mammography only (Krupinski, 2010). In 2016, 38 million magnetic resonance imaging (MRI) scans and 79 million computed tomography (CT) scans were recorded (Papanicolas, Woskie, & Jha, 2018). Analyzing this massive number of medical images is very difficult and time consuming for expert doctors. The severity of the condition can be understood, especially considering the number of doctors per patient is almost 11,000/1 in some regions (Pandey, Singh, Singh, & Kumar, 2019). In the medical domain, where early diagnosis is vital for many diseases and directly related to human health, all images should be scrutinized. However, in most cases, the number of expert doctors does not allow this examination to be carried out quickly. In addition, errors caused by the human factor should be minimized (Stoean, Pelka, Nensa, & Friedrich, 2018). To overcome these drawbacks, computer-aided diagnosis (CAD) systems are a highly effective solution that has been used since the past (Chen et al., 2020, Sobrinho et al., 2020). CAD systems are extensive spectrum systems that include artificial intelligence techniques, which are used to reduce the workload of specialist doctors and support the decisions made. They are used for many applications such as medical image segmentation, classification, enhancement, pre-processing, post-processing, and retrieval. Recently, retrieval-based CAD systems attract a lot of attention (Das and Neelima, 2020, Owais et al., 2019). CBMIR methods not only produce the classification result of images but also retrieve and visualize images. Also, it assigns specific and descriptive numbers or characters to each image, allowing images to be quickly saved in the database (Shi, et al., 2018). Then, using these numbers or characters, it can retrieve all disease information about the image (Zhang, Liu, Dundar, Badve, & Zhang, 2015).

Retrieval systems provide many positive contributions such as more regular datasets, adding new images to the dataset by tagging, quick access to images in the dataset, and classification result for each image. These systems are divided into two as text-based image retrieval and content-based image retrieval. In the early years, the text-based image retrieval method was used, which was based on the representation of each image with one or more text (Das & Neelima, 2020). This manual text information is time consuming, repetitive, not always reliable, and cannot be applied for unannotated or unlabeled images. In addition, it requires experience and knowledge of an expert doctor (Rajaei, Dallalzadeh, & Rangarajan, 2013). Content-based image retrieval (CBIR) method, an automatic retrieval system based on image features, has been proposed for the elimination of these drawbacks (Al-Mohamade, Bchir, & Ben Ismail, 2020). These features, which are related to information such as the shape, color, texture, edges of the objects in the images, are usually extracted from the images using hand-crafted feature extraction methods. The incompatibility between these low-level features and high-level image concepts causes a 'semantic gap'. This gap negatively affects the overall system performance by causing an ambiguity between the query image and the generated features (Wang, et al., 2020).

The performance of retrieval systems is critically dependent on feature representation and measurement of similarity. Today, CNN architecture is accepted as the most effective solution to image processing problems (Sengupta et al., 2020). The CNN architecture automatically extracts features from images at multiple levels. These multiple transformations and representations capability plays a significant role in CNN architecture's ability to solve complex functions more effectively. Besides, it helps to solve the problem of the semantic gap by extracting discriminative features from images (Wei et al., 2019). Effective CNN architectures such as AlexNet (Krizhevsky, Sutskever, & Hinton, 2017), VGGNet (Simonyan & Zisserman, 2014), ResNet (He, Zhang, Ren, & Sun, 2016), InceptionNet (Szegedy et al., 2015) are used frequently in the literature to represent image features robustly. Current CBIR studies use these features produced by CNN architectures quite often (Abdel-Nabi et al., 2019, Saritha et al., 2018, Sezavar et al., 2019, Siradjuddin et al., 2019). These discriminative features are usually converted into binary codes by processing them with the help of a classifier or a second algorithm. In another approach, vectors obtained in the FCL layer are used to get hash code (Shi et al., 2018).

The existing layers of the CNN structure enable it to produce high performance in specific tasks such as classification, segmentation, and detection. Other layers except FCL are used to learn the features and represent them more efficiently. The FCL turns these raw features into classifiable vectors and classify. Zhou (2020). Feature vectors created automatically by FCL layers are especially important in creating hashing functions used for image retrieval. For this reason, the importance of FCL layers in CNN architecture cannot be ignored. On the other hand, raw feature vectors produced by FCL layers cannot directly generate discrete hash codes. Quantization is needed to generate these codes (Cao et al., 2017, Tang et al., 2018). Image retrieval using binary codes is much faster than direct matching and less costly to store. The direct use of feature vectors taken from the FCL layer is not efficient due to the high dimensionality of such vectors. Auto-encoders (AE) are highly prone to be used to create such codes. However, the raw images are not suitable for use in AE training. For this reason, feature vectors in the FCL layer are used as the AE input (Camlica, Tizhoosh, & Khalvati, 2015a). AE is a special kind of neural network structure that can encode the features in its input and express these features with fewer parameters with minimum error. As a result of the hash code generation process, the feature vector can be represented with fewer parameters but with the same meaning. Thanks to this feature, it is beneficial in retrieval systems (Zhu, Wang, Bai, Yao, & Bai, 2016).

CBMIR systems can be used as clinical decision machines, education, research, and counseling system. The importance of these systems has allowed many researchers to conduct research on this subject from past to present. When looking at the development of CBMIR systems from a wide window, hand-crafted features have been used in the past, while automatic feature extraction methods are used today (Mohd Zin et al., 2018). In the retrieval systems, Fourier transform (Bueno, Chino, Traina, Traina, & Azevedo-Marques, 2002), Gabor filters (Gang & Zong-Min, 2007), wavelet-based systems (Quellec, Lamard, Cazuguel, Cochener, & Roux, 2010), invariants moments (Afifi & Ashour, 2012), co-occurrence matrices (Kwak et al., 2002) were used in the early stages, which are among the low level and single level features. These features are extracted from the color, texture, edges, and shape of the image and are of low level. Although the produced retrieval performance is not satisfactory, it has made a big leap compared to the text-based image retrieval systems. In the following years, the bag-of-visual-words (BoVW) framework was preferred to prevent the drawbacks caused by sticking to a single feature. BoVW can be defined as a codebook of visual words. It is created by collecting samples taken from more than one salient keypoints (Iakovidis et al., 2009). In this period, studies focused on capturing saliency points in images and determining their interests. For this purpose, scale-invariant feature transform (SIFT) (Zhi, Zhang, Zhao, Zhao, & Lin, 2009), speeded up robust features (SURF) (Lee & Kim, 2014), local binary patterns (LBP) (Camlica, Tizhoosh, & Khalvati, 2015b), histogram of oriented gradient (HOG) (Vijendran & Kumar, 2015), GIST (Rupali & Bhakti, 2017) algorithms were used for CBIR systems. Such hand-crafted feature generation algorithms are still used due to the high number of data requirements of CNN (Ahn, Kumar, Fulham, Feng, & Kim, 2019). While it is still challenging to find labeled and balanced medical datasets, in addition, a large number of images are required to train the CNN architecture. While some CBMIR researchers look for different ways to solve this problem, such as Radon transform (Babaie, Tizhoosh, Khatami, & Shiri, 2017), some researchers go over CNN architectures. The siamese network architecture has been advantageous in the works carried out with the unsupervised CNN architecture (Spitzer, Kiwitz, Amunts, Harmeling, & Dickscheid, 2018). In addition, supervised CNN architectures also achieved promising results (Cai, Li, Qiu, Ma, & Gao, 2019). The fact that CNN algorithms produced stunning results in almost all image processing areas did not escape the attention of CBIR researchers. They carried out their work to transfer these architectures to the CBMIR area. For this purpose, structures such as transfer learning, shallow CNNs, hybrid CNNs have been used.

The training and test procedures of the proposed study are carried out using the well-known X-ray image dataset called IRMA (Huang et al., 2003, Lehmann et al., 2004, Lehmann et al., 2005). Each image in the IRMA dataset is represented by IRMA codes consisting of 13-characters. In this study, each symbol of the IRMA code is referred to as 'character', and each symbol of the codes produced in all steps of our algorithm except the normalization step is referred to as 'digit'. This is because the IRMA code consists of numbers and letters, whereas the outputs produced by the proposed architecture consist of floating-point numbers (stored as variables of type double). The images in IRMA dataset, which has a very unbalanced class distribution, are accessed with these codes (Khatami, Babaie, Tizhoosh, et al., 2018). In addition, a unique error value called IRMA error is calculated to measure performance (Ahn et al., 2019, Khatami et al., 2018, Sriram et al., 2019).

Details of CBMIR studies and specific methods used for the IRMA dataset are examined in this part. Tang, Liu, & Liu (2017) proposed a multi-scale single layer stacked AE (sAE) structure for the classification of IRMA images. Feature matrices were obtained with the convolution operator, and these matrices were coded with fisher vector encodings. Kundu, Chowdhury, & Das (2017) extracted global shape features with the pulse coupled neural network (PCNN) model. In the second part, they obtained local features with counterlet transform. Khatami et al. (2018) reduced the search space using parallel CNN architectures. LBP, HOG, and Radon transformations, which are local feature extraction models, followed this structure. In this way, they represented the features by narrowing the search space further. Ahn et al. (2019) proposed a convolutional sparse kernel network (CSKN) to learn discriminative features from unlabeled medical images. Shamna, Govindan, & Abdul Nazeer (2018) presented the BoVW model based on the spatial matching of the visual words with location-based correlation. Also, they suggested skip similarity index for retrieval from the generated codes. Khatami, Nazari, Khosravi, Lim, & Nahavandi (2020) proposed the new generalization model based on noise perturbation for the CNN model. They added additive noise in each iteration to the weights of convolution layers. Ahn et al. (2016) used an architecture called late-fusion of domain transferred CNN with spatial pyramid features. The performance of their method was quite high with a 159.2 IRMA error score. Tizhoosh carried out various studies on the use of Radon barcodes and Gabor barcodes on IRMA dataset (Nouredanesh et al., 2016, Tizhoosh, 2015). He combined Radon barcodes with CNN architecture (Liu, Tizhoosh, & Kofman, 2016), in encoded local projections method (Tizhoosh & Babaie, 2018), in the last part of LeNet architecture (Khatami et al., 2017), and Projectron architecture (Sriram, et al., 2019). Tang, Yang, & Xia (2017) proposed the IRMA dataset retrieval method with the combined texton dictionary and locality constrained linear coding technique.

Several retrieval and tagging studies are available in the literature for the IRMA dataset. These studies are generally based on hand-crafted feature extraction algorithms (Camlica et al., 2015b). Hand-crafted feature extraction methods generally require experience and are prone to error in multi-organ datasets. Besides, they may be inadequate in producing discriminative features. For this reason, the researchers tended to extract features from raw IRMA images using CNN. However, accessing labeled medical datasets is very difficult, and the number of images in these datasets is insufficient to train the CNN architecture. Researchers have used pre-trained CNN architectures to avoid this problem. They partially solved this problem by training pre-trained CNN structures with their datasets (Tizhoosh & Babaie, 2018). Several studies combine hand-crafted feature extraction methods such as Radon, LBP, HOG with CNN architecture to improve the results obtained with traditional CNN architectures (Khatami, Babaie, Khosravi, et al., 2018). Such studies are inefficient in terms of computational complexity. To perform a more efficient analysis, researchers focus on methods based on feature vectors in FCL. Thus, CNN mechanized CBMIR systems are increasing day by day (Shi et al., 2018). Apart from CNN architecture, AE structures are also used to extract features from images and generate hash codes (Zhang, Dou, Ju, Xu, & Zhang, 2016). AE structures generate a specific relation between input and output. This specific relation allows the input vector to be represented in a different dimension at the output of the AE. AE architectures are generally used in the literature to reduce the length by preserving the important features of the input vector (Das & Walia, 2017). Thanks to these features, it is more profitable to use them for hash code generation in CBMIR studies. Using feature vectors in the FCL as an AE input can significantly improve performance. In addition, the AE structure can generate robust hash codes automatically, without the need for ground truth hash code information. The other contribution is that by simply changing the output layer of the AE architecture, the hash code of any length can be generated automatically.

In this study, a medical image tagging method consisting of four steps is presented to solve the problems mentioned above. In the first stage, features are extracted from raw medical images with the help of CNN architecture. The feature vector of 2000-digits from the FCL layer of CNN architecture is used to generate retrieval codes. These codes are obtained from the IRMA dataset containing an unbalanced number of samples. For this reason, this imbalance is reduced by creating new class-guided codes using the SMOTE algorithm (Chawla, Bowyer, Hall, & Kegelmeyer, 2002). These new classes created by the SMOTE algorithm increase the performance by reducing data imbalance. The new generated codes have the same number of components as the feature vectors. These feature vectors must be encoded as vectors with the same dimensionality (number of components) of the IRMA code (i.e.: 13) without any loss of information. The specially designed sAE structure for this process solves the code length problem. The main contributions of the proposed framework are:

(1)
Imbalance in sample distribution between classes is a widespread problem for medical datasets. The proposed method copes with this problem by balancing feature vectors with the help of the SMOTE algorithm, which is one of the most effective data oversampling methods in the literature.
(2)
To the best of the author's knowledge, this is the first study in which eliminates inter-class imbalance problems using feature vectors in the FCL layer rather than using raw data from the input.
(3)
The proposed sAE for the conversion of discriminative features produced by CNN into hash codes has increased the overall system performance considerably.

The rest of this paper is organized as follows: The technical details and parameters of the proposed method are described in Section 2. Information about the IRMA dataset, implementation details, and experimental results are given in Section 3. Finally, our conclusion is presented in Section 4.

Section snippets

A overview of the proposed framework

This study suggests an effective tagging method using an unbalanced medical dataset, as shown in Fig. 1. The proposed framework consists of four parts, three of which are the main parts. The fourth part may not be counted among the main parts as it includes the conversion of the 13-digits real numbers to the IRMA dataset standard. First, a CNN architecture is developed to extract features from an unbalanced dataset containing different numbers of MRIs from different parts of the body. This part

Dataset

The IRMA dataset was created from randomly selected samples from various X-ray images obtained during routine radiology at the Department of Diagnostic Radiology, Aachen University of Technology (RWTH), Aachen, Germany (Huang et al., 2003, Lehmann et al., 2004, Lehmann et al., 2005). The X-ray images contain many regions from various age, gender, and viewing positions. Consists of 12,677 training images and 1733 test images in 57 different image categories. These images have been converted to

Discussion

When the studies in the literature are examined, hand-crafted feature extraction is used in early studies. Recently, CNN-based methods have come to the fore. However, CNN methods do not achieve the desired performance due to the lack of labeled medical datasets. For this reason, features produced by CNN and hand-crafted features are combined. This phenomenon is again highly dependent on the human experience and time-consuming. To prevent this, this study focuses on automatically generating code

Conclusion

This study presents an effective CBIR method that can generate code for medical images and can also be used for retrieval. Instead of producing codes directly from images, this study follows the approach of generating code using deep features. Besides, an effective vector over-sampling approach is introduced for unbalanced medical image datasets. Accordingly, the vector-book is expanded using the well-known SMOTE method, unlike image similarities in the image augmentation approach, this method

CRediT authorship contribution statement

Şaban Öztürk: Conceptualization, Software, Methodology, Resources, Data curation, Validation, Formal analysis, Investigation, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The Scientific and Technological Research Council of Turkey (TÜBİTAK) under grant number 120E018.

Human and animal rights

The paper does not contain any studies with human participants or animals performed by any of the authors.

References (69)

E. Ahn et al.
Convolutional sparse kernel network for unsupervised medical image analysis
Medical Image Analysis
(2019)
A. Khatami et al.
Parallel deep solutions for image retrieval from imbalanced medical imaging archives
Applied Soft Computing
(2018)
A. Khatami et al.
A sequential search-space shrinking using CNN transfer learning and a Radon projection pool for medical image retrieval
Expert Systems with Applications
(2018)
A. Khatami et al.
A weight perturbation-based regularisation technique for convolutional neural networks and the application in medical imaging
Expert Systems with Applications
(2020)
M.K. Kundu et al.
Interactive radiographic image retrieval system
Computer Methods and Programs in Biomedicine
(2017)
T.M. Lehmann et al.
Automatic categorization of medical images for content-based retrieval and data mining
Computerized Medical Imaging and Graphics
(2005)
G. Quellec et al.
Wavelet optimization for content-based image retrieval in medical databases
Medical Image Analysis
(2010)
X. Shi et al.
Pairwise based deep ranking hashing for histopathology image classification and retrieval
Pattern Recognition
(2018)
Q. Tang et al.
Medical image classification via multiscale representation learning
Artificial Intelligence in Medicine
(2017)
D.-X. Zhou
Universality of deep convolutional neural networks
Applied and Computational Harmonic Analysis
(2020)

Z. Zhu et al.

Deep learning representation using autoencoder for 3D Shape Retrieval

Neurocomputing

(2016)

H. Abdel-Nabi et al.

Content Based Image Retrieval Approach using Deep Learning

(2019)

A.J. Afifi et al.

Content-Based Image Retrieval Using Invariant Color and Texture Features

(2012)

E. Ahn et al.

X-ray image classification using domain transferred convolutional neural networks and local sparse spatial pyramid

(2016)

A. Al-Mohamade et al.

Multiple query content-based image retrieval using relevance feature weight learning

Journal of Imaging

(2020)

M. Babaie et al.

Local radon descriptors for image search

(2017)

P.H. Bartels et al.

Objective cell image analysis

Journal of Histochemistry & Cytochemistry

(2016)

J.M. Bueno et al.

How to add content-based image retrieval capability in a PACS

(2002)

Y. Cai et al.

Medical image retrieval based on convolutional neural network and supervised hashing

IEEE Access

(2019)

Z. Camlica et al.

Autoencoding the retrieval relevance of medical images

(2015)

Z. Camlica et al.

Medical Image Classification via SVM Using LBP Features from Saliency-Based Folded Data

(2015)

Y. Cao et al.

Deep Visual-Semantic Quantization for Efficient Image Retrieval

(2017)

N.V. Chawla et al.

SMOTE: Synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

(2002)

T. Chen et al.

Computer-aided diagnosis of gallbladder polyps based on high resolution ultrasonography

Computer Methods and Programs in Biomedicine

(2020)

P. Das et al.

Content-Based Medical Visual Information Retrieval

Hybrid Machine Intelligence for Medical Image Analysis

(2020)

R. Das et al.

Partition selection with sparse autoencoders for content based image classification

Neural Computing and Applications

(2017)

Z. Gang et al.

Texture feature extraction and description using gabor wavelet in content-based medical image retrieval

(2007)

K. He et al.

Deep Residual Learning for Image Recognition

(2016)

Huang, H. K., Lehmann, T. M., Ratib, O. M., Schubert, H., Keysers, D., Kohnen, M., & Wein, B. B. (2003). The IRMA code...

Huang, Y., Huang, K., Yu, Y., & Tan, T. (2011). Salient coding for image classification. In Cvpr 2011 (pp....

D.K. Iakovidis et al.

A pattern similarity scheme for medical image retrieval

IEEE Transactions on Information Technology in Biomedicine

(2009)

Khatami, A., Babaie, M., Khosravi, A., Tizhoosh, H. R., Salaken, S. M., & Nahavandi, S. (2017). A deep-structural...

A. Krizhevsky et al.

ImageNet classification with deep convolutional neural networks

Communications of the ACM

(2017)

E.A. Krupinski

Current perspectives in medical image perception

Attention, Perception & Psychophysics

(2010)

Cited by (73)

Voxel representation of brain images inpainting via Regional Pixel Semantic Network and pyramidal attention AE - Quantile differential mechanism model
2024, Computers in Biology and Medicine
Medical image inpainting holds significant importance in enhancing the quality of medical images by restoring missing areas, thereby rendering them suitable for diagnostic purposes. While several techniques have been previously proposed for medical image inpainting, they are not suitable for distorted images containing metallic implants due to their limited consideration of known shaped masking. To overcome this limitation, a novel Vectorized Box Interpolation with Arbitrary Auto-Rand Augment Masking technique has been proposed which involves scaling and vectorizing images to expand their details and generating asymmetrically shaped masking in an automatic random format. One of the challenging tasks in this regard is the precise detection of lost regions, which is addressed through the introduction of the Regional Pixel Semantic Network. This technique employs the locally shared features (LSF) based region sensing with FCN (fully convolutional network) segmentation, which performs automatic segmentation based on neighboring pixel local dependency and regional features to determine the location of masked regions. During the reconstruction of missing parts, a significant challenge posed is the inability to recognize proximity in encoding owing to the generation of shadow-like regions on the feature map. To address this issue, a novel Multilayered DRC Regularized Pyramidal Attention AE Model has been proposed which employs dilated convolution with coherent pyramidal attention for feature extraction and improves image resolution using a Laplacian convolutional layer. Moreover, the realness of the generated image is determined using the Quantile Differential Mechanism model, where in the Quantile Differential Partial Convolutional Discriminator utilizes the hyperbolic tangent activation function in the partial convolutional layer to calculate recognition accuracy. As a result, the proposed method achieves high percentages for accuracy (98 %), precision (97 %), sensitivity (96 %), recall (95 %), and F-measure (96 %) thereby outperforming existing methods. Overall, this proposed method effectively handles distorted images with metallic implants, accurately detects lost regions, and improves the reconstructed image quality.
SRP-YOLOX: An improved deep convolutional neural network for automated via detection
2023, Microelectronics Reliability
You only look once (YOLO) series have gained great popularity in the deep learning task of object detection, for its optimal speed and accuracy trade-off. This paper presents an improved deep convolutional neural network based on YOLOX, called SRP-YOLOX, to precisely and robustly detect the vias from the scanning electron microscope (SEM) images of the integrated circuit (IC) chips, for the purpose of the subsequent netlist extraction. Compared with YOLOX, the novelty of our method is as follows. Firstly, the number of prediction heads is adapted to improve the performance of detecting tiny vias. Then, the convolutional block attention module (CBAM) is deployed to find the attention regions in scenarios with dense objects. Finally, we utilize the structural re-parameterization (SRP) to achieve further improvement without increasing the inference-time costs. Experiments on 270 testing images of 1024 × 1024 pixels show that the proposed SRP-YOLOX achieves a recall of 99.98 % and a precision of 99.94 %. Moreover, the inference results on 50 images of 8192 × 8192 pixels taken from new ICs demonstrate that the proposed SRP-YOLOX has good robustness in the task of IC via detection.
Deep learning-based circular disk type radar target detection in complex environment
2023, Physical Communication
Target identification is one of the most popular radar uses in real life. Target identification is a classifier that analyzes whether a signal contains an echo from a target (target-present) or is merely noise (target-absent). Deep learning techniques are a popular topic in classification, and they have evinced to be effective in a range of applications. In this paper, a 64 layers Circular Disk type RADAR Target Detection (CDRTD) model is proposed based on Transfer Learning using the SqueezeNet architecture of Convolutional Neural Network (CNN) that functions directly with processed radar target return eco signal and minimize the requirement of conventional laborious radar signal processing. Further, the proposed 64 layers SqueezeNet-based CNN CDRTD model was then implemented to identify circular disk type targets in complex environment. Finally, the target return eco data was tested to identify the circular disk type radar target in complex environments. We further analyzed target detection probability, false alarm rate, precision, recall, F1 in a complex environment and compared it with the ideal case. We found that our proposed CDRTD model can classify 83.3% of the test samples correctly with an overall accuracy of 94.59% in a noisy and cluttered environment whereas 100% of the test samples are classified correctly with an overall accuracy of 100% in an ideal environment.
Adaptive multi-feature fusion via cross-entropy normalization for effective image retrieval
2023, Information Processing and Management
Multi-feature fusion has achieved gratifying performance in image retrieval. However, some existing fusion mechanisms would unfortunately make the result worse than expected due to the domain and visual diversity of images. As a result, a burning problem for applying feature fusion mechanism is how to figure out and improve the complementarity of multi-level heterogeneous features. To this end, this paper proposes an adaptive multi-feature fusion method via cross-entropy normalization for effective image retrieval. First, various low-level features (e.g., SIFT) and high-level semantic features based on deep learning are extracted. Under each level of feature representation, the initial similarity scores of the query image w.r.t. the target dataset are calculated. Second, we use an independent reference dataset to approximate the tail of the attained initial similarity score ranking curve by cross-entropy normalization. Then the area under the ranking curve is calculated as the indicator of the merit of corresponding feature (i.e., a smaller area indicates a more suitable feature.). Finally, fusion weights of each feature are assigned adaptively by the statistically elaborated areas. Extensive experiments on three public benchmark datasets have demonstrated that the proposed method can achieve superior performance compared with the existing methods, improving the metrics mAP by relatively 1.04% (for Holidays), 1.22% (for Oxf5k) and the N-S by relatively 0.04 (for UKbench), respectively.
An effective hashing method using W-Shaped contrastive loss for imbalanced datasets
2022, Expert Systems with Applications
The extraction of informative features from medical images and the retrieving of similar images from data repositories is vital for clinical decision support systems. Unlike general tasks such as medical image classification and segmentation, retrieval is more reliable in terms of interpretability. However, this task is quite challenging due to the multimodal and imbalanced nature of medical images. Because traditional retrieval methods use hand-crafted feature extraction guided approximate hashing functions, they often have problems capturing the latent characteristics of images. Deep learning based retrieval methods can eliminate drawbacks of hand-crafted feature extraction methods. However, in order for a deep architecture to produce high performance, large-scale datasets containing labeled and balanced samples are required. Since most medical datasets do not have these properties, existing hashing methods are not powerful enough to model patterns in medical images, which have a similar general appearance but subtle differences. In this study, a novel W-shaped contrastive loss (W-SCL) is proposed for skin lesion image retrieval on a dataset whose visual difference between classes is relatively low. We considerably improve the traditional contrastive loss (CL) performance by including label information for very similar skin lesion images. We use two benchmark datasets consisting of general images and two benchmark skin lesion datasets to test the proposed W-SCL performance. In addition, experiments are carried out using various pre-trained CNN and shallow CNN architectures. These extensive experiments reveal that the proposed method improves the mean average precision (mAP) performance by approximately 7% for general image datasets and approximately 12% for skin lesion datasets.
Convoluted Neighborhood-Based Ordered-Dither Block Truncation Coding for Ear Image Retrieval
2024, International Journal of Image and Graphics

View all citing articles on Scopus

View full text

Stacked auto-encoder based tagging with deep features for content-based medical image retrieval

Highlights

Abstract

Introduction

Section snippets

A overview of the proposed framework

Dataset

Discussion

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Human and animal rights

Medical Image Analysis

Applied Soft Computing

Expert Systems with Applications

Expert Systems with Applications

Computer Methods and Programs in Biomedicine

Computerized Medical Imaging and Graphics

Medical Image Analysis

Pattern Recognition

Artificial Intelligence in Medicine

Applied and Computational Harmonic Analysis

Neurocomputing

Content Based Image Retrieval Approach using Deep Learning

Content-Based Image Retrieval Using Invariant Color and Texture Features

X-ray image classification using domain transferred convolutional neural networks and local sparse spatial pyramid

Multiple query content-based image retrieval using relevance feature weight learning

Journal of Imaging

Local radon descriptors for image search

Objective cell image analysis

Journal of Histochemistry & Cytochemistry

How to add content-based image retrieval capability in a PACS

Medical image retrieval based on convolutional neural network and supervised hashing

IEEE Access

Autoencoding the retrieval relevance of medical images

Medical Image Classification via SVM Using LBP Features from Saliency-Based Folded Data

Deep Visual-Semantic Quantization for Efficient Image Retrieval

SMOTE: Synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Computer-aided diagnosis of gallbladder polyps based on high resolution ultrasonography

Computer Methods and Programs in Biomedicine

Content-Based Medical Visual Information Retrieval

Hybrid Machine Intelligence for Medical Image Analysis

Partition selection with sparse autoencoders for content based image classification

Neural Computing and Applications

Texture feature extraction and description using gabor wavelet in content-based medical image retrieval

Deep Residual Learning for Image Recognition

A pattern similarity scheme for medical image retrieval

IEEE Transactions on Information Technology in Biomedicine

ImageNet classification with deep convolutional neural networks

Communications of the ACM

Current perspectives in medical image perception

Attention, Perception & Psychophysics