Weighted IForest and siamese GRU on small sample anomaly detection in healthcare

doi:10.1016/j.cmpb.2022.106706

Computer Methods and Programs in Biomedicine

Volume 218, May 2022, 106706

https://doi.org/10.1016/j.cmpb.2022.106706 Get rights and content

Highlights

•
Proposes a weighted IForest algorithm to mark a small part of the data. Expert decision making rules and use logical regression algorithm to obtain the weight of features.
•
Improves the FDA function and uses it as the loss function of SGRU to improve the accuracy of the algorithm.

Abstract

Background and objectiveAt present, many achievements have been made in anomaly detection of big data using deep neural network, However, in many practical application scenarios, there are still some problems, such as shortage of data, too large workload of manual data annotating and so on.

MethodsThis paper proposes weighted iForest and Siamese GRU (WIF-SGRU) algorithm on small sample anomaly detection. In the data annotation stage, we propose a weighted IForest algorithm for automatic annotation of unlabeled data. In the training phase of anomaly detection model, the Siamese GRU is proposed to train the target data to obtain the anomaly model and detect the real-time anomaly of small sample data.

ResultsThe proposed algorithm is verified on six public datasets (Arrhythmia, Shuttle, Staellite, Sttimage-2, Lymphography, and WBC). The experimental results show that compared with the traditional data annotation and anomaly detection algorithm, the algorithm of weighted IForest and Siamese GRU improves the accuracy and real-time performance.

ConclusionsThis paper proposes a weighted IForest and Siamese GRU algorithm architecture, which provides a more accurate and efficient method for outlier detection of data. Firstly, the framework uses the improved IForest algorithm to label the label-free data, Then the Siamese GRU is optimized by the improved $F D A_{loss}$ function,the optimized network is used to learn the distance between data for real-time and efficient anomaly detection. Experiments show that the framework has good potential.

Introduction

Anomaly detection of structured data is an important part of anomaly detection. How to detect anomalies from massive data is an important research direction and a very difficult task. The main purpose of anomaly detection is to find out the important information of possible key situations from a small amount of data significantly different from other data. Anomaly detection also has a wide range of applications such as credit card fraud, network attack detection and so on. The main goal of anomaly detection is to identify a few data that are inconsistent with the general characteristics of the data. The commonly used anomaly detection algorithm has statistical hypothesis testing algorithm [1], DBSCAN(Density-Based Spatial Clustering of Applications with Noise) [2], [3], One-Class SVM algorithm [4], K-means clustering algorithm [5], IForest(Isolation Forest) algorithm [6], [7] and anomaly detection based on deep learning [8], [9] and so on. The above algorithms are very mature anomaly detection algorithms, which provides a good basis for anomaly detection. However, there are still some difficulties to be solved in practical applications, such as using depth neural network for anomaly detection, which requires a large number of labeled data for training. However, the traditional annotating method has strong subjectivity and consumes manpower and material resources. At the same time, due to the lack of training data caused by various reasons, there are not enough data points for specific use scenarios. Many algorithms and traditional neural network structures cannot build a good anomaly detection model, and cannot meet the needs of efficient and real-time anomaly detection. In order to solve the difficulty of data annotation and the influence of insufficient samples on the establishment of anomaly model,this paper proposes an algorithm(WIF-SGRU) based on weighted IForest and Siamese GRU (Gated Recurrent Unit) to detect anomalies in unlabeled small sample data

The most noteworthy is that anomaly detection has been widely used in various fields. The success of proposed model in a variety of datasets further illustrate itsvalidity.Accordingly, the main contributions of our model are summarized as bellow:

1)
Based on studies,we propose the weighted IForest algorithm to mark a small part of the data. Expert decision making rules and use logical regression algorithm to obtain the weight of features.
2)
Our model improves the FDA function so that FDA can adapt to the training of siamese network and use it as the loss function of SGRU to improve the accuracy of the algorithm.
3)
we apply our model to detect anomaly for health monitoring and achieve significant performance improvements.

The rest of this paper is organized as follows: In the second section, the related work section reviews the existing algorithms and related studies on anomaly detection. The third section introduces the process of anomaly detection model from data annotation to model construction, and describes the proposed improvements.The fourth section presents the relevant experiments and results. The fifth section analyzes the experimental results and draws relevant conclusions and summarizes the future work.

Section snippets

Related work

The anomaly detection is used to discover data that does not match with the general characteristics of the data. It has a wide range of applications, such as social network anomaly detection, Chaudhary uses the ability of deep learning to detect anomalies in email networks and Twitter networks, and proposes a neural network model to apply it to social connection diagrams to detect anomalies[10]. M Venkatesan and others proposed a graph-based unsupervised machine learning method for edge and

Anomaly detection model

The main problems of the anomaly detection model are the difficulty of data annotation and the influence of insufficient sample size on model training. The algorithm is divided into three steps:

1)
The determination of weights: Annotating small parts of data through expert decision making rules, Then, the importance weights of each feature are learned through logical regression to determine the influence of each feature on the abnormal results.
2)
Training data annotation: The obtained weight is

Experimental results and analysis

The experimental data in this paper include six common datasets : Arrhythmia, Shuttle, Staellite, Sttimage-2, Lymphography, and WBC. The reason for the selected datasets are the sufficient variety of samples and the large gap between the samples, along with the size and appropriateness of the dataThese datasets are used by most abnormal recognition research papers, which have high comparative value, and the proportion of abnormal data in data sets is relatively low, which is more suitable for

Conclusions

This paper proposes a weighted IForest and Siamese GRU algorithm architecture, which provides a more accurate and efficient method for outlier detection of data. Firstly, the framework uses the improved IForest algorithm to label the label-free data, Then the Siamese GRU is optimized by the improved $F D A_{loss}$ function,the optimized network is used to learn the distance between data for real-time and efficient anomaly detection. Experiments show that the framework has good potential. In the

Declaration of Competing Interest

The authors declare that there are no conflict of interests, we do not have any possible conflicts of interest.

Acknowledgment

The author is very grateful to the editor and reviewer for their comments and suggestions. The author will actively improve and learn according to the suggestions. This work was partially supported by key national projects.

References (31)

Y. Wang
Application of artificial intelligence based on deep learning in breast cancer screening and imaging diagnosis
Neural Comput. Appl.
(2021)
Y. Zhang et al.
PEA: Parallel electrocardiogram-based authentication for smart healthcare system
J. Netw. Comput. Appl.
(2018)
Z. Chen et al.
Learning graph structures with transformer for multivariate time series anomaly detection in iot
IEEE Internet Things J.
(2021)
P.J. Rousseeuwand et al.
A fast algorithm for the minimum covariance determinant estimator
Technometrics
(1999)
A.S. Jalal et al.
A density based algorithm for discovering density varied clusters in large spatial databases
Int. J. Comput. Appl.
(2010)
M. Yang et al.
Deep learning and one-class SVM based anomalous crowd detection
2019 International Joint Conference on Neural Networks (IJCNN)
(2019)
Min Chen, Wenjing Xiao, Long Hu, Yujun Ma, Yin Zhang, and Guangming Tao. 2021. Cognitive Wearable Robotics for Autism...
M. Chen et al.
Digital medical education empowered by intelligent fabric space
28 National Science Open
(2022)
B. Hu et al.
Bio-inspired visual neural network on spatio-temporal depth rotation perception
Neural Comput. Appl.
(2021)
W. Luo et al.
Remembering history with convolutional LSTM for anomaly detection
2017 IEEE International Conference on Multimedia and Expo (ICME)
(2017)

A. Chaudhary et al.

Anomaly detection using graph neural networks

2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India

(2019)

M. Venkatesan et al.

Graph based unsupervised learning methods for edge and node anomaly detection in social network

2019 IEEE 1st International Conference on Energy, Systems and Information Processing (ICESIP)

(2019)

S. Chen et al.

Anomaly subgraph mining in large-scale social networks

2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing& Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Xiamen, China

(2019)

A. Deng et al.

Graph neural network-based anomaly detection in multivariate time series

Proceedings of the AAAI Conference on Artificial Intelligence

(2021)

F. Cauteruccio

A framework for anomaly detection and classification in multiple IoT scenarios

Future Gener. Comput. Syst.

(2021)

Cited by (10)

Anomaly detection for space information networks: A survey of challenges, techniques, and future directions
2024, Computers and Security
Space anomaly detection plays a critical role in safeguarding the integrity and reliability of space systems amid the rising tide of threats. This survey aims to deepen comprehension of space cyber threats through space threat modeling, and meticulously examine the unique challenges of space anomaly detection. The survey identifies scalability, real-time detection, limited labeled data availability, concept drift, and adversarial attacks as key challenges based on thorough literature analysis and synthesis. By extensively exploring state-of-the-art anomaly detection techniques, the study evaluates their applicability, strengths, and limitations within space networks. Going beyond analysis, a notable contribution of this work involves integrating stream-based and graph-based methods, tailored to capture the intricate temporal and structural relationships inherent in space networks. This innovative hybrid approach holds promise for heightened detection accuracy and sets the stage for future research endeavors. As space threats continue evolving in both number and sophistication, this survey timely provides insights, recommendations, and a clear roadmap for researchers, engineers, and practitioners to fortify space anomaly detection mechanisms.
Ensembling shallow siamese architectures to assess functional asymmetry in Alzheimer's disease progression
2023, Applied Soft Computing
The development of methods based on artificial intelligence for the classification of medical imaging is widespread. Given the high dimensionality of this type of images, it is imperative to use the information contained in relevant regions for further classification. This information can be derived from the morphology of the region of interest, in terms of measurements such as area, perimeter, etc. However, the performance of the classification system strongly depends on the correct selection of the type of information employed. We propose in this work an alternative for evaluating differences between brain regions that relies on the basis of Siamese neural networks. Initially, brain scans are delimited by an anatomical atlas. Next, each pair of regions of interest is then entered into a Siamese network, which is formed by relating the distance between the two individual outputs and the corresponding label. Features are extracted from the embeddings of the final linear layer. Finally, the classification is performed by combining the characteristics of each pair of regions into an ensemble architecture. Performance was assessed by determining how asymmetry between the right and left hemispheres changes during progressive brain degeneration, from mild cognitive impairment to severe atrophy associated with Alzheimer’s disease (AD). Our method discriminates with an accuracy of 98.95% between controls and AD patients, and most important, it predicts the cognitive decline in patients suffering from mild cognitive impairment that will develop AD before it occurs with an accuracy of 78.41%. These results demonstrate the applicability of our proposal in the study of a wide range of pathologies.
Anomaly Detection for Space Information Networks: A Survey of Challenges, Techniques, and Future Directions
2023, SSRN
Maize Seedling Leave Counting Based on Semi-Supervised Learning and UAV RGB Images
2023, Sustainability (Switzerland)
DIAGNOSIS METHOD OF ABNORMAL FLUCTUATION OF CPU USAGE BASED ON IFOREST-BILSTM
2023, IET Conference Proceedings
Improved Anomaly Detection by Using the Attention-Based Isolation Forest
2023, Algorithms

View all citing articles on Scopus

View full text

Weighted IForest and siamese GRU on small sample anomaly detection in healthcare

Highlights

Abstract

Introduction

Section snippets

Related work

Anomaly detection model

Experimental results and analysis

Conclusions

Declaration of Competing Interest

Acknowledgment

Neural Comput. Appl.

J. Netw. Comput. Appl.

IEEE Internet Things J.

A fast algorithm for the minimum covariance determinant estimator

Technometrics

A density based algorithm for discovering density varied clusters in large spatial databases

Int. J. Comput. Appl.

Deep learning and one-class SVM based anomalous crowd detection

2019 International Joint Conference on Neural Networks (IJCNN)

Digital medical education empowered by intelligent fabric space

28 National Science Open

Bio-inspired visual neural network on spatio-temporal depth rotation perception

Neural Comput. Appl.

Remembering history with convolutional LSTM for anomaly detection

2017 IEEE International Conference on Multimedia and Expo (ICME)

Anomaly detection using graph neural networks

2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India

Graph based unsupervised learning methods for edge and node anomaly detection in social network

2019 IEEE 1st International Conference on Energy, Systems and Information Processing (ICESIP)

Anomaly subgraph mining in large-scale social networks

2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing& Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Xiamen, China

Graph neural network-based anomaly detection in multivariate time series

Proceedings of the AAAI Conference on Artificial Intelligence

A framework for anomaly detection and classification in multiple IoT scenarios

Future Gener. Comput. Syst.