Weighted IForest and siamese GRU on small sample anomaly detection in healthcare
Introduction
Anomaly detection of structured data is an important part of anomaly detection. How to detect anomalies from massive data is an important research direction and a very difficult task. The main purpose of anomaly detection is to find out the important information of possible key situations from a small amount of data significantly different from other data. Anomaly detection also has a wide range of applications such as credit card fraud, network attack detection and so on. The main goal of anomaly detection is to identify a few data that are inconsistent with the general characteristics of the data. The commonly used anomaly detection algorithm has statistical hypothesis testing algorithm [1], DBSCAN(Density-Based Spatial Clustering of Applications with Noise) [2], [3], One-Class SVM algorithm [4], K-means clustering algorithm [5], IForest(Isolation Forest) algorithm [6], [7] and anomaly detection based on deep learning [8], [9] and so on. The above algorithms are very mature anomaly detection algorithms, which provides a good basis for anomaly detection. However, there are still some difficulties to be solved in practical applications, such as using depth neural network for anomaly detection, which requires a large number of labeled data for training. However, the traditional annotating method has strong subjectivity and consumes manpower and material resources. At the same time, due to the lack of training data caused by various reasons, there are not enough data points for specific use scenarios. Many algorithms and traditional neural network structures cannot build a good anomaly detection model, and cannot meet the needs of efficient and real-time anomaly detection. In order to solve the difficulty of data annotation and the influence of insufficient samples on the establishment of anomaly model,this paper proposes an algorithm(WIF-SGRU) based on weighted IForest and Siamese GRU (Gated Recurrent Unit) to detect anomalies in unlabeled small sample data
The most noteworthy is that anomaly detection has been widely used in various fields. The success of proposed model in a variety of datasets further illustrate itsvalidity.Accordingly, the main contributions of our model are summarized as bellow:
- 1)
Based on studies,we propose the weighted IForest algorithm to mark a small part of the data. Expert decision making rules and use logical regression algorithm to obtain the weight of features.
- 2)
Our model improves the FDA function so that FDA can adapt to the training of siamese network and use it as the loss function of SGRU to improve the accuracy of the algorithm.
- 3)
we apply our model to detect anomaly for health monitoring and achieve significant performance improvements.
The rest of this paper is organized as follows: In the second section, the related work section reviews the existing algorithms and related studies on anomaly detection. The third section introduces the process of anomaly detection model from data annotation to model construction, and describes the proposed improvements.The fourth section presents the relevant experiments and results. The fifth section analyzes the experimental results and draws relevant conclusions and summarizes the future work.
Section snippets
Related work
The anomaly detection is used to discover data that does not match with the general characteristics of the data. It has a wide range of applications, such as social network anomaly detection, Chaudhary uses the ability of deep learning to detect anomalies in email networks and Twitter networks, and proposes a neural network model to apply it to social connection diagrams to detect anomalies[10]. M Venkatesan and others proposed a graph-based unsupervised machine learning method for edge and
Anomaly detection model
The main problems of the anomaly detection model are the difficulty of data annotation and the influence of insufficient sample size on model training. The algorithm is divided into three steps:
- 1)
The determination of weights: Annotating small parts of data through expert decision making rules, Then, the importance weights of each feature are learned through logical regression to determine the influence of each feature on the abnormal results.
- 2)
Training data annotation: The obtained weight is
Experimental results and analysis
The experimental data in this paper include six common datasets : Arrhythmia, Shuttle, Staellite, Sttimage-2, Lymphography, and WBC. The reason for the selected datasets are the sufficient variety of samples and the large gap between the samples, along with the size and appropriateness of the dataThese datasets are used by most abnormal recognition research papers, which have high comparative value, and the proportion of abnormal data in data sets is relatively low, which is more suitable for
Conclusions
This paper proposes a weighted IForest and Siamese GRU algorithm architecture, which provides a more accurate and efficient method for outlier detection of data. Firstly, the framework uses the improved IForest algorithm to label the label-free data, Then the Siamese GRU is optimized by the improved function,the optimized network is used to learn the distance between data for real-time and efficient anomaly detection. Experiments show that the framework has good potential. In the
Declaration of Competing Interest
The authors declare that there are no conflict of interests, we do not have any possible conflicts of interest.
Acknowledgment
The author is very grateful to the editor and reviewer for their comments and suggestions. The author will actively improve and learn according to the suggestions. This work was partially supported by key national projects.
References (31)
Application of artificial intelligence based on deep learning in breast cancer screening and imaging diagnosis
Neural Comput. Appl.
(2021)- et al.
PEA: Parallel electrocardiogram-based authentication for smart healthcare system
J. Netw. Comput. Appl.
(2018) - et al.
Learning graph structures with transformer for multivariate time series anomaly detection in iot
IEEE Internet Things J.
(2021) - et al.
A fast algorithm for the minimum covariance determinant estimator
Technometrics
(1999) - et al.
A density based algorithm for discovering density varied clusters in large spatial databases
Int. J. Comput. Appl.
(2010) - et al.
Deep learning and one-class SVM based anomalous crowd detection
2019 International Joint Conference on Neural Networks (IJCNN)
(2019) - Min Chen, Wenjing Xiao, Long Hu, Yujun Ma, Yin Zhang, and Guangming Tao. 2021. Cognitive Wearable Robotics for Autism...
- et al.
Digital medical education empowered by intelligent fabric space
28 National Science Open
(2022) - et al.
Bio-inspired visual neural network on spatio-temporal depth rotation perception
Neural Comput. Appl.
(2021) - et al.
Remembering history with convolutional LSTM for anomaly detection
2017 IEEE International Conference on Multimedia and Expo (ICME)
(2017)
Anomaly detection using graph neural networks
2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India
Graph based unsupervised learning methods for edge and node anomaly detection in social network
2019 IEEE 1st International Conference on Energy, Systems and Information Processing (ICESIP)
Anomaly subgraph mining in large-scale social networks
2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing& Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Xiamen, China
Graph neural network-based anomaly detection in multivariate time series
Proceedings of the AAAI Conference on Artificial Intelligence
A framework for anomaly detection and classification in multiple IoT scenarios
Future Gener. Comput. Syst.
Cited by (9)
Anomaly detection for space information networks: A survey of challenges, techniques, and future directions
2024, Computers and SecurityEnsembling shallow siamese architectures to assess functional asymmetry in Alzheimer's disease progression
2023, Applied Soft ComputingMaize Seedling Leave Counting Based on Semi-Supervised Learning and UAV RGB Images
2023, Sustainability (Switzerland)A Novel Method of Local Anode Effect Prediction for Large Aluminum Reduction Cell
2022, Applied Sciences (Switzerland)