Weighted IForest and siamese GRU on small sample anomaly detection in healthcare

https://doi.org/10.1016/j.cmpb.2022.106706Get rights and content

Highlights

  • Proposes a weighted IForest algorithm to mark a small part of the data. Expert decision making rules and use logical regression algorithm to obtain the weight of features.

  • Improves the FDA function and uses it as the loss function of SGRU to improve the accuracy of the algorithm.

Abstract

Background and objectiveAt present, many achievements have been made in anomaly detection of big data using deep neural network, However, in many practical application scenarios, there are still some problems, such as shortage of data, too large workload of manual data annotating and so on.

MethodsThis paper proposes weighted iForest and Siamese GRU (WIF-SGRU) algorithm on small sample anomaly detection. In the data annotation stage, we propose a weighted IForest algorithm for automatic annotation of unlabeled data. In the training phase of anomaly detection model, the Siamese GRU is proposed to train the target data to obtain the anomaly model and detect the real-time anomaly of small sample data.

ResultsThe proposed algorithm is verified on six public datasets (Arrhythmia, Shuttle, Staellite, Sttimage-2, Lymphography, and WBC). The experimental results show that compared with the traditional data annotation and anomaly detection algorithm, the algorithm of weighted IForest and Siamese GRU improves the accuracy and real-time performance.

ConclusionsThis paper proposes a weighted IForest and Siamese GRU algorithm architecture, which provides a more accurate and efficient method for outlier detection of data. Firstly, the framework uses the improved IForest algorithm to label the label-free data, Then the Siamese GRU is optimized by the improved FDAloss function,the optimized network is used to learn the distance between data for real-time and efficient anomaly detection. Experiments show that the framework has good potential.

Introduction

Anomaly detection of structured data is an important part of anomaly detection. How to detect anomalies from massive data is an important research direction and a very difficult task. The main purpose of anomaly detection is to find out the important information of possible key situations from a small amount of data significantly different from other data. Anomaly detection also has a wide range of applications such as credit card fraud, network attack detection and so on. The main goal of anomaly detection is to identify a few data that are inconsistent with the general characteristics of the data. The commonly used anomaly detection algorithm has statistical hypothesis testing algorithm [1], DBSCAN(Density-Based Spatial Clustering of Applications with Noise) [2], [3], One-Class SVM algorithm [4], K-means clustering algorithm [5], IForest(Isolation Forest) algorithm [6], [7] and anomaly detection based on deep learning [8], [9] and so on. The above algorithms are very mature anomaly detection algorithms, which provides a good basis for anomaly detection. However, there are still some difficulties to be solved in practical applications, such as using depth neural network for anomaly detection, which requires a large number of labeled data for training. However, the traditional annotating method has strong subjectivity and consumes manpower and material resources. At the same time, due to the lack of training data caused by various reasons, there are not enough data points for specific use scenarios. Many algorithms and traditional neural network structures cannot build a good anomaly detection model, and cannot meet the needs of efficient and real-time anomaly detection. In order to solve the difficulty of data annotation and the influence of insufficient samples on the establishment of anomaly model,this paper proposes an algorithm(WIF-SGRU) based on weighted IForest and Siamese GRU (Gated Recurrent Unit) to detect anomalies in unlabeled small sample data

The most noteworthy is that anomaly detection has been widely used in various fields. The success of proposed model in a variety of datasets further illustrate itsvalidity.Accordingly, the main contributions of our model are summarized as bellow:

  • 1)

    Based on studies,we propose the weighted IForest algorithm to mark a small part of the data. Expert decision making rules and use logical regression algorithm to obtain the weight of features.

  • 2)

    Our model improves the FDA function so that FDA can adapt to the training of siamese network and use it as the loss function of SGRU to improve the accuracy of the algorithm.

  • 3)

    we apply our model to detect anomaly for health monitoring and achieve significant performance improvements.

The rest of this paper is organized as follows: In the second section, the related work section reviews the existing algorithms and related studies on anomaly detection. The third section introduces the process of anomaly detection model from data annotation to model construction, and describes the proposed improvements.The fourth section presents the relevant experiments and results. The fifth section analyzes the experimental results and draws relevant conclusions and summarizes the future work.

Section snippets

Related work

The anomaly detection is used to discover data that does not match with the general characteristics of the data. It has a wide range of applications, such as social network anomaly detection, Chaudhary uses the ability of deep learning to detect anomalies in email networks and Twitter networks, and proposes a neural network model to apply it to social connection diagrams to detect anomalies[10]. M Venkatesan and others proposed a graph-based unsupervised machine learning method for edge and

Anomaly detection model

The main problems of the anomaly detection model are the difficulty of data annotation and the influence of insufficient sample size on model training. The algorithm is divided into three steps:

  • 1)

    The determination of weights: Annotating small parts of data through expert decision making rules, Then, the importance weights of each feature are learned through logical regression to determine the influence of each feature on the abnormal results.

  • 2)

    Training data annotation: The obtained weight is

Experimental results and analysis

The experimental data in this paper include six common datasets : Arrhythmia, Shuttle, Staellite, Sttimage-2, Lymphography, and WBC. The reason for the selected datasets are the sufficient variety of samples and the large gap between the samples, along with the size and appropriateness of the dataThese datasets are used by most abnormal recognition research papers, which have high comparative value, and the proportion of abnormal data in data sets is relatively low, which is more suitable for

Conclusions

This paper proposes a weighted IForest and Siamese GRU algorithm architecture, which provides a more accurate and efficient method for outlier detection of data. Firstly, the framework uses the improved IForest algorithm to label the label-free data, Then the Siamese GRU is optimized by the improved FDAloss function,the optimized network is used to learn the distance between data for real-time and efficient anomaly detection. Experiments show that the framework has good potential. In the

Declaration of Competing Interest

The authors declare that there are no conflict of interests, we do not have any possible conflicts of interest.

Acknowledgment

The author is very grateful to the editor and reviewer for their comments and suggestions. The author will actively improve and learn according to the suggestions. This work was partially supported by key national projects.

References (31)

  • A. Chaudhary et al.

    Anomaly detection using graph neural networks

    2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India

    (2019)
  • M. Venkatesan et al.

    Graph based unsupervised learning methods for edge and node anomaly detection in social network

    2019 IEEE 1st International Conference on Energy, Systems and Information Processing (ICESIP)

    (2019)
  • S. Chen et al.

    Anomaly subgraph mining in large-scale social networks

    2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing& Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Xiamen, China

    (2019)
  • A. Deng et al.

    Graph neural network-based anomaly detection in multivariate time series

    Proceedings of the AAAI Conference on Artificial Intelligence

    (2021)
  • F. Cauteruccio

    A framework for anomaly detection and classification in multiple IoT scenarios

    Future Gener. Comput. Syst.

    (2021)
  • Cited by (9)

    View all citing articles on Scopus
    View full text