skip to main content
10.1145/3638884.3638976acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccipConference Proceedingsconference-collections
research-article

Incomplete Mixed Data Outlier Detection based on Local Difference Information

Published: 23 April 2024 Publication History

Abstract

Outlier detection is a fundamental task in data mining and knowledge discovery. For graph-based unsupervised outlier detection, it is typical to discover normal object behavior by utilizing the similarity neighborhood relationship among objects to identify outliers. However, the core challenge in outlier detection lies in identifying anomalies, errors, or rare instances that deviate from ordinary patterns. Directly leveraging object dissimilarity for outlier detection may be more straightforward and efficient. Accordingly, we propose a novel and efficient DRS (Distant Relative Stranger) model based on dissimilarity to discover outliers. Specifically, this study constructs a distant neighbor information network for incomplete mixed data based on object dissimilarity and uncertainty. By utilizing the local information of probabilistic graph models, the model characterizes the degree of anomaly for each object, achieving strong adaptability and high-performance unsupervised outlier detection. Through performance evaluation on 16 real-world datasets and comparison with 12 other models, it is evident that the proposed DRS model demonstrates the advantages of simplicity and stable performance in unsupervised outlier detection tasks. This research provides a new avenue and applicable approach for graph-based outlier detection studies.

References

[1]
[1] Li R, Chen H C, Liu S X, Wang K, Wang B, Hu X X. TFD-IIS-CRMCB: Telecom Fraud Detection for Incomplete Information Systems Based on Correlated Relation and Maximal Consistent Block[J]. Entropy, 2023, 25(1):112.
[2]
[2] Wang B, Wei H Q, Liu S X, Wang K, Li R. NIT: Searching for rumors in social network through neighborhood information transmission[J]. Neurocomputing, 2023, 553: 126552.
[3]
[3] Estiri H, Murphy S N. Semi-supervised encoding for outlier detection in clinical observation data[J]. Computer Methods and Programs in Biomedicine, 2019, 181: 104830.
[4]
[4] Li R, Chen H C, Liu S X, Jiang H C, Wang B. Attribute reduction for incomplete mixed data based on neighborhood information system[J]. International Journal of General Systems, 2023: 1-27.
[5]
[5] Breunig M M, Kriegel H P, Ng R T, Sander J. LOF: Identifying density-based local outliers[C]. Proceeding in: International Conference on Management of Data, ACM SIGMOD, 2000: 93-104.
[6]
[6] Li Z, Zhao Y, Botta N, Ionescu C, Hu X Y. COPOD: Copula-based outlier detection[C]. Proceeding in: International conference on data mining, IEEE, 2020: 1118-1123.
[7]
[7] Liu Y Z, Li Z, Zhou C, Jiang Y C, Sun J S, Wang M, He X N. Generative Adversarial Active Learning for Unsupervised Outlier Detection[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(8): 1517-1528.
[8]
[8] Goodge A, Hooi B, Ng S K, Ng W S. LUNAR: Unifying local outlier detection methods via graph neural networks[C]. Proceeding in: International Conference on Artificial Intelligence (NeurIPS), AAAI, 2022, pp. 6737-6745.
[9]
[9] Chen Y M, Miao D Q, Zhang H Y. Neighborhood outlier detection[J]. Expert Systems with Applications, 2010, 37 (12): 8745-8749.
[10]
[10] Yuan Z, Zhang X Y, Feng S. Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures[J]. Expert Systems with Application, 2018, 112: 243-257.
[11]
[11] Yuan Z, Chen H M, Li T R, Zhang X Y, Sang B B. Multigranulation relative entropy-based mixed attribute outlier detection in neighborhood systems[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52 (8): 5175-5187.
[12]
[12] Zhang P F, Li T R, Wang G Q, Wang D X, Lai P, Zhang F. A multi-source information fusion model for outlier detection[J]. Information Fusion, 93: 192-208.
[13]
[13] Yuan Z, Chen H M, Luo C, Peng D Z. MFGAD: Multi-fuzzy granules anomaly detection[J]. Information Fusion, 2023, 95: 17-25.
[14]
[14] Moonesignhe H D K, Tan P N. Outlier detection using random walks[C]. Processing in: International Conference on Tools with Artificial Intelligence, IEEE, 2006: 532-539.
[15]
[15] Wang Y, Li Y P. Outlier detection based on weighted neighbourhood information network for mixed-valued datasets[J]. Information Sciences, 2021, 564: 396-415.
[16]
[16] Li R, Chen H C, Liu S X, Li X, Li Y L, Wang B. Incomplete mixed data-driven outlier detection based on local-global neighborhood information[J]. Information Sciences, 2023, 633: 204-225.
[17]
[17] Liu C, Yuan Z, Chen B Y, Chen H M, Peng D Z. Fuzzy granular anomaly detection using Markov random walk[J]. Information Sciences, 2023, 646: 119400.

Index Terms

  1. Incomplete Mixed Data Outlier Detection based on Local Difference Information

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing
    December 2023
    648 pages
    ISBN:9798400708909
    DOI:10.1145/3638884
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DRS
    2. difference information
    3. markov random walk
    4. outlier detection
    5. uncertainty

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Program of Song Shan Laboratory (Included in the management of Major Science and Technology Program of Henan Province)

    Conference

    ICCIP 2023

    Acceptance Rates

    Overall Acceptance Rate 61 of 301 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 21
      Total Downloads
    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media