skip to main content
10.1145/3350546.3352520acmotherconferencesArticle/Chapter ViewAbstractPublication PageswiConference Proceedingsconference-collections
research-article

Multi-parameter streaming outlier detection

Published:14 October 2019Publication History

ABSTRACT

Distance-based outlier detection techniques is a wide-spread methodology for anomaly detection. Despite their effectiveness, a main limitation is that they heavily rely on the dataset and the parameters chosen in order to establish the right status of each data point. These parameters typically include, but are not limited to, the neighborhood radius and threshold. In continuous streaming environments, the need for real-time analysis does not permit for an algorithm to be restarted multiple times with different parameters until the right combination is specified. This gives rise to the need for one technique that combines an arbitrary number of parameterizations with the use of minimal yet sufficient computer resources. In this work we both compare the state-of-the-art techniques for handling multiple queries in distance-based outlier detection algorithms and we propose a novel technique for multi-parameter distance-based outlier detection tailored to distributed continuous streaming environments, such as Spark and Flink.

References

  1. Charu C. Aggarwal. 2013. Outlier Analysis. Springer.Google ScholarGoogle Scholar
  2. F. Angiulli and F. Fassetti. 2007. Detecting distance-based outliers in streams of data. In CIKM. 811–820.Google ScholarGoogle Scholar
  3. Lei Cao, Jiayuan Wang, and Elke A Rundensteiner. 2016. Sharing-aware outlier analytics over high-volume data streams. In Proceedings of the 2016 International Conference on Management of Data. ACM, 527–540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lei Cao, Yizhou Yan, Caitlin Kuhlman, Qingyang Wang, Elke A. Rundensteiner, and Mohamed Y. Eltabakh. 2017. Multi-Tactic Distance-Based Outlier Detection. In ICDE. 959–970.Google ScholarGoogle Scholar
  5. Lei Cao, Di Yang, Qingyang Wang, Yanwei Yu, Jiayuan Wang, and Elke A Rundensteiner. 2014. Scalable distance-based outlier detection over high-volume data streams. In ICDE. 76–87.Google ScholarGoogle Scholar
  6. Edwin M. Knorr, Raymond T. Ng, and Vladimir Tucakov. 2000. Distance-based Outliers: Algorithms and Applications. The VLDB Journal 8, 3-4 (2000).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Maria Kontaki, Anastasios Gounaris, Apostolos N Papadopoulos, Kostas Tsichlas, and Yannis Manolopoulos. 2016. Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Information systems 55(2016), 37–53.Google ScholarGoogle Scholar
  8. Theodoros Toliopoulos, Anastasios Gounaris, Kostas Tsichlas, Apostolos Papadopoulos, and Sandra Sampaio. 2018. Parallel Continuous Outlier Mining in Streaming Data. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 227–236.Google ScholarGoogle Scholar
  9. Theodoros Toliopoulos, Anastasios Gounaris, Kostas Tsichlas, Apostolos Papadopoulos, and Sandra Sampaio. 2019. Continuous Outlier Mining of Streaming Data in Flink. CoRR abs/1902.07901(2019).Google ScholarGoogle Scholar
  10. Luan Tran, Liyue Fan, and Cyrus Shahabi. 2016. Distance-based outlier detection in data streams. Proceedings of the VLDB Endowment 9, 12 (2016), 1089–1100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Yang, E.A. Rundensteiner, and M.O. Ward. 2009. Neighbor-based pattern detection for windows over streaming data. In EDBT. 529–540.Google ScholarGoogle Scholar
  12. Peter N Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, Vol. 93. 311–321.Google ScholarGoogle Scholar
  13. Guanzhe Zhao, Yanwei Yu, Peng Song, Geng Zhao, and Zhe Ji. 2018. A Parameter Space Framework for Online Outlier Detection Over High-Volume Data Streams. IEEE Access 6(2018), 38124–38136.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-parameter streaming outlier detection
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WI '19: IEEE/WIC/ACM International Conference on Web Intelligence
        October 2019
        507 pages

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 October 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate118of178submissions,66%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format