Abstract
Anomaly detection aims at detecting examples that do not conform to normal behavior. Increasingly, anomaly detection is being approached from a semi-supervised perspective where active learning is employed to acquire a small number of strategically selected labels. However, because anomalies are not always well-understood events, the user may be uncertain about how to label certain instances. Thus, one can relax this request and allow the user to provide soft labels (i.e., probabilistic labels) that represent their belief that a queried example is anomalous. These labels are naturally noisy due to the user’s inherent uncertainty in the label and the fact that people are known to be bad at providing well-calibrated probability instances. To cope with these challenges, we propose to exploit a Gaussian Process to learn from actively acquired soft labels in the context of anomaly detection. This enables leveraging information about nearby examples to smooth out possible noise. Empirically, we compare our proposed approach to several baselines on 21 datasets and show that it outperforms them in the majority of experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The code and Supplement are available via https://github.com/TimoM99/SLADe.
- 2.
Results for \(0\%\) and \(10\%\) noise are, for completeness, in the Supplement.
References
Abe, N., Zadrozny, B., Langford, J.: Outlier detection by active learning. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 504–509. Springer (2006)
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45681-3_2
Buhmann, M.D.: Radial basis functions. Acta Numer. 9, 1–38 (2000)
Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining Knowl. Discov. 30, 891–927 (2016)
Ding, Y., Wang, L., Fan, D., Gong, B.: A semi-supervised two-stage approach to learning from noisy labels. In: 2018 IEEE Winter Conference on Applications of Computer Vision, pp. 1215–1224. IEEE (2018)
Ebert, S., Fritz, M., Schiele, B.: Ralf: a reinforced active learning formulation for object class recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012)
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)
Fujimaki, R., Yairi, T., Machida, K.: An approach to spacecraft anomaly detection problem using kernel feature space. In: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 401–410. Association for Computing Machinery (2005)
Griffin, D., Tversky, A.: The weighing of evidence and the determinants of confidence. Cognit. Psychol. 24(3), 411–435 (1992)
Guthrie, D., Guthrie, L., Allison, B., Wilks, Y.: Unsupervised anomaly detection. In: 20th International Joint Conference on Artificial Intelligence, pp. 1624–1628. Morgan Kaufmann Publishers (2007)
Hacohen, G., Dekel, A., Weinshall, D.: Active learning on a budget: opposite strategies suit high and low budgets. In: 39th International Conference on Machine Learning, pp. 8175–8195. PMLR (2022)
Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: Adbench: anomaly detection benchmark. Adv. Neural Inf. Process. Syst. 35, 32142–32159 (2022)
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognit. Lett. 24(9–10), 1641–1650 (2003)
Huang, J., Ling, C.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)
Juslin, P., Olsson, H., Winman, A.: The calibration issue: theoretical comments on suantak, bolger, and ferrell (1996). Organiz. Behav. Human Decis. Process. 73(1), 3–26 (1998)
Kapoor, A., Grauman, K., Urtasun, R., Darrell, T.: Active learning with gaussian processes for object categorization. In: 11th IEEE International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Kowalska, K., Peel, L.: Maritime anomaly detection using gaussian process active learning. In: 15th IEEE International Conference on Information Fusion, pp. 1164–1171. IEEE (2012)
Kriegel, H.P., Kroger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: 2011 SIAM International Conference on Data Mining, pp. 13–24. SIAM (2011)
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: 2003 SIAM International Conference on Data Mining, pp. 25–36. SIAM (2003)
Lin, C.F., Wang, S.D.: Fuzzy support vector machines. IEEE Trans. Neural Netw. 13(2), 464–471 (2002)
Littlestone, N., Warmuth, M.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Marteau, P.F., Soheily-Khah, S., Béchet, N.: Hybrid isolation forest-application to intrusion detection. arXiv preprint arXiv:1705.03800 (2017)
Monarch, R.M.: Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-Centered AI. Simon and Schuster (2021)
Nassar, L., Karray, F.: Overview of the crowdsourcing process. Knowl. Inf. Syst. 60, 1–24 (2019)
Nguyen, Q., Valizadegan, H., Hauskrecht, M.: Learning classification models with soft-label information. J. Am. Med. Inf. Assoc. 21(3), 501–508 (2014)
Nguyen, Q., Valizadegan, H., Seybert, A., Hauskrecht, M.: Sample-efficient learning with auxiliary class-label information. In: 2011 AMIA Annual Symposium, pp. 1004–1012. American Medical Informatics Association (2011)
Niaf, E., Flamary, R., Rouviere, O., Lartizien, C., Canu, S.: Kernel-based learning from both qualitative and quantitative labels: application to prostate cancer diagnosis based on multiparametric mr imaging. IEEE Trans. Image Process. 23(3), 979–991 (2013)
Pang, G., Shen, C., van den Hengel, A.: Deep anomaly detection with deviation networks. In: 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 353–362. Association for Computing Machinery (2019)
Pang, G., Shen, C., Jin, H., Hengel, A.V.D.: Deep weakly-supervised anomaly detection. arXiv preprint arXiv:1910.13601 (2019)
Peng, P., Wong, R.C.W., Yu, P.S.: Learning on probabilistic labels. In: 2014 SIAM International Conference on Data Mining, pp. 307–315. SIAM (2014)
Perini, L., Bürkner, P., Klami, A.: Estimating the contamination factor’s distribution in unsupervised anomaly detection. In: Fortieth International Conference on Machine Learning. PMLR (2023)
Perini, L., Vercruyssen, V., Davis, J.: Class prior estimation in active positive and unlabeled learning. In: 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence, pp. 2915–2921. IJCAI-PRICAI (2020)
Perini, L., Vercruyssen, V., Davis, J.: Transferring the contamination factor between anomaly detection domains by shape similarity. In: 36th AAAI Conference on Artificial Intelligence, pp. 4128–4136. AAAI Press (2022)
Pimentel, T., Monteiro, M., Veloso, A., Ziviani, N.: Deep active learning for anomaly detection. In: 2020 IEEE International Joint Conference on Neural Networks, pp. 1–8. IEEE (2020)
Pustokhina, I., Seraj, A., Hafsan, H., Mostafavi, S.M., Alizadeh, S.: Developing a robust model based on the gaussian process regression approach to predict biodiesel properties. Int. J. Chem. Eng. 1–12 (2021)
Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4
Ratner, A., Hancock, B., Dunnmon, J., Goldman, R., Ré, C.: Snorkel metal: weak supervision for multi-task learning. In: Second Workshop on Data Management for End-to-End Machine Learning. Association for Computing Machinery (2018)
Raykar, V.C., et al.: Learning from crowds. J. Mach. Learn. Res. 11(4) (2010)
Ruff, L., et al.: Deep semi-supervised anomaly detection. arXiv preprint arXiv:1906.02694 (2019)
Russo, S., Lürig, M., Hao, W., Matthews, B., Villez, K.: Active learning for anomaly detection in environmental data. Environ. Model. Softw. 134, 104869 (2020)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Thiel, C.: Classification on soft labels is robust against label noise. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5177, pp. 65–73. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85563-7_14
Vercruyssen, V., Meert, W., Verbruggen, G., Maes, K., Baumer, R., Davis, J.: Semi-supervised anomaly detection with an application to water analytics. In: 2018 IEEE International Conference on Data Mining, pp. 527–536. IEEE (2018)
Vercruyssen, V., Perini, L., Meert, W., Davis, J.: Multi-domain active learning for semi-supervised anomaly detection. In: 2022 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 485–501. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-26412-2_30
Xuan, J., Lu, J., Zhang, G.: A survey on Bayesian nonparametric learning. ACM Comput. Surv. 52(1), 1–36 (2019)
Zhao, Y., Hryniewicki, M.K.: Xgbod: improving supervised outlier detection with unsupervised representation learning. In: 2018 IEEE International Joint Conference on Neural Networks, pp. 1–8. IEEE (2018)
Zhao, Z., et al.: Enhancing robustness of on-line learning models on highly noisy data. IEEE Trans. Depend. Secure Comput. 18(05), 2177–2192 (2021)
Acknowledgment
This work is supported by the FWO-Vlaanderen (aspirant grant 1166222N to LP and G0D8819N to JD and TM) and the Flemish government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme (JD, LP).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Martens, T., Perini, L., Davis, J. (2023). Semi-supervised Learning from Active Noisy Soft Labels for Anomaly Detection. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-43412-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)