Multi-domain Active Learning for Semi-supervised Anomaly Detection

Vercruyssen, Vincent; Perini, Lorenzo; Meert, Wannes; Davis, Jesse

doi:10.1007/978-3-031-26412-2_30

Vincent Vercruyssen¹³,
Lorenzo Perini¹³,
Wannes Meert¹³ &
…
Jesse Davis¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

561 Accesses
1 Citations

Abstract

Active learning aims to ease the burden of collecting large amounts of annotated data by intelligently acquiring labels during the learning process that will be most helpful to learner. Current active learning approaches focus on learning from a single dataset. However, a common setting in practice requires simultaneously learning models from multiple datasets, where each dataset requires a separate learned model. This paper tackles the less-explored multi-domain active learning setting. We approach this from the perspective of multi-armed bandits and propose the active learning bandits (Alba) method, which uses bandit methods to both explore and exploit the usefulness of querying a label from different datasets in subsequent query rounds. We evaluate our approach on a benchmark of 7 datasets collected from a retail environment, in the context of a real-world use case of detecting anomalous resource usage. Alba outperforms existing active learning strategies, providing evidence that the standard active learning approaches are less suitable for the multi-domain setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Appendix & Code: https://github.com/Vincent-Vercruyssen/ALBA-paper.
2.
Because AL typically operates with a small budget and large datasets, this assumption is reasonable as the marginal benefit of labeling each additional instance is small.
3.
Section 5 provides empirical evidence that the random selection strategy indeed leads to better results than the heuristic strategy. Note that the proof relies on the heuristic strategy being able to rank the instances correctly according to their informativeness. In reality, this ranking is approximate.
4.
Online Appendix 7.2 has detailed information on the (choice of) evaluation metrics, benchmark data, and hyperparameters. It also has additional results on the impact of the dataset characteristics on Alba’s performance.
5.
We use 8 statistical (average, standard deviation, max, min, median, sum, entropy, skewness, and Kurtosis) and 2 binary features (whether its a Friday or a Sunday), 10 in total.
6.
See online Appendix 7.3 for the plots for all 54 benchmark datasets.
7.
All the experimental evaluations maintain a precision of 1e–4 and a threshold of 0.001 (e.g., to determine the similarity of two AULC scores).
8.
See online Appendix 7.3 for a more detailed discussion.

References

Acharya, A., Mooney, R.J., Ghosh, J.: Active multi-task learning using both latent and supervised shared topics. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 190–198 (2014)
Google Scholar
Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning 5(1), 1–122 (2012)
Google Scholar
Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining Knowl. Discovery 30(4), 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8
Article MathSciNet Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Desreumaux, L., Lemaire, V.: Learning active learning at the crossroads? evaluation and discussion. arXiv preprint arXiv:2012.09631 (2020)
Fang, M., Tao, D.: Active multi-task learning via bandits. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 505–513 (2015)
Google Scholar
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 1–35 (2016)
MathSciNet MATH Google Scholar
Ganti, R., Gray, A.: Building bridges: viewing active learning from the multi-armed bandit lens. arXiv preprint arXiv:1309.6830 (2013)
He, R., He, S., Tang, K.: Multi-domain active learning: a comparative study. arXiv preprint arXiv:2106.13516 (2021)
Hsu, W.N., Lin, H.T.: Active learning by learning. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Konyushkova, K., Sznitman, R., Fua, P.: Learning active learning from data. arXiv preprint arXiv:1703.03365 (2017)
Levine, N., Crammer, K., Mannor, S.: Rotting bandits. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Machine Learning Proceedings, pp. 148–156. Morgan Kaufmann (1994)
Google Scholar
Li, L., Jin, X., Pan, S.J., Sun, J.T.: Multi-domain active learning for text classification. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1086–1094 (2012)
Google Scholar
Perini, L., Vercruyssen, V., Davis, J.: Class prior estimation in active positive and unlabeled learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI 2020), pp. 2915–2921. IJCAI-PRICAI (2020)
Google Scholar
Perini, L., Vercruyssen, V., Davis, J.: Transferring the contamination factor between anomaly detection domains by shape similarity. In: Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (2021)
Google Scholar
Reichart, R., Tomanek, K., Hahn, U., Rappoport, A.: Multi-task active learning for linguistic annotations. In: Proceedings of ACL-08: HLT, pp. 861–869 (2008)
Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through monte-carlo estimation of error reduction. In: Proceedings of the 18th International Conference on Machine Learning, pp. 441–448 (2001)
Google Scholar
Sener, O., Savarese, S.: Active learning for convolutional neural networks: a core-set approach. arXiv preprint arXiv:1708.00489 (2017)
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
MathSciNet MATH Google Scholar
Seung, H., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)
Google Scholar
Sinha, S., Ebrahimi, S., Darrell, T.: Variational adversarial active learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5972–5981 (2019)
Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)
Article MATH Google Scholar
Trittenbach, H., Englhardt, A., Böhm, K.: An overview and a benchmark of active learning for outlier detection with one-class classifiers. Expert Syst. Appl. 168, 114372 (2021)
Article Google Scholar
Vercruyssen, V., Meert, W., Verbruggen, G., Maes, K., Bäumer, R., Davis, J.: Semi-supervised anomaly detection with an application to water analytics. In: Proceedings of the IEEE International Conference on Data Mining, pp. 527–536 (2018)
Google Scholar
Wang, W., Zhou, Z.H.: On multi-view active learning and the combination with semi-supervised learning. In: Proceedings of the 25th International Conference on Machine learning, pp. 1152–1159 (2008)
Google Scholar
Wei, K., Yang, Y., Zuo, H., Zhong, D.: A review on ice detection technology and ice elimination technology for wind turbine. Wind Energy 23(3), 433–457 (2020)
Article Google Scholar
Xiao, Y., Chang, Z., Liu, B.: An efficient active learning method for multi-task learning. Knowledge-Based Syst. 190, 105137 (2020)
Article Google Scholar
Yoo, D., Kweon, I.S.: Learning loss for active learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 93–102 (2019)
Google Scholar
Zhang, Y., Yang, Q.: A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017)
Zhang, Z., Jin, X., Li, L., Ding, G., Yang, Q.: Multi-domain active learning for recommendation. In: 30th AAAI Conference on Artificial Intelligence (2016)
Google Scholar

Download references

Acknowledgements

This work is supported by the Flemish government “Onderzoeksprogramma Artificiële Intelligentie Vlaanderen”, “Agentschap Innoveren & Ondernemen (VLAIO)” as part of the innovation mandate HBC.2020.2297, the FWO-Vlaanderen aspirant grant 1166222N, and Leuven.AI, B-3000 Leuven, Belgium.

Author information

Authors and Affiliations

KU Leuven, Leuven, Belgium
Vincent Vercruyssen, Lorenzo Perini, Wannes Meert & Jesse Davis

Authors

Vincent Vercruyssen
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Perini
View author publications
You can also search for this author in PubMed Google Scholar
Wannes Meert
View author publications
You can also search for this author in PubMed Google Scholar
Jesse Davis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent Vercruyssen .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1729 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vercruyssen, V., Perini, L., Meert, W., Davis, J. (2023). Multi-domain Active Learning for Semi-supervised Anomaly Detection. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_30
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Multi-domain Active Learning for Semi-supervised Anomaly Detection