Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach

Matthew Yudin; Achyut Reddy; Sridhar Venkatesan; Rauf Izmailov

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach

Topics: AI-based Security and Privacy; Intrusion Detection & Prevention; Machine learning applications to data security and privacy; Machine Learning Security and Privacy

In Proceedings of the 21st International Conference on Security and Cryptography SECRYPT - Volume 1, 140-150, 2024 , Dijon, France

Authors: Matthew Yudin ; Achyut Reddy ; Sridhar Venkatesan and Rauf Izmailov

Affiliation: Peraton Labs, Basking Ridge, NJ, U.S.A.

Keyword(s): Concept Drift, Adversarial Machine Learning, Mitigation.

Abstract: Machine learning (ML) models are increasingly being adopted to develop Intrusion Detection Systems (IDS). Such models are usually trained on large, diversified datasets. As a result, they demonstrate excellent performance on previously unseen samples provided they are generally within the distribution of the training data. However, as operating environments and the threat landscape change over time (e.g., installations of new applications, discovery of a new malware), the underlying distributions of the modeled behavior also change, leading to a degradation in the performance of ML-based IDS over time. Such a shift in distribution is referred to as concept drift. Models are periodically retrained with newly collected data to account for concept drift. Data curated for retraining may also contain adversarial samples i.e., samples that an attacker has modified in order to evade the ML-based IDS. Such adversarial samples, when included for re-training, would poison the model and subsequ ently degrade the model’s performance. Concept drift and adversarial samples are both considered to be out-of-distribution samples that cannot be easily differentiated by a trained model. Thus, an intelligent monitoring of the model inputs is necessary to distinguish between these two classes of out-of-distribution samples. In the paper, we consider a worst-case setting for the defender in which the original ML-based IDS is poisoned through an out-of-band mechanism. We propose an approach that perturbs an input sample at different magnitudes of noise and observes the change in the poisoned model’s outputs to determine if an input sample is adversarial. We evaluate this approach in two settings: Network-IDS and an Android malware detection system. We then compare it with existing techniques that detect either concept drift or adversarial samples. Preliminary results show that the proposed approach provides strong signals to differentiate between adversarial and concept drift samples. Furthermore, we show that techniques that detect only concept drift or only adversarial samples are insufficient to detect the other class of out-of-distribution samples. (More)

Machine learning (ML) models are increasingly being adopted to develop Intrusion Detection Systems (IDS). Such models are usually trained on large, diversified datasets. As a result, they demonstrate excellent performance on previously unseen samples provided they are generally within the distribution of the training data. However, as operating environments and the threat landscape change over time (e.g., installations of new applications, discovery of a new malware), the underlying distributions of the modeled behavior also change, leading to a degradation in the performance of ML-based IDS over time. Such a shift in distribution is referred to as concept drift. Models are periodically retrained with newly collected data to account for concept drift. Data curated for retraining may also contain adversarial samples i.e., samples that an attacker has modified in order to evade the ML-based IDS. Such adversarial samples, when included for re-training, would poison the model and subsequently degrade the model’s performance. Concept drift and adversarial samples are both considered to be out-of-distribution samples that cannot be easily differentiated by a trained model. Thus, an intelligent monitoring of the model inputs is necessary to distinguish between these two classes of out-of-distribution samples. In the paper, we consider a worst-case setting for the defender in which the original ML-based IDS is poisoned through an out-of-band mechanism. We propose an approach that perturbs an input sample at different magnitudes of noise and observes the change in the poisoned model’s outputs to determine if an input sample is adversarial. We evaluate this approach in two settings: Network-IDS and an Android malware detection system. We then compare it with existing techniques that detect either concept drift or adversarial samples. Preliminary results show that the proposed approach provides strong signals to differentiate between adversarial and concept drift samples. Furthermore, we show that techniques that detect only concept drift or only adversarial samples are insufficient to detect the other class of out-of-distribution samples.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.19.74.75

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Yudin, M., Reddy, A., Venkatesan, S. and Izmailov, R. (2024). Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach. In Proceedings of the 21st International Conference on Security and Cryptography - SECRYPT; ISBN 978-989-758-709-2; ISSN 2184-7711, SciTePress, pages 140-150. DOI: 10.5220/0012761600003767

@conference{secrypt24,
author={Matthew Yudin and Achyut Reddy and Sridhar Venkatesan and Rauf Izmailov},
title={Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach},
booktitle={Proceedings of the 21st International Conference on Security and Cryptography - SECRYPT},
year={2024},
pages={140-150},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012761600003767},
isbn={978-989-758-709-2},
issn={2184-7711},
}

TY - CONF

JO - Proceedings of the 21st International Conference on Security and Cryptography - SECRYPT
TI - Backdoor Attacks During Retraining of Machine Learning Models: A Mitigation Approach
SN - 978-989-758-709-2
IS - 2184-7711
AU - Yudin, M.
AU - Reddy, A.
AU - Venkatesan, S.
AU - Izmailov, R.
PY - 2024
SP - 140
EP - 150
DO - 10.5220/0012761600003767
PB - SciTePress