research-article

Public Access

Robust Linear Regression Against Training Data Poisoning

Authors:

Yevgeniy Vorobeychik,

Alina OpreaAuthors Info & Claims

AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pages 91 - 102

https://doi.org/10.1145/3128572.3140447

Published: 03 November 2017 Publication History

Abstract

The effectiveness of supervised learning techniques has made them ubiquitous in research and practice. In high-dimensional settings, supervised learning commonly relies on dimensionality reduction to improve performance and identify the most important factors in predicting outcomes. However, the economic importance of learning has made it a natural target for adversarial manipulation of training data, which we term poisoning attacks. Prior approaches to dealing with robust supervised learning rely on strong assumptions about the nature of the feature matrix, such as feature independence and sub-Gaussian noise with low variance. We propose an integrated method for robust regression that relaxes these assumptions, assuming only that the feature matrix can be well approximated by a low-rank matrix. Our techniques integrate improved robust low-rank matrix approximation and robust principle component regression, and yield strong performance guarantees. Moreover, we experimentally show that our methods significantly outperform state of the art both in running time and prediction error.

References

[1]

Scott Alfeld, Xiaojin Zhu, and Paul Barford. 2016. Data Poisoning Attacks against Autoregressive Models AAAI.

[2]

Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Georgios Sakkis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos. 2000. Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. arXiv preprint cs/0009009 (2000).

[3]

Mauro Barni and Benedetta Tondi. 2014. Source distinguishability under corrupted training Information Forensics and Security (WIFS), 2014 IEEE International Workshop on. IEEE, 197--202.

[4]

Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. Doug Tygar. 2006. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security. ACM, 16--25.

Digital Library

[5]

Battista Biggio, Igino Corona, Giorgio Fumera, Giorgio Giacinto, and Fabio Roli. 2011. Bagging classifiers for fighting poisoning attacks in adversarial classification tasks. Multiple Classifier Systems. Springer, 350--359.

[6]

Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. ICML.

[7]

Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM (JACM) Vol. 58, 3 (2011), 11.

Digital Library

[8]

Philip K. Chan and Salvatore J. Stolfo. 1998. Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In KDD, Vol. Vol. 1998. 164--168.

[9]

Moses Charikar, Jacob Steinhardt, and Gregory Valiant. 2016. Learning from untrusted data. arXiv preprint arXiv:1611.02315 (2016).

[10]

Yudong Chen, Constantine Caramanis, and Shie Mannor. 2013. Robust High Dimensional Sparse Regression and Matching Pursuit. arXiv preprint arXiv:1301.2725 (2013).

[11]

Yudong Chen, Huan Xu, Constantine Caramanis, and Sujay Sanghavi. 2011. Robust Matrix Completion and Corrupted Columns. In Proc. of ICML 11.

[12]

Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, Deepak Verma, et al. 2004. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 99--108.

Digital Library

[13]

Carl Eckart and Gale Young. 1936. The approximation of one matrix by another of lower rank. Psychometrika, Vol. 1, 3 (1936), 211--218.

[14]

Jiashi Feng, Huan Xu, Shie Mannor, and Shuicheng Yan. 2014. Robust logistic regression and classification. In Advances in Neural Information Processing Systems. 253--261.

[15]

Harold Hotelling. 1933. Analysis of a complex of statistical variables into principal components. Journal of educational psychology Vol. 24, 6 (1933), 417.

[16]

Ian Jolliffe. 2002. Principal component analysis. Wiley Online Library.

[17]

Ian T. Jolliffe. 1982. A note on the use of principal components in regression. Applied Statistics (1982), 300--303.

[18]

Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730 (2017).

[19]

Bo Li and Yevgeniy Vorobeychik. 2014. Feature cross-substitution in adversarial classification Advances in Neural Information Processing Systems. 2087--2095.

[20]

Bo Li and Yevgeniy Vorobeychik. 2015. Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings. In AISTATS.

[21]

Daniel Lowd and Christopher Meek. 2005. Adversarial learning Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 641--647.

[22]

Shike Mei and Xiaojin Zhu. 2015. The Security of Latent Dirichlet Allocation. In AISTATS.

[23]

Shike Mei and Xiaojin Zhu. 2015. Using Machine Teaching to Identify Optimal Training-set Attacks on Machine Learners AAAI.

[24]

Paul Rodriguez and Brendt Wohlberg. 2013. Fast principal component pursuit via alternating minimization Image Processing (ICIP), 2013 20th IEEE International Conference on. IEEE, 69--73.

[25]

Benjamin I. P. Rubinstein, Blaine Nelson, Ling Huang, Anthony D. Joseph, Shing-hon Lau, Satish Rao, Nina Taft, and J. D. Tygar. 2009. Antidote: understanding and defending against poisoning of anomaly detectors Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference. ACM, 1--14.

[26]

Claude E. Shannon. 1949. Communication theory of secrecy systems. Bell Labs Technical Journal 28, 4 (1949), 656--715.

[27]

Salvatore Stolfo, David W. Fan, Wenke Lee, Andreas Prodromidis, and P. Chan. 1997. Credit card fraud detection using meta-learning: Issues and initial results AAAI-97 Workshop on Fraud Detection and Risk Management.

[28]

Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli. 2015. Is Feature Selection Secure against Training Data Poisoning? Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1689--1698.

[29]

Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli. 2015. Is Feature Selection Secure against Training Data Poisoning ICML.

[30]

Huan Xu, Constantine Caramanis, and Sujay Sanghavi. 2010. Robust PCA via outlier pursuit. In Advances in Neural Information Processing Systems. 2496--2504.

[31]

Ming Yan, Yi Yang, and Stanley Osher 2013. Exact low-rank matrix completion from sparsely corrupted entries via adaptive outlier pursuit. Journal of Scientific Computing Vol. 56, 3 (2013), 433--449.

Cited By

Ferdinan TKocoń J(2025)Fortifying NLP models against poisoning attacksInformation Fusion10.1016/j.inffus.2024.102692114:COnline publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1016/j.inffus.2024.102692
Maramreddy YMuppavaram K(2024)Detecting and Mitigating Data Poisoning Attacks in Machine Learning: A Weighted Average ApproachEngineering, Technology & Applied Science Research10.48084/etasr.759114:4(15505-15509)Online publication date: 2-Aug-2024
https://doi.org/10.48084/etasr.7591
Yang LRajab MShami AMuhaidat S(2024)Enabling AutoML for Zero-Touch Network Security: Use-Case Driven AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2024.337663121:3(3555-3582)Online publication date: Jun-2024
https://doi.org/10.1109/TNSM.2024.3376631
Show More Cited By

Index Terms

Robust Linear Regression Against Training Data Poisoning
1. Security and privacy

Recommendations

Threats to Training: A Survey of Poisoning Attacks and Defenses on Machine Learning Systems
Machine learning (ML) has been universally adopted for automated decisions in a variety of fields, including recognition and classification applications, recommendation systems, natural language processing, and so on. However, in light of high expenses on ...
Subpopulation Data Poisoning Attacks
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security

Machine learning systems are deployed in critical settings, but they might fail in unexpected ways, impacting the accuracy of their predictions. Poisoning attacks against machine learning induce adversarial modification of data used by a machine ...
Label Sanitization Against Label Flipping Poisoning Attacks
ECML PKDD 2018 Workshops
Abstract
Many machine learning systems rely on data collected in the wild from untrusted sources, exposing the learning algorithms to data poisoning. Attackers can inject malicious data in the training dataset to subvert the learning process, compromising ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

November 2017

140 pages

ISBN:9781450352024

DOI:10.1145/3128572

General Chair:
Bhavani Thuraisingham
University of Texas at Dallas, USA
,
Program Chairs:
Battista Biggio
Pluribus One and University of Cagliari, Italy
,
David Mandell Freeman
Facebook Inc., USA
,
Brad Miller
Google Inc., USA
,
Arunesh Sinha
University of Michigan, Ann Arbor, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ONR
NIH Berkeley Deep Drive (BDD)
ARO
Berkeley Deep Drive (BDD)
NSF
NIH

Conference

CCS '17

Sponsor:

SIGSAC

CCS '17: 2017 ACM SIGSAC Conference on Computer and Communications Security

November 3, 2017

Texas, Dallas, USA

Acceptance Rates

AISec '17 Paper Acceptance Rate 11 of 36 submissions, 31%;

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

53
Total Citations
View Citations
1,438
Total Downloads

Downloads (Last 12 months)264
Downloads (Last 6 weeks)24

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ferdinan TKocoń J(2025)Fortifying NLP models against poisoning attacksInformation Fusion10.1016/j.inffus.2024.102692114:COnline publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1016/j.inffus.2024.102692
Maramreddy YMuppavaram K(2024)Detecting and Mitigating Data Poisoning Attacks in Machine Learning: A Weighted Average ApproachEngineering, Technology & Applied Science Research10.48084/etasr.759114:4(15505-15509)Online publication date: 2-Aug-2024
https://doi.org/10.48084/etasr.7591
Yang LRajab MShami AMuhaidat S(2024)Enabling AutoML for Zero-Touch Network Security: Use-Case Driven AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2024.337663121:3(3555-3582)Online publication date: Jun-2024
https://doi.org/10.1109/TNSM.2024.3376631
Jiang WLi HXu GZhang TLu R(2024)A Comprehensive Defense Framework Against Model Extraction AttacksIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326132721:2(685-700)Online publication date: Mar-2024
https://doi.org/10.1109/TDSC.2023.3261327
Wang FWang XBan X(2024)Data poisoning attacks in intelligent transportation systems: A surveyTransportation Research Part C: Emerging Technologies10.1016/j.trc.2024.104750165(104750)Online publication date: Aug-2024
https://doi.org/10.1016/j.trc.2024.104750
Zhu YWen HZhao RJiang YLiu QZhang P(2023)Research on Data Poisoning Attack against Smart Grid Cyber–Physical System Based on Edge ComputingSensors10.3390/s2309450923:9(4509)Online publication date: 5-May-2023
https://doi.org/10.3390/s23094509
Du YCai YJin XWang HLi YLu M(2023)ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic SamplesMathematics10.3390/math1118389111:18(3891)Online publication date: 13-Sep-2023
https://doi.org/10.3390/math11183891
Pagan NBaumann JElokda EDe Pasquale GBolognani SHannák A(2023)A Classification of Feedback Loops and Their Relation to Biases in Automated Decision-Making SystemsProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623227(1-14)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3617694.3623227
Boufelja Yacobi SBouneffouf D(2023)Robust Stochastic Multi-Armed Bandits with Historical DataCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587653(959-965)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543873.3587653
Hammoudeh ZLowd D(2023)Reducing Certified Regression to Certified Classification for General Poisoning Attacks2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML54575.2023.00040(484-523)Online publication date: Feb-2023
https://doi.org/10.1109/SaTML54575.2023.00040
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten