skip to main content
10.1145/3128572.3140447acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Robust Linear Regression Against Training Data Poisoning

Published: 03 November 2017 Publication History

Abstract

The effectiveness of supervised learning techniques has made them ubiquitous in research and practice. In high-dimensional settings, supervised learning commonly relies on dimensionality reduction to improve performance and identify the most important factors in predicting outcomes. However, the economic importance of learning has made it a natural target for adversarial manipulation of training data, which we term poisoning attacks. Prior approaches to dealing with robust supervised learning rely on strong assumptions about the nature of the feature matrix, such as feature independence and sub-Gaussian noise with low variance. We propose an integrated method for robust regression that relaxes these assumptions, assuming only that the feature matrix can be well approximated by a low-rank matrix. Our techniques integrate improved robust low-rank matrix approximation and robust principle component regression, and yield strong performance guarantees. Moreover, we experimentally show that our methods significantly outperform state of the art both in running time and prediction error.

References

[1]
Scott Alfeld, Xiaojin Zhu, and Paul Barford. 2016. Data Poisoning Attacks against Autoregressive Models AAAI.
[2]
Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Georgios Sakkis, Constantine D. Spyropoulos, and Panagiotis Stamatopoulos. 2000. Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. arXiv preprint cs/0009009 (2000).
[3]
Mauro Barni and Benedetta Tondi. 2014. Source distinguishability under corrupted training Information Forensics and Security (WIFS), 2014 IEEE International Workshop on. IEEE, 197--202.
[4]
Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. Doug Tygar. 2006. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications security. ACM, 16--25.
[5]
Battista Biggio, Igino Corona, Giorgio Fumera, Giorgio Giacinto, and Fabio Roli. 2011. Bagging classifiers for fighting poisoning attacks in adversarial classification tasks. Multiple Classifier Systems. Springer, 350--359.
[6]
Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. ICML.
[7]
Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM (JACM) Vol. 58, 3 (2011), 11.
[8]
Philip K. Chan and Salvatore J. Stolfo. 1998. Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In KDD, Vol. Vol. 1998. 164--168.
[9]
Moses Charikar, Jacob Steinhardt, and Gregory Valiant. 2016. Learning from untrusted data. arXiv preprint arXiv:1611.02315 (2016).
[10]
Yudong Chen, Constantine Caramanis, and Shie Mannor. 2013. Robust High Dimensional Sparse Regression and Matching Pursuit. arXiv preprint arXiv:1301.2725 (2013).
[11]
Yudong Chen, Huan Xu, Constantine Caramanis, and Sujay Sanghavi. 2011. Robust Matrix Completion and Corrupted Columns. In Proc. of ICML 11.
[12]
Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, Deepak Verma, et al. 2004. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 99--108.
[13]
Carl Eckart and Gale Young. 1936. The approximation of one matrix by another of lower rank. Psychometrika, Vol. 1, 3 (1936), 211--218.
[14]
Jiashi Feng, Huan Xu, Shie Mannor, and Shuicheng Yan. 2014. Robust logistic regression and classification. In Advances in Neural Information Processing Systems. 253--261.
[15]
Harold Hotelling. 1933. Analysis of a complex of statistical variables into principal components. Journal of educational psychology Vol. 24, 6 (1933), 417.
[16]
Ian Jolliffe. 2002. Principal component analysis. Wiley Online Library.
[17]
Ian T. Jolliffe. 1982. A note on the use of principal components in regression. Applied Statistics (1982), 300--303.
[18]
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730 (2017).
[19]
Bo Li and Yevgeniy Vorobeychik. 2014. Feature cross-substitution in adversarial classification Advances in Neural Information Processing Systems. 2087--2095.
[20]
Bo Li and Yevgeniy Vorobeychik. 2015. Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings. In AISTATS.
[21]
Daniel Lowd and Christopher Meek. 2005. Adversarial learning Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 641--647.
[22]
Shike Mei and Xiaojin Zhu. 2015. The Security of Latent Dirichlet Allocation. In AISTATS.
[23]
Shike Mei and Xiaojin Zhu. 2015. Using Machine Teaching to Identify Optimal Training-set Attacks on Machine Learners AAAI.
[24]
Paul Rodriguez and Brendt Wohlberg. 2013. Fast principal component pursuit via alternating minimization Image Processing (ICIP), 2013 20th IEEE International Conference on. IEEE, 69--73.
[25]
Benjamin I. P. Rubinstein, Blaine Nelson, Ling Huang, Anthony D. Joseph, Shing-hon Lau, Satish Rao, Nina Taft, and J. D. Tygar. 2009. Antidote: understanding and defending against poisoning of anomaly detectors Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference. ACM, 1--14.
[26]
Claude E. Shannon. 1949. Communication theory of secrecy systems. Bell Labs Technical Journal 28, 4 (1949), 656--715.
[27]
Salvatore Stolfo, David W. Fan, Wenke Lee, Andreas Prodromidis, and P. Chan. 1997. Credit card fraud detection using meta-learning: Issues and initial results AAAI-97 Workshop on Fraud Detection and Risk Management.
[28]
Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli. 2015. Is Feature Selection Secure against Training Data Poisoning? Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1689--1698.
[29]
Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli. 2015. Is Feature Selection Secure against Training Data Poisoning ICML.
[30]
Huan Xu, Constantine Caramanis, and Sujay Sanghavi. 2010. Robust PCA via outlier pursuit. In Advances in Neural Information Processing Systems. 2496--2504.
[31]
Ming Yan, Yi Yang, and Stanley Osher 2013. Exact low-rank matrix completion from sparsely corrupted entries via adaptive outlier pursuit. Journal of Scientific Computing Vol. 56, 3 (2013), 433--449.

Cited By

View all
  • (2025)Fortifying NLP models against poisoning attacksInformation Fusion10.1016/j.inffus.2024.102692114:COnline publication date: 1-Feb-2025
  • (2024)Detecting and Mitigating Data Poisoning Attacks in Machine Learning: A Weighted Average ApproachEngineering, Technology & Applied Science Research10.48084/etasr.759114:4(15505-15509)Online publication date: 2-Aug-2024
  • (2024)Enabling AutoML for Zero-Touch Network Security: Use-Case Driven AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2024.337663121:3(3555-3582)Online publication date: Jun-2024
  • Show More Cited By

Index Terms

  1. Robust Linear Regression Against Training Data Poisoning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
    November 2017
    140 pages
    ISBN:9781450352024
    DOI:10.1145/3128572
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adversarial machine learning
    2. defense
    3. poisoning attacks

    Qualifiers

    • Research-article

    Funding Sources

    • ONR
    • NIH Berkeley Deep Drive (BDD)
    • ARO
    • Berkeley Deep Drive (BDD)
    • NSF
    • NIH

    Conference

    CCS '17
    Sponsor:

    Acceptance Rates

    AISec '17 Paper Acceptance Rate 11 of 36 submissions, 31%;
    Overall Acceptance Rate 94 of 231 submissions, 41%

    Upcoming Conference

    CCS '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)264
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Fortifying NLP models against poisoning attacksInformation Fusion10.1016/j.inffus.2024.102692114:COnline publication date: 1-Feb-2025
    • (2024)Detecting and Mitigating Data Poisoning Attacks in Machine Learning: A Weighted Average ApproachEngineering, Technology & Applied Science Research10.48084/etasr.759114:4(15505-15509)Online publication date: 2-Aug-2024
    • (2024)Enabling AutoML for Zero-Touch Network Security: Use-Case Driven AnalysisIEEE Transactions on Network and Service Management10.1109/TNSM.2024.337663121:3(3555-3582)Online publication date: Jun-2024
    • (2024)A Comprehensive Defense Framework Against Model Extraction AttacksIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326132721:2(685-700)Online publication date: Mar-2024
    • (2024)Data poisoning attacks in intelligent transportation systems: A surveyTransportation Research Part C: Emerging Technologies10.1016/j.trc.2024.104750165(104750)Online publication date: Aug-2024
    • (2023)Research on Data Poisoning Attack against Smart Grid Cyber–Physical System Based on Edge ComputingSensors10.3390/s2309450923:9(4509)Online publication date: 5-May-2023
    • (2023)ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic SamplesMathematics10.3390/math1118389111:18(3891)Online publication date: 13-Sep-2023
    • (2023)A Classification of Feedback Loops and Their Relation to Biases in Automated Decision-Making SystemsProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623227(1-14)Online publication date: 30-Oct-2023
    • (2023)Robust Stochastic Multi-Armed Bandits with Historical DataCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587653(959-965)Online publication date: 30-Apr-2023
    • (2023)Reducing Certified Regression to Certified Classification for General Poisoning Attacks2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML54575.2023.00040(484-523)Online publication date: Feb-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media