skip to main content
research-article

Robust Regression via Heuristic Corruption Thresholding and Its Adaptive Estimation Variation

Published:07 June 2019Publication History
Skip Abstract Section

Abstract

The presence of data noise and corruptions has recently invoked increasing attention on robust least-squares regression (RLSR), which addresses this fundamental problem that learns reliable regression coefficients when response variables can be arbitrarily corrupted. Until now, the following important challenges could not be handled concurrently: (1) rigorous recovery guarantee of regression coefficients, (2) difficulty in estimating the corruption ratio parameter, and (3) scaling to massive datasets. This article proposes a novel Robust regression algorithm via Heuristic Corruption Thresholding (RHCT) that concurrently addresses all the above challenges. Specifically, the algorithm alternately optimizes the regression coefficients and estimates the optimal uncorrupted set via heuristic thresholding without a pre-defined corruption ratio parameter until its convergence. Moreover, to improve the efficiency of corruption estimation in large-scale data, a Robust regression algorithm via Adaptive Corruption Thresholding (RACT) is proposed to determine the size of the uncorrupted set in a novel adaptive search method without iterating data samples exhaustively. In addition, we prove that our algorithms benefit from strong guarantees analogous to those of state-of-the-art methods in terms of convergence rates and recovery guarantees. Extensive experiments demonstrate that the effectiveness of our new methods is superior to that of existing methods in the recovery of both regression coefficients and uncorrupted sets, with very competitive efficiency.

References

  1. Kush Bhatia, Prateek Jain, and Purushottam Kar. 2015. Robust regression via hard thresholding. In Proceedings of the 28th International Conference on Neural Information Processing Systems. 721--729.Google ScholarGoogle Scholar
  2. Joel W. Branch, Chris Giannella, Boleslaw Szymanski, Ran Wolff, and Hillol Kargupta. 2013. In-network outlier detection in wireless sensor networks. Knowledge and Information Systems 34, 1 (2013), 23--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Markus Breunig, Hans-Peter Kriegel, Raymond Ng, and Jörg Sander. 1999. Optics-of: Identifying local outliers. Principles of Data Mining and Knowledge Discovery (1999), 262--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In Proceedings of the ACM Sigmod Record, Vol. 29. ACM, 93--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kai Chen, Qi Lv, Yao Lu, and Yong Dou. 2017. Robust regularized extreme learning machine for regression using iteratively reweighted least squares. Neurocomputing 230 (2017), 345--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yudong Chen, Constantine Caramanis, and Shie Mannor. 2013. Robust sparse regression under adversarial corruption. In Proceedings of the 30th International Conference on Machine Learning. 28, 3 (2013), 774--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gaudenz Danuser and Markus Stricker. 1998. Parametric model fitting: From inlier characterization to outlier detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 3 (1998), 263--280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering 26, 9 (2014), 2250--2267.Google ScholarGoogle ScholarCross RefCross Ref
  9. Victoria Hodge and Jim Austin. 2004. A survey of outlier detection methodologies. Artificial Intelligence Review 22, 2 (2004), 85--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chao Huang and Dong Wang. 2016. Topic-aware social sensing with arbitrary source dependency graphs. In Proceedings of the 15th International Conference on Information Processing in Sensor Networks. IEEE Press, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chao Huang, Dong Wang, and Nitesh Chawla. 2017. Scalable uncertainty-aware truth discovery in big data social sensing applications for cyber-physical systems. IEEE Transactions on Big Data. 1--1.Google ScholarGoogle ScholarCross RefCross Ref
  12. Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew. 2006. Extreme learning machine: Theory and applications. Neurocomputing 70, 1--3 (2006), 489--501.Google ScholarGoogle ScholarCross RefCross Ref
  13. Peter J. Huber. 1973. Robust regression: Asymptotics, conjectures and Monte Carlo. Annals of Statistics 1, 5 (1973), 799--821. https://projecteuclid.org/euclid.aos/1176342503.Google ScholarGoogle ScholarCross RefCross Ref
  14. Peter J. Huber and Elvezio M. Ronchetti. 2009. The Basic Types of Estimates. John Wiley 8 Sons, Inc., 45--70.Google ScholarGoogle Scholar
  15. Wen Jin, Anthony K. H. Tung, and Jiawei Han. 2001. Mining top-n local outliers in large databases. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01). ACM, New York, NY, 293--298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yoonsuh Jung, Seung Pil Lee, and Jianhua Hu. 2016. Robust regression for highly corrupted response by shifting outliers. Statistical Modelling 16, 1 (2016), 1--23.Google ScholarGoogle ScholarCross RefCross Ref
  17. Longin Jan Latecki, Aleksandar Lazarevic, and Dragoljub Pokrajac. 2007. Outlier detection with kernel density functions. In Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition. Springer, 61--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ruirui Li, Xinxin Huang, Shuo Song, Jia Wang, and Wei Wang. 2016. Towards customer trouble tickets resolution automation in large cellular services. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. ACM, 479--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Po-Ling Loh and Martin J. Wainwright. 2011. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. In Proceedings of the Advances in Neural Information Processing Systems. 2726--2734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. M. Lourenco, Ana M. Pires, and M. Kirst. 2011. Robust linear regression methods in association studies. Bioinformatics 27, 6 (2011), 815--821. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. RARD Maronna, R Douglas Martin, and Victor Yohai. 2006. Robust Statistics. John Wiley 8 Sons, Chichester.Google ScholarGoogle Scholar
  22. Brian Mcwilliams, Gabriel Krummenacher, Mario Lucic, and Joachim M. Buhmann. 2014. Fast and robust least squares estimation in corrupted linear models. In Proceedings of the 27th International Conference on Neural Information Processing Systems. Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), Curran Associates, Inc., 415--423. Retrieved from http://papers.nips.cc/paper/5428-fast-and-robust-least-squares-estimation-in-corrupted-linear-models.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Imran Naseem, Roberto Togneri, and Mohammed Bennamoun. 2012. Robust regression for face recognition. Pattern Recognition 45, 1 (2012), 104--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hong-Wei Ng and Stefan Winkler. 2014. A data-driven approach to cleaning large face datasets. In Proceedings of the IEEE International Conference on Image Processing (ICIP’14). IEEE, 343--347.Google ScholarGoogle ScholarCross RefCross Ref
  25. Nam H. Nguyen and Trac D. Tran. 2013. Exact recoverability from dense corrupted observations via L1-minimization. IEEE Transactions on Information Theory 59, 4 (2013), 2017--2035. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Volker Roth. 2006. Kernel fisher discriminants for outlier detection. Neural Computation 18, 4 (2006), 942--960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Peter J. Rousseeuw and Annick M. Leroy. 2005. Robust Regression and Outlier Detection, Vol. 589. John Wiley 8 Sons.Google ScholarGoogle Scholar
  28. Peter J. Rousseeuw and Katrien van Driessen. 2006. Computing LTS regression for large data sets. Data Mining and Knowledge Discovery 12, 1 (Jan. 2006), 29--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yiyuan She and Art B. Owen. 2011. Outlier detection using nonconvex penalized regression. Journal of the American Statistical Association 106, 494 (2011), 626--639. http://www.jstor.org/stable/41416397.Google ScholarGoogle ScholarCross RefCross Ref
  30. Helge Erik Solberg and Ari Lahti. 2005. Detection of outliers in reference distributions: Performance of Horn’s algorithm. Clinical Chemistry 51, 12 (2005), 2326--2332.Google ScholarGoogle ScholarCross RefCross Ref
  31. Christoph Studer, Patrick Kuppinger, Graeme Pope, and Helmut Bolcskei. 2012. Recovery of sparsely corrupted signals. IEEE Transactions on Information Theory 58, 5 (2012), 3115--3130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sharmila Subramaniam, Themis Palpanas, Dimitris Papadopoulos, Vana Kalogeraki, and Dimitrios Gunopulos. 2006. Online outlier detection in sensor data using non-parametric models. In Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 187--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. John Wright and Yi Ma. 2010. Dense error correction via L1-minimization. IEEE Transactions on Information Theory 56, 7 (Jul. 2010), 3540--3560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2 (2009), 210--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xian Wu, Yuxiao Dong, Jun Tao, Chao Huang, and Nitesh V. Chawla. 2017. Reliable fake review detection via modeling temporal and behavioral patterns. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data’17). IEEE, 494--499.Google ScholarGoogle Scholar
  36. Allen Yang, Arvind Ganesh, Shankar Sastry, and Yi Ma. 2010. Fast L1-Minimization Algorithms and an Application in Robust Face Recognition: A Review. Technical Report UCB/EECS-2010-13. EECS Department, University of California, Berkeley. Retrieved from http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-13.html.Google ScholarGoogle Scholar
  37. Andrea Zanella, Nicola Bui, Angelo Castellani, Lorenzo Vangelista, and Michele Zorzi. 2014. Internet of things for smart cities. IEEE Internet of Things Journal 1, 1 (2014), 22--32.Google ScholarGoogle ScholarCross RefCross Ref
  38. Xuchao Zhang, Shuo Lei, Liang Zhao, Arnold Boedihardjo, and Chang-Tien Lu. 2018. Robust regression via online feature selection under adversarial data corruption. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM’18). IEEE, 1440--1445.Google ScholarGoogle ScholarCross RefCross Ref
  39. Xuchao Zhang, Liang Zhao, Arnold P Boedihardjo, and Chang-Tien Lu. 2017. Online and distributed robust regressions under adversarial data corruption. In Proceedings of the IEEE International Conference on Data Mining (ICDM’17). IEEE, 625--634.Google ScholarGoogle ScholarCross RefCross Ref
  40. Xuchao Zhang, Liang Zhao, Arnold P. Boedihardjo, and Chang-Tien Lu. 2017. Robust regression via heuristic hard thresholding. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 3434--3440.Google ScholarGoogle ScholarCross RefCross Ref
  41. Hao Zhu, Henry Leung, and Zhongshi He. 2013. A variational Bayesian approach to robust sensor fusion based on Student-t distribution. Information Sciences 221, Supplement C (2013), 201--214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Abdelhak M Zoubir, Visa Koivunen, Yacine Chakhchoukh, and Michael Muma. 2012. Robust estimation in signal processing: A tutorial-style treatment of fundamental concepts. IEEE Signal Processing Magazine 29, 4 (2012), 61--80.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Robust Regression via Heuristic Corruption Thresholding and Its Adaptive Estimation Variation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 13, Issue 3
      June 2019
      261 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3331063
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 June 2019
      • Accepted: 1 February 2019
      • Revised: 1 October 2018
      • Received: 1 January 2018
      Published in tkdd Volume 13, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format