Skip to main content

Linear Regression with Mismatched Data: A Provably Optimal Local Search Algorithm

  • Conference paper
  • First Online:
Integer Programming and Combinatorial Optimization (IPCO 2021)

Abstract

Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the problem leads to computational challenges, and we are unaware of any algorithm for this problem with both theoretical guarantees and appealing computational performance. To this end, in this paper, we propose and study a simple greedy local search algorithm. We prove that under a suitable scaling of the number of mismatched pairs compared to the number of samples and features, and certain assumptions on the covariates; our local search algorithm converges to the global optimal solution with a linear convergence rate under the noiseless setting.

Supported by grants from the Office of Naval Research: ONR-N000141812298 (YIP) and National Science Foundation: NSF-IIS-1718258.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This permutation \(P^*\) may not satisfy \( \mathsf {dist}(P^* ,I_n) = r\), but \( \mathsf {dist}(P^* ,I_n)\) will be close to r.

References

  1. Abid, A., Zou, J.: Stochastic EM for shuffled linear regression. arXiv preprint arXiv:1804.00681 (2018)

  2. Dokmanić, I.: Permutations unlabeled beyond sampling unknown. IEEE Signal Process. Lett. 26(6), 823–827 (2019)

    Article  Google Scholar 

  3. Emiya, V., Bonnefoy, A., Daudet, L., Gribonval, R.: Compressed sensing with unknown sensor permutation. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1040–1044. IEEE (2014)

    Google Scholar 

  4. Haghighatshoar, S., Caire, G.: Signal recovery from unlabeled samples. IEEE Trans. Signal Process. 66(5), 1242–1257 (2017)

    Article  MathSciNet  Google Scholar 

  5. Hsu, D.J., Shi, K., Sun, X.: Linear regression without correspondence. In: Advances in Neural Information Processing Systems, pp. 1531–1540 (2017)

    Google Scholar 

  6. Neter, J., Maynes, E.S., Ramanathan, R.: The effect of mismatching on the measurement of response errors. J. Am. Stat. Assoc. 60(312), 1005–1027 (1965)

    MathSciNet  Google Scholar 

  7. Pananjady, A., Wainwright, M.J., Courtade, T.A.: Denoising linear models with permuted data. In: 2017 IEEE International Symposium on Information Theory (ISIT), pp. 446–450. IEEE (2017)

    Google Scholar 

  8. Pananjady, A., Wainwright, M.J., Courtade, T.A.: Linear regression with shuffled data: statistical and computational limits of permutation recovery. IEEE Trans. Inf. Theory 64(5), 3286–3300 (2017)

    Article  MathSciNet  Google Scholar 

  9. Shi, X., Li, X., Cai, T.: Spherical regression under mismatch corruption with application to automated knowledge translation. J. Am. Stat. Assoc., 1–12 (2020)

    Google Scholar 

  10. Slawski, M., Ben-David, E., Li, P.: Two-stage approach to multivariate linear regression with sparsely mismatched data. J. Mach. Learn. Res. 21(204), 1–42 (2020)

    MathSciNet  MATH  Google Scholar 

  11. Tsakiris, M.C., Peng, L., Conca, A., Kneip, L., Shi, Y., Choi, H., et al.: An algebraic-geometric approach to shuffled linear regression. arXiv preprint arXiv:1810.05440 (2018)

  12. Unnikrishnan, J., Haghighatshoar, S., Vetterli, M.: Unlabeled sensing with random linear measurements. IEEE Trans. Inf. Theory 64(5), 3237–3253 (2018)

    Article  MathSciNet  Google Scholar 

  13. Wainwright, M.J.: High-Dimensional Statistics: A Non-asymptotic Viewpoint, vol. 48. Cambridge University Press, Cambridge (2019)

    Google Scholar 

  14. Wang, G., et al.: Signal amplitude estimation and detection from unlabeled binary quantized samples. IEEE Trans. Signal Process. 66(16), 4291–4303 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahul Mazumder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mazumder, R., Wang, H. (2021). Linear Regression with Mismatched Data: A Provably Optimal Local Search Algorithm. In: Singh, M., Williamson, D.P. (eds) Integer Programming and Combinatorial Optimization. IPCO 2021. Lecture Notes in Computer Science(), vol 12707. Springer, Cham. https://doi.org/10.1007/978-3-030-73879-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73879-2_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73878-5

  • Online ISBN: 978-3-030-73879-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics