Abstract
This paper studies a shuffled linear regression problem. As a variant of ordinary linear regression, it requires estimating not only the regression variable, but also permutational correspondences between the covariates and responses. While existing formulations require the underlying ground-truth correspondences to be an ideal bijection such that all pieces of data should match, such a requirement barely holds in real-world applications due to either missing data or outliers. In this work, we generalize the formulation of shuffled linear regression to a broader range of conditions where only a part of the data should correspond. To this end, the effective recovery condition and NP-hardness of the proposed formulation are also studied. Moreover, we present a simple yet effective algorithm for deriving the solution. Its global convergence property and convergence rate are also analyzed in detail. Distinct tasks validate the effectiveness of our proposed formulation and the solution method.















Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
Some implementations are from ProbReg: http://probreg.readthedocs.io/, last accessed on 2022/12/14 12:14:20.
References
Abid, A., & Zou, J. (2018). A stochastic expectation-maximization approach to shuffled linear regression. In Proceedings of annual allerton conference on communication, control, and computing.
Abid, A., Poon, A., & Zou, J. (2017). Linear regression with shuffled labels. ArXiv Preprint ArXiv:1705.01342.
Aoki, Y., Goforth, H., Srivatsan, R.A., & Lucey, S. (2019). Pointnetlk: Robust & efficient point cloud registration using pointnet. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 7163–7172.
Arun, K. S., Huang, T. S., & Blostein, S. D. (1987). Least-squares fitting of two 3-d point sets. Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.1987.4767965.
Attouch, H., & Bolte, J. (2009). On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Mathematical Programming, 116(1), 5–16.
Attouch, H., Bolte, J., Redont, P., & Soubeyran, A. (2010). Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-łojasiewicz inequality. Mathematics of Operations Research, 35(2), 438–457.
Aubry, M., Schlickewei, U., & Cremers, D. (2011). The wave kernel signature: A quantum mechanical approach to shape analysis. In Proceedings of international conference on computer vision workshops (ICCV workshops).
Bell, J., & Stevens, B. (2009). A survey of known results and research areas for n-queens. Discrete Mathematics, 309(1), 1–31.
Birdal, T., & Simsekli, U. (2019). Probabilistic permutation synchronization using the riemannian structure of the birkhoff polytope. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 11,105–11,116.
Bogo, F., Romero, J., Loper, M., & Black, M.J. (2014). FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of conference on computer vision and pattern recognition (CVPR).
Bolte, J., Daniilidis, A., & Lewis, A. (2007). The łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization, 17(4), 1205–1223.
Bolte, J., Sabach, S., & Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1), 459–494.
Bronstein, A.M., Bronstein, M.M., & Kimmel, R. (2008). Numerical geometry of non-rigid shapes.
Cai, Z., Chin, T.J., Le, H., & Suter, D. (2018) Deterministic consensus maximization with biconvex programming. In Proceedings of European conference on computer vision (ECCV), pp. 685–700.
Campbell, D., & Petersson, L. (2015). An adaptive data representation for robust point-set registration and merging. In Proceedings of international conference on computer vision (ICCV).
Chetverikov, D., Svirko, D., Stepanov, D., & Krsek, P. (2002). The trimmed iterative closest point algorithm. Object Recognition Supported by User Interaction for Service Robots, 3, 545–548.
Chin, T. J., & Suter, D. (2017). The maximum consensus problem: Recent algorithmic advances. Synthesis Lectures on Computer Vision, 7(2), 1–194.
Choi, S., Kim, T., & Yu, W. (2009) Performance evaluation of RANSAC family. In Proceedings of British machine vision conference (BMVC).
Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In Proceedings of annual conference on computer graphics and interactive techniques.
Date, K., & Nagi, R. (2016). Gpu-accelerated hungarian algorithms for the linear assignment problem. Parallel Computing, 57, 52–72.
De Menezes, D., Prata, D. M., Secchi, A. R., & Pinto, J. C. (2021). A review on robust m-estimators for regression analysis. Computers & Chemical Engineering, 147(107), 254.
Doornik, J.A. (2011). Robust estimation using least trimmed squares. Tech. rep., Institute for Economic Modelling, Oxford Martin School, and Economics Department, University of Oxford, UK.
Eckart, B., Kim, K., & Jan, K. (2018). Eoe: Expected overlap estimation over unstructured point cloud data. In Proceedings of international conference on 3D vision (3DV), pp. 747–755.
Elhami, G., Scholefield, A., Haro, B.B., & Vetterli, M. (2017). Unlabeled sensing: Reconstruction algorithm and theoretical guarantees. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP), pp. 4566–4570.
Fiori, M., Sprechmann, P., Vogelstein, J., Musé, P., & Sapiro, G. (2013). Robust multimodal graph matching: Sparse coding meets graph matching. In Proceedings of conference on neural information processing systems (NIPS).
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Fogel, F., Jenatton, R., Bach, F., & d’Aspremont, A. (2013). Convex relaxations for permutation problems. In Proceedings of conference on neural information processing systems (NIPS).
Gao, W., & Tedrake, R. (2019). Filterreg: Robust and efficient probabilistic point-set registration using gaussian filter and twist parameterization. In Proceedings of conference on computer vision and pattern recognition (CVPR).
Gold, S., Rangarajan, A., Lu, C. P., Pappu, S., & Mjolsness, E. (1998). New algorithms for 2d and 3d point matching: Pose estimation and correspondence. Pattern Recognition, 31(8), 1019–1031.
Gunawardana, A., & Byrne, W. (2005). Convergence theorems for generalized alternating minimization procedures. Journal of Machine Learning Research, 6, 2049–2073.
Haghighatshoar, S., & Caire, G. (2017). Signal recovery from unlabeled samples. Transactions on Signal Processing, 66(5), 1242–1257.
Hahnel, D., Burgard, W., Fox, D., Fishkin, K., & Philipose, M. (2004). Mapping and localization with rfid technology. In Proceedings of international conference on robotics and automation (ICRA), vol. 1, pp. 1015–1020.
Hampel, F. (2014). Robust inference. Statistics Reference Online.
Hampel, F. R. (1985). The breakdown points of the mean combined with some rejection rules. Technometrics, 27, 95–107.
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision.
Hawkins, D. M. (1994). The feasible solution algorithm for least trimmed squares regression. Computational Statistics & Data Analysis, 17(2), 185–196.
Hsu, D.J., Shi, K., & Sun, X. (2017) Linear regression without correspondence. In Proceedings of conference on neural information processing systems (NIPS).
Huber, P.J. (1992). Robust estimation of a location parameter. In Breakthroughs in statistics, pp. 492–518.
Jia, K., Chan, T. H., Zeng, Z., Gao, S., Wang, G., Zhang, T., & Ma, Y. (2016). Roml: A robust feature correspondence approach for matching objects in a set of images. International Journal of Computer Vision, 117(2), 173–197.
Jiang, H., Stella, X. Y., & Martin, D. R. (2010). Linear scale and rotation invariant matching. Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1339–1355.
Kuhn, A., & Mayer, H. (2015). Incremental division of very large point clouds for scalable 3d surface reconstruction. In Proceedings of international conference on computer vision workshops (ICCV workshops).
Kuhn, H. W. (1955). The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.
Larranaga, P., Kuijpers, C. M. H., Murga, R. H., Inza, I., & Dizdarevic, S. (1999). Genetic algorithms for the travelling salesman problem: A review of representations and operators. Artificial Intelligence Review, 13(2), 129–170.
Le, H., Chin, T.J., & Suter, D. (2017) An exact penalty method for locally convergent maximum consensus. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 1888–1896.
Le, H., Chin, T. J., Eriksson, A., Do, T. T., & Suter, D. (2019). Deterministic approximate methods for maximum consensus robust fitting. Transactions on Pattern Analysis and Machine Intelligence, 43(3), 842–857.
Li, H., & Hartley, R. (2007). The 3d-3d registration problem revisited. In Proceedings of international conference on computer vision (ICCV).
Li, F., Fujiwara, k., Okura, F., & Matsushita, Y. (2021). Generalized shuffled linear regression. In Proceedings of international conference on computer vision (ICCV).
Lian, W., & Zhang, L. (2014). Point matching in the presence of outliers in both point sets: A concave optimization approach. In Proceedings of conference on computer vision and pattern recognition (CVPR).
Li, J., So, A. M. C., & Ma, W. K. (2020). Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions. Signal Processing Magazine, 37(5), 18–31.
Lowe, D.G. (1999). Object recognition from local scale-invariant features. In Proceedings of international conference on computer vision (ICCV).
Lubiw, A. (1981). Some np-complete problems similar to graph isomorphism. SIAM Journal on Computing, 10(1), 11–21.
Maciel, J., & Costeira, J. P. (2003). A global solution to sparse correspondence problems. Transactions on Pattern Analysis and Machine Intelligence, 25(2), 187–199.
Marques, M., Stošić, M., & Costeira, J. (2009). Subspace matching: Unique solution to point matching with geometric constraints. In Proceedings of international conference on computer vision (ICCV), pp. 1288–1294.
Maset, E., Arrigoni, F., & Fusiello, A. (2017). Practical and efficient multi-view matching. In Proceedings of international conference on computer vision (ICCV), pp. 4568–4576.
Mathias, R. (2006). The Linear Algebra a Beginning Graduate Student Ought to Know.
Melzi, S., Ren, J., Rodolà, E., Sharma, A., Wonka, P., & Ovsjanikov, M. (2019). Zoomout: Spectral upsampling for efficient shape correspondence. Transactions on Graphics, 38(6).
Mohamed, I. S., Capitanelli, A., Mastrogiovanni, F., Rovetta, S., & Zaccaria, R. (2019). A 2d laser rangefinder scans dataset of standard eur pallets. Data in brief, 24, 103837.
Myronenko, A., & Song, X. (2010). Point set registration: Coherent point drift. Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2262–2275.
Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. In Proceedings of symposium on security and privacy.
Nejatbakhsh, A., & Varol, E. (2021). Neuron matching in c. elegans with robust approximate linear regression without correspondence. In Proceedings of winter conference on applications of computer vision (WACV).
Ovsjanikov, M., Ben-Chen, M., Solomon, J., Butscher, A., & Guibas, L. (2012). Functional maps: A flexible representation of maps between shapes. Transactions on Graphics, 31(4).
Pachauri, D., Kondor, R., & Singh, V. (2013). Solving the multi-way matching problem by permutation synchronization. In Proceedings of conference on neural information processing systems (NIPS), vol. 26.
Pananjady, A., Wainwright, M.J., & Courtade, T.A. (2017). Denoising linear models with permuted data. In Proceedings of international symposium on information theory (ISIT).
Pananjady, A., Wainwright, M. J., & Courtade, T. A. (2017). Linear regression with shuffled data: Statistical and computational limits of permutation recovery. Transactions on Information Theory, 64(5), 3286–3300.
Pomerleau, F., Liu, M., Colas, F., & Siegwart, R. (2012). Challenging data sets for point cloud registration algorithms. International Journal of Robotics Research, 31(14), 1705–1711.
Pylvänäinen, T., Berclaz, J., Korah, T., Hedau, V., Aanjaneya, M., & Grzeszczuk, R. (2012). 3d city modeling from street-level data for augmented reality applications. In Proceedings of international conference on 3D imaging, modeling, processing, visualization & transmission, pp. 238–245.
Ren, J., Poulenard, A., Wonka, P., & Ovsjanikov, M. (2018). Continuous and orientation-preserving correspondences via functional maps. Transactions on Graphics, 37(6), 1–6.
Rousseeuw, P.J., & Leroy, A.M. (2005). Robust regression and outlier detection.
Rusinkiewicz, S. (2019). A symmetric objective function for icp. Transactions on Graphics, 38(4).
Rusu, R.B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (fpfh) for 3d registration. In Proceedings of international conference on robotics and automation (ICRA).
Shiratori, T., Berclaz, J., Harville, M., Shah, C., Li, T., Matsushita, Y., & Shiller, S. (2015). Efficient large-scale point cloud registration using loop closures. In Proceedings of international conference on 3D vision (3DV), pp. 232–240.
Slawski, M., Ben-David, E., et al. (2019). Linear regression with sparsely permuted data. Electronic Journal of Statistics, 13(1), 1–36.
Slawski, M., Ben-David, E., & Li, P. (2019). A two-stage approach to multivariate linear regression with sparsely mismatched data. Journal of Machine Learning Research, 21(204), 1–42.
Stošić, M., Marques, M., & Costeira, J. P. (2011). Convex solution of a permutation problem. Linear Algebra and its Applications, 434(1), 361–369.
Theiler, P., Schindler, K., et al. (2012). Automatic registration of terrestrial laser scanner point clouds using natural planar surfaces. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 3, 173–178.
Tsakiris, M., & Peng, L. (2019). Homomorphic sensing. In Proceedings of international conference on machine learning (ICML).
Unnikrishnan, J., Haghighatshoar, S., & Vetterli, M. (2018). Unlabeled sensing with random linear measurements. Transactions on Information Theory, 64(5), 3237–3253.
Vestner, M., Lähner, Z., Boyarski, A., Litany, O., Slossberg, R., Remez, T., Rodola, E., Bronstein, A., Bronstein, M., & Kimmel, R., et al. (2017). Efficient deformable shape correspondence via kernel matching. In Proceedings of international conference on 3D vision (3DV).
Volgenant, A. (2004). Solving the k-cardinality assignment problem by transformation. European Journal of Operational Research, 157(2), 322–331.
Vongkulbhisal, J., De la Torre, F., & Costeira, J. P. (2018). Discriminative optimization: Theory and applications to computer vision. Transactions on Pattern Analysis and Machine Intelligence, 41(4), 829–843.
Wang, F., Xue, N., Yu, J.G., & Xia, G.S. (2020). Zero-assignment constraint for graph matching with outliers. In Proceedings of conference on computer vision and pattern recognition (CVPR).
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 1912–1920.
Xu, Y., & Yin, W. (2013). A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6(3), 1758–1789.
Xu, Y., & Yin, W. (2017). A globally convergent algorithm for nonconvex optimization based on block coordinate update. Journal of Scientific Computing, 72(2), 700–734.
Yadav, S.S., Lopes, P.A.C., Ilic, A., & Patra, S.K. (2019). Hungarian algorithm for subcarrier assignment problem using gpu and cuda. International Journal of Communication Systems, 32(4).
Yang, E., Lozano, A. C., & Aravkin, A. (2018). A general family of trimmed estimators for robust high-dimensional data analysis. Electronic Journal of Statistics, 12(2), 3519–3553.
Yang, H., Shi, J., & Carlone, L. (2020). Teaser: Fast and certifiable point cloud registration. Transactions on Robotics, 37(2), 314–333.
Zangwill, W.I. (1969). Nonlinear programming: a unified approach.
Zhang, H., Slawski, M., & Li, P. (2019). Permutation recovery from multiple measurement vectors in unlabeled sensing. In Proceedings of international symposium on information theory (ISIT).
Zhou, Q.Y., Park, J., & Koltun, V. (2016). Fast global registration. In Proceedings of European conference on computer vision (ECCV).
Acknowledgements
This work was supported by NII CRIS collaborative research program operated by NII CRIS and LINE Corporation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Federica Arrigoni.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Supplementary information for the analyses of GSLR
1.1 A.1 Derivation of Corollary 1
Given a generic linear subspace \(\mathcal {V} \subset \mathbb {R}^m\) of dimension d (hereafter denoted as \(\textsf{dim}\left( \mathcal {V}\right) = d\)), and a finite set of endomorphisms \(\mathcal {T}\) of \(\mathbb {R}^m\), homomorphic sensing (Tsakiris and Peng 2019) presents the unique recovery condition in \(\mathcal {V}\) under \(\mathcal {T}\). I.e., \(\forall v_1, v_2 \in \mathcal {V}, \tau _1, \tau _2 \in \mathcal {T}\), when can \(\tau _1\left( v_1\right) =\tau _2\left( v_2\right) \) imply \(v_1 = v_2\).
For self-containment, let us first remark Theorem 1 and Proposition 3 of Tsakiris and Peng (2019), which can be jointly described by the following corollary:
Corollary 2
(Unique recovery condition of homomorphic sensing) Assuming \(\forall \tau \in \mathcal {T}, \textsf{rank}\left( \tau \right) \ge 2d\), then, we have unique recovery in \(\mathcal {V}\) under \(\mathcal {T}\) as long as
where \(\mathcal {Q}_{\tau _1, \tau _2}\) is an intermediate technical term dependent to \(\tau _1\) and \(\tau _2\) such that, in case \(\pi _1, \pi _2 \in \Pi ^\ddagger \) are full-rank square permutation matrices, we have
where \(\rho \) denotes a coordinate projection matrix.
To apply Corollary 2 to GSLR, we shall consider \(\textbf{Ax}\) as a whole for some d-dimensional \(\textbf{x}\) and set \(\mathcal {V} = \{\textbf{Ax} : \textbf{x} \in \mathcal {X}\}\). Since linear mapping does not increase dimension, we have \(\textsf{dim}\left( \mathcal {V}\right) \le d\). Futhermore, by setting \(\mathcal {T} = \Pi _k\) and the \(\rho \) in Eq. (17) to the projections that map from \(\Pi ^\ddagger \) to \(\Pi _k\), we obtain
Using Eq. (18) as a lower bound for Eq. (16), we can immediately recognize \(k \ge 2d\) as a sufficient condition for unique recovery.
1.2 A.2 Proof of Proposition 1
Proof of Proposition 1
Since the bijectivity will always be satisfied as long as Problem 2 is solved, we here show that if there exists an f to solve Problem 2 that takes the graphs G and H defined in Proposition 1 as inputs, then this f is a linear mapping from a subset of \(\textbf{A}\) to \(\textbf{b}\). Specifically, since \(\textsf{dom}\left( f\right) \) is a finitely discrete set, an f that solves Problem 2 would be a continuous mapping. Furthermore, assuming that vertices are mapped in the form of \(\textbf{A}_i\) to \(\textbf{b}_i\) and \(\textbf{A}_j\) to \(\textbf{b}_j\), we have
always holds w.r.t. the preimage of \(\textbf{b}\), indicating that f is additive. By combining the continuity and the additivity we can obtain that f is a linear mapping (Mathias 2006). \(\square \) \(\square \)
Appendix B: Supplementary information for the convergence analyses
1.1 B.1 Proofs of the global convergence
We here show that the GSLR triplet mentioned in Sect. 4.2.2 satisfies all the conditions of Theorem 1. For self- containment, we first remark the definition of closedness of set-valued mapping:
Definition 1
(Closedness of set-valued mapping (Gunawardana and Byrne 2005)) The concept of closedness of set-valued mapping \(\mathcal {F}: \mathcal {U} \rightarrow \mathcal {V}\) generalizes the notion of continuity of point-to-point mappings: \(\mathcal {F}\) is closed at a point \(u \in \mathcal {U}\) if
-
\(u_k \rightarrow u\), where \(u_k \in \mathcal {U}\),
-
\(v_k \rightarrow v\), where \(v, v_k \in \mathcal {V}\),
-
\(v_k \in \mathcal {F}(u_k)\),
imply \(v \in \mathcal {F}(u)\). Based on this, \(\mathcal {F}\) is called closed if it is closed at any points of \(\mathcal {U}\).
Now we are ready to introduce the proofs:
Proof of the compactness of \(\mathcal {Z}\) Since \(\Pi \) is discrete and finite, \(\Pi \) is compact. Since \(\mathcal {X}\) is compact per Hypotheses H and unions of compact sets are still compact, \(\mathcal {Z}\) is compact. \(\square \)
Proof of the closedness of \( \mathcal {H}\) For any \(\left( \textbf{P}_{t}, \textbf{x}_{t}\right) \), there exist some \(\delta _1, \delta _2 \in \mathbb {R}^{+}\) that satisfy
where \(\mathcal {N}_{\delta _1}(\textbf{P}_t)\) is the neighborhood of \(\textbf{P}_t\). Therefore, we can derive the differences of mapped \(\textbf{x}\) as
Since Eq. (21) implies that the cost matrix for the k-LAP generated by the regression variable \(\textbf{x} \in \mathcal {H}(\textbf{P, x})\) remains unchanged when the input of \(\mathcal {H}\) switches from \(\left( \textbf{P}_t, \textbf{x}_t\right) \) to \(\left( \mathbf {P'}, \mathbf {x'}\right) \), so will the mapped permutation \(\textbf{P}\). Therefore, we have \(\mathcal {H}(\textbf{P}_{t}, \textbf{x}_{t})\) closed by further joining Eqs. (20) and (21). \(\square \)
Proof of the continuity and monotonicity of \(\Psi \) For the continuity, since compositions of continuous mappings are still continuous, we only need to prove the continuity of \(f(\textbf{P}, \textbf{x}) = \textbf{PAx}\) and \(g(\textbf{P}) = \textbf{PP}^T\textbf{b}\). Since the proof roadmap for f and g is similar, we only take f as an example. Specifically, given the fact that such a function is a bilinear form over the finite-dimensional space \(\left( \textbf{P, x}\right) \), we only need to show that it is separately continuous w.r.t. both \(\textbf{P}\) and \(\textbf{x}\). For proof, per Hypotheses H, we have \(f(\textbf{x}) = \textbf{Ax}\) continuous w.r.t. \(\textbf{x} \in \mathcal {X}\). Since \(\Pi \) is finitely discrete, we also have \(f(\textbf{P}) = \textbf{PA}\) continuous w.r.t. any \(\textbf{P} \in \Pi \). Therefore, we can obtain that \(f(\textbf{P}, \textbf{x}) = \textbf{PAx}\) is continuous and hence so is \(\Psi \).
For the monotonicity, by conducting the linear regression step w.r.t. \(\textbf{x}\) we have
where the equality holds if and only if \(\textbf{x}_t\) is a critical point. Furthermore, assuming there is a temporary permutation matrix \(\textbf{P}_{\textrm{tmp}}\) that randomly removes \(k_{t+1} - k_t\) non-zero assignments from \(\textbf{P}_t\), since the cost matrix \(\textbf{D}_{ij} = \left( \textbf{A}_i\textbf{x}-\textbf{b}_j\right) ^2\) is non-negative, we can obtain that
Since \(\textbf{P}_{t+1}\) is a global optimum of the objective function w.r.t. \(\textbf{P}\), and it preserves the same number of inliers to \(\textbf{P}_{\textrm{tmp}}\), we can further derive that
where the equality holds if and only if \(\textbf{P}_\textrm{tmp}\) is a critical point. Summarizing Eqs. (22)-(24) leads to:
Therefore, we can conclude that Algorithm 1 is monotonically decreasing and the equality in Eq. (25) holds if and only if \(\left( \textbf{P}_t, \textbf{x}_t\right) \) is already a critical point. \(\square \)
1.2 B.2 Proofs of Propositions 2 and 3
Throughout this appendix, apart from the notations described in Table 1, we also use C to denote a constant whose value may vary from line to line.
To prove Proposition 2, due to the non-smoothness of our Objective (6), we following the concept of the KL-inequality generalized to sub-analytic set (Bolte et al. 2007). Let us first remark the definitions for self-containment:
Definition 2
(Kurdyka-Łojasiewicz inequality) Let f be a continuous sub-analytic function (i.e., a function whose graph is a sub-analytic set) with closed domain that maps to \(\mathbb {R}\), and \({\bar{x}} \in \textsf{dom}\left( f\right) \) be a critical point, there exists a concave function \(\phi \left( s\right) = Cs^{1 - \theta }\) with \(\theta \in \left[ 0, 1\right) \) and \(C > 0\) such that the following inequality holds for any x within the neighborhood of \({\bar{x}}\) under the conventions \(0^0=1\) and \(0/0=\infty /\infty =0\):
where \(\textrm{dis}\left( 0, \partial f(x)\right) = \inf \{ \left\| x^\star \right\| : x^\star \in \partial f(x) \}\) denotes the point-to-set distance.
Definition 3
(Semi-X & sub-analytic sets) A subset \(\mathcal {S}\) of \(\mathbb {R}^{d}\) is called semi-algebraic if every \(\textbf{x} \in \mathbb {R}^{d}\) admits a neighborhood \(\mathcal {N}\) such that
where \(f_{ij}\) and \(g_{ij}\) are polynomials and \(m, n < +\infty \).
A semi-analytic set generalizes the above definition by relaxing \(f_{ij}\) and \(g_{ij}\) to real analytic functions.
A subset \(\mathcal {S}\) of \(\mathbb {R}^{d}\) is called sub-analytic if every \(\textbf{x} \in \mathbb {R}^{d}\) admits a neighborhood \(\mathcal {N}\), and there exists a bounded semi-analytic set \(\mathcal {Y} \subset \mathbb {R}^{d+o}\) such that \(\mathcal {S} \cap \mathcal {N}\) is the projection from \(\mathcal {Y}\) to \(\mathcal {N}\):
where \(\pi : \mathbb {R}^{d+o} \rightarrow \mathbb {R}^{d}\) is the projection. In summary, sub-analytic set broadens the semi-analytic one, and the semi-analytic set broadens the semi-algebraic one.
We also introduce the following lemma:
Lemma 1
The set of generalized permutation matrices \(\Pi \) can be characterized by the following polynomials:
where \(\textbf{e}_i\) is the one-hot vector with 1 at position i, \(\textbf{D} = \{ \textbf{PP}^T\), \(\textbf{P}^T\textbf{P}\}\), and \(\left\lceil n \right\rfloor \) is the set of positive integers that are no greater than the size of \(\textbf{D}\).
Proof of Lemma 1 This lemma is an extension of the properties of the full-rank permutation matrices, which states that a matrix is permutational if and only if it is orthogonal and doubly stochastic (Fogel et al. 2013). The latter two equations imply that the diagonal entries of \(\textbf{D}\) should be within \(\{0, 1\}\) and the off-diagonal ones 0. While the sufficiency is guaranteed by the definition of generalized permutation matrices, we here demonstrate that these polynomials are necessary for a matrix to be permutational by forcing \(\textbf{P}\) to be binary. Specifically, assuming there is a matrix \(\textbf{P} \in \Pi \) with entries \(\textbf{P}_{ip}, \ldots , \textbf{P}_{iq} \in \left( 0, 1\right) \) with \(1 \le p < q \le n\) in the \(i^{\textrm{th}}\) row, then the \(pq^{\textrm{th}}\) entry of \(\textbf{D} = \textbf{P}^T\textbf{P}\) can be denoted as
which contradicts the requirement that the off-diagonal entries of \(\textbf{D}\) should be 0. \(\textbf{D} = \textbf{PP}^T\) follows the same proof. \(\square \)
Now we are ready to prove Proposition 2:
Proof of Proposition 2 The first property is a direct result of the fact that \(\triangledown _\textbf{x} J = 2\left( \textbf{A}^T\textbf{P}^T\textbf{PA}\textbf{x} - \textbf{PP}^T\textbf{b}\right) \) is a continuous linear mapping.
For the second property, from Lemma 1 we know that \(\Pi \) is semi-algebraic (hence sub-analytic). Moreover, given the facts that \(\mathcal {X}\) is sub-analytic per the assumption, Objective (6) is polynomial, and unions and intersections preserve sub-analyticity per Definition 3; we have the graph of Objective (6) sub-analytic. Joining the sub-analyticity together with the continuity mentioned in Sect. 4.2.2 implies the KL-inequality. \(\square \)
To prove Proposition 3, we introduce the following two lemmas:
Lemma 2
Suppose that Hypotheses H are satisfied, then for any \(t^{\textrm{th}}\) step of Algorithm 1, the following equation holds true:
Proof of Lemma 2 Per the alternative update rule of Algorithm 1 we have
From Eq. (28) we can obtain
which together with Eq. (29) concludes the proof. \(\square \)
Lemma 3
Suppose that Hypotheses H are satisfied, then for any \(t^{\textrm{th}}\) step of Algorithm 1, there exists \(C \in \mathbb {R}^{+}\) such that the following equation holds true:
Proof of Lemma 3 We base the proof on the quotient law of convergent sequences. In detail, let us define the sequences:
-
1.
Sequence of difference between energy values: \(P = \left\{ J_{t} - J_{t+1} : t \in \mathbb {N} \right\} \).
-
2.
Sequence of squared difference between solutions: \(Q = \left\{ \left\| \textbf{z}_t - \textbf{z}_{t+1}\right\| ^2 : t \in \mathbb {N} \right\} \).
-
3.
Sequence of quotient: \(S = \left\{ \frac{P_t}{Q_t} : t \in \mathbb {N} \right\} \).
Per the compactness of \(\textbf{z}\) and continuity of J, we know that both \(\textbf{z}\) and J are bounded. Moreover, per the global convergences mentioned in Sect. 4.2.2, for any \(t \in \mathbb {N}\) we have \(P_t, Q_t \ge 0\) and \(P_t = 0 \Leftrightarrow Q_t = 0\). Therefore, with the convention \(\frac{0}{0} = 1\), for any \(s \in S\) we have \(s \in \mathbb {R}_{+}\), which implies that there always exist some \(C \in \mathbb {R}_{+}\) such that \(\inf S \ge C\) holds. \(\square \)
With Lemmas 2 and 3 in hand, we now can prove Proposition 3:
Proof of Proposition 3 According to Lemma 2 and the Lipschitz continuity of \( \triangledown _{\textbf{x}} J\), we have
where \(L \in \mathbb {R}^{+}\) is the Lipschitz constant.
Assuming the energy value \({\bar{J}}\) at the critical point satisfies \({\bar{J}} = 0\) without loss of generality, combining Eq. (31) and Eq. (26) leads to
On the other hand, by using the concavity of \(\phi \) and Lemma 3, we can obtain
Taking Eq. (32) into Eq. (33) yields
Taking square root on both sides of Eq. (34) and applying the inequality \(a^2 + b^2 \ge 2ab\) result in
Summing Eq. (35) on both sides for \(t \in \left[ p, +\infty \right) \) leads to
Since \(\mathcal {Z}\) is compact as mentioned in Sect. 4.2.2, there exists a B such that \(\sup \{\left\| \textbf{z}_i - \textbf{z}_j\right\| : \textbf{z}_i, \textbf{z}_j \in \mathcal {Z} \} = B < +\infty \). Hence Eq. (36) yields
which concludes the proof. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, F., Fujiwara, K., Okura, F. et al. Shuffled Linear Regression with Outliers in Both Covariates and Responses. Int J Comput Vis 131, 732–751 (2023). https://doi.org/10.1007/s11263-022-01709-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01709-2