Abstract
Robust model fitting plays a vital role in computer vision, and research into algorithms for robust fitting continues to be active. Arguably the most popular paradigm for robust fitting in computer vision is consensus maximisation, which strives to find the model parameters that maximise the number of inliers. Despite the significant developments in algorithms for consensus maximisation, there has been a lack of fundamental analysis of the problem in the computer vision literature. In particular, whether consensus maximisation is “tractable” remains a question that has not been rigorously dealt with, thus making it difficult to assess and compare the performance of proposed algorithms, relative to what is theoretically achievable. To shed light on these issues, we present several computational hardness results for consensus maximisation. Our results underline the fundamental intractability of the problem, and resolve several ambiguities existing in the literature.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Robustly fitting a geometric model onto noisy and outlier-contaminated data is a necessary capability in computer vision [1], due to the imperfectness of data acquisition systems and preprocessing algorithms (e.g., edge detection, keypoint detection and matching). Without robustness against outliers, the estimated geometric model will be biased, leading to failure in the overall pipeline.
In computer vision, robust fitting is typically performed under the framework of inlier set maximisation, a.k.a. consensus maximisation [2], where one seeks the model with the most number of inliers. For concreteness, say we wish to estimate the parameter vector \(\mathbf {x}\in \mathbb {R}^d\) that defines the linear relationship \(\mathbf {a}^T \mathbf {x}= b\) from a set of outlier-contaminated measurements \(\mathcal {D}= \{(\mathbf {a}_i,b_i)\}^{N}_{i=1}\). The consensus maximisation formulation for this problem is as follows.
Problem 1
(MAXCON). Given input data \(\mathcal {D}= \{(\mathbf {a}_i,b_i)\}^{N}_{i=1}\), where \(\mathbf {a}_i \in \mathbb {R}^d\) and \(b_i \in \mathbb {R}\), and an inlier threshold \( \epsilon \in \mathbb {R}_+\), find the \(\mathbf {x}\in \mathbb {R}^d\) that maximises
where \(\mathbb {I}\) returns 1 if its input predicate is true, and 0 otherwise.
The quantity \(| \mathbf {a}_i^T\mathbf {x}- b_i |\) is the residual of the i-th measurement with respect to \(\mathbf {x}\), and the value given by \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D})\) is the consensus of \(\mathbf {x}\) with respect to \(\mathcal {D}\). Intuitively, the consensus of \(\mathbf {x}\) is the number of inliers of \(\mathbf {x}\). For the robust estimate to fit the inlier structure well, the inlier threshold \(\epsilon \) must be set to an appropriate value; the large number of applications that employ the consensus maximisation framework indicate that this is usually not an obstacle.
Developing algorithms for robust fitting, specifically for consensus maximisation, is an active research area in computer vision. Currently, the most popular algorithms belong to the class of randomised sampling techniques, i.e., RANSAC [2] and its variants [3, 4]. Unfortunately, such techniques do not provide certainty of finding satisfactory solutions, let alone optimal ones [5].
Increasingly, attention is given to constructing globally optimal algorithms for robust fitting, e.g., [6,7,8,9,10,11,12,13,14]. Such algorithms are able to deterministically calculate the best possible solution, i.e., the model with the highest achievable consensus. This mathematical guarantee is regarded as desirable, especially in comparison to the “rough” solutions provided by random sampling heuristics.
Recent progress in globally optimal algorithms for consensus maximisation seems to suggest that global solutions can be obtained efficiently or tractably [6,7,8,9,10,11,12,13,14]. Moreover, decent empirical performances have been reported. This raises hopes that good alternatives to the random sampling methods are now available. However, to what extent is the problem solved? Can we expect the global algorithms to perform well in general? Are there fundamental obstacles toward efficient robust fitting algorithms? What do we even mean by “efficient”?
1.1 Our Contributions and Their Implications
Our contributions are theoretical. We resolve the above ambiguities in the literature, by proving the following computational hardness results. The implications of each result are also listed below.

As usual, the implications of the hardness results are subject to the standard complexity assumptions P\(\ne \)NP [15] and FPT\(\ne \)W[1]-hard [16].
Our analysis indicates the “extreme” difficulty of consensus maximisation. MAXCON is not only intractable (by standard notions of intractability [15, 16]), the W[1]-hardness result also suggests that any global algorithm will scale exponentially in a function of d, i.e., \(N^{f(d)}\). In fact, if a conjecture of Erickson et al. [17] holds, MAXCON cannot be solved faster than \(N^d\). Thus, the decent performances in [6,7,8,9,10,11,12,13,14] are unlikely to extend to the general cases in practical settings, where \(N \ge 1000\) and \(d \ge 6\) are common. More pessimistically, APX-hardness shows that MAXCON is impossible to approximate, in that there are no polynomial time approximation schemes (PTAS) [18] for MAXCONFootnote 1.
A slightly positive result is as follows.

This is achieved by applying a special case of the algorithm of Chin et al. [13] on MAXCON to yield a runtime of \(\mathcal {O}(d^o)\text {poly}(N,d)\). However, this still scales exponentially in o, which can be large in practice (e.g., \(o \ge 100\)).
1.2 How Are Our Theoretical Results Useful?
First, our results clarify the ambiguities on the efficiency and solvability of consensus maximisation alluded to above. Second, our analysis shows how the effort scales with the different input size parameters, thus suggesting more cogent ways for researchers to test/compare algorithms. Third, since developing algorithms for consensus maximisation is an active topic, our hardness results encourage researchers to consider alternative paradigms of optimisation, e.g., deterministically convergent heuristic algorithms [19,20,21] or preprocessing techniques [22,23,24].
1.3 What About Non-linear Models?
Our results are based specifically on MAXCON, which is concerned with fitting linear models. In practice, computer vision applications require the fitting of non-linear geometric models (e.g., fundamental matrix, homography, rotation). While a case-by-case treatment is ideal, it is unlikely that non-linear consensus maximisation will be easier than linear consensus maximisation [25,26,27].
1.4 Why Not Employ Other Robust Statistical Procedures?
Our purpose here is not to benchmark or advocate certain robust criteria. Rather, our primary aim is to establish the fundamental difficulty of consensus maximisation, which is widely used in computer vision. Second, it is unlikely that other robust criteria are easier to solve [28]. Although some that use differentiable robust loss functions (e.g., M-estimators) can be solved up to local optimality, it is unknown how far the local optima deviate from the global solution.
The rest of the paper is devoted to developing the above hardness results.
2 NP-hardness
The decision version of MAXCON is as follows.
Problem 2
(MAXCON-D). Given data \(\mathcal {D}= \{(\mathbf {a}_i,b_i)\}^{N}_{i=1}\), an inlier threshold \(\epsilon \in \mathbb {R}_+\), and a number \(\psi \in \mathbb {N}_+\), does there exist \(\mathbf {x}\in \mathbb {R}^d\) such that \(\varPsi _\epsilon (\mathbf {x}\mid \mathcal {D}) \ge \psi \)?
Another well-known robust fitting paradigm is least median squares (LMS), where we seek the vector \(\mathbf {x}\) that minimises the median of the residuals
LMS can be generalised by minimising the k-th largest residual instead
where function \(\mathrm {kos}\) returns its k-th largest input value.
Geometrically, LMS seeks the slab of the smallest width that contains half of the data points \(\mathcal {D}\) in \(\mathbb {R}^{d+1}\). A slab in \(\mathbb {R}^{d+1}\) is defined by a normal vector \(\mathbf {x}\) and width w as
Problem (3) thus seeks the thinnest slab that contains k of the points. The decision version of (3) is as follows.
Problem 3
(k-SLAB). Given data \(\mathcal {D}= \{(\mathbf {a}_i,b_i)\}^{N}_{i=1}\), an integer k where \(1 \le k \le N\), and a number \(w^\prime \in \mathbb {R}_+\), does there exist \(\mathbf {x}\in \mathbb {R}^d\) such that k of the members of \(\mathcal {D}\) are contained in a slab \(h_w(\mathbf {x})\) of width at most \(w^\prime \)?
k-SLAB has been proven to be NP-complete in [17].
Theorem 1
MAXCON-D is NP-complete.
Proof
Let \(\mathcal {D}\), k and \(w^\prime \) define an instance of k-SLAB. This can be reduced to an instance of MAXCON-D by simply reusing the same \(\mathcal {D}\), and setting \(\epsilon = \frac{1}{2}w^\prime \) and \(\psi = k\). If the answer to k-SLAB is positive, then there is an \(\mathbf {x}\) such that k points from \(\mathcal {D}\) lie within vertical distance of \(\frac{1}{2}w^\prime \) from the hyperplane defined by \(\mathbf {x}\), hence \(\varPsi _\epsilon (\mathbf {x}\mid \mathcal {D})\) must be at least \(\psi \) and the answer to MAXCON-D is also positive. Conversely, if the answer to MAXCON-D is positive, then there is an \(\mathbf {x}\) such that \(\psi \) points have vertical distance of less than \(\epsilon \) to \(\mathbf {x}\), hence a slab that is centred at \(\mathbf {x}\) of width at most \(w^\prime \) can enclose k of the points, and the answer to k-SLAB is also positive.\(\square \)
The NP-completeness of MAXCON-D implies the NP-hardness of the optimisation version MAXCON. See Sect. 1.1 for the implications of NP-hardness.
3 Parametrised Complexity
Parametrised complexity is a branch of algorithmics that investigates the inherent difficulty of problems with respect to structural parameters in the input [16]. In this section, we report several parametrised complexity results of MAXCON.
First, the consensus set \(\mathcal {C}_\epsilon (\mathbf {x}\mid \mathcal {D})\) of \(\mathbf {x}\) is defined as
An equivalent definition of consensus (1) is thus
Henceforth, we do not distinguish between the integer subset \(\mathcal {C}\subseteq \{1,\dots ,N \}\) that indexes a subset of \(\mathcal {D}\), and the actual data that are indexed by \(\mathcal {C}\).
3.1 XP in the Dimension
The following is the Chebyshev approximation problem [29, Chapter 2] defined on the input data indexed by \(\mathcal {C}\):
Problem (7) has the linear programming (LP) formulation

which can be solved in polynomial time. Chebyshev approximation also has the following property.
Lemma 1
There is a subset \(\mathcal {B}\) of \(\mathcal {C}\), where \(|\mathcal {B}| \le d+1\), such that
Proof
See [29, Sect. 2.3]. \(\square \)
We call \(\mathcal {B}\) a basis of \(\mathcal {C}\). Mathematically, \(\mathcal {B}\) is the set of active constraints to \(\mathrm {LP}[\mathcal {C}]\), hence bases can be computed easily. In fact, \(\mathrm {LP}[\mathcal {B}]\) and \(\mathrm {LP}[\mathcal {C}]\) have the same minimisers. Further, for any subset \(\mathcal {B}\) of size \(d+1\), a method by de la Vallée-Poussin can solve \(\mathrm {LP}[\mathcal {B}]\) analytically in time polynomial to d; see [29, Chapter 2] for details.
Let \(\mathbf {x}\) be an arbitrary candidate solution to MAXCON, and \((\hat{\mathbf {x}}, \hat{\gamma })\) be the minimisers to \(\mathrm {LP}[\mathcal {C}_\epsilon (\mathbf {x}\mid \mathcal {D})]\), i.e., the Chebyshev approximation problem on the consensus set of \(\mathbf {x}\). The following property can be established.
Lemma 2
\(\mathrm {\Psi }_\epsilon (\hat{\mathbf {x}} \mid \mathcal {D}) \ge \mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D})\).
Proof
By construction, \(\hat{\gamma } \le \epsilon \). Hence, if \((\mathbf {a}_i, b_i)\) is an inlier to \(\mathbf {x}\), i.e., \(|\mathbf {a}^T_i \mathbf {x}- b_i| \le \epsilon \), then \(|\mathbf {a}_i^T\hat{\mathbf {x}} -b_i | \le \hat{\gamma } \le \epsilon \), i.e., \((\mathbf {a}_i, b_i)\) is also an inlier to \(\hat{\mathbf {x}}\). Thus, the consensus of \(\hat{\mathbf {x}}\) is no smaller than the consensus of \(\mathbf {x}\). \(\square \)
Lemmas 1 and 2 suggest a rudimentary algorithm for consensus maximisation that attempts to find the basis of the maximum consensus set, as encapsulated in the proof of the following theorem.
Theorem 2
MAXCON is XP (slice-wise polynomial) in the dimension d.
Proof
Let \(\mathbf {x}^*\) be a witness to an instance of MAXCON-D with positive answer, i.e., \(\mathrm {\Psi }_\epsilon (\mathbf {x}^* \mid \mathcal {D}) \ge \psi \). Let \((\hat{\mathbf {x}}^*, \hat{\gamma }^*)\) be the minimisers to \(\mathrm {LP}[\mathcal {C}_\epsilon (\mathbf {x}^* \mid \mathcal {D})]\). By Lemma 2, \(\hat{\mathbf {x}}^*\) is also a positive witness to the instance. By Lemma 1, \(\hat{\mathbf {x}}^*\) can be found by enumerating all \((d+1)\)-subsets of \(\mathcal {D}\), and solving Chebyshev approximation (7) on each \((d+1)\)-subset. There are a total of \(\left( {\begin{array}{c}N\\ d+1\end{array}}\right) \) subsets to check; including the time to evaluate \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D})\) for each candidate, the runtime of this simple algorithm is \(\mathcal {O}(N^{d+2}\mathrm {poly}(d))\), which is polynomial in N for a fixed d. \(\square \)
Theorem 2 shows that for a fixed dimension d, MAXCON can be solved in time polynomial in the number of measurements N (this is consistent with the results in [8, 12]). However, this does not imply that MAXCON is tractable (following the standard meaning of tractability in complexity theory [15, 16]). Moreover, in practical applications, d could be large (e.g., \(d \ge 5\)), thus the rudimentary algorithm above will not be efficient for large N.
3.2 W[1]-Hard in the Dimension
Can we remove d from the exponent of the runtime of a globally optimal algorithm? By establishing W[1]-hardness in the dimension, this section shows that it is not possible. Our proofs are inspired by, but extends quite significantly from, that of [30, Sect. 5]. First, the source problem is as follows.
Problem 4
(k-CLIQUE). Given undirected graph \(G = (V, E)\) with vertex set V and edge set E and a parameter \(k \in \mathbb {N}_+\), does there exist a clique in G with k vertices?
k-CLIQUE is W[1]-hard w.r.t. parameter k [31]. Here, we demonstrate an FPT reduction from k-CLIQUE to MAXCON-D with fixed dimension d.
Generating the Input Data. Given input graph \(G = (V, E)\), where \(V = \{1,\dots ,M \}\), and size k, we construct a \((k+1)\)-dimensional point set \(\mathcal {D}_{G} = \{ (\mathbf {a}_i, b_i) \}^{N}_{i=1} = \mathcal {D}_V \cup \mathcal {D}_E\) as follows:
-
The set \(\mathcal {D}_V\) is defined as
$$\begin{aligned} \mathcal {D}_V = \{ (\mathbf {a}^v_\alpha , b^v_\alpha ) \}_{\alpha = 1,\dots ,k}^{v = 1, \dots , M}, \end{aligned}$$(9)where
$$\begin{aligned} \mathbf {a}^v_\alpha = \left[ 0, \dots , 0, 1, 0, \dots , 0 \right] ^T \end{aligned}$$(10)is a k-dimensional vector of 0’s except at the \(\alpha \)-th element where the value is 1, and
$$\begin{aligned} b^v_\alpha = v. \end{aligned}$$(11) -
The set \(\mathcal {D}_E\) is defined as
$$\begin{aligned} \begin{aligned} \mathcal {D}_E = \{ (\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta }) \mid ~&u, v = 1,\dots ,M, \\&\langle u, v \rangle \in E, \langle v, u \rangle \in E,\\&\alpha , \beta = 1, \dots , k,\\&\alpha < \beta ~\}, \end{aligned} \end{aligned}$$(12)where
$$\begin{aligned} \mathbf {a}^{u,v}_{\alpha ,\beta } = \left[ 0, \dots , 0, 1, 0, \dots , 0, M, 0, \dots , 0 \right] ^T \end{aligned}$$(13)is a k-dimensional vector of 0’s, except at the \(\alpha \)-th element where the value is 1 and the \(\beta \)-th element where the value is M, and
$$\begin{aligned} b^{u,v}_{\alpha ,\beta } = u + Mv. \end{aligned}$$(14)
The size N of \(\mathcal {D}_{G}\) is thus \(|\mathcal {D}_V| + |\mathcal {D}_E| = kM + 2|E|\left( {\begin{array}{c}k\\ 2\end{array}}\right) \).
Setting the Inlier Threshold. Under our reduction, \(\mathbf {x}\in \mathbb {R}^d\) is responsible for “selecting” a subset of the vertices V and edges E of G. First, we say that \(\mathbf {x}\) selects vertex v if a point \((\mathbf {a}^v_\alpha , b^v_\alpha ) \in \mathcal {D}_V\), for some \(\alpha \), is an inlier to \(\mathbf {x}\), i.e., if
where \(x_\alpha \) is the \(\alpha \)-th element of \(\mathbf {x}\). The key question is how to set the value of the inlier threshold \(\epsilon \), such that \(\mathbf {x}\) selects no more than k vertices, or equivalently, such that \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_V) \le k\) for all \(\mathbf {x}\).
Lemma 3
If \(\epsilon < \frac{1}{2}\), then \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_V) \le k\), with equality achieved if and only if \(\mathbf {x}\) selects k vertices of G.
Proof
For any u and v, the ranges \([u-\epsilon , u+\epsilon ]\) and \([v-\epsilon , v+\epsilon ]\) cannot overlap if \(\epsilon < \frac{1}{2}\). Hence, \(x_\alpha \) lies in at most one of the ranges, i.e., each element of \(\mathbf {x}\) selects at most one of the vertices; see Fig. 1. This implies that \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_V) \le k\). \(\square \)
The blue dots indicate the integer values in the dimensions \(x_\alpha \) and \(x_\beta \). If \(\epsilon <\frac{1}{2}\), then the ranges defined by (15) for all \(v = 1,\dots ,M\) do not overlap. Hence, \(x_\alpha \) can select at most one vertex of the graph. (Color figure online)
Second, a point \((\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta })\) from \(\mathcal {D}_E\) is an inlier to \(\mathbf {x}\) if
As suggested by (16), the pairs of elements of \(\mathbf {x}\) are responsible for selecting the edges of G. To prevent each element pair \(x_\alpha , x_\beta \) from selecting more than one edge, or equivalently, to maintain \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_E) \le \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), the setting of \(\epsilon \) is crucial.
Lemma 4
If \(\epsilon < \frac{1}{2}\), then \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_E) \le \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), with equality achieved if and only if \(\mathbf {x}\) selects \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) edges of G.
Proof
For each \(\alpha , \beta \) pair, the constraint (16) is equivalent to the two linear inequalities
which specify two opposing half-planes (i.e., a slab) in the space \((x_\alpha ,x_\beta )\). Note that the slopes of the half-plane boundaries do not depend on u and v. For any two unique pairs \((u_1, v_1)\) and \((u_2, v_2)\), we have the four linear inequalities
The system (18) can be simplified to
Setting \(\epsilon < \frac{1}{2}\) ensures that the two inequalities (19) cannot be consistent for all unique pairs \((u_1, v_1)\) and \((u_2, v_2)\). Geometrically, with \(\epsilon < \frac{1}{2}\), the two slabs defined by (17) for different \((u_1, v_1)\) and \((u_2, v_2)\) pairs do not intersect; see Fig. 2 for an illustration.
The blue dots indicate the integer values in the dimensions \(x_\alpha \) and \(x_\beta \). If \(\epsilon <\frac{1}{2}\), then any two slabs defined by (17) for different \((u_1, v_1)\) and \((u_2, v_2)\) pairs do not intersect. The figure shows two slabs corresponding to \(u_1 = 1\), \(v_1 = 5\), \(u_2 = 2\), \(v_2 = 5\). (Color figure online)
Hence, if \(\epsilon < \frac{1}{2}\), each element pair \(x_\alpha , x_\beta \) of \(\mathbf {x}\) can select at most one of the edges. Cumulatively, \(\mathbf {x}\) can select at most \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) edges, thus \(\mathrm {\Psi }_{\epsilon }(\mathbf {x}\mid \mathcal {D}_E) \le \left( {\begin{array}{c}k\\ 2\end{array}}\right) \). \(\square \)
Up to this stage, we have shown that if \(\epsilon < \frac{1}{2}\), then \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) \le k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), with equality achievable if there is a clique of size k in G. To establish the FPT reduction, we need to establish the reverse direction, i.e., if \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) = k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), then there is a k-clique in G. The following lemma shows that this can be assured by setting \(\epsilon <\frac{1}{M+2}\).
Lemma 5
If \(\epsilon < \frac{1}{M+2}\), then \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) \le k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), with equality achievable if and only if there is a clique of size k in G.
Proof
The ‘only if’ direction has already been proven. To prove the ‘if’ direction, we show that if \(\epsilon <\frac{1}{M+2}\) and \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) = k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), the subgraph S(\(\mathbf {x}\)) = \(\{\lfloor x_1 \rceil ,\ldots ,\lfloor x_k \rceil \}\) is a k-clique, where each \(\lfloor x_\alpha \rceil \) represents a vertex index in G. Since \(\epsilon <\frac{1}{2}\), \(\lfloor x_\alpha \rceil = u\) if and only if \((\mathbf {a}^u_\alpha , b^u_\alpha )\) is an inlier. Therefore, S(\(\mathbf {x}\)) consists of all vertices selected by \(\mathbf {x}\). From Lemmas 3 and 4, when \(\mathrm {\Psi }_\epsilon (\mathbf {x}\mid \mathcal {D}_G) = k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \), \(\mathbf {x}\) is consistent with k points in \(\mathcal {D}_V\) and \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) points in \(\mathcal {D}_E\). The inliers in \(\mathcal {D}_V\) specifies the k vertices in S(\(\mathbf {x}\)). The ‘if’ direction is true if all selected \(\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) edges are only edges in S(\(\mathbf {x}\)), i.e., for each inlier point \((\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta })\in \mathcal {D}_E\), \((\mathbf {a}^u_\alpha , b^u_\alpha )\) and \((\mathbf {a}^v_\beta , b^v_\beta )\) are also inliers w.r.t. \(\mathbf {x}\). The prove is done by contradiction:
If \(\epsilon <\frac{1}{M+2}\), given an inlier \((\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta })\), from (16) we have:
Assume at least one of \((\mathbf {a}^u_\alpha , b^u_\alpha )\) and \((\mathbf {a}^v_\beta , b^v_\beta )\) is not an inlier, from (15) and \(\epsilon <\frac{1}{M+2}\), we have \(\lfloor x_\alpha \rceil \ne u\) or \(\lfloor x_\beta \rceil \ne v\), which means that at least one of \((\lfloor x_\alpha \rceil -u)\) and \((\lfloor x_\beta \rceil -v)\) is not zero. Since all elements of \(\mathbf {x}\) satisfy (15), both \((\lfloor x_\alpha \rceil -u)\) and \((\lfloor x_\beta \rceil -v)\) are integers between \([-(M-1),(M-1)]\). If only one of \((\lfloor x_\alpha \rceil -u)\) and \((\lfloor x_\beta \rceil -v)\) is not zero, then \(|(\lfloor x_\alpha \rceil -u) + M(\lfloor x_\beta \rceil - v)| \ge |1+M\cdot 0| = 1\). If both are not zero, then \(|(\lfloor x_\alpha \rceil -u) + M(\lfloor x_\beta \rceil - v)| \ge |(M-1)+M\cdot 1| = 1\) Therefore, we have
Also due to (15), we have
Combining (21) and (22), we have
which contradicts (20). It is obvious that S(\(\mathbf {x}\)) can be computed within linear time. Hence, the ‘if’ direction is true when \(\epsilon <\frac{1}{M+2}\).\(\square \)
To illustrate Lemma 5, Fig. 3 depicts the value of \(\mathrm {\Psi }_\epsilon ( \mathbf {x}\mid \mathcal {D}_G )\) in the subspace \((x_\alpha , x_\beta )\) for \(\epsilon < \frac{1}{M+2}\). Observe that \(\mathrm {\Psi }_\epsilon ( \mathbf {x}\mid \mathcal {D}_G )\) attains the highest value of 3 in this subspace if and only if \(x_\alpha \) and \(x_\beta \) select a pair of vertices that are connected by an edge in G.
If \(\epsilon <\frac{1}{M+2}\), then the slab (17) that contains a point \((\mathbf {a}^{u,v}_{\alpha ,\beta }, b^{u,v}_{\alpha ,\beta })\in \mathcal {D}_E\), where (u, v) is an edge in \(\mathcal {G}\), does not intersect with any grid region besides the one formed by \((\mathbf {a}^u_\alpha , b^u_\alpha )\) and \((\mathbf {a}^v_\beta , b^v_\beta )\). In this figure, \(u = 1\) and \(v = 5\).
Completing the Reduction. We have demonstrated a reduction from k-CLIQUE to MAXCON-D, where the main work is to generate data \(\mathcal {D}_G\) which has number of measurements \(N = k|V| + 2|E|\left( {\begin{array}{c}k\\ 2\end{array}}\right) \) that is linear in |G| and polynomial in k, and dimension \(d = k\). In other words, the reduction is FPT in k. Setting \(\epsilon < \frac{1}{M+2}\) and \(\psi = k + \left( {\begin{array}{c}k\\ 2\end{array}}\right) \) completes the reduction.
Theorem 3
MAXCON is W[1]-hard w.r.t. the dimension d.
Proof
Since k-CLIQUE is W[1]-hard w.r.t. k, by the above FPT reduction, MAXCON is W[1]-hard w.r.t. d. \(\square \)
The implications of Theorem 3 have been discussed in Sect. 1.1.
3.3 FPT in the Number of Outliers and Dimension
Let \(f(\mathcal {C})\) and \(\hat{\mathbf {x}}_\mathcal {C}\) respectively indicate the minimised objective value and minimiser of \(\mathrm {LP}[\mathcal {C}]\). Consider two subsets \(\mathcal {P}\) and \(\mathcal {Q}\) of \(\mathcal {D}\), where \(\mathcal {P}\subseteq \mathcal {Q}\). The statement
follows from the fact that \(\mathrm {LP}[\mathcal {P}]\) contains only a subset of the constraints of \(\mathrm {LP}[\mathcal {Q}]\); we call this property monotonicity.
Let \(\mathbf {x}^*\) be a global solution of an instance of MAXCON, and let \(\mathcal {I}^* := \mathcal {C}_\epsilon (\mathbf {x}^* \mid \mathcal {D}) \subset \mathcal {D}\) be the maximum consensus set. Let \(\mathcal {C}\) index a subset of \(\mathcal {D}\), and let \(\mathcal {B}\) be the basis of \(\mathcal {C}\). If \(f(\mathcal {C}) > \epsilon \), then by Lemma 1
The monotonicity property affords us further insight.
Lemma 6
At least one point in \(\mathcal {B}\) do not exist in \(\mathcal {I}^*\).
Proof
By monotonicity,
Hence, \(\mathcal {I}^* \cup \mathcal {B}\) cannot be equal to \(\mathcal {I}^*\), for if they were equal, then \(f(\mathcal {I}^* \cup \mathcal {B}) = f(\mathcal {I}^*) \le \epsilon \) which violates (26). \(\square \)
The above observations suggest an algorithm for MAXCON that recursively removes basis points to find a consensus set, as summarised in Algorithm 1. This algorithm is a special case of the technique of Chin et al. [13]. Note that in the worst case, Algorithm 1 finds a solution with consensus d (i.e., the minimal case to fit \(\mathbf {x}\)), if there are no solutions with higher consensus to be found.

Theorem 4
MAXCON is FPT in the number of outliers and dimension.
Proof
Algorithm 1 conducts a depth-first tree search to find a recursive sequence of basis points to remove from \(\mathcal {D}\) to yield a consensus set. By Lemma 6, the longest sequence of basis points that needs to be removed is \(o = N-|\mathcal {I}^*|\), which is also the maximum tree depth searched by the algorithm (each descend of the tree removes one point). The number of nodes visited is of order \((d+1)^o\), since the branching factor of the tree is \(|\mathcal {B}|\), and by Lemma 1, \(|\mathcal {B}| \le d+1\).
At each node, \(\mathrm {LP}[\mathcal {C}]\) is solved, with the largest of these LPs having \(d+1\) variables and N constraints. Algorithm 1 thus runs in \(\mathcal {O}(d^o \mathrm {poly}(N,d))\) time, which is exponential only in the number of outliers o and dimension d.\(\square \)
Using [32, Theorem 2.3] and the repeated basis detection and avoidance procedure in [13, Sec. 3.1], the complexity of Algorithm 1 can be improved to \(\mathcal {O}((o+1)^d\mathrm {poly}(N,d))\). See [33, Sec. 3.5] for details.
4 Approximability
Given the inherent intractability of MAXCON, it is natural to seek recourse in approximate solutions. However, this section shows that it is not possible to construct PTAS [18] for MAXCON.
Our development here is inspired by [34, Sec. 3.2]. First, we define our source problem: given a set of k Boolean variables \(\{ v_j \}^{k}_{j=1}\), a literal is either one of the variables, e.g., \(v_j\), or its negation, e.g., \(\lnot v_j\). A clause is a disjunction over a set of literals, i.e., \(v_1 \vee \lnot v_2 \vee v_3\). A truth assignment is a setting of the values of the k variables. A clause is satisfied if it evaluates to true.
Problem 5
(MAX-2SAT). Given M clauses \(\mathcal {K}= \{ \mathcal {K}_i \}^{M}_{i=1}\) over k Boolean variables \(\{ v_j \}^{k}_{j=1}\), where each clause has exactly two literals, what is the maximum number of clauses that can be satisfied by a truth assignment?
MAX-2SAT is APX-hard [35], meaning that there are no algorithms that run in polynomial time that can approximately solve MAX-2SAT up to a desired error ratio. Here, we show an L-reduction [36] from MAX-2SAT to MAXCON, which unfortunately shows that MAXCON is also APX-hard.
Generating the Input Data. Given an instance of MAX-2SAT with clauses \(\mathcal {K}= \{ \mathcal {K}_i \}^{M}_{i=1}\) over variables \(\{ v_j \}^{k}_{j=1}\), let each clause \(\mathcal {K}_i\) be represented as \((\pm v_{\alpha _i})\vee (\pm v_{\beta _i})\), where \(\alpha _i, \beta _i \in \{1, \dots , k\}\) index the variables that exist in \(\mathcal {K}_i\), and ± here indicates either a “blank” (no negation) or \(\lnot \) (negation). Define
similarly for \(\mathrm {sgn}(\beta _i)\). Construct the input data for MAXCON as
where there are six measurements for each clause. Namely, for each clause \(\mathcal {K}_i\),
-
\(\mathbf {a}^{1}_i\) is a k-dimensional vector of zeros, except at the \(\alpha _i\)-th and \(\beta _i\)-th elements where the values are respectively \(\mathrm {sgn}(\alpha _i)\) and \(\mathrm {sgn}(\beta _i)\), and \(b^{1}_i = 2\).
-
\(\mathbf {a}^{2}_i = \mathbf {a}^{1}_i\) and \(b^{2}_i = 0\).
-
\(\mathbf {a}^{3}_i\) is a k-dimensional vector of zeros, except at the \(\alpha _i\)-th element where the value is \(\mathrm {sgn}(\alpha _i)\), and \(b^{3}_i = -1\).
-
\(\mathbf {a}^{4}_i = \mathbf {a}^{3}_i\) and \(b^{4}_i = 1\).
-
\(\mathbf {a}^{5}_i\) is a k-dimensional vector of zeros, except at the \(\beta _i\)-th element where the value is \(\mathrm {sgn}(\beta _i)\), and \(b^{5}_i = -1\).
-
\(\mathbf {a}^{6}_i = \mathbf {a}^{5}_i\) and \(b^{6}_i = 1\).
The number of measurements N in \(\mathcal {D}_{\mathcal {K}}\) is 6M.
Setting the Inlier Threshold. Given a solution \(\mathbf {x}\in \mathbb {R}^k\) for MAXCON, the six input measurements associated with \(\mathcal {K}_i\) are inliers under these conditions:
where \(x_{\alpha }\) is the \(\alpha \)-th element of \(\mathbf {x}\). Observe that if \(\epsilon < 1\), then at most one of (29), one of (30), and one of (31) can be satisfied. The following result establishes an important condition for L-reduction.
Lemma 7
If \(\epsilon < 1\), then
\(\mathrm {OPT(\text {MAX-2SAT})}\) is the maximum number of clauses that can be satisfied for a given MAX-2SAT instance, and \(\mathrm {OPT(\text {MAXCON})}\) is the maximum achievable consensus for the MAXCON instance generated under our reduction.
Proof
If \(\epsilon < 1\), for all \(\mathbf {x}\), at most one of (29), one of (30), and one (31), can be satisfied, hence \(\mathrm {OPT(\text {MAXCON})}\) cannot be greater than 3M. For any MAX-2SAT instance with M clauses, there is an algorithm [37] that can satisfy at least \(\lceil \frac{M}{2} \rceil \) of the clauses, thus \(\mathrm {OPT(\text {MAX-2SAT})} \ge \lceil \frac{M}{2} \rceil \). This leads to (32).\(\square \)
Note that, if \(\epsilon < 1\), rounding \(\mathbf {x}\) to its nearest bipolar vector (i.e.,, a vector that contains only \(-1\) or 1) cannot decrease the consensus w.r.t. \(\mathcal {D}_\mathcal {K}\). It is thus sufficient to consider \(\mathbf {x}\) that are bipolar in the rest of this section.
Intuitively, \(\mathbf {x}\) is used as a proxy for truth assignment: setting \(x_j = 1\) implies setting \(v_j = true\), and vice versa. Further, if one of the conditions in (29) holds for a given \(\mathbf {x}\), then the clause \(\mathcal {K}_i\) is satisfied by the truth assignment. Hence, for \(\mathbf {x}\) that is bipolar and \(\epsilon < 1\),
where \(\sigma \) is the number of clauses satisfied by \(\mathbf {x}\). This leads to the final necessary condition for L-reduction.
Lemma 8
If \(\epsilon < 1\), then
where \(\mathbf {t}(\mathbf {x})\) returns the truth assignment corresponding to \(\mathbf {x}\), and \(\mathrm {SAT}(\mathbf {t}(\mathbf {x}))\) returns the number of clauses satisfied by \(\mathbf {t}(\mathbf {x})\).
Proof
For any bipolar \(\mathbf {x}\) with consensus \(2M + \sigma \), the truth assignment \(\mathbf {t}(\mathbf {x})\) satisfies exactly \(\sigma \) clauses. Since the value of \(\mathrm {OPT(\text {MAXCON})}\) must take the form \(2M + \sigma ^*\), then \(\mathrm {OPT(\text {MAX-2SAT})} = \sigma ^*\). The condition (34) is immediately seen to hold by substituting the values into the equation.\(\square \)
We have demonstrated an L-reduction from MAX-2SAT to MAXCON, where the main work is to generate \(\mathcal {D}_\mathcal {K}\) in linear time. The function \(\mathbf {t}\) also takes linear time to compute. Setting \(\epsilon < 1\) completes the reduction.
Theorem 5
MAXCON is APX-hard.
Proof
Since MAX-2SAT is APX-hard, by the above L-reduction, MAXCON is also APX-hard.\(\square \)
5 Conclusions and Future Work
Notes
- 1.
Since RANSAC does not provide any approximation guarantees, it is not an “approximation scheme” by standard definition [18].
References
Meer, P.: Robust techniques for computer vision. In: Medioni, G., Kang, S.B. (eds.) Emerging Topics in Computer Vision. Prentice Hall, Upper Saddle River (2004)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Choi, S., Kim, T., Yu, W.: Performance evaluation of RANSAC family. In: British Machine Vision Conference (BMVC) (2009)
Raguram, R., Chum, O., Pollefeys, M., Matas, J., Frahm, J.M.: USAC: a universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 2022–2038 (2013)
Tran, Q.H., Chin, T.J., Chojnacki, W., Suter, D.: Sampling minimal subsets with large spans for robust estimation. Int. J. Comput. Vis. (IJCV) 106(1), 93–112 (2014)
Li, H.: Consensus set maximization with guaranteed global optimality for robust geometry estimation. In: IEEE International Conference on Computer Vision (ICCV) (2009)
Zheng, Y., Sugimoto, S., Okutomi, M.: Deterministically maximizing feasible subsystems for robust model fitting with unit norm constraints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Enqvist, O., Ask, E., Kahl, F., Åström, K.: Robust fitting for multiple view geometry. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 738–751. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_53
Bazin, J.C., Li, H., Kweon, I.S., Demonceaux, C., Vasseur, P., Ikeuchi, K.: A branch-and-bound approach to correspondence and grouping problems. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1565–1576 (2013)
Yang, J., Li, H., Jia, Y.: Optimal essential matrix estimation via inlier-set maximization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 111–126. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_8
Parra Bustos, A., Chin, T.J., Suter, D.: Fast rotation search with stereographic projections for 3D registration. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Enqvist, O., Ask, E., Kahl, F., Åström, K.: Tractable algorithms for robust model estimation. Int. J. Comput. Vis. 112(1), 115–129 (2015)
Chin, T.J., Purkait, P., Eriksson, A., Suter, D.: Efficient globally optimal consensus maximisation with tree search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Campbell, D., Petersson, L., Kneip, L., Li, H.: Globally-optimal inlier set maximisation for simultaneous camera pose and feature correspondence. In: IEEE International Conference on Computer Vision (ICCV) (2017)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W H Freeman & Co, New York (1990)
Downey, R.G., Fellows, M.R.: Parametrized Complexity. Springer, New York (1999). https://doi.org/10.1007/978-1-4612-0515-9
Erickson, J., Har-Peled, S., Mount, D.M.: On the least median square problem. Discret. Comput. Geom. 36(4), 593–607 (2006)
Vazirani, V.: Approximation Algorithms. Springer, Berlin (2001). https://doi.org/10.1007/978-3-662-04565-7
Le, H., Chin, T.J., Suter, D.: An exact penalty method for locally convergent maximum consensus. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Purkait, P., Zach, C., Eriksson, A.: Maximum consensus parameter estimation by reweighted \(\ell _1\) methods. In: Pelillo, M., Hancock, E. (eds.) EMMCVPR 2017. LNCS, vol. 10746, pp. 312–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78199-0_21
Cai, Z., Chin, T.J., Le, H., Suter, D.: Deterministic consensus maximization with biconvex programming. In: Ferrari, V. (ed.) ECCV 2018, Part XII. LNCS, vol. 11216, pp. 699–714. Springer, Cham (2018)
Svärm, L., Enqvist, O., Oskarsson, M., Kahl, F.: Accurate localization and pose estimation for large 3D models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Parra Bustos, A., Chin, T.J.: Guaranteed outlier removal for rotation search. In: IEEE International Conference on Computer Vision (ICCV) (2015)
Chin, T.J., Kee, Y.H., Eriksson, A., Neumann, F.: Guaranteed outlier removal with mixed integer linear programs. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Johnson, D.S., Preparata, F.P.: The densest hemisphere problem. Theor. Comput. Sci. 6, 93–107 (1978)
Ben-David, S., Eiron, N., Simon, H.: The computational complexity of densest region detection. J. Comput. Syst. Sci. 64(1), 22–47 (2002)
Aronov, B., Har-Peled, S.: On approximating the depth and related problems. SIAM J. Comput. 38(3), 899–921 (2008)
Bernholt, T.: Robust estimators are hard to compute. Technical report 52, Technische Universität Dortmund (2005)
Cheney, E.W.: Introduction to Approximation Theory. McGraw-Hill, New York (1966)
Giannopoulos, P., Knauer, C., Rote, G.: The parameterized complexity of some geometric problems in unbounded dimension. In: Chen, J., Fomin, F.V. (eds.) IWPEC 2009. LNCS, vol. 5917, pp. 198–209. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-11269-0_16
Matoušek, J.: On geometric optimization with few violated constraints. Discret. Comput. Geom. 14(4), 365–384 (1995)
Chin, T.J., Purkait, P., Eriksson, A., Suter, D.: Efficient globally optimal consensus maximisation with tree search. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(4), 758–772 (2017)
Amaldi, E., Kann, V.: The complexity and approximability of finding maximum feasible subsystems of linear relations. Theor. Comput. Sci. 147, 181–210 (1995)
Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9, 256–278 (1974)
Acknowledgements
This work was supported by ARC Grant DP160103490.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Chin, TJ., Cai, Z., Neumann, F. (2018). Robust Fitting in Computer Vision: Easy or Hard?. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11216. Springer, Cham. https://doi.org/10.1007/978-3-030-01258-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-01258-8_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01257-1
Online ISBN: 978-3-030-01258-8
eBook Packages: Computer ScienceComputer Science (R0)