Two-View Orthographic Epipolar Geometry: Minimal and Optimal Solvers

Journal of Mathematical Imaging and Vision, Jul 2017

We will in this paper present methods and algorithms for estimating two-view geometry based on an orthographic camera model. We use a previously neglected nonlinear criterion on rigidity to estimate the calibrated essential matrix. We give efficient algorithms for estimating it minimally (using only three point correspondences), in a least squares sense (using four or more point correspondences), and optimally with respect to the number of inliers. The inlier-optimal algorithm is based on a three-point solver and gives a fourth-order polynomial time algorithm. These methods can be used as building blocks to robustly find inlier correspondences in the presence of high degrees of outliers. We show experimentally that our methods can be used in many instances, where the orthographic camera model isn’t generally used. A case of special interest is situations with repetitive structures, which give high amounts of outliers in the initial feature point matching.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs10851-017-0753-1.pdf

Two-View Orthographic Epipolar Geometry: Minimal and Optimal Solvers

Two-View Orthographic Epipolar Geometry: Minimal and Optimal Solvers Magnus Oskarsson 0 0 Lund University , PO Box 118, 221 00 Lund , Sweden We will in this paper present methods and algorithms for estimating two-view geometry based on an orthographic camera model. We use a previously neglected nonlinear criterion on rigidity to estimate the calibrated essential matrix. We give efficient algorithms for estimating it minimally (using only three point correspondences), in a least squares sense (using four or more point correspondences), and optimally with respect to the number of inliers. The inlier-optimal algorithm is based on a three-point solver and gives a fourth-order polynomial time algorithm. These methods can be used as building blocks to robustly find inlier correspondences in the presence of high degrees of outliers. We show experimentally that our methods can be used in many instances, where the orthographic camera model isn't generally used. A case of special interest is situations with repetitive structures, which give high amounts of outliers in the initial feature point matching. 1 Introduction The geometry underlying two views of a rigid scene has been studied for a very long time. Using different projection models, the structure from motion problem, i.e., the problem of estimating both the scene geometry and the camera geometry from only image data, has been solved for many cases [8]. In his seminal paper from 1979, Ullman used the orthographic projection model to formulate and solve a number of structure from motion problems, for a small number of views and points [27]. During the following years, these theories were developed and refined. The main concern was to develop methods and algorithms to accurately estimate the geometry, in the presence of image noise [7, 9, 10, 14]. During recent years, more focus has been given to robustly estimating geometry in the presence of outliers in the data. To enable this, a number of approaches have been followed. A classic way to handle outliers is through robust estimation schemes based on RANSAC. In these frameworks, one needs methods for estimating model parameters given a small or minimal number of data points. For calibrated two-view projective geometry, the five-point algorithm estimates the essential matrix minimally, using five image point correspondences. In another direction, algorithms for global inlier maximization have been developed. These are often based on relaxing an initial non-convex problem into a more tractable problem, using different error norms or branch-andbound methods [3, 12, 13, 16]. Arguably, in many cases it is more beneficial to instead relax the underlying camera models (e.g., by using an orthographic camera model), to get tractable problems. Using approximate models to estimate epipolar geometry has been investigated previously; in [6], Goshen and Shimshoni used two matching SIFT descriptors, with dominant directions and scale, to construct eight correspondences. Perdoch et al. [20] used a fixed partial calibration to estimate the fundamental matrix from two local affine frames. In [21], it was shown that the affine fundamental matrix is the first-order Taylor approximation of the full perspective fundamental matrix. Using two matching MSER feature points, an approximate fundamental matrix could be estimated. Lately, methods that optimally find models that maximize the number of inliers, in polynomial time, have been developed, [5, 25]. We will in this paper investigate a number of specific geometric problems that have received little or no attention earlier, related to orthographic projections of rigid scenes in two views. We believe that this work gives some additional theoretic insights and at the same time also gives new powerful tools to robustly establish image point correspondences in the presence of outliers. Our main contribution is three algorithms for estimating orthographic epipolar geometry for two views: – A minimal three-point solver that uses only three points and yields only two solutions, making it very suitable for RANSAC-based estimation. – An optimal solver, maximizing the number of inliers using recent methods for optimal estimation and based on three specialized minimal solvers. – A least squares solver that minimizes a relevant reprojection error, by finding all stationary points of the corresponding Lagrangian. 2 Essential Matrix Estimation In [18] the general form of the fundamental matrix for two affine cameras was given, ⎡ 0 0 F = ⎣ 0 0 c d a⎤ b⎦ . e This matrix has five degrees of freedom, but it is only defined up to scale. It can also be described using the directions and offsets of the corresponding epipolar lines [17]. It can be linearly estimated from at least four point correspondences [22] or two ellipses [1]. The corresponding calibrated entity—the essential matrix corresponding to two orthographic projections—will fulfill an additional nonlinear constraint. This fact has been mentioned in passing earlier [9,15], but the implications have not been pursued in any depth. The extra constraint means that the essential matrix in this case only has three degrees of freedom. This low dimension will give us powerful tools for estimating the epipolar geometry. We will state the constraint in terms of the essential matrix. Theorem 1 The essential matrix, ⎡ 0 0 a⎤ E = ⎣ 0 0 b⎦ , c d e a2 + b2 = c2 + d2. corresponding to two orthographic views, will fulfill ( 2 ) ( 3 ) ( 1 ) c2 + d2 = r223 + r123 = 1 − r323. Proof Two orthographic views can be represented by the camera matrices and hence a2 + b2 = c2 + d2. For two corresponding points u = [x , y, 1] and u = [x , y , 1], we have that uT E u = 0 ⇔ e + ax + by + cx + d y = 0. The essential matrix is only determined up to scale. To get appropriate and symmetric epipolar line constraints, we will fix the scale by setting a2 + b2 = 1(= c2 + d2). This will lead to that (a, b) and (c, d) represent the unit norm normal directions for the two epipolar lines. The epipolar line distances to the origin are then given by e + cx + d y , respectively, e + ax + by. This gives us a natural way to define a symmetric and geometrically valid error. The mean perpendicular distance to the epipolar lines in the two images is given by 1 D = 2 (|ax + by + e + cx + d y | + |cx + d y + e + ax + by|) = |e + ax + by + cx + d y |. In [22], three different cost functions, for affine geometry estimation, were discussed. Our choice of normalization makes these three different cost functions collapse into the ( 6 ) ( 7 ) ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) one described above. For a thorough discussion on model selection functions for two-view geometry see [26]. It is known that given two orthographic views of a rigid scene, the full camera geometry cannot be established [9,10]. But the image data must fulfill the epipolar constraint in order for the geometry to be valid. We will use this rigidity constraint to enable estimates of the geometry, based on a low number of tentative point correspondences. These estimates in turn can be used to find image feature point correspondences in robust manner. We will in the next sections show how the essential matrix can be estimated. In Sect. 3, we solve the minimal case of three point correspondences. This gives a very fast solver that can be used in a standard RANSAC framework. In Sect. 4, we derive solvers that can be used to find the globally optimal essential matrix that maximizes the number of inliers, given a set of point correspondences corrupted by outliers. Finally in Sect. 5, we show how to find the nonlinear least squares solution, given four or more point correspondences. 3 A Minimal Three-Point Solver The essential matrix has three degrees of freedom, and each point correspondence gives one constraint on the parameters. We should hence be able to minimally estimate the essential matrix using three point correspondences. We will now show how this can be done. We have according to ( 12 ) the three constraints Di = 0, i = 1, 2, 3. These are linear in the five parameters (a, b, c, d, e). In addition to this, we have the two quadratic constraints a2 + b2 = 1 and c2 + d2 = 1. We can start by eliminating e by taking D2 − D1 and D3 − D1. This gives two linear constraints in (a, b, c, d). We can use these to express (a, b) in terms of (c, d). Substituting these expressions into a2 + b2 = 1 gives a polynomial in c and d. We can use the second quadratic constraint c2 = 1 − d2 to eliminate all factors of c of higher degree than one. This gives a polynomial of the form p1 = k1cd + k2d2 + k3, where ki only depends on image point measurements. Multiplying this polynomial by c and again eliminating terms of higher degree in c gives a polynomial of the form p2 = q1cd2 + q2c + q3d3 + q4d. We can write these two polynomial constraints as B 1c = 00 , with B = k1 k2d2 + k3 q1d2 + q2 q3d3 + q4d . ( 13 ) ( 14 ) In order for ( 13 ) to have a solution, the determinant of B must vanish. Taking the determinant of B gives a fourth degree polynomial in d but with only even degrees. It is clear that if we have a solution (a, b, c, d, e) to our three-point problem then (−a, −b, −c, −d, −e) will also be a solution, but this corresponds to the same essential matrix. This means that we only need to consider two of the solutions for d. The solver will be extremely fast since we only need to solve a second-degree polynomial in the end. 4 Maximizing the Number of Inliers In the presence of outliers, finding accurate correspondences is difficult, and robust methods are highly desirable. A common approach is to estimate a model that optimizes the number of inliers. Measuring reprojection errors is normally the preferred choice, as this accurately models the limited precision of feature detection techniques. Although such formulations lead to challenging optimization problems, using recent advances in robust estimation it is sometimes possible to develop tractable methods. In [5], it was shown how the number of inliers can be maximized in polynomial time, for a fixed-dimensional model, where the computational complexity follows directly as a consequence of the theory of optimization. One requirement is that the parameter space is a differentiable manifold embedded in Rm with a set of equality constraints. The authors used this to produce algorithms for optimal image stitching and 2D-registration. In [24,25], the authors used similar methods to perform large-scale imagebased localization. We will here describe how these ideas can be applied to orthographic essential matrix estimation, resulting in an optimal method. The main theorem from [5] shows that one can find the optimal solution with respect to the number of inliers by enumerating a finite set of so called critical points, essentially being the Karush–Kuhn–Tucker (KKT) points. These critical points divide the solution space into regions that contain different combinations of inliers and outliers, and the optimal solution with respect to the number of inliers will be found in one of the critical points. Our parameter space (a, b, c, d, e) is embedded in R5 with the constraints h1 = a2 + b2 = 1 and h2 = c2 + d2 = 1. The critical points satisfy the KKTconditions for local optimality to the optimization problems min (a,b,c,d,e) f (a, b, c, d, e) h j (a, b, c, d) = 1, j = 1, 2, Di (a, b, c, d, e) ≤ for all i ∈ C, where C runs over all subsets of correspondences of size |C | ≤ 3 (the number of degrees of freedom of our problem) and Di are the epipolar distances ( 12 ). The function f is an auxiliary goal function, which can be chosen arbitrarily, as long as the KKT points constitute a finite set. Most often a ( 15 ) ( 16 ) ( 17 ) linear function will yield simple equations, and we will show that choosing the goal function f (a, b, c, d, e) = a + d, ( 18 ) will give us a finite set of KKT points. For an inlier bound of , the KKT points are then found by looking at points where a number of constraints using ( 17 ) are active, i.e., 1. Di2 = 2, i = i1, i2, i3. 2. Di2 = 2, i = i1, i2 and (∇ f, ∇h j , ∇ Di2) are linearly dependent. 3. Di2 = 2, i = i1 and (∇ f, ∇h j , ∇ Di2) are linearly dependent. 4. ∇ f = 0. In order to find the solution, we need solvers to the first three different cases above (the last case gives only a trivial solution), and we can then find the optimal solutions by evaluating all KKT points. The time complexity of the algorithm is determined by step 1, where three constraints are active. Going through all triplets of points and then for each solution checking how many inliers we have gives a total complexity of O(n4) for n tentative correspondences. In this way, it is possible to maximize the number of inliers in O(n4) time. The steps of the method are summarized in Algorithm 1. In Sect. 4.1, we show how we construct the main optimal three-point solver. The optimal two- and one-point solvers are discussed in Sect. 4.2. Algorithm 1 Optimal inlier maximization 1: Given a set of corresponding points C in two images, and an inlier bound . 2: For all possible combinations C3k ⊂ C with |C3k | = 3, find the corresponding essential matrices E 3k as described in Section 4.1 and count the total number of corresponding points N 3k with error less than . 3: For all possible combinations C2k ⊂ C with |C2k | = 2, find the corresponding essential matrices E 2k as described in Section 4.2 and count the total number of corresponding points N 2k with error less than . 4: For all C1k ⊂ C correspondences, find the corresponding essential matrices E 1k as described in Section 4.2 and count the total number of corresponding points N 1k with error less than . 5: The maximal number of inliers is the maximum of all N 3k , N 2k and N 1k for all k. The globally optimal essential matrix is the corresponding E 3k , E 2k or E 1k for the optimal k. 4.1 The Three-Point Solver for Inlier Maximization Given an inlier bound and three point correspondences, we want to find an essential matrix E such that Di2 = 2, i = 1, 2, 3. These equations are quadratic in the unknowns (a, b, c, d, e), but we can simplify them by considering that the solution must fulfill Di = ± , i = 1, 2, 3. This gives in total eight combinations of solutions. We linearly eliminate e, giving for each combination a system on the form Di − D1 = wi , i = 2, 3, where wi can take the values of −2 , 0 or 2 . This gives a very similar system to the one we solved in Sect. 3, but slightly more complicated due to the constant factors. We will solve it by explicitly constructing a Gröbner basis [4]. We start in the same way and write (a, b) in (c, d) using the linear constraints. Resubstitution into a2 + b2 = 1 gives an equation on the form k1c2 + k2cd + k3c + k4d2 + k5d + k6 = 0, where ki only depends on image measurements. We can eliminate the c2−term in favor for d2 using the constraint c2 + d2 = 1. We then get p1 = cd + k1 + k2c + k3d + k4d2, where each ki depends on k . Multiplying this equation with c and d gives two new equations. Again we eliminate all c2−terms. Taking a linear combination of these two polynomials gives us a polynomial with monomials {cd, c, d3, d2, d, 1}. We can eliminate the cd−term by solving for cd in p1 = 0 and resubstituting. This gives us p2 = d3 + q1 + q2c + q3d + q4d2, where again qi only depends on the image coordinates. The polynomials p1 and p2 together with p0 = c2 +d2 −1 constitute a Gröbner basis for our problem. We can now construct our solver using the action matrix method [4]. We use the linear basis {1 , c , d , d2} and d as action variable to construct the action matrix A, ( 20 ) ( 21 ) ( 22 ) ( 23 ) ( 24 ) ( 25 ) ( 26 ) ⎡ 1 ⎤ d ⎢ c ⎥ ⎣⎢ dd2⎥⎦ with ⎡ d ⎤ = ⎢⎢ cdd2⎥⎥ ⎣ d3⎦ ⎡ 1 ⎤ c = A ⎢ ⎢ d ⎥⎥ , ⎣ d2⎦ ( 19 ) ⎡ 0 A = ⎢⎢⎣ −0k1 −q1 0 −k2 0 −q2 1 −k3 0 −q3 0 ⎤ −k4 1 ⎥⎥ . ⎦ −q4 solver that is both robust to noise and efficient. We will solve the least squares problem by finding all the stationary points of (28) by differentiating the corresponding Lagrangian function, L(a, b, c, d, e, λ, γ ) n Di2 + λ(1 − a2 − b2) + γ (1 − c2 − d2), (29) i=1 = with ⎢⎢⎢ ∇ L = 2 ⎢⎢ ⎢⎢⎢ ⎣ ⎡ n i=1 xi Di − aλ⎤ n ni=1 yi Di − bλ ⎥ ni=1 xi Di − cγ ⎥⎥ i=1 yni Di − dγ ⎥⎥ . 11 −− iac=221−−Ddib22 ⎥⎥⎦⎥ From ∇ L = 0 it is clear that 1 n e = n i=1 [xi yi xi yi ] ⎢⎢⎣ bc⎥⎥⎦ . ⎡ a⎤ d We can now solve for c and d by finding the eigenvectors of A, giving four solutions. In total, we get 8 · 4 = 32 solutions to the three-point problem. 4.2 The Two- and One-Point Solvers for Inlier Maximization We will here give an outline to our two and one point solvers, i.e., the solutions to Di2 = 2, i = i1, i2, and (∇ f, ∇h j , ∇ Di2) are linearly dependent, respectively, D2 i = 2, i = i1 and (∇ f, ∇h j , ∇ D2) are linearly dependent. To i get as simple expressions as possible, we have chosen the goal function f = a + c. That the gradients should be linearly dependent can be expressed using the determinant of the corresponding stacked gradients, yielding for the two-point case an expression on the form ( 27 ) s1ad + s2bc + s3bd = 0, where each si only depends on image measurements. Taking the difference of the two epipolar line distance equations eliminates e and gives a linear constraint on (a, b, c, d). This cg2iv+es dto2g=eth1erawsiytshtetmhe otfwfoouermpboeldydnionmgicarliteeqriuaataio2n+s ibn2th=e four variables (a, b, c, d). Using similar techniques as previously, this system can be solved, yielding at most eight real solutions. Since we, for each of the two points, have a distance of ± we get in total 4 · 8 = 32 solutions. For the one point case, this point will be used to express e in terms of the unknowns. The linearly dependence of the gradients will in this case give the constraints bd = 0, ad = 0, bc = 0 and s1ad + s2bc + s3bd = 0. This together with the embedding constraints only gives the solutions (a, b, c, d) = ( 1, 0, 1, 0 ) and (a, b, c, d) = (−1, 0, 1, 0). In total, we will get four solutions since we get two solutions for e corresponding to ± . 5 A Least Squares Solver Having more than three point correspondences will lead to an overdetermined system of equations if we want to solve Di = 0. We will solve this in a least squares sense, i.e., given n point correspondences, Di2, s.t. a2 + b2 = 1, c2 + d2 = 1. (28) n min E i=1 Actually in the very first ECCV, Harris described the same problem [7], and it was noted that the solution can be found as one of the roots of an 8-degree polynomial. However, no details as to how to construct the final polynomial were given. Here we go through the solution in detail and construct a (30) (31) (32) (33) Thus e can be eliminated by removing the centroid of each of the image point sets, i.e., u˜ i = ui − u¯ and u˜ i = ui − u¯ . The vanishing of the gradient of L (without the normalization constraints) can then be written M v = Sv, with v = [a, b, c, d]T and M4×4 = u ˜ u ˜ uT u T , S = ⎢⎣ 0 ˜ ˜ ⎡ λ ⎢ 0 0 This is almost an eigenvalue problem, but the different λ and γ make it slightly more complicated to solve. One can see that (32) is homogeneous in (a, b, c, d) so we can scale the equations with 1/d giving equations in (a/d, b/d, c/d, 1) = (a, bˆ, cˆ, 1). We can then use three of the four equations to ˆ linearly express (aˆ , bˆ, cˆ) in terms of λ and γ . We reinsert these expressions into the fourth equation of (32) and the normalization constraints aˆ 2 + bˆ2 = cˆ2 + 1. This gives two equations of degree four in λ and degree two in γ . Multiplying these two equations with γ gives two new equations. We can now write these equations as ⎡ γ 3⎤ Since this should have a solution, the determinant of B should be zero. Expressing this determinant in λ gives a 12 degree polynomial. This polynomial can be factorized into an 8degree polynomial and a 4-degree polynomial. The 4-degree polynomial [containing spurious solutions due to the denominator from the linear solution of (32)] can be factored out, leaving an 8-degree polynomial to solve. The rest of the unknowns can be found by back substitution. The least squares solution should be one of the eight solutions, and we simply choose the one with smallest error. 6 Experimental Validation We have conducted a number of experiments, on both synthetic and real data, to show the performance of our solvers. We have implemented all solvers in MATLAB, and they are publicly released at https://github.com/hamburgerlady/ ortho-gem. In Table 1, the running times for our MATLAB implementations are shown. All tests were conducted on a desktop computer running Ubuntu, with an Intel Core i7 3.6 GHz processor. 6.1 Synthetic Data In order to test the numerical stability of our minimal solvers, we did some simple tests. We randomly generated true orthographic epipolar geometry, with corresponding essential matrices and image data (without added noise, 10,000 instances). We then ran our solvers: the minimal three-point solver and the three-point solver used in the optimal inlier solver. For the optimal inlier solver we also used a random bound. We then calculated the equation residuals for all solutions (( 3 ) and ( 12 ) for the minimal three-point solver and ( 3 ) and ( 19 ) for the optimal three-point solver). The resulting histogram of the logarithm of the errors is shown in Fig. 1, where one can see that we get errors close to machine-precision. 60 % 30 % 60 40 20 Minimal 3-point solver Optimal 3-point solver 4 points 10 points 50 points 0−18 −16 −14 −12 log10 of residuals −10 −8 In order to test our least squares solver, we again randomly generated true orthographic epipolar geometry, without noise. We then ran our least squares solver on a large number of examples ( 10,000 ), and for each solution we recorded the Frobenius norm of the difference of the estimated essential matrix and the ground truth essential matrix. The distributions of errors are shown in Fig. 2. We get slightly different behavior for different number of points, but overall we get results with very small errors. To test the dependence on image point errors, for the least square solver, we ran a similar experiment as the previous one. We added Gaussian noise to the image measurements and ran our least square solver. In Fig. 3, the resulting errors in the final estimates are shown. Top shows the mean absolute angular error between the estimated and ground truth epipolar line directions for the two images. Bottom shows the corresponding absolute 1 2 3 4 5 Noise standard deviation ·10−2 error in line offset (the last entry in the essential matrix). We show the results using 4 and 10 correspondences. 6.2 Semisynthetic Data To test our algorithms’ behavior in the presence of outliers and real errors, we have conducted a number of tests based on real data. Alcatraz In a first experiment, we used the Alcatraz dataset, as described in [19]. We used two images (image number 1 and 39) with SIFT correspondences, including outliers. Since we had access to the ground truth, we knew which correspondences were inliers and outliers, respectively. We have compared our minimal three-point solver to the calibrated five-point solver, using the MATLAB-mex implementation from [23], that runs in around 0.15 ms. We used 41 outliers in the correspondences and varied the number of inliers between 10 and 180. This gave us a number of sets with outlier ratio varying between 19 and 80%. The input for a large outlier ratio is shown in Fig. 5. Using a standard RANSAC loop—with the number of iterations set so that we would find an inlier set with probability 99.9%—we compared the fivepoint solver to our minimal three-point solver. The results are 0.4 0.6 Outlier ratio 0.4 0.6 Outlier ratio 4 3 2 1 4 3 2 1 0.4 0.6 Outlier ratio shown in Fig. 4. The left graph shows the running times as function of outlier ratio, on a logarithmic scale. Our method is faster due to three factors. Firstly, the minimal solver is faster. Secondly, since our solver maximally gives two solutions, we get in general fewer real solutions that need evaluating on the whole point set. And thirdly, since we only need to sample three points, the likelihood of finding an outlier free hypothesis set is higher. We get these benefits at the cost of having a more restrictive camera model, which might not capture the complete geometry. This will of course be highly scenario dependent, but as shown in Fig. 4, where the recall and fallout of the estimated inlier set are shown, we match the five-point solver well in terms of finding the true inlier set. In Sect. 6.4, we give more results on how well the orthographic model approximates the true epipolar geometry. We have also compared with our optimal method that maximizes the number of inliers. The running time of this method doesn’t depend on the inlier ratio, but only on the number of initial correspondences, but it is in general much slower than the RANSAC methods. In the middle of Fig. 5, the resulting inlier set for the fivepoint solver is shown. Below this, the result of the minimal three-point solver is shown. In this case, there are very few inliers, and the more restricted orthographic camera model serves as a regularization. On a systems level, this would of course be easy to handle using the five-point solver, since the full 3D geometry probably wouldn’t fit, but we still believe that this shows the benefit of using a simpler model for establishing correspondences. Dinosaur To test our optimal inlier maximization algorithm, as described in Sect. 4, we used the well-known dinosaur sequence. This sequence contains very little outliers, and the camera geometry has been shown to be well approximated by an affine model, see, e.g., [11]. We used two calibrated images (image 20 and 22) with a subset of 51 true correspondences. Using the full inlier set, we estimated the essential matrix using our least square solver. This was used to define the ground truth of our experiment. We then randomly corrupted a subset of the correspondences to simulate gross outliers and ran our optimal inlier maximization. As a comparison, we also ran our minimal three-point solver using RANSAC exhaustively. The result of the error between the estimated essential matrices and the ground truth, for different rates of outliers, is shown in Fig. 6. We show the error in angle and offset in a similar way as in Sect. 6.1. As can be seen in the figure, the errors degrade gracefully as functions of the outlier ratio. 6.3 Real Data and Repetitive Structures One scenario where the matching of features is difficult is when the depicted scene contains repetitive structures. This 0 0.2 0.4 0.6 Outlier ratio Fig. 6 Top absolute error between the ground truth and estimated epipolar line angles as functions of outlier ratio, for the semisynthetic optimal estimation experiment. Bottom absolute error between the ground truth and estimated offset element of the essential matrix, as function of the outlier ratio is quite common and can occur due to similar textures of objects. To test the use of our minimal solver in such a setting, we conducted a small experiment. The setup of our Heinz experiment is shown in Fig. 7. We took four images of a number of cans, with the same texture on them. We extracted SURF features and matched these based on their feature vectors, for all pairs of the four images. A subset of the initial matches can be seen to the left in Fig. 7. For the six pairs, we got between 330 and 530 tentative correspondences. We then ran 5,000 iterations of our minimal three-point solver to estimate the inlier set. We then used this inlier set to estimate the full projective epipolar geometry, with subsequent bundle adjustment. The original points and reprojected points are shown in the middle of the figure. The corresponding rms-value for all views was 1.14 in unnormalized coordinates. We have also run our optimal inlier method, giving a corresponding rms-value for all views of 1.58 after bundle adjustment. The reprojection points can be seen to the right of Fig. 7. The number of inliers for all pairs is given in Table 2. e, Optimal e, 3pt RANSAC 0.8 0.8 Fig. 7 To the left, the four input images from the Heinz experiment are shown, with the initial feature point matches for all pairs of images—indicated by lines—overlaid. To help visibility, only 50 random correspondences for each pair are shown. As can be seen, there are multiple mismatches due to the repetitive scene. To the right, the initial points (red) and reprojected points (yellow) are shown, after running a RANSAC loop with our minimal three-point, respectively, optimal inlier solver and subsequent bundle adjustment (Color figure online) The numbers are given for all combinations of the four respective images. Also shown are the final rms-values of the reprojection errors after bundle adjustment In a second experiment, facade, we used four images of a building facade. These images contain a number of repeating structures, such as windows, roof texture, dormer windows, and chimneys. We again extracted SURF features and matched these, for all pairs of the four images. The estimated inliers set using 10,000 iterations of RANSAC with our minimal three-point solver is shown in Fig. 8. We have also run the optimal inlier maximization on this dataset— the details are given in Table 2. In both these experiments, running the optimal solver is orders of magnitude slower than RANSAC, so it is not directly practical to use it for these amounts of initial correspondences. In order for it to be tractable, some form of initial pruning would be needed, but this is left for future work. 6.4 Orthographic Model Fit In order to investigate how well the orthographic model approximates the true epipolar geometry, we made the folFig. 8 Inlier image points (red) and reprojected points (yellow), after running a RANSAC loop with our minimal three-point solver and subsequent bundle adjustment on the facade dataset (Color figure online) lowing test on the Alcatraz dataset. We used the true inlier set for all pairs of images and used our least squares solver to find the orthographic essential matrix. We then calculated the mean reprojection error ( 12 ) for all pairs. We looked at how this error varies with different parameters such as the medium depth of all scene points, the variance of the depth of the scene points, and the baseline between the two cameras. Our conclusion for this test set was that the important parameter was the baseline between the two cameras. In Fig. 9, we show the mean error as a function of the distance between the two corresponding camera centers. The metric scale for the cameras was manually estimated from the images. The average depth of the scene points was 8.2 m for this dataset (with a standard deviation of 1.4 m). Camera displacement (m) We have in this paper given methods and algorithms for estimating two-view geometry for orthographic cameras. We have shown how to estimate the corresponding essential matrix minimally (using three point correspondences), in a least squares sense, or optimally with respect to the number of inliers. These methods can be used to robustly find inlier correspondences in the presence of high degrees of outliers. They depend on an orthographic camera model, but we indicate in the experimental section that in many cases this model is a very good approximation. Our low-dimensional solvers give many benefits over the full projective estimates. Due to the simplicity of our minimal solver, we get a faster solver that also gives fewer solutions than, e.g., the fivepoint calibrated solver, which leads to faster validation in a RANSAC loop. Of course, these benefits come at the cost of assuming a more restrictive camera model. This model might not capture the complete geometry and may be biased toward affine geometry. This caveat aside, we believe that we get a very fast framework for robust two-view correspondence estimation. Even though our optimal inlier estimation is based on only three point correspondences, it is in many cases not tractable to check all combinations of three points. Future work will include investigating recent methods that can reduce the number of initial correspondences without sacrificing optimality [2, 25]. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Magnus Oskarsson received his M.Sc. degree in Engineering Physics in 1997 and Ph.D. in Mathematics in 2002 from the University of Lund, Sweden. His thesis work was devoted to computer vision with applications for autonomous vehicles. He is currently an associate professor at the Centre for Mathematical Sciences, Lund University, where his teachings include undergraduate and graduate courses in mathematics and image analysis. He is the author and co-author to a number of papers in international journals and conference proceedings within geometry, algebra, and optimization with applications in computer vision, cognitive vision, and image enhancement. 1. Arandjelovic , R. , Zisserman , A. : Efficient image retrieval for 3d structures . In: British Machine Vision Conference , pp. 1 - 11 ( 2010 ) 2. Ask , E. , Enqvist , O. , Kahl , F. : Optimal geometric fitting under the truncated l2-norm . In: Conference on Computer Vision and Pattern Recognition ( 2013 ) 3. Breuel , T. : Implementation techniques for geometric branch-andbound matching methods . Comput. Vis. Image Understand . 90 ( 3 ), 258 - 294 ( 2003 ) 4. Cox , D.A. , Little , J., O'shea, D.: Using Algebraic Geometry , vol. 185 . Springer, Berlin ( 2005 ) 5. Enqvist , O. , Ask , E. , Kahl , F. , Åström , K. : Tractable algorithms for robust model estimation . Int. J. Comput. Vis . 112 ( 1 ), 115 - 129 ( 2015 ) 6. Goshen , L. , Shimshoni , I. : Balanced exploration and exploitation model search for efficient epipolar geometry estimation . IEEE Trans. Pattern Anal. Mach. Intell . 30 ( 7 ), 1230 - 1242 ( 2008 ) 7. Harris , C. : Structure-from-motion under orthographic projection . In: European Conference on Computer Vision , pp. 118 - 123 . Springer ( 1990 ) 8. Hartley , R. , Zisserman , A. : Multiple View Geometry in Computer Vision . Cambridge University Press, Cambridge ( 2004 ) 9. Hu , X. , Ahuja , N.: Motion estimation under orthographic projection . IEEE Trans. Robot. Autom . 7 ( 6 ), 848 - 853 ( 1991 ) 10. Huang , T.S. , Lee , C. : Motion and structure from orthographic projections . IEEE Trans. Pattern Anal. Mach. Intell . 11 ( 5 ), 536 - 540 ( 1989 ) 11. Jiang , F. , Oskarsson , M. , Åström , K. : On the minimal problems of low-rank matrix factorization . In: Conference on Computer Vision and Pattern Recognition , pp. 2549 - 2557 . IEEE ( 2015 ) 12. Kahl , F. , Hartley , R.: Multiple view geometry under the L∞-norm . IEEE Trans. Pattern Anal. Mach. Intell . 30 ( 9 ), 1603 - 1617 ( 2008 ) 13. Ke , Q. , Kanade , T. : Quasiconvex optimization for robust geometric reconstruction . In: International Conference on Computer Vision , pp. 986 - 993 . Beijing, China ( 2005 ) 14. Koenderink , J.J. , Van Doorn , A.J.: Affine structure from motion . J. Opt. Soc. Am. A 8 ( 2 ), 377 - 385 ( 1991 ) 15. Lehmann , S. , Bradley , A.P. , Clarkson , I.V.L. , Williams , J. , Kootsookos , P.J. : Correspondence-free determination of the affine fundamental matrix . IEEE Trans. Pattern Anal. Mach. Intell . 29 ( 1 ), 82 - 97 ( 2007 ) 16. Li , H. : A practical algorithm for L∞ triangulation with outliers . In: Conference on Computer Vision and Pattern Recognition ( 2007 ) 17. Mendonça , P.R. , Cipolla , R.: Estimation of epipolar geometry from apparent contours: Affine and circular motion cases . In: Conference on Computer Vision and Pattern Recognition. IEEE ( 1999 ) 18. Mundy , J.L. , Zisserman , A. : Geometric Invariance in Computer Vision . MIT Press, Cambridge ( 1992 ) 19. Olsson , C. , Enqvist , O. : Stable structure from motion for unordered image collections . In: Scandinavian Conference on Image Analysis ( 2011 ) 20. Perd 'och, M. , Matas , J. , Chum , O. : Epipolar geometry from two correspondences . In: International Conference on Pattern Recognition , vol. 4 , pp. 215 - 219 . IEEE ( 2006 ) 21. Pritts , J. , Chum , O. , Matas , J.: Approximate models for fast and accurate epipolar geometry estimation . In: 28th International Conference of Image and Vision Computing New Zealand (IVCNZ) , pp. 106 - 111 . IEEE ( 2013 ) 22. Shapiro , L.S. , Zisserman , A. , Brady , M.: 3d motion recovery via affine epipolar geometry . Int. J. Comput. Vis . 16 ( 2 ), 147 - 182 ( 1995 ) 23. Stewenius , H. , Engels , C. , Nistér , D. : Recent developments on direct relative orientation . ISPRS J. Photogramm. Remote Sens . 60 ( 4 ), 284 - 294 ( 2006 ) 24. Svärm , L. , Enqvist , O. , Kahl , F. , Oskarsson , M. : City-scale localization for cameras with known vertical direction . IEEE Trans. Pattern Anal. Mach . Intell. PP(99) , 1 - 1 ( 2016 ). doi: 10 .1109/TPAMI. 2016 . 2598331 25. Svärm , L. , Enqvist , O. , Oskarsson , M. , Kahl , F. : Accurate localization and pose estimation for large 3d models . In: Conference on Computer Vision and Pattern Recognition ( 2014 ) 26. Torr , P.H. : Model Selection for Two View Geometry: A Review . In: Forsyth, D.A. , Mundy , J.L. , di Gesu , V. , Cipolla , R . (eds.) Shape, Contour and Grouping in Computer Vision , pp. 277 - 301 . Springer, Berlin ( 1999 ) 27. Ullman , S.: The interpretation of structure from motion . Proceed. R. Soc. Lond. B Biol. Sci . 203 ( 1153 ), 405 - 426 ( 1979 )


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs10851-017-0753-1.pdf

Magnus Oskarsson. Two-View Orthographic Epipolar Geometry: Minimal and Optimal Solvers, Journal of Mathematical Imaging and Vision, 2017, 1-11, DOI: 10.1007/s10851-017-0753-1