A Global Linear Method for Camera Pose Registration Nianjuan Jiang*1,
29 Slides6.81 MB
A Global Linear Method for Camera Pose Registration Nianjuan Jiang*1, Zhaopeng Cui*2, Ping Tan2 1 Advanced Digital Sciences Center, Singapore 2 National University of Singapore *Joint first authors 1
Structure from Motion (SfM) Simultaneously recover both 3D scene points and camera poses 2
SfM Pipeline Step 1. Epipolar geometry; compute relative motion between 2 or 3 cameras 6-point method [Quan 1995] 7-point method [Torr & Murray 1997] 8-point method (normalized) [Hartley 1997] 5-point method [Nister 2004] Images with matched feature points 3
SfM Pipeline Step 1. Epipolar geometry; Step 2. Camera registration; put all cameras in the same coordinate system (auto-calibration if needed [Pollefeys et al. 1998]) [Fitzgibbon & Zisserman 1998] [Pollefeys et al. 2004] 4
SfM Pipeline Step 1. Epipolar geometry; Step 2. Camera registration; Step 3. Bundle adjustment. optimize all cameras and points [Triggs et al. 1999] 5
โThe Black Art โ Step 1. Epipolar geometry; Step 2. Camera registration; Step 3. Bundle adjustment. The state-of-the-art: 1. Step 1 and 3 are very well studied with elegant theories and algorithms. 2. The step 2 is often ad-hoc and heuristic. The camera registration to initialize bundle adjustment โ is still to some extent a black art โ. Page 452, Chapter 18.6 6
Typical Solutions Hierarchical solution: Iteratively merge sub-sequences [Fitzgibbon & Zisserman 1998] [Lhuillier & Quan 2005] 7
Typical Solutions Hierarchical solution: Incremental solution: Iteratively merge sub-sequences Iteratively add cameras one by one [Fitzgibbon & Zisserman 1998] [Lhuillier & Quan 2005] [Pollefeys et al. 2004] [Snavely et al. 2006] 8
Pain of Existing Solutions The block diagram (for the incremental solution): Step 1: Epipolar Geometry Initial Reconstruction (2 cameras) Register All Cameras in a Single Step Add Cameras Bundle Adjustment More Cameras? Step 3: Bundle Adjustment Drawbacks: 1. Repetitively calling bundle adjustment Inefficiency Ourtime objective: 90% of the total computation is spent on bundle adjustment. Simultaneously register cameras to initialize 2. Some cameras are fixed before theallothers bundle adjustment asymmetric formulationthe leads to inferior results. 9
Previous Works linear global solution to rotations cannot solve translations discrete-continuous optimization require coplanar cameras [Govindu 2001] [Hartley etfeatures: al. 2013] Desirable [Crandall et al. 2011] 1. Solve both rotations & translations; 2. Linear & robust solution; elegant quasi-convex optimization linear global solution to translations 3. No degeneracy. sensitive to outliers L [Kahl 2005] [Martinec et al. 2007] degenerate at collinear motion [Arie-Nachimson et al. 2012] 10
The Input Epipolar Geometry The essential matrix encodes the relative motion ๐๐ ๐ธ [ ๐ก ๐ธ ๐๐ ๐๐ ๐ ] ๐ ๐๐ and ๐๐ ๐ก ๐๐ ๐ก ๐๐ ๐ ๐๐ 11
Rotation Registration [Martinec et al. 2007] A linear equation from every two cameras ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ [ 1 2 , ,3] {cam1,cam 2} ๐ 2 ๐ 12 ๐ 1 ๐ ๐๐ ๐ ๐ ๐ ๐ {cam2,cam 3} ๐ ๐ ๐ 3 23 3 {camm, camn} ๐ ๐ ๐ ๐๐ ๐ ๐ ๐ ๐๐ ๐ ๐ ๐ ๐ 12
Translation Registration (3 cameras) Input: Output: Relative translations: Camera positions: ck ๐ ci ๐ ๐๐ ๐๐ ๐ ๐๐ cj 13
Translation Registration (3 cameras) Suppose , are known, can be computed by: 1. rotate to match the orientation of 2. shrink/grow to match the length of A linear equation: ๐ ๐ ( ๐๐ ) both are ๐ ๐๐ ๐๐ easy to compute ๐๐ ๐๐ ๐ ๐ ๐๐ ๐ ๐ ๐ (๐ ๐ ( ๐ ๐ ๐ )๐ )๐ ๐๐๐๐๐๐ (๐ ๐ ๐ ๐ ) ck ๐ ๐๐ ๐ ๐ ci ๐ ๐๐ cj 14
Translation Registration (3 cameras) A similar linear equation by matching and ๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ๐ ) ๐ ๐๐๐๐ (๐ ๐ ๐ ๐ ) ck ๐ ๐๐ ๐ ๐ ci ๐ ๐๐ cj 15
Translation Registration (3 cameras) A geometric explanation : plane spanned by and ๐๐ ๐๐ ๐ ๐ ๐๐ ๐ ๐ ( ๐ ) ๐ (๐ ๐ ๐ ๐ ) ๐ : plane spanned by and ๐๐ ๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ) ๐ (๐ ๐ ๐ ๐ ) ๐ ๐1 ck c ik ci and are non-coplanar ๐2 c jk c ij cj 16
Translation Registration (3 cameras) A geometric explanation ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ๐ ) ๐ ๐๐ (๐ ๐ ๐ ๐ ) ๐๐ ๐๐ ๐ ๐ ๐๐ ๐ ๐ ( ๐ ๐ ) ๐ ๐๐ (๐ ๐ ๐ ๐ ) ๐๐ ๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ๐ ) ๐ ๐๐๐๐ (๐ ๐ ๐ ๐ ) A ๐๐ ๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ) ๐ (๐ ๐ ๐ ๐ ) ๐ต ๐ see derivation in the paper ๐1 A ck c ik ci ๐2 : the mutual perpendicular line : the middle point of B c jk c ij Our linear equations minimizes an approximate geometric error! cj 17
Translation Registration (3 cameras) No degeneracy with collinear motion ๐ ci ๐๐ ck ๐ ๐ ๐๐ ๐๐ cj ๐๐ ๐ ๐ ๐๐ ๐ ๐ ( 0 ) ๐ ๐๐ ๐๐ (๐ ๐ ๐ ๐ ) ๐๐ ๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ( 0 ) ๐ (๐๐ ๐ ๐ ) 18
Translation Registration (3 cameras) Suppose , are known, can be computed by: ๐๐ ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ๐ ) ๐ ๐๐๐๐ (๐ ๐ ๐๐ ) ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ๐ ) ๐ ๐๐๐๐ (๐ ๐ ๐ ๐) ck ๐ ๐ ๐ ๐๐ ๐ ๐๐ ๐ ๐ ci ๐ ๐๐ cj 19
Translation Registration (3 cameras) Suppose , are known, can be computed by: ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ๐ ) ๐ ๐๐๐๐ (๐ ๐ ๐ ๐ ) ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ ๐ ) ๐ ๐๐๐๐ (๐๐ ๐ ๐ ) ck ๐ ๐๐ ๐ ๐๐ ๐ ๐ ๐ ๐ ci ๐ ๐๐ cj 20
Translation Registration (3 cameras) Collecting all six equations ๐ต๐๐๐ ๐๐ ๐ ๐ 0 ๐๐ ( ) 21
Translation Registration (n cameras) Generalize to n cameras 1. Collect equations from all triangles in the match graph. ๐ต๐ 0 ๐1 ๐2 ๐ 3match graph: The ๐4 each camera is a vertex, ๐ ๐5 connect two cameras if their ๐6 relative motion is known. ๐7 ๐8 ๐9 [] 2. Solve all equations ๐ต1 ( ๐1 , ๐ 2 , ๐ 3 ) 0 cameras can be non-coplanar.
Triangulation Once cameras are fixed, triangulate matched corners to generate 3D points. 23
Robustness Issues Exclude unreliable triplets More consistency checks in the paper Check if ๐ ? ๐๐ ๐ ๐๐ ๐ ๐๐ ๐ ๐๐ ๐ ๐๐ ๐๐ ๐ 24
Results Accuracy evaluation: Compare with recent methods on data with known ground truth. Fountain-P11 Herz-Jesu-P25 Fountain-P11 Castle-P30 Herz-Jesu-P25 Castle-P30 c meters R degrees c meters R degrees c meters R degrees Ours 0.0139 0.1954 0.0636 0.1880 0.2345 0.4800 [Arie-Nachimson et al. 2012] 0.0226 0.4211 0.0479 0.3125 - - [Sinha et al. 2010] 0.1317 - 0.2538 - - - VisualSFM 0.0364 0.2794 0.0551 0.2868 0.2639 0.3980 All results are after the final bundle adjustment.
Results Efficiency evaluation: Building Notre Dame Building (128) Pisa Notre Dame (371) Trevi Fountain Pisa (481) Trevi Fountain (1259) Our Method VisualSFM Our Method VisualSFM Our Method VisualSFM Our Method VisualSFM Total running time (s)* 17 62 49 479 69 479 135 1790 BA time (s) 11 57 20 442 52 444 61 1715 Registration time (s) 6 5 29 37 17 12 74 75 128 128 362 365 479 480 1255 1253 91,290 78,100 103,629 104,657 134,555 129,484 297,766 292,277 # of reconstructed images # of reconstructed points * The total running time excludes the time spent on feature matching and epipolar geometry computation.
Conclusions A global solution for orientations & positions; Linear, robust & geometrically meaningful; No degeneracy. 27
Thanks! code & data available at: http://www.ece.nus.edu.sg/stfpage/eletp/
Results A large scale scene Quasi-dense points generated by CMVS [Furukawa et al. 2010] for better visualization. 29