3dv Slides
3dv Slides
https://github.com/sunglok/3dv_tutorial
An Invitation
Introduction to 3D Vision
: A Tutorial for Everyone
Computer vision is an interdisciplinary field that deals with how computers can be made to
gain high-level understanding from digital images or videos. From the perspective of
engineering, it seeks to automate tasks that the human visual system can do.[1][2][3]
"Computer vision is concerned with the automatic extraction, analysis and understanding of
useful information from a single image or a sequence of images. It involves the development
of a theoretical and algorithmic basis to achieve automatic visual understanding."[9] As a
scientific discipline, computer vision is concerned with the theory behind artificial systems that
extract information from images. The image data can take many forms, such as video sequences,
views from multiple cameras, or multi-dimensional data from a medical scanner.[10] As a
technological discipline, computer vision seeks to apply its theories and models for the
construction of computer vision systems.
Computer Vision by Wikipedia
3
What is Computer Vision?
Computer vision is an interdisciplinary field that deals with how computers can be made to
gain high-level understanding from digital images or videos. From the perspective of
engineering, it seeks to automate tasks that the human visual system can do.[1][2][3]
"Computer vision is concerned with the automatic extraction, analysis and understanding of
useful information from a single image or a sequence of images. It involves the development
of a theoretical and algorithmic basis to achieve automatic visual understanding."[9] As a
scientific discipline, computer vision is concerned with the theory behind artificial systems that
extract information from images. The image data can take many forms, such as video sequences,
views from multiple cameras, or multi-dimensional data from a medical scanner.[10] As a
technological discipline, computer vision seeks to apply its theories and models for the
construction of computer vision systems.
Computer Vision by Wikipedia
4
What is Computer Vision?
Image Understanding
Computer
Graphics
Shape
Computer Face
Vision
Location
Human
Image
Processing
(transformed) Image/Signal
5
What is Computer Vision?
Computer Vision
6
What is 3D Vision?
Visual Geometry (Multiple View Geometry)
Geometric Vision
Computer Vision
▪ Reference Books
8
What is 3D Vision?
cf. All examples in this tutorial are mostly less than 100 lines and based on recent OpenCV (> 3.0.0).
9
What is 3D Vision?
Range Sensor
Omni-directional Camera
(LiDAR, RADAR)
10
What is 3D Vision?
Range Sensor
Omni-directional Camera
(LiDAR, RADAR)
11
Applications: Photo Browsing
[Reference] Snavely et al., Photo Tourism: Exploring Photo Collections in 3D, SIGGRAPH, 2006 12
Applications: 3D Reconstruction
[Reference] Im et al., High Quality Structure from Small Motion for Rolling Shutter Cameras, ICCV, 2015
14
[Reference] Hedman et al., Causual 3D Photography, SIGGRAPH Asia, 2017
Applications: Real-time Visual SLAM
▪ ORB-SLAM (2014)
[Reference] Mur-Artal et al., ORB-SLAM: A Versatile and Accurate Monocular SLAM System, T-RO, 2015 15
Applications: Augmented Reality
[Reference] Klein and Murray, Parallel Tracking and Mapping for Small AR Workspaces, ISMAR, 2007 16
Applications: Mixed Reality
▪ What is 3D Vision?
▪ Single-view Geometry
– Camera Projection Model
– General 2D-3D Geometry
▪ Two-view Geometry
– Planar 2D-2D Geometry (Projective Geometry)
– General 2D-2D Geometry (Epipolar Geometry)
▪ Correspondence Problem
▪ Multi-view Geometry
▪ Summary
19
Getting Started with 2D
▪ Similarity
20
Getting Started with 2D
▪ Similarity
21
Getting Started with 2D
▪ Point transformation
22
Getting Started with 2D
▪ Coordinate transformation
23
Getting Started with 2D
24
Getting Started with 2D
25
Getting Started with 2D
▪ 2D rotation matrix
cf. Properties of a rotation matrix
• 𝑅 −1 = 𝑅 ⊺ (orthogonal matrix)
• det(𝑅) = 1
▪ 3D rotation matrix
26
Single-view Geometry
Camera Projection Model
- Pinhole Camera Model
- Geometric Distortion
General 2D-3D Geometry
- Camera Calibration
- Absolute Camera Pose Estimation
camera obscura
pinhole
image
image plane
object
camera center
principal axis
focal length focal length
camera obscura
pinhole
image
image plane
object
29
Camera Projection Model
image plane
principal axis
+
principal point
+
image plane
30
Camera Projection Model
image plane
+
principal axis
+
principal point
+
image plane
31
Camera Projection Model
object
image plane
32
Camera Projection Model
object
image plane
33
Camera Projection Model
object
image plane
camera
34
Camera Projection Model vanishing point
▪ Vanishing Points
– A point on the image plane where mutually parallel lines in 3D space converge
• A vector to the vanishing point is parallel to the lines.
• A vector to the vanishing point is parallel to the reference plane made by the lines.
▪ Vanishing Points
– A point on the image plane where mutually parallel lines in 3D space converge
• A vector to the vanishing point is parallel to the lines.
• A vector to the vanishing point is parallel to the reference plane made by the lines.
top-view
parallel lines
vanishing point
side-view
lines
36
Camera Projection Model vanishing point
cf. The tilt angle in this page is defined as the opposite direction of the common notation.
vanishing point
reference plane
37
Camera Projection Model vanishing point
cf. Similarly, the pan angle (𝜃𝑦 ) w.r.t. rails also can be calculated using 𝑥 instead of 𝑦.
vanishing point
reference plane
38
head point
head point
+
contact point
39
head point
head point
+
contact point
40
Camera Projection Model
41
1. #include "opencv2/opencv.hpp"
2. ...
3. int main()
4. {
5. const char* input = "data/daejeon_station.png";
6. double f = 810.5, cx = 480, cy = 270, L = 3.31;
7. cv::Point3d cam_ori(DEG2RAD(-18.7), DEG2RAD(-8.2), DEG2RAD(2.0));
8. cv::Range grid_x(-2, 3), grid_z(5, 35);
9. ...
42
Camera Projection Model
▪ Camera Matrix K
specialized
+ generalized
principal axis
43
Camera Projection Model
principal axis
44
Camera Projection Model
45
Camera Projection Model
▪ Projection Matrix P
– If a 3D point is not based on the camera coordinate, the point should be transformed.
3x4 matrix
world coordinate
camera coordinate
– The camera pose can be derived from R⊺ and −R⊺ 𝐭. 3x4 matrix
46
A point cloud: ”data/box.xyz”
CloudCompare (https://danielgm.net/cc/)
1. #include "opencv2/opencv.hpp"
5. int main()
6. {
7. // The given camera configuration: focal length, principal point, image resolution, position, and orientation
8. double f = 1000, cx = 320, cy = 240, noise_std = 1;
9. cv::Size img_res(640, 480);
10. std::vector<cv::Point3d> cam_pos = { cv::Point3d(0, 0, 0), cv::Point3d(-2, -2, 0), ... };
11. std::vector<cv::Point3d> cam_ori = { cv::Point3d(0, 0, 0), cv::Point3d(-CV_PI / 12, CV_PI / 12, 0), ... };
34. // Project the points (cf. OpenCV provides 'cv::projectPoints()' with consideration of distortion.)
35. cv::Mat x = P * X;
36. x.row(0) = x.row(0) / x.row(2);
37. x.row(1) = x.row(1) / x.row(2);
38. x.row(2) = 1;
58. cv::waitKey(0);
49
59. return 0;
60. }
Camera Projection Model
▪ Geometric Distortion
– Geometric distortion models are usually defined on the normalized image plane.
– Polynomial distortion model (a.k.a. Brown-Conrady model)
• Radial distortion: 𝑘1 , 𝑘2 , …
• Tangential distortion: 𝑝1 , 𝑝2 , … (usually negligible)
barrel dist.
distortion correction
K1: 1.105763E-01
K2: 1.886214E-02
K3: 1.473832E-02
P1:-8.448460E-03
P2:-7.356744E-03
52
1. #include "opencv2/opencv.hpp"
2. int main()
3. {
4. const char* input = "data/chessboard.avi";
5. cv::Matx33d K(432.7390364738057, 0, 476.0614994349778, 0, 431.2395555913084, 288.7602152621297, 0, 0, 1);
6. std::vector<double> dist_coeff = { -0.2852754904152874, 0.1016466459919075, ... };
7. // Open a video
8. cv::VideoCapture video;
9. if (!video.open(input)) return -1;
19. // Rectify geometric distortion (cf. 'cv::undistort()' can be applied for one-time remapping.)
20. if (show_rectify)
21. {
22. if (map1.empty() || map2.empty())
23. cv::initUndistortRectifyMap(K, dist_coeff, cv::Mat(), cv::Mat(), image.size(), CV_32FC1, map1, map2);
24. cv::remap(image, image, map1, map2, cv::InterpolationFlags::INTER_LINEAR);
25. }
33. video.release();
34. return 0;
35. }
53
General 2D-3D Geometry
▪ Camera Calibration
– Unknown: Intrinsic + extrinsic parameters (5* + 6 DoF)
• The number of intrinsic parameters* can be varied w.r.t. user preference.
– Given: 3D points 𝐗1 , 𝐗 2 , …, 𝐗 𝑛 and their projected points 𝐱1 , 𝐱 2 , …, 𝐱 𝑛
– Constraints: 𝑛 x projection 𝐱 𝑖 = K R | 𝐭 𝐗 𝑖
9 x 6 chessboard
54
General 2D-3D Geometry
▪ Camera Calibration
– Unknown: Intrinsic + 𝑚 x extrinsic parameters (5* + 𝑚 x 6 DoF)
Given: 3D points 𝐗1 , 𝐗 2 , …, 𝐗 𝑛 and their projected points from 𝑗th camera 𝐱 𝑖
𝑗
–
Constraints: 𝑛 x 𝑚 x projection 𝐱 𝑖 = K R𝑗 | 𝐭𝑗 𝐗 𝑖
𝑗
–
– Solutions
• OpenCV cv::calibrateCamera() and cv::initCameraMatrix2D()
• Camera Calibration Toolbox for MATLAB, http://www.vision.caltech.edu/bouguetj/calib_doc/
• GML C++ Camera Calibration Toolbox, http://graphics.cs.msu.ru/en/node/909
• DarkCamCalibrator, http://darkpgmr.tistory.com/139
9 x 6 chessboard
55
General 2D-3D Geometry
960 x 540
56
1. #include "opencv2/opencv.hpp"
2. int main()
3. {
4. const char* input = "data/chessboard.avi";
5. cv::Size board_pattern(10, 7);
6. float board_cellsize = 0.025f;
7. // Select images
8. std::vector<cv::Mat> images;
9. ...
57
General 2D-3D Geometry
landmark map
58
General 2D-3D Geometry
- Grab an image
- Find a chessboard on the image
[AR Rendering]
- Draw the 3D box on the image
using cv::projectPoints()
59
1. #include "opencv2/opencv.hpp"
2. int main()
3. {
4. const char* input = "data/chessboard.avi";
5. cv::Matx33d K(432.7390364738057, 0, 476.0614994349778, 0, 431.2395555913084, 288.7602152621297, 0, 0, 1);
6. std::vector<double> dist_coeff = { -0.2852754904152874, 0.1016466459919075, ... };
7. cv::Size board_pattern(10, 7);
8. double board_cellsize = 0.025;
9. // Open a video
10. cv::VideoCapture video;
11. if (!video.open(input)) return -1;
60
31. // Run pose estimation
32. while (true)
33. {
34. // Grab an image from the video
35. cv::Mat image;
36. video >> image;
37. if (image.empty()) break;
62. video.release();
63. return 0;
64. }
61
General 2D-3D Geometry
62
General 2D-3D Geometry
63
General 2D-3D Geometry
[pose_estimation_book3.cpp]
▪ Example: Pose Estimation (Book) + Camera Calibration – Initially Given K
(∵ unknown)
- Load the cover image
- Extract ORB features
- Build 3D object points 𝐗1 , 𝐗 2 , …
64
Two-view Geometry
Planar 2D-2D Geometry (Projective Geometry)
- Planar Homography Estimation
General 2D-2D Geometry (Epipolar Geometry)
- Relative Camera Pose Estimation
- Triangulation
plane
transformation
shear
composition
rotation
translation
perspective
scaling / aspect ratio projection
66
Overview of Projective Geometry
Euclidean Transform Similarity Transform Affine Transform Projective Transform
(a.k.a. Rigid Transform) (a.k.a. Planar Homography)
Matrix Forms
DoF 3 4 6 8
Transformations
- rotation O O O O
- translation O O O O
- scaling X O O O
- aspect ratio X X O O
- shear X X O O
- perspective projection X X X O
Invariants
- length O X X X
- angle O O X X
- ratio of lengths O O X X
- parallelism O O O X
- incidence O O O O
- cross ratio O O O O
cv::getAffineTransform() cv::getPerspectiveTransform()
cv::estimateRigidTransform() -
OpenCV Functions
- cv::findHomography()
cv::warpAffine() cv::warpPerspective()
relative pose R, 𝐭
68
Planar 2D-2D Geometry (Projective Geometry)
point_dst
Click
point_src
69
1. #include "opencv2/opencv.hpp"
9. int main()
10. {
11. const char* input = "data/sunglok_desk.jpg";
12. cv::Size card_size(450, 250);
2. int main()
3. {
4. // Load two images (cf. We assume that two images have the same size and type)
5. cv::Mat image1 = cv::imread("data/hill01.jpg");
6. cv::Mat image2 = cv::imread("data/hill02.jpg");
7. if (image1.empty() || image2.empty()) return -1;
=
23. }
24. cv::Mat H = cv::findHomography(points2, points1, cv::RANSAC);
25. cv::Mat merge;
26. cv::warpPerspective(image2, merge, H, cv::Size(image1.cols * 2, image1.rows));
27. merge.colRange(0, image1.cols) = image1 * 1; // Copy
1. #include "opencv2/opencv.hpp"
2. int main()
3. {
4. // Open a video and get the reference image and feature points
5. cv::VideoCapture video;
6. if (!video.open("data/traffic.avi")) return -1;
7. cv::Mat gray_ref;
8. video >> gray_ref;
9. if (gray_ref.empty())
10. {
11. video.release();
12. return -1;
13. }
14. if (gray_ref.channels() > 1) cv::cvtColor(gray_ref, gray_ref, cv::COLOR_RGB2GRAY);
72
22. // Run and show video stabilization
23. while (true)
24. {
25. // Grab an image from the video
26. cv::Mat image, gray;
27. video >> image;
28. if (image.empty()) break;
29. if (image.channels() > 1) cv::cvtColor(image, gray, cv::COLOR_RGB2GRAY);
30. else gray = image.clone();
50. video.release();
51. return 0;
52. }
▪ Fundamental Matrix
image plane
camera #1
camera #2
relative pose R, 𝐭
74
General 2D-2D Geometry (Epipolar Geometry)
▪ Essential Matrix
camera #1
camera #2
relative pose R, 𝐭
Epipolar Constraint:
(Derivation)
75
General 2D-2D Geometry (Epipolar Geometry)
▪ Epipolar Geometry
camera #1
baseline camera #2
epipole
76
General 2D-2D Geometry (Epipolar Geometry)
unknown
▪ Epipolar Geometry
epipolar line
image plane
camera #1
baseline camera #2
epipole
epipolar line
no epipole
relative pose
77
General 2D-2D Geometry (Epipolar Geometry)
image plane
camera #1
camera #2
relative pose R, 𝐭
78
Epipolar Geometry: Scale Ambiguity
Images are from Resolving Scale Ambiguity in Monocular Visual Odometry (by Choi et al., 2013). 79
Epipolar Geometry: Scale Ambiguity
Images are from Resolving Scale Ambiguity in Monocular Visual Odometry (by Choi et al., 2013). 81
Epipolar Geometry: Relative Pose Ambiguity
Images are from Multiple View Geometry in Computer Vision (by Hartley and Zisserman, 2nd Edition, 2004). 82
General 2D-2D Geometry (Epipolar Geometry)
83
General 2D-2D Geometry (Epipolar Geometry)
84
Overview of Epipolar Geometry
Formulation 1 ⊺
E = 𝐭 ×R =R+
H 𝐭𝐧
𝑑
Estimation - 5-point algorithm (𝑛 ≥ 5) → k solution - 4-point algorithm (𝑛 ≥ 4) → 1 solution
- cv::findEssentialMat() - cv::findHomography()
Input - (ො𝐱𝑖 , 𝐱ො 𝑖′ ) [m] on the normalized image plane - (ො𝐱𝑖 , 𝐱ො 𝑖′ ) [m] on a plane in the normalized image plane
Degenerate - no translational motion - correspondence not from a single plane
Cases
Decomposition - cv::decomposeEssentialMat() - cv::decomposeHomographyMat() with K = I3×3
to R and 𝐭 - cv::recoverPose()
86
General 2D-2D Geometry (Epipolar Geometry)
relative pose R, 𝐭
87
Result: “visual_odometry_epipolar.xyz”
88
1. #include "opencv2/opencv.hpp"
2. int main()
3. {
4. const char* input = "data/KITTI_07_L/%06d.png";
5. double f = 707.0912;
6. cv::Point2d c(601.8873, 183.1104);
7. bool use_5pt = true;
8. int min_inlier_num = 100;
9. ...
89
33. // Calculate relative pose
34. cv::Mat E, inlier_mask;
35. if (use_5pt)
36. {
37. E = cv::findEssentialMat(point_prev, point, f, c, cv::RANSAC, 0.99, 1, inlier_mask);
38. }
39. else
40. {
41. cv::Mat F = cv::findFundamentalMat(point_prev, point, cv::FM_RANSAC, 1, 0.99, inlier_mask);
42. cv::Mat K = (cv::Mat_<double>(3, 3) << f, 0, c.x, 0, f, c.y, 0, 0, 1);
43. E = K.t() * F * K;
44. }
45. cv::Mat R, t;
46. int inlier_num = cv::recoverPose(E, point_prev, point, R, t, f, c, inlier_mask);
58. video.release();
59. fclose(camera_traj);
60. return 0;
61. }
90
General 2D-2D Geometry (Epipolar Geometry)
image plane
camera #1
camera #2
relative pose R, 𝐭
91
▪ Example: Triangulation (Two-view Reconstruction) [triangulation.cpp]
1. #include "opencv2/opencv.hpp"
2. int main()
3. {
4. // cf. You need to run 'image_formation.cpp' to generate point observation.
5. const char *input0 = "image_formation0.xyz", *input1 = "image_formation1.xyz";
6. double f = 1000, cx = 320, cy = 240;
What is it?
600 pixels
94
Quiz
What is it?
600 pixels
95
Quiz
“Duck”
[Reference] Tuytelaars et al., Local Invariant Feature Detectors: A Survey, Foundations and Trends in Computer Graphics and Vision, 2008 96
Harris Corner (1988)
Images are from Matching with Invariant Features (by Frolova and Simakov, Lecture Slides, 2004). 97
Harris Corner (1988)
▪ Properties
– Invariant to translation, rotation, and intensity shift (𝐼 → 𝐼 + 𝑏) intensity scaling (𝐼 → 𝑎𝐼)
– But variant to image scaling
Edges
Corner!
Images are from Matching with Invariant Features (by Frolova and Simakov, Lecture Slides, 2004). 98
SIFT (Scale-Invariant Feature Transform; 1999)
local extrema (N: 11479) feature points (N: 971) feature scales and orientations
𝐷𝑦𝑦 𝐷𝑥𝑦
𝐷′𝑦𝑦 𝐷′𝑥𝑦
[Reference] Bay et. al., SURF: Speeded Up Robust Features, CVIU, 2008 100
FAST (Features from Accelerated Segment Test; 2006)
▪ Versions
– FAST-9 (𝑁: 9), FAST-12 (𝑁: 12), …
– FAST-ER: Training a decision tree to enhance repeatability with more pixels
[Reference] Rosten et al., FASTER and better: A machine learning approach to corner detection, T-PAMI, 2010 101
LIFT (Learned Invariant Feature Transform; 2016)
SIFT LIFT
[Reference] Yi et. al., LIFT: Learned Invariant Feature Transform, ECCV, 2016 102
Overview of Feature Correspondence
▪ Features
– Corners: Harris corner, GFTT (Shi-Tomasi corner), SIFT, SURF, FAST, LIFT, …
– Edges, line segments, regions, …
▪ Feature Descriptors and Matching
– Patch: Raw intensity
• Measures: SSD (sum of squared difference), ZNCC (zero normalized cross correlation), …
– Floating-point descriptors: SIFT, SURF, (DAISY), LIFT, … → e.g. A 128-dim. vector (a histogram of gradients)
• Measures: Euclidean distance, cosine distance, (the ratio of first and second bests)
• Matching: Brute-force matching (𝑂(𝑁 2 )), ANN (approximated nearest neighborhood) search (𝑂(log𝑁))
• Pros (+): High discrimination power
• Cons (–): Heavy computation
– Binary descriptors: BRIEF, ORB, (BRISK), (FREAK), … → e.g. A 128-bit string (a series of intensity comparison)
• Measures: Hamming distance
• Matching: Brute-force matching (𝑂(𝑁 2 ))
• Pros (+): Less storage and faster extraction/matching
• Cons (–): Less performance
▪ Feature Tracking (a.k.a. Optical Flow)
– Optical flow: (Horn-Schunck method), Lukas-Kanade method
• Measures: SSD (sum of squared difference)
• Tracking: Finding displacement of a similar patch
• Pros (+): No descriptor and matching (faster and compact)
• Cons (–): Not working in wide baseline
103
SIFT (Scale-Invariant Feature Transform; 1999)
𝐿 𝑥, 𝑦 + 1 − 𝐿(𝑥, 𝑦 − 1)
𝜃 𝑥, 𝑦 = tan−1
𝐿 𝑥 + 1, 𝑦 − 𝐿(𝑥 − 1, 𝑦)
104
SIFT (Scale-Invariant Feature Transform; 1999)
2 2 2 1 1 4 2 1 2 …
[Reference] Calonder et. al., BRIEF: Computing a Local Binary Descriptor Very Fast, T-PAMI, 2012 106
ORB (Oriented FAST and rotated BRIEF, 2011)
– Rotation-aware BRIEF
• Extract BRIEF descriptors w.r.t. the known orientation
• Use better comparison pairs trained by greedy search
▪ Combination: ORB
– FAST-9 detector (with orientation) + BRIEF-256 descriptor (with trained pairs)
▪ Computing time
– ORB: 15.3 [msec] / SURF: 217.3 [msec] / SIFT: 5228.7 [msec] @ 24 images (640x480) in Pascal dataset
[Reference] Rublee et. al., ORB: An Efficient Alternative to SIFT or SURF, ICCV, 2011 107
Lukas-Kanade Optical Flow
∴ 𝒗 = A† 𝒃 = A⊺ A −1 ⊺
A 𝒃
▪ Combination: KLT tracker
– Shi-Tomasi detector (a.k.a. GFTT) + Lukas-Kanade optical flow
109
Why Outliers?
Putative matches (inliers + outliers)
110
RANSAC: Random Sample Consensus
𝐱 𝑖 = (𝑥𝑖 , 𝑦𝑖 )
111
RANSAC: Random Sample Consensus
Hypothesis
Generation
Hypothesis
Evaluation
112
RANSAC: Random Sample Consensus
Hypothesis
Generation
Hypothesis
Evaluation
# of Inlier Candidates: 3
113
RANSAC: Random Sample Consensus
Hypothesis
Generation
Hypothesis 7
Evaluation
114
RANSAC: Random Sample Consensus
Parameters:
Hypothesis
▪ The inlier threshold Generation
▪ The number of iterations
- 𝑠: Confidence level
Hypothesis 7
- 𝛾: Inlier ratio
Evaluation
- 𝑑: The number of samples
115
▪ Example: Line Fitting with RANSAC [line_fitting_ransac.cpp]
1. #include "opencv2/opencv.hpp"
5. int main()
6. {
7. cv::Vec3d truth(1.0 / sqrt(2.0), 1.0 / sqrt(2.0), -240.0); // The line model: a*x + b*y + c = 0 (a^2 + b^2 = 1)
8. int ransac_trial = 50, ransac_n_sample = 2;
9. double ransac_thresh = 3.0; // 3 x 'data_inlier_noise'
10. int data_num = 1000;
11. double data_inlier_ratio = 0.5, data_inlier_noise = 1.0;
117
Least Squares Method, RANSAC, and M-estimator
▪ RANSAC
– Find a model while maximizing the number of supports (~ inlier candidates)
~ minimizing the number of outlier candidates
– RANSAC:
[Reference] Choi et al., Performance Evaluation of RANSAC Family, BMVC, 2009 118
Ceres (an asteroid)
▪ Ceres Solver?
– An open source C++ library for modelling and solving large and complicated optimization problems.
• Since 2010 by Google (BSD license)
– Problem types: 1) Non-linear least squares (with bounds), 2) General unconstrained minimization
– Homepage: http://ceres-solver.org/
121
Multi-view Geometry
▪ Bundle Adjustment
– Unknown: Position of 3D points and each camera’s relative pose (6𝑛 + 3𝑚 DoF)
– Given: Point correspondence, camera matrices, position of 3D points, and each camera’s relative pose
cf. initial values
Constraints: 𝑛 x 𝑚 x projection 𝐱 𝑖 = P𝑗 𝐗 𝑖 = K 𝑗 R𝑗 | 𝐭𝑗 𝐗 𝑖
𝑗
–
– Solution: Non-linear least-square optimization
• Cost Function: Reprojection error
camera #m
image plane
relative pose R 𝑚 , 𝐭 𝑚
camera #1
camera #2
relative pose R 2 , 𝐭 2
123
Bundle Adjustment
▪ Bundle Adjustment
– Unknown: Position of 3D points and each camera’s relative pose (6𝑛 + 3𝑚 DoF)
– Given: Point correspondence, camera matrices, position of 3D points, and each camera’s relative pose
cf. initial values
Constraints: 𝑛 x 𝑚 x projection 𝐱 𝑖 = P𝑗 𝐗 𝑖 = K 𝑗 R𝑗 | 𝐭𝑗 𝐗 𝑖
𝑗
–
– Solution: Non-linear least-square optimization
• Cost Function: Reprojection error
124
Bundle Adjustment
125
[Given]
- Intrinsic parameters: camera matrices K𝑗 (all cameras has same and fixed camera matrix.)
[Unknown]
- Extrinsic parameters: camera poses
- 3D points 𝐗 𝑖 = 0, 0, 5.5 ⊺
initial values
126
1. #include "bundle_adjustment.hpp"
2. int main()
3. {
4. // cf. You need to run 'image_formation.cpp' to generate point observation.
5. const char* input = "image_formation%d.xyz";
6. int input_num = 5;
7. double f = 1000, cx = 320, cy= 240;
25. static ceres::CostFunction* create(const cv::Point2d& _x, double _f, cv::Point2d& _c)
26. {
27. return (new ceres::AutoDiffCostFunction<ReprojectionError, 2, 6, 3>(new ReprojectionError(_x, _f, _c)));
28. }
29. private:
30. const cv::Point2d x;
31. const double f;
32. const cv::Point2d c;
33. };
[bundle_adjustment.hpp] 128
Bundle Adjustment
[bundle_adjusetment_global.xyz]
- # of iterations: 20
- Reprojection error: 0.113 ← 2637.626
[bundle_adjusetment_inc.xyz]
- # of iterations: 17 + 4 + 6
- Reprojection error: 0.113 ← 722.350
129
[Given]
- Intrinsic parameters: camera matrices K𝑗 (all cameras has same and fixed camera matrix.)
1) Best-pair selection
[Unknown]
- Extrinsic parameters: camera poses
2) Epipolar geometry
3) Triangulation
- 3D points 𝐗 𝑖
130
[Given]
- Intrinsic parameters: camera matrices K𝑗 (all cameras has same and fixed camera matrix.)
5) Perspective-n-point (PnP)
- 3D points 𝐗 𝑖
6) Triangulation
131
[Given]
- Intrinsic parameters: camera matrices K𝑗 (all cameras has same and fixed camera matrix.)
5) Perspective-n-point (PnP)
- 3D points 𝐗 𝑖
6) Triangulation
7) Bundle Adjustment
132
Applications
Structure-from-Motion
image set
Visual SLAM
(unordered)
image sequences
(ordered) Visual Odometry
133
Structure-from-Motion
1834 / 5767
134
Structure-from-Motion
205 / 5615
135
Structure-from-Motion
136
Structure-from-Motion
2) Estimate relative pose from the best two views (epipolar geometry)
- # of points: 2503
- # of iterations: 150
- Reprojection error: 50.962 ← 17385.5
- # of points: 2133
- # of iterations: 26 + 23 + 25
- Reprojection error: 0.380 ← 1.261
138
Structure-from-Motion using VisualSFM
142
Paradigm #1: Bayesian Filtering v.s. Bundle Adjustment
SLAM SfM
(Simultaneous Localization and Mapping) (Structure from Motion)
Bayesian Filtering
VS Bundle Adjustment
≤
1. Global optimization
2. # of features (100 vs 4000)
143
Paradigm #2: Feature-based Method v.s. Direct Method
+ =
144
Paradigm #2: Feature-based Method v.s. Direct Method
145
Why Visual Odometry?
Visual odometry
VS Wheel odometry
146
Why Visual Odometry?
Visual odometry
VS Wheel odometry
147
Why Visual Odometry?
Visual odometry
VS Wheel odometry
Visual Odometry
VS Visual SLAM
148
Why Visual Odometry?
Visual odometry
VS Wheel odometry
Visual Odometry
VS Visual SLAM
149
Feature-based Monocular Visual Odometry
150
Feature-based Monocular Visual Odometry
151
Feature-based Monocular Visual Odometry
152
Slides and example codes are available:
Summary https://github.com/sunglok/3dv_tutorial
▪ What is 3D Vision?
▪ Single-view Geometry
– Camera Projection Model
• Pinhole Camera Model
• Geometric Distortion Models
– General 2D-3D Geometry
• Camera Calibration
• Absolute Camera Pose Estimation (PnP Problem)
▪ Two-view Geometry
– Planar 2D-2D Geometry (Projective Geometry)
• Planar Homography
– General 2D-2D Geometry (Epipolar Geometry)
• Fundamental/Essential Matrix
• Relative Camera Pose Estimation
• Triangulation (Point Localization)
▪ Multi-view Geometry
– Bundle Adjustment (Non-linear Optimization)
– Applications: Structure-from-motion, Visual SLAM, and Visual Odometry
▪ Correspondence Problem
– Feature Correspondence: Feature Matching and Tracking
– Robust Parameter Estimation: (Hough Transform), RANSAC, M-estimator
153
Applications in Deep Learning Era
▪ Linear Equations
– Inhomogeneous cases: Multiplying a pseudo-inverse
– Homogenous cases: Finding a null vector (or a vector which has the smallest singular value)
▪ Non-linear Equations
– Non-linear optimization
– General cases: Gradient-descent method, Newton method
– Least squares cases: Gauss-Newton method, Levenberg–Marquardt method
155
Appendix: Further Information
▪ Beyond Point Features
– Other features: OPVO, Kimera
– Direct methods (w/o features): Already mentioned (including deep learning)
▪ Real-time / Large-scale SfM
▪ (Spatially / Temporally) Non-static SfM
– Deformable (or moving) objects: Non-rigid SfM
▪ Depth, Object, and Semantic Recognition