(Winter 2021) : CS231A: Computer Vision, From 3D Reconstruction To Recognition Homework #2 Due: Friday, Februrary 12

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

CS231A: Computer Vision, From 3D Reconstruction to Recognition Homework #2

(Winter 2021) Due: Friday, Februrary 12

Fundamental Matrix (20 points)


In this question, you will explore some properties of fundamental matrix and derive a minimal
parameterization for it.
a Show that two 3 × 4 camera matrices M and M 0 can always be reduced to the following
canonical forms by an appropriate projective transformation in 3D space, which is represented
by a 4 × 4 matrix H. Here, we assume eT3 (−A0 A−1 b+b0 ) 6= 0, where e3 = (0, 0, 1)T , M = [A, b]
and M 0 = [A0 , b0 ].
Note: You don’t have to show the explicit form of H for the proof. [10 points]
Hint: The camera matrix has rank 3. Block matrix multiplication may be helpful. If you
construct a projective transformation matrix H0 that reduces M to M̂ , (i.e., M̂ = M H0 )
can a H1 be constructed such that not only does it not affect the reduction of M to M̂ (i.e.,
M̂ = M H0 H1 ), but it also reduces M 0 to M̂ 0 ? (i.e., M̂ 0 = M 0 H0 H1 )
   
1 0 0 0 a11 a12 a13 b1
M̂ = 0 1 0 0 and M̂ 0 = a21 a22 a23 b2 
0 0 1 0 0 0 0 1

b Given a 4 × 4 matrix H representing a projective transformation in 3D space, prove that


the fundamental matrices corresponding to the two pairs of camera matrices (M, M 0 ) and
(M H, M 0 H) are the same. [5 points]
(Hint: Think about point correspondences)
c Using the conclusions from (a) and (b), derive the fundamental matrix F of the camera pair
(M, M 0 ) using a11 , a12 , a13 , a21 , a22 , a23 , b1 , b2 . Then use the fact that F is only defined up
to a scale factor to construct a seven-parameter expression for F . (Hint: The fundamental
matrix corresponding to a pair of camera matrices M = [I | 0] and M 0 = [A | b] is equal to
[b]× A.) [5 points]

Fundamental Matrix Estimation From Point Correspondences (30


points)
This problem is concerned with the estimation of the fundamental matrix from point correspon-
dences. In this problem, you will implement both the linear least-squares version of the eight-point
algorithm and its normalized version to estimate the fundamental matrices. You will implement
the methods in p2.py and complete the following:
(a) Implement the linear least-squares eight point algorithm in lls eight point alg() and re-
port the returned fundamental matrix. Remember to enforce the rank-two constraint for the
fundamental matrix via singular value decomposition. Attach a copy of your code.
[20 points]

1
.
Figure 1: Example illustration, with epipolar lines shown in both images (Images courtesy Forsyth
& Ponce)

(b) Implement the normalized eight point algorithm in normalized eight point alg() and re-
port the returned fundamental matrix. Remember to enforce the rank-two constraint for the
fundamental matrix via singular value decomposition. Attach a copy of your code.
[5 points]

(c) After implementing methods to determine the Fundamental matrix, we can now determine
epipolar lines. Specifically to determine the accuracy of our Fundamental matrix, we will
compute the average distance between a point and its corresponding epipolar line in
compute distance to epipolar lines(). Attach a copy of your code.
[5 points]

The Factorization Method (15 points)


In this question, you will explore the factorization method, initially presented by Tomasi and
Kanade, for solving the affine structure from motion problem. You will implement the methods in
p3.py and complete the following:

(a) Implement the factorization method as described in lecture and in the course notes. Complete
the function factorization method(). Submit a copy of your code.
[5 points]

(b) Run the provided code that plots the resulting 3D points. Compare your result to the ground
truth provided. The results should look identical, except for a scaling and rotation. Explain
why this occurs.
[3 points]

(c) Report the 4 singular values from the SVD decomposition. Why are there 4 non-zero singular
values? How many non-zero singular values would you expect to get in the idealized version
of the method, and why?
[2 points]

(d) The next part of the code will now only load a subset of the original correspondences. Compare
your new results to the ground truth, and explain why they no longer appear similar.
[3 points]

(e) Report the new singular values, and compare them to the singular values that you found
previously. Explain any major changes.
[2 points]

2
Triangulation in Structure From Motion (35 points)

Figure 2: The set of images used in this structure from motion reconstruction.

Structure from motion is inspired by our ability to learn about the 3D structure in the surround-
ing environment by moving through it. Given a sequence of images, we are able to simultaneously
estimate both the 3D structure and the path the camera took. In this problem, you will implement
significant parts of a structure from motion framework, estimating both R and T of the cameras,
as well as generating the locations of points in 3D space. Recall that in the previous problem we
triangulated points assuming affine transformations. However, in the actual structure from motion
problem, we assume projective transformations. By doing this problem, you will learn how to solve
this type of triangulation. In Course Notes 4, we go into further detail about this process. You will
implement the methods in p4.py and complete the following:

(a) Given correspondences between pairs of images, we compute the respective Fundamental and
Essential matrices. Given the Essential matrix, we must now compute the R and T between
the two cameras. However, recall that there are four possible R, T pairings. In this part, we
seek to find these four possible pairings, which we will later be able to decide between. In the
course notes, we explain in detail the following process:

1. To compute R: Given the singular value decomposition E = U DV T , we can rewrite


E = M Q where M = U ZU T and Q = U W V T or U W T V T , where
   
0 1 0 0 −1 0
Z = −1 0 0 and W = 1 0 0
0 0 0 0 0 1

Note that this factorization of E only guarantees that Q is orthogonal. To find a rotation,
we simply compute R = (det Q)Q.
2. To compute T : Given that E = U ΣV T , T is simply either u3 or −u3 , where u3 is the
third column vector of U .

Implement this in the function estimate initial RT(). We provide the correct R, T , which
should be contained in your computed four pairs of R, T . Submit the four pairs of R, T and
a copy of your code.
[5 points]

(b) In order to distinguish the correct R, T pair, we must first know how to find the 3D point
given matching correspondences in different images. The course notes explain in detail how
to compute a linear estimate (DLT) of this 3D point:

1. For each image i, we have pi = Mi P , where P is the 3D point, pi is the homogenous


image coordinate of that point, and Mi is the projective camera matrix.

3
2. Formulate matrix
p1,1 m3> 1>
 
1 − m1
 p1,2 m3> − m2> 
 1 1 
A=
 .. 
 . 

3> 1>
pn,1 mn − mn 
pn,2 m3>
n − mn
2>

where pi,1 and pi,2 are the xy coordinates in image i and mk>
i is the k-th row of Mi .
3. The 3D point can be solved for by using the singular value decomposition.
Implement the linear estimate of this 3D point in linear estimate 3d point(). Like before,
we print out a way to verify that your code is working. Submit the output generated by this
part of the code and a copy of your code.
[5 points]
(c) However, we can do better than linear estimates, but usually this falls under some iterative
nonlinear optimization. To do this kind of optimization, we need some residual. A simple
one is the reprojection error of the correspondences, which is computed as follows:
For each image i, given camera matrix Mi , the 3D point P , we compute y = Mi P , and find
the image coordinates  
0 1 y1
pi =
y3 y2
Given the ground truth image coordinates pi , the reprojection error ei for image i is
ei = p0i − pi
The Jacobian is written as follows:
∂e1 ∂e1 ∂e1
 
 ∂P1 ∂P2 ∂P3 
 . .. .. 
 ..
J = . . 

 ∂em ∂em ∂em 
∂P1 ∂P2 ∂P3
Recall that each ei is a vector of length two, so the whole Jacobian is a 2K × 3 matrix, where
K is the number of cameras. Fill in the methods reprojection error() and jacobian(),
which computes the reprojection error and Jacobian for a 3D point and its list of images. Like
before, we print out a way to verify that your code is working. Submit the output generated
by this part of the code and a copy of your code.
[5 points]
(d) Implement the Gauss-Newton algorithm, which finds an approximation to the 3D point that
minimizes this reprojection error. Recall that this algorithm needs a good initialization, which
we have from our linear estimate in part (b). Also recall that the Gauss-Newton algorithm
is not guaranteed to converge, so, in this implementation, you should update the estimate of
the point P̂ for 10 iterations (for this problem, you do not have to worry about convergence
criteria for early termination):
P̂ = P̂ − (J T J)−1 J T e
where J and e are the Jacobian and error computed from the previous part. Implement the
Gauss-Newton algorithm to find an improved estimate of the 3D point in the
nonlinear estimate 3d point() function. Like before, we print out a way to verify that
your code is working. Submit the output generated by this part of the code and a copy of
your code.
[5 points]

4
(e) Now finally, go back and distinguish the correct R, T pair from part (a) by implementing the
method estimate RT from E(). You will do so by:

1. First, compute the location of the 3D point of each pair of correspondences given each
R, T pair
2. Given each R, T you will have to find the 3D point’s location in that R, T frame. The cor-
rect R, T pair is the one for which the most 3D points have positive depth (z-coordinate)
with respect to both camera frames. When testing depth for the second camera, we must
transform our computed point (which is the frame of the first camera) to the frame of
the second camera.
[5 points]

(f) Congratulations! You have implemented a significant portion of a structure from motion
pipeline. Your code is able to compute the rotation and translations between different cam-
eras, which provides the motion of the camera. Additionally, you have implemented a robust
method to triangulate 3D points, which enable us to reconstruct the structure of the scene. In
order to run the full structure from motion pipeline, please change the variable run pipeline
at the top of the main function to True. Submit the final plot of the reconstructed statue.
Hopefully, you can see a point cloud that looks like the frontal part of the statue in the above
sequence of images.

Note: Since the class is using Python, the structure from motion framework we use is not
the most efficient implementation. It will be common that generating the final plot may
take a few minutes to complete. Furthermore, Matplotlib was not built to be efficient for 3D
rendering. Although it’s nice to wiggle the point cloud to see the 3D structure, you may find
that the GUI is laggy. If we used better options that incorporate OpenGL (see Glumpy), the
visualization would be more responsive. However, for the sake of the class, we will only use
the numpy-related libraries.
[10 points]

You might also like