0% found this document useful (0 votes)
30 views

Lecture W4ab

This document discusses primitives and transformations in computer vision. It introduces basic geometric primitives like points, lines and planes in 2D and 3D space. It also covers common transformations like translation. The origins of the pinhole camera model are explained, along with orthographic and perspective projection models. Geometric image formation using the pinhole camera model is also covered.

Uploaded by

cane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Lecture W4ab

This document discusses primitives and transformations in computer vision. It introduces basic geometric primitives like points, lines and planes in 2D and 3D space. It also covers common transformations like translation. The origins of the pinhole camera model are explained, along with orthographic and perspective projection models. Geometric image formation using the pinhole camera model is also covered.

Uploaded by

cane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

D R .

DAU D A B D U L L A H

COMPUTER VISION W E E K 4

F E B 2 0 2 3
Agenda

2.1 Primitives and Transformations

2.2 Geometric Image Formation

2.3 Photometric Image Formation

2.4 Image Sensing Pipeline

2
2.1
Primitives and Transformations
Primitives and Transformations

I Geometric primitives are the basic building blocks used to describe 3D shapes
I In this unit, we introduce points, lines and planes
I Furthermore, the most basic transformations are discussed
I This unit covers the topics of the Szeliski book, chapter 2.1
I A more exhaustive introduction can be found in the book:
Hartley and Zisserman: Multiple View Geometry in Computer Vision

4
2D Points
2D points can be written in inhomogeneous coordinates as
!
x
x= ∈ R2
y

or in homogeneous coordinates as  

x̃ =  ỹ  ∈ P2
 

where P2 = R3 \ {(0, 0, 0)} is called projective space.

Remark: Homogeneous vectors that differ only by scale are considered equivalent and
define an equivalence class. ⇒ Homogeneous vectors are defined only up to scale.
5
2D Points
An inhomogeneous vector x is converted to a homogeneous vector x̃ as follows
   
x̃ x !
x
x̃ =  ỹ  = y  = = x̄
   
1
w̃ 1

with augmented vector x̄.To convert in the opposite direction we divide by w̃:
     
! x x̃ x̃/w̃
x 1 1  
x̄ = = y  = x̃ =  ỹ  = ỹ/w̃ 
  
1 w̃ w̃
1 w̃ 1

Homogeneous points whose last element is w̃ = 0 are called ideal points or


points at infinity. These points can’t be represented with inhomogeneous coordinates!
6
2D Points
Homogeneous Vector

Augmented Vector

Homogeneous Inhomogeneous
Coordinates Coordinates

7
2D Lines

2D lines can also be expressed using homogeneous coordinates l̃ = (a, b, c)> :

{x̄ | l̃> x̄ = 0} ⇔ {x, y | ax + by + c = 0}

We can normalize l̃ so that l̃ = (nx , ny , d)> = (n, d)> with knk2 = 1. In this case, n is
the normal vector perpendicular to the line and d is its distance to the origin.

An exception is the line at infinity l̃∞ = (0, 0, 1)> which passes through all ideal points.

8
Cross Product

Cross product expressed as the product of a skew-symmetric matrix and a vector:


    
0 −a3 a2 b1 a2 b3 − a3 b2
a × b = [a]× b =  a3 0 −a1  b2  = a3 b1 − a1 b3 
    

−a2 a1 0 b3 a1 b2 − a2 b1

Remark: In this course, we use squared brackets to distinguish matrices from vectors.

9
2D Conics
More complex algebraic objects can be represented using polynomial homogeneous
equations. For example, conic sections (arising as the intersection of a plane and a
3D cone) can be written using quadric equations:

{x̄ | x̄> Q x̄ = 0}

Circle Ellipse Parabola Hyperbola

Useful for multi-view geometry and camera calibration, see Hartley and Zisserman.
13
3D Points
3D points can be written in inhomogeneous coordinates as
 
x
x =  y  ∈ R3
 

or in homogeneous coordinates as
 

 
 ỹ  3
 z̃  ∈ P
x̃ =  
 

with projective space P3 = R4 \ {(0, 0, 0, 0)}.


14
3D Quadrics

The 3D analog of 2D conics is a quadric surface:

{x̄ | x̄> Q x̄ = 0}

Useful in the study of multi-view geometry. Also serves as useful modeling primitives
(spheres, ellipsoids, cylinders), see Hartley and Zisserman, Chapter 2 for details.

18
Superquadrics Revisited

Superquadrics (generalization of quadrics) for shape abstraction and compression.

Paschalidou, Ulusoy and Geiger: Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids. CVPR, 2019. 19
2D Transformations

Translation: (2D Translation of the Input, 2 DoF)


" #
I t
x0 = x + t ⇔ x̄0 = > x̄
0 1
I Using homogeneous representations allows to chain/invert transformations
I Augmented vectors x̄ can always be replaced by general homogeneous ones x̃
20
Application: Panorama Stitching

Brown and Lowe: Recognising Panoramas. ICCV, 2003 26


2.2
Geometric Image Formation
Origins of the Pinhole Camera

Animal Eye: Pinhole Perspective Projection: Photographic Camera:


A long time ago Brunelleschi, 15th Century Nicéphore Niépce, 1816

28
Origins of the Pinhole Camera

Camera Obscura:
4th Century BC

28
Origins of the Pinhole Camera

https://www.abelardomorell.net/camera-obscura

28
Origins of the Pinhole Camera
Physical Camera Model Mathematical Camera Model

e
e

y
t Ra

an
an

t Ra
y Ligh

Pl
Pl

Focal Ligh

e
e

ag
ag

Focal Point

Im
Point
Im

Light Ray Light Ray

Camera
Coordinate Camera
System Coordinate
System

I In a physical pinhole camera the image is projected up-side down


onto the image plane which is located behind the focal point
I When modeling perspective projection, we assume the image plane in front
I Both models are equivalent, with appropriate change of image coordinates
29
Projection Models
Orthographic Projection Perspective Projection
e Light Ray

e
an

an
t Ray
Ligh
Pl

Pl
e

e
ag

ag
Im

Im
Focal Point

Light Ray Light Ray


Camera
Coordinate
System
Image
Coordinate Camera Image
System Coordinate Coordinate
System System

Opto Engineering Telecentric Lens Canon 800mm Telephoto Lens Nikon AF-S Nikkor 50mm Sony DSC-RX100 V Samsung Galaxy S20

I These two are the most important projections, see Szeliski Ch. 2.1.4 for others
30
Projection Models

Perspective Weak Perspective /Orthographic


Increasing Focal Length / Distance from Camera

30
Orthographic Projection
Image Plane
Image Coordinate System
Light Ray
Camera
Coordinate
System
Camera
Center

Orthographic projection of a 3D point xc ∈ R3 to pixel coordinates xs ∈ R2 :


I The x and y axes of the camera and image coordinate systems are shared
I Light rays are parallel to the z-coordinate of the camera coordinate system
I During projection, the z-coordinate is dropped, x and y remain the same
I Remark: the y coordinate is not shown here for clarity, but behaves similarly
31
Orthographic Projection

An orthographic projection simply drops the z component of the 3D point in camera


coordinates xc to obtain the corresponding 2D point on the image plane (= screen) xs .
 
" # 1 0 0 0
1 0 0
xs = xc ⇔ x̄s = 0 1 0 0 x̄c
 
0 1 0
0 0 0 1

Orthography is exact for telecentric lenses and an approximation for telephoto lenses.
After projection the distance of the 3D point from the image can’t be recovered.

32
Scaled Orthographic Projection

In practice, world coordinates (which may measure dimensions in meters) must be


scaled to fit onto an image sensor (measuring in pixels) ⇒ scaled orthography:
 
" # s 0 0 0
s 0 0
xs = xc ⇔ x̄s = 0 s 0 0 x̄c
 
0 s 0
0 0 0 1

Remark: The unit for s is px/m or px/mm to convert metric 3D points into pixels.

Under orthography, structure and motion can be estimated simultaneously using


factorization methods (e.g., via singular value decomposition).

33
Perspective Projection
Image Plane
Image
Coordinate Light Ray
Camera System
Coordinate
System
Camera Principal Axis
Center Focal Length

Perspective projection of a 3D point xc ∈ R3 to pixel coordinates xs ∈ R2 :


I The light ray passes through the camera center, the pixel xs and the point xc
I Convention: the principal axis (orthogonal to image plane) aligns with the z-axis
I Remark: the y coordinate is not shown here for clarity, but behaves similarly

34
Perspective Projection

In perspective projection, 3D points in camera coordinates are mapped to the image


plane by dividing them by their z component and multiplying with the focal length:
 
! ! f 0 0 0
xs f xc /zc
= ⇔ x̃s =  0 f 0 0 x̄c
 
ys f yc /zc
0 0 1 0

Note that this projection is linear when using homogeneous coordinates. After the
projection it is not possible to recover the distance of the 3D point from the image.

Remark: The unit for f is px (=pixels) to convert metric 3D points into pixels.

35
Perspective Projection
Without Principal Point Offset With Principal Point Offset
Image
Coordinate
System

e
ay ay

an
ht R ht R
Lig Lig

Pl
e
ag
Im

y y
Light Ra Light Ra
Focal Point Focal Point
Principal Point
Principal Axis Principal Point Principal Axis

Camera Image Camera

e
an
Coordinate Coordinate Coordinate

Pl
System System System

e
ag
Im
I To ensure positive pixel coordinates, a principal point offset c is usually added
I This moves the image coordinate system to the corner of the image plane
36
Perspective Projection
The complete perspective projection model is given by:
 
! ! fx s cx 0
xs fx xc /zc + s yc /zc + cx
= ⇔ x̃s =  0 fy cy 0 x̄c
 
ys fy yc /zc + cy
0 0 1 0

I The left 3 × 3 submatrix of the projection matrix is called calibration matrix K


I The parameters of K are called camera intrinsics (as opposed to extrinsic pose)
I Here, fx and fy are independent, allowing for different pixel aspect ratios
I The skew s arises due to the sensor not mounted perpendicular to the optical axis
I In practice, we often set fx = fy and s = 0, but model c = (cx , cy )>
37
Chaining Transformations
Image
Coordinate
System World

e
an
Coordinate

Pl
System

e
ag
Im
Camera
Coordinate
System

Let K be the calibration matrix (intrinsics) and [R|t] the camera pose (extrinsics).
We chain both transformations to project a point in world coordinates to the image:
" #
h i h i R t h i
x̃s = K 0 x̄c = K 0 x̄ w = K R t x̄w = P x̄w
0> 1

Remark: The 3 × 4 projection matrix P can be pre-computed. 38


Lens Distortion
The assumption of linear projection (straight lines remain straight) is violated in
practice due to the properties of the camera lens which introduces distortions.
Both radial and tangential distortion effects can be modeled relatively easily:
Let x = xc /zc , y = yc /zc and r2 = x2 + y 2 . The distorted point is obtained as:
! !
x 2 κ3 x y + κ4 (r2 + 2 x2 )
x0 = (1 + κ1 r2 + κ2 r4 ) +
| {z } y 2 κ4 x y + κ3 (r2 + 2 y 2 )
Radial Distortion | {z }
Tangential Distortion
!
fx x0 + cx
xs =
fy y 0 + cy

Images can be undistorted such that the perspective projection model applies.
More complex distortion models must be used for wide-angle lenses (e.g., fisheye).
40
Lens Distortion

41
2.3
Photometric Image Formation
Photometric Image Formation
light
source
^
n

sensor
plane
surface

optics

I So far we have discussed how individual light rays travel through space
I We now discuss how an image is formed in terms of pixel intensities and colors
I Light is emitted by one or more light sources and reflected or refracted
(once or multiple times) at surfaces of objects (or media) in the scene
43
Rendering Equation
Let p ∈ R3 denote a 3D surface point, v ∈ R3 the viewing direction and s ∈ R3 the
incoming light direction. The rendering equation describes how much of the light Lin
with wavelength λ arriving at p is reflected into the viewing direction v:
Z
Lout (p, v, λ) = Lemit (p, v, λ) + BRDF(p, s, v, λ) · Lin (p, s, λ) · (−n> s) ds

I Ω is the unit hemisphere at normal n


I The bidirectional reflectance distribution
function BRDF(p, s, v, λ) defines how
light is reflected at an opaque surface.
I Lemit > 0 only for light emitting surfaces

44
Diffuse and Specular Reflection

Diffuse Specular Mirror

I Typical BRDFs have a diffuse and a specular component


I The diffuse (=constant) component scatters light uniformly in all directions
I This leads to shading, i.e., smooth variation of intensity wrt. surface normal
I The specular component depends strongly on the outgoing light direction
45
Diffuse and Specular Reflection

Diffuse Specular Combined

I Typical BRDFs have a diffuse and a specular component


I The diffuse (=constant) component scatters light uniformly in all directions
I This leads to shading, i.e., smooth variation of intensity wrt. surface normal
I The specular component depends strongly on the outgoing light direction
45
BRDF Examples

I BRDFs can be very complex and spatially varying


Slide Credits: Svetlana Lazebnik 46
Fresnel Effect

I The amount of light reflected from a surface depends on the viewing angle

Slide Credits: Filament Documentation 47


Global Illumination

Rendering with Direct Lighting Rendering with Global Illumination

I Modeling one light bounce is insufficient for rendering complex scenes


I Light sources can be shadowed by occluders and rays can bounce multiple times
I Global illumination techniques also take indirect illumination into account
48
Why Camera Lenses?

I Large and very small pinholes result in image blur (averaging, diffraction)
I Small pinholes require very long shutter times (⇒ motion blur)
I http://www.pauldebevec.com/Pinhole/
49
Why Camera Lenses?
Small Pinhole

e
an
Pl

Pin-
e
ag

hole
Im

Large Pinhole
e
an
Pl

Pin-
e
ag

hole
Im

50
Optics
Pinhole Camera Model Camera with Lens
ne

e
an
a
Pl

Pl
Lens
Pin-
e

e
ag

ag
hole
Im

Im
I Cameras use one or multiple lenses to accumulate light on the sensor plane
I Importantly, if a 3D point is in focus, all light rays arrive at the same 2D pixel
I For many applications it suffices to model lens cameras with a pinhole model
I However, to address focus, vignetting and aberration we need to model lenses
51
Thin Lens Model
Image Focal Points of Lens
Plane

xs zs − f xs zs zs − f zs zs zs 1 1 1
= ∧ = ⇒ = ⇒ −1 = ⇒ + =
xc f xc zc f zc f zc zs zc f

I The thin lens model with spherical lens is often used as an approximation
I Properties: Axis-parallel rays pass the focal point, rays via center keep direction
R
I From Snell’s law we obtain f = 2(n−1) with radius R and index of refraction n
52
Depth of Field (DOF)

I The image is in focus if 1 1 1


zs + zc = f where f is the focal length of the lens
I For zc → ∞ we obtain zs = f (lens with focal length f ≈ pinhole at distance f )
I If the image plane is out of focus, a 3D point projects to the circle of confusion c

53
Depth of Field (DOF)

I To control the size of the circle of confusion, we change the lens aperture
I An aperture is a hole or an opening through which light travels
I The aperture limits the amount of light that can reach the image plane
I Smaller apertures lead to sharper, but more noisy images (less photons)
54
Depth of Field (DOF)

I The allowable depth variation that limits the circle of confusion c is called
depth of field and is a function of both the focus distance and the lens aperture
I Typical DSLR lenses have depth of field indicators
I The commonly displayed f-number is defined as
f
N= (often denoted as f /N , e.g.: f /1.4)
d
I In other words, it is the lens focal length f divided by the aperture diameter d
55
Depth of Field (DOF)

Aperture = f/1.4 Aperture = f/4.0 Aperture = f/22


DOF = 0.8 cm DOF = 2.2 cm DOF = 12.4 cm

Depth of Field (DOF):


I Distance between the nearest and farthest objects that are acceptably sharp
I Decreasing the aperture diameter (increasing the f-number) increases the DOF

56
Chromatic Aberration

I The index of refraction for glass varies slightly as a function of wavelength


I Thus, simple lenses suffer from chromatic aberration which is the tendency for
light of different colors to focus at slightly different distances (blur, color shift)
I To reduce chromatic and other kinds of aberrations, most photographic lenses
are compound lenses made of different glass elements (with different coatings)

57
Chromatic Aberration

I Top: High-quality lens Bottom: Low-quality lens (blur, rainbow edges)


58
Vignetting
Image
Plane

I Vignetting is the tendency for the brightness to fall off towards the image edge
I Composition of two effects: natural and mechanical vignetting
I Natural vignetting: foreshortening of object surface and lens aperture
I Mechanical vignetting: the shaded part of the beam never reaches the image
I Vignetting can be calibrated (i.e., undone)
59
Vignetting

60
2.4
Image Sensing Pipeline
Image Sensing Pipeline

The image sensing pipeline can be divided into three stages:


I Physical light transport in the camera lens/body
I Photon measurement and conversion on the sensor chip
I Image signal processing (ISP) and image compression
62
Shutter

I A focal plane shutter is positioned just in front the image sensor / film
I Most digital cameras use a combination of mechanical and electronic shutter
I The shutter speed (exposure time) controls how much light reaches the sensor
I It determines if an image appears over-/underexposed, blurred or noisy
63
Sensor

I CCDs move charge from pixel to pixel and convert it to voltage at the output node
I CMOS images convert charge to voltage inside each pixel and are standard
I Larger chips (full frame = 35 mm) are more photo-sensitive ⇒ less noise

https://meroli.web.cern.ch/lecture_cmos_vs_ccd_pixel_sensor.html
64
Color Filter Arrays

G R G R rGb Rgb rGb Rgb

B G B G rgB rGb rgB rGb

G R G R rGb Rgb rGb Rgb

B G B G rgB rGb rgB rGb

Bayer RGB Pattern Interpolated Pixels


I To measure color, pixels are arranged in a color array, e.g.: Bayer RGB pattern
I Missing colors at each pixel are interpolated from the neighbors (demosaicing)
65
Color Filter Arrays

Slide Credits: Steve Seitz 66


Color Filter Arrays

I Each pixel integrates the light spectrum L according to its spectral sensitivity S:
Z
R = L(λ) SR (λ) dλ

I The spectral response curves are provided by the camera manufacturer


67
Color Spaces

I Various different color spaces have been developed and are used in practice
68
Gamma Compression
Y’ Y
visible
Y’ = Y1/γ noise
Y = Y’γ

Y quantization Y’
noise
I Humans are more sensitive to intensity differences in darker regions
I Therefore, it is beneficial to nonlinearly transform (left) the intensities or colors
prior to discretization (left) and to undo this transformation during loading
69
Image Compression

I Typically luminance is compressed with higher fidelity than chrominance


I Often, (8 × 8 pixel) patch-based discrete cosine or wavelet transforms are used
I Discrete Cosine Transform (DCT) is an approximation to PCA on natural images
I The coefficients are quantized to integers that can be stored with Huffman codes
I More recently, deep network based compression algorithms are developed

70
QUESTIONS???
AC K N OW L E D G E M E N T !
• Various contents in this presentation have been taken from different books,
lecture notes, and the web. These solely belong to their owners, and are here used
only for clarifying various educational concepts. Any copyright infringement is
not intended.

You might also like