RMK Group 21cs905 CV Unit 1

1
2
Please read this disclaimer before
proceeding:
This document is confidential and intended solely for the
educational purpose of RMK Group of Educational Institutions. If
you have received this document through email in error,
please notify the system manager. This document contains
proprietary information and is intended only to the respective
group / learning community as intended. If you are not the
addressee you should not disseminate, distribute or copy
through e-mail. Please notify the sender immediately by e-mail
if you have received this document by mistake and delete this
document from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.
3
21CS905
COMPUTER
VISION
Department:
ARTIFICIAL INTELLIGENCE AND DATA
SCIENCE
Batch/Year: BATCH 2021-25/IV
Created by:
Dr. Seethalakshmi V, Associate Professor, ADS, RMKCET
Date: 18-06-2024
4
Table of Contents
Sl. No. Contents Page No.
1 Contents 5
2 Course Objectives 6
3 Pre Requisites (Course Name with Code) 8
4 Syllabus (With Subject Code, Name, LTPC details) 10
5 Course Outcomes (6) 12
6 CO-PO/PSO Mapping 14
Lecture Plan (S.No., Topic, No. of Periods, Proposed
7 date, Actual Lecture Date, pertaining CO, Taxonomy 18
level, Mode of Delivery)
8 Activity based learning 20

Lecture Notes (with Links to Videos, e-book reference,
9 21
PPTs, Quiz and any other learning materials )
Assignments ( For higher level learning and Evaluation
10 50
- Examples: Case study, Comprehensive design, etc.,)
11 Part A Q & A (with K level and CO) 55
12 Part B Qs (with K level and CO) 62

Supportive online Certification courses (NPTEL,
13 65
Swayam, Coursera, Udemy, etc.,)
14 Real time Applications in day to day life and to Industry 67
Contents beyond the Syllabus ( COE related Value
15 69
added courses)
16 Assessment Schedule ( Proposed Date & Actual Date) 71
17 Prescribed Text Books & Reference Books 73
18 Mini Project 75
5
Course
Objectives
6
COURSE OBJECTIVES
To understand the fundamental concepts

related to Image formation and processing.
To learn feature detection, matching and
detection
To become familiar with feature based
alignment and motion estimation
To develop skills on 3D reconstruction
To understand image based rendering and
recognition
7
PRE
REQUISITES
8
PRE REQUISITES
 21CS905 COMPUTER VISION
9
Syllabus
10
Syllabus
L T P C
21CS905 COMPUTER VISION
3 0 0 3
OBJECTIVES:
 To understand the fundamental concepts related to Image formation and processing.
 To learn feature detection, matching and detection
 To become familiar with feature based alignment and motion estimation
 To develop skills on 3D reconstruction
 To understand image based rendering and recognition
UNIT I INTRODUCTION TO IMAGE FORMATION AND PROCESSING 15
Computer Vision - Geometric primitives and transformations - Photometric image formation -

The digital camera - Point operators - Linear filtering - More neighborhood operators - Fourier
transforms - Pyramids and wavelets - Geometric transformations - Global optimization.
UNIT II FEATURE DETECTION, MATCHING AND SEGMENTATION 15

Points and patches - Edges - Lines - Segmentation - Active contours - Split and merge - Mean
shift and mode finding - Normalized cuts - Graph cuts and energy-based methods
UNIT III FEATURE-BASED ALIGNMENT & MOTION ESTIMATION 15

2D and 3D feature-based alignment - Pose estimation - Geometric intrinsic calibration -
Triangulation - Two-frame structure from motion - Factorization - Bundle adjustment -
Constrained structure and motion - Translational alignment - Parametric motion - Spline-based
motion - Optical flow - Layered motion.
UNIT IV 3D RECONSTRUCTION 15
Shape from X - Active rangefinding - Surface representations - Point-based representations-
Volumetric representations - Model-based reconstruction - Recovering texture maps and

albedosos.
UNIT V IMAGE-BASED RENDERING AND RECOGNITION 15
View interpolation Layered depth images - Light fields and Lumigraphs - Environment mattes -
Video-based rendering-Object detection - Face recognition - Instance recognition – Category
recognition - Context and scene understanding- Recognition databases and test sets
TOTAL: 75 PERIODS
OUTCOMES:
At the end of this course, the students will be able to:
CO1:To understand basic knowledge, theories and methods in image processing and computer
vision.
CO2: To implement basic and some advanced image processing techniques in OpenCV.
CO3: To apply 2D a feature-based based image alignment, segmentation and motion estimations
CO4: To apply 3D image reconstruction techniques
CO5: To design and develop innovative image processing and computer vision applications.
TEXT BOOKS:
1. Richard Szeliski, “Computer Vision: Algorithms and Applications”, Springer- Texts
in Computer Science, Second Edition, 2022.
1. Computer Vision: A Modern Approach, D. A. Forsyth, J. Ponce, Pearson Education,
Second Edition, 2015
REFERENCES:
1. Richard Hartley and Andrew Zisserman, Multiple View Geometry in Computer Vision,
Second Edition, Cambridge University Press, March 2004..
2. Christopher M. Bishop; Pattern Recognition and11 Machine Learning, Springer, 2006
3. E. R. Davies, Computer and Machine Vision, Fourth Edition, Academic Press, 2012.
Course
Outcomes
12
Course Outcomes
Course Description Knowledge
Outcomes Level
CO1 To understand basic knowledge, theories and methods in K2

image processing and computer
vision.
To implement basic and some advanced image processing
CO2 K3
techniques in OpenCV.
CO3 To apply 2D a feature-based based image alignment, K3

segmentation and motion estimations
To apply 3D image reconstruction techniques
CO4 K4
CO5 To design and develop innovative image processing and K5

computer vision applications.
Knowledge Level Description
K6 Evaluation
K5 Synthesis
K4 Analysis
K3 Application
K2 Comprehension
K1 Knowledge
13
CO – PO/PSO
Mapping
14
CO – PO /PSO Mapping
Matrix
CO PO PO PO PO PO PO PO PO PO PO PO PO PS PS PS
1 2 3 4 5 6 7 8 9 10 11 12 O1 O2 03
1 3 2 1 1 3
2 3 3 2 2 3
3 3 3 1 1 3
4 3 3 1 1 3
5 3 3 1 1 3
6 2 2 1 1 3
15
UNIT –I
INTRODUCTION TO
IMAGE FORMATION
AND PROCESSING
16
Lecture
Plan
17
Lecture Plan – Unit 1– INTRODUCTION TO
IMAGE FORMATION AND PROCESSING
Numb
Actual Taxo
Sl. er of Proposed Mode of
Topic Lectur CO nomy
No. Period Date Delivery
e Date Level
s
Computer Vision 19-07-
2024
Blackboard
1 CO1 K3
/ ICT Tools
1
1 CO1 K3 Blackboard
Geometric 20-07- / ICT Tools
2
primitives and 2024
transformations
22-07-
3
Photometric image 2024 Blackboard
1 CO1 K4
formation / ICT Tools
The digital camera 23-07- Blackboard
2 CO1 K4
4 - Point operators 2024 / ICT Tools
Linear filtering - 24-07- Blackboard

More 2 CO1 K4
2024 / ICT Tools
5
neighborhood
operators
Fourier 25-07- Blackboard
1 CO1 K3
2024 / ICT Tools
6 transforms
Pyramids and 26-07- Blackboard
1 CO1 K4
7. wavelets 2024 / ICT Tools
27-07-
Geometric
2024
transformation Blackboard
1 CO1 K4
8 s / ICT Tools
Global
30-07- Blackboard
optimization. 2 CO1 K4
9 2024 / ICT Tools
18
Activity Based
Learning
19
Activity Based Learning
Sl. No. Contents Page No.
1 Geometric Primitives and 47
Transformations
20
Lecture Notes –
Unit 1
21
UNIT-1 INTRODUCTION TO
IMAGE FORMATION AND
PROCESSING
Sl. Contents Page
No. No.
1 Computer Vision 23
2 Geometric primitives and transformations 24
3 Photometric image formation 26
4 The digital camera 27
5 Point operators 29
Linear filtering
6 32
7 More neighborhood operators 35
8 Fourier transforms 37
9 Pyramids and wavelets 39
10 Geometric transformations 41
11 Global optimization. 45
22
UNIT-1INTRODUCTION TO IMAGE
FORMATION AND PROCESSING
Computer Vision - Geometric primitives and transformations - Photometric image formation -
The digital camera - Point operators - Linear filtering - More neighborhood operators –
Fourier transforms - Pyramids and wavelets - Geometric transformations - Global
optimization.
1.1 Computer Vision.

Computer vision is a multidisciplinary field that enables machines to interpret and make
decisions based on visual data. It involves the development of algorithms and systems that
allow computers to gain high-level understanding from digital images or videos. The goal of
computer vision is to replicate and improve upon human vision capabilities, enabling
machines to recognize and understand visual information.
1.1 Key tasks in computer vision include:
1. Image Recognition: Identifying objects, people, or patterns within images.
2. Object Detection: Locating and classifying multiple objects within an image or video
stream.
3. Image Segmentation: Dividing an image into meaningful segments or regions, often to
identify boundaries and structures.
4. Face Recognition: Identifying and verifying individuals based on facial
features.
5. Gesture Recognition: Understanding and interpreting human gestures from images
or video.
6. Scene Understanding: Analyzing and comprehending the content and context of a
scene.
7. Motion Analysis: Detecting and tracking movements within video sequences.
8. 3DReconstruction:Creating
9. Three-dimensional models of objects or scenes from two-dimensional images
23
Computer vision applications are diverse and found in various fields, including
healthcare (medical image analysis), autonomous vehicles, surveillance,
augmentedreality,robotics,industrialautomation,andmore.Advancesindeep
learning, especially convolutional neural networks (CNNs), have significantly
contributed to the progress and success of computer vision tasks by enabling
efficient feature learning from large datasets
1.2 Geometric primitives and transformations:

Geometric primitives and transformations are fundamental concepts
in computer graphics and computer vision. They form the basis for
representing and manipulating visual elements in both 2D and 3D
spaces. Let's explore each of these concepts:
Geometric Primitives:
1. Points: Represented by coordinates (x, y) in 2D or (x, y, z) in 3D
space.
2. Lines and Line Segments: Defined by two points or a point and a
direction vector
3. Polygons: Closed shapes with straight sides. Triangles,
quadrilaterals, and other polygons are common geometric
primitives.
4. Circles and Ellipses: Defined by a center point and radii (or axes
in the case of ellipses).
5. Curves: Bézier curves, spline curves, and other parametric curves
are used to represent smooth shapes.
Geometric Transformations:
Geometric transformations involve modifying the position, orientation,
and scale of geometric primitives. Common transformations include
24
1. Translation: Moves an object by a certain distance along a
specified direction.
2. Rotation: Rotates an object around a specified point or axis.
3. Scaling: Changes the size of an object along different axes.
4. Shearing: Distorts the shape of an object by stretching or
compressing along one or more axes.
5. Reflection: Mirrors an object across a specified plane.
6. Affine Transformations: Combine translation, rotation, scaling, and
shearing.
7. Projective Transformations: Used for perspective transformations
in 3D graphics
Applications:
• Computer Graphics: Geometric primitives and
transformations are fundamental for rendering 2D and
3Dgraphics in applications such as video games, simulations, and
virtual reality.
• Computer-Aided Design (CAD): Used for designing and modeling
objects in engineering and architecture.
• Computer Vision: Geometric transformations are applied to align
and process images, correct distortions, and perform other tasks
in image analysis.
• Robotics: Essential for robot navigation, motion planning, and
spatial reasoning.
• Understanding geometric primitives and transformations is crucial
for creating realistic and visually appealing computer-generated
images, as well as for solving various problems in computer vision
and robotics.
25
1.3. Photometric image formation:
Photometric image formation refers to the process by which light
interacts with surfaces and is captured by a camera, resulting in the
creation of a digital image. This process involves various factors
related to the properties of light, the surfaces of objects, and the
characteristics of the imaging system. Understanding photometric
Image formation is crucial in computer vision, computer graphics, and
image processing.
Here are some key concepts involved:

Illumination:
- Ambient Light: The overall illumination of a scene that comes from all
directions.
- Directional Light: Light coming from a specific direction, which can create
highlights and shadows.
Reflection:
- Diffuse Reflection: Light that is scattered in various directions by rough
surfaces.
- Specular Reflection: Light that reflects off smooth surfaces in a concentrated
direction, creating highlights 26
Shading:- Lambertian Shading: A model that assumes diffuse reflection and
constant shading across a surface.- Phong Shading: A more sophisticated
model that considers specular reflection, creating more realistic highlights.
Surface Properties:- Reflectance Properties: Material characteristics that
determine how light is reflected (e.g., diffuse and specular reflectance).-
Albedo: The inherent reflectivity of a surface, representing the fraction of
incident light that is reflected.
Lighting Models:- Phong Lighting Model: Combines diffuse and specular
reflection components to model lighting.- Blinn-Phong Model: Similar to the
Phong model but computationally more efficient.
Shadows:- Cast Shadows: Darkened areas on surfaces where light is
blocked by other objects.- Self Shadows: Shadows cast by parts of an object
onto itself.
Color and Intensity:- Color Reflection Models: Incorporate the color
properties of surfaces in addition to reflectance.- Intensity: The brightness of
light or color in an image.
Cameras:- Camera Exposure: The amount of light allowed to reach the
camera sensor or film.- Camera Response Function: Describes how a camera
responds to light of different intensities.
1.4 The Digital Camera:
A digital camera is an electronic device that captures and stores digital
images. It differs from traditional film cameras in that it uses electronic
sensors to record images rather than photographic film. Digital cameras have
become widespread due to their convenience, ability to instantly review
images, and ease of sharing and storing photos digitally. Here are key
components and concepts related to digital cameras:
27
Image Sensor:- Digital cameras use image sensors (such as CCD or CMOS) to convert light
into electrical signals.- The sensor captures the image by measuring the intensity of light at each
pixel location.
Lens:- The lens focuses light onto the image sensor.- Zoom lenses allow users to adjust the
focal length, providing optical zoom.
Aperture:- The aperture is an adjustable opening in the lens that controls the amount of light
entering the camera.- It affects the depth of field and exposure.
Shutter:- The shutter mechanism controls the duration of light exposure to the image sensor.-
Fast shutter speeds freeze motion, while slower speeds create motion blur.
Viewfinder and LCD Screen:- Digital cameras typically have an optical or electronic
viewfinder for composing shots.- LCD screens on the camera back allow users to view and frame
images.
Image Processor:- Digital cameras include a built-in image processor to convert raw sensor
data into a viewable image.- Image processing algorithms may enhance color, sharpness, and
reduce noise.
Memory Card:- Digital images are stored on removable memory cards, such as SD or CF
cards.- Memory cards provide a convenient and portable way to store and transfer images.
Autofocus and Exposure Systems:- Autofocus systems automatically adjust the lens to
ensure a sharp image.- Exposure systems determine the optimal combination of aperture,
shutter speed, and ISO sensitivity for proper exposure.
White Balance:- White balance settings adjust the color temperature of the captured image to
match different lighting conditions.
Modes and Settings:- Digital cameras offer various shooting modes (e.g., automatic, manual,
portrait, landscape) and settings to control image parameters.
Connectivity:- USB, HDMI, or wireless connectivity allows users to transfer images to
computers, share online, or connect to other devices.
Battery:- Digital cameras are powered by rechargeable batteries, providing the necessary
energy for capturing and processing images
28
1.5 Point operators:
Point operators, also known as point processing or pixel-wise
operations, are basic image processing operations that operate on
individual pixels independently. These operations are applied to each
pixel in an image without considering the values of neighboring pixels.
Point operators typically involve mathematical operations or functions
that transform the pixel values, resulting in changes to the image's
appearance. Here are some common point operators:
Brightness Adjustment:
- Addition/Subtraction: Increase or decrease the intensity of all pixels by
adding or subtracting a constant value.
- Multiplication/Division: Scale the intensity values by multiplying or
dividing them by a constant factor.
Contrast Adjustment:
- Linear Contrast Stretching: Rescale the intensity values to cover the full
dynamic range.
- Histogram Equalization: Adjust the distribution of pixel intensities to
enhance contrast.
Gamma Correction:
- Adjust the gamma value to control the overall brightness and contrast
of an image.
29
Thresholding:
- Convert a grayscale image to binary by setting a threshold value. Pixels
with values above the threshold become white, and those below become
black.
Bit-plane Slicing:
- Decompose an image into its binary representation by considering
individual bits.
Color Mapping:
- Apply color transformations to change the color balance or convert
between color spaces (e.g., RGB to grayscale).
Inversion:
- Invert the intensity values of pixels, turning bright areas dark and vice
versa.
Image Arithmetic:
- Perform arithmetic operations between pixels of two images, such as
addition, subtraction, multiplication, or division.
30
Point operators are foundational in image processing and form the basis for
more complex operations. They are often used in combination to achieve
desired enhancements or modifications to images. These operations are
computationally efficient, as they can be applied independently to each pixel,
making them suitable for real-time applications and basic image
manipulation tasks. It's important to note that while point operators are
powerful for certain tasks, more advanced image processing techniques,
such as filtering and convolution, involve considering the values of
neighboring pixels and are applied to local image regions
31
1.6 Linear filtering:
Linear filtering is a fundamental concept in image processing that involves
applying a linear operator to an image. The linear filter operates on each
pixel in the image by combining its value with the values of its
neighboring pixels according to a predefined convolution kernel or matrix.
The convolution operation is a mathematical operation that computes the
weighted sum of pixel values in the image, producing a new value for the
center pixel.
The general formula for linear filtering or convolution is given by
Where
Common linear filtering operations include:

Blurring/Smoothing:
- Average filter: Each output pixel is the average of its neighboring
pixels.
- Gaussian filter: Applies a Gaussian distribution to compute weights for
pixel averaging
32
Edge Detection:
- Sobel filter: Emphasizes edges by computing gradients in the x and y
directions.
- Prewitt filter: Similar to Sobel but uses a different kernel for gradient
computation.
Sharpening:
- Laplacian filter: Enhances high-frequency components to highlight edges.
- High-pass filter: Emphasizes details by subtracting a blurred version of the
image.
Embossing:
- Applies an embossing effect by highlighting changes in intensity.
33
Linear filtering is a versatile technique and forms the basis for more
advanced image processing operations. The convolution operation
can be efficiently implemented using convolutional neural networks
(CNNs) in deep learning, where filters are learned during the training
process to perform tasks such as image recognition, segmentation,
and denoising. The choice of filter kernel and parameters determines
the specific effect achieved through linear filtering.
34
1.7 More neighborhood operators :
Neighborhood operators in image processing involve the consideration of
pixel values in the vicinity of a target pixel, usually within a defined
neighborhood or window. Unlike point operators that operate on
individual pixels, neighborhood operators take into account the local
structure of the image. Here are some common neighborhood operators:
Median Filter:
- Computes the median value of pixel intensities within a local neighborhood.
- Effective for removing salt-and-pepper noise while preserving edges.
Gaussian Filter:
- Applies a weighted average to pixel values using a Gaussian distribution.
- Used for blurring and smoothing, with the advantage of preserving edges.
Bilateral Filter:
- Combines spatial and intensity information to smooth images while preserving
edges.
- Uses two Gaussian distributions, one for spatial proximity and one for
35
intensity similarity.
Non-local Means Filter:
- Computes the weighted average of pixel values based on similarity in a larger
non-local neighborhood.
- Effective for denoising while preserving fine structures.
Anisotropic Diffusion:
- Reduces noise while preserving edges by iteratively diffusing intensity values
along edges.
- Particularly useful for images with strong edges.
Morphological Operators:
- Dilation: Expands bright regions by considering the maximum pixel value in a
neighborhood.
Erosion:
- Contracts bright regions by considering the minimum pixel value in a
neighborhood.
- Used for operations like noise reduction, object segmentation, and shape
analysis.
Laplacian of Gaussian (LoG):
- Applies a Gaussian smoothing followed by the Laplacian operator.
- Useful for edge detection.
Canny Edge Detector:
- Combines Gaussian smoothing, gradient computation, non-maximum
suppression, and edge tracking by
hysteresis.
- Widely used for edge detection in computer vision applications.
Homomorphic Filtering:
- Adjusts image intensity by separating the image into illumination and
reflectance components.
- Useful for enhancing images with non-uniform illumination.
36
Adaptive Histogram Equalization:
- Improves contrast by adjusting the histogram of pixel intensities
based on local neighborhoods.
- Effective for enhancing images with varying illumination.
These neighborhood operators play a crucial role in image
enhancement, denoising, edge detection, and other image processing
tasks. The choice of operator depends on the specific characteristics of
the image and the desired outcome.
1.8 Fourier transforms:

Fourier transforms play a significant role in computer vision for
analyzing and processing images. They are used to decompose an
image into its frequency components, providing valuable information for
tasks such as image filtering, feature extraction, and pattern
recognition. Here are some ways Fourier transforms are employed in
computer vision:
Frequency Analysis:
- Fourier transforms help in understanding the frequency content of an
image. High-frequency components correspond to edges and fine
details, while low-frequency components represent smooth regions.
Image Filtering:
Filtering in the frequency domain allows for efficient operations such as
blurring or sharpening. Low-pass filters remove high-frequency noise,
while high-pass filters enhance edges and fine details.
Image Enhancement:
- Adjusting the amplitude of specific frequency components can
enhance or suppress certain features in an image. This is commonly
used in image enhancement techniques.
37
Texture Analysis:
- Fourier analysis is useful in characterizing and classifying textures based on
their frequency characteristics. It helps distinguish between textures with
different patterns.
Pattern Recognition:
- Fourier descriptors, which capture shape information, are used for
representing and recognizing objects in images. They provide a compact
representation of shape by capturing the dominant frequency components.
Image Compression:
- Transform-based image compression, such as JPEG compression, utilizes
Fourier transforms to transform image data into the frequency domain. This
allows for efficient quantization and coding of frequency
components.
Image Registration:
- Fourier transforms are used in image registration, aligning images or
transforming them to a common coordinate system. Cross-correlation in the
frequency domain is often employed for this purpose.
Optical Character Recognition (OCR):
- Fourier descriptors are used in OCR systems for character recognition. They
help in capturing the shape information of characters, making the recognition
process more robust.
Homomorphic Filtering:
- Homomorphic filtering, which involves transforming an image to a
logarithmic domain using Fourier transforms, is used in applications such as
document analysis and enhancement.
38
Image Reconstruction:
- Fourier transforms are involved in techniques like computed
tomography (CT) or magnetic resonance imaging (MRI) for
reconstructing images from their projections.

The efficient computation of Fourier transforms, particularly through the
use of the Fast Fourier Transform (FFT) algorithm, has made these
techniques computationally feasible for real-time applications in
computer vision. The ability to analyze images in the frequency domain
provides valuable insights and contributes to the development of
advanced image processing techniques.
1.9 PYRAMIDS AND WAVELETS:
Pyramids and wavelets are both techniques used in image processing for
multi-resolution analysis, allowing the representation of an image at
different scales. They are valuable for tasks such as image compression,
feature extraction, and image analysis.
Image Pyramids:
Image pyramids are a series of images representing the same scene but
at different resolutions. There are two main types of image pyramids:
Gaussian Pyramid:
- Created by repeatedly applying Gaussian smoothing and downsampling
to an image.
- At each level, the image is smoothed to remove high-frequency
information, and then it is subsampled to reduce its size.
- Useful for tasks like image blending, image matching, and coarse-to-
fine image processing
39
Laplacian Pyramid:
- Derived from the Gaussian pyramid.
- Each level of the Laplacian pyramid is obtained by subtracting the
expanded version of the higher level Gaussian pyramid from the original
image.
- Useful for image compression and coding, where the Laplacian pyramid
represents the residual information not captured by the Gaussian pyramid.
Image pyramids are especially useful for creating multi-scale
representations of images, which can be beneficial for various
computer vision tasks.
Wavelets:
Wavelets are mathematical functions that can be used to analyze signals
and images. Wavelet transforms provide a multi-resolution analysis by
decomposing an image into approximation (low-frequency) and detail
(high-frequency) components.
Key concepts include:
Wavelet Transform:- The wavelet transform decomposes an image into
different frequency components by convolving the image with wavelet
functions.- The result is a set of coefficients that represent the image at
various scales and orientations.
Multi-resolution Analysis:- Wavelet transforms offer a multi-resolution
analysis, allowing the representation of an image at different scales.- The
approximation coefficients capture the low-frequency information, while
detail coefficients capture high-frequency information.
Haar Wavelet:- The Haar wavelet is a simple wavelet function used in
basic wavelet transforms.- It represents changes in intensity between
adjacent pixels.
40
Wavelet Compression:
- Wavelet-based image compression techniques, such as JPEG2000, utilize
wavelet transforms to efficiently represent image data in both spatial and
frequency domains.
Image Denoising:
- Wavelet-based thresholding techniques can be applied to denoise images by
thresholding the wavelet coefficients.
Edge Detection:
- Wavelet transforms can be used for edge detection by analyzing the high-
frequency components of the image.
Both pyramids and wavelets offer advantages in multi-resolution analysis, but
they differ in terms of their representation and construction. Pyramids use a
hierarchical structure of smoothed and subsampled images, while wavelets
use a transform-based approach that decomposes the image into frequency
components. The choice between pyramids and wavelets often depends on
the specific requirements of the image processing task at hand.
1.10 Geometric transformations :

Geometric transformations are operations that modify the spatial
configuration of objects in a digital image. These transformations are applied
to change the position, orientation, scale, or shape of objects while
preserving certain geometric properties. Geometric transformations are
commonly used in computer graphics, computer vision, and image
processing. Here are some fundamental geometric transformations:
1. Translation:
- Description: Moves an object by a specified distance along the x and/or y
axes.- Transformation Matrix (2D):
41
Applications: Object movement, image registration.
2. Rotation:
●Description: Rotates an object by a specified angle about a fixed point.
●Transformation Matrix(2D)
Applications: Image rotation, orientation adjustment

3. Scaling:
●Description: Changes the size of an object by multiplying its coordinates
by scaling factors.
●Transformation Matrix(2D):
●Applications: Zooming in/out ,resizing.
42
4. Shearing:
●Description: Distorts the shape of an object by varying its coordinates
linearly.
●Applications: Skewing, slanting.

5. Affine Transformation:
●Description: Combines translation, rotation, scaling, and shearing.
●Applications: Generalized transformations.

6. Perspective Transformation:
●Description: Represents a perspective projection, useful for simulating
three-dimensional effects.
43
●Applications:3D rendering, simulation.
7. Projective Transformation:
●Description: Generalization of perspective transformation with
additional control points.
●Transformation Matrix(3D):More complex than the perspective
transformation matrix.
●Applications: Computer graphics, augmented reality.
These transformations are crucial for various applications, including image

manipulation, computer-aided design (CAD), computer vision, and graphics
rendering. Understanding and applying geometric transformations are
fundamental skills in computer science and engineering fields related to
digital image processing.
44
1.11 GLOBAL OPTIMIZATION:
Global optimization is a branch of optimization that focuses on finding
the global minimum or maximum of a function over its entire feasible
domain. Unlike local optimization, which aims to find the optimal
solution within a specific region, global optimization seeks the best
possible solution across the entire search space. Global optimization
problems are often challenging due to the presence of multiple local
optima or complex, non-convex search spaces.
Here are key concepts and approaches related to global optimization:
Concepts:
Objective Function:
- The function to be minimized or maximized.
Feasible Domain:
- The set of input values (parameters) for which the objective function
is defined.
Global Minimum/Maximum:
- The lowest or highest value of the objective function over the entire
feasible domain.
Local Minimum/Maximum:
●A minimum or maximum within a specific region of the feasible
domain.
Approaches:
Grid Search:-
Dividing the feasible domain into a grid and evaluating the objective
function at each grid point to find the optimal solution.
Random Search:-
Randomly sampling points in the feasible domain and evaluating the
objective function to explore different regions.
45
Evolutionary Algorithms:- Genetic algorithms, particle swarm optimization,
and other evolutionary techniques use populations of solutions and genetic
operators to iteratively evolve toward the optimal solution.
Simulated Annealing:- Inspired by the annealing process in metallurgy,
simulated annealing gradually decreases the temperature to allow the algorithm
to escape local optima.
Ant Colony Optimization:- Inspired by the foraging behavior of ants, this
algorithm uses pheromone trails to guide the search for the optimal solution.
Genetic Algorithms:- Inspired by biological evolution, genetic algorithms use
mutation, crossover, and selection to evolve a population of potential solutions.
Particle Swarm Optimization:- Simulates the social behavior of birds or fish,
where a swarm of particles moves through the search space to find the optimal
solution.
Bayesian Optimization:-
Utilizes probabilistic models to model the objective function and guide the
search toward promising regions.
Quasi-Newton Methods:- Iterative optimization methods that use an
approximation of the Hessian matrix to find the optimal solution efficiently.
Global optimization is applied in various fields, including engineering design,
machine learning, finance, and parameter tuning in algorithmic optimization.
The choice of a specific global optimization method depends on the
characteristics of the objective function, the dimensionality of the search space,
and the available computational resources.
46
ACTIVITY:
Activity 1: Implementing and Visualizing the Fourier Transform
Objective: Understand and implement the Fourier Transform on an
image and visualize its frequency components.
Reference:
•https://Fourier Transform in Image Processing.co.uk/#!/worksheets
47
Video Links
Unit – 1
48
Video Links
Sl. Topic Video Link
No.
https://www.youtube.com/watch?v=spUNpyF58BY
1 Fourier Transform in
Image Processing
Computer Vision
2 Basics" by OpenCV CAP5415 Fall 2014 - YouTube
Image
3 Processing in
Sockets with Python 3 - YouTube
Python
Digital Image
https://www.youtube.com/playlist?list=PL6ZV-
4 Processing fpYwBx0aJDM7-Y3iC5Bz4rOUJpYQ
https://www.youtube.com/watch?v=CmQz0Txa1bw
5 Histogram
Equalization in Image
Enhancement
49
Assignments
Unit - I
50
Assignment Questions
Assignment Questions – Very Easy
Q. ASSIGNMENT QUESTIONS Marks Knowledg CO
e level
No.
1 Define computer vision and explain the 5 K3 CO1
difference between low-level, mid-level, and

high-level computer vision tasks.
5 K3 CO1
2 What is the purpose of histogram
equalization in image processing?
Assignment Questions – Easy

Q. ASSIGNMENT QUESTIONS Marks Knowledge CO
level
No.
1 Load a grayscale image using a library like 5 K4 CO1
OpenCV or PIL and display it.

Topic: Image Processing Basics
2 5 K3 CO1
Perform histogram equalization on a given
image and display the original and
enhanced images side by side.
Topic: Histogram Processing, Image
Enhancement
51
Assignment Questions – Medium
e level
No.
1 Apply the Fast Fourier Transform (FFT) to 5 K3 CO1
an image and display the magnitude

spectrum. Then, apply the inverse FFT to
reconstruct the image and display the
result. Topic: Fourier Transform
5 K3 CO1
2 Implement an image smoothing filter (e.g.,
Gaussian blur) and apply it to an image.
Display the original and smoothed images.
Topic: Convolution and Filtering
Assignment Questions – Hard

Q. ASSIGNMENT QUESTIONS Marks Knowledge CO
level
No.
1 Perform image restoration using inverse 5 K4 CO1
filtering on a degraded image. Assume a

simple degradation function (e.g., motion
blur) is given.
Topic: Image Restoration, Convolution and
Filtering
2 5 K3 CO1
Implement and apply a projective
transformation to warp an image (e.g.,
perspective correction). Display the original
and warped images.
Topic: Transformation: Projective
52
Assignment Questions – Very Hard
e level
No.
1 Develop 5 K5 CO1
an algorithm to automatically
detect and correct for skew in scanned
documents using Hough Transform for line
detection.
Topic: Image Enhancement, Transformation
5 K5 CO1
2 Implement an object detection algorithm
using a combination of low-level (e.g., edge
detection), mid-level (e.g., segmentation),
and high-level (e.g., machine learning
classifier) techniques. Test your algorithm
on a given dataset.
Topic: Low-level, Mid-level, High-level
Computer Vision Tasks
53
Course Outcomes:
CO1: Explain low level processing of image and transformation techniques applied
to images.
*Allotment of Marks
Correctness of the Presentation Timely Submission Total (Marks)
Content
15 - 5 20
54
Part A – Questions
& Answers
Unit – I
55
Part A - Questions & Answers
1. What is computer vision?[K2, CO1]
Computer vision is a field of artificial intelligence that enables computers to

interpret and make decisions based on visual data from the world, such as images
and videos.
2. What is the main difference between low-level and high-level

computer vision tasks?[K3, CO1]
Low-level tasks involve basic image processing operations like noise reduction and
edge detection, while high-level tasks involve interpreting and understanding the
content of images, such as object recognition and scene understanding.
3. What is the pinhole camera model? [K2, CO1]

The pinhole camera model is a simple geometric model for image formation, where
light rays pass through a small aperture (pinhole) and project an inverted image on
the opposite side.
4. What are intrinsic camera parameters?[K2, CO1]
Intrinsic camera parameters include the focal length, the optical center (principal
point), and the distortion coefficients of the camera lens.
5. What is an orthogonal transformation?. [K3, CO1]
An orthogonal transformation is a linear transformation that preserves angles and

lengths, typically represented by an orthogonal matrix with orthonormal rows and
columns.
6. Define Euclidean transformation in image processing.[K3, CO1]
Euclidean transformation, also known as rigid transformation, involves rotation and

translation operations that preserve the shape and size of objects.
56
7. What is an affine transformation?. [K3, CO1]
An affine transformation is a combination of linear transformations (like rotation,

scaling, and shearing) followed by translation, preserving lines and parallelism.
8. What is a projective transformation?[K2, CO1]
A projective transformation, or homography, maps straight lines to straight lines

but does not necessarily preserve parallelism, angles, or distances.
9. What is the purpose of the Fourier Transform in image processing?

[K3, CO1]
The Fourier Transform converts an image from the spatial domain to the
frequency domain, allowing for analysis and manipulation of its frequency
components.
10. How is the magnitude spectrum of a Fourier Transform

visualized?. [K3, CO1]
The magnitude spectrum is visualized by taking the logarithm of the

magnitude of the Fourier coefficients, often displaying it as an image for
better interpretation of frequency components.
57
11. What is convolution in image processing?[K2, CO1]
Convolution is a mathematical operation used to apply a filter to an

image, involving the sliding of a kernel over the image and
computing the sum of element-wise multiplications.
12. What is the purpose of a Gaussian filter?[K3, CO1]
A Gaussian filter is used for smoothing an image, reducing noise,

and blurring details by averaging pixel values with a Gaussian
function.
13. What is histogram equalization?. [K3, CO1]
Histogram equalization is a technique to enhance the contrast of an

image by redistributing its intensity values, making the histogram of
the output image more uniform.
14. What is the difference between contrast stretching and

histogram equalization?. [K3, CO1]
Contrast stretching linearly scales the pixel values to enhance image

contrast, while histogram equalization adjusts the pixel values non-
linearly to achieve a uniform histogram.
15. What is image restoration? [K3, CO1]
Image restoration involves techniques to recover a degraded image

by reversing the effects of blurring, noise, and other distortions.
58
16. What is inverse filtering in image restoration?[K2, CO1]
nverse filtering attempts to recover the original image by reversing the
degradation function, often used to counteract blurring.
17. What is a histogram in the context of image
processing?[K3, CO1]
A histogram represents the distribution of pixel intensity values in an
image, showing the frequency of each intensity level.
18. Why is histogram equalization important in medical
imaging?[K3, CO1]
Histogram equalization enhances the contrast of medical images,
making important details more visible for better diagnosis.
19. What is edge detection? [K3, CO1]
Edge detection is a low-level image processing technique to identify and
locate sharp discontinuities in intensity, often corresponding to object
boundaries.
20. Name two popular edge detection algorithms.K3 CO1]
The Sobel and Canny edge detection algorithms are widely used for
detecting edges in images.
59
21. What is image segmentation?. [K2, CO1]
Image segmentation is a mid-level task that divides an image into
meaningful regions or segments, typically to isolate objects or areas of
interest.
22. How does the k-means clustering algorithm work in image
segmentation? [K3 ,CO1]
K-means clustering partitions the image into k segments by assigning pixels
to the nearest cluster center, iteratively updating cluster centers based on
pixel values.
23 What is object recognition? [K3, CO1]
Object recognition is a high-level computer vision task that identifies and
classifies objects within an image based on learned features.
24. How does a convolutional neural network (CNN) aid in object
recognition?[K3, CO1]
CNNs automatically learn hierarchical features from images through
multiple layers of convolutions, pooling, and fully connected layers,
effectively recognizing objects.
25. What is the role of the lens in a camera?[K3, CO1]
The lens focuses light onto the image sensor, forming a clear image by
converging or diverging light rays.
60
26. What causes chromatic aberration in a camera lens? [K3,
CO1]
Chromatic aberration occurs when different wavelengths of light are refracted
by different amounts, causing color fringing and blurring.
27. Explain the concept of a homography matrix.[K3, CO1]
A homography matrix is a 3x3 transformation matrix that maps points from
one plane to another in projective space, used in applications like image
stitching.
28. What is the main application of affine transformations in image
processing? [K3, CO1]
Affine transformations are used for geometric corrections, such as scaling,
rotating, and skewing images, while preserving collinearity and parallelism.
29. What is a median filter used for in image processing? [K3, CO1]
A median filter is used to reduce noise in an image, particularly salt-and-
pepper noise, by replacing each pixel value with the median of neighboring
pixel values.
30. Describe the purpose of a Laplacian filter.[K3, CO1]
A Laplacian filter detects edges by highlighting regions of rapid intensity
change, using a second-order derivative of the image.
61
Part B – Questions
Unit – I
62
Part B Questions
Q. No. Questions K Level CO
Mapping
1 Discuss the differences between low-level, mid- K4 CO1

level, and high-level computer vision tasks.
Provide examples of each and explain their
significance in a complete computer vision
system.
2 Explain the process of image formation in a K4 CO1

pinhole camera model and compare it with the
lens-based camera model. Discuss the
advantages and disadvantages of each model.
3 Describe the mathematical formulations and K4 CO1

properties of orthogonal, Euclidean, affine, and
projective transformations. Provide examples of
where each type of transformation might be
used in image processing.
4 Discuss the Fourier Transform in the context of K3 CO1

image processing. Explain how it is applied to
an image, interpret the resulting frequency
domain representation, and discuss its practical
applications such as filtering and image
compression.
5 Explain the convolution operation in image K4 CO1

processing. Provide examples of different types
of convolution kernels (e.g., Sobel, Gaussian)
and discuss their effects on images.
6 Discuss various image enhancement K3 CO1

techniques, such as histogram equalization,
contrast stretching, and unsharp masking.
Explain how each technique improves image
quality and provide practical examples..
63
7 Describe the process of image restoration. K4 CO1
Discuss different degradation models and
restoration techniques, such as Wiener
filtering and blind deconvolution, providing
examples of their applications.
8 Explain the concept of histogram K3 CO1

processing in image enhancement.
Describe the steps involved in histogram
equalization and its effect on image
contrast. Discuss any limitations of this
technique.
9 Describe the process of edge detection in K4 CO1

images. Compare and contrast different
edge detection algorithms, such as Sobel,
Canny, and Laplacian of Gaussian,
highlighting their strengths and weaknesses.
10 Explain the concept of image segmentation. K4 CO1

Discuss various segmentation techniques,
such as thresholding, region growing, and
k-means clustering, and their applications
in different fields.
64
Supportive online
Certification
courses (NPTEL,
Swayam, Coursera,
Udemy, etc.,)
65
Supportive Online Certification
Courses
 Coursera – Introduction to Computer
Vision
• Description:
This course provides an overview of computer
vision, including image processing, feature
extraction, and object recognition.
• Offered by:
Georgia Tech
https://www.coursera.org/learn/introdu
ction-computer-vision
• NPTEL:Computer Vision
• Computer Vision - Course (nptel.ac.in)
 Udemy:
• Computer Vision -
https://www.bing.com/ck/a?!&&p=37e82e3153ef651eJmlt
dHM9MTcxODc1NTIwMCZpZ3VpZD0wZmYzMGRhMi1iMTh
hLTZmZDgtMmFkNi0xZThmYjAyNzZlYjQmaW5zaWQ9NTU
xMA&ptn=3&ver=2&hsh=3&fclid=0ff30da2-b18a-6fd8- -
2ad6-
1e8fb0276eb4&u=a1L3ZpZGVvcy9yaXZlcnZpZXcvcmVsYX
RlZHZpZGVvP3E9UHJhY3RpY2FsK09wZW5DViszK3VkbWV5
K2NvdXJzZSZtaWQ9NThDMDExNjI2NTMzQzBFNDRCNjE1O
EMwMTE2MjY1MzNDMEU0NEI2MSZGT1JNPVZJUkU&ntb=1
66
Real time
Applications in day
to day life and to
Industry
67
Real time Applications
1. Autonomous Vehicles
Topic: Lane Detection, Obstacle Detection, Traffic Sign Recognition
Description: Computer vision systems are crucial for autonomous vehicles to
navigate safely. They detect lanes, obstacles, and traffic signs in real-time,
enabling the vehicle to make driving decisions.
2. Facial Recognition
Topic: Security and Surveillance, Access Control
Description: Real-time facial recognition systems are used in security and
surveillance to identify individuals. They are also used in access control systems
to authenticate users.
3. Medical Imaging
Topic: Real-time Diagnostics, Surgery Assistance
Description: Computer vision assists in medical imaging by providing real-time
analysis for diagnostics and during surgeries. Applications include real-time MRI
analysis and guiding surgical robots.
4. Augmented Reality (AR)
Topic: Interactive Gaming, Virtual Try-Ons
Description: AR applications use real-time computer vision to overlay digital
content onto the real world. Examples include interactive gaming (e.g.,
Pokémon GO) and virtual try-ons for glasses or clothes.
68
Content Beyond
Syllabus
69
Medical Image Analysis
Deep learning has revolutionized medical diagnostics by
enhancing the analysis of medical images such as MRIs, CT
scans, and X-rays. By leveraging sophisticated algorithms,
deep learning models can extract intricate patterns and
features from these images, aiding in precise diagnosis and
treatment planning.
One prominent application is in radiomics, where deep
learning algorithms extract quantitative features from
radiographic images. These features can include texture,
shape, and intensity characteristics that are not visible to the
human eye. By analyzing these features, clinicians can
potentially predict disease progression, treatment response,
and patient outcomes with greater accuracy.
Case studies have demonstrated the efficacy of deep learning
in various medical specialties. For instance, in oncology, AI
models can identify subtle changes in tumor characteristics
that may indicate response to therapy or recurrence. In
neurology, these models assist in the early detection of
neurological disorders by analyzing brain scans. In cardiology,
AI helps in assessing cardiac function and detecting anomalies
in heart images.
Overall, the integration of deep learning into medical image
analysis holds promise for advancing precision medicine,
improving diagnostic accuracy, and ultimately enhancing
patient care across diverse medical fields.
70
Assessment
Schedule
(Proposed Date &
Actual Date)
71
Assessment Schedule
(Proposed Date & Actual Date)
Sl. ASSESSMENT Proposed Actual
No. Date Date
1 FIRST INTERNAL
ASSESSMENT
2 SECOND
INTERNAL
ASSESSMENT
3 MODEL
EXAMINATION
4 END SEMESTER
EXAMINATION
72
Prescribed Text
Books & Reference
73
Prescribed Text Books &
Reference
TEXT BOOKS:
1. D. A. Forsyth, J. Ponce, “Computer Vision: A Modern
Approach”, Pearson Education, 2003.
2. Richard Szeliski, “Computer Vision: Algorithms and
Applications”, Springer Verlag London Limited,2011.
REFERENCES:
1. B. K. P. Horn -Robot Vision, McGraw-Hill.
2. Simon J. D. Prince, Computer Vision: Models, Learning,
and Inference, Cambridge University Press, 2012.
3. Mark Nixon and Alberto S. Aquado, Feature Extraction &
Image Processing for Computer Vision, Third Edition,
Academic Press, 2012.
4. E. R. Davies, (2012), “Computer & Machine Vision”,
Fourth Edition, Academic Press.
5. Concise Computer Vision: An Introduction into Theory
and Algorithms, by Reinhard Klette, 2014
74
Mini Project
Suggestions
75
Mini Project Suggestions
[1] Very Hard
Develop a deep learning model (like YOLO or Faster R-CNN) for real-
time object detection and localization in images.
[2] Hard
Build a GAN-based model to enhance the resolution of low-resolution
images.
[3] Medium
Develop a CNN-based model for semantic segmentation of medical
images (e.g., brain MRI).
[4] Easy
Implement various image filtering and enhancement techniques.
[5] Very Easy
Explore the application of Fourier Transform in image processing.
76
Thank you
Disclaimer:
This document is confidential and intended solely for the educational

purpose of RMK Group of Educational Institutions. If you have received
this document through email in error, please notify the system
manager. This document contains proprietary information and is
intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate,
distribute or copy through e-mail. Please notify the sender immediately
by e-mail if you have received this document by mistake and delete this
document from your system. If you are not the intended recipient you
are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.
77

RMK Group 21cs905 CV Unit 1

Uploaded by

Copyright:

Available Formats

RMK Group 21cs905 CV Unit 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RMK Group 21cs905 CV Unit 1

Uploaded by

Copyright:

Available Formats

1

3 Pre Requisites (Course Name with Code) 8

4 Syllabus (With Subject Code, Name, LTPC details) 10

5 Course Outcomes (6) 12

8 Activity based learning 20

12 Part B Qs (with K level and CO) 62

16 Assessment Schedule ( Proposed Date & Actual Date) 71

17 Prescribed Text Books & Reference Books 73

To understand the fundamental concepts

 21CS905 COMPUTER VISION

Computer Vision - Geometric primitives and transformations - Photometric image formation -

UNIT II FEATURE DETECTION, MATCHING AND SEGMENTATION 15

UNIT III FEATURE-BASED ALIGNMENT & MOTION ESTIMATION 15

Volumetric representations - Model-based reconstruction - Recovering texture maps and

CO1 To understand basic knowledge, theories and methods in K2

CO3 To apply 2D a feature-based based image alignment, K3

CO5 To design and develop innovative image processing and K5

Knowledge Level Description

Linear filtering - 24-07- Blackboard

1 Geometric Primitives and 47

Sl. Contents Page

2 Geometric primitives and transformations 24

3 Photometric image formation 26

4 The digital camera 27

7 More neighborhood operators 35

9 Pyramids and wavelets 39

1.1 Computer Vision.

1.2 Geometric primitives and transformations:

Here are some key concepts involved:

Common linear filtering operations include:

1.8 Fourier transforms:

reconstructing images from their projections.

1.10 Geometric transformations :

Applications: Image rotation, orientation adjustment

●Applications: Zooming in/out ,resizing.

●Applications: Skewing, slanting.

●Applications: Generalized transformations.

These transformations are crucial for various applications, including image

difference between low-level, mid-level, and

Assignment Questions – Easy

OpenCV or PIL and display it.

an image and display the magnitude

Assignment Questions – Hard

filtering on a degraded image. Assume a

Computer vision is a field of artificial intelligence that enables computers to

2. What is the main difference between low-level and high-level

3. What is the pinhole camera model? [K2, CO1]

An orthogonal transformation is a linear transformation that preserves angles and

6. Define Euclidean transformation in image processing.[K3, CO1]

Euclidean transformation, also known as rigid transformation, involves rotation and

An affine transformation is a combination of linear transformations (like rotation,

8. What is a projective transformation?[K2, CO1]

A projective transformation, or homography, maps straight lines to straight lines

9. What is the purpose of the Fourier Transform in image processing?

10. How is the magnitude spectrum of a Fourier Transform

The magnitude spectrum is visualized by taking the logarithm of the

Convolution is a mathematical operation used to apply a filter to an

12. What is the purpose of a Gaussian filter?[K3, CO1]

A Gaussian filter is used for smoothing an image, reducing noise,