OpenCV-Python - Quick Guide
OpenCV-Python - Quick Guide
Originally developed by Intel, OpenCV is a cross platform library written in C++ but
also has a C Interface Wrappers for OpenCV which have been developed for many
other programming languages such as Java and Python. In this tutorial, functionality
of OpenCV’s Python library will be described.
OpenCV-Python
OpenCV-Python is a Python wrapper around C++ implementation of OpenCV
library. It makes use of NumPy library for numerical operations and is a rapid
prototyping tool for computer vision problems.
This tutorial is designed for the computer science students and professionals who
wish to gain expertise in the field of computer vision applications. Prior knowledge of
Python and NumPy library is essential to understand the functionality of OpenCV-
Python.
img=cv2.imread(filename, flags)
The function will return an image object, which can be rendered using imshow()
function. The command for using imshow() function is given below −
cv2.imshow(window-name, image)
The image is displayed in a named window. A new window is created with the
AUTOSIZE flag set. The WaitKey() is a keyboard binding function. Its argument is
the time in milliseconds.
The function waits for specified milliseconds and keeps the window on display till a
key is pressed. Finally, we can destroy all the windows thus created.
The function waits for specified milliseconds and keeps the window on display till a
key is pressed. Finally, we can destroy all the windows thus created.
import numpy as np
import cv2
# Load a color image in grayscale
img = cv2.imread('OpenCV_Logo.png',1)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
The command to save an image with the help of imwrite() function is as follows −
cv2.imwrite(filename, img)
The image format is automatically decided by OpenCV from the file extension.
OpenCV supports *.bmp, *.dib , *.jpeg, *.jpg, *.png,*.webp, *.sr,*.tiff, \*.tif etc.
image file types.
Example
Following program loads OpenCV logo image and saves its greyscale version when ‘s’
key is pressed −
import numpy as np
import cv2
# Load an color image in grayscale
img = cv2.imread('OpenCV_Logo.png',0)
cv2.imshow('image',img)
key=cv2.waitKey(0)
if key==ord('s'):
cv2.imwrite("opencv_logo_GS.png", img)
cv2.destroyAllWindows()
Output
import numpy as np
import cv2
import matplotlib.pyplot as plt
# Load an color image in grayscale
img = cv2.imread('OpenCV_Logo.png',0)
plt.imshow(img)
plt.show()
Output
The first two items shape[0] and shape[1] represent width and height of the
image.
Shape[2] stands for a number of channels.
3 indicates that the image has Red Green Blue (RGB) channels.
Similarly, the size property returns the size of the image. The command for the size
of an image is as follows −
>>> img.size
119880
We can access and manipulate any pixel’s value, with the help of the command
mentioned below.
>>> p=img[50,50]
>>> p
array([ 1, 1, 255], dtype=uint8)
Example
Following code changes the color value of the first 100X100 pixels to black. The
imshow() function can verify the result.
Output
The image channels can be split in individual planes by using the split() function.
The channels can be merged by using merge() function.
Suppose we set all pixels in blue channel to 0, the code will be as follows −
bitwise_and
bitwise_or
bitwise_xor
bitwise_not
Example 1
To demonstrate the use of these operators, two images with filled and empty circles
are taken.
import cv2
import numpy as np
img1 = cv2.imread('a.png')
img2 = cv2.imread('b.png')
Output
Example 2
In another example involving bitwise operations, the opencv logo is superimposed on
another image. Here, we obtain a mask array calling threshold() function on the
logo and perform AND operation between them.
Similarly, by NOT operation, we get an inverse mask. Also, we get AND with the
background image.
import cv2 as cv
import numpy as np
img1 = cv.imread('lena.jpg')
img2 = cv.imread('whitelogo.png')
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols]
img2gray = cv.cvtColor(img2,cv.COLOR_BGR2GRAY)
ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH_BINARY)
mask_inv = cv.bitwise_not(mask)
# Now black-out the area of logo
img1_bg = cv.bitwise_and(roi,roi,mask = mask_inv)
Output
The masked images give following result −
OpenCV Python - Draw Shapes and Text
In this chapter, we will learn how to draw shapes and text on images with the help of
OpenCV-Python. Let us begin by understanding about drawing shapes on images.
Functions
Parameters
img
1
The image where you want to draw the shapes
2 color
Color of the shape. for BGR, pass it as a tuple. For grayscale, just pass
the scalar value.
thickness
3 Thickness of the line or circle etc. If -1 is passed for closed figures like
circles, it will fill the shape.
lineType
4
Type of line, whether 8-connected, anti-aliased line etc.
Example
Following example shows how the shapes are drawn on top of an image. The
program for the same is given below −
import numpy as np
import cv2
img = cv2.imread('LENA.JPG',1)
cv2.line(img,(20,400),(400,20),(255,255,255),3)
cv2.rectangle(img,(200,100),(400,400),(0,255,0),5)
cv2.circle(img,(80,80), 55, (255,255,0), -1)
cv2.ellipse(img, (300,425), (80, 20), 5, 0, 360, (0,0,255), -1)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output
Draw Text
The cv2.putText() function is provided to write a text on the image. The command
for the same is as follows −
Fonts
FONT_HERSHEY_SIMPLEX 0
FONT_HERSHEY_PLAIN 1
FONT_HERSHEY_DUPLEX 2
FONT_HERSHEY_COMPLEX 3
FONT_HERSHEY_TRIPLEX 4
FONT_HERSHEY_COMPLEX_SMALL 5
FONT_HERSHEY_SCRIPT_SIMPLEX 6
FONT_HERSHEY_SCRIPT_COMPLEX 7
FONT_ITALIC 16
Example
Following program adds a text caption to a photograph showing Lionel Messi, the
famous footballer.
import numpy as np
import cv2
img = cv2.imread('messi.JPG',1)
txt="Lionel Messi"
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img,txt,(10,100), font, 2,(255,255,255),2,cv2.LINE_AA)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output
OpenCV Python - Handling Mouse Events
OpenCV is capable of registering various mouse related events with a callback
function. This is done to initiate a certain user defined action depending on the type
of mouse event.
cv.EVENT_MOUSEMOVE
1
When the mouse pointer has moved over the window.
cv.EVENT_LBUTTONDOWN
2
Indicates that the left mouse button is pressed.
cv.EVENT_RBUTTONDOWN
3
Event of that the right mouse button is pressed.
cv.EVENT_MBUTTONDOWN
4
Indicates that the middle mouse button is pressed.
cv.EVENT_LBUTTONUP
5
When the left mouse button is released.
cv.EVENT_RBUTTONUP
6
When the right mouse button is released.
7 cv.EVENT_MBUTTONUP
Indicates that the middle mouse button is released.
cv.EVENT_LBUTTONDBLCLK
8
This event occurs when the left mouse button is double clicked.
cv.EVENT_RBUTTONDBLCLK
9
Indicates that the right mouse button is double clicked.
cv.EVENT_MBUTTONDBLCLK
10
Indicates that the middle mouse button is double clicked.
cv.EVENT_MOUSEWHEEL
11
Positive for forward and negative for backward scrolling.
cv2.setMouseCallback(window, callbak_function)
This function passes the type and location of the event to the callback function for
further processing.
Example 1
Following code draws a circle whenever left button double click event occurs on the
window showing an image as background −
import numpy as np
import cv2 as cv
# mouse callback function
def drawfunction(event,x,y,flags,param):
if event == cv.EVENT_LBUTTONDBLCLK:
cv.circle(img,(x,y),20,(255,255,255),-1)
img = cv.imread('lena.jpg')
cv.namedWindow('image')
cv.setMouseCallback('image',drawfunction)
while(1):
cv.imshow('image',img)
key=cv.waitKey(1)
if key == 27:
break
cv.destroyAllWindows()
Output
Run the above program and double click at random locations. The similar output will
appear −
Example 2
Following program interactively draws either rectangle, line or circle depending on
user input (1,2 or 3) −
import numpy as np
import cv2 as cv
# mouse callback function
drawing=True
shape='r'
def draw_circle(event,x,y,flags,param):
global x1,x2
if event == cv.EVENT_LBUTTONDOWN:
drawing = True
x1,x2 = x,y
elif event == cv.EVENT_LBUTTONUP:
drawing = False
if shape == 'r':
cv.rectangle(img,(x1,x2),(x,y),(0,255,0),-1)
if shape == 'l':
cv.line(img,(x1,x2),(x,y),(255,255,255),3)
if shape=='c':
cv.circle(img,(x,y), 10, (255,255,0), -1)
img = cv.imread('lena.jpg')
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)
while(1):
cv.imshow('image',img)
key=cv.waitKey(1)
if key==ord('1'):
shape='r'
if key==ord('2'):
shape='l'
if key==ord('3'):
shape='c'
#print (shape)
if key == 27:
break
cv.destroyAllWindows()
On the window surface, a rectangle is drawn between the coordinates of the mouse
left button down and up if ‘1’ is pressed.
On choosing 3 for the circle, it is drawn at the coordinates of the mouse up event.
Following image will be the output after the successful execution of the above
mentioned program −
OpenCV Python - Add Trackbar
Trackbar in OpenCV is a slider control which helps in picking a value for the variable
from a continuous range by manually sliding the tab over the bar. Position of the tab
is synchronised with a value.
The createTrackbar() function creates a Trackbar object with the following command
−
In the following example, three trackbars are provided for the user to set values of
R, G and B from the grayscale range 0 to 255.
Using the track bar position values, a rectangle is drawn with the fill colour
corresponding to RGB colour value.
Example
Following program is for adding a trackbar −
import numpy as np
import cv2 as cv
img = np.zeros((300,400,3), np.uint8)
cv.namedWindow('image')
def nothing(x):
pass
while(1):
cv.imshow('image',img)
k = cv.waitKey(1) & 0xFF
if k == 27:
break
# get current positions of four trackbars
r = cv.getTrackbarPos('R','image')
g = cv.getTrackbarPos('G','image')
b = cv.getTrackbarPos('B','image')
#s = cv.getTrackbarPos(switch,'image')
#img[:] = [b,g,r]
cv.rectangle(img, (100,100),(200,200), (b,g,r),-1)
cv.destroyAllWindows()
Output
OpenCV Python - Resize and Rotate an Image
In this chapter, we will learn how to resize and rotate an image with the help of
OpenCVPython.
Resize an Image
It is possible to scale up or down an image with the use of cv2.resize() function.
When graphical data contains a gap, but data is available on either side of the gap or
at a few specific points within the gap. Interpolation allows us to estimate the values
within the gap.
In the above resize() function, interpolation flags determine the type of interpolation
used for calculating size of destination image.
Types of Interpolation
Example
Following code resizes the ‘messi.jpg’ image to half its original height and width.
import numpy as np
import cv2
img = cv2.imread('messi.JPG',1)
height, width = img.shape[:2]
res = cv2.resize(img,(int(width/2), int(height/2)), interpolation =
cv2.INTER_AREA)
cv2.imshow('image',res)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output
Rotate an image
OpenCV uses affine transformation functions for operations on images such as
translation and rotation. The affine transformation is a transformation that can be
expressed in the form of a matrix multiplication (linear transformation) followed by a
vector addition (translation).
Following program rotates the original image by 90 degrees without changing the
dimensions −
Example
import numpy as np
import cv2
img = cv2.imread('OpenCV_Logo.png',1)
h, w = img.shape[:2]
center = (w / 2, h / 2)
mat = cv2.getRotationMatrix2D(center, 90, 1)
rotimg = cv2.warpAffine(img, mat, (h, w))
cv2.imshow('original',img)
cv2.imshow('rotated', rotimg)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output
Original Image
Rotated Image
In simple thresholding, all pixels with value less than threshold are set to zero, rest
to the maximum pixel value. This is the simplest form of thresholding.
The cv2.threshold() function has the following definition.
Parameters
The parameters for the image thresholding are as follows −
Types of Thresholding
Other types of thresholding are enumerated as below −
cv.THRESH_BINARY
1
dst(x,y) = maxval if src(x,y)>thresh 0 otherwise
cv.THRESH_BINARY_INV
2
dst(x,y)=0 if src(x,y)>thresh maxval otherwise
cv.THRESH_TRUNC
3
dst(x,y)=threshold if src(x,y)>thresh src(x,y) otherwise
cv.THRESH_TOZERO
4
dst(x,y)=src(x,y) if src(x,y)>thresh 0 otherwise
cv.THRESH_TOZERO_INV
5
dst(x,y)=0 if src(x,y)>thresh src(x,y)otherwise
Following program produces a binary image from the original with a gradient of grey
values from 255 to 0 by setting a threshold to 127.
Example
Original and resultant threshold binary images are plotted side by side using
Matplotlib library.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('gradient.png',0)
ret,img1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
plt.subplot(2,3,1),plt.imshow(img,'gray',vmin=0,vmax=255)
plt.title('Original')
plt.subplot(2,3,2),plt.imshow(img1,'gray',vmin=0,vmax=255)
plt.title('Binary')
plt.show()
Output
The adaptive thresholding determines the threshold for a pixel based on a small
region around it. So, different thresholds for different regions of the same image are
obtained. This gives better results for images with varying illumination.
The adaptiveMethod has following enumerated values −
Example
In the example below, the original image (messi.jpg is applied with mean and
Gaussian adaptive thresholding.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('messi.jpg',0)
img = cv.medianBlur(img,5)
th1 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
th2 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)
titles = ['Original', 'Mean Thresholding', 'Gaussian Thresholding']
images = [img, th1, th2]
for i in range(3):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.title(titles[i])
plt.xticks([]),plt.yticks([])
plt.show()
Output
Original as well as adaptive threshold binary images are plotted by using matplotlib
as shown below −
Example
OTSU algorithm determines the threshold value automatically from the image
histogram. We need to pass the cv.THRES_OTSU flag in addition to the THRESH-
BINARY flag.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img = cv.imread('messi.jpg',0)
# global thresholding
ret1,img1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
# Otsu's thresholding
ret2,img2 = cv.threshold(img,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
plt.subplot(2,2,1),plt.imshow(img,'gray',vmin=0,vmax=255)
plt.title('Original')
plt.subplot(2,2,2),plt.imshow(img1,'gray')
plt.title('Binary')
plt.subplot(2,2,3),plt.imshow(img2,'gray')
plt.title('OTSU')
plt.show()
Output
The matplotlib’s plot result is as follows −
Image filtering is a process of averaging the pixel values so as to alter the shade,
brightness, contrast etc. of the original image.
By applying a low pass filter, we can remove any noise in the image. High pass filters
help in detecting the edges.
The OpenCV library provides cv2.filter2D() function. It performs convolution of the
original image by a kernel of a square matrix of size 3X3 or 5X5 etc.
Convolution slides a kernel matrix across the image matrix horizontally and
vertically. For each placement, add all pixels under the kernel, take the average of
pixels under the kernel and replace the central pixel with the average value.
Perform this operation for all pixels to obtain the output image pixel matrix. Refer
the diagram given below −
The cv2.filter2D() function requires input array, kernel matrix and output array
parameters.
Example
Following figure uses this function to obtain an averaged image as a result of 2D
convolution. The program for the same is as follows −
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('opencv_logo_gs.png')
kernel = np.ones((3,3),np.float32)/9
dst = cv.filter2D(img,-1,kernel)
plt.subplot(121),plt.imshow(img),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(dst),plt.title('Convolved')
plt.xticks([]), plt.yticks([])
plt.show()
Output
Types of Filtering Function
Other types of filtering function in OpenCV includes −
MedianBlur − Instead of average, it takes the median of all pixels under the
kernel and replaces the central value.
Canny edge detection algorithm was developed by John Canny. According to it,
object’s edges are determined by performing following steps −
First step is to reduce the noisy pixels in the image. This is done by applying 5X5
Gaussian Filter.
Second step involves finding the intensity gradient of the image. The smooth image
of the first stage is filtered by applying the Sobel operator to obtain first order
derivatives in horizontal and vertical directions (Gx and Gy).
The root mean square value gives edge gradient and tan inverse ratio of derivatives
gives the direction of edge.
−−−−−−−
2 2
Edge gradient G = √Gx + Gy
Gy
−1
Angle θ = tan ( )
Gx
After getting gradient magnitude and direction, a full scan of the image is done to
remove any unwanted pixels which may not constitute the edge.
All these steps are performed by OpenCV’s cv2.Canny() function which needs the
input image array and minval and maxval parameters.
Example
Here’s the example of canny edge detection. The program for the same is as follows
−
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('lena.jpg', 0)
edges = cv.Canny(img,100,200)
plt.subplot(121),plt.imshow(img,cmap = 'gray')
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(edges,cmap = 'gray')
plt.title('Edges of original Image'), plt.xticks([]), plt.yticks([])
plt.show()
Output
By using histogram, one can understand the contrast, brightness and intensity
distribution of the specified image. The bins in a histogram represent incremental
parts of the values on X axis.
In our case, it is the pixel value and the default bin size is one.
Parameters
The cv2.calcHist() function’s parameters are as follows −
mask − Mask image is given as "None" for full image. For a particular region
of image, you have to create a mask image for that and give it as a mask.
histSize − This represents our BIN count.
Example
Following program computes histogram for each channel in the image (lena.jpg) and
plots the intensity distribution for each channel −
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('lena.jpg')
color = ('b','g','r')
for i,col in enumerate(color):
hist = cv.calcHist([img],[i],None,[256],[0,256])
plt.plot(hist, color = col)
plt.xlim([0,256])
plt.show()
Output
OpenCV Python - Color Spaces
A color space is a mathematical model describing how colours can be represented. It
is described in a specific, measurable, and fixed range of possible colors and
luminance values.
HSV color space − H, S and V stand for Hue, Saturation and Value. This is
an alternative color model to RGB. This model is supposed to be closer to the
way a human eye perceives any colour. Hue value is between 0 to 179,
whereas S and V numbers are between 0 to 255.
CMYK color space − In contrast to RGB, CMYK is a subtractive color model.
The alphabets stand for Cyan, Magenta, Yellow and Black. White light minus
red leaves cyan, green subtracted from white leaves magenta, and white
minus blue returns yellow. All the values are represented on the scale of 0 to
100 %.
CIELAB color space − The LAB color space has three components which are
L for lightness, A color components ranging from Green to Magenta and B for
components from Blue to Yellow.
YCrCb color space − Here, Cr stands for R-Y and Cb stands for B-Y. This
helps in separation of luminance from chrominance into different channels.
OpenCV supports conversion of image between color spaces with the help of
cv2.cvtColor() function.
Conversion Codes
The conversion is governed by following predefined conversion codes.
cv.COLOR_BGR2BGRA
1
Add alpha channel to RGB or BGR image.
cv.COLOR_BGRA2BGR
2
Remove alpha channel from RGB or BGR image.
cv.COLOR_BGR2GRAY
3
Convert between RGB/BGR and grayscale.
cv.COLOR_BGR2YCrCb
4
Convert RGB/BGR to luma-chroma
cv.COLOR_BGR2HSV
5
Convert RGB/BGR to HSV
cv.COLOR_BGR2Lab
6
Convert RGB/BGR to CIE Lab
cv.COLOR_HSV2BGR
7
Backward conversions HSV to RGB/BGR
Example
Following program shows the conversion of original image with RGB color space to
HSV and Gray schemes −
import cv2
img = cv2.imread('messi.jpg')
img1 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY )
img2 = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Displaying the image
cv2.imshow('original', img)
cv2.imshow('Gray', img1)
cv2.imshow('HSV', img2)
Output
OpenCV Python - Morphological Transformations
Simple operations on an image based on its shape are termed as morphological
transformations. The two most common transformations are erosion and dilation.
Erosion
Erosion gets rid of the boundaries of the foreground object. Similar to 2D
convolution, a kernel is slide across the image A. The pixel in the original image is
retained, if all the pixels under the kernel are 1.
Otherwise it is made 0 and thus, it causes erosion. All the pixels near the boundary
are discarded. This process is useful for removing white noises.
Parameters
The src and dst parameters are input and output image arrays of the same size.
Kernel is a matrix of structuring elements used for erosion. For example, 3X3 or 5X5.
The anchor parameter is -1 by default which means the anchor element is at center.
Iterations refers to the number of times erosion is applied.
Dilation
It is just the opposite of erosion. Here, a pixel element is 1, if at least one pixel
under the kernel is 1. As a result, it increases the white region in the image.
Parameters
The dilate() function has the same parameters such as that of erode() function.
Both functions can have additional optional parameters as BorderType and
borderValue.
Example
Given below is an example program showing erode() and dilate() functions in use −
import cv2 as cv
import numpy as np
img = cv.imread('LinuxLogo.jpg',0)
kernel = np.ones((5,5),np.uint8)
erosion = cv.erode(img,kernel,iterations = 1)
dilation = cv.dilate(img,kernel,iterations = 1)
cv.imshow('Original', img)
cv.imshow('Erosion', erosion)
cv.imshow('Dialation', dilation)
Output
Original Image
Erosion
Dilation
OpenCV Python - Image Contours
Contour is a curve joining all the continuous points along the boundary having the
same color or intensity. The contours are very useful for shape analysis and object
detection.
Find Contour
Before finding contours, we should apply threshold or canny edge detection. Then,
by using findContours() method, we can find the contours in the binary image.
Parameters
On the other hand, approximation method can be one from the following −
Draw Contour
After detecting the contour vectors, contours are drawn over the original image by
using the cv.drawContours() function.
Parameters
Example
In the first step, we obtain a gray image and then perform the canny edge detection.
On the resultant image, we then call findContours() function. Its result is a point
vector. We then call the drawContours() function.
import cv2
import numpy as np
img = cv2.imread('shapes.png')
cv2.imshow('Original', img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow('Contours', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output
The original image, after canny edge detection and one with contours drawn will be
displayed in separate windows as shown here −
After the canny edge detection, the image will be as follows −
Where image is the input image in which the templ (template) pattern is to be
located. The method parameter takes one of the following values −
cv.TM_CCOEFF,
cv.TM_CCOEFF_NORMED, cv.TM_CCORR,
cv.TM_CCORR_NORMED,
cv.TM_SQDIFF,
cv.TM_SQDIFF_NORMED
This method slides the template image over the input image. This is a similar
process to convolution and compares the template and patch of input image under
the template image.
It returns a grayscale image, whose each pixel denotes how much it matches with
the template. If the input image is of size (WxH) and template image is of size
(wxh), the output image will have a size of (W-w+1, H-h+1). Hence, that rectangle
is your region of template.
Example
In an example below, an image having Indian cricketer Virat Kohli’s face is used as a
template to be matched with another image which depicts his photograph with
another Indian cricketer M.S.Dhoni.
Following program uses a threshold value of 80% and draws a rectangle around the
matching face −
import cv2
import numpy as np
img = cv2.imread('Dhoni-and-Virat.jpg',1)
cv2.imshow('Original',img)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.imread('virat.jpg',0)
cv2.imshow('Template',template)
w,h = template.shape[0], template.shape[1]
matched = cv2.matchTemplate(gray,template,cv2.TM_CCOEFF_NORMED)
threshold = 0.8
for pt in zip(*loc[::-1]):
cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0,255,255), 2)
cv2.imshow('Matched with Template',img)
Output
The original image, the template and matched image of the result as follows −
Original image
The Gaussian pyramid is used to down sample images while the Laplacian pyramid
reconstructs an up sampled image from an image lower in the pyramid with less
resolution.
1 4 6 4 1
⎡ ⎤
⎢4 16 24 16 4⎥
1 ⎢ ⎥
⎢ ⎥
⎢6 24 36 24 6⎥
16 ⎢ ⎥
⎢4 16 24 16 4⎥
⎣ ⎦
1 4 6 4 1
Now remove every even-numbered row and column. Resulting image will be 1/4th
the area of its predecessor. Iterating this process on the original image produces the
entire pyramid.
To make the images bigger, the columns filled with zeros. First, upsize the image to
double the original in each dimension, with the new even rows and then perform a
convolution with the kernel to approximate the values of the missing pixels.
The cv.pyrUp() function doubles the original size and cv.pyrDown() function
decreases it to half.
Example
Following program calls pyrUp() and pyrDown() functions depending on user input
“I” or “o” respectively.
Note that when we reduce the size of an image, information of the image is lost.
Once, we scale down and if we rescale it to the original size, we lose some
information and the resolution of the new image is much lower than the original one.
import sys
import cv2 as cv
filename = 'chicky_512.png'
src = cv.imread(filename)
while 1:
print ("press 'i' for zoom in 'o' for zoom out esc to stop")
rows, cols, _channels = map(int, src.shape)
cv.imshow('Pyramids', src)
k = cv.waitKey(0)
if k == 27:
break
cv.destroyAllWindows()
Output
OpenCV Python - Image Addition
When an image is read by imread() function, the resultant image object is really a
two or three dimensional matrix depending upon if the image is grayscale or RGB
image.
Hence, cv2.add() functions add two image matrices and returns another image
matrix.
Example
Following code reads two images and performs their binary addition −
kalam = cv2.imread('kalam.jpg')
einst = cv2.imread('einstein.jpg')
img = cv2.add(kalam, einst)
cv2.imshow('addition', img)
Result
Instead of a linear binary addition, OpenCV has a addWeighted() function that
performs weighted sum of two arrays. The command for the same is as follows
Parameters
The parameters of the addWeighted() function are as follows −
The image matrices obtained in the above example are used to perform weighted
sum.
By varying a from 0 -> 1, a smooth transition takes place from one image to
another, so that they blend together.
First image is given a weight of 0.3 and the second image is given 0.7. The gamma
factor is taken as 0.
It can be seen that the image addition is smoother compared to binary addition.
First load the images and find Gaussian pyramids for both. The program for the
same is as follows −
import cv2
import numpy as np,sys
kalam = cv2.imread('kalam.jpg')
einst = cv2.imread('einstein.jpg')
### generate Gaussian pyramid for first
G = kalam.copy()
gpk = [G]
for i in range(6):
G = cv2.pyrDown(G)
gpk.append(G)
# generate Gaussian pyramid for second
G = einst.copy()
gpe = [G]
for i in range(6):
G = cv2.pyrDown(G)
gpe.append(G)
From the Gaussian pyramids, obtain the respective Laplacian Pyramids. The program
for the same is as follows −
Then, join the left half of the first image with the right half of second in each level of
pyramids. The program for the same is as follows −
Finally, reconstruct the image from this joint pyramid. The program for the same is
given below −
ls_ = LS[0]
for i in range(1,6):
ls_ = cv2.pyrUp(ls_)
ls_ = cv2.add(ls_, LS[i])
cv2.imshow('RESULT',ls_)
Output
The blended result should be as follows −
OpenCV Python - Fourier Transform
The Fourier Transform is used to transform an image from its spatial domain to its
frequency domain by decomposing it into its sinus and cosines components.
In case of digital images, a basic gray scale image values usually are between zero
and 255. Therefore, the Fourier Transform too needs to be a Discrete Fourier
Transform (DFT). It is used to find the frequency domain.
N−1 N−1
ki lj
−i2π( , )
F(k, l) = ∑ ∑ f (i, j) e N N
i=0 j=0
If the amplitude varies so fast in a short time, you can say it is a high frequency
signal. If it varies slowly, it is a low frequency signal.
In case of images, the amplitude varies drastically at the edge points, or noises. So
edges and noises are high frequency contents in an image. If there are no much
changes in amplitude, it is a low frequency component.
OpenCV provides the functions cv.dft() and cv.idft() for this purpose.
Here,
Example
Given below is the program using Matplotlib, we plot the original image and
magnitude spectrum −
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('lena.jpg',0)
dft = cv.dft(np.float32(img),flags = cv.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)
magnitude_spectrum = 20*np.log(cv.magnitude(dft_shift[:,:,0],dft_shift[:,:,1]
plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])
plt.show()
Output
This function needs a device index as the parameter. Your computer may have
multiple cameras attached. They are enumerated by an index starting from 0 for
built-in webcam. The function returns a VideoCapture object
cam = cv.VideoCapture(0)
After the camera is opened, we can read successive frames from it with the help of
read() function
ret,frame = cam.read()
The read() function reads the next available frame and a return value (True/False).
This frame is now rendered in desired color space with the cvtColor() function and
displayed on the OpenCV window.
To capture the current frame to an image file, you can use imwrite() function.
cv2.imwrite(“capture.png”, img)
To save the live stream from camera to a video file, OpenCV provides a VideoWriter()
function.
The fourcc parameter is a standardized code for video codecs. OpenCV supports
various codecs such as DIVX, XVID, MJPG, X264 etc. The fps anf framesize
parameters depend on the video capture device.
The VideoWriter() function returns a VideoWrite stream object, to which the grabbed
frames are successively written in a loop. Finally, release the frame and VideoWriter
objects to finalize the creation of video.
Example
Following example reads live feed from built-in webcam and saves it to ouput.avi file.
import cv2 as cv
cam = cv.VideoCapture(0)
cc = cv.VideoWriter_fourcc(*'XVID')
file = cv.VideoWriter('output.avi', cc, 15.0, (640, 480))
if not cam.isOpened():
print("error opening camera")
exit()
while True:
# Capture frame-by-frame
ret, frame = cam.read()
# if frame is read correctly ret is True
if not ret:
print("error in retrieving frame")
break
img = cv.cvtColor(frame, cv.COLOR_BGR2RGB)
cv.imshow('frame', img)
file.write(img)
if cv.waitKey(1) == ord('q'):
break
cam.release()
file.release()
cv.destroyAllWindows()
video=cv2.VideoCapture(file)
FFPyPlayer
FFPyPlayer is a python binding for the FFmpeg library for playing and writing media
files. To install, use pip installer utility by using the following command.
Following is the complete code for playing a video file along with its audio −
import cv2
video=cv2.VideoCapture(file)
player = MediaPlayer(file)
while True:
ret, frame=video.read()
audio_frame, val = player.get_frame()
if not ret:
print("End of video")
break
if cv2.waitKey(1) == ord("q"):
break
cv2.imshow("Video", frame)
if val != 'eof' and audio_frame is not None:
#audio
img, t = audio_frame
video.release()
cv2.destroyAllWindows()
The cv2.read() function returns the next available frame. The function also gives a
return value which continues to be true till the end of stream. Here, a counter is
incremented inside the loop and used as a file name.
import cv2
import os
cam = cv2.VideoCapture("video.avi")
frameno = 0
while(True):
ret,frame = cam.read()
if ret:
# if video is still left continue creating images
name = str(frameno) + '.jpg'
print ('new frame captured...' + name)
cv2.imwrite(name, frame)
frameno += 1
else:
break
cam.release()
cv2.destroyAllWindows()
First, ensure that all the required images are in a folder. Python’s glob() function in
the built-in glob module builds an array of images so that we can iterate through it.
Read the image object from the images in the folder and append to an image array.
import cv2
import numpy as np
import glob
img_array = []
for filename in glob.glob('*.png'):
img = cv2.imread(filename)
height, width, layers = img.shape
size = (width,height)
img_array.append(img)
The create a video stream by using VideoWriter() function to write the contents of
the image array to it. Given below is the program for the same.
for i in range(len(img_array)):
out.write(img_array[i])
out.release()
You should find the file named ‘video.avi’ in the current folder.
Pretrained classifiers for face, eye etc. can be downloaded from https://github.com
Example
The complete code for face detection is as follows −
import numpy as np
import cv2
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
img = cv2.imread('Dhoni-and-virat.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex,ey,ew,eh) in eyes:
cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output
You will get rectangles drawn around faces in the input image as shown below −
The KDE will have places with a high and low data point density, respectfully.
Meanshift is a very useful method to keep the track of a particular object inside a
video.
Every instance of the video is checked in the form of pixel distribution in that frame.
An initial window as region of interest (ROI) is generally a square or a circle. For
this, the positions are specified by hardcoding and the area of maximum pixel
distribution is identified.
The ROI window moves towards the region of maximum pixel distribution as the
video runs. The direction of movement depends upon the difference between the
center of our tracking window and the centroid of all the k-pixels inside that window.
In order to use Meanshift in OpenCV, first, find the histogram (of which, only Hue is
considered) of our target and can back project its target on each frame for
calculation of Meanshift. We also need to provide an initial location of the ROI
window.
We repeatedly calculate the back projection of the histogram and calculate the
Meanshift to get the new position of track window. Later on, we draw a rectangle
using its dimensions on the frame.
Functions
Example
import numpy as np
import cv2 as cv
cap = cv.VideoCapture('traffic.mp4')
ret,frame = cap.read()
while(1):
ret, frame = cap.read()
if ret == True:
hsv = cv.cvtColor(frame, cv.COLOR_BGR2HSV)
dst = cv.calcBackProject([hsv],[0],reg_hist,[0,180],1)
# apply meanshift
ret, tracker = cv.meanShift(dst, tracker, criteria)
# Draw it on image
x,y,w,h = tracker
img = cv.rectangle(frame, (x,y), (x+w,y+h), 255,2)
cv.imshow('img',img)
As the program is run, the Meanshift algorithm moves our window to the new
location with maximum density.
Output
Camshift
One of the disadvantages of Meanshift algorithm is that the size of the tracking
window remains the same irrespective of the object's distance from the camera.
Also, the window will track the object only if it is in the region of that object. So, we
must do manual hardcoding of the window and it should be done carefully.
First, it finds an object center using meanShift and then adjusts the window size and
finds the optimal rotation. The function returns the object position, size, and
orientation. The position is drawn on the frame by using polylines() draw function.
Example
Output
One snapshot of the result of modified program showing rotated rectangle of the
tracking window is as follows −
D.Lowe's new algorithm, Scale Invariant Feature Transform (SIFT) extracts the
key points and computes its descriptors.
Keypoint Localization.
Orientation Assignment.
Keypoint Descriptor.
Keypoint Matching.
Example
Calling its detect() method obtains key points which are drawn on top of the
original image. Following code implements this procedure
import numpy as np
import cv2 as cv
img = cv.imread('home.jpg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
sift = cv.SIFT_create()
kp = sift.detect(gray,None)
img=cv.drawKeypoints(gray,kp,img)
cv.imwrite('keypoints.jpg',img)
Output
The original image and the one with keypoints drawn are shown below −
Example
Following example uses brute-force method
import numpy as np
import cv2
img1 = cv2.imread('lena.jpg')
img2 = cv2.imread('lena-test.jpg')
# Convert it to grayscale
img1_bw = cv2.cvtColor(img1,cv2.COLOR_BGR2GRAY)
img2_bw = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
orb = cv2.ORB_create()
matcher = cv2.BFMatcher()
matches = matcher.match(queryDescriptors,trainDescriptors)
Output
OpenCV Python - Digit Recognition with KNN
KNN which stands for K-Nearest Neighbour is a Machine Learning algorithm based
on Supervised Learning. It tries to put a new data point into the category that is
most similar to the available categories. All the available data is classified into
distinct categories and a new data point is put in one of them based on the similarity.
Category with maximum data points is the category in which the new data
point is classified.
import numpy as np
import cv2
image = cv2.imread('digits.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
fset=[]
for i in np.vsplit(gray,50):
x=np.hsplit(i,100)
fset.append(x)
NP_array = np.array(fset)
Now we divide this data in training set and testing set, each of size (2500,20x20) as
follows −
trainset = NP_array[:,:50].reshape(-1,400).astype(np.float32)
testset = NP_array[:,50:100].reshape(-1,400).astype(np.float32)
Next, we have to create 10 different labels for each digit, as shown below −
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = np.repeat(k,250)[:,np.newaxis]
We are now in a position to start the KNN classification. Create the classifier object
and train the data.
knn = cv2.ml.KNearest_create()
knn.train(trainset, cv2.ml.ROW_SAMPLE, train_labels)
Compare the output with test labels to check the performance and accuracy of the
classifier.
result = output==test_labels
correct = np.count_nonzero(result)
accuracy = (correct*100.0)/(output.size)
print(accuracy)