Optimizing Image Analysis For Speed

Optimizing Image Analysis for Speed
General Rules
Rule #1: Do not compute what you do not need.
 Use image resolution well fitted to the task. The higher the resolution, the slower the
processing.
 Use the inRoi input of image processing to compute only the pixels that are needed
in further processing steps.
 If several image processing operations occur in sequence in a confined region then it
might be better to use CropImage at first.
 Do not overuse images of types other than UInt8 (8-bit).
 Do not use multi-channel images when there is no color information being
processed.
 If some computations can be done only once, move them before the main program
loop, or even to a separate program. Below is an example of a typical structure of
the "Main" macrofilter that implements this advice. There are two macrofilters: the
first one is responsible for once-only computations, and the second is a Task
implementing the main program loop:
Typical program structure separating precomputing from the main loop.
Rule #2: Prefer simple solutions.
 Do not use Template Matching if more simple techniques as Blob Analysis or 1D

Edge Detection would suffice.
 Prefer pixel-precise image analysis techniques (Region Analysis) and the Nearest
Neighbour (instead of Bilinear) image interpolation.
 Consider extracting higher level information early in the program pipeline – for
example it is much faster to process Regions than Images.
Rule #3: Mind the influence of the user interface.
 Note that in the development environment displaying data on the preview windows
takes much time. Choose Program » Previews Update Mode » Disable
Visualization to get performance closer to the one you can expect in the runtime
environment.
 In the runtime environment use the VideoBox control for image display. It is highly
optimized and can display hundreds of images per second.
 Using the VideoBox controls, prefer the setting of SizeMode: Normal, especially if
the image to be displayed is large. Also consider
using DownsampleImage or ResizeImage.
 Prefer the Update Data Previews Once an Iteration option.
 Mind the Diagnostic Mode. Turn it off whenever you need to test speed.
 Pay attention to the information provided by the Statistics window. Before optimizing
the program, make sure that you know what really needs optimizing.
Rule #4: Mind the influence of the data flow model.
Data flow programming allows for creating high speed machine vision applications nearly as
well as the standard C++ programming. This, however, requires meeting an assumption
that we are using high-level tools and image analysis is the main part. On the other hand,
for low level programming tasks – like using many simple filters to process high numbers of
pixels, points or small blobs – all interpreted languages will perform significantly slower than
C++.
 For performance-critical low-level programming tasks consider User Filters.

 Prefer formula blocks over arithmetic filters like AddIntegers or DivideReals.
 Use a lower number of higher level filters (e.g. RotatePath) instead of a big number
of low level filters or formulas (e.g. calculating coordinates of all individual points of
the path).
 Avoid using low-level filters (such as MergeDefault or ChooseByPredicate) with non-
primitive types such as Image or Region. Filters perform full copying of at least one
of the input objects. Prefer using Variant Step Macrofilters instead.
 Mind the connections with conversions (the arrow head with a dot) – there are
additional computations, which is some cases (e.g. RegionToImage) might take
some time. If the same conversion is used many times, then it might be better to use
the converting filter directly.
 The sequence of filters with array connections may produce a lot of data on the
outputs. If only the final result is important, then consider extracting a macrofilter that
will be executed in array mode as a whole and inside of it all the connections will be
basic. For example:
After optimizing: Only the fin
consumption. Please note tha
Reg
Before optimizing: There are several arrays of intermediate results.
Common Optimization Tips

Apart from the above general rules, there are also some common optimization tips related
to specific filters and techniques. Here is a check-list:
 Template Matching: Do not mark the entire object as the template region, but only
mark a small part having a unique shape.
 Template Matching: Prefer high pyramid levels, i.e. leave
the inMaxPyramidLevel set to Auto, or to a high value like between 4 and 6.
 Template Matching: Prefer inEdgePolarityMode set not to Ignore
and inEdgeNoiseLevel set to Low.
 Template Matching: Use as high values of the inMinScore input as possible.
 Template Matching: If you process high-resolution images, consider setting
the inMinPyramidLevel to 1 or even 2.
 Template Matching: When creating template matching models, try to limit the range
of angles with the inMinAngle and inMaxAngle inputs.
 Template Matching: Do not expect high speed when allowing rotations and scaling at
the same time. Also model creation can take much time or even fail with an "out of
memory" error.
 Template Matching: Consider limiting inSearchRegion. It might be set manually, but
sometimes it also helps to use Region Analysis techniques before Template
Matching.
 Template Matching: Decrease inEdgeCompleteness to achieve higher speed at the
cost of lower reliability. This might be useful when the pyramid cannot be made
higher due to loss of information.
 Do not use these filters in the main program
loop: CreateEdgeModel1, CreateGrayModel, TrainOcr_MLP, TrainOcr_SVM.
 If you always transform images in the same way, consider filters from the Image
Spatial Transforms Maps category instead of the ones from Image Spatial
Transforms.
 Do not use image local transforms with arbitrary shaped
kernels: DilateImage_AnyKernel, ErodeImage_AnyKernel, SmoothImage_Mean_An
yKernel. Consider the alternatives without the "_AnyKernel" suffix.
 SmoothImage_Median can be particularly slow. Use Gaussian or Mean smoothing
instead, if possible.
Application Warm-Up (Advanced)

An important practical issue in industrial applications with triggered cameras is that the first
iteration of a program must often already be executed at the full speed. There are however
additional computations performed in the first iterations that have to be taken into account:
 Memory buffers (especially images) for output data are allocated.

 Memory buffers get loaded to the cache memory.
 External DLL libraries get delay-loaded by the operating system.
 The modern CPU mechanics, like branch prediction, get trained.
 Connections with external devices (e.g. cameras) get established.
 Some filters, especially ones from 1D Edge Detection and Shape Fitting,
precompute some data.
These are things that result from both the simplified data-flow programming model, as well
as from the modern architectures of computers and operating systems. Some, but not all, of
them can be solved with the use of Aurora Vision Library (see: When to use Aurora Vision
Library?). There is however, an idiom that might be useful also with Aurora Vision Studio –
it is called "Application Warm-Up" and consists in performing one or a couple of iterations
on test images (recorded) before the application switches to the operational stage. This can
be achieved with the following "GrabImage" variant macrofilter:
An example "GrabImage" macrofilter designed for application warming-up.
The "GrabImage" variant macrofilter shown above is an example of how application warm-

up can be achieved. It starts its operation in the "WarmUp" variant, where it initializes the
camera and produces a test image loaded from a file (which has exactly the same
resolution and format as the images acquired from the camera). Then it switches to the
"Work" variant, where the standard image acquisition filter is used. There also an additional
output outIsWarmingUp that can be used for example to suppress the output signals in the
warming-up stage.
Configuring Parallel Computing

The filters of Aurora Vision Studio internally use multiple threads to utilize the full power of
multi-core processors. By default they use as many threads as there are physical
processors. This is the best setting for majority of applications, but in some cases another
number of threads might result in faster execution. If you need maximum performance, it is
advisable to experiment with the ControlParallelComputing filter with both higher and lower
number of threads. In particular:
 If the number of threads is higher than the number of physical processors, then it is

possible to utilize the Hyper-Threading technology.
 If the number of threads is lower than the number of physical processors (e.g. 3
threads on a quad-core machine), then the system has at least one core available
for background threads (like image acquisition, GUI or computations performed by
other processes), which may improve its responsiveness.
Configuring Image Memory Pools

Among significant factors affecting filter performance is memory allocation. Most of the
filters available in Aurora Vision Studio re-use their memory buffers between consecutive
iterations which is highly beneficial for their performance. Some filters, however, still
allocate temporary image buffers, because doing otherwise would make them less
convenient in use. To overcome this limitation, there is the
filter ControlImageMemoryPools which can turn on a custom memory allocator for
temporary images.
There is also a way to pre-allocate image memory before first iteration of the program
starts. For this purpose use the InspectImageMemoryPools filter at the end of the program,
and – after a the program is executed – copy its outPoolSizes value to the input of
a ChargeImageMemoryPools filter executed at the beginning. In some cases this will
improve performance of the first iteration.
Using GPGPU/OpenCL Computing

Some filters of Aurora Vision Studio allow to move computations to an OpenCL capable
device, like a graphics card, in order to speed up execution. After proper initialization,
OpenCL processing is performed completely automatically by suitable filters without
changing their use pattern. Refer to "Hardware Acceleration" section of the filter
documentation to find which filters support OpenCL processing and what are their
requirements. Be aware that the resulting performance after switching to an OpenCL device
may vary and may not always be a significant improvement relative to CPU processing.
Actual performance of the filters must always be verified on the target system by proper
measurements.
To use OpenCL processing in Aurora Vision Studio the following is required:
 a processing device installed in the target system supporting OpenCL C language in

version 1.1 or greater,
 a proper and up-to-date device driver installed in the system,
 a proper OpenCL runtime software provided by its vendor.
OpenCL processing is supported for example in the following

filters: RgbToHsi, HsiToRgb, ImageCorrelationImage, DilateImage_AnyKernel.
To enable OpenCL processing in filters an InitGPUProcessing filter must be executed at the

beginning of a program. Please refer to that filter documentation for further information.
When to use Aurora Vision Library?

Aurora Vision Library is a separate product for the C++ programmers. The performance of
the functions it provides is roughly the same as of the filters provided by Aurora Vision
Studio. There are, however, some important cases when the overall performance of the
compiled code is better.
Case 1: High number of simple operations
There is an overhead of about 0.004 ms on each filter execution in Studio. That value may
seem very little, but if we consider an application which analyzes 50 blobs in each iteration
and executes 20 filters for each blob, then it may sums up to a total of 4 ms. This may
already be not negligible. If this is only a small part of a bigger application, then User
Filters might be the right solution. If, however, this is how the entire application works, then
the library should be used instead.
Case 2: Memory re-use for big images
Each filter in Aurora Vision Studio keeps its output data on the output ports. Consecutive
filters do not re-use this memory, but instead create new data. This is very convenient for
effective development of algorithms as the user can see all intermediate results. However, if
the application performs complex processing of very big images (e.g. from 10 megapixel or
line-scan cameras), then the issue of memory re-use might become critical. Aurora Vision
Library may then be useful, because only at the level of C++ programming the user can
have the full control over the memory buffers.
Aurora Vision Library also makes it possible to perform in-place data processing, i.e.
modifying directly the input data instead of creating new objects. Many simple image
processing operations can be performed in this way. Especially the Image
Drawing functions and image transformations in small regions of interest may get a
significant performance boost.
Case 3: Initialization before first iteration
Filters of Aurora Vision Studio get initialized in the first iteration. This is for example when
the image memory buffers are allocated, because before the first image is acquired, the
filters do not know how much memory they will need. Sometimes, however, the application
can be optimized for specific conditions and it is important that the first iteration is not any
slower. On the level of C++ programming this can be achieved with preallocated memory
buffers and with separated initialization of some filters (especially for 1D Edge
Detection and Shape Fitting filters, as well as for image acquisition and I/O interfaces). See
also: Application Warm-Up.
Previous: Sorting, Classifying and Choosing

Next: Understanding OrNil Filter Variants
Objects
https://docs.adaptive-vision.com/current/studio/programming_tips/OptimizingImageAnalysis.html
https://docs.adaptive-vision.com/current/studio/programming_tips/OptimizingImageAnalysis.html

Optimizing Image Analysis For Speed

Uploaded by

Copyright:

Available Formats

Optimizing Image Analysis For Speed

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimizing Image Analysis For Speed

Uploaded by

Copyright:

Available Formats

Optimizing Image Analysis for Speed

Typical program structure separating precomputing from the main loop.

Rule #2: Prefer simple solutions.

 Do not use Template Matching if more simple techniques as Blob Analysis or 1D

Rule #3: Mind the influence of the user interface.

Rule #4: Mind the influence of the data flow model.

 For performance-critical low-level programming tasks consider User Filters.

Before optimizing: There are several arrays of intermediate results.

Common Optimization Tips

Application Warm-Up (Advanced)

 Memory buffers (especially images) for output data are allocated.

The "GrabImage" variant macrofilter shown above is an example of how application warm-

Configuring Parallel Computing

 If the number of threads is higher than the number of physical processors, then it is

Configuring Image Memory Pools

Using GPGPU/OpenCL Computing

To use OpenCL processing in Aurora Vision Studio the following is required:

 a processing device installed in the target system supporting OpenCL C language in

OpenCL processing is supported for example in the following

To enable OpenCL processing in filters an InitGPUProcessing filter must be executed at the

When to use Aurora Vision Library?

Case 1: High number of simple operations

Case 3: Initialization before first iteration

Previous: Sorting, Classifying and Choosing

You might also like