Embedded Vision
Embedded Vision
Embedded Vision
2023/2024
PAGE 1
Embedded vision block diagram
As like embedded systems, there are popular single board computers (SBC), such
as the Raspberry Pi are available on the market for embedded vision product
development. the Raspberry Pi is a mini computer with established interfaces and
offers a similar range of features as a classic PC or laptop. Embedded vision
solutions can also be implemented with so-called system on modules (SoM) or
computer on modules (CoM). These modules represent a computing unit. For the
adaptation of the desired interfaces to the respective application, a so called
individual carrier board is needed. This is connected to the SoM via specific
connectors and can be designed and manufactured relatively simply. The SoMs or
CoMs (or the entire system) are cost effective on the one hand since they are
available off-the-shelf, while on the other hand they can also be individually
customized through the carrier board. For large manufactured quantities,
individual processing boards are a good idea.
All modules, single board computers, and SoMs, are based on a system on chip
(SoC). This is a component on which the processor(s), controllers, memory
PAGE 2
modules, power management, and other components are integrated on a single
chip. Due to these efficient components, the SoCs, embedded vision systems have
only recently become available in such a small size and at a low cost.
Embedded system
boards
PAGE 3
Most of the previously mentioned single board computers and SoMs do not
include the x86 family processors common in standard PCs. Rather, the CPUs are
often based on the ARM architecture. The open source Linux operating system is
widely used as an operating system in the world of ARM processors. For Linux,
there are a large number of open source application programs, as well as numerous
freely available program libraries. Increasingly, however, x86-based single board
computers are also spreading. A consistently important criterion for the computer
is the space available for the embedded system.
For the software developer, the program development for an embedded system is
different than for a standard PC. As a rule, the target system does not provide a
suitable user interface which can also be used for programming. The software
developer must connect to the embedded system via an appropriate interface if
available (e.g., network interface) or develop the software on the standard PC and
then transfer it to the target system. When developing the software, it should be
noted that the hardware concept of the embedded system is oriented to a specific
application and thus differs significantly from the universally usable PC. However,
the boundary between embedded and desktop computer systems is sometimes
difficult to define. Just think of the mobile phone, which on the one hand has
many features of an embedded system (ARM-based, single-board construction),
but on the other hand can cope with very different tasks and is therefore a
universal computer.
This technology category includes any device that executes vision algorithms or
vision system control software. The applications represent distinctly different
types of processor architectures for embedded vision, and each has advantages and
trade-offs that depend on the workload. For this reason, many devices combine
multiple processor types into a heterogeneous computing environment, often
integrated into a single semiconductor component. In addition, a processor can be
accelerated by dedicated hardware that improves performance on computer vision
algorithms.
PAGE 4
coprocessors and accelerators to implement the most demanding processing tasks
in the application. These coprocessors and accelerators are typically not
programmable by the chip user, however. This trade-off is often acceptable in
wireless applications, where standards mean that there is strong commonality
among algorithms used by different equipment designers.
One or more highly parallel engines for pixel rate processing with simple
algorithms
While any processor can in theory be used for embedded vision, the most
promising types today are:
PAGE 5
Mobile “application processor”
The memory systems of embedded CPUs are not designed for these kinds of data
flows. However, like most types of processors, embedded CPUs become more
powerful over time, and in some cases can provide adequate performance. There
are some compelling reasons to run vision algorithms on a CPU when possible.
First, most embedded systems need a CPU for a
In addition, most vision algorithms are initially developed on PCs using general
purpose CPUs and their associated software development tools.
Similarities between PC CPUs and embedded CPUs (and their associated tools)
mean that it is typically easier to create embedded implementations of vision
algorithms on embedded CPUs compared to other kinds of embedded vision
processors. In addition, embedded CPUs typically are the easiest to use compared
to other kinds of embedded vision processors, due to their relatively
straightforward architectures, sophisticated tools, and other application
development infrastructure, such as operating systems.
PAGE 6
related to the target application. ASSPs use unique architectures, and this can make programming
them more difficult than with other kinds of processors. Indeed, some ASSPs are not user
programmable. Another consideration is risk. ASSPs often are delivered by small suppliers, and
this may increase the risk that there will be difficulty in supplying the chip, or in delivering
successor products that enable system designers to upgrade their designs without having to start
from scratch. An example of a vision-oriented ASSP is the PrimeSense PS1080-A2, used in
the Microsoft Kinect.
PAGE 7
for vision have been enhanced with coprocessors that are optimized for processing video inputs
and accelerating computer vision algorithms. The specialized nature of DSPs makes these devices
inefficient for processing general purpose software workloads, so DSPs are usually paired with a
RISC processor to create a heterogeneous computing environment that
offers the best of both worlds.
Digital signal processors (“DSP processors” or “DSPs”) are microprocessors specialized for
signal processing algorithms and applications. This specialization typically makes DSPs more
efficient than general purpose CPUs for the kinds of signal processing tasks that are at the heart
of vision applications. In addition, DSPs are relatively mature and easy to use compared to other
kinds of parallel processors. Unfortunately, while DSPs do deliver higher performance and
efficiency than general purpose CPUs on vision algorithms, they often fail to deliver sufficient
performance for demanding algorithms. For this reason, DSPs are often supplemented with one or
more coprocessors. A typical DSP chip for vision applications therefore comprises a CPU, a DSP,
and multiple coprocessors. This heterogeneous combination can yield excellent performance and
efficiency, but can also be difficult to program. Indeed, DSP vendors typically do not enable
users to program the coprocessors; rather, the coprocessors run software function libraries
developed by the chip supplier. An example of a DSP targeting video applications is the Texas
Instruments DM8168.
Field Programmable Gate Arrays (FPGAs) with a CPU
Instead of incurring the high cost and long lead times for a custom ASIC to accelerate computer
vision systems, designers can implement an FPGA to offer a reprogrammable solution for
hardware acceleration. With millions of programmable gates, hundreds of I/O pins, and compute
performance in the trillions of multiply accumulates/sec (tera-MACs), high-end FPGAs offer the
potential for highest performance in a vision system. Unlike a CPU, which has to use time slice or
multi-thread tasks as they compete for compute resources, an FPGA has the advantage of being
able to simultaneously accelerate multiple portions of a computer vision pipeline. Since the
parallel nature of FPGAs offers so much advantage for accelerating computer vision, many of the
algorithms are available as optimized libraries from semiconductor vendors. These computer
vision libraries also include preconfigured interface blocks for connecting to other vision devices,
such as IP cameras.
Field programmable gate arrays (FPGAs) are flexible logic chips that can be reconfigured at the
gate and block levels. This flexibility enables the user to craft computation structures that are
tailored to the application at hand. It also allows selection of I/O interfaces and on-chip
peripherals matched to the application requirements. The ability to customize compute structures,
coupled with the massive amount of resources available in modern FPGAs, yields high
performance coupled with good cost and energy efficiency.
However, using FPGAs is essentially a hardware design function, rather than a software
development activity. FPGA design is typically performed using hardware description languages
(Verilog or VHLD) at the register transfer level (RTL) a very low-level of abstraction. This
makes FPGA design time consuming and expensive, compared to using the other types
of processors discussed here.
However using FPGAs is getting easier, due to several factors. First, so called “IP block” libraries
—libraries of reusable FPGA design components are becoming increasingly capable. In some
cases, these libraries directly address vision algorithms. In other cases, they enable supporting
functionality, such as video I/O ports or line buffers. Second, FPGA suppliers and their partners
increasingly offer reference designs reusable system designs incorporating FPGAs and targeting
specific applications.
Third, high-level synthesis tools, which enable designers to implement vision and other
algorithms in FPGAs using high-level languages, are increasingly effective. Relatively low-
performance CPUs can be implemented by users in the FPGA. In a few cases, high-performance
PAGE 8
CPUs are integrated into FPGAs by the manufacturer. An example FPGA that can be used for
vision applications is the Xilinx Spartan-6 LX150T.
PAGE 9
Other Semiconductor Devices for Embedded Vision
Embedded vision applications involve more than just programmable devices and image sensors;
they also require other components for creating a complete system. Most applications require data
communications of pixels and/or metadata, and many designs interface directly to the user.
Some computer vision systems also connect to mechanical devices, such as robots or industrial
control systems.
The list of devices in this “other” category includes a wide range of standard products. In
addition, some system designers may incorporate programmable logic devices or ASICs. In many
vision systems, power, space, and cost constraints require high levels of integration with the
programmable device often into a system-on-a-chip (SoC) device. Sensors to sense external
parameters or envienvironmental measurements are discussed in the separate chapter headings.
Memory
Processors can integrate megabytes’ worth of SRAM and DRAM, so many designs will not
require off-chip memory. However, computer vision algorithms for embedded vision often
require multiple frames of sensor data to track objects. Off-chip memory devices can store
gigabytes of memory, although accessing external memory can add hundreds of cycles of latency.
The systems with a 3D-graphics subsystem will usually already include substantial amounts of
external memory to store the frame buffer, textures, Z buffer, and so on. Sometimes this graphics
memory is stored in a dedicated, fast memory bank that uses specialized DRAMs.
Some vision implementations store video data locally, in order to reduce the amount of
information that needs to be sent to a centralized system.
For a solid state, nonvolatile memory storage system, the storage density is driven by the size of
flash memory chips. Latest generation NAND chip fabrication technologies allow extremely
large, fast and low-power storage in a vision system.
Mainstream computer networking and bus technology has finally started to catch up to the needs
of computer vision to support simultaneous digital video streams. With economies of scale, more
vision systems will use standard buses like PCI and PCI Express. For networking, Gigabit
Ethernet (GbE) and 10GbE interfaces offer sufficient bandwidth even for multiple high-definition
video streams. However, the trade association for Machine Vision (AIA) continues to promote
Camera Link, and many camera and frame grabber manufacturers use this interface.
PAGE 10