FPGA Paper PDF
FPGA Paper PDF
FPGA Paper PDF
Abstract
Many of linear control applications require real-time operation; higher density
programmable logic devices such as field programmable gate array (FPGA) can be used
to integrate large amounts of logic in a single IC. This work, proposes a developed
method to design PD controller (PDC) with optimal- gains using FPGA. The method
used to design PD controller is to design it as digital design Proportional and Derivative
controller in parallel through the summer. The proposed design is 32-bits FPGA-based
controller (32PDC), which uses 32-bits for each input/output variable. The single joint of
robot is used to test the controller in simulation environments, using VHDL code for the
purpose of simulation in Xilinx. The same design is coded in MATLAB environment
(MPDC) in order to make a comparison with the proposed FPGA-based design. PDC
needs 16 clock cycles to complete one action with maximum frequency of 108.5 MHz.
32PDC is able to produce an output in 13.24 MHz with the robot system. Therefore, the
proposed controller will be able to control a wide range of the systems with high
sampling rate and 75.545 ns delays.
1. Introduction
The controller is a device which can sense information from linear or nonlinear system
(e.g., robot manipulator) to improve the system performance [1]. The main targets to
design control systems are stability, good disturbance rejection, and small tracking error
[2]. Several industrial robot manipulators are controlled by linear methodologies (e.g.,
Proportional-Derivative (PD) controller, Proportional- Integral (PI) controller or
Proportional- Integral-Derivative (PID) controller) and realize them as a computer
program. However, most of linear control applications require real time operations with
high speed constrains. Therefore, the common method cannot be considered as a suitable
solution for these type applications. Higher density programmable logic devices such as
FPGA can be used to integrate large amounts of logic in a single IC. FPGA provide
additional flexibility than ASIC, and they can be used with tighter time-to-market
schedules [3].
A Field Programmable Gate Array (FPGA) is similar to a PLD, but whereas PLDs are
generally limited to hundreds of gates, FPGAs support thousands of gates. They are
especially popular for prototyping integrated circuit designs. Once the design is set,
hardwired chips are produced for faster performance. Field Programmable Gate Arrays
(FPGAs) are divided into two major categories:
SRAM-based FPGA
Antifuse-based FPGA
Despite the much advancement in dedicated digital control processing (DCP)
microcontrollers, the efficiency with which any FPGA hardware implementation is able
to execute the long standing metric of multiply and accumulate still remains to be the
limiting factor in overall performance. The specialized and constrained RISC-like
instruction sets that give FPGAs such as Xilinx instruction deep SPARTAN 3E family
their increased efficiency, oddly enough also lead to their performance ceilings for FPGA
applications. First, it is incredibly difficult to produce efficient assembly and machine
code for these specialized instruction sets, and thus FPGAs require a powerful, specially
optimized compiler (typically C). Second, and most important, is their inability to do
multiply and accumulate functions in parallel. Despite the 8 instruction deep pipeline, the
fact still remains that the data flow must pass through a pipeline that is only capable of
completing one multiply and accumulates function every few clock cycles at best. This
translates to calculating one sample of a 512-point Fast Fourier Transform (FFT) every
several thousands to 10s of thousands of clock cycles, depending on the architecture [1].
These two limitations led designers in the early 1990s to look for other alternatives that
might alleviate the overhead of and dependency of the compiler, and offer a more parallel
approach. At first, this search led designers down the path of Systems on a Chip (SOC)
design. SOCs are very dense and fast Application Specific Integrated Circuits (ASICs)
that contain a central processor hard core (ex. IBM 405 PPC), surrounded by soft-core
algorithm accelerators, memory, soft-core peripherals, and I/O interfaces all attached to a
central bus. These chips are expensive, designed for large production quantities, and are
one-time-programmable. An example of a current large SOC market is the second
generation cellular phone.
Around the same time as the move to ASICs, Field Programmable Gate Arrays
(FPGAs) emerged in the engineering prototyping and emulation world. FPGAs are a
reprogrammable, SRAM based, VLSI platform that allow a digital hardware designer to
almost instantly (compared to ASICs) upgrade a design. FPGAs lose their memory
when power is removed from the circuit, and thus must be programmed upon each power
up, either through JTAG or a PROM. FPGAs contain internal units such as built in
memory, registers, and multiplier blocks. As the number logic resources in FPGAs grew,
DSP designers began to take advantage of the vast amount of parallelism offered by such
re-programmable chips, at the cost of reduced clock speed in comparison with current
dedicated DSP microprocessors. By 2000, FPGAs began to approach the gate density of
ASICs, and make significant increases in maximum clocking frequencies. However,
ASICs are still superior for overall density and speed. Nevertheless, the vast parallelism
allowing for the execution of hundreds of multiply and accumulate (MAC) operations in
parallel, combined with nearly instant re-programmability, steered the implementation of
computationally complex DSP algorithms down the path of FPGAs. Many companies
specializing in FPGA algorithm design and implementation utilize FPGAs as their
primary means of prototyping. Often times they are utilized as the final design
implementation choice over an ASIC platform in situations where power consumption is
not a major concern. This is most evident in companies such as Lockheed Martin who
develop military applications that require cost effective rapid prototyping platforms that
can be highly integrated with FPGA software development. One of the most world-
renowned software tools for FPGA algorithm design is Matlab, produced by Mathworks.
Many companies currently realize all of their high level FPGA modeling in Matlab,
including all of the necessary test harnesses. They then convert these algorithm models to
a hardware description language, such as Verilog or VHDL, either manually or by using
some algorithm specific conversion software. From here, the normal design flow is
followed to synthesize and place and route the algorithm into a target FPGA of choice.
The original test vectors used to test the algorithm in Matlab are generally applied in
some fashion to test the algorithm in the actual hardware. Therefore, printed circuit
boards (PCBs) housing the FPGAs must be designed with interfaces that will allow data
to be transmitted to and from the PCB via a PC running Matlab. This interface requires
additional hardware and software overhead for testing. The bottom line is that todays
FPGA algorithm and hardware designers have a desire and need to begin designing with
Matlab, target FPGA hardware, and then complete their verification by reading test
vectors back into Matlab from the real hardware to compare against their original
software algorithm results. In 2001, one of the leading FPGA companies, Xilinx, teamed
up with IBM PowerPC (PPC) ASIC designers to develop a new VLSI design platform.
The result was a new family of Xilinx FPGAs, called the Virtex II Pro (V-II Pro), built
on Xilinxs Virtex family. Not only are these FPGAs faster and denser than previous
programmable logic families from Xilinx or their competition, but they also integrate an
IBM 405 PPC ASIC hardcore into the FPGA fabric. Along with the built in processor,
Xilinx and IBM offered a set of soft cores to accompany the processor, which were
optimized for FPGA implementation, as well as a new set of software tools to easily
integrate them into FPGA designs [4-5].
The integration of an ASIC hardcore into the programmable fabric of an FPGA is not
only a great milestone from a VLSI technology standpoint, but it also provides an
essential bridge between the SOC market, traditionally only capable of being
implemented in an ASIC, and re-programmable logic chips. With the availability of a
central processor, FPGA optimized I/O cores (ex., Ethernet), new software, and millions
of FPGA gates allowing for user defined soft cores, Xilinx and IBM gave birth to a whole
new market; the re-programmable SOC. Despite the advantages this offers to the general
SOC market, there is one major drawback for FPGA implementations; the 405 PPC does
not contain a floating-point unit (FPU). Floating-point operations are essential to FPGA
algorithms. However, not all is lost. Since the user-defined SOC cores are programmed
into the FPGA fabric, portions of a given FPGA algorithm can take advantage of vast
hardware parallelism to implement floating-point MAC operations, while the rest of the
algorithm is executed on the CPU. Future generations of the Xilinx re-programmable
SOC will no doubt contain a FPU CPU, presenting the first fully integrated system to
physically emulate (FPGA) algorithm tradeoffs and optimization. Therefore, the new
reprogrammable SOC (i.e., the Xilinx Virtex-II Pro) offers a unique and high tech
platform to develop the ultimate FPGA engine [6-7].
In order to provide sufficient background information for the discussion of
implementing control algorithms on Xilinx FPGAs, this part presents a thorough
summary of the Xilinx architecture. It focuses primarily on the architectural information
needed to understand multipliers and the implementation of FPGA algorithms.
The Xilinx Spartan FPGA architecture is basically the same as the Virtex II platform,
with the exception that the V2P is shipped on 300 nm wafers with dies that are fabricated
on 0.13um technology. These chips are SRAM based devices: that is, they do not retain
their logical configuration once power is removed. Instead they contain an internal
SRAM based configuration memory. Upon power up, application specific configuration
data is loaded into the configuration memory, which is typically stored in an EEPROM (It
can also be loaded via a PC through a JTAG boundary scan interface. The EEPROM
communicates with the FPGA to facilitate a three phase loading sequence. First, the
configuration memory is cleared, then the configuration data is loaded, finally followed
by a start-up sequence that activates the logic (sequential release of clocks and control
lines) [8]. Although FPGA stands for Field Programmable Gate Array, the Xilinx
FPGAs at the highest level of abstraction are really more like a very dense array
consisting of the 6 major building blocks shown in Figure 1. Configurable Logic Blocks
(CLB.s), Block RAMs (BRAM), Multipliers, Digital Clock Managers (DCMs), and
standard and high speed I/O (IOB.s), are all connected to each other through a fully
buffered (SRAM controlled pass transistor) programmable Switching Matrix [8-10].
This switching matrix is programmed and controlled via the configuration data, known
as a bit file, loaded into the configuration SRAM at power up. The CLB building blocks
take up more than 75% of the area resources, and therefore each device within a the V2P
family can be characterized by its CLB array size, with all of the other building blocks in
relation to the CLBs, as depicted in Figure 2 [8]. The FPGA device that was supplied
with the Xilinx development card that was utilized in the implementation portion of this
research was the XC2VP7. This device has an array of 40x34, for a total of 1360 CLBs.
As can we seen from Figure 2, the Multipliers and Block RAMs are sandwiched in
narrow columns between the CLBs. The Spartan 1600 E device has six such columns,
containing a total of 44 multipliers and 44 Block RAMs. The maximum size V2P device
goes up to 120x94 (11280) CLBs with 16 columns of 444 Block RAMs and 444
Multipliers [3-4].
Notice in Figure 2 that the Digital Clock Manager blocks are placed at the top and
bottom of each Block RAM/ Multiplier column; thus there are a total of 12 DCMs in the
XC2VP7 chip, with a maximum of 32 DCMs for the V2P family. The Rocket I/O Multi-
Gigabit Transceivers are parallel to serial (and vice versa) embedded transceiver cores
used for high-speed interfaces between multiple FPGAs, over a bus or back plane for
example. Although these are very helpful to digital control designers who have need to
parse a control algorithm across 2 chips and communicate quickly to keep processing real
time data, neither the Rocket I/O nor the DCMs are the direct focus of discussion here,
and thus the reader is referenced to [5] for further information on these topics. Although
this is certainly a building block for the architecture, it really is an item that deserves
separate attention, as it brings into the design a whole architecture of its own, with its
own set of building blocks.
Clearly, the central building block of the V2P architecture is the CLB. Figure 3
illustrates the construction of a single CLB [6]. It consists of 4 slices, or sub-blocks,
staggered into two columns, each with its own independent logic carry chain, as well a
common shift chain connecting the staggered sets of slices. Each slice is connected to the
programmable switch matrix such that each block may gain access to the IOBs, DCMs,
BRAMs, Multipliers, and to other CLBs as Figure 1 illustrates. The fast connects allow
for quick local feedback within the CLB [7].
2. Theory
The equation of an n-DOF robot manipulator governed by the following equation:
() + (, ) = (1)
Where is actuation torque, M (q) is a symmetric and positive define inertia matrix,
(, ) is the vector of nonlinearity term. This robot manipulator dynamic equation can
also be written in a following form:
Where B(q) is the matrix of coriolios torques, C(q) is the matrix of centrifugal torques,
and G(q) is the vector of gravity force. The dynamic terms in equation (2) are only
manipulator position. This is a decoupled system with simple second order linear
differential dynamics. In other words, the component influences, with a double
integrator relationship, only the joint variable , independently of the motion of the other
joints. Therefore, the angular acceleration is found as to be:
= (). { (, )} (3)
The data collection to implement the single joint robot arm is the relationship between
the rate of torque and end-effector position. Regarding to this information Figure 4 shows
the system dynamic behavior.
100 1
80
60
Annotations denote column breakpoints
40
20
Position
-20
-40
-60
-80
-100
-15 -10 -5 0 5 10 15
Torque
= () + (4)
(5)
= [() + ]
where the control variable u(t) is the controller output and K (the proportional gain), and
Td (derivative time). Performing Laplace transform on (5), we get
() = [ + . ] (6)
We can easily convert the parameters from one form to another by noting that
= (7)
= .
() = ()[ + ( )] (8)
Rearranging gives
( + ) + ( ) + (9)
() = ()[ ]
= __
Where
=
= { }
=
Figure 6. Symbols for Conversion from Double to N-bit Fixed Point and
Vice Versa
In derivative part, systems should derivative from error (the output of P controller).
Figure 10 shows the derivative of error.
4. Result
Figure 11 shows the actual and desired input and also torque performance in transient
state. Regarding to this Figure however actual and desired inputs equal to zero but torque
performance has fluctuations in first 50 .
Figure 12 indicates the actual and desired position, and also torque performance. In
this state the desired position moved to 50 but in the next 100 the actual
position moved to 11 . Regarding to the following Figure it has about 61
degrees error.
Regarding to Figure 12, the torque performance between 50 to 150 ns is equal to zero.
In this time controller is inactive, this time is the controllers delay. The next
100 (150 250 ) illustrate improvement the actual position from 11 to
0.23 . Regarding to the Figure 13 the error reduce from 61 to
49.77 .
5. Conclusion
From the design and simulation results of the proposed controller, it can be concluded
that; higher execution speed versus small chip size is achieved by designing PD-FPGA
based controller with simplified structure. This method improves the speed of system
performance and reduces the delay of systems control. As a simulation result in
XILINX, it is observed that; this controller is able to make as a fast response at 15.716
clock period with 63.7 of a maximum frequency and 4.362 for minimum input
arrival time after clock. From investigation and synthesis summary, 19.727 for
maximum input arrival time after clock with 50.69 frequencies, this design has
15.716 delays for each controller to 46 logic elements. Regarding to timing report
87.8% is logical delay and 12.2% is route delay. The offset before CLOCK is 1.946
for 1 logic gates. Therefore, the proposed controller will be able to control a wide range
of the systems with high sampling rate.
Acknowledgement
The authors would like to thank the anonymous reviewers for their careful reading of
this paper and for their helpful comments. This work was supported by the Iranian
Institute of Advance Science and Technology Program of Iran under grant no. 2012-
Persian Gulf-DIG.
Iranian center of Advance Science and Technology (IRAN SSP) is one of the
independent research centers specializing in research and training across of Control and
Automation, Electrical and Electronic Engineering, and Mechatronics & Robotics in Iran.
At IRAN SSP research center, we are united and energized by one mission to discover
and develop innovative engineering methodology that solve the most important
challenges in field of advance science and technology. The IRAN SSP Center is instead
to fill a long standing void in applied engineering by linking the training a development
function one side and policy research on the other. This center divided into two main
units:
Education unit
Research and Development unit
References
[1] J. L. Hennnesy and D. A. Pattterson, Computer Architecture: A Quantitative Approach, 3rd ed.,
Morgan Kaufmann Publishers, (2003).
[2] Xilinx Company, XA Spartan 3E Field Programmable Gate Arrays, Data sheet DS635, URL
www.xilinx.com, (2009).
[3] S. Lentijo, S. Pytel, A. Monti, J. Hudgins, E. Santi and G. Simin, "FPGA based sliding mode control
for high frequency power converters", IEEE Conference on Power Electronics, (2004), pp. 3588-
3592.
[4] R. R. Ramos, D. Biel, E. Fossas and F. Guinjoan, "A fixed-frequency quasi-sliding control algorithm:
application to power inverters design by means of FPGA implementation", IEEE Transactions on
Power Electronics, vol. 18, no. 1, (2003), pp. 344-355.
[5] F. J. Lin, D. H. Wang and P. K. Huang, "FPGA-based fuzzy sliding-mode control for a linear
induction motor drive", IEEE Journal of Electrical Power Application, vol. 152, no. 5, (2005), pp.
1137-1148.
[6] S. T. Karris, Digital circuit analysis and design with Simulink modeling and introduction to CPLDs
and FPGAs, Orchard Pubns, (2007).
[7] F. Piltan, A. Gavahian, N. Sulaiman, M.H. Marhaban and R. Ramli, Novel Sliding Mode Controller
for robot manipulator using FPGA, Journal of Advanced Science & Engineering Research, vol. 1, no.
1, (2011), pp. 1-22.
[8] F. Piltan, N. Sulaiman, M. H. Marhaban, A. Nowzary & M. Tohidian, Design of FPGA-based
Sliding Mode Controller for Robot Manipulator, International Journal of Robotic and Automation,
vol. 2, no. 3, (2011), pp. 183-204.
[9] F. Piltan, N. Sulaiman, A. Jalali and K. Aslansefat, Evolutionary Design of Mathematical tunable
FPGA Based MIMO Fuzzy Estimator Sliding Mode Based Lyapunov Algorithm: Applied to Robot
Manipulator, International Journal of Robotics and Automation, vol. 2, no. 5, (2011), pp. 317-343.
[10] F. Piltan, I. Nazari, S. Siamak and P. Ferdosali, Methodology of FPGA-Based Mathematical error-
Based Tuning Sliding Mode Controller, International Journal of Control and Automation, vol. 5,no.
1, (2012), pp. 89-118.
Authors
Farzin Piltan was born on 1975, Shiraz, Iran. In 2004 he is
jointed Institute of Advance Science and Technology, Research and
Development Center, IRAN SSP. Now he is a dean of Intelligent
Control and Robotics Lab. He is led of team (47 researchers) to
design and build of nonlinear control of industrial robot manipulator
for experimental research and education and published about 54
Papers in this field since 2010 to 2012, team supervisor and leader (9
researchers) to design and implement intelligent tuning the rate of
fuel ratio in internal combustion engine for experimental research
and education and published about 17 Journal papers since 2011 to
2013, team leader and advisor (34 researchers) of filtering the hand
tremors in flexible surgical robot for experimental research and
education and published about 31 journal papers in this field since
2012 to date, led of team (21 researchers) to design high precision
and fast dynamic controller for multi-degrees of freedom actuator for
experimental research and education and published about 7 journal
papers in this field since 2013 to date, led of team (22 researchers) to
research of full digital control for nonlinear systems (e.g., Industrial
Robot Manipulator, IC Engine, Continuum Robot, and Spherical
Motor) for experimental research and education and published about
4 journal papers in this field since 2010 to date and finally led of
team (more than 130 researchers) to implementation of Project
Based-Learning project at IRAN SSP research center for
experimental research and education, and published more than 110
journal papers since 2010 to date. In addition to 7 textbooks, Farzin