Fusion Engineering and Design: A. Rigoni, G. Manduchi, A. Luchetta, C. Taliercio, T. Schröder

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Fusion Engineering and Design 128 (2018) 122–125

Contents lists available at ScienceDirect

Fusion Engineering and Design


journal homepage: www.elsevier.com/locate/fusengdes

A framework for the integration of the development process of Linux FPGA T


System on Chip devices

A. Rigonia, G. Manduchia, , A. Luchettaa, C. Taliercioa, T. Schröderb
a
Consorzio RFX (CNR, ENEA, INFN, Università di Padova, Acciaierie Venete SpA), Padova, Italy
b
Max-Planck-Institut für Plasmaphysik, D-17491 Greifswald, Germany

A R T I C L E I N F O A B S T R A C T

Keywords: System on Chip is a hardware solution combining different hardware devices in the same chip. In particular, the
FPGA XILINX Zynq solution, implementing an ARM processor and a configurable FPGA on the same chip, is a can-
System on chip didate technology for a variety of applications of interest in fusion research, where FPGA fast logic must be
ADC combined with CPU processing for high-level functions and communication. Developing Zynq based applications
Timing systems
requires the development of the FPGA logic using the XILINX Vivado IDE, mapping information between the
FPGA device and the processor address space, developing the kernel drivers for interaction with the FPGA device
and developing the high level application programs in user space for the supervision and the integration of the
system. The paper presents a framework that integrates all the above steps and greatly simplifies the overall
process. The framework has been used for the development of a programmable timing device in Wendelstein 7-X.
The development of new devices integrating data acquisition and timing functions is also foreseen for RFX-mod.

1. Introduction communication protocol, the amount of required human resources to


implement such solutions is often unaffordable, especially in small la-
The use of FPGA based solutions in control and data acquisition boratories. For this reason, FPGA solutions have been in the past limited
systems (CODAS) for nuclear fusion devices has been in the past rather to specific applications in diagnostics [1,2]. A notable exception is
limited if compared to other physics experiments such as accelerators. certainly represented by the RIO FPGA architectures [3] (Compact RIO
This fact is mainly due to the different requirements: while in accel- and Flex RIO) which provide an easy FPGA programming and in-
erators it is necessary to handle a very large amount of fast events from tegration via LabVIEW and have been widely adopted for plasma con-
detectors, requiring fast data reduction on the fly based on coin- trol [4] and other diagnostic applications [5]. This solution, proposed
cidences, in fusion experiments a lower number of channels is used, by National Instruments, aims at leveraging the power of FPGA by re-
typically requiring the acquisition of input signals for data storage and moving the main barriers in their usage, that is the expertise required in
possibly real-time control. Therefore, fusion experiment use more HDL programming and interfacing with the rest of the system. This
conventional electronic devices such as transient recorders, replaced in solution is however quite expensive and closed to the specific choice in
recent long lasting experiments by Analog to Digital (ADC) devices hardware and in the programming environment. A new modern ap-
supporting a continuous output data stream. Moreover, the dynamics of proach for the integration of high-level software components with the
the phenomena controlled in real-time, such as plasma stability, require power of the FPGA logic design is obtaining growing attention in the
in most cases a response time of the order of milliseconds, whereas the market of embedded technologies and exploits the System on Chip
control of the fastest phenomena such as vertical stabilization in toka- (SoC) solution that combines different hardware devices in the same
maks require a response time of the order of 100 μs. These requirements chip. The main hardware competitors leading the SoC FPGA market are
can be satisfied using the current computer technology making there- Intel/Altera and Xilinx, both proposing almost the same development
fore the use of general purpose computers preferable over specialized solutions but with their own proprietary software. In particular the
FPGA solutions. Developing FPGA solutions requires in fact skills and XILINX Zynq architecture [6], implementing an ARM processor and a
expertise in the Hardware Description Languages (HDL) and hardware configurable FPGA in the same chip, is a valuable candidate technology
interfaces. Considering also that the integration of custom FPGA sys- for a variety of applications of interest in fusion research, where FPGA
tems in CODAS normally requires developing some kind of specialized fast logic can be combined with software functions carried out by a CPU


Corresponding author.
E-mail addresses: [email protected] (A. Rigoni), [email protected] (G. Manduchi).

https://doi.org/10.1016/j.fusengdes.2018.01.042
Received 9 June 2017; Received in revised form 28 December 2017; Accepted 17 January 2018
Available online 06 February 2018
0920-3796/ © 2018 Elsevier B.V. All rights reserved.
A. Rigoni et al. Fusion Engineering and Design 128 (2018) 122–125

for high level functions and communication. feature detection. It is worth stressing the fact that FPGA solutions are
A considerable number of heterogeneous hardware from many more difficult to develop in respect of CPU based ones, and therefore
vendors have been released profiting of the high integration of SoC the latter is preferred, provided it can satisfy the required timing con-
devices. The main advantages that these chips brings to the program- straints. As a rule of thumb, CPU based solutions should be considered
mable logic are the possibility to interface and share hardware features when the order of magnitude of the required reaction time of the system
that are typical of a complete system such as the DMA controller and is 100 μs or larger. Shorter times normally require FPGA implementa-
external interfaces like Ethernet or SATA. tions, however other factors may affect the choice, such as memory
Many software solutions have been also proposed, to guide the access issues that may reduce performance regardless the computa-
developer through the non-trivial mechanisms of the FPGA to system tional power, as happens also in large distributed computation carried
interfacing, as well as covering different programming approaches: out by General purpouse Graphical Processor Units (GGPUs) [12].
from low level synthesis of Verilog and VHDL hardware description, to As for other FPGA solutions, SoC systems require skills and ex-
the higher level toolchains that compile real programming languages perience. For example, developing Zynq based applications requires (1)
like SystemC OpenCL and others [7,8]. the development of the FPGA logic, (2) mapping information between
In this paper we present yet another choice named Anacleto the FPGA device and the processor address space, (3) developing the
(Another auto config for logic evaluation toolchains), particularly tar- kernel driver for interfacing user software and the FPGA device and (4)
geted to the GNU Linux embedded devices, that has been developed by developing the high level software applications in user space for the
RFX consortium and aims at proposing a unified standard workflow to supervision of the system and its integration in the central CODAS. For
the FPGA developer for programming both the logic and the software this reason we have implemented a framework that integrates the above
components in a uniform and portable way. steps. The framework, described in the next section, makes the overall
It is worth noting that in the development of Anacleto SoC projects process easier, especially the integration of the FPGA components and
the knowledge of HDL, unlike other solutions such as the National the processor by coordinating all the required tools and by providing a
Instrument RIO LabView interface, is not hidden by the framework. The set of templates that can be adapted to the specific application.
aim of this framework is indeed not to provide a new programming
interface, adding another layer of logic, but to ease the development 2. Framework components
process with established well known open-source build tools. In this
way Anacleto can be a way to access the low level machinery of the Anacleto uses the Autotools [13] build infrastructure to organize the
FPGA programming easily and uniformly, and a much cheaper solution most general FPGA workflow acting like a standard toolchain compi-
in respect of RIO. Moreover, at this low abstraction programming level, lation led by GNU make targets. The development process remains
many of the features that are usually involved are already provided free quite complex because many components in the final device board must
of charge by the chip vendors or with a reasonable license fee by ex- be orchestrated (i.e. the kernel configuration, the customization of
ternal contributions, keeping a door open to a wide market of existing drivers to handle the newly created device, and so forth) but never-
solutions. theless the compilation is managed almost in automatic manner and,
The first candidate applications for SoC devices are timing systems, once the project is properly defined, all the steps are covered by Ma-
data acquisition preprocessing and fast computation. Timing systems kefile targets that can be chained in a single make run. In order to
represent a classical field of applications for FPGAs and have been develop a SoC application, it is necessary firstly to select the hardware
implemented both in custom systems [9] and commercial products system. Because we decided to make use of the Xilinx Zynq devices, as a
[10]. A typical timing application uses a synchronization clock signal first attempt, three low-cost solutions have been considered: RedPitaya
distributed, normally via fiber optic, to all the timing devices and [14], ZedBoard [15] and Parallella [16]. RedPitaya is intended to be
possibly propagating asynchronous events. The FPGA provides the used as a stand-alone system for handling digital and analog I/O sig-
generation of the required timing signals (clocks, triggers, …) based on nals. This board hosts ready to use ADC and DAC components and
current configuration loaded in the system using some kind of hardware therefore could result best suited for developing small self-contained
interface such as PCI. A processor would introduce in this case more applications, but for the same reason it shows a reduced flexibility in
flexibility in the management of the configuration, letting, for example, respect of the other two for the configuration of the I/O pins. The other
the configuration be uploaded via the network. boards are intended to be hosted in a carrier board and therefore mount
Integrating configurable FPGAs in data acquisition would provide no additional I/O devices. In particular, Parallella is targeted towards
much more flexibility in data management introducing features not computing intensive applications and hosts an additional processor
currently supported by ADC devices. An example is the possibility of with 16 cores.
managing deferred triggers communicated via network. Using the Several other software components, all free of charge, are required
network to communicate triggers in data acquisition introduces delays for developing a SoC application an deploying it into the target board.
that may compromise the precision in the reconstruction of the ac- First of all, it is necessary to download from XILINX the Integrated
quired signal. However, if a trigger message also carries the exact Development Environment (IDE) tool VIVADO for HDL programming
trigger time, and assuming that all devices have a precise knowledge of (Verilog and VHDL are the supported languages). In order to be used on
time (e.g. using IEEE 1588 timing protocol), it is possible to provide a a specific target, VIVADO requires a target-specific configuration, pro-
correct reconstruction of the signal using an internal circular buffer vided by the board developer, which specifies how the processor is
maintaining a signal history lasting at least the delay in trigger com- configured in that particular board. Currently only Red Pitaya config-
munication [11]. The use of a configurable FPGA in data acquisition uration is managed in the framework, but it is foreseen that config-
could also allow a significant reduction of the required front end when uration files from Zed Board and Parallella will be included, adding the
integrated signals from electromagnetic probes are acquired. In this choice of the target board in the configuration steps. VIVADO provides
case it would be possible to avoid analog integration before data ac- a set of configurable Intellectual Property (IP) components that carry
quisition moving integration to FPGA processing during acquisition. out the connectivity between the processor (dual core ARM Cortex A9
Fast computation carried out by FPGA allows using more sophisti- in the Zynq chip mounted on Red Pitaya) and the FPGA application.
cated algorithms in real-time plasma control retaining at the same time When no DMA is involved, communication between the processor and
the flexibility provided by a computer system. The same approach could the FPGA application is carried out by a configurable number of 32 bit
be used for new data processing algorithms such as feature detection registers and, optionally, one or more interrupt lines. When the de-
from acquired video frames. In this case the processor would supervise veloper creates a new project for a FPGA application, the IDE creates a
data transfer and the FPGA would carry out intensive computing for set of IP components, carrying out the handshaking with the internal

123
A. Rigoni et al. Fusion Engineering and Design 128 (2018) 122–125

bus (AXI bus) used to exchange information between the processor and
the FPGA application. The IDE provides the definition of a set of 32 bit
signals that can be used by the FPGA application for communication. In
a typical use case, such signals will represent the configuration to be
uploaded to the FPGA application, but they can be used to exchange
input and output data as well. After developing the specific application,
the IDE will generate the binary code to be downloaded into the FPGA.
Other configurable IP components provided by VIVADO allow the de-
finition of up to two DMA channels for FPGA applications handling data
streaming.
The configuration of the interface, i.e. number of shared registers,
interrupt lines, and DMA channels must also be reflected in the device
memory map of the processor. XILINX provides a github project hosting
an adapted version of Linux kernel 4.4 in a Debian distribution. The
project includes the toolchain for ARM processor and the kernel sources
and a tool for the generation of the device-tree structure, used by Linux
Kernel 4.4 for device abstraction [17]. Basically, a device-tree de-
scription provides information about the connected devices including
memory addresses and size of the device registers. In this case, the
device-tree description will include the registers used to communicate
with the FPGA application. The specific device-tree for the Zynq chip is
generated from the current VIVADO project by the Hardware Software
Interface component belonging to the XILINX system development kit.
The same component can generate templates for Bare Metal im-
plementation, Linux and FreeRTOS. In particular the Linux driver
template is generated within the framework based on the selected data Fig. 1. Steps in building a new system.
transfer type (mapped registers and/or DMA).
Once Linux and the corresponding device-tree have been built, the 5) make new-project to start a new VIVADO project for the devel-
final step is the development, starting from the generated template, of a opment of the specific FPGA components in VHDL or Verilog. All the
Linux device driver that will allow user programs interact with the IP components required for SoC interface are created and can now
FPGA application. In the simplest configuration, a buffer in user space is be configured using VIVADO graphical interface;
mapped against the sets of registers defined in the FPGA application so 6) make write-project once the logic has been defined with external
that information is exchanged by reading and writing that buffer. sources and VIVADO project block designs, the whole project defi-
From the above description, it is clear that building a SoC system nition can be stored in the repository in the form of a script able to
from scratch is not an easy task, despite the availability in the web of all regenerate the project from scratch, even using different versions of
required tools. The presented framework integrates all the above steps VIVADO.
and greatly simplifies the overall process. In particular, the framework: 7) At this point the FPGA application can be developed. It is also ne-
cessary to write the Linux driver for communication and a skeleton
- Supervises he compilation of the toolchain and the Linux kernel Linux driver source file is generated by the framework, based on the
using the components taken from the XILINX repository; current configuration of the interface IP components. Then make
- Handles the management of the VIVADO project and the required IP starts both the logic synthesis and the compilation of software
components for FPGA integration; components.
- Supervises the construction of the device-tree required for the 8) make deploy to generate the device-tree, compile the Kernel
proper mapping of the FPGA registers into processor address space; module, download the kernel and the bitstream into the target de-
- Provides templates for the development of the required Linux dri- vice.
vers.
Fig. 2 shows the blocks generated by the VIVADO tool when a new
Using this framework, a SoC FPGA application can be built from SoC project is created. The top left block defines the processor; the
scratch by executing the general steps described in Fig. 1: bottom left block defines reset logic and the block in the middle defines
As shown in the schema, the overall workflow can be splitted in two the bus logic. The top right block hosts specific FPGA firmware (the
main stages: the building of the board system comprising the operating timing device in this case) and it is connected to the bus logic block via
system kernel and software, and the specific project building with the the AXI bus. Other modules can be defined as well, all connected to the
definition of the logic and the software drivers that compile against the same AXI bus. These modules can then be adapted to connect the in-
built kernel. The same board system can be shared among different terface registers to the specific FPGA firmware.
specific projects and many different projects can be also installed in the
same board. The reported steps depict a possible procedure example
through the development process a developer would follow, that is: 3. Implemented and foreseen applications

1) Clone the framework from the github repository at: [https://github. The presented framework has been used to develop a general pur-
com/mildstone/anacleto] pose timing device to be used in Wendelstein 7-X diagnostics. The
2) boostrap command that will set-up the environment and download timing device is implemented in a Red Pitaya board and defines two
all the required components and tools; digital outputs to generate clock and gate signals, and two digital inputs
3) configure to set-up the system before building the toolchain and the to receive a synchronizing 10 MHz clock and a trigger signal. The board
Linux Kernel; this also compiles and shows a graphical user interface is configured via software to generate a pre-programmed timing se-
that eases the selection of the required options. quence after the system has been armed and a trigger input signal has
4) make to compile the toolchain and then the Linux Kernel; been received. The timing sequence is communicated via TCP/IP to the

124
A. Rigoni et al. Fusion Engineering and Design 128 (2018) 122–125

Fig. 2. Blocks generated by the VIVADO tool in a SoC project.

ARM processor hosted in the Zynq chip of the Red Pitaya board. In this experience. There are however several applications of interest in fusion
case a set of registers have been defined as interface between the pro- that don’t require sophisticated FPGA firmware, as it has been the case
cessor and the FPGA application, without using interrupt lines. All of the presented timing board. This class of applications is likely to
registers except one are used to specify the time sequence. The re- benefit from SoC architecture, especially when the set of required tools
maining register is used as command register to arm and disarm the and configurations is transparently managed by a tool like the one
board. Not considering the time required for developing the FPGA ap- presented here.
plication written in VHDL the creation of the new project, the adaption Another promising field of applications for SoC architectures is the
of the driver from the template and the deploy required less than one possibility of moving critical computation into FPGA, leaving software
working day. to supervise the overall management of computation. This approach is
The same github repository used to host the framework components already exploited in the RIO architecture for LabVIEW applications and
has been used to host the timing board project and it is foreseen that all may lead to extremely performing real-time systems that retain the
the new developed projects will be hosted there. flexibility of computer-based solutions, but allow moving time critical
We are currently considering the usage of Zed Board for developing inner loops into FPGA firmware.
a new fast ADC to be used to acquire electromagnetic probes in the
upgrade of the RFX-mod experiment currently in construction at References
Consorzio RFX [18]. In this experiment a large (∼1000) number of
electromagnetic signals is foreseen, where a configurable subset will be [1] R.C. Pereira, et al., Pulse analysis for gamma-ray diagnostics ATCA sub-systems of JET
tokamak, IEEE Trans. Nucl. Sci. 58 (4) (2011) 1531–1537.
used for real-time control. All the acquired signals will be stored at full [2] S. Hernandez-Montero, J.A. Lopez, M. Sanchez, L. Esteban, Real Time FPGA-Based
speed (up to 1 MHz) during the pulse in RAM memory and will be read Crosstalk Elimination for Multichannel Interferometry Systems in Fusion Diagnostics Real
after the discharge, using the traditional transient recorder organization Time Conference (RT), 18th IEEE-NPSS, 2012.
[3] National Instruments, Compact RIO Platform, (2018) http://www.ni.com/compactrio/.
(RFX-mod carries out short duration discharges), but at the same time a [4] L. Giannone, M. Cerna, R.H. Cole, D. Schmidt, Data acquisition and real-time signal
subsampled version of the signal will be made available for real-time processing of plasma diagnostics on ASDEX upgrade using LabVIEW RT, Fus. Eng. Des. 85
(3) (2018) 303–307.
control. For this purpose, 16 channels from a fast and insulated ADC, [5] M. Ruiz, J. Vega, G. Ratta, E. Barrera, A. Murari, J.M. López, G. Arcas, R. Meléndez, Real
already used for vertical stabilization at JET [19], will be connected to time plasma disruptions detection in JET implemented with the ITMS platform using
FPGA based IDAQ IEEE, Trans. Nucl. Sci. 58 (4) (2011) 1576–1581.
the digital inputs of the SoC board. Each ADC channel will provide a
[6] XILINX all Programmable SoC with Hardware and Software Programmability, (2018)
50 MHz signal carrying the serialized bits of every sample (ADC con- https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html.
version is performed at 18 bits). The FPGA firmware will de-serialize [7] Razvan Nane, et al., A survey and evaluation of FPGA high-level synthesis tools, IEEE
Trans. Comp. 55 (10) (2015) 1591–1604.
the input channel, send the samples to RAM via a DMA channel, per- [8] XILINX: Cloud Acceleration for RTL, C/C++, and OpenCL, (2018) https://www.xilinx.
form digital filtering for subsampling, and send subsampled data at com/products/design-tools/software-zone/sdaccel.html.
[9] J. Schacht, J. Skodzik, the CoDaC Team, Multifunction-timing card lTTEV2 for CoDaC
10 kHz via the second DMA link to the processor that will send samples systems of wendelstein 7-X, IEEE Trans. Nucl. Sci. 62 (3) (2015) 1187–1194.
via UDP to the real-time control system. [10] DIO4 Timing Generator/Digital I/O Module, (2018) http://www.incaacomputers.com/
products/by-function/digital-io/dio4/.
[11] C. Taliercio, A. Luchetta, G. Manduchi, A. Rigoni, Distributed continuous event −based
4. Conclusions data acquisition using IEEE 1588 synchronization and FlexRIO FPGA, IEEE Trans. Nucl.
Sci. 64 (2017) 7.
[12] T.J. Maceina, G. Manduchi, Assessment of general purpose GPU systems in real-time
The SoC architecture proved to be effective in removing the control, IEEE Trans. Nucl. Sci. 64 (6) (2017).
“knowledge barrier” that prevents FPGA development in several fusion [13] GNU Autoconf Introduction, (2018) http://www.gnu.org/software/autoconf/autoconf.
html.
laboratories, especially when FPGA solutions are not strictly required to [14] Redpitaya Home Page, (2018) https://redpitaya.com/.
achieve requirements. The applicability of the SoC architecture has [15] Zedboard Home Page, (2018) http://zedboard.org/.
been further improved by the presented framework, hiding to the de- [16] Parallella Home Page, (2018) https://www.parallella.org/.
[17] Device Tree Specification, (2018) https://www.devicetree.org/.
veloper several intermediate steps and exposing only the necessary [18] M.E. Puiatti, et al., Extended scenarios opened by the upgrades of the RFX-mod experi-
information for the proper system configuration. The presented fra- ment, 26th IAEA Fusion Energy Conference (2016).
[19] A. Batista, A. Neto, M. Correia, A.M. Fernandes, B. Carvalho, ATCA control system
mework however leaves to the developer the whole responsibility of the hardware for the plasma vertical stabilization in the JET tokamak, IEEE Trans. Nucl. Sci
FPGA firmware development, a task that requires ingenuity and 57 (2) (2010) 583–588.

125

You might also like