PRGA: An Open-Source FPGA Research and Prototyping Framework
PRGA: An Open-Source FPGA Research and Prototyping Framework
PRGA: An Open-Source FPGA Research and Prototyping Framework
Prototyping Framework
Ang Li David Wentzlaff
angl(at)princeton(dot)edu wentzlaf(at)princeton(dot)edu
Princeton University Princeton University
Princeton, New Jersey Princeton, New Jersey
ABSTRACT 1 INTRODUCTION
Field Programmable Gate Arrays (FPGA) are being used in a fast- Field Programmable Gate Arrays (FPGAs) have become an in-
growing range of scenarios, and heterogeneous CPU-FPGA systems creasingly important tool to enable application performance in
are being tapped as a possible way to mitigate the challenges posed a post Moore’s Law [19] world. Whether they are being used as
by the end of Moore’s Law. This growth in diverse use cases has a standalone compute fabric or a supplement to processors at the
fueled the need to customize FPGA architectures for particular chip-level [8, 10, 29], board-level [20], system-level, or datacenter-
applications or application domains. While high-level FPGA models level [1, 5], the diversity of use cases and importance of FPGAs
can help explore the FPGA architecture space, as FPGAs move to have been increasing. Ideally, an FPGA architecture should be op-
more advanced design nodes, there is an increased need for low- timized for each unique use case. In practice, though, it is very
level FPGA research and prototyping platforms that can be brought challenging to evaluate different FPGA designs in detail and even
all the way to fabrication. more challenging and time-consuming to prototype and bring those
This paper presents Princeton Reconfigurable Gate Array FPGAs to fabrication. This is because FPGA chip design flow has
(PRGA), a highly customizable, scalable, and complete open-source diverged from the design flows of other digital ASICs like pro-
framework for building custom FPGAs. The framework’s core func- cessors. Commercial FPGAs are often designed with custom cells
tions include generating synthesizable Verilog from user-specified and specialized EDA tools that are publicly unavailable. Likewise,
FPGA architectures, and providing a complete, auto-generated, each unique FPGA requires the creation of customized CAD tools.
open-source CAD toolchain for the custom FPGAs. Developed in Due to this high design cost, commercial FPGA vendors typically
Python, PRGA provides a user-friendly API and supports use both offer a limited set of designs optimized across common, but poten-
as a standalone FPGA as well as an embedded FPGA. PRGA is a tially non-characteristic, use cases. Due to similar reasons, FPGA
great platform for FPGA architecture research, FPGA configuration architecture studies often use and stop at high-level models [4, 21].
memory research, FPGA CAD tool research, and heterogeneous To facilitate FPGA architecture research and enable designs op-
systems research. It is also a completely open-source framework timized for custom applications, tools are needed to evaluate, opti-
for designers who need a free and customizable FPGA IP core. An mize, and prototype FPGA architectures all the way down to the
FPGA designed with PRGA is placed and routed using standard fabrication level. An ideal framework would be easy-to-use, exten-
cell libraries. The design is evaluated and compared to prior works, sible, scalable, and open-source. A framework that provides synthe-
providing comparable performance and increased configurability. sizable RTL enables gate-level or transistor-level implementation
using commercial ASIC design flows and standard cell libraries. By
CCS CONCEPTS enabling such physical prototyping, a framework can be used to
• Hardware → Reconfigurable logic and FPGAs. evaluate timing, power, and area with the utmost fidelity. Likewise,
RTL-level prototyping incorporates the details of the configuration
KEYWORDS memory, enabling research on bitstream format and partial or dy-
namic reconfiguration. High-level modeling tools are an important
FPGA; FPGA architecture; open-source hardware
first step, but there exists a need for low-level (RTL and below)
ACM Reference Format: frameworks that can be used to study low-level issues such as
Ang Li and David Wentzlaff. 2021. PRGA: An Open-Source FPGA Research floorplanning, design regularity, signal integrity, and other physical
and Prototyping Framework. In Proceedings of the 2021 ACM/SIGDA Interna-
design issues all while providing the path to then take the optimized
tional Symposium on Field Programmable Gate Arrays (FPGA ’21), February
28-March 2, 2021, Virtual Event, USA. ACM, New York, NY, USA, 11 pages.
design through prototyping and fabrication.
https://doi.org/10.1145/3431920.3439294 In this paper, we present Princeton Reconfigurable Gate
Array (PRGA), a highly customizable, scalable, and complete
Permission to make digital or hard copies of all or part of this work for personal or open-source framework for building custom FPGAs. PRGA is
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation available at https://parallel.princeton.edu/prga. Fig. 1 shows
on the first page. Copyrights for components of this work owned by others than the the workflow used to design a custom FPGA and then develop an
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or application that uses it. The PRGA FPGA architecture is highly
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected]. customizable, and it supports user-provided modules such as SRAM
FPGA ’21, February 28-March 2, 2021, Virtual Event, USA macros, hard arithmetic units, and routing switches, all of which
© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. can be easily added into the flow. PRGA is developed in Python and
ACM ISBN 978-1-4503-8218-2/21/02. . . $15.00
https://doi.org/10.1145/3431920.3439294 provides a well-defined Python API. Extensions are encouraged and
Figure 1: Overview of a typical PRGA workflow
supported through modularization and low-level APIs. To further with support for partial reconfiguration. Custom configura-
lower the barrier of extending the framework, most output files tion circuitry can also be designed using low-level API.
are generated from human-readable Jinja [25] templates that are • CAD Support
customizable without changing the Python codebase. At the end of (1) Auto-generated Yosys script for synthesis: BRAM inference,
the FPGA design flow, PRGA produces human-readable, industry- hard logic techmap, and post-synthesis simulation.
standard, Verilog files that are synthesizable and physically imple- (2) Auto-generated, FASM-annotated VPR inputs for place-
mentable using commercial EDA Tools. PRGA is ASIC-friendly and ment, routing and raw bitstream generation.
can be used to generate standalone FPGAs as well as embedded • ASIC Compatibility
FPGAs, where the customization that PRGA provides is critical. (1) Bring-Your-Own-Circuits: replace generated modules with
PRGA is not derived from prior FPGA modeling/exploration custom Verilog modules or hard macros.
tools. Therefore, it is not restricted by the internal representations (2) ASIC-friendly module hierarchy: fracturable switch box
of legacy tools. This enables PRGA to support flexible hierarchies to maximize regularity; arbitrary levels of sub-arrays to
that match physical implementation needs. Likewise, the generated balance ASIC QoR and ease-of-backend.
configuration circuitry is highly flexible and is decoupled from the • Framework Extensibility
design hierarchy, opening up the ability for researchers to explore (1) Modularized, pass-based workflow. Passes may be added
novel configuration strategies (order, storage, and topology) which or modified without affecting the rest of the flow.
is a key component to building efficient FPGAs. Such physical- (2) Core data structure can be serialized to disk. Tools don’t
aware customizability is critical to the design of large-scale, high- need to rerun the entire building process every time.
performance, fabrication-ready FPGAs. In this paper, we evaluate PRGA by characterizing its scalability
In addition to designing the FPGA itself, PRGA offers a complete in terms of memory usage and runtime and find that it enables
HDL-to-bitstream solution using open-source CAD tools, config- the creation of very large designs with reasonable computational
uring and parameterizing those FPGA implementation CAD tools resources. In addition, we take a design through place and route to
for the created custom FPGA. Specifically, this flow uses Yosys [28] tape-in quality to show that PRGA is production-ready. Finally, we
for technology mapping and synthesis, VPR [21] for place & route, compare designs created with PRGA with prior and commercial
FASM [23] for raw bitstream generation, and a custom bitstream designs in terms of area and delay, and show that we are competitive
generator to convert the raw bitstream into binary format. The with other standard-cell-based FPGA generators.
target application can be verified by simulation with various levels PRGA enables many exciting applications. It is a great platform
of abstraction throughout the flow, making it easy to debug both for FPGA architecture research, in particular bridging the gap from
the FPGA itself and the application. Scripts and data files for the high-level FPGA architecture exploration tools down to low-level
same FPGA are reusable across application development runs. implementation details, as well as enabling RTL-in-the-loop FPGA
In summary, the key features of PRGA include: architecture optimization studies. It can also be used to build targets
• Architecture Customizability for FPGA CAD tool research, for example, security-aware place-
(1) Fully-customizable, heterogeneous logic blocks: LUT count, and-route tools. PRGA is a framework that allows the creation and
LUT size, local interconnect, hard adder chains, multi-modal exploration of many different FPGA designs, which makes it more
primitives, logic elements, and more. than an FPGA generator that can only generate a certain type of
(2) Bring-Your-Own-IP: block RAM, hardened multiplier/accu- FPGA. The FPGAs built with PRGA can be used either as standalone
mulator, and even big IP cores like CPUs, memory/network FPGAs or integrated into SoCs. It can even be an excellent platform
controllers, etc. for CPU-FPGA heterogeneous system research.
(3) Fully-customizable routing structure: switch box pattern,
connection box pattern, non-uniform channel, long wires, 2 PRGA WORKFLOW
and global wires. Fig. 1 shows an overview of a typical PRGA workflow. The FPGA
(4) Extensible configuration circuitry: simple scanchain-based design flow is driven by a user-written Python script, while the HDL-
configuration or complex, NoC-based, packetized bitstreams to-bitstream flow integrates open-source CAD tools to generate
valid bitstreams for the created custom FPGA.
Figure 2: FPGA architecture modeled by PRGA. The position, shape and size of the modules do not reflect the physical prop-
erties in an ASIC implementation. Programmable connections are shown as many-to-one connections in routing boxes and
blocks. ○1 are bridging nets discussed in Sec. 3.3.3; ○
2 is an unroutable clock pin directly connected to the global clock tree.
in almost perfect alignment with their logical positions. We adopt Table 3: Representative Path Delays
the cycle-free Switch Box pattern [15] and flatten the Switch Box
instances in the Arrays. Compared to a black-boxed approach in Once the layout is finished and verified, Static Timing Analysis
which blocks and routing boxes are designed individually then (STA) using automated EDA tools can be applied to extract the tim-
simply stitched together at the top level, this design flow applies ing and power characteristics from the FPGA. This information can
proper constraints at Array level, enabling the EDA tools to resolve be passed back into the generation passes to get timing-annotated
many hazards that are otherwise unidentifiable, for example, hold scripts for the HDL-to-bitstream toolchain.
time violations, clock skew, crosstalk, IR drop, etc.
Different challenges arise when designing the top-level Array 5 EVALUATION
because of its scale. To minimize EDA tool runtime without sacri-
ficing QoR, we reduce the logic left in the top-level Array to the 5.1 ASIC Implementation
minimum. Pins of the LOGIC Arrays are also aligned, reducing In this section, we evaluate the layout of the example FPGA built
wiring congestion. in Sec. 4, and compare the results with previous works [12, 24] and
from 2K LUT6s up to 512K LUT6s, and compare the memory usage,
Peak Memory (MB)
6 × 102
runtime, and output file sizes.
4 × 102
As shown in Fig. 6, PRGA has a low memory footprint, and ex-
3 × 102
cept for Routing Resource Graph (RRG) generation, PRGA only needs
2 × 102
a short amount of time to generate all the files needed. Memory
usage and runtime both scale linearly as the capacity of the archi-
103 Architecture Customization
Transformation & Generation Passes
tecture increases, but even for the half-million-gate architecture,
102 Pickling
PRGA finishes with less than 1GB memory within 5 minutes.
Runtime (s)
Unpickling
101 The bottleneck is RRG generation. RRG can be read by VPR
100 in addition to the architecture specification XML, overriding the
10−1 default auto-generated routing resource graph in VPR at a per-
Total runtime for RRG_GEN (s)
Generated Verilog
101 Pickled Context
6 ENABLED APPLICATIONS
100
This section outlines some of the compelling applications that
10−1 PRGA enables. PRGA can be used not only as a research platform
but also as a generator of custom FPGA IP cores.
RRG.xml file size (MB)
104