Cgra Content

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 2

Reconfigurable computing

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like fieldprogrammable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the datapath itself in addition to the control flow. On the other hand, the main difference with custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric.

Current systems
The reconfigurable computers can be categorized in two classes of architectures: hybrid computer and fully FPGA based computers. Both architectures are designed to transport the benefits of reconfigurable logic to large scale computing. They can be used in traditional CPU cluster computers and network infra structures. The hybrid computer combine a single or a couple of reconfigurable logic chip, FPGAs, with a standard microprocessor CPU by exchanging e.g. one CPU of a multi CPU board with a FPGA, also known as hybrid-core computing, or adding a PCI or PCI Express based FPGA expansion card to the computer. Simplified, they are Von-Neumann based architectures with an integrated FPGA accelerator. This architectural compromise results in a reduced scalability of hybrid computers and raises their power consumption. They have the same bottlenecks as all von Neumann based architectures have nevertheless it enables users to get an acceleration of their algorithm without losing their standard CPU based environment. A relatively new class are the fully FPGA based computers. This class usually contains no CPUs or uses the CPUs only as interface to the network environment. Their benefit is to transport the energy efficiency and scalability of FPGAs fully without compromise to their users. Depending on the Architecture of the interconnection between the FPGAs these machines are fully scalable even across single machine borders. Their bus system and overall architecture eliminate the bottlenecks of the von Neumann architecture.

Programming of reconfigurable computers


The FPGA reconfiguration can be accomplished either via the traditional hardware description languages (HDLs), which can be generated directly or by using electronic design automation ("EDA") or electronic system level ("ESL") tools, employing high level languages like the graphical tool Starbridge Viva or C-based languages like for example ROCCC 2.0 from Jacquard Computing Inc.,Impulse C from Impulse Accelerated Technologies, SystemC, LISA, Handel-C from Celoxica, DIMEC from Nallatech, C-to-Verilog.com or Mitrion-C from Mitrionics, or graphical programming languages like LabVIEW.

Granularity
The granularity of the reconfigurable logic is defined as the size of the smallest functional unit (configurable logic block, CLB) that is addressed by the mapping tools. High granularity, which can also be known as fine-grained, often implies a greater flexibility when implementing algorithms into the hardware. However, there is a penalty associated with this in terms of increased power, area and delay due to greater quantity of routing required per computation. Fine-grained architectures work at the bitlevel manipulation level; whilst coarse grained processing elements (reconfigurable datapath unit, rDPU) are better optimised for standard data path applications. One of the drawbacks of coarse grained architectures are that they tend to lose some of their utilisation and performance if they need to perform smaller computations than their granularity provides, for example for a one bit add on a four bit wide functional unit would waste three bits. This problem can be solved by having a coarse grain array (reconfigurable datapath array, rDPA) and a FPGA on the same chip. Coarse-grained architectures (rDPA) are intended for the implementation for algorithms needing wordwidth data paths (rDPU). As their functional blocks are optimized for large computations and typically comprise word wide arithmetic logic units (ALU), they will perform these computations more quickly and with more power efficiency than a set of interconnected smaller functional units; this is due to the connecting wires being shorter, resulting in less wire capacitance and hence faster and lower power designs. A potential undesirable consequence of having larger computational blocks is that when the size of operands may not match the algorithm an inefficient utilisation of resources can result. Often the type of applications to be run are known in advance allowing the logic, memory and routing resources to be tailored (for instance, see KressArray Xplorer) to enhance the performance of the device whilst still providing a certain level of flexibility for future adaptation. Examples of this are domain specific arrays aimed at gaining better performance in terms of power, area, throughput than their more generic finer grained FPGA cousins by reducing their flexibility.

You might also like