Longnail: High-Level Synthesis of Portable Custom Instruction Set Extensions for RISC-V Processors from Descriptions in the Open-Source CoreDSL Language

J Oppermann, BM Damian-Kosterhon… - Proceedings of the 29th …, 2024 - dl.acm.org
J Oppermann, BM Damian-Kosterhon, F Meisel, T Mürmann, E Jentzsch, A Koch
Proceedings of the 29th ACM International Conference on Architectural …, 2024dl.acm.org
In the RISC-V ecosystem, custom instruction set architecture extensions (ISAX) are an
energy-efficient and cost-effective way to accelerate modern embedded workloads.
However, exploring different combinations of base cores and ISAXes for a specific
application requires automation and a level of portability across microarchitectures that is
not provided by existing approaches. To that end, we present an end-to-end flow for ISAX
specification, generation, and integration into a number of host cores having a range of …
In the RISC-V ecosystem, custom instruction set architecture extensions (ISAX) are an energy-efficient and cost-effective way to accelerate modern embedded workloads. However, exploring different combinations of base cores and ISAXes for a specific application requires automation and a level of portability across microarchitectures that is not provided by existing approaches.
To that end, we present an end-to-end flow for ISAX specification, generation, and integration into a number of host cores having a range of different microarchitectures. For ISAX specification, we propose CoreDSL, a novel behavioral architecture description language that is concise, easy to learn, and open source. Hardware generation is handled by Longnail, a domain-specific high-level synthesis tool that compiles CoreDSL specifications into hardware modules compatible with the recently introduced SCAIE-V extension interface, which we rely on for automatic integration into the host cores.
We demonstrate our tooling by generating ISAXes using a mix of features, including complex multi-cycle computations, memory accesses, branch instructions, custom registers, and decoupled execution across four embedded cores and evaluate the quality of results on a 22nm ASIC process.
ACM Digital Library