23 results sorted by ID
Possible spell-corrected query: is extension
Verifying Jolt zkVM Lookup Semantics
Carl Kwan, Quang Dao, Justin Thaler
Applications
Lookups are a popular way to express repeated constraints in state-of-the art SNARKs. This is especially the case for zero-knowledge virtual machines (zkVMs), which produce succinct proofs of correct execution for programs expressed as bytecode according to a specific instruction set architecture (ISA). The Jolt zkVM (Arun, Setty & Thaler, Eurocrypt 2024) for RISC-V ISA employs Lasso (Setty, Thaler & Wahby, Eurocrypt 2024), an efficient lookup argument for massive structured tables, to prove...
Marian: An Open Source RISC-V Processor with Zvk Vector Cryptography Extensions
Thomas Szymkowiak, Endrit Isufi, Markku-Juhani Saarinen
Implementation
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
SoK: Instruction Set Extensions for Cryptographers
Hao Cheng, Johann Großschädl, Ben Marshall, Daniel Page, Markku-Juhani O. Saarinen
Implementation
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
Towards ML-KEM & ML-DSA on OpenTitan
Amin Abdulrahman, Felix Oberhansl, Hoang Nguyen Hien Pham, Jade Philipoom, Peter Schwabe, Tobias Stelzer, Andreas Zankl
Implementation
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors
Oren Ganon, Itamar Levi
Implementation
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge...
Fast polynomial multiplication using matrix multiplication accelerators with applications to NTRU on Apple M1/M3 SoCs
Décio Luiz Gazzoni Filho, Guilherme Brandão, Julio López
Implementation
Efficient polynomial multiplication routines are critical to the performance of lattice-based post-quantum cryptography (PQC). As PQC standards only recently started to emerge, CPUs still lack specialized instructions to accelerate such routines. Meanwhile, deep learning has grown immeasurably in importance. Its workloads call for teraflops-level of processing power for linear algebra operations, mainly matrix multiplication. Computer architects have responded by introducing ISA extensions,...
PQ.V.ALU.E: Post-Quantum RISC-V Custom ALU Extensions on Dilithium and Kyber
Konstantina Miteloudi, Joppe Bos, Olivier Bronchain, Björn Fay, Joost Renes
Implementation
This paper explores the challenges and potential solutions of implementing the recommended upcoming post-quantum cryptography standards (the CRYSTALS-Dilithium and CRYSTALS-Kyber algorithms) on resource constrained devices.
The high computational cost of polynomial operations, fundamental to cryptography based on ideal lattices, presents significant challenges in an efficient implementation.
This paper proposes a hardware/software co-design strategy using RISC-V extensions to optimize...
Recommendation for a holistic secure embedded ISA extension
Florian Stolz, Marc Fyrbiak, Pascal Sasdrich, Tim Güneysu
Foundations
Embedded systems are a cornerstone of the ongoing digitization of our society, ranging from expanding markets around IoT and smart-X devices over to sensors in autonomous driving, medical equipment or critical infrastructures. Since a vast amount of embedded systems are safety-critical (e.g., due to their operation site), security is a necessity for their operation. However, unlike mobile, desktop, and server systems, where adversaries typically only act have remote access, embedded systems...
RISC-V Instruction Set Extensions for Lightweight Symmetric Cryptography
Hao Cheng, Johann Großschädl, Ben Marshall, Dan Page, Thinh Pham
Implementation
The NIST LightWeight Cryptography (LWC) selection process aims to standardise cryptographic functionality which is suitable for resource-constrained devices. Since the outcome is likely to have significant, long-lived impact, careful evaluation of each submission with respect to metrics explicitly outlined in the call is imperative. Beyond the robustness of submissions against cryptanalytic attack, metrics related to their implementation (e.g., execution latency and memory footprint) form an...
On the Performance Gap of a Generic C Optimized Assembler and Wide Vector Extensions for Masked Software with an Ascon-{\it{p}} test case
Dor Salomon, Itamar Levi
Implementation
Efficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to...
A Scalable SIMD RISC-V based Processor with Customized Vector Extensions for CRYSTALS-Kyber
Huimin Li, Nele Mentens, Stjepan Picek
Implementation
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1,600] permutation, which operates on an internal state of 1,600 bits, mostly represented as a $5\times5\times64{-}bit$ matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1,600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of...
SME: Scalable Masking Extensions
Ben Marshall, Dan Page
Supporting masking countermeasures for non-invasive side-channel security in instructions set architectures is a hard problem. Masked operations often have a large number of inputs and outputs, and enabling portable higher order masking has remained a difficult. However, there are clear benefits to enabling this in terms of performance, code density and security guarantees. We present SME, an instruction set extension for enabling secure and efficient software masking of cryptographic
code...
Architecture Support for Bitslicing
Pantea Kiaei, Tom Conroy, Patrick Schaumont
Implementation
The bitsliced programming model has shown to boost the throughput of software programs. However, on a standard architecture, it exerts a high pressure on register access, causing memory spills and restraining the full potential of bitslicing. In this work, we present architecture support for bitslicing in a System-on-Chip. Our hardware extensions are of two types; internal to the processor core, in the form of custom instructions, and external to the processor, in the form of direct memory...
A lightweight ISE for ChaCha on RISC-V
Ben Marshall, Daniel Page, Thinh Hung Pham
Implementation
ChaCha is a high-throughput stream cipher designed with the aim of ensuring high-security margins while achieving high performance on software platforms. RISC-V, an emerging, free, and open Instruction Set Architecture (ISA) is being developed with many instruction set extensions (ISE). ISEs are a native concept in RISC-V to support a relatively small RISC-V ISA to suit different use-cases including cryptographic acceleration via either standard or custom ISEs. This paper proposes a...
The design of scalar AES Instruction Set Extensions for RISC-V
Ben Marshall, G. Richard Newell, Dan Page, Markku-Juhani O. Saarinen, Claire Wolf
Implementation
Secure, efficient execution of AES is an essential requirement on most computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. RISC-V is a (relatively) new ISA that lacks such a standardised ISE. We survey the state-of-the-art industrial and academic ISEs for AES, implement and evaluate five different ISEs, one of which is novel. We recommend separate ISEs for 32 and 64-bit base architectures, with measured performance improvements for an AES-128...
Development of The RISC-V Entropy Source Interface
Markku-Juhani O. Saarinen, G. Richard Newell, Ben Marshall
Implementation
The RISC-V True Random Number Generator (TRNG) architecture breaks with previous ISA TRNG practice by splitting the Entropy Source (ES) component away from cryptographic DRBGs into a separate privileged interface, and in its use of polling. The modular approach is suitable for the RISC-V hardware IP ecosystem, allows a significantly smaller implementation footprint on platforms that need it, while directly supporting current standards compliance testing methods. We describe the interface,...
An Instruction Set Extension to Support Software-Based Masking
Si Gao, Johann Großschädl, Ben Marshall, Dan Page, Thinh Pham, Francesco Regazzoni
Implementation
In both hardware and software, masking can represent an effective means of hardening an implementation against side channel attack vectors such as Differential Power Analysis (DPA). Focusing on software, however, the use of masking can present various challenges: specifically, it often 1) requires significant effort to translate any theoretical security properties into practice, and, even then, 2) imposes a significant overhead in terms of efficiency. To address both challenges, this paper...
RISQ-V: Tightly Coupled RISC-V Accelerators for Post-Quantum Cryptography
Tim Fritzmann, Georg Sigl, Johanna Sepúlveda
Public-key cryptography
Empowering electronic devices to support Post-Quantum Cryptography (PQC) is a challenging task. PQC introduces new mathematical elements and operations which are usually not easy to implement on standard processors. Especially for low cost and resource constraint devices, hardware acceleration is usually required. In addition, as the standardization process of PQC is still ongoing, a focus on maintaining flexibility is mandatory. To cope with such requirements, hardware/software co-design...
ISA Extensions for Finite Field Arithmetic - Accelerating Kyber and NewHope on RISC-V
Erdem Alkim, Hülya Evkan, Norman Lahr, Ruben Niederhagen, Richard Petri
Implementation
We present and evaluate a custom extension to the RISC-V instruction set
for finite fields arithmetic. The result serves as a very compact
approach to software-hardware co-design of PQC implementations in the
context of small embedded processors such as smartcards. The extension
provides instructions that implement finite field operations with
subsequent reduction of the result. As small finite fields are used in
various PQC schemes, such instructions can provide a considerable
speedup for...
A Lattice-based Enhanced Privacy ID
Nada EL Kassem, Luis Fiolhais, Paulo Martins, Liqun Chen, Leonel Sousa
Cryptographic protocols
The Enhanced Privacy ID (EPID) scheme is currently used for hardware enclave attestation by an increasingly large number of platforms that implement Intel Software Guard Extensions (SGX). However, the scheme currently deployed by Intel is supported on Elliptic Curve Cryptography (ECC), and will become insecure should a large quantum computer become available. As part of National Institute of Standards and Technology (NIST)'s effort for the standardisation of post-quantum cryptography, there...
SNEIK on Microcontrollers: AVR, ARMv7-M, and RISC-V with Custom Instructions
Markku-Juhani O. Saarinen
Implementation
SNEIK is a family of lightweight cryptographic algorithms derived from a
single 512-bit permutation. The SNEIGEN ``entropy distribution
function'' was designed to speed up certain functions in post-quantum
and lattice-based public key algorithms.
We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit
ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an
implementation for the open-source RISC-V (RV32I) Instruction Set
Architecture (ISA). Our results demonstrate...
Data Oblivious ISA Extensions for Side Channel-Resistant and High Performance Computing
Jiyong Yu, Lucas Hsiung, Mohamad El Hajj, Christopher W. Fletcher
Foundations
Blocking microarchitectural (digital) side channels is one of the most pressing challenges in hardware security today. Recently, there has been a surge of effort that attempts to block these leakages by writing programs data obliviously. In this model, programs are written to avoid placing sensitive data-dependent pressure on shared resources. Despite recent efforts, however, running data oblivious programs on modern machines today is insecure and low performance. First, writing programs...
Simple AEAD Hardware Interface (SÆHI) in a SoC: Implementing an On-Chip Keyak/WhirlBob Coprocessor
Markku-Juhani O. Saarinen
Simple AEAD Hardware Interface (SÆHI) is a hardware cryptographic interface aimed at CAESAR Authenticated Encryption with Associated Data (AEAD) algorithms. Cryptographic acceleration is typically achieved either with a coprocessor or via instruction set extensions. ISA modifications require re-engineering the CPU core, making the approach inapplicable outside the realm of open source processor cores. At minimum, we suggest implementing CAESAR AEADs as universal memory-mapped cryptographic...
Lookups are a popular way to express repeated constraints in state-of-the art SNARKs. This is especially the case for zero-knowledge virtual machines (zkVMs), which produce succinct proofs of correct execution for programs expressed as bytecode according to a specific instruction set architecture (ISA). The Jolt zkVM (Arun, Setty & Thaler, Eurocrypt 2024) for RISC-V ISA employs Lasso (Setty, Thaler & Wahby, Eurocrypt 2024), an efficient lookup argument for massive structured tables, to prove...
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge...
Efficient polynomial multiplication routines are critical to the performance of lattice-based post-quantum cryptography (PQC). As PQC standards only recently started to emerge, CPUs still lack specialized instructions to accelerate such routines. Meanwhile, deep learning has grown immeasurably in importance. Its workloads call for teraflops-level of processing power for linear algebra operations, mainly matrix multiplication. Computer architects have responded by introducing ISA extensions,...
This paper explores the challenges and potential solutions of implementing the recommended upcoming post-quantum cryptography standards (the CRYSTALS-Dilithium and CRYSTALS-Kyber algorithms) on resource constrained devices. The high computational cost of polynomial operations, fundamental to cryptography based on ideal lattices, presents significant challenges in an efficient implementation. This paper proposes a hardware/software co-design strategy using RISC-V extensions to optimize...
Embedded systems are a cornerstone of the ongoing digitization of our society, ranging from expanding markets around IoT and smart-X devices over to sensors in autonomous driving, medical equipment or critical infrastructures. Since a vast amount of embedded systems are safety-critical (e.g., due to their operation site), security is a necessity for their operation. However, unlike mobile, desktop, and server systems, where adversaries typically only act have remote access, embedded systems...
The NIST LightWeight Cryptography (LWC) selection process aims to standardise cryptographic functionality which is suitable for resource-constrained devices. Since the outcome is likely to have significant, long-lived impact, careful evaluation of each submission with respect to metrics explicitly outlined in the call is imperative. Beyond the robustness of submissions against cryptanalytic attack, metrics related to their implementation (e.g., execution latency and memory footprint) form an...
Efficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to...
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1,600] permutation, which operates on an internal state of 1,600 bits, mostly represented as a $5\times5\times64{-}bit$ matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1,600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of...
Supporting masking countermeasures for non-invasive side-channel security in instructions set architectures is a hard problem. Masked operations often have a large number of inputs and outputs, and enabling portable higher order masking has remained a difficult. However, there are clear benefits to enabling this in terms of performance, code density and security guarantees. We present SME, an instruction set extension for enabling secure and efficient software masking of cryptographic code...
The bitsliced programming model has shown to boost the throughput of software programs. However, on a standard architecture, it exerts a high pressure on register access, causing memory spills and restraining the full potential of bitslicing. In this work, we present architecture support for bitslicing in a System-on-Chip. Our hardware extensions are of two types; internal to the processor core, in the form of custom instructions, and external to the processor, in the form of direct memory...
ChaCha is a high-throughput stream cipher designed with the aim of ensuring high-security margins while achieving high performance on software platforms. RISC-V, an emerging, free, and open Instruction Set Architecture (ISA) is being developed with many instruction set extensions (ISE). ISEs are a native concept in RISC-V to support a relatively small RISC-V ISA to suit different use-cases including cryptographic acceleration via either standard or custom ISEs. This paper proposes a...
Secure, efficient execution of AES is an essential requirement on most computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. RISC-V is a (relatively) new ISA that lacks such a standardised ISE. We survey the state-of-the-art industrial and academic ISEs for AES, implement and evaluate five different ISEs, one of which is novel. We recommend separate ISEs for 32 and 64-bit base architectures, with measured performance improvements for an AES-128...
The RISC-V True Random Number Generator (TRNG) architecture breaks with previous ISA TRNG practice by splitting the Entropy Source (ES) component away from cryptographic DRBGs into a separate privileged interface, and in its use of polling. The modular approach is suitable for the RISC-V hardware IP ecosystem, allows a significantly smaller implementation footprint on platforms that need it, while directly supporting current standards compliance testing methods. We describe the interface,...
In both hardware and software, masking can represent an effective means of hardening an implementation against side channel attack vectors such as Differential Power Analysis (DPA). Focusing on software, however, the use of masking can present various challenges: specifically, it often 1) requires significant effort to translate any theoretical security properties into practice, and, even then, 2) imposes a significant overhead in terms of efficiency. To address both challenges, this paper...
Empowering electronic devices to support Post-Quantum Cryptography (PQC) is a challenging task. PQC introduces new mathematical elements and operations which are usually not easy to implement on standard processors. Especially for low cost and resource constraint devices, hardware acceleration is usually required. In addition, as the standardization process of PQC is still ongoing, a focus on maintaining flexibility is mandatory. To cope with such requirements, hardware/software co-design...
We present and evaluate a custom extension to the RISC-V instruction set for finite fields arithmetic. The result serves as a very compact approach to software-hardware co-design of PQC implementations in the context of small embedded processors such as smartcards. The extension provides instructions that implement finite field operations with subsequent reduction of the result. As small finite fields are used in various PQC schemes, such instructions can provide a considerable speedup for...
The Enhanced Privacy ID (EPID) scheme is currently used for hardware enclave attestation by an increasingly large number of platforms that implement Intel Software Guard Extensions (SGX). However, the scheme currently deployed by Intel is supported on Elliptic Curve Cryptography (ECC), and will become insecure should a large quantum computer become available. As part of National Institute of Standards and Technology (NIST)'s effort for the standardisation of post-quantum cryptography, there...
SNEIK is a family of lightweight cryptographic algorithms derived from a single 512-bit permutation. The SNEIGEN ``entropy distribution function'' was designed to speed up certain functions in post-quantum and lattice-based public key algorithms. We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an implementation for the open-source RISC-V (RV32I) Instruction Set Architecture (ISA). Our results demonstrate...
Blocking microarchitectural (digital) side channels is one of the most pressing challenges in hardware security today. Recently, there has been a surge of effort that attempts to block these leakages by writing programs data obliviously. In this model, programs are written to avoid placing sensitive data-dependent pressure on shared resources. Despite recent efforts, however, running data oblivious programs on modern machines today is insecure and low performance. First, writing programs...
Simple AEAD Hardware Interface (SÆHI) is a hardware cryptographic interface aimed at CAESAR Authenticated Encryption with Associated Data (AEAD) algorithms. Cryptographic acceleration is typically achieved either with a coprocessor or via instruction set extensions. ISA modifications require re-engineering the CPU core, making the approach inapplicable outside the realm of open source processor cores. At minimum, we suggest implementing CAESAR AEADs as universal memory-mapped cryptographic...