FURISC: FHE Encrypted URISC Design: Ayantika Chatterjee, Indranil Sengupta

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

1

FURISC: FHE Encrypted URISC Design


Ayantika Chatterjee, Indranil Sengupta
[email protected], [email protected]

Abstract—This paper proposes design of a Fully Homomorphic data need to be transferred to and from the server and it is
Ultimate RISC (FURISC) based processor. The FURISC archi- repeatedly exposed to adversary. Another major drawback is
tecture supports arbitrary operations on data encrypted with that huge amount of cloud resources can only be used for
Fully Homomorphic Encryption (FHE) and allows the execution
of encrypted programs stored in processors with encrypted storing data only, can never be used for processing of critical
memory addresses. The FURISC architecture is designed based information. To avoid these issues, it is required to delegate
on fully homomorphic single RISC instructions like Subtract the ability to process the data without decrypting it. In this
Branch if Negative (SBN) and MOVE. This paper explains how scenario, homomorphic encryption scheme is the only answer
the use of FHE for designing the ultimate RISC processor is to this problem [2].
better in terms of security compared to previously proposed
somewhat homomorphic encryption (SHE) based processor. The As it was discussed in [1], direct computation on en-
absence of randomization in SHE can lead to Chosen Plaintext crypted data can be achieved by adding interaction and using
Attacks (CPA) which is alleviated by the use of the FHE based secured hardware. However, Rivest et al. first introduced
Ultimate RISC instruction. Furthermore, the use of FURISC the concept of privacy homomorphism [3]. Homomorphic
helps to develop fully homomorphic applications by tackling encryption scheme is a public encryption scheme which allows
the termination problem, which is a major obstacle for FHE
processor design. The paper compares the MOVE based FHE algebraic manipulations on ciphertexts [1]. That implies any
RISC processor with the SBN alternative, and shows that the user who is only given two ciphertexts Encrypt(m1 , pk) and
later is more efficient in terms of number of instructions and time Encrypt(m2 , pk) of elements of group (G2 , ∗), can compute as
required for the execution of a program. Finally, an SBN based Encrypt(m1 , pk) ⊕ Encrypt(m2 , pk) without the knowledge
FURISC processor simulator has been designed to demonstrate of secret key and plaintexts. Operations ∗ and ⊕ depend on
that various algorithms can indeed be executed on data encrypted
with FHE, providing a solution to the termination problem for the choice of encryption scheme. Hence, with the use of such
FHE based processors and the CPA insecurity of SHE processors encryption scheme, cloud can process encrypted data without
simultaneously. knowing the actual data and result.
Index Terms—Fully Homomorphic encryption, Cloud, URISC. However, capability of processing directly on single en-
crypted data does not suffice the requirement of fully secured
computation. If the computation flow remains unencrypted
in secured processing, that may leak sensitive information.
I. I NTRODUCTION Hence, present researchers are exploring to develop secured
Cloud computing evolves a new paradigm to increase com- encrypted processors where data as well as computations
puting and storage capability using external service providers. both are encrypted. In [4], a Turing complete encrypted One
Using the concept of ”loan of software“ and hardware, cloud Instruction Set Computer (OISC) has been proposed based on
mitigates the need of large resources. However, every solution partially homomorphic Paillier encryption scheme. However,
comes with more new problems. Hence, use of cloud to this design suffers from a few limitations from the security
store any sensitive data may lead to security hindrance. To point of view. Firstly, to design the encrypted memory the
establish a successful and trustworthy service, it is expected encryption scheme is considered to be deterministic and that
that the cloud service provider will protect the privacy of makes the design susceptible to Chosen Plaintext Attack
the information stored in cloud and to achieve this, different (CPA). However, underlying somewhat homomorphic encryp-
techniques have been acquired both in client side and server tion scheme is incapable of designing encrypted memory sup-
side. In spite of this, external attackers may penetrate while porting randomized encryption scheme. This randomization is
internal attackers may compromise information. supported only by Fully homomorphic encryption (FHE) as
Concerns regarding privacy and security are the biggest an underlying scheme.
hurdles for the adoption of cloud computing by security- In literature, researches on FHE are taking place in different
conscious enterprises [1]. Cryptographic techniques can pro- directions. Encrypted bitwise additions and multiplications are
vide a solution to this cloud security problem. Other than defined in [5] and implemented using integers in [6] and
providing privacy through anonymity, classical encryption- [7]. Further efficiency enhancement on fully homomorphic
decryption techniques are beneficial. Users can store encrypted encryption has been reported in [8], [9] and [10]. In [11]
form of potentially sensitive data in such public server to and [12] advancements have been proposed to implement
maintain the confidentiality. However, this solution requires faster encryption schemes. Further, in [13] and [14] recent
extra overhead in case of processing the stored data. Every developments of FHE have been discussed. In [15]–[17],
time for any simple processing on stored data, decryption searching and sorting on FHE data have been investigated.
is necessary. Further, costly encryption operation is required To accelerate the performance of FHE, use of hardware has
to upload the data back to the cloud. In this way, sensitive been also investigated in [18].
2

In [19], authors have given an initial layout of designing A. Homomorphisms and Fully Homomorphic Encryption
FHE encrypted processors. However, determination of termi- Scheme
nation point of any encrypted program or identifying the end Homomorphism is a structure-preserving transformation be-
point of any encrypted loop are major challenges in case of tween two sets, where an operation on two members in the
designing FHE based processor. In [20], a proposed solution first set is preserved in the second set on the corresponding
of this problem is to define a possible maximum loop length members. Let P and C be sets with members p1 , p2 ∈ P , t is a
and the loop or program terminates once the maximum value transformation between the two sets with its reverse function
is achieved. This solution requires large number of redundant t0 and an operation ⊕. The system is a homomorphism, if
operations in a program as number of loops increases. In our ∀(p1 , p2 ) ∈ P , (p1 ⊕ p2 ) = t0 (t(p1 ) t(p2 )). If there
work, we explore how this problem can be better handled are two functions ⊕ and ⊗ , such that ∀(p1 , p2 ) ∈ P ,
with client intervention and message passing protocol between (p1 ⊕ p2 ) = t0 (t(p1 ) t(p2 )) and ∀(p1 , p2 ) ∈ P , (p1 ⊗ p2 ) =
client and server. Further, we combine the flexibility of pro- t0 (t(p1 ) ∗ t(p2 )). This is called an algebraic homomorphism.
cessing arbitrary operations on encrypted data by FHE scheme Operations ⊕ and ⊗ on plaintext may be similar or may be
with simplicity of unit reduced instruction set architecture different with the operations and ∗ performed on ciphertext.
(URISC) and investigate the benefit of applying such design The obvious practical implication is the possibility to trans-
to solve the termination problem. form the two members p1 and p2 into the range of C, thus
Our contribution in this paper is to develop an encrypted applying some sort of encryption, and having the operations ⊕
processor able to perform arbitrary computations on encrypted and ⊗ (or equivalent operations) performed by a third party.
data with encrypted instructions. However, while working in The result can then be decrypted back into the range of P . An
encrypted domain, handling different machine opcode is dif- algebraically homomorphic crypto-system can be described as
ficult since, same opcode generates (bitwise) different ciphers a 6-tuple H1 = (P, C, t, t0 , ⊕, ⊗) where P and C denote the
due to the randomization property of encryption scheme. This plain-text space and the ciphertext space, respectively, whereas
motivates us to design FHE encrypted unit reduced instruction t and t0 denote the encryption and decryption functions. ⊕ and
set computer (FURISC) architecture which works with single ⊗ tag the two algebraic operations. In group homomorphic en-
opcode rather than a multi-instruction processor. The URISC cryption scheme (GHE), the encryption function forms group
is considered the penultimate reduction of Reduced instruction homomorphism and the encryption scheme allows an operation
set computer [1], which is capable of synthesizing a complete on ciphertexts being equivalent to some binary operations on
set of operations with the help of single instruction. Here, we corresponding plaintexts [1].
provide the design of Subtract and Branch if Negative (SBN)
and Move operation based FURISC architecture and finally B. Fully Homomorphic Encryption
explain how the encrypted CPA secured FURISC architecture
Fully homomorphic encryption (FHE) scheme is an ex-
is capable of handling the encrypted loop termination problem
tended form of group homomorphic encryption (GHE). GHE
in a more practical way. With examples of basic sorting
only supports a single arbitrary operation on plaintext (as well
and searching techniques we show the timing requirement of
as on ciphertext), whereas FHE supports two arbitrary opera-
actually computing different arbitrary operations on encrypted
tions (+, ∗) on plaintexts (as well as (⊕, ) on ciphertexts).
data.
Gentry defined FHE scheme is explained in [2]. The scheme
The rest of our paper is organized as follows: In section
has the security parameter λ, and sets N = λ, P = λ2 , Q =
II, we discuss the preliminaries of homomorphism, specially
λ5 . The scheme also uses two integer parameters 0 < α < β
the FHE scheme. Next, section III gives the justification
and the following algorithms:
of designing FHE based OISC. In section IV, we explain
our proposed design of FHE based SBN processor and it 1) KeyGen(λ): Generate a random P -bit odd integer, p.
is compared with Move based FHE processor in section V. A set → −y = {y1 , y2 , . . . yβ } is generated such that yi ∈
Finally, we compare our design with existing works in section [0, 2). Out of these elements, there must P exist a sparse
VII and conclude in section VIII highlighting some possible subset S ⊂ → −
y of α elements, such that yj ∈S (yj ) =
1
future works. p mod 2. Set sk to be a binary encoding s of the sparse
subset S, where s = (0, 1)β . Set pk ← (p, → −y ).
II. P RELIMINARIES 2) Encrypt(pk, m): Obtain the ciphertext c = m0 + pq,
Before going to the detailed design of the proposed en- where m0 is a random N -bit integer st. m = m0 mod 2.
crypted processor, we first discuss the basic principle operation Generate → −
z : zi ← c.yi mod 2. Return c∗ = (c, → −z ). In
of the FHE scheme. Fully Homomorphic encryptions provide the rest of the paper, we shall mention Encrypt(pk, m)
a mechanism to perform arbitrary computations over encrypted as Encrypt.
3) Decrypt(sk, c∗ ): Output LSB(c) XOR LSB(b t St zt e),
P
data. The promise shown in the work of Gentry [2] had
been followed by several improvements to develop more where LSB() returns the least significant bit of the input,
efficient realizations of this technique, which has potential and b.e returns the nearest integer to the input. P Decryption
applications for performing privacy preserving operations, that works since (up to small precision errors) t St zt =
c
P
is relevant to cloud computing. In this section, we first provide t cS t y t = p mod 2.
a brief outline of the FHE scheme and a popular library for The above encryption allows arbitrary computations
performing the basic computations based on this encryption. on encrypted data by defining operations like
3

Evaluate(f, c1 , . . . , ct ), where f is an arbitrary operation snippet, if(a[i] > a[j]) i = i + 1; It can be observed that
on the ciphertexts, c1 , . . . , ct . The result of the computation if the data is encrypted, the outcome of the comparison is
is always a ciphertext, c whose decryption would be same also encrypted which leads to the fact that, to update the
as the function f applied on the plaintexts corresponding to, index of the array the index also needs to be encrypted.
c1 , . . . , ct . However, the decryption can be erroneous if the Thus we develop a processor architecture wherein the data
noise (measured as c mod p ) increases. In order to reduce and the memory content is encrypted. Using the standard
the error during the computations, there is an additional load-store paradigm of RISC processors thus the program,
operation, called Recrypt which takes the ciphertext, c and which is comprised of the instructions from the Instruction Set
produces another ciphertext, say c0 which corresponds to the Architecture (ISA) is also encrypted. In this section, we study
same plaintext, but with a reduced noise level. The operation the motivation of using a single RISC instruction, URISC, to
is done by allowing to compute the decryption function, as build such a processor. We also address several related issues
the function f in the Evaluate function. and discuss the motivation of choosing a FHE based URISC,
However, direct application of Gentry’s FHE scheme has which we call as FURISC.
performance issues, hence lots of improvements and ap- 1) Why a single Instruction?: As discussed, the memory
proaches from alternate assumptions have been proposed in content, which stores both the data and the instructions, have
[6], [21]. In our work, while performing homomorphic oper- to be in an encrypted format. It may be mentioned, that for
ations, we have re-used the homomorphic modules proposed protections against Chosen-Plaintext Attacks (CPA) and other
in Scarab library [22]. stronger forms of adversaries, the encryption algorithms are
randomized. This implies that the same plaintext, m can be
C. Scarab library encrypted to different ciphertexts, c = Enc(m, r), where Enc
is the encryption algorithm and r is the random input1 . Thus
Scarab library is an implementation of a FHE scheme a computer which has multiple instructions in its Instruction
using large integers. This scheme is based on the proposed Set Architecture (ISA) will lead to the situation where with
work in [23] with some modifications in recrypt operation. varying keys the same instruction would give rise to different
In [23], authors have constructed a modified FHE scheme encrypted instructions, and hence varying opcodes. This would
with relatively small key and ciphertext size from a some- make the functioning of the computer infeasible. URISC
what homomorphic scheme based on Gentry’s work [2]. This provides a unique opportunity in this context. A URISC is
modification has smaller message expansion and key size an abstract machine, which uses only a single instruction and
than Gentry’s original scheme and also allows efficient fully other necessary instructions are composed from the single
homomorphic encryption over any field of characteristic two. instruction set [25]. Thus, the URISC is Turing Complete, and
Hence, this work is more practical in case of applying FHE to one can perform all computations using a single instruction.
real applications and this is the building block of the Scarab This resolves the confusion regarding varying opcode in case
library. of a standard RISC or CISC processor, which has multiple
The implementation of this library uses the GNU Multiple instructions in their ISA.
Precision Arithmetic Library (GMP) for large integers and 2) Pitfalls of using Somewhat Homomorphic Schemes: Why
Fast Library for Number Theory (FLINT) as helping libraries. Fully Homomorphic Encryption?: For designing encrypted
Detailed encryption-decryption scheme along with the modifi- processors, somewhat homomorphic schemes are the first
cations in recrypt operation has been vividly explained in [24]. choice since FHE scheme suffers from performance issues. In
In the following sections, we explore the design of FURISC [4] and [26], authors have explored the design of encrypted one
processor using the modules present in Scarab library. instruction set processor based on the Paillier based encryp-
tion, which is an additive homomorphic encryption scheme.
III. I MPLEMENTING H OMOMORPHIC E NCRYPTION USING The underlying instruction which is a subleq (alias SBN) is a
A U LTIMATE RISC I NSTRUCTION single instruction whose arithmetic computation is a subtrac-
Ultimate RISC (URISC) is the minimalistic perspective tion on two operand values. In the same instruction, depending
to computer architecture design, where a single instruction on whether the result is positive or negative, the Program
is used to perform all computations. In this section, we Counter (PC) gets updated to the next address or an instruction
first outline the rationale of using URISC for realizing FHE mentioned as another operand of the SBN instruction. Since,
algorithms. Paillier encryption supports subtraction on encrypted data this
is a promising choice to develop a processor for performing
arbitrary computations on encrypted data (decompose the
A. Justification of Single Instruction Processor for Encrypted program using encrypted SBN instructions, and subsequently
Data execute them using Paillier Encryption algorithm).
Fully Homomorphic Encryption (FHE) provides an avenue Unfortunately, the design suffers from a serious deficiency.
for performing arbitrary computations on encrypted data. The PC needs to be updated based on an encrypted condition
However, capability to operate on encrypted data alone is not after the subtraction. Thus while the subtraction is supported
sufficient for secured computation. In order to ensure that the by the underlying SHE, the update of the PC needs an
control flow of the program is secured it is necessary that
the address space is also encrypted. Consider, the program 1 The decryption is however always deterministic algorithm.
4

encrypted decision making module which can be realized require the decryption key. However, server does not have
by an encrypted multiplexer. To explain, consider a decision access to the secret key or decryption capability and hence it
block, where depending on an encrypted condition c0 , output y 0 requires client intervention to handle the termination problem.
may be a0 or b0 (all the variables are encrypted). The decision Next, we show how message passing protocol between server
block can be realized by a multiplexer y 0 = a0 (c0 ) + b0 (c0 ), and client can be a better option to handle this termination
where the computations of the right hand side are homomor- problem.
phic. Thus, design of a multiplexer on encrypted data and
control requires capability to perform both encrypted addition B. Client Intervention to Handle Termination
and multiplication, which is not supported by any somewhat
Homomorphic operations perform directly on encrypted
homomorphic scheme.
data and produce the final encrypted results in the cloud
In order to make these decisions, the design proposed in [4]
server. However, real world works on unencrypted data, hence
uses sign lookup memory table for storing sign for encryptions
authorized client may decrypt the encrypted result finally at
of numbers. Moreover, it is assumed that the encryption is
the client side if the unencrypted value is required for further
deterministic and the public key of the encryption is unknown
processing. This capacity of decryption in the client end can
to the adversary. In several real life scenarios such a restriction
be used to solve the problem of encrypted loop termination
may not be feasible and the deterministic encryption can make
without leaking any critical information.
the processor computations vulnerable to chosen plaintext
In Fig. 1, we explain a generalized encrypted loop execution
attack (CPA) [27].
and termination with client intervention. Here, we define an
This motivates us to look into replacing SHE with FHE,
unencrypted variable loopHndl at the server side such that
since FHE supports both addition and multiplication, which
loop is getting executed if loopHndl = 1. FHE Compare is
in turn is capable of designing an encrypted decision module,
an encrypted module which compares between encrypted loop
namely the multiplexer. This provides the flexibility of making
counter variable (enc i) and encrypted value of maximum
branch decisions and PC updations even when the encryption
loop count and generates compResult, which is sent to client
is randomized, without the use of any static encryption table
in each iteration of loop. compResult value is decrypted
and making the encryption deterministic. All these issues
at the client side and if it is 1 (indicates enc i reached
motivate to design an encrypted processor with FHE as the
the maximum value and the loop should be terminated) an
underlying encryption scheme.
interrupt is generated in the client side to set the value of
3) Issue in Fully Homomorphic Processor : Termination
loop End to 1. This unencrypted value is sent back to the
problem: Effort to design FHE based processor has been
server as loopHndl and based on this value the control exits
first made in [20]. However, a major open problem is to
from the loop. Since, loopHndl is not directly related to the
detect and handle the termination of encrypted processes.
critical information of the instruction executed in the loop,
Since, any encrypted process is locked in the cipher-space, all
this unencrypted traffic from client to server does not reveal
the intermediate termination conditions are encrypted. Hence,
any sensitive information. Additionally, server and client can
it is impossible to identify the termination points of loop
settle on some symmetric encryption key and the client can
termination or process termination from unencrypted domain.
sent encrypted value of loopHndl to the server and server can
One possible solution of encrypted loop termination problem
then decrypt it.
as proposed in [20], is by defining the maximum number
However, this way of handling loop termination requires
of cycles, that need to be performed to safely execute the
client intervention and decryption of the loopHndl signal
encrypted program. However, this incurs extra overhead of
for each loop. In practical scenario, any program should
redundant operations. Moreover, when any program consists
consist of multiple loops and hence large number of message
of numbers of loops, termination of each loop is handled in
passing from client to cloud server as well as decryption of
the same way by mentioning maximum number of possible
each signal need to be handled separately. This incurs extra
cycles. Hence, large number of redundant FHE operations,
overhead in terms of network bandwidth, synchronization and
required to handle each loop termination further increase the
decryption operation. Subsequently, we shall explain how this
cost of overhead. When performance of FHE operations is a
multiple message passing for termination can be reduced in
major hurdle, this solution of termination handling is an added
an efficient way if the underlying processor is an encrypted
bottleneck to performance.
FURISC architecture and termination can be handled with
Here, we propose a different approach of handling termi-
single message passing between client and server.
nation problem in a better way in any cloud-server setting,
more specifically when the server is a public cloud. We
consider authorized clients to encrypt FHE data and store them C. Solving Termination Problem in Fully Homomorphic Pro-
in the cloud server, where homomorphic processing on the cessor using FURISC
encrypted data is supposed to take place generating encrypted As explained, rather than designing an overall FHE proces-
results. From the security point of view, if client (or any sor we prefer a single instruction architecture, since URISC
other adversary) can identify the termination point of any supports Turing complete computation obviating the need of
encrypted process without having access to the secret key, then different machine opcodes. Different types of single instruc-
it is a potential threat to the underlying cryptosystem. Hence, tions for modeling of URISC are [25]:
the determination of the event of termination should always • Subtract and branch if less than or equal to zero.
5

Encrypted Loop count variable


(enc_i) initialization
loopHndl=1

Send loopHndl
Is to server
loopHndl equals Set loopHndl=0
1?
No(encrypted) yes(encrypted) Generate Interrupt
Loop instruction execution yes

Is
No LoopEnd equals
enc_i <− FHE_Add(enc_i, 1)
1?

compResult <− FHE_Compare ( enc_i,


Loopend <− Decrypt compResult
Max_LoopCount)
Send compResult
to client

Exit loop

Server side Client side

Fig. 1. Example of encrypted loop handling

• Subtract and branch if negative (SBN). IV. D ESIGN OF FURISC


• Reverse subtract and skip if borrow In this section, we discuss the design basics of FURISC pro-
• Move. cessor based on two primitive URISC instructions : SBN and
Here, we take the example of SBN instruction based FURISC MOVE. Here we consider 4-tuple format of SBN instruction
and show how it can be advantageous to handle termination and explain how to design an SBN based FURISC.
problem of FHE processes in an efficient way with single Let A’, B’ and C’ be FHE encrypted memory addresses
client-server interaction in comparison to multiple client server and Mem’[A’] and Mem’[B’] be the encrypted contents of
interactions. In SBN instruction, the operandum is subtracted the respective addresses. With these parameters, fully homo-
from operandam and the execution proceeds to next-address morphic SBN instruction can be represented as:
if the subtraction result is negative. SBN A’, B’, resultant’, C’ :
More formally, the instruction is represented as: resultant’ = Mem’[A’] - Mem’[B’];
if (resultant’ < enc (0))
SBN A, B, C : goto C’
Mem[B] = Mem[A] - Mem[B]; else goto next instruction
if (Mem[B]< 0)
goto C
else goto next instruction Implementation of this FHE based SBN instruction requires
the following steps:
In case of encrypted processes, let the loop handling condi- • Encryption Phase: Memory addresses A, B, C should be
tion (or termination condition) be decided based on the sub- encrypted by FHE to A’, B’, C’ and contents of the
traction results of address A and address B contents (Mem[A] addresses are stored in encrypted format.
and Mem[B]) and it is stored in Mem[B]. Depending on the • Memory read-write: Contents of memory address A’
value of Mem[B], PC can jump to instruction within the loop and memory address B’ need to be fetched. Encrypted
itself or proceed to the next instructions out of the loop, once memory module handles memory read and write operation
loop end condition has been reached. Multiple loops can be which will be explained in a subsequent section.
handled in the same way and in such scenario, client-server • FHE Subtraction: The subtraction of Mem’[A’] and
message passing for each loop is not necessary for multiple Mem’[B’] is performed by FHE Sub module of FURISC
loop handling. Finally, once the final termination condition processor and stored in register resultant’.
has been reached PC can jump to a dedicated predefined • Branching or program counter (PC) Updation: If
End of program location. Client can get the information of (resultant’ < enc(0)), the execution control pro-
program termination by a single message passing from this ceeds to C’ or to next instruction pointed by encrypted
particular location. In the next section, we first explore how to PC i.e (PC’ + 1). This branch updation is handled by
design URISC processor using FHE as an underlying encryp- FHE Branch module, which is again part of Encrypted
tion scheme and gradually explain this solution of encrypted ALU of FURISC.
program termination with more examples. • Value of the register resultant’ is finally updated to
6

0’ FHE
MUX Encrypted
input data FHE
Input
MUX
addresses

Encrypted
Memory

Memory
} FHE Addition Mem’[A’] Input
addresses

Encrypted
Memory

data
FHE
Encrypted MUX
0’ FHE
addresses Encrypted MUX
addresses Encrypted
input
data

FHE_SUB
FHE_SUB

Bit−OR Bit−OR

Fig. 2. Encrypted memory read module for FURISC Fig. 3. Encrypted memory write module for FURISC

certain memory or register address location according to


URISC instructions. SubResult
Mem[A] Mem[B]
In the following subsection, we shall explain how to imple-
FHE Subtraction Module SubResult[MSB]
ment the mentioned modules using the Fully Homomorphic Mem[B]

primitive circuits mentioned in HElib [22] like FHE Add Branch


NextPC
Module
(Add ciphertext bits (XOR)), FHE Mul ( Multiply ciphertext
bits (AND)), FHE Fulladd (Add with carry in and carry out)
and FHE Halfadd (Add with carry out).
Fig. 4. Encrypted ALU module for FURISC
A. Encrypted memory module
Encrypted memory in FURISC design requires manipula-
the matched location based on selection of FHE multiplexers
tion of encrypted data as well as encrypted addressing. The
(FHE MUX). Thus, FHE multiplexer (FHE MUX) modules
main design challenge of designing such memory is that
are used to check encrypted matching. Once match is found
the underlying encryption algorithm is randomized. Hence,
memory read-write operation is performed from or to the
initially encrypted data may be stored in a certain encrypted
matched memory.
address. During memory-fetch encryption of same address
However, directly following this approach of encrypted
gives a different result (bitwise values are different). That
memory design incurs large overhead. Main drawback of
makes the content fetching more difficult from a particular
this design is that for every instruction, memory read or
address of memory. Hence, an encrypted decision making
write operation requires search through the whole memory
module is required for encrypted memory read-write as pro-
to find the exact matched location. In our design, we separate
posed in [20].
instruction memory and data memory to reduce the search
Fig. 2 and Fig. 3 describe how encrypted memory works.
space. Further, we dedicate a separate unencrypted bit to mark
In our design, the base address of the memory is encrypted
the active memory space. Each time, instructions of a specific
and the next locations are determined incrementing (homo-
program are loaded to the memory, those recently loaded
morphically) the base address consecutively. Encrypted data
locations are marked as active. Once the program terminates,
are stored in these encrypted addresses. To fetch data from
the attached memory locations are marked as inactive. Thus,
any of these locations, the input encrypted memory address
for every program counter (PC) updation search is restricted in
need to be matched with the encrypted locations of the
active instruction memory locations only. Again, for data read
memory. Since every time the encryption algorithm generates
or write from or to memory, search is restricted only within
different encrypted values for same address, homomorphic
active data memory.
address matching technique is required. Hence, to search a
particular memory, input encrypted address is subtracted from
each location address of encrypted memory by FHE SUB
B. Encrypted ALU module
module (will be explained in section IV-B) and bitwise OR of
the subtraction result is computed using FHE OR module (Bit- The main arithmetic operations of this ALU for FURISC
OR module in the diagram). Output of the FHE OR module processor is FHE subtraction and PC updation as shown in
is fed as the selection lines of FHE multiplexers (FHE MUX) Fig. 4. The ALU module mainly consists of a fully homomor-
attached to each location of memory. In case of memory read, phic subtraction module (FHE Sub) and FHE branch module
if any match is found (Output of FHE OR module is enc(0)), (FHE branch).
data from the matched location is fetched to a temporary reg- FHE Sub module: FHE Subtraction is implemented by
ister (otherwise (enc(0)) value is added to the register value). adding one number with the 2’s complement of another. The
In case of memory write, input encrypted data is written in subtraction module is designed by performing homomorphic
7

addition of one ciphertext with 2’s complement of another MOVE operandam’ operandum’
ciphertext.
The implication of this instruction is to copy the contents of
FHE branch module: According to the principle of SBN
the operandam’ to operandum’, where these are two encrypted
instruction, branching operation decides whether the program
addresses. The copy can be performed from any location to
control will next proceed to address C’ or to the next address
other (to any memory or register from any memory or regis-
of program counter (PC’ + enc(1)). Since all the opera-
ter). Hence, the design of a MOVE based architecture only
tions will take place in encrypted domain in FURISC, the next
requires memory fetch-write operations and register fetch-
proceeding address should also be encrypted. For this reason,
write operations. Memory read-write and register operations
FHE MUX is used with two inputs, C’ and the incremented
are performed using encrypted multiplexer as explained in
PC’ + enc(1). The branching depends on the decision if
section IV-A.
the subtraction result of Mem’[A’] and Mem’[B’] is nega-
tive. Hence, the most significant bit (MSB) of the subtraction
result is treated as the selection line (M SB = enc(1) indicates A. Performance evaluation: SBN vs Move FURISC
the value as negative). In this section, we evaluate which FURISC architecture
between SBN and MOVE is worthy to consider in terms
of performance. Here, SBN and MOVE based implementa-
C. Overall architecture
tions are compared in terms of number of instructions and
Fig. 5 shows the overall architecture with encrypted memory investigate which one is really faster. Let a program P be
module and encrypted ALU. SBN functionality is realized with implemented by n1 SBN instructions. Let same program P
the following steps with this architecture: be implemented by n2 MOVE instructions. Now, let a single
• Register A’, B’ and C’ hold the address values as men- SBN instruction be implemented by m1 MOVE instructions.
tioned in the SBN instruction parameter. Hence, intuitively converting all SBN instructions of program
• Initially, address of A’ is taken into PC’ and the memory P to MOVE instruction is equivalent to implementing P
content is fetched from the Encrypted Memory by mem- only with MOVE instructions. That implies, Thus, the code
ory read operation. Memory Read/Write Module works as length of the program P using only MOVE instructions is
mentioned in section IV-A. The fetched value is stored proportional to n1 m1 MOVE instructions. Similarly, let single
in register Mem’[A’]. MOVE instruction be implemented by m2 SBN instructions,
• Similarly, contents of memory address B’ is stored in hence code length of program P is proportional to n2 .m2
2
Mem’[B’]. Selection of A’ or B’ is controlled by sel, the instructions. That again implies, nn21 .m
.m1
2
= nn21 or ( nn21 ) = m
m1 .
2

selection line of associated FHE MUX. Following code snippets show how single SBN and MOVE
• Subtraction operation is performed using the FHE ALU instructions can be mapped to their respective MOVE and SBN
module and the result is stored in Resultant register. equivalents.
• Further, MSB of Resultant register value is fed as se- A single MOVE instruction MOVE operandam
lection to a FHE mux for PC updation and the next PC operandum can be realized by a single SBN instruction:
address is determined from the two inputs (PC + 1)’
SBN operandam, #00, operandum, #00
and C’ of the multiplexer depending the selection value.
It may be noted that (PC + 1)’ can be obtained by However, a single SBN instruction: SBN operandam
homomorphically adding the cipher corresponding to 1 operandum resultant next-address can be real-
with that corresponding to PC. ized by the following instructions:
• Depending on the third parameter of the SBN instruction, INVERT operandum
value stored in the Resultant register is updated in the ADD operandam operandum resultant
respective memory or register location. COMPARE resultant CONSTANT
BRANCH next-address
So far we have discussed how to design SBN based
FURISC. Another approach of FURISC design is based on In this instruction sequences, operandum is inverted
MOVE instruction, which basically works on copy operation. and (-operandum ) is added to operandam and ad-
Intuitively, MOVE based architecture should be better in dition result is stored in resultant. resultant is
terms of performance in comparison to subtraction based compared with CONSTANT to check if it is negative and
SBN architecture since copy operation does not require any branch to next-address depending on the value of
recrypt operation. Recrypt is the costlier operation during FHE the resultant. All the instructions like INVERT, ADD,
based computations and the main reason for slow performance COMPARE, BRANCH need to be realized by multiple MOVE
for any FHE operation. In the next section, we outline a instructions. Hence, number of MOVE instructions required to
comparison between MOVE and SBN based FURISC and implement a single SBN instruction (m1 ) is greater than num-
explore which design is actually advantageous in terms of ber of SBN instructions equivalent to one MOVE operation
performance. (m2 ) [25]. That again implies, m1 > m2 and hence n1 < n2 .
Hence, it indicates SBN based URISC architecture requires
V. C OMPARISON WITH MOVE BASED URISC lesser number of instructions compared to MOVE instruction
The format of the basic instruction for MOVE based FU- based URISC to implement any program. In practical scenario,
RISC is: large number of instructions indicate large number of PC
8

A’ FHE
MUX
B’

C’ sel

Mem’[A’]
FHE
PC’ Encrypted Memory Temporary De− FHE ALU
Memory Read/Write Register MUX
Module Mem’[B’]

FHE
MUX Regsel
Resultant[MSB]
Resultant

FHE Addition (PC+1)’


FHE
MUX NextPC
PCsel

C’

NextPCsel

Fig. 5. Overall FURISC Architecture

TABLE I while(x > y)


T IMING REQUIREMENT OF ENCRYPTED OPERATIONS ON FURISC {
x--;
Operations CPU cycles }
SBN processor implementation
Fibonacci 3918*108
Since, x and y are both encrypted, the termination condition
Binary Search 96*108 of the loop while (x > y) is impossible to comprehend
Quick sort 12012*108 when the code is executed in any general purpose processor.
In FURISC architecture, SBN instructions realize the loop in
Move processor implementation the following way:
Fibonacci 4836*108
Binary Search 251*108 while’ : SBN $1000’, $1001’, temp’, &wend’
Quick sort 156026*108 SBN $1000’, enc(1), $1000’, null
SBN PC’, &while’, PC’, &while’
wend’ SBN $1000’, enc(0), $1000’, null
updation and memory, register handling. In both SBN and Let encrypted x and y be stored in encrypted addresses
MOVE based FURISC architecture, PC updation and memory $1000’ and $1001’, while’ indicates encrypted starting
and register read-write operations require large number of FHE addresses of while loop execution and wend’ is the encrypted
operations, hence that incurs extra timing requirement in terms address where program counter (PC) should jump once the
of CPU cycles. Due to this reason, MOVE based FURISC is while loop gets terminated. The advantage of designing this
not advantageous in terms of performance. FURISC is that PC updation can be controlled by encrypted
Table I shows the number of required CPU cycles to subtraction operation since PC and all the address locations are
implement different encrypted functions on encrypted SBN encrypted. Thus, when encrypted x is less than encrypted y,
and Move based FURISC. The results are obtained designing subtraction result between contents of $1000’ and $1001’
C-based simulators of SBN and MOVE based FURISC archi- becomes negative. That indicates the encrypted termination
tectures. The simulators are designed using modules defined condition has been reached and PC now should jump to
in Scarab library and evaluated for correctness on a Linux wend’. If this is the only loop present in program then the
Ubuntu 64-bit machine with i686 architecture 1.6GHZ pro- wend’ is a no operation (NOP) and PC next jumps to End
cessor. Among the implemented functions, Fibonacci requires of program location. Otherwise, PC jumps to next instruction
single loop handling, binary search and sort algorithms require of the program. NOP is implemented by subtracting enc(0)
multiple loop handling. Here, we show the required CPU from the value of $1000’ location and storing the result back
cycles for computing Fibonacci value of 100, for performing to the $1000’ location.
binary search within 100 data and performing quick sort on In the next example, we shall show how multiple encrypted
a collection of 100 data. The experimental results also con- loop termination is handled using FURISC. We take the
form the theoretical observation that MOVE based processors example of quick sort which consists of multiple nested loop.
require higher number of CPU cycles and perform inferior Following is the representation of this code realized with SBN
compared to the SBN based counterpart. instruction architecture.
VI. FURISC APPLIED TO SOLVE TERMINATION
/********* FURISC implementation of
In this section, we explain with examples how FURISC ar- Quick sort in Appendix *********/
chitecture helps to tackle encrypted loop termination problem. /***** Starting of if: lines 1-4 ******/
Initially, we start with an example of a simple loop: QS’: SBN last’, first’, temp’, &EOP
9

SBN first’, enc(0), pivot’, null


SBN first’, enc(0), i’, null
SBN last’, enc(0), j’, null SBN j’, enc(1), j’
SBN QS’, enc(0), PC’, PC’
/********* while(i<j): line 5 *********/
while1’: SBN j’, i’, temp’, &wend1’
SBN enc(0), enc(1), temp’
SBN enc(0), pivot’, jtemp’, null SBN j’, temp’, j’
SBN $2000’, jtemp’, temp1’, null SBN QS’, enc(0), PC’, PC’

/********* while loop : line 6-7 ******/ /********* End of Program *********/
while2’: SBN enc(0), i’, itemp’, null EOP: SBN $2000’, enc(0), $2000’, null
SBN $2000’, itemp’, temp’, null
SBN temp1’, temp’, accumulator’,
&while3’
SBN last’, i’, product, &while3’ This code snippet shows how easily the nested loop can
SBN product, enc(0),temp’,&while3’ be handled using this architecture. Here, we consider array
SBN enc(0), enc(1), temp’ x[ ] is resided at starting address $2000’. At while1’,
SBN i’, temp’, i’ the condition (i’< j’) (i’ and j’ are the encryption of
/********* End of while of line 6 ******/ i and j) has been checked using (SBN i’, j’, temp’,
wend2’ SBN PC’, &while2’, PC’, &while2’ wend1’), where i’ and j’ are stored in intermediate regis-
ters. With the SBN functionality, j’ is subtracted from i’
SBN enc(0), pivot’, jtemp’, null and the loop condition is checked. When j’ is less than
SBN $2000’, jtemp’, temp1’, null i’, subtraction result is negative and PC’ proceeds to end
/******** while loop : line 8-9 *******/ of while (wend’). For while2’, loop condition is checked
while3’: SBN enc(0), j’, jtemp’, null by subtracting i’ from last’ and if the subtraction result is
SBN $2000’, jtemp’, temp’, null negative the program flow is branched to while3’. Thus,
multiple loop is handled without the requirement of any
SBN temp1’, temp’, accumulator’, redundant operation.
&wend3’
SBN j’, enc(1), j’
SBN PC’, &while3’, PC’, &while3’ Once, the termination condition is reached PC should jump
to the End of program location. Hence, the termination prob-
/******* End of while of line 8 *******/ lem reduces to determine whether the PC has reached to End of
wend3’: SBN j’, i’, temp’, &endif’ program location. In our design, we have dedicated a particular
address location as End of program address (EOP). Since, all
SBN enc(0), i’, itemp’, null
SBN $2000’, itemp’, temp’, null the address locations as well as the PC are encrypted, client
SBN $2000’, itemp’, (mem_tempi)’, can not directly know when PC has been reached to EOP.
null Ideally, client should not know this information without the
SBN enc(0), j’, jtemp’, null access to secret key since it will hamper the security of the
SBN $2000’, jtemp’, temp1’, null crypto system. To solve this issue, we consider an encrypted
SBN $2000’, jtemp’, (mem_tempj)’,
null termination-bit, which is set high once PC has reached the
EOP address. This termination-bit is send to the client through
SBN temp’, enc(0), (mem_tempj)’, an encrypted message. Client is capable of decrypting the bit
null having access to secret key. Once the termination point is
SBN temp1’, enc(0), (mem_tempi)’, reached decryption of the termination-bit generates an inter-
null
rupt in the client side, so that client can get the information that
the program has been terminated. This method is advantageous
endif : SBN PC’, &while1’, PC’, &while1’ over the method of using maximum number of cycles, since
no redundant operation is required in this process. Further,
/****** End of while of line 5 *******/ it is also better in comparison to the proposed method in
wend1’: SBN enc(0), pivot’, itemp’, null
SBN $2000’, itemp’, temp’, null section III-B, which requires client intervention and message
SBN $2000’, itemp’, (mem_tempi)’, passing for every loop iteration in a single program. Since,
null each program consist of numerous loops, large number of
SBN enc(0), j’, jtemp’, null message passing, network bandwidth, synchronization and
SBN $2000’, jtemp’, temp1’, null decryptions are necessary for loop handling. On the other
SBN $2000’, jtemp’, mem_tempj’,
null hand, our proposed method shows only a single client-server
message passing is capable of handling termination problem
SBN temp’, enc(0), mem_tempj’, while using FURISC architecture no matter what is the size
null of the program or how many loops are present.
SBN temp1’, enc(0), mem_tempi’,
null
10

TABLE II VIII. C ONCLUSION


C OMPARISON OF FURISC PERFORMANCE WITH HEROIC
In this work, we present an encrypted URISC architecture
Operations HEROIC Timing FURISC Timing with FHE as underlying encryption scheme, which combines
(Clock cycles) (Clock cycles) the flexibility of performing arbitrary operations on encrypted
Factorial 8.45*107 402.5*108 data due to the property of FHE with design simplicity of
Fibonacci 2.74*108 396*108
Bubble Sort 1.54*108 3509*108 URISC architecture. Due to the use of FHE, randomization in
memory handling and PC branching solves the CPA Vulner-
ability issues of previous SHE based design [4]. Further, we
TABLE III also show how this design is advantageous to handle encrypted
C OMPARISON WITH MAXIMUM LOOP COUNT ON UNENCRYPTED loop termination problem. As a future work, performance
PROCESSOR
improvement of FURISC can be investigated to make this
Fibonacci data Timing with maximum FURISC Timing design more practical for implementing in context of cloud
loop count (clock cycle) (Clock cycle) computing.
30 2400*108 1188*108
60 2400*108 2358*108 R EFERENCES
90 2400*108 3528*108 [1] S. Rass and D. Slamanig, Cryptography for Security and Privacy in
100 2400*108 3918*108 Cloud Computing. Norwood, MA, USA: Artech House, Inc., 2013.
[2] C. Gentry, “A fully homomorphic encryption scheme,” Ph.D. disserta-
tion, Stanford University, 2009, crypto.stanford.edu/craig.
[3] R. L. Rivest, L. Adleman, and M. L. Dertouzos, “On data banks
VII. C OMPARISON WITH EXISTING WORKS and privacy homomorphisms,” Foundations of Secure Computation,
Academia Press, pp. 169–179, 1978.
[4] N. G. Tsoutsos and M. Maniatakos, “Heroic: Homomorphically
encrypted one instruction computer,” in Proceedings of the
Table II shows a comparison of FURISC performance with Conference on Design, Automation & Test in Europe, ser. DATE
SHE based HEROIC proposed in [4], [26]. According to ’14. 3001 Leuven, Belgium, Belgium: European Design and
the experimental results, FHE based FURISC requires more Automation Association, 2014, pp. 246:1–246:6. [Online]. Available:
http://dl.acm.org/citation.cfm?id=2616606.2616907
clock cycles compared to HEROIC for implementing same [5] C. Gentry, “Computing arbitrary functions of encrypted data,” Commun.
operation, but it is advantageous in terms of security improve- ACM, vol. 53, no. 3, pp. 97–105, Mar. 2010.
ment and providing CPA resistance to the encrypted processor. [6] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, “Fully
homomorphic encryption over the integers,” IACR Cryptology ePrint
Unlike [4], access to public key need not be restricted and the Archive, p. 616, 2009.
encryption can be randomized for CPA resistance. [7] J.-S. Coron, A. Mandal, D. Naccache, and M. Tibouchi, “Fully ho-
momorphic encryption over the integers with shorter public keys,” in
We compare the performance of FURISC with the proposed Proceedings of the 31st annual conference on Advances in cryptology,
work in [20], where termination condition of FHE process is ser. CRYPTO’11. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 487–
handled by predefined value of maximum loop count. The 504.
[8] M. Naehrig, K. Lauter, and V. Vaikuntanathan, “Can homomorphic
disadvantage of this proposed method is that every algorithm encryption be practical?” in Proceedings of the 3rd ACM workshop on
is bound to give the worst case performance. For example, Cloud computing security workshop, ser. CCSW ’11. New York, NY,
in case of binary search algorithm on n FHE data, maximum USA: ACM, 2011, pp. 113–124.
[9] Y. Ramaiah and G. Kumari, “Towards practical homomorphic encryption
loop execution count should be prefixed at O(logn) (since with efficient public key generation,” ACEEE International Journal on
from the knowledge of unencrypted binary search O(logn) is Network Security, vol. 3, no. 4, p. 8, October 2012.
the worst case performance timing requirement). In this case, [10] A. Silverberg, “Fully homomorphic encryption for mathematicians,”
IACR Cryptology ePrint Archive, vol. 2013, p. 250, 2013. [Online].
program cannot terminate before and best case performance of Available: http://eprint.iacr.org/2013/250
O(1) can never be achieved. Hence, this incurs large amount [11] M. Akinwande, “Advances in homomorphic cryptosystems.” J. UCS,
of redundant operations. Similarly, sorting algorithms need to vol. 15, no. 3, pp. 506–522, 2009.
[12] D. Stehle and R. Steinfeld, “Faster fully homomorphic en-
always iterate for O(n2 ) times. The main advantage of our cryption,” Cryptology ePrint Archive, Report 2010/299, 2010,
proposed technique is that the encrypted program terminates http://eprint.iacr.org/.
as and when the processing is complete, hence it is possible [13] V. Vaikuntanathan, “Computing blindfolded: New developments in
fully homomorphic encryption,” in IEEE 52nd Annual Symposium
to achieve the best or average performance of respective on Foundations of Computer Science, FOCS 2011, Palm Springs,
algorithms. CA, USA, October 22-25, 2011, 2011, pp. 5–16. [Online]. Available:
http://dx.doi.org/10.1109/FOCS.2011.98
Further, table III shows a performance comparison of FU- [14] Z. Brakerski and V. Vaikuntanathan, “Efficient fully homomorphic
RISC with encrypted algorithms executed on unencrypted encryption from (standard) $\mathsf{LWE}$,” SIAM J. Comput.,
processors. We choose Fibonacci computation as an example vol. 43, no. 2, pp. 831–871, 2014. [Online]. Available:
http://dx.doi.org/10.1137/120868669
with a fixed maximum loop count. All the implementations [15] H. Perl, Y. Mohammed, M. Brenner, and M. Smith, “Fast confidential
are evaluated for correctness on a Linux Ubuntu 64-bit ma- search for bio-medical data using bloom filters and homomorphic
chine with i686 architecture 1.6GHZ processor. Result shows cryptography.” in eScience. IEEE Computer Society, 2012, pp. 1–8.
[16] ——, “Privacy/performance trade-off in private search on bio-medical
whatever be the actual data for Fibonacci computation, always data,” Future Generation Computer Systems, 2014.
it takes computation time for maximum loop count. Hence, [17] A. Chatterjee, M. Kaushal, and I. Sengupta, “Accelerating sorting
it requires large number of redundant operations for smaller of fully homomorphic encrypted data,” in Progress in Cryptology -
INDOCRYPT 2013 - 14th International Conference on Cryptology in
data and number of redundant operation decreases as the data India, Mumbai, India, December 7-10, 2013. Proceedings, 2013, pp.
is closer to maximum loop count. 262–273.
11

[18] Y. Doroz, E. Ozturk, and B. Sunar, “Accelerating fully homomorphic


encryption in hardware,” IEEE Transactions on Computers, vol. 99, no.
PrePrints, p. 1, 2014.
[19] M. Brenner, H. Perl, and M. Smith, “Practical applications of homomor-
phic encryption,” in SECRYPT 2012 - Proceedings of the International
Conference on Security and Cryptography, Rome, Italy, 24-27 July, 2012,
SECRYPT is part of ICETE - The International Joint Conference on e-
Business and Telecommunications, 2012, pp. 5–14.
[20] ——, “How practical is homomorphically encrypted program execution?
an implementation and performance evaluation,” in 11th IEEE Inter-
national Conference on Trust, Security and Privacy in Computing and
Communications, TrustCom 2012, Liverpool, United Kingdom, June 25-
27, 2012, 2012, pp. 375–382.
[21] D. Stehle and R. Steinfeld, “Faster fully homomorphic en-
cryption,” Cryptology ePrint Archive, Report 2010/299, 2010,
http://eprint.iacr.org/.
[22] https://hcrypt.com//scarab library.
[23] N. P. Smart and F. Vercauteren, “Fully homomorphic encryption with
relatively small key and ciphertext sizes,” in Proceedings of the 13th
International Conference on Practice and Theory in Public Key Cryp-
tography, ser. PKC’10, Berlin, Heidelberg, 2010, pp. 420–443.
[24] H. Perl, M. Brenner, and M. Smith, “Poster: an implementation of the
fully homomorphic smart-vercauteren crypto-system.” in ACM Confer-
ence on Computer and Communications Security, Y. Chen, G. Danezis,
and V. Shmatikov, Eds. ACM, 2011, pp. 837–840.
[25] W. F. Gilreath and P. A. Laplante, Computer Architecture: A Minimalist
Perspective. Springer Publishing Company, Incorporated, 2012.
[26] N. G. Tsoutsos and M. Maniatakos, “Investigating the application of one
instruction set computing for encrypted data computation,” in SPACE,
2013, pp. 21–37.
[27] J. Katz and Y. Lindell, Introduction to Modern Cryptography (Chapman
& Hall/Crc Cryptography and Network Security Series). Chapman &
Hall/CRC, 2007.

A PPENDIX
C ODE FOR QUICK SORT ALGORITHM WITH MULTIPLE LOOP
1. if(first<last){
2. pivot=first;
3. i=first;
4. j=last;

5. while(i<j){
6. while(x[i]<=x[pivot]&&i<last)
7. i++;
8. while(x[j]>x[pivot])
9. j--;
10. if(i<j){
11. temp=x[i];
12. x[i]=x[j];
13. x[j]=temp;
14. }
15. }

16. temp=x[pivot];
17. x[pivot]=x[j];
18. x[j]=temp;
19. quicksort(x,first,j-1);
20. quicksort(x,j+1,last);
21. }
22. }

You might also like