2017CCF C会议
2017CCF C会议
2017CCF C会议
Chris Mitchell
Liqun Chen
Dongmei Liu (Eds.)
LNCS 10631
Information and
Communications Security
19th International Conference, ICICS 2017
Beijing, China, December 6–8, 2017
Proceedings
123
Lecture Notes in Computer Science 10631
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7410
Sihan Qing Chris Mitchell
•
Information and
Communications Security
19th International Conference, ICICS 2017
Beijing, China, December 6–8, 2017
Proceedings
123
Editors
Sihan Qing Liqun Chen
Chinese Academy of Sciences University of Surrey
and Peking University Guildford, Surrey
Beijing UK
China
Dongmei Liu
Chris Mitchell Microsoft
Royal Holloway, University of London Beijing
Egham, Surrey China
UK
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
General Chair
Dan Meng Institute of Information Engineering, Chinese Academy
of Sciences, China
Program Chairs
Sihan Qing Chinese Academy of Sciences and Peking University, China
Chris Mitchell Royal Holloway, University of London, UK
Liqun Chen University of Surrey, UK
Organizing Committee
Zhen Xu (Chair) Institute of Information Engineering, Chinese Academy
of Sciences, China
Liming Wang Institute of Information Engineering, Chinese Academy
(Vice Chair) of Sciences, China
Chao Zheng SKLOIS, Institute of Information Engineering,
Chinese Academy of Sciences, China
Publicity Chair
Qingni Shen Peking University, China
Publication Chair
Dongmei Liu Microsoft, China
Program Committee
Man Ho Allen Au The Hong Kong Polytechnic University, Hong Kong
Joonsang Baek University of Wollongong, Australia
Zhenfu Cao East China Normal University, China
Chin Chen Chang Feng Chia University, Taiwan
Chi Chen Institute of Information Engineering, Chinese Academy
of Sciences, China
Kefei Chen Hangzhou Normal University, China
Liqun Chen University of Surrey, UK
Zhong Chen Peking University, China
K. P. Chow The University of Hong Kong, Hong Kong
Frédéric Cuppens Telecom Bretagne, France
VIII Organization
Additional Reviewers
Algorithms
Applied Cryptography
Practical Range Proof for Cryptocurrency Monero with Provable Security . . . 255
Kang Li, Rupeng Yang, Man Ho Au, and Qiuliang Xu
Security Applications
IoT Security
Privacy Protection
Security Protocols
Network Security
1 Introduction
Anonymous credentials allow users to obtain credentials on their identities and
prove possession of these credentials anonymously. There are three parties in the
anonymous credentials system: users obtain credentials from issuers (or GM,
indicating Group Manager). They can then present these credentials to verifiers
(or SP, indicating Service Provider) in an anonymous manner. The verifiers can
check the validity of users’ anonymous credentials but cannot identify them.
Practical solutions for anonymous credentials have been proposed, such as
IBM’s identity mixer [19] and TCG (Trusted Computing Group)’s DAA (Direct
Anonymous Attestation) protocol [14,15], Microsoft’s U-Prove [22], or Nymble
system [21]. To avoid misbehavior, most of schemes introduce a TTP (Trust
Third Party) to revoke misbehaved users. However, having a TTP capable of
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 3–16, 2018.
https://doi.org/10.1007/978-3-319-89500-0_1
4 W. Wang et al.
revoked. This attack allows a user to escape from being revoked as he wishes
after he owns a express-lane token, which disables the security policy of BLACR.
Then we provide a revised variant that can be proved by ProVerif. The revision
also shows that the fix provided by ExBLACR is incorrect.
2.1 Syntax
Roughly speaking, a TTP-free blacklistable anonymous credentials system con-
tains the following algorithms:
The adversary can choose any user (id ) to run the processes Register and
Authenticate. The restricted channel name p is used for delivering the credential
of the user between registration and authentication.
Process Judge models the judgment of a user’s state (for example, his current
reputation score). We record two events in process Judge: event satisfyPolicy for
a satisfied judgment; and event notSatisfy for a failure.
Both sides of equivalence are of the same processes except that the left side
executes the registration and authentication processes of the user id0 . That is
to say, if the equivalence is held, then the adversary cannot tell whether or not
the user id0 has executed the registration and authentication processes.
Unlinkability ensures that a system in which the analyzed processes can be
executed by a user multiple times looks the same to an adversary that the system
in which the analyzed processes can be executed by the user at most once.
(!vid.vp.(Register|(p(cre).!(Authenticate|Judge))))
≈
(!vid.vp.(Register|(p(cre).(Authenticate|Judge))))
The difference between two sides locates in the number of times that the
authentication has been executed. On condition that this equivalence is satisfied,
the adversary cannot distinguish the user executing the authentication multiple
times from executing at most once.
In this section, we model BLACR and automatically verify its security properties
using formal analysis tool ProVerif. The review of ProVerif calculus and ZKP
compiler are presented in the full version.
∪ ΣZK , where
Σ = Σbase
true,false,hash,exp,and,or,eq,pk,sk,
Σbase =
commit,open,bbssign,bbsver,getmess
ΣZK = {ZKi,j , Veri,j , Publici , Formula, αi , βj |i, j ∈ N }
For the signature Σbase , functions true, false are constant symbols; hash, pk,
sk, getmess are unary functions; exp, land, or, eq, commit, open, bbssign
are binary functions; bbsver is ternary functions. The equation theory Ebase
associated with signature Σbase is defined as follows:
Ebase = and(true, true) = true
or(true, x) = true
or(x, true) = true
eq(x, x) = true
bbsver(open(bbssign(commit(x, y), sk(s)), y), x, pk(s)) = true
getmess(open(bbssign(commit(x, y), sk(s)), y)) = x
Functions and, or, eq are used for conjunction, disjunction and equality test
respectively; hash is used for hashing messages; exp is used for the exponent
operation. The rest functions are used for constructing and verifying BBS+
signature scheme.
2. The user judges his reputation score si of each category by checking if the
entries on the corresponding list belong to him. Then he tests if si < T Si so
that P ol evaluates to 1.
3. If the test is successful, the user returns to the SP a pair (τ, Π2 , Π3 ), where τ
= (b, t = H(b||sid)x ) is the ticket associated with the current authentication
session, and (Π2 , Π3 ) is a pair of signature proof of knowledges. Π2 is used
to prove that τ is correctly formed with the credential cre: SP K{(x, r, cre) :
Cx = commit(x, r), bbsver(cre, x, pk(s)) = true, t = b̂x }(mauth ), where b̂ =
H(b||sid); Π3 is used to prove P ol evaluates to 1: SP K{(x, r, si ) : Cx =
commit(x, r), Csij = commit(0)|j ∈user
/ , Csij = commit(sij )|j∈user , Csi = Csi1 ·
· · CsiL , si < T Si }(mauth ), where j ∈ {1, ..., L} and L is the length of the
corresponding list.
4. The SP verifies the proofs (Π2 , Π3 ).
Register =
c(mreg ). Issue =
let x = bind(id) in vmreg .c̄ mreg .c((Cx , P i1 )).
vy.let Cx = commit(x, y) in if public2 (P i1 ) = Cx then
let P i1 = ZK(x, y; id, Cx , mreg ; Freg ) in if public3 (P i1 ) = mreg then
c̄ (Cx , P i1 ) .c(bcre). if Ver2,3 (Freg , P i1 ) = true then
let cre = open(bcre, y) in let id = public1 (P i1 ) in
if bbsver(cre, x, pk(siss )) = true then if sybil = true then 0 else
event(registered).! p̄ cre let bcre = bbssign(Cx , sk(siss )) in
else c̄ bcre
event(unregistered)
Verify =
vmauth .c̄ (vauth ) .c((P i2 , P i3 )).
if public2 (P i2 ) = pk(siss ) then
Authenticate =
if public5 (P i2 ) = hash((public3 (P i2 ),
c(mauth ).vr.vb.vrs
sid)) then
let x = bind(id) in
if public6 (P i2 ) = mauth then
let Cx = commit(x, r) in
if Ver3,6 (Fsig , P i2 ) = true then
let h = hash((b, sid)) in
let Cx = public1 (P i2 ) in
let t = exp(h, x) in
let b = public3 (P i2 ) in
let P i2 = ZK(x, r, cre; Cx , pk(siss ), b, t, h,
let t = public4 (P i2 ) in
mauth ; Fsig ) in
if public1 (P i3 ) = Cx then
jud(s).let Cs = commit(s, rs ) in
if public3 (P i3 ) = ltT S then
let P i3 = ZK(x, r, s, rs ; Cx , Cs , ltT S, mauth ;
if public4 (P i3 ) = mauth then
FP ol ) in
if Ver4,4 (FP ol , P i3 ) = true then
event(startAuth).c̄ (P i2 , P i3 )
event(acceptAuth).! lt (b, t)
else
event(revoke)
Then the user generates two zero-knowledge proofs: Π2 with formula Fsig =
and(and(β1 = commit(α1 , α2 ), bbsver( α3 , α1 , β2 ) = true), β4 = exp(β5 , α1 ))
12 W. Wang et al.
C-Process
vsiss .vsver .vjud.vlt.let c = pub in (pub pk(siss ) | pub pk(sver ) |
(! issue) | ( ! (Verify | AssignScore)) | ControlUsers)
Note that we also initialize a key pair for the SP since we set sid = pk(sver ) to
identify the SP for computing the ticket.
A-Process
vsiss .vsver .vjud.vlt.vint0 .vint1 .let c = pub in (
pub pk(siss ) |pub pk(sver ) |(!Issue) |Users|
(let (id, p) = (id0 , int0 ) in Register)|(vid1 .let (id, p) = (id1 , int1 ) in (
Register|int0 (cre0 ).int1 (cre1 ).
let (id, cre) = (diff[id0 , id1 ], diff[cre0 , cre1 ]) in (Authenticate|Judge)))
)
where
Users =!vid.!vp.(Register|(p(cre).!(Authenticate|Judge)))
Encoding anonymity in this way, we have the left side of diff representing
an execution of publicly known id id0 , while the right side of diff represents
an execution of unknown id id1 (a restrict id). In fact, the right side of diff is
a case of Users. Hence, it directly corresponds to the definition in Sect. 2.2 and
we succeed in reducing the problem of proving anonymity to the diff-equivalence
that can be verified by ProVerif.
Result 2. Given the main process A-Process, ProVerif succeeds in proving the
diff-equivalence, therefore, anonymity is satisfied.
U-Process
vsiss .vsver .vjud.vL.let c = pub in (pub pk(siss ) |pub pk(sver ) |
(! issue) |Unlinkability)
where
Unlinkability =
!vid1 .vint1 .((let (id, p) = (id1 , int1 ) in Register)|(!vid2 .vint2 .(
(let (id, p) = (id2 , int2 ) in Register)|(int1 (cre1 ).int2 (cre2 ).
let (id, cre) = (diff[id1 , id2 ], diff[cre1 , cre2 ]) in (Authenticate|Judge)) ))
)
Thinking inside this process, we have that the left side of the diff represent-
ing a user executes the system many time, while the right side represents the
users execute the system at most once (The user id2 is always different for each
execution of processes of the user id1 ).
Result 3. Given the main process U-Process, ProVerif succeeds in proving the
diff-equivalence, therefore, unlinkability is satisfied.
The attack trace shows that a replay attack can be carried out by a malicious
user as follows: in the second express-lane authentication, the user finds his
aggregated reputation score does not satisfy the authentication policy. But he
still proceeds in this way: he uses a preceding token that is enough to make the
aggregate score satisfying the policy. This attack can happen since these tokens
do not consist of any labels to distinguish each other.
In general, this attack can be applied to two scenarios violating the security
policy: the first one is that a user can utilize an old token to escape from being
revoked; the other is that a user in possession of a token can conduct an express-
lane authentication at any time, regardless of whether he is an active user.
This attack can be fixed by refining the definition of token tk. The token
must consist of the timestamp information t. We revise the processes with the
timestamp and the verification is successful by ProVerif.
In fact, our solution in symbolic representation mode indicates that the fix
presented in ExBLACR still does not work properly since the timestamp in the
proving process of ExBLACR does not be revealed. In such way, a malicious user
can still conduct the replay attack mentioned above, because the SP can just
ensure the token tk is correct but can not know the timestamp t corresponding
to this token.
4 Conclusion
This paper presents the definitions of some common security properties for
BLAC-like systems in the symbolic model using applied pi calculus. We express
these definitions as correspondence and equivalence properties. As a case study,
we verify these properties in BLACR system. The analysis finds a known attack
aiming at the token mechanism in the express-lane authentication. This revision
with a successful verification in ProVerif also indicates that the fix provided by
ExBLACR is incorrect.
Actually, our model is of approximate due to the nature of ProVerif. We think
some other modelling method can also be under consideration to record state,
such as multiset rewriting rules (Tamarin tool [24]) or stateful variant of applied
pi calculus [5,17]. Another extension may be lying in research of composing
protocols as mentioned in the introduction. These may be the future work.
References
1. Abadi, M., Blanchet, B.: Analyzing security protocols with secrecy types and logic
programs. J. ACM (JACM) 52(1), 102–146 (2005)
2. Arapinis, M., Cheval, V., Delaune, S.: Verifying privacy-type properties in a mod-
ular way. In: CSF 2012, pp. 95–109. IEEE (2012)
3. Arapinis, M., Cheval, V., Delaune, S.: Composing security protocols: from confi-
dentiality to privacy. In: Focardi, R., Myers, A. (eds.) POST 2015. LNCS, vol.
9036, pp. 324–343. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-
662-46666-7 17
4. Arapinis, M., Chothia, T., Ritter, E., Ryan, M.: Analysing unlinkability and
anonymity using the applied pi calculus. In: CSF 2010, pp. 107–121. IEEE (2010)
5. Arapinis, M., Phillips, J., Ritter, E., Ryan, M.D.: Statverif: verification of stateful
processes. J. Comput. Secur. 22(5), 743–821 (2014)
6. Au, M.H., Tsang, P.P., Kapadia, A.: PEREA: practical TTP-free revocation of
repeatedly misbehaving anonymous users. ACM Trans. Inf. Syst. Secur. (TISSEC)
14(4), 29 (2011)
7. Au, M.H., Kapadia, A.: PERM: practical reputation-based blacklisting without
TTPs. In: CCS 2012, pp. 929–940. ACM (2012)
8. Au, M.H., Kapadia, A., Susilo, W.: BLACR: TTP-free blacklistable anonymous
credentials with reputation. In: NDSS Symposium 2012: 19th Network & Dis-
tributed System Security Symposium, pp. 1–17. Internet Society (2012)
9. Au, M.H., Susilo, W., Mu, Y.: Constant-size dynamic k -TAA. In: De Prisco, R.,
Yung, M. (eds.) SCN 2006. LNCS, vol. 4116, pp. 111–125. Springer, Heidelberg
(2006). https://doi.org/10.1007/11832072 8
10. Backes, M., Maffei, M., Unruh, D.: Zero-knowledge in the applied pi-calculus and
automated verification of the direct anonymous attestation protocol. In: SP 2008,
pp. 202–215. IEEE (2008)
11. Blanchet, B.: Automatic proof of strong secrecy for security protocols. In: SP 2004,
pp. 86–100. IEEE (2004)
12. Blanchet, B., Abadi, M., Fournet, C.: Automated verification of selected equiv-
alences for security protocols. In: 20th IEEE Symposium on Logic in Computer
Science. Proceedings, pp. 331–340. IEEE (2005)
13. Blanchet, B., et al.: ProVerif: cryptographic protocol verifier in the formal model.
http://prosecco.gforge.inria.fr/personal/bblanche/proverif/
14. Brickell, E., Camenisch, J., Chen, L.: Direct anonymous attestation. In: Proceed-
ings of the 11th ACM Conference on Computer and Communications Security, pp.
132–145. ACM (2004)
15. Brickell, E., Chen, L., Li, J.: A new direct anonymous attestation scheme from
bilinear maps. In: Lipp, P., Sadeghi, A.-R., Koch, K.-M. (eds.) Trust 2008. LNCS,
vol. 4968, pp. 166–178. Springer, Heidelberg (2008). https://doi.org/10.1007/978-
3-540-68979-9 13
16. Brickell, E., Li, J.: Enhanced privacy ID: a direct anonymous attestation scheme
with enhanced revocation capabilities. In: Proceedings of the 2007 ACM Workshop
on Privacy in Electronic Society, pp. 21–30. ACM (2007)
17. Bruni, A., Modersheim, S., Nielson, F., Nielson, H.R.: Set-pi: set membership
p-calculus. In: CSF 2015, pp. 185–198. IEEE (2015)
18. Camenisch, J., Drijvers, M., Lehmann, A.: Universally composable direct anony-
mous attestation. In: Cheng, C.-M., Chung, K.-M., Persiano, G., Yang, B.-Y. (eds.)
PKC 2016. LNCS, vol. 9615, pp. 234–264. Springer, Heidelberg (2016). https://
doi.org/10.1007/978-3-662-49387-8 10
16 W. Wang et al.
19. Camenisch, J., et al.: Specification of the identity mixer cryptographic library,
version 2.3. 1, 7 December 2010
20. Cheval, V., Blanchet, B.: Proving more observational equivalences with ProVerif.
In: Basin, D., Mitchell, J.C. (eds.) POST 2013. LNCS, vol. 7796, pp. 226–246.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36830-1 12
21. Johnson, P.C., Kapadia, A., Tsang, P.P., Smith, S.W.: Nymble: anonymous IP-
address blocking. In: Borisov, N., Golle, P. (eds.) PET 2007. LNCS, vol. 4776, pp.
113–133. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75551-
78
22. Paquin, C., Zaverucha, G.: U-prove cryptographic specification v1. 1. Technical
report, revision 3. Microsoft Corporation (2013)
23. Ryan, M.D., Smyth, B.: Applied pi calculus. In: Cortier, V., Kremer, S. (eds.)
Formal Models and Techniques for Analyzing Security Protocols, Chap. 6. IOS
Press (2011)
24. Schmidt, B., Meier, S., Cremers, C., Basin, D.: Automated analysis of Diffie-
Hellman protocols and advanced security properties. In: CSF 2012, pp. 78–94.
IEEE (2012)
25. Smyth, B., Ryan, M., Chen, L.: Formal analysis of anonymity in ECC-based direct
anonymous attestation schemes. In: Barthe, G., Datta, A., Etalle, S. (eds.) FAST
2011. LNCS, vol. 7140, pp. 245–262. Springer, Heidelberg (2012). https://doi.org/
10.1007/978-3-642-29420-4 16
26. Smyth, B., Ryan, M.D., Chen, L.: Formal analysis of privacy in direct anonymous
attestation schemes. Sci. Comput. Program. 111, 300–317 (2015)
27. Tsang, P.P., Au, M.H., Kapadia, A., Smith, S.W.: BLAC: revoking repeatedly
misbehaving anonymous users without relying on TTPs. ACM Trans. Inf. Syst.
Secur. (TISSEC) 13(4), 39 (2010)
28. Wang, W.: Proverif inputs for analyzing BLACR system. https://github.com/
WangWeijin/Formal-analysis-of-BLACR-system
29. Wang, W., Feng, D., Qin, Y., Shao, J., Xi, L., Chu, X.: ExBLACR: extending
BLACR system. In: Susilo, W., Mu, Y. (eds.) ACISP 2014. LNCS, vol. 8544, pp.
397–412. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08344-5 26
30. Xi, L., Feng, D.: FARB: fast anonymous reputation-based blacklisting without
TTPs. In: Proceedings of the 13th Workshop on Privacy in the Electronic Society,
pp. 139–148. ACM (2014)
31. Xi, L., Feng, D.: Formal analysis of DAA-related APIs in TPM 2.0. In: Au,
M.H., Carminati, B., Kuo, C.-C.J. (eds.) NSS 2014. LNCS, vol. 8792, pp. 421–
434. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11698-3 32
32. Yu, K.Y., Yuen, T.H., Chow, S.S.M., Yiu, S.M., Hui, L.C.K.: PE(AR)2 : privacy-
enhanced anonymous authentication with reputation and revocation. In: Foresti,
S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 679–696.
Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33167-1 39
An Efficiency Optimization Scheme for the
On-the-Fly Statistical Randomness Test
1 Introduction
Random number has numerous applications in a diverse set of areas ranging from
statistics to cryptography. For cryptographic applications it is crucial to generate
random bits which will be unpredictable even by the strongest adversary. The
random sequence can be generated by two ways: true-random number genera-
tors (TRNGs) and pseudo-random number generators (PRNGs). Many studies
have proved entropy source of random number generators (RNGs) is impres-
sionable for many environmental condition changes [1], such as temperature,
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 17–35, 2018.
https://doi.org/10.1007/978-3-319-89500-0_2
18 J. Shen et al.
– (R1) The on-the-fly test should reject sequences when any test fails, which is
different from the requirement of traditional testing that collects the result
of each test.
– (R2) On-the-fly test should detect statistical weaknesses of rejected sequences
as early as possible, namely the delay introduced by an on-the-fly test should
be short in order to meet the requirement of high efficiency.
– (R3) On-the-fly test is oriented towards such applications as cloud encryp-
tion/storage service, real-time encrypted communication, etc. Due to the time
constraint of these applications, the length of sequences is usually relatively
short, such as 128 bits for AES, 256 and 512 bits for hash functions.
2 Related Work
2.1 Standards for Statistical Randomness Test
There are a number of statistical test standards, including NIST SP 800-22
[5] and FIPS 140-2 [7,8] issued by NIST, AIS 31 [9] issued by Bundesamt für
Sicherheit in der Informationstechnik (BSI), Diehard Battery [10] proposed by
20 J. Shen et al.
Marsaglia, TestU01 [11] proposed by L’Ecuyer and Simard, etc. L’Ecuyer [12]
studied some main techniques for theoretical and statistical tests. In our study,
we focus on the efficiency optimization of NIST SP 800-22 test suite, because
of the similar tests and testing strategy among these test suites. Meaning that,
our optimization scheme is also suitable for other test suites as well as the SP
800-22.
In statistical test suites, tests are executed in a given execution order to
test the sequences in series. In the NIST test suite, there is a default order [5]
executed by the test: {T1 − T2 − T13 − T3 − T4 − T5 − T6 − T7 − T8 − T9 − T12 −
T14 − T15 − T11 − T10 }. And the serial number of tests is shown in Table 1. Note
that the execution order does not influence the result of the NIST test suite, so
we can adjust the order to find the optimized one.
No. T1 T2 T3 T4 T5
Test Frequency Block frequency Runs Longest-run Binary matrix rank
No. T6 T7 T8 T9 T10
Test Discrete fourier Non-overlapping Overlapping TM Maurer’s universal Linear complexity
transform TM
No. T11 T12 T13 T14 T15
Test Serial Approximate Cumulative sums Random excursions Random excursions
entropy variant
Suciu et al. [21] improved the efficiency of the NIST test suite by introducing
the parallel computing based on the multicore architecture. After testing on
An Efficiency Optimization Scheme 21
different sizes of the sequence data, the experimental results showed a very sig-
nificant speedup compared to the original version. Furthermore, Huang and Lai
[22] provided a method to optimize the execution order of tests for efficiency,
which is based on the conditional entropy of each test in the NIST test suite.
However, the optimized result just stayed in the length range of 15 to 24 bits. In
our optimization scheme, we can obtain the optimized order for the sequences
between 128 and 256 bits, which is usually adopted as the length of the block
by the block cipher. Chen et al. [23] proposed the prototype of the optimization
scheme, and in this paper, we extended the previous work on efficiency evaluation
and expanded the experimental principle and the process of proof.
Effective Coverage. It is obvious that a test which has larger coverage, larger
independence and smaller time consumption should be put at a fronter position.
However, the three factors cannot be directly used to describe the efficiency of
a specific order, thus we introduce the concept of effective coverage to give an
intuitive explanation. Effective coverage is defined as the independent coverage
of a test to the whole coverage of its all the previous tests in a given execution
order. For example, the tests are executed under a specific order in a test suite,
and set as {T1 − T2 − · · · − TN }. Then, the effective coverage of Ti is represented
i−1 T
as RnTi \ j=1 Rnj .
Since the influence of time consumption on efficiency is obvious, we analyze
the influence of effective coverage under the assumption of the same time con-
sumption. In order to reach the highest efficiency, for the above execution order,
each position i(= 1, 2, · · · , N ) should place such test to be sorted as which has
the largest effective coverage. Illustrated with the n-bit sequences, the coverage
of the ith test Ti is presented as RnTi . The value of effective coverage of the ith
i−1 T i−1 T
test, i.e. |RnTi \ j=1 Rnj |, should equal to max{|RnTk \ j=1 Rnj |} (Tk is the test
to be sorted). Figure 2 gives an intuitive description about the relation of tests
Ti , Tj and Tk , and we presume the sorting of coverage is Ti > Tj > Tk . So
test Ti should be executed firstly due to the largest contribution for increasing
(effective) coverage of test suites. And next, we compare the effective coverage of
T
test Tj to test Tk ’s. If |Rnj \RnTi | > |RnTk \RnTi |, test Tj should immediately follow
with test Ti , and vice versa. Thus, the most efficient execution order should be
Ti ⇒ Tj ⇒ Tk . It means that the larger the effective coverage is, the higher
the efficiency is with identical time consumption. The analysis also indicates
that the effective coverage is positive correlated with both the coverage and
independence.
Therefore, the Average Elimination Time used (AET ) which can be utilized
as an objective metric for testing efficiency is presented as follows.
T ET
AET = N (2)
| i=1 RnTi |
From Formula (1), the time consumption and effective coverage of each test
under the given order are presented in each term. Both coverage and indepen-
dence decide the size of effective coverage, which are shown from the second term
to the (N − 1)th term. In particular, the first and the N th term only represent
the coverage of test T1 and the independence of test Tn , respectively. Thus, it
further proves that efficiency is primarily determined by these three influence
factors together.
influence factors, which is one of the bases for adjustment of execution order,
and the other bases are the value of the factors for each test. We will find out
the optimized order of the SP 800-22 test suite by this scheme. Our optimization
scheme is suitable for all the test suits, and the SP 800-22 is just adopted as a
representation in this paper.
Step 1: selecting the size of samples from the entire sequence space.
As we mentioned, the values of coverage and independence are derived from
the experiments. However, for 128-bit or longer sequences, it is impossible to
traverse the entire space. Therefore, we randomly select samples to obtain the
(approximate) property of tests in the entire space. Also, we need to verify
the result to confirm that the size of the samples is large enough to reflect
the property.
Step 2: calculating the values of the influence factors. The chosen
sequence samples are executed by all the tests, and we record the passing or
failure status. Using these results, we figure out the set of samples that fail
a specific test, i.e. coverage, and the set of samples that only fail the test,
i.e. independence. In addition, the time consumption of these tests can be
obtained from the timer of system where the tests are running.
Step 3: verifying the stability of the factor values for different sizes
of samples. After choosing the number of samples, we repeat random sam-
pling and calculate the factor values of the samples. If the calculated values
are stable or almost stable each time, the chosen sample number is considered
enough to reflect the property of the entire space. Otherwise, we enlarge the
number until the stability appears.
Step 4: allocating the weights of the three factors. There is no explicit
relationship among these factors, as they reflect different aspects of a test.
An Efficiency Optimization Scheme 25
of coverage and independence are proportional to the sample size, i.e., the more
tested samples are, the more failed samples are. These values need to be normal-
ized for a faircomparison. The normalized coverage value of test Ti is derived
Ti 12 Ti
from |R128 |/| i=1 R128 |, i.e., the ratio of the original coverage size of Ti to the
size of entire sequences eliminated by any test. The normalized independence
Ti Tj Ti
value of test Ti is derived from |R128 \ j=i R128 |/|R128 |, i.e., the ratio of the
original independence size of Ti to the coverage size of Ti .
The standard variance comparison of normalized factors for the sample sizes
of 215 , 220 and 225 is shown in Table 2. Note that since the time consumption is
independent of the sample size, it is not listed for comparison in Table 2. The
stability (standard variance) under sample size 220 or 225 is several times better
than that under 215 , i.e., the sample size has a better performance in representing
the entire sequence space. We also find that the standard deviation with sample
size 220 is close to that with 225 , meaning that the sample size 220 could be
proper. Furthermore, we need to observe the stability (invariance) of the result
of weight allocation and optimized order for each sample to determine which
sample size is enough, which is explained in Sects. 4.4 and 4.5, respectively.
3
ContributionTi = (fi ∗ wi ), (3)
i=1
Table 3. The optimized execution order of NIST tests for 128 bits with sample size
220
total samples, which indicates the necessity of shortening the elimination time
used (T ET ).
Then we observe whether the order is changed for any other length of
sequences. The optimized order obtained is stable for 256 bits as follows, but it is
different from the optimized result for 128 bits. Besides, it seems that this sam-
ple size is suitable for 256-bit optimization because of the fixed optimized order.
Note that we may need to increase sample size if the optimization is applied to
longer sample sequences, such as 512 bits.
– The optimized order for 256-bit sequences with sample size 220 and 225 : {T15 −
T14 − T8 − T7 − T2 − T3 − T11 − T4 − T10 − T12 − T13 − T1 }.
The reason why different lengths can lead to the difference of optimized
order is not only the change of time consumption, but also the coverage and
independence are changed too. The change of coverage and independence result
from that some of these tests will become stricter for testing sequences with
different lengths, and the degree of strictness of other tests will not change. Fur-
thermore, there is a strong correlation between time consumption and sequence
length. With the growth of length, time consumption of testing a sequence will
be increased, especially the time consumption of linear complexity test, which
shows an exponential growth. This leads to an obviously decreasing contribution
of such tests.
5 Evaluation
The average elimination time used (AET ) is defined to precisely measure the
efficiency of the test under a specific order. To evaluate the validity and efficiency
of our scheme, we compare the AET result between the 20 bits and 128 bits,
which is running under the most efficient order, i.e., optimal order, and our
An Efficiency Optimization Scheme 29
optimized order, shown in Table 4. The optimal order is found by traversing all
the 12! possible orders with the sample size 220 , and it takes 6 days to complete
the process in our experimental platform. The result shows that the AET value
of our optimized order approximates that of the optimal order.
Table 4. Comparison of AET for 20/128 bits with sample size 220
Then, we run the test under the default order of SP 800-22 and our optimized
order, with the sequence length 20/128/256 bits, respectively, and 220 is still
adopted as the sample size. The experimental result is shown in Table 5, which
is calculated as the average value of about fifty rounds.
The efficiency under our optimized order is enhanced by 175.7% compared
to the default order, with the 20-bit sequence length. In the case of 128 and 256-
bit sequence length, the efficiency is increased by 33% and 20%, respectively. In
compare with the AET of our optimized order and optimal order, the efficiency
of our scheme is in close proximity to the most efficient one. Therefore, our
optimization scheme provides a fast-implemented method to achieve the most
optimized speed for on-the-fly statistical randomness test.
Acknowledgement. This work was partially supported by National Key R&D Plan
No. 2016YFB0800504 and No. 2016QY02D0400, and National Natural Science Foun-
dation of China No. U163620068.
Table 6. The initial data - the attribute values of tests in SP 800-22 for 128 bits with
sample size 220 for α = 0.01 (Example)
Factor Sample
T1 T2 T3 T4 T7 T8 T10 T11 T12 T13 T14 T15
Inf1 10576 8919 11075 9507 104155 43816 65947 19063 11649 12701 49012 62223
Inf2 688 3164 4650 3852 74600 29763 48708 7331 2569 1475 32683 45932
Inf3 (µs) 5.30 5.59 6.65 7.52 81.79 13.7 87.76 21.32 13.07 10.39 15.51 21.42
An Efficiency Optimization Scheme 31
where rij is the element (i, j) in fuzzy similarity matrix, and define the notation
xki ∧ xkj = min{xki , xkj }, xki ∨ xkj = max{xki , xkj }.
Step 3. Dynamic Clustering Results. The dynamic clustering result is got
from the maximal tree, whose vertexes and weights represent the tests and
selectable maximal similarity degree between two tests respectively, under differ-
ent threshold values. Based on the above fuzzy similarity matrix, the obtained
maximal tree is shown in Fig. 4 by using the Kruskal Method.
T2 T13 T8
0.4813 0.6183 0.9029
T1 0.1535 T3 0.6551 T4 0.3512 T12 0.3921 T11 0.3075 T14 0.7106 T15 0.6007 T10 0.731 T7
From the weight on the branch of the maximal tree, we determine a series of
proper threshold values λ ∈ [0, 1]. Cut off the branch whose weight is below λ so
that we get a disconnected graph, and the connected branches form a class on
this λ. Under different thresholds, the classifications of these 12 tests represented
by their subscript are listed as follows, which is the dynamic clustering results.
Refer to Table 1 for the serial number of tests.
• 0.8 < λ ≤ 1, 11 classes: {1}, {2}, {3}, {4}, {7}, {8, 14}, {10}, {11}, {12},
{13}, {15};
• 0.7 < λ ≤ 0.8, 9 classes: {1}, {2}, {3}, {4}, {7, 10}, {8, 14, 15}, {11}, {12}, {13};
• 0.5 < λ ≤ 0.7, 6 classes: {1}, {2}, {3, 4}, {7, 8, 10, 14, 15}, {11}, {12, 13};
• 0.4 < λ ≤ 0.5, 5 classes: {1}, {2, 3, 4}, {7, 8, 10, 14, 15}, {11}, {12, 13};
• 0.2 < λ ≤ 0.4, 2 classes: {1}, {2, 3, 4, 7, 8, 10, 11, 12, 13, 14, 15};
• 0.1 < λ ≤ 0.2, 1 classes: {1, 2, 3, 4, 7, 8, 10, 11, 12, 13, 14, 15}.
that the ith class is denoted by {x1 , x2 , · · · , xni }. The clustering center of the
(i) (i) (i)
ith class is the vector x̄(i) = (x̄1 , x̄2 , x̄3 ), and the x̄k is the mean of the k th
(i) (i) (i) (i)
(i) 1 ni (i)
attribute, i.e. x̄k = x (k = 1, 2, 3). So the F - statistic is calculated by
ni j=1 kj
r
3
(i)
ni (x̄k − x̄k )2 /(r − 1)
i=1 k=1
F = ni
r
∼ F (r − 1, n − r),
3
(i) (i)
(xkj − x̄k )2 /(n − r)
i=1 j=1 k=1
where the numerator and denominator indicate the distances between classes
and the distances between samples in each class, respectively. Thus, for the
given confidence level 1 − α, we obtain the Fα from the F - critical value table.
Compared the calculated F to Fα , if F > Fα , according to the variance theory
of mathematical statistics, we believe that the differences among classes are
obvious under this λ corresponding to the F . Moreover, if there are more than
one candidate F (>Fα ), we further find the maximum of the proportional (F −
Fα )/Fα . This maximum means that the corresponding classification is the most
reasonable. Table 7 shows the calculated results of F - statistic.
After calculating, only λ = 0.4 satisfies F > Fα (α = 0.025). Therefore, the
best classification is 5 classes: {1}, {2, 3, 4}, {7, 8, 10, 14, 15}, {11}, {12, 13}.
Step 5. Deleting Attribute. We delete the attribute in turn and get the altered
initial data after an attribute deleted. Repeating the Steps 1–4, we enumerate
the best classification of deleting each attribute, respectively. Notice that these
best classification are also determined by the above method of F - statistic.
– Deleting Inf1 . {1}, {2, 3, 4}, {7, 10}, {8, 14, 15}, {11}, {12, 13};
– Deleting Inf2 . {1}, {2}, {3, 4, 11, 12, 13}, {7, 8, 10, 14, 15};
– Deleting Inf3 . {1, 2, 3, 4, 11, 12, 13}, {7, 8, 10, 14, 15}.
classification of the initial data from the best classification of the altered initial
data after an attribute deleted, it means that this deleted attribute contains
less information about the best classification of the initial data, and vice versa.
Therefore, the amount of information of deleted ith attribute Infi is anti-related
to the mutual information I(A; A − Infi ) which is owned by the best classifica-
tions of both the initial data and the altered initial data after deleting the ith
attribute. And the reciprocal of mutual information can be used to represent
the relative amount of information of the deleted attribute [27]. Notice that the
related theories are introduced in Appendix C. The expression of weight is
1
Wi = .
I(A; A − Infi )
After calculating the weights, let the weight of each factor Infi be normalized as
Wi
wi = 3 .
j=1 Wj
References
1. Vasyltsov, I., Hambardzumyan, E., Kim, Y.-S., Karpinskyy, B.: Fast digital TRNG
based on metastable ring oscillator. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008.
LNCS, vol. 5154, pp. 164–180. Springer, Heidelberg (2008). https://doi.org/10.
1007/978-3-540-85053-3 11
2. Markettos, A.T., Moore, S.W.: The frequency injection attack on ring-oscillator-
based true random number generators. In: Clavier, C., Gaj, K. (eds.) CHES 2009.
LNCS, vol. 5747, pp. 317–331. Springer, Heidelberg (2009). https://doi.org/10.
1007/978-3-642-04138-9 23
3. Fischer, V., Aubert, A., Bernard, F., et al.: True Random Number Generators in
Configurable Logic Devices. Project ANR-ICTeR (2009)
4. Schindler, W.: Efficient online tests for true random number generators. In: Koç,
Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 103–117.
Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44709-1 10
5. Rukhin, A., Soto, J., Nechvatal, J., et al.: A statistical suite for random and pseu-
dorandom number generators for cryptographic applications. NIST Special Publi-
cation 800-22, Washington, D.C., May 2001
6. Sönmez Turan, M., DoGanaksoy, A., Boztaş, S.: On independence and sensitivity of
statistical randomness tests. In: Golomb, S.W., Parker, M.G., Pott, A., Winterhof,
A. (eds.) SETA 2008. LNCS, vol. 5203, pp. 18–29. Springer, Heidelberg (2008).
https://doi.org/10.1007/978-3-540-85912-3 2
7. NIST FIPS PUB: 140-2: Security Requirements for Cryptographic Modules. Wash-
ington, D.C., USA (2001)
8. Elaine, B., John, K.: Recommendation for random number generation using deter-
ministic random bit generators. NIST Special Publication 800-90A, Washington,
D.C., January 2012
An Efficiency Optimization Scheme 35
9. Killmann, W., Schindler, W.: AIS 31: Functionality Classes and Evaluation
Methodology for True (Physical) Random Number Generators. Version 3.1. T-
Systems GEI GmbH and Bundesamt fr Sicherheit in der Informationstechnik (BSI),
Bonn, Germany (2001)
10. Marsaglia, G.: The Marsaglia Random Number CDROM Including the Diehard
Battery of Tests of Randomness (1995)
11. L’Ecuyer, P., Simard, R.J.: TestU01: AC library for empirical testing of random
number generators. ACM Trans. Math. Softw. 33(4) (2007)
12. L’Ecuyer, P.: Testing random number generators. In: Winter Simulation Confer-
ence, pp. 305–313. ACM Press (1992)
13. Hellekalek, P., Wegenkittl, S.: Empirical evidence concerning AES. ACM Trans.
Model. Comput. Simul. 13(4), 322–333 (2003)
14. Fan, L., Chen, H., Gao, S.: A general method to evaluate the correlation of ran-
domness tests. In: Kim, Y., Lee, H., Perrig, A. (eds.) WISA 2013. LNCS, vol. 8267,
pp. 52–62. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05149-9 4
15. Maurer, U.M.: A universal statistical test for random bit generators. J. Cryptol.
5(2), 89–105 (1992)
16. Hamano, K., Kaneko, T.: Correction of overlapping template matching test
included in NIST randomness test suite. IEICE Trans. 90–A(9), 1788–1792 (2007)
17. Kim, S.-J., Umeno, K., Hasegawa, A.: Corrections of the NIST statistical test suite
for randomness. IACR Cryptology ePrint Archive 2014:18–31 (2004)
18. Hamano, K.: The distribution of the spectrum for the discrete fourier transform
test included in SP800-22. IEICE Trans. 88–A(1), 67–73 (2005)
19. Pareschi, F., Rovatti, R., Setti, G.: On statistical tests for randomness included in
the NIST SP800-22 test suite and based on the binomial distribution. IEEE Trans.
Inf. Forensics Secur. 7(2), 491–505 (2012)
20. Sulak, F., Doğanaksoy, A., Ege, B., Koçak, O.: Evaluation of randomness test
results for short sequences. In: Carlet, C., Pott, A. (eds.) SETA 2010. LNCS,
vol. 6338, pp. 309–319. Springer, Heidelberg (2010). https://doi.org/10.1007/978-
3-642-15874-2 27
21. Suciu, A., Nagy, I., Marton, K., Pinca, I.: Parallel implementation of the NIST
statistical test suite. In: Proceedings of the 2010 IEEE 6th International Conference
on Intelligent Computer Communication and Processing (ICCP), pp. 363–368.
Institute of Electrical and Electronic Engineers (2010)
22. Huang, J., Lai, X.: Measuring random tests by conditional entropy and optimal exe-
cution order. In: Chen, L., Yung, M. (eds.) INTRUST 2010. LNCS, vol. 6802, pp.
148–159. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25283-
9 10
23. Chen, T., Ma, Y., Lin, J., Wang, Z., Jing, J.: An efficiency optimization scheme for
the on-the-fly statistical randomness test. In: Proceedings of the 2015 IEEE 2nd
International Conference on Cyber Security and Cloud Computing (CSCloud),
CSCLOUD 2015, pp. 515–517. IEEE Computer Society, Washington, D.C. (2015)
24. Soto, J.: Statistical testing of random number generators. In: Proceedings of the
22nd National Information Systems Security Conference (NISSC), vol. 10, pp. 12–
23. NIST, Gaithersburg (1999)
25. Soto, J.: Randomness Testing of the AES Candidate Algorithms. NIST (1999).
csrc.nist.gov
26. NIST: The NIST Statistical Test Suite (2010). http://csrc.nist.gov/groups/ST/
toolkit/rng/documents/sts-2.1.2.zip
27. Chen, C.B., Wang, L.Y.: Rough set-based clustering with refinement using Shan-
non’s entropy theory. Comput. Math. Appl. 52(10–11), 1563–1576 (2006)
Signature Scheme and Key Management
FABSS: Attribute-Based Sanitizable
Signature for Flexible Access Structure
1 Instruction
The EHR system is considered as a sustainable solution for improving the quality
of medical care, referring to the systematized collection of patient and popula-
tion electronically-stored health information in a digital format. In the EHR
system, it is important to guarantee the authentication and integrity of medical
records, and thus digital signature is utilized in the EHR system. However, the
secret signing key of the signature attests to the identity information of patients,
such as names, ages which are not supposed to be shown to the public. With
attribute based signature (ABS), patients sign the records with attributes signing
key issued by attribute authorities according to the patients’ attributes, such as
age : >45, prof ession : teacher, workunits : Xidian U niversity, etc. The sig-
nature attests not to the identities of patients but some of their attributes, which
protects the identity privacy of patients and achieves anonymous authentication.
Hence, ABS adapts to the EHR system.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 39–50, 2018.
https://doi.org/10.1007/978-3-319-89500-0_3
40 R. Mo et al.
In the system, the health data of patients are updated frequently. However,
traditional digital signature including ABS prohibits any alteration of the origi-
nal medical data once it is signed and has to be regenerated from scratch once
parts of the original records are changed, which increases the computation over-
head of users, leading to inefficiency. Sanitizable signature allows a semi-trusted
party sanitizer to modify certain portions of the health records in the original
signature. Thus, the signer needs to sign the records only once, which reduces
the computation cost of the signer. In the sanitizing phase of the original sig-
nature, the sanitizer can generate the sanitized signature without interacting
with the signer for signing keys, thus the sanitizer cannot forge the signature
of the original signer. In addition, the process of sanitizing does not impact the
verification.
In this paper, to address the efficiency and identity privacy problems in the
EHR system, we propose a novel Flexible Attribute-Based Sanitizable Signature
(FABSS) scheme. Specifically, major contributions of this paper are twofold.
2 Preliminaries
The threshold access structure in ABS is composed of one threshold and several
attributes. A user can generate a valid signature only if the size of the intersection
of his attribute sets and the access structure attribute sets exceeds the threshold
value. Simple access control can be achieved with threshold access structure,
such as {‘A’ AND ‘B’ AND ‘C’}, {‘A’ OR ‘B’ OR ‘C’}.
The flexible access structure consists of a number of thresholds and
attributes, in which each interior node is a threshold gate. Besides aforemen-
tioned structures, we can define expressive access control in large-scale attribute
sets through changing the breadth and depth of the structure, such as {{‘A’
AND ‘B’} OR ‘C’}, {{‘A’ OR ‘B’} AND ‘C’}, etc.
of the matrix M with a labeling function z(·). The monotone span satisfies the
following equation:
Sign: First convert the claim-predicate f into its corresponding monotone span
program matrix M ∈ (Zp )l×t , mapping each row of M with the row label-
ing function z : [l] → A. Then compute the vector v that corresponds to
the desirable assignment for A. Pick random r ← Z∗p , r1 , . . . , rl ← Zp and
r vi r n mk ri r
compute Y = K , Si = (Kz(i) ) · (U k=1 Uk ) (∀i ∈ [l]), W = K0 , Pj =
l Mi,j ·ri
i=1 (Aj Bj ) (∀j ∈ [t]). The signature is σ = (Y, W, S1 , . . . , Sl , P1 , . . . , Pt ).
Verify: With (P arams, σ = (Y, W, S1 , . . . , Sl , P1 , . . . , Pt ), m, f ), the verifier first
converts f into its corresponding monotone span program M ∈ (Zp )l×t , with
row labeling z : [l] → A. If Y = 1, the verifier outputs reject, otherwise checks
?
e(W, A0 ) = e(Y, g20 ),
l
n
z(i) e(Y, g21 )e(U k=1 Ukmk , P1 ), j = 1,
e(Si , (Aj Bj )Mi,j ) =
?
n
i=1
e(U k=1 Ukmk , Pj ), j > 1,
for each j ∈ [t]. The verifier returns accept if all the equations above hold,
otherwise reject.
Sanitize: The sanitizer obtains σ and the DI from the signer. Pick random
r
U i n m
r̃1 , . . . , r̃l ← Zp then compute Y = Y, Si = Si k∈I1 Ukri (U k=1 Uk k )r̃i (∀i ∈
k∈I2
l k
5 Security Analysis
Proof (Correctness). When the signature of either the original message or the
sanitized message is signed by the signer whose attributes fit the access structure
f , it can be successfully checked by the verification.
Verification:
l
z(i) Mi,j
l
vi
n
m a+bz(i)·Mi,j
e(Si , (Aj Bj ) )= e((Kz(i) )r · (U Uk k )ri , g2j )
i=1 i=1 k=1
l
vi ·Mi,j r
n
m (a+bz(i))·Mi,j
= e((K i=1 ) , g2j ) · e((U Uk k )ri , g2j )
k=1
mk
e(Y, g21 )e((U nk=1 Uk ), P1 ), j = 1,
= m
e((U k=1 )n Uk k ), Pj ), j > 1.
From the sanitization we can see that the distribution of the santized signa-
ture is identical to that of the original signature, so the verification fits both of
them.
Proof (Unforgeability). We can prove that our FABSS scheme is unforgeable
under selective-predicate attack in the generic group model. Here we present the
full proof of unforgeability.
For Y = K r ← G1 and W = K0r = K r/a0 ← G1 , we suppose
y/a p
Y = g1y , W = g1 0 . Similarly, suppose Si = g1si , Pj = g2jj . We can
yvi n l
+u ri +
k=1 uk mk ri (a+bz(i))M ·r
derive that Si = g1a+bz(i) , Pj = g2j i=1 i,j i
. So si =
yvi
n l
a+bz(i) + u ri + k=1 uk mk ri , pj = i=1 (a + bz(i))Mi,j · ri . Then we get
n
si (a + bz(i))Mi,j = yvi Mi,j + ri (u + k=1 uk mk )(a + bz(i))Mi,j . Then
l l n
i=1 (si (a + bz(i))M i,j ) = i=1 (yvi Mi,j + ri (u + k=1 uk mk )(a + bz(i))Mi,j ).
l
We assume that v · Mi,j = d = [1, 0, . . . , 0], then we conclude that
l i
i=1
pj = u + n uk mk ·
1
i=1 (si (a + bz(i))Mi,j ) − ydj .
k=1
Therefore, we can define that the oracle Sign(P arams, SKA , m, f ) generates
signatures in the following way: Let M ∈ (Zp )l×t be the monotone span program
for f , with row labeling function z : [l] → A.
– Pick random s1 , . . . , sl ← Z∗p .
l
– For all j ∈ [t], compute pj = u +n1 uk mk · i=1 (si (a + bz(i))Mi,j ) − ydj ,
k=1
where d = [1, 0, . . . , 0].
y/a
– Output σ = (g1y , g1 0 , g1s1 , . . . , g1sl , g2p11 , . . . , g2ptt ).
We assume that there is an efficiently computable homomorphism between
G1 and G2 . For any generic-group adversary, the simulator registers each
group element’s discrete logarithm in the following formal variables: Σ =
{a0 , a, b, u , λ0 }∪{λj | j ∈ [tmax ]}∪{xμ | μ ∈ [Λ]}∪{si , y (q) | q ∈ [ν], i ∈ [l(q) ]},
(q)
where Λ is the number of queries made to the KeyGen oracle, ν is the number
of Sign queries made by the adversary, and l(q) is the length of the monotone
span program corresponding to the qth signature query.
The simulation associates each group element with aforementioned formal
variables. For each group element in its collection, the simulator keeps track of
46 R. Mo et al.
its discrete logarithm and gives it to the adversary as the encoding of the group
element. In the simulation, the group elements are expressed as follows:
Public key components are generated by Setup: 1, representing the generator
g1 . λ0 , representing g20 = g1λ0 . λ0 a0 , denoting A0 = g1λ0 a0 . {λj | j ∈ [tmax ]},
λ λ a
indicating g2j = g1 j . {λj a | j ∈ [tmax ]}, standing for Aj = g1 j . {λj b | j ∈
λj b
[tmax ]}, representing Bj = g1 . u , denoting U = g1u . {uk | k ∈ [n]}, indicating
Uk = g1uk .
Signing key components are given by KeyGen. Let Aμ be the μth set of
x
attributes queried to KeyGen: xμ , representing K (μ) = g1 μ . xμ /a0 , denoting
(μ) x /a (μ) x /(a+bz)
K0 = g1 μ 0 . {xμ /(a + bz) | z ∈ Aμ }, indicating Kz = g1 μ .
Sign queries. For the qth signature query on message m(q) under the pred-
(q) (q)
icate f (q) made by the adversary, let M(q) ∈ (Zp )l ×t be the monotone
span program corresponding to f (q) , with row labeling z (q) : [l(q) ] → A:
(q) (q)
s
= g1i . y (q) , denoting Y (q) = g1y
(q) (q)
{si | i ∈ [l(q) ]}, representing Si . y (q) /a0 ,
(q)
y /a0 (q) (q) λj
standing for W (q) = g1 . {pj | j ∈ [t(q) ]}, where pj = n(q) (q) (q)
·
u + k=1 uk mk
l(q) (q) (q)
(q)
pj
i=1 (si (a + bz (q) (i))Mi,j ) − y (q) dj , representing Pj = g1 .
∗ s∗ s∗
(g1y , g1w , g11 , . . . , g1l∗ ,
∗
Now the adversary outputs a forgery signature σ ∗ =
p∗ p∗
a predicate f ∗ and message m∗ such that (m , f )
= (m , f )
g1 1 , . . . , g1 t∗ ) on ∗ ∗ (q) (q)
∗ ∗ ∗
for all q. M ∈ (Zp )l ×t is the corresponding monotone span program with
row labeling z ∗ (·). The discrete logarithm of the forgery has to satisfy y ∗
=
l∗
0, w∗
= 0, for Y ∗
= 1, W ∗
= 1 and i=1 s∗i M∗i,j (a + bz ∗ (i))λj = y ∗ dj λj + (u +
n∗ ∗ ∗ ∗
k=1 uk mk )pj , these constraints can hold with non-negligible probability only
if two sides of the equation are functionally equivalent.
Then we will prove if the two sides of the equation are functionally equivalent,
there has to be a contradiction: there exists a μ0 ∈ [Λ] such that f ∗ (Aμ0 ) = 1.
Namely, the adversary may generate a signature using the signing key SKAμ0
that has been queried before but meets the new claim-predicate f ∗ , and thus
the output is not a forgery.
Assume L(Γ ) is the set of all multilinear polynomials over the set of terms
Γ with coefficients in Zp . Let H(Γ ) ⊂ L(Γ ) be the subset of homogeneous
polynomial.
We know that y ∗ , w∗ , s∗1 , . . . , s∗l∗ , p∗1 , . . . , p∗t∗ ∈ L(Γ ), where Γ = {1, a0 , λ0 , u ,
uk } ∪ {λj , aλj , bλj | j ∈ [tmax ]} ∪ {xμ , xμ /a0 , xμ /(a + bz) | μ ∈ [Λ], z ∈ Aμ } ∪
(q) (q)
{si , y (q) , w(q) , pj | q ∈ [ν], i ∈ [l(q) ], j ∈ [t(q) ]}. We can exclude certain terms
by comparing terms between the equation, then for y ∗ we can get y ∗ ∈ H({xμ |
n∗
μ ∈ [Λ]} ∪ {y (q) | q ∈ [ν]}). It is obvious that λj | (u + k=1 u∗k m∗k )p∗j and thus
λj | p∗j . So, p∗j ∈ H({λj , aλj , bλj } ∪ {pj | q ∈ [ν]}).
(q)
Suppose p∗j has a λj term. Then the right side has monomials λj and bλj .
Because y ∗ has no a or b term, y ∗ dj λj cannot contribute a λj monomial. There-
l∗
fore i=1 s∗i M∗i,j (a + bz ∗ (i))λj cannot contribute a monomial with λj alone, so
p∗j ∈ H({aλj , bλj } ∪ {pj
(q)
| q ∈ [ν]}).
FABSS: Attribute-Based Sanitizable Signature for Flexible Access Structure 47
n∗
Suppose p∗j has a pj term. Then (u + k=1 u∗k m∗k )p∗j will contribute the
(q)
n ∗ n∗ n(q) (q) (q)
u + k=1 u∗ ∗
k mk
) · pj . Since k=1 u∗k m∗k
= k=1 uk mk for any q,
(q)
term of ( n(q) (q) (q)
u+ k=1 uk mk
this is a proper rational. Neither y ∗ nor {s∗i }i∈l∗ can yield terms in the final
∗
u + n ∗ ∗
k=1 uk mk
equation with a factor of Hence, p∗j ∈ H({aλj , bλj }).
n(q) (q) (q) .
u + k=1 uk mk
n∗
Consider j0 such that dj0
= 0. As neither (u + k=1 u∗k m∗k )p∗j0 nor
l∗ ∗ ∗ ∗ ∗
i=1 si Mi,j0 (a + bz (i))λj0 can contribute a monomial of this form, y can-
not have a y (q) term. Therefore, y ∗ ∈ H({xμ | μ ∈ [Λ]}). Finally we conclude
that p∗j ∈ H({aλj , bλj }), y ∗ ∈ H({xμ | μ ∈ [Λ]}).
To make the expression equal, some parts of the left side have xμ to fit y ∗
and the other parts do not have xμ to satisfy p∗j . So, we can break s∗i up into
two parts: one whose terms involve xμ variables, and one whose terms do not.
x
Suppose s∗i = t∗i (Xi ) + δ ∗ (Γ \ Xi ), where Xi = { a+bzμ∗ (i) | z(i) ∈ Aμ , μ ∈ [Λ]}
is to cancel out the term (a + bz ∗ (i)) from the left side. For t∗i ∈ H(Xi ), it is
l∗ l ∗
apparent for all j ∈ [t] that i=1 t∗i M∗i,j (a + bz ∗ (i)) = y ∗ dj = y ∗ i=1 vi∗ M∗i,j ,
because of the equality of two sides of the equation, we get for all i ∈ [l] that
t∗i M∗i,j (a + bz ∗ (i)) = y ∗ vi∗ M∗i,j .
∗
Take account of any xμ0 that has a non-zero coefficient in y . Construct
xμ0
vi∗ , for i ∈ [l], by defining vi∗ = 1
[xμ0 ]y ∗
∗ ∗
a+bz ∗ (i) ti , where the [xμ0 ]y denotes
xμ0
the coefficient of the term xμ0 in y ∗ , ∗
a+bz ∗ (i) ti denotes the coefficient of the
x
term a+bzμ0∗ (i) in t∗i . v ∗ is a vector composed of constants, which satisfies the
equation v ∗ M∗ = [d1 , . . . , dt ] = [1, 0, . . . , 0]. Further, when vi∗
= 0, the set Aμ0
surely contains the attribute z ∗ (i), which means xz∗ (i)
= 0. By the properties
of the monotone span program, it must be the case that f ∗ (Aμ0 ) = 1, thus the
signature is not a forgery.
Proof (Anonymity). In our construction, the signature will not reveal which
attributes of the signer’s attributes A are used to sign the message, because any
attribute subset satisfying the access structure f can generate a valid signature.
Thus, we only need to prove that the signer’s identity among all users is kept
anonymous even when A = A, where A is the attributes in f .
First, the challenger runs Setup to get the public parameters P arams and
master secret keys M SK. The adversary outputs two attributes A1 and A2
satisfying f , and conducts KeyGen to get signing keys SKA1 = (K1 , K01 , {Kz1 |
z ∈ A1 } and SKA2 = (K2 , K02 , {Kz2 | z ∈ A2 }, respectively. Let Kθ , K0θ =
1/a 1/(a+bz)
Kθ 0 , Kzθ = Kθ for each z ∈ Aθ , where θ ∈ {1, 2}.
Then the adversary asks the challenger to generate a signature for message
m∗ with the signing key from either SKA1 or SKA2 . The challenger chooses
a random bit b ∈ {1, 2} and outputs a signature Y = K r , W = K0r , Si =
n l
vi
Kz(i) · (U k=1 Ukmk )ri , Pj = i=1 (Aj Bj )Mi,j ·ri by the algorithm Sign with
the signing key SKAb = (Kb , K0b , {Kzb | z ∈ Ab }. On the basis of Monotone
Span Program, it is obvious that it could be generated from either SKA1 or
48 R. Mo et al.
SKA2 . Hence, if the signature is generated from SKA1 for A1 , it could also be
generated from SKA2 for A2 . Thus, our FABSS scheme satisfies anonymity.
Proof (Information Privacy). From the construction of our scheme, the sig-
vi r
nature of message m = m1 m2 . . . mn is σ = (Y = K r , Si = (Kz(i) ) ·
n mk ri r
l z(i) Mi,j ·ri
(U k=1 Uk ) (∀i ∈ [l]), W = K0 , Pj = i=1 (Aj Bj ) (∀j ∈ [t])).
The sanitized signature of message m1 resulting in m is σ1 = (Y =
vi r n m
K r , {Si = (Kz(i) ) · (U k=1 Uk k )ri +r̃i : (∀i ∈ [l])}, W = K0r , {Pj =
l z(i) Mi,j ·(ri +r̃i )
i=1 (Aj Bj ) : (∀j ∈ [t])}), where r, ri , r̃i are random numbers. So
the distribution of σ is identical to that of σ1 . Similarly, the distribution is iden-
tical to σ2 of message m2 resulting in m . Hence, the distribution of σ1 and σ2
are identical and our scheme preserves the information privacy.
6 Performance Analysis
Through comparing with existing scheme functionally in Table 1, our FABSS
scheme not only reduces the patients computation cost, but also preserves the
privacy of patients. Meanwhile, the FABSS scheme achieves flexible access struc-
ture and fine-grained access control. Thus, our scheme applies to the EHR
system.
In Table 2 we specify the efficiency of our scheme. For the ease of exposition
we assume G1 , G2 are symmetric, treating G1 as the base group and G2 as the
bilinear group GT in our scheme. In scheme [8,12,17], n denotes the sum of
the attributes in the system, m is the length of the message, ω is the signers
attributes, the threshold value is expressed by k and d ≥ k. I is the order of
U I. In our scheme, we first convert f into the matrix Ml×t , then denote the
length and width of the matrix by l and t, respectively, where l = n, t = k. EX
is the number of the exponent arithmetic and P is the number of the pairing
arithmetic.
From Table 2 we find that our scheme exceeds in Key.Size, Key.Gen,
Sig.Size, Sansig.Size than that of [12,17] and is inferior to that of [8], because
[8] does not consider the privacy of users and thus does not include the attribute
sets. The size of P arams, M SK is similar to that of [12,17]. Furthermore, the
computation cost of Sig.Gen, Sansig.Gen and V erif y is longer than that of
[8,17], which is due to the flexible access structure with matrix M and admissible.
FABSS: Attribute-Based Sanitizable Signature for Flexible Access Structure 49
7 Conclusion
In order to reduce the computation cost and keep the identity privacy of users in
the EHR system, we propose the Flexible Attribute-Based Sanitizable Signature
(FABSS) scheme. Security demonstration shows that our scheme is unforge-
able and preserves the anonymity and information privacy of the users. Com-
pared with existing scheme, our scheme not only reduce the users’ computation
cost when data updating, but also supports flexible access structure defining
expressive access control in large-scale users. Further efforts can be made on
enhancing the security model of our FABSS scheme. In addition, we will exploit
multi-authority FABSS scheme in which the attributes are assigned by different
attribute authorities.
References
1. Ateniese, G., Chou, D.H., de Medeiros, B., Tsudik, G.: Sanitizable signatures.
In: di Vimercati, S.C., Syverson, P., Gollmann, D. (eds.) ESORICS 2005. LNCS,
vol. 3679, pp. 159–177. Springer, Heidelberg (2005). https://doi.org/10.1007/
11555827 10
2. Brzuska, C., Fischlin, M., Freudenreich, T., Lehmann, A., Page, M., Schelbert, J.,
Schröder, D., Volk, F.: Security of sanitizable signatures revisited. In: Jarecki, S.,
Tsudik, G. (eds.) PKC 2009. LNCS, vol. 5443, pp. 317–336. Springer, Heidelberg
(2009). https://doi.org/10.1007/978-3-642-00468-1 18
3. Brzuska, C., Fischlin, M., Lehmann, A., Schröder, D.: Unlinkability of sanitizable
signatures. In: Nguyen, P.Q., Pointcheval, D. (eds.) PKC 2010. LNCS, vol. 6056, pp.
444–461. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13013-
7 26
4. Canard, S., Laguillaumie, F., Milhau, M.: Trapdoor sanitizable signatures and their
application to content protection. In: Bellovin, S.M., Gennaro, R., Keromytis, A.,
Yung, M. (eds.) ACNS 2008. LNCS, vol. 5037, pp. 258–276. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-68914-0 16
50 R. Mo et al.
5. Lai, J., Ding, X., Wu, Y.: Accountable trapdoor sanitizable signatures. In: Deng,
R.H., Feng, T. (eds.) ISPEC 2013. LNCS, vol. 7863, pp. 117–131. Springer, Hei-
delberg (2013). https://doi.org/10.1007/978-3-642-38033-4 9
6. Miyazaki, K., Hanaoka, G., Imai, H.: Digitally signed document sanitizing scheme
based on bilinear maps. In: Proceedings of the 2006 ACM Symposium on Informa-
tion, Computer and Communications Security, pp. 343–354 (2006)
7. Yuen, T.H., Susilo, W., Liu, J.K., Mu, Y.: Sanitizable signatures revisited. In:
Franklin, M.K., Hui, L.C.K., Wong, D.S. (eds.) CANS 2008. LNCS, vol. 5339, pp.
80–97. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89641-8 6
8. Agrawal, S., Kumar, S., Shareef, A., Rangan, C.P.: Sanitizable signatures with
strong transparency in the standard model. In: Bao, F., Yung, M., Lin, D., Jing,
J. (eds.) Inscrypt 2009. LNCS, vol. 6151, pp. 93–107. Springer, Heidelberg (2010).
https://doi.org/10.1007/978-3-642-16342-5 7
9. Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine-
grained access control of encrypted data. In: Proceedings of 13th ACM Conference
on Computer and Communications Security, pp. 89–98 (2006)
10. Bethencourt, J., Sahai, A., Waters, B.: Ciphertext-policy attribute-based encryp-
tion. In: Proceedings of IEEE Symposium on Security and Privacy, pp. 321–334
(2007)
11. Maji, H.K., Prabhakaran, M., Rosulek, M.: Attribute-based signatures. In: Kiayias,
A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 376–392. Springer, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-19074-2 24
12. Li, J., Au, M.H., Susilo, W., Xie, D., Ren, K.: Attribute-based signature and its
applications. In: Proceedings of 5th ACM Symposium on Information, Computer
and Communications Security, pp. 60–69 (2010)
13. Okamoto, T., Takashima, K.: Efficient attribute-based signatures for non-monotone
predicates in the standard model. In: Catalano, D., Fazio, N., Gennaro, R., Nicolosi,
A. (eds.) PKC 2011. LNCS, vol. 6571, pp. 35–52. Springer, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-19379-8 3
14. Su, J., Cao, D., Zhao, B., Wang, X., You, I.: ePASS: an expressive attribute-based
signature scheme with privacy and an unforgeability guarantee for the internet of
things. Future Gener. Comput. Syst. 33, 11–18 (2014)
15. Rao, Y.S., Dutta, R.: Efficient attribute-based signature and signcryption realizing
expressive access structures. Int. J. Inf. Secur. 15, 81–109 (2016)
16. Li, J., Chen, X., Huang, X.: New attribute-based authentication and its application
in anonymous cloud access service. Int. J. Web Grid Serv. 11, 125–141 (2015)
17. Liu, X., Ma, J., Xiong, J., Ma, J., Li, Q.: Attribute based sanitizable signature
scheme. J. Commun. 34, 148–155 (2013)
18. Xu, L., Zhang, X., Wu, X., Shi, W.: ABSS: an attribute-based sanitizable signature
for integrity of outsourced database with public cloud. In: Proceedings of 5th ACM
Conference on Data and Application Security and Privacy, pp. 167–169 (2015)
SSUKey: A CPU-Based Solution
Protecting Private Keys on Untrusted OS
Abstract. With more and more websites adopt private keys to authen-
ticate users or sign digital payments in e-commerce, various solutions
have been proposed to secure private keys – some of them employ extra
specific hardware devices while most of them adopt security features
provided by general OS. However, users are reluctant to extra devices
and general OS is too complicated to protect itself, let alone the private
keys on it. This paper proposes a software solution, SSUKey, adopting
CPU security features to protect private keys against the vulnerabilities
of OS. Firstly, threshold cryptography (TC) is employed to partition the
private key into two shares and two Intel SGX enclaves on local client
and remote server are used to secure the key shares respectively. Sec-
ondly, the two enclaves are carefully designed and configured to mitigate
the vulnerabilities of Intel SGX, including side channel and rollback.
Thirdly, an overall central private key management is designed to help
users globally monitor the usage of private keys and detect abnormal
behaviors. Finally, we implement SSUKey as a cryptography provider,
apply it to file encryption and Transport Layer Security (TLS) down-
load, and evaluate their performance. The experiment results show that
the performance decline due to SSUKey is acceptable.
1 Introduction
the public key is known to public and is used to verify the digital signatures.
The private key should be kept secret.
At present the most effective way to protect private keys is using specific
hardware devices, which have their own processors and storage isolated from host
PC. This is adopted by Facebook, Google, GitHub, and Dropbox [2]. But users
are reluctant to use specified hardware devices because they are inconvenient
to carry and easy to lose. According to SafeNet Inc., the use of hardware-based
authentication dropped from 60% in 2013 to 41% in 2014; conversely, the use of
software-based authentication rose from 27% in 2013 to 40% in 2014 [3].
The security of software-based methods protecting private key relies on the
security of privileged code, such as OS kernel and VMM (Virtual Machine Mon-
itor). For example, [4,5] use hypervisor to provide isolation environment to
protect sensitive data. However, privileged code had been found many serious
vulnerabilities, for example, CVE-2015-2291, CVE-2017-0077, CVE-2016-0180
and CVE-2017-8468 for Windows kernel, CVE-2017-13715, CVE-2017-12146,
CVE-2017-10663 and CVE-2016-10150 for Linux kernel, CVE-2009-1542, CVE-
2016-7457 and CVE-2017-10918 for VMM, CVE-2016-8103, CVE-2016-5729 and
CVE-2006-6730 for SMM, and may still have vulnerabilities.1
Intel Software Guard Extensions (SGX) [6–9] enables execution of security-
critical application code, called enclaves, in isolation from the untrusted system
software. It also provides enclaves processor-specific keys, such as the sealing
key or the attestation key, which can be accessed by the enclaves. SGX is con-
sidered as a remarkable way to protect private keys when first proposed [7].
However, SGX has been found several vulnerabilities, such as cache-based side
channel attack [10,11], page-based side channel attack [12], and rollback attack
[9]. Although Intel has recently added support for monotonic counters (SGX
counters) [13] that an enclave developer may use for rollback attack protection,
this mechanism is likely vulnerable to bus tapping and flash mirroring attacks
[14] since the non-volatile memory used to store the counters resides outside the
processor package.
We propose a software solution, SSUKey, adopting Intel SGX to protect pri-
vate keys against the vulnerabilities of OS. Especially, SSUKey employs ECC-
based threshold cryptography (ECC-TC) to mitigate the vulnerabilities of SGX
and enhance the security of SGX. Each private key is partitioned into two shares
using ECC-TC, and the two key shares are protected using two Intel SGX
enclaves on local client and remote server respectively. Since the two enclaves
are carefully designed and the remote server can be carefully configured and
well protected by additional mechanisms, such as advanced firewall, it is very
difficult for an attacker to successfully perform side channel and rollback attacks
on both the local client and the remote server enclaves. If the attacker only com-
promises one of them, threshold cryptography (TC) ensures that the attacker
knows nothing about the private key. SSUKey also provides a central private key
management. A user may use the same private key on different websites for con-
1
All these vulnerabilities are published in Nation Vulnerability Database (NVD,
https://nvd.nist.gov/).
SSUKey: A CPU-Based Solution Protecting Private Keys on Untrusted OS 53
venience. When the private key is compromised, our SSUKey can directly revoke
the private key by the remote server immediately without having to inform all
the websites respectively. The overall central private key management also help
users globally monitor the usage of private keys and detect abnormal behaviors.
Windows CNG (Cryptography API: Next Generation), proposed by
Microsoft, sets a standard interface for both cryptography provider and appli-
cation. All Windows built-in applications (e.g., TLS, certificate tools, IE, Edge,
IIS, etc.) use CNG to protect cryptographic keys and Microsoft recommends
all Windows applications should use CNG to protect cryptographic keys. We
implement our SSUKey complying with CNG, supporting SM2 (ECC algorithm)
[15], SM3 (hash algorithm) [16], SM4 (symmetric algorithm) [17], PRF (pseudo
random function) [18], and NIST hash-based DRBG (random number genera-
tion algorithm) [19]. As a proof-of-concept, we evaluate the single-thread per-
formance of SSUKey on Intel NUC6 with i3-6100U CPU, which is designed low
power (15 W). We first evaluate SSUKey by testing cryptographic operations.
Compared to the software solution without any protection, the performance of
verifying signatures, symmetric operations and hash operations almost does not
change, while that of signing signatures declines from 481 Operations per second
(Ops) to 110 Ops. Second, we evaluate SSUKey by testing it in real applications,
file encryption and TLS download. Compared to the one without any protection,
the performance of file encryption declines less than 3%, while that of TLS down-
load (with 4 KB message) declines less than 1%. As the result, the performance
decline due to SSUKey is acceptable.
In summary, we claim following main contributions:
Intel CPU supports SGX starting with the Skylake microarchitecture, so our
SSUKey can work on any new CPUs afterwards.
2 Assumptions
We consider a powerful adversary who controls all software except SSUKey on
the target platform, including the OS. The adversary’s aim is to compromise
private keys. The adversary can block, delay, replay, read and modify all messages
sent by SSUKey. Especially, the adversary can revert the sealed secrets in file
system to previous state, i.e., rollback attack. The adversary can learn some
information about the private keys by performing side channel attack [10,11].
The adversary cannot break through CPU and compromise SGX enclaves
from inside. Specially the adversary cannot read or modify the enclave runtime
54 H. Li et al.
memory and has no access to processor-specific keys, such as the sealing key
or the attestation key. We also assume that it is very difficult to perform side
channel or rollback attacks on both local client and remote server successfully,
since the remote server is carefully configured and well protected by additional
mechanisms, such as advanced firewall.
SSUKey ensures that the integrity, confidentiality and freshness of private
keys. SSUKey does not aim to provide availability since the adversary controls
the OS and denial-of-service (DoS) is always possible. SSUKey authenticates a
user using the password entered by the user. But SSUKey does not protect the
path between the input device like keyboard and the enclave. This function can
be provided by SGXIO [20] which employs hypervisor to enhance the security
of I/O path. Our SSUKey is compatible with SGXIO.
3 SSUKey Design
3.1 Architecture
Figure 1 shows our system overview. Our system consists of a remote server and
some users’ local platforms. Each local platform may run multiple user appli-
cations that host local client enclaves (CEs) which have an access to the user’s
cryptographic keys. The remote server runs a service that hosts a remote server
enclave (SE) which assists CE to perform cryptographic operations coopera-
tively. The remote server is carefully configured and well protected by addi-
tional mechanisms, such as advanced firewall, intrusion detection system, and
latest malware detection or anti-virus software. Both CE and SE run a share
of ECC-TC algorithms and hold a share of corresponding cryptographic keys
respectively.
Figure 2 gives the key components of our SSUKey architecture. Authentica-
tion and session management modules authenticate CE/SE to its counterpart
and establish a trusted channel between CE and SE; sign/decrypt modules oper-
ate the cooperative ECC-TC algorithm shares and key management modules
manage the key shares on CE/SE; authentication module authenticates user
to CE; policy engine module checks while activity monitor module monitors
the operation requests from CE overall; persistent storage stores the sealed key
shares.
Remote Server
Platform A
App 1 App i
Service
CE CE
SE
OS OS
Sealed Key
Key Shares Key Mgmt Sign/Decrypt Sign/Decrypt Key Mgmt Shares
On success, SE obtains the CE’s public key (the CE has SE’s public key
hard-coded in its implementation). Specifically, SE is bounded with P KSE and
the CE is bounded with P KCE since the key pairs can only be accessed from
CE or SE respectively.
When a CE wants to connect to SE for the first time of current execution,
the CE and SE use the raw public keys P KCE /P KSE , following the procedure
specified in RFC 7250: Using Raw Public Keys in Transport Layer Security
(TLS) and Datagram Transport Layer Security (DTLS) and TLS 1.3 [21], to
establish a session key and use the established session key to protect all the
subsequent messages between the CE and SE.
The CE now has the public key P and a share of the private key dCE while
SE has the other share dSE of the private key as well as the key identifier ID
and the key status ST S. The CE seals all the secrets (dCE , P , and ID) two-
fold – firstly, the secrets are sealed using user specific secret, such as password,
and secondly, they are sealed using the CE’s sealing key, which is derived from
the code measurement of CE and the processor-specific secrets. The purpose
of user specific secret is to authenticate the untrusted application that employs
the CE and the application will be rejected to access the secrets if it fails to
authenticate itself. After two-fold sealing the secrets, the CE saves the sealed
secrets to persistent storage. The secrets on SE (ID, ST S and dSE ) are saved
in memory. When SE wants to store the secrets to non-volatile memory, SE
seals them first and keeps the state of them in memory. The sealed secrets
are protected from rollback attack as long as SE does not shut down since SE
can maintain the state of the secrets itself. We explains more about rollback
protection in Sect. 4.
(1) The CE sends SE the key identifier ID, a sign or decrypt opcode, and the
message.
(2) Upon receiving the ID and the opcode, SE checks the key policy, i.e., key
status ST S associated with the ID. If the key is available, SE signs or
decrypts the message using dSE .
SSUKey: A CPU-Based Solution Protecting Private Keys on Untrusted OS 57
4 Security Analysis
The theoretical security of SSUKey is based on the security of ECC-TC [15,22]
and the enclaves provided by Intel SGX. The adversary is allowed to attack
from the very beginning of the private key being setup. The adversary has to
compromise both two key shares on CE and SE separately to recover the private
key. In this section, we mainly illustrate that the adversary cannot successfully
compromise a private key by performing the most promising attacks on SSUKey,
including tampering system memory, eavesdropping channels between CE and
SE, performing side channel and rollback attacks on CE and SE.
Side Channel Protection. Intel SGX enclaves are vulnerable to side channel
attack, for example, cached-based side channel attack [10,11] and page-based
side channel attack [12]. CE and SE of our SSUKey are also threated by such
attacks. A successful side channel attack on SSUKey has to extract the key
shares from both the CE and SE. This is very difficult and almost impossible
since the remote server that hosts SE is carefully configured and well protected by
additional mechanisms. Compared with SE, the CE is more likely being attacked.
If the key share dCE on the CE is compromised due to side channel attack, TC
ensures that the compromise of dCE cannot be used to derive the private key.
SE verifies the CE’s identity during the establishment of the trusted channel
between the CE and SE. Thus, even though the adversary has compromised the
dCE on the CE, it cannot masquerade as the CE to request SE to help sign or
decrypt a message. This makes SSUKey tolerant to the compromise of the key
share dCE on the CE.
Rollback Protection. Intel SGX enclaves are also vulnerable to rollback attack
[9]. The adversary can exploit this vulnerability to break the freshness of the
private key. A successful rollback attack on SSUKey has to revert the state of
the two key shares from both the CE and SE. On the one hand, SE can be kept
online almost all the time by a lot of ways (e.g., [23]), and the adversary cannot
perform a rollback attack on SE successfully as long as SE does not shut down
since SE can maintain the state of the secrets itself. On the other hand, when
occasionally being restarted, for example, due to service update, SE can protect
the secrets from rollback attack using SGX counters [13] or other useful solutions
such as ROTE [24].
5 Evaluation
In this section, we describe our performance evaluation. We implemented our
system consisting of two enclave libraries for CE and SE respectively, a remote
service, and applications. The cryptographic library supports SM2 (Mediated
SM2), SM3, SM4, KDF, and NIST hash-based DRBG. The enclave library for
CE is implemented as a cryptography provider complying with Windows CNG.
The internal distributed architecture of SSUKey is transparent to the applica-
tions. Both the applications and the remote service are running separately within
a single thread atop Windows 10 on Intel NUC6 with i3-6100U CPU, which is
designed low power (15 W). The applications connect to the remote service via
local network (ping about 1.1 ms).
SSUKey: A CPU-Based Solution Protecting Private Keys on Untrusted OS 59
Throughput (Ops)
40 500
400
35
300
30 SSUKey Enc/Dec
200
Pure Enc/Dec
25
100
20 0
0.5 1 2 4 8 16 32 63 128 256 16 32 64 128 256 512 1024 2048 4096 8192
Size of data (Kilobyte) Size of data (Byte)
(a) symmetric encryption/decryption (b) signatures
1.6 12 SSUKey Enc SSUKey Dec Pure Enc Pure Dec
SSUKey Enc
1.4
Throughput (MB/s)
10
1.2 SSUKey Dec
8
1
Pure Enc 6
0.8
0.6 Pure Dec 4
0.4
2
0.2
0
0 16 32 64 128 256 512 1024 2048 4096 8192
16 32 64 128 256 512 1024 2048 4096 8192 -2
Size of data (Byte) Size of data (Byte)
(c) asymmetric encryption/decryption
45
40
Throughput (MB/s)
35
30
SSUKey Enc
25
20 SSUKey Dec
15 Pure Enc
10
5 Pure Dec
0
0.5 1 2 4 8 16 32 64 128 256
6 Related Work
TrustZone. ARM TrustZone (TZ) combines secure execution with trusted path
support. It provides one secure world isolated against a normal world. The secure
SSUKey: A CPU-Based Solution Protecting Private Keys on Untrusted OS 61
world operates a whole trusted stack, including security kernel, device drivers and
applications. TZ allows device interrupts being directly routed into the secure
world and thus supports generic trusted paths [25]. However, TZ does not dis-
tinguish between different secure application processes in hardware. It requires a
security kernel for secure process isolation, management, attestation and similar.
The prototype of SSUKey is promising to be migrated to TZ. Compared with
SGX, TZ is more competent to offer generic trusted I/O path.
7 Conclusion
In this paper, we have presented SSUKey, a CPU-based solution protecting pri-
vate keys. Our main idea is to adopt Intel SGX to resist the vulnerabilities of
privileged code, including OS kernel, and employ ECC-TC to mitigate the vul-
nerabilities of SGX, including side channel and rollback. We consider a powerful
adversary that controls the OS and has even compromised one share of the pri-
vate key. We provide a central key management function to help users globally
monitor the usage of private keys and detect the abnormal behaviors, minimizing
the risk of private key abusing. Our experiments demonstrate that our SSUKey
is acceptable with a moderate performance decline when compared with the one
without protection from SGX and TC.
References
1. Stratistics MRC: Digital Signature - Global Market Outlook (2016–2022). http://
www.strategymrc.com/report/digital-signature-market. Accessed Sept 2017
2. Services that Integrate with the YubiKey. https://www.yubico.com/solutions/#
FIDO-U2F. Accessed Sept 2017
3. SafeNet Inc.: 2014 Authentication Survey Executive Summary. https://safenet.
gemalto.com/news/2014/authentication-survey-2014-reveals-more-enterprises-
adopting-multi-factor-authentication/. Accessed Sept 2017
62 H. Li et al.
1 Introduction
A lattice is the set of all integer combinations of n linearly independent vectors
in Rm , where n is the rank of the lattice, m is the dimension of the lattice, and
the n linearly independent vectors are called a lattice basis. Successive minima
are fundamental parameters of a lattice. The ith successive minimum λi (L) (i =
1, 2, . . . , n) of the lattice L is the least number r such that the sphere centered
at the origin with radius r contains i linearly independent lattice vectors.
During recent decades, research on lattices has attracted many experts with
a computational point of view. Some important lattice problems are defined
below, where γ ≥ 1 is a function of rank.
CVP (Closest Vector Problem): Given a lattice and a target vector, find a
lattice point approximately closest to the target, i.e., a lattice point at a distance
from the target that is at most γ times the distance of the closest lattice point.
SIVP (Shortest Independent Vector Problem): Given a lattice of rank n, find
a maximal set of approximately shortest linearly independent lattice vectors,
i.e., n linearly independent vectors of length at most γ · λn .
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 65–74, 2018.
https://doi.org/10.1007/978-3-319-89500-0_5
66 W. Wang and K. Lv
The related complexity results for CVP and SIVP can be found in [1,4–7,13].
Another important lattice problem is the Covering Radius Problem (CRP).
The exact CRP is in Π2 at the second level of the polynomial hierarchy. CRP,
given a lattice L, asks for the maximum possible distance ρ(L) from the lattice
such that ρ(L) = max dist(x, L). Let ρ(L) be the covering radius of the
x∈span(L)
lattice L. We can see that there is always a lattice point with distance ρ(L) from
the target. Micciancio [14] presented an approximation version of this problem,
which is called the Covering Bounded Distance Decoding Problem (BDDρ ), given
a lattice L and a target point t, asks for a lattice point v such that v − t ≤
γ · ρ(L). Micciancio [14] also proposed the approximation version of CRP, which
given a lattice L, the goal is to find a value ρ̂ such that ρ(L) ≤ ρ̂ ≤ γ · ρ(L).
In 2004, Micciancio [14] showed that finding collision of some hash function
can be reduced to approximate CRP of lattices, where CRP only is used to
connect the average and worst case complexity of lattice problems. Motivated
by [7], Guruswami et al. [8] initiated the study of computation complexity for
CRP (GapCRP), which given a lattice and some value r, we are supposed to
decide if the covering radius is at most r. They showed that CRP2 lies in AM,
CRP√n/ log n lies in coAM and CRP√n lies in N P ∩ coN P which implies that
under Karp reductions CRP√n is not NP-hard unless N P = coN P . But they
did not give some hardness results for CRP [8]. Peikert [17] showed th CRP√n
lies in coNP in the p norm for 2 ≤ p ≤ ∞. The first hardness result for CRP
was presented by Haviv and Regev, they proved that there exists some constant
such that it is Π2 -hard in the p norm for any sufficiently large value of p [10].
In 2015, Haviv [9] proposed the Remote Set Problem (RSP) on lattices which
given a lattice asks to find a set of points containing a point which is far from the
lattice. Haviv proved that GapCRP√n/ log n is in NP, improving a result of [8].
Computing the covering radius of a lattice is a classic problem in the algorithm
point of view, but it has received so far little attention from an algorithm point
of view. In 2013, Micciancio and Voulgaris [15] gave a deterministic single expo-
nential time algorithm for CRP using a randomized polynomial time reduction
from [8].
We remark that to date virtually all known reduction for CRP is from the
promise version (GapCRP) to other promise problems. Using the transference
theorems [3], Micciancio and Goldwasser [16] gave the Karp reduction from
GapCRPγn to GapSVPγ , and they proved that the hardness of GapCRP can
be used to build provably secure cryptographic functions. In 2005, Guruswami
et al. [8] proved that there exists a Karp reduction from GapCRP√n to the
exact version of SIVP. In 2014, Haviv proved that if there exists a deterministic
polynomial-time algorithm for RSP then there exists a deterministic polynomial-
time Cook reduction from GapCRPγ to GapCVPγ /γ for some fixed γ [9]. We
have known CRP can related to CVP, but the connection is weaker. The cover-
ing radius corresponds to the worst case solution to CVP, because the covering
radius of L is the smallest ρ such that CVP instance (L, t, ρ) has solution for
any t ∈ span(L). Given our limited understanding of the complexity of lattice
The Reductions for the Approximating Covering Radius Problem 67
2 Preliminaries
The scalar product of two vectors x and y is x, y = i xi yi . dist(x, L) is the
minimum Euclidean distance from x ∈ R to any vector in L. All definitions
m
The integer n is the rank of the lattice and m is the dimension of the lattice.
The sequence of linearly independent vectors b1 , . . . , bn ∈ Rm is called a basis
of the lattice. We can represent b1 , . . . , bn as a matrix B with m rows and n
columns, that is, B = [b1 , . . . , bn ] ∈ Rm×n . The lattice L generated by a basis
B is denoted by L = L(B) = {Bx : x ∈ Zn }. For a basis B = [b1 , . . . , bn ], we
define the fundamental parallelepiped
n
P(B) = { xi bi : 0 ≤ xi < 1}.
i=1
i−1
b∗i = bi − μi,j b∗j , μi,j = bi , b∗j /b∗j , b∗j .
j=1
ρ(L)
Pr(dist(x, L) ≥ ) ≥ 1/2
x 2
where x is chosen uniformly at random from P(L).
Proof. The reduction uses an idea that dates back to the result of Haviv [9].
Let L(B) be a lattice, B = (b1 , b2 , . . . , bn ) is a lattice basis. Using the LLL
algorithm, we can obtain a reduced basis B = (b̃1 , . . . , b̃n ) such that for any
pair of consecutive vectors bi , bi+1 , bi+1 ≥ 1/2 · b∗i 2 , where b∗1 , . . . , b∗n are
∗ 2
Then,
≤ 2n/2−1 b∗ .
ρ(L) = ρ(L) n
Hence, we have
1
b∗n ≥ ρ(L).
2n/2−1
Since the project of every vector
On the other hand, let y = 1/2 · b∗n ∈ span(L).
∗ ∗
in L to span(bn ) is c · bn for some c ∈ Z, we have
1 ≥ 1 · b∗ ≥ 1 · 1 = 1 ρ(L).
dist( · b∗n , L) n ρ(L)
2 2 2 2n/2−1 2n/2
Let t = y = 1/2 · b∗n ∈ span(L), we have dist(t, L) ≥ 1/2n/2 ρ(L).
Consider the CRPα·γ instance L, the goal of the reduction is to use BDDργ
oracle to find a value ρ̂ such that ρ(L) ≤ ρ̂ ≤ γ · ρ(L). Call the BDDργ oracle
with the instance (L, t) to return a lattice point v such that v − t ≤ γ · ρ(L),
where t = 1/2 · b∗n . Then,
1
ρ(L) ≤ dist(t, L) ≤ v − t ≤ γ · ρ(L).
2n/2
Hence, ρ(L) ≤ 2n/2
v − t ≤ 2n/2
· γ · ρ(L). Let α = 2n/2
∈ Z, we have
Proof. Let (L, t) be an instance of CVP2γ . Call the BDDργ oracle on the instance
(L, t) to obtain a lattice point v that satisfies v − t ≤ γ · ρ(L).
By Lemma 1, we can obtain Pr (dist(x, L) ≥ ρ(L)/2) ≥ 1/2.
x∈P(L)
With a probability at least 1/2, we can obtain dist(t, L) ≥ ρ(L)/2 for a
target vector t (If t ∈ P(L), let x = t mod L, then dist(x, L) = dist(t, L)).
So the oracle returns a vector v = BDDργ (L, t) such that v − t ≤ γ · ρ(L) ≤
2γ · dist(t, L) with probability 1/2.
On the other hand, since dist(t, L) ≤ ρ(L) and any solution to CVPγ
instance (L, t) is also a solution to (L, t) as BDDργ instance, there is a triv-
ial reduction from BDDργ to CVPγ .
Next, we reduce BDDρ to SIVP. The idea is the reduction from the Bounded
Distance Decoding (BDD) to Unique Shortest Vector (USVP) of Lyubashevsky
and Micciancio [12].
√
Theorem 4. For any γ < 2, there is a deterministic polynomial-time reduc-
tion from BDDρ√3 to SIVPγ , where ρ(L) = max dist(x, L) > γ/2 · λn (L).
x∈span(L)
72 W. Wang and K. Lv
We consider the lattice L = L(B ) and invoke the SIVPγ oracle on input
L . The oracle return n + 1 linearly independent vectors w1 , . . . , wn+1 with
wi ≤ λn+1 (L ) for i = 1, 2, . . . , n + 1 and at least one vector must depend
n+1
on dn+1 . Assume that wn+1 = i=1 ci di , where for i = 1, 2, . . . , n, ci ∈ Z,
cn+1 ∈ Z\{0}, we have wn+1 2 ≤ γ 2 λ2n+1 (L ). There are two cases:
1. λn+1 (L ) ≤ μ2 + ρ2 . If |cn+1 | ≥ 2, then
n+1
wn+1 2 = ci bi − t2 + ρ2 ≤ γ 2 (μ2 + ρ2 ).
i=1
We can obtain
n
ci bi − t2 ≤ γ 2 (μ2 + ρ2 ) − ρ2 < 3ρ2 .
i=1
n √
Then, there exists a lattice point v = i=1 ci bi such that v − t < 3ρ.
2. λn+1 (L ) > μ2 + ρ2 .
The lattice L contains n linearly independent vector v1 , . . . , vn with vi ≤
λn (L) for i = 1, . . . , n. Moreover, (v1 , 0)T , . . . , (vn , 0)T , (v − t, ρ)T are n + 1
linearly independent vector in L , where v ∈ L such that μ = dist(v, t). Clearly,
we can see that λn (L) ≥ λn+1 (L ).
If |cn+1 | ≥ 2, then
we have ρ ≤ γ/2 · λn (L), contradicting the fact that ρ > γ/2 · λn (L).
n+1
Hence, |cn+1 | = 1, assume that cn+1 = −1, then wn+1 2 = i=1 ci bi −
t2 + ρ2 ≤ γ 2 λ2n (L). We have
n
ci bi − t2 ≤ γ 2 λ2n (L) − ρ2 < (2ρ)2 − ρ2 .
i=1
The Reductions for the Approximating Covering Radius Problem 73
n √
Then, there exists a lattice point v = i=1 ci bi such that v − t < 3ρ.
We now discuss to guess the ρ such that ρ(L) = max dist(x, L). From
x∈span(L)
Theorem 1, given a lattice L, using the LLL algorithm, we can construct a
reduced basis B = (b̃1 , . . . , b̃n ) such that for any pair of consecutive vectors
bi , bi+1 , bi+1 ≥ 1/2 · b∗i 2 , where b∗1 , . . . , b∗n are the Gram-Schmidt orthog-
∗ 2
onalized vectors. We can obtain that ρ(L) ≤ 1/2 · b∗ 2 ≤ 2n/2−1 b∗ .
i i n
Moreover, let t = 1/2 · b∗n ∈ span(L), we have ρ(L) ≥ 1/2 · b∗ .
n
Hence, we obtain ρ(L) ∈ [1/2 · b∗n , 2n/2−1 b∗n ]. So, we can find a d =
1/2·b∗n such that d ∈ [2−n/2 ρ, ρ]. In the polynomial-sized set {2n/2 d(1+1/n)i :
0 ≤ i ≤ log1+1/n 2n/2 }, we can find at least one ρ such that (1 − 1/n)ρ(L) ≤ ρ ≤
(1 + 1/n)ρ(L) by trying all the possible values of ρ. We can then redo the above
proof by appropriately modifying some terms by factors of 1 − 1/n or 1 + 1/n
in order to satisfy the inequalities that appear. The result will from a slightly
weaker problem BDDρ√3·(1−1/n)c to SIVPγ , where c > 0 is some constant. By
Corollary 2, we can obtain the reduction from BDDρ√3 to SIVPγ .
4 Conclusions
In our paper, we showed that there exists a polynomial-time reduction from
CRPα·γ to BDDργ . The reduction preserves the rank of the input lattice. But we
do not know at present whether CRP and BDDρ are equivalent. We also proved
that the reduction from approximating CRP to approximating CVP and SIVP.
Considering the reduction from CRP to SIVP which preserves the approximation
factor is also an interesting problem.
Acknowledgements. This work was supported by National Natural Key R&D Pro-
gram of China (Grant No. 2017YFB0802502), the Science and Technology Plan
Projects of University of Jinan (Grant No. XKY1714), the Doctoral Initial Foundation
of the University of Jinan (Grant No. XBS160100335), the Social Science Program of
the University of Jinan (Grant No. 17YB01).
References
1. Aharonov, D., Regev, O.: Lattice problems in NP intersect coNP. J. ACM 52,
749–765 (2005). Preliminary version in FOCS 2004
2. Babai, L.: On Lovasz lattice reduction and the nearest lattice point problem. Com-
binatorica 6(1), 1–13 (1986)
3. Banaszczyk, W.: New bounds in some transference theorems in the geometry of
numbers. Math. Ann. 296, 625–635 (1993)
4. Blömer, J., Naewe, S.: Sampling methods for shortest vectors, closest vectors and
successive minima. Theor. Comput. Sci. 410, 1648–1665 (2009)
74 W. Wang and K. Lv
5. Blöer, J., Seifert, J.P.: On the complexity of computing short linearly independent
vectors and short bases in a lattice. In: 31th Annual ACM Symposium on Theory
of Computing, pp. 711–720. ACM (1999)
6. Dubey, C., Holenstein, T.: Approximating the closest vector problem using an
approximate shortest vector oracle. In: Goldberg, L.A., Jansen, K., Ravi, R., Rolim,
J.D.P. (eds.) APPROX/RANDOM 2011. LNCS, vol. 6845, pp. 184–193. Springer,
Heidelberg (2011). https://doi.org/10.1007/978-3-642-22935-0 16
7. Goldreich, O., Goldwasser, S.: On the limits of nonapproximability of lattice prob-
lems. J. Comput. Syst. Sci. 60(3), 540–563 (2000)
8. Guruswami, V., Micciancio, D., Regev, O.: The complexity of the covering radius
problem on lattices and codes. Comput. Complex. 14(2), 90–121 (2005). Prelimi-
nary version in CCC 2004
9. Haviv, I.: The remote set problem on lattice. Comput. Complex. 24, 103–131 (2015)
10. Haviv, I., Regev, O.: Hardness of the covering radius problem on lattices. Chicago
J. Theor. Comput. Sci. 04, 1–12 (2012)
11. Lenstra, A., Lenstra, H., Lovász, L.: Factoring polynomials with rational coeffi-
cients. Math. Ann. 261, 515–534 (1982)
12. Lyubashevsky, V., Micciancio, D.: On bounded distance decoding, unique shortest
vectors, and the minimum distance problem. In: Halevi, S. (ed.) CRYPTO 2009.
LNCS, vol. 5677, pp. 577–594. Springer, Heidelberg (2009). https://doi.org/10.
1007/978-3-642-03356-8 34
13. Micciancio, D.: Efficient reductions among lattice problems. In: 19th SODA Annual
ACM-SIAM Symposium on Discrete Algorithm, pp. 84–98 (2008)
14. Micciancio, D.: Almost perfect lattices, the covering radius problem, and applica-
tions to Ajtai’s connection factor. Electron. Colloq. Comput. Complex. 66, 1–39
(2003)
15. Micciancio, D., Voulgaris, P.: A deterministic single exponential time algorithm
for most lattice problems based on Voronoi cell computation. SIAM J. Comput.
42(3), 1364–1391 (2013)
16. Micciancio, D., Goldwasser, S.: Complexity of Lattice Problems: A Cryptographic
Perspective. The Kluwer International Series in Engineering and Computer Sci-
ence, vol. 671. Kluwer Academic Publishers, Boston (2002)
17. Peikert, C.: Limits on the hardness of lattice problems in p norms. Comput. Com-
plex. 17(2), 300–351 (2008)
Solving Discrete Logarithm Problem
in an Interval Using Periodic Iterates
Abstract. The Pollard’s kangaroos method can solve the discrete logarithm
problem in an interval. We present an improvement of the classic algorithm,
which reduces the cost of kangaroos’ jumps by using the sine function to
implement periodic iterates and giving some pre-computation. Our experiments
show that this improvement is worthy of attention.
1 Introduction
The discrete logarithm problem (DLP) in a group G is to find the integer n such that
gn = h holds given g, h 2 G. As one of the most important mathematical primitives in
modern cryptography, there are some classic algorithms to solve it, such as the Pol-
lard’s rho method, the index calculus method, and the Pollard’s kangaroos method as
well [1]. It’s interesting to study solving discrete logarithm problem of the given
interval in the practical cryptography system. The discrete logarithm problem in an
interval is defined as following:
Definition 1 (DLP in an interval). Let p be a prime number and G be a cyclic subgroup
of order q in Fp . Given the generator g and an element h of G, and an integer N less
than the order of g, it is assumed that there exists an unknown integer n in the interval
[0, N] such that h = gn holds. To compute n.
Indeed, some instances belonging to this case had been studied, such as the DLP
with c-bit exponents (c-DLSE) [2–4], Boneh-Goh-Nissim homomorphic encryption
scheme [5], counting points on curves or abelian varieties over finite fields [6], the
analysis of the strong Diffie-Hellman problem [1, 7], and side-channel or small sub-
pffiffiffi
group attacks [8, 9] and reference therein. Pollard’s rho algorithm costs time Oð nÞ to
solve it. [4] improves Pollard’s kangaroos algorithm to solve DLP in an interval of size
pffiffiffiffi
N with expected running time ð2 + O(1)) N group operations and polynomial stor-
age. Galbraith et al. improve it by increasing the number of kangaroos showing that
when the number of kangaroos is four, total number of jumps is optimal and number
pffiffiffiffi
of group operations is ð1:714 + O(1)) N [10]. [11] uses series of small integer
multiplications to replace every multiplication of elements of a group, which reduces
the cost of each jump in Pollard’s rho algorithm and to determine to compute a
complete multiplication according to whether some function values belong to the set
of distinguished points or not. The definition of distinguished points is originally
introduced in [12] for time-memory trade-off, which is some elements of group
G satisfying a certain condition such that these points can be checked easily. [11]
showed that when the related results meet the pre-defined distinguished point condi-
tion, we make a complete integer multiplication operation, thereby reducing the
number of complete
multiplications and time cost of each jump. A preprocessing
storage size is O ðlogpÞr þ 1 loglogp and running time is at least 10 times faster than
the original algorithm.
Contribution. We use the sine-function to implement periodic iterates and give some
pre-computation to reduce the cost of Pollard’s kangaroos algorithm obviously. we can
reduce jj pjj2 bit operations of a complete integer multiplication to at most d kekg2 bit
operations, where d ¼ loge p. The pre-defined distinguished point condition is [η − k, η],
where k ¼ logg þ m; m is an integer satisfying 0 m\g bloggc. Furthermore, we also
properly increase the number of kangaroos to reduce the total number of jumps, which
improve both the time cost and the total number of jumps. Compared with the classic
algorithm, the efficiency is noticeably improved.
The Pollard’s kangaroos algorithm [13] is based on random walk and each kangaroo
jumps one step at a time. The main process is: First, fix a set of small integers of
k elements S ¼ fs1 ; s2 ; . . .; sk g, which is also considered as the set of the distances of
pffiffiffiffi
jump steps such that the mean value of the elements of S is about N and k is a small
integer. We randomly select some elements from group G to form the distinguished set
D = {g1, g2, g3,…, gt}, such that the size of D is approximately jDj=jGj ¼ pcffiffiNffi, where
c is a constant and c 1. Define a random map f from G to S. mA kangaroo’s jumps
f ðgÞ
corresponds to a sequence in G; gi þ 1 ¼ gi gi ; i ¼ 0; 1; 2; . . .; starting from the given
g0. Let d0 = 0, di+1 = di + f(gi), i = 0, 1, 2, …. Then, di is the sum of distances of first
pffiffiffiffi
i jumps of kangaroo and gi ¼ g0 gdi ; i ¼ 0; 1; 2; . . .. The algorithm requires 2 N group
operations and a small amount of additional storage space.
For each jump, we need to compute a complete integer multiplication between two
group elements, which costs about jj pjj2 bit operations. When the product belongs to
the distinguished set D, the values related to this jump will be stored for collision
detection; otherwise, the related values will not be stored. Thus, storage operations are
not carried out every time, so it is not necessary to do a full product for each jump.
Solving Discrete Logarithm Problem in an Interval Using Periodic Iterates 77
and
calculate g2 ¼ g0 Ms0 Ms1 ; Ms0 Ms1 2 Ml has been pre-computed without computing
g1 ¼ g0 Ms0 . In the next process, for each iteration to complete a jump, the index value
s2 ; s3 ; etc; can be calculated in the same way, and we do not compute the complete
multiplication until that the pre-defined distinguished point condition is met, that is, the
value of the function s falls in the interval [η − k, η]. At same time, we get the corre-
sponding point gi ¼ g0 Ms0 . . . Msi1 , where Ms0 . . . Msi1 2 Ml has been basically pre-
computed and can be obtained through look-up the table. If the collision has not occurred
for l iterations, i.e., gl þ 1 = g0 Ms0 . . . Msl , the computed value of gl þ 1 will be stored in the
table and the algorithm can be re-executed with gl þ 1 as a new starting point.
We denote two kangaroos as T and W. T and W jump respectively from the midpoint
of the interval (i.e., g0 ¼ gN=2 ) and g00 ¼ h to the right side of the interval. The jumping
process of T and W are alternately executed. The branches T and W randomly select the
initial values s0 ¼ sðg0 Þ) and s00 ¼ ðg00 Þ from the set S respectively and set their own
initial jumping points and initial distances of the jump step of the two branches
be g0 ¼ Ms0 ; g ¼ Ms00 and d0 ðT) ¼ 0; d0 ðW) ¼ 0. When i > 1, we have di ðTÞ ¼
Pi1 Pj1
i¼0 tsi ; dj ðWÞ ¼ i¼0 ts0j . Let Ti and Wj denote the i and j-th jump of the two branches
respectively. i and j start from 0. T0, W0, T1, W1, T2, W2,…,Ti-1, Wi-1 are alternately
executed. When the distinguished point condition is satisfied, the current jumping
point’s value of branch T or W will be computed. Then we can get the triplets that are
ðgi ; di ðTÞ; TÞ or g; dj ðWÞ; W and store it into the corresponding table Lt or Lw at index gi
and gj respectively. A collision occurs when gi is accessed by a different type of branch
kangaroo, so the value of n ¼ N=2 þ di ðT Þ dj ðW Þ can be computed and the algorithm
terminates.
Running Time. From Eq. (3), we can know the time of calculating sðx; yÞ includes
d multiplications of modulo η, d-1 additions of modulo η, and d sinusoidal operations.
Considering the time of calculating a sine operation as a constant C0 and ignoring the
relatively small cost of addition, time of computing s is about dMulðjjgjjÞ þ
ðh þ 1=lÞMulðjj pjjÞ þ dC0 , where h is the probability that a point is a distinguished
point, and l is the maximum number of iterations of the pre-process. Notice that the
time of calculating sine function is neglected after processing in (2), the time of a jump
is dMulðjjgjjÞ þ ðh þ 1=lÞMulðjj pjjÞ, where h is the probability that a point is a dis-
tinguishable point, and l is the maximum number of iterations in the preprocessing
table. Since the total number of jumps is N=ð2mÞ þ 2m þ 2=h, the total time is
fdMulðjjgjjÞ þ ðh þ 1=l
2ÞMul
ðjj pjjÞg fN=ð2mÞ þ 2m þ 2 1=hg. From (1) and (3),
we need about d jjejj g bit operations required for a complete multiplication, obvi-
ously smaller than the jj pjj2 bit operations required for a complete multiplication of the
original algorithm. Usually we have the comparison results, seeing Table 1.
We can take the pre-defined distinguished point condition as ½g k; g , the number
pffiffiffiffi
of group operations is about logg=gð2 + O(1)) N. [10] improves the classic Pollard’s
kangaroos algorithm by increasing the number of kangaroos such that the total number
of jumps is reduced and the probability of collision is increased. When the number of
kangaroos is four, the total number of jumps is optimal and number of group operations
pffiffiffiffi
is ð1:714 + O(1)) N.
Solving Discrete Logarithm Problem in an Interval Using Periodic Iterates 79
Given a 32-bit prime number p = 2147483659, g = 29, and take e ¼ 8; N ¼ 50. Since
pffiffiffiffi
N = 7, here we set g = 13 and C ¼ f1; 2; 3; . . .; 13g. Then k ¼ bloggc ¼ 3, so the
[η − k, η]= [10, 13]. The distinguished point condition is [10, 13].
interval
M ¼ g1 ; g2 ; . . .; g13 . Given h = 44895682, then our task is to seek x in the interval
[1, N] such that 29x mod p ¼ 44895682. Here, we set l = 3 and precompute
Ml ¼ ff291 ; 292 ; . . .; 2913 g [ f1gg3 . We show some instances of experiment for dif-
ferent size primes p to display advantage of the improved algorithm in Table 2.
5 Conclusion
Acknowledgements. This work is partially supported by National Key R&D Program of China
(2017YFB0802502) and NSF (No. 61272039).
80 J. Liu and K. Lv
References
1. McCurley, K.: The discrete logarithm problem. In: Proceedings of the Symposium in
Applied Mathematics, pp. 49–74. AMS (1990)
2. Gennaro, R.: An improved pseudo-random generator based on discrete log. In: Bellare, M.
(ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 469–481. Springer, Heidelberg (2000). https://
doi.org/10.1007/3-540-44598-6_29
3. Patel, S., Sundaram, G.S.: An efficient discrete log pseudo random generator. In: Krawczyk,
H. (ed.) CRYPTO 1998. LNCS, vol. 1462, pp. 304–317. Springer, Heidelberg (1998).
https://doi.org/10.1007/BFb0055737
4. van Oorschot, P.C., Wiener, M.J.: On Diffie-Hellman key agreement with short exponents.
In: Maurer, U. (ed.) EUROCRYPT 1996. LNCS, vol. 1070, pp. 332–343. Springer,
Heidelberg (1996). https://doi.org/10.1007/3-540-68339-9_29
5. Boneh, D., Goh, E.-J., Nissim, K.: Evaluating 2-DNF formulas on ciphertexts. In: Kilian,
J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 325–341. Springer, Heidelberg (2005). https://doi.
org/10.1007/978-3-540-30576-7_18
6. Gaudry, P., Schost, É.: A low-memory parallel version of Matsuo, Chao, and Tsujii’s
Algorithm. In: Buell, D. (ed.) ANTS 2004. LNCS, vol. 3076, pp. 208–222. Springer,
Heidelberg (2004). https://doi.org/10.1007/978-3-540-24847-7_15
7. Cheon, J.H.: Security analysis of the strong Diffie-Hellman problem. In: Vaudenay, S. (ed.)
EUROCRYPT 2006. LNCS, vol. 4004, pp. 1–11. Springer, Heidelberg (2006). https://doi.
org/10.1007/11761679_1
8. Gopalakrishnan, K., Thériault, N., Yao, C.Z.: Solving discrete logarithms from partial
knowledge of the key. In: Srinathan, K., Rangan, C.P., Yung, M. (eds.) INDOCRYPT 2007.
LNCS, vol. 4859, pp. 224–237. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-
540-77026-8_17
9. Lim, C.H., Lee, P.J.: A key recovery attack on discrete log-based schemes using a prime
order subgroup. In: Kaliski, B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 249–263.
Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0052240
10. Galbraith, S.D., Pollard, J.M., Ruprai, R.S.: Computing discrete logarithm in an interval.
Math. Comput. 82(282), 1181–1195 (2013)
11. Cheon, J.H., Hong, J., Kim, M.: Speeding up the Pollard rho method on prime fields. In:
Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 471–488. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-89255-7_29
12. Quisquater, J.-J., Delescaille, J.-P.: How easy is collision search? Application to DES. In:
Quisquater, J.-J., Vandewalle, J. (eds.) EUROCRYPT 1989. LNCS, vol. 434, pp. 429–434.
Springer, Heidelberg (1990). https://doi.org/10.1007/3-540-46885-4_43
13. Pollard, J.M.: Kangaroos, Monopoly and Discrete Logarithms. J. Cryptol. 4, 437–447 (2000)
Distributed Pseudorandom Functions
for General Access Structures in NP
1 Introduction
Distributing the computation of a function is a rather important method
employed in order to avoid performance bottlenecks as well as single point of
failures due to security compromises or even increased demand (i.e., overloaded
servers). Investigating the distribution of trapdoor functions for public key cryp-
tography has already received a lot of attention [2,3]. However, the computation
of distributed functions that are useful in secret key cryptography e.g., pseudo-
random functions (PRFs) has received limited attention [7–9].
As a motivating example for the use of distributed PRFs, let us consider the
scenario of a one-time password system (e.g., RSA SecurID). Users obtain one-
time passwords from this system by sending inputs. Each password should be
random and independent from the other passwords, and asking the evaluating
system on the same input twice should yield the same (random) output. In this
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 81–87, 2018.
https://doi.org/10.1007/978-3-319-89500-0_7
82 B. Liang and A. Mitrokotsa
system, one assumes the existence of an authentication server, who has a secret
key K and responds with PRF outputs PRFK (x) that are used as the users’ one-
time passwords. Since the server knows the secret PRF key, this authentication
server is a prime target for attacks. The natural solution to this problem is to
distribute the role of the authentication server among many servers. This leads
to the notion of distributed PRFs (DPRFs).
In this paper, we investigate whether it is possible to construct distributed
PRFs for a general class of access mechanism, going beyond the existing thresh-
old access structure (i.e., at least t-out-of-N servers are required to evaluate
the PRF) and the access structure that can be described by a polynomial-size
monotone span programs (e.g., undirected connectivity in a graph).
More precisely our contributions are two-fold: (i) we introduce the notion of
single round distributed PRFs for a general class of access structures (monotone
functions in NP), (ii) we provide a provably secure general construction of dis-
tributed PRFs for every mNP access structure from puncturable PRFs based on
indistinguishable obfuscation.
Distributed PRFs. Distributed pseudorandom functions (DPRFs), originally
introduced by Naor et al. [7], provide the properties of regular PRFs (i.e., indis-
tinguishability from random functions) and the capability to evaluate the func-
tion f (approximate of a random function) among a set of distributed servers.
More precisely, Naor et al. [7] considered the setting where the PRF secret key is
split among N key servers and at least t servers are needed to evaluate the PRF.
The distributed PRF in this setting is known as distributed PRF for threshold
access structure. Very importantly, evaluating the PRF is done without recon-
structing the key at a single location. Naor et al. [7] also presented constructions
of DPRFs based on general monotone access structures, such as monotone sym-
metric branching programs (contact schemes), and monotone span programs.
Although some distributed PRFs (DPRFs) schemes have been proposed, all
previous constructions have some limitations. Naor et al. [7] gave several efficient
constructions of certain weak variants of DPRFs. One of their DPRF construc-
tions requires the use of random oracles. To eliminate the use of random oracles,
Nielsen [9] provided the first regular DPRF by distributing a slightly modified
variant of the Naor-Reingold PRF [8]. Unfortunately, the resulting DPRF is
highly interactive among the servers and requires a lot of rounds.
Boneh et al. [1] gave an efficient construction of DPRF for t-out-of-N thresh-
old access structure from LWE using a key homomorphic PRF. Boneh et al.
apply Shamir’s t-out-of-N threshold secret sharing scheme [2] on top of their
LWE-based key homomorphic PRF scheme, which results in an one-round DPRF
with no interaction among the key servers. However, the question of constructing
single round, non-interactive distributed PRFs that support more general access
structures such as monotone functions in NP remained open prior to this work.
Our Contributions. In this work, we consider single round distributed PRFs
for a more general class of access structures than the existing monotone access
structures: monotone functions in NP, also known as mNP (firstly considered
by Komargodski et al. [6]). An access structure that is defined by a function in
Distributed Pseudorandom Functions for General Access Structures in NP 83
2 Preliminaries
2.1 Monotone-NP and Access Structures
A function f : 2[n] → {0, 1} is said to be monotone if for every Γ ⊆ [n], such that
f (Γ ) = 1 it also holds that ∀Γ ⊆ [n] such that Γ ⊆ Γ it holds that f (Γ ) = 1.
A monotone Boolean circuit is a Boolean circuit with AND and OR gates
(without negations). A non-deterministic circuit is a Boolean circuit whose
inputs are divided into two parts: standard inputs and non-deterministic inputs.
A non-deterministic circuit accepts a standard input if and only if there is some
setting of the non-deterministic input that causes the circuit to evaluate to 1.
A monotone non-deterministic circuit is a non-deterministic circuit, where the
monotonicity requirement applies only to the standard inputs, that is, every path
from a standard input wire to the output wire does not have a negation gate.
Let us denote by Advpseudo
Π,A := Pr[b = b] − 1/2 the advantage of the adversary
A in guessing b in the above experiment, where the probability is taken over
the randomness of the challenger and of A. We say the distributed PRFs Π
for an mNP access structure A is selectively pseudorandom if there exists a
negligible function negl(λ) such that for all non-uniform PPT adversaries A,
it holds that Advpseudo
Π,A ≤ negl(λ).
Prog
5 Conclusion
In this paper, we consider single round distributed PRFs for a more general class
of access structures: monotone functions in NP. We also give a generic construc-
tion of distributed PRFs for every mNP access structure from puncturable PRFs
based on indistinguishable obfuscation.
References
1. Boneh, D., Lewi, K., Montgomery, H., Raghunathan, A.: Key homomorphic PRFs
and their applications. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, Part
I. LNCS, vol. 8042, pp. 410–428. Springer, Heidelberg (2013). https://doi.org/10.
1007/978-3-642-40041-4 23
2. De Santis, A., Desmedt, Y., Frankel, Y., Yung, M.: How to share a function securely.
In: Proceedings of STOC 1994, pp. 522–533. ACM, New York (1994)
3. Desmedt, Y., Frankel, Y.: Threshold cryptosystems. In: Brassard, G. (ed.) CRYPTO
1989. LNCS, vol. 435, pp. 307–315. Springer, New York (1990). https://doi.org/10.
1007/0-387-34805-0 28
4. Garg, S., Gentry, C., Halevi, S., Raykova, M., Sahai, A., Waters, B.: Candidate
indistinguishability obfuscation and functional encryption for all circuits. In: Pro-
ceedings of FOCS 2013, Washington, D.C., USA, pp. 40–49. IEEE Computer Society
(2013)
5. Grigni, M., Sipser, M.: Monotone complexity (1990)
6. Komargodski, I., Naor, M., Yogev, E.: Secret-sharing for NP. J. Cryptol. 30(2),
444–469 (2017)
7. Naor, M., Pinkas, B., Reingold, O.: Distributed pseudo-random functions and KDCs.
In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 327–346. Springer,
Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X 23
8. Naor, M., Reingold, O.: Number-theoretic constructions of efficient pseudo-random
functions. J. ACM (JACM) 51(2), 231–262 (2004)
9. Nielsen, J.B.: A threshold pseudorandom function construction and its applications.
In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 401–416. Springer, Heidel-
berg (2002). https://doi.org/10.1007/3-540-45708-9 26
Reducing Randomness Complexity
of Mask Refreshing Algorithm
1 Introduction
Side Channel Analysis (SCA) [8,9] has become a serious threat to implemen-
tations of cryptographic algorithms. Among existing countermeasures, one of
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 88–101, 2018.
https://doi.org/10.1007/978-3-319-89500-0_8
Reducing Randomness Complexity of Mask Refreshing Algorithm 89
the most widely used is masking [3–5,7,10–13]. A d-th order masking consists
in splitting each secret-dependent intermediate variable x (called sensitive vari-
able) into d + 1 shares (x0 , x1 , . . . , xd ), where (x1 , . . . , xd ) are randomly picked
up. When d-th order masking is utilized to secure a block cipher, a so-called d-th
order masking scheme should be designed to operate on those shares.
The first probing secure masking scheme is the ISW scheme [7], which works
as a transformer mapping the AND and NOT gates to secure gates. In this way,
they aim to map an S-box circuit to a randomized S-box circuit which satisfies
the d-th order security. Rivain and Prouff apply the ISW transformers to secure
the whole AES software implementation [13], by extending the ISW transformer
to secure multiplication over Fn2 . In this process, when the inputs of an ISW
multiplication are related, the circuit may suffer joint leakage, which we call the
“dependent-input issue” in the sequel. In order to solve the dependent-input
issue, a so-called refreshing algorithm [5,6,13] should be inserted. Among the
existing refreshing algorithms, the refreshing algorithm proposed in [6] is the
only one which can actually solve the dependent-input issue. This refreshing
algorithm satisfies d-SNI (Strong Non-Interference) [1], which we call the d-SNI
refreshing in the sequel. It can be proved that the d-SNI refreshing algorithm can
effectively eliminate the dependence of the input shares [1]. However, the d-SNI
refreshing introduces exponential number of extra randomness, which may lead
to a low efficiency. Therefore, improving the efficiency of the d-SNI refreshing,
while maintaining the d-th order security of the dependent-input multiplication,
is of great importance.
In [13], authors propose a refreshing algorithm which is very efficient
with only d extra randomness. However, it only satisfies d-TNI (Tight Non-
Interference) security [1], where d-TNI refreshing algorithms can hardly preserve
the d-th order security of the dependent-input multiplication. In [2], a random-
ness reduction strategy for ISW multiplication is proposed. However, with this
new strategy, the obtained algorithm only satisfies d-TNI security. d-TNI refresh-
ing algorithm cannot solve the dependent-input issue.
In this paper, we claim that the d-SNI refreshing algorithm is overqualified
when the multiplication is masked with ISW-like schemes [5,7,13]. According
to the property of the ISW-like schemes, we relax the security requirement
of the refreshing algorithm from d-SNI to conditional d-SNI (weaker than d-
SNI). According to this new security requirement, we obtain a conditional d-SNI
refreshing algorithm through search for security order d ≤ 11, which requires less
randomness generations than the original d-SNI refreshing algorithm. Finally,
we implement the two refreshing algorithms on ARM, and compare the random
generations and practical performances of both refreshing schemes.
Paper Organization. This paper is organized as follows. In Sect. 2, we give
some useful notations and review the compositional security notions and the
dependent-input issue. In Sect. 3, we prove that a refreshing algorithm satisfying
conditional d-SNI can solve the dependent-input issue, and accordingly propose a
conditional d-SNI refreshing algorithm. In Sect. 4, we compare the performances
of our proposal with that of the d-SNI refreshing. In Sect. 5, we conclude this
paper.
90 S. Qiu et al.
2 Preliminaries
2.1 Notations and Notions
[n1 , n2 ] denotes the integer set {n1 , n1 + 1, . . . , n2 }. For a set S, |S| denotes
the cardinality of S, and S denotes the complementary set of S. For a set S
which can be represented as S = {s1 , s2 , · · · , sn }, (ai )i∈S represents the set
{as1 , as2 , . . . , asn }. Linear mapping is denoted as (·). The arrow ← represents
$
to assign the value of the right variable to the left variable. ←−− represents to
randomly pick one value from the set on the right and assign this value to
the left variable. ⊕ denotesXOR operation on F2 (or Fn2 ), and · denotes
n AND
n
operation on F2 (or Fn2 ). i=1 represents the XOR sum, namely i=1 xi =
x1 ⊕ x2 ⊕ · · · ⊕ xn . In this paper, the compositional security notions [1] are
involved. We review them here.
To deal with the dependent-input issue, Duc et al. [6] propose a new
refreshing algorithm satisfying d-SNI, which we call the d-SNI refreshing in the
sequel. Although being proven to reach d-th order security when plugged in the
dependent-input multiplication, it requires more random generations. The d-TNI
refreshing algorithm [13] needs d random generations, while the d-SNI refreshing
algorithm needs d(d + 1)/2 random generations.
to the description of the refreshing algorithm (Algorithm 1), set I 2 contains ri
(ri = ai ⊕ bi ), ri , linear
combinations of ri and input shares ai . Therefore, set
I 2 may depend on (ai ⊕ bi )i∈B . Finally, the probes in the circuit I 1 ∪ I 2 can
be simulated with at most |A ∪ B| shares of the input, as S11 ∪ S 2 ⊆ (ai )i∈A∪B .
The proof can be divided in the following two steps, where I denotes the
set of all possible I 1 , I|c1 denotes the set of all possible I 1 satisfies the first
constraint, and I|c1,c2 denotes the set of all possible I 1 satisfies the first and
second constraints.
1. First, we prove that if there exists I 1 ∈ I satisfying |S11 ∪ S 2 | > t,
there will exist I 1 ∈ I|c1 which also satisfies |S11 ∪ S 2 | > t.
Proof. If the circuit in Fig. 2 is d-SNI when I 1 satisfying the two constraints,
there exists |S11 ∪ S 2 | ≤ |I 1 | + |I 2 | on condition of the two constraints. According
to Proposition 4, for arbitrary I 1 , there also exists |S11 ∪S 2 | ≤ |I 1 |+|I 2 |. Namely,
the circuit satisfies d-SNI.
Proof. If the circuit in Fig. 2 is d-TNI when I 1 satisfying the two constraints,
there exists |S11 ∪ S 2 | ≤ |I 1 | + |I 2 | + |O| on condition of the two constraints.
According to Proposition 4, for arbitrary I 1 , there also exists |S11 ∪ S 2 | ≤ |I 1 | +
|I 2 | + |O|. Namely, the circuit satisfies d-TNI.
Algorithm 3. Sel.
Require:
The set S21 and S 2 .
Ensure:
2
of |Imin |.
The value
1: T ← i∈B∗ ri
2: T ← Rewritten(T ) In the increasing order of the index of ri
3: m ← P(T ) The number of the parts, m
4: n ← NOA(T ) The number of the non-overlap adjacent pairs, n
5: {k1 , k2 , · · · , kt0 } ← OA(T ) t0 overlapping adjacent pairs, with each pair has kt elements
2
6: |Imin | ← 2m − n
7: if t0 = 0 then
8: for t = 1 to t0 do
9: if kt is even then
2 2
10: |Imin | ← |Imin | − kt /2
11: else
2 2
12: |Imin | ← |Imin | − (kt + 1)/2 Complete the computation
13: end if
14: end for
15: end if
2
16: return |Imin |
execute algorithm Sel several times to judge if |S 2 | ≤ |I 2 | holds for every pos-
sible S 2 and S21 . Finally, we can judge if this refreshing algorithm satisfies the
conditional d-SNI.
The description of Sel is given in Algorithm 3. In the sequel, we explain
∗
in detail how algorithm Sel maps S2 and S to
1 2
|Imin |. For B ⊆ B, accord-
2
to S2 , wecan obtain
1
ing i∈B ∗ bi , where i∈B ∗ bi can berewritten as
( i∈B ∗ ai )⊕( i∈B ∗ (ai ⊕bi )). Then, if and only if we can obtain i∈B ∗ (ai ⊕bi )
according to I 2 , the probes in I 2 ∪ S21 can be related to ( i∈B ∗ ai ). Namely, S 2
equals (ai )i∈B ∗ . Therefore, for a given S21 and a given S 2 = (ai )i∈B ∗ , S 2 should
relate to i∈B ∗ (ai ⊕ bi ). To obtain the minimal set Imin
2
is equivalent to reveal
i∈B ∗ (ai ⊕ bi ) with minimal
internal variables of the refreshing algorithm.
In the following, i∈B ∗ ri is called the “target” and it is denoted 2 by T .
Algorithm Sel aim to find the smallest subset Imin 2
, which satisfies Imin = T .
Reducing Randomness Complexity of Mask Refreshing Algorithm 97
1
We do not utilize ri . The case when ri exists in I 2 is checked through brute-force
search.
98 S. Qiu et al.
d0 m(i)
d=3 1 (1), (2)
d=4 1 (1), (2)
d=5 3 (3, 2, 5), (3, 2, 4), (2, 3, 4), (2, 3, 5), (3, 1, 5), (3, 1, 4), (2, 1, 4), (2, 1, 5)
d=6 3 (3, 5, 2), (3, 6, 2), (3, 4, 2), (3, 2, 5), (3, 2, 4), (3, 2, 6), (2, 5, 3), (2, 6, 3)
d=7 4 (4, 3, 6, 2), (1, 4, 6, 3), (1, 4, 6, 2), (3, 4, 6, 2), (1, 4, 7, 3), (4, 1, 6, 2)
d=8 5 (5, 4, 6, 3, 2), (2, 3, 5, 8, 4), (3, 2, 7, 4, 8), (5, 4, 8, 3, 2), (2, 6, 3, 4, 7),
(4, 6, 2, 3, 8)
d=9 7 (5, 4, 8, 3, 7, 9, 2), (3, 6, 1, 7, 4, 8, 2), (3, 7, 5, 4, 9, 6, 2), (2, 8, 5, 1, 7,
4, 9)
d = 10 7 (5, 4, 8, 3, 6, 10, 2), (2, 7, 5, 10, 3, 8, 4), (5, 9, 2, 8, 3, 10, 4) (5, 4, 8, 3,
6, 9, 2), (5, 2, 7, 8, 3, 9, 4), (5, 6, 2, 9, 4, 7, 3)
d = 11 9 (6, 2, 10, 5, 8, 3, 11, 9, 4)
4 Complexity Comparisons
In this section, we implement the refreshing algorithms on ARM, and verify the
efficiency of our proposal by comparing the required random generations, clock
cycles and ROM consumptions. For each algorithm (the d-SNI refreshing and
our new proposal), we have six implementations for d = 3, 4, . . . , 8. Codes were
written in assembly language for an ARM-based 32-bit architecture. Accord-
ing to the comparison results, our proposal outperforms the d-SNI refreshing in
terms of both the randomness complexity and the arithmetic complexity, as sig-
nificantly less random generations, less clock cycles, and less ROM consumptions
are involved in our proposal than in the d-SNI refreshing.
Reducing Randomness Complexity of Mask Refreshing Algorithm 99
Clock Cycle
Random Generations
150 1.2
140
40 36 1
100 103
30
28 95 0.8
21 77 73 0.6
20 15 20 40 56 59
16 17 50 52 0.4
10 13 39
10 6 9 11 34 0.2
8
4 5
0 0 0
3 4 5 6 7 8 9 10 11 3 4 5 6 7 8
Masking Order d Masking Order d
Fig. 5. The random complexity of the d-SNI Fig. 6. Clock cycles and ROM con-
refreshing and the new refreshing. sumptions (KBytes) of the new
refreshing and the d-SNI refreshing.
First, we compare the random generations of the two refreshing schemes. For
ISW-like masking schemes, the main overhead lies in the randomness generation.
Random generations execute by calling a random generation module, which leads
to significant time and storage consumptions. Therefore, decreasing the random-
ness generation is of crucial importance. The randomness complexities of the
new refreshing algorithm and the d-SNI refreshing algorithm are compared in
Fig. 5. We can see that the new refreshing makes a remarkable improvement and
decreasing 33%–70% random generations than the d-SNI refreshing. Then, in
order to verify the arithmetic complexity, we implemented two refreshing algo-
rithms on 32-bit ARM core. As we have already compared the randomness com-
plexity, the random generations are no longer executed in the implementations.
The random numbers are assumed to be stored in ROM without considering its
ROM consumption. The performances of the implementations, including clock
cycles and ROM consumptions, are given in Fig. 6. According to Fig. 6, our pro-
posal are better than the d-SNI refreshing in both timing performance and ROM
consumption.
References
1. Barthe, G., Belaı̈d, S., Dupressoir, F., Fouque, P.-A., Grégoire, B., Strub, P.-Y.,
Zucchini, R.: Strong non-interference and type-directed higher-order masking. In:
Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communi-
cations Security, pp. 116–129. ACM (2016)
2. Belaı̈d, S., Benhamouda, F., Passelègue, A., Prouff, E., Thillard, A., Vergnaud,
D.: Randomness complexity of private circuits for multiplication. In: Fischlin, M.,
Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9666, pp. 616–648. Springer,
Heidelberg (2016). https://doi.org/10.1007/978-3-662-49896-5 22
3. Bilgin, B., Gierlichs, B., Nikova, S., Nikov, V., Rijmen, V.: Higher-order threshold
implementations. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol.
8874, pp. 326–343. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-
662-45608-8 18
4. Chari, S., Jutla, C.S., Rao, J.R., Rohatgi, P.: Towards sound approaches to counter-
act power-analysis attacks. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666,
pp. 398–412. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-
1 26
5. Coron, J.-S., Prouff, E., Rivain, M., Roche, T.: Higher-order side channel security
and mask refreshing. In: Moriai, S. (ed.) FSE 2013. LNCS, vol. 8424, pp. 410–424.
Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43933-3 21
6. Duc, A., Dziembowski, S., Faust, S.: Unifying leakage models: from probing attacks
to noisy leakage. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS,
vol. 8441, pp. 423–440. Springer, Heidelberg (2014). https://doi.org/10.1007/978-
3-642-55220-5 24
7. Ishai, Y., Sahai, A., Wagner, D.: Private circuits: securing hardware against prob-
ing attacks. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 463–481.
Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4 27
8. Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.)
CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999).
https://doi.org/10.1007/3-540-48405-1 25
9. Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS,
and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp.
104–113. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-68697-5 9
10. Nassar, M., Souissi, Y., Guilley, S., Danger, J.-L.: RSM: a small and fast coun-
termeasure for AES, secure against 1st and 2nd-order zero-offset SCAs. In: DATE
2012, pp. 1173–1178. IEEE (2012)
11. Nikova, S., Rechberger, C., Rijmen, V.: Threshold implementations against side-
channel attacks and glitches. In: Ning, P., Qing, S., Li, N. (eds.) ICICS 2006.
LNCS, vol. 4307, pp. 529–545. Springer, Heidelberg (2006). https://doi.org/10.
1007/11935308 38
Reducing Randomness Complexity of Mask Refreshing Algorithm 101
12. Reparaz, O., Bilgin, B., Nikova, S., Gierlichs, B., Verbauwhede, I.: Consolidating
masking schemes. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO 2015. LNCS,
vol. 9215, pp. 764–783. Springer, Heidelberg (2015). https://doi.org/10.1007/978-
3-662-47989-6 37
13. Rivain, M., Prouff, E.: Provably secure higher-order masking of AES. In: Mangard,
S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 413–427. Springer,
Heidelberg (2010). https://doi.org/10.1007/978-3-642-15031-9 28
Applied Cryptography
A Plausibly Deniable Encryption
Scheme Utilizing PUF’s Thermo-Sensitivity
1 Introduction
Conventional encryption schemes seldom think about situations when one or both two
sides of communication are coerced to reveal their private information, e.g. private
keys, nonce and other random parameters used in encryption. However, such situations
can always be found in real world scenarios. For example, a man is taking a disk with
encrypted sensitive documents through the Customs, but unfortunately the customs
The work is supported by a grant from the National Key Research and Development Program of
China (Grant No. Y16A01602).
officer requires checking the content of his disk. In order to cheat the officer, the man
would hope to convincingly deny the existence of the genuine plaintext.
One way to achieve this goal, is to explain the encrypted document into a fake
innocuous one. Given this, Canetti, Dwork, Naor, and Ostrovsky firstly proposed
intriguing Deniable Encryption in 1997 [1]. The main idea is to construct a fake
randomness, maybe the key or some additional parameters required in the encryption,
to reinterpret the ciphertext into a plausible fake plaintext. Though varieties of schemes
have been proposed since then [14, 15, 18, 19], these schemes are limited in theoretical
discussion. In order to satisfy information security requirements, all the theoretical
schemes are suffering from extremely long length of ciphertext or key [1, 2].
In engineering practice, engineers seek another way to obtain deniability which is
so called Plausibly Deniable Encryption. Plausibly Deniable Encryption aims to deny
the existence of encrypted data with the help of engineering methods, e.g. TrueCrypt
[3], Rubberhose filesystem [4], Steganographic File Systems [5–7, 17]. These schemes
usually hide sensitive data in a hidden volume or a random-looking free space, but such
schemes require some special designs in the filesystem and are under threat of flaws in
the implementation [8] and forensic tools [9]. Moreover, the existence of such special
designs in the file system is detectable.
No matter in theoretical discussion or in engineering practice, the basic idea to
achieve deniability is the same: Though having been forced to hand in all the
parameters used in encryption, the user is still able to retain a trapdoor information
which is the radical difference between him and the adversary. The adversary without
this trapdoor information, in spite of all the other parameters used in encryption he has
had, he cannot tell whether the decrypted plaintext is a fake or a genuine one (in
theoretical deniable encryption) or distinguish between a truly random sequence and a
ciphertext (in plausibly deniable encryption). Therefore, the secrecy of the trapdoor
information is even more important than the encrypt key in deniable encryption sce-
narios. This trapdoor information should be stored as covert as possible and to con-
vincingly cheat the adversary, both the user’s behavior and the encryption system
should look normal enough not to arouse the adversary’s suspicion.
On account of these, we propose a practical deniable encryption scheme which
takes advantage of PUFs’ thermo-sensitivity to implement deniable encryption in quite
a covert way. Our scheme neither requires the user to remember or store any tedious
trapdoor information, nor requires any special designs in the file system or extra inputs
during decryption. Generally, PUF’s sensitivity to temperature is regarded as an
undesirable nature that undermines PUF’s stability. However, we aware that if PUF’s
behavior varies with temperature, it may serve as a thermosensitive “hidden trigger”
which can only be triggered in specific temperature range. In the proposed scheme, the
PUF-based “hidden trigger” is able to perceive temperature variation, which makes the
temperature become a vital and covert trapdoor information to determine whether to
decrypt faithfully or not.
Details of the scheme will be described in Sect. 3 and we successfully implemented
it on Xilinx KC705 evaluation boards to examine its feasibility. According to the
experiment results, ciphertexts generated at extreme temperature (e.g. −40 °C or 60 °
C) will be decrypted as the prepared fake plaintext at room temperature (20 °C–30 °C).
A Plausibly Deniable Encryption Scheme 107
2 Preliminary
X
n
H1 ðr Þ ¼ log2 ðmaxfPðri ¼ 1Þ; Pðri ¼ 0ÞgÞ: ð1Þ
i¼1
Pðri ¼ 1Þ and Pðri ¼ 0Þ are probabilities for the ith bit of response to equal 1 and 0.
Reliability and Uniqueness: Assume we instantiate Npuf PUF entities, and invoke
each of them with Nchal challenges, for each challenge we measure Nmeas times. Thus,
we obtain Npuf Nchal Nmeas response sequences. Equations (2) and (3) calculate
the average intra-distance and average inter-distance respectively [13].
2 X
N meas X
Npuf X
Nchal
lintra ¼ HDðrij1 ðck Þ; rij2 ðck ÞÞ: ð2Þ
Npuf Nchal Nmeas ðNmeas 1Þ j
1 ;j2 ¼1 i¼1 k¼1
j1 6¼j2
2 XNpuf X X
Nchal N meas
HD () is a function counting the Hamming Distance (HD) between two PUF
responses. Apparently, average intra-distance reflects the difference between each
measurement (reliability) and average inter-distance demonstrates to what extent
entities of the same PUF are different from each other (uniqueness). For a PUF design,
its ideal inter-distance is 50%, while its intra-distance should be as low as possible.
Error Correcting Code (ECC): Because PUF’s response is not perfectly reproduc-
tive, ECCs like Hamming code, Reed-Muller code, BCH code, repeating code etc., are
widely adopted in PUF’s application to guarantee that the same response is generated in
every invoking. The enrollment and recovery process are shown in Fig. 1, generally the
helper data can save in an unprotected NVM and the response security is guaranteed by
the random number k in the enrollment process.
PUFs’ such property suggests that if we choose an ECC algorithm with appropriate
error correcting capability, we can control a PUF’s responses only to be recovered
within a temperature range, thereby utilize PUF to perceive temperature variations. If
the enrolled response is recovered successfully, the ciphertext will be decrypted loyally,
otherwise a prepared fake text will be output as the decrypted plaintext to cheat the
adversary. This is the basic idea of the proposed deniable encryption scheme.
110 C. Li et al.
The proposed scheme is a plan-ahead deniable encryption scheme, i.e. fake text is
prepared before decryption. The basic idea is to let the cryptographic system vary its
decryption result automatically under different temperature conditions. The scheme
contains four programs.
• The Enroll program is responsible for recording environmental temperature in
PUF’s response sequence. As mentioned in Sect. 2.2, some PUFs’ behavior stably
varies with temperature, therefore, the enrolled response sequence can be regarded
as a reflection of temperature and will serve as a “hidden trigger” which can only be
successfully recovered in neighboring temperature range.
• The Explain program prepares alternative texts beforehand to generate deniable
ciphertexts. The input of the Explain program are two texts m and m0 , where m is
the genuine text and m0 is the fake one which will take place of m as the decryption
result to cheat the adversary.
• The Encrypt program also generates ciphertexts, but its ciphertexts can only be
decrypted faithfully. Therefore, this program has only one input text m, the format
of its ciphertext is analogous to that of the Explain program.
• The Decrypt program will selectively output genuine or fake plaintexts according
to the temperature. While doing decryption, the Decrypt program checks the
temperature condition by comparing the recovered trigger with the enrolled one. We
call the recovered trigger equals the enrolled one as “the trigger is triggered”. In this
case, the program recovers the genuine text and output it as the final decryption
result, otherwise, the program just outputs the decrypted fake text.
3.2 Workflow
Enroll Program Enrl : k ! ðrsp1 ; w1 Þ. The Enroll program records current temper-
ature in PUF’s response sequence. It first uses the encryption key k as PUF’s challenge
and obtains a response sequence rsp1 ¼ puf ðkÞ. rsp1 will serve as the “hidden trigger”
and be saved in the nonvolatile memory of the PUF module. Then the program cal-
culates the helper data w1 ¼ ECCwk
enrol
ðrsp1 Þ and saves it as well.
A Plausibly Deniable Encryption Scheme 111
Reg cyphertext
challenge
rsp1 m⊕m '
or 0
PUF
rsp2
⊕ 0
helper dataω2 plaintext
rsp
3 rd
part
⊕ 1
ECCst rsp2''
⊕ m⊕m '
rsp1
ECC Module
helper data
ω1
ECCwk rsp1'' match? comparison result
Explain Program Exp : ðm; m0 Þ ! dc. The Explain program generates deniable
ciphertext dc with input ðm; m0 Þ, m is the genuine text and m0 is the fake one. First, the
program encrypts the fake text normally with the symmetric key algorithm and
acquires ciphertext c0 ¼ EN (k; m0 ). Then uses c0 as PUF’s challenge to get corre-
sponding response sequence rsp02 ¼ puf (c0 ). rsp02 serves as a random mask to hide the
two texts’ difference
m m0 . “” is the bit XOR operator. Finally, the helper data
w02 ¼ ECCstenrol rsp02 is calculated and forms the output deniable ciphertext:
dc ¼ c0 w02 ðrsp02 m m0 Þ.
Encrypt Program Enc : m ! ec. The Encrypt program generates ciphertext ec with
one input text m. First, the program encrypts m with the encryption key k by the
symmetric key algorithm, i.e. c = EN (k; m), and uses this ciphertext as challenge to
invoke the PUF and get corresponding response rsp2 ¼ puf ðcÞ. Also, the helper data
w2 is calculated by ECCst and the ciphertext ec ¼ ckw2 krsp2 .
Decrypt Program Dec : cin ! mout . The Decrypt Program explains the input cipher-
text cin into
certain
plaintext mout . First, the program divides cin into three equilong parts
cin ¼ c00 w002 mk and decrypts c00 with the symmetric key algorithm to get
mtemp ¼ DE ðk; c00 Þ. Then the program invokes PUF with the encryption key k and
recovers the acquired response with the saved helper data w1 by the weaker ECC
algorithm, i.e. rsp001 ¼ ECCwk recov
ðpuf ðk Þ; w1 Þ. If rsp001 dose not equal the saved trigger
rsp1 , i.e. rsp001 6¼ rsp1 , the program outputs mtemp directly; otherwise, it uses c00 to invoke
the PUF and recover the obtained response with w002 by the stronger ECC algorithm, i.e.
rsp002 ¼ ECCstrecov puf ðc00 Þ; w002 , finally outputs mout ¼ mtemp rsp002 mk.
change of the PUF response rsp1 that severs as the “hidden trigger” will be within the
correction capability of ECCwk . As long as the recovered response equals the enrolled
one, the Decrypt program will extract the hidden information m m0 masked by rsp2
(because there is: rsp002 mk ¼ rsp002 ðrsp02 m m0 ) = m m0 ) to reconstruct the
genuine text. If the input ciphertext is generated by the Encrypt program, whether the
trigger is “triggered” or not, the Decrypt program will always decrypt faithfully.
Because ec ¼ ckw2 krsp2 and rsp2 can be regarded as a masked all-zero sequence. Any
sequence doing bit XOR operation with the all-zero sequence equals itself, so the
output will always be DE ðk; cÞ.
Deniability: While operating under certain temperature which is out of the “trigger
range”, the deniable ciphertext dc, which is originally generated by the Explain pro-
gram, will be decrypted into the prepared fake text m0 . Because the change of response
sequence is already out of the correction ability of ECCwk , thus rsp1 cannot be suc-
cessfully recovered, i.e. the “hidden trigger” will not be “triggered”, the Decrypt
program just outputs DE ðk; c00 Þ directly.
Security: As with respect to the first part of the ciphertext (in the Explain program is
the fake text m0 , in the Encrypt program is the sole input m), the adversary has no way
to derive the text protected by cryptographic algorithm. Owing to PUF’s unpre-
dictability and randomness, the random mask used in the third part makes the adversary
unable to figure out the hidden difference m m0 . As the second part, the helper data of
the random mask, has nothing related to either text m or m0 , the security of the whole
ciphertext in our scheme is guaranteed.
Practicability: The prime advantage of our scheme is that the user does not need any
special manipulation to cheat the adversary. In our scheme, we hide the information
m m0 that helps us to recover the genuine text in the ciphertext itself and utilize the
temperature as the covert trapdoor information to achieve deniability. Therefore, no
extra input is required during decryption and the enrolled temperature, under which the
deniable ciphertexts are generated, is kept in the user’s mind without a trace. The user
just needs to make sure that the temperature of the environment, in which he may be
compelled, is most likely to be out of the “trigger range”. Furthermore, in our scheme,
the Encrypt program and the Decrypt program are accessible to the adversary. The
adversary can choose arbitrary plaintexts or ciphertexts to examine the loyalty of the
encryption system, but as the ciphertext generated by the Encrypt program can only be
decrypted loyally, from the view of the adversary, our deniable encryption system will
always perform in a normal way.
The most important thing in the real design is to determine the weaker and the
stronger ECC algorithms and their correction capabilities according to PUF’s actual
properties. We must investigate how much influence do temperature variations pose on
the PUF’s reliability, because if the ECC algorithm is too strong, the trigger would be
unresponsive to temperature variation, then the deniable ciphertext will be decrypted
faithfully in a large temperature range; if the ECC algorithm is too weak, the cor-
rectness of our scheme cannot be ensured.
We deploy1024 Bistable Ring PUFs (BR PUF) [11] on two KC705 boards
respectively. To investigate the properties of this BR PUF, we exhaust all the chal-
lenges and measure every challenge for 32 times under 5 different temperature con-
ditions (−40 °C, −20 °C, 25 °C, 40 °C and 60 °C). For each measurement, we can
obtain 1024 response bits from each board, thus we totally acquire about 1 giga-bit
data. According to formulas (2) and (3), we yield the PUF’s average intra-distance and
inter-distance are 5.00% and 44.34% respectively. This result suggests that this kind of
BR PUF is able to generate a relatively stable trigger sequence and sufficiently different
random masks with different configurations.
We further calculate the average intra-distances of responses generated under dif-
ferent temperatures and compare this temperature-influenced distribution with the
original intra-distance distribution in Fig. 4. From the figure we can see, the whole
distribution shifts rightwards, and the average intra-distance increases to 8.89%. As the
weaker ECC algorithm must make sure the trigger sequence to be successfully
recovered in a temperature range as narrow as possible, according to the original
distribution, ECC that corrects sequences with 10% error bits is desirable. While the
stronger one should be able to handle at least 18% error bit rate to recover the random
mask at any temperature. Therefore, we chose (15, 11) Hamming Code (can correct
1-bit error in every 11 bits) as the weaker ECC and (1, 5) Reed-Muller Code [12] (can
correct 7-bit error in every 32 bits) as the stronger one.
BR PUF
Hamming Challenge
Code 0 0 0
RO1 0
1 1 1 1
Reed-Muller 0 0 0 0
Code
Challenge RO2
0 0 0 0
1 1 1 1
reset
1 1 1 1
0 0 0
RO1024 0
1 1 1 1
reset
Cryptographic 1 1
1 1
Module 0 0
Challenge 0 0
(AES)
Comparing these two graphs, the changing trends of triggers’ recovery probability
on these two boards are the same on the whole. As the temperature difference is
enlarging, the recovery probability declines obviously. With respect to changing pat-
terns in the higher temperature region (10 °C–60 °C), we can see that triggers enrolled
at 10 °C and 60 °C cannot be recovered when temperature difference reaches 50 °C.
However, as lines tend to stay stable when temperature falls below −20 °C, triggers
enrolled at 10 °C can still be recovered with a relatively high probability at −40 °C.
Though the changing patterns at low temperatures are gentler, triggers enrolled at
−40 °C are not likely to be recovered at room temperature (20 °C–30 °C). Considering
heats emitted by electronic devices during working process, generating deniable
ciphertexts at extreme low temperature could be a better choice.
116 C. Li et al.
5 Conclusion
In this paper, we present a novel and practical PUF-based deniable encryption scheme.
Our key thought is to convert temperature into a covert trapdoor information, i.e. by
utilizing PUF’s thermo-sensitivity, we enable the decrypt program to perceive tem-
perature variations thereby changes its output under different temperatures. In our
scheme, because the trapdoor information is hidden in user’s mind and as a physical
factor it does not need to be invoked deliberately, the user is able to decrypt the
ciphertext deniably without any abnormal manipulation, which makes the output
plaintext more convincing. Based on this, we presented our architectural design and
analysis its performances. In addition, we implement this scheme with BR PUFs on
two Xilinx KC795 evaluation boards to prove its feasibility.
References
1. Canetti, R., Dwork, C., Naor, M., Ostrovsky, R.: Deniable encryption. In: Kaliski, B.S. (ed.)
CRYPTO 1997. LNCS, vol. 1294, pp. 90–104. Springer, Heidelberg (1997). https://doi.org/
10.1007/BFb0052229
2. Amit, S., Brent, W.: how to use indistinguishability obfuscation: deniable encryption, and
more. In: STOC, pp. 475–484 (2014)
3. TrueCrypt.org. Free open source on-the-fly disk encryption software. Version 7.1a, July
2012. http://www.truecrypt.org/
4. Julian, A., Suelette, D., Ralf, W.: Rubberhose Cryptographically Deniable Transparent Disk
Encryption System, 15 September 2010. Accessed 21 Oct. 2010
5. Anderson, R., Needham, R., Shamir, A.: The steganographic file system. In: Aucsmith, D.
(ed.) IH 1998. LNCS, vol. 1525, pp. 73–82. Springer, Heidelberg (1998). https://doi.org/10.
1007/3-540-49380-8_6
6. McDonald, A.D., Kuhn, M.G.: StegFS: a steganographic file system for linux. In: Pfitzmann,
A. (ed.) IH 1999. LNCS, vol. 1768, pp. 463–477. Springer, Heidelberg (2000). https://doi.
org/10.1007/10719724_32
7. HweeHwa, P., Kian-Lee, T., Xuan, Z.: Stegfs: a steganographic file system. In: 19th
International Conference on Data Engineering, Proceedings, pp. 657–667. IEEE (2003)
8. Adal, C.: BestCrypt IV generation flaw. http://adal.chiriliuc.com/bc_iv_flaw.html
9. Robert, M.: Encrypted hard drives may not be safe. In: IDG News Service, 17 July 2010
10. Daniel, E.H., Wayne, P.B., Kevin, F.: Power-Up SRAM state as an identifying fingerprint
and source of true random numbers. IEEE Trans. Comput. 58(9), 1198–1210 (2009)
11. Chen, Q., Csaba, G., Lugli, P., Schlichtmann, U., Ruhrmair, U.: The Bistable Ring PUF: a
new architecture for strong Physical Unclonable functions. In: IEEE International
Symposium on Hardware Oriented Security and Trust–HOST, pp. 134–141. IEEE (2011)
12. Sebastian, R.: Reed-Muller Codes, Carleton University (2003)
13. Roel, M.: Physically Unclonable Functions: Constructions, Properties and Applications.
Katholieke Universiteit Leuven, Belgium (2012)
14. Klonowski, M., Kubiak, P., Kutyłowski, M.: Practical deniable encryption. In: Geffert, V.,
Karhumäki, J., Bertoni, A., Preneel, B., Návrat, P., Bieliková, M. (eds.) SOFSEM 2008.
LNCS, vol. 4910, pp. 599–609. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-
540-77566-9_52
A Plausibly Deniable Encryption Scheme 117
15. Dürmuth, M., Freeman, D.M.: Deniable encryption with negligible detection probability: an
interactive construction. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632,
pp. 610–626. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20465-4_33
16. Tuyls, P., Škorić, B.: Strong authentication with physical unclonable functions. In: Petković,
M., Jonker, W. (eds.) Security, Privacy, and Trust in Modern Data Management, pp. 133–
148. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-69861-6_10
17. Katzenbeisser, S., Petitcolas, F.A.: Information Hiding Techniques for Steganography and
Digital Watermarking. Artech House, Inc. (2000)
18. Howlader, J., Basu, S.: Sender-side public key deniable encryption scheme. In: International
Conference on Advances in Recent Technologies in Communication and Computing, pp. 9–
13. IEEE (2009)
19. Meng, B., Wang, J.Q.: An efficient receiver deniable encryption scheme and its applications.
J. Netw. 5(6), 683–690 (2010)
20. Herkle, A., Becker, J., Ortmanns, M.: Exploiting weak PUFs from data converter
nonlinearity—E.g., a multibit CT DR modulator. IEEE Trans. Circ. Syst. I Regul. Pap. 63
(7), 994–1004 (2016)
Two Efficient Tag-Based Encryption
Schemes on Lattices
1 Introduction
the encryption and decryption algorithms both take a tag as an extra input.
All the ECIES and RSA-OAEP submissions and Shoup’s proposal for an ISO
standard of PKE include the notion of a tag (in the first two it is called an
encoding parameter), although no indication was given as to the role or function
of a tag.
As an independent primitive, in contrast to PKE, TBE has an additional
ability to attach a tag to the ciphertext during the encryption process, while
the tag is generally not included in the ciphertext and is explicitly given to the
decryption algorithm. Such an explicit treatment of a tag has some notational
advantages, when we consider an adversary who tries to alter the tag without
affecting the ciphertext. The security of TBE can be similarly defined as that
of PKE, as well as adding another dimension selective/adaptive-tag indicating
whether the adversary submits the target tag before receiving the public key
(selective-tag), or in the challenge phase together with a pair of chosen messages
(adaptive-tag). And thus its security notions include indistinguishability against
selective-tag/adaptive-tag chosen-plaintext/(weak) lunch-time/(weak) chosen-
ciphertext attacks, which can be abbreviated, respectively, to IND-sTag-CPA,
IND-aTag-CPA, IND-sTag-wCCA1, IND-sTag-CCA1, IND-aTag-CCA1, IND-
sTag-wCCA2, IND-sTag-CCA2, IND-aTag-wCCA2 and IND-aTag-CCA2. Note
that w here is short for weak, which means that the adversary is not allowed
to query the target tag instead of the pair of the target tag and the challenge
ciphertext to the decryption oracle.
As a cryptographic tool, IND-sTag-wCCA2 secure TBE schemes, belongs to
a more general class of cryptographic schemes than selectively secure identity-
based encryption (IBE) schemes, are sufficient to construct CCA secure PKE
schemes, according to [Kil06]. Note that IND-aTag-CCA2 secure TBE schemes
are equivalent with IND-CCA2 secure PKE schemes. In addition, IND-aTag-
wCCA2 secure TBE schemes can be used, with the technique in [MRY04], to
construct protocols that realizes the secure message transmission functionality
in the universal composition framework.
TBE is an interesting cryptographic primitive and a useful tool from the
above description. As far, except that IND-sTag-wCCA2 (IND-aTag-wCCA2,
IND-aTag-CCA2, respectively) secure TBE schemes can be obtained from IND-
sID-CPA (IND-aID-CPA, IND-aID-CCA2, respectively) secure IBE schemes
by the generic transformation in [Kil06], in the traditional number-theoretic
field, there is also an IND-aTag-wCCA2 secure TBE scheme [MRY04] and two
IND-sTag-wCCA2 secure TBE schemes [Kil06]. Unfortunately, in the lattice-
based field, there is only one IND-sTag-wCCA2 secure lattice-based TBE
scheme [SLLF15], which, in fact, is under a variant of DLWE assumption. Our
goal here is to construct more efficient lattice-based TBE schemes with stronger
security under standard assumptions.
Table 1. The comparison between our schemes and the one from [SLLF15]
with the one in [SLLF15] on some aspects in Table 1, which shows that our
schemes are more efficient with smaller moduli, are under weaker lattice assump-
tions with smaller approximation factors, and are with stronger security, than
the TBE scheme from [SLLF15].
The main idea for constructing our schemes is combining the probabilis-
tic partition technique of [ABB10] for the adaptively secure IBE scheme,
originated from the work [Wat05], and the G-trapdoor as well as some
efficient algorithms from [MP12]. In particular, an ingenious matrix,
which
comes from [Boy10], in the construction of [ABB10], is A0 (B + i id[i]Ai ),
where each entry of id is in {−1, 1}, whose trapdoor can be derived from that
of A0 , and is transformed, in the proof, into
A0 (A0 id[i]R∗i + (1 + id[i]hi )B), (1)
i i
the challenge ciphertext, which is a solution to make our schemes based on the
DLWE assumption.
Additionally, we embed the preimage sampling problem into TBE11 , which
is the reason of adding an extra vector in the public key, and the LWE inversion
problem into TBE2. And therefore, for the decryption of TBE1, a preimage
sampling algorithm together with a trapdoor extension algorithm are enough.
It’s more complex for the decryption of TBE2 since LWE samples are not
generated in key generation as Regev encryption [Reg05], the secret key is a
G-trapdoor of the first part of LWE samples not the secret vector used to gen-
erate LWE samples and the message part will be lost if we execute the inversion
algorithm on the second part of LWE samples (as the ciphertext, each entry of
which is in Zq ).
To overcome this obstacle, we observe that the inversion algorithm in [MP12]
for Λ(ATt ) is essentially that for Λ(GT ) by transforming the former into the latter
in use of the trapdoor RAt at first. To solve the above problem for TBE2, we
map the message into an element in Λ(GT )/2Λ(GT ) and use a perturbed vector
of 2Λ(At ) to hide the encoded message. And then in decryption, we get the
transformed error by executing the first two steps of the inversion algorithm
on the perturbed vector of Λ(GT ) and subtract it from the perturbed vector
of Λ(GT )/2Λ(GT ), from which the message can be obtained by the inverse
mapping. Note that the mapping is efficient to evaluate and to invert according
to [MP12].
1.2 Applications
1
Although it seems able to construct a TBE scheme by simply, based on the dual-
Regev encryption [GPV08], duplicating the same number of the image vectors and
the preimage vectors as the bit length of a tag in key generation and just sum
the image vectors indexed by the tag during the encryption, such a scheme is only
IND-aTag-CPA secure.
122 X. Wang et al.
2 Preliminaries
2.1 Basic Notation
In this paper, we use bold lower case letters (e.g. a, b) to denote column vec-
tors and bold upper case letters (e.g. A, B) to denote matrices. For a matrix
A, A−1 , AT denote its inversion and transposition, respectively, A[i, j] denotes
the entry in the i-th row and the j-th column, A := maxu Au
for2 all unit
vectors u and the norm of a vector x is defined as x := i x[i] , where
Two Efficient Tag-Based Encryption Schemes on Lattices 123
x[i] denotes the i-th entry of x. For a positive integer n, let In denote the n-
dimensional identity matrix. For an integer q ≥ 2, the notation q is log2 q.
$
For a set S, then s ← − S represents the operation of picking an element s from
S uniformly at random. For k ∈ N, then [k] denotes the set {1, . . . , k}. Let PPT
short for probabilistic polynomial-time.
2.2 Lattices
The LWE problem was introduced by Regev [Reg05]. Decisional LWE (DLWE) is
defined as follows.
124 X. Wang et al.
There are known quantum [Reg05] and classical [Pei09] reductions between
DLWEn,q,Ψ̄α√and approximating short vector problems on lattices. In particular,
for αq ≥ 2 n, solving the DLWEn,q,Ψ̄α problem is at least as hard as solving
worst-case lattice problems with approximation factors of O(n/α).
In 2011, Alwen and Peikert [AP11] elucidated and generalized Ajtai’s algo-
rithm to provide a basis of essentially optimal length.
Lemma 3 (Trapdoor Generation II [AP11]). For any integer q ≥ 2 and
m ≥ 2n2q , there exists a PPT algorithm that, on inputs n, q and m, outputs a
nearly uniform matrix A ∈ Zn×m ≤ 5 nq .
and a basis S of Λ⊥ (A) with S
q
from a discrete Gaussian with parameter s over Λ⊥ (H G−B)[i] (A) in use of R,
where
√ (H G − B)[i] is the i-th column of (H G − B). Note that R ≤ s ·
AdvaTag-wCCA2
A (λ) := | Pr[C outputs 1] − 1/2|, (2)
LWE value is used to hide the message, the tag is bound to the part of A and
the preimage vector e will first be sampled for decryption.
Let √n be the security parameter, a prime q = poly(n), α ∈ (0, 1) such that
αq ≥ 2 n, m = O(nq ) and D is the distribution used in Lemma 4. The tag
space is T = {0, 1} and the message space is Zp for some 2 ≤ p < q.
$
– TBE1.Gen(1n ): Sample Ā0 ← − Zn×m
q and run (A0 , TA0 ) ← TrapGen(Ā0 , In ),
n×nq
where A0 := Ā0 G − Ā0 TA0 . Choose A1 , . . . , A ←
$
− Zq
$
− Znq ,
, u←
output pk := (A0 , A1 , . . . , A , u) and sk := TA0 .
$
– TBE1.Enc(pk, t ∈ T , μ ∈ Zp ): Sample s ← − Znq , Ri ← D(m+nq )×nq for
m+nq
i ∈ [], x ← Ψ̄α , y ← Ψ̄α , let At := (A0 (G + i=1 (−1)t[i] Ai ), Rt :=
t[i] T
i=1 (−1) Ri , z := −RTt y, compute and output c := (uAt ) s +
T T T T
(x, y , z ) + (μ · q/p, 01×(m+2nq ) ) mod q.
– TBE1.Dec(sk, t, c): First derive a trapdoor TAt ← TrapExt(TA0 , At , In ,
TA0 ). And then sample et ← SamplePre(TAt , At , u, TAt ), such that
At et = u mod q, and compute δ := (1, −eTt ) · c/q. Finally find and output
μ ∈ Zp such that δ − μ/p is closest to 0 modulo 1.
Lemma 8 (Correctness). Let a prime q = ω(p2 · (nq )2.5 · (log n)2 ) and α <
(p2 ·(nq )2 ·ω((log n)2 ))−1 . Then TBE1.Dec works with overwhelming probability.
Proof. In the decryption algorithm, TrapExt and SamplePre are firstly called and
their correctness are guaranteed by Lemmas 7 and 5 respectively. Subsequently,
(1, −eTt ) · c = μ · (q/p) − μ · (q/p − q/p) + x − eTt (yT , zT )T mod q, in which
the error term is −μ · (q/p − q/p) + x − (et,1 − Ret,2 )T y, if we parse et as
(eTt,1 , eTt,2 )T .
According to Lemmas 1, 4 and 5, we have
et ≤ TAt · ω( log n) · m + 2nq ≤ O((nq )1.5 ) · ω(log n),
√ √
and since R ≤ · O( m + nq + nq ) · ω( log n) = · O( nq ) · ω( log n)
by Lemma 4, et,1 − Ret,2 ≤ et,1 + Ret,2 ≤ · O((nq )2 ) · ω((log n)1.5 ),
and hence by Lemma 12 in [ABB10],
|(et,1 − Ret,2 )T y| ≤ et,1 − Ret,2 · αq · ω( log(m + nq ))
+ et,1 − Ret,2 · m + nq /2,
therefore |δ − μ/p| = |(1, −eTt ) · c/q − μ/p| = | μ/p + 1/q · (−μ · (q/p − q/p) +
x − eTt (yT , zT )T ) − μ/p| < 1/(2p),
so TBE1.Dec outputs μ as desired.
128 X. Wang et al.
C first computes Hh (ti ) and checks if Hh (ti ) = 0: If yes, it aborts the game
and outputs a random bit; else, it computes Ati as in (3), uses its trapdoor
ti [j] ∗
j=1 (−1) Rj to generate eti , and finally it exploits eti to decrypt ci and
sends the result to A.
At the challenge phase, once receiving t∗ from A, C first computes Hh (t∗ )
and checks whether it equals to 0: If not, it aborts the game and outputs a
random bit; else, it generates a challenge ciphertext as in Game 2.
At the guess phase, C just performs the artificial abort as in Game 2.
Game 4. This game is identical to Game 3 except that the challenge cipher-
m+2nq +1
text is chosen as a random element in Zq .
mod 2q.
– TBE2.Dec(sk, t, c): Let TAt ← TrapExt(TA0 , At , In , TA0 ), where A0 TAt
t[i]
t[i]
G n− (G +n i=1 (−1) Ai ) = − i=1 (−1) Ai . And to compute
=
z ∈ Zq , e ∈ Z as in Lemma 6 on inputs (TAt , At , c mod q) as follows:
q
1. b := TTAt Inq · (c mod q) as a perturbed vector w.r.t. Λ(GT );
T
2. run the inversion
for Λ(G ), to get
algorithm the inversion (z, e)√for b.
If e ≥ ( + nq ) · αq · O( nq ) · ω̃(log n) + ( + nq ) · O(nq ) · ω( log n),
output ⊥. Compute u := (TTAt Inq ) · (c mod 2q) − e, and output f −1 (u).
Lemma 9 (Correctness). Let q = ω(( + nq ) · nq · log n) and α < (( +
nq )· nq ·ω(log n))−1 . Then TBE2.Dec works with overwhelming probability.
Proof. The proof is identical to that for Theorem 1 except that the difficulty of
distinguishing from the latter two games is based on a particular form of dis-
cretized DLWE assumption: It is infeasible to distinguish the following two dis-
tributions for any uniform s ∈ Znq , U (Znq × T) := {(a, b)} $ n $ and As,α :=
a←− Zq , b ←
−T
{(a, b := a, s/q + e mod 1)} $ n , which can be transformed into the
a←−Zq , e←Ψ̄α
distributions over Zn × Z2q by the mapping b → 2qb + D √ 2 by
q Z−2qb, 2
(αq) −(2α q)
Theorem 6.3 in [MP12].
130 X. Wang et al.
References
[ABB10] Agrawal, S., Boneh, D., Boyen, X.: Efficient lattice (H)IBE in the standard
model. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 553–
572. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-
5 28
[Ajt99] Ajtai, M.: Generating hard instances of the short basis problem. In: Wieder-
mann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol.
1644, pp. 1–9. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-
48523-6 1
[AP11] Alwen, J., Peikert, C.: Generating shorter bases for hard random lattices.
Theory Comput. Syst. 48(3), 535–553 (2011)
[Boy10] Boyen, X.: Lattice mixing and vanishing trapdoors: a framework for fully
secure short signatures and more. In: Nguyen, P.Q., Pointcheval, D. (eds.)
PKC 2010. LNCS, vol. 6056, pp. 499–517. Springer, Heidelberg (2010).
https://doi.org/10.1007/978-3-642-13013-7 29
[GPV08] Gentry, C., Peikert, C., Vaikuntanathan, V.: Trapdoors for hard lattices and
new cryptographic constructions. In: STOC 2008, pp. 197–206. ACM (2008)
[Kil06] Kiltz, E.: Chosen-ciphertext security from tag-based encryption. In: Halevi,
S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 581–600. Springer,
Heidelberg (2006). https://doi.org/10.1007/11681878 30
[MG02] Micciancio, D., Goldwasser, S.: Complexity of Lattice Problems, vol. 671,
p. x,220. Springer, New York (2002). https://doi.org/10.1007/978-1-4615-
0897-7
[MP12] Micciancio, D., Peikert, C.: Trapdoors for lattices: simpler, tighter, faster,
smaller. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS,
vol. 7237, pp. 700–718. Springer, Heidelberg (2012). https://doi.org/10.1007/
978-3-642-29011-4 41
[MR07] Micciancio, D., Regev, O.: Worst-case to average-case reductions based on
gaussian measures. SIAM J. Comput. 37(1), 267–302 (2007)
[MRY04] MacKenzie, P., Reiter, M.K., Yang, K.: Alternatives to non-malleability: defi-
nitions, constructions, and applications. In: Naor, M. (ed.) TCC 2004. LNCS,
vol. 2951, pp. 171–190. Springer, Heidelberg (2004). https://doi.org/10.1007/
978-3-540-24638-1 10
[Pei09] Peikert. C.: Public-key cryptosystems from the worst-case shortest vector
problem. In: STOC 2009, pp. 333–342. ACM (2009)
[Reg05] Regev, O.: On lattices, learning with errors, random linear codes, and cryp-
tography. In: STOC 2005, pp. 84–93 (2005)
[Sho01] Shoup, V.: A proposal for the ISO standard for public-key encryption (version
2.1). IACR E (2001)
[SLLF15] Sun, X., Li, B., Lu, X., Fang, F.: CCA secure public key encryption scheme
based on LWE without gaussian sampling. In: Lin, D., Wang, X.F., Yung, M.
(eds.) Inscrypt 2015. LNCS, vol. 9589, pp. 361–378. Springer, Cham (2016).
https://doi.org/10.1007/978-3-319-38898-4 21
Two Efficient Tag-Based Encryption Schemes on Lattices 131
1 Introduction
and range queries on encrypted data [8] and polynomial evaluation, CNF/DNF
formulas [10].
At first, the IPE constructions [4,10–16] were based on bilinear groups and
constructing IPE scheme from other assumption was left as an open prob-
lem. Until 2011, Agrawal et al. [2] proposed the first IPE scheme (denoted by
AFV11) from the LWE assumption. One of the drawbacks of the scheme is that
it has large sizes of public parameter (i.e., O(un2 log3 n)) and ciphertext (i.e.,
O(un log3 n)) for q = poly(n), where u is the dimension of the attribute vector, n
is the security parameter. For efficiency, Xagawa1 [17] improved the AFV11 IPE
scheme and obtained a more compact IPE scheme (denoted by Xag13) with pub-
lic parameter of size O(un2 log2 n) and ciphertext of size O(un log2 n). Whether
we can further compress the public parameter and ciphertext size to get a more
compact IPE scheme is an interesting problem.
For predicate
⎛ vector ⎞ v = (v1 , . . . , v ) and the corresponding Vi = vi In as before,
v1 I n
⎜ v2 I n ⎟
⎜ ⎟
let V = ⎜ . ⎟ ∈ Zn×nq , we define the mapping Tv : (Zm m 2
q ) → (Zq ) by
2
⎝ .. ⎠
v I n
Λv ,x = Λq (A|A1 G−1
n, ,m (VGn,2,m ) + WGn,2,m )
−1
The secret key r is defined as a short basis of Λ⊥q (A|A1 Gn, ,m (VGn,2,m )), so if
x , v = 0, then W = 0 , and thus the secret key r can decrypt the corresponding
ciphertext.
Due to the fact that n log q = O(m) = O(n log q), then = O(log ). And
is a bit decomposition base of modulus q = poly(n), thus = O(n) and
= O(log n). So it’s obvious that our IPE scheme improves the public parameter
and ciphertext size by a factor of = O(log n).
2 Preliminaries
2.1 Predicate Encryption
Predicate Encryption ([10]). For the set of attribute Σ and the class
of the predicate F, a predicate encryption scheme consists four algorithm
Setup, KeyGen, Enc, Dec which are PPT algorithms such that:
Compact (Targeted Homomorphic) Inner Product Encryption from LWE 135
• Setup uses the security parameter λ and outputs the master public key mpk
and master secret key msk.
• KeyGen uses the master secret key msk and a predicate f ∈ F and outputs a
secret key skf for f .
• Enc uses the master public key mpk and a attribute I ∈ Σ, outputs a cipher-
texts C for message μ ∈ M.
• Dec takes as input the ciphertexts C and secret key skf . If f (I) = 0, it outputs
μ; if f (I) = 1, it outputs a distinguished symbol ⊥ with all but negligible
probability.
2.2 Lattices
For positive integers n, m, q, and a matrix A ∈ Zn×m
q , the m-dimensional integer
lattices are defined as: Λq (A) = {y : y = AT s for some s ∈ Zn } and Λ⊥q (A) =
{y : Ay = 0 mod q}.
3.2 Parameters
3.3 Security
• THIPE.KeyGen(mpk, msk, x ): On input the master public key mpk and mas-
ter secret key msk, and a predicate vector v = (v 1 , . . . , v k ) ∈ (Zq )k where
v i = (vi,1 , . . . , vi, ) ∈ Zq , do: ⎛ ⎞
vi,1 In
⎜ vi,2 In ⎟
⎜ ⎟
1. For i = 1, . . . , , compute the matrices Vi := ⎜ . ⎟ ∈ Zn×n q , and let
⎝ .. ⎠
vi, In
Vi := G−1
n, ,m
· Gn,2,m )
(Vi
2. Define the matrices:
k
B := Ai Vi ∈ Zn×m
q
i=1
• THIPE.Dec(mpk, Cg , skv ): On input the master public key, a secret key skv =
r for predicate vector v and the ciphertext Cg , do:
1. For b = (0, . . . , 0, q/2
)T , compute z ← r T Cg G−1
2m+1,2,M (b) mod q
2. Output 0, if |z| < q/4; otherwise, output 1.
5 Conclusion
In this work, we built a compact IPE scheme and a targeted homomorphic com-
pact IPE scheme. We make use of two gadget matrix Gn, ,m and Gn,2,m and
decrease the public parameter size to O(un2 log n), ciphertext size to O(un log n).
Our IPE scheme improve the public parameters by a factor of O(log n) compared
with [17].
References
1. Apon, D., Fan, X., Liu, F.: Compact identity based encryption from LWE. http://
eprint.iacr.org/2016/125
2. Agrawal, S., Freeman, D.M., Vaikuntanathan, V.: Functional encryption for inner
product predicates from learning with errors. In: Lee, D.H., Wang, X. (eds.) ASI-
ACRYPT 2011. LNCS, vol. 7073, pp. 21–40. Springer, Heidelberg (2011). https://
doi.org/10.1007/978-3-642-25385-0 2
3. Ajtai, M.: Generating hard instances of the short basis problem. In: Wiedermann,
J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 1–9.
Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48523-6 1
4. Attrapadung, N., Libert, B.: Functional encryption for inner product: achiev-
ing constant-size ciphertexts with adaptive security or support for negation. In:
Nguyen, P.Q., Pointcheval, D. (eds.) PKC 2010. LNCS, vol. 6056, pp. 384–402.
Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13013-7 23
5. Alwen, J., Peikert, C.: Generating shorter bases for hard random lattices. Theory
Comput. Syst. 48, 535–553 (2011)
6. Brakerski, Z., Cash, D., Tsabary, R., Wee, H.: Targeted homomorphic attribute-
based encryption. In: Hirt, M., Smith, A. (eds.) TCC 2016. LNCS, vol. 9986, pp.
330–360. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53644-
5 13
140 J. Li et al.
7. Boneh, D., Gentry, C., Gorbunov, S., Halevi, S., Nikolaenko, V., Segev, G., Vaikun-
tanathan, V., Vinayagamurthy, D.: Fully key-homomorphic encryption, arithmetic
circuit ABE and compact Garbled circuits. In: Nguyen, P.Q., Oswald, E. (eds.)
EUROCRYPT 2014. LNCS, vol. 8441, pp. 533–556. Springer, Heidelberg (2014).
https://doi.org/10.1007/978-3-642-55220-5 30
8. Boneh, D., Waters, B.: Conjunctive, subset, and range queries on encrypted data.
In: Vadhan, S.P. (ed.) TCC 2007. LNCS, vol. 4392, pp. 535–554. Springer, Heidel-
berg (2007). https://doi.org/10.1007/978-3-540-70936-7 29
9. Cash, D., Hofheinz, D., Kiltz, E., Peikert, C.: Bonsai trees, or how to delegate a
lattice basis. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 523–
552. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5 27
10. Katz, J., Sahai, A., Waters, B.: Predicate encryption supporting disjunctions, poly-
nomial equations, and inner products. In: Smart, N. (ed.) EUROCRYPT 2008.
LNCS, vol. 4965, pp. 146–162. Springer, Heidelberg (2008). https://doi.org/10.
1007/978-3-540-78967-3 9
11. Lewko, A., Okamoto, T., Sahai, A., Takashima, K., Waters, B.: Fully secure
functional encryption: attribute-based encryption and (hierarchical) inner prod-
uct encryption. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp.
62–91. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5 4
12. Okamoto, T., Takashima, K.: Hierarchical predicate encryption for inner-products.
In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 214–231. Springer,
Heidelberg (2009). https://doi.org/10.1007/978-3-642-10366-7 13
13. Okamoto, T., Takashima, K.: Fully secure functional encryption with general rela-
tions from the decisional linear assumption. In: Rabin, T. (ed.) CRYPTO 2010.
LNCS, vol. 6223, pp. 191–208. Springer, Heidelberg (2010). https://doi.org/10.
1007/978-3-642-14623-7 11
14. Okamoto, T., Takashima, K.: Achieving short ciphertexts or short secret-keys for
adaptively secure general inner-product encryption. In: Lin, D., Tsudik, G., Wang,
X. (eds.) CANS 2011. LNCS, vol. 7092, pp. 138–159. Springer, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-25513-7 11
15. Okamoto, T., Takashima, K.: Adaptively attribute-hiding (hierarchical) inner
product encryption. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012.
LNCS, vol. 7237, pp. 591–608. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-29011-4 35
16. Park, J.-H.: Inner-product encryption under standard assumptions. Des. Codes
Crypt. 58, 235–257 (2011)
17. Xagawa, K.: Improved (Hierarchical) Inner-Product Encryption from Lattices.
In: Kurosawa, K., Hanaoka, G. (eds.) PKC 2013. LNCS, vol. 7778, pp. 235–252.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36362-7 15
Compact Inner Product Encryption
from LWE
1 Introduction
Z. Wang and M. Wang—This work was supported by the National Science Founda-
tion of China (No. 61772516).
X. Fan—This material is based upon work supported by IBM under Agreement
4915013672. Any opinions, findings, and conclusions or recommendations expressed
in this material are those of the author(s) and do not necessarily reflect the views
of the sponsors.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 141–153, 2018.
https://doi.org/10.1007/978-3-319-89500-0_12
142 Z. Wang et al.
encryption are fine-grained access and computing on encrypted data. The fine-
grained access part is formalized as a cryptographic notion, named predicate
encryption [11,19]. In predicate encryption system, each ciphertext ct is associ-
ated with an attribute a while each secret key sk is associated with a predicate
f . A user holding the key sk can decrypt ciphertext ct if and only if f (a) = 0.
Moreover, the attribute a is kept hidden.
With several significant improvements on quantum computing, the com-
munity is working intensively on developing applications whose security holds
even against quantum attacks. Lattice-based cryptography, the most promising
candidate against quantum attacks, has matured significantly since the early
works of Ajtai [3] and Regev [22]. Most cryptographic primitives, ranging from
basic public-key encryption (PKE) [22] to more advanced schemes e.g., identity-
based encryption (IBE) [1,12], attribute-based encryption (ABE) [9,17], fully-
homomorphic encryption (FHE) [13], etc., can be built from now canonical lattice
hardness assumptions, such as Regev’s Learning with Errors (LWE). From the
above facts, we can draw the conclusion that our understanding about instanti-
ating different cryptographic primitives based on lattices is quite well. However,
for improving the efficiency of existent lattice-based construction, e.g. reducing
the size of public parameters and ciphertexts, or simplifying the decryption algo-
rithm, our understanding is limited. Besides the theoretical interests in shrinking
the size of ciphertext, as the main motivation of studying functional encryption
comes from its potential deployment in complex networks and cloud computing,
thus the size of transmitted data is a bottleneck of current lattice-based con-
structions. Combining all these, this brings us to the following open question:
Theorem 1.1 (Main). Under the standard Learning with Errors assumption,
there is an IPE scheme satisfying weak attribute-hiding property for predi-
cate/attribute vector of length t = log n, where (1) the modulus q is a prime
of size polynomial in the security parameter n, (2) ciphertexts consist of a vector
in Z2m+1
q , where m is the lattice column dimension, and (3) the public parame-
ters consists two matrices in Zn×m
q and a vector in Znq .
Compact Inner Product Encryption from LWE 143
Remark 1.2. Our technique only allows us to prove a weak form of anonymity
(“attribute hiding”). Specifically, given a ciphertext ct and a number of keys that
do not decrypt ct, the user cannot determine the attribute associated with ct.
In the strong form of attribute hiding, the user cannot determine the attribute
associated with ct even when given keys that do decrypt ct. The weakened form
of attribute hiding we do achieve is nonetheless more that is required for ABE
and should be sufficient for many applications of PE. See Sect. 2 for more detail.
Corollary 1.3. Under the standard Learning with Errors assumption, there is
an IPE scheme with weak attribute-hiding property supporting predicate/attribute
vector of length t = poly(n), where (1) the modulus q is a prime of size polynomial
(t +1)m+1
in the security parameter n, (2) ciphertexts consist of a vector in Zq ,
where m is the lattice column dimension and (3) the public parameters consists
(t + 1) matrices in Zn×m
q and a vector in Znq .
In addition to reducing the size of public parameters and ciphertexts, our decryp-
tion algorithm is computed in an Single-Instruction-Multiple-Data (SIMD) man-
ner. In prior works [2,24], the decryption computes the inner product between
the predicate vector and ciphertext by (1) decomposing the predicate vector, (2)
multiplying-then-adding the corresponding vector bit and ciphertext, entry-by-
entry. Our efficient decryption algorithm achieves the inner product by just one
vector-matrix multiplication.
Our high-level approach to compact inner product encryption from LWE begins
by revisiting the first lattice-based IPE construction [2] and the novel fully homo-
morphic encryption proposed recently by Gentry et al. [15].
t
k
[A|Av ] = [A| vij Aij ]
i=1 j=0
144 Z. Wang et al.
by “mixing” a long public matrices (A, {Aij }) ∈ Zn×m q . The secret key skv is
⊥
t k
a short trapdoor of lattice Λq ([A| i=1 j=0 vij Aij ]). To encode an attribute
vector w ∈ Ztq , for i ∈ [t], j ∈ [k], construct the w-specific vector as
for a randomly chosen vector s ∈ Znq and a public matrix B ∈ Zn×mq . To reduce
the noise growth in the inner produce computation, decryption only needs to
multiply-then-add the r-ary representation of vij to its corresponding cij , as
t
k t
k
vij r ij = sT ( vij Aij + v, wB) + noise
i=1 j=0 i=1 j=0
when v, w = 0, the (v, wB) part vanishes, thus the lattice computed after
inner produce matches the Av part in the key generation. Then the secret key
skv can be used to decrypt the ciphertext. Therefore, the number of matrices
in public parameters or vectors in ciphertext is quasilinear in the dimension of
vectors.
ci = sT (Ai + wi G) + noise
Since G−1 (vi G) is small norm, the decryption succeeds when v, w = 0.
Compact Inner Product Encryption from LWE 145
c = sT (A1 + Ew ) + noise
As such, our final IPE system contains only two matrices (A, A1 ) (and a vector
u), and the ciphertext consists of two vectors. By carefully twisting the vector
encoding and proof techniques shown in [2], we show our IPE construction satis-
fies weakly attribute-hiding. Our IPE system can also be extended in a “parallel
repetition” manner to support (t = λ)-length vectors, as Corollary 1.3 states.
In this section, we provide a comparison with the first IPE construction [2] and
its follow-up improvement [24]. In [24], Xagawa used the “Full-Rank Difference
encoding”, proposed in [1] to map the vector Ztq to a matrix in Zn×n q . The
size of public parameters (or ciphertext) in his scheme depends linearly on the
length of predicate/attribute vectors, and the “Full-Rank Difference encoding”
incurs more computation overhead than embedding GSW-FHE structure in IPE
construction as described above. The detailed comparison is provided in Table 1
for length parameter t = log λ.
146 Z. Wang et al.
Schemes # of Zn×m
q mat. in |pp| # of Zm
q vec. in |ct| LWE param 1/α
2 Preliminaries
Notation. Let λ be the security parameter, and let ppt denote probabilistic
polynomial time. We use bold uppercase letters to denote matrices M, and bold
to denote the Gram-Schmidt
lowercase letters to denote vectors v. We write M
orthogonalization of M. We write [n] to denote the set {1, . . . , n}, and |t| to
denote the number of bits in the string t. We denote the i-th bit s by s[i]. We
say a function negl(·) : N → (0, 1) is negligible, if for every constant c ∈ N,
negl(n) < n−c for sufficiently large n.
We recall the syntax and security definition of inner product encryption (IPE)
[2,19]. IPE can be regarded as a generalization of predicate encryption. An IPE
scheme Π = (Setup, KeyGen, Enc, Dec) can be described as follows:
Setup(1λ ): On input the security parameter λ, the setup algorithm outputs public
parameters pp and master secret key msk.
KeyGen(msk, v): On input the master secret key msk and a predicate vector v,
the key generation algorithm outputs a secret key skv for vector v.
Enc(pp, w, μ): On input the public parameter pp and an attribute/message pair
(w, μ), it outputs a ciphertext ctw .
Dec(skv , ctw ): On input the secret key skv and a ciphertext ctw , it outputs the
corresponding plaintext μ if v, w = 0; otherwise, it outputs ⊥.
Security. For the weakly attribute-hiding property of IPE, we use the following
experiment to describe it. Formally, for any ppt adversary A, we consider the
experiment ExptIPE λ
A (1 ):
We note that query phases I and II can happen polynomial times in terms of
security parameter. The advantage of adversary A in attacking an IPE scheme
Π is defined as:
1
∗
AdvA (1 ) = Pr[b = b ] − ,
λ
2
where the probability is over the randomness of the challenger and adversary.
Learning with Errors. The LWE problem was introduced by Regev [22], the
works of [22] show that the LWE assumption is as hard as (quantum) solving
GapSVP and SIVP under various parameter regimes.
Then outputs a vector r ∈ Z2m distributed statistically close to DΛuq (F),s where
F := (A|AR + B).
Gadget Matrix. We now recall the gadget matrix [4,20], and the extended
gadget matrix technique appeared in [6], that are important to our construction.
Definition 2.6. Let m = n · log q
, and define the gadget matrix
Gn,2,m = g ⊗ In ∈ Zn×m
q
log q
where vector g = (1, 2, 4, . . . , 2log q ) ∈ Zq , and ⊗ denotes tenser product.
We will also refer to this gadget matrix as “powers-of-two” matrix. We define the
inverse function G−1n,2,m : Zq
n×m
→ {0, 1}m×m which expands each entry a ∈ Zq
of the input matrix into a column of size log q
consisting of the bits of binary
representations. We have the property that for any matrix A ∈ Zqn×m , it holds
that Gn,2,m · G−1
n,2,m (A) = A.
As mentioned by [20] and explicitly described in [6], the results for Gn,2,m and
its trapdoor can be extended to other integer powers or mixed-integer products.
In this direction, we give a generalized notation for gadget matrices as follows:
3 Our Construction
In this section, we describe our compact IPE construction. Before diving into
the details, we first revisit a novel encoding method implicitly employed in
Compact Inner Product Encryption from LWE 149
adaptively secure IBE setting in [6]. Consider the vector space Zdq . For vector
v = (v1 , . . . , vd ) ∈ Zdq , we define the following encoding algorithm which maps a
d-dimensional vector to an n × m matrix.
encode(v) = Ev = v1 In | · · · |vd In · Gdn,,m (1)
Similarly, we also define the encoding for an integer a ∈ Zq as: encode(a) = Ea =
aGn,2,m . The above encoding supports the vector space operations naturally, and
our compact IPE construction relies on this property.
c0 = sT A + eT
0, c1 = sT (B + Ew ) + eT0 R, c2 = sT u + e1 + q/2
μ
where errors e0 ← DZm ,s , e1 ← DZ,s .
150 Z. Wang et al.
– Dec(skv , ctw ): On input the secret key skv = r v and ciphertext ctw =
(c0 , c1 , c2 ), if v, w = 0 mod q, then output ⊥. Otherwise, first compute
⎡ ⎤
v1 I n
−1 ⎢ .. ⎥
c1 = c1 · Gdn,,m ⎣ . ⎦ · Gn,2,m
vd I n
then output Round(c2 − (c0 , c1 ), r v ).
Lemma 3.1. The IPE scheme Π described above is correct (c.f. Definition 2.1).
Proof. When the predicate vector v and attribute vector w satisfies v, w =
0 mod q, it holds that c1 = sT Bv + e0 . Therefore, during decryption, we have
μ = Round q/2
μ + e1 − (e0 , e0 ), r v = μ ∈ {0, 1}
small
The third equation follows if (e1 − (e0 , e0 ), r v ) is indeed small, which holds
w.h.p. by setting the parameters appropriately below.
Theorem 3.2. Assuming the hardness of (n, q, χ)-LWE assumption, the IPE
scheme described above is weakly attribute-hiding (c.f. Definition 2.2).
References
1. Agrawal, S., Boneh, D., Boyen, X.: Efficient lattice (H) IBE in the standard model.
In: Gilbert [16], pp. 553–572 (2010)
2. Agrawal, S., Freeman, D.M., Vaikuntanathan, V.: Functional encryption for inner
product predicates from learning with errors. In: Lee, D.H., Wang, X. (eds.) ASI-
ACRYPT 2011. LNCS, vol. 7073, pp. 21–40. Springer, Heidelberg (2011). https://
doi.org/10.1007/978-3-642-25385-0 2
3. Ajtai, M.: Generating hard instances of lattice problems (extended abstract). In:
28th ACM STOC, pp. 99–108. ACM Press, May 1996
4. Alperin-Sheriff, J., Peikert, C.: Faster bootstrapping with polynomial error. In:
Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 297–314.
Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44371-2 17
5. Alwen, J., Peikert, C.: Generating shorter bases for hard random lattices. Theory
Comput. Syst. 48(3), 535–553 (2010)
6. Apon, D., Fan, X., Liu, F.-H,: Vector encoding over lattices and its applications.
Cryptology ePrint Archive, Report 2017/455 (2017). http://eprint.iacr.org/2017/
455
7. Bethencourt, J., Sahai, A., Waters, B.: Ciphertext-policy attribute-based encryp-
tion. In: 2007 IEEE Symposium on Security and Privacy, pp. 321–334. IEEE Com-
puter Society Press, May 2007
8. Boneh, D., Franklin, M.: Identity-based encryption from the weil pairing. In: Kilian,
J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 213–229. Springer, Heidelberg (2001).
https://doi.org/10.1007/3-540-44647-8 13
9. Boneh, D., Gentry, C., Gorbunov, S., Halevi, S., Nikolaenko, V., Segev, G., Vaikun-
tanathan, V., Vinayagamurthy, D.: Fully key-homomorphic encryption, arithmetic
circuit ABE and compact garbled circuits. In: Nguyen, P.Q., Oswald, E. (eds.)
EUROCRYPT 2014. LNCS, vol. 8441, pp. 533–556. Springer, Heidelberg (2014).
https://doi.org/10.1007/978-3-642-55220-5 30
10. Boneh, D., Sahai, A., Waters, B.: Functional encryption: definitions and challenges.
In: Ishai, Y. (ed.) TCC 2011. LNCS, vol. 6597, pp. 253–273. Springer, Heidelberg
(2011). https://doi.org/10.1007/978-3-642-19571-6 16
11. Boneh, D., Waters, B.: Conjunctive, subset, and range queries on encrypted data.
In: Vadhan, S.P. (ed.) TCC 2007. LNCS, vol. 4392, pp. 535–554. Springer, Heidel-
berg (2007). https://doi.org/10.1007/978-3-540-70936-7 29
12. Cash, D., Hofheinz, D., Kiltz, E., Peikert, C.: Bonsai trees, or how to delegate a
lattice basis. In: Gilbert [16], pp. 523–552 (2010)
13. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Mitzenmacher,
M. (ed.) 41st ACM STOC, pp. 169–178. ACM Press, May/June 2009
14. Gentry, C., Peikert, C., Vaikuntanathan, V.: Trapdoors for hard lattices and new
cryptographic constructions. In: Ladner, R.E., Dwork, C. (eds.) 40th ACM STOC,
pp. 197–206. ACM Press, May 2008
Compact Inner Product Encryption from LWE 153
15. Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with
errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti,
R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 75–92. Springer, Hei-
delberg (2013). https://doi.org/10.1007/978-3-642-40041-4 5
16. Gilbert, H. (ed.): EUROCRYPT 2010. LNCS, vol. 6110. Springer, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-13190-5
17. Gorbunov, S., Vaikuntanathan, V., Wee, H.: Attribute-based encryption for cir-
cuits. In: Boneh, D., Roughgarden, T., Feigenbaum, J. (eds.) 45th ACM STOC,
pp. 545–554. ACM Press, June 2013
18. Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine-
grained access control of encrypted data. In: Juels, A., Wright, R.N., De Capitani
di Vimercati, S. (eds.) ACM CCS 2006, pp. 89–98. ACM Press, October/November
2006. Available as Cryptology ePrint Archive Report 2006/309
19. Katz, J., Sahai, A., Waters, B.: Predicate encryption supporting disjunctions, poly-
nomial equations, and inner products. In: Smart, N. (ed.) EUROCRYPT 2008.
LNCS, vol. 4965, pp. 146–162. Springer, Heidelberg (2008). https://doi.org/10.
1007/978-3-540-78967-3 9
20. Micciancio, D., Peikert, C.: Trapdoors for lattices: simpler, tighter, faster, smaller.
In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp.
700–718. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29011-
4 41
21. O’Neill, A.: Definitional issues in functional encryption. Cryptology ePrint Archive,
Report 2010/556 (2010). http://eprint.iacr.org/2010/556
22. Regev, O.: On lattices, learning with errors, random linear codes, and cryptogra-
phy. In: Gabow, H.N., Fagin, R. (eds.) 37th ACM STOC, pp. 84–93. ACM Press,
May 2005
23. Shamir, A.: Identity-based cryptosystems and signature schemes. In: Blakley, G.R.,
Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 47–53. Springer, Heidelberg
(1985). https://doi.org/10.1007/3-540-39568-7 5
24. Xagawa, K.: Improved (hierarchical) inner-product encryption from lattices. In:
Kurosawa, K., Hanaoka, G. (eds.) PKC 2013. LNCS, vol. 7778, pp. 235–252.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36362-7 15
Towards Tightly Secure Deterministic
Public Key Encryption
Daode Zhang1,2,3 , Bao Li1,2,3 , Yamin Liu1 , Haiyang Xue1(B) , Xianhui Lu1 ,
and Dingding Jia1
1
School of Cyber Security, University of Chinese Academy of Sciences,
Beijing, China
{zhangdaode,lb}@is.ac.cn,
{liuyamin,xuehaiyang,luxianhui,jiadingding}@iie.ac.cn
2
State Key Laboratory of Information Security, Institute of Information
Engineering, Chinese Academy of Sciences, Beijing, China
3
Science and Technology on Communication Security Laboratory, Chengdu, China
1 Introduction
Tight Security Reduction. Standard security notions for public key encryp-
tion (PKE) schemes, e.g., IND-CCA security [6], only consider one user and one
ciphertext. However, in the reality setting, the adversary can know at most nu
public keys of users and obtain at most nc∗ challenge ciphertexts from per user.
These two parameters can be very large, e.g., nu = nc∗ = 240 . In general, L
will depend on nu and nc∗ [1]. In order to compensate for the security loss, we
have to increase the strength of the underlying intractability assumption which
worsens the parameters of the encryption scheme and affects the performance
of the implementation. For example, for encryption schemes from the Decisional
Diffie-Hellman assumption over cyclic groups, we have to increase the size of the
underlying groups, which in turn increases the running time of the implementa-
tion, as exponentiation in an l-bit group takes time about O(l3 ) as stated in [7].
Hence, it is important to study tight security reductions where the security loss
L is a small constant that in particular does not depend on parameters under
the adversary’s control, such as nu , nc∗ . In the case of CCA security, L should
also be independent of the parameter nc , which is the number of queries that
the adversary can make to each decryption oracle at most.
Fig. 1. The security loss amongst the D-PKE schemes under the concrete assumptions.
However, their PRIV1-IND-CCA D-PKE scheme in [4] (The sect. 7.2) is not
tightly PRIV-IND-CCA secure for block-sources. In their PRIV1-IND-CCA D-
PKE scheme, the ciphertext of a message m contains an item as follows
Fabo (ekabo , Hcr (kcr , Hinv (kinv , m)), Hinv (kinv , m)),
where Kabo is the key generation algorithm of Fabo . According to the gen-
eralized “Crooked” leftover hash lemma, the statistical distance between
$
f (Hinv (kinv , m)) and f (h) is negligible, where h ← Uinv and Uinv denotes the
uniform distribution on the range of Hinv . So that f (h) includes no information
of the message m. In order to use the generalized “Crooked” leftover hash lemma,
Hcr (kcr , Hinv (kinv , m)) and Hcr (kcr , h) must belong to the lossy branch of the
respective all-but-one TDF Fabo . As a result, the security loss of their scheme is
2 times of the security loss of the all-but-one TDF Fabo . However, the tight secu-
rity reduction considers nc∗ > 1 challenge ciphertexts in the PRIV-IND-CCA
security game for block-sources. Though PRIV1-IND-CCA and PRIV-IND-CCA
are proved to be equivalent in [4], there is a security loss of 2 · nc∗ due to the
employment of the hybrid technique.
Furthermore, to address this problem, we upgrade the all-but-one TDF in
the constructions of [4] to all-but-n TDF [8] whose number of the lossy branches
is n. When the number of the lossy branches is two times of the number of the
challenge ciphertexts, i.e., n = 2 · nc∗ (because we additionally need Hcr (kcr , h)
to be in the lossy branches of the all-but-n TDF), every challenge ciphertext
can be evaluated on the lossy branches in the PRIV-IND-CCA security game for
block-sources. In addition, apparently that if the security loss of the all-but-n
Towards Tightly Secure Deterministic Public Key Encryption 157
TDF is independent of n (tightly secure), then the security loss of the D-PKE
scheme can also be independent of nc∗ , i.e., the D-PKE scheme can be tightly
PRIV-IND-CCA secure for block-sources. However, because the number of the
lossy branches of the all-but-n TDF in the construction is bounded by n, so
that the number of the challenge ciphertexts nc∗ is bounded by n2 . As a result,
our D-PKE schemes are only able to be tightly PRIV-IND- n2 -CCA secure for
block-sources, where PRIV-IND- n2 -CCA security for block-sources is very similar
to PRIV-IND-CCA security for block-sources except with the restriction the
number of the challenge ciphertexts is bounded by n2 .
As aforementioned, the most important part of our constructions is to find
tightly secure all-but-n TDFs. Finally, we prove that the all-but-n TDF given by
Hemenway et al. [8] is tightly secure with a security loss of only 2. This improves
their original security reduction which has a security loss of 2n due to the use of
the hybrid technique. Applying this result to our constructions, we obtain the
first D-PKE scheme which is tightly PRIV-IND- n2 -CCA secure for block-sources
based on the s-DCR assumption.
2 Preliminaries
R
Notations. For a random variable X, we write x ← X to denote sampling
x according to X’s distribution. For a random variable X, its min-entropy is
defined as H∞ (X) = − log(maxx PX (x)). Given Y , the worst-case conditional
min-entropy of X is H∞ (X|Y ) = − log(maxx,y PX|Y =y (x)) and the average-case
conditional min-entropy of X is H ∞ (X|Y ) = − log( PY (y)·maxx PX|Y =y (x)).
y
A random variable X ∈ {0, 1}l is called a (t, l)-source if it satisfies that H∞ (X) ≥
→
−
t. And a vector X is called a (t, l)-block-source of length n if it is a list of random
variables (X1 , · · · , Xn ) over {0, 1}l and satisfies that H∞ (Xi |X1 , · · · , Xi−1 ) ≥ t
for all i ∈ [n] = {1, · · · , n}. The statistical distance between two
distributions
X, Y over a finite or countable domain D is (X, Y ) = 12 w∈D |PX (w) −
PY (w)|. A hash function H = (K, H) with range R is pairwise-independent if
for all x1 = x2 ∈ {0, 1}l and all y1 , y2 ∈ R, Pr[H(K, x1 ) = y1 ∧ H(K, x2 ) =
R
y2 : K ← K] ≤ |R|1 2 . A hash function H(K, H) is collision resistant if for all
probabilistic polynomial-time
adversary A, the advantage AdvH cr
(A) is negligible,
cr R R
where AdvH = Pr H(K, x1 ) = H(K, x2 ) K ← K;(x1 , x2 ) ← A(K); .
When nc∗ = 1, we call the scheme PRIV1-IND secure for block-sources; when O
is the encryption oracle E(pk, ·), we call the scheme PRIV-IND-CPA secure for
block-sources; when O includes the encryption and decryption oracle E(pk, ·) ∨
→
−∗
D(sk, ·)¬ c , we call the scheme PRIV-IND-CCA secure for block-sources.
We also define a notion of PRIV-IND-q-CCA security for block-sources which
is very similar to PRIV-IND-CCA security for block-sources except with the
restriction that the length nc∗ of block-sources is bounded by q.
Towards Tightly Secure Deterministic Public Key Encryption 159
secure for block-sources. In the above, Acr is the adversary who wants to find
collisions of Hcr , and Alt (respectively, Aabn ) is the adversary who attacks the
security of LT DF (respectively, ABN ).
References
1. Bellare, M., Boldyreva, A., Micali, S.: Public-key encryption in a multi-user setting:
security proofs and improvements. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS,
vol. 1807, pp. 259–274. Springer, Heidelberg (2000). https://doi.org/10.1007/3-
540-45539-6 18
2. Bellare, M., Boldyreva, A., O’Neill, A.: Deterministic and efficiently searchable
encryption. In: Menezes, A. (ed.) CRYPTO 2007. LNCS, vol. 4622, pp. 535–552.
Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74143-5 30
3. Bellare, M., Fischlin, M., O’Neill, A., Ristenpart, T.: Deterministic encryption:
definitional equivalences and constructions without random oracles. In: Wagner, D.
(ed.) CRYPTO 2008. LNCS, vol. 5157, pp. 360–378. Springer, Heidelberg (2008).
https://doi.org/10.1007/978-3-540-85174-5 20
4. Boldyreva, A., Fehr, S., O’Neill, A.: On notions of security for deterministic
encryption, and efficient constructions without random oracles. In: Wagner, D.
(ed.) CRYPTO 2008. LNCS, vol. 5157, pp. 335–359. Springer, Heidelberg (2008).
https://doi.org/10.1007/978-3-540-85174-5 19
5. Brakerski, Z., Segev, G.: Better security for deterministic public-key encryption:
the auxiliary-input setting. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol.
6841, pp. 543–560. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-
642-22792-9 31
6. Dolev, D., Dwork, C., Naor, M.: Non-malleable cryptography (extended abstract).
In: STOC 1991, pp. 542–552
7. Gay, R., Hofheinz, D., Kiltz, E., Wee, H.: Tightly CCA-secure encryption without
pairings. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665,
pp. 1–27. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49890-
31
8. Hemenway, B., Libert, B., Ostrovsky, R., Vergnaud, D.: Lossy encryption: con-
structions from general assumptions and efficient selective opening chosen cipher-
text security. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073,
pp. 70–88. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25385-
04
Towards Tightly Secure Deterministic Public Key Encryption 161
9. Mironov, I., Pandey, O., Reingold, O., Segev, G.: Incremental deterministic public-
key encryption. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012.
LNCS, vol. 7237, pp. 628–644. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-29011-4 37
10. Peikert, C., Waters, B.: Lossy trapdoor functions and their applications. In: STOC
2008, pp. 187–196
11. Raghunathan, A., Segev, G., Vadhan, S.: Deterministic public-key encryption for
adaptively chosen plaintext distributions. In: Johansson, T., Nguyen, P.Q. (eds.)
EUROCRYPT 2013. LNCS, vol. 7881, pp. 93–110. Springer, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-38348-9 6
12. Wee, H.: Dual projective hashing and its applications — lossy trapdoor functions
and more. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS,
vol. 7237, pp. 246–262. Springer, Heidelberg (2012). https://doi.org/10.1007/978-
3-642-29011-4 16
13. Xie, X., Xue, R., Zhang, R.: Deterministic public key encryption and identity-
based encryption from lattices in the auxiliary-input setting. In: Visconti, I., De
Prisco, R. (eds.) SCN 2012. LNCS, vol. 7485, pp. 1–18. Springer, Heidelberg (2012).
https://doi.org/10.1007/978-3-642-32928-9 1
Efficient Inner Product Encryption
with Simulation-Based Security
1 Introduction
Traditional public-key encryption provides all-or-nothing access to data: you can
either recover the entire plaintext or reveal nothing from the ciphertext. Func-
tional encryption (FE) [5,15] is a vast new paradigm for encryption which allows
tremendous flexibility in accessing encrypted data. In functional encryption, a
secret key skf embedded with a function f can be created from a master secret
key msk. Then, given a ciphertext for x, a user learns f (x) and reveals noth-
ing else about x. In recent years, the cryptographic community has made great
progress in research on the security of FE and construction for such schemes (see
for instance [1,6,8–10] and any more).
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 162–171, 2018.
https://doi.org/10.1007/978-3-319-89500-0_14
Efficient Inner Product Encryption with Simulation-Based Security 163
n
inner product x , y = i=1 xi yi . In this paper, we consider IPE with function
privacy, i.e., function-hiding inner product encryption.
Function-Hiding IPE. [2] presented adaptively secure schemes where the mes-
sage x0 and x1 may be adaptively chosen at any point in time, based on the
previously collected information. Bishop et al. [3] proposed a function-hiding
IPE scheme under the Symmetric External Diffie-Hellman (SXDH) assumption,
which satisfies an indistinguishability-based definition, and considered adaptive
adversaries. However, the scheme is available in a rather weak and unrealis-
tic security model which places limit on adversaries’ queries. Recently, Datta
et al. [7] developed a function-hiding IPE under the SXDH assumption where
the additional restriction on adversaries’ queries is removed. Tomida et al. [17]
constructed a more efficient function-hiding IPE scheme than that of [7] under
the External Decisional Linear (XDLIN) assumption. Kim et al. [11] put forth a
fully-secure function-hiding IPE scheme with less parameter sizes and run-time
complexity than in [3,7]. The scheme is proved simulation-based secure in the
generic model of bilinear maps. For the first time Zhao et al. [18] presented a
simulation-based secure functional-hiding IPE scheme under the SXDH assump-
tion in the standard model. The scheme can tolerate an unbounded number of
ciphertext queries and adaptive key queries.
Our Contribution. We construct a efficient simulation-based secure function-
hiding IPE (SSFH-IPE) scheme in the standard model. We compare our scheme
with related works in Table 1 where scalar multiplications on cyclic groups are
involved in key generation algorithm and encryption algorithm, and paring oper-
ations on bilinear paring groups are involved in decryption algorithm. We achieve
an outstanding reduction by a factor of 2 or more in computational complexity.
Our scheme achieves n + 6 group elements in secret key and ciphertext, which
also reduces storage complexity by a factor 2 or more. Hence, performance in the
SSFH-IPE scheme is superior to that in the previous schemes in both storage
complexity and computation complexity. Furthermore, our scheme is based on
the XDLIN assumption which is weaker than the SXDH assumption. In more
detail, the SXDH assumption relies on type 3 bilinear pairing groups, while the
XDLIN assumption relies on any type of bilinear pairing groups [17]. Therefore
from this angle, the SXDH assumption is stronger than the XDLIN assumption.
Although the construction of [17] was proved to be indistinguishability-based
secure under the XDLIN assumption and also succeeded in improving efficiency,
both storage complexity and computation complexity of our scheme are better
than that of [17] and our scheme achieves simulation-base security which is much
stronger than indistinguishability-based security.
To guarantee correctness, our scheme requires that inner products are within
a range of polynomial-size, which is consistent with other schemes in Table 1.
As pointed out in [3], it is reasonable for statistical computations because the
computations, like the average over a polynomial-size database, will naturally
be contained within a polynomial range. In addition, our scheme is simulation-
based secure against adversaries who hold in an unbounded number of ciphertext
queries and adaptive key queries. Although very basic functionalities such as IBE
Efficient Inner Product Encryption with Simulation-Based Security 165
2 Preliminaries
$
Let λ be the security parameter. If S is a set, x ← − S denotes the process of
choosing uniformly at random from S. Let X = {Xn }n∈N and Y = {Yn }n∈N be
distribution ensembles. We say that X≈c Y are computationally indistinguish-
able between X and Y , if for all nonuniform probabilistic polynomial-time D
and every n ∈ N, the difference between Pr[D(Xn ) = 1] and Pr[D(Yn ) = 1]
is negligible. Let negl(λ) be a negligible function in λ. Moreover, we write x
to denote a vector (x1 , . . . , xn ) ∈ Znq of length n
for some positive integer q
n
and n. We use a, b to denote the inner product, i=1 ai bi mod q, of vectors
a ∈ Zq and b ∈ Zq . We use upper case boldface to denote matrics. XT denotes
n n
166 Q. Zhao et al.
transpose of the matrix X. GL(n, Zq ) denotes the general linear group of degree
n over Zq . Z×
q denotes a set of integers {1, . . . , q − 1}.
RealSSF
A
H−IP E λ
(1 )≈c IdealSSF
A,S
H−IP E λ
(1 ).
n
b ∗i = j=1 φi,j a ∗j , B∗ = {b ∗1 , ..., b ∗n }, gT = e(g1 , g2 )ψ ,
return (B, B∗ ).
n T
Let (x )B denote i=1 xi b i , where x = (x1 , ..., xn ) ∈ Znq and B = {b 1 , ..., b n }.
Then we have
n n
E((x )A , (y )A∗ ) = e(xi g1 , yi g2 ) = e(g1 , g2 ) i=1 xi yi = e(g1 , g2 )x ,y , and
i=1
T −1 x ,y
E((x B, (y )B∗ ) = E((Bx )A , (ψ(BT )−1 y )A∗ ) = e(g1 , g2 )ψBx ·(B ) y
= gT .
3 SSFH-IPE Scheme
In this section, we present the construction of SSFH-IPE.
SSFH-IPE.Setup(1λ , n)→ (msk, pp): The setup algorithm runs (q, G1 , G2 , GT ,
$
e) ←
− Gabpg (1λ ). It then generates
(q, V, V∗ , GT , A, A∗ , E) ←
$
− Gdpvs (1λ , n + 6, (q, G1 , G2 , GT , e)) and
(B = {b 1 , ..., b n+6 }, B∗ = {b ∗1 , ..., b ∗n+6 }) ←
$
− Gob (1λ , n + 6).
Remark 1. We can easily notice that our scheme is malleable, where a ciphertext
can be created from certain other ciphertexts. The scheme in [17] is also mal-
leable, while it seems difficult to prove the schemes in [3,7,18] to be malleable.
4 Security Proof
Definition 6 (Problem 0). Problem 0 is to guess b ∈ {0, 1}, given ((q, G1 , G2 ,
∗ , y , κg1 , ξg2 ), where
GT , e), B, B b
$
(q, G1 , G2 , GT , e) ←
− Gabpg (1λ ),
$
B = (χi,j ) ←− GL(3, Zq ), (φi,j ) = (BT )−1 ,
$ n
κ←− Zq , bi = κ j=1 χi,j aj for i = 1, 2, 3, B = {b1 , b2 , b3 },
n ∗ = {b∗ , b∗ },
− Zq , b∗i = ξ j=1 φi,j a∗j for i = 1, 3, B
$
ξ← 1 3
$ $
gT = e(g1 , g2 )κξ , δ, σ ←
− Zq , ρ ←− Z×
q ,
y0 = (δ, 0, σ)B , y1 = (δ, ρ, σ)B .
Definition 7 (Problem 1). Problem 1 is to guess b ∈ {0, 1}, given ((q, G1 , G2 ,
B
GT , e), B, ∗ , Yb ), where
$
(B, B∗ , (q, G1 , G2 , GT , e)) ←
− Gob (1λ , n + 6),
∗ = {b∗ , ..., b∗ , b∗ , b∗ , b∗ },
B = {b1 , ..., bn , bn+1 , bn+2 , bn+5 }, B 1 n n+3 n+4 n+6
$ $
α, β ← − Z×
− Zq , η ← q ,
Y0 = (0 , α, β, 0, 0, η, 0)B , Y1 = (0n , α, β, 0, 0, η, ζ )B .
n
AdvProb
A
n
(λ) = |Pr[ExpP
A (1 ) = 1] − Pr[ExpA (1 ) = 1]|,
0 λ P1 λ
Lemma 2. Suppose the XDLIN assumption holds in G1 and G2 . Then for all
PPT adversary B, there is a adversary A such that AdvProb
B
1
(λ) ≤ AdvProb
A
0
(λ).
Lemma 3. Suppose the XDLIN assumption holds in G1 and G2 . Then for all
PPT adversary B, there is a adversary A such that AdvProb
B
2
(λ) ≤ AdvProb
A
0
(λ).
Lemma 4. Suppose the XDLIN assumption holds in G1 and G2 . Then for all
PPT adversary B, there is a adversary A such that AdvProb
B
3
(λ) ≤ AdvProb
A
0
(λ).
Lemma 5. Suppose the XDLIN assumption holds in G1 and G2 . Then for all
PPT adversary B, there is a adversary A such that AdvProb
B
4
(λ) ≤ AdvProb
A
0
(λ).
The proofs of Lemmas 2–5 and Theorem 1 are given in the full version of
this paper.
Acknowledgments. This work has been partly supported by National NSF of China
under Grant No. 61772266, 61572248, 61431008.
References
1. Agrawal, S., Gorbunov, S., Vaikuntanathan, V., Wee, H.: Functional encryption:
new perspectives and lower bounds. In: Canetti, R., Garay, J.A. (eds.) CRYPTO
2013. LNCS, vol. 8043, pp. 500–518. Springer, Heidelberg (2013). https://doi.org/
10.1007/978-3-642-40084-1 28
2. Agrawal, S., Libert, B., Stehlé, D.: Fully secure functional encryption for inner
products, from standard assumptions. In: Robshaw, M., Katz, J. (eds.) CRYPTO
2016. LNCS, vol. 9816, pp. 333–362. Springer, Heidelberg (2016). https://doi.org/
10.1007/978-3-662-53015-3 12
3. Bishop, A., Jain, A., Kowalczyk, L.: Function-hiding inner product encryption. In:
Iwata, T., Cheon, J.H. (eds.) ASIACRYPT 2015. LNCS, vol. 9452, pp. 470–491.
Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48797-6 20
4. Boneh, D., Raghunathan, A., Segev, G.: Function-private identity-based encryp-
tion: hiding the function in functional encryption. In: Canetti, R., Garay, J.A.
(eds.) CRYPTO 2013. LNCS, vol. 8043, pp. 461–478. Springer, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-40084-1 26
Efficient Inner Product Encryption with Simulation-Based Security 171
5. Boneh, D., Sahai, A., Waters, B.: Functional encryption: definitions and challenges.
In: Ishai, Y. (ed.) TCC 2011. LNCS, vol. 6597, pp. 253–273. Springer, Heidelberg
(2011). https://doi.org/10.1007/978-3-642-19571-6 16
6. Brakerski, Z., Segev, G.: Function-private functional encryption in the private-key
setting. In: Dodis, Y., Nielsen, J.B. (eds.) TCC 2015. LNCS, vol. 9015, pp. 306–324.
Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46497-7 12
7. Datta, P., Dutta, R., Mukhopadhyay, S.: Functional encryption for inner product
with full function privacy. In: Cheng, C.-M., Chung, K.-M., Persiano, G., Yang,
B.-Y. (eds.) PKC 2016. LNCS, vol. 9614, pp. 164–195. Springer, Heidelberg (2016).
https://doi.org/10.1007/978-3-662-49384-7 7
8. De Caro, A., Iovino, V., Jain, A., O’Neill, A., Paneth, O., Persiano, G.: On the
achievability of simulation-based security for functional encryption. In: Canetti,
R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8043, pp. 519–535. Springer,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-40084-1 29
9. Goldwasser, S., Kalai, Y.T., Popa, R.A., Vaikuntanathan, V., Zeldovich, N.:
Reusable garbled circuits and succinct functional encryption. In: STOC, pp. 555–
564 (2013)
10. Gorbunov, S., Vaikuntanathan, V., Wee, H.: Functional encryption with bounded
collusions via multi-party computation. In: Safavi-Naini, R., Canetti, R. (eds.)
CRYPTO 2012. LNCS, vol. 7417, pp. 162–179. Springer, Heidelberg (2012).
https://doi.org/10.1007/978-3-642-32009-5 11
11. Kim, S., Lewi, K., Mandal, A., Montgomery, H., Roy, A., Wu, D.J.: Function-
hiding inner product encryption is practical. Cryptology ePrint Archive, Report
2016/440 (2016)
12. Okamoto, T., Takashima, K.: Fully secure functional encryption with general rela-
tions from the decisional linear assumption. In: Rabin, T. (ed.) CRYPTO 2010.
LNCS, vol. 6223, pp. 191–208. Springer, Heidelberg (2010). https://doi.org/10.
1007/978-3-642-14623-7 11
13. Okamoto, T., Takashima, K.: Homomorphic encryption and signatures from vector
decomposition. In: Galbraith, S.D., Paterson, K.G. (eds.) Pairing 2008. LNCS, vol.
5209, pp. 57–74. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-
85538-5 4
14. Okamoto, T., Takashima, K.: Hierarchical predicate encryption for inner-products.
In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 214–231. Springer,
Heidelberg (2009). https://doi.org/10.1007/978-3-642-10366-7 13
15. O’Neill, A.: Definitional issues in functional encryption. Cryptology ePrint Archive,
Report 2010/556 (2010)
16. Shen, E., Shi, E., Waters, B.: Predicate privacy in encryption systems. In: Reingold,
O. (ed.) TCC 2009. LNCS, vol. 5444, pp. 457–473. Springer, Heidelberg (2009).
https://doi.org/10.1007/978-3-642-00457-5 27
17. Tomida, J., Abe, M., Okamoto, T.: Efficient functional encryption for inner-product
values with full-hiding security. In: Bishop, M., Nascimento, A.C.A. (eds.) ISC
2016. LNCS, vol. 9866, pp. 408–425. Springer, Cham (2016). https://doi.org/10.
1007/978-3-319-45871-7 24
18. Zhao, Q., Zeng, Q., Liu, X., Xu, H.: Simulation-based security of function-
hidinginner product encryption. Sci. China Inf. Sci. 61, 048102 (2017). https://
doi.org/10.1007/s11432-017-9224-9
Server-Aided Directly Revocable
Ciphertext-Policy Attribute-Based
Encryption with Verifiable Delegation
1 Introduction
Cloud Computing is a promising primitive which enables large amounts of
resources to be easily accessible to cloud users. Although data storage on public
cloud provides an ease of accessibility, it also raises concerns on data confiden-
tiality. Due to poor scalability and complex key management, the traditional
encryption schemes, such as identity based encryption, can’t satisfy the require-
ments of various commercial applications that have a large amount of users.
2 Preliminaries
Multilinear Maps. Let G0 , G1 , ..., Gd+3 be cyclic groups of prime order p. Multi-
linear maps consist of d + 3 mappings {ei : G0 × Gi → Gi+1 |i = 0, ..., d + 2}, for
i = 0, ..., d + 2, (i) if g0 is a generator of G0 , gi+1 = ei (g0 , gi ) is a generator of
Gi+1 ; (ii) ∀a, b ∈ Zp , ei (g0a , gib ) = ei (g0 , gi )ab ; (iii) ei can be efficiently computed.
d + 4 Multilinear Decisional Diffie-Hellman Assumption ( d + 4-MDDH).
Let G(λ) → (p, G0 , G1 , ..., Gd+3 , e0 , e1 , . . . , ed+2 ) be a generator of multilin-
ear groups. Given y = g0 , g0d1 , g0d2 , g0d3 , g0c , g0z0 , ..., g0zd , where z0 , ..., zd , d1 , d2 , d3 ,
∗
c∈R Zp are unknown, there is no polynomial algorithm A that can distinguish
cz0 ···zd d1 d2 d3
gd+3 from a random element Z∈R Gd+3 with a non-negligible advantage.
Subset Cover. Let Tid be a full binary tree, depth(x) denote the depth of node
x such that depth(root) = 0, path(x) = {xi0 , . . . , xidepth(x) } denote the path from
the root to node x. A list of revoked users R corresponds to a set of leaf nodes
in Tid . ∀x ∈ R, mark all nodes of path(x), and subset cover cover(R) is the set
of unmarked nodes that are the direct children of marked nodes in Tid , more
details refer to [17].
Selective security for original ciphertext. The selective security against chosen-
plaintext attack on original ciphertext, IND-s-CPA-OC in brief, is defined by
following game between a challenger C and an adversary A.
Init: A outputs a target access structure (W∗ , ρ∗ ) that will be used to gen-
erate a challenge ciphertext.
Setup. C runs the Setup(λ) algorithm and gives the system public parameters
P P to A. A is allowed to generate the secret key of aide server, but it is asked
to send the public key SP K to C.
Phase 1. A makes KeyGen(idi ,Si ) queries for (id1 ,S1 ), ..., (idq1 ,Sq1 ), C
returns SKidi ,Si to A.
Challenge. A submits two messages M0 , M1 of equal length, an access struc-
ture (W∗ , ρ∗ ), a revocation list R to C. None of the sets S1 , ..., Sq1 from Phase 1
satisfies (W∗ , ρ∗ ). C flips a random coin β∈R {0, 1} and generates the challenge
ciphertext CT ∗ with Mβ and aide-ciphertext CT ∗ under revocation list R. At
last, C returns CT ∗ , CT ∗ to A.
Phase 2. A makes KeyGen(idi ,Si ) queries for (idq1 +1 ,Sq1 +1 ), ..., (idq ,Sq ) as
in Query Phase 1 with the restriction that Sq1 +1 , ..., Sq should not satisfy the
challenge access structure (W∗ , ρ∗ ).
Guess. A outputs a guess bit β ∈R {0, 1} and wins the game if β = β. The
advantage of A is defined to be Adv(A) = |Pr[β = β] − 1/2|.
Selective security for updated ciphertext. The selective security against
chosen-plaintext attack on updated ciphertext, IND-s-CPA-UC in brief, is same
as IND-s-CPA-OC except the challenge phase.
Challenge. A submits two messages M0 , M1 of equal length, an access struc-
ture (W∗ , ρ∗ ), a prior revocation list R, a new revocation list R where R ⊂ R
Server-Aided Directly Revocable CP-ABE with Veriable Delegation 175
to C. None of the sets S1 , ..., Sq1 from Phase 1 satisfies (W∗ , ρ∗ ). C flips a ran-
dom coin β∈R {0, 1} and generates the challenge ciphertext CT ∗ with Mβ and
aide-ciphertext CT ∗ under revocation list R, and then generates update aide-
ciphertext C T̂ ∗ under revocation list R . At last, C returns CT ∗ , C T̂ ∗ to A.
Verifiability of revocation delegation. The verifiability of aide-ciphertext is
defined by following game between a challenger C and an adversary A.
Init, Setup and Query Phase are same as IND-s-CPA-OC.
Challenge. A submits a message M of equal length, an access structure
(W∗ , ρ∗ ), a prior revocation list R to C. None of the sets S1 , ..., Sq1 from Phase
1 satisfies (W∗ , ρ∗ ). C generates ciphertext CT ∗ with M and aide-ciphertext
CT ∗ under revocation list R. At last, C returns CT ∗ , CT ∗ to A.
Guess. A generates update aide-ciphertext C T̂ ∗ under revocation list R ,
where R ⊂ R . A wins the game if Verify(P P ,CT ∗ , C T̂ ∗ , R, R ) → 1 and the dis-
tributions of C T̂ ∗ and C T̃ are distinguishable, where Update(P P, CT ∗ ,R, R ) →
C T̃ are normally produced by C.
4 Our Construction
Let U = {at1 , ..., at|U| } be the attribute universe and ID = {id1 , ..., id|ID| } be
the user universe in the system. Let d, such that 2d = |ID|, be the depth for all
leaves in the full binary tree of identities.
Setup(λ) → (P P, M SK): Given the security parameter λ, it generates
d + 3 multilinear maps: {ei : G0 × Gi → Gi+1 |i = 0, ..., d + 2}, where
G0 , G1 , ..., Gd+3 are cyclic group of prime order p. Let g0 be a random generator
of G0 , and then gi+1 = ei (g0 , gi ) is a generator of Gi+1 for i = 0, 1, ..., d + 2.
The authority chooses α, b∈R Z∗p randomly and computes gd+2 α b
, gd+2 . For each
∗ ti
attribute ati ∈ U, it selects ti ∈R Zp randomly and sets Ti = g0 . The author-
ity chooses an efficient map H : {0, 1}∗ → G0 . Let Tid denote a binary
tree according to the revocation list R. At last, the authority sets master
secret key as M SK = α, {ti , i = 1, ..., |U|} and publishes public parameters
b
P P = {p, G0 , G1 , ..., Gd+3 , e0 , ..., ed+2 , T1 , ..., T|U| , g0 , gd+2 , ed+2 (g0 , gd+2 )α , U,
ID, H, d, Tid }.
Server KeyGen(P P ) → (SP K, SSK): The aide server randomly chooses
c∈R Z∗p , keeps secret key SSK = c secretly and publishes public key SP K = g0c .
User KeyGen(P P, SP K, id,S, M SK) → SKid,S : The authority can gener-
ate the secret key SKid,S = (K, L, {Kx : ∀x ∈ S) as follows.
– Otherwise, for each i ∈ [0, d], find out all the nodes x1 , ..., xk such that xt ∈
cover(R ) − cover(R) where depth(xt ) = i, t ∈ [0, k]; choose a1 , ..., ak ∈R Zp at
random, compute Pxt = edepth(xt )+1 (Pxt , C ) and verify
k
k
at at
edepth(xt )+2 ((Pxt ) , g0c ) = edepth(xt )+2 (g0 , (D̂xt ) ) (1)
t=1 t=1
– If there exists i ∈ [0, d] such that Eq. (1) does not hold, then output 0;
otherwise, return 1.
5 Security Results
Due to space limitation, we only give the security results. The complete proof
and efficiency analysis will be given in the full paper.
6 Conclusion
Acknowledgment. This work was supported in part by the National Natural Sci-
ence Foundation of China (Nos. 61602512, 61632012, 61373154, 61371083, 61672239),
in part by China Postdoctoral Science Foundation of China (No. 2016M591629), in
part by National Key Research and Development Program (Nos. 2016YFB0800101
and 2016YFB0800100), Innovative Research Groups of the National Natural Science
Foundation of China (No. 61521003).
178 G. Yu et al.
References
1. Sahai, A., Waters, B.: Fuzzy identity-based encryption. In: Cramer, R. (ed.) EURO-
CRYPT 2005. LNCS, vol. 3494, pp. 457–473. Springer, Heidelberg (2005). https://
doi.org/10.1007/11426639 27
2. Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine
grained access control of encrypted data. In: Proceedings of the 13th ACM Con-
ference on Computer and Communications Security, pp. 89–98. ACM (2006)
3. Pirretti, M., Traynor, P., McDaniel, P., Waters, B.: Secure attribute-based systems.
In: Proceedings of the 13th ACM conference on Computer and communications
security, pp. 99–112. ACM (2006)
4. Liang, X., Lu, R., Lin, X., Shen, X.: Ciphertext policy attribute based encryption
with efficient revocation. Technical report, University of Waterloo (2010)
5. Ostrovsky R., Sahai A., Waters, B.: Attribute-based encryption with non-
monotonic access structures. In: Proceedings of ACM Conference on Computer
and Communication Security, pp. 195–203. ACM (2007)
6. Hur, J., Noh, D.: Attribute-based access control with efficient revocation in data
outsourcing systems. IEEE Trans. Parallel Distrib. Syst. 22(7), 1214–1221 (2011)
7. Sahai, A., Seyalioglu, H., Waters, B.: Dynamic credentials and ciphertext del-
egation for attribute-based encryption. In: Safavi-Naini, R., Canetti, R. (eds.)
CRYPTO 2012. LNCS, vol. 7417, pp. 199–217. Springer, Heidelberg (2012).
https://doi.org/10.1007/978-3-642-32009-5 13
8. Xie, X., Ma, H., Li, J., Chen, X.: New ciphertext-policy attribute-based access con-
trol with efficient revocation. In: Mustofa, K., Neuhold, E.J., Tjoa, A.M., Weippl,
E., You, I. (eds.) ICT-EurAsia 2013. LNCS, vol. 7804, pp. 373–382. Springer,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-36818-9 41
9. Attrapadung, N., Imai, H.: Conjunctive broadcast and attribute-based encryption.
In: Shacham, H., Waters, B. (eds.) Pairing 2009. LNCS, vol. 5671, pp. 248–265.
Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03298-1 16
10. Yu, S., Wang, C., Ren, K., Lou, W.: Attribute based data sharing with attribute
revocation. In: Proceedings of the 5th ACM Symposium on Information, Computer
and Communications Security, pp. 261–270. ACM (2010)
11. Jahid, S., Mittal, P., Borisov, N.: EASiER: Encryption-based access control in
social networks with efficient revocation. In: Proceedings of the 6th ACM Sympo-
sium on Information, Computer and Communications Security, pp. 411–415. ACM
(2011)
12. Zhang, Y., Chen, X., Li, J., Li, H., Li, F.: FDR-ABE: attribute-based encryption
with flexible and direct revocation. In: 5th International Conference on Intelligent
Networking and Collaborative Systems-2013, pp. 38–45. IEEE (2013)
13. Naruse, T., Mohri, M., Shiraishi, Y.: Attribute-based encryption with attribute
revocation and grant function using proxy re-encryption and attribute key for
updating. In: Park, J., Stojmenovic, I., Choi, M., Xhafa, F. (eds.) Future Informa-
tion Technology 2014. LNEE, vol. 276, pp. 119–125. Springer, Berlin, Heidelberg
(2014). https://doi.org/10.1007/978-3-642-40861-8 18
14. Shi, Y., Zheng, Q., Liu, J., Han, Z.: Directly revocable key-policy attribute-based
encryption with verifiable ciphertext delegation. Inf. Sci. 295, 221–231 (2015)
15. Cui, H., Deng, R.H., Li, Y., Qin, B.: Server-aided revocable attribute-based encryp-
tion. In: Askoxylakis, I., Ioannidis, S., Katsikas, S., Meadows, C. (eds.) ESORICS
2016. LNCS, vol. 9879, pp. 570–587. Springer, Cham (2016). https://doi.org/10.
1007/978-3-319-45741-3 29
Server-Aided Directly Revocable CP-ABE with Veriable Delegation 179
16. Yamada, K., Attrapadung, N., Emura, K., Hanaoka, G., Tanaka, K.: Generic con-
structions for fully secure revocable attribute-based encryption. In: Foley, S.N.,
Gollmann, D., Snekkenes, E. (eds.) ESORICS 2017. LNCS, vol. 10493, pp. 532–
551. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66399-9 29
17. Boldyreva, A., Goyal, V., Kumar, V. : Identity-based encryption with efficient
revocation. In: Proceedings of the 15th ACM Conference on Computer and Com-
munications Security, pp. 417–426. ACM (2008)
Practical Large Universe Attribute-Set
Based Encryption in the Standard Model
Xinyu Feng1,2 , Cancan Jin1,2 , Cong Li1,2 , Yuejian Fang1,2 , Qingni Shen1,2(B) ,
and Zhonghai Wu1,2
1
School of Software and Microelectronics, Peking University, Beijing, China
{xyf,jincancan1992,li.cong}@pku.edu.cn,
{fangyj,qingnishen,wuzh}@ss.pku.edu.cn
2
National Engineering Research Center for Software Engineering,
Peking University, Beijing, China
1 Introduction
In cloud computing system, the cloud service providers may be honest but curi-
ous about the customer data for the analysis of user behavior or advertising. A
feasible solution is that owners encrypt sensitive data before uploading them.
Compared with the traditional one-to-one encryption, Attribute-Based Encryp-
tion (ABE), as an excellent cryptographic access control mechanism, is quite
preferable for data encryption and sharing based on the recipients’ ability to
satisfy a policy. ABE is an excellent cryptographic access control mechanism
achieving the sharing of encrypted data. However, in many scenarios, separate
attributes cannot give a good satisfaction for the various requirements, they are
only meaningful when they are organized as the groups or sets.
There are mainly two types of ABE schemes: Ciphertext-Policy ABE (CP-
ABE), where ciphertexts are associated with access policies and keys are asso-
ciated with sets of attributes, and Key-Policy ABE (KP-ABE), where keys
are associated with access policies and ciphertexts are associated with sets of
attributes. In this work, we focus on the challenge how to organize attributes
efficiently.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 180–191, 2018.
https://doi.org/10.1007/978-3-319-89500-0_16
Practical Large Universe Attribute-Set Based Encryption 181
1. Attributes are often related with each other. Many attributes are only mean-
ingful in groups or in sets.
2. Separate attributes cannot give a great satisfaction for various requirements
in practice, which will lead to the consequence of a large number of repeated
attributes in the access policy.
Sahai and Waters first proposed the concept of Attribute-based Encryption [17]
in 2005, as a generalization of Fuzzy Identity-based Encryption by using thresh-
old gates as the access structure. Then ABE comes into two flavors, Key-Policy
ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE). Goyal et al. proposed
the first KP-ABE scheme [7], which supports monotonic Boolean encryption
policies. The first construction of CP-ABE was given by Bethencourt et al. [4],
whose security proof was based on the generic group model. Okamoto et al. first
gave a bounded fully secure construction in the standard model [9]. Until now
many works have been presented to achieve the unbounded or large universe
properties in ABE [8,13]. But most of them were somewhat limited such as
restricting the expressiveness of policies or using random oracle model. In 2013,
Rouselakis and Waters proposed a large universe and unbounded ABE scheme
[16] and proved it to be selectively secure using the partitioning style techniques.
Later in 2014, Wang and Feng proposed a large universe ABE scheme for lat-
tices [20]. In 2016, Li et al. proposed a practical construction for large universe
hierarchical ABE scheme [10] and Zhang et al. proposed an accountable large
universe ABE scheme supporting monotone access structures [21].
There are many other schemes focus on the problem of how to organize
attributes in ABE to make it practical and efficient. One study is hierarchical
ABE (HABE) [6,11,12,19]. Another is ASBE. Note that ASBE is quite differ-
ent from many existing HABE schemes in organizing attributes. Attributes in
former is composite such as {University A, Master}, while in the latter they are
hierarchical, that is, there is a relation between the superior and the subordi-
nate. ASBE was first proposed by Bobba et al. [5] in 2009. In ASBE, attributes
are organized into a recursive family of sets. In Bobba’s work, access policy was
182 X. Feng et al.
1.3 Organization
2 Preliminaries
2. For each access structure A on U which is the attribute universe, there exists
a matrix M ∈ Zpl×n with l rows and n columns, which is called the share-
generating matrix and a function ρ, which is defined as the mapping from
rows of M to attributes in U , i.e. ρ : [l] → U. For all i = 1, · · · , l, the ith
row of M is associated with an attribute ρ(i). Let the function ρ define the
party labeling row i as ρ(i). To share the secret s ∈ Zp , we first consider the
column vector y = (s, y2 , · · · , yn )T , where s is the secret to be shared, and
y2 , · · · , yn ∈ Zp are randomly chosen. Then M y is the vector of l shares of
the secret s according to II. The share (M y)i belongs to party ρ(i), that is,
the attribute of ρ(i).
According to [6], every LSSS enjoys the linear reconstruction property. Sup-
pose II is an LSSS for the access structure A. Let S be any authorized set if
A(S) = 1, and let I ⊂ {1, 2, · · · , l} be defined as I = {i : ρ(i) ∈ S}. Then there
exist constants {di ∈ Zp }i∈I such that, if {λi }(i ∈ I) are valid shares of any
secret s according to II, then i∈I di · λi = s.
Furthermore, to support composite attribute sets, that is only attributes in
the same set can be used to satisfy the access policy, one natural idea is to re-
share the shares obtained from the outer set. Take the depth of key structure
being 2, that is, d = 2 as an example, we will first generate a share di (1 ≤ i ≤ k)
of the secret for each attribute subset: A0 , A1 , A2 , · · · , Ak . And then for each
attribute subset Ai , it takes the share di as a new secret to share with the
attributes (ai1 , · · · , aini ) in it where ni is the number of attributes in set Ai .
When the depth of key structure is greater than d, iterate the process discussed
above several times until there is no composite attribute subsets.
Assumption. Initially the challenger calls the groups generation algorithm with
the security parameter as input and then picks a random group element g ∈ G0 ,
184 X. Feng et al.
g, g s
i j i i 2
g a , g b , g sbj , g a bj
, gabj
∀(i, j) ∈ [q, q]
i
bj /b2j
ga ∀(i, j, j ) ∈ [2q, q, q] with j = j
i
ga /bj
∀(i, j) ∈ [2q, q] with i = q + 1
i
i
bj /b2j
g sa bj /bj
, g sa ∀(i, j, j ) ∈ [q, q, q] with j = j
q+1
It is hard for the adversary to distinguish e(g, g)sa ∈ G1 from an element
which is randomly chosen from G1 .
We say that the q-type assumption holds if no PPT adversary has a non-
negligible advantage in solving the q-type problem.
Selective security model. We give the definition of the security model for our
large universe CP-ASBE (LU-CP-ASBE) scheme. In our LU-CP-ASBE model,
attributes are divided into simple attributes and composite attributes. Note that
once some component in composite attribute sets satisfies the access structure,
the associated user is said to be authorized. We described the security model by
a game between an adversary A and a challenger B and is parameterized by the
security parameter λ ∈ N. The phases of the game are as follows:
3 Our Construction
In this section, we present the construction of LU-CP-ASBE scheme where
the attributes are assumed to be divided into simple attributes and composite
Practical Large Universe Attribute-Set Based Encryption 185
The algorithm calls the group generation algorithm G(1λ ) and gets the
descriptions of the groups and the bilinear mapping D = (p, G0 , G1 , e). Then
it picks the random terms g, u, h, w, v ∈ G0 and α, β ∈ Zp . The setup algo-
rithm issues the public parameters P K as: (D, g, u, h, w, v, X, Y ) and keeps
the master key M K(α, β) as secret.
Note that the operations on exponents are module the order p of the group,
which is prime.
It then chooses θ · τ random values tθτ ∈ Zp and for every θ ∈ [0, γ], τ ∈ [nθ ]
computes
C = me(g, g)αs , D0 = g s ,
{θ} {θ} {θ}
Cτ,1 = wλX v tX , Cτ,2 = (uρ(X ) h)−tX , Cτ,3 = g tX , Ĉτ{θ} = X λX
Function ψ(i) defines subset Ȧ that ρ(i) belongs to and function Φ(i) defines
the position of ρ(i) in Ȧψ(i) . Denote the set {i : i ∈ I ∩ ψ(i)} as Iθ . Now the
decryption algorithm calculates
{θ} di {θ}
i∈Iθ e((ĈΦ(i) ) , L )
F = {θ} {θ} {θ} {θ} {θ} {θ} di
.
θ∈ψ(i) i∈Iθ (e(K1 , CΦ(i),1 )e(Kτ,2 , CΦ(i),2 )e(Kτ,3 , CΦ(i),3 ))
where τ is the index of the attribute ρ(i) in subset Aθ . The algorithm outputs
plaintext m as C · F/(D0 , K0 ).
– Correctness.
{θ} {θ} {θ} {θ} {θ} {θ}
Fθ = (e(K1 , CΦ(i),1 )e(Kτ,2 , CΦ(i),2 )e(Kτ,3 , CΦ(i),3 ))di = e(g, w)rθ di λi
i∈Iθ i∈Iθ
Then we have F = θ∈ψ(i) Fθ = e(g, w)r i∈I di λi = e(g, w)rs and m =
C · F/(D0 , K0 ) = me(g, g)αs e(g, w)rs /e(g s , g α wr ).
Practical Large Universe Attribute-Set Based Encryption 187
– Phase 1. Now challenger B has to produce secret keys for tuples which
consists of non-authorized attribute sets S = {A0 , A1 , A2 , · · · , Ak }, where
Ai = {ai1 , ai2 , · · · , aini }. The only restriction is that S does not satisfy A∗ .
Consequently, there exists a vector d = (d1 , d2 , · · · , dn )T ∈ Znp such that
d1 = −1 and Ml∗ , d = 0 for all i ∈ I = {i|i ∈ [l] ∩ ρ∗ (i) ∈ S}. B computes
d using linear algebra. Then B picks r̃ for the user and r̃θ (θ ∈ [k]) for each
attribute subset randomly from Zp , and for simplicity we let r̃0 = r̃. Then B
implicitly have
r = r̃θ − d1 aq − · · · − dn aq+1−n = r̃θ − di aq+1−i (θ ∈ [0, k]),
i∈[n]
rθ = r̃θ + d1 aq + · · · + dn aq+1−n = r̃θ + di aq+1−i (θ ∈ [0, k]).
i∈[n]
188 X. Feng et al.
Each rθ is properly distributed due to r̃θ . Then using the suitable terms from
the assumption, B calculates:
{θ}
n
q+2−i {θ}
q+1−i
K0 = g α wrθ = g α̃ (g a )r̃θ (g a )di , K1 = g rθ = g r̃θ (g a )di ,
i=2 i∈[n]
{θ} (r+rθ )/β (r̃0 +r̃θ )s/β̃)
L =g =g .
Additionally, for each attribute aθτ in attribute subset Aθ , B compute the
{θ} {θ}
terms Ki,2 = g rθ,i and Ki,3 = (uaθ,i h)rθ,i v −rθ . The part v −rθ is
ak ∗
di aq+1−i
v −r̃θ (g ṽ · (g bj )Mj,k )− i∈[n]
(j,k)∈[l,n]
q+1−i ∗ q+1+k−i
= v −r̃θ (g a )−ṽdi · g −di Mj,k a /bj
i∈[n] (i,j,k)∈[n,l,n]
q+1−i aq+1+k−i ∗
v −r̃θ (g a )−ṽdi · (g bj
)−di Mj,k
= i∈[n] (i,j,k)∈[n,l,n],i=k
Φ
∗ q+1
· g −di Mj,k a /bj
(i,j)∈[n,l]
∗ q+1 ∗ q+1
=Φ · g −M J ,da /bj
=Φ· g −M J ,da /bj
.
j∈l j∈l,ρ∗ (j)∈S
/
The Φ part can be calculated by the simulator using the assumption, while
the second part cannot. Simulator B implicitly sets
bi
rθ,τ = r̃θ,τ + rθ ·
∗
aθτ − ρ∗ (i )
i ∈[l],ρ (i )∈S
/
bi di bi aq+1−i
= r̃θ,τ + r̃θ · + .
aθτ − ρ∗ (i ) aθτ − ρ∗ (i )
i ∈[l],ρ∗ (i )∈S
/ {i,i }∈[n,l],ρ∗ (i )∈S
/
where rθ,τ is properly distributed. Notice that rθ,τ is well defined only for
attributes that has nothing to do with the policy, therefore, the denominators
{θ}
aθτ − ρ∗ (i ) are non-zero. The (uaθ,i h)rθ,i part in Ki,3 is computed as
∗ b ak (a ∗
{θ}
r̃θ Mj,k
i θ,i −ρ (j))
b2 (aθτ −ρ∗ (i ))
(u aθ,i r̃θ,l
h) · (Ki,2 /g r̃θ,l )ũaθ,i +h̃ · g j
=Ψ· g j =Ψ· g bj
.
(i,j)∈[n,l],ρ∗ (j)∈S
/ j∈[l],ρ∗ (j)∈S
/
Practical Large Universe Attribute-Set Based Encryption 189
{θ}
where Ψ and Ki,2 can be calculated using the terms in our assumption.
The non-computable parts of (uaθ,i h)rθ,i and v −rθ term can cancel with each
{θ} {θ}
other. In this way simulator B can calculate Ki,2 and Ki,3 and send the
{θ} {θ} {θ} {θ}
decryption key SK = (S, {K0 , K1 , L{θ} , Ki,2 , Ki,3 }θ∈[0,k],i∈[nθ ] ) to A.
– Challenge. The adversary A submits two equal length message m0 and m1 .
$
Then B flips a random coin b ← − {0, 1} and constructs C = mb T e(g, g)α̃s and
D0 = g s where T is the challenge term. Then B is supposed to generate the
other components in ciphertext and it sets implicitly y = (s, sa + ỹ2 , sa2 +
$
− Zp . Since λ = M ∗ y, we have that
ỹ3 , · · · , san−1 + ỹn ) where ỹ2 , ỹ3 , · · · , ỹn ←
∗
n
λX = i∈[n] MX ,i sa i−1
+ i=2 MX ,i ỹl = i∈[n] MX∗ ,i sai−1 + λ̃X for each
∗
row X ∈ [l]. And for each now B sets implicitly tX = −sbX which is properly
distributed. Using this, B calculates
{θ}
Cτ,1 = wλX v tX
∗ i −
sak bX Mj,k
∗
= wλ̃X · g MX ,i sa · g −sbX ṽ · g bj
i∈[n] (j,k)∈[l,n]
∗ i k ∗ −
sak bX Mj,k
∗
θ
where X = i=0 nθ + τ .
By using tX = −sbX , term v can cancel with the unknown powers of wλX and
similarly by using β = β̃/s, the unknown powers in Ĉθ can also be canceled.
{θ} {θ}
Now there is nothing non-computable for B in terms Ci,2 , Ci,3 and Ĉθ .
So far, B successfully generates the correct ciphertext under the access struc-
ture (M, ρ) using the suitable terms in our assumption and public parameters
P K. Finally, B sends the challenged ciphertext CT
{θ} {θ} {θ}
(C, (M, ρ), D0 , {Cτ,1 , Cτ,2 , Cτ,3 }θ∈[0,m],τ ∈[nθ ] , {Ĉθ }θ∈[m] )
190 X. Feng et al.
to the attacker A.
– Phase 2. Phase 1 is repeated.
– Guess. The adversary A is supposed to output a guess b of b to B. If b = b,
q+1
B outputs 0 and claim the challenge term is T = e(g, g)a s , otherwise, it
outputs 1 and the challenge term T is random.
q+1
Since the probability of T = e(g, g)a s equals 1/2, B has an advantage of
AdvA /2 to break the q-type security assumption.
5 Conclusion
In this paper, we proposed a feasible and efficient attribute-set based encryp-
tion scheme, which can be applied in the scenario where many attributes are
only meaningful in groups or in sets as they describe users. Our scheme is large
universe, unbounded and powerful in expressing complex access policies. Addi-
tionally, it is proved to be selectively secure under the q-type assumption.
References
1. Aluvalu, R., Kamliya, V.: A survey on hierarchical attribute set based encryption
(HASBE) access control model for cloud computing. Int. J. Comput. Appl. 112(7),
4–7 (2015)
2. Ambrosin, M., Conti, M., Dargahi, T.: On the feasibility of attribute-based encryp-
tion on smartphone devices, pp. 49–54 (2015)
3. Beimel, A.: Secure schemes for secret sharing and key distribution. Int. J. Pure
Appl. Math. (1996)
4. Bethencourt, J., Sahai, A., Waters, B.: Ciphertext-policy attribute-based encryp-
tion. In: IEEE Symposium on Security and Privacy, pp. 321–334 (2007)
5. Bobba, R., Khurana, H., Prabhakaran, M.: Attribute-sets: a practically moti-
vated enhancement to attribute-based encryption. In: Backes, M., Ning, P. (eds.)
ESORICS 2009. LNCS, vol. 5789, pp. 587–604. Springer, Heidelberg (2009).
https://doi.org/10.1007/978-3-642-04444-1 36
6. Deng, H., Wu, Q., Qin, B., Domingo-Ferrer, J., Zhang, L., Liu, J., Shi, W.:
Ciphertext-policy hierarchical attribute-based encryption with short ciphertexts.
Inf. Sci. 275(11), 370–384 (2014)
7. Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine-
grained access control of encrypted data. In: ACM Conference on Computer and
Communications Security, pp. 89–98 (2006)
8. Lewko, A.: Tools for simulating features of composite order bilinear groups in the
prime order setting. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012.
LNCS, vol. 7237, pp. 318–335. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-29011-4 20
Practical Large Universe Attribute-Set Based Encryption 191
9. Lewko, A., Okamoto, T., Sahai, A., Takashima, K., Waters, B.: Fully secure
functional encryption: attribute-based encryption and (hierarchical) inner prod-
uct encryption. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp.
62–91. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5 4
10. Li, C., Fang, Y., Zhang, X., Jin, C., Shen, Q., Wu, Z.: A practical construction for
large universe hierarchical attribute-based encryption. Concurr. Comput. Pract.
Exp. 29(17) (2017)
11. Li, J., Wang, Q., Wang, C., Ren, K.: Enhancing attribute-based encryption with
attribute hierarchy. Mob. Netw. Appl. 16(5), 553–561 (2011)
12. Liu, J., Wan, Z., Gu, M.: Hierarchical attribute-set based encryption for scalable,
flexible and fine-grained access control in cloud computing. In: Bao, F., Weng,
J. (eds.) ISPEC 2011. LNCS, vol. 6672, pp. 98–107. Springer, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-21031-0 8
13. Okamoto, T., Takashima, K.: Fully secure unbounded inner-product and attribute-
based encryption. In: Wang, X., Sako, K. (eds.) ASIACRYPT 2012. LNCS, vol.
7658, pp. 349–366. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-
642-34961-4 22
14. Perumal, B., Rajasekaran, M.P., Duraiyarasan, S.: An efficient hierarchical
attribute set based encryption scheme with revocation for outsourcing personal
health records in cloud computing. In: International Conference on Advanced Com-
puting and Communication Systems, pp. 1–5 (2014)
15. Ragesh, G.K., Baskaran, D.K.: Ragesh G K and Dr K Baskaran privacy preserving
ciphertext policy attribute set based encryption (PP-CP-ASBE) scheme for patient
centric data access control in cloud assisted WBANs, ACCIS 2014. In: ACCIS 2014.
Elsevier (2014)
16. Rouselakis, Y., Waters, B.: Practical constructions and new proof methods for large
universe attribute-based encryption. In: ACM SIGSAC Conference on Computer
and Communications Security, pp. 463–474 (2013)
17. Sahai, A., Waters, B.: Fuzzy identity-based encryption. In: Cramer, R. (ed.) EURO-
CRYPT 2005. LNCS, vol. 3494, pp. 457–473. Springer, Heidelberg (2005). https://
doi.org/10.1007/11426639 27
18. Wan, Z., Liu, J., Deng, R.H.: HASBE: a hierarchical attribute-based solution for
flexible and scalable access control in cloud computing. IEEE Trans. Inf. Forensics
Secur. 7(2), 743–754 (2012)
19. Wang, G., Liu, Q., Wu, J.: Hierarchical attribute-based encryption for fine-grained
access control in cloud storage services. In: ACM Conference on Computer and
Communications Security, pp. 735–737 (2010)
20. Wang, S., Feng, F.: Large universe attribute-based encryption scheme from lattices.
Comput. Sci. 17(7), 327 (2014)
21. Zhang, Y., Li, J., Zheng, D., Chen, X., Li, H.: Accountable large-universe attribute-
based encryption supporting any monotone access structures. In: Liu, J.K.K., Ste-
infeld, R. (eds.) ACISP 2016. LNCS, vol. 9722, pp. 509–524. Springer, Cham (2016).
https://doi.org/10.1007/978-3-319-40253-6 31
Fully Secure Hidden Ciphertext-Policy
Attribute-Based Proxy Re-encryption
Xinyu Feng1,2 , Cong Li1,2 , Dan Li1,2 , Yuejian Fang1,2 , and Qingni Shen1,2(B)
1
School of Software and Microelectronics, Peking University, Beijing, China
{xyf,li.cong,lidan.sichuan.yaan}@pku.edu.cn,
{fangyj,qingnishen}@ss.pku.edu.cn
2
National Engineering Research Center for Software Engineering,
Peking University, Beijing, China
1 Introduction
Attribute-based Encryption (ABE) which provides fine grained access control
is a good solution to the secure sharing of cloud data. There are mainly two
types of ABE schemes: the Key-Policy ABE (KP-ABE), where the ciphertexts
are associated with sets of attributes while the keys are associated with access
policies; the Ciphertext-Policy ABE (CP-ABE), where the keys are associated
with sets of attributes and the ciphertexts are associated with access policies.
Attribute-based proxy re-encryption (AB-PRE) is an application of proxy
cryptography in ABE [4,15,19,26]. AB-PRE schemes allow the data owner to
delegate the capability of re-encryption to the semi-trusted proxy. In this way,
the proxy is capable of running the re-encryption operation, which reduces the
computation cost of the data owner. An authorized user is able to decrypt the
re-encrypted data just using his/her own secret key and no additional component
is needed. Moreover, no sensitive data can be revealed by the proxy. However,
there exists a problem in current ciphertext-policy attribute-based proxy re-
encryption (CP-AB-PRE) schemes [4,15,19,27]. In these schemes, the ciphertext
policy which consists of the user’s attributes is exposed to the proxy, thus, the
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 192–204, 2018.
https://doi.org/10.1007/978-3-319-89500-0_17
Fully Secure Hidden Ciphertext-Policy Attribute-Based Proxy Re-encryption 193
proxy can get some information of attributes about both the owner and the user.
A user’s attributes may contain his/her sensitive information. These data relate
to user privacy and should not be exposed to a third party.
To solve the problem mentioned above, we borrow the concept of hidden
policy appeared in schemes [9,12,13,21,22] to propose a hidden ciphertext policy
attribute-based proxy re-encryption scheme. By using our scheme, the proxy can
obtain little sensitive data or privacy information of the user.
Our Contributions. By employing the AND-gates policy we propose the first
fully secure hidden CP-AB-PRE scheme which can make a better protection of
the user’s privacy. Our scheme has the following properties:
– Unidirectionality (A ciphertext CT is able to be transformed to CT but
it cannot be transformed from CT ).
– Non-Interactivity (The data owner is able to generate the re-encryption
key by himself without any participation of the untrusted third party).
– Multi-use (The encrypted data can be re-encrypted for multiple times).
– Master key security (The proxy or the user doesn’t need to obtain the
data owner’s secret key during the re-encryption and decryption process).
– Re-encryption control (The data owner can determine whether the
encrypted data can be re-encrypted).
– Collusion resistant (Users are not able to combine their keys to obtain the
plaintext which belongs to none of them).
Table 1 shows the comparison between our CP-AB-PRE scheme and other
schemes on the main features.
Schemes Liang et al. [15] Luo et al. [19] Do et al. [4] Liang et al. [14] Ours
√ √ √ √ √
Unidirectionality
√ √ √ √ √
Non-interactive
√ √ √ √
Multi-use ×
√ √ √ √ √
Master key security
√ √ √ √
Re-encryption control ×
√ √ √ √ √
Collusion resistant
√
Hidden policy × × × ×
√ √
Fully secure × × ×
Related Work. Proxy re-encryption was first proposed by Blaze et al. [2], which
can transform a key with the ciphertext into another key without revealing the
secret key and the plaintext of the ciphertext. But there should be an unrealistic
level of trust in the proxy to achieve the delegation because the sensitive infor-
mation can be revealed during the re-encryption process. To solve this problem,
Ateniese et al. proposed a new proxy re-encryption scheme in 2005 [1]. Green and
194 X. Feng et al.
We take AND-gates as the basic access policy in our scheme, where negative
attributes and wildcards are supported. A negative attribute denotes a user
shouldn’t have this attribute and a wildcard means this attribute is out of con-
sideration. Multi-valued attribute is also supported in our scheme.
We use the notation such as W = [W1 , · · · , Wn ] = [1, 0, ∗, ∗, 0] where n = 5
to specify the ciphertext policy. The wildcard * in the ciphertext policy means
“not care” value, which can be considered as an AND-gate on all the attributes.
For example, the above ciphertext policy means that the recipient who wants to
decrypt must have the value 1 for W1 , 0 for W2 and W5 , and the values for W3
and W4 do not matter in the AND-gate. A recipient with policy [1, 0, 1, 0, 0] can
decrypt the ciphertext, but a recipient with policy [1, 1, 1, 0, 1] can not.
Fully Secure Hidden Ciphertext-Policy Attribute-Based Proxy Re-encryption 195
The definition of full security for the CP-ABE system is described by a security
game between a challenger and an attacker, which proceeds as follows:
– Setup. The challenger runs the Setup algorithm and sends the public param-
eters P P to the attacker and the challenger knows the master key M SK.
– Phase 1 . The attacker adaptively makes queries for private keys cor-
responding to sets of attributes S1 , . . . , SQ1 to the challenger. Each
time, the challenger responds with a secret key obtained by running
KeyGen(M SK, P P, Sk ). The attacker may also requests the re-encryption
keys for access policies W , and the challenger will run the RKGen(SKL , W )
algorithm to respond.
– Challenge. The attacker selects two messages M0 and M1 with the same
length and an access structure W . The challenger flips a random coin b ∈
{0, 1} and encrypts Mb under W to generate CT . It sends CT to the attacker.
– Phase 2 . Phase 2 is similar to Phase 1 except that the attacker requests
private keys corresponding to sets of attributes SQ1 +1 , . . . , SQ adaptively.
Notice that none of the attributes should satisfy the access structure W in
the challenge phase.
– Guess. The attacker outputs a guess b for b.
The advantage of an attacker in this game is defined to be P r[b = b ] − 12 .
3 Our Construction
k
Setup(1 , n). A trusted authority generates a tuple G = [p, G, GT , g ∈ G, e] and
random w ∈ Z∗p . For each attribute i where 1 ≤ i ≤ n, the authority generates
random values {ai,t , bi,t ∈ Z∗p }1≤t≤ni and random points {Ai,t ∈ G}1≤t≤ni . It
computes Y = e(g, g)w . The public key P K and the master key M K is
a b
P K = {Y, p, G, GT , g, e, {{Ai,ti,t , Ai,ti,t }1≤t≤ni }1≤i≤n },
C0 = hr . For 1 ≤ i ≤ n, it picks up random values {ri,t ∈ Z∗p }1≤t≤ni and com-
b a
putes Ci,t,1 , Ci,t,2 as follows: if vi,t ∈ Wi , Ci,t,1 = (Ai,ti,t )ri,t , Ci,t,2 = (Ai,ti,t )r−ri,t
(well-formed ); if vi,t ∈/ Wi , Ci,t,1 , Ci,t,2 are random (mal-formed ). The ciphertext
CT is: CT = {C̃, C0 , C0 , {{Ci,t,1 , Ci,t,2 }1≤t≤ni }1≤i≤n }.
e(g, g)rsi e(g, h)rd then compute C̄ = e(C0 , D0 ) i=1 Ei = e(g, g)wr e(g, h)nrd ,
the re-encrypted ciphertext is formed as CT = {C̃, C0 , C̄, C}.
4 Security Proof
We prove our scheme fully secure using the dual system [10] under the general
subgroup decision assumption, the three party Diffie-Hellman assumption in a
subgroup, and the source group q-parallel BDHE assumption in a subgroup.
Let Gamereal denote the real security game defined in Sect. 2.3. We assume
g2 ∈ Gp2 and give the definition of semi-functional keys and semi-functional
ciphertexts.
– Gamek . Let Q denote the total number of key queries from the attacker. In
this game, the ciphertext given to the attacker is semi-functional as well as
the first k keys. The remaining keys are normal.
We define some transitions to complete our security proof. At the beginning,
we transit from Gamereal to Game0 , then from Game0 to Game1 , and so on.
We finally get the transition of GameQ−1 to GameQ . The ciphertext as well
as all the keys given to the attacker are semi-functional in GameQ . We then
transit from GameQ to Gamef inal . Gamef inal is similar to GameQ except
that the ciphertext given to the attacker is a semi-functional encryption of a
random message.
To complete the transition from Gamek−1 to Gamek , we define another two
types of semi-functional keys as follows:
– Nominal Semi-functional Keys. The nominal semi-functional keys share
the values ai,ti , bi,ti , ui,ti with the semi-function ciphertext. Then choose ran-
dom exponents s and si . The nominal semi-functional keys are formed as:
si +ui,t ai,t bi,t λi ai,t λi bi,t λi
D0 g2−s , Di,0 g2 i i i
, Di,1 g2 i
, Di,2 g2 i
.
Ai,t , w, ui,t , ai,t , bi,t are selected randomly by B, and the master key is known
to B. B sends the public parameters to A. When A requests a secret key,
or a re-encryption key, B runs the normal KeyGen algorithm or the normal
RKGen algorithm to generate the requested one.
On the other hand, A is allowed to request a challenge ciphertext. A first
selects two messages M0 and M1 with the same length, and an access policy
W , then sends them to B. B flips coin to choose a random bit b and then
encrypts Mb (b ∈ {0, 1}) under W as follows. It implicitly sets g r equal to
the Gp1 part of T . It also chooses r̃i,t , r ∈ ZN , ∀t ∈ [1, ni ], ∀i ∈ [1, n] and
implicitly sets rr̃i,t = ri,t . The ciphertext is formed as:
C̃ = M e(g1 , T )w , C0 = T, C0 = T r , Ci,t,1 = (T ui,t bi,t )r̃i,t , Ci,t,2 = (T ui,t ai,t )1−r̃i,t .
Ai,t , w, ui,t , ai,t , bi,t are selected randomly by B, and the master key is known
to B. B sends the public parameters to A. When A requests a secret key
or a re-encryption key, B runs the normal KeyGen algorithm or the normal
RKGen algorithm to generate the requested one.
200 X. Feng et al.
Ai,t , w, ui,t , ai,t , bi,t are selected randomly by B, and the master key is known
to B. B sends the public parameters to A. When A requests a secret key,
or a re-encryption key, B runs the normal KeyGen algorithm or the normal
RKGen algorithm to generate the requested one.
In response to A’s first k − 1 key requests, B generates semi-functional keys
by first run the normal KeyGen algorithm and then multiplying D0 by a
random element of Gp2 .
Fully Secure Hidden Ciphertext-Policy Attribute-Based Proxy Re-encryption 201
To generate the k th key query by A, B first run the normal KeyGen algorithm
to generate a normal key D0 , {Di,0 , Di,1 , Di,2 }1≤i≤n . It then chooses random
exponents si , ui,t , ai,t , bi,t , λi ∈ ZN , the key is formed as:
s +ui,t ai,t bi,t λi ai,t λi bi,t λi
D0 T, Di,0 g2i , Di,1 g2 i
, Di,2 g2 i
.
Then B runs the RKGen algorithm and generates the re-encryption key:
s +ui,t ai,t bi,t λi r ai,t λi bi,t λi
D0 T, Di,0 g2i h , Di,1 g2 i
, Di,2 g2 i
.
If T = g2xyz , this will be a properly distributed nominal semi-functional key,
and when T is random in Gp2 , this will be a properly distributed temporary
semi-functional key.
To generate the semi-functional challenge ciphertext for message Mb and
access policy W . B first runs the normal Encrypt algorithm to generate
a normal ciphertext C̃, C0 , C0 , {{Ci,t,1 , Ci,t,2 }1≤t≤ni }1≤i≤n . It then chooses
random exponents r , ri,t ∈ Z∗p . The semi-functional ciphertext is formed as:
ui,t bi,t ri,t
C̃ = M e(g, g)wr , C0 g2r , C0 g2r , Ci,t,1 g2 , Ci,t,2 g ui,t ai,t ri,t .
– Lemma 4. There is no PPT attacker which can achieve a non-negligible dif-
ference in advantage between GameTk and Gamek for any k from 1 to Q. We
prove this lemma under the general subgroup decision assumption.
– Proof. The proof of this lemma is similar to Lemma 2, except that B uses
Y2 Y3 to place a random Gp2 component on the D0 part of the k th key to
make it a semi-functional key in the case that T has no Gp2 component.
– Lemma 5. There is no PPT attacker which can achieve a non-negligible dif-
ference in advantage between GameQ and Gamef inal .
We prove this lemma under the basic generic group assumption.
– Proof. Given a PPT attacker A achieving a non-negligible difference
in advantage between GameQ and Gamef inal , we will create a PPT
algorithm B to break the basic generic group assumption. B is given
g1 , g2 , g3 , g1w X2 , g1r Y2 , T where T is either e(g1 , g1 )wr or a random element
of Gp2 . Due to the different values of T , B will simulate either GameQ or
Gamef inal with A.
B first runs the Setup algorithm and generate the public parameters:
a u ai,t b u bi,t
N, p, G, GT , g1 , e, Y = e(g1 , g1 )w , {Ai,ti,t = g1 i,t i,t
, Ai,t = g1 i,t }1≤t≤ni }1≤i≤n .
Then B runs the RKGen algorithm and generates the re-encryption key:
D0 = (g1w X2 )g1−s Rg2r , Di,0 = g1i g1 i,t
s u ai,t bi,t λi r
h R0 , Di,1 = g ai,ti λi R1 , Di,2 = g bi,ti λi R2 .
202 X. Feng et al.
5 Conclusion
In this work, we propose a hidden ciphertext-policy attribute-based proxy re-
encryption scheme, which solves the problem of privacy leaking during the re-
encryption process. In addition, we further prove our scheme to be fully secure in
the standard model. In the future work, we intend to design a new CP-AB-PRE
scheme to reduce the computation cost of the re-encryption process and provide
a more expressive ability.
References
1. Ateniese, G., Fu, K., Green, M., Hohenberger, S.: Improved proxy re-encryption
schemes with applications to secure distributed storage. ACM Trans. Inf. Syst.
Secur. 9(1), 1–30 (2006)
2. Blaze, M., Bleumer, G., Strauss, M.: Divertible protocols and atomic proxy cryp-
tography. In: Nyberg, K. (ed.) EUROCRYPT 1998. LNCS, vol. 1403, pp. 127–144.
Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0054122
3. Chu, C.-K., Tzeng, W.-G.: Identity-based proxy re-encryption without random
oracles. In: Garay, J.A., Lenstra, A.K., Mambo, M., Peralta, R. (eds.) ISC 2007.
LNCS, vol. 4779, pp. 189–202. Springer, Heidelberg (2007). https://doi.org/10.
1007/978-3-540-75496-1 13
4. Do, J.M., Song, Y.J., Park, N.: Attribute based proxy re-encryption for data con-
fidentiality in cloud computing environments. In: First ACIS/JNU International
Conference on Computers, Networks, Systems and Industrial Engineering, pp. 248–
251 (2011)
Fully Secure Hidden Ciphertext-Policy Attribute-Based Proxy Re-encryption 203
5. Green, M., Ateniese, G.: Identity-based proxy re-encryption. In: Katz, J., Yung,
M. (eds.) ACNS 2007. LNCS, vol. 4521, pp. 288–306. Springer, Heidelberg (2007).
https://doi.org/10.1007/978-3-540-72738-5 19
6. Green, M., Hohenberger, S., Waters, B.: Outsourcing the decryption of ABE cipher-
texts. In: Usenix Conference on Security, pp. 34–34 (2011)
7. Hohenberger, S., Rothblum, G.N., Shelat, A., Vaikuntanathan, V.: Securely obfus-
cating re-encryption. In: Vadhan, S.P. (ed.) TCC 2007. LNCS, vol. 4392, pp. 233–
252. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70936-7 13
8. Hur, J., Noh, D.K.: Attribute-based access control with efficient revocation in data
outsourcing systems. IEEE Trans. Parallel Distrib. Syst. 22(7), 1214–1221 (2011)
9. Lai, J., Deng, R.H., Li, Y.: Fully secure cipertext-policy hiding CP-ABE. In: Bao,
F., Weng, J. (eds.) ISPEC 2011. LNCS, vol. 6672, pp. 24–39. Springer, Heidelberg
(2011). https://doi.org/10.1007/978-3-642-21031-0 3
10. Lewko, A., Waters, B.: New proof methods for attribute-based encryption: achiev-
ing full security through selective techniques. In: Safavi-Naini, R., Canetti, R.
(eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 180–198. Springer, Heidelberg (2012).
https://doi.org/10.1007/978-3-642-32009-5 12
11. Li, H., Pang, L.: Efficient and adaptively secure attribute-based proxy reencryption
scheme. Int. J. Distrib. Sens. Netw. 12, 1–12 (2016)
12. Li, J., Ren, K., Zhu, B., Wan, Z.: Privacy-aware attribute-based encryption with
user accountability. In: Samarati, P., Yung, M., Martinelli, F., Ardagna, C.A. (eds.)
ISC 2009. LNCS, vol. 5735, pp. 347–362. Springer, Heidelberg (2009). https://doi.
org/10.1007/978-3-642-04474-8 28
13. Li, X., Gu, D., Ren, Y., Ding, N., Yuan, K.: Efficient ciphertext-policy attribute
based encryption with hidden policy. In: Xiang, Y., Pathan, M., Tao, X., Wang,
H. (eds.) IDCS 2012. LNCS, vol. 7646, pp. 146–159. Springer, Heidelberg (2012).
https://doi.org/10.1007/978-3-642-34883-9 12
14. Liang, K., Man, H.A., Liu, J.K., Susilo, W., Wong, D.S., Yang, G., Yu, Y., Yang,
A.: A secure and efficient ciphertext-policy attribute-based proxy re-encryption for
cloud data sharing. Future Gener. Comput. Syst. 52(C), 95–108 (2015)
15. Liang, X., Cao, Z., Lin, H., Shao, J.: Attribute based proxy re-encryption with
delegating capabilities. In: AISACCS Pages, pp. 276–286 (2009)
16. Libert, B., Vergnaud, D.: Unidirectional chosen-ciphertext secure proxy re-
encryption. IEEE Trans. Inf. Theory 57(3), 1786–1802 (2011)
17. Liu, Q., Tan, C.C., Wu, J., Wang, G.: Reliable re-encryption in unreliable clouds.
In: Global Communications Conference, GLOBECOM 2011, 5–9 December 2011,
Houston, Texas, USA, pp. 1–5 (2011)
18. Liu, Q., Wang, G., Wu, J.: Time-based proxy re-encryption scheme for secure data
sharing in a cloud environment. Inf. Sci. 258(3), 355–370 (2014)
19. Luo, S., Hu, J., Chen, Z.: Ciphertext policy attribute-based proxy re-encryption.
In: Soriano, M., Qing, S., López, J. (eds.) ICICS 2010. LNCS, vol. 6476, pp. 401–
415. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17650-0 28
20. Matsuo, T.: Proxy re-encryption systems for identity-based encryption. In: Tak-
agi, T., Okamoto, T., Okamoto, E., Okamoto, T. (eds.) Pairing 2007. LNCS, vol.
4575, pp. 247–267. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-
540-73489-5 13
21. Nishide, T., Yoneyama, K., Ohta, K.: Attribute-based encryption with partially
hidden encryptor-specified access structures. In: Bellovin, S.M., Gennaro, R.,
Keromytis, A., Yung, M. (eds.) ACNS 2008. LNCS, vol. 5037, pp. 111–129.
Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68914-0 7
204 X. Feng et al.
22. Phuong, T.V.X., Yang, G., Susilo, W.: Hidden ciphertext policy attribute-based
encryption under standard assumptions. IEEE Trans. Inf. Forensics Secur. 11(1),
35–45 (2015)
23. Ran, C., Hohenberger, S.: Chosen-ciphertext secure proxy re-encryption. In: ACM
Conference on Computer and Communications Security, CCS 2007, Alexandria,
Virginia, USA, pp. 185–194, October 2007
24. Seo, H.J., Kim, H.: Attribute-based proxy re-encryption with a constant number
of pairing operations. J. Inf. Commun. Converg. Eng. 10(1), 53–60 (2012)
25. Seo, H., Kim, H.: Zigbee security for visitors in home automation using attribute
based proxy re-encryption. In: IEEE International Symposium on Consumer Elec-
tronics, pp. 304–307 (2011)
26. Guo, S., Zeng, Y., Wei, J., Xu, Q.: Attribute-based re-encryption scheme in the
standard model. Wuhan Univ. J. Nat. Sci. 13(5), 621–625 (2008)
27. Yu, S., Wang, C., Ren, K., Lou, W.: Achieving secure, scalable, and fine-grained
data access control in cloud computing. In: Conference on Information Communi-
cations, pp. 534–542 (2010)
Identity-Based Group Encryption
Revisited
1 Introduction
Group Encryption (GE) is an encryption analogue of group signature. A GE
scheme enables a sender to send a ciphertext to a member of group and satisfies
the verifier that the ciphertext belongs to some member of the group. The focus
of this paper is on the GE scheme in identity-based paradigm. There is only
one identity-based group encryption system reported in the literature, and that
was proposed in 2016 by Xiling et al. [2], which claims to achieve anonymity of
the receiver. We take a closer look at their protocol. We show that an honest
but curious verifier will be able to identify the designated recipient by using
the information exchanged between the sender and verifier during the execution
of a zero-knowledge protocol. Hence, this breaks the claimed anonymity of the
construction in [2].
identity and the identity that forms the ciphertext are identical and a PKG who
issues private keys to the users. The procedures involved are-
ParaGen: Let the user’s identity be ID ∈ Zp , G and GT be two groups of
order p, e: G × G → GT be an admissible bilinear map, Ḡ be an abelian group
of order p in which the DDH problem is hard. PKG chooses random g, h ← G
and a random α ← Zp , sets g1 ← g α , chooses g2 , g3 , t ← Ḡ, and a universal
hash function H. The public parameters are (g, g1 , h, g2 , g3 , t, H) and the master
secret key is α.
GKGen: This procedure chooses random x1 , x2 , y1 , y2 , z ← Zp , computes
w = g2 x1 g3 x2 , d = g2 y1 g3 y2 , l = g2 z . Group public key and secret keys are
(g2 , g3 , w, d, l) and (x1 , x2 , y1 , y2 , z) respectively.
UKGen: For every user, PKG chooses a random r ← Zp , and calculates the
user’s secret key SKID = (r, hID ), where hID = (hg −r )1/(α−ID) . The user
registers his identity with the group manager.
Encryption: This procedure can be divided into two sub procedures:-
1. Message encryption: Given plaintext m ∈ GT , and member’s identity ID,
choose random s ← Zp . Then compute the ciphertext as
C1 = (g1s g −s.ID , e(g, g)s , m · e(g, h)−s ) = (C10 , C11 , C12 ) (1)
C2 = (k1 , k2 , ψ, v) (2)
The zero knowledge proof has been converted into an equivalent form as
follows-
C10 = g1s g −s·ID , C11 = e(g, g)s , k1 = g2n , k2 = g3n ,
ZK s, n, ID ψ = ln tID , v = wn dn , A = ψ s , A = A1 A2 ,
A1 = lns , A−12 =t
−s·ID
, k = k1s , k = g2ns
The protocol is a 3-move protocol as discussed below-
¯ n̄ randomly and computes C¯10 = g s̄ g −s̄·ID ¯
1. Prover chooses s̄, ID, 1 , C¯11 =
s̄ ¯ n̄ ¯ n̄ ¯
n̄ ID n̄ n̄ s̄
e(g, g) , k1 = g2 , k2 = g3 , ψ̄ = l t , v̄ = w d , Ā = ψ , Ā = Ā1 Ā2 , Ā1 =
s̄
ln̄s̄ , A¯−1
¯
−s̄·ID
2 =t , k̄ = k¯1 , k̄ = g2n̄s̄ and sends these to the verifier.
Identity-Based Group Encryption Revisited 207
k c k¯1 , g r2 = k c k¯2 , (wd )r2 = v c v̄, lr2 tr3 = ψ c ψ̄, g r1 g r4 = C¯10 C c , tr4 =
? ? ? ? ?
1 3 2 1 10
−1 −1 c r5 ?
g2r5 =
?
A2 (A2 ) , l = Ac1 Ā1 , c
k k̄. The verifier outputs 1 if all the checks
hold true, otherwise it outputs 0.
¯ + c · IDr mod p,
c ∈ Zp , r1 ≡ s̄ + cs mod p, r2 ≡ n̄ + cn mod p, r3 ≡ ID
¯
r4 ≡ −s̄ · ID − cs · IDr mod p, r5 ≡ n̄s̄ + c · ns mod p
Now with the obtained c, r1 , r3 , r4 from the sender and C11 in Eq. (3), we
show that the verifier will be able to identify the actual receiver.
For each identity IDi in the group, the verifier performs the procedure PROC
as shown below.
PROC (IDi , c, r1 , r3 , r4 , C11 )
1. Compute g r1 ·IDi + r4
2. Compute e(g, g)r1 · (C11 )−c = e(g, g)r1 · (e(g, g)s )−c = e(g, g)r1 −sc =
e(g, g)s̄
3. Compute e(g, g)s̄·r3
4. Compute X = e(g, g r1 ·IDi + r4
) · e(g, g)s̄·r3 )
208 K. Gupta et al.
¯
and e(g, g)s̄·r3 = e(g, g)s̄(ID+c·IDr )
Hence,
¯ ¯
X = e(g, g s̄(IDr −ID) ) · e(g, g)s̄(ID+c·IDr )
¯ ¯
= e(g, g)s̄IDr −s̄ID+s̄ID+cs̄IDr = e(g, g)s̄·IDr (c+1)
1. X = e(g, g r1 ·IDx + r4
) · e(g, g)s̄·r3 ), where
¯
g r1 ·IDx + r4
= g (s̄+cs)·IDx −s̄·ID−cs·IDr
¯
= g s̄(IDx −ID)+sc(IDx −IDr )
and
¯
e(g, g)s̄·r3 = e(g, g)s̄(ID+c·IDr )
Hence,
¯ ¯
X = e(g, g s̄(IDx −ID)+sc(IDx −IDr ) ) · e(g, g)s̄(ID+c·IDr )
¯ ¯
= e(g, g)s̄IDx −s̄ID+sc·IDx −sc·IDr +s̄ID+s̄c·IDr
= e(g, g)s̄(IDx +c·IDr )+sc(IDx −IDr )
H̄(C1 , C2 , C¯10 , C¯11 , k¯1 , k¯2 , ψ̄, v̄, Ā, A¯1 , A¯2 , A¯−1
2 ) = c,
where the values C1 , C2 are defined in Eqs. (1), (2) respectively and
C¯10 , C¯11 , k¯1 , k¯2 , ψ̄, v̄, Ā, Ā1 , Ā2 , A¯−1
2 are defined in Eq. (4), and sends c to the
verifier. Hence, all the steps in procedure PROC can be executed by the veri-
fier using the ciphertext C and the hash value to break the anonymity of the
receiver.
References
1. Groth, J., Sahai, A.: Efficient non-interactive proof systems for bilineargroups. In:
Electronic Colloquium on Computational Complexity (ECCC), vol. 14, no. 053
(2007)
2. Luo, X., Ren, Y., Liu, J., Hu, J., Liu, W., Wang, Z., Xu, W., Wu, Q.: Identity-based
group encryption. In: Liu, J.K., Steinfeld, R. (eds.) ACISP 2016. LNCS, vol. 9723,
pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40367-0 6
Compact Hierarchical IBE from Lattices
in the Standard Model
Daode Zhang1,2,3 , Fuyang Fang4(B) , Bao Li1,2,3 , Haiyang Xue1 , and Bei Liang5
1
School of Cyber Security, University of Chinese Academy of Sciences,
Beijing, China
{zhangdaode,lb}@is.ac.cn, [email protected]
2
State Key Laboratory of Information Security,
Institute of Information Engineering, Beijing, China
3
Science and Technology on Communication Security Laboratory,
Chengdu, China
4
Information Science Academy, China Electronics Technology Group Corporation,
Beijing, China
fuyang [email protected]
5
Chalmers University of Technology, Gothenburg, Sweden
[email protected]
1 Introduction
Hierarchical identity-based encryption (HIBE) proposed by Horwitz et al.
[7,8] is an extension of identity-based encryption (IBE)[12], in which arbitrary
string can be as the public key. In a HIBE scheme, an identity at level k of the
hierarchy tree is provided with a private key from its parent identity and also
can delegate private keys to its descendant identities, but cannot decrypt the
message intended for other identities.
HIBE from Lattices: The first lattice-based HIBE scheme based on the Learn-
ing with Errors (LWE) problem [11] proposed by Cash et al. [5], using the basis
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 210–221, 2018.
https://doi.org/10.1007/978-3-319-89500-0_19
Compact Hierarchical IBE from Lattices in the Standard Model 211
delegation technique for lattices. Agrawal et al. [1] proposed SampleLeft and
SampleRight algorithms, then extended them and obtained another basis dele-
gation technique, with which they constructed an efficient HIBE scheme with
selective security in the standard model. However, the above basis delegation
techniques will increase the dimension of lattice involved, as well as the size of
ciphertext. Later, Agrawal et al. [2] proposed a different delegation mechanism,
called “in place” delegation technique, which preserves the dimension of lattices.
With this technique, they constructed two HIBE schemes with and without ran-
dom oracles, and the dimension of lattices involved for all nodes in the hierarchy
remained unchanged. Nevertheless, as they said in [2], the construction in the
standard model was competitive with previous schemes in [1,5] only when the
bits of identity (|idi | = λ) at level i in the hierarchy is small, e.g., λ = 1 at each
level. Furthermore, as the length of identity increases, e.g., λ = n, the sizes of
ciphertext, private key and master public key will be worse than the parameters
in [1]. With the “in place” delegation technique, Fang et al. also utilized the
Learning with Rounding (LWR) assumption [3,4] over small modulus to con-
struct HIBE schemes. Thus, they possess the same restrictions as [2]. Micciancio
and Peikert [10] introduced the notion of G-trapdoor for lattices and proposed an
efficient trapdoor delegation for lattices. With this technique, they can decrease
the public key and ciphertext by 4 factors and the size of the delegated trapdoor
grows only linearly with the dimension of lattices in the hierarchy, rather than
quadratically in [1], but the ciphertext will be increased by nk log q bits node by
node.
We apply a gadget matrix G ∈ Zn×nk q defined in [10] into the basis delega-
tion technique in [1] to construct a selectively secure HIBE scheme with small
parameter based on the LWE problem in the standard model, where k = logb q,
b = 2d and d is the maximum depth of the HIBE scheme.
The public parameter in our HIBE scheme needs to contain one matrix of
the same dimension as G (i.e., about n logb q) and the size of ciphertext is
n logb q log q ≈ d1 · n log2 q for each level of the hierarchy. However, we obtain
this improvement at the cost of increasing the size of private key. Thus, the
parameters in our HIBE are the trade-off of the sizes of the public parameter
and private keys. Next, we compare our scheme with the previous schemes in
following Table 1.
From Table 1, the advantages of our HIBE scheme are:
1. The size of the master public key in [1,10] is reduced by a factor of O(d);
d
2. The sizes of the ciphertext and lattice dimension at level are d+ · = O()
d+
times smaller than the sizes in [1,10] and d < 2 times larger than the sizes
in [2] on the condition that λ = 1. In particular, the parameters in ABB10b
except the private key are competitive with our HIBE scheme only when
λ = 1.
212 D. Zhang et al.
d
larger than [10] and the maximum ratio can reach to O( log n + 1)when = 1.
2. The error rate 1/α is lightly smaller than the sizes in [1,10] when d > 4.
Analysis: Before explaining why this modification works, let us firstly describe
the reason that the sizes of ciphertexts in [1,10] increase as mentioned above.
In [1], the identity-based encryption matrix for identity id = (id1 , · · · , id ) ∈
({0, 1}λ ) is
where A, A1 , · · · , A , B ∈ Zn×m
q and m = O(n log q). The difference in [10] is
that the matrix B is replaced by a gadget matrix G ∈ Zn×nk
q , that is,
where d is the maximum depth of the HIBE scheme and O(1) here satisfies
O(1) ≥ 2 is a small constant. That is, the parameter d plays the important role
on the size of the public parameters.
The straight modification is to replace B and G with another matrix, which
has a short basis as trapdoor but with smaller columns. We know that the gadget
Compact Hierarchical IBE from Lattices in the Standard Model 213
matrix G has special structure that can be simply modified. The widely used
version of G is defined as
G = gt ⊗ In ∈ Zn×nk
q
√2, · · · , 2
where gt = (1, k−1
) and k = log q. Lattice Λ⊥ (G) has a short basis
S and S̃ ≤ 5. In fact, a generalized notion of gadget G provided in [10] is
defined as
G = gt ⊗ In ∈ Zn×nk
q
√
and the hardness of LWE requires that α q ≥ 2 n.
Without loss of generality, we can set m = 2n log q. Hence, the modulus q
should satisfy
√ d
q ≥ n · (m + knd)(d+3)/2 · b · ω(log 2 (m + knd))
√ d
⇒ q ≥ n · (2m)d/2 · 2d · ω(log 2 (2m))
√ d
⇒ q ≥ n · (dn log n)d/2 · 2d · ω(log 2 (2m))
⇒ q ≥ Õ((4d)d/2 · nd/2 )
which is sufficient for our HIBE scheme and lightly smaller than the sizes of q
in [1,10] if d > 4.
Furthermore, we decrease the columns of G from nlog q to nlogb q so that
the sizes of ciphertext and the master key increase linearly with nlogb q, rather
than m in [1] or nlog q in [10] for each hierarchy and m + nk < m + dnk <
2m = O(dn log n). This is why the sizes of ciphertext and the master key decrease
by about and d factors, respectively.
2 Preliminaries
Let n be the security parameter and we use negl (n) to denote an arbitrary
negligible function f (n) where f (n) = o(n−c ) for every fixed constant c. We
say that a probability is overwhelming if it is 1 − negl (n). We use poly(n) and
O(n) to denote an unspecified function f (n) = O(nc ) and f (n) = O(n · log c n)
respectively for some constant c. We use A ≈c(s) B to denote a distribution A is
computationally (statistically) indistinguishable from a distribution B. Let Zq be
a q-ary finite field for a prime q ≥ 2. The s1 (R) are called the singular values of
R and s1 (R) = maxu Ru = maxu Rt u ≤ R, Rt , where the maximum
$
are taken over all unit vectors u. Let a ←
− Zq denote that a is randomly chosen
from Zq .
– Init: The adversary A is given the maximum hierarchy depth d and announces
a target identity id∗ = {i1 , ..., it } of depth t < d.
– KeyGen: The simulator S generates the KeyGen algorithm to generate the
public parameter mpk and master key msk and sends mpk to adversary A.
– Query1: The adversary A makes queries on identity id1 , ..., idk , where no one
is a prefix of id∗ . The simulator returns the private key Skid i responding to
each query on identity idi by calling the Extract algorithm.
– Challenge Ciphertext: When the phase of Query1 is over and the adversary A
sends a challenge message m ∈ M to S. The simulator S chooses a random
bit b ∈ {0, 1} and a random c from the ciphertext space. If b = 0, then
S generates the challenge ciphertext c∗ by calling Enc(mpk, id∗ , m) with
message m; Otherwise, S sends c as the challenge ciphertext c∗ to A.
– Query2: The adversary makes additional adaptive private key queries as in
the phase of Query1 and the simulator proceeds as before.
– Guess: Finally, the adversary outputs a guess b ∈ {0, 1} and wins if b = b.
Advindr-sid-cpa
HIBE,A ≤ negl(n)
216 D. Zhang et al.
Lemma 1 ([10]). Let n, q > 2, m = O(n log q) be integers, then there exists a
polynomial time algorithm GenTrap(n, m, q) outputs a vector A ∈ Zn×m
q and a
matrix TA ∈ Zm×m , where TA is a basis for Λ⊥ (A) such that A is statistically
√
close to uniform and T A = O( n log q).
Let R = (R1 | · · · |R ) and hid = [H(id1 − id∗1 )B| · · · |H(id − id∗ )B] ∈ Zn×m
q .
The matrix TB ∈ Zm×m is a basis for Λ⊥ (B) and a Gaussian parameter
√
σ ≥ T B · s1 (R ) · ω( log m), then there exists a polynomial time algorithm
⊥
SampleBasisRight(A, R , Fid , TB , σ) outputs a basis T for lattice Λ (Fid ) and
≤ σ ( + 1)m.
satisfies T
In our work, we will use the gadget matrix G instead of the matrix B
in the SampleBasisRight algorithm and use the algorithm SampleD instead of
SampleRight in the SampleBasisRight algorithm. Because the G is rank n and
S is a short basis for Λ⊥ (G ), we can obtain the following corollary.
218 D. Zhang et al.
Fid = [A|AR1 + H(id1 − id∗1 )G | · · · |AR + H(id − id∗ )G ] ∈ Zm+nk
q
Let R = (R1 | · · · |R ) and hid = [H(id1 − id∗1 )G | · · · |H(id − id∗ )G ] ∈ Zn×nk
q .
The matrix S ∈ Znk×nk is a basis for Λ⊥ (G ) and a Gaussian parameter
√
σ ≥ S · s1 (R ) · ω( log nk), then there exists a polynomial time algorithm
SampleBasisRight(A, R , Fid , S , σ) outputs a basis T for lattice Λ⊥ ([A|A ]) and
√
≤ σ m + nk.
satisfies T
Proof. When we replace the matrix B with√the gadget matrix G and set the
Gaussian parameter σ ≥ S · s1 (R ) · ω( log nk), then output of the algo-
rithms SampleRight and SampleD are the same. Therefore, the SampleBasisRight
algorithm will output the short basis for Fid .
– Derive(mpk, id|id , SKid ) → SKid|id : Given the master public key mpk, a
private key SKid for identity id = {id1 , · · · , id−1 } at depth − 1 as inputs,
the algorithm works as follows:
Compact Hierarchical IBE from Lattices in the Standard Model 219
1. Let Aid = [A1 + H(id1 )G | · · · |A−1 + H(id−1 )G ] and Fid = [A|Aid ],
then SKid is a short basis for lattice Λ⊥q (Fid );
2. Let Fid|id = [Fid |A + H(id )G ];
3. Construct short basis for lattice Λ⊥ q (Fid|id ) by invoking
n×(m+nk)
and Fid = [A|Aid ] ∈ Zq ;
$
2. Choose a uniformly randomness s ←
− Znq ;
$
3. Choose a uniformly random matrix R ← − {−1, 1}m×nk ;
4. Choose (x0 , x1 ) ← DZ,α q × DZ,α q , then set xt2 = xt1 R ∈ Znk
m
q and
compute
c0 = st u + x0 + q/2 · m
c1 = st Fid + [xt1 |xt2 ] = st [A|Aid ] + [xt1 |xt2 ]
5. Output the ciphertextct = (c0 , ct1 ) ∈ Zq × Zm+nk
q .
– Decrypt(c, id, SKid ) → m or ⊥: Given the ciphertext c and the private key
SKid for identity id as input, the algorithm works as follows:
1. Parse ct = (c0 , ct1 ), if c cannot parse in this way, output ⊥;
2. Compute Aid , Fid as before;
√
3. Let τ = σ · m + nk · ω( log(m + nk)) ≥ SK id · ω( log(m + nk))
and sample eid ∈ Zm+nk as
4 Conclusion
In this paper, we introduce a trade off of the sizes of public parameter and
ciphertext and the size of private key in the selective-secure LWE-based HIBE
scheme in the standard model. We obtain this trade-off by adjusting the base of
n×nlogb q
the gadget matrix G ∈ Zq defined in [10]. By setting b = 2d , the size
of the master public key and ciphertext at level can de reduced by a factor
of O(d) and O() respectively, at the cost of increasing the size of private key
d
by a factor of O( log n + 1). And the parameters in ABB10b scheme except the
private key is competitive with our HIBE scheme only when λ = 1.
Compact Hierarchical IBE from Lattices in the Standard Model 221
References
1. Agrawal, S., Boneh, D., Boyen, X.: Efficient lattice (H)IBE in the standard model.
In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 553–572. Springer,
Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5 28
2. Agrawal, S., Boneh, D., Boyen, X.: Lattice basis delegation in fixed dimension and
shorter-ciphertext hierarchical IBE. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol.
6223, pp. 98–115. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-
14623-7 6
3. Banerjee, A., Peikert, C., Rosen, A.: Pseudorandom functions and lattices. In:
Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp.
719–737. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29011-
4 42
4. Bogdanov, A., Guo, S., Masny, D., Richelson, S., Rosen, A.: On the hardness of
learning with rounding over small modulus. In: Kushilevitz, E., Malkin, T. (eds.)
TCC 2016. LNCS, vol. 9562, pp. 209–224. Springer, Heidelberg (2016). https://
doi.org/10.1007/978-3-662-49096-9 9
5. Cash, D., Hofheinz, D., Kiltz, E., Peikert, C.: Bonsai trees, or how to delegate a
lattice basis. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 523–
552. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5 27
6. Gentry, C., Peikert, C., Vaikuntanathan, V.: Trapdoors for hard lattices and new
cryptographic constructions. In: STOC 2008, pp. 197–206 (2008)
7. Gentry, C., Silverberg, A.: Hierarchical ID-based cryptography. In: Zheng, Y. (ed.)
ASIACRYPT 2002. LNCS, vol. 2501, pp. 548–566. Springer, Heidelberg (2002).
https://doi.org/10.1007/3-540-36178-2 34
8. Horwitz, J., Lynn, B.: Toward hierarchical identity-based encryption. In: Knudsen,
L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 466–481. Springer, Heidelberg
(2002). https://doi.org/10.1007/3-540-46035-7 31
9. Katsumata, S., Yamada, S.: Partitioning via non-linear polynomial functions: more
compact IBEs from ideal lattices and bilinear maps. In: Cheon, J.H., Takagi, T.
(eds.) ASIACRYPT 2016. LNCS, vol. 10032, pp. 682–712. Springer, Heidelberg
(2016). https://doi.org/10.1007/978-3-662-53890-6 23
10. Micciancio, D., Peikert, C.: Trapdoors for lattices: simpler, tighter, faster, smaller.
In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp.
700–718. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29011-
4 41
11. Regev, O.: On lattices, learning with errors, random linear codes, and cryptogra-
phy. In: STOC 2005, pp. 84–93 (2005)
12. Shamir, A.: Identity-based cryptosystems and signature schemes. In: Blakley, G.R.,
Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 47–53. Springer, Heidelberg
(1985). https://doi.org/10.1007/3-540-39568-7 5
Attacks and Attacks Defense
Methods for Increasing the Resistance
of Cryptographic Designs Against
Horizontal DPA Attacks
Abstract. Side channel analysis attacks, especially horizontal DPA and DEMA
attacks, are significant threats for cryptographic designs. In this paper we
investigate to which extend different multiplication formulae and randomization
of the field multiplier increase the resistance of an ECC design against horizontal
attacks. We implemented a randomized sequence of the calculation of partial
products for the field multiplication in order to increase the security features of
the field multiplier. Additionally, we use the partial polynomial multiplier itself
as a kind of countermeasure against DPA attacks. We demonstrate that the
implemented classical multiplication formula can increase the inherent resis-
tance of the whole ECC design. We also investigate the impact of the combi-
nation of these two approaches. For the evaluation we synthesized all these
designs for a 250 nm gate library technologies, and analysed the simulated
power traces. All investigated protection means help to decrease the success rate
of attacks significantly: the correctness of the revealed key was decreased from
99% to 69%.
1 Introduction
Wireless Sensor Networks (WSNs) and the Internet of Things (IoT) are emerging
technologies and are used in application fields such as telemedicine, automation control
and monitoring of critical infrastructures. These application fields require the data to be
kept confidential and/or to ensure the integrity of transmitted data.
RSA and Elliptic curve cryptography (ECC) are asymmetric cryptographic
approaches. Both can be applied not only for encryption and decryption of messages
but also for digital signature operations and for key exchange. To reduce the time and
energy consumption of computation, asymmetric cryptographic algorithms are imple-
mented in hardware, as cryptographic accelerators. The area of cryptographic accel-
erators defines its production costs as well as its energy consumption per clock cycle,
so it has to be as small as possible. As ECC uses by far smaller keys than RSA, it
© Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 225–235, 2018.
https://doi.org/10.1007/978-3-319-89500-0_20
226 I. Kabin et al.
provides an energy efficient kind of public key cryptography and is well suited for
WSNs and for the IoT. In this type of networks the risk of side channel analysis
(SCA) attacks needs to be taken serious. Due to the need of saving energy i.e. sleeping
intervals and due to the nature of wireless connections devices can be stolen unnoticed,
analysed in a labor and brought back. So, the devices or better the implementations of
cryptographic operations need to be as resistant to SCA attacks as possible.
In ECC each cryptographic key pair consists of a private and a public component.
As the security of the ECC is based on keeping the private key secret the goal of an
attacker is to reveal this key. The most often applied attacks are power analysis
(PA) attacks or electromagnetic analysis (EMA) attacks. The attacker measures the
current through the crypto-accelerator or its electromagnetic emanation while a cryp-
tographic operation using the private key or other sensitive data is performed. For ECC
the core operation is the elliptic curve point multiplication with a scalar, denoted as kP
operation. P is a point of the elliptic curve (EC) and k is a scalar. For the ECDSA
signature generation [1] the critical operation is a kG multiplication, i.e. a multiplication
of the EC basis point G with a random number k. If an attacker can reveal the scalar k,
the private key Key used for a signature generation can be easy calculated as follows:
ske
Key ¼ mod e
r
Here e is a hash value of the message to be signed; numbers r and s are components
of the digital signature and e is the order of the EC basis point G, respectively to [1, 2].
The numbers r, s and the message itself are transmitted to a receiver, i.e. the attacker
knows these numbers. Additionally, the point G and its order e are parameters of the
EC, i.e. they are known to the attacker, as well.
kP algorithms implemented in hardware process the scalar k bitwise. Thus, the
processing of each key bit takes a certain time, here denoted as a slot. The shape of a
slot in a measured trace depends on the circuit of the ECC design, on the value of the
processed key bit and on the data processed in the slot. This means that the measured
traces can be used for revealing the scalar k. Horizontal attacks [3], i.e. attacks based on
a statistical analysis of a single trace, are significant threats for cryptographic devices,
especially due to the fact that the traditional randomization methods such as ran-
domization of the scalar k, blinding of the EC point P or randomization of the pro-
jective coordinates of point P [4] do not provide any kind of protection.
2 Investigated kP Designs
1987 Montgomery proposed an algorithm for the kP calculation [9]. 1998 Lopez and
Dahab showed that the Montgomery kP algorithm can be performed using only the x-
coordinate of the point P if P is a point on EC over GF(2n) [10]. Additionally, they
proposed to use special projective coordinates of the EC point P to avoid the most
complex operation, i.e. the division of elements of Galois fields. These optimizations
reduced the execution time and the energy consumption of the kP calculation signifi-
cantly. The Montgomery algorithm using projective Lopez-Dahab coordinates is a time
and energy efficient solution and due to this fact this algorithm is the one mostly used
for implementing the EC point multiplication in hardware. The most referenced version
of the Montgomery kP algorithm is [11]. The kP operation according to this algorithm,
can be performed using only 6 multiplications, 5 squarings and 3 additions of Galois
field elements for each key bit, except of the most significant bit kl−1 = 1. The length of
the operands depends on the chosen security level. We experimented with a kP design
for EC B-233, recommended by NIST [1]. The maximal length l of operands is up to
233 bits in our designs.
The Montgomery kP algorithm has the same sequence of operations for the pro-
cessing of each key bit, independently of its value. Such implementations are resistant
against SPA attacks. A possibility to increase the inherent resistance of the Mont-
gomery kP implementations against SCA attacks (not only against simple ones), is to
increase the noise level in the analysed power profile. As reported in [7] the field
multiplier can be the source of the noise if itself is resistant against SCA attacks. The
write to register operations are most analysed ones while an attack is performed. If
these operations are executed in parallel to the field multiplications the analysis
becomes by far more complex. Thus, implementations exploiting parallel execution of
operations of the kP algorithm are inherently more resistant against PA attacks.
Additionally, the execution of many operations in parallel reduces the execution time
of the cryptographic operations and increases the efficiency of the design.
Our kP design is a balanced and efficient implementation of the kP algorithm based
on Algorithm 2 published in [7], that is a modification of the Montgomery kP
algorithm.
2.1 Basic Design: Balanced, Efficient, Resistant Against SPA and HCCA
The structure of our kP designs is shown in Fig. 1.
228 I. Kabin et al.
The block Controller manages the sequence of the field operations. It controls the
data flow between the other blocks and defines which operation has to be performed in
the current clock cycle. Depending on the signals of the Controller the block ALU
performs addition or squaring of its operands. Our design comprises of only one block
MULT to calculate the product of 233 bit long operands. The multiplication is the most
complex field operation. In our implementation it takes 9 clock cycles to calculate the
product according to a fixed calculation plan using the iterative Karatsuba multipli-
cation method as described in [12]. In each of the 9 clock cycles one partial polynomial
product of two 59 bit long operands Aj and Bj (with 1 j 9) is calculated and
accumulated to the product including reduction.
Figure 2 shows the structure of our field multiplier for 233 bit long operands. It
consists of a Partial Multiplier (PM) for 59 bit long operands. The field multiplier takes
9 clock cycles to calculate the product using a 59 bit partial multiplier. The PM takes 1
clock cycle to calculate the polynomial product of 59 bit long operands and is
implemented as a combination of 3 multiplication methods (MMs). The 2-segment
iterative Karatsuba multiplication formula [14] was applied for 60-bit long multipli-
cands. The gate complexity of this multiplier is GC2m ¼ 3 GCm þ ð7m 3ÞXOR . Here
m is the length of segments m = 60/2 = 30 and GCm is the gate complexity of the
internal m-bit partial multipliers. Thus, the 59 bit partial multiplier contains of 3
internal multipliers: two of them for 30 bit long operands and one multiplier for 29 bit
long operands.
All these internal multipliers are implemented identically, using the 6-segment
iterative Winograd multiplication formula [14], which gate complexity is:
GC6m ¼ 18 GCm þ ð72m 19ÞXOR , with m = 30/6 = 5 bits. Corresponding to the
6-segment iterative Winograd multiplication formula the 30-bit multiplier consists of
18 internal multipliers of 5-bit long operands. Each of these small multipliers was
implemented using the classical multiplication formula with n = 5:
2X
n2
C ¼AB¼ ci ti ; with ci ¼ ak bl ; 8k; l\n : ð1Þ
i¼k þ l
i¼0
Fig. 2. Structure of the field multiplier for 233 bit long operands.
The field reduction has to be applied to the accumulation register and can be
performed either once per field multiplication or after calculating of each partial
product. The latter design consumes more power for the calculation of the filed product
but the power shape of such a multiplication is more random. The partial reduction
after each calculation of a partial product was implemented not only in [15, 16] but also
in the design reported in [17]. All designs investigated here perform the reduction of
the product after the calculation of each partial product to increase the noise and to
reduce the success of SCA attacks.
In our Basic Design the multiplication formula contains 9 partial products of 59 bit
long operands, i.e. for each new field product calculation one out of 9! possible cal-
culation sequences can be selected randomly.
Table 1 gives an overview of implementation details of our design and the
implementation described in [15, 16].
The smaller number of possible permutations in our design compared to the one
described in [15, 16] means that our multiplier is more vulnerable to collision-based
attacks. They are a kind of vertical attacks and can be prevented using traditional
randomization countermeasures [4]. In this work we concentrate on the prevention of
horizontal DPA attacks. The area and energy consumption of a partial multiplier for
59-bit long operands are significantly higher than those of a multiplier for 32 bit long
operands. Thus, the 59-bit partial multiplier can be more effective as a noise source and
by that as a means against horizontal DPA attacks.
The gate complexity of such a multiplier is the biggest one of all potential multi-
pliers. All other multiplication methods, like Karatsuba or eMSK multiplication for-
mulae, were developed with the goal to reduce the (gate) complexity of the classical
multiplication formula. On the one hand the gate complexity is a disadvantage of the
classical multiplication method because it results in an increased chip area, price and
energy consumption. But on the other hand the increased energy consumption and
especially its fluctuation mean an increased noise level for an attacker, if it analyses the
activity of the other blocks. Due to this fact, using the classical MM for the imple-
mentation of the partial multiplier can be an advantage, because it increases the
inherent robustness of kP designs against SCA attacks.
1 X229
pi ¼ pi ð2Þ
230 j¼0 j
Thus, the 54 calculated values pi define the mean power profile of slots.
232 I. Kabin et al.
2. For each i we obtained one key candidate k candidate i using the following assump-
tion: the jth bit of the key candidate is 1 if in the slot with number j the value with
number i – i.e. the value pij – is smaller than or equal to the average value pi . Else
the jth bit of the ith key candidate is 0:
8
< 1; if pij pi
kjcandidate i
¼ ð3Þ
:
0; if pij [ pi
To evaluate the success of the attack we compared all extracted key candidates with
the scalar k that was really processed. For each key candidate we calculated its relative
correctness as follows:
Thus, we define the correctness as a value between 50% and 100%. For the attacker
the worst-case of the attack results is a correctness of 50% which means the difference
of means test cannot even provide a slight hint whether the key bit processed is more
likely a ‘1’ or a ‘0’. The worst-case from the attacker’s point of view is the ideal case
from the designer’s point of view. We denote it as the “ideal case” in the rest of the
paper.
Figure 3 shows attack results i.e. relative correctness d for the key candidates
extracted using PTs of our Basic Design, simulated for the 250 nm technology. In order
to demonstrate that well-known countermeasures [4] are not effective against horizontal
DPA, we applied point blinding, key randomization and a combination of both as
countermeasures with the goal to randomize the data processed in our Basic Design.
The red bars show the result of the attack for the Basic Design without randomized
inputs. The green, the yellow and the black bars show the analysis results when
traditional countermeasures [4] are applied.
Figure 4 shows all key candidates given in Fig. 3 sorted in descending order of
correctness. According to that each key candidate got a new index displayed at the
x axis. This representation helps to compare the analysis results.
Methods for Increasing the Resistance of Cryptographic Designs 233
Fig. 3. Results of attacks against kP executions with and without traditional randomization
countermeasure. All PTs were simulated for our Basic Design using the 250 nm technology.
(Color figure online)
Fig. 4. Correctness of key candidates from Fig. 3 sorted in descending order; according to that
each key candidate got a new index displayed at the x axis. The one with the highest correctness
is now number 1, the one with the lowest number 54. The blue horizontal line at 50% shows the
ideal case. (Color figure online)
Comparing attack results displayed in Figs. 3 and 4 shows clearly that the tradi-
tional randomization countermeasures do not provide any protection against horizontal
SCA attacks.
In this section we discuss the analysis results of the power shape randomization
strategies introduced in Sect. 2. We synthesized the 4 designs described in Sect. 2 for a
250 nm technology. Then we simulated the designs using PrimeTime [18] to get power
traces which we then analysed to evaluate the effectiveness of the randomization
strategies for the complete kP designs. All 4 power traces were simulated using the
same inputs, i.e. the key k and EC point P. Figure 5 shows the analysis results.
234 I. Kabin et al.
Fig. 5. Attack results: correctness of the extracted keys sorted in descending order. (Color figure
online)
The results of the analysis show that the implementation of the partial multiplier
using the classical multiplication formula has a significant impact on the resistance of
the kP design against horizontal DPA (see yellow bars in Fig. 5). This effect is similar
to the randomization of the calculation sequence of partial products if the PM is
implemented as an area-optimized combination of MMs (see green bars). Both
strategies combined, i.e. applying a randomized sequence of PMs and implementation
of PM using the classical MM increases this effect significantly: the correctness of the
extraction was decreased from 99% for our Basic Design (see blue bars) to 69% (see
black bars) that is a significant improvement of the design’s resistance against the
applied horizontal DPA attack.
5 Conclusion
In this paper we showed that traditional countermeasures such as point blinding and
key randomization provide almost no protection against horizontal DPA attacks (see
Figs. 3 and 4 in Sect. 3). In order to prevent horizontal DPA attacks from being
successful we investigated alternative means to increase the resistance of the kP
designs: randomizing the calculation sequence of the partial products and imple-
menting the partial multiplier using the classical multiplication formula. We showed
that the impact of both countermeasures on the success of horizontal DPA attacks is
similar. Especially a combination of these approaches can significantly increase the
inherent resistance of ECC designs against horizontal attacks: the correctness of the
revealed key was decreased from 99% to 69% (see Fig. 5 in Sect. 4).
Acknowledgments. The work presented here was partly supported by the German Ministry of
Research and Education (BMBF) within the ParSec project, grant agreement no. 16KIS0219K.
References
1. Federal Information Processing Standard (FIPS) 186-4, Digital Signature Standard; Request
for Comments on the NIST-Recommended Elliptic Curves (2015)
Methods for Increasing the Resistance of Cryptographic Designs 235
2. Johnson, D., Menezes, A., Vanstone, S.: The elliptic curve digital signature algorithm
(ECDSA). IJIS 1, 36–63 (2001)
3. Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Horizontal correlation
analysis on exponentiation. In: Soriano, M., Qing, S., López, J. (eds.) ICICS 2010. LNCS,
vol. 6476, pp. 46–61. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-
17650-0_5
4. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems.
In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302. Springer,
Heidelberg (1999). https://doi.org/10.1007/3-540-48059-5_25
5. Kabin, I., Dyka, Z., Kreiser, D., Langendoerfer, P.: Evaluation of resistance of ECC designs
protected by different randomization countermeasures against horizontal DPA attacks. In:
Proceedings of IEEE East-West Design Test Symposium (EWDTS2017) (2017)
6. Kabin, I., Dyka, Z., Kreiser, D., Langendoerfer, P.: Attack against montgomery kP
implementation: horizontal address-bit DPA? In: Proceedings of the WiP Session of
Euromicro Conference on Digital System Design (DSD2017) (2017)
7. Dyka, Z., Bock, E.A., Kabin, I., Langendoerfer, P.: Inherent resistance of efficient ECC
designs against SCA attacks. In: 2016 8th IFIP International Conference on New
Technologies, Mobility and Security (NTMS), pp. 1–5 (2016)
8. Kabin, I., Dyka, Z., Kreiser, D., Langendoerfer, P.: On the influence of hardware
technologies on the vulnerability of protected ECC implementations. In: Proceedings of the
WiP Session of Euromicro Conference on Digital System Design (DSD2016) (2016)
9. Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math.
Comp. 48, 243–264 (1987)
10. López, J., Dahab, R.: Fast multiplication on elliptic curves over GF(2m) without
precomputation. In: Koç, Çetin K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717,
pp. 316–327. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48059-5_27
11. Hankerson, D., López Hernandez, J., Menezes, A.: Software implementation of elliptic curve
cryptography over binary fields. In: Koç, Ç.K., Paar, C. (eds.) CHES 2000. LNCS, vol.
1965, pp. 1–24. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44499-8_1
12. Dyka, Z., Langendoerfer, P.: Area efficient hardware implementation of elliptic curve
cryptography by iteratively applying Karatsuba’s method. In: Design, Automation and Test
in Europe, vol. 3, pp. 70–75 (2005)
13. Bauer, A., Jaulmes, E., Prouff, E., Wild, J.: Horizontal collision correlation attack on elliptic
curves. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 553–
570. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_28
14. Dyka, Z.: Analysis and prediction of area- and energy-consumption of optimized polynomial
multipliers in hardware for arbitrary GF(2n) for elliptic curve cryptography. Dissertation
thesis, BTU Cottbus-Senftenberg (2013). https://opus4.kobv.de/opus4-btu/frontdoor/index/
index/docId/2634
15. Madlener, F., Sötttinger, M., Huss, S.A.: Novel hardening techniques against differential
power analysis for multiplication in GF(2n). In: 2009 International Conference on
Field-Programmable Technology, pp. 328–334. IEEE (2009)
16. Stöttinger, M., Madlener, F., Huss, S.A.: Procedures for securing ECC implementations
against differential power analysis using reconfigurable architectures. In: Platzner, M., Teich,
J., Wehn, N. (eds.) Dynamically Reconfigurable Systems, pp. 395–415. Springer, Dordrecht
(2010). https://doi.org/10.1007/978-90-481-3485-4_19
17. Dyka, Z., Wittke, C., Langendoerfer, P.: Clockwise randomization of the observable
behaviour of crypto ASICs to counter side channel attacks. In: 2015 Euromicro Conference
on Digital System Design, pp. 551–554 (2015)
18. Synopsis. PrimeTime. http://www.synopsys.com/Tools/
New Certificateless Public Key
Encryption Secure Against Malicious
KGC Attacks in the Standard Model
1 Introduction
Certificateless public key cryptography (CL-PKC), which was originally intro-
duced by Al-Riyami and Paterson [1], represents an interesting and potentially
useful balance between identity-based cryptography (ID-PKC) and public key
cryptography (PKC) based on public key infrastructure (PKI). It eliminates
the key escrow associated with identity-based cryptography without requiring
the introduction of public key certificates. In CL-PKC, a key generation cen-
ter (KGC) is involved in issuing partial private keys computed from the master
secret for users. Each user also independently generates a secret value and the
corresponding public key for itself. Cryptographic operations can then be per-
formed successfully only when both a user partial private key and its secret
value are obtained. An attacker who knows only one of them should not be
able to impersonate the user to carry out any cryptographic operation such as
decrypting or signing.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 236–247, 2018.
https://doi.org/10.1007/978-3-319-89500-0_21
New CLE Secure Against Malicious KGC Attacks in the Standard Model 237
Since the KGC is no longer fully trusted, there are two different types of
adversaries [1,8] in CL-PKC. A Type I adversary is able to compromise a user
secret value and/or replace the user public key with some values chosen by the
adversary. However, it does not know the user partial private key and the master
secret key. Comparing with a Type I adversary, a Type II adversary knows the
master secret key (and hence knows the partial private key of any user), but
does not know the user secret value or being able to replace the user public key.
Up to now, a number of CLE schemes [1,4,6,7,10,14,18] that can withstand the
attacks of both types of adversaries have been available.
In 2007, Au et al. introduced the original concept of malicious key generation
center [3] in CL-PKC. They pointed out that the malicious KGC is an impor-
tant but seemingly previously neglected security concern in CL-PKC. In their
improved security model, they allow the malicious KGC to launch the Type II
attack at the beginning of the system initialization. That is the KGC may mali-
ciously generate (by setting some trapdoors) the master public/secret key pair so
that it may compromise the security of the resulting certificateless cryptosystem.
As we no longer fully trust the KGC, a secure CL-PKC system should require
that the KGC cannot get a non-negligible advantage in attacking a certificate-
less cryptosystem even if it is malicious. As all of those CLE schemes proposed
before 2007 have an implicit assumption that the KGC always generates the
master public/secret key pair honestly according to the scheme specification in
their security model, most of them are insecure against malicious KGC attacks.
Since the introduction of malicious KGC, some attentions have been paid to
the construction of secure CL-PKC systems which can withstand malicious KGC
attacks. Several papers in this line of research have been available (The summary
from the Table 1). In 2007, Huang et al. introduced the earliest secure generic
CLE scheme (HW Scheme) [11] without random oracle. In 2008, Hwang et al.
proposed a concrete CLE scheme (Hwang scheme) [13] in the standard model and
claimed that their scheme was secure against the malicious KGC attack. Unfor-
tunately, Zhang et al. later pointed out that Hwang scheme could not withstand
the key replacement attack in [20]. Although Hwang et al. especially emphasize
that their scheme could resist malicious KGC attacks, we find that the Hwang
scheme is not secure against a traditional type II adversary/malicious KGC. In
2009, Zhang et al. presented a CLE scheme (ZW scheme) with a shorter public
key without using random oracle. They proved the security of their scheme in the
standard model and claimed it could resist malicious KGC attacks. Regrettably,
Shen et al. [15] remarked that Zhang et al.’s security proof is not sound and their
security conclusion is wrong. They showed that the ZW scheme is vulnerable to
attacks of an ordinary type II adversary let alone malicious KGC attacks. In
2010, Huang et al. also proposed an improved generic CLE scheme [12] in the
standard model. Unfortunately, they still did not give a concrete CLE scheme.
As far as we know, until 2014, Yang et al. [19] constructed a CLE scheme secure
against malicious KGC without random oracles (YZ scheme). Nevertheless, its
security depends on the total number of partial private key extraction queries
238 W. Yang et al.
made by an adversary. So, it seems that there are few concrete secure CLE
schemes withstanding malicious KGC attacks.
In this paper, we first demonstrate that Hwang et al.’s security proof in [13] is
not sound by giving concrete Type II attacks according to their security model.
We then present a new CLE scheme and rigorously prove that our construction is
secure against both Type I adversaries and malicious KGCs under the Decisional
Bilinear Diffie-Hellman assumption without random oracles.
Note that, it seems to be somewhat self-contradictory [11] if we require a
certificateless encryption scheme to be secure against both the Type I adversaries
with access to the strong decryption oracle and the malicious KGC adversaries.
The reason is as follows. If a type I adversary is allowed to assess the strong
decryption oracle, the challenger (who knows the master secret) can successfully
answer the decryption queries without knowing the secret value corresponding
to the public key used in producing the ciphertext. Then, employing the same
strategy as the strong decryption oracle for type I adversaries, the malicious
KGC should also be able to decrypt any ciphertext. Based on this observation,
in our security model, if the public key has been replaced, when making some
decryption queries or private key extraction queries, the Type I adversaries are
required to provide the corresponding secret value to the challenger. That is, the
Type I adversary cannot access to the strong decryption oracle.
The rest of this paper is organized as follows. Some preliminaries and the
security notions for CLE schemes are given in Sect. 2. Then, the insecurity of
the Hwang scheme against type II adversaries is analyzed in Sect. 3. Our new
CLE scheme is put forward in Sect. 4. And in Sect. 5 we give security analysis
and performance analysis of our new scheme. Finally we present our conclusions
in Sect. 6.
2 Preliminaries
We briefly introduce some basic notions used in this paper, namely bilinear
pairings, complexity assumptions and the basic concepts of certificateless public
key encryption schemes.
New CLE Secure Against Malicious KGC Attacks in the Standard Model 239
– Setup(1k ): Taking a security parameter k as input, the KGC runs this proba-
bilistic polynomial time (PPT) algorithm to output a randomly chosen master
secret key msk and a master public key mpk.
– ExtPPriK (mpk, msk, ID): Taking the master public key mpk, the master
secret key msk, and a user’s identity ID as input, the KGC runs this PPT
algorithm to generate the partial private key dID for the user with identity
ID.
– SetSecV (mpk, ID): Taking the master public key mpk and a user’s identity
ID as input, the user with identity ID runs this PPT algorithm to output a
secret value svID .
240 W. Yang et al.
– SetPubK (mpk, ID, svID ): Taking the master public key mpk and the user’s
secret value svID as input, the user with identity ID runs this algorithm to
output a public key P KID .
– SetPriK (mpk, svID , dID ): Taking the master public key mpk, the user’s
secret key svID , and the user’s partial private key dID as input, the user with
identity ID runs this PPT algorithm to generate a private key SKID .
– Encrypt(mpk, P KID , ID, M ): Taking a plaintext M , the master public key
mpk, a user’s identity ID and its public key P KID as input, a sender runs
this PPT algorithm to create a ciphertext C.
– Decrypt(mpk, SKID , C): Taking the master public key mpk, the user’s pri-
vate key SKID , and a ciphertext C as input, the user as a recipient runs
this deterministic algorithm to get a decryption σ, which is either a plaintext
message or a “reject” message.
IND-CLE-CCA2 security:
As defined in [1], there are two types of security for a CLE scheme, Type I
security and Type II security, along with two types of adversaries, AI and AII
[1] respectively. The original model believes that the KGC honestly generates
the master public key and master secret key pairs. However, in our security
model, the KGC can maliciously generate them [3] by setting some trapdoors.
Therefore, we allow a type II adversary generate all public parameters in any way
it favours, which matches better with the original motivation of the CL-PKC.
dID is secretly delivered to the user with identity ID as its partial private
key.
SetSecV(mpk, ID): The user with identity ID Picks at random an xID ∈ Zp∗
as its secret value svID .
SetPubK(mpk, ID, svID ): The user with identity ID computes its public
key pkID = (XID , σID ), where XID = hxID and σID is the Schnorr one-time
signature using xID as the signing key and (h, XID = hxID ) as the verification
key. (The message can be any arbitrary string which can be included in mpk.
The signature can be generated using the technique of Fiat-Shamir transform
without random oracles as described in [5].)
SetPriK(mpk, svID , dID ): The user ID picks r randomly from Zp∗ and com-
putes the private key SKID = (sk1 , sk2 ) as
(sk1 , sk2 ) = (psk1xID · Fu (ID)r , psk2xID · g r ) = (g1βxID · Fu (ID)rxID +r , g rxID +r ).
Hwang et al. [13] claimed their scheme is semantically secure even in the strength-
ened model considering the malicious KGC attack. Unfortunately, their conclu-
sion is not sound. We show a concrete type II attack to their scheme which
indicates Hwang scheme is vulnerable to type II adversaries including malicious
KGCs. Our attack is depicted as follows.
New CLE Secure Against Malicious KGC Attacks in the Standard Model 243
where s ∈R Zp∗ , wγ∗ = H(C0∗ , C1∗ , C2∗ , ID∗ , pkID∗ ) ∈ {0, 1}n and Wγ∗ =
{j|wγ∗ [j] = 1, 1, 2, . . . , n}.
– In phase 2, adversary AII first randomly picks s ∈ Zp∗ and generates another
ciphertext C = (C0 , C1 , C2 , C3 ) with
C0 = C0∗ · (XID )s , C1 = C1∗ · g s , C2 = C2∗ · Fu (ID∗ )s ,
ν+ νj
C3 = (C1 ) = Fv (wγ )s+s ,
j∈Wγ
where wγ = H(C0 , C1 , C2 , ID∗ , pkID∗ ) ∈ {0, 1}n and Wγ = {j|wγ [j] = 1, j =
1, 2, . . . , n}.
Note that C = (C0 , C1 , C2 , C3 ) is another valid ciphertext of message Mγ
encrypted under identity ID and the public key P KID .
Next, the adversary AII issues a decryption query ODec (ID∗ , P KID∗ , C ). That
is it submits the ciphertext C to the challenger for decryption under the identity
ID∗ and the public key P KID∗ . Recall that according to the restrictions specified
in the security model, it is legal for AII to issue such a query since C ∗ = C .
So, the challenger has to return the underlying message Mγ to AII . With Mγ ,
adversary AII can certainly know the value γ, and then wins in the game.
Thus, the Hwang scheme is insecure against chosen ciphertext attack of a type
II adversary including malicious KGC.
Setup(1k ): Let (G, GT ) be bilinear map groups of prime order p > 2k and let
g be a generator of G. Set g1 = g α , for a random α ∈ Zp∗ , and pick at random
a group element g2 ∈ G and vectors (u , u1 , . . . , un ), (v , v1 , . . . , vn ) ∈ Gn+1 .
These vectors define the hash functions
n
n
IDj w
Fu (ID) = u uj and Fv (w) = v vj j
j=1 j=1
C = (C0 , C1 , C2 , C3 , C4 )
= (M · ê(pk0 , pk1 )−t · ê(g1 , g2 )s , ê(g, g)t , g s , Fu (ID)s , Fv (w)s ),
and
5.2 Security
For more details about security analysis, the readers are referred to the full
version of this paper.
Here, we compare the major computational cost and security properties of our
scheme with some available concrete CLE schemes that are claimed to be secure
against malicious KGC attacks. The comparison is listed in Table 1. All the
schemes involve three major operations: Pairing, Multiplication and Exponenti-
ation (in G and GT ). For ease of analysis, we only count the number of those
operations in the Encrypt and Decrypt phases. For the security properties, we
consider the confidentiality against the Type I adversaries and the Type II adver-
saries respectively. Note that, in the table, the symbol (n) in pairing column
denotes the number of pairing operations is independent of the message and can
be pre-computed.
Table 2 shows that, although the computational cost of our scheme is rel-
atively high, our scheme overcomes the security weaknesses of the previous
schemes [13,20]. On the face of it, the ZW scheme has the advantages of a short
public key and a higher computational efficiency. Unfortunately, the ZW scheme
is not only insecure against type II adversaries, but also relies on a stronger
assumption.
246 W. Yang et al.
6 Conclusion
In this paper, we have shown that the CLE scheme introduced by Hwang and
Liu is vulnerable to the attacks from the type II adversaries including malicious
KGC. Furthermore, we have put forward a new concrete CLE scheme and proved
its security against both types of adversaries including malicious KGC. Our
security proofs have been rigorously presented in the standard model under the
Decisional Bilinear Diffie-Hellman assumption.
References
1. Al-Riyami, S.S., Paterson, K.G.: Certificateless public key cryptography. In: Laih,
C.-S. (ed.) ASIACRYPT 2003. LNCS, vol. 2894, pp. 452–473. Springer, Heidelberg
(2003). https://doi.org/10.1007/978-3-540-40061-5 29
2. Al-Riyami, S.S., Paterson, K.G.: CBE from CL-PKE: a generic construction and
efficient schemes. In: Vaudenay, S. (ed.) PKC 2005. LNCS, vol. 3386, pp. 398–415.
Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30580-4 27
3. Au, M., Chen, J., Liu, J., Mu, Y., Wong, D., Yang G.: Malicious KGC attacks in
certificateless cryptography. In: Deng, R., Samarati, P. (eds.) ASIACCS 2007, pp.
302–311. ACM Press (2007)
4. Baek, J., Safavi-Naini, R., Susilo, W.: Certificateless public key encryption with-
out pairing. In: Zhou, J., Lopez, J., Deng, R.H., Bao, F. (eds.) ISC 2005. LNCS,
vol. 3650, pp. 134–148. Springer, Heidelberg (2005). https://doi.org/10.1007/
11556992 10
5. Bellare, M., Shoup, S.: Two-tier signatures, strongly unforgeable signatures, and
fiat-shamir without random oracles. In: Okamoto, T., Wang, X. (eds.) PKC 2007.
LNCS, vol. 4450, pp. 201–216. Springer, Heidelberg (2007). https://doi.org/10.
1007/978-3-540-71677-8 14
6. Bentahar, K., Farshim, P., Malone-Lee, J., Smart, N.: Generic construction
of identity-based and certificateless KEMs. Cryptology ePrint Archive: Report
2005/058 (2005). http://eprint.iacr.org/2005/058
7. Cheng, Z., Comley, R.: Efficient certificateless public key encryption. Cryptology
ePrint Archive: Report 2005/012 (2005). http://eprint.iacr.org/2005/012
8. Dent, A.: A survey of certificateless encryption schemes and security models. Cryp-
tology ePrint Archive, Report 2006/211 (2006)
9. Dent, A.W., Libert, B., Paterson, K.G.: Certificateless encryption schemes strongly
secure in the standard model. In: Cramer, R. (ed.) PKC 2008. LNCS, vol. 4939, pp.
344–359. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78440-
1 20
New CLE Secure Against Malicious KGC Attacks in the Standard Model 247
10. Huang, X., Susilo, W., Mu, Y., Zhang, F.: On the security of certificateless sig-
nature schemes from Asiacrypt 2003. In: Desmedt, Y.G., Wang, H., Mu, Y., Li,
Y. (eds.) CANS 2005. LNCS, vol. 3810, pp. 13–25. Springer, Heidelberg (2005).
https://doi.org/10.1007/11599371 2
11. Huang, Q., Wong, D.S.: Generic certificateless encryption in the standard model.
In: Miyaji, A., Kikuchi, H., Rannenberg, K. (eds.) IWSEC 2007. LNCS, vol.
4752, pp. 278–291. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-
540-75651-4 19
12. Huang, Q., Wong, D.: Generic certificateless encryption secure against malicious-
but-passive KGC attacks in the standard model. J. Comput. Sci. Technol. 25(4),
807–826 (2010)
13. Hwang, Y., Liu, J.: Certificateless public key encryption secure against malicious
KGC attacks in the standard model. J. Univ. Comput. Sci. 14(3), 463–480 (2008)
14. Libert, B., Quisquater, J.-J.: On constructing certificateless cryptosystems from
identity based encryption. In: Yung, M., Dodis, Y., Kiayias, A., Malkin, T. (eds.)
PKC 2006. LNCS, vol. 3958, pp. 474–490. Springer, Heidelberg (2006). https://
doi.org/10.1007/11745853 31
15. Shen, L., Zhang, F., Li, S.: Cryptanalysis of a certificateless encryption scheme
in the standard model. In: 4th International Conference on Intelligent Networking
and Collaborative Systems, INCos 2012 (2012)
16. Waters, B.: Efficient identity-based encryption without random oracles. In: Cramer,
R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 114–127. Springer, Heidelberg
(2005). https://doi.org/10.1007/11426639 7
17. Weng, J., Yao, G., Deng, R., Chen, M., Li, X.: Cryptanalysis of a certificateless
signcryption scheme in the standard model. Inf. Sci. 181(3), 661–667 (2011)
18. Yum, D.H., Lee, P.J.: Generic construction of certificateless signature. In: Wang,
H., Pieprzyk, J., Varadharajan, V. (eds.) ACISP 2004. LNCS, vol. 3108, pp. 200–
211. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27800-9 18
19. Yang, W., Zhang, F., Shen, L.: Efficient certificateless encryption withstanding
attacks from malicious KGC without using random oracles. Secur. Commun. Netw.
7(2), 445–454 (2014)
20. Zhang, G., Wang, X.: Certificateless encryption scheme secure in standard model.
Tsinghua Sci. Technol. 14(4), 452–459 (2009)
A Lattice Attack on Homomorphic
NTRU with Non-invertible Public Keys
1 Introduction
The NTRU encryption scheme designed by Hoffstein et al. [6] is considered as
a reasonable alternative to the public key encryption schemes based on either
integer factorization or discrete logarithm. Since its first introduction, minor
changes of the parameter to avoid known attacks have been added. Even with
its computational efficiency and standardization of the NTRU [11], a provably
secure version was not known until Stehlé et al. proposed a modification of the
original NTRU in the year 2011 [10]. The IND-CPA security of their modifica-
tion is proven in the standard model under the hardness assumption of standard
worst-case problems over ideal lattices [10]. Reflecting the continued progress
in the research on quantum computing, researches on transitioning to quantum
resistant algorithms become very active. Moreover, NIST has initiated a stan-
dardization process in post-quantum cryptography. The IND-CPA secure version
of NTRU could be a strong candidate for the standardization of post-quantum
public key encryption. The security proof of the IND-CPA secure NTRU was
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 248–254, 2018.
https://doi.org/10.1007/978-3-319-89500-0_22
A Lattice Attack on Homomorphic NTRU with Non-invertible Public Keys 249
given in [10] under the assumption that the public key is an invertible polyno-
mial in Rq = Z[x]/q, xn +1, however, no such result is known for ‘non-invertible’
public key. López-Alt et al. observed that the IND-CPA secure NTRU can be
made fully homomorphic and proposed the first multikey homomorphic encryp-
tion scheme for a bounded number of users [8]. Notably, the homomorphic NTRU
[8] and its subsequent versions [3,9] do not assume invertible public keys. If q is
a prime number and n is a power of 2 with q = 1 mod 2n, then there is a ring
isomorphism between Rq and Znq and the number of non-invertible elements in
Rq is q n − (q − 1)n .
In this paper, we investigate the security influence of using non-invertible
public key in the homomorphic NTRU. We present a very effective lattice attack
for message recovery on the homomorphic NTRU when the public key is not
invertible. The message space of the homomorphic NTRU is {0, 1} which implies
that the IND-CPA security is equivalent to the security against the message
recovery attack. We interpret the message recovery attack as solving a system
of linear equations under some condition over a finite field Zq using β(x) =
xn +1
gcd (h(x),xn +1) ∈ Zq [x] for any non-invertible public key pk = h(x). For a proof
of successful message recovery in general, we used a sequence of sublattices of
the target lattice and showed that there is an optimal sublattice which gives the
desired short vector by the LLL algorithm if the degree of deg β(x) ≤ log4 q in
the homomorphic NTRU. Moreover, it is known that the actual shortest output
vector of the LLL algorithm could be much shorter than its theoretical bound.
In fact, our experiments using MLLL(Modified LLL) in [4] give much shorter
vector than the theoretical bound and this suggests that avoiding β(x) to have
small degree is not enough to guarantee the security of the homomorphic NTRU
under message recovery attack. Therefore we conclude that setting the public
key of the homomorphic NTRU as an invertible polynomial in Rq is desirable
since the security against message recovery attack is a minimal requirement for
encryption scheme. We note that some lattice attacks called by the subfield
attacks on NTRU cryptosystem were proposed by Cheon et al. [5] and Albrecht
et al. [1] and the goal of the subfield attack is to recover private key which can
be understood as a short vector of the NTRU lattice. Their subfield attacks are
based on the fact that there exist subfields that allow to reduce the dimension
of the NTRU lattice and successful when the modulus q is exponential in n.
Contrary to [1,5], the goal of our lattice attack is the message recovery when the
public key is non-invertible.
The rest of the paper is organized as follows. In Sect. 2, we review some basics
of this paper. In Sect. 3, we show that how to mount the message recovery attack
to be successful if the public key is not invertible. In Sect. 4, we conclude our
paper.
2 Preliminaries
2.1 The Basic Scheme of Homomorphic NTRU
The homomorphic NTRU is defined on the ring Rq = Z[x]/q, xn + 1 for q is a
prime number and n is a power of two. Any element k(x) ∈ Rq is represented
250 S. Ahn et al.
n−1
as k(x) = i=0 ki xi , where − 2q < ki < 2q . For the ring R = Z[x]/xn + 1,
we denote k(x) ← χ for an appropriate distribution χ and each coefficient
|ki | ≤ of k(x) if k(x) ← χ . In the homomorphic version in [8], it is assumed
δ
that q = 2n with 0 < δ < 1 and the message space is {0, 1} while it was
considered that q = poly(n) with the message space {0, 1}n in the proven IND-
CPA secure version [10]. The basic scheme of the homomorphic NTRU consists
of three polynomial time algorithms KeyGen, Enc, Dec).
KeyGen(1κ ): Sample polynomials f˜(x), g(x) ← χ , repeat sampling f˜(x) until
f (x) := 2f˜(x) + 1 is invertible in Rq and denote the inverse of f (x) in
Rq as (f (x))−1 . Output pk = h(x) := 2g(x)(f (x))−1 (mod q, xn + 1) and
sk = f (x).
Enc(pk, m ∈ {0, 1}): Sample polynomials s(x), e(x) ← χ , and output
c(x) := h(x)s(x) + 2e(x) + m (mod q, xn + 1).
Dec(sk, c): Compute μ(x) = f (x)c(x) (mod q, xn + 1), and output
m = μ(x) (mod 2).
that the public key h(x) is not invertible in Rq . Because q is prime, we see
that Zq [x] is a unique factorization domain. If h(x) is not invertible in Rq , then
gcd (h(x), xn + 1) = d(x) = 1 in Zq [x]. Therefore, we see that xn + 1 = β(x)d(x)
and gcd (β(x), h(x)) = 1 in Zq [x]. Since xn + 1 divides β(x)h(x), we see that
β(x)h(x) = 0 in Rq . For a given ciphertext c(x) = h(x)s(x) + 2e(x) + m, we
see that w(x) = β(x)c(x) mod (q, xn + 1) = β(x)(2e(x) + m) mod (q, xn + 1).
In the homomorphic NTRU, the plaintext is chosen from {0, 1}, and therefore,
its IND-CPA security is equivalent to the security in message recovery attack.
Therefore, the IND-CPA adversarial goal is to recover m ∈ {0, 1} from
while m and e(x) are unknown and w(x) and β(x) are known.
For any given ciphertext c(x), the plaintext m ∈ {0, 1} can be recovered by
n−1
m = ( i=0 λi wi mod q) mod 2, where w(x) = β(x)c(x) mod (q, xn + 1) =
n−1 i
i=0 wi x .
252 S. Ahn et al.
n−1
Proof. For a given vector η = (η0 , . . . , ηn−1 ) = i=0 λi bi mod q with the
Condition(*) holds, we have
n−1
n−1
n−1
λi wi mod q = λi bi , 2e + m mod q = ( 2ei ηi ) + mηn−1 mod q.
i=0 i=0 i=0
q
From the assumptions |ηi | < 4n+2 , |ei | ≤ and m ∈ {0, 1}, we see that
n−1
q q q(2n + 1)
|( 2ei ηi ) + mηn−1 | < 2n + = = q/2.
i=0
4n + 2 4n + 2 4n + 2
n−1
n−1
Therefore, we have i=0 λi wi mod q = i=0 2ei i + mηn−1 , which implies
η
n−1
that ( i=0 λi wi mod q) mod 2 = m.
Note that Theorem 1 works for any solution (λi )0≤i≤n−1 which is easy to
compute from η by a simple linear algebra over Zq . Therefore, for a success-
ful message recovery attack, it is enough to get a vector η ∈ Zn that satisfies
Condition(*).
Let LBi,red
⊂ Zi be the lattice generated by the row vectors of Bi,red
for
i =
+ 1, ..., n. If ηred = (ηn−i , ..., ηn−1 ) ∈ LBi,red
is a short vector that sat-
isfies Condition(*) then η = (ηj )0≤j≤(n−1) is a short vector in LB that satisfies
Condition(*), where ηj = 0 if 0 ≤ j ≤ (n − i − 1) and ηj = ηj if n − i ≤ j ≤ n − 1.
From [4], we see that the shortest vector vi ∈ LBi,red of the output of the LLL
algorithm for the lattice generated by the row vectors of Bi,red satisfies that
i−1 i−1
||vi || ≤ ||vi ||2 ≤ 2 4
det(Bi,red )1/i = 2 4 qi.
4 Conclusion
The IND-CPA security of the homomorphic NTRU is proven when the public key
is invertible in Rq [10]. However, no result on the security of the homomorphic
NTRU is known when the public key is not invertible. In this paper, we show that
if the public key is not invertible in the homomorphic NTRU, then one can use a
lattice reduction algorithm effectively to recover the plaintext of any ciphertext.
Therefore, we conclude that the public key of homomorphic NTRU should be
invertible in the ring Rq to guarantee the IND-CPA security of homomorphic
variants of NTRU [3,8,9].
254 S. Ahn et al.
Acknowledgement. Hyang-Sook Lee and Seongan Lim were supported by Basic Sci-
ence Research Program through the National Research Foundation of Korea (NRF)
funded by the Ministry of Science, ICT and Future Planning (Grant Number:
2015R1A2A1A15054564). Seongan Lim was also supported by Basic Science Research
Program through the NRF funded by the Ministry of Science, ICT and Future Plan-
ning (Grant Number: 2016R1D1A1B01008562). Ikkwon Yie was supported by Basic
Science Research Program through the NRF funded by the Ministry of Science, ICT
and Future Planning (Grant Number: 2017R1D1A1B03034721).
References
1. Albrecht, M., Bai, S., Ducas, L.: A subfield lattice attack on overstretched NTRU
assumptions. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9814,
pp. 153–178. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-
53018-4 6
2. Ajtai, M.: The shortest vector problem in L2 is NP-hard for randomized reductions.
In: STOC 1998, pp. 10–19 (1998)
3. Bos, J.W., Lauter, K., Loftus, J., Naehrig, M.: Improved security for a ring-based
fully homomorphic encryption scheme. In: Proceedings of IMA International Con-
ference 2013, pp. 45–64 (2013)
4. Bremner, M.R.: Lattice Basis Reduction-An Introduction to the LLL Algorithm
and its Applications. CRC Press, Boca Raton (2012)
5. Cheon, J.H., Jeong, J., Lee, C.: An algorithm for NTRU problems and cryptanal-
ysis of the GGH multilinear map without an encoding of zero. Cryptology ePrint
Archive, Report 2016/139 (2016)
6. Hoffstein, J., Pipher, J., Silverman, J.H.: NTRU: a ring-based public key cryptosys-
tem. In: Buhler, J.P. (ed.) ANTS 1998. LNCS, vol. 1423, pp. 267–288. Springer,
Heidelberg (1998). https://doi.org/10.1007/BFb0054868
7. Lenstra, A.K., Lenstra, H.W., Lovász, L.: Factoring polynomials with rational
coefficients. Mahtematische Ann. 261, 513–534 (1982)
8. Lopez-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multyparty computa-
tion on the cloud via multikey fully homomorphic encryption. In: STOC 2012, pp.
1219–1234 (2012)
9. Rohloff, K., Cousins, D.B.: A scalable implementation of fully homomorphic
encryption built on NTRU. In: Böhme, R., Brenner, M., Moore, T., Smith, M.
(eds.) FC 2014. LNCS, vol. 8438, pp. 221–234. Springer, Heidelberg (2014). https://
doi.org/10.1007/978-3-662-44774-1 18
10. Stehlé, D., Steinfeld, R.: Making NTRU as secure as worst-case problems over ideal
lattices. In: Paterson, K.G. (ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 27–47.
Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20465-4 4
11. Security Inovation: NTRU PKCS Tutorial. https://www.securityinnovation.com
Practical Range Proof for Cryptocurrency
Monero with Provable Security
1 Introduction
Research on crypto-currency begins in the 80’s, when Chaum [5] proposed the
first electronic cash scheme. As the first and also the most popular decentralized
cryptocurrency, Bitcoin was created in 2009 [9]. Achieving security and privacy
simultaneously have been the design goal for most of these schemes since then.
However, bitcoin does not offer a strong privacy as all transactions are publicly
broadcast and replicated through the bitcoin blockchain. Any data on bitcoin
blockchain could be collected and mined to derive some information that may
undermine the user privacy.
In order to tackle the privacy issues of bitcoin, a new open-source decen-
tralized cryptocurrency, called Monero, is created in April 2014. Firstly, it uses
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 255–262, 2018.
https://doi.org/10.1007/978-3-319-89500-0_23
256 K. Li et al.
linkable ring signature to obscure the original of a transaction and thus pro-
vide better payer privacy. Secondly, it utilises single-use, randomised version of
the receiver’s public key, called stealth address, to provide better payee privacy.
Finally, In order to hide the wallet balance is not stored in plain. Instead, they
are hidden using commitment schemes. As wallet balance is stored in commit-
ted forms, RingCT requires the sender to issue a proof that (1) the input of
a ringCT transaction is equal to that of the output; and (2) the balance of
each wallet involved is within the range of permitted values. The second part
is a range proof and its importance has been explained in detail in the Monero
white paper [10]. Briefly, the cryptographic primitives of Monero work in a cyclic
group of known order q. Then, a commitment of −1 is equivalent to q − 1. Thus,
a sender with 1 dollar in his wallet could send a transaction with the output of 2
and q − 1 if range proof is not performed. By requiring the sender to prove that
all commitments involved are confined within a small range, the above attack
can be prevented.
Despite being one of the most important components and accounts for over
50% of the total bandwidth of a ringCT transaction, the range proof employed in
ringCT is not well-studied. This range proof combines bit decomposition tech-
nique with ring signatures in an unconventional way [8]. The Monero white
paper [10] proposed a new ring signature called ANSL, and discussed two other
options, namely, a well-studied scheme from Abe et al. [1], and a newly pro-
posed scheme called Borromean ring signature [8] but no formal security analysis
regarding the range proof instantiated from these ring signatures is given.
– Let M = H(C, C0 , . . ., C−1 ) for some hash function H. For each i, the
i
prover generates a ring signature on the ring Ri = {Ci , Ci /g 2 } by invoking
σi = Π.Sign(Ri , ri , M ). The range proof π for C is (C0 , σ0 , . . ., C−1 , σ−1 ).
– Upon receiving a proof π, the verifier computes M = H(C, C0 , . . ., C−1 )
and outputs accept if and only the following holds:
−1
C = i=0 Ci
1 = Π.Verify({C0 , C0 /g}, M, σ0 )
..
.
−1
1 = Π.Verify({C−1 , C−1 /g 2 }, M, σ−1 )
Discussions. The monero range proof follows the folklore approach in bit decom-
position while replacing the standard 0/1 OR proof for each commitment Ci with
a range signature. The design philosophy, as explained in [8] is that: “If C was
a commitment to 1 then I do not know its discrete log, but C becomes a com-
mitment to 0 and I do know its discrete log (just the blinding factor). If C was
a commitment to 0 I know its discrete log, and I don’t for C . If it was a com-
mitment to any other amount, none of the results will be zero and I won’t be
able to sign”.
i
−1confirming Ci can only be a commitment to 0 or 2 , the equation
After
C = i=0 assures the verifier that C is a commitment to a number between 0
to 2 − 1.
BKM Ring Signature. First, we review the ring signature scheme from [2].
The BKM scheme works in group equipped with a bilinear map, ê : G×G → GT .
As shown in [2], this ring signature scheme is unforgeable under the CDH
assumption in the standard model when H is a Waters’ hash function.
Theorem 1. If ppt E exists for (P, V), we show how to construct simulator S
that can solve the DL problem given a CDH oracle.
Proof. We consider the case when = 1, i.e., the ring signature-based range
proof of which the value committed is either 0 or 1. Assume (P, V) is an non-
interactive zero-knowledge proof-of-knowledge system,
It means there exists ppt E which can extracts from any P capable of outputting
valid proofs (i.e. proof accepted by V). We show how to construct ppt S, having
access to a CDH oracle, which can solve the DL problem through interaction
with E.
S receives Y as a problem instance and its goal is to output r such that Y =
hr . It flips a coin b ∈ {0, 1} and set C = g b Y . Then, it invoked its CDH oracle on
input C, C/g to obtain a value Z. It computes M = C, generates a random t ∈R ,
uses Z, t to produce a proof π as follows: Compute σ1 = Z(H0 (M )H1 (M ))t , σ2 =
ht . Output σ = (σ1 , σ2 ) as the proof that C = hr ∨ C = ghr . Invoke E on σ
r r
to obtain witness r satisfying C = h or C = gh . In both cases (b = 0 and
b = 1), S outputs r as the solution to the DL problem. Now if b = 0, then S
succeeds if C = hr , and if b = 1, then S succeeds if C = ghr . Since b is hidden
from E, with probability 1/2, S is able to solve the DL problem.
Under the assumption that CDH DL, the above theorem implies that no
E exists. In other words, (P, V) cannot be a proof-of-knowledge.
Let g, h be two random generators of G, which can be sampled with some public
randomness and also serve as the public key of the Peterson commitment. Let
H be a cryptographic hash function. The common reference string of the range
proof is crs = (G, q, g, h, H). The range proof Π consists of two algorithms, which
are described as follows:
[0, l − 1].
i
4. It sets Ci,0 = Ci and computes Ci,1 = Ci /g 2 for i ∈ [0, l − 1].
$
5. For i ∈ [0, l − 1], it samples ci,ūi , αi , zi,ūi ← Zq , and computes Ti,ui = hαi
ci,ū
and Ti,ūi = hzi,ūi Ci,ūi i .
6. It computes the challenge c = H(C, C0 , . . . , Cl−1 , T0,0 , T0,1 , . . . , Tl−1,0 ,
Tl−1,1 ), and for i ∈ [0, l − 1], it computes ci,ui = c − ci,ūi mod q, and
zi,ui = αi − ci,ui ri .
7. It outputs the proof π = ({Ci }i∈[0,l−2] , {ci,0 , ci,1 }i∈[0,l−1] , {zi,0 ,
zi,1 }i∈[0,l−1] ).
– Verify. On input an integer l, a commitment C, and a proof π.
l−2
1. The verify algorithm computes Cl−1 = C/( i=0 Ci ).
c
2. For i ∈ [0, l − 1], it computes Ci = Ci /g 2 , Ti,0 = hzi,0 Ci i,0 and Ti,1 =
i
c
hzi,1 Ci i,1 .
3. It computes c = H(C, C0 , . . . , Cl−1 , T0,0 , T0,1 , . . . , Tl−1,0 , Tl−1,1 ).
4. It outputs 1 iff c = ci,0 + ci,1 mod q for all i ∈ [0, l − 1], outputs 0
otherwise.
Proof Sketch. The range proof proposed in this section is similar to the range
proof presented in [7]. The main difference is that in [7], each commitment Ci
is a commitment of ui , while in our range proof it is a commitment of 2i ui . We
remark that our modification will not affect the security of the range proof since
these two types of commitments are interconvertible, i.e., given a commitment of
ui (or 2i ui ), one can easily generate a commitment of 2i ui (or ui ). We illustrate
this in more details in the full version of this paper.
Prover Verifier
This paper 5 4
This paper (optimised) 3.1 2.2
Monero (ANSL) 3 2 + 2
4 Conclusion
In this paper, we studied the range proof protocol employed in Monero. Firstly,
we pointed out that the design philosophy of this range proof does not guarantee
its security. Secondly, we designed a new range proof protocol and presented a
formal security proof. And, it is also illustrated that the efficiency is comparable
to that of Monero. Moreover, the improved protocol is compatible with Monero’s
wallet and the algebraic structure. Therefore, our proposed range proof protocol
is secure and practical.
Acknowledgement. This work was supported by the National Natural Science Foun-
dation of China (Grant No. 61602396, U1636205, 61572294, 61632020).
262 K. Li et al.
References
1. Abe, M., Ohkubo, M., Suzuki, K.: 1-out-of-n signatures from a variety of keys.
In: Zheng, Y. (ed.) ASIACRYPT 2002. LNCS, vol. 2501, pp. 415–432. Springer,
Heidelberg (2002). https://doi.org/10.1007/3-540-36178-2 26
2. Bender, A., Katz, J., Morselli, R.: Ring signatures: stronger definitions, and con-
structions without random oracles. Cryptology ePrint Archive, Report 2005/304
(2005). http://eprint.iacr.org/2005/304
3. Boudot, F.: Efficient proofs that a committed number lies in an interval. In:
Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 431–444. Springer,
Heidelberg (2000). https://doi.org/10.1007/3-540-45539-6 31
4. Chaabouni, R., Lipmaa, H., Shelat, A.: Additive combinatorics and discrete log-
arithm based range protocols. In: Steinfeld, R., Hawkes, P. (eds.) ACISP 2010.
LNCS, vol. 6168, pp. 336–351. Springer, Heidelberg (2010). https://doi.org/10.
1007/978-3-642-14081-5 21
5. Chaum, D.: Blind signatures for untraceable payments. In: Chaum, D., Rivest,
R.L., Sherman, A.T. (eds.) Advances in Cryptology, pp. 199–203. Springer, Boston
(1983). https://doi.org/10.1007/978-1-4757-0602-4 18
6. Lipmaa, H., Asokan, N., Niemi, V.: Secure Vickrey auctions without thresh-
old trust. In: Blaze, M. (ed.) FC 2002. LNCS, vol. 2357, pp. 87–101. Springer,
Heidelberg (2003). https://doi.org/10.1007/3-540-36504-4 7
7. Mao, W.: Guaranteed correct sharing of integer factorization with off-line share-
holders. In: Imai, H., Zheng, Y. (eds.) PKC 1998. LNCS, vol. 1431, pp. 60–71.
Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0054015
8. Maxwell, G.: Confidential transactions. Web, June 2015
9. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. White paper (2009).
https://bitcoin.org/bitcoin.pdf
10. Noether, S., Mackenzie, A., Monero Core Team: Ring confidential transactions.
Monero research lab report MRL-0005, February 2016
Wireless Sensor Network Security
Modeling Key Infection
in Large-Scale Sensor Networks
1 Introduction
Typically, a sensor network is composed of a large number of sensor nodes; each
sensor node is a small, inexpensive wireless device with limited battery power,
memory storage, data processing capacity and short radio transmission range.
Additionally, sensor networks are often operated on an unattended mode and
sensors are not tamper resistant. This makes sensor networks more vulnerable
than traditional wireless networks.
The first practical key predistribution scheme [1,2] for sensor networks is
random key pre-distribution scheme introduced by Eschenauer and Gligor [3]
and was investigated by Yağan and Makowski [4]. A major advantage of this
scheme is the exclusion of the base station in key management. Another category
scheme is location based key pre-distribution [5,6], which takes advantage of
sensor deployment information to improve the network performance. Location
based schemes can reach the same connectivity with fewer keys stored in sensors
than previous schemes.
In this paper, we are interested in very simple sensors and a large number
of them in a network. The number is such that it is infeasible to deploy every
sensor node manually. Deployment in batches implies self-organizing network
that is automatically and autonomously established upon physical deployment.
Large number of sensors make it also hard to change code or data stored in
every sensor, it is much easier to mass-produce sensors that are identical even
on firmware and data level.
Key infection [7] is a lightweight security protocol suitable for large-scale sen-
sor networks and is based on the assumption that, during the network deploy-
ment phase, the adversary can monitor only a fixed percentage of communication
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 265–275, 2018.
https://doi.org/10.1007/978-3-319-89500-0_24
266 F. Peng et al.
channels. Sensors simply broadcast keys in clear to all their neighbors. The plain-
text key exchange is not much useful in common scenarios but when this process
starts in hundred thousands instances at a time, it becomes extremely difficult
for the adversary to compromise large fraction keys of the network.
To analyze key infection protocol, we propose a probability model of key
infection. This model can help designers to evaluate key infection and adapt it
to their needs. Then, a group based key infection protocol is proposed to improve
the security performance of key infection.
Basic Key Infection. The idea of basic key infection (B-KI) [7] is to prop-
agate keying material after deployment: each sensor simply chooses a key and
broadcasts it in plaintext to its neighbors.
Assume sensor i, when it comes to rest after deployment, broadcasts a key
ki and is heard by sensor j. Sensor j then generates a key kji and sends to
i: {j, kji }ki . Later on, the key kji can be used to protect communication link
between sensors i and j.
W1 → W3 : {W1 , W2 , N1 }k13
W3 → W2 : {W1 , W2 , N1 }k23
W2 computes : k12 = H(k12 ||N1 )
W2 → W1 : {N1 , N2 }k12
W1 → W2 : {N2 }k12
where N1 and N2 are nonces, {M }ki represents the encrypted message M using
key ki , and H(.) is a hash function. After the protocol terminates, W1 and W2
update their key from k12 to k12 = H(k12 ||N1 ).
268 F. Peng et al.
Therefore, the probability P{b} that there are exactly b eavesdroppers in the
overlap region Rij (R) is
b t−b
t E[Aij (R)] E[Aij (R)]
P{b} = 1− ,
b S S
and the probability of at least one eavesdropper located inside the interior of
region Rij (R) is
t
E[Aij (R)]
PRij (R) = 1 − P{b = 0} = 1 − 1 −
S
2 t
0.5865πR
≈1− 1− .
S
Modeling Key Infection in Large-Scale Sensor Networks 269
Let n denote the average number of neighbors of a sensor. Because the sen-
sors are distributed over the field uniformly, when n n and S πR2 , we have
πR2 n +1
Let Bij , Bij̄ , and Bīj be events that the adversary has placed eavesdroppers
in regions Rij (R), Rij̄ (R), and Rīj (R), respectively. Clearly, Bij , Bij̄ , and Bīj
are independent. Therefore, the event B that the link key between i and j is
broken in B-KI is B = Bij ∪ (Bij̄ ∩ Bīj ). Therefore
Thus, as to the basic key infection B-KI, the outage probability PB−KI that
the link key between a pair of sensors is compromised, is equal to the probability
that event B occurs. More preciously,
PB−KI = PRij (R) + PRij̄ (R) · PRīj (R) − PRij (R) · PRij̄ (R) · PRīj (R) .
For two parties whispering key infection, W-KI(2), the outage probability
PW −KI(2) that the key is compromised is
PW −KI(2) = PRij (x) + PRij̄ (x) PRīj (x) − PRij (x) PRij̄ (x) PRīj (x)
= ϕ(0.1955) + ϕ2 (0.3045) − ϕ(0.1955)ϕ2 (0.3045) (2)
Now considering the case that only one party applies whispering key infection,
denoted as W-KI(1). As depicted in Fig. 3(b) and (c), sensor i uses whispering
key infection to communicate with sensor j, but j communicates with i using
the maximum communication radius R. In this case, the area of the lenticular
overlap region Rij (x, R) is
πx2 0 < x ≤ R2 ,
Aij (x, R) =
g(x) R2 < x ≤ R.
√
where, g(x) = 2x2 sin−1 2xR R
+ R2 cos−1 2x − R2 4x2 − R2 .
R R R
2x 2 2x 2x
E[Aij (x, R)] = Aij (x, R) dx = πx2 dx + g(x) dx ≈ 0.2932πR2 .
0 R2 0 R2 R R2
2
x
Let α = ∠cie, β = ∠aje, and γ = ∠akc. Then, α = cos−1 2R +
2 2 2 2 2 2
y
cos−1 2R − cos−1 x +y 2xy
−z
, β = cos−1 2R x z
+ cos−1 2R − cos−1 x +z −y
2xz ,γ =
2 2 2
y −1 z −1 y +z −x
cos−1 2R + cos 2R − cos , and the area of ace is Sace =
2yz
l(l − ce)(l − ae)(l − ac), where l = 2 (ce + ae + ac), ce = 2R sin(α/2), ae
1
=
2R sin(β/2), ac = 2R sin(γ/2).
2
The area of region Rijk (R) shown in Fig. 4 is Aijk (R) = Sace + R2 (α + β +
γ − sin α − sin β − sin γ), and
R
E[Aijk (R)] = Aijk (R)f (x)f (y)f (z)dxdydz ≈ 0.4942πR2 ,
0
0.09
Probability that a link is compromised
0.09 0.09
basic whisper SA
0.08 0.08 0.08 SA−KI
B−KI W−KI(2)
0.07 0.07 W−KI(1) 0.07
(a) Basic key infection (b) Whispering key infec- (c) Secrecy amplification
tion
Fig. 5. The outage probability of simulation results [7] vs. the probability model. For
each subfigure, the solid line with marker + is the simulation results given in [7], the
dotted line represents the result of the probability model proposed in this paper. Here,
n = 5, α = t/n.
The analytical results of the probability model for key infection are given in
Fig. 5. Our results of probability model approximate to the simulation results in
[7]. Therefore, this model can help designers to evaluate key infection and adapt
it to their needs.
Step 1 : In-group key establishment. Before deployment, sensors are first pre-
arranged into small groups, and sensors apply key infection to establish pair-
wise keys with all the other sensors in the same group.
Step 2 : Cross-group key establishment. After deployment, if two adjacent sen-
sors have not established a secret key, they use key infection to negotiate a
key. Secrecy amplification could be applied jointly if needed.
500
500
450
450
400
400
350
350
300 300
250 250
200
200
150
150
100
100
50
50
0
0
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
Fig. 6. Deployment examples of G-KI with different σ. Red squares and blue circles
denote eavesdroppers and sensors, respectively. Red lines are the links compromised by
the adversary, blue dot lines are the secure links. The big red circles are eavesdropping
regions of eavesdroppers. (Color figure online)
each grid cell is the deployment point. Any sensor in the deployment group
Gi follows a two-dimensional Gaussian distribution centered at a deployment
point (xi , yi ) with the standard deviation σ.
1 − x2 +y2 2
e 4σ , −∞ < x, y < ∞
fX,Y (x, y) =
4πσ
√
Therefore, the distance √between nodes i and j, √Z = X2 + Y2 has the
Rayleigh distribution, Z = X2 + Y2 ∼ Rayleigh( 2σ), and the probability
z2
distribution function of Z is given by FZ (z) = 1 − e− 4σ2 , (z ≥ 0). Therefore, the
probability that two sensors in the same group are adjacent after deployment is
R2
P{Z ≤ R} = 1 − e− 4σ2 .
274 F. Peng et al.
0.7
B−KI (R=35,n’=16)
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
t/n (α) − Fraction of black nodes in network (%)
As Fig. 6 depicted, when the deviation σ increases, the sensors are more
evenly distributed, but the benefits introduced by G-KI diminish monotonically,
because the sensors in the same group are not close to each other. An appropriate
σ is a trade-off between the security and the suitable distribution of the network.
0.9
0.8
0.7
0.6
F(σ=x)
0.5
0.4
0.3
0
10 15 20 25 30 35 40 45 50
Standard deployment deviation σ
σ = x. Then, we define a function F (x) = e−V ar(σ=x) . If F (x1 ) > F (x2 ), the
distribution with σ = x1 is more approximate to random distribution than the
distribution with σ = x2 . Figure 8 depicts the simulation results of F (x) with
different σ. When σ ≥ 30, the distribution of G-KI asymptotic approximates to
random distribution.
6 Conclusions
Acknowledgments. This work was supported by the National Key Research and
Development Program of China (2016YFB0800601), the National Natural Science
Foundation of China (61671360, 61173135), and in part by the Natural Science Basic
Research Plan in Shaanxi Province of China (2017JM6082).
References
1. Ding, J., Bouabdallah, A., Tarokh, V.: Key pre-distributions from graph-based block
designs. IEEE Sens. J. 16(6), 1842–1850 (2016)
2. Bechkit, W., Challal, Y., Bouabdallah, A., Tarokh, V.: A highly scalable key pre-
distribution scheme for wireless sensor networks. IEEE Trans. Wirel. Commun.
12(2), 948–959 (2013)
3. Eschenauer, L., Gligor, V.: A key-management scheme for distributed sensor net-
works, In: Proceedings of the 9th ACM Conference on Computer and Communica-
tions Security, pp. 41–47. ACM Press, Washington (2002)
4. Yağan, O., Makowski, A.M.: Wireless sensor networks under the random pair-
wise key predistribution scheme: can resiliency be achieved with small key rings?
IEEE/ACM Trans. Netw. 24(6), 3383–3396 (2016)
5. Du, W., Deng, J., Han, Y.S., Chen, S., Varshney, P.K.: A key management scheme
for wireless sensor networks using deployment knowledge. In: INFOCOM 2004, pp.
586–597. IEEE Press, New York (2004)
6. Choi, J., Bang, J., Kim, L., Ahn, M., Kwon, T.: Location-based key management
strong against insider threats in wireless sensor networks. IEEE Syst. J. 11(2),
494–502 (2017)
7. Anderson, R., Chan, H., Perrig, A.: Key infection: smart trust for smart dust. In:
IEEE International Conference on Network Protocols, pp. 206–215. IEEE Press,
New York (2004)
SDN-Based Secure Localization
in Heterogeneous WSN
1 Introduction
The rapid development of Internet of Things (IoT) [1,2] makes wireless sensor
networks (WSN) [3] face great challenges in heterogeneous interconnection and
network management. The introduction of software-defined networking (SDN)
has brought the dawn to solve this problem [4,5]. Centralized control is one
of the core feature of SDN. Constructing network global view is the basis task
of control plane [6], where the sensor node location information is the priority
among priorities. On the one hand, the valuable sensing information must be
associated with the location, and which is an important guarantee of quality
of service (QoS). On the other hand, with the paradigm of “sensing as a ser-
vice”, location information is a significant foundation for the distribution and
deployment of sensing services [7].
In distributed WSN, the sensor nodes localization method is usually divided
into two kinds of range-based and range-free technologies [8]. Note that the
localization in this paper refers to the planar positioning of sensor node itself.
The range-based localization refers to achieve location by using certain means
to measure the distance between nodes, which including received signal strength
indicator (RSSI), signal transmission time or angle [9–11]. With ease of imple-
mentation and no additional hardware, the RSSI-based method becomes the
preferred localization technology in WSN. The range-free positioning indicates
some properties of sensor nodes are used to obtain location, such as neighbors or
hops [12,13]. This approach reduces the localization cost by sacrificing position-
ing accuracy, and is generally applicable to large-scale networks. Among them,
the most classic one is hop-based DV-Hop algorithm [14].
Although there are large divergences in the above two methods, the basic
process is essentially the same. First, the blind node (the sensor node to be posi-
tioned) acquires the distance (expressed as RSSI or hops, etc.) with the anchor
node (the reference node with known location) through the range-based or range-
free methods. Then, the anchor node publishes its own location information as a
reference to blind node. Finally, the blind node calculates the position through
plane geometry relation.
However, the above procedure does not take into account the anchor node
location and identity privacy, and it is a big security threaten. The malicious
node can pretend to be the blind node to eavesdrop the location information of
anchor node. As the next step, it can destroy the network positioning capabilities
by targeted physical destruction or signal interference. In turn, the malicious
node can affect the positioning accuracy and QoS by impersonating anchor node
to publish the fictitious location information [15]. In addition, the transmission
power and radius of sensor nodes are different in heterogeneous WSN, and the
RSSI- or hop-based distance may have notable errors, so the node localization
accuracy is confronted with great challenges [16].
As far as the authors know, there is no related research using SDN to solve
the security location problem in WSN, and only [17,18] using SDN method to
program the activation state of anchor nodes. Although the node localization
accuracy and network energy consumption are well balanced, it is still belonging
to distributed positioning with the privacy leak problem. In view of this, we
propose a centralized security localization mechanism based on SDN for hetero-
geneous WSN. In which, the localization algorithm is run on the SDN controller,
thus ensuring the security of sensitive information such as anchor node location
and identity. In addition, in the distance calculation process, the positioning
accuracy is greatly enhanced by considering the transmission power of heteroge-
neous sensor nodes. Relative to sensor nodes, many capabilities of SDN controller
like energy, computing, storage, communication, etc., are generally considered
unrestricted. Therefore, the mechanism can effectively reduce the sensor node
positioning load.
The outline of the rest of this paper is organized as follows: Sect. 2 gives
a general introduction to the secure localization model. And the correspond-
ing algorithm is detailedly described in Sect. 3. Section 4 analyzes the security
and performance of the mechanism. Experimental design and results analysis is
elaborated in Sect. 5 and summed up in Sect. 6.
278 M. Huang and B. Yu
The data plane includes anchor and blind nodes. The anchor nodes can obtain
its own position information by coordinates presetting or GPS positioning, so as
to provide reference for the blind nodes. Considering heterogeneous factor, we
design two classes of blind nodes. The same point is that they are sensor nodes
to be located and can communicate with each other. And the difference between
them is in the level of residual energy, communication radius and transmission
power [19].
Limitations of space, we only designed a single controller, but can be extended
to a logic unified control plane with multi-controllers. SDN controller is the key
of security localization mechanism, the ultimate goal is to achieve the virtual
view (part of the network global view) which all the sensor nodes are accurately
positioned. It is necessary to illustrate that the position similarity between vir-
tual view and data plane reflects the mechanism performance. At the same time,
SDN-Based Secure Localization in Heterogeneous WSN 279
the security embodied in the virtual view too. By using centralized localization
instead of traditional distributed one, SDN controller can protect the sensitive
information in positional process, thus effectively resisting multiple attacks from
malicious node for location based services. In the choice of centralized posi-
tioning algorithm, considering the diversity of traditional location technology,
we only select two popular algorithms from range-based and range-free classes
respectively, namely RSSI-based and DV-Hop algorithms.
PL (d) = SP − LS (1)
It can be seen that SDN controller can normalize all the LS s by the minimum
transmit power (SPmin ) among all the sensor nodes, as shown in Eq. (2), and
the amended LS denoted as LS .
it can be used as a supplement to RSSI-based algorithm for the areas where the
anchor nodes are sparser. In order to improve the positioning accuracy of DV-
Hop, this paper adopts the fixed distance (FD) by RSSI replace of ED per hop.
As shown in Fig. 2(b), the number of hops between the anchor nodes A1 and A2
is 4, the distance is s, and the number of hops between blind node B and A1 or
A2 are 2, so the traditional DV-Hop algorithm estimates the distance from B to
A1 or A2 are sE = s × 2/4. Obviously, the positioning error arises. Therefore, we
use the ratio relation between LS s to correct each hop distance, and the distance
between B and A1 (SF ) is shown in Eq. (4). Similarly, the distance between B
and A2 can be calculated.
ls1 + ls2
sF = ×s (4)
ls1 + ls2 + ls3 + ls4
The algorithm flow chart is shown in Fig. 3. Among them, “Build LT ” and
“Amend LS ” belong to the state acquisition phase, “RSSI-based Localization”
and “DV-Hop Localization” are implemented in the centralized positioning stage,
and “Complete Virtual View” is the final result of the algorithm.
the distributed positioning technology. Thus, the attacker cannot get through
eavesdropping to obtain the anchor node location, but also difficult to locate the
anchor node through traffic analysis attack, and then capture or tamper is not
feasible too. In other words, the probability of discovery anchor node through
eavesdropping or traffic analysis attacks is equal to the one that the random selec-
tion node among all the sensor nodes is an anchor node. In addition, the attacker
disguised as an anchor node is bound to be recognized by SDN controller.
Obviously, due to the openness of WSN deployment with its wireless chan-
nel, the attackers can physically capture all the sensor nodes, but the attack
cost is huge, the actual feasibility is not high. It is worth mentioning that this
mechanism cannot resist certain attacks aimed at the transmission signal to
increase the positioning error, such as the installation of obstacles to reduce the
LS between the neighbors. Such attacks in the distributed positioning method
is also difficult to withstand.
Parameters Value
Deployment area 200 × 200 m2
Transmission radius 50 m
Initial power 9 × 106 mC
Number of nodes From 30 to 110 with increment is 20
Anchor node ratio 10%, 30% and 50%
Transmission power 1.0 dBm (60%), −1.5 dBm (20%) and 4.5 dBm (20%)
sensor nodes and network average residual energy respectively, the black polyg-
onal line represents the proposed centralized positioning algorithm, and the blue
one stand for the distributed positioning using DV-Hop Algorithm. In addition,
the percentage data in explanatory text means the anchor node ratio.
It can be seen from Fig. 4 that the centralized localization algorithm is supe-
rior to the distributed one in energy consumption. With the increase in the
number of network nodes, it is natural that the time and energy required for
network positioning are correspondingly raised. However, the dissipation energy
in the distributed method is relatively fast. In addition, with the increase in the
proportion of anchor nodes, the energy consumption in centralized localization
is only slightly increased, while the raise in distributed positioning is very obvi-
ous. The fundamental reason is that the energy consumption of our mechanism
is mainly reflected in the state acquisition phase. At the same time, for all the
nodes are treated equally by SDN controller, thus it is only need to transfer the
location data of additional anchor nodes. On the contrary, DV-Hop algorithm
will in the full use of the anchor nodes to improve the positioning accuracy,
naturally wasting more network energy.
(2) Positioning Accuracy
In this paper, we take the concept of overall positioning accuracy, that is, the
sum of all the location deviation distance divided by the product of network
SDN-Based Secure Localization in Heterogeneous WSN 285
The positioning accuracy is shown in Fig. 5, and the drawing notes are the
same as those shown in Fig. 4, which are no longer explained. Obviously, the secu-
rity localization mechanism is better than the distributed positioning scheme,
and the advantages are more prominent when the anchor nodes are deployed
sparse. The reason is that this mechanism joint uses a variety of positioning
algorithms (This paper enumerates only RSSI-based and DV-Hop methods). In
the meantime, the transmission power is adopted to amend the link quality,
and further applied to reduce the DV-Hop positioning error. In addition, as a
whole, the more anchor node deployment, the higher the positioning accuracy,
and when the proportion reaches 50%, the positioning error can be less than
0.1. Therefore, this scheme can effectively improve the positioning performance
of heterogeneous WSN.
286 M. Huang and B. Yu
6 Summary
In the traditional distributed positioning method, the anchor node must broad-
cast its own location and identity information to assist the blind node posi-
tioning process. However, this approach makes the anchor node easily become
the attack target, such as malicious nodes launching eavesdropping attacks to
obtain anchor node location, and then can be implemented targeted manual cap-
ture or signal interference. To change this situation, we adopt SDN paradigm
to transfer the distributed positioning process to SDN controller, so as to effec-
tively protect the privacy information such as location and identity of anchor
nodes. The above conclusions are verified by security analysis. In addition, after
modifying the link quality of heterogeneous sensor nodes, SDN controller can
improve the positioning accuracy of blind nodes by running the complementary
range-based and range-free localization algorithms. Finally, based on the open
source architecture SDN-WISE and COOJA simulation platform, we designed
and implemented the verification experiment. The results show that the scheme
has better energy efficiency and higher positioning accuracy than the traditional
method.
References
1. Miorandi, D., Sicari, S., Pellegrini, F.D., Chlamtac, I.: Internet of things. Ad Hoc
Netw. 10, 1497–1516 (2012)
2. Capella, J.V., Campelo, J.C., Bonastre, A., Ors, R.: A reference model for moni-
toring IoT WSN-based applications. Sensors 16, 1816–1836 (2016)
3. Ovsthus, K., Kristensen, L.M.: An industrial perspective on wireless sensor
networks—a survey of requirements, protocols, and challenges. IEEE Commun.
Surv. Tutor. 16, 1391–1412 (2014)
4. Luo, T., Tan, H.P., Quek, T.Q.S.: Sensor OpenFlow: enabling software-defined
wireless sensor networks. IEEE Commun. Lett. 16, 1896–1899 (2012)
5. Caraguay, Á.L.V., Peral, A.B., López, L.I.B., Villalba, L.J.G.: SDN: evolution and
opportunities in the development IoT applications. Int. J. Distrib. Sens. Netw.
2014, 1–10 (2014)
6. Kreutz, D., Ramos, F.M.V., Verissimo, P.E., Rothenberg, C.E., Azodolmolky, S.,
Uhlig, S.: Software-defined networking: a comprehensive survey. Proc. IEEE 103,
14–76 (2015)
7. Perera, C., Zaslavsky, A., Christen, P., Georgakopoulos, D.: Sensing as a service
model for smart cities supported by Internet of Things. Trans. ETT 25, 81–93
(2014)
8. Han, G., Xu, H., Duong, T.Q., Jiang, J., Hara, T.: Localization algorithms of
wireless sensor networks: a survey. Telecommun. Syst. 52, 2419–2436 (2013)
9. Shao, J.F., Tian, W.Z.: Energy-efficient RSSI-based localization for wireless sensor
networks. IEEE Commun. Lett. 18, 973–976 (2014)
10. Shao, H.J., Zhang, X.P., Wang, Z.: Efficient closed-form algorithms for AOA based
self-localization of sensor nodes using auxiliary variables. IEEE Trans. Signal Pro-
cess. 62, 2580–2594 (2014)
11. Go, S., Chong, J.: Improved TOA-based localization method with BS selection
scheme for wireless sensor networks. ETRI J. 37, 707–716 (2015)
SDN-Based Secure Localization in Heterogeneous WSN 287
12. Ma, D., Meng, J.E., Wang, B.: Analysis of hop-count-based source-to-destination
distance estimation in wireless sensor networks with applications in localization.
IEEE Trans. Veh. Technol. 59, 2998–3011 (2010)
13. Garcı́a-Otero, M., Población-Hernández, A.: Secure neighbor discovery in wireless
sensor networks using range-free localization techniques. Int. J. Distrib. Sens. Netw.
2012, 178–193 (2012)
14. Gui, L., Val, T., Wei, A., Dalce, R.: Improvement of range-free localization tech-
nology by a novel DV-Hop protocol in wireless sensor networks. Ad Hoc Netw. 24,
55–73 (2015)
15. Li, P., Yu, X., Xu, H., Qian, J., Dong, L., Nie, H.: Research on secure localization
model based on trust valuation in wireless sensor networks. Secur. Commun. Netw.
2017, 1–12 (2017)
16. Assaf, A.E., Zaidi, S., Affes, S., Kandil, N.: Low-cost localization for multihop
heterogeneous wireless sensor networks. IEEE Trans. Wirel. Commun. 15, 472–
484 (2016)
17. Zhu, Y., Zhang, Y., Xia, W., Shen, L.: A software-defined network based node
selection algorithm in WSN localization. In: IEEE Vehicular Technology Confer-
ence, pp. 1–5. IEEE Press, New York (2016)
18. Zhu, Y., Yan, F., Zhang, Y., Zhang, R., Shen, L.: SDN-based anchor scheduling
scheme for localization in heterogeneous WSNs. IEEE Commun. Lett. 21, 1127–
1130 (2017)
19. Liu, X., Evans, B.G., Moessner, K.: Energy-efficient sensor scheduling algorithm
in cognitive radio networks employing heterogeneous sensors. IEEE Trans. Veh.
Technol. 64, 1243–1249 (2015)
20. Peng, R., Sichitiu, M.L.: Probabilistic localization for outdoor wireless sensor net-
works. ACM SIGMOBILE Mob. Comput. Commun. Rev. 11, 53–64 (2007)
21. Anastasi, G., Conti, M., Francesco, M.D., Passarella, A.: Energy conservation in
wireless sensor networks: a survey. Ad Hoc Netw. 7, 537–568 (2009)
22. Gill, K., Yang, S.H., Yao, F., Lu, X.: A ZigBee-based home automation system.
IEEE Trans. Consum. Electron. 55, 422–430 (2009)
23. Galluccio, L., Milardo, S., Morabito, G., Palazzo, S.: SDN-WISE: design, prototyp-
ing and experimentation of a stateful SDN solution for WIreless SEnsor networks.
In: 2015 IEEE Conference on Computer Communications, INFOCOM, pp. 513–
521. IEEE Press, New York (2015)
24. Osterlind, F., Dunkels, A., Eriksson, J., Finne, N., Voigt, T.: Cross-level sensor
network simulation with COOJA. In: 2006 IEEE Conference on Local Computer
Networks, pp. 641–648. IEEE Press, New York (2006)
Security Applications
A PUF and Software Collaborative Key
Protection Scheme
1 Introduction
Theoretically, authentication schemes and protocols are based on the assumption that
the key stored in the non-volatile memory (NVM) is secure [1]. Unfortunately, this is
quite difficult to achieve in practice. Physical attacks, e.g. side channel attack and
reverse engineering, would result in key exposure and security breaks. Moreover,
software attacks like malicious software and viruses, can also steal the private key. In
industry, the natural idea for protecting against private key exposure through invasive
physical attacks is to create a tamper sensing area to store the key information [2].
However, such methods are always complex and have not solved the essential problem
that the private key is still permanently stored in the non-volatile memory.
Z. Liu—The work is supported by a grant from the National Key Research and Development
Program of China (Grant No. Y16A01602).
2 Related Work
2.1 Controlled PUFs
A PUF’s CRP set contains all the secret with which the PUF can sever as a physical
root of trust. Unfortunately, almost all popular electric PUFs used in practice are so
called “Weak PUF”. These PUFs have limited CRPs, which can be totally observed
with low cost. In addition, path-delay based PUFs, e.g. Arbiter PUF and Ring Oscil-
lator PUF, are proved to be vulnerable to modeling and machine learning attacks [11–
13]. If a PUF’s behavior has been penetrated, the key derived from it is also exposed.
To overcome the inborn defect of “Weak PUF”, Blaise et al. proposed Con-
trolled PUF (CPUF) [14], which enhances PUF’s resistance to being modeled and
broaden the application range of “Weak PUFs”. A CPUF is a combination of a PUF
and an inseparable circuit, which usually implements an encryption or hash algorithm.
This circuit governs the PUF’s input and output, which is so called “control”. The input
control restricts the selection of challenges, which is very effective in protecting the
PUF from modeling attacks that adaptively choose challenges. The output control
prevents the adversary from probing the PUF, because it hides the physical output of
the PUF and the adversary can only obtain indirect sequences derived from PUF’s
responses [15] (Fig. 1).
Controlled PUF
The secure sketch guarantees the perfect reproduction of the key derived from
PUF’s response. It is usually implemented by an Error Correcting Code (ECC) algo-
rithm as Fig. 2 demonstrates. To ensure correctness of the key’s recovery, the cor-
recting capability of the ECC should be carefully designed according to PUF’s
reliability. During the sketch process, some redundant information x will be produced.
This redundant information x is called “helper data” and can help recover the noisy
response in the recover process. Generally, the helper data is stored in an NVM without
any protection, the worst estimation of remaining entropy considering the helper data
being revealed is H∞(y) − (#c − #r), where H∞(y) is the min-entropy of the enrolled
response sequence y, #c is the code length of ECC and #r is the bit number of the
encoded random number r.
Though PUF’s response sequences are supposed to be random and unpredictable,
they are in fact not nearly-uniform bit strings that satisfy the security requirement for a
secret key. Therefore, an entropy accumulator is demanded to extract high quality
random keys from response sequences that only possess limited entropy per bit.
A secure hash algorithm is often applied as an entropy accumulator and the con-
struction of PUF-based key generator is shown in Fig. 3.
response y
Secure Sketch Secure Sketch
helper data NVM
sketch recover
In this section, we will elaborate our full key protection scheme which enhances the
PUF’s security and meanwhile authenticates the integrity and legality of the firmware.
PUF
Output
CPU BUS
Interface
Malicious
code Other
NVM
Peripherals
materials. To guarantee the consistency of the running and the input firmware code, the
PUF module have direct access to the NVM that stores the firmware code, i.e. firmware
code is read directly by hardware logic without modification.
This hash module also serves as an entropy accumulator to form a PUF-base key
generator with the secure sketch module and PUF instance. The secure sketch guar-
antees the generated key’s reproducibility. Regarding the PUF instance, considering
PUF’s responses fluctuate randomly at every measurement, PUF itself can be regarded
as a physical random source. Therefore, Except for offering instance-specific materials
to generate the private key, PUF meanwhile forms a random number generator
(RNG) with the hash module to serve the secure sketch in the key generation phase.
In addition, we notice that the PUF’s input challenges and output responses have all
been processed by the hash module, i.e. our design has naturally possessed CPUF
structure, which strengthens PUF’s resistance to adversaries like the chip manufacturer
who have chance to read PUF’s CRPs directly.
Enhanced PUF
random number sequence
Hash Secure
NVM firmware code Module PUF NVM
Sketch
recovered response
3. Use the obtained hash value as PUF’s challenge to invoke the PUF instance and
acquire a noisy response sequence y′;
4. Load helper data w;
5. Recover the noisy response y′ with the helper data w and acquire the recovered
response sequence y″;
6. Hash the recovered y″ to get the recovered private key pk′ and output it.
Where INS is the PUF instance set and CHA is the challenge set. Yihrt represents
PUF instance’s inherent physical features. For 8yihrt(ins, cha) 2 Yihrt, yihrt(ins, cha) is
decided by the instance’s random characters and the input challenge, it is invariable at
each measurement; E is the summation of random variations (e.g. voltage and tem-
perature fluctuations, thermal noise etc.) during the measurement, it changes at every
measurement, i.e. 8i 6¼ j, there is ei 6¼ ej. We assume PUF’s bit error rate is pe.
Correctness: To ensure the key’s correct recovery, the error correcting capability of
ECC algorithm in the secure sketch must be sufficiently strong. The lower bound of
required error correcting capability is determined by the noise rate between the enrolled
response y and reproduced response y′.
298 C. Li et al.
X
t
fbino t; n; 2pe 2p2e 1 pfail : ð2Þ
i¼0
n t
where fbino ðt; n; pÞ ¼ p ð1 pÞnt , pfail is the permitted failure probability for
t
key’s recovery, usually pfail = 10−6 in the industry.
Security: As we have assumed that every bit in a binary response sequence r 2 {0, 1}n
is independent, min-entropy calculated as formula (3) offers a lower bound of responds’
randomness in the worst case.
X
n
H1 ðr Þ ¼ log2 max P r i ¼ 1 ; P r i ¼ 0 : ð3Þ
i¼1
According to Sect. 2.2, when the helper data w is disclosed, the min-entropy
remained in the recovered response sequence y″ is H∞(y) + H∞(r) − #w. Assume the
length of the generated key is lkey, to make the key possess sufficient randomness, H∞(y″)
should be equal or greater than m, i.e.
H1 ð yÞ þ H1 ðr Þ #w lkey : ð4Þ
P(ri = 1) and Pðr i ¼ 0Þ are probabilities for the ith bit of response to equal 1 and 0
respectively. Respectively substitute them into formula (3), we can calculate H∞(y) and
H∞(r). From formula (5) and (6), we can see that the randomness of response sequence
y, which will be used to generate the private key, comes from the PUF,
H1 ð yÞ ¼ H1 ðyihrt ðins; chaÞÞ; as for response r that is used to generate random num-
bers, its randomness comes from random factors during multiple measure process, i.e.
P
H∞(r) = H∞(e) and average pe ¼ ni¼1 minfPðr i ¼ 1Þ; Pðr i ¼ 0Þg=n.
A PUF and Software Collaborative Key Protection Scheme 299
H 1 ðr Þ
qðr Þ ¼ : ð7Þ
#r
Let ly and lr represent the length of y and yr respectively, then H∞(y) = lyq(y) and
H∞(yr) = lrq(yr), ly and lr should satisfy inequations:
ly
½nqð yÞ þ k n lkey : ð8Þ
n
and
ly
lr qðyr Þ k: ð9Þ
n
4 Implementation
3
4 4
PUF SHA2-256 RM(1,6)
3 5
2
Read Flash
FIFO & Logic
6 1
Microblaze AXI BUS
1
Quad SPI SPI Flash UART
Flash Controller
Upper computer
that used for key generation and random number generation are q(y) = 0.9839 and
q(yr) = 0.0376 according to formulas (5)–(7).
To generate 256-bit keys, i.e. lkey = 256, we choose SHA2-256 as the hash module.
As the Reed-Muller code RM(1, m) is a binary [2m, m + 1, 2m−1] linear block code,
substitute pe and the above parameters into formula (2), we get m 6. Therefore, we
finally adopt RM(1, 6), whose code length n ¼ 64 bit and the required random
sequence in every block is 7-bit long, i.e. k = 7. Substitute n, k, lkey and q(y) into in
Eq. (8) we get ly 2744.57. Let it be the smallest integer that can be divided evenly
by n, there is ly = 2816, ly/n = 44. Substitute them and other related parameters into
inequation (9), we learn that the length of the random number sequence ly k=n ¼ 308
bit and lr 8191.49, let lr = 8192.
After we implement all the hardware designs, we find that the size of the generated
mcs file is about 30.74 Mb. As we need 2816 bit response to generate the private key
and every 16-bit response is corresponding to a 20-bit challenge, we totally need
2816 20/16 = 3520 bit challenge. Let the challenge length lchal be the smallest
integer that can be divided evenly by 256, we get lchal = 3584 and lchal/256 = 14.
The working process of the enhanced PUF module is demonstrated as follows:
Key Generation Phase:
1. Receive key generation command from the CPU, then read 35 Mb data from the
SPI flash;
2. Divide the read data into 14 parts, hash each 2.5 Mb part in sequence to totally get a
3584-bit sequence, cut out 3520 bits as the challenge sequence;
3. Invoke the PUF instance by every 20-bit challenge successively to obtain a 2816-bit
response sequence y;
4. Divide the 1024 ROs into 512 pairs, read the response of all the RO pairs to form an
8192-bit response sequence yr, then divide yr into four parts, hash every part to
totally get 512 bits random number, cut 308 bits as the random number sequence r;
A PUF and Software Collaborative Key Protection Scheme 301
5. Use the ECC module to encode the random sequence r by every 7 bits, then sketch
the response sequence y with encoded r and output the helper data;
6. Hash the response sequence y to acquire a 256-bit key and output it.
Key Recovery Phase:
1. Receive recovery command and helper data from the CPU, then read 35 Mb data
from the SPI flash;
2. Divide the read data into 14 parts, hash each 2.5 Mb part in sequence to totally get a
3584-bit sequence, cut out 3520 bits as the challenge sequence;
3. Invoke the PUF instance by every 20-bit challenge successively to obtain a 2816-bit
response sequence y’;
4. Sent the helper data to the ECC module;
5. Use the ECC module to recover the noisy response sequence y′ with the helper data
and get recovered response sequence y″;
6. Hash the recovered response sequence y″ to acquire a 256-bit key and output it.
5 Conclusion
To protect the PUF-based generated key throughout the chip’s lifetime, we propose a
novel key protection scheme, in which we bind the chip’s firmware and the embeded
PUF to collaboratively generate the chip’s exclusive key. Before the valid firmware is
completed, there is no legal key observed, our scheme thereby protects the system
during the manufacturing and software development stages. After the software
development stage, the system is sensitive to any change of the firmware code. The
successful recovery of the legal key in return verifies the device and firmware’s legality
and the hash module naturally forms a CPUF with the PUF instance, which further
boosts the PUF’s resistance to attacks.
References
1. Rührmair, U., Holcomb, D. E.: PUFs at a glance. In: Design, Automation and Test in Europe
Conference and Exhibition (DATE), pp. 1–6 (2014)
2. Gassend, B.: Physical random functions. In: Computer Security Conference, p. 928 (2003)
3. Pappu, R., Recht, B., Taylor, J., Gershenfeld, N.: Physical one-way functions. Science 297
(5589), 2026–2030 (2002)
4. Daniel, E.H., Wayne, P.B., Kevin, F.: Power-up SRAM state as an identifying fingerprint
and source of true random numbers. IEEE Trans. Comput. 58(9), 1198–1210 (2009)
5. Kota, F., Mitsuru, S., Akitaka, F., Takahiko, M., Takeshi, F.: The arbiter-PUF with high
uniqueness utilizing novel arbiter circuit with delay-time measurement. In: ISCAS,
pp. 2325–2328 (2011)
6. Gassend, B.: Physical random functions. M.S. thesis, Massachusetts Institute of Technology
(MIT), MA, USA, p. 36, 52, 209 (2003)
7. Gassend, B., Clarke, D., van Dijk, M., Devadas, S.: Silicon physical random functions. In:
ACM Conference on Computer and Communications Security – CCS, pp. 148–160. ACM
(2002)
8. Guajardo, J., Kumar, S.S., Schrijen, G.-J., Tuyls, P.: FPGA intrinsic pufs and their use for IP
protection. In: Paillier, P., Verbauwhede, I. (eds.) CHES 2007. LNCS, vol. 4727, pp. 63–80.
Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74735-2_5
9. Kota, F., Mitsuru, S., Akitaka, F., Takahiko, M., Takeshi, F.: The arbiter-PUF with high
uniqueness utilizing novel arbiter circuit with delay-time measurement. In: ISCAS,
pp. 2325–2328 (2011)
10. Lee, J.W., Lim, D., Gassend, B., Suh, G.E.: A technique to build a secret key in integrated
circuits for identification and authentication applications. In: 2004 Symposium on VLSI
Circuits, Digest of Technical Papers, vol. 42, pp. 176–179. IEEE (2004)
11. Hospodar, G., Maes, R., and Verbauwhede, I.: Machine learning attacks on 65 nm Arbiter
PUFs: accurate modeling poses strict bounds on usability. In: IEEE International Workshop
on Information Forensics and Security, vol. 2, pp. 37–42. IEEE (2012)
12. Ganji, F., Tajik, S., Fäßler, F., Seifert, J.P.: Strong machine learning attack against PUFs
with no mathematical model. In: Cryptographic Hardware and Embedded Systems – CHES
2016. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53140-2_19
13. Ruhrmair, U., Solter, J.: PUF modeling attacks: an introduction and overview. In:
Conference on Design, Automation & Test in Europe, European Design and Automation
Association, vol. 13, p. 348 (2014)
A PUF and Software Collaborative Key Protection Scheme 303
14. Gassend, B., Clarke, D., Dijk, M.V., Devadas, S.: Controlled physical random functions.
ACM Trans. Inf. Syst. Secur. 10(4), 1–22 (2002)
15. Gassend, B., Clarke, D., Dijk, M.V., Devadas, S.: Controlled physical random functions. In:
Computer Security Applications Conference, 2002, Proceedings, vol. 10, pp. 149–160. IEEE
(2007)
16. Roel, M.: Physically Unclonable Functions: Constructions, Properties and Applications.
Katholieke Universiteit Leuven, Belgium (2012)
17. Dodis, Y., Reyzin, L., Smith, A.: Fuzzy extractors: how to generate strong keys from
biometrics and other noisy data. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004.
LNCS, vol. 3027, pp. 523–540. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-
540-24676-3_31
18. Maes, R., Van Herrewege, A., Verbauwhede, I.: PUFKY: a fully functional PUF-based
cryptographic key generator. In: Prouff, E., Schaumont, P. (eds.) CHES 2012. LNCS, vol.
7428, pp. 302–319. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33027-
8_18
19. Delvaux, J., Gu, D., Schellekens, D., Verbauwhede, I.: Helper data algorithms for
PUF-based key generation: overview and analysis. IEEE Trans. Comput.-Aided Des. Integr.
Circ. Syst. 34(6), 889–902 (2015)
20. Zhang, Q., Liu, Z., Ma, C., Li, C., Zhang, L.: FROPUF: how to extract more entropy from
two ring oscillators in FPGA-based PUFs. In: Deng, R., Weng, J., Ren, K., Yegneswaran, V.
(eds.) International Conference on Security and Privacy in Communication Systems,
pp. 675–693. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-59608-2_37
Towards a Trusted and Privacy
Preserving Membership Service
in Distributed Ledger Using Intel
Software Guard Extensions
1 Introduction
Distributed Ledger Technology offers a range of benefits to public and pri-
vate services, including government, financial institutions and various industrial
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 304–310, 2018.
https://doi.org/10.1007/978-3-319-89500-0_27
Towards a Trusted and Privacy Preserving Membership Service 305
scenarios [21]. One of the most important features of distributed ledger is the
decentralized architecture which brings efficiency and robustness for data pro-
cessing and distribution. Every node in the network hosts a copy of the over-
all state across the ledger. Modifications to a certain transaction recorded on
the ledger can be immediately reflected in all copies, indicating the ability of
the ledger to detect and reject unauthorized changes. Meanwhile, integrity is
preserved for each transaction since it is difficult for attackers to corrupt the
ledger. Blockchain, as a major implementation of distributed ledger technol-
ogy, uses a chain of blocks as a data structure to record transactions and pre-
selected consensus algorithms to achieve agreement on all transactions. Basi-
cally, there are two types of blockchains, namely permissionless and permis-
sioned blockchains. Permissionless blockchains, like Bitcoin [18], allow anyone to
join in the block mining process. Every nodes possesses an identical copy of the
whole ledger and there is no single owner of the ledger. Permissioned blockchains,
also known as private blockchains, supported by most open source blockchain
projects such as Multichain [5], limit node participation under permissions from
blockchain owners or a certain role defined in the blockchain. This methodology
is better suited for applications requiring access control, efficiency, and greater
transparency.
Hyperledger [1] is an open source permissioned blockchain project hosted by
The Linux Foundation, including leaders in finance, banking, IoT, supply chain,
manufacturing and technology. Hyperledger Fabric [7] is an architecture deliv-
ering high degree of confidentiality, reliability, flexibility and scalability on top
of the Hyperledger platform, supporting pluggable implementations of different
customized components. Membership service is one of the critical components
that Hyperledger Fabric provides to support dynamic entity registration and
identity management, as well as auditing.
The CIA triad model [2] is a basic guideline for information security sys-
tems with the requirement of three properties including confidentiality, integrity
and availability. Some previous work [14] meets these three criteria in a cloud
data sharing system by implementing ProvChain, a blockchain based data prove-
nance architecture which guarantees data confidentiality, integrity and availabil-
ity using tamper-proof and immutable blockchain receipt and hash algorithms.
However, the identity management and scalability as well as privacy preserva-
tion remain as challenging problems. In this paper, we design a flexible and
scalable membership service based on Hyperledger Fabric, and propose a set of
protocols to secure the transactions and preserve the privacy of user data in a
fine-grained sense where users can define the extend to which the access is con-
trolled among other users during communication or data sharing. We provide
extra security features in peer communications by enabling Intel SGX, which is
a promising technology introduced since the sixth generation of Intel processors
in 2015 [3]. Our architecture serves as a feasible and scalable solution to satisfy
the requirement of identity management and privacy preservation.
306 X. Liang et al.
2 Background
2.1 Intel SGX Security Capabilities
Intel SGX provides seven capabilities that can be used for security considera-
tions. Enclave Execution (C1). The isolated environment inside the CPU, i.e.,
enclave, is responsible for preventing attackers from accessing the trusted execu-
tion environment and hence greatly reducing the attack surface [16]. Hardware
based attacks such as side-channel attack are also possible to prevent. Remote
Attestation (C2). Remote attestation allows a client platform to attest itself to
a remote party proving that the client is running in a trusted environment. Sim-
ilarly, the server can attest itself to the client in case of phishing attack. This
ensures the integrity of code execution in both the client and server side accord-
ing to application needs. Trusted Elapsed time (C3). This is critical for scenarios
where it is time sensitive, such as auditing. Sealing and Unsealing (C4). Intel
SGX’s sealing feature helps to store confidential information outside the enclave
for future access after system shutdown. Therefore, integrity can be preserved
since no entity can modify the data without the trusted execution environment
or the access to the unsealing key.
nodes. Two kinds of CA roles are set up, including ECA and TCA. For each
client to be enrolled in the system, there will be a RA to collect physical evi-
dence such as a photo ID, or digital measures such as an email address. The
process of identifying a user is launched inside an enclave to ensure trusted exe-
cution and resist against eavesdropping. After identity check (for example, a
validating email), the client will be provisioned with a secret key (SKc ), indi-
cating ownership of cloud data. During enrollment, the Intel SGX attestation
model [6] is adopted between a CA (or a intermediate CA) and the request-
ing client to ensure that the requesting client is running on a trusted platform.
First of all, the CA asks client application for a remote attestation (step 1). The
client application receiving the request is not trusted, so it turns to the enclave
part of the application for a local attestation (step 2). The enclave returns a
REPORT signed with the Enclave Identity Key (step 3), which is forwarded by
the application to the Quoting Enclave (step 4). The Quoting Enclave signs the
Report with its private key and generate the QUOTE which is sent back to the
application (step 5). The application forwards the QUOTE to the CA (step 6).
To verify the authenticity of the QUOTE, CA will send it to an Attestation
Verification Service (AVS) (step 7) and receive a result of the status checking of
the client platform (step 8). If the client wants to ensure that it is requesting to
an authentic CA, it can attest the CA in the same way.
Using Intel SGX attestation feature, the compromise of the platform can be
detected. Generally, the remote attestation is launched only once during enroll-
ment. If the same client platform is re-enrolled, the AVS can detect this event,
under the name based mode [11] which can reveal whether two requests are from
the same CPU. If the user wants to disenroll the system and enroll again after
a period of time, the CA can set the rules that one single platform can re-enroll
after disenroll. If the client platform launches re-enrollment without previous
disenroll, then the CA is sure that the platform is compromised or user identity
is stolen. To differentiate these two cases, the CA keeps a flag for each client
during enrollment, indicating the re-enrollment state. After a successful attes-
tation between the CA and the client, the CA issues an ECert which contains
special attributes needed for the client to participate.
Signing and Verifying Transactions (P2). After a member joins a channel,
the member will request to a TCA for a TCert or a batch of TCerts to sign
transactions. To keep unlinkability between TCerts, multiple TCerts are used to
access multiple channels. We use Camenisch-Lysyanskaya (CL) signature scheme
[8] to provide flexible TCerts by making TCert a zero-knowledge proof of an
ECert. This proof indicates that the TCert is really from the desired ECert but
no information of that ECert will be disclosed. The generation of the proof is
running inside an enclave, which provides a trusted environment.
Besides, for each transaction, selective attribute disclosure can be adopted
for the generation of TCert. For example, Node can use a subset of its attributes
(for example, AttrA) on Transaction A and use another subset of attributes (for
example, AttrB) on Transaction B. This preserves privacy on the transaction
308 X. Liang et al.
level. Each peer participating the transaction verification process uses CA’s pub-
lic key P ubk to verify that a certain transaction is signed by an authorized
member.
Auditing Transactions (P3). Each channel has an auditor to audit transac-
tions within the channel. To support auditability and accountability, the role
of auditor is assigned by MSP at channel creation to audit the behaviors of
client users and peers. To audit transactions from a single channel, the audi-
tor is issued a special TCert from the MSP, which has access to all transactions
inside the channel. This special TCert, generated within an enclave, has the abil-
ity to reveal linkage of transactions to the corresponding entities involved. Each
auditor is responsible for one channel. A set of auditors form an auditing chan-
nel, following the general channel formation protocol. Both internal and external
auditing can be done upon request from application users and service providers.
The corresponding subledger to the auditing channel forms an immutable record
for all auditing logs. This preserves the availability of audit logs for future usage.
Meanwhile, the existence of auditing channel ensures that each auditor is per-
forming decent behaviors. According to system requirement, cross-auditing can
be launched upon requesting of each auditor.
One essential security property for cloud auditing is that the auditor itself
should be auditable [10]. The auditing channel established fulfills this objec-
tive by encapsulating the auditor behavior in an auditing transaction which will
be validated by all auditors in that channel. Auditing transaction contains the
basic elements of an audit trial, including the time, subject and object. The
element of time is critical for auditing considering that some attackers mod-
ify the time of an intrusion event would easily hide their malicious behavior.
SGX provides a trusted timestamping function sgx get trusted time to assist
the auditing process.
privacy and the security of blockchain mining scheme. SGP [20] is a cryptogra-
phy primitive which can be applied in SGX based smart contract for transparent
information exchange with fairness. TC [22] is an authenticated data feed system
which combines a blockchain front end with trusted hardware, such as SGX, to
scrape websites. Besides, research work in [12] also emphasizes the adoption of
SGX based blockchain and smart contract applications for security and privacy
considerations, indicating the significant potential of the wide adoption of SGX
enabled platforms.
Several blockchain architecture integrates Intel SGX. Sawtooth Lake [4] is an
open source distributed ledger project based on trusted execution environment
including Intel SGX which provides a trusted consensus algorithm, called proof of
eclipsed time, in blockchain applications. However, the detailed design is not well
illustrated and does not provide any security guarantee. Teechan [15], introduces
a payment channel, which is established based on SGX and proves to be secure
and efficient for blockchain transactions. [19] proposes a proof of execution using
SGX, which ensures the active participation of nodes in gossip-based blockchains.
Proof of Luck [17] provides a consensus algorithm based on SGX. These work
focus on the security guarantee of peer but are faced with potential privacy risks.
In this paper, we utilize Intel SGX to help prevent these attacks and effec-
tively address the potential privacy issues in blockchain applications, removing
the need of trust between participating nodes during transactions. In the future,
we will utilize the design, implement a privacy preserving blockchain service in a
cloud data sharing application, and evaluate the overall application performance.
References
1. Hyperledger-blockchain technologies for business. https://www.hyperledger.org/
2. Information security - wikipedia. https://en.wikipedia.org/wiki/Information secu
rity
3. Intel architecture instruction set extensions programming reference. https://
software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf
4. Introduction - sawtooth lake latest documentation. https://intelledger.github.io/
introduction.html
5. Multichain private blockchain white paper. http://www.multichain.com/down
load/MultiChain-White-Paper.pdf
6. Anati, I., Gueron, S., Johnson, S., Scarlata, V.: Innovative technology for CPU
based attestation and sealing. In: Proceedings of the 2nd International Workshop
on Hardware and Architectural Support for Security and Privacy, vol. 13 (2013)
7. Cachin, C.: Architecture of the Hyperledger blockchain fabric. In: Workshop on
Distributed Cryptocurrencies and Consensus Ledgers (2016)
310 X. Liang et al.
8. Camenisch, J., Lysyanskaya, A.: A signature scheme with efficient protocols. In:
Cimato, S., Persiano, G., Galdi, C. (eds.) SCN 2002. LNCS, vol. 2576, pp. 268–289.
Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36413-7 20
9. Jain, P., Desai, S., Kim, S., Shih, M.W., Lee, J., Choi, C., Shin, Y., Kim, T., Kang,
B.B., Han, D.: OpenSGX: an open platform for SGX research. In: Proceedings of
the Network and Distributed System Security Symposium, San Diego, CA (2016)
10. Jia, X.: Auditing the auditor: secure delegation of auditing operation over cloud
storage. Technical report, IACR Cryptology ePrint Archive. https://eprint.iacr.
org/2011/304.pdf. Accessed 10 Aug 2016
11. Johnson, S., Scarlata, V., Rozas, C., Brickell, E., Mckeen, F.: Intel software guard
extensions: EPID provisioning and attestation services. White Paper (2016)
12. Kaptchuk, G., Miers, I., Green, M.: Managing secrets with consensus networks: fair-
ness, ransomware and access control. IACR Cryptology ePrint Archive 2017/201
(2017)
13. Kosba, A., Miller, A., Shi, E., Wen, Z., Papamanthou, C.: Hawk: the blockchain
model of cryptography and privacy-preserving smart contracts. In: 2016 IEEE
Symposium on Security and Privacy (SP), pp. 839–858. IEEE (2016)
14. Liang, X., Shetty, S., Tosh, D., Kamhoua, C., Kwiat, K., Njilla, L.: ProvChain:
a blockchain-based data provenance architecture in cloud environment with
enhanced privacy and availability. In: International Symposium on Cluster, Cloud
and Grid Computing. IEEE/ACM (2017)
15. Lind, J., Eyal, I., Pietzuch, P., Sirer, E.G.: Teechan: payment channels using trusted
execution environments. arXiv preprint arXiv:1612.07766 (2016)
16. McKeen, F., Alexandrovich, I., Berenzon, A., Rozas, C.V., Shafi, H., Shanbhogue,
V., Savagaonkar, U.R.: Innovative instructions and software model for isolated
execution. In: HASP@ ISCA, p. 10 (2013)
17. Milutinovic, M., He, W., Wu, H., Kanwal, M.: Proof of luck: an efficient blockchain
consensus protocol. In: Proceedings of the 1st Workshop on System Software for
Trusted Execution, p. 2. ACM (2016)
18. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008)
19. van Renesse, R.: A blockchain based on gossip?-a position paper
20. Tramer, F., Zhang, F., Lin, H., Hubaux, J.P., Juels, A., Shi, E.: Sealed-glass proofs:
using transparent enclaves to prove and sell knowledge. In: 2017 IEEE European
Symposium on Security and Privacy (EuroS&P), pp. 19–34. IEEE (2017)
21. Walport, M.: Distributed ledger technology: beyond blockchain. UK Gov. Off. Sci.
(2016)
22. Zhang, F., Cecchetti, E., Croman, K., Juels, A., Shi, E.: Town crier: an authen-
ticated data feed for smart contracts. In: Proceedings of the 2016 ACM SIGSAC
Conference on Computer and Communications Security, pp. 270–282. ACM (2016)
Malicious Code Defense and Mobile
Security
Deobfuscation of Virtualization-
Obfuscated Code Through Symbolic
Execution and Compilation Optimization
1 Introduction
Virtualization-based obfuscation replaces the code in a binary with semantically
equivalent bytecode, which can only be interpreted by a virtual machine whose
instruction set and architecture can be customized. Thus, it makes the resulting
code difficult to understand and analyze, and is widely used in malware [1].
When regular dynamic and static analyzers are directly applied to analyzing such
code, their execution gets trapped into the VM interpreter and thus can hardly
reveal the real logic of the code. Therefore, how to deobfuscate virtualization-
obfuscated code has been an important and challenging problem.
Existing techniques either reverse engineer the virtual machine to infer the
logic behind the bytecode [2], or execute the obfuscated code and work on
the instruction traces corresponding to the executed bytecode [3–5]. While the
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 313–324, 2018.
https://doi.org/10.1007/978-3-319-89500-0_28
314 M. Liang et al.
restores the original code in the memory during execution. Thus, regular unpack-
ers cannot recover the original code.
Another challenge is various bytecode-level obfuscation can be applied to
the virtualization-obfuscated code, which makes the extracted bytecode even
harder to analyze and understand. For example, a simple x86 instruction can
be translated into several virtual instructions which keep same semantic but
are much more complex to understand. A concrete example is logical operation
obfuscation in VMProtect. A VM in VMProtect does not generate not, and, or
or xor instructions but only nor instructions. All these logical operations are
implemented in nor instructions; e.g., or(a, b) = nor(nor(a, b), nor(a, b)).
Various VM architecture of virtual obfuscators is also a challenge. Conven-
tional deobfuscation tools works well on general architecture (Intel, ARM and
MIPS) but do not support customized VM architecture without specialized adap-
tion. When deobfuscating virtualization-obfuscated code, there is a lack of a
generic technique that tackles various VM architecture with different kinds of
bytecode-level obfuscation. Our goal is to conquer this challenge and propose
such a generic technique.
When running the obfuscated binary, our system records the dynamic trace,
which contains instruction sequences and their operations on registers and
memory. We use Pin [9], a binary instrument tool developed by Intel, to record
the trace. Pin provides instrumentation interface on instruction level for user to
insert callback function before or after execution of instructions, which gives the
chance to record all context information. Although Pin allows us to run analysis
routine upon execution of program, we prefer to record all information to a file
and do offline analysis on the trace file. By decoupling trace recording and the
analysis, we can apply multiple rounds of analysis to the trace without running
the executable repeatedly. Once we have the trace file, we can reconstruct the
control flow graph (CFG) and extract virtual instruction handlers.
The dispatcher fetches the opcode of the instruction pointed to by the VPC,
which is stored in esi register, and then jumps to the corresponding handler
according to the opcode, which is done by an indirect jump through push and
ret instruction. Without dynamic execution, it would be difficult to determine
the target of such indirect jumps. Static analysis tools such as IDA Pro [10] is
unable to generate the CFG for such case.
We instead choose to reconstruct the CFG from the dynamic execution trace.
The basic steps of reconstructing the CFG from trace are as follows:
Step 2: Emerging basic blocks. For each connected pair of basic blocks B1 and
B2 are merged into a new block named B12 , if and only if the outbound
degree of B1 and inbound degree of B2 are both 1.
Step 3: Loop. The processing goes back to Step 2 until no more blocks can be
merged. Then we get our CFG.
Handlers are functions that virtual machine used to interpret virtual instruc-
tions. We can extract semantic information of virtual instructions by analyz-
ing corresponding handler functions. We propose to apply a symbolic execution
based approach to extracting semantics of each handler.
The overall interpretation logic of virtual machine is too complicated to be
symbolically executed as a whole due to time and memory overhead. Our app-
roach instead applies symbolic execution to handler functions separately. Each
handler function processes a single virtual instruction, which usually has simple
logic, therefore the path explosion problem is naturally avoided and symbolic
expressions will not be too complex. In addition, with the use of symbol execu-
tion, many obfuscations on handler functions such as junk code insertion and
instruction replacement are automatically tackled by the symbolic engine.
Deobfuscation of Virtualization-Obfuscated Code 319
In our design, all registers and memory in handler function are initialized
as symbolic variables. After symbolic execution of the function, the symbolic
execution engine outputs a series of symbolic expressions, which represents the
operations the handler does.
Here is an example of vPushReg4 handler in VMProtect. Our prototype
system uses Miasm [11] as our symbolic execution engine, which symbolically
executes binary code of the handler and returns symbolic expressions in Miasm
IR format as below. The disassembly and the symbolic expression are shown
below (‘@’ in Miasm IR expressions means dereference of address):
From the expressions we know that this handler loads the value at a memory
address based by edi offset by eax & 0x3C, then stores the value to where ebp
points to. Assuming edi represents the initial address of the virtual register array
and ebp the virtual stack top, this handler simply pushes a virtual register value
to the virtual stack.
The above example shows that the symbolic expression of the handler func-
tion can fully express its operations; that is, it can capture the semantic infor-
mation of the corresponding virtual instruction.
While function summary can eliminate dead code within a handler, it is inef-
fective in handling obfuscations among virtual instructions, which is one of the
main challenges of deobfuscation we aim to conquer. When CISC instructions
(e.g., x86) are converted into virtual RISC instructions, the problem is more
severe, as a single CISC instruction is usually transformed into multiple virtual
RISC instructions. It will significantly benefit code analysis and understanding if
we can convert the multiple virtual RISC instructions back into the single CISC
one. An intuitive approach is to prepare a whole set of templates for transfor-
mation, each of which tries to match a specific sequence of RISC instructions
and recover the original CISC instruction. Such template-based transformation
has multiple drawbacks: (1) A lot of tedious work has to be done to prepare
the transformation templates; worse, whenever a new VM is encountered, they
320 M. Liang et al.
have to be updated. (2) During the transformation, the register information gets
lost and more inference must be done to restore it. (3) Obfuscators often apply
additional obfuscation on the virtual-instruction layer, which may render the
template match ineffective.
We consider such obfuscation as an opposite process of optimization, since
it replaces the original concise instructions with more complex but semantically
equivalent code, while deobfuscation aims to optimize away the intermediate
variables and redundant instructions introduced by virtual machine. Thus, we
propose to use modern compiler such as gcc and clang, which have been devel-
oped for years and proven to have excellent optimization capabilities, to optimize
the virtual instructions by relying on their built-in data flow analysis and live
variable analysis; this way, we can remove redundant instructions and generate
concise code.
The registers are treated as unsigned integer variables, and are converted to
pointers when used as addresses.
4 Evaluation
We have evaluated our system against VMProtect and Code Virtualizer; both are
well-known commercial obfuscators. We first performed some micro-benchmark
experiments, which consider the following four samples:
(a) (b)
locates the dispatcher address and extracts the handlers from the trace. We
use gcc as our compiler with O3 optimization level against the translated virtual
instruction trace; the results demonstrate that the trace count is greatly reduced
after deobfuscation.
Figure 5 shows an intuitive comparison when we applied our method on the
binop sample, whose source code is shown Fig. 5(a). After optimizing with com-
piler and decompiling with IDA Pro, the deobfuscated the pseudo C code, shown
as Fig. 5(b), is concise and equivalent to the original code. Basic array memory
access and most binary operations such as add, and, or, shift etc. between array
elements are precisely recovered. The xor and subtraction operators are not
exactly same as origin but the deobfuscated code has equivalent semantic.
5 Related Work
Deobfuscation approaches for virtualization-based obfuscated binaries have long
been part of the state-of-the-art research in reverse engineering and binary anal-
ysis. Rolles [2] points out the essence of virtualization obfuscation is bytecode
interpretation, and his paper describes a generic approach to defeating such pro-
tection by completely reversing the emulator, however, no automated system is
presented. Coogan et al. [4] present a semantics-based approach to deobfuscating
virtualization-obfuscated software. In that work, obfuscated binary is executed
and all instructions execution information are recorded in a trace file; then,
instructions that interact with the system directly or indirectly are kept with
other instructions discarded. It does not perform further deobfuscation, though.
Sharif et al. [3] propose an automatic analysis method to extract the virtual
program counter information and construct the original control flow graph from
a virtualized binary; Kalysch et al. [12] present VMAttack, an IDA Pro plugin,
as an assistance tool for analyzing virtualization-packed binaries. Both are new
analysis methods but unable to recover the deobfuscated code. While Yadegari
et al. [13] pointed out compiler optimization can assist deobfuscation, specifically
arithmetic simplification in their case (Sect. III.C), they did not reuse any com-
pilers as a generic approach to deobfuscation. Instead, they use taint analysis to
identify instructions for value propagation, and various specialized optimizations
for simplifying the code; plus, symbolic execution is used to generate inputs for
running a binary. To our knowledge, our system is the first that reuses mod-
ern compilers and leverages compilation optimization as a generic approach to
deobfuscating virtualization-obfuscated code.
6 Conclusion
Virtualization obfuscation has been proven to be one of the most effective tech-
niques to obfuscate binaries. This paper presents a novel automated deobfus-
cation method. It first constructs a CFG via offline trace analysis to detect
dispatcher and handler functions, and symbolically execute the handlers to gen-
erate symbolic expressions. Then, symbolic expressions are translated into C
324 M. Liang et al.
code and bytecode is converted into invocations of the C functions, which are
then optimized by compilers to recover simplified and semantically equivalent
code. We have implemented a prototype system and evaluated it against popular
commercial obfuscators. The experimental result indicates that our system can
successfully recover concise code from virtualization-obfuscated code. Our work
demonstrates that compilation optimization is an effective and generic approach
to tackling virtualization obfuscation.
References
1. Nagra, J., Collberg, C.: Surreptitious Software: Obfuscation, Watermarking, and
Tamperproofing for Software Protection. Pearson Education, London (2009)
2. Rolles, R.: Unpacking virtualization obfuscators. In: 3rd USENIX Workshop on
Offensive Technologies (WOOT) (2009)
3. Sharif, M., Lanzi, A., Giffin, J., Lee, W.: Automatic reverse engineering of malware
emulators. In: 2009 30th IEEE Symposium on Security and Privacy, pp. 94–109.
IEEE (2009)
4. Coogan, K., Lu, G., Debray, S.: Deobfuscation of virtualization-obfuscated soft-
ware: a semantics-based approach. In: Proceedings of the 18th ACM Conference
on Computer and Communications Security, pp. 275–284. ACM (2011)
5. HexEffect: Virtual deobfuscator. http://www.hexeffect.com/virtual deob.html
6. VMProtect: Vmprotect software protection. http://vmpsoft.com/
7. Oreans: Code virtualizer. https://oreans.com/codevirtualizer.php
8. Ugarte-Pedrero, X., Balzarotti, D., Santos, I., Bringas, P.G.: SoK: deep packer
inspection: a longitudinal study of the complexity of run-time packers. In: 2015
IEEE Symposium on Security and Privacy (SP), pp. 659–673. IEEE (2015)
9. Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S.,
Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with
dynamic instrumentation. ACM SIGPLAN Not. 40, 190–200 (2005)
10. Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular
Disassembler. No Starch Press, San Francisco (2011)
11. CEA: cea-sec/miasm: reverse engineering framework in Python. https://github.
com/cea-sec/miasm
12. Kalysch, A., Götzfried, J., Müller, T.: VMAttack: deobfuscating virtualization-
based packed binaries. In: ARES (2017)
13. Yadegari, B., Johannesmeyer, B., Whitely, B., Debray, S.: A generic approach to
automatic deobfuscation of executable code. In: 2015 IEEE Symposium on Security
and Privacy (SP), pp. 674–691. IEEE (2015)
A Self-healing Key Distribution Scheme
for Mobile Ad Hoc Networks
1 Introduction
The MANET is a multi-hop temporary autonomous system consisting of a set of
mobile nodes with wireless transceivers, unlike traditional networks that rely on
communications infrastructure, all mobile nodes in MANET assume both com-
munication and routing responsibilities, and all mobile nodes are equal, without
a central control organization [20]. And the nodes move in or out range dynami-
cally so that the topology of network dynamically changes. These features guar-
antee the flexibility of MANET applications, but also face to many challenges.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 325–335, 2018.
https://doi.org/10.1007/978-3-319-89500-0_29
326 G. Xiang et al.
2 Preliminaries
Our scheme is based on the Dual Directional Hash Chain(DDHC), we first intro-
duce the definition of a one-way hash function, which is the foundation of the
DDHC. A hash function is any function that can be used to map data of arbitrary
size to data of fixed size. A one-way hash function H(x) satisfies the following
conditions:
A DDHC consists of two one-way hash chains with equal length, a forward
hash chain and a backward hash chain. First, generating two random key seeds
F K and BK from finite field Fq . Then repeatedly applies the same one-way
function H(x) on each key seed to produce two hash chains of equal length m.
So, the DDHC is defined as follows:
{KB
m
= H m (KB
0 1
), . . . , KB 0
= H(KB 0
), KB = BK}
In our scheme, we mainly focus on how to guarantee the security of com-
munication between mobile nodes. To further clarify our goals and facilitate the
later presentation, We define security model for the proposed self-healing key
distribution scheme as follows:
(1) Key confidentiality: Any mobile nodes that are not the member of the group
have no access to the keys that can decrypt the data that broadcast to the
group.
(2) Forward secrecy: For the set Rj of mobile nodes revoked before session j, it
is computationally infeasible for the mobile nodes ui ∈ Rj colluding together
to recover any of subsequent session keys SKj , SKj+1 , · · · , SKm , even with
the knowledge of keys SK1 , SK2 , · · · , SKj−1 .
(3) Backward secrecy: For the set Jj of new mobile nodes joined after session
j, it is computationally infeasible for the mobile nodes ui ∈ Jj colluding
together to recover any of past session keys SK1 , SK2 , · · · , SKj , even with
the knowledge of keys SKj+1 , SKj+2 , · · · , SKm .
328 G. Xiang et al.
(4) Collision resistant: Given any set Ri of mobile nodes revoked before session
i and any set Jj of new mobile nodes joined after session j, i < j. It is
computationally infeasible for a colluding coalition Ri ∪ Jj to recover any
keys SKi , SKi+1 , · · · , SKj−1 between session i and session j.
(5) Revocation capability: The illegal mobile nodes will be removed from the
current group in time when the detection system detects the illegal mobile
nodes.
3 Proposed scheme
Initialization:
KDC assigns key parameters
Mobile
Is the message Y node :Calculates
Rrekeying? the new key for
this session
N
Reinitialize
3.1 Initialization
Let t be a positive integer. KDC first randomly chooses a bivariate t-degree
polynomial h(x, y) from a small finite field Fq [x, y]:
h(x, y) = aij xi y j mod q (t ≤ m, t ≤ n) (1)
0≤i,j≤t
According to the identifier of each mobile node and h(x, y), KDC assigns a
polynomial h(ui , y) to each mobile node as its mask polynomial.
KDC randomly picks the forward key seed F K, the backward key seed BK
and the self-healing hash seed SH from Fq , respectively. Then KDC generates
three hash chains: f k j = H j (F K), bk j = H m−j+1 (BK), shj = H1j (SH), Where
the structure of the DDHC adopts the hash function H(x), and the structure of
the self-healing hash chain adopts the hash function H1 (x).
During the initialization phase, KDC will send the following packet over the
secure channel to the mobile node ui with the session period si to sj (1 ≤ si <
sj ≤ m).
KDC → ui : {Tr h(ui , y)f k i shi }
where Tr is the length of each session, f k i = H i (F K) is the forward key hash
value and shi is the self-healing hash value corresponding to session si . When the
mobile node receives the initialization packet, decrypts and obtains the corre-
sponding session key update parameters and system parameters, sets the update
time of timer to Tr , and saves the session key update parameters.
Then KDC constructs the recovery polynomial ψj (x) to recover the lost ses-
sion keys:
ψj (x) = rj (x)shj + h(x, f kj ) mod q (4)
330 G. Xiang et al.
Where legal mobile nodes compute the access polynomial vj (x) will get vj (x) =
1, however, the revoked mobile nodes will get a random value that different
from 1.
After constructing the access polynomial, KDC will compute the broadcast
polynomial bj (x):
Based on these preparations, KDC will broadcast the key update messages
Bj in the group. The format of Bj is as follows:
bj (ui ) − h(ui , f k j )
bk j = mod q (8)
vj (ui )
If the mobile node is revoked, the result of access polynomial is a random
value different from 1, the mobile node cannot obtain the right backward hash
value bk j .
If there is no revoked mobile node in sj , ui computes shj = H1 (shj−1 ). Oth-
erwise, ui can compute shj from the recovery polynomial ψj (x) in the broadcast
message Bj :
ψj (ui ) − h(ui , f k j )
shj = shj = mod q (9)
rj (ui )
Finally, ui can further compute the session key SKj = f k j + shj × bk j .
At the same time, KDC adds ui to the legal set Uj , updates key and recovers
key.When a non-revoked mobile node leaves the communication group,the mobile
node will be deleted from the legal set Uj , and the mobile node can rejoin the
communication group with the same identify.
Revoke group members: Assuming that a mobile node is captured by the
attacker during sj , KDC immediately broadcasts an Rrekeying key update mes-
sage to revoke the captured mobile node:
When the non-revoked mobile node receives the Rrekeying message, it will
compute the new session key SKj . Noted that when the mobile node is revoked
during the session, the non-revoked mobile node does not reset the timer.
4 Security Analysis
Theorem 1: The scheme is a session key with privacy and achieves self-healing
with revocation capability.
Proof: (1) The scheme is a session key with privacy: For a non-revoked mobile
node ui in sj , the SKj is determined by f k j , bk j and shj . f k j is assigned to
the non-revoked mobile node when the node joins the communication group. bk j
can only be recovered by non-revoked mobile nodes at the beginning of each
session. Even if a revoked mobile node obtain f k j and bk j , shj will be updated
332 G. Xiang et al.
immediately when a mobile node is revoked, such that the revoked cannot recover
shj . Thus, it is impossible for any mobile node to obtain the session key only by
f k j and bk j or only by shj .
(2) Self-healing: As described in Sect. 3.5, a non-revoked mobile node can recover
the lost session key by the self-healing hash value and the recovery polyno-
mial.
(3) Revocation capability: In the scheme, the session key is updated in two ways,
one is to update periodically, and the other is to update when the mobile
node is revoked. The periodic update prevents the session key from being
cracked because it uses time too long. The dynamic revocation mechanism
ensures that the revoked mobile node is removed from the communication
group in time to avoid further damage to the system. Let R be the set of all
mobile nodes revoked in and before sj . For a mobile node ui ∈ R, because
the access polynomial vj (ui ) is always zero, ui cannot recover bk j from the
broadcast polynomial bj (ui ), moreover, once the mobile node ui is revoked,
the self-healing hash value shj will replace by a random value shj , ui cannot
obtain shj . Because ui cannot obtain bk j and shj , it is infeasible for ui to
recover the session key SKj .
Proof: (1) Forward security: Let R be the set of all mobile nodes revoked in
and before sk . Consider a mobile node ui ∈ R, whose lifetime is from sstart to
send . We can analyze the forward security in two scenarios:
sstart < sk ≤ send , which signifies that sk in the lifetime of ui , and ui can
obtain f k k . If ui is revoked before sk , ui cannot recover bk k and shk , so ui
cannot recover SKk . If ui is revoked in sk , ui can recover bk k , but shk will be
replaced by a new random value shk when ui is revoked, ui cannot recover shk ,
so ui cannot recover SKk .
sk > send , in this case ui could only obtain f k k and cannot recover bk k
and shk . Thus ui cannot recover SKk . As a result, the scheme achieves forward
security.
Theorem 3: The scheme resists collusion of revoked mobile nodes and newly
joined mobile nodes.
Proof: Suppose the mobile node ui is revoked in si and the mobile node uj join
the group in sj , where si < sj . ui and uj can collude to obtain the value of DDHC
from si to sj . The self-healing hash chain is a forward hash chain in the scheme,
shi will be replaced by a new random value shi when ui is revoked, subsequent
A Self-healing Key Distribution Scheme for Mobile Ad Hoc Networks 333
self-healing hash value will be computed with shi , that is shj = H1j−i (shi ). It is
computationally infeasible to compute shj−1 even if it obtains shj . Therefore,
even if the revoked mobile node in collusion with the newly joined mobile node,
they cannot obtain session keys more than their lifetime. As a result, the scheme
resists the collusion of revoked mobile nodes and newly joined mobile node.
5 Performance analysis
In order to evaluate the performance of the proposed scheme, we will compare
with the communication overhead and storage overhead between our scheme
and the previous self-healing key distribution schemes based on hash chain. The
comparison results are shown in Table 1. The storage overhead of a non-revoked
mobile node with the lifetime from si to sj is shown in Table 2, and the total
storage overhead of a non-revoked mobile node is (t + 4) log q bits.
At the session sj , the broadcast message Bj consists of t-degree broadcast
polynomial bj (x), v t-degree recovery polynomials ψj (x), set Uj and the revo-
cation set R. The communication overhead of the set Uj and the revocation set
R can be ignored because the mobile node identify can be selected from a small
finite field Fq . Therfore, The total communication overhead of our scheme is
(v + 1)(t + 1) log q bits, where 0 ≤ v < j ≤ m.
According to the Table 1, the scheme 3 of [5] is better than our scheme in
term of storage overhead and communication overhead, however, the scheme 3
of [5] can not resist the collusion of newly joined user and revoked users whose
lifetimes do not expire and the users in the scheme can not be revoked during
a session. Although the scheme of [2,7] have the revocation capability, they can
only resist the collusion of newly joined user and revoked users incompletely
and the users in these two schemes can not be revoked during a session. The
scheme of [12] has the revocation capability, but the users in the scheme can
not be revoked during a session. From the comparison in Table 1, although the
storage overhead and communication overhead of proposed scheme are slightly
increased, only our scheme can revoke a user dynamically and resist collusion of
newly joined user and revoked users no matter whether their lifetimes expire or
not.
6 Conclusion
A self-healing group key distribution scheme with dynamic revocation and collu-
sion resistance is proposed in this paper. The scheme based on DDHC to ensure
the forward security and backward security of session key. For the problem of
packet loss, the scheme introduces a self-healing hash chain to ensure that the
non-revoked mobile node can recover the lost session key. At the same time, the
scheme has a small storage overhead and communication overhead and can be
applied to resource-constrained MANET communication.
References
1. Chen, H., Xie, L.: Improved one-way hash chain and revocation polynomial-based
self-healing group key distribution schemes in resource-constrained wireless net-
works. Sensors 14(12), 24358–24380 (2014)
2. Du, C., Hu, M., Zhang, H., Zhang, W.: Anti-collusive self-healing key distribution
scheme with revocation capability. Inf. Technol. J. 8(4), 619–624 (2009)
3. Du, W., He, M.: Self-healing key distribution with revocation and resistance to
the collusion attack in wireless sensor networks. In: Baek, J., Bao, F., Chen, K.,
Lai, X. (eds.) ProvSec 2008. LNCS, vol. 5324, pp. 345–359. Springer, Heidelberg
(2008). https://doi.org/10.1007/978-3-540-88733-1 25
4. Dutta, R., Mukhopadhyay, S.: Improved self-healing key distribution with revoca-
tion in wireless sensor network. In: Wireless Communications and NETWORKING
Conference, pp. 2963–2968. IEEE (2007)
5. Dutta, R., Mukhopadhyay, S., Collier, M.: Computationally secure self-healing key
distribution with revocation in wireless ad hoc networks. Ad Hoc Netw. 8(6), 597–
613 (2010)
6. Guo, H., Zheng, Y.: On the security of a self-healing group key distribution scheme.
Wirel. Pers. Commun. 91(3), 1109–1121 (2016)
7. Guo, H., Zheng, Y., Zhang, X., Li, Z.: Exponential arithmetic based self-
healing group key distribution scheme with backward secrecy under the resource-
constrained wireless networks. Sensors 16(5), 609 (2016)
8. Liu, D., Ning, P., Sun, K.: Efficient self-healing group key distribution with revoca-
tion capability. In: ACM Conference on Computer and Communications Security,
pp. 231–240. ACM (2003)
9. Staddon, J., Miner, S., Franklin, M., Balfanz, D., Malkin, M., Dean, D.: Self-
healing key distribution with revocation. In: 2002 IEEE Symposium on Security
and Privacy, Proceedings, pp. 241–257. IEEE (2002)
A Self-healing Key Distribution Scheme for Mobile Ad Hoc Networks 335
10. Sun, X., Wu, X., Huang, C., Zhong, J., Zhong, J.: Modified access polynomial
based self-healing key management schemes with broadcast authentication and
enhanced collusion resistance in wireless sensor networks. Ad Hoc Netw. 37(P2),
324–336 (2016)
11. Tian, B., Han, S., Dillon, T.S.: An efficient self-healing key distribution scheme.
In: New Technologies, Mobility and Security, pp. 1–5. IEEE (2008)
12. Tian, B., Han, S., Dillon, T.S., Das, S.: A self-healing key distribution scheme based
on vector space secret sharing and one way hash chains. In: World of Wireless,
Mobile and Multimedia Networks, pp. 1–6. IEEE (2008)
13. Wang, G., Wen, T., Guo, Q., Ma, X.: An efficient and secure group key management
scheme in mobile ad hoc networks. J. Comput. Res. Dev. 30(3), 937–954 (2010)
14. Wang, Q., Chen, H., Xie, L., Wang, K.: Access-polynomial-based self-healing group
key distribution scheme for resource-constrained wireless networks. Secur. Com-
mun. Netw. 5(12), 1363–1374 (2012)
15. Wang, Q., Chen, H., Xie, L., Wang, K.: One-way hash chain-based self-healing
group key distribution scheme with collusion resistance capability in wireless sensor
networks. Ad Hoc Netw. 11(8), 2500–2511 (2013)
16. Xu, Q., He, M.: Improved constant storage self-healing key distribution with revo-
cation in wireless sensor network. Inf. Secur. Appl. 5379, 41–55 (2008)
17. Yang, Y., Zhou, J., Deng, R.H., Bao, F.: Computationally secure hierarchical self-
healing key distribution for heterogeneous wireless sensor networks. Inf. Commun.
Secur. 5927, 135–149 (2009)
18. Yuan, T., Ma, J., Zhong, Y., Zhang, S.: Efficient self-healing key distribution with
limited group membership for communication-constrained networks. In: IEEE/I-
FIP International Conference on Embedded Ubiquitous Computing, pp. 453–458.
IEEE (2008)
19. Yuan, T., Ma, J., Zhong, Y., Zhang, S.: Self-healing key distribution with limited
group membership property. In: International Conference on Intelligent Networks
Intelligent Systems, pp. 309–312. IEEE (2008)
20. Zhu, S., Setia, S., Xu, S., Jajodia, S.: GKMPAN: an efficient group rekeying scheme
for secure multicast in ad-hoc networks. J. Comput. Secur. 14(4), 301–325 (2006)
21. Zou, X., Dai, Y.S.: A robust and stateless self-healing group key management
scheme. In: International Conference on Communication Technology, pp. 1–4. IEEE
(2006)
IoT Security
SecHome: A Secure Large-Scale Smart
Home System Using Hierarchical Identity
Based Encryption
1 Introduction
In recent years, smart home devices have received much attention due to their
potential applications and the proliferation of Internet of Things. As a result,
users who would like to set up different access controls to different people and
devices are driven to use an access manager. Some access managers are pro-
vided by smart home vendors as part of smart home ecosystem, and some
are provided by third-party cloud services. There are several different types of
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 339–351, 2018.
https://doi.org/10.1007/978-3-319-89500-0_30
340 Y. Li et al.
Internet of Things access methods with a smart home system, which include
IEEE 802.11(Wi-Fi), Bluetooth and ZigBee. All these technologies are helping
smart devices connecting to a cloud center. The owner of the devices can utilize
her smart phone to control and monitor them. Most of them are wireless-network
based where passwords are backed up to cloud and synced across the smart home
devices.
However, numerous recent surveys show that smart home systems are vulner-
able to hacking because of the weak authentication and authorization. The secu-
rity and privacy issues are highly concerned for smart home adoption since the
data generated by living environment is usually sensitive [1]. So far, we can iden-
tify two types of issues for smart home system: security issue and privacy issue.
By saying security issue, we refer to the broad class of adversaries that inten-
tionally attack the system. The problems of security issue need to be addressed,
for example, using authentication and encryption scheme to avoid interference
over the communication channel. Although the authentication between users
and cloud is important in previous home automation researches, the focus of
this paper is on privacy, which we expect will be the dominant concern of users
in a smart home system. Specifically, privacy issue is concerned by home own-
ers who are afraid that cloud service providers will get sensitive information
and affect the security of the living environment. For example, sensitive infor-
mation like smart lock will indicate whether there are any people at home. A
cloud service provider will need a secure protocol to guarantee users’ private
information.
Current existing smart home systems use Transport Layer Security (TLS) or
HTTPS protocol to provide authentication as well as encryption. However, such
smart home systems may have several issues. First, there is no guarantee of third
party security and privacy protection. The cloud server of smart home systems
would master all sensitive data and it is difficult to say that the data won’t harm
users’ privacy. Second, the smart home owner cannot issue keys to her home
members dynamically so that they must interact with the cloud server which
may have many potential dangers. Third, compared with the traditional security
system, smart devices are more vulnerable since the low-cost embedded systems
are used. Therefore, we cannot just rely on the traditional security system to
provide a strong security guarantee in Cyber-Physical Systems, especially in
smart home system.
In this paper, we propose a secure large-scale smart home system using hier-
archical identity based encryption. Generally speaking, the basic scheme of our
system can be described as follows. When a home owner starts setting a smart
home, she issues a secret key to the home members based on the hierarchy of
the home. Later, when any home member buys a smart device, the owner issues
a private key for it and it connects to the presetting private cloud. The pri-
vate cloud then communicates with the public cloud by using the public ID to
encrypt the sensitive data. In order to allow users to control the access to the
smart home devices, suitable hierarchy and authentication as well as encryption
are required.
SecHome: A Secure Large-Scale Smart Home System 341
There are two main issues in our system design that need to be addressed.
First, since a malicious user may attempt to impersonate a normal user and
control her smart devices, every device needs to be authenticated by smart home
cloud server to ensure that device qualifies for connecting. Second, different
home members should have different access rights to monitor and control smart
devices. For instance, the smart home owner can control and issue access rights
to her home members.
Our contributions can be summarized as follows:
– We are the first to study privacy protection in smart home system by using
hybrid cloud architecture and to propose a hierarchical key management
scheme.
– To the best of our knowledge, we are also the first to study a privacy protection
in smart home using Hierarchical Identity Based Encryption.
– In terms of privacy, our algorithm leaks no knowledge about each party’s
data.
The rest of this paper is organized as follows. In Sect. 2, we discuss the related
works. In Sect. 3, we present the overall architecture and intuitions behind our
design. We then give the full specification of our system and analyze the security
in Sect. 4. In Sect. 5, we present evaluations of our solution. Our conclusion and
future work are shown in Sect. 6.
2 Related Work
According to [2,3], this novel paradigm, named “Internet of Things”, is rapidly
gaining ground as the modern wireless networks technology. However, in [4],
the IoT has a great impact on personal privacy and security. Therefore, a high
degree of reliability is needed which includes data authentication, access control
and clients’ privacy. Unlike other Cyber-Physical Systems devices, smart home
devices have a direct influence on people’s daily life. Therefore, the design of
the access control scheme in smart home system is extremely important and the
encrypted data generated by the system should only be viewed by the correct
home member. To the best of our knowledge, there is no such scheme that can
only let smart home owner monitor the data. Each of the existing smart home
systems relies on a reliable third party cloud server.
Identity-Based Encryption(IBE) which was proposed by Shamir in [5] can
simplify key management in a Public Key Infrastructure(PKI) by using objects’
identities(e.g., unique mobile phone number, email address, product serial num-
ber, etc.) as public keys. After that the first secure IBE scheme was proposed
by Boneh and Franklin [6] from the bilinear pairings. They have also proved
that the IBE scheme is semantically secure against adaptive chosen-ciphertext
attack under the DBDH assumption in the random oracle model. Moreover, a
handful of researches on constructing provable secure IBE scheme were proposed
342 Y. Li et al.
in [7–10]. Many alternative approaches are derived from IBE with the develop-
ment of cloud computing, for example, Role Based Encryption (RBE) [11] and
Attribute Based Encryption (ABE)[12–16]. Another approach which can enforce
access control policies and data encryption is to apply the Hierarchical ID-based
Encryption (HIBE)[17–19] to Internet of Things.
The design of our secure smart home system is motivated by [20], which is
an emerging active research area in the intersection of computer security and
Internet of Things. In HIBE, the length of the identity becomes longer with
the growth in the depth of hierarchy, which is suitable for a home’s structure
since the depth of the hierarchy would not be too large in a family. However, as
far as we know, there is no previous proposed security and privacy protection
scheme for smart home systems. Furthermore, although our design is motivated
by HIBE proposed in [20], our problem does not fit exactly into the specific
cryptographic-design. For example, in our system, we consider various hardware
securities, such as secure boot and data isolation [21].
3 Architecture
In this section, we present the architecture of our secure hierarchical smart home
system. In our system, there are five components which include public cloud,
private cloud, home owner, home members and smart devices. The architecture
is shown in Fig. 1. Public cloud is utilized to store and transfer the encrypted
sensitive data. Due to the limited computation capacity, a private cloud is used
to help encrypting and decrypting data generated by the smart device sensors.
Home owner and home members use a smart phone to control and monitor data
generated by the smart devices. Home owner, home members and smart devices
are arranged in a form of a hierarchy which can be visualized as a pyramid. The
hierarchy will be described in Sect. 5.2.
In our secure hierarchical smart home architecture, we assume that a private
cloud has been set up in the home. Then we let the owner of the home be at
the depth 1 in the home hierarchy. After that, the home owner can set keys for
her children nodes as well as the sub-structure as shown in steps 1 and 2. Home
members also set keys for their smart devices in steps 3 and 4. Smart devices
transfer the encrypted data to private cloud using symmetric key encryption as
shown in steps 5 and 6. The private cloud can only be accessed by the home
owner and the data must be encrypted by HIBE encryption when they leave
the private cloud as shown in step 7 and 8. A public cloud is a third-party
cloud which is employed to store and transfer the encrypted data. Home owner
and home members who equip with a smart phone can send HIBE encrypted
data and command public cloud in step 9–12. Therefore, the top of the home
hierarchy can monitor the huge data generated by their descendant nodes and
smart devices through the public cloud. The step details of how to set up keys
and encrypt data for home members and smart devices will be described in
Sect. 5.2.
In this section, we present the overview of our system and the intuitions behind
our design. Boneh et al. proposed a hierarchical identity based encryption with
constant size ciphertext in [20] which can apply to a number of applications.
However, since the encryption scheme they proposed is only a general approach
for constructing HIBE using pairing based encryption, we will apply the HIBE
that matches the properties of smart home system. Moreover, we also provide
a revocation mechanism to keep our system more secure. At a high level, smart
home device usually contains an embedded chip with a real time operating sys-
tem running on it. But this kind of systems is more vulnerable because of the
design of the system architecture [22,23]. Therefore, our system also considers
other aspects of security that would protect the data stored in the device. Intu-
itively, we provide a hardware-assisted dynamic root of trust which allows secure
task loading at the runtime.
Our goal is to maintain the confidentiality and integrity of users’ data running
on a network of hosts potentially under the control of an adversary. This section
outlines our design to achieve this with good performance and through keeping
adversary’s attack out of the SecHome system.
Algorithm 1 shows the overall scheme of our smart home system, which con-
sists of the initialization, adds and revokes nodes in the hierarchy. Specifically,
people nodes in this hierarchy are equipped with a mobile device that allows
them to generate keys for children nodes and to send as well as receive messages
through a wireless network.
344 Y. Li et al.
When a new home member joins in this hierarchy in depth 1 and controls a device
in depth 2,
Home owner only needs to generate a dID1 for the member and sends the
encrypted device’s r to him via a secret channel using Algorithm 2. The member
can use device’s r to control or get the data from it.
Recursively we can set up keys for all nodes in the hierarchy.
When a home owner generates a key for her children nodes, for example,
we use a general HIBE structure in Fig. 2 to illustrate our scheme and explain
how the proposed scheme adds and revokes a home member or device in the
hierarchy. Start by assuming a home owner’s id is P id1 . The owner has two
SecHome: A Secure Large-Scale Smart Home System 345
children nodes P id2 and P id3 . A smart device, with ID number Did1 can only
be controlled by P id1 . Home owner P id1 generates public parameters and master
key using Setup mentioned above. For the children nodes P id2 and P id3 , home
owner generates two tuples (P id1 , P id2 ) and (P id1 , P id3 ). Then the owner uses
KeyGen to produce two private keys for the two children nodes. In order to
send the keys to the children nodes securely, we choose the broadcast encryption
scheme proposed in Algorithm 2 which needs a presetting secret key sku for each
node. The parent node first sets up an identity space ID, and then produces
a message (Hdr, c) for each user and broadcast it to children node group. The
children nodes decrypt the message using their presetting secret key sku to get
the private key dI . For the device Did1 , the key distribution process is as same
as the children nodes. However, there is another scenario that another person
may also need to control it after the key distribution. Therefore, we are required
to add an element between P id1 and Did1 . To solve this problem, P id1 can issue
a private key to the new element and send the encrypted dDid1 to the owner.
Then the new element can be inserted into the hierarchy successfully.
In another scenario, child node P id3 buys a smart device Did2 and wants to
set it up within this hierarchy. First, P id2 generates private keys for the device
using public parameters of the parent node. Then P id2 encrypts the devices’
private r using Encrypt in Algorithm 1 and sends back to the parent node. The
parent node can decrypt it using Decrypt and get r in order to obtain the private
key of the device. Therefore, P id1 can control and gain the data generated by
device Did2 .
Sometimes, a node in the hierarchy needs to update the decryption, or even
the node should be revoked in this hierarchy. For example, the node P id4 needs
to be revoked in the hierarchy. We first generate a subtree that contains P id4
from the root node to the leaf nodes. In this hierarchy, the node P id4 has a parent
node P id2 and two children nodes P id6 and P id7 . Therefore, P id2 generates two
private keys for P id6 and P id7 and recursively the descendant nodes of P id4
get new keys and send the encrypted r using ID to all ancestor nodes. Then the
node P id4 has been successfully revoked.
346 Y. Li et al.
authenticity and ChaCha20 can provide a 256-bit security level [26]. We have
performed our experiments on a cluster server with two 6-core Intel(R) Xeon(R)
CPU 1.90 GHz processors, 16 GB of RAM, and 6 TB 7200 RPM hard disks, which
are connected by gigabit switched Ethernet. On the smart phone side, we per-
formed our experiments on a smart phone equipped with a Samsung Exynos4412
CPU 1.5 GHz processor, 1 GB of RAM and 8 GB flash disk. On the smart device
side, we have performed our experiments on Raspberry Pi2 equipped with a
ARM Cortex-A7 based BCM2836 CPU 900 MHz processor, and 1 GB of RAM.
Encryption and decryption are the most frequently used operations in the sys-
tem. Since we have smart phone side and smart device side that need to encrypt
and decrypt their data, we first measure the time taken at the smart phone for
performing encryption and decryption. The time for smart phone decryption is
measured from the time the smart phone receives the encrypted data from the
private cloud, to the time the smart phone starts to display the data to the home
member.
Figure 3(a) shows the time that the smart phone has spent in executing the
encryption and decryption algorithm on different sizes of data. In this case,
increasing the size of the plaintext also increases the decryption time; increasing
the number of ancestor nodes has the same influence on the encryption and
decryption time. However it is important to note that the number of ancestors
is usually much smaller than the increasing of the plaintext data.
Figure 3(b) shows the time that the smart phone has spent on different depth
of the hierarchy. In this experiment, we have created a hierarchy with depth of
7. From the result we can get the depth has a minor influence on each node’s
encryption and decryption performance.
Next we look at the smart device side operation time. Figure 4(a) shows the
time for encrypting and decrypting files of different sizes on the smart device
side. In this experiment, we have created a hierarchy with depth of 5. In our mea-
surements, the encryption time is measured from the time when a sensor starts
to collect the data to the time the smart device starts sending the encrypted
data to the cloud. The decryption time is measured from the time the cloud
starts sending encrypted data to the time the smart device starts executing the
command. From the result, we can believe that SecHome has the potential to
be used in many commercial situation.
Since we use ChaCha20 as our symmetric encryption algorithm, the encryp-
tion and decryption can happen while the data is being transferred between
smart phones, smart devices and cloud. Figure 4(b) shows the comparison of
ChaCha20 and AES which demonstrates that ChaCha20 is more suitable for
smart phones and devices.
6 Conclusion
References
1. Brush, A., Lee, B., Mahajan, R., Agarwal, S., Saroiu, S., Dixon, C.: Home automa-
tion in the wild: challenges and opportunities. In: Proceedings of the SIGCHI Con-
ference on Human Factors in Computing Systems, pp. 2115–2124. ACM (2011)
2. Atzori, L., Iera, A., Morabito, G.: The internet of things: a survey. Comput. Netw.
54(15), 2787–2805 (2010)
3. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of things (IoT): a
vision, architectural elements, and future directions. Future Gener. Comput. Syst.
29(7), 1645–1660 (2013)
4. Weber, R.H.: Internet of things-new security and privacy challenges. Comput. Law
Secur. Rev. 26(1), 23–30 (2010)
5. Shamir, A.: Identity-based cryptosystems and signature schemes. In: Blakley, G.R.,
Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 47–53. Springer, Heidelberg
(1985). https://doi.org/10.1007/3-540-39568-7 5
6. Boneh, D., Franklin, M.: Identity-based encryption from the weil pairing.
In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 213–229. Springer,
Heidelberg (2001). https://doi.org/10.1007/3-540-44647-8 13
7. Boneh, D., Boyen, X.: Efficient selective-ID secure identity-based encryption with-
out random oracles. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004.
LNCS, vol. 3027, pp. 223–238. Springer, Heidelberg (2004). https://doi.org/10.
1007/978-3-540-24676-3 14
8. Waters, B.: Efficient identity-based encryption without random oracles. In: Cramer,
R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 114–127. Springer, Heidelberg
(2005). https://doi.org/10.1007/11426639 7
9. Waters, B.: Dual system encryption: realizing fully secure IBE and HIBE under
simple assumptions. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 619–
636. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8 36
10. Liang, K., Liu, J.K., Wong, D.S., Susilo, W.: An efficient cloud-based revoca-
ble identity-based proxy re-encryption scheme for public clouds data sharing. In:
Kutylowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8712, pp. 257–272.
Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11203-9 15
11. Zhou, L., Varadharajan, V., Hitchens, M.: Achieving secure role-based access con-
trol on encrypted data in cloud storage. IEEE Trans. Inf. Forensics Secur. 8(12),
1947–1960 (2013)
12. Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute-based encryption for fine-
grained access control of encrypted data. In: Proceedings of the 13th ACM Con-
ference on Computer and Communications Security. ACM, pp. 89–98 (2006)
13. Bethencourt, J., Sahai, A., Waters, B.: Ciphertext-policy attribute-based encryp-
tion. In: IEEE Symposium on Security and Privacy, SP 2007, pp. 321–334. IEEE
(2007)
14. Li, M., Yu, S., Zheng, Y., Ren, K., Lou, W.: Scalable and secure sharing of personal
health records in cloud computing using attribute-based encryption. IEEE Trans.
Parallel Distrib. Syst. 24(1), 131–143 (2013)
SecHome: A Secure Large-Scale Smart Home System 351
15. Wan, Z., Liu, J.E., Deng, R.H.: Hasbe: a hierarchical attribute-based solution for
flexible and scalable access control in cloud computing. IEEE Trans. Inf. Forensics
Secur. 7(2), 743–754 (2012)
16. Jung, T., Li, X.-Y., Wan, Z., Wan, M.: Control cloud data access privilege and
anonymity with fully anonymous attribute-based encryption. IEEE Trans. Inf.
Forensics Secur. 10(1), 190–199 (2015)
17. Horwitz, J., Lynn, B.: Toward hierarchical identity-based encryption. In: Knudsen,
L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 466–481. Springer, Heidelberg
(2002). https://doi.org/10.1007/3-540-46035-7 31
18. Shao, J., Cao, Z.: Multi-use unidirectional identity-based proxy re-encryption from
hierarchical identity-based encryption. Inf. Sci. 206, 83–95 (2012)
19. Blazy, O., Kiltz, E., Pan, J.: (Hierarchical) identity-based encryption from affine
message authentication. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS,
vol. 8616, pp. 408–425. Springer, Heidelberg (2014). https://doi.org/10.1007/978-
3-662-44371-2 23
20. Boneh, D., Boyen, X., Goh, E.-J.: Hierarchical identity based encryption with con-
stant size ciphertext. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494,
pp. 440–456. Springer, Heidelberg (2005). https://doi.org/10.1007/11426639 26
21. Koeberl, P., Schulz, S., Sadeghi, A.-R., Varadharajan, V.: Trustlite: a security
architecture for tiny embedded devices. In: Proceedings of the Ninth European
Conference on Computer Systems, Article no. 10, p. 1. ACM (2014)
22. Costin, A., Zaddach, J., Francillon, A., Balzarotti, D., Antipolis, S.: A large-scale
analysis of the security of embedded firmwares. In: USENIX Security Symposium
(2014)
23. Cui, A., Stolfo, S. J.: A quantitative analysis of the insecurity of embedded network
devices: results of a wide-area scan. In: Proceedings of the 26th Annual Computer
Security Applications Conference, pp. 97–106. ACM (2010)
24. Caro, A.D., Iovino, V.: Java pairing based cryptography library (2011). http://
libeccio.dia.unisa.it/projects/jpbc
25. Lynn, B.: Pairing-based cryptography library (2007). http://crypto.stanford.edu/
pbc
26. Nir, Y., Langley, A.: ChaCha20 and Poly1305 for IETF Protocols (2015). https://
tools.ietf.org/html/rfc7539
Multi-attribute Counterfeiting Tag
Identification Protocol in Large-Scale RFID
System
1 Introduction
filter is applied in our protocol, which indicates that tag has more than one attribute.
When talked about tag’s attribute, the only one comes to mind is tag’s identity ID.
In fact, reader can measure angle between tag and itself after scanning. So, our
protocol first utilizes two attributes: tag identity ID and angle value. In previous works
[1, 17, 18], counterfeiting tag is defined as one which is not in back-end server’s
database. However, there exists a condition that a tag which holds the same ID with a
genuine one is attached to a counterfeiting item. In this condition, tag’s angle value is a
good criterion to point out counterfeiting tags. Our protocol is the first one to solve
problem that counterfeiting tag forges the same ID with a genuine one. That is to say,
our protocol can not only point out unknown tags, but also counterfeiting goods with
genuine tag IDs. Also our protocol is suit for tag identification in large-scale RFID
system. Even if quantity of RFID tags grows rapidly, identification efficiency can still
be kept around an acceptable range. Compared with recently proposed methods, it can
achieve higher identification efficiency, especially with huge amount of tags. And the
time cost of our protocol is controlled in a reasonable range.
2 Related Works
The main usage of RFID tags in practice is identifying counterfeiting goods. In pre-
vious works like [1], counterfeiting tag is defined as one which is not in back-end
server’s database. Other studies [2, 3] call tags under this condition unknown tags. In
previous studies, counterfeiting tag detection or they call it unknown tag detection, is
classified into two categories: deterministic authentication and probabilistic estimation.
Deterministic authentication is mainly proposed in early works [5–8]. Weis et al.
[5] propose a hash lock authentication scheme to protect tags from tracked. As its
searching complexity is O(N), where N is number of tags, it suffers from poor efficiency
in large-scale RFID system. In order to reduce search complexity, Lu et al. [6] intro-
duce a tree-based method with complexity of O(log(N)). Then they propose a new
scheme which can achieve complexity of O(1) in [7]. Recently, Chen et al. [8] intro-
duce a token-based protocol whose overhead in both tag and reader is O(1). However,
schemes [6–8] are both tree-based protocols whose number of keys increases loga-
rithmically with growth of tags.
In recent years, probabilistic estimation schemes [9–12] are gradually proposed.
However, these methods are focus on estimating cardinality of tags. They cannot
announce identities of counterfeiting tags. Also, there are several identity detection
schemes [13–16]. But they are aimed to find missing tags. Yang et al. [17] first offer a
framed slotted ALOHA based solution to detect counterfeiting tags or they call them
unknown tags. Bianchi et al. [18] further improve this by introducing a standard bloom
filter structure. Then Liu et al. [4] propose sampling filtering techniques based on bloom
filter. However all these bloom filter based methods which leads to more collisions,
haven’t well utilized space of frame. Further, Gong et al. [1] provide a counterfeiting tag
estimation scheme. But it cannot figure out counterfeiting tag. Later, schemes [3, 19]
offer an indicator vector for tags which results in more overhead in tags. In contrast, our
protocol not only points out unknown tags, but also counterfeiting goods with genuine
tags. What’s more, it can achieve a high detection efficiency with reasonable time cost.
354 D. Zhu et al.
3 Preliminary
3.1 System Model and Assumption
A typical RFID system consists of three entities: tags, readers and a back-end server.
Reader is connected to back-end server through wired or wireless link with high
computational ability. RFID tags are divided into three types: active, semi-active and
passive tags. Our protocol mainly talks about passive tags. A reader interrogates and
receives responses from a tag via transmitting a radio-frequency (RF) signal. Each tag
is associated with identity information ID and location information angle value.
In our large-scale system model, reader is in the center and periodically scans tags.
All tags are within reader’s interrogation range. Tags include genuine ones and coun-
terfeits. If identity and angle value of a tag are both stored in back-end server, this tag is a
genuine one. The existence of counterfeiting tags includes two conditions. The first one
is neither tag’s identity or angle value is stored in back-end server, while the second one
is a tag which holds the same ID with a genuine one is attached to a counterfeiting item.
The first condition is usually proposed in previous works [1, 3, 17, 18]. However,
problem in the second condition is first proposed and solved in this paper.
genuine tag
counterfeiting tag
Back-end server
reader
The whole process is divided in two interrogations by reader. When tag settles
down in reader’s interrogation range, it assumes that tags are all genuine ones and tags
will be not transferred to another place before the second interrogation. Then reader
scans tags and measures angle value between tag and itself by RSSI information which
we will not describe in this paper. Afterwards, reader writes the corresponding angle
value on each tag. So tag’s attributes stores in tag are identity ID and angle value. As
tag does not change places before the second interrogation, actual angle value between
reader and tag will be identical with the angle value stored in tag. During period
between the two interrogations, counterfeits can move into interrogation range.
Counterfeit can forge the same ID with genuine one, however, it cannot forge actual
angle value since counterfeit does not know accurate location of the one it forges in a
large-scale RFID system. In the second interrogation, reader will authenticate tag’s
identity and angle value to find out counterfeits. The scenario is illustrated in Fig. 1.
Multi-attribute Counterfeiting Tag Identification Protocol 355
location information anglei can be mapped to k locations {lai1, lai2, …, laiu, …, laik} in
bloom filter for angle, where laiu = Hu(IDi, r, f) mod f, u 2 [1, k], and r is a random
seed. Since there are L standard filters in multi-dimension dynamic bloom filter, tags
need to choose one to join in. So different from the previous multi-dimension dynamic
bloom filter [20], our protocol introduces participation probability p (p 1) for tags to
determine which bloom filter to take part.
Our protocol consists of two parts: counterfeiting tag detection and counterfeiting
tag verification. In detection part, multi-dimension dynamic bloom filter is used to find
counterfeiting tag whose ID or angle value is not identical with the one stored in
back-end server. This includes two conditions. The first one is neither tag’s identity or
angle value is stored in back-end server, while the second one is a tag which holds the
same ID with a genuine one is attached to a counterfeiting item. According to these
conditions, counterfeiting tag verification will announce their identity and verify their
angle value.
After Algorithms 1 and 2, reader receives responses from all tags in its interrogation
range. Obviously, if either k slots for identity or k slots for angle value is not in
database of back-end server, it must be a counterfeit. Reader notes the counterfeiting
tags in this condition. There exists a condition that two tags hold the same k slots in
bloom filter for identity, it means that a counterfeiting tag may be there. Since bloom
filter has false positive probability, it cannot conclude that two tags with same k slots
for identity hold the same identity. It needs to be confirmed in counterfeiting tag
verification.
Algorithm 2 protocol for tag
1: receive frame start command
2: receive frame size f, random seed r, participation probability p.
3: choose to participate in one of m frames or sleep based on the probability p
4: if not participate, sleep until another frame starts.
5: initialize attribute array A[]=[ID, angle]
6: for i = 0 to 1 do
7: for j = 1 to k do
8: S[i][j]= Hj(f, ri, A[i]) mod f
9: end for
10: end for
11: response slot numbers S[i][j] (0≤i≤1,1≤j≤k)
Counterfeiting Tag Verification. This phase mainly verifies tags in the second
condition of counterfeiting tag detection. Reader first seeks database in back-end server
to find identity matched to the k slots. If there are two identities mapped, reader looks
up tags’ k slots for angle value. On condition that two tags matches two genuine tags in
database of back-end server, there exists no counterfeits. Otherwise, tag which is not
matched should be a counterfeit.
If there is only one identity matched, reader will broadcast the identity and wait for
tags’ response. According to the response, reader measures two tags’ angle value. Tag
with whose measured angle value is not identical with the one in database of back-end
server should be a counterfeit. To now, all counterfeiting tags have been confirmed.
5 Performance Analysis
Since tag’s information includes identity ID and angle value, attribute number of
MATI s is set to 2. Probability pDBFID and pDBFangle represents false positive in
dynamic bloom filter for identity and angle value respectively. If a false positive event
happens in our protocol, it should satisfy that all k slots in both dynamic bloom filter for
identity and dynamic bloom filter for angle value are set to 1. So false positive
probability pMATI is denoted as:
k m 2
pkff t
pMATI ¼ pDBF ID pDBF angle 1 1 1e ð4Þ
As our protocol is based on framed slotted ALOHA algorithm, longer frame size
can decrease collisions. Figure 2(a) shows that our protocol acquires better identifi-
cation efficiency with increase of frame size in large-scale RFID system. In order to
minimize false positive probability, ratio of f and N should be optimized. According to
previous work [21], it can be concluded that false positive probability ps is minimized
when ratio of f and N satisfies Eq. (5). From Fig. 2(b), the result is clearly depicted.
ft
k¼ ln2 ð5Þ
pN
As pDBF ID and pDBF angle are false positive probability in dynamic bloom filter for
identity and angle value respectively, it’s obviously that they are all below 1. So it can
be concluded that either pDBF ID or pDBF angle is greater than pMATI , which indicates
Multi-attribute Counterfeiting Tag Identification Protocol 359
1 0.2
0.9 0.18
0.8 0.16
false positive probability
0.6 0.12
0.5 0.1
0.4 0.08
0.3 0.06
k=7
0.2 0.04
0.1 0.02
0 0
0 5000 10000 15000 3 4 5 6 7 8 9 10
frame size hash function numbers k
(a) (b)
Fig. 2. (a) When N = 15000, k = 7, ft = 300, p = 1, the false probability of MATI; (b) When
N = 15000, f = 5000, ft = 495, p = 1, the false probability of MATI.
1
standard bloom filter
0.9 DBF
MATI
0.8
false positive probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5
number of tags x 10
4
Fig. 3. When f = 5000, ft = 495, k = 7, p = 1, the false probability of standard bloom filter,
dynamic bloom filter and MATI respectively.
360 D. Zhu et al.
dynamic bloom filter with the increment of tag numbers. Then false positive probability
of SEBA+ [18], WP [3] and our protocol is examined in Fig. 4. We fix number of tags
N = 1000, participation probability p = 1, hash function number k = 7. As shown in
Fig. 4, when frame size is greater than 2000, our protocol’s false positive probability is
approached to zero, which means identification efficiency is almost up to 1. In contrast,
false positive probability of SEBA and WP is still greater than 0.4, which has huge
difference with our protocol. In conclusion, compared with other methods, our protocol
achieves better identification efficiency in large-scale RFID system.
1
MATI
WP
0.8 SEBA+
false positive probability
0.6
0.4
0.2
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
frame size
Fig. 4. When N = 1000, ft = 300, k = 7, p = 1, the false probability of WP, SEBA+ and MATI
respectively.
From Eq. (7), we can compute that time cost is only 10 s when number of tags is up
to 10000. Also, when N is up to 50000, the time cost is 40 s. Although our protocol’s
time cost is not very low, its false positive probability is far less than WP from Fig. 4.
Multi-attribute Counterfeiting Tag Identification Protocol 361
Specially, with the increment of tags, our protocol has apparent advantage in identi-
fication efficiency. It’s worth to consume more time to achieve better identification
efficiency.
6 Conclusion
Acknowledgement. This work was supported by research of life cycle management and control
system for equipment household registration, No. J770011104 and natural science foundation of
China (61701494). We also thank the anonymous reviewers and shepherd for their valuable
feedback.
References
1. Gong, W., Stojmenovic, I., Nayak, A., et al.: Fast and scalable counterfeits estimation for
large-scale RFID systems. IEEE Trans. Netw. (TON) 24(2), 1052–1064 (2016)
2. The spread of counterfeiting: Knock-offs catch on. The Economist, March 2010
3. Gong, W., Liu, J., Yang, Z.: Fast and reliable unknown tag detection in large-scale RFID
systems. In: Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc
Networking and Computing, pp. 141–150. ACM (2016)
4. Liu, X., Qi, H., Li, K., et al.: Sampling bloom filter-based detection of unknown RFID tags.
IEEE Trans. Commun. 63(4), 1432–1442 (2015)
5. Weis, S.A., Sarma, S.E., Rivest, R.L., Engels, D.W.: Security and privacy aspects of
low-cost radio frequency identification systems. In: Hutter, D., Müller, G., Stephan, W.,
Ullmann, M. (eds.) Security in Pervasive Computing. LNCS, vol. 2802, pp. 201–212.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-39881-3_18
6. Lu, L., Han, J., Xiao, R., Liu, Y.: ACTION: breaking the privacy barrier for RFID systems.
In: INFOCOM 2009, pp. 1953–1961. IEEE (2009)
7. Lu, L., Liu, Y., Li, X.Y.: Refresh: weak privacy model for RFID systems. In: Proceedings of
INFOCOM 2010, pp. 1–9. IEEE (2010)
8. Chen, M., Chen, S.: ETAP: enable lightweight anonymous RFID authentication with O(1)
overhead. In: 2015 IEEE 23rd International Conference on Network Protocols (ICNP),
pp. 267–278. IEEE (2015)
9. Shahzad, M., Liu, A.X.: Fast and accurate estimation of RFID tags. IEEE/ACM Trans.
Networking 23(1), 241–254 (2015)
362 D. Zhu et al.
10. Liu, J., Xiao, B., Chen, S., Zhu, F.: Fast RFID grouping protocols. In: 2015 IEEE
Conference on Computer Communications (INFOCOM), pp. 1948–1956. IEEE (2015)
11. Zheng, Y., Li, M.: ZOE: fast cardinality estimation for large-scale RFID systems. In: 2013
Proceedings IEEE, INFOCOM, pp. 908–916. IEEE (2013)
12. Gong, W., Liu, K., Miao, X., Liu, H.: Arbitrarily accurate approximation scheme for
large-scale RFID cardinality estimation. In: 2014 Proceedings IEEE, INFOCOM, pp. 477–
485. IEEE (2014)
13. Yu, J., Chen, L., Zhang, R., et al.: On missing tag detection in multiple-group
multiple-region RFID systems. IEEE Trans. Mob. Comput. 16(5), 1371–1381 (2017)
14. Liu, X., Li, K., Min, G., Shen, Y., Liu, A.X., Qu, W.: Completely pinpointing the missing
RFID tags in a time-efficient way. IEEE Trans. Comput. 64(1), 87–96 (2015)
15. Liu, X., Li, K., Min, G., Shen, Y., Liu, A.X., Qu, W.: A multiple hashing approach to
complete identification of missing RFID tags. IEEE Trans. Commun. 62(3), 1046–1057
(2014)
16. Shahzad, M., Liu, A.X.: Expecting the unexpected: fast and reliable detection of missing
RFID tags in the wild. In: 2015 IEEE Conference on Computer Communications
(INFOCOM), pp. 1939–1947. IEEE (2015)
17. Yang, L., Han, J., Qi, Y., Liu, Y.: Identification-free batch authentication for RFID tags. In:
2010 18th IEEE International Conference on Network Protocols (ICNP), pp. 154–163. IEEE
(2010)
18. Bianchi, G.: Revisiting an RFID identification-free batch authentication approach. IEEE
Commun. Lett. 15(6), 632–634 (2011)
19. Liu, X., Xiao, B., Zhang, S., Bu, K.: Unknown tag identification in large RFID systems: An
efficient and complete solution. IEEE Trans. Parallel Distrib. Syst. 26(6), 1775–1788 (2015)
20. Guo, D., Wu, J., Chen, H., et al.: Theory and network applications of dynamic bloom filters.
In: Proceedings of 25th IEEE International Conference on Computer Communications,
pp. 1–12. IEEE (2006)
21. Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM
13(7), 422–426 (1970)
22. Philips Semiconductors. Your Supplier Guide to ICODE Smart Label Solutions (2008)
Hijacking Your Routers via Control-Hijacking
URLs in Embedded Devices with Web
Interfaces
Abstract. Embedded devices start to get into the lives of ordinary people, such
as SOHO routers and IP camera. However, studies have shown that the safety
consideration of these devices is not enough, which has led to a growing number
of security researchers focusing on the exploit of embedded devices. A majority
of embedded devices run a web service to facilitate user management, which
provides a potential attack interface. But what needs to be pointed out is that
unfortunately most vulnerabilities of web service need attackers to provide login
credentials to access and exploit, which makes attacking much less practical.
This paper presents an automated vulnerability detecting and exploiting model
DAEWC (Detect and Exploit without Credentials). Firstly, the DAEWC uses
the symbol execution method to find URLs that are not protected by authenti-
cation mechanism. Secondly, DAEWC aims at these URLs using fuzzing
method, combined with a lightweight dynamic data flow tracking technology to
analyze the web server, which can quickly and accurately find easy-to-exploit
vulnerabilities. Last but not least, DAEWC implements an automatic vulnera-
bility exploit model, which generates executable custom shellcode, for example,
executing system (“/bin/sh”) or read/write arbitrary memory. Using these vul-
nerabilities, we can attack embedded devices with web services even without the
access to the web interface. For example, attackers can control a Wi-Fi router at
the airport without login credentials by sending a specially constructed URL
request. We applied the DAEWC to the firmware of two embedded device
vendors, found 9 unreported 0-day vulnerabilities in four of them and generated
highly usable exploit script.
1 Introduction
Along with the development of the Internet of Things, more and more embedded
devices connect to the Internet. For the convenience of user management, these devices
usually run a web service, so that users can remotely operate or set these devices. Users
need to have a web authentication before the management operation, and only after the
authentication (such as the username/password check or the cookies check) they can
have access to other pages. If authentication fails, the server will return a 401 HTTP
code (unauthorized response), or a 302 HTTP code that redirect user to the login page.
However, in practice, some device manufacturers do not complete the verification of all
user URL requests, which leads to unauthorized access to some URLs. Once the
function to handle this type of URL request in the web sever is vulnerable, attackers
can control the firmware program flow by directly requesting these URLs and sending
malicious payload even without the login credentials. We call this type of URLs
control-hijacking URLs.
Web services on embedded devices might have vulnerabilities such as SQL
injection, XSS, CSRF and so on, but these web application layer problems often fail to
meet the requirements of fully controlling the devices. This article aims at the web
server binaries, and focuses on the detection of web services memory corruption
vulnerabilities (such as buffer overflow, command execution, format string vulnera-
bility and so on). Once attacker control the web server program flow, they can operate
with web server’s privileges on the operation system level. The web server of
embedded devices is usually run as root, so that attacker can control the device with the
root privilege of the operation system. It is obvious that in embedded devices, com-
pared to the vulnerabilities of the web application (i.e. CSRF or XSS), the binary’s
memory corruption or command execution vulnerability are more threatening. And this
kind of vulnerability not only exists in theory, in fact, in 2017 the Axis camera is
reported to have a vulnerability that affects millions of IoT devices and it can be
exploited by sending a POST package to control the equipment.1
But finding control-hijacking URLs is not easy. Firstly, we can’t get the firmware
source code - this is a problem for all binary analysis method. Secondly, firmware is
run on the device, and deploying device for each firmware will undoubtedly cost a lot.
Andrei Costin [1] simulate the firmware’s web interface, perform static and dynamic
analysis on it. But this analysis is dependent on the existing static (RIPS) or dynamic
(metesploit or exp from exploit database) analysis tools, and as the manufacturer’s
security consciousness improves, these vulnerabilities can be easily found by manu-
facturers themselves and fix, and thus the number of them has greatly reduced. Also
some of the vulnerabilities they found require web login credentials (such as cookies)
to be triggered, thus these vulnerability do not pose an effective threat to devices in the
real world. In addition, they need to use a real Linux kernel to simulate the firmware,
which is not efficient enough.
Static vulnerability detection method, such as Yan Shoshitaishvili’s Firmliace, can
detect logic flaw (such as back door) effectively, but for the memory corruption and
command injection vulnerabilities static detection is not effective enough. However, to
perform dynamic firmware testing, one needs to spend a lot on buying equipments or
need to get firmware simulated. In fact, the present automated simulation technique
works for only a part of the firmware and as we said before, the test result is often not
ideal.
In light of these challenges, we purpose a novel firmware vulnerabilities detecting
and exploiting tools - DAEWC. It can automatically extract the firmware and look for
web server binary. In the firmware binary, through specific strategy it can locate the
1
http://blog.senr.io/blog/devils-ivy-flaw-in-widely-used-third-party-code-impacts-millions.
Hijacking Your Routers via Control-Hijacking URLs 365
credential authentication code in the program and use symbol execution technique to
identify URLs accessible by unauthorized users, then use a lightweight dynamic data
tracker system to detect potential memory corruption and arbitrary command execution
vulnerabilities. In addition, this framework can also detect URLs that can open
undesired backdoor services (such as Telnet, SSH), which also do not require
credentials.
We tested the DAEWC model on four different routers and found at least one
control-hijacking URLs on each of them. These routers have different web services,
and their architecture includes ARM and MIPS. Experimental results show that
DAEWC is effective and accurate to automatically detect vulnerabilities and generate
exploit (exp).
To sum up, we made the following contributions:
• We purpose a novel model that can perform automated detection of Control-
hijacking URLs in the embedded device firmware. It can be used to control
embedded devices without the web login credentials, so the model can detect more
exploitable vulnerabilities than existing firmware vulnerability detection techniques.
We can exploit the memory crash and arbitrary code execution vulnerabilities
without knowing the details of the firmware implementation.
We implemented a tool DAEWC for automatic vulnerability detection based on the
model, which uses the original firmware as input, outputs control-hijacking URLs, and
the PoC (Proof of Concept) corresponding to each problematic URL. DAEWC can
detect vulnerabilities in multiple architecture firmware and is not related to device
hardware platforms.
• We used DAEWC to detect four different real-world firmware samples and suc-
cessfully found nine 0-day vulnerabilities. They have been submitted to CVEs.
• According to vulnerability detection report, we generated usable exploit successfully.
2 Approach Overview
These URLs are then passed into the vulnerability detection module. The DAEWC
will simulate the web services of the firmware and then, using a lightweight data
tracker, identify possible vulnerabilities. Specifically, we will get the parameters of the
URLs of the previous output to construct special payload.
And then we hook and examine the context of sensitive system function. If the input
we construct gets into the parameters of the sensitive function or overrides the key
memory, you can assume there is a vulnerability here. In the above code example, when
program is processing “goform/ping” request, we will find that the value of parameter
368 M. Yuan et al.
3 Pretreatment
We download the latest firmware from the vendor’s website. We select the firmware
that was easy to decompress, with a file system, and with Unix-like operating system.
We then use binwalk, firmware-mod-kit, or other tools to extract the firmware’s
instruction set type based on /bin/busybox or /bin/sh. Finally, the DAEWC finds the
web server (boa, HTTPD, or lighthttpd) in the decompressed directory, along with the
configuration file and the document root (WWW directory).
There are many ways to handle URL requests in the firmware, such as binaries
(HTTPD, boa), scripting languages (PHP, LUA). We will use two methods to find
control-hijacking URLs. For binary files, DAEWC use symbolic execution methods.
For scripting languages, we will make targeted regular expression matches.
Security Policies. For binary programs, unlike other vulnerability detection system
like Firmalice [3], KLEE [4], AEG [5], and Mayhem [6], DAEWC is more complex. It
first finds URLs that do not require web credentials verification and then exploits them.
It is actually using the firmware’s logical bug in the design process to find these URLs.
After connecting to the Wi-Fi, hackers can reset the router web interface’s user name
and password without credential validation. That is because that the firmware does not
have a credentials checking when processing the URL request to modify the username.
In the absence of development documentation, to infer automatically complete web
server’s request processing logic is very complicated, so we need to manually specify
which operation is related to checking web access and providing specific access policy.
This requires the analyst to have a good understanding of the firmware internal pro-
gram, and to know which sensitive programs or memory are involved in authorization
authentication. We developed the following strategies for the DAEWC.
• Fixed strings. The security policy can be specified as several fixed strings. An
example of such a policy is that the web server directly extracts the value of the
“COOKIES” field in the request for inspection. We can use the data dependency
graph to identify where it is in the program.
• Behavioral patterns. Another policy for DAEWC is to determine the location based
on what the authentication program might do. For example, a server may return a
302 redirect code or 400 code in the HTTP response headers when the authenti-
cation check returns false. We can determine the location of the credentials veri-
fication code according to this behavior pattern. In addition, web server may access
specific memory (a data segment that holds a user’s password) after authentication
or validation, which can also be treated as a behavioral pattern.
• Manually specified. If we already see the identification checking function when
manually reverse engineering, we can specify this code as our policy.
Symbol Execution. We’ll perform some static analysis before symbol execution,
because instead of analyzing the entire web server, we can just focus on the credentials
verification section. The DAEWC generates control flow graph (CFG) by using the
static analysis module of angr, and then generates control dependency graph (CDG) on
this basis, combining with data dependency graph (DDG) to generate the program
dependency graph (PDG). We can then start slicing back from the credentials verifi-
cation point. When back slicing, irrelevant functions and instructions can be ignored,
which greatly reduces the complexity of the analyzer.
Finally, we use the claripy constraint solver of angr to get all the URLs that do not
need to pass the authorization process, and analyze the parameters passed when the
request is sent to these URLs.
370 M. Yuan et al.
5 Vulnerability Detection
In the vulnerability detection phase, we aim to find opening undesired services (SSH,
Telnet), memory corruption and arbitrary command execution vulnerabilities.
To detect undesired services open, we can directly match strings such as *ssh*,
*telnet*, *ftp* in the URL generated in the previous module. Once these strings which
suggest that a special service open exist, a warning will be reported for further veri-
fication. If the service is actually turned on, we can use tool like Hydra to brute force
SSH/TELNET/FTP user name and password. Because embedded devices generally run
these services as root, this means that a back door exists.
For memory corruption or arbitrary command execution vulnerability, we adopt a
dynamic lightweight data tracker detection method.
First, we need to solve the problem of simulating the web server. Chen et al.’s [7]
method is to simulating the whole firmware, Costin [1] focuses on getting the firmware
web interface into the simulation environment to run. These two solutions both need
Linux kernel, and with efficiency and success rate under satisfaction. We are using the
User Mode of the qemu emulator directly (such as qemu-arm) to run the web server
program. At the beginning, the web server program needs to be patched in the binary
level. (1) The IP of the BSS section is changed to a fixed value of 0.0.0.0 in data
section, so we can make local network testing. (2) We ignore the external library
functions that some vendors customize that has no influence on web services; (3) The
functions in libnvram, such as nvram_get, are hooked to make it return fixed number
directly, and so on. After chrooting the decompressed the firmware, modifying the web
configuration file, running init program (such as rcS), we finally start the web server.
Next we come up with a solution that can quickly find a web sever memory
corruption vulnerability and command execution vulnerability. Through hooking
sensitive library function, we can monitor the register value when the sensitive function
is called and the memory of the address in the register. We summarized more than 40
library functions involving memory operations and command execution (strcpy, system
and popen, etc.). We compile the hooking program to generate a dynamic library file,
and use preloading to hook library function call. The hooking function will call the
original library function after processing.
After hook, we will conduct a series of tests to determine if there are any vul-
nerabilities. First, we send special payload, such as “AAAAAA” strings, in a URL
request, and then perform the following detection:
A. For the command execution vulnerability and the format string vulnerability, we
check the parameter registers to determine whether the input data is part of the
function’s parameters. For example, our data goes to the first parameter register r0
of the printf function, which can tell that there is a format string vulnerability.
When our data is the first parameter register r0 or the second parameter register r1
of execve, you can judge that there is a command execution vulnerability here.
B. We check some specific registers or memory data for memory corruption vulner-
abilities. For example, in a stack overflow vulnerability, before overwriting the
return address on the stack, payload will first overwrite the fp address, that is, the
address pointed by the r11 register in ARM architecture. So each time after
Hijacking Your Routers via Control-Hijacking URLs 371
As early as 2011, Avgerinos [5] put forward a model that use symbolic execution to
automate the generation of vulnerability utilization programs-AEG. However, this
model has three problems to consider. Firstly, it takes a long time to determine the
location of the vulnerability by using the simple symbol execution, and the effect is not
ideal. Secondly its exploit generation model is basically aimed at overflow vulnera-
bility, and there is no vulnerability modeling for the format string vulnerability and
arbitrary command execution. Thirdly, AEG emphasize particularly on exploiting on
x86 architecture. However, embedded devices are mostly based on reduced instruction
set such as MIPS and ARM. Their function call convention, parameters passing con-
vention and stack frame structure are very different from x86, so we can’t simply
transplanted AEG model.
DAEWC proposes a new automatic exploit generation model for embedded
devices. We consider the ARM and MIPS instruction set to generate arbitrary
read/write, rebound shell and a series of other shellcode. And the final purpose is to
hijack the web sever program control flow to make it execute our shellcode.
In fact, automatic exploit generation requires a separate modeling of different
vulnerability types. Based on vulnerability types and payload to trigger crash detected
in previous module, we use the following different ways to generate exploit:
1. For arbitrary command execution vulnerability, the values of the URL parameters
become directly a part of the command execution function’s parameters, such as the
‘system’ function. We can replace the parameter value of any command: reboot,
cat/etc/passwd and so on. In order to make these commands execute smoothly, we
can add a semicolon to avoid interference of other strings in command execution
function parameters.
2. For formatting string vulnerability, we first use libformatstr to determine the
location of the formatted string parameters in the stack and the length of padding
that needs to be fulfill the 32-bit alignment in the stack. Then use %s and %n
parameters to achieve arbitrary address read and write. Because the string will have
“\x00” truncation, we will put the read-write address at the end of the constructed
malicious formatting string.
372 M. Yuan et al.
For overflow vulnerabilities, the following steps need to be taken. Firstly we need
to locate the overflow point, which can be obtained according to the report of the
vulnerability detection module. Secondly we need to locate the return address. In order
to achieve this, we firstly use the qemu to simulate the web sever simulation, and then
using ptrace2 to monitor the running program. After that we changed each byte of the
crash trigger payload from the last byte, and resend the payload after every change.
After ptrace capture the SIGSEGV signal, we will check PC and stack registers. If
value in PC registers has changed, the changed byte’s position minus three is the length
of needed padding to rewrite the return address in the stack. With padding we can
control the PC by adding target address. Since embedded devices rarely open security
mechanism like NX, ASLR or Canary of x86 platform, it’s easy to take advantage of it.
The third step is to determine the layout of the stack. In the second step we already
have the padding required to cover the return address and the return address, so we’re
going to consider shellcode’s layout. Because there is no NX and ASLR protection, we
can simply put shellcode as part of the input into the stack, and then return to the
address in the stack where shellcode located. In the previous step by ptrace, we not only
get the position of the return address, but also obtained the stack frame pointer value, so
we’ll deploy shellcode behind the return address, and then overwrite the return address
to be the stack frame pointer value.
7 Evaluation
The Tenda router is popular around the world, and the AC series router has sold
hundreds of thousands of units based on conservative estimates. We tested the Tenda
AC6 router with the DAEWC, and found that several critical vulnerabilities could be
triggered without the need of web credential validation.
Preprocessing. The firmware obtained by preprocessing is based on 32-bit ARM
architecture. Web service program is started by HTTPD, web directory is /webroot, and
rcS program is found.
Authentication-bypassing URLs Extraction. HTTPD has obvious behavioral pattern
when checking the web request credentials. A 302 code response will be returned when
authentication failed. We use this as a security policy, and we use the symbol execution
module of DAEWC to slice and analyze the HTTPD. We generate 312 slices, and then
we get 17 URLs that do not require authorization.
Vulnerability Detection. We found a URL that can open telnet service the regular
expression matching process in the previous step and found two command execution
vulnerabilities and four overflow vulnerabilities. In the manual vulnerability verifica-
tion process, we also surprisingly found a password reset vulnerability based on the
report of the DAEWC.
2
http://man7.org/linux/man-pages/man2/ptrace.2.html.
Hijacking Your Routers via Control-Hijacking URLs 373
Acknowledgment. This work is supported in part by National High Technology Research and
Development Program of China (No. 2015AA016004), the National Key R&D Program of
China (No. 2016QY04W0802).
References
1. Costin, A., Zarras, A., Francillon, A.: Automated dynamic firmware analysis at scale: a case
study on embedded web interfaces. In: Proceedings of the 11th ACM on Asia Conference on
Computer and Communications Security, pp. 437–448. ACM (2016)
2. http://angr.io/
3. Shoshitaishvili, Y., Wang, R., Hauser, C., Kruegel, C., Vigna, G.: Firmalice - automatic
detection of authentication bypass vulnerabilities in binary firmware. In: Proceedings of the
Symposium on Network and Distributed System Security (NDSS) (2015)
4. Cadar, C., Dunbar, D., Engler, D.R.: KLEE: unassisted and automatic generation of
high-coverage tests for complex systems programs. In: Proceedings of OSDI, vol. 8,
pp. 209–224 (2008)
5. Avgerinos, T., Cha, S.K., Hao, B.L.T., Brumley, D.: AEG: automatic exploit generation. In:
Proceedings of the Network and Distributed System Security Symposium, February 2011
6. Cha, S.K., Avgerinos, T., Rebert, A., Brumley, D.: Unleashing mayhem on binary code. In:
Proceedings of the IEEE Symposium on Security and Privacy, pp. 380–394. IEEE (2012)
7. Chen, D.D., Egele, M., Woo, M., Brumley, D.: Towards automated dynamic analysis for
linux-based embedded firmware. In: ISOC Network and Distributed System Security
Symposium (NDSS) (2016)
8. CVE-2017-9138. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9138
9. CVE-2017-9139. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9139
10. CVE-2017-11495. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-11495
A Method to Effectively Detect
Vulnerabilities on Path Planning of VIN
Jingjing Liu1 , Wenjia Niu1(B) , Jiqiang Liu1(B) , Jia Zhao1 , Tong Chen1 ,
Yinqi Yang1 , Yingxiao Xiang1 , and Lei Han2
1
Beijing Key Laboratory of Security and Privacy in Intelligent Transportation,
Beijing Jiaotong University, Beijing 100044, China
{niuwj,jqliu}@bjtu.edu.cn
2
Science and Technology on Information Assurance Laboratory,
Beijing 100072, China
1 Introduction
Path planning for robot mainly solves three problems. First, it must ensure that
the robot can move from the initial point to the target point. Second, it should
allow the robot to bypass the obstacles with a certain method. Third, it tries
to optimize the robot running trajectory in the completion of the above tasks
on the premise. In general, path planning approach has two different types, the
global planning and the local path planning. The global planning method usually
can find the optimal solution. The available planning algorithms for this class
are framework space method [6], free space method [7] and grid method [8].
But it needs to know the accurate information of the environment in advance,
and the calculation is very large. The accuracy of path planning depends on the
accuracy of obtaining the environmental information. The local path planning
requires only the robot’s near obstacle information, so that the robot has good
ability to avoid collision. The commonly used local path planning methods are
template matching [9] and artificial potential fields [10]. The template matching
method is very easy to achieve, but the fatal flaw of this method is to rely
on the past experience of the robot, and if there is not enough path template
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 374–384, 2018.
https://doi.org/10.1007/978-3-319-89500-0_33
Method Detect Vulnerabilities of VIN 375
in the case library, it is impossible to find the path that matches the current
state. The artificial potential fields method is much flexible to control. However,
this method is easy to fall into local optimum, resulting in motion deadlock.
Nowadays, researchers start to put their attention on Reinforcement Learning
(RL), a typical machine learning approach acquiring knowledge in the action-
evaluation environment, for better solutions.
Traditional RL methods are Policy Iteration (PI) [11], Value Iteration (VI)
[12], Monte-Carlo Method (MC) [13], Temporal-Difference Method (TD) [14] and
Q-learning [1]. However, most traditional methods perform relatively poor when
encounter different scenes from the previous training set. In contrast, deep RL,
which incorporates the advantages of neural networks, has better performance
and accuracy. Deep RL has been considered one of the major human learning
patterns all the time. It is widely used in robot automatic control, artificial
intelligence (AI) for computer games and optimization of market strategy. Deep
RL is especially good at controlling an individual who can act autonomously in
an environment and constantly improve individual behavior through interaction
with the environment. Well known deep RL works are Deep Q-Learning (DQN)
[15] and Value Iteration Networks (VIN) [5].
Among them, VIN has better generalization ability [5]. There is a kind of
special value iteration in VIN, not only need to use the neural network to learn a
direct mapping from the state to the decision, but also can embed the traditional
planning algorithm into neural network. Thus, the neural network learns how
to make long-term planning in the current environment, and uses long-term
planning to assist the neural network in making better decisions. That is, VIN
learns to plan and make decisions by observing current environment. Assuming
VIN has already learned a model to predict rewards of future states, the flaw is
when such predictions carry on, the errors of observations accumulate and can
not be avoided because VIN perform VI over the whole environment. We can
use this flaw to study how to obstruct VIN’s performance. Apparently, adding
some obstacles, which are unnoticed for human but can be detected by robots,
to form a fake environment sample will probably make VIN be tricked and do
wrong predictions.
Our main contribution is a method for detecting potential attack that could
obstruct VIN. We study the typical and most recent works about path planning
[2–4] and build our own method. We build a 2D navigation task demonstrate
VIN and study how to add obstacles to effectively affect VIN’s performance
and propose a general method suitable for different kinds of environment. Our
empirical results show that our method has great performance on automatically
finding vulnerable points of VIN and thus obstructing navigation task.
2 Related Work
2.1 Reinforcement Learning
Traditional Reinforcement Learning methods are Policy Iteration, Value Itera-
tion, Monte-Carlo Method, Temporal-Difference Method and Q-learning.
376 J. Liu et al.
Policy Iteration and Value Iteration. The purpose of Policy Iteration [11]
is to iteratively converge the value function in order to maximize the conver-
gence of the policy. It is essentially the direct use of the Bellman Equation. So
Policy Iteration is generally divided into two steps: (1) Policy Evaluation; (2)
Policy Improvement. Value Iteration [12] is obtained by Bellman Optimal Equa-
tion. Thus, Policy Iteration uses Bellman Equation to update value, finally the
convergence of the value is currently under policy value (so called to evaluate
policy), to get new policy. And Value Iteration uses the bellman optimal equa-
tion to update the value and finally converges to the resulting value. Therefore,
as long as the final convergence is concerned, then the optimal policy is obtained.
This method is based on updating value, so it’s called Value Iteration.
Monte-Carlo Method. Monte Carlo’s idea is simple, that is, repeating tests
to find the average. The Monte Carlo method is only for problems with stage
episode [13]. For example, playing a game is step by step and will end. The
Monte Carlo Method cares only for problems that can end quickly. The Monte
Carlo method is extremely simple. But the disadvantages are also obvious. It
takes a lot of time to do as many tests as possible, and to calculate at the end of
each test. AlphaGo [16] uses the idea of Monte Carlo Method in Reinforcement
Learning. It only uses the final winning or losing results to optimize each step.
Time Difference can be regarded as a combination of Dynamic Programming
and Monte Carlo Method. Time Difference [14] also can directly learn from
experience, does not require any dynamic model in the environment, but also a
kind of Reinforcement Learning, can learn step by step, and does not need to
wait for the end of the entire event, so there is no problem of calculating the
peak value.
Q-learning is an important milestone in Reinforcement Learning. Watkins [1]
takes the Reinforcement Learning method, whose evaluation function is based
on Q value of state-action, as Q-learning. It is actually a change form of the
Markov Decision Process (MDP). In Reinforcement Learning, the reinforcement
function R and the state transfer function P are unknown, so the Q-learning uses
iterative algorithm to approximate the optimal solution. Q-learning does not go
forward along the path of the highest Q value at each iteration. The reason why
the Q-learning effects is the updating of the Q matrix.
network is the Q value, if the target Q value can be constructed, the loss function
can be obtained by the mean-square error (MSE). But for the value network,
the input information is the state s, action a and feedback R. Therefore, how to
calculate the target Q value is the key to the DQN, which is the problem that
Reinforcement Learning can solve.
Value Iteration Networks. The biggest innovation of this work is that a
Planing Module is added to the General Policy. The author believes that joining
the motivation of this module is natural, because when solving a space problem,
it is not simply solving the problem, but planning in this space. It aims at
solving the problem of poor generalization ability in Reinforcement Learning.
In order to solve this problem, a Learn to Plan module is proposed [5]. The
innovations of VIN are mainly the following points: (1) the reward function
and transfer function are also parameterized and can be derived; (2) a spatial
assisted strategy is introduced to make policy more generalized; (3) the attention
mechanism is introduced into the solution of the strategy; (4) the design of VI
module is equivalent to CNN, and the BP algorithm can be used to update the
network.
We find that each method has its own superiority. But, VIN is much better
at planning in different sample from training set. Thus, we choose VIN as our
experiment bases and we will show our analysis in Sect. 3.
3 Method
In this section, we will discuss how we build the method for obstructing VIN’s
performance. First, we analyze the path planning idea of VIN, and carry out ten-
tative experiments to speculate the factors that might affect VIN path planning.
After that, we propose the method of obstructing VIN according to these factors.
In Sect. 4, we will use a large number of experiments to verify this method.
path. VIN defines a MDP space, M, which consists of a series of tuples, that
is, some columns’ states, actions, transfers, rewards, data tuples, and M that
determines our final strategy. Then, a strategy obtained by data M from this
MDP space policy is not a good generalization strategy because the policy is
limited to this data space. Therefore, the authors assume that the unknown
data space M is obtained, and that the optimal plan in this space contains
important information about the optimization strategies in the M space. In fact,
the assumption is that M is just a sampling trajectory of a part of the MDP
space, adding M is as a complement to the trajectory in this space.
VIN believes that the reward R and the transfer probability P in the data
space M also depend on the observations in the M data space. After doing such
hypothesis, it introduces two functions, fR and fP, respectively, for parameterized
R and P . fR is the reward mapping: when inputting state, it will calculate the
corresponding reward value. For example, the reward value is higher in the state
near the target, and the value is lower when the state position is near the obstacle.
Rule 1: The farther away from the VIN planning path, the less the distur-
bance to the path.
The formula is:
v1yk = ω1 min{d1 |d1 = (xr − ykr )2 + (xc − ykc )2 ,
(1)
(xr , xc ) = x ∈ X, (ykr , ykc ) = yk ∈ Y }
where (xnr , xnc ) is the coordinate of xn , the destination of the path, (ykr , ykc )
is the coordinate of yk , ω3 is the weight of v3 . The formula considers the Cheby-
shev distance from yk to the destination, and use the weight ω3 to control the
attenuation of v3 .
For the points in X and B, the value is 0. That is: vx = 0, vb = 0.
So, the problem now is how to use ω to control the attenuation of each
formula. We define ω3 as constant, because v3 is the least important among v,
that is Rule 3 is not that important compared with the other two. And we define
ωi (i = 1, 2) as follows:
d2
ωi = exp(− i2 ), i = 1, 2 (4)
2θi
where the ωi (i = 1, 2) is exponential decay, and when it multiplies di (i = 1, 2),
vi (i = 1, 2) will grows within a certain range, then decays exponentially. And
θi (i = 1, 2) is the parameter to control the peak point and the rate of the decay.
380 J. Liu et al.
3
vy k = viyk (5)
i=1
4 Experiment
Our goal in this section is mainly to figure out the following question: Can our
method work effectively on adding obstacles automatically to obstruct VIN’s
performance?
The basis of our experiment is VIN. We apply the available source code [18]
provided by VIN’s author to train VIN Model. We adopt 28 × 28 Grid-World
domain as the domain input and output.
First, we generate 20, 000 maps with random starting and ending points and
shortest path. We use 10, 000 maps to train the VIN Model. Then, we preprocess
the other 10, 000 domains as the testing set: getting rid of those can’t reach the
ending point by VIN. And pick 5, 000 domains from them to be the testing set.
Second, we carry out the experiment on testing set with the method proposed
in this paper, and get the label (success or fail) of each domain.
At last, we use the method for random addition of noise as a comparison test.
The result shows that our method does have superiority on finding vulnerabilities
in VIN path planning and automatically adding obstacles.
Method Detect Vulnerabilities of VIN 381
As the pictures of Fig. 2 show, our method does have ability to find vulnerabilities
in VIN and thus interfere its performance. Fig. 2(a) is a sample of testing set,
and Fig. 2(b) to (f) are the top 5 vulnerable points picked by our method. Four
of them can effectively interfere VIN’s path.
(a) Sample of Testing Set (b) Available Obstacle 1 (c) Available Obstacle 2
References
1. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
2. Giusti, A., Guzzi, J., Dan, C.C., et al.: A machine learning approach to visual
perception of forest trails for mobile robots. IEEE Robot. Autom. Lett. 1(2), 661–
667 (2016)
3. Levine, S., Finn, C., Darrell, T., et al.: End-to-end training of deep visuomotor
policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)
4. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep
reinforcement learning. Nature 518(7540), 529–533 (2015)
5. Tamar, A., Wu, Y., Thomas, G., et al.: Value iteration networks. In: Advances in
Neural Information Processing Systems, pp. 2154–2162 (2016)
6. Hyndman, R.J., Koehler, A.B., Snyder, R.D., et al.: A state space framework
for automatic forecasting using exponential smoothing methods. Int. J. Forecast.
18(3), 439–454 (2002)
7. Brooks, R.A.: Solving the find-path problem by good representation of free space.
IEEE Trans. Syst. Man Cybern. 2, 190–197 (1983)
384 J. Liu et al.
8. Zhu, Q.B., Zhang, Y.: An ant colony algorithm based on grid method for mobile
robot path planning. Robot 27(2), 132–136 (2005)
9. Terwilliger, T.C.: Automated main-chain model building by template matching and
iterative fragment extension. Acta Crystallogr. Sect. D: Biol. Crystallogr. 59(1),
38–44 (2003)
10. Warren, C.W.: Global path planning using artificial potential fields. In: Proceedings
of 1989 IEEE International Conference on Robotics and Automation, pp. 316–321.
IEEE (1989)
11. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. J. Mach. Learn. Res.
4(Dec), 1107–1149 (2003)
12. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: an anytime algo-
rithm for POMDPs. In: IJCAI, vol. 3, pp. 1025–1032 (2003)
13. Dellaert, F., Fox, D., Burgard, W., et al.: Monte Carlo localization for mobile
robots. In: Proceedings of 1999 IEEE International Conference on Robotics and
Automation, vol. 2, pp. 1322–1328. IEEE (1999)
14. Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM
38(3), 58–68 (1995)
15. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Playing Atari with deep reinforcement
learning. arXiv preprint arXiv:1312.5602 (2013)
16. Wang, F.Y., Zhang, J.J., Zheng, X., et al.: Where does AlphaGo go: from church-
turing thesis to AlphaGo thesis and beyond. IEEE/CAA J. Automatica Sin. 3(2),
113–120 (2016)
17. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets.
In: Advances in neural information processing systems, pp. 2672–2680 (2014)
18. UC Berkeley. https://github.com/avivt/VIN
Healthcare and Industrial Control
System Security
Towards Decentralized Accountability
and Self-sovereignty in Healthcare Systems
1 Introduction
The rising of wearable technology contributes to the digitalization of the world.
Wearable technology refers to networked devices embedded with sensors which
can be worn comfortably on the body or even inside the body to collect health
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 387–398, 2018.
https://doi.org/10.1007/978-3-319-89500-0_34
388 X. Liang et al.
data and tracking activities [19] thus serving as a convenient tool to monitor
personal health. From doctors’ side, those collected data can be valuable clues
for determining the appropriate medical treatment. Besides, Insurance 3.0 [3]
rises as a result of analysis on big data. With the availability of rich health
data in the cloud, health insurance companies can make more strategic policies
according to individual characteristics.
However, challenges are arising since more health data can be collected from
both wearable devices and EHR systems. First, patients become more concerned
about the privacy of the health data. Many exiting state-of-the-art approaches
focus on improving data providers’ responsibilities to detect the data disclosure
activities, however, it is urgent to protect data access and provide immediate
notifications of data disclosure risks. Second, over 300 different EHR systems
are in use today, but most of them adopt a centralized architecture which suffers
from single point of failure. Meanwhile, there are little or even no communication
and cooperation among systems [1]. The isolation between data centers results in
the lack of a holistic and thorough view of personal health. It is reported that 62%
of insured adults rely on their doctors to manage their health records [1], which
limits their ability to interact with other healthcare providers than their primary
doctor. Moreover, even though many health providers are supposed to follow
rules or laws, such as HIPAA (Health Insurance Portability and Accountability
Act of 1996), but there are still many entities which are not covered by any
laws. Therefore, it is crucial that any entity that has access to the data should
be accountable for their operations on the data and any operations on the data
need to be audited.
With the above mentioned issues of data ownership, data isolation and lack
of accountability, as well as high privacy risks existing in current EHR systems,
patients have little control over their personal health data [10], the notion of
Self-Sovereignty [7,13] gains great popularity for dealing with healthcare data
issues. To better bring this concept into reality, we adopt two novel technolo-
gies, Intel SGX and blockchain, to implement a patient-centric personal health
data management system with accountability and decentralization. Intel SGX
offers an anonymous key system (AKS) [18] that can generate an anonymous
certificate which will then be transmitted to a certification platform for valida-
tion. Blockchain technology, where data are stored in a public, distributed and
immutable ledger, maintained by a decentralized network of computing nodes,
provides a decentralized and permanent record keeping capability, which is crit-
ical for data provenance [12] and access control [9] in cloud data protection.
In this paper, we propose a complete patient-centric personal health data
management system, allowing patients to collect and manage their health data
all in a compliant way. In the development of the system, we take the user
ownership of data into consideration and the contribution is as follows.
2 Architecture Design
2.1 System Overview
A three layer architecture for accountability and privacy preservation is designed
for the PHDM system. The data sharing layer provides users with entire control
over their personal health data and handles data requests from third parties. The
SGX enabled hardware layer provisions a trusted execution environment in the
cloud, generates data access tokens and is responsible for reliable data storage
and process. The blockchain network layer, which is distributed and untrusted,
records data operations and various data access requests for immutability and
integrity protection. Figure 1 is a general scenario for the patient centric per-
sonal health data management (PHDM) system. Personal wearable devices col-
lect original health data, such as walking distance, sleeping conditions and heart-
beat, which may be synchronized by the user with their online account associated
with the cloud server and cloud database. Every piece of health data could be
hashed and uploaded to the blockchain network for record keeping and integrity
protection. The original data is maintained in the cloud database hosted on
trusted platform enabled by SGX. The user owns personal health data, main-
tains access tokens and is responsible for granting, denying and revoking data
access from any other parties requesting data access. For example, a user seeking
390 X. Liang et al.
EHR Manager
Update
Hospital
Query/ NoƟficaƟon
Database Blockchain Schools/ Research Center/
Client Insurance Company
Intel SGX Blockchain
Client
Medical
Devices
TransacƟon
Consensus Scheme
visible only to
validaƟng enƟƟes ValidaƟng
ParƟcipants and nodes Integrity Accountability
The providers
PaƟents address UpdaƟng/Query
address
Tamper-proof Immutability
EMR Queries & Reference to
hash database
Blockchain
medical treatment would grant the doctor a one time data access token. Same
scenario applies to user-insurance company interactions. Besides, user can also
manually record everyday activities according to a particular medical treatment
such as medicine usage and share the information frequently with the doctor.
Healthcare providers such as doctors can perform medical test, give suggestions
or provide medical treatment, and request access to previous medical treatment
from the patient. The data request and the corresponding data access is recorded
on the blockchain for distributed validation. Besides, user may request a health
insurance quote from insurance companies to choose health insurance plans.
Insurance companies can also request access to user health data from wearable
devices and medical treatment history. The blockchain network is used for three
purposes. For health data collected from wearable devices and from healthcare
providers, each of the hashed data entry is uploaded to the blockchain network
for integrity protection. For personal health data access request from healthcare
provider and health insurance company, a permission from the data owner is
needed with a decentralized permission management scheme. Besides, each of
the access request and access activity should be recorded on the blockchain for
further auditing or investigation.
In the patient centric data management system, users are required to register
an online account to be involved in the system, and generate data encryption
key pairs to encrypt their cloud data for confidentiality. For key management,
we assume the system developers adopt a secure wallet service. The description
of each key established is as follows.
Decentralized Accountability and Self-sovereignty in Healthcare 391
In the system, there are four phases for personal health data management includ-
ing user registration, health data generation and synchronization (data generated
from user, healthcare provider and insurance company), health data access man-
agement, health data access record uploading and health data access auditing.
User Registration. In the system, user needs to create an online account to
store health data collected from wearable devices and other sources in the cloud
database by way of establishing an online ID. Other entities in the system cannot
correlate the online ID with their real identity, preserving user privacy in the
registration phase.
Health Data Generation and Synchronization. Health data contains four
categories, including data collected from wearable devices, data collected from
medical test, data collected by patient indicating their treatment details and
392 X. Liang et al.
cloud server is responsible for issuing and verifying tokens, and also maintaining
both the data record database and data access log database. Users can request
and share the access tokens to data requstors. Potential data requestors include
healthcare providers, insurance companies and even system auditors. Each data
and token operation is recorded in the blockchain and thus validated. After user
registration, the cloud server can issue tokens based on the personal information
provided by users. To access data, the required token will be presented to the
cloud server and verified. The server issuance operation, the user token presen-
tation and verification omit system logs which will be stored in the log database,
as well as data requests and access from third parties.
The cloud server issues tokens to users with the signature (σz1 , σc1 , σr1 ). For
privacy concerns, the application attributes are hashed for the generation of U-
Prove based token. During some circumstances, the issuer is able to generate
multiple tokens at one time for better performance.
after the verification. Different decisions can be made by the user, such as to
grant, deny and revoke access. The presentation proof serves two purposes. For
one thing, it proves the integrity and the authenticity of the attribute values and
for another, it establishes the confirmation of the ownership of the private key
associated with the token itself, which will further prevent token replay attack.
As is shown in Fig. 2, each data and token operation is recorded in the blockchain
and thus validated in a decentralized and permanent manner, ensuring data
integrity. Besides, every operation is launched on a trusted platform enabled
by Intel SGX, making the operation record trustworthy and nonframeable. The
event record can be described using a tuple as {datahash, owner, receiver, time,
location, expirydate, signature} where the signature comes with platform depen-
dency for accountability. Then the tuple is submitted to the blockchain network
which is followed by several steps to transform a list of records into a transac-
tion. A list of transactions will be used to form a block, and the block will be
validated by nodes in the blockchain network by consensus algorithms. After a
series of processes, the integrity of the record can be preserved, and future vali-
dation on the block and the transaction related to this record is accessible. Each
time there is an operation on the personal health data, a record will be reflected
and anchored to the blockchain. The SGX platform identification key KP ID is
used to generate the signature thus making each record platform dependent and
ensuring that every action on personal health data is accountable. The token
generation and issuance are also recorded in the same way so as to track the
data requests and authorizations.
For scalability considerations, we adopt a Merkle tree based architecture [14]
to handle large number of data records. Each leaf node represents a record and
the intermediate node is computed as the hash of the two leaf nodes. The Merkle
root, along with the tree path from the current node to the root, serves as the
proof of integrity and validation, that is, the Merkle proof. The basic Merkle
proof is shown in Fig. 3. First, we need to identify the record location, the tar-
getHashB. The target hash and the path to the Merkle root, that is, nodes in
green, constitute the Merkle proof of the hashed data record, which is stored in
a JSON-LD document that contains the information to cryptographically verify
that the record is anchored to a blockchain. By calculating the hashes in different
tree levels, it is easy and fast to obtain the root hash, which is anchored in the
blockchain transaction, witnessed and maintained by some distributed nodes. It
proves the data was created as it was at the time anchored. The Merkle root for
each Merkle tree is related to one transaction in the blockchain network, which
means a blockchain transaction represents a list of data records the Merkle hosts,
enabling the scalability and effectiveness of data integrity protection and valida-
tion. The tree based architecture protects the integrity of each operation record
396 X. Liang et al.
itself which can be validated by traversing the tree nodes. Meanwhile, it implic-
itly indicates the integrity of all the nodes in that any single node modification
could lead to the modification of the root, thus protecting the integrity of the
whole tree structure at trivial costs.
5 System Evaluation
To evaluate the performance of the system and overhead brought by the secu-
rity measures, we adopt two metrics, including the efficiency to handle different
number of accountable records and generate large numbers of tokens. For record
anchoring, the tree based algorithm bears a computation complexity of log(n)
and the average time cost for each record is 0.4 ms when 1000 entries are pro-
cessed concurrently.
For U-Prove based token generation, we select five attributes predefined and
involved in each token and two of them are required to obtain a data access token.
During the token issuance, there are basically two cryptographic methods for
digital signature including Subgroup and ECC. The evaluation results for token
issuance and presentation with these two methods are shown in Fig. 4(a) and
(b). It can be concluded that ECC-based token generation is more efficient than
the subgroup-based method. This can be explained that ECC utilizes shorter
(a) Average Time for Token Issuance (b) Average Time for Token Presentation
key length for the elliptic curve than subgroups of equivalent security levels and
computes faster with a small field. Adopting the ECC-based U-Prove protocols
for both token issuance and presentation, the average overhead brought to the
system is 8.1% and 9.4%, respectively.
Some work [17] has been done to integrate blockchain technology to the health-
care industry. MedRec [8] is proposed to build the healthcare on top of smart con-
tract. But still, privacy risks remain to be addressed. [20] points out that MPC
(Secure Multi-Party Computing) is a promising solution to enable untrusted
third-party to conduct computation over patient data without violating privacy
but the actual efficiency is not clear. [21] addresses the adoption of blockchain
in social network domain but not fully explores the benefits of the blockchain.
[11] addresses the blockchain adoption in Internet of Things environment.
In this paper, we build a web based system for personal health data man-
agement using blockchain and Intel SGX. By utilizing blockchain technology in
the self-sovereign healthcare systems, we manage to distribute the responsibility
of maintaining trusted records for data operation as well as token generations.
Meanwhile, benefiting from the blockchain consensus scheme and the decentral-
ized architecture, along with the trusted execution environment and the platform
dependency provisioned by Intel SGX, the records are anchored with trusted
timestamping and redundancy, preserving both availability and accountability
of the healthcare data and operations. We also propose a U-Prove based protocols
for the permission management. We implement a prototype of the PHDM sys-
tem and the evaluation shows that the performance is acceptable. In the future,
we will integrate the PHDM system with the enhancement of a blockchain based
access control scheme to provide better data protection and user privacy.
References
1. Connected patient report. https://www.salesforce.com/assets/pdf/industries/
2016-state-of-the-connected-patient-pr.pdf
2. Chainpoint: a scalable protocol for anchoring data in the blockchain and generating
blockchain receipts. http://www.chainpoint.org/
3. Insurance 3.0 - The Turn of the Digital. http://www.huxley.com/fr/actualites-
et-articles-de-fond/actualites/insurance-3-0-le-virage-du-digital. Accessed 7 Mar
2017
4. Tierion API. https://tierion.com/app/api
398 X. Liang et al.
5. Anati, I., Gueron, S., Johnson, S., Scarlata, V.: Innovative technology for cpu
based attestation and sealing. In: Proceedings of the 2nd International Workshop
on Hardware and Architectural Support for Security and Privacy, vol. 13 (2013)
6. Chen, L., Li, J.: Flexible and scalable digital signatures in TPM 2.0. In: Proceedings
of the 2013 ACM SIGSAC Conference on Computer & Communications Security,
CCS 2013, pp. 37–48. ACM, New York (2013). https://doi.org/10.1145/2508859.
2516729
7. Clippinger, J.H.: Why Self-Sovereignty Matters. https://idcubed.org/chapter-2-
self-sovereignty-matters/. Accessed 7 Mar 2017
8. Ekblaw, A., Azaria, A., Halamka, J.D., Lippman, A.: A case study for blockchain
in Healthcare:MedRec prototype for electronic health records and medical research
data. In: Proceedings of IEEE Open & Big Data Conference (2016)
9. Hardjono, T., Pentland, A.S.: Verifiable anonymous identities and access control
in permissioned blockchains
10. Kish, L.J., Topol, E.J.: Unpatients-why patients should own their medical data.
Nat. Biotechnol. 33(9), 921–924 (2015)
11. Liang, X., Zhao, J., Shetty, S., Li, D.: Towards data assurance and resilience in
IoT using distributed ledger. In: IEEE MILCOM. IEEE (2017)
12. Liang, X., Shetty, S., Tosh, D., Kamhoua, C., Kwiat, K., Njilla, L.: ProvChain:
a blockchain-based data provenance architecture in cloud environment with
enhanced privacy and availability. In: International Symposium on Cluster, Cloud
and Grid Computing. IEEE/ACM (2017)
13. Liang, X., Zhao, J., Shetty, S., Liu, J., Li, D.: Integrating blockchain for data
sharing and collaboration in mobile healthcare applications, October 2017
14. Merkle, R.C.: Protocols for public key cryptosystems. In: 1980 IEEE Symposium
on Security and Privacy, p. 122. IEEE, April 1980
15. Paquin, C.: U-prove technology overview v1.1 (revision 2), April 2013. https://
www.microsoft.com/en-us/research/publication/u-prove-technology-overview-v1-
1-revision-2/
16. Paquin, C., Zaverucha, G.: U-prove cryptographic specification v1. 1. Technical
report, Microsoft Corporation (2011)
17. Peterson, K., Deeduvanu, R., Kanjamala, P., Boles, K.: A blockchain-based app-
roach to health information exchange networks (2016)
18. Sarangdhar, N., Nemiroff, D., Smith, N., Brickell, E., Li, J.: Trusted platform mod-
ule certification and attestation utilizing an anonymous key system, 19 May 2016.
https://www.google.com/patents/US20160142212. US Patent App. 14/542,491
19. Thierer, A.D.: The internet of things and wearable technology: addressing privacy
and security concerns without derailing innovation. Richmond J. Law Technol. 21,
1 (2014)
20. Yue, X., Wang, H., Jin, D., Li, M., Jiang, W.: Healthcare data gateways: found
healthcare intelligence on blockchain with novel privacy risk control. J. Med. Syst.
40(10), 218 (2016). https://doi.org/10.1007/s10916-016-0574-6
21. Zhang, J., Xue, N., Huang, X.: A secure system for pervasive social network-based
healthcare. IEEE Access 4, 9239–9250 (2016)
P3ASC: Privacy-Preserving Pseudonym
and Attribute-Based Signcryption
Scheme for Cloud-Based Mobile
Healthcare System
1 Introduction
The promotion of wireless body area network (WBAN) has accelerated the explo-
sive growth of medical and biological data, posing new challenges to data storage
and data processing for health care providers [1,2]. A possible way to overcome
these challenges is to exploit the benefits of cloud computing [3]. Cloud comput-
ing can provide an information technology infrastructure that allows hospitals,
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 399–411, 2018.
https://doi.org/10.1007/978-3-319-89500-0_35
400 C. Wang et al.
updates made on the descriptive attributes are much simpler than updates made
on access structure.
Given the importance of PHI and the compliance of health insurance porta-
bility and accountability act (HIPPA), it is critical to guarantee source authenti-
cation and integrity of PHI in cloud-based mobile healthcare system. Otherwise,
anyone can modify or forge someone’s PHI, which is undesirable. In addition,
user privacy may impede its wide adoption. Digital signature is a very useful tool
for providing authenticity, integrity and non-repudiation while it is rarely consid-
ered to provide user privacy by its own. Although privacy-preserving signature
schemes, such as ring signature, group signature, mesh signature, attribute-based
signature [17], have been widely studied in recent years, they are very compli-
cated and time-consuming.
In this paper, we introduce a new cryptographic primitive named pri-
vacy preserving pseudonym and attribute-based signcryption (P3ASC) scheme,
which can fulfill the functionality of pseudonym-based signature and key-policy
attribute-based encryption in a logical step. To achieve user anonymity, pri-
vacy preserving technique based on pseudonyms is adopted. Then, we propose a
P3ASC scheme and prove it is indistinguishable against adaptive chosen plain-
text attacks in the selective-set model under the DBDH assumption, and is exis-
tentially unforgeable against adaptive chosen message and pseudonym attacks
in the random oracle model under the ECDL assumption. Finally, we provide an
architectural model of cloud-based mobile healthcare system by exploiting our
proposed P3ASC scheme. It can ensure data confidentiality, integrity, authen-
ticity, and non-repudiation, but also can provide fine-grained access control and
user anonymity.
The rest of the paper is organized as follows. We introduce some neces-
sary preliminary work in Sect. 2. We give syntax and security definitions for
P3ASC scheme in Sect. 3. We present a P3ASC scheme in Sect. 3. We describe
an architecture of cloud-based mobile healthcare system by exploiting the pro-
posed P3ASC scheme in Sect. 4. Finally, we conclude our paper and discuss our
future work in Sect. 5.
2 Preliminaries
$
We denote by κ the system security parameter. When S is a set, x ← S denotes
that x is uniformly picked from S. Let M, ID and Ω be message universe,
identity (pseudonym) universe and attribute universe, respectively.
Ω \ ∅. The sets in A are called the authorized sets, and the sets not in A are
called the unauthorized sets [9].
We restrict our attention to monotone access structures. If a set of attributes
ω satisfies an access policy (access structure) A, we denote it as A(ω) = 1.
Otherwise, we denote it as A(ω) = 0.
Let M×k be a matrix, and ρ : {1, . . . , } → Ω be a function that maps a
row of M×k to an attribute for labeling. A secret sharing scheme for access
structure A over a set of attributes Ω is a linear secret-sharing scheme over Fq
and is represented by (M×k , ρ) if it consists of two polynomial-time algorithms:
Let P be a point with a prime order q in an elliptic curve Ep (a, b), and G be a
def
subgroup generated by the base point P , i.e., G = P .
We say that the ECDL assumption holds for the group G, if for any proba-
bilistic polynomial time (PPT) adversary A, the advantage AdvECDLP
A (1κ ) is a
negligible function in the security parameter κ.
A bilinear group parameter generator G is an algorithm that takes as input a
security parameter κ and outputs a bilinear group setting (q, G1 , GT , ê), where
G1 and GT are a cyclic additive group and a multiplicative group of prime order
q, respectively, and ê: G1 × G1 → GT is a bilinear pairing with the following
properties:
$ $
– Bilinearity: For P1 , P2 ← G1 and a, b ← Z∗q , we have ê([a]P1 , [b]P2 ) =
ê(P1 , P2 )ab .
– Non-degeneracy: There exists P1 , Q1 ∈ G1 such that ê(P1 , Q1 )
= 1GT .
– Computability: There is an efficient algorithm to compute ê(P1 , Q1 ) for
$
P1 , Q1 ← G1 .
Privacy-Preserving Pseudonym and Attribute-Based Signcryption 403
$ $
where a, b, c, z ← Z∗q , P1 ← G1 , the decisional bilinear Diffie-Hellman problem
(DBDHP) in (q, G1 , GT , ê) is to determine whether ê(P1 , P1 )z = ê(P1 , P1 )abc .
The advantage of an adversary A in breaking DBDHP in (q, G1 , GT , ê) is
defined by
AdvDBDHP
A (1κ ) = | Pr[D0 (1κ ) → 1] − Pr[D1 (1κ ) → 1]|
We say that the DBDH assumption holds for (q, G1 , GT , ê), if for any PPT
adversary A, the advantage AdvDBDHP
A (1κ ) is a negligible function in the security
parameter κ.
– Setup: The probabilistic setup algorithm is run by the PKG. It takes as input
a security parameter κ and an attribute universe Ω. It outputs the public
system parameters mpk, and the master secret key msk which is known only
to the PKG.
– PIDKeyGen: The probabilistic pseudonym-based private key generation algo-
rithm is run by the PKG. It takes as input mpk, msk, and a real user identity
id. It outputs a pseudonym pid and a private key skpid corresponding to the
pseudonym.
– ABKeyGen: The probabilistic attribute-based private key generation algo-
rithm is run by the PKG. It takes as input mpk, msk, and an access structure
A assigned to a user. It outputs a private key dkA corresponding to the access
structure A.
– SignCrypt: The probabilistic signcrypt algorithm is run by a sender. It takes
as input mpk, a message Msg, a sender’s pseudonym-based private key skpid ,
and a set ω of descriptive attributes. It outputs a signcrypted ciphertext Sct.
– PubVerify: The deterministic public verifiability algorithm is run by any
receivers. It takes as input mpk, a signcrypted ciphertext Sct, a sender’s
pseudonym pid, and a set ω of descriptive attributes. It outputs a bit b which
is 1 if Sct is generated by the sender, or 0 if Sct is not generated by the sender.
– UnSigncrypt: The deterministic unsigncryption algorithm is run by a receiver.
It takes as input mpk, a signcrypted ciphertext Sct, a sender’s pseudonym
pid, a set ω of descriptive attributes, and a receiver’s attribute-based private
key dkA . It outputs Msg if A(ω) = 1. Otherwise it outputs ⊥.
404 C. Wang et al.
$
idU ←
− ID
$ ∗
rU ←
− q, RU = [rU ]P
(RU ,idU )
−−−−−−−−−→
Check the validity of idU
$ U = [
rU ←− ∗q , R rU ]P
RU = RU + R U
pidU = H1 (x, RU ) ⊕ idU
cU = H2 (pidU , RU )
U = rU + cU · x mod q
sk
U ,R
(pid ,sk U )
←−−−−U−−−−−−−−−
RU = RU + R U
cU = H2 (pidU , RU )
U mod q
skU = rU + sk
QU = [skU ]P
?
QU = RU + [cU ]Ppub
– UnSigncrypt: A receiver uses his decryption private key dkA associated to the
access structure A described by (M×n , ρ) to recover and verify the sign-
|ω |
crypted ciphertext Sct = {ω, pidU , RU , C , {Ci }i=1 , A, σ, h4 } as follows.
?
1. Determine A(ω) = 1. If not, the receiver rejects the signcrypted cipher-
text Sct.
2. Validate the signcrypted ciphertext Sct as any receiver performs in the
PubVerify algorithm.
3. Define I = {i|ρ(i) ∈ ω} ⊂ {1, 2, . . . , }, and let {μi } be a set of con-
stants
such that if {λi } are valid shares of y according to (M×n , ρ), then
i∈I λi · μi = y.
4. Compute V = ρ(i)∈ω ê(Di , Cρ(i) )μi and Msg = C /V .
5. Check H4 (A, Msg , V ) = h4 . If it holds, the receiver accepts and outputs
?
ρ(i)∈ω
= ê(P, P )sy
Msg = C /V = Msg · Y s /ê(P, P )sy
= Msg.
Theorem 2. Our P3ASC scheme satisfies the conditional anonymity for the
sender.
Proof. In the PIDKeyGen, sender can choose a family of pseudonyms and obtain
associated private keys by running a Schnorr-like lightweight identity-based blind
signature scheme with the PKG ([18–20]). Although the signcrypted message Sct
must include a pseudonym of the sender and RU . However, anyone, except the
PKG, cannot extract sender’s real identity idU because they have no idea of the
master secret key x. Furthermore, there is no linkage between these pseudonyms,
anyone, except the PKG, cannot link two sessions initiated by the same sender.
Of course, the PKG can extract the sender’s real identity by computing idU =
pidU ⊕ H1 (x, RU ). Thus, our P3ASC scheme achieves the conditional anonymity
for the sender.
Proof. We will give detailed security proof in the full version due to the space
limitation.
Theorem 4. Our P3ASC scheme is EUF-CMA secure in the adaptive model
under the ECDL assumption.
Proof. We will give detailed security proof in the full version due to the space
limitation.
– HA, who acts as the PKG. HA is responsible for generating system public
parameters, issuing pseudonym-based private keys for PHI owners and PHI
writers, and attribute-based private keys for PHI writers and PHI readers.
– PHI Owners, who carry multiple wearable or implanted sensors and a patient
terminal. Those sensors can sense and process vital signs or environmental
parameters, and transfer the relevant data to the patient terminal. Typically,
patient terminal is equipped with mobile health application and database
collection and storage functions with the ability for mobile communication.
PHI owners register themselves to the HA by sending their real identity, and
the HA allocates pseudo-identity to the PHI owner, which is to be used for
all the communications in the network. Thus, the actual identity of the PHI
owner is concealed.
– Medical CSP, who keeps patient related data of registered users and provides
various services to the registered users. We assume that the medical CSP is
honest-but-curious, which means that the medical CSP will perfectly execute
the protocol specifications, but intend to extract the patient’s private personal
health information.
– PHI readers, who are allowed to view a PHI owner’s PHI. It can be doctors,
nurses, researchers, insurance company employees, etc.
– PHI writers, who are allowed to view and update a PHI owner’s PHI. It can
be doctors who may access the patients’ medical information and provide
medical services.
5 Conclusions
References
1. Negra, R., Jemili, I., Belghith, A.: Wireless body area networks: applications and
technologies. Procedia Comput. Sci. 83, 1274–1281 (2016)
2. Kang, J., Adibi, S.: A review of security protocols in mHealth Wireless Body Area
Networks (WBAN). In: Doss, R., Piramuthu, S., Zhou, W. (eds.) FNSS 2015.
CCIS, vol. 523, pp. 61–83. Springer, Cham (2015). https://doi.org/10.1007/978-3-
319-19210-9 5
3. Sadiku, M.N.O., Musa, S.M., Momoh, O.D.: Cloud computing: opportunities and
challenges. IEEE Potentials 33(1), 34–36 (2014)
4. Buchade, A.R., Ingle, R.: Key management for cloud data storage: methods and
comparisons. In: Fourth International Conference on Advanced Computing Com-
munication Technologies, pp. 263–270. IEEE Press (2014)
5. Patil, H.K., Seshadri, R.: Big data security and privacy issues in healthcare. In:
IEEE International Congress on Big Data, pp. 762–765. IEEE (2014)
6. Samaher, A.J., Ibrahim, A.S., Mohammad, S., Shahaboddin, S.: Survey of main
challenges (security and privacy) in wireless body area networks for healthcare
applications. Egypt. Inform. J. 18(2), 113–122 (2017)
7. Sahai, A., Waters, B.: Fuzzy identity-based encryption. In: Cramer, R. (ed.) EURO-
CRYPT 2005. LNCS, vol. 3494, pp. 457–473. Springer, Heidelberg (2005). https://
doi.org/10.1007/11426639 27
8. Goyal, V., Pandey, O., Sahai, A., Waters, B.: Attribute based encryption for fine-
grained access conrol of encrypted data. In ACM conference on Computer and
Communications Security, pp. 89–98 (2006)
9. Bethencourt, J., Sahai, A., Waters, B.: Ciphertext-policy attribute-based encryp-
tion. In: IEEE Symposium on Security and Privacy, pp. 321–334. IEEE Press
(2007)
10. Waters, B.: Ciphertext-policy attribute-based encryption: an expressive, efficient,
and provably secure realization. In: Catalano, D., Fazio, N., Gennaro, R., Nicolosi,
A. (eds.) PKC 2011. LNCS, vol. 6571, pp. 53–70. Springer, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-19379-8 4
11. Attrapadung, N., Libert, B., de Panafieu, E.: Expressive key-policy attribute-based
encryption with constant-size ciphertexts. In: Catalano, D., Fazio, N., Gennaro, R.,
Nicolosi, A. (eds.) PKC 2011. LNCS, vol. 6571, pp. 90–108. Springer, Heidelberg
(2011). https://doi.org/10.1007/978-3-642-19379-8 6
12. Pirretti, M., Traynor, P., McDaniel, P., Waters, B.: Secure attribute-based systems.
J. Comput. Secur. 18(5), 799–837 (2010)
13. Li, M., Yu, S.C., Zheng, Y., Ren, K., Lou, W.J.: Scalable and secure sharing of
personal health records in cloud computing using attribute-based encryption. IEEE
Trans. Parallel Distrib. Syst. 24(1), 131–143 (2013)
14. Wang, C.J., Xu, X.L., Shi, D.Y., Fang, J.: Privacy-preserving cloud-based per-
sonal health record system using attribute-based encryption and anonymous multi-
receiver identity-based encryption. Informatica 39(4), 375–382 (2015)
15. Yu, S., Wang, C., Ren, K., Lou, W.: Achieving secure, scalable, and fine-grained
data access control in cloud computing. In: Proceedings IEEE INFOCOM, pp. 1–9
(2010)
16. Tan, Y.L., Goi, B.M., Komiya, R., Phan, R.: Design and implementation of key-
policy attribute-based encryption in body sensor network. Int. J. Cryptol. Res.
4(1), 84–101 (2013)
Privacy-Preserving Pseudonym and Attribute-Based Signcryption 411
17. Maji, H.K., Prabhakaran, M., Rosulek, M.: Attribute-based signatures. In: Kiayias,
A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 376–392. Springer, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-19074-2 24
18. Pointcheval, D., Stern, J.: Provably secure blind signature schemes. In: Kim, K.,
Matsumoto, T. (eds.) ASIACRYPT 1996. LNCS, vol. 1163, pp. 252–265. Springer,
Heidelberg (1996). https://doi.org/10.1007/BFb0034852
19. Galindo, D., Garcia, F.D.: A Schnorr-like lightweight identity-based signature
scheme. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 135–
148. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02384-2 9
20. Chatterjee, S., Kamath, C., Kumar, V.: Galindo-Garcia identity-based signature
revisited. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol.
7839, pp. 456–471. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-
642-37682-5 32
S7commTrace: A High Interactive
Honeypot for Industrial Control System
Based on S7 Protocol
1 Introduction
With the development of “Industry 4.0” in the world, more and more industrial
control systems access to the Internet, which improves the production efficiency
greatly. At the same time, the security threats in cyberspace begin to penetrate
into industrial control systems. Stuxnet worm was disclosed in June 2010 for the
first time, which is the first worm attacking the energy infrastructure [1,2]. In
2014 the hackers attacked a steel plant in Germany through manipulating and
destroying the control system, so that the blast furnace could not be closed prop-
erly, resulting in a huge loss [3]. On December 23, 2015, the Ukrainian power
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 412–423, 2018.
https://doi.org/10.1007/978-3-319-89500-0_36
S7commTrace: A High Interactive Honeypot for Industrial Control System 413
network suffered a hacker attack, which was the first successful attack to the
power grid, resulting in hundreds of thousands users suffering power blackout
for hours [4]. On June 12, 2017, the security vendor ESET disclosed an indus-
trial control network attack weapons named as win32/Industroyer, which imple-
mented malicious attacks on power substation system. It can directly control the
circuit breaker to power the substation off [5].
Industrial control systems are highly interconnected and interdependent with
the critical national infrastructure [6]. Once a cyberspace security incident occurs
in the industrial control systems, it has a significant impact on the country’s
political and economic and other aspects. Therefore, different from the tradi-
tional security strategy in Internet, security incidents in industrial control system
should not be deal with after its occurrence. As we all know, every cyberspace
attack was preceded by a probe to host(s) or network [7]. So it is critical for the
industrial control system to build the ability of situation awareness by capturing
the detection and attacking data passively and to reveal potential attackers and
their motivations before a fatal attack happens.
Based on Siemens’ S7 communications protocol, we develop S7commTrace
which is a kind of high interactive honeypot for industrial control system. Fur-
thermore, we deploy S7commTrace and Conpot under the same circumstance
in four countries. According to the comparative experiments on the captured
data by these two kinds of honeypots in the following 20 days and the searching
results in Shodan after 30 days later, S7commTrace shows better performance
than Conpot.
2 Related Work
Honeypot is a kind of security resource that is used to attract the attacker for
illegal application without any business utility [8]. Honeypot technology is a
method to set some of the hosts, network services or information as a bait, to
induce attackers, so that the behavior of attack can be captured and analyzed
[9]. Honeypots can be used to better understand the landscape of where these
attacks are originating [10]. Venkat Pothamsetty and Matthew Franz of the Cisco
Critical Infrastructure Assurance Group (CIAG) released the first PLC honeypot
in 2004. They used Honeyd architecture to simulate the Modbus industrial con-
trol protocol [11]. Rist et al. [12] released Conpot, which was a open source low
interactive honeypot for industrial control system in May 2013. Conpot stopped
updating in November 2015. Although it supports up to seven kinds of protocols
(S7, Modbus, BACnet, HTTP, Kamstrup, SNMP, IPMI), all of them are low
interactive and only support a small number of function codes.
Serbanescu et al. [13] analyzed the attractiveness of the industrial equip-
ment exposed in the public network to the attacker and the behavior of the
attacker by setting low interactive ICS honeypot on a large scale. Jicha et al.
[10] deployed 12 Conpot SCADA Honeypot on AWS to evaluate the attrac-
tiveness and behavior in detail, by analyzing the scan results of NMAP and
SHODAN. Buza et al. [14] divided the honeypot into three categories according
414 F. Xiao et al.
to the complexity: low interaction, high interaction and hybrid. They summa-
rized the development of related project about honeypot for industrial control
system since 2004, including: CIAG SCADA HoneyNet Project, Honeyd, Digital
Bond SCADA HoneyNet, Conpot Project. After analyzing the advantages and
disadvantages of these projects, they designed and developed the Crysys PLC
honeypot (CryPLH), which supported Http, Https, SNMP, and Step 7.
Search engine for Internet-accessed devices, which is different from the tradi-
tional content search engine, probes the Internet network equipment information,
stores the results in the database, and provides web and API query interface.
Commonly used search engines for Internet- accessed devices are Shodan [15]
Censys [16,17] and ZoomEye [18]. Shodan uses the industrial control protocol
directly to crawl the industrial control equipment on the Internet, and visual-
ized the location and other information of them [19]. It is not only convenient
for network security practitioners, but also facilitates the attacker to locate vic-
tims. Furthermore it may expose the existence of honeypots. Bodenheim et al.
[20] deployed four Allen-Bradley ControlLogix 1756-L61 PLCs on the Internet
to check Shadon’s capabilities and found that four PLCs were all indexed by
Shodan within 19 days. Subsequently, he proposed a solution to reduce the risk
of exposure in Shodan by transforming the web service banner.
In summary, previous studies mainly focused on low interactive honeypots for
industrial control system. Conpot is one of the most famous and advanced hon-
eypot in recent years. Although it supports various industrial control protocols,
Conpot is easy to be recognized as honeypot by cyberspace search engine for its
characteristic of low interaction. CryPLH tries to improve the performance on
interaction, but it still lacks of the capability of support real industrial control
protocols. As we know, a deliberate and fatal network attack always starts with
detections to target. If there is no response to the initial requests, attackers will
abort their further operations. Therefore, it is quite necessary to develop a kind
of high interactive honeypot based on industrial control protocols to capture
detection and attacking data with good quality, while reducing the risk of being
marked by cyberspace search engine.
The Magic flag of the S7 communications protocol is fixed to 0x32, and the fol-
lowing fields are S7 type, data unit ref, parameters length, data length, result
info, parameters and data. In parameters field, the first byte stands for the func-
tion code of S7. Table 1 shows the optional function codes of S7. Communication
Setup code is used to build a S7 connection, Read code helps the host computer
to read data from PLC, Write code helps the host computer to write data to
PLC. As for the codes of Request Download, Download Block, Download End,
Download Start, Upload and Upload End, they are designed for downloading
or uploading operations of blocks. PLC Control code covers the operations of
Hot Run and Cool Run, while PLC Stop is used to turn off the device. When
the function code is 0x00, it stands for system function which is used to check
system settings or status. And the details are described by the 4-bits function
group code and 1-byte subfunction code in the parameters field. System Func-
tions further are divided into 7 groups, as shown in Table 2. Block function is
used to read the block, and Time Function is used to check or set the device
clock.
3.2 S7commTrace
S7commTrace can be divided into four modules, including TCP communication
module, S7 communications protocol simulation module, data storage module
and user template, as shown in Fig. 3. TCP communication module is responsi-
ble for listening on TCP port 102, submitting the received data to the Protocol
Simulation module, and replying to the remote peer. S7 communications proto-
col simulation module parses the received data according to the protocol format
and obtains the valid contents at first. And then S7 communications protocol
Simulation module generates the reply data referring to user template. At last,
the reply data are submitted back to TCP communication module to be pack-
aged. User template records all the user-defined information such as PLC serial
number, manufacturer, and so on. The data storage module handles the request
and response of data storage.
Cyber attacks against the S7 device are implemented based on some spe-
cific function codes, such as uploading, stopping, etc. And as we known, an
S7commTrace: A High Interactive Honeypot for Industrial Control System 417
experienced attacker usually tries to check the system status list (Read SZL)
or do other operations before the execution of those significant function codes.
Therefore, in order to record the attacker’s communication data completely and
accurately, a sophisticated honeypot should have as more responses to S7 func-
tion codes as possible to induce the attacker’s further operations. After setting
up a S7 communication, Conpot only support the subfunction code of Read SZL
and reply a fixed value of SZL ID and index. As for other function codes, Conpot
has no response to them. S7commTrace makes a great improvement over Conpot
by responding to all the function codes and subfunction codes listed in Tables 1
and 2.
In order to fabricate the responding data, we request and record the real
responses from a S7-300 PLC device firstly. And then by means of those real
data, a user defined template is made in S7commTrace. At the same time, we
customize unique settings of User Template among different S7commTrace hon-
eypots, without changing the data format.
4 Evaluation
We deploy Conpot and S7commTrace honeypots in United States (US), German
(GE), China (CN) and Singapore (SG) around the global area at the same time.
The deployment utilize Aliyun (US, CN, SG) and Host1Plus (GE) as virtual host
with configuration of 1.5 Ghz single core CPU, 1 GB RAM and 40 GB Disk. All
the operation systems of virtual hosts are Ubuntu Server 16.04 64 bits. Every
virtual host installs MySQL database to store data captured by local honeypot.
Furthermore, two copies of the VPS are rented in every county to make sure
that Conpot and S7commTrace are deployed under the same circumstance. The
experiment lasts for 60 days.
the format of S7 protocol. Ignoring such data, Conpot records a total of 82 valid
requests, while S7commTrace records a total of 535 valid requests. Compared
with Conpot honeypot, the number of valid request is significantly increased in
every S7commTrace honeypot. On the purpose of checking the quality of the
data, we calculate the rate of valid request in two kinds of honeypots. The aver-
age rate of valid request in Conpot is 17.37%, while the maximum rate does
not exceed 22.78%. But in S7commTrace, the average rate increases to 43.96%,
while the minimum rate is not less than 36.11%. Therefore, S7commTrace not
only records more requests but also records more valid request, compared with
Conpot. That means data quality is great improved in S7commTrace.
IP Conpot S7commTrace
US GE CN SG US GE CN SG
139.162.99.243
141.212.122.145 – –
113.225.219.220 – – – –
113.225.210.250 – – – –
80.82.77.139 – – – – –
71.6.146.185 – – – – – –
188.138.125.44 – – – – – –
120.132.93.150 – – – – – –
141.212.122.96 – – – – – –
141.212.122.48 – – – – – –
66.240.219.146 – – – – – –
5 Conclusions
We developed a kind of high interactive honeypot name as S7commTrace
for industrial control system based on Siemens’ S7 communications protocol.
Through deploying Conpot and S7commTrace globally at the same time, we
compared them from two dimensions: how they were indexed by cyberspace
search engine, and the detection and attacking data they recorded. And thus we
can draw the following in conclusion. Compared to the S7 component in Conpot,
S7commTrace has the following advantages:
6 Future Work
References
1. Chen, T.M., Abu-Nimeh, S.: Lessons from Stuxnet. Comput. 44(4), 91–93 (2011)
2. Kushner, D.: The real story of stuxnet. IEEE Spectrum 50(3), 48–53 (2013)
3. Zetter, K.: A cyberattack has caused confirmed physical damage for the
second time ever. http://www.wired.com//2015//01//german-steel-mill-hack-
destruction. Accessed 8 July 2017
4. Zetter, K.: Inside the cunning, unprecedented hack of Ukraine’s power grid.
https://www.wired.com/2016/03/inside-cunning-unprecedented-hack-ukraines-po
wer-grid/. Accessed 8 July 2017
5. https://www.eset.com/us/about/newsroom/press-releases/eset-discovers-dangero
us-malware-designed-to-disrupt-industrial-control-systems/. Accessed 8 July 2017
S7commTrace: A High Interactive Honeypot for Industrial Control System 423
6. Stouffer, K., et al.: Guide to industrial control systems (ICS) security. NIST special
publication vol. 800, no. 82, p. 16 (2011)
7. Hink, R.C.B., Goseva-Popstojanova, K.: Characterization of cyberattacks aimed
at integrated industrial control and enterprise systems: a case study. In: IEEE
International Symposium on High Assurance Systems Engineering, pp. 149–156
(2016)
8. Spitzner, L.: Honeypots: Tracking Hackers. Addison-Wesley Longman Publishing
Co. Inc., Boston (2002)
9. Zhuge, J.-W., et al.: Honeypot technology research and application. Ruanjian Xue-
bao/J. Softw. 24(4), 825–842 (2013)
10. Jicha, A., et al.: SCADA honeypots: an in-depth analysis of Conpot. In: 2016 IEEE
Conference on Intelligence and Security Informatics (ISI)
11. Pothamsetty, V., Franz, M.: SCADA Honeynet Project: Building Honeypots for
Industrial Networks. SCADA Honeynet Project, 15 July 2005
12. CONPOT ICS/SCADA Honeypot. http://conpot.org/. Accessed 16 July 2017
13. Serbanescu, A.V., et al.: ICS threat analysis using a large-scale honeynet. In: Pro-
ceedings of the 3rd International Symposium for ICS & SCADA Cyber Security
Research. British Computer Society (2015)
14. Buza, D.I., Juhász, F., Miru, G., Félegyházi, M., Holczer, T.: CryPLH: protecting
smart energy systems from targeted attacks with a PLC honeypot. In: Cuellar, J.
(ed.) SmartGridSec 2014. LNCS, vol. 8448, pp. 181–192. Springer, Cham (2014).
https://doi.org/10.1007/978-3-319-10329-7 12
15. Shodan. https://www.shodan.io/. Accessed 15 July 2017
16. Censys. https://censys.io/. Accessed 15 July 2017
17. Durumeric, Z., et al.: A search engine backed by internet-wide scanning. In: Pro-
ceedings of the 22nd ACM SIGSAC Conference on Computer and Communications
Security. ACM (2015)
18. Zoomeye. https://www.zoomeye.org/. Accessed 16 July 2017
19. Ics-radar. https://ics-radar.shodan.io/. Accessed 15 July 2017
20. Bodenheim, R., et al.: Evaluation of the ability of the Shodan search engine to
identify Internet-facing industrial control devices. Int. J. Crit. Infrastruct. Protect.
7(2), 114–123 (2014)
21. https://wiki.wireshark.org/S7comm. Accessed 15 July 2017
22. http://plcscan.org/blog/. Accessed 16 July 2017
Privacy Protection
Research on Clustering-Differential Privacy
for Express Data Release
Abstract. With the rapid development of “Internet +”, the express delivery
industry has exposed more privacy leakage problems. One way is the circulation
of the express orders, and the other way is the express data release. For the
second problem, this paper proposes a clustering-differential privacy preserving
method combining with the theory of anonymization. Firstly, we use DBSCAN
density clustering algorithm to initialize the original data set to achieve the first
clustering. Secondly, in order to reduce the data generalization we combine the
micro-aggregation technology to achieve the second clustering of the data set.
Finally, adding Laplace noise to the clustering data record and correct the data
that does not satisfy the differential privacy model to ensure the data availability.
Simulation experiments show that the clustering-differential privacy preserving
method can apply on the express data release, and it can keep higher data
availability relative to the traditional differential privacy preserving.
1 Introduction
With the rapid development of e-commerce, there has been unprecedented prosperity in
China’s express industry. Since 2011, the express business has increased by an average
of more than 50% annually. As of December 20, 2016, the State Post Bureau
announced that China’s express business has reached 30 billion, and it continues to be
ranked first in the world. The rapid development of express industry also inevitably
brings about some privacy leakage problems. The circulation of express delivery order
and the express data release are the main ways of express information leakage.
At present, the domestic related departments and some scholars have given a lot of
suggestions and solutions to the issue of privacy leakage in the express delivery
industry. The national postal office announced that the parcel delivery real-name reg-
istration system should be carried out from November 1, 2015 in order to cut off drugs
circulation. Wei [1] proposed the customer’s information leakage includes two aspects:
The information leakage in the express delivery and the information leakage in the
direct contact with the courier at the end of the express distribution. For the former,
they proposed a new K-anonymous model to protect the of customers’ privacy by
randomly breaking the relationship between the attributes in the records. For the pri-
vacy leakage in the express order, Zhou [2] designed the express information
management system based on DES and RSA encryption technology. Yang [3] pro-
posed personal privacy protection logistics system based on two-dimensional code
technology to protect customer’s information. That has solved the contradiction
between the logistics information encryption and logistics process through the hier-
archical encryption and permission grading. However, the above research focused on
strengthening the protection of express order information alone. Relevant research on
the express data release is rare. If data without privacy processing has been released,
these messages will be inadvertently stolen by malicious individuals and institutions.
And then it will pose a serious threat to personal safety. As for the data release, Chai [4]
proposed to use the improved K-means clustering algorithm to cluster sensitive attri-
butes to achieve the data table protection anonymously. According to the clustering
analysis in the data mining and the problem of hidden attribute publishing in the
clustering analysis, a new perturbation method NESDO based on synthetic data
replacement is proposed by Chong [5]. The algorithm can effectively keep the
clustering-data personalities and their common characteristics. Liu [6] combined
anonymous technology to propose a data privacy protection method, which maintains
the higher availability of published data by adding noise to anonymous data.
Combined with the existing literature, this paper proposes a clustering-differential
privacy preserving method for express data release. Firstly, using DBSCAN density
clustering algorithm initializes the original data set to achieve the first clustering.
Secondly, in order to reduce the data generalization we combine the micro-aggregation
technology to achieve the second clustering of the data set. Thirdly, considering that
the clustering will lead to the information leakage, this paper uses Laplace noise
mechanism to process the clustering data to meet the requirement of differential privacy
protection model. Finally, correcting the data that does not satisfy the differential
privacy model to ensure the data availability.
the goods’ price and so on. Table 1 lists the basic information on the express sheet. The
courier numbers are given according to Yuantong express.
As can be seen from Tables 1 and 2, the sender and recipient’s basic information
contains a lot of sensitive information. They need to be given corresponding protection
before the data is released. Table 3 is the detailed information of recipient stored in the
courier database. In Table 3, the recipient name, ID number is the explicit identifier.
Zip code, gender, age (gotten from the recipient ID number) is Quasi-identifier. Phone
number and recipient’s address are sensitive attributes. If these are stolen by criminals,
it will pose a threat to the recipient’s security.
2.2 Micro-aggregation
Micro-aggregation technology is a kind of data anonymity scheme. Micro-aggregation
[7] can be divided into two parts: k-division and aggregation. K-division can achieve
intra class homogeneity as much as possible and class homogeneity as small as
possible.
Definition 4 (K-division). There is a table TðA1 ; A2 . . .An Þ. QI is Quasi-Identifiers of
T. QI contains p attributes. The data table T is divided into
Pg classes based on QI, ni is
the number of tuples for the class of i. For 8i, ni k, n ¼ gi¼1 ni , then the data table T
is divided into k-division based on QI.
Definition 5 (Aggregation). There is a table TðA1 ; A2 . . .An Þ. QI is Quasi-Identifiers of
T. A k - division based on QI divides T into g classes. Ci is the tuple centroid for the
class of i. For iði ¼ 1; 2. . .; gÞ, we use Ci to replace all the operation of the object of the
class of i. And this process is called aggregation.
Definition 6 (Dissimilarity matrix). It refers to the storage of n objects between the two
similarities. dðj; iÞ is the quantitative representation of the object j and i dissimilarity.
The closer the two objects are, the closer value of d is to zero. The value of d is
calculated using Euclidean distance.
X 1=2
d ðX; Y Þ ¼ ðXi Yi Þ2 ð1Þ
x
sinði; jÞ ¼ ð3Þ
y
Input: Raw data set D = {D1, D2 …… Dn}, neighbor radius ‘e’, neighbor density
threshold ‘Minpts’, equivalence class quantity ‘k’, privacy protection budget
‘ε’, similarity threshold ‘y’
Output: the data set T clustered
Step 7, if the number of tuples is greater than 2 k, repeat steps 5–6. If the number of
tuples between k and 2 k, then they divide into a class. If the number of tuples is
less than k, then return to the nearest class;
Step 8, calculating the centroid of each class with the centroid instead of all the
objects within the class;
Step 9, adding Laplace noise to each data record that is queried, and return the data
set T that satisfies the differential privacy;
Step 10, the similarity threshold (formula 4) is used to evaluate the data before and
after the processing, and the correction is performed below the specified threshold
until the similarity is satisfied.
SSE1
SCORE ¼ ð6Þ
SSE2
In formula (6), ‘SSE1’ represents the information loss after clustering - differential
privacy. ‘SSE2’ represents the information loss after traditional differential privacy. The
process of scoring is normalizing the information loss. The smaller the ‘SSE1’ relative
to ‘SSE2’, the higher the availability of the release data.
Test 2. The amount of data is 5000. Setting the parameters e = 15, k = 5, e = 0.02.
Increasing the density threshold - Minpts. The experimental results are shown in Fig. 4.
The amount of data is 5000. Setting the parameters e = 15, k = 5, e = 0.02.
Increasing the density threshold - Minpts. The experimental results are shown in Fig. 5.
By comparing Figs. 4 and 5, we can draw the following conclusions:
① The value of ‘SCORE’ is always less than 1, which indicates that the
clustering - differential privacy method proposed in this paper can maintain
high data availability compared with the general differential privacy method;
② Keeping the other parameters unchanged and gradually increasing the
Minpts, the ‘SCORE’ also has an increasing trend;
③ As ‘e’ increases, the amount of noise added to the data decreases. The
information loss is small. And the value of ‘SCORE’ is small too.
436 T. Chen and H. Kang
5 Conclusion
Aiming at the information leakage in the express data release, this paper proposes a
clustering - differential privacy preserving method. On the basis of density clustering
algorithm, combining the idea of anonymity, the twice division of data set is carried out
by using micro aggregation technology, which reduces the data generalization. Taking
into account the clustering process will lead to leakage of information, while to resist
the homogeneity attacks and background knowledge attacks, this paper adds Laplace
noise to processed data set. Then we need to correct the data which does not meet the
Research on Clustering-Differential Privacy 437
similarity requirements. Theoretical analysis and experimental results show that this
method is suitable for the express data release and it can guarantee a higher data
availability in data release.
References
1. Wei, Q., Li, X.Y.: Express information protection application based on K- anonymity. Appl.
Res. Comput. 31(2), 555–557 (2014)
2. Zhou, C.Q., Zhu, S.Z., Wang, S.S., Ao, L.N.: Research on privacy protection in express
information management system. Logist. Eng. Manage. 37(12), 30–32 (2015)
3. Zhang, X.W., Li, H.K., Yang, Y.T., Sun, G.Z.: Logistic information privacy protection
system based on encrypted QR code. Appl. Res. Comput. 33(11), 3455–3459 (2016)
4. Chai, R.M., Feng, H.H.: Efficient (K, L)-anonymous privacy protection based on clustering.
Comput. Eng. 41(1), 139–142 (2015)
5. Chong, Z., Ni, W., Liu, T., et al.: A privacy-preserving data publishing algorithm for
clustering application. J. Comput. Res. Dev. 47(12), 2083–2089 (2010)
6. Liu, X.Q., Li, Q.M.: Differentially private data release based on clustering anonymization.
J. Commun. 37(5), 125–129 (2016)
7. Song, J., Xu, G.Y., Yao, R.P.: Anonymized data privacy protection method based on
differential privacy. J. Comput. Appl. 36(10), 2753–2757 (2016)
8. Xiong, P., Zhu, T.Q., Wang, X.F.: A survey on differential privacy and applications. Chin.
J. Comput. 37(1), 101–122 (2014)
9. Bhaskar, R., Laxman, S., Thakurta, A.: Discovering frequent patterns in sensitive data. In:
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining. Washington, USA, pp. 503–512 (2010)
10. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness
Knowl. Based Syst. 10(5), 557–570 (2002)
11. Sarwar, B., Karypis, G., Konstan, J., et al.: Intembased collaborative filtering recommen-
dation algorithms. In: Proceedings of the 10th International Conference on World Wide
Web, pp. 285–295. ACM (2001)
12. Mcsherry, F.D.: Privacy integrated queries: an extensible platform for privacy-preserving
data analysis. In: The 2009 ACM SIGMOD International Conference on Management of
Data. Providence, Rhode Island, pp. 19–30. ACM (2009)
Frequent Itemset Mining with Differential
Privacy Based on Transaction Truncation
1 Introduction
Frequent itemset mining can find valuable knowledge from mass data, but mining
sensitive data may reveal individual privacy. For example, analysis of search logs can
acquire the behavior of user’s page click, then get their interests in privacy. Therefore,
it is necessary to introduce privacy protection mechanism into frequent itemset mining.
Differential privacy [1, 2] is a privacy protection technology that adds noise to
query request or analysis results, it is not affected by attacker’s background knowledge,
and guarantees that adding or removing one transaction has little effect on the query
results.
The research of frequent itemset mining algorithm has made great progress with
differential privacy. Bhaskar et al. [3] applied Laplace mechanism (LM) to compute
noisy supports of all possible frequent itemsets, and then publish the top-k frequent
itemsets with the highest noisy supports. Zeng et al. [4] analyze the effect of transaction
length on global sensitivity, then they propose transaction truncating and heuristic
method. Zhang et al. [5] adopt EM to select the top-k frequent itemsets. In order to
boost availability of the noisy supports, they propose the technique of consistency
constraints.
An effective frequent itemset mining algorithm with differential privacy should
guarantee a certain privacy, then it tries to improve the availability of frequent itemsets.
According to SCO, the transaction length is proportional to Laplace noise, how to
reduce the length of long transactions is the key point for a transaction database, the
approach reduces some noisy errors, but it results in loss of items and brings more
truncation errors at the same time. So the challenge is how to balance both noisy errors
and truncation errors, the main contributions of this paper are as follows.
(1) In order to improve privacy protection of frequent itemsets, we propose the
algorithm FI-DPTT, it perturbs real supports of top-k frequent itemsets by Laplace
noise.
(2) In order to improve the availability of frequent itemsets under differential privacy,
we propose a quality function which balances both noisy errors and truncation
errors in EM, it draws on the idea of Median to find the optimal transaction length.
2 Preliminaries
Pr[Ƒ(D1)=O]≤eε×Pr[Ƒ(D2)=O] ð1Þ
In the above definition, Pr[Ƒ(D)=O] denotes that Ƒ(D) outputs the probability of being
O, e is called the privacy budget, which controls the strength of privacy protection.
A smaller e leads to stricter privacy protection and vice versa.
D1 and D2 are arbitrary neighboring databases, DQ denotes the most distance between
Q(D1) and Q(D2), global sensitivity is independent for arbitrary transaction databases.
440 Y. Xia et al.
ð4Þ
Where Du denotes global sensitivity of quality function u(p, D). The key point is
how to design a quality
function
u(p, D), p denotes the selected items from the output
euðp; DÞ
fields O. A larger exp 2Du leads to higher probability that is selected as output.
Definition 7 (False Negative Rate (FNR) [5]). Let TPk(D) be top-k frequent itemsets in
the database D, FNR measures the ratio that the real top-k frequent itemsets are in
TPk(D) and not in TPk(Dt). A smaller FNR leads to higher data accuracy.
Definition 8 (Average Relative Error (ARE) [5]). It measures the errors that we add
Laplace noise to top-k frequent itemsets in database D. Where TC(pi, TPk(D)) denotes
real supports of the frequent itemset pi in database D. NC(pi, TPk(Dt)) denotes noisy
supports of frequent itemset pi, If pi is not in TPk(Dt), we set NC(pi, TPk(Dt)) = 0.
A smaller ARE leads to higher data accuracy.
P jTC(pi , TPk (D)) NC(pi , TPk (Dt )) j
Pi 2 TPk ðDÞ TC(pi , TPk (D))
ARE ¼ ð6Þ
k
Frequent Itemset Mining with Differential Privacy 441
3 Proposed Algorithm
3.1 Idea of Transaction Truncation
We define the optimal transaction length. Total errors are the sum of noisy errors and
truncation errors, we truncate an original transaction database D into the transaction
database Dt, the total errors which we generate frequent itemsets in the Dt under
e-differential privacy are the smallest than any other truncated database, so the longest
transaction length in the database Dt is the optimal transaction length in the database D.
In order to reduce truncation errors, Apriori method is performed first to get can-
didates of 1-frequent itemsets and their supports, and then items of each transaction is
ranked in descending order with supports to get the database D0 (Step 1), when we
truncate a transaction database. e (Step 2) is allocated to two steps e1 (Step 3) and e2
(Step 5) on average. The database D0 is truncated into Dt by lopt (Step 4).
Procedure SelectOptLen draws on the characteristic of Median [9] that describes the
trend of transaction records, it is rarely influenced by extreme values. We scan the
database D0 to obtain length of each transaction, then adopt EM to get lopt. A quality
function uðt; D0 Þ ¼ countt . If rank(t) = SCALE |D0 |, we set u(t, D0 )
jrank(t) - SCALEjD'jjj
= 2countt, countt denotes the supports of the last item in the current transaction record,
rank(t) denotes the location where t is ranked in ascending order from the database D0 .
Du(t, D0 ) is affected one at most. Because we add or remove one transaction record from
the database D0 , the global sensitivity of u(t, D0 ) is one, that is, Du(t, D′) = 1.
Procedure Perturb-Frequency(Dt, λ, lopt, ε2)
Input: truncated transaction database Dt, a support threshold λ, privacy budget ε2 and lopt
Output: top-k frequent itemsets and their noisy supports under ε-differential privacy
1. S←FP-Growth(Dt, λ) /*Employing FP-Growth to generate k-frequent itemsets.*/
2. k←|S|
3. for i=1 to k do
4. ct(pi)←c(pi)+Lap(lopt/ε2) /*top-k frequent itemsets with Laplace noisy supports.*/
5. end for
6. return FS
4 Experimental Evaluation
4.1 Experimental Setting
This section evaluates FI-DPTT algorithm on the data availability with DP-topkP [5].
Experimental environment is Inter Core i5-2410 M, CPU 2.30 GHz, 4 GB memory,
Windows 7 and datasets PUMSB-STAR, RETAIL and KOSARAK [10]. FNR and
ARE are used for data analysis. We repeat the experiment for five times and get the
average (Table 1).
0.4
0.2
0
0.7 0.75 0.8 0.85 0.9 0.95
SCALE
Fig. 3. The relationship between k and availability when k changes in the dataset KOSARAK
5 Conclusion
If there are some long transactions in a transaction database, it makes the availability of
frequent itemsets reduced under differential privacy. The algorithm FI-DPTT combines
exponential mechanism with Laplace mechanism. In order to improve the availability
of frequent itemsets under differential privacy, a quality function of exponential
mechanism is designed to balance truncation errors and noisy errors, then Laplace
noise is added to the real supports of frequent itemsets. The proposed algorithm can
gain better performance on both data availability and privacy.
References
1. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.)
ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). https://doi.org/10.
1007/11787006_1
2. Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A.
(eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). https://doi.org/
10.1007/978-3-540-79228-4_1
3. Bhaskar, R., Laxman, S., Thakurta, A.: Discovering frequent patterns in sensitive data. In:
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2010
DBLP, pp. 503–512 (2010)
4. Zeng, C., Naughton, J.F., Cai, J.Y.: On differentially private frequent itemset mining.
VLDB J. 6(1), 25–36 (2012)
5. Zhang, X., Miao, W., Meng, X.: An accurate method for mining top-k frequent pattern under
differential privacy. J. Comput. Res. Develop. 51(1), 104–114 (2014)
Frequent Itemset Mining with Differential Privacy 445
6. Bonomi, L., Xiong, L.: A two-phase algorithm for mining sequential patterns with
differential privacy. In: ACM International Conference on Information & Knowledge
Management, pp. 269–278. ACM (2013)
7. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private
data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284.
Springer, Heidelberg (2006). https://doi.org/10.1007/11681878_14
8. Mcsherry, F., Talwar, K.: Mechanism design via differential privacy. In: Foundations of
Computer Science 2007, FOCS 2007, pp. 94–103. IEEE (2007)
9. Guoqing, L., Xiaojian, Z., Liping, D.: Frequent sequential pattern mining under differential
privacy. J. Comput. Res. Develop. 52(12), 2789–2801 (2015)
10. Datasets. http://fimi.ua.ac.be/data/
Perturbation Paradigms of Maintaining
Privacy-Preserving Monotonicity
for Differential Privacy
1 Introduction
To ensure the tradeoff between data utility and privacy-preserving, Dwork
et al. [1] first proposed differential privacy for numeric data. It is a rigorous
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 446–458, 2018.
https://doi.org/10.1007/978-3-319-89500-0_39
Perturbation Paradigms of Maintaining Privacy-Preserving Monotonicity 447
2 Preliminaries
Here, we introduce the preliminaries of computational indistinguishability and
differential privacy.
P r[A(D, X)]
1 − negl(n) ≤ ≤ 1 + negl(n) (2)
P r[A(D, Y )]
Since ε is small, we say that some algorithm A cannot distinguish these two
distributions. This mean that A cannot tell whether it is given an element ri
sampled according to distribution X or Y . We can know that the smaller ε, the
better indistinguishable. In other words, the probability ratio of two distribu-
tions X and Y is closer to 1, the indistinguishability is better. Therefore, the
indistinguishability is stronger, and the distinguish error or uncertainty of an
algorithm A is bigger.
Laplace mechanism and Gaussian mechanism are suitable for numerical data.
Exponential mechanism is used for character data. Given some arbitrary range ,
which the user prefers to output some element of with the maximum possible
utility. With respect to some utility function u : D × → R, which maps
database and output pairs into utility. So the sensitivity of utility function u is
3 Perturbation Paradigms
In widely applications, different perturbation methods have reached the goals
of demand. We summarize three perturbation paradigms, including the linear
perturbation, non-linear perturbation, and randomized perturbation.
In linear perturbation, the perturbation is achieved by adding linear noise in
existing work. Yang et al. [7] proposed Bayesian differential privacy with output
450 H. Liu et al.
perturbation by adding Laplace noise. Abadi et al. [8] developed deep learning
with differential privacy by adding Gaussian noise. Tong et al. [9] proposed a
scheduling protocol for the purpose of protecting user’s location privacy and min-
imizing vehicle miles based on joint differential privacy by adding Gaussian noise.
In addition to adding noise, we give several perturbation methods in the linear
perturbation paradigm, including subtraction, multiplication, division, filtering,
and moving average filtering noise. To the best of our knowledge, there is no non-
linear perturbation paradigm in the present work. We give several perturbation
methods in the non-linear perturbation paradigm, including adding absolute
value, square, exponential transformation, sine transformation and cosine trans-
formation of noise. Next, we summarize technologies of randomized perturbation
are as following, including Exponential mechanism and randomized response.
Many data summarization applications involving sensitive data about indi-
viduals whose privacy concerns are not automatically addressed, so Mitrovic
et al. [10] proposed a general and systematic study of differential privacy sub-
modular maximization by using Exponential mechanism. Qin et al. [11] proposed
LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with
local differential privacy using randomized response.
In Table 1, we summarize the perturbation methods of three perturbation
paradigms for numeric and character data.
Table 1. Perturbation methods of three perturbation paradigms for numeric and char-
acter data
We use error and uncertainty as privacy metrics for numerical data and
character data, respectively. Concretely, we use expected estimation error and
entropy as privacy metrics for numerical data and character data, respectively.
Here, other error metrics can also be used for numeric data, such as Euclidean
distance, and other uncertainty metrics can be also used for character data. Since
noise perturbation of differential privacy mechanisms are random for numerical
data, the monotonicity curve of expected estimation error is more smooth than
that of Euclidean distance. Thus, we chose expected estimation error as privacy
metric for numeric data in this paper. The expected estimation error is the
expected distance between the perturbation value xi and the true value xi . The
p(xi ) is the probability of perturbation value xi . The metric of the expected
estimation error (EEE) is EEE = xi ∈D p(xi )||xi − xi ||1 .
And, entropy quantifies the amount of information contained in a random
variable. It is used as a privacy metric, which indicates
n the adversary’s uncer-
tainty. Thus, entropy (EN T ) metric is EN T = − i=1 p(xi ) log2 p(xi ), where
p(xi ) denotes the probability of the random variable.
According to the definition of computational indistinguishability and pri-
vacy metric of expected estimation error, we have the following theorem about
privacy-preserving monotonicity of differential privacy for numeric data.
Theorem 1. In differential privacy, the expected estimation error EEE
decreases as the privacy budget ε increases for numeric data.
Proof. Here, we use the identity query function f : D → D. In differential
privacy, for any adjacent datasets D1 and D2 , we perturb query results using
the same noise subjected to identical distribution with different expectations.
Thus, we have P r[M (D1 )] ≤ eε P r[M (D2 )] + δ.
Now, let us assume that the mechanism M (D2 ) subjects to a fixed probability
distribution, and probability value δ is constant. For privacy budget εi , if the
probability density function is p(x), then there is
Xk
P r[M (D1 , εi )] = p(x)dx (5)
−∞
n
1 e
εui
j=1 (ui − uj )e
εuj n
Hi (ε) = − n (εu i − ln eεuj + 1) (10)
ln 2 ( j=1 eεuj )2 j=1
and
n
n
1 e
εui
j=1 (ui − uj )e
εuj n
H (ε) = − ( n (εu i − ln eεuj + 1)) (11)
i=1
ln 2 ( j=1 eεuj )2 j=1
n n
Since i=1 eεui j=1 (ui − uj )eεuj = 0, there is
1 ε n n n
H (ε) = − n εu 2 ( u2i eεui eεui − ( eεui )2 ) (12)
ln 2 ( i=1 e i ) i=1 i=1 i=1
Therefore, H (ε) ≤ 0 when given ε ≥ 0. So H(ε) deceases as ε increases.
When ε = 0, H(ε) is the maximum, which denotes the strongest privacy
preserving level. Similarity, the same property can be proved for randomized
response.
Fig. 4. Moving average filtering (MAF) Fig. 5. Adding absolute value, square,
perturbation of Gaussian mechanism and sine transformation perturbations
(GM) for different N of Laplace mechanism (LM) and Gaus-
sian mechanism (GM)
monotonicity of differential privacy for numeric data, where b is the scale param-
eter of probability distribution that generating noise X and k is any appropriate
real number.
6 Conclusion
Acknowledgments. This work was supported by the National Natural Science Foun-
dation of China with No. 61173190, No. 61602290, and No. 61662009, Natural Sci-
ence Basic Research Program of Shaanxi Province with No. 2017JQ6038, Fundamental
Research Founds for the Central Universities with No. GK201704016, No. 2016CBY004,
No. GK201603093, No. GK201501008, and No. GK201402004, Program of Key Science
and Technology Innovation Team in Shaanxi Province with No. 2014KTC-18, and
Open Project Fund of Guizhou Provincial Key Laboratory of Public Big Data with
No. 2017BDKFJJ026.
References
1. Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in
private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876,
pp. 265–284. Springer, Heidelberg (2006). https://doi.org/10.1007/11681878 14
2. Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our data, our-
selves: privacy via distributed noise generation. In: Vaudenay, S. (ed.) EURO-
CRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006). https://
doi.org/10.1007/11761679 29
3. McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: 48th
Annual IEEE Symposium on Foundations of Computer Science, pp. 94–103. IEEE
(2007)
4. Warner, S.L.: Randomized response: a survey technique for eliminating evasive
answer bias. J. Am. Stat. Assoc. 60(309), 63–69 (1965)
5. Katz, J., Lindell, Y.: Introduction to Modern Cryptography. CRC Press, Boca
Raton (2014)
6. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found.
Trends R Theor. Comput. Sci. 9(3–4), 211–407 (2014)
7. Yang, B., Sato, I., Nakagawa, H.: Bayesian differential privacy on correlated data.
In: Proceedings of the 2015 ACM SIGMOD International Conference on Manage-
ment of Data, pp. 747–762. ACM (2015)
8. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K.,
Zhang, L.: Deep learning with differential privacy. In: Proceedings of the 2016
ACM SIGSAC Conference on Computer and Communications Security, pp. 308–
318. ACM (2016)
9. Tong, W., Hua, J., Zhong, S.: A jointly differentially private scheduling protocol for
ridesharing services. IEEE Trans. Inf. Forensics Secur. 12(10), 2444–2456 (2017)
10. Mitrovic, M., Bun, M., Krause, A., Karbasi, A.: Differentially private submodular
maximization: data summarization in disguise. In: International Conference on
Machine Learning, pp. 2478–2487. ACM (2017)
458 H. Liu et al.
11. Qin, Z., Yang, Y., Yu, T., Khalil, I., Xiao, X., Ren, K.: Heavy hitter estimation
over set-valued data with local differential privacy. In: Proceedings of the 2016
ACM SIGSAC Conference on Computer and Communications Security, pp. 192–
203. ACM (2016)
12. Fanaee-T, H., Gama, J.: Event labeling combining ensemble detectors and back-
ground knowledge. Prog. Artif. Intell. 2(2–3), 113–127 (2014)
13. Lipowski, A., Lipowska, D.: Roulette-wheel selection via stochastic acceptance.
Phys. A: Stat. Mech. Appl. 391(6), 2193–2196 (2012)
The De-anonymization Method Based
on User Spatio-Temporal Mobility Trace
Zhenyu Chen1,2(B) , Yanyan Fu1 , Min Zhang1 , Zhenfeng Zhang1 , and Hao Li1
1
Institute of Software Chinese Academy of Science, Beijing, China
{chenzhenyu,fuyy,mzhang,zfzhang,lihao}@tca.iscas.ac.cn
2
University of Chinese Academy of Sciences, Beijing, China
Abstract. Nowadays user mobility traces are more and more likely to
de-anonymize users in addition to other types of data. Among them,
model-based approaches usually provide more accurate de-anonymize
than that of feature-locations-based approaches. More recently, Hidden
Markov Model (HMM) based approaches are proposed to find out the
mobility pattern of user mobility traces. However, in the key step of the
hidden state definition, existing models rely on the fixed classification of
time, space or number, which can hardly suit for all various users. In this
paper, we propose a user de-anonymization method based on HMM. Dif-
ferent from current approaches, the method utilizes a novel density-based
HMM, which uses the density-based clustering to obtain hidden states
of HMM from three-dimensional (time, latitude and longitude) spatio-
temporal data, and provide much better performance. Furthermore, we
also propose a frequent spatio-temporal cube filter (FSTC-Filter) which
significantly reduces the number of candidate models and thus improves
the efficiency.
1 Introduction
With the popularity of smartphones, application service providers can easily
collect personal activity information from billions of users, but this also brings
about the massive risk of user privacy leakage. Usually, these providers will
take means of anonymization before publishing them to the public, for instance,
substituting users identifiers (e.g., name, SSN) with pseudonyms, but the user
identity still can be compromised due to a more in-depth study. Some famous
de-anonymization examples include discovering the political stance from Netflix
viewing history and linking Netflix account to real users [10], distinguishing users
in the anonymous social network by comparing the similarity of graph struc-
ture [9,11]. Recently, user mobility traces can be better used to predict future
locations [5,14,15], distinguish abnormal users [6,7], also include de-anonymize
users.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 459–471, 2018.
https://doi.org/10.1007/978-3-319-89500-0_40
460 Z. Chen et al.
Early studies [2,3,8,17] usually take several feature places (e.g., most fre-
quent locations) of each user as user mobility profile. Some studies [4,12,13,18]
build a statistical probability model (e.g., Markov chain, HMM) as user mobility
profile. As we will see in Sect. 2, model-based approaches usually provide more
accurate match than feature-locations based approaches. More recently, several
approaches based on HMM can better find out the activity pattern of user mobil-
ity traces. In these approaches, core part is the definition of hidden states, which
determines the effectiveness of the model. Existing models choose hidden states
either on the classification of time and space or EM-cluster, which cant handle
all various users simultaneously.
In this paper, we propose a novel density-based HMM, which uses the density-
based clustering to obtain hidden states of HMM from three-dimensional (time,
latitude and longitude) data. Rather than assuming everyone has the same hid-
den states, our density-based HMM automatically extract distinct hidden states
from the mobility trace of each user, considering the time and spatial density
feature of data. Furthermore, we also propose a frequent spatio-temporal cube
filter (FSTC-Filter) to improve de-anonymization efficiency.
In general, the contribution of this paper are summarized as follows:
1. We define a density-based HMM (DBHMM) to profile user mobility. DBHMM
establish an accurate mobility model by generating more personalized hidden
states for each user. The clusters (hidden states) and its number are generated
directly from each user’s data.
2. We introduce a frequent spatio-temporal cube filter (FSTC-Filter) to reduce
the number of candidate models for anonymous trace matching. It improves
the efficiency of attack while ensuring accuracy.
3. We perform experiments based on two real datasets. Results show that
DBHMM describes user behavior more accurately than others, FSTC-Filter
precisely filters out lots of incorrect users to improve efficiency.
The rest of this paper is organized as follows. Section 2 highlights related
works and analyses existing approaches. Section 3 introduces the framework of
our method. Two essential components are elaborated in Sects. 4 and 5, respec-
tively. Extensive experimental evaluation is presented in Sect. 6. Finally, Sect. 7
concludes this paper.
2 Related Work
Existing de-anonymization approaches based on user locations can classify into
two major categories. One kind of works builds user mobility profile by extracting
his/her feature locations, which we call the feature-locations-based approaches.
Another build a statistical probability model from his/her mobility traces, which
we call the model-based approaches.
Feature-locations-based approaches characterize a user by his/her feature
locations(usually frequent locations). Zang and Bolot [17] performed a study
on the top-k most frequently visited locations of each user. De Montjoye et al.
The De-anonymization Method 461
approaches. However, there are still two deficiencies of them. Firstly, allocat-
ing hidden states to all users by the same rules may bring in inconvenience for
some of the users. Secondly, the time cost is far more than feature-locations. We
will discuss and improve them in detail in Sects. 4 and 5, respectively.
3 Overview
density
hidden
Training based
states
dataset HMM
generate
training User
mobility
Models
Figure 1 depicts the framework of our method, which includes two major
phases: the training phase and the attack phase. The training phase is the prepa-
ration stage for de-anonymization. It used to train a statistical probability model
as mobility profile for each user. We first generate personalized hidden states by
density-based clustering algorithm, then train the density-based HMM depend-
ing on the above hidden states. The attack phase is the execution stage for de-
anonymization. It used to find out the user who is the most probable owner of
anonymous mobility traces. In this phase, we de-anonymize anonymous data in
three steps: Firstly, we utilize frequent spatio-temporal cube filter (FSTC-Filter)
to reduce the number of candidate models for anonymous trace matching. Then,
we use the Viterbi algorithm to generate matching probability between anony-
mous traces and filtered candidate models. Finally, through ranking and voting,
we pick a final match result from the top-k candidate sets.
The De-anonymization Method 463
userA
userB
latitude
longitude
automatically obtain all high-density areas that have an uncertain number and
any shape. On the other hand, the dense area of spatio-temporal points usually
implies a pattern of user mobility, which corresponds to the purpose of hidden
states generation.
We build our hidden states generation algorithm by extending the density
joinable cluster (DJ-Cluster) algorithm (a typical density-based clustering algo-
rithm [20]) to three-dimensional space. The main idea of DJ is to generate the
initial cluster by parameters Minpts and ε, then merges the initial clusters step
by step until it stops. Minpts is the minimal number of points needed to cre-
ate an initial cluster, while ε is the maximum radius of the initial cluster. The
Euclidean distance d (p1, p2) is adapted to measure the distance between two
spatio-temporal points p1 (t1, Lat1, Lon1) and p2 (t2, Lat2, Lon2). More specif-
ically, d (p1, p2) is defined as follows:
d(p1, p2) = (t1 − t2)2 + a ∗ (Lat1 − Lat2)2 + b ∗ (lon1 − Lon2)2
40
30
time\half-hour
20
10
0
117
116.8 40.05
116.6 40
39.95
116.4
39.9
latitude 116.2 39.85 longitude
count(si )
πi = p(si ) = m
j=1 count(sj )
T (si , sj )
ai,j =
T (si )
where T (si ) indicates how many times the user visits a place in si and the
next place not in si , while T (si , si ) indicates how many times the user visits
a place in si and visits the next place in sj .
– Emission probability set E = {e1 (o1 ), . . . , ek (oi ), . . . , em (on )}, each element
ek (oi ) is the probability that observation is oi when the state is sk , which is
defined as
φ(oi |sk )
ek (oi ) = p(oi |sk ) = n
j=1 φ(oj |sk )
where φ(oj |sk ) indicates how many times the user visits oj in sk .
In particular, we note that user behavior is not exactly similar every day. For
example, most people do not go to the workplaces at weekends. Therefore, we
build user mobility model based on three granularities. Firstly, each user only
has a unique model; secondly, each user has a rest day model and a workday
model; thirdly, each user has seven models (from Sunday to Saturday).
anonymous trace match with a user who has 10 hidden states, the time cost of
Viterbi Algorithm is 100 times of feature-locations. There are also scalability
concerns about the number of users. In reality, de-anonymization attacks may
involve millions of users that are far more than users in the research experiment.
To overcome the deficiency, we introduce a frequent spatio-temporal cube
filter (FSTC-Filter) to reduce the number of users to match with anonymous
mobility trace. We divide the three-dimensional (time, latitude and longitude)
space into spatio-temporal cubes. One day is divided into 24 intervals by hour,
latitude and longitude are divided into intervals by 1 km. The maximum number
of cubes is Nx * Ny * 24, and each cube’s volume is about 1 km * 1 km * 1 h. Nx
and Ny are the numbers of segments of latitude and longitude in user trace
range. A cube is a FSTC when the point of user trace appears in the cube greater
than the threshold (depend on the density of dataset). We set the overlapping
proportion x that is ta match with each FSTC; no overlapping threshold is λ.
If x ≥ 1 − λ, we skip the step that match with DBHMM. The x calculate as
follows:
|c ∈ ta ∩ F ST Ci |
x=
|ta |
where |c ∈ ta ∩ F ST Ci | is the number of ta cubes also in FSTC, repeat the count
of the same cube that multiple appears; |ta | is the number of cubes in ta .
6 Experiment
In this section, we evaluated our method based on data sets described in Sect. 6.1.
All the codes were implemented in JAVA, and the experiments were conducted
on a computer with Intel Core i7-4770 3.4 GHz CPU and 8 GB memory.
6.1 Data
In this study, we consider two widely used real-life mobility trace data sets,
namely GeoLife [19], Gowalla [16]. Note that we will use only the longitude,
altitude, date and time from these datasets.
GeoLife contains GPS traces of 182 users of Beijing collected per 5s from
Apr. 2007 to Oct. 2011. Each record includes pseudonym, latitude, longitude,
altitude, date and time. To ensure the accuracy of models, we just use the data
of users who have more than 15 daily records, and total 105 users up to the
standard.
Gowalla contains 456,988 check-in records of 10,162 users in the California
and Nevada between Feb. 2009 and Oct. 2010. Each record includes user ID,
time, coordinate (latitude and longitude) and POI information. Our experiment
chooses 1072 users who have more than 100 records.
We divided each dataset into two parts: the training set and the test set.
And we randomly choose 80% data as the training set, the rest 20% as the test
set. These two sets are not any overlapping.
The raw data sets are not suitable for user modeling directly. First, it can
cause overfitting. Because its precision is centimeter, a user record in the same
place may be treated as different locations (e.g., p1(39.994622, 116.326757),
p2(39.994614, 116.326753), distance is about 80 cm, but p1 = p2). This overfit-
ting will lead to the mismatch between anonymous traces and candidate users.
Second, high-precision data requires excessive calculation. In the experiment,
the raw spatio-temporal data precision is generalized, latitude and longitude
precision is about 100 m, time precision is half an hour.
Geolife Gowalla
1
0.9
0.8
0.7
Accuracy 0.6
0.5
0.4
0.3
0.2
0.1
0
MMC Rand4 UHMM Ours
may cause wrong matches. The accuracy of Rand4 in Geolife is the worst. But
in Gowalla, the accuracy of Rand4 method is better than MMC and UHMM. It
is probably because of that, Gowalla is user initiative published points, randomly
select 4 points may uniquely identify a user; but randomly selected 4 points in
Geolife may choose useless points that belong to multiple users’ trace.
The accuracy of all different methods in Gowalla is better than in Geolife.
Gowalla is check-in data set. Each point contains lots of biased information and
behavior characteristics. Geolife is GPS data that collected by devices. Most
points provide lots of non-subjective and irrelevant information. Besides, Geolife
includes traces of MSRA employees who work in the same location; its mobility
trace similarity is higher than average level.
accuracy recall
1 210000
180000
running time (s)
0.9 150000
0.8 120000
rate
90000
0.7
60000
0.6 30000
0.5 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
λ λ
recall rate are slowly increasing with the parameter λ before it gradually becomes
stable. The increasing trend of recall rate is slightly greater than accuracy. When
λ is small, it may filter out some correct users, recall rate and accuracy increase
with λ; with λ become large, more users will pass filter, resulting in a larger recall
rate increasing than accuracy. Figure 5b shows the total cost of 1861 traces (total
contain 4861679 records) match with 105 models (1861 * 105 match probabilities
are calculated). The running time increases exponentially with λ. It is because
that increasing λ will pass a lot of incorrect users, then the running time increase
rapidly. As we can see, λ = 0.3 is a turning point which both accuracy and recall
rate tend to be stable, and running time also is in the smooth growth stage.
1 1
0.9 0.9
Accuracy
Accuracy
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0 200 400 600 800 1000 1 2 3 4 5
ε (m) k
0.9
Accuracy
0.8
0.7
0.6
0.5
day rest&work week
7 Conclusion
In this paper, we propose a de-anonymization attack that infers the owner of
an anonymous set of user mobility traces. Our method distinguishes itself from
existing methods in three aspects. (1) It analyses the advantages of feature-
locations-based approaches and model-based approaches. (2) It uses frequent
spatio-temporal cube to perform rationally coarse-grained filtering. (3) It uses
density-based clustering to extract hidden states of HMM that can reflect the
subjective initiative of each user and captures user personalized feature. Our
experiments on two real-life datasets show that our method is effective in differ-
ent types of user mobility traces.
In the future, we plan to extend the current work by following avenues. We
will further introduce the semantic and social element to our model, integrate
multiple signals to generate a comprehensive view of user behavior. We will also
explore a privacy protection mechanism to resist this de-anonymization attack
while ensuring data usability.
References
1. Ayhan, S., Samet, H.: Aircraft trajectory prediction made easy with predictive
analytics. In: Proceedings of the 22nd International Conference on Knowledge Dis-
covery and Data Mining (2016)
2. De Montjoye, Y.-A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the
crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)
3. Freudiger, J., Shokri, R., Hubaux, J.-P.: Evaluating the privacy risk of location-
based services. In: Danezis, G. (ed.) FC 2011. LNCS, vol. 7035, pp. 31–46. Springer,
Heidelberg (2012). https://doi.org/10.1007/978-3-642-27576-0 3
4. Gambs, S., Killijian, M.-O., del Prado Cortez, M.N.: De-anonymization attack on
geolocated data. J. Comput. Syst. Sci. 80(8), 1597–1614 (2014)
5. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.-L.: Understanding individual human
mobility patterns. Nature 453(7196), 779–782 (2008)
6. Lin, M., Cao, H., Zheng, V., Chang, K.C., Krishnaswamy, S.: Mobile user verifica-
tion/identification using statistical mobility profile. In: 2015 International Confer-
ence on Big Data and Smart Computing (BigComp), pp. 15–18. IEEE (2015)
7. Lin, M., Cao, H., Zheng, V., Chang, K.C.-C., Krishnaswamy, S.: Mobility profiling
for user verification with anonymized location data. In: Twenty-Fourth Interna-
tional Joint Conference on Artificial Intelligence (2015)
The De-anonymization Method 471
8. Naini, F.M., Unnikrishnan, J., Thiran, P., Vetterli, M.: Where you are is who you
are: user identification by matching statistics. IEEE Trans. Inf. Forensics Secur.
11(2), 358–372 (2016)
9. Narayanan, A., Shi, E., Rubinstein, B.I.P.: Link prediction by de-anonymization:
how we won the kaggle social network challenge. In: The 2011 International Joint
Conference on Neural Networks (IJCNN), pp. 1825–1834. IEEE (2011)
10. Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets.
In: IEEE Symposium on Security and Privacy, SP 2008, pp. 111–125. IEEE (2008)
11. Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: 2009 30th
IEEE Symposium on Security and Privacy, pp. 173–187. IEEE (2009)
12. Pan, J., Rao, V., Agarwal, P., Gelfand, A.: Markov-modulated marked Poisson
processes for check-in data. In: International Conference on Machine Learning, pp.
2244–2253 (2016)
13. Wang, R., Zhang, M., Feng, D., Fu, Y., Chen, Z.: A de-anonymization attack on
geo-located data considering spatio-temporal influences. In: Qing, S., Okamoto, E.,
Kim, K., Liu, D. (eds.) ICICS 2015. LNCS, vol. 9543, pp. 478–484. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-29814-6 41
14. Wang, Y., Yuan, N.J., Lian, D., Xu, L., Xie, X., Chen, E., Rui, Y.: Regularity and
conformity: location prediction using heterogeneous mobility data. In: Proceedings
of the 21th ACM SIGFKDD International Conference on Knowledge Discovery
and Data Mining, pp. 1275–1284. ACM (2015)
15. Xue, A.Y., Zhang, R., Zheng, Y., Xie, X., Huang, J., Xu, Z.: Destination prediction
by sub-trajectory synthesis and privacy protection against such prediction. In: 2013
IEEE 29th International Conference on Data Engineering (ICDE), pp. 254–265.
IEEE (2013)
16. Yuan, Q., Cong, G., Ma, Z., Sun, A., Thalmann, N.M.: Time-aware point-of-
interest recommendation. In: Proceedings of the 36th International ACM SIGIR
Conference on Research and Development in Information Retrieval, pp. 363–372.
ACM (2013)
17. Zang, H., Bolot, J.: Anonymization of location data does not work: a large-scale
measurement study. In: Proceedings of the 17th Annual International Conference
on Mobile Computing and Networking, pp. 145–156. ACM (2011)
18. Zhang, C., Zhang, K., Yuan, Q., Zhang, L., Hanratty, T., Han, J.: Gmove: group-
level mobility modeling using geo-tagged social media. In: Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Min-
ing, pp. 1305–1314. ACM (2016)
19. Zheng, Y., Zhang, L., Xie, X., Ma, W.-Y.: Mining interesting locations and travel
sequences from GPS trajectories. In: Proceedings of the 18th International Con-
ference on World Wide Web, pp. 791–800. ACM (2009)
20. Zhou, C., Frankowski, D., Ludford, P., Shekhar, S., Terveen, L.: Discovering per-
sonally meaningful places: an interactive clustering approach. ACM Trans. Inf.
Syst. (TOIS) 25(3), 12 (2007)
Privacy-Preserving Disease Risk Test
Based on Bloom Filters
1 Introduction
The first full sequencing of a human genome (“Human Genome Project”)1 was
completed in 2003, which takes 13 years, 3 billion dollars and involves more
than 20 institutions world wide. Since then, numerous companies and research
institutions have been moving toward more and more affordable and accurate
technologies. For example, a genomics company called Illumina has brought the
price of sequencing a human genome down to $1,000 in 2014, and the company
claims that its newest machine can bring the cost down to $100 over the next
few years. Advances in Whole Genome Sequencing (WGS) make genome testing
available to the masses. At the same time, it stimulates the development of
personalized healthcare such as predicting disease predisposition or response to
the treatment. Some commercial companies (e.g., 23andMe2 ) already provide
low-cost risk tests to their customers for certain diseases.
1
The human genome project, https://www.genome.gov/12011238/an-overview-of-
the-human-genomeproject/.
2
https://www.23andme.com. Accessed on 8/Sep/2017.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 472–486, 2018.
https://doi.org/10.1007/978-3-319-89500-0_41
Privacy-Preserving Disease Risk Test Based on Bloom Filters 473
4
See the definition of homomorphic encryption in Sect. 3.3.
5
We discuss the reason for choosing 100 in Sect. 4.1.
Privacy-Preserving Disease Risk Test Based on Bloom Filters 475
(with false positive rate 0.1%). We link one ciphertext to each bloom filter at
the leaf node level to represent the encrypted states of SNPs. In [11,14], they
store one ciphertext (encrypted state) for each position of SNPs. However, we
store one ciphertext for 100 positions of SNPs. Without considering the small-
size index structure, we reduce the storage overhead by 100x. On the other
hand, bloom filter provides fast searching speed. For example, we only need to
compute ten hash functions to check whether an element belongs to a bloom filter
which contains 1 million elements (with false positive rate 0.1%). If MU wants to
conduct disease risk test for a patient, MU will send a list of encrypted positions
to SPU. Then SPU checks the bloom filter tree hierarchically and returns the
corresponding encrypted states to MU if contained. The disease risk is computed
on the MU’s side via homomorphic operations. Once the computation is done,
the final disease risk is disclosed to MU under the authorization of the patient.
The main contributions of our work are summarized as follows.
(1) We put forward a general framework to reduce the storage overhead of exist-
ing works by 100x, regardless of the underlying cryptosystem. To be specific,
we insert 100 positions into a bloom filter and link one ciphertext to the
bloom filter to represent the encrypted states of SNPs.
(2) For genetic disease risk test, we are the first to create index for the encrypted
SNPs stored at SPU. We organize bloom filters with a tree structure. Com-
pared to B+ tree index, we speed up the searching of SNPs by 60x.
(3) We implement our scheme with SNPs of a real person. The experimental
results show the practicality of our scheme.
2 Related Work
Ziegeldorf et al. perform string matching with bloom filters [18] on the Track
3 dataset provided by the iDASH Secure Genome Analysis Competition. They
assume n patients have SNPs at the same m locations, without considering
the fact that different patients own different sets of 4 million SNPs. Our paper
handles with the positions and states of the SNPs separately. We focus on the
storage and searching of SNPs [11,12,14] in this paper. But beyond that, there
exists some related work worthy to be discussed. Since the genome contains the
most sensitive information, how and where should a patient’s genomic data be
stored have always been controversial. In the cloud? In a physician’s office? On a
smartphone? All have strengths and limitations in terms of portability, capacity,
computing power and privacy. One personal genomic toolkit called GenoDroid is
implemented on the Android platform [19]. To reduce the computation on mobile
phones, the genomic data and certain cryptographic operations should be pre-
processed on a laptop. On the other hand, although the SNPs are encrypted
at SPU, a honest-but-curious SPU learns which SNPs are accessed, how often
SNPs are used. Therefore, SPU might try to infer the nature of the ongoing test.
To hide the access pattern, [20] uses Oblivious RAM (ORAM) to store DNA in
several small encrypted blocks. The blocks are accessed in an oblivious manner.
476 J. Zhang et al.
However, the usage of ORAM will inevitably incur large cost due to periodically
reshuffle process, which is impractical to implement in real-life scenarios. Fur-
thermore, as MU knows the SNP weights, the risk computation can be seen as
a linear equation where the states of SNPs are the unknowns. MU can launch
brute-force attack. For example, we assume 10 SNPs are used in the disease risk
test. Each SNP only has 3 different states. Therefore, MU can try all the 310
combinations of SNPs. If one of these potential end-results matches the actual
end-result of the test, the MU actually learns the states of the relevant SNPs.
One simple obfuscation method is to provide the end-result of the genetic test to
MU as a range. Therefore, Ayday et al. discussed how to use privacy-preserving
integer comparison to report the range of genetic test [21]. Provided that the
result range is divided into 4 smaller ranges [0, 0.25), [0.25, 0.50), [0.50, 0.75)
and [0.75, 1], SPU compares the encrypted disease risk with those pre-defined
boundaries to determine the range that the test result falls in. Moreover, they
take into account clinical and environmental data to compute the final disease
risk, such as hypertension, age, smoking and family disease history. The above
techniques can be combined with our scheme proposed in this paper.
3 Preliminaries
3.1 Genomics Background
The human genome is encoded in double stranded DNA molecules consisting of
two complementary polymer chains. Each chain consists of simple units called
nucleotide or “bases”, adenine (A), cytosine (C), guanine (G), thymine (T).
Specifically, A/T and C/G are complementary pairs, respectively. The human
genome consists of approximately three billion nucleotides, of which only around
0.1% of one individual’s DNA is different from others due to genetic variation.
The remaining 99.9% of the genome is identical between any two individuals.
What is SNP? Single Nucleotide Polymorphism (SNP) is the most common
DNA variation. SNP is a variation in a single nucleotide that occurs at a specific
position in the genome. For example, at a specific position of two DNA fragments
in Fig. 2, the nucleotide C may appear in most individuals, but in a minority of
individuals, the position is occupied by base T. In this case, we say there is a
SNP at this position and we call the two possible nucleotide variations (C or T)
alleles. Each individual carries two alleles at each SNP (one inherited from the
father and one from the mother). There are approximately 50 million approved
SNPs in the human population and each patient carries on average 4 million
variants. Different patients own different sets of 4 million SNPs. For a particular
patient, all the 4 million SNPs are called real SNPs and the remaining SNPs are
considered as fictitious SNPs.
Relationship Between SNPs and Genetic Diseases: It is shown by Mul-
tiple Genome-Wide Association Studies (GWAS) that a patient’s susceptibility
to genetic diseases can be predicted from SNPs. Assume that the SNP in Fig. 2
(with alleles C and T) is relevant to a disease and T is the one carrying the disease
Privacy-Preserving Disease Risk Test Based on Bloom Filters 477
risk, then an individual with TT has the highest risk to develop the disease. CT
implies lower disease risk. And individuals with CC are least likely to suffer from
the disease. We use {0, 1, 2} to represent the number of risk alleles. Furthermore,
we adopt the encoding scheme in [14] to express {0, 1, 2} as {100, 010, 001}, as
binary encoding of SNPs is able to remove non-linear operations (i.e., squaring).
Disease Risk Computation: There are different functions for computing dis-
ease risk. One popular method is weighted averaging, which computes the pre-
dicted susceptibility by weighting the SNPs by their contributions. For each
SNPi , the contribution amount wi is known by the medical unit, determined by
previous studies on case and control groups. To be specific, wi relies on position
contribution pi and state contribution ci , and we have wi = pi ci . Assume that
we denote the set of SNPs
of a user associated with disease X as I, the risk can
be calculated by S = wpiiIi .
2
Given two safe primes p and q, we compute N = pq, g = −a2N (a ∈ Z∗N 2 ), the secret key s ∈ [1, N2 ], the public
key (N, g, h = g s ).
Encryption: To encrypt plaintext m ∈ ZN , we select a random r ∈ [1, N4 ] and generate the ciphertext E(m) =
(1) (2)
(Cm , Cm ) as below:
(1)
Cm = g r mod N 2 and Cm(2)
= hr (1 + mN ) mod N 2 (2)
Decryption:
(2)
Cm t − 1 mod N 2
t= (1) s
m= (3)
(Cm ) N
Additive Homomorphism: Supposed that we have two plaintexts m1 , m2 , and their ciphertexts are
(1) (2) (1) (2)
E(m1 ) = (Cm1 , Cm1 ), E(m2 ) = (Cm2 , Cm2 ). The ciphertext E(m1 + m2 ) can be computed as E(m1 + m2 ) =
(1) (1) (2) (2)
(Cm1 Cm2 , Cm1 Cm2 ).
(1) (2)
Proxy Re-encryption: A ciphertext E(m) = (Cm , Cm ) can be re-encrypted under the same public key, by
using a new random number r1 ∈ [1, n/4].
(1)
Cm = g r1 Cm
(1) (2)
Cm = hr1 Cm
(2)
(4)
4 Our Scheme
In this paper, we aim to put forward a general framework to reduce the storage
overhead, regardless of the underlying cryptosystem. For a disease risk test, we
want MU to download only the SNPs related to the disease (padding included).
Thus, we are supposed to build index to make SPU efficiently select SNPs. Recall
that position and state uniquely define a SNP, we should select cryptosystems for
them separately. In Sect. 4.1, we provide you with the building blocks required to
construct a privacy-preserving disease risk test protocol. Then we demonstrate
our scheme in Sect. 4.2.
6
m1 , m2 are plaintexts and E(m1 ), E(m2 ) are their corresponding ciphertexts.
Privacy-Preserving Disease Risk Test Based on Bloom Filters 479
Storage Overview. For 50 million possible SNPs, we create 0.5 million bloom
filters. Each bloom filter contains 100 SNPs with the same state. In order to
represent the encrypted states of SNPs, we link one ciphertext to each bloom
filter. We show the storage architecture in Fig. 5. On the left side, we store the
encrypted states of the SNPs. On the right side, we store the bloom filters. There
is a one-to-one correspondence between the encrypted state and the bloom filter
at each row. To avoid searching all the bloom filters sequentially, we organize
them with a tree structure. To make it clear, we show a binary tree example
in Fig. 6. At the leaf node level, we store the 0.5 million bloom filters and the
corresponding ciphertexts. At level i − 1, we store bloom filters which double
the size of the bloom filters at level i. However, we want to point out that our
scheme is not restricted to binary tree. Any multi-branch tree can be used. For a
k-branch tree, the size of bloom filters at level i − 1 is k times the size of bloom
filters at level i.
Why a Bloom Filter Contains 100 Positions? The decrease of storage over-
head depends on the capacity of each bloom filter. With the capacity increases,
480 J. Zhang et al.
the storage overhead decreases. As there are three possible states for each SNP,
we only need three bloom filters in the extreme case, with each bloom filter at
maximum capacity. In this case, we get the maximum storage overhead decrease.
However, the length of bloom filter reveals the frequency of different states of
SNPs. Therefore, we need to fix a constant capacity for the bloom filters. On the
other hand, there is one security weakness that SPU will know requests made
to the same bloom filter might have the same state. Supposed that there are
b bloom filters, and MU requests for t real SNPs, the probability that any two
requests appear on the same bloom filter is Ct2 /b. For fixed t, the above probabil-
ity decreases with b increases. The number of real SNPs for a particular disease
is quite small (i.e., less than 60)7 . For example, calculating the susceptibility of
Alzheimer’s disease requires only 10 SNPs [24] and coronary artery disease risk
computation includes 23 SNPs [25]. With 0.5 million bloom filters, the proba-
bility that any two different requests appear on the same bloom filter is at most
2
C60 /(5 × 105 ) = 0.35%. Therefore, we choose the constant capacity 100 for the
bloom filters to achieve a balance between the decrease of storage overhead and
privacy.
7
http://www.eupedia.com/genetics/cancer related snp.shtml, accessed on 9 Sep 2017.
Privacy-Preserving Disease Risk Test Based on Bloom Filters 481
to MU. If two or more requests occur in the same bloom filter, SPU should
re-encrypt the ciphertext before returning to MU (Eq. (4)). Furthermore, we
require SPU to return the encrypted states according to the order of MU’s
requests.
– Step 5: Once getting the encrypted states of requested SNPs, MU computes
the disease risk via homomorphic operations based on weighted averaging.
– Step 6: The encrypted final result is sent to P. After decryption, the result
is return back to MU.
5 Privacy Analysis
The SNPs of a patient contain sensitive information. Homomorphic encryption
makes it possible to compute disease risk for the patients without revealing the
true values of SNPs. At the same time, the medical unit (i.e., pharmaceutical
company) might consider the test specifics as trade secret. Hence, MU does
not want to make public the weights (or even the number) of the disease test.
Therefore, we conduct privacy analysis of our proposed scheme in term of these
two aspects. The MU will download the requested encrypted SNPs and compute
the disease risk locally. Therefore, the privacy of test weights preserves. Recall
that CI and P are honest parties, SPU and MU are honest-but-curious (Fig. 1),
we consider two kinds of attacks: (i) an attacker at MU trying to know the SNPs
of the patient (type-1 attack). (ii) an attacker at SPU inferring the requests from
MU and the SNPs of the patient (type-2 attack).
Security Under Type-1 Attack: The MU makes requests to SPU for SNPs of
a patient to conduct disease risk test. SPU returns the corresponding encrypted
states of the requested SNPs to MU. Even different requests occur in the same
bloom filter, SPU will re-encrypt the ciphertext. Therefore, the ciphertexts are
indistinguishable among different requests from the point of view of MU. Secu-
rity under type-1 attack relies on the security of the underlying cryptosystem
that encrypts the states. In our scheme, any additively homomorphic cryptosys-
tem can be used. As long as the underlying cryptosystem is semantic secure,
it is unable for the attacker at MU to infer any additional information about
the plaintext with the ciphertext known. Most of the additively homomorphic
cryptosystems base their security on assumptions relying on deciding residuosity.
But the two cryptosystems we mentioned in Sect. 4.1 base on non-residuosity-
related decisional assumption. To be specific, the security of modified paillier
482 J. Zhang et al.
6 Experimental Evaluation
based on B+ three, which performs well in running range search (i.e., [1, 100]).
However, the encrypted positions of SNPs related to a particular disease may be
scattered throughout the space. In this scenario, bloom filter is better in terms of
searching speed and storage. Recall that we store 100 SNPs in each bloom filter,
we have around 104 bloom filters for our dataset. To facilitate presentation, we
organize the bloom filters with 10-branch tree. The number of bloom filters at
each level and the capacity of each bloom filter are shown in Table 1. We show
the time to search requested SNPs at SPU based on B+ tree or bloom filter
tree in Fig. 7, varying the number of SNPs from 1 to 1000. For B+ tree index,
the searching time increases from 130 ms to 3200 ms with increasing number of
SNPs. By contrast, for bloom filter tree index, the searching time increases from
6.4 ms to 52 ms approximately linearly. On average, we speed up the searching
by 60x.
Since bloom filter may cause false positive, we also measure the effect of false
positive rate on the size of our index (bloom filter tree). Given a fixed n, if we
want to keep p below a threshold , we need to set m to be m ≥ n log2 e·log2 ( 1 )
(see Sect. 3.2). For 1 million elements, the size of our index under false positive
rate 1%, 0.1% and 0.01% is 4.84 MB, 6.91 MB and 9.68 MB, respectively. The
number of hash functions used in each bloom filter is 7, 10 and 14, respectively.
In comparison, the size of B+ three index generated by MySQL is 48.69 MB.
Therefore, our bloom filter tree index not only speeds up the searching of SNPs
but also is space-efficient. All the existing works store one encrypted state for
each SNP (i.e., [11,14]). To be specific, one encrypted state corresponds to one
position. But in this paper, we insert 100 positions with the same state into one
bloom filter and link one ciphertext to the bloom filter. Therefore, one encrypted
state corresponds to 100 SNP positions. This is the reason why we can reduce
the storage overhead by 100x. It is shown in [14] that the encryption of 50
million SNPs takes about 100 GB by modified paillier cryptosystem, and 4.5 GB
by AH-ECC cryptosystem (under 112-bit security parameter). As a result, our
scheme makes a big difference in storage overhead regardless of the underlying
cryptosystem.
484 J. Zhang et al.
References
1. Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J.,
Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W.: Resolving individuals
contributing trace amounts of DNA to highly complex mixtures using high-density
SNP genotyping microarrays. PLoS Genet. 4(8), e1000167 (2008)
2. Altshuler, D., Daly, M.J., Lander, E.S.: Genetic mapping in human disease. Science
322(5903), 881–888 (2008)
3. Humbert, M., Ayday, E., Hubaux, J.-P., Telenti, A.: Addressing the concerns of
the lacks family: quantification of kin genomic privacy. In: Proceedings of the 2013
ACM SIGSAC Conference on Computer and Communications Security, pp. 1141–
1152. ACM (2013)
4. Erlich, Y., Narayanan, A.: Routes for breaching and protecting genetic privacy.
Nat. Rev. Genet. 15(6), 409–421 (2014)
5. Malin, B.A.: An evaluation of the current state of genomic data privacy protection
technology and a roadmap for the future. J. Am. Med. Inform. Assoc. 12(1), 28–34
(2005)
6. Shringarpure, S.S., Bustamante, C.D.: Privacy risks from genomic data-sharing
beacons. Am. J. Hum. Genet. 97(5), 631–646 (2015)
7. Zhang, Y., Blanton, M., Almashaqbeh, G.: Secure distributed genome analysis
for gwas and sequence comparison computation. BMC Med. Inform. Decis. Mak.
15(5), S4 (2015)
8. Perl, H., Mohammed, Y., Brenner, M., Smith, M.: Privacy/performance trade-off
in private search on bio-medical data. Future Gener. Comput. Syst. 36, 441–452
(2014)
9. Chen, Y., Peng, B., Wang, X.F., Tang, H.: Large-scale privacy-preserving mapping
of human genomic sequences on hybrid clouds. In: NDSS (2012)
Privacy-Preserving Disease Risk Test Based on Bloom Filters 485
10. Zhou, X., Peng, B., Li, Y.F., Chen, Y., Tang, H., Wang, X.F.: To release or not to
release: evaluating information leaks in aggregate human-genome data. In: Atluri,
V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 607–627. Springer, Hei-
delberg (2011). https://doi.org/10.1007/978-3-642-23822-2 33
11. Ayday, E., Raisaro, J.L., Hubaux, J.-P., Rougemont, J.: Protecting and evaluating
genomic privacy in medical tests and personalized medicine. In: Proceedings of
the 12th ACM Workshop on Workshop on Privacy in the Electronic Society, pp.
95–106. ACM (2013)
12. Ayday, E., Raisaro, J.L., Hubaux, J.-P.: Personal use of the genomic data: privacy
vs. storage cost. In: 2013 IEEE Global Communications Conference (GLOBE-
COM), pp. 2723–2729. IEEE (2013)
13. Falconer, D.S., Mackay, T.F.C., Frankham, R.: Introduction to Quantitative Genet-
ics. Trends in Genetics, vol. 12, no. 7, 4th edn, 280 p. Elsevier Science Publishers
(Biomedical Division), Amsterdam (1996)
14. Danezis, G., De Cristofaro, E.: Fast and private genomic testing for disease suscep-
tibility. In: Proceedings of the 13th Workshop on Privacy in the Electronic Society,
pp. 31–34. ACM (2014)
15. Ugus, O., Westhoff, D., Laue, R., Shoufan, A., Huss, S.A.: Optimized implemen-
tation of elliptic curve based additive homomorphic encryption for wireless sensor
networks. arXiv preprint arXiv:0903.3900 (2009)
16. Huang, R.W., Gui, X.L., Yu, S., Zhuang, W.: Research on privacy-preserving cloud
storage framework supporting ciphertext retrieval. In: 2011 International Confer-
ence on Network Computing and Information Security (NCIS), vol. 1, pp. 93–97.
IEEE (2011)
17. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun.
ACM 13(7), 422–426 (1970)
18. Ziegeldorf, J.H., Pennekamp, J., Hellmanns, D., Schwinger, F., Kunze, I., Henze,
M., Hiller, J., Matzutt, R., Wehrle, K.: BLOOM: bloom filter based oblivious
outsourced matchings. BMC Med. Genomics 10(2), 44 (2017)
19. De Cristofaro, E., Faber, S., Gasti, P., Tsudik, G.: Genodroid: are privacy-
preserving genomic tests ready for prime time? In: Proceedings of the 2012 ACM
Workshop on Privacy in the Electronic Society, pp. 97–108. ACM (2012)
20. Karvelas, N., Peter, A., Katzenbeisser, S., Tews, E., Hamacher, K.: Privacy-
preserving whole genome sequence processing through proxy-aided ORAM. In:
Proceedings of the 13th Workshop on Privacy in the Electronic Society, pp. 1–10.
ACM (2014)
21. Ayday, E., Raisaro, J.L., Laren, M., Jack, P., Fellay, J., Hubaux, J.-P.: Privacy-
preserving computation of disease risk by using genomic, clinical, and environmen-
tal data. In: Proceedings of USENIX Security Workshop on Health Information
Technologies (HealthTech 2013), no. EPFL-CONF-187118 (2013)
22. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: a survey.
Internet Math. 1(4), 485–509 (2004)
23. Bresson, E., Catalano, D., Pointcheval, D.: A Simple public-key cryptosystem
with a double trapdoor decryption mechanism and its applications. In: Laih, C.-S.
(ed.) ASIACRYPT 2003. LNCS, vol. 2894, pp. 37–54. Springer, Heidelberg (2003).
https://doi.org/10.1007/978-3-540-40061-5 3
24. Seshadri, S., Fitzpatrick, A.L., Arfan Ikram, M., DeStefano, A.L., Gudnason, V.,
Boada, M., Bis, J.C., Smith, A.V., Carrasquillo, M.M., Lambert, J.C., et al.:
Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA
303(18), 1832–1840 (2010)
486 J. Zhang et al.
25. Rotger, M., Glass, T.R., Junier, T., Lundgren, J., Neaton, J.D., Poloni, E.S., Van’t
Wout, A.B., Lubomirov, R., Colombo, S., Martinez, R., et al.: Contribution of
genetic background, traditional risk factors, and HIV-related factors to coronary
artery disease events in HIV-positive persons. Clin. Infect. Dis. 57(1), 112–121
(2013)
26. Erkin, Z., Franz, M., Guajardo, J., Katzenbeisser, S., Lagendijk, I., Toft, T.:
Privacy-preserving face recognition. In: Goldberg, I., Atallah, M.J. (eds.) PETS
2009. LNCS, vol. 5672, pp. 235–253. Springer, Heidelberg (2009). https://doi.org/
10.1007/978-3-642-03168-7 14
27. Barman, L., Graini, E., Raisaro, J.L., Ayday, E., Hubaux, J.-P., et al.: Privacy
threats and practical solutions for genetic risk tests. In: 2nd International Work-
shop on Genome Privacy and Security (GenoPri 2015), no. EPFL-CONF-207435
(2015)
Engineering Issues of Crypto
Verifiable and Forward Secure Dynamic
Searchable Symmetric Encryption
with Storage Efficiency
1 Introduction
In our daily life, we use various cloud storage services, and our sensitive data
are stored in an outside server. Because many leakage incidents of databases
stored in cloud storage servers recently happen (e.g., “The Fappening” of Apple’s
iCloud in 2014), these data must be encrypted. However, if we use ordinary
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 489–501, 2018.
https://doi.org/10.1007/978-3-319-89500-0_42
490 K. Yoneyama and S. Kimura
1
Though Song et al. [1] already proposed a dynamic SSE scheme, the search cost is
linear in the number of documents.
Verifiable and Forward Secure Dynamic Searchable Symmetric Encryption 491
inserted document matches previous search queries. Thus, forward secure SSE
schemes can resist to file injection attacks. Also, forward secure SSE schemes
allow for an online build of the encrypted database because the update phase
does not leak information. In most of non-forward secure SSE schemes, inverted
indexes of the database are necessary in the setup phase; and thus, an indexing
step may be an efficiency bottleneck of the system. Stefanov et al. [11] proposed
a forward secure dynamic SSE scheme based on the oblivious RAM (ORAM).
However, their scheme is not verifiable, and needs large bandwidth overhead on
updates due to ORAM. Bost et al. [12] extended Stefanov et al.’s scheme to
verifiable. Their scheme also need large bandwidth overhead on updates.
Recently, Bost [13] proposed an efficient forward secure dynamic SSE scheme
(Σoφoς) without relying on ORAM. Σoφoς achieves optimal search and update
complexity for both computation and communication for forward secure SSE.
The key idea of Σoφoς is that the location of the newly added encrypted docu-
ment and the search token are unlinkable by preventing the adversary to gener-
ate any new search token from old one, but the client can compute new one by
using trapdoor permutations. Also, he shows an extension to verifiable scheme
(Σoφoς-). The idea of Σoφoς- is that the client keeps each hash value of indexes
of documents for each keyword. If the malicious server returns a fake answer, then
the client can verify validity by comparing the hash value. However, Σoφoς-
needs the additional storage cost for clients than Σoφoς because of keeping
the hash table. Specifically, whereas Σoφoς needs O(W log D) storage, Σoφoς-
needs O(W (log D + κ)) storage, where W is the number of distinct keywords, D
is the number of documents and κ is the security parameter. For an implemen-
tation in [13], experimental parameters sizes are set as D ≤ 248 , W ≤ 223 and
κ is 128 bit; and thus, the extra client storage cost of Σoφoς- is about 128MB
to the cost of Σoφoς. Therefore, to achieve both of forward security against
malicious servers and client storage efficiency is an important remaining prob-
lem. Also, any formal definition and proof of security against malicious servers
is not shown in [13]. Hence, it is unclear that Σoφoς- is actually secure against
malicious servers.
The contribution of this paper is twofold: one is to resolve the problem on the
storage cost in Σoφoς-, and the other is to show the formal security against
malicious servers.
Table 1. Comparison among previous forward secure SSE schemes and our scheme
the search result is valid. On the other hand, in the strong reliability definition,
the adversary wins if a tag is forged but the search result is valid. Therefore,
strong reliability is stronger than soundness. The detailed discussion about the
difference between two types of definitions is shown in [6]. We formally prove
that our scheme satisfies strong reliability by assuming the APRF. Specifically,
we show a reduction to pseudo-randomness of the APRF from strong reliability.
Also, we prove that our scheme satisfies forward security by assuming the
APRF and the one-way trapdoor permutation in the random oracle model. The
definition of forward security is the same as in [13].
2 Preliminaries
Notations. Throughout this paper we use the following notations. We denote κ
as the security parameter, and negl(κ) as the negligible function in κ. Hereafter,
we omit the security parameter for inputs of algorithms except cases that we
must explicitly state it. If Set is a set, then by m ∈R Set we denote that m is
sampled uniformly from Set. If ALG is an algorithm, then by y ← ALG(x; r)
we denote that y is output by ALG on input x and randomness r (if ALG is
deterministic, r is empty). When X is a bit-string, we denote |X| as the bit
length, and when X is a set, we denote |X| as the number of elements.
Security Model. For SSE schemes, privacy against the server is required. It is
ideal if there is no leaked information to the server. However, it is not realistic in
the SSE setting. Hence, we define leakage function L = (LStp , LSrch , LU pdt ) to
represent what a SSE scheme leaks to the adversary. LStp /LSrch /LU pdt means
the leakage function in the setup/search/update phase. The leakage function L
keeps the query list Q as it state. Q contains entries (i, w) for a search query
on keyword w, or entries (i, op, in) for an update query. i is incremented at each
query. The search pattern sp(w) is defined as sp(w) = {j : (j, x) ∈ Q}. The
history of keyword hist(w) contains the set of documents indexes matching w
at the setup phase, and the set of updated documents indexes matching w at
the update phase. As the security model of dynamic SSE schemes, we show
definitions of confidentiality, forward security, and strong reliability.
Verifiable and Forward Secure Dynamic Searchable Symmetric Encryption 495
Confidentiality. It is required that there is no leak from each phase except deriv-
able information from leakage functions. Confidentiality is defined by the simu-
lation paradigm (i.e., indistinguishability between the real world and the ideal
world), and is parametrized by leakage functions.
Definition 4 (Confidentiality). The real world SSEreal and the ideal world
SSEideal containing a simulator S are defined as follows:
1. An adversary A chooses database DB.
2. A obtains EDB ← Setup(DB) in SSEreal , or a simulated output EDB ←
S(LStp (DB)) in SSEideal .
3. A can repeatedly pose search (resp. update) queries with input w (resp.
(op, in)), and obtains DB(w) ← Search(K, w, σ, EDB) (resp. EDB ←
Update(K, σ, op, in, EDB)) in SSEreal , or a simulated output DB(w) ←
S(LSrch (w)) (resp. EDB ← S(LU pdt (op, in))) in SSEideal .
4. A outputs a bit b.
We say that a SSE scheme is L-adaptively secure if for any PPT A there exists
S such that
| Pr[1 ← AinSSEreal ] − Pr[1 ← AinSSEideal ]| ≤ negl(κ).
3 Σoφoς-, Revisited
In this section, we recall Σoφoς-, a forward secure dynamic SSE scheme secure
against malicious servers.
In the sense of the storage cost, each client must keep table W and H. Table W
contains (iw , c) for every keyword; and hence, the storage cost is O(W log D).
Table H contains H3 (DB(w)) for every keyword; and hence, the storage cost is
O(W κ). Therefore, the total storage cost for a client is O(W (log D + κ)).
Verifiable and Forward Secure Dynamic Searchable Symmetric Encryption 497
The computational cost is O(aw ) in the Search phase and O(1) in the Update
phase, where aw is the number of times that the queried keyword w was his-
torically added to the database. Also, the communication cost is O(nw ) in the
Search phase and O(1) in the Update phase, where nw is the size of the search
result set.
There are several naive approaches to reduce the extra O(W log D) storage cost
for Σoφoς-. For example, the client encrypts H3 (DB(w)) and stores it to the
server instead of storing by him/her. Then, in the Search phase, the client receives
the ciphertext of H3 (DB(w)) and DB(w), decrypts the ciphertext, and can check
the validity of DB(w). Thus, the storage cost can be the same as Σoφoς. However,
in the Update phase, an additional round is necessary to receive the ciphertext
of H3 (DB(w)) because the client does not memorize it and H3 (DB(w)) must be
updated. Therefore, this naive approach is not very good from the viewpoint of
round complexity.
4 Our Scheme
In this section, we show the protocol of our scheme. It is based on Σoφoς, but
achieves verifiability by another way than Σoφoς-.
In our scheme, we do not use any client-local verification table like table H
in Σoφoς-, but use a “tag” binding the keyword and the index as a verifier.
(w) (w) (w)
Specifically, the client sends a verifier V erc+1 as well as (U Tc+1 , ec+1 ) to the
(w)
server in the Update phase, and checks if V erc+1 is correctly bound with DB(w)
(w)
in the Search phase. We note that the client do not have to keep V erc+1 after
sending it, but the client receives the verifier corresponding to indexes. We use
a PRF to generate the verifier, and unforgeability of the verifier is guaranteed
from pseudo-randomness of the PRF.
(w)
However, if the client receives V erc+1 for nw indexes matching with w in each
Search phase, the communication complexity increases by nw PRF values and the
client needs to compute nw PRF values. Hence, we use a algebraic PRF (APRF)
AF with closed form efficiency. APRF is a special type of PRF with a range that
forms an Abelian group such that group operations are efficiently computable.
In addition, certain algebraic operations on these outputs can be computed sig-
nificantly more efficiently if one possesses
nw the salt ofhthe PRF that was used to
generate them. As Definition 2, since i=1 [AF (s, zi )] i (x) can be
efficiently com-
nw
puted by CFEvalh,z (x, s), the server can just compute and send i=1 [AF (s, zi )]
to the client, and the client can just compute CFEvalh,z (x, s) with a sublinear
running time in nw to check the validity of the verifier, where h(x) = 1, . . . , 1.
498 K. Yoneyama and S. Kimura
It saves the increase of the communication complexity only to one group ele-
ment, and the computational cost for the client is bounded by sublinear in nw .
For example, we can use the APRF from the decisional Diffie-Hellman (DDH)
assumption [15] or from the strong DDH assumption proposed [14].
– Setup(DB):
1. KS ∈R {0, 1}κ
2. KV ∈R {0, 1}κ
3. (sk, pk) ← KeyGen(1κ )
4. W, T ← empty tables
5. store DB to W and T according to Update phase
6. return (KS , KV , sk) as secret key K and W as the state of the client σ
7. return T as encrypted database EDB
– Search(w, W, T):
• Client:
1. Kw ← F (KS , w)
(w)
2. (STc , c(w) ) ← W[w]
Verifiable and Forward Secure Dynamic Searchable Symmetric Encryption 499
(w)
3. if (STc , c(w) ) = ⊥
4. return ∅
(w)
5. send (Kw , STc , c(w) ) to the server as trapdoor t(w)
• Server:
1. for i = c to 0 do
(w) (w)
2. U Ti ← H1 (Kw , STi )
(w) (w) (w)
3. (ei , V eri ) ← T[U Ti ]
(w) (w) (w)
4. indi ← ei ⊕ H2 (Kw , STi )
5. STi−1 ← πpk (STi )
6. end for c
(w) (w)
7. send (DB(w) = {indi }0≤i≤c , Ver(w) = i=0 V eri ) to the client
• Client:
1. if |DB(w)| ≥ c + 1 or Ver(w)
= CFEvalh,{i,w,ind(w) } (0, KV )
i 0≤i≤c
2. return 0
4.3 Correctness
According to the protocol, for any search query, an honest server can return
true answer (DB(w), Ver(w) ), and the client always accept the answer except the
probability that some collision on U T occur. If a collision of U T for distinct
inputs to H1 occurs, the server cannot find the correct encrypted index and
verifier. However, since H1 is a RO, the collision probability is negligible in κ.
storage cost for a client is O(W log D). It is the same as Σoφoς whereas Σoφoς
is not secure against malicious servers.
The computational cost and communication cost are asymptotically the same
as Σoφoς-. The exact additional communication cost is only one group element
(i.e., 160-bit for 80-bit security) both in the Update phase and the Search phase.
Also, the exact additional computational cost for the client is an APRF com-
putation in the Update phase and a sublinear computation in nw in the Search
phase. Therefore, our scheme is still efficient even in exact costs.
References
1. Song, D.X., Wagner, D., Perrig, A.: Practical techniques for searches on encrypted
data. In: IEEE Symposium on Security and Privacy 2000, pp. 44–55 (2000)
2. Curtmola, R., Garay, J.A., Kamara, S., Ostrovsky, R.: Searchable symmetric
encryption: improved definitions and efficient constructions. In: ACM Conference
on Computer and Communications Security 2006, pp. 79–88 (2006)
Verifiable and Forward Secure Dynamic Searchable Symmetric Encryption 501
3. Kamara, S., Papamanthou, C., Roeder, T.: Dynamic searchable symmetric encryp-
tion. In: ACM Conference on Computer and Communications Security 2012, pp.
965–976 (2012)
4. Kurosawa, K., Ohtaki, Y.: UC-secure searchable symmetric encryption. In:
Keromytis, A.D. (ed.) FC 2012. LNCS, vol. 7397, pp. 285–298. Springer, Heidelberg
(2012). https://doi.org/10.1007/978-3-642-32946-3 21
5. Kurosawa, K., Ohtaki, Y.: How to update documents Verifiably in searchable sym-
metric encryption. In: Abdalla, M., Nita-Rotaru, C., Dahab, R. (eds.) CANS 2013.
LNCS, vol. 8257, pp. 309–328. Springer, Cham (2013). https://doi.org/10.1007/
978-3-319-02937-5 17
6. Kurosawa, K., Sasaki, K., Ohta, K., Yoneyama, K.: UC-secure dynamic searchable
symmetric encryption scheme. In: Ogawa, K., Yoshioka, K. (eds.) IWSEC 2016.
LNCS, vol. 9836, pp. 73–90. Springer, Cham (2016). https://doi.org/10.1007/978-
3-319-44524-3 5
7. Islam, M.S., Kuzu, M., Kantarcioglu, M.: Access pattern disclosure on searchable
encryption: ramification, attack and mitigation. In: NDSS 2012 (2012)
8. Cash, D., Grubbs, P., Perry, J., Ristenpart, T.: Leakage-abuse attacks against
searchable encryption. In: ACM Conference on Computer and Communications
Security 2015, pp. 668–679 (2015)
9. Zhang, Y., Katz, J., Papamanthou, C.: All your queries are belong to us: the power
of file-injection attacks on searchable encryption. In: USENIX Security Symposium
2016, pp. 707–720 (2016)
10. Chang, Y.-C., Mitzenmacher, M.: Privacy preserving keyword searches on remote
encrypted data. In: Ioannidis, J., Keromytis, A., Yung, M. (eds.) ACNS 2005.
LNCS, vol. 3531, pp. 442–455. Springer, Heidelberg (2005). https://doi.org/10.
1007/11496137 30
11. Stefanov, E., Papamanthou, C., Shi, E.: Practical dynamic searchable encryption
with small leakage. In: NDSS 2014 (2014)
12. Bost, R., Fouque, P.A., Pointcheval, D.: Verifiable dynamic symmetric searchable
encryption: optimality and forward security. In: IACR Cryptology ePrint Archive
2016 (2016)
13. Bost, R.: Σoφoς: forward secure searchable encryption. In: ACM Conference on
Computer and Communications Security 2016, pp. 1143–1154 (2016)
14. Benabbas, S., Gennaro, R., Vahlis, Y.: Verifiable delegation of computation over
large datasets. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 111–131.
Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22792-9 7
15. Naor, M., Reingold, O.: Number-theoretic constructions of efficient pseudo-random
functions. In: FOCS 1997, pp. 458–467 (1997)
Improved Automatic Search Tool
for Bit-Oriented Block Ciphers
and Its Applications
1 Introduction
Finding different types of distinguishers is the key step to evaluate the security of
block ciphers. The automatic search methods are the main choices. Most of the
early automatic search methods were based on special algorithms implemented
from scratch in general purpose programming language. This kinds of methods
may be more efficient in some specific cases but they are much more difficult to
implement. Recently, the search problem is described as an SAT, MILP, or CP
models which can be automatically solved with the corresponding solvers. Among
them, the automatic search method based on MILP is simple and practical which
has become a popular tool.
The MILP method was first proposed by Mouha et al. [1] which used for
counting minimum number of differential (or linear) active s-boxes for word-
oriented block ciphers. In Asiacrypt 2014, Sun et al. [2] proposed a extended
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 502–508, 2018.
https://doi.org/10.1007/978-3-319-89500-0_43
Improved Automatic Search Tool for Bit-Oriented Block Ciphers 503
framework for bit-oriented block ciphers. The key idea of [2] is to exact the
inequalities from the H-representation of the convex hull of all possible differ-
ential patterns of the s-box. The linear inequalities describing the differential
properties of up to 5-bit s-boxes can be obtained by using the SAGE [3] software
and a greedy algorithm. Recently, the tools using the MILP method to searching
integral distinguishers [4] based on division property and impossible differentials
[5] have also been proposed. Usually, the MILP instances are be described with
the LP format and solved with the optimizer Gurobi [6] which is the most effi-
cient commercial solver currently. The MILP method is powerful, but there are
some inherent drawbacks. In this paper, we mainly simplify the scale of MILP
models and accelerate the search of (related-key) differential characteristics and
impossible differentials for bit-oriented block ciphers.
Our Contributions. We proposes the OPB file format to describe the MILP
models. Compared to the LP file format, this is more concise and more suitable
for constructing models for bit-oriented block ciphers. By setting the parame-
ter MIPFocus of Gurobi reasonably, the solution time can be greatly reduced.
For the impossible differentials search, we give the simply linear inequalities of
differential propagation of the modular addition without considering the differ-
ential probability. This helps reduce the number of variables and constraints in
MILP models and speed up searches. As applications, we give the exact lower
bounds of the number of related-key differential active s-boxes for LBlock and
the impossible differentials for the SPECK family.
Organization. The remainder of the paper is organized as follows. In Sect. 2,
we give a brief introduction to the automatic search tools based on MILP and
Gurobi for bit-oriented block ciphers. And then we propose some techniques
to improve the tools of (related-key) differentials and impossible differentials.
As applications, we search the exact lower bounds of the number of related-
key differential active s-boxes of LBlock and the impossible differentials of the
SPECK family in Sect. 3. We conclude in Sect. 4.
Constraints. For the XOR operation, the bit-level input differences are a, b
and the bit-level output difference is c. Then the constraints are:
⎧
⎨ d⊕ ≥ a, d⊕ ≥ b, d⊕ ≥ c
a + b + c ≥ d⊕ (2)
⎩
a+b+c≤2
We propose a new file format OPB to describe the MILP models for bit-oriented
block ciphers. Gurobi can solve a variety of file format models, such as MPS,
LP, OPB and so on. Among them, the MPS format is the most common, the LP
format is more readable than MPS. The common method to describe the MILP
models is the LP file format using Python or C++ language. The OPB format
is used to store pseudo-Boolean satisfaction and pesudo-boolean optimization
models which contains only boolean variables 0 or 1. So the OPB file format is
more suitable to build MILP models for bit-oriented block ciphers. Compared
to the LP file format, the OPB format is more concise, easy to read and write.
The key words and contents of the two file formats are different, as shown in
Table 1. First, the OPB format does not need to specify the variables and their
types in particular because all of them have been default to Boolean variables. In
addition, we can easy to describe the constraints of the differential propagation
of the XOR operation in this format. As the bit-level input differences are a, band
the bit-level output difference is c. Then the constraint is:
c − a − b + 2ab = 0 (5)
Improved Automatic Search Tool for Bit-Oriented Block Ciphers 505
*.lp *.opb
Minimize min:(objection)
Subject to (constraints)
(constraints)
Binary
(variables)
End
3 Applications
3.1 Application to LBlock
Lblock is a lightweight block cipher designed by Wu and Zhang at ACNS [10].
Since the shift operation number of the key schedule of LBlock is 29, we can only
use the bit-oriented MILP model in related-key differential search. The models
based on the LP format and the OPB format are solved respectively. The size of
models described by the OPB format are less than the LP format, so the solution
time is more faster usually. By setting the parameter MIPFocus = 1, the solution
time of the models is further shortened. The results are shown in Table 2. We
improved the results of 9∼11 rounds LBlock which show that 9/10/11 rounds
exact lower bounds is 7/9/11 active s-boxes. In [11], Sun et al. needed about 4
days to find the 11-round exact lower bounds of the number of differential active
s-boxes of LBlock in the related-key model. We only needed about 2 days. To
the best of our knowledge, the 12-round exact lower bounds of the number of
differential active s-boxes of LBlock is first obtained using about 3 weeks.
The computations are performed on PC (Intel(R) Core(TM) i3-4160 CPU,
3.60 GHz, 4.00 GB RAM, 4 cores, window7) with the optimizer Gurobi7.0.1.
Table 2. The exact lower bounds of the number of differential active s-boxes for round-
reduced variants of LBlock in the related-key model
limited the input and output differences to only 1 active bit. The results of the
experiments are shown in the Table 3. The input or output difference is expressed
by the position of non-zero bit. The position of the leftmost bit is 0.
The computations are performed on PC (Intel(R) Core(TM) i7-7500U CPU,
2.70 GHz, 8.00 GB RAM, 4 cores, window10) with the optimizer Gurobi7.0.1.
Table 3. Summary of impossible differentials on the SPECK family
4 Conclusion
Our work provides a new OPB file format to describe the MILP models for bit-
oriented block ciphers and also through setting the parameter MIPFocus = 1 to
accelerate the search. In addition, we give a system of simple linear inequali-
ties of differential patterns propagation of modular addition used in impossible
differential search. We applied our techniques to LBlock and SPECK.
508 L. Li et al.
Acknowledgments. The authors would like to thank all anonymous referees for their
valuable comments that greatly improve the manuscript. This work is supported by
National Natural Science Foundation of China (No. 61672509, No. 61232009) and
National Cryptography Development Fund (MMJJ20170101).
References
1. Mouha, N., Wang, Q., Gu, D., Preneel, B.: Differential and linear cryptanalysis
using mixed-integer linear programming. In: Wu, C.-K., Yung, M., Lin, D. (eds.)
Inscrypt 2011. LNCS, vol. 7537, pp. 57–76. Springer, Heidelberg (2012). https://
doi.org/10.1007/978-3-642-34704-7 5
2. Sun, S., Hu, L., Wang, P., Qiao, K., Ma, X., Song, L.: Automatic security eval-
uation and (related-key) differential characteristic search: application to SIMON,
PRESENT, LBlock, DES(L) and other bit-oriented block ciphers. In: Sarkar, P.,
Iwata, T. (eds.) ASIACRYPT 2014, Part I. LNCS, vol. 8873, pp. 158–178. Springer,
Heidelberg (2014). https://doi.org/10.1007/978-3-662-45611-8 9
3. Stein, W., et al.: Sage: Open source mathematical software (2008)
4. Xiang, Z., Zhang, W., Bao, Z., Lin, D.: Applying MILP method to searching inte-
gral distinguishers based on division property for 6 lightweight block ciphers. In:
Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016, Part I. LNCS, vol. 10031, pp.
648–678. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53887-
6 24
5. Sasaki, Y., Todo, Y.: New Impossible differential search tool from design and crypt-
analysis aspects. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017, Part III.
LNCS, vol. 10212, pp. 185–215. Springer, Cham (2017). https://doi.org/10.1007/
978-3-319-56617-7 7
6. Gurobi Optimization: Gurobi optimizer reference manual (2013). http://www.
gurobi.com
7. Fu, K., Wang, M., Guo, Y., Sun, S., Hu, L.: MILP-based automatic search algo-
rithms for differential and linear trails for speck. In: Peyrin, T. (ed.) FSE 2016.
LNCS, vol. 9783, pp. 268–288. Springer, Heidelberg (2016). https://doi.org/10.
1007/978-3-662-52993-5 14
8. Cui, T., Jia, K., Fu, K., et al.: New Automatic Search Tool for Impossible Differ-
entials and Zero-Correlation Linear Approximations. Cryptology ePrint archive,
Report 2016/689 (2016). https://eprint.iacr.org/2016/689
9. Lee, H.C., Kang, H.C., Hong, D., et al.: New Impossible Differential Characteristic
of SPECK64 using MILP. Cryptology ePrint archive, Report 2016/1137 (2016).
https://eprint.iacr.org/2016/1137
10. Wu, W., Zhang, L.: LBlock: a lightweight block cipher. In: Lopez, J., Tsudik, G.
(eds.) ACNS 2011. LNCS, vol. 6715, pp. 327–344. Springer, Heidelberg (2011).
https://doi.org/10.1007/978-3-642-21554-4 19
11. Sun, S., Hu, L., Wang, M., et al.: Towards finding the best characteristics of some
bit-oriented block ciphers and automatic enumeration of (related-key) differential
and linear characteristics with predefined properties. Cryptology ePrint Archive,
Report 2014/747 (2014). https://eprint.iacr.org/2014/747
12. Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks, B., Wingers, L.:
The SIMON and SPECK famillies of lightweight block ciphers. Cryptology ePrint
archive, Report 2013/543 (2013). http://eprint.iacr.org/2013/543
Hypercubes and Private Information
Retrieval
Anirban Basu(B) , Rui Xu, Juan Camilo Corena, and Shinsaku Kiyomoto
1 Introduction
2 Related Work
The problem of hiding the index of a retrieval operation on a database from the
server which actually holds the database was investigated by Rivest et al. [2],
Blakely and Meadows [3], Abadi et al. [4,5], Beaver and Feigenbaum [6].
The current known seminar work on private information retrieval by Chor et
al. [7,8] builds upon the above. PIR can be roughly grouped into two categories,
information-theoretic PIR and computationally PIR. The initial proposals of
Chor et al. [7,8] assume k non-communicating servers to store the database and
can resist computationally unbounded malicious servers. Later on the weaker
notion of computational PIR [9], which aims only at providing privacy against
computationally bounded adversary, emerges so as to relieve the critical assump-
tion on more than one non-communicating servers.
The work very close to our scheme is by Chan [1], which uses 2-D hyper-
cube and its generalisations into higher dimensions for a single server private
information retrieval with O(n) communication complexity. The work is more
computationally efficient on the server side than ours but has more computa-
tions (than ours) on the client-side. Chan’s work uses the Damgärd-Jurik cryp-
tosystem, reducing the need for a larger homomorphic cryptosystems for every
hypercube dimension reduction than the previous reduction. The main difference
between our work and [1] is that we propose a version of PIR, in which we can
reduce a lot of the computational complexity at the cost of a measurable privacy
loss, which is explained in Sect. 4.2.
(3) somewhat homomorphic (e.g., allowing a number of additions and one mul-
tiplication), and (4) fully homomorphic.
The Paillier public-key cryptosystem [10], satisfying semantic security against
chosen plaintext attacks (IND-CPA requirement), and its variant, the Damgård-
Jurik cryptosystem [11], have practical implementations and both exhibit only
additively homomorphic properties: (a) the encryption of the sum of two plain-
text messages m1 and m2 is the modular product of their individual cipher-
texts, i.e., E(m1 + m2 ) = E(m1 ) · E(m2 ) and (b) the encryption of the prod-
uct of one plaintext message m1 and another plaintext multiplicand π is the
modular exponentiation of the ciphertext of m1 with π as the exponent, i.e.,
E(m1 · π) = E(m1 )π .
remember that only one component amongst va,k is one. Let us suppose, va,3 = 1,
which means Ea (va,3 )t1,3 is non-zero from the first row of T . Therefore, the
homomorphic sum of the homomorphic products for the first row will produce
T1,a = Ea (va,3 )t1,3 , which when decrypted will result in t1,3 . However, decryption
is not done at this stage. If we repeat this for every row in T (with Va assuming
that va,3 = 1) and obtain the homomorphic sums per row, we generate a column
vector as follows: ⎛ ⎞
T1,a = Ea (va,3 )t1,3
⎜T2,a = Ea (va,3 )t2,3 ⎟
⎜ ⎟
Ta = ⎜ t3,3 ⎟
⎜T3,a = Ea (va,3 ) ⎟ (2)
⎝ ... ⎠
Td,a = Ea (va,3 )td,3
If we homomorphically multiply each element in Ta with Vb and homo-
morphically add the resulting components, we get: Tb = Eb (vb,1 )T1,a Eb (vb,2 )T2,a
Eb (vb,3 )T3,a . . . Eb (vb,d )Td,a but again, only one of vb,k is non-zero. Let us assume
that vb,2 = 1. Therefore, Tb = Eb (vb,2 )T2,a because all the other components
are effectively zero in plaintext domain. Thus, our 2-D square matrix has been
reduced to a point in the encrypted domain, i.e., Tb = Eb (vb,2 )T2,a . If we now run
the decryption Db (Tb ) first, we effectively obtain T2,a since vb,2 = 1. Running
the decryption Da (T2,a ), we obtain t2,3 , which is exactly the point that can be
located by setting vb,2 = 1 and va,3 = 1. Note that for simplicity, we did not
describe the shuffling of the components of both vectors because it is related
to ensuring randomisation and not the hypercube reduction process. The above
process of hypercube reduction can be easily generalised to dimensions higher
than λ = 2. If the encryption function for reducing dimension i to i−1 is denoted
by Ei then the ciphertext space for Ei must be less than the plaintext space of
Ei−1 . In the above example, Ei = Ea and Ei−1 = Eb . This constraint on the cryp-
tosystems illustrates the fact that with higher dimensions, one would require
multiple cryptosystems with large key sizes.
Loss of Privacy. Following the strategy for quantifying the loss of privacy
in [12], we use Shannon’s entropy to measure how much privacy is lost for
using homomorphic encryption in the m lower dimensions only. The entropy
a measure of uncertainty in a random variable X, and is defined as H(X) =
is
− x pX (x) log pX (x). Let H(V ) denote the uncertainty of the vector V , where
only one element is 1 and the rest are 0. Since the elements in the vector can
either be 0 or 1, and the entire vector can only haveone element that is 1, we
d
can write that for a d-element vector, H(V ) = − i d1 log d1 = log d. Thus,
for λ such independent vectors, the total entropy is λ log d. If only m such
vectors are encrypted, then we can quantify the loss in privacy in terms of
entropy as Ploss = (λ − m) log d and leaving us with the residual privacy as
Presidual = m log d.
514 A. Basu et al.
5 Evaluation
In the performance evaluation of cryptographic primitives shown in Table 1, we
have used an open-source implementation of the Paillier cryptosystem1 . The
performance of this implementation on a 64-bit Macbook Pro running macOS
Sierra 10.12.5 and Java 1.8.0 121-b13 on a 2.9 GHz Intel Core i5 with a 16 GB
2133 MHz LPDDR3 RAM are given in Table 1. The plaintext and integer mul-
tiplicands chosen from random integers of bit lengths 256, 512 and 1024 respec-
tively. Notice that these bit lengths are half the size of the public key sizes of the
tested cryptosystems because our implementation supports negative integers by
dividing the plaintext space into half with the upper half reserved for positive
integers and the lower half for negative ones.
Table 1. Comparison of the performances, in terms of time, of a Java implementation
of the Paillier cryptosystem with different bit lengths for the public key (i.e. modulus n).
6 Conclusions
In this paper, we have proposed a computationally private information retrieval
method based on the concept of hypercubes. We have also shown that in order
to make our CPIR scheme efficient, we may need to make limited use of homo-
morphic cryptosystems with a quantifiable loss of privacy.
One avenue of future work includes testing the proposed system (both ver-
sions – one without privacy loss and one with measurable privacy loss) for scal-
ability for picking one token from a large number of tokens in a database.
References
1. Chang, Y.-C.: Single database private information retrieval with logarithmic com-
munication. In: Wang, H., Pieprzyk, J., Varadharajan, V. (eds.) ACISP 2004.
LNCS, vol. 3108, pp. 50–61. Springer, Heidelberg (2004). https://doi.org/10.1007/
978-3-540-27800-9 5
2. Rivest, R.L., Adleman, L., Dertouzos, M.L.: On data banks and privacy homomor-
phisms. Found. Secur. Comput. 4(11), 171–181 (1978)
3. Blakley, G., Meadows, C.: A database encryption scheme which allows the compu-
tation of statistics using encrypted data. In: 1985 IEEE Symposium on Security
and Privacy, p. 116. IEEE (1985)
4. Abadi, M., Feigenbaum, J., Kilian, J.: On hiding information from an oracle. J.
Comput. Syst. Sci. 39(1), 21–50 (1989)
5. Beaver, D., Feigenbaum, J., Kilian, J., Rogaway, P.: Locally random reductions:
improvements and applications. J. Cryptol. 10(1), 17–36 (1997)
6. Beaver, D., Feigenbaum, J.: Hiding instances in multioracle queries. In: Choffrut,
C., Lengauer, T. (eds.) STACS 1990. LNCS, vol. 415, pp. 37–48. Springer, Heidel-
berg (1990). https://doi.org/10.1007/3-540-52282-4 30
7. Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval.
In: Proceedings of the 36th Annual Symposium on Foundations of Computer Sci-
ence, pp. 41–50. IEEE (1995)
8. Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval.
J. ACM 45(6), 965–982 (1998)
9. Chor, B., Gilboa, N.: Computationally private information retrieval. In: Proceed-
ings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pp.
304–313. ACM (1997)
10. Paillier, P.: Public-key cryptosystems based on composite degree residuosity
classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238.
Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X 16
11. Damgård, I., Jurik, M.: A generalisation, a simplification and some applications
of paillier’s probabilistic public-key system. In: Kim, K. (ed.) PKC 2001. LNCS,
vol. 1992, pp. 119–136. Springer, Heidelberg (2001). https://doi.org/10.1007/3-
540-44586-2 9
12. Coney, L., Hall, J.L., Vora, P.L., Wagner, D.: Towards a privacy measurement
criterion for voting systems. In: Proceedings of the 2005 National Conference on
Digital Government Research, pp. 287–288. Digital Government Society of North
America (2005)
A Multi-client Dynamic Searchable
Symmetric Encryption System
with Physical Deletion
Lei Xu1,2 , Chungen Xu1(B) , Joseph K. Liu2 , Cong Zuo2 , and Peng Zhang3
1
School of Science, Nanjing University of Science and Technology, Nanjing, China
[email protected], [email protected]
2
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
{joseph.liu,cong.zuo1}@monash.edu
3
ATR Key Laboratory of National Defense Technology,
College of Information Engineering, Shenzhen University, Shenzhen, China
[email protected]
1 Introduction
Cloud storage is a new concept that extends and develops in the concept of cloud
computing, which collects data storage and service access functions through the
combination of cluster application, network technology or distributed file sys-
tem, and collects many different types of storage devices in the network together
through application software to work together. It is an emerging network storage
technique, which has many good qualities. For one thing it makes all the storage
resources be integrated together to achieve data storage management automa-
tion and intelligence, for another, it improve the storage efficiency and flexible
expansion through the visualization technology to solve the waste of storage
space, reduce the operating costs [1–4]. Due to its properties of flexible manage-
ment and low rental prices, many users and businesses choose to put their own
data in the cloud.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 516–528, 2018.
https://doi.org/10.1007/978-3-319-89500-0_45
A Multi-client Dynamic Searchable Symmetric System Encryption 517
With the promotion of cloud services, the industry soon found that when
cloud storage brings people convenience, it gradually appears some short boards,
and the biggest obstacle of cloud service promotion is the security issues around
the data. The user suspects that the cloud service can not provide the corre-
sponding security support for the data, which hinders the transfer of more data
and business platform. In order to solve the problem mentioned above, we need
to satisfies the following two conditions: Integrity and confidentiality, that is the
cloud storage server should ensure that the data and operations in the cloud
would not be malicious or non-malicious loss, destruction, leakage or illegal use;
Access and privacy, when users visit some sensitive data, the system can prevent
potential rivals to infer the user’s behavior through the user’s access mode. At
present, the main means to solve such problems is to use the cryptography tech-
niques. Users always use encryption system to encrypt their sensitive data before
upload to the cloud to protect the data’s confidentiality from illegal adversary.
This method is the most straightforward and the simplest, but is not practical
in the real scene. After a long period of research, for the former, people find
that the searchable encryption is good tool to solve this problem. In this paper,
we will focus on how to realize the dynamic data confidentiality and privacy
retrieval control in cloud.
number of users. Dong et al. [18] constructed multi-user system based on proxy
re-encryption techniques, where each user has its own unique key to encrypt,
search and decrypt data. Thus, the scheme need a trusted server to manage
keys. At the same time, recently, there are many systems based on ABE, in
which user used attribute set to define rights of search [19–22]. Wang et al.
[19] achieves fine-grained access control to authorized users with different access
rights using a standard CP-ABE without key share. 2016, Wang et al. [20] pro-
posed an efficiently multiuser searchable attribute-based encryption scheme with
attribute revocation and grant for cloud storage. In the scheme, attribute revoca-
tion and grant processes of users are delegated to proxy server. In 2015, Rompay
et al. [23] introduces a third party, named a proxy, that performs an algorithm
to transform a single user query into one query per targeted document. In this
way, sever cannot have access to content of query and its result, which achieves
query privacy.
1.3 Organization
The rest of this paper is organized as follows: In Sect. 2, we describe the definition
of MC-DSSE scheme and gave some hardness assumptions. In Sects. 3 and 4,
we propose a novel DSSE scheme support for multi-client and give its security
proof. Section 5 gives its communication and computation cost. Finally, we end
the paper with a brief conclusion.
A Multi-client Dynamic Searchable Symmetric System Encryption 519
2 Preliminaries
In this section, we first review the definition of the multi-client dynamic search-
able symmetric encryption with keyword search, and then introduce some hard-
ness problems with its complexity assumption related to our security proof.
We say that a multi-client DSSE scheme is called IND-CKA2 secure with leakage
functions above, if the probability P r[RealA (k) = 1] − P r[IdealA,S (k) = 1] is
negligible for some security parameter k.
Definition 3 (Strong RSA Problem) [25]. Let p, q be two k-bit big prime
numbers, and set n = pq. Choose g ∈ Z∗n randomly. We say that an efficient
algorithm A solves the strong RSA problem if it receives as input the tuple (n, g)
and outputs two element (z, e) such that z e = g mod n.
H: {0, 1}∗ → {0, 1}2k+1 , G: {0, 1}∗ → {0, 1}3k+1 be two cryptographic hash
functions. Choose two big prime integers p, q, and pick k1 , k2 ∈ {0, 1}k ran-
domly, then output the master key MK = (p, q, k1 , k2 , g) and the public key
PK= (n = pq, F, G, H). Finally, run the Algorithm 1 to generate the EDB
and send it to the Server, keep the TP secret.
– Server: Store the encrypted database EDB.
ClientKGen(MK, w):
– Client: Assuming that a legitimate client wish to perform searches over key-
words w = (w1 , w2 , . . . , wn ), he send w to the owner to apply for his private
key of keywords w.
– Data owner: The data owner generates a corresponding private key as:
n
wj
skw = (skw,1 , skw,2 , skw,3 ) ← (k1 , k2 , g 1/ j=1 mod n)
to the server.
– Server: Take the encrypted database EDB = (DW , DF , DTid ) as inputs, set
Lid = Fk1 (g 1/id mod n), and executes the following algorithm to delete the
expected file.
A Multi-client Dynamic Searchable Symmetric System Encryption 523
Search((skw ), EDB):
and send STwi = (Fk1 (g 1/wi mod n), Fk2 (g 1/wi mod n))to the server;
– Server: Take EDB = (DW , DF ) and token STwi = (Fk1 (g 1/wi mod n)),
Fk2 (g 1/wi mod n)) as inputs, initialize an empty set I, a temporary index-
data pair (Ltw = N U LL, Dw t
= N U LL) and a temporary pointer Pwt =
1/wi
N U LL, set Lw = Fk1 (g ), and do the following steps:
4 Security Analysis
In this section, we show that our proposed protocol is IND-CKA2 secure against
the adaptive server and the client one after another as [24] except some leakage
function. Before starting our proof, we need a simulator S to response the query
from A, which is defined in Sect. 2, to take the following leakage functions as
input:
1. L Setup = |DB|: After running Setup algorithm, one will statistics the num-
ber of file-keyword pairs in DB according to the size of EDB.
2. LAddKeyword = N ew(id, w): When running the AddKeyword algorithm,
one will get the generated ciphertexts N ew(id, w) by comparing with the
former database.
3. LDeleteF ile = (Old(id), N ew(id)): When deleting a selected file id, one will
know all deleted ciphertexts of file which identifier is id, and ciphertexts of
them were simulated by protocol Setup or AddKeyword.
4. LSearch = (DB(w), Old(w), N ew(w)): When searching a file which contains
the keyword w, one will know all matched ciphertexts and their father files
in DB(w), and the first part of these ciphertexts were generated by protocol
Setup or AddKeyword.
Lemma 1. Suppose that there exists an adversary A that run protocol Setup
to get the corresponding encrypted database from S, and the leakage function
LSetup = |DB|, then A can not distinguish the above simulated EDB with a real
one.
A Multi-client Dynamic Searchable Symmetric System Encryption 525
Lemma 2. Suppose that H and G are random oracles, then for any polynomial
time adversary A, there exists an algorithm SAddKeyword , such that A could
distinguish it with a real one that is generated in game RealA (k).
Lemma 3. Suppose H and Fk1 are random oracles, then for any polynomial
time adversary A, there exists an algorithm SSearch , such that A could distin-
guish it with a real one that is generated in game RealA (k).
Lemma 4. Suppose G and Fk1 are random oracles, then for any polynomial
time adversary A, there exists an algorithm SDeleteF ile , such that A could dis-
tinguish it with a real one that is generated in game RealA (k).
Assume that there exists an adversarial client A who can generate a valid
search token for some nonauthorized keyword w0 , so he can get the correct value
(g 1/w mod n). In this case, we can use A to construct an efficient algorithm B
to solve the strong RSA problem with a non-negligible probability by Euclidean
algorithm. Consider the properties of RSA function, actually unless the client
can compute the correct value g 1/w mod n, or no one can generate a valid
search token for non-authorized keyword w .
Table 1. The communication and computation cost of some classical retrieval scheme
computation cost of the exponential operation. Table 1 lists some classical similar
schemes about searchable encryption.
From the table, we can see that our searchable encryption achieves a balance
in diversified function and communication cost. The size of EDB can keep the
size of O(|DB|), which is similar with the scheme proposed by Xu [24]. And
we also realize the multi-client function in our paper without increasing much
computation cost.
6 Conclusion
We construct an efficient and practical multi-client symmetric searchable encryp-
tion scheme with physical deletion property via RSA function in the random
oracle model, and prove the security of the scheme by using the strong RSA
function and four attack lemmas. The scheme gives a general method to extend
the single reader model searchable encryption scheme to multiple readers. We
also present the detailed communication cost and computation cost of the pro-
posed scheme and point out that our scheme is more efficient than other classical
ones by comparing the running time with some classical searchable encryption
in each phase.
References
1. Liu, J.K., Au, M.H., Susilo, W., et al.: Secure sharing and searching for real-time
video data in mobile cloud. IEEE Netw. 29(2), 46–50 (2015)
2. Baek, J., Vu, Q.H., Liu, J.K., et al.: A secure cloud computing based framework
for big data information management of smart grid. IEEE Trans. Cloud Comput.
3(2), 233–244 (2015)
3. Wang, S., Zhou, J., Liu, J.K., et al.: An efficient file hierarchy attribute-based
encryption scheme in cloud computing. IEEE TIFS 11(6), 1265–1277 (2016)
4. Wang, S., Liang, K., Liu, J.K., et al.: Attribute-based data sharing scheme revisited
in cloud computing. IEEE TIFS 11(8), 1661–1673 (2016)
A Multi-client Dynamic Searchable Symmetric System Encryption 527
5. Chor, B., Kushilevitz, E., Goldreich, O., et al.: Private information retrieval. J.
ACM 45, 965–981 (1998)
6. Golle, P., Staddon, J., Waters, B.: Secure conjunctive keyword search over
encrypted data. In: Jakobsson, M., Yung, M., Zhou, J. (eds.) ACNS 2004. LNCS,
vol. 3089, pp. 31–45. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-
540-24852-1 3
7. Liu, C., Zhu, L., Wang, M., et al.: Search pattern leakage in searchable encryption:
attacks and new construction. Inf. Sci. 265, 176–188 (2014)
8. Liu, J., Lai, J., Huang, X.: Dual trapdoor identity-based encryption with keyword
search. Soft. Comput. 21(10), 2599–2607 (2015)
9. Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Roşu, M.-C., Steiner, M.: Highly-
scalable searchable symmetric encryption with support for boolean queries. In:
Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 353–373.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4 20
10. Jarecki, S., Jutla, C., Krawczyk, H., et al.: Outsourced symmetric private informa-
tion retrieval. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer
and Communications Security, pp. 875–888. ACM, Berlin (2013)
11. Liang, K., Huang, X., Guo, F., Liu, J.K.: Privacy-preserving and regular language
search over encrypted cloud data. IEEE TIFS 11(10), 2365–2376 (2016)
12. Kasra Kermanshahi, S., Liu, J.K., Steinfeld, R.: Multi-user cloud-based secure key-
word search. In: Pieprzyk, J., Suriadi, S. (eds.) ACISP 2017. LNCS, vol. 10342, pp.
227–247. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60055-0 12
13. Yang, X., Lee, T.T., Liu, J.K., et al.: Trust enhancement over range search for
encrypted data. In: Trustcom, pp. 66–73. IEEE, New York (2016)
14. Zuo, C., Macindoe, J., Yang, S., et al.: Trusted Boolean search on cloud using
searchable symmetric encryption. In: Trustcom, pp. 113–120. IEEE, New York
(2016)
15. Liang, K., Su, C., Chen, J., Liu, J.K.: Efficient multi-function data sharing and
searching mechanism for cloud-based encrypted data. In: Proceedings of the 11th
ACM on Asia CCS, pp. 83–94. ACM (2016)
16. Curtmola, R., Garay, J., Ostrovsky, R.: Searchable symmetric encryption: improved
definitions and efficient constructions. In: CCS 2006, pp. 79–88. ACM, New York
(2006)
17. Bao, F., Deng, R.H., Ding, X., Yang, Y.: Private query on encrypted data in
multi-user settings. In: Chen, L., Mu, Y., Susilo, W. (eds.) ISPEC 2008. LNCS,
vol. 4991, pp. 71–85. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-
540-79104-1 6
18. Dong, C., Russello, G., Dulay, N.: Shared and searchable encrypted data for
untrusted servers. In: Atluri, V. (ed.) DBSec 2008. LNCS, vol. 5094, pp. 127–143.
Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70567-3 10
19. Wang, Q., Zhu, Y., Luo, X.: Multi-user searchable encryption with fine-grained
access control without key sharing. In: International Conference on Advanced Com-
puter Science Applications and Technologies, pp. 119–125. IEEE (2014)
20. Wang, S., Zhang, X., Zhang, Y.: Efficiently multi-user searchable encryption
scheme with attribute revocation and grant for cloud storage. PLoS ONE 11(11),
e0167157 (2016)
21. Wang, Y., Wang, J., Sun, S.-F., Liu, J.K., Susilo, W., Chen, X.: Towards multi-user
searchable encryption supporting Boolean query and fast decryption. In: Okamoto,
T., Yu, Y., Au, M.H., Li, Y. (eds.) ProvSec 2017. LNCS, vol. 10592, pp. 24–38.
Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68637-0 2
528 L. Xu et al.
22. Cui, H., Deng, R.H., Liu, J.K., Li, Y.: Attribute-based encryption with expressive
and authorized keyword search. In: Pieprzyk, J., Suriadi, S. (eds.) ACISP 2017.
LNCS, vol. 10342, pp. 106–126. Springer, Cham (2017). https://doi.org/10.1007/
978-3-319-60055-0 6
23. Van Rompay, C., Molva, R., Önen, M.: Multi-user searchable encryption in the
cloud. In: Lopez, J., Mitchell, C.J. (eds.) ISC 2015. LNCS, vol. 9290, pp. 299–316.
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23318-5 17
24. Xu, P., Liang, S., Wang, W., Susilo, W., Wu, Q., Jin, H.: Dynamic searchable
symmetric encryption with physical deletion and small leakage. In: Pieprzyk, J.,
Suriadi, S. (eds.) ACISP 2017. LNCS, vol. 10342, pp. 207–226. Springer, Cham
(2017). https://doi.org/10.1007/978-3-319-60055-0 11
25. Sun, S.-F., Liu, J.K., Sakzad, A., Steinfeld, R., Yuen, T.H.: An efficient non-
interactive multi-client searchable encryption with support for Boolean queries.
In: Askoxylakis, I., Ioannidis, S., Katsikas, S., Meadows, C. (eds.) ESORICS 2016.
LNCS, vol. 9878, pp. 154–172. Springer, Cham (2016). https://doi.org/10.1007/
978-3-319-45744-4 8
High-Performance Symmetric
Cryptography Server with GPU
Acceleration
Abstract. With more and more sensitive and private data transferred
on the Internet, various security protocols have been developed to secure
end-to-end communication. However, in practical situations, applying
these protocols would decline the overall performance of the whole sys-
tem, of which frequently-used symmetric cryptographic operations on
the server side is the bottleneck. In this contribution, we present a high-
performance symmetric cryptography server. Firstly, a symmetric algo-
rithm SM4 is carefully scheduled in GPUs, including instruction-level
implementation and variable location improvement. Secondly, optimiza-
tion methods is provided to speed up the inefficient data transfer between
CPU and GPU. Finally, the overall server architecture is adopted for
mass data encryption, which can deliver 15.96 Gbps data encryption
through network, 1.23 times of the existing fastest symmetric crypto-
graphic server. Furthermore, the server can be boosted by 2.02 times
with the high-speed pre-calculation technique for long-term-key applica-
tions such as IPSec VPN gateways.
1 Introduction
Cloud computing, e-commerce, online bank and other Internet services are devel-
oping rapidly, more and more sensitive and private data is transferred on the
W. Cheng—This work was partially supported by the National 973 Program of China
under award No. 2014CB340603 and the National Cryptography Development Fund
under award No. MMJJ20170213.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 529–540, 2018.
https://doi.org/10.1007/978-3-319-89500-0_46
530 W. Cheng et al.
Internet. SSL, TLS, IPSec and other security protocols have been developed
to support secure and reliable end-to-end communication. Under these security
protocols, servers and clients first use key exchange protocols to negotiate a
symmetric key, and then use the symmetric key to encrypt the communication
content. From 2011 to 2015, the total amount of global data have increased for
more than 10 times (from 0.7 ZB to 8.6 ZB), and CPU performance is only about
three times higher, CPU’s development can hardly meet the expanding demand.
In order to satisfy the need for symmetric calculation power, we present a
high-performance symmetric cryptography server. Our server can be deployed
in cloud computing, database encryption, end-to-end encrypted communication
and other applications. It works as a proxy and outsources these complex and
onerous symmetric computation from the original server.
To build the high-performance symmetric cryptography server, we accom-
plished a high speed SM4 kernel, used a GPU card as an SM4 algorithm accel-
erator, and optimized the network service based on the characteristics of GPU
card. Our work is to gradually optimize the following three aspects.
1. Speeding up the SM4 kernel. We used CUDA’s PTX (Parallel Thread Execu-
tion) instructions to accomplish bitwise exclusive OR operation and circular
shift operation in the algorithm. At the same time, we adjusted the order of
plain-text in global memory and modified its accessing method to increase
plain-text’s accessing rate. S-Box and round keys were also carefully arranged.
On NVIDIA GeForce GTX 1080, our SM4 kernel is able to encrypt 535.68 Gb
data per second. It achieves 12.59 times of the existing fastest SM4 imple-
mentation by Martı́nez-Herrera et al. [12].
2. Enhancing the overall throughput of the GPU card. Using GPUs as an accel-
erator, data transfer between GPUs and CPUs limits its overall throughput.
We took advantage of multi-stream’s parallel operation, and overlapped data
transfer and calculation process with each other. Finally, the overall through-
put of GPU card is enhanced to 76.89 Gbps, and our optimizations methods
make use of 85.4% of GTX 1080’s PCI-E bandwidth.
3. Optimizing the network service. Based on the characteristics that the GPU
card is weak performance under single thread, and strong performance under
multiple threads, we designed a queue to cache network service requests, and
the GPU card can handles multiple encryption requests in the queue at the
same time. The server’s peak throughput through network is 15.96 Gbps.
For the case that using a single key, like large IPSec VPN gateways, we
designed a memory management framework and used pre-calculation tech-
nique to decrease data copy operations and reduce the degree of coupling.
With these optimizations, the server’s peak throughput reaches 32.23 Gbps,
and it is 2.48 times faster than SSLShader [6].
2 Related Work
The SM4 algorithm is Chinese standard symmetric cipher for data protection,
and it is first declassified in 2006 and standardized in 2012. Researchers have done
a lot of works on its security and attacking methods [2,3]. Chinese government is
also vigorously promoting the SM4 algorithm as a standard for data protection.
However, the heavy cryptographic computation of SM4 limits its using scene,
and there are few studies to solve this problem.
As a graphics dedicated processor, GPUs are concentrated in the field of
computer graphics [1]. After CUDA was introduced, GPU became widely used
in the field of general purpose computing, many symmetric algorithms were
scheduled in GPUs for better performance. Manavski et al. [11] took the lead
in using CUDA to accelerate AES, which followed the Rijndael reference imple-
mentation, and provided optimal location for storing the T-tables to enhance the
benchmark performances, and their work was followed by most teams. Harrison
et al. completed AES-CTR on CUDA enabled GPUs [5]. While optimizing the
algorithm’s calculation rate, this work also contributed on how to schedule serial
and parallel execution of block ciphers on GPUs. On Tesla P100, Nishikawa [13]
increased the speed of the AES-ECB to 605.9 Gbps. Except for AES, the opti-
mization of other symmetric algorithms on the GPU has also been extensively
studied by researchers, including DES [10], Blowfish, IDEA, CAST-5, Camellia
[4], MD5-RC4 [8].
In addition to symmetric cryptography, GPUs were also widely applied in
asymmetric cryptography, including RSA [6,16] and ECC [17]. GPU-based cryp-
tographic servers were also growing rapidly. Using GPU as a general-purpose SSL
accelerators, SSLShader accelerated SSL cryptographic operations, and it han-
dled 29K SSL TPS and achieved 13 Gbps bulk encryption throughput [6]. Guess
[15] was a dedicated equipment for signature generation and verification, and it
was capable of 8.71 × 106 operations per second (OPS) for signature generation
or 9.29 × 105 OPS for verification.
3 Background
Compared with CPUs, modern GPUs devote most of the transistor to arith-
metic processing unit with much less data cache and flow controler. This unique
hardware design is specialized for manipulating computer graphics, and it is also
quiet fit for compute-intensive operations and high parallel computation. CUDA
is a parallel computing platform, which was created and firstly introduced in
532 W. Cheng et al.
Multiprocessor 20
Global memory
memory cache
Cuda core
Cuda core
Cuda core
Memory
Multiprocessor 3
Constant
GPU
CPU
Multiprocessor 2
Multiprocessor 1
Shared memory
Register
Cache
memory cache
Texture
Cuda core
Cuda core
Cuda core
Fig. 1. NVIDIA GeForce GTX 1080’s architecture and how it works in the system
4 Implementation Architecture
In this work, our main contributions are on these aspects: (i ) Algorithm level
optimizations for scheduling SM4 encryption on GPUs. (ii ) Optimizations for
data transfer between CPUs and GPUs. (iii ) Optimizing the overall performance
of the server.
High-Performance Symmetric Cryptography Server with GPU Acceleration 533
For the SM4 kernel, naive porting of CPU algorithms to a GPU would waste most
GPU computational resources and cause server performance degradation, In this
part, we describe our approaches and design choices to maximize performance
of SM4 kernel on GPUs, and the key point in maximizing SM4 performance
lies in rational use of GPU storage resources and reducing GPU computational
resources consumption.
The plani-text of SM4 encryption is divided into 4 32-bit blocks. With the
corresponding round key, the round function uses these 4 blocks to generation
a new 32-bit block. Generated block and the last 3 blocks are the input for
the next round function, and the first block will never be used again. In order
to reduce the usage of registers, we use the first block of the original input to
store the generated block. SM4 symmetric encryption operation consists of 32
round functions, and the entire encryption process originally required 36 32-bit
registers. With our tricks, only 4 32-bit registers are used in the entire process.
Then we optimized the implementation of the round function. For the round
function, it carries out bitwise exclusive OR, left circular shift operations and
non-linear permutations on these 32-bit inputs. PTX instruction xor.b32 is used
to complete bitwise exclusive OR operation for two 32-bit units. The implemen-
tation for the left circular shift operation is more complicated, we used shl.b32,
shr.b32 and or.b32 these three instructions.
For non-linear permutations, we build a 256 bytes size S-Box, and it needs
to query the S-Box for 4 times to implement the non-linear permutations for a
32-bit block, and we carefully considered the arrangement for the S-Box. The
naive way is to store the S-box in global memory. Global memory’s accessing rate
is slow, and it results in SM4 kernel performance degradation. Except for global
memory, GPUs have 32-bit register, constant memory and shared memory, which
also can be used to store S-Box. Through comparative tests, we find when S-box
is stored in constant memory, the performance of SM4 kernel is the best. GPUs
have specially on-chip cache for constant memory, and after data in constant
memory was loaded in the on-chip cache, the program get the data directly
from the cache, rather than device memory. Register is also optional in certain
circumstances, compared with using constant memory, the implementation of
SM4 kernel that uses register to store S-Box is resistant to key recovery timing
attack [7], although it has compromised in terms of performance.
After optimizing the round function, we also need a reasonable arrangement
of the data for the SM4 kernel. During the encryption process, there are three
kinds data: plain-text, round keys and S-box. As mentioned above, S-Box is
stored in constant memory. The arrangement for round keys varies depending
on the applications. When the application uses multiple keys, round keys for each
GPU thread is different, so each GPU thread must derives its own round keys and
store them in global memory. When only a single key is used in the application,
we use CPU to derive round keys in advance and store them in shared memory
for faster accessing rate. The plain-text is stored in global memory, because other
storage space on GPUs are not suitable. Register is very limited, and normally
534 W. Cheng et al.
the size of plain-text is quite big, using 64K 32-bit registers to store plain-text
results in degradation of the SM4 kernel. Every GPU thread access different
plain-text, so shared memory is not a good choice. Plain-text is not constant
value, so constant memory is not suitable too.
As plain-text is stored in global memory, and compared with other memory
space, global memory has lower bandwidth, and we then optimized the inefficient
accessing to the plain-text in global memory. We used the coalesced access to
improve the accessing efficiency. Coalesced access needs two conditions: (i) The
size of data accessed by a thread each time must be 4 bytes, 8 bytes or 16 bytes;
(ii) The memory space accessed by a warp must be continuous. We used INT,
INT2, and INT4 these unique instructions provided by the CUDA API to access
global memory. INT, INT2, and INT4 are used to access 4,8,16 bytes global
memory respectively. To meet the second condition, we adjust the order of the
data blocks, and every GPU thread accesses the global memory in a special
method. Under normal circumstances, each piece of plain-text encrypted by the
same key is processed by one GPU threads, and multiple pieces are combined
into one data block and then transferred to GPU’s global memory. In this case,
every thread is accessing a continuous memory space, but each time when a
warp performs a memory accessing operation, it accesses multiple discrete data
fragments, just as shown in Fig. 2(a). To benefit from coalesced access, we adjust
the order of the data blocks. First, every piece of plain-text is split into several
blocks, the size of the block depends on the instruction used to access global
memory, instruction INT, INT2 and INT4 corresponds to 4-bytes, 8-bytes and
16-bytes block size respectively. Second, rearranging these blocks, make sure
that all pieces’s first blocks are combined in the natural order, and so it is with
other blocks, as shown in Fig. 2(b). Every time a GPU thread accesses global
memory, the next hop of the pointer also needs to point to the corresponding
block. Under this condition, the accessed memory space by a warp is continuous,
and each thread is processing its corresponding data.
access instruction 2
T0 T1 T31 access instruction 2
T0 T1 T31
device memory, which is accessed via 32, 64 or 128 bytes memory transactions
[14]. When a threads accesses global memory, the data need to be read in the
cache by generating 32, 64 or 128 bytes memory transactions, then the thread
gets data from the cache. As the size of the memory transactions is fixed, so more
data than the actual demand will be cached. When the conditions of coalesced
access are met, most threads directly gets the data from the cache without
generating more memory transactions. The result also shows that instruction
INT4 is the best choice.
At last, we set the SM4 kernel’s parameters to 20 blocks and each block
owns 1024 GPU threads, so that the SM4 kernel maximize the use of GPU’s
computational resources.
Copy data from Copy data from Copy data from Copy data from
Stream 0 Calculation Calculation
host to device host to device host to device host to device
To achieve the highest throughput of the GPU card, the parameter of the SM4
kernel changed. We reduce the threads per block from 1024 to 512, every stream
536 W. Cheng et al.
only hold one block, and the stream number is 10. This is because our SM4
kernel is much faster than the data transfer, reducing the number of blocks and
the number of threads per block improve the efficiency of the parallel execution
between these streams.
After our optimization, the throughput of the GPU card reaches to
76.89 Gbps. Compared with the SM4 kernel, it is much slower. The test tool
provided by the NVIDIA CUDA Toolkit shows that the bandwidth of our target
platform’s (GeForce GTX 1080) PCI-E is about 90 Gbps. So Our optimization
makes use of 85.4% of the bandwidth, and this result is satisfactory.
Plain text
Pinned Memory Block array
CPU GPU
Cipher text Full ID poll
Empty ID
poll
In the single key case, we have further optimizations. Based on the feature
that CTR mode supports pre-calculation, we use the GPU to generate stream
ciphers and store them in host memory. When CPU received service request,
High-Performance Symmetric Cryptography Server with GPU Acceleration 537
it just encrypts the plain-text with the stream ciphers, and send the cipher-
text back. In the case of multiple keys, before plain-text is encrypted, it would
be copied for three times, including pushed in the queue, popped out from the
queue and copied to a piece of continuous pinned host memory. In the single
key case, with our optimizations, plain-text is directly encrypted with no other
copy operation. And at the same time, through our optimizations, CPU and
GPU work independently, the degree of the system’s coupling is greatly reduced.
It is worth mentioning that technique can be applied to enhance the overall
performance of GPU servers for other symmetric algorithms, such as AES-CTR,
and DES-CTR.
Figure 4 shows how our optimizations work in detail. First, we designed a
memory management framework. In the framework, there is a pinned memory
block array, each pinned memory block is pre-allocated and specially designed
to store 512 × 16 bytes IVs and the generated cipher streams. Every block has an
unique ID, and all these IVs and cipher streams could be accessed by querying
the ID. In the initialization process, CPU fills the IVs with random numbers and
pushes these IDs into the empty ID poll. After the initialization is completed,
GPU begins to generate cipher streams: it pulls out IDs from the empty ID
poll and uses the IVs in the block to generate cipher streams. When the cipher
streams are full filled, these IDs are pushed into the full ID poll. When GPU
is generating cipher streams, CPU gets an ID from the full ID poll and deals
with network requests. When it receives a request, it generates the cipher-text
with the cipher streams, and then sends the encrypted data back. If the cipher
streams are used up, CPU refills the IVs, pushes the ID back to the empty ID
poll and gets another ID from the full ID poll.
5 Performance Assessment
In this section, we evaluated our SM4 kernel, and compared it with other imple-
mentations. We tested the capabilities of our symmetric cryptography server,
and compared it with other servers. Our server platform was equipped with one
Intel E5-2699 v3 CPU, 8 GB memory, and one NVIDIA GTX 1080 cards.
ECB ECB
CBC CBC
CTR CTR
0.8 10
Throughput (Gbps)
Throughput (Gbps)
8
0.6
6
0.4
0.2
0 10000 20000 0 2000 4000 6000
TCP connection number TCP connection number
a) Multiple keys, 64 bytes packet b) Multiple keys, 1440 bytes packet
ECB
20 CBC 64Byte
CTR 35 1440Byte
4096Byte
30
15
Throughput (Gbps)
25
Throughput (Gbps)
20
10 15
10
5 5
0
0 1000 2000 3000 0 1000 2000
TCP connection number TCP connection number
c) Multiple keys, 4096 bytes packet d) Single key under CTR mode
6 Conclusion
In this work, we have presented how to use GPUs to accelerate SM4 encryp-
tion. On GeForce GTX 1080, the speed of our SM4 kernel reaches 535.68 Gbps,
it is 12.59 times faster than the existing fastest SM4 implementation. We have
sped up inefficient data transfer between GPUs and CPUs, and our optimization
makes use of 85.4% bandwidth of GTX 1080’s PCI-E. We also have showed the
potential of using GPUs to enhance the performance of symmetric cryptogra-
phy servers. Our symmetric cryptography server is capable of providing data
encryption under ECB mode, CBC mode and CTR mode. For the case that uses
a single key for a long term, like IPSec VPN gateways, the throughput of our
server reaches 32.23 Gbps, it is 2.48 times faster than SSLShader
For SM4 symmetric algorithm, the kernel rate is much higher than data trans-
fer rate. While the GPU performs SM4 operations, other algorithms can be oper-
ated at the same time to avoid the waste of GPU’s computational resources. The
inefficiency in the Linux TCP/IP stack is limiting the potential of our server, and
Intel’s DPDK seems to be a possible solution. These issues are our future work.
References
1. Bolz, J., Farmer, I., Grinspun, E., Schröoder, P.: Sparse matrix solvers on the
GPU: conjugate gradients and multigrid. ACM Trans. Graph. (TOG) 22(3), 917–
924 (2003)
2. Erickson, J., Ding, J., Christensen, C.: Algebraic cryptanalysis of SMS4: Gröbner
basis attack and SAT attack compared. In: Lee, D., Hong, S. (eds.) ICISC 2009.
LNCS, vol. 5984, pp. 73–86. Springer, Heidelberg (2010). https://doi.org/10.1007/
978-3-642-14423-3 6
540 W. Cheng et al.
3. Etrog, J., Robshaw, M.J.B.: The cryptanalysis of reduced-round SMS4. In: Avanzi,
R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 51–65. Springer,
Heidelberg (2009). https://doi.org/10.1007/978-3-642-04159-4 4
4. Gilger, J., Barnickel, J., Meyer, U.: GPU-acceleration of block ciphers in the
OpenSSL cryptographic library. In: Gollmann, D., Freiling, F.C. (eds.) ISC 2012.
LNCS, vol. 7483, pp. 338–353. Springer, Heidelberg (2012). https://doi.org/10.
1007/978-3-642-33383-5 21
5. Harrison, O., Waldron, J.: Practical symmetric key cryptography on modern graph-
ics hardware. In: USENIX Security Symposium, vol. 2008 (2008)
6. Jang, K., Han, S., Han, S., Moon, S.B., Park, K.: SSLShader: cheap SSL acceler-
ation with commodity processors. In: NSDI (2011)
7. Jiang, Z.H., Fei, Y., Kaeli, D.: A complete key recovery timing attack on a GPU. In:
2016 IEEE International Symposium on High Performance Computer Architecture
(HPCA), pp. 394–405. IEEE (2016)
8. Li, C., Wu, H., Chen, S., Li, X., Guo, D.: Efficient implementation for MD5-RC4
encryption using GPU with CUDA. In: 3rd International Conference on Anti-
counterfeiting, Security, and Identification in Communication. ASID 2009, pp. 167–
170. IEEE (2009)
9. Liu, F., Ji, W., Hu, L., Ding, J., Lv, S., Pyshkin, A., Weinmann, R.-P.: Analysis
of the SMS4 block cipher. In: Pieprzyk, J., Ghodosi, H., Dawson, E. (eds.) ACISP
2007. LNCS, vol. 4586, pp. 158–170. Springer, Heidelberg (2007). https://doi.org/
10.1007/978-3-540-73458-1 13
10. Luken, B.P., Ouyang, M., Desoky, A.H.: AES and DES encryption with GPU. In:
ISCA PDCCS, pp. 67–70 (2009)
11. Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for
AES cryptography. In: IEEE International Conference on Signal Processing and
Communications. ICSPC 2007, pp. 65–68. IEEE (2007)
12. Martı́nez-Herrera, A.F., Mancillas-López, C., Mex-Perera, C.: GCM implementa-
tions of Camellia-128 and SMS4 by optimizing the polynomial multiplier. Micro-
process. Microsyst. 45, 129–140 (2016)
13. Nishikawa, N., Amano, H., Iwai, K.: Implementation of bitsliced AES encryption
on CUDA-enabled GPU. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds.)
NSS 2017. LNCS, vol. 10394, pp. 273–287. Springer, Cham (2017). https://doi.org/
10.1007/978-3-319-64701-2 20
14. NVIDIA. CUDA C Programming Guide 8.0 (2017). http://docs.nvidia.com/cuda/
cuda-c-programming-guide/index.html#introduction
15. Pan, W., Zheng, F., Zhao, Y., Zhu, W.-T., Jing, J.: An efficient elliptic curve
cryptography signature server with GPU acceleration. IEEE Trans. Inf. Forensics
Secur. 12(1), 111–122 (2017)
16. Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the floating-point com-
puting power of GPUs for RSA. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K.,
Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 198–215. Springer, Cham (2014).
https://doi.org/10.1007/978-3-319-13257-0 12
17. Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the potential of GPUs
for modular multiplication in ECC. In: Rhee, K.-H., Yi, J.H. (eds.) WISA 2014.
LNCS, vol. 8909, pp. 295–306. Springer, Cham (2015). https://doi.org/10.1007/
978-3-319-15087-1 23
An Experimental Study of Kannan’s
Embedding Technique for the Search
LWE Problem
1 Introduction
Nowadays many post-quantum cryptographic schemes as fully homomorphic
encryption and lattice-based signature schemes base their security on some
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 541–553, 2018.
https://doi.org/10.1007/978-3-319-89500-0_47
542 Y. Wang et al.
lattice hard problems as the learning with errors (LWE) problem, short inte-
ger solution (SIS) and so on [9,16,18]. LWE problem was introduced by Regev
in 2005, which comes from “learning parity with noise” by lifting the moduli
value, and concreting the probability distribution of “error” [18]. As an average-
case lattice problem, LWE problem is proved as hard as certain worst-case lattice
problems such as GapSVP and SIVP [18], which allows to build many provably
secure lattice-based cryptographic schemes. The hardness of LWE problem is
related to three critical parameters: the length n of secret vector, the moduli q
and the deviation σ of error vectors. Some theoretical analysis for the hardness
of LWE are given as lattice-based attack [14,15], and BKW type attack [11], but
rarely concrete parameters based on experiments were published. However, for
the practical application, it is indispensable to estimate the concrete parameters
of LWE from sufficient experiments. In this work, we focus on the more practi-
cal lattice-based attack. At first, the LWE problem can be seen as a particular
bounded distance decoding (BDD) instance on a q-ary lattice. For a given lattice
and a target vector close to the lattice points in a reasonable bound, BDD is to
find the closest lattice vector to the target.
There are two main methods to process the BDD instance. One is reducing
the lattice basis first and search the secret vector by Babai’s NearestPlane [4]
algorithm or its variants [14,15]. Especially in [14], Liu and Nguyen intermingle
the short error vector into an enumeration searching tree, which makes the attack
more efficient. Another procedure is to reduce BDD to the unique-shortest vector
problem (unique-SVP) by Kannan’s embedding technique [10]. This procedure
increase one more lattice dimension by adding the target vector and a so-called
embedding factor M into the new basis. By a proper parameter setting, the short
error vector is usually a component of the shortest vector in the new lattice. So
there is a big gap between the shortest vector and the second shortest vector
in the new lattice, which makes a lattice reduction algorithm or a searching
algorithm find the shortest one more efficiently. Since both methods call the SVP
solver as subroutine, their complexity grow exponentially with the dimension
increasing.
In order to assess the hardness of the LWE problem in practice, TU Darm-
stadt, in alliance with UC San Diego and TU published, a new platform “Darm-
stadt LWE Challenge” [5,21]. LWE Challenge provides LWE samples by increas-
ing hardness for researcher to test their solving algorithms.
In this work, we apply the embedding technique on LWE problem, using
state-of-the-art progressive BKZ algorithm [2]. The LWE instances used in our
experiments are sampled from Darmstadt LWE Challenge. From our experi-
ments, we find that the algorithm can derive a better efficiency if the embedding
factor M is closer to 1. We also give an preliminary analysis for the proper
parameter as the dimension m of LWE samples should be used in the attack
associate to the secret length n. Especially for n ≥ 55 and the fixed σ/q = 0.005,
our implemented embedding technique with progressive BKZ is more efficient
than Xu et al.’s implementation of the enumeration algorithm in [21,23]. Finally,
we got the records of case (70, 0.005) in Darmstadt LWE Challenge, using our
extrapolated setting of m, which took 32.73 single core hours.
An Experimental Study of Kannan’s Embedding Technique 543
Roadmap. Section 2 recalls the notations and background on lattice, LWE prob-
lem and BKZ reduction algorithms. We introduce Kannan’s embedding tech-
nique in Sect. 3. Our experimental results and preliminary analysis on the rele-
vant parameters settings in Kannan’s embedding technique are shown in Sect. 4.
Finally we give some conclusions in Sect. 5.
2 Preliminaries
Note that our definition of rHF depending on the given bases and the output
(b1 , . . . , bn ) of the short vector from lattice algorithms.
Gaussian Heuristic. Given a n-dimensional lattice L and a continuous (and
usually convex) set S ⊂ Rn , Then the Gaussian heuristic estimates that the
number of points in S ∩ L is approximately vol(S)/vol(L).
544 Y. Wang et al.
(Γ (n/2 + 1)vol(L))1/n
λ1 (L) ≈ √
π
There are four parameters in LWE problem: the number of samples m ∈ Z, the
length n ∈ Z of secret vector, modulo q ∈ Z and the standard deviation σ ∈ R>0
for the discrete Gaussian distribution (denoted by Dσ ) on Z. Uniformly sample a
matrix A ∈ Zm×nq and a secret vector s ∈ Znq , and randomly sample a relatively
small perturbation vector e ∈ Zm q from Gaussian distribution Dσ . The LWE
distribution Ψ is constructed by pairs (A, b ≡ As + e (mod q)) ∈ (Zm×n q , Zm
q )
sampled as above. The search LWE problem is for a given pair (A, b) sampled
from LWE distribution Ψ , to compute the pair (s, e).
L(A,q) = {y ∈ Zm n
q |y ≡ Ax (mod q) for some x ∈ Z }
volume of derived q-ary lattice is q m−n , which will be used in Sect. 3.4. We give
explanations for each step in Algorithm 1.
Step 1. We follow the method in Sect. 2.2.3 to construct and compute the
HNF basis BHNF of q-ary lattice L(A,q) = {v ∈ Zm | v ≡ Ax (mod q), x ∈ Zn }.
Step 2. This step is the key point of embedding technique: expand the q-ary
basis BHNF ∈ Zm×n by one dimension, and embed the target vector b and
one embedding factor M into the new basis B ∈ Z(m+1)×(m+1) .
Step 3. At this step, we process the new basis B by lattice algorithms. After
the reduction, we get the error vector e from the output shortest vector w,
since e = b − Bu and w = B ( u1 ) = ( M
e
) for some u ∈ Zm q . In our work, we
use the progressive BKZ reduction in this step [2].
Step 4. Simply get the secret vector s by Gauss elimination.
(1) In the embedding procedure of Step 3, if the output vector w of the lattice
algorithm satisfies
√ 1/(m+1)
2mσ
w ≤ e2 + M 2 ≈ , (1)
(M q m−n )1/(m+1)
√
here e ≈ mσ, then the answer is correct with high probility.
(2) There is a gap between the shortest vector and the linearly independent
second shortest vector in L (B ), namely we have to solve a unique-SVP in
this lattice. The size of embedding factor M can affect the gap in some sense
and we will discuss it in Sect. 3.3.
(3) Since we do not know the exact value of e, we can not terminate by
condition (1). w ≤ e2 + M 2 is the condition for a reduction or point
548 Y. Wang et al.
suggest that the choice for the embedding factor M ∈ N is e. If M is bigger,
then there is a lower chance to solve LWE problems, since the gap in unique-
SVP will become smaller. However, if M is too small, there may exist a vector
b e
v ∈ L(B ) such that v + c · ( M ) < w = ( M ) where c ∈ Z, according to [1].
In our experiments we observe the runtime of attack using increasing M from 1.
This can properly enlarge the gap in γ-unique SVP transformed from BDD,
within a reduction algorithm’s capability estimated by the root Hermite factor
δ = rHF(b1 , . . . , bn ) = (b1 /vol(L)1/n )1/n .
Fig. 1. Runtime for cases (n, α) with fixed bases and increasing embedding factor M .
550 Y. Wang et al.
and plot it in Fig. 2. And the fitting of optimal m and n is the linear function
Table 1. Experimental runtime for each (n, α = 0.005) cases with parameter δ in
range [1.01, 1.011, . . . , 1.025].
Fig. 2. The runtime for embedding technique on Darmstadt LWE Challenge of (n, α =
0.005) cases: the stars and the full curve denote our average and it fitting of the
experimental runtime for the optimal m respectively; the dot line is fitting of the
smallest runtime for each optimal m case, which is heuristically seen as the lower bound
in our work; the red crosses are Xu et al.’s records at the LWE Challenge website; the
hollow circles are our experimental results for (n, α) = (70, 0.005). (Color figure online)
and we plot the fitting line in Fig. 2. Note that we take the quadratic formulas for
the estimation in 4 and 6 since the state-of-the-art extreme pruning enumeration
2
runs in 2O(n )−0.5n as the subroutine of progressive BKZ.
Furthermore, in Table 2, we estimate the necessary dimension m and the
relevant runtime by embedding technique on solving LWE Challenge cases n ≥
75, σ = 0.005,using progressive BKZ algorithm. Our estimation depending on
the fitting function (4) and (5).
Moreover, from Fig. 2 we can see that Xu et al.’s LWE Challenge records of
α = 0.005 stopped at n = 65 for the overwhelming runtime and low success prob-
ability [22]. Our implemented embedding technique with progressive BKZ can
solve the LWE Challenge instances more efficiently than Xu et al.’s enumeration
implementation for n ≥ 55.
For the cases of (n = 70, α = 0.005), we compute the extrapolated m ≈ 239
(relevant δ = 1.010) from function (5). Then we use δ =1.010, 1.011, 1.012, 1.013
and there are just two Darmstadt LWE Challenge cases with δ = 1.011, 1.012
are successfully solved by m = 233, 223 in time 216.8 , 218.2 s respectively. and
we plot it in Fig. 2, which are lying between the two fitting curves and close to
the runtime of estimated FittingLog2 (Lower Bound).
552 Y. Wang et al.
5 Conclusions
In this paper, we studied the algorithm to solve LWE problem using
Kannan’s embedding technique. Especially we randomly sampled LWE instances
from Darmstadt LWE Challenge and applied the progressive BKZ algorithm to
reduce the embedded bases. From our experiments of fixed relative error size
α = σ/q = 0.005, we observed that the algorithm has a more efficient trend if
the embedding factor M is closer to 1. We also illustrated the relation of the
dimension m of the q-ary lattice L(A,q) in LWE instance, the length n of secret
vector s, and the runtime of the algorithm. Furthermore, Xu et al.’s LWE Chal-
lenge records of α = 0.005 stopped at n = 55 for the overwhelming runtime,
while our experimental results show that for n ≥ 55, the embedding technique
with progressive BKZ can solve the LWE Challenge instances more efficiently
than Xu et al.’s implementation of Liu-Nguyen’s enumeration algorithm. Finally
our LWE Challenge record of (n, α) = (70, 0.005) cases succeeded in 216.8 s (32.73
single core hours), which also lies in the bounds of our fitting curves.
References
1. Albrecht, M.R., Fitzpatrick, R., Göpfert, F.: On the efficacy of solving LWE by
reduction to unique-SVP. In: Lee, H.-S., Han, D.-G. (eds.) ICISC 2013. LNCS,
vol. 8565, pp. 293–310. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-
12160-4 18
2. Aono, Y., Wang, Y., Hayashi, T., Takagi, T.: Improved progressive BKZ algorithms
and their precise cost estimation by sharp simulator. In: Fischlin, M., Coron, J.-S.
(eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 789–819. Springer, Heidelberg
(2016). https://doi.org/10.1007/978-3-662-49890-3 30
3. The progressive BKZ code. http://www2.nict.go.jp/security/pbkzcode/
4. Babai, L.: On Lovász’ lattice reduction and the nearest lattice point problem. In:
Mehlhorn, K. (ed.) STACS 1985. LNCS, vol. 182, pp. 13–20. Springer, Heidelberg
(1985). https://doi.org/10.1007/BFb0023990
An Experimental Study of Kannan’s Embedding Technique 553
5. Buchmann, J., Büscher, N., Göpfert, F., Katzenbeisser, S., Krämer, J., Micciancio,
D., Siim, S., Vredendaal, C., Walter, M.: Creating cryptographic challenges using
multi-party computation: the LWE challenge. In: AsiaPKC 2016, pp. 11–20 (2016)
6. Chen, Y., Nguyen, P.Q.: BKZ 2.0: better lattice security estimates. In: Lee, D.H.,
Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 1–20. Springer, Heidel-
berg (2011). https://doi.org/10.1007/978-3-642-25385-0 1
7. Domich, P., Kannan, R., Trotter, L.: Hermite normal form computation using
modulo determinant arithmetic. Math. Oper. Res. 12, 50–59 (1987)
8. Gama, N., Nguyen, P.Q.: Predicting lattice reduction. In: Smart, N. (ed.) EURO-
CRYPT 2008. LNCS, vol. 4965, pp. 31–51. Springer, Heidelberg (2008). https://
doi.org/10.1007/978-3-540-78967-3 3
9. Gentry, C., Peikert, C., Vaikuntanathan, V.: Trapdoors for hard lattices and new
cryptographic constructions. In: STOC 2008, pp. 197–206 (2008)
10. Kannan, R.: Minkowski’s convex body theorem and integer programming. Math.
Oper. Res. 12(3), 415–440 (1987)
11. Kirchner, P., Fouque, P.-A.: An improved BKW algorithm for LWE with applica-
tions to cryptography and lattices. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO
2015. LNCS, vol. 9215, pp. 43–62. Springer, Heidelberg (2015). https://doi.org/10.
1007/978-3-662-47989-6 3
12. Lenstra, A.K., Lenstra Jr., H.W., Lovász, L.: Factoring polynomials with rational
coefficients. Math. Ann. 261(4), 515–534 (1982)
13. Lyubashevsky, V., Micciancio, D.: On bounded distance decoding, unique shortest
vectors, and the minimum distance problem. In: Halevi, S. (ed.) CRYPTO 2009.
LNCS, vol. 5677, pp. 577–594. Springer, Heidelberg (2009). https://doi.org/10.
1007/978-3-642-03356-8 34
14. Liu, M., Nguyen, P.Q.: Solving BDD by enumeration: an update. In: Dawson, E.
(ed.) CT-RSA 2013. LNCS, vol. 7779, pp. 293–309. Springer, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-36095-4 19
15. Lindner, R., Peikert, C.: Better key sizes (and attacks) for LWE-based encryption.
In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 319–339. Springer, Hei-
delberg (2011). https://doi.org/10.1007/978-3-642-19074-2 21. Decoding Radius
and DMT Optimality, ISIT2011, pp. 1106–1110 (2011)
16. Micciancio, D., Regev, O.: Lattice-based cryptography. In: Bernstein, D.J.,
Buchmann, J., Dahmen, E. (eds.) Post-Quantum Cryptography 2009, pp. 147–
191. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-88702-7 5
17. Victor Shoup’s NTL library. http://www.shoup.net/ntl/
18. Regev, O.: On lattices, learning with errors, random linear codes, and cryptogra-
phy. In: STOC 2005, pp. 84–93 (2005)
19. Schnorr, C.P.: Lattice reduction by random sampling and birthday methods. In:
Alt, H., Habib, M. (eds.) STACS 2003. LNCS, vol. 2607, pp. 145–156. Springer,
Heidelberg (2003). https://doi.org/10.1007/3-540-36494-3 14
20. Schnorr, C.P., Euchner, M.: Lattice basis reduction: improved practical algorithms
and solving subset sum problems. Math. Program. 66, 181–199 (1994)
21. TU Darmstadt Learning With Errors Challenge. https://www.latticechallenge.
org/lwe challenge/challenge.php
22. Xu, R.: Private communication (2017)
23. Xu, R., Yeo, S.L., Fukushima, K., Takagi, T., Seo, H., Kiyomoto, S., Henricksen,
M.: An experimental study of the BDD approach for the search LWE problem. In:
Gollmann, D., Miyaji, A., Kikuchi, H. (eds.) ACNS 2017. LNCS, vol. 10355, pp.
253–272. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61204-1 13
Cloud and E-commerce Security
A Security-Enhanced vTPM 2.0 for Cloud
Computing
1 Introduction
Security is currently the key factor of restricting the development of cloud computing.
In the cloud computing environment, how to protect the integrity of cloud infras-
tructure is a basic requirement of cloud security. Trusted computing has been con-
sidered as a feasible way to protect the integrity of cloud infrastructure.
However, in cloud computing environment, a lot of virtual machines may be
running in a physical machine. It is difficult to use hardware TPM (Trusted Platform
Module) to build trusted virtual execution environment. Therefore, vTPM has been put
forward and used in cloud [13, 14].
IBM designed and implemented vTPM system on a virtualized hardware platform
[11]. They virtualized the Trusted Platform Module by extending the standard TPM
command set to support vTPM lifecycle management and enable trust establishment in
the virtualized environment. Hence, each virtual machine instance gets its own unique
and virtual TPM. However, vTPM still faces some key challenges in cloud.
Firstly, current vTPM lacks the mechanism to ensure the security vTPM itself.
vTPM is an emulated software TPM. Due to lacking the physical hardware protection,
it is subject to greater security threats compared with entity TPM. Furthermore,
physical TPM may not provide runtime protection for vTPM because its NVRAM is
usually very small and cannot support multiple virtual machines. In addition, entity
TPM incurs a large overhead in performance when multiple vTPMs run at the same
time.
Secondly, current vTPM cannot support TPM 2.0 specification. The architecture of
TPM 2.0 is different with TPM 1.2, for example the keys of TPM 2.0 are generated by
three persistent hierarchies and also it can support all kinds of cryptographic algorithms
through incorporating an algorithm identifier. Hence we need to design vTPM based on
TPM 2.0 architecture so as to improve vTPM 1.2.
Aiming at these problems, we propose security-enhanced vTPM 2.0 system which
can support TCG TPM 2.0 specification and the keys and private data can be protected
using SGX keys and enclave. To the best of our knowledge, it is the first time that
vTPM 2.0 based on KVM (Kernel-based Virtual Machine) has been proposed and
implemented. In our system, we also propose a vTPM 2.0 key distribution and pro-
tection mechanism based on KMC (Key Management Center) [20, 21]. Our approach
can achieve the key hierarchy, which is same as the physical TPM. In addition, it can
avoid the problem that new physical platform always regenerates the certificate for
vTPM and rebuilds the trust binding during the migration for each time. Moreover, the
basic seeds of vTPM can be backed up by KMC. When the vTPM is damaged, the keys
and data of vTPM can be easily recovered. We also implement our system on KVM
and Skylake CPU and evaluate vTPM 2.0 performance. The result shows that the
SGX-enhanced vTPM brings about 20% additional overhead compared with the vTPM
2.0 which lacks security protection.
The remainder of this paper is organized as follows. Section 2 provides related
work. Section 3 introduces the background of TPM 2.0 and SGX. Section 4 describes
the design of security-enhanced vTPM 2.0. Section 5 proposes the vTPM key distri-
bution and protection mechanism. The security-enhanced vTPM 2.0 implementation is
described in Sect. 6. Section 7 presents the evaluation of vTPM 2.0. Section 8 provides
the conclusion.
2 Related Work
TCG TPM 1.2 architecture [25] are found, for example, cipher algorithms are not flexible.
Hence TCG has proposed TPM 2.0 specification [1–5] and has published as ISO standard
in 2015 [22]. Scarlata et al. [18] present a support for a variety of TPM model and different
security properties of system framework based on Xen virtual machine hypervisor.
vTPM is a virtualized TPM which is generally used in virtualized environment,
such as cloud computing platform, to build trust computing base. The most important
work about virtualized TPM is that Berger et al. [9] from IBM designed and imple-
mented a vTPM system on a virtualized hardware platform. They virtualized the
Trusted Platform Module by extending the standard TPM command set to support
vTPM lifecycle management and enable trust establishment in the virtualized envi-
ronment. England and Loeser [15] extended hypervisor to add the vPCR (virtual PCR)
and TPM context manager resource virtualization which allows guests operating sys-
tems to share hardware TPM. But the number of virtual machines on a physical
machine is uncertain, their approach must meet performance bottleneck due to the
limited memory space of TPM. In addition, Yang et al. [16] designed an Ng-vTPM
framework. In Ng-vTPM framework, the EK and SRK are produced by physical TPM.
The approach can protect the keys’ security to some extent, but once the physical TPM
is damaged, the keys of vTPM will not be used and recovered forever. Yan et al. [17]
propose a secure enhancement named vTSE. The scheme utilizes the physical memory
isolation feature of SGX to protect the code and data of vTPM instances, but they do
not consider the vTPM keys recovery and cannot support TPM 2.0.
Current work is just support TPM 1.2. In our work, we design and implement the
vTPM 2.0 on KVM. In addition, we provide runtime protection for code and private
data of vTPM 2.0 using SGX. Furthermore, the vEK (virtual EK) and vSRK (virtual
SRK) of vTPM 2.0 will be generated by a trusted party, KMC, and they are bound with
VM UUID. Therefore, the keys are easy to be recovered once damaged.
3 Background
TPM that are authorized. Besides the traditional password and HMAC authentication
methods, the authentication method based on policy authorization has been added. It
allows one or more authorization policies are used, which can enhance key’s security.
Last but not least, TPM 2.0 enhances the robustness. In the TPM 2.0 specification,
some important things can be sealed to a PCR (platform configuration register) value
approved by a particular signer instead of to a particular PCR value. Additionally,
platform hierarchy allows OEM (Original Equipment Manufacturer) directly to use the
function of TPM in BIOS without considering the OS’ s support.
3.2 SGX
The Software Guard Extension (SGX) [26, 27] was introduced in 2013 by Intel Cor-
poration. It protects a portion of the application’s memory space and places code and
data in this container that Intel calls enclave [12, 28]. Once the protected part of the
application is loaded into the enclave, SGX will protect them from external process
such as OS, drivers, BIOS, hypervisor and System Management Mode (SMM).
Moreover, when the process terminates, its enclaves will be destroyed and the runtime
data and code that are protected in the enclave will disappear. SGX also provides the
seal function to encrypt the data and store it on the permanent media and then we can
restore it in enclave when we need to use it again.
In addition to providing security attributes of memory isolation and protection.
SGX architecture also supports attestation function. In SGX, attestation [30] is to prove
the identity of the platform, and supports two attestations, local attestation between
enclaves and remote attestation by a third party.
reset of NVRAM, the initialization of TPM components and the process of command.
Platform module implements the creation of NVRAM file, the set of locality and the
management of power state of TPM. The Crypto Engine module packages the real-
ization of crypto algorithms provided by TPM and implements them through calling
interfaces provided by OpenSSL.
Because Libtpms 2.0 module undertakes the core function of vTPM 2.0, its code
and process need to be protected. We isolated the module into a SGX enclave. When
Libtpms 2.0 is loaded, the SGX enclave is created and then Libtpms 2.0 program is
measured to validate its integrity. Once the integrity is not tampered with, Libtpms 2.0
code will be executed in the enclave EPC (enclave page cache). Hence the program is
protected in runtime and only itself can access the code in the enclave. The untrusted
part of vTPM 2.0, such as vTPM 2.0 management module, just can call the function of
Libtpms 2.0 through enclave call (ecall) and out call (ocall).
NVRAM is like TPM memory. Due to lacking the isolated physical NVRAM of
TPM, it is designed as a separated file which saves the keys, PCR values, seeds and
other private data. When a vTPM 2.0 device is created, a NVRAM file will be also
created. Because the important data of vTPM is saved in NVRAM file, it is vital for
vTPM security. Hence, we leverage SGX sealing to protect NVRAM. To preserve
some secret data in an enclave for future use, SGX offers a sealing function. Sealing
can encrypt the data inside an enclave and store them on a permanent medium such as a
hard disk drive, so the data can be used the next time. When sealing data, there are two
options available: sealing to the current enclave using the current version of the enclave
measurement (MRENCLAVE) or sealing to the enclave author uses the identity of the
enclave author (MRSIGNER). In this work, we use both mechanisms. The private data
of NVRAM is sealed by the seal key which is generated in the corresponding enclave.
When the NVRAM file is loaded into RAM, it will be unsealed and isolated in an
enclave. Therefore the software except for Libtpms 2.0, including OS, drivers, BIOS
and hypervisor cannot access the data of NVRAM.
562 J. Wang et al.
Tpm2driver is generally used to provide the interfaces to access TPM 2.0 hardware
device. Tpm_tis emulates the hardware interface of TIS (TPM Interface Specification)
in QEMU and implements the interfaces to call Libtpms 2.0. SeaBIOS also plays an
important part in the process of creating a VM on KVM platform. Apart from
implementing the whole standard calling interfaces as a typical x86 hardware BIOS,
SeaBIOS is extended to support TPM by initializing the vTPM 2.0 when creating a
VM. This includes allocating a fixed virtual memory address in which the vTPM
communicates with the lower operating system and resetting all the registers of vTPM.
When a VM sends a TPM command, tpm2driver of the VM will firstly talk to the
tpm_tis frontend emulated by QEMU to deliver the TPM request. Then the tpm_tis
frontend in QEMU delivers the request to Libtpms 2.0 driver, the driver will call the
Libtpms 2.0 shared library to process the TPM command and return the results. This
method does not have any limit of the numbers of VM (as long as the hardware
resources permit). All that a user needs to do is to configure an exclusive NVRAM used
to save all the persistent state and data for the vTPM 2.0 in each VM. During the VM
migration, the corresponding NVRAM is migrated along with the VM and then the VM
can continue to use vTPM 2.0 resources on the new platform.
When vTPM 2.0 is protected using SGX enclave, its keys and PCRs are encrypted by
the CPU supported SGX. Once a virtual machine with vTPM 2.0 device is migrated,
the vTPM 2.0 device also need to be migrated. However, the SGX keys cannot be
migrated. Hence, the trust chain between vTPM and physical CPU will be broken
during the vTPM migration. In addition, the current method cannot support key
recovery. Once vTPM is damaged, the keys of vTPM will be lost. To solve the
problems, we propose a vTPM 2.0 key distribution and protection mechanism based on
KMC and Intel SGX.
In our method, the primary seeds of vTPM 2.0 including EPS (Endorsement Pri-
mary Seeds), SPS (Storage Primary Seeds) and PPS (Platform Primary Seeds) will be
generated by KMC and then distributed to vTPM 2.0 by encrypted and secure channel.
The primary seeds will be encrypted and saved in KMC and this process is carried out
in the Enclave safe. Meanwhile, the primary seeds in a virtual machine will be
encrypted by SGX key on host. Once physical CPU or vTPM is damaged, KMC can
recover the primary seeds and then recover vTPM keys. The key distribution and
protection process of vTPM 2.0 is described as shown Fig. 2. The encrypted com-
munication channel is established using the SGX remote authentication feature. In
order to achieve encrypted communication channel, it needs to introduce a special
quoting enclave which is used to generate the credential that reflects enclave and
platform status. When the KMC wants to authenticate a VM, the VM first executes the
EREPORT instruction to generate the REPORT structure and then use the report key of
quoting enclave to generate a MAC, along with the REPORT send to quoting enclave.
Then the quoting enclave packs them into a quote structure QUOTE and signs it with
EPID. Finally the quoting enclave sends QUOTE and signature to KMC.
A Security-Enhanced vTPM 2.0 for Cloud Computing 563
1. A virtual machine communicates with TSS (Trusted Software Stack) and calls API
(Application Programming Interface) to request getting primary seeds at startup
process. Its UUID (Universally Unique Identifier) is as a parameter of the request.
2. In order to get the primary seeds, TSS sends a message to KMC according to the
KMC address in configuration.
3. KMC selects an asymmetric key pair (e.g. RSA) from key protection system as a
protected key, gets its public key and returns the protected public key to TSS.
4. TSS calls TPM2_Load to load the protected public key to vTPM, then calls
TPM2_RSA_Encrypt to encrypt them and sends the cipher text to KMC, requests
KMC to send basic seeds.
5. After KMC gets the encrypted request, it will use TPM interface to decrypt the
cipher text to get the request with the private key. Then KMC will generate a
symmetric key. Furthermore, the symmetric key and UUID will be used as the
parameters of the random number generator to generate basic seeds for the virtual
machine.
564 J. Wang et al.
6. KMC stores the basic seeds, the symmetric key and UUID in database, and encrypts
basic seeds and other information with protected key, then sends back the cipher
text to vTPM TSS.
7. TSS calls TPM2_RSA_Decrypt to decrypt the cipher text, and returns the basic
seeds to the virtual machine.
Compared with previous method, our approach can achieve the key hierarchy,
which is the same as the physical TPM. In addition, it can avoid the problem that
during the migration for each time new physical platform always regenerates the
certificate for vTPM and rebuilds the trust bindings. Moreover, the basic seeds of
vTPM can be backed up by KMC. When the vTPM is bad, the keys and data of vTPM
can be recovered.
sends the public key to QEMU. QEMU furthermore sends VM UUID encrypted by
public key to KMC through security channel. KMC creates an AES key by local crypto
chip. The AES key and UUID are used as the parameters of the random number
generator to generate primary seed for virtual machines. Meanwhile the basic seeds,
UUID and AES key will be encrypted by local crypto chip and then stored in KMC
database. The encrypted seed will be reply to QEMU. The Libtpms 2.0 module in
QEMU, QEMU will create vEK, vSRK, and other root keys for the vTPM.
For a vTPM, QEMU allocates a memory file to save nonvolatile data. The vEK,
vSRK, and other root keys are saved to this file named NVRAM. In order to protect the
keys security, the NVRAM file is sealed and isolated by SGX keys and enclave. The
keys are also backed up to KMC. In addition, the Libtpms 2.0 is compiled into a static
library so as to be loaded and run in the SGX enclave. We also add the ecall and ocall
in the vTPM management module and the Libtpms 2.0 module in order to implement
the communication with them.
7 Evaluation
7.2 Migration
We also carried out the single VM and multiple VMs live migration test. [19]
Migration channel using SSH RSA public key encryption, we record the start time and
end time, and compute the time cost of migration. In addition to the normal time
566 J. Wang et al.
needed for migration, VM with SGX-enhanced vTPM migration time also includes
four parts: (1) unseal NVRAM from enclave; (2) migrate vTPM state; (3) the desti-
nation host decrypts NVRAM; (4) use new SGX to seal NVRAM. VM without vTPM
does not include the four parts.
For a single VM migration, the VM image is Ubuntu 14.04 64-bit and the hardware
resources allocated for the VM are 1 VCPU, 1024 MB RAM and 20 G Disk. For
multiple VMs (ten units) concurrent migration, the allocated hardware resources for
each VM are 1 VCPU, 1024 RAM, and 6 G Disk.
A Security-Enhanced vTPM 2.0 for Cloud Computing 567
8 Conclusion
Acknowledgment. This work is sponsored by the National Basic Research Program of China
(973 Program) granted No. 2014CB340600, National Natural Science Foundation of China
granted No. 61402342, 61173138 and 61103628, and the Huawei Technologies Co., Ltd. col-
laborative research project.
568 J. Wang et al.
References
1. Trusted Computing Group. TPM Rev 2.0 Part1. Architecture. Family 2.0, Level 00.
Revision 16 Jan 2014
2. Trusted Computing Group. TPM Rev 2.0 Part2. Structures. Family 2.0, Level 00. Revision
16 Jan 2014
3. Trusted Computing Group. TPM Rev 2.0 Part3. Commands. Family 2.0, Level 00. Revision
16 Jan 2014
4. Trusted Computing Group. TPM Rev 2.0 Part4. Supporting. Routines. Family 2.0, Level 00.
Revision 16 Jan 2014
5. Trusted Computing Group. Trusted Platform Module Specification Family 2.0, Level 00.
Revision 00.99 (2014)
6. Santos, N., Rodrigues, R., Gummadi, K.P., Saroiu, S.: Policy-sealed data: a new abstraction
for building trusted cloud services. In: Proceedings of 21th USENIX Security Symposium on
USENIX Security Symposium (2012)
7. Chen, C., Raj, H., Saroiu, S., Wolman, A.: cTPM: a cloud TPM for cross-device trusted
applications. In: Proceedings of the 11th USENIX Conference on Networked Systems
Design and Implementation (2014)
8. Bates, A., Tian, D., Kevin, R.B.: Trustworthy whole-system provenance for the Linux
Kernel. In: Proceedings of 24th USENIX Security Symposium on USENIX Security
Symposium (2015)
9. Berger, S., Cáceres, R., Goldman, K.A., et al.: vTPM: virtualizing the trusted platform
module. In: Proceedings of the 15th Conference on USENIX Security Symposium, vol. 15,
p. 21. USENIX Association (2006)
10. Anati, I., Gueron, S., Johnson, S., Scarlata, V.: Innovative technology for CPU based
attestation and sealing. In: Proceedings of the 2nd International Workshop on Hardware and
Architectural Support for Security and Privacy, vol. 13 (2013)
11. Sadeghi, A.-R., Stüble, C., Winandy, M.: Property-based TPM virtualization. In: Wu, T.-C.,
Lei, C.-L., Rijmen, V., Lee, D.-T. (eds.) ISC 2008. LNCS, vol. 5222, pp. 1–16. Springer,
Heidelberg (2008). https://doi.org/10.1007/978-3-540-85886-7_1
12. Hoekstra, M., Lal, R., Pappachan, P., Phegade, V., Del Cuvillo, J.: Using innovative
instructions to create trustworthy software solutions. In: HASP@ ISCA, pp. 11–17 (2013)
13. Garfinkel, T., Pfaff, B., Chow, J., et al.: Terra: a virtual machine-based platform for trusted
computing. ACM SIGOPS Operating Syst. Rev. 37(5), 193–206 (2003)
14. Krautheim, F.J., Phatak, D.S., Sherman, A.T.: Introducing the trusted virtual environment
module: a new mechanism for rooting trust in cloud computing. In: Acquisti, A., Smith, S.
W., Sadeghi, A.-R. (eds.) Trust 2010. LNCS, vol. 6101, pp. 211–227. Springer, Heidelberg
(2010). https://doi.org/10.1007/978-3-642-13869-0_14
15. England, P., Loeser, J.: Para-virtualized TPM sharing. In: Lipp, P., Sadeghi, A.-R., Koch,
K.-M. (eds.) Trust 2008. LNCS, vol. 4968, pp. 119–132. Springer, Heidelberg (2008).
https://doi.org/10.1007/978-3-540-68979-9_9
16. Yang, Y., Yan, F., Mao, J.: Ng-vTPM: a next generation virtualized TPM architecture.
J. Wuhan Univ. (Nat. Sci. Ed.) 2, 103–111 (2015)
17. Yan, F., Yu, Z., Zhang, L., et al.: vTSE: a solution of SGX-based vTPM secure
enhancement. Adv. Eng. Sci. 49(2), 133–139 (2017)
18. Scarlata, V., Rozas, C., Wiseman, M., Grawrock, D., Vishik, C.: TPM virtualization:
building a general framework. In: Pohlmann, N., Reimer, H. (eds.) Trusted Computing.
Vieweg+Teubner (2008)
A Security-Enhanced vTPM 2.0 for Cloud Computing 569
19. Danev, B., Masti, R.J., Karame, G.O., et al.: Enabling secure VM-vTPM migration in private
clouds. In: Proceedings of the 27th Annual Computer Security Applications Conference,
pp. 187–196. ACM (2011)
20. Zhang, Q., Zhao, S., Qin, Y., et al.: Formal analysis of TPM 2.0 key management APIs.
Chin. Sci. Bull. 59(32), 4210–4224 (2014)
21. NIST, Recommendation for Key Management–Part 1: General (Revision 3), Special
Publication 800–57
22. http://www.trustedcomputinggroup.org/media_room/news/392
23. http://www.infineon.com/cms/en/product/security-ic/trustedcomputing/channel.html?channel=
db3a30433efacd9a013f10d2a7264daa
24. http://www.chromebookblog.com/tag/tpm-chips-for-chromebook/
25. Arthur, W., Challener, D.: Practical Guide to TPM 2.0 Using the Trusted Platform Module in
the New Age of Security. Willey (2015)
26. Mckeen, F., Alexandrovich, I., Berenzon, A., et al.: Innovative instructions and software
model for isolated execution (2013)
27. Intel Software Guard Extensions, https://software.intel.com/en-us/sgx
28. Sinha, R., Rajamani, S., Seshia, S., Vaswani, K.: Moat: verifying confidentiality of enclave
programs. In: ACM Sigsac Conference on Computer and Communications Security,
pp. 1169–1184 (2015)
SDAC: A New Software-Defined Access
Control Paradigm for Cloud-Based
Systems
1 Introduction
In the distributed cloud systems, one tenant can provision the resources from dif-
ferent cloud infrastructures. Considering the multi-tenancy, extremely dynamic
and heterogeneous cloud environments, each tenant is expected to protect
its users and resources with an effective access control model. However, the
best practice of access control for cloud-based systems usually relies on the
pre-definition of the access control models, e.g., Mandatory Access Control
(MAC), Role Based Access Control (RBAC), while the tenant-specific and user-
customized access control model remains unavailable.
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 570–581, 2018.
https://doi.org/10.1007/978-3-319-89500-0_49
SDAC: A New Software-Defined Access Control Paradigm 571
2 Related Work
– Entity E = {ei }, which can be used as subjects or objects in the access control
model, and they can be either users or cloud resources.
– Information I: the related security properties of each entity, e.g., user roles,
file types. One entity can be assigned with several types of properties, called
categories, and each of which has a set of values, e.g., I= InfoCategory ×
CatScope, where InfoCategory is a set of the types of security properties, and
CatScope is a set of potential values for each category.
– Data (D) is a complete set of values for each category for subjects,
objects, actions, i.e., D = (SubjectD, ObjectD, ActionD, Instruction), where
SubjectD ⊆ SubjectM D × CatScope, ObjectD ⊆ ObjectM D × CatScope,
ActionD ⊆ ActionM D × CatScope
– Rule (R) specifies user privileges by using category values of subjects, objects,
and actions to determine the instructions to be triggered, i.e., R : SubjectD ×
ObjectD × ActionD → Instruction. As aforesaid, three types of instruction
are identified, (1) authorization decision (grant or deny); (2) policy update to
modify the category values of an entity; (3) policy chain to route the request
to another policy.
– Perimeter (P ) is a set of entities (subjects, objects, actions) to be protected.
As each SMPolicy is applied to one particular scope, we need to define its
576 R. He et al.
perimeter by identifying the entities that are involved in this scope, i.e., P =
(S, O, A), where S, O, A ⊆ E.
– Entity-Data Assignment (EDAss) is used to establish many-to-many rela-
tionship between related data and entities by assigning category value
to each entity. Formally, EDAss = (SubjectDataAss, ObjectDataAss,
ActionDataAss), where SubjectDataAss ⊆ S × SubjectD, ObjectDataAss
⊆ O × ObjectD ActionDataAss ⊆ A × ActionD.
where {e}×DataAss means fetching all attributes of the entity e from its entity-
data assignment.
1. Through the meta-data interface, the admin creates categories for subjects,
objects and actions, then defines meta-rules, which specify categories to be
used to build rules. The meta-data (M D), together with meta-rule (M R),
constructs a customized access control model.
2. The admin creates an access control policy based on this access control model
by specifying the values for each category, and creating rules (instructions)
based on these values and the meta-rule;
3. The admin identifies subjects, objects and actions that need to be protected
by this SMPolicy, and finally assigns values to each category subjects, objects
and actions.
Meta-data M D:
– SubjectMD = (subject-security-level)
– ObjectMD = (object-security-level)
– ActionMD = (action-type)
Meta-rule M R:
– SubjectCategory = (subject-security-level)
– ObjectCategory = (object-security-level)
– ActionCategory = (action-type)
– Instruction = (AuthzDecision)
Data D:
Rule R:
– r = ((subject-security-level, [high]), (object-security-level, [medium]), (action-
type, [vm-action], (instruction, [grant]))
– r = ((subject-security-level, [high, medium]), (object-security-level, [low]),
(action-type, [vm-action], (instruction, [grant]))
578 R. He et al.
Perimeter P :
– S: {user0 , user1 }
– O: {vm0 , vm1 }
– A: {start-vm, stop-vm}
Entity-Data Assignment EDAss:
– SubjectDataAss = ((user0 , high), (user1 , medium))
– ObjectDataAss = ((vm0 , medium), (vm1 , low))
– ActionDataAss = ((start-vm, vm-action), (stop-vm, vm-action))
In this MLS, a user can start or stop a VM if and only if his or her security
level is higher than that of VM. For example, user0 can manipulate vm0 and
vm1 , while user1 can only manipulate vm1 .
4 Experiments
Our SDAC prototype is deployed into three HA (High-Availability) OpenStack
clusters, one serves as master platform, while another two run as slave platforms.
Each one is equipped with 5 servers (Intel E5-2680 with 48 cores/251G RAM),
of which 3 are controller nodes and 2 are compute nodes. The 6th server is set
up as a security node running SDAC. Specifically, our SDAC is implemented
based on a micro-service architecture, which means that both Access Control
Manager in the control plane and Access Control Daemon in the policy plane
are implemented through a set of containers.
Throughput of the Policy Engine. One of the key metrics for evaluating the
capability of access control policy engine (PDP) is throughput, e.g., the number
of authorization requests per second that it can handle. In this evaluation, we
set SMPolicy as a basic RBAC, which has 10 users, 5 roles and 10 objects.
We gradually increased the number of requests to observe the throughput of
the policy engine. As shown in Fig. 3, the average throughput arrives its limit
(4.1 requests per second) when the request frequency was adjusted from 1 to
20 requests per second. It is worth nothing that, thanks to the micro-service
architecture, one identical SMPolicy container will be automatically launched
when the number of request is beyond the throughput of the policy engine.
SMPolicy Chaining Overhead. Our SDAC allows several SMPolicies to be
chained together to meet specific policy requirements. This apparently will incur
certain overhead. In this experiment, we configure one tenant (10 users, 5 roles
and 10 objects) with a purpose to comparing the authorization overhead between
(1) RBAC0 implemented by one policy only; and (2) RBAC0 that is realized by
chaining 2 SMPolicies together. The results is shown in Fig. 4. In case (1), the
throughput was around 4 requests per second, while in case (2), the throughput
was 2.9 requests per. It can be concluded that the extra overhead introduced by
policy chaining is around 32%.
decreased dramatically when the number of users got larger than 50, as shown
in Fig. 5. The worst case was 0.5 requests per second when the number of users
reached to 1500.
Scalability with the varying number of tenants. To evaluate its scalability, we con-
figured each tenant which has 10 users, 5 predefined roles and 10 VM objects.
As shown in Fig. 6, the throughput of policy engine varied from 5.7 requests
per second (one tenant) to 4.5 requests per second (10 tenants), showing slight
degradation. The reason is that SDAC is implemented using the micro-service
architecture, in which SMPolicies of each tenant run in a dedicated and inde-
pendent containers, enabling SDAC to scale freely with multiple tenants.
5 Conclusion
References
1. Meghanathan, N.: Review of access control models for cloud computing. Comput.
Sci. Inf. Technol. 3, 77–85 (2013)
2. Ngo, C., Demchemko, Y., de Laat, C.: Multi-tenant attribute-based access control
for cloud infrastructure services. J. Inf. Secur. Appl. 27, 65–84 (2016)
3. Sandhu, R.S., Samarati, P.: Access control: principle and practice. IEEE Commun.
Mag. 32(9), 40–48 (1994)
4. Sandhu, R.S.: Lattice-based access control models. Computer 26(11), 9–19 (1993)
5. Kalam, A.A.E., Baida, R.E., Balbiani, P., Benferhat, S., Cuppens, F., Deswarte,
Y., Miege, A., Saurel, C., Trouessin, G.: Organization based access control. In:
POLICY 2013, pp. 120–131 (2003)
6. Lang, B., Foster, I., Siebenlist, F., Ananthakrishnan, R., Freeman, T.: A flexible
attribute based access control method for grid computing. J. Grid Comput. 7,
169–180 (2009)
7. Calero, J.M., Edwards, N., Kirschnick, J., Wilcock, L., Wray, M.: Toward a multi-
tenancy authorization system for cloud services. IEEE Secur. Priv. 8(6), 48–55
(2010)
8. IBM: Best practices for access control in multi-tenant cloud solutions using Tivoli
Access Manager, May 2011. https://www.ibm.com/developerworks/cloud/library/
cl-cloudTAM/index.html
9. Almutairi, A.A., Sarfraz, M.I.: A distributed access control architecture for cloud
computing. IEEE Softw. 29(2), 36–44 (2012)
10. Decat, M., Lagaisse, B., Van Landuyt, D., Crispo, B., Joosen, W.: Federated
authorization for software-as-a-service applications. In: Meersman, R., Panetto, H.,
Dillon, T., Eder, J., Bellahsene, Z., Ritter, N., De Leenheer, P., Dou, D. (eds.) OTM
2013. LNCS, vol. 8185, pp. 342–359. Springer, Heidelberg (2013). https://doi.org/
10.1007/978-3-642-41030-7 25
11. Yu, S., Wang, C., Ren, K., Lou, W.: Achieving secure, scalable, and fine-grained
data access control in cloud computing. In: IEEE INFOCOM 2010, pp. 1–9 (2010)
12. Ferraiolo, D.F., Sandhu, R., Gavrila, S., Kuhn, D.R., Chandramouli, R.: Proposed
NIST standard for role-based access control. ACM Trans. Inf. Syst. Secur. 4(3),
224–274 (2001)
13. Park, J., Sandhu, R.: The UCONABC usage control model. ACM Trans. Inf. Syst.
Secur. 7(1), 128–174 (2004)
14. Park, J., Zhang, X., Sandhu, R.: Attribute mutability in usage control. In: Farkas,
C., Samarati, P. (eds.) DBSec 2004. IIFIP, vol. 144, pp. 15–29. Springer, Boston,
MA (2004). https://doi.org/10.1007/1-4020-8128-6 2
15. Pattaranantakul, M., Tseng, Y., He, R., Zhang, Z., Meddahi, A.: A first
step towards security extension for NFV orchestrator. In: 2016 IEEE Trust-
com/BigDataSE/ISPA, pp. 598–605 August 2016
16. Pattaranantakul, M., He, R., Meddahi, A., Zhang, Z.: SecMANO: towards network
functions virtualization (NFV) based security management and orchestration. In:
ACM International Workshop on SDN-NFVSec 2017, pp. 25–30, March 2017
17. XACML:3.0: eXtensible access control markup language (XACML) Version 3.0,
OASIS Standard (2013). http://portal.etsi.org/NFV/NFV White Paper.pdf
18. Jin, X., Krishnan, R., Sandhu, R.: A unified attribute-based access control model
covering DAC, MAC and RBAC. In: Cuppens-Boulahia, N., Cuppens, F., Garcia-
Alfaro, J. (eds.) DBSec 2012. LNCS, vol. 7371, pp. 41–55. Springer, Heidelberg
(2012). https://doi.org/10.1007/978-3-642-31540-4 4
A Cross-Modal CCA-Based Astroturfing
Detection Approach
1 Introduction
With the rapid development of Internet, especially the popularity of the online
shopping, produced unprecedented significant impact on the way that people live
and goods purchase. However, there are a large number of astroturfing with their
false comments in the product comments, which may affect the user’s point of
view and guide public opinion [1,2]. Because the network is virtual, consumers
are difficult to select the best quality goods among various kinds of products
through the pictures. In recent years, online shopping has become a part of
people’s lives, although consumers enjoy the convenience of online shopping. So
consumers tend to refer to the comments in the goods to decide the choice, but
in order to improve the credibility, sales, baby popularity, most merchants use
astroturfing to brush praise. The comments of astroturfing is likely to mislead the
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 582–592, 2018.
https://doi.org/10.1007/978-3-319-89500-0_50
A Cross-Modal CCA-Based Astroturfing Detection Approach 583
purchasers and affect them to select the goods incorrectly, the existence of false
comments seriously affect the reference value of the information, misleading the
consumer’s judgment of potential consumers greatly. Therefore, in order to create
a good online shopping environment and protect the interests of consumers,
detecting the online astroturfing is very important [3].
At present, most of the comments on shopping sites is a combination of
text and picture comments website such as Taobao. Astroturfing which post
comments with pictures in online shopping can be probably divided into two
categories. One class is that most of the astroturfing tend to post similar com-
ments directly and selected the original pictures of goods in the comments for
convenience, we can see that their images are almost identical, and the words
in the text comment are similar. The word repetition rate is so high and the
overall meaning of the comment is roughly the same. In addition, the pictures
selected or intercepted by users might be affected by resolution, format, and so
on. Therefore, the pictures similarity will not be high only through the picture
recognition, it is difficult to detect the astroturfing. The combination of pictures
and text can express the overall meaning of the comments, and improve the
similarity of the comments.
To warm up, we can use the CCCA model to combine the text and the
picture and transform them to each other. Hence, we label this waterarmy
“astrotufing 1” who publish similar text and images, and we use CCCA model
to detect them.
The other class is that a lot of astroturfing post other pictures casually instead
of buying goods which makes the pictures of comments are inconsistent with the
corresponding goods. Therefore, the text comments are similar while the pictures
comments are irrelevant, so that the comments pictures will have low similarity to
the goods pictures. Hence, we label this waterarmy “astrotufing 2” who publish
similar text and different images, and we use image similarity algorithm to detect
them.
The structure of this paper is organized as follows. Section 2 will discuss the
related work in this field. Section 3 will present astroturfing detection methods
through CCCA model. Section 4 gives experimental and results. Finally, conclu-
sion is showed in Sect. 5.
2 Related Work
Currently, the study of astroturfing has made great progress compared to pre-
vious years ago. According to the different features, the methods of astroturfing
identification using, mainly divided into three categories: based on content char-
acteristics, based on behavioral characteristics and based on synthetic features.
Content Based Approach. Content-based approaches are based on the com-
ments similarity and its linguistic features to extract comments of similar content
and discover false reviewers. Through the analysis of tendency of the text in com-
ments, the false comments issued by the astroturfing could be found. Ott et al.
[2] studied deceptive opinion spam that have been deliberately written to sound
584 X. Bai et al.
authentic, and they verified that the text feature of the comment can be used
to identify the false comments. Duh et al. [4] found that astroturfing released a
false comment that deviated from the normal user comments by analyzing the
tendencies of the review text.
Behavior Based Approach. Behavior-based approaches refer to the astro-
turfing has a number of comments focused on sudden, extreme, releasing early
product reviews and so on. Lim et al. [5] identify several characteristic behav-
iors of review spammers and model these behaviors so as to detect them. They
analyzed a large number of product reviews in Amazon and extracted similar
comments and propose scoring methods to measure the degree of spam for each
reviewer. Mukherjee et al. [6] propose a novel angle to the problem by modeling
spamicity as latent, they use users and their published comments to build classi-
fiers and use the characteristics of astroturfing to distinguish itself with ordinary
users.
Multiple-feature Based Approach. Multiple-feature based approaches are
the combination of the content characteristics and behavior characteristics of
astroturfing, which utilize the artificial tagging the samples of astroturfing and
the credibility theory of communication to identify astroturfing. Lu et al. [7] com-
bine the astroturfing characteristics and content characteristics using the anno-
tative factor graph model and identify the unknown network using the artificial
tagging network of astroturfing samples and theory of credibility propagation.
Mukherjee et al. [8] first put forward the e-commerce field network of astroturf-
ing identification methods, they use the content of the comments to produce
candidate groups, and then found astroturfing according to their characteristics.
Thus, it can be seen that the traditional methods of astroturfing identification
in the field of e-commerce mainly based on the content similarity and its text
features to find false commentators, the methods are very simple and inaccurate.
This work, we combine text with pictures and use cross-modal CCA method to
identify astroturfing in e-commerce.
The comments data obtained from the Taobao website can not be used directly as
experimental data, so we preprocess the data. Since the comment is a paragraph
of text, so it is necessary to convert the text into multi-dimensional eigenvector.
First of all, we extract the keywords in the comments and split a text comment
into a number of words, then we use these words to represent a document [9].
Text keywords extraction will be implemented by the Textrank algorithm. Hence,
we present the specific steps:
(1) We split up the text comment T that we have climbed in accordance
with the complete sentence. (2) For each sentence, we use word segmentation
and word tagging, and filter out the stop words, only retain the specified part
of the word, such as nouns, verbs, adjectives, retain candidate keywords. (3)
Constructing candidate keywords graphs G = (V, E), where V is a node set
which is composed of the candidate keywords generated in step 2, then we use
co-occurrence to construct the edges between any two points. There are edges
between the two nodes, only when their corresponding vocabulary in the length
of the window K co-exist, K represents the window size, that is, the most common
K words. (4) According to the above formula, iterating and propagating the
weight of each node until they converge.
586 X. Bai et al.
e(wj , wi ) 1
R(wi ) = λ R(wj ) + (1 − λ) (1)
j:wj →wi
O(wj ) |V |
where R(w) denotes the value of PageRank; O(w) denotes the side of the degree;
e(wj , wi ) denotes the weight from edge wj to wi ; λ denotes the smoothing factor.
(5) The node weights are sorted in reverse order, and the most important T words
are obtained as candidate keywords. (6) The most important T words from step
5 will be marked in the original text, if adjacent phrases are formed, then they
are grouped into multiple keywords.
The gradient and gradient directions of the image are respectively calculated
in the horizontal and vertical directions. Mainly to capture the contours and
texture information, and further weakening the interference of light.
The gradient of the pixel (x, y) in the image is:
Gy (x, y)
α(x, y) = tan−1 ( ) (7)
Gx (x, y)
A Cross-Modal CCA-Based Astroturfing Detection Approach 587
(3) The image is divided into several small units, the gradient histogram of each
small unit is counted. Several units make up a block, and the eigenvectors
of all the units in a block are concatenated to get the HOG eigenvector of
this block.
(4) The HOG eigenvectors of all the blocks in the image can be connected in
series to get the HOG feature vector of the image. This is the final multi-
dimensional feature vector available for classification.
Finally, the image feature vector format is Si = IS1i , IS2i , . . . , ISni .
them to the label of “astroturfing 1”. Then, the CCA algorithm is exploited to
study the cross-modal learning for each pair of text and image comments, and
a classification model is obtained. Finally, in the part of test, we compared the
image similarity of the pictures in comments of test data set and the sample
pictures of products provided by businesses. If the similarity score is less than
0.3, the comment may be suspected to be the second type of astroturfing, and the
user’s ID is output. Otherwise, the text comment and all the picture comments
are projected into the common feature subspace o using the space projection
function ϕT , ϕI , and then the K-nearest neighbor algorithm is used to find the
closest category in the trained model and finally the results are output.
the AUC value can be used to evaluate the pros and cons of a binary classifier.
In this paper, we use the ROC curve and the AUC value to evaluate the classi-
fication accuracy of our experiment. The ROC curve of the experimental results
is shown in Fig. 4. According to the accuracy of ROC curve for all test dataset,
the accuracy of our detection method is 89.5%.
The ROC curves of three types of products are shown in Fig. 5. There are
three curves which represent three types of products, the yellow one represent
the clothing, the green one represent the shoes and the blue one represent the
bags. We can see that the AUC value for clothing is 0.9143, the AUC value for
shoes is 0.8762 and the AUC value for bags is 0.8236. Therefore, the astroturfing
of clothing have high accuracy. Hence, the astroturfing may would like to publish
their comments in clothing class.
As shown in Fig. 6, we can see that the precision rate is equal to recall rate
when the value is about 0.8. It validates that the proposed cross-modal detection
method of astroturfing have a good performance.
A Cross-Modal CCA-Based Astroturfing Detection Approach 591
5 Conclusion
In this paper, we proposed a cross-modal CCA model to detect astroturfing in
online shopping. To verify our method, we conduct an experiment on a Taobao
dataset containing comments of manufactured products. We first extract text
and image features, and use image similarity algorithm to detect the astroturfing
which release pictures of goods irrelevant to the samples. Then, we use the
CCA algorithm to study the cross-modal learning for each pair of text and
image comments, mapping the text and image from their respective natural
spaces to a CCA space. Finally, we use this method to detect astroturfing that
publish pictures of goods almost same to the samples. Experimental results have
demonstrated that the proposed method has a good performance. As part of our
future work, we will explore and study more astroturfing features not only on
shopping website and research more approaches to detect astroturfing.
References
1. Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In:
Proceedings of the 26th Annual Computer Security Applications Conference, pp.
1–9. ACM (2010)
2. Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by
any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies, vol. 1,
pp. 309–319. Association for Computational Linguistics (2011)
3. Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army:
detection of hidden paid posters. In: 2013 IEEE/ACM International Conference
on Advances in Social Networks Analysis and Mining (ASONAM), pp. 116–120.
IEEE (2013)
4. Duh, A., Štiglic, G., Korošak, D.: Enhancing identification of opinion spammer
groups. In: Proceedings of International Conference on Making Sense of Converging
Media, p. 326. ACM (2013)
5. Lim, E.-P., Nguyen, V.-A., Jindal, N., Liu, B., Lauw, H.W.: Detecting product
review spammers using rating behaviors. In: Proceedings of the 19th ACM Inter-
national Conference on Information and Knowledge Management, pp. 939–948.
ACM (2010)
6. Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., Ghosh,
R.: Spotting opinion spammers using behavioral footprints. In: Proceedings of the
19th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pp. 632–640. ACM (2013)
7. Lu, Y., Zhang, L., Xiao, Y., Li, Y.: Simultaneously detecting fake reviews and
review spammers using factor graph model. In: Proceedings of the 5th Annual
ACM Web Science Conference, pp. 225–233. ACM (2013)
592 X. Bai et al.
8. Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer
reviews. In: Proceedings of the 21st International Conference on World Wide Web,
pp. 191–200. ACM (2012)
9. Peng, L., Bin, W., Zhiwei, S., Yachao, C., Hengxun, L.: Tag-TextRank: a webpage
keyword extraction method based on tags. J. Comput. Res. Dev. 49(11), 2344–2351
(2012)
10. Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao, L., Huang, T.: Large-
scale image classification: fast feature extraction and SVM training. In: 2011 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1689–1696.
IEEE (2011)
11. Mizuno, K., Terachi, Y., Takagi, K., Izumi, S., Kawaguchi, H., Yoshimoto, M.:
Architectural study of hog feature extraction processor for real-time object detec-
tion. In: 2012 IEEE Workshop on Signal Processing Systems (SiPS), pp. 197–202.
IEEE (2012)
12. Pereira, J.C., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G.R.G., Levy, R.,
Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multi-
media retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)
13. Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R.,
Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceed-
ings of the 18th ACM International Conference on Multimedia, pp. 251–260. ACM
(2010)
14. Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces
for cross-modal matching. In: Proceedings of the IEEE International Conference
on Computer Vision, pp. 2088–2095 (2013)
15. Ranjan, V., Rasiwasia, N., Jawahar, C.V.: Multi-label cross-modal retrieval. In:
Proceedings of the IEEE International Conference on Computer Vision, pp. 4094–
4102 (2015)
Security Protocols
Secure and Efficient Two-Factor
Authentication Protocol Using RSA
Signature for Multi-server Environments
1 Introduction
User authentication scheme is essential for implementing the secure communi-
cation because it can provide mutual authentication. Two-factor authentication
protocols are widely used to ensure secure communication between the remote
client and the server. It is a critical task to design a secure and robust two-factor
authentication protocols. To ensure the security of client-server communication
in single server environment, many authentication protocols using RSA cryp-
tosystem [1–3], hash function [1,4,5], chaotic map [6,7] and elliptic curve [8]
have been proposed. However, most of these protocols cannot be used in multi-
server environments. To address the issue, many authentication protocols for
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 595–605, 2018.
https://doi.org/10.1007/978-3-319-89500-0_51
596 Z. Xu et al.
Table 1. Notations
Symbol Meaning
Ui User
ASj Application-server
RC Registration center
IDi Identity of Ui
IDj Identity of ASj
g A generator g ∈ Zn∗
e Public key of RC
d Private key of RC
a, r Random number selected by the Ui in authentication phase
598 Z. Xu et al.
it can computes c1 = mej1 mod n, c2 = mej2 mod n. There exist some r, s such
r s
that rej1 +sej2 = 1, m = mrej1 +sej2 = (mej1 ) (mej2 ) = cr1 cs2 mod n. Therefore,
the adversary can get the value of m using c1 , c2 , r, s.
3 Proposed Protocol
We put forward a secure authentication system for multi-server environment
which can withstand the above mentioned security issues. Our scheme consists
of three phases: application-server registration phase, user registration phase,
verification phase.
– ASj selects two large prime numbers p, q, and computes nj = pj ×qj , φ (nj ) =
(pj − 1) (qj − 1).
– ASj chooses a public key ej (1 < ej < φ (nj )), where gcd (φ (nj ) , ej ) = 1.
Then, it computes it’s private key dj ≡ e−1
j mod φ (nj ).
– Finally, ASj chooses his/her identity IDj and sends ej , nj , IDj to RC
securely.
d
– RC computes Cerj = h (ej || IDj || nj ) and sends Cerj to ASj .
4 Security Analysis
4.1 Security Proof
Theorem 1. If has advantage AdvPake (A) against our scheme running in time
qsend Send queries, qexe Execute queries and qh Hash queries. Define the security
2
qh
length l and the password space |D|. Then, we can attain: AdvPake ≤ 2l−1 +
(qsend +qexe )2
p
DLP
+ 2qh · AdvG (t) + q2send
l−2 +
2qsend DLP
|D| , where AdvG (t) denote the
probabilistic polynomial time t to breach DLP problem.
– ASj : Search for (∗, k) in Lh . This game will be terminated if the information
does not exist. Otherwise, compute Aj = k dj .
e e
– Ui : Computes Aj j and checks whether Aj j = h (Ti ). If holds, Ui search for
ej , nj , Cerj , Aj in the send list.
If A guess the parameter k successfully without making hash quries, this game
will succeed. Therefore, we have P r [Suc3 ] − P r [Suc2 ] ≤ qsend
2l
.
Game G5 : We design this game to imitate Discrete logarithm problem. The
security of our protocol depends on the Discrete logarithm problem solely: Ri =
r
h (P IDi ) mod nj .
600 Z. Xu et al.
5 Performance Analysis
In this section, we compare the proposed scheme with recent authentication
schemes [18,21,23–25] proposed in terms of security and performance (as shown
in Table 2). We use some time complexities to evaluate the computational cost.
Th denotes the cost time for one-way hash operation. Tsym denotes the execution
time for symmetric key encryption/decryption operation. Te denotes the running
time for exponentiation operation. Tm denotes the execution time for modular
multiplication operation.
We have implemented various cryptographic operations with the MIRACL
C/C++ Library [26] on a personal computer with 4G bytes memory and the
Windows 7 operating system. It requires Visual C++ 2008, 1024-bit cyclic
group, AES for symmetric encryption/decryption, 160-bit prime field Fp and
Security attributes and schemes [18] [23] [24] [25] [21] Our scheme
√ √ √ √ √ √
User anonymity
√ √ √ √ √ √
Stolen smart card attack
√ √ √ √ √
Impersonation attack ×
√ √ √ √ √ √
Replay attack
√ √ √ √ √ √
Denial of service attack
√ √
Session key verification × × × ×
√ √ √ √ √
Man-in-the-middle ×
√ √ √ √ √
Common modulus attacks ×
Secure and Efficient Two-Factor Authentication Protocol 603
Table 3. Comparison of computation cost at the user side and the server side
Table 4. Comparison of execution time at the user side and the server side
SHA-1 operation. The execution time of these different operations are: 0.0004 ms,
0.1303 ms, 0.0147 ms and 1.8269 ms.
In Table 1, we find that our proposed scheme can withstand known attacks,
such as user anonymity, common modulus attacks, mutual authentication, Server
impersonation attack. In Tables 3 and 4, we find that computational cost time
of our proposed scheme is lower than the schemes in [18,23] and nearly equal
with the schemes in [21,24,25].
6 Conclusion
In this paper, we cryptanalyzed Amin et al.’s scheme, and found that their
protocol is susceptible to common modulus attack. Then We present a secure
and efficient two-factor authentication protocol using RSA signature for multi-
server environments. We prove informally that our protocol can withstand dif-
ferent cryptographic attacks. In the proposed scheme, we employ RSA signature
to implement the authentication scheme. Our proposed scheme is suitable for
deployment in various low-power smart cards, and in particular for the mobile
computing networks.
References
1. Amin, R., Biswas, G.P.: An improved RSA based user authentication and session
key agreement protocol usable in TMIS. J. Med. Syst. 39(8), 1–14 (2015)
2. Giri, D., Maitra, T., Amin, R., Srivastava, P.D.: An efficient and robust RSA-based
remote user authentication for telecare medical information systems. J. Med. Syst.
39(1), 1–9 (2015)
3. Amin, R., Biswas, G.P.: Remote access control mechanism using rabin public
key cryptosystem. In: Mandal, J.K., Satapathy, S.C., Sanyal, M.K., Sarkar, P.P.,
Mukhopadhyay, A. (eds.) Information Systems Design and Intelligent Applications.
AISC, vol. 339, pp. 525–533. Springer, New Delhi (2015). https://doi.org/10.1007/
978-81-322-2250-7 52
4. Amin, R., Biswas, G.P.: Cryptanalysis and design of a three-party authenticated
key exchange protocol using smart card. Arab. J. Sci. Eng. 40(11), 1–15 (2015)
5. Islam, S.K.H., Biswas, G.P., Choo, K.K.R.: Cryptanalysis of an improved
smartcard-based remote password authentication scheme. Inf. Sci. Lett. 3(1), 35
(2014)
6. Hafizul Islam, S.K., Khan, M.K., Obaidat, M.S., Bin Muhaya, F.T.: Provably
secure and anonymous password authentication protocol for roaming service in
global mobility networks using extended chaotic maps. Wirel. Pers. Commun.
84(3), 1–22 (2015)
7. Hafizul Islam, S.K.: Design and analysis of a three party password-based authenti-
cated key exchange protocol using extended chaotic maps. Inf. Sci. Int. J. 312(C),
104–130 (2015)
8. Amin, R., Biswas, G.P.: A secure three-factor user authentication and key agree-
ment protocol for TMIS with user anonymity. J. Med. Syst. 39(8), 1–19 (2015)
9. Hafizul Islam, S.K.: Design and analysis of an improved smartcard-based remote
user password authentication scheme. Int. J. Commun. Syst. 29(11), 1708–1719
(2016)
10. Hafizul Islam, S.K.: A provably secure id-based mutual authentication and key
agreement scheme for mobile multi-server environment without ESL attack. Wirel.
Pers. Commun. 79(3), 1975–1991 (2014)
11. Amin, R., Biswas, G.P.: A secure light weight scheme for user authentication and
key agreement in multi-gateway based wireless sensor networks. Ad Hoc Netw. 36,
58–80 (2016)
12. Liao, Y.-P., Wang, S.-S.: A secure dynamic ID based remote user authentica-
tion scheme for multi-server environment. Comput. Stand. Interfaces 31(1), 24–29
(2009)
13. Hsiang, H.-C., Shih, W.-K.: Improvement of the secure dynamic ID based remote
user authentication scheme for multi-server environment. Comput. Stand. Inter-
faces 31(6), 1118–1123 (2009)
14. Lee, C.-C., Lin, T.-H., Chang, R.-X.: A secure dynamic ID based remote user
authentication scheme for multi-server environment using smart cards. Expert
Syst. Appl. 38(11), 13863–13870 (2011)
15. Truong, T.-T., Tran, M.-T., Duong, A.-D.: Robust secure dynamic ID based remote
user authentication scheme for multi-server environment. In: Murgante, B., Misra,
S., Carlini, M., Torre, C.M., Nguyen, H.-Q., Taniar, D., Apduhan, B.O., Gervasi,
O. (eds.) ICCSA 2013. LNCS, vol. 7975, pp. 502–515. Springer, Heidelberg (2013).
https://doi.org/10.1007/978-3-642-39640-3 37
Secure and Efficient Two-Factor Authentication Protocol 605
16. Sood, S.K., Sarje, A.K., Singh, K.: A secure dynamic identity based authentication
protocol for multi-server architecture. J. Netw. Comput. Appl. 34(2), 609–618
(2011)
17. Li, X., Xiong, Y., Ma, J., Wang, W.: An efficient and security dynamic identity
based authentication protocol for multi-server architecture using smart cards. J.
Netw. Comput. Appl. 35(2), 763–769 (2012)
18. Pippal, R.S., Jaidhar, C.D., Tapaswi, S.: Robust smart card authentication scheme
for multi-server architecture. Wirel. Pers. Commun. 72(1), 729–745 (2013)
19. He, D., Chen, J., Shi, W., Khan, M.K.: On the security of an authentication scheme
for multi-server architecture. Int. J. Electr. Secur. Digit. Forensics 5(3/4), 288–296
(2013)
20. Arshad, H., Rasoolzadegan, A.: Design of a secure authentication and key agree-
ment scheme preserving user privacy usable in telecare medicine information sys-
tems. J. Med. Syst. 40(11), 237 (2016)
21. Amin, R., Islam, S.K., Khan, M.K., et al.: A two-factor RSA-based robust authen-
tication system for multiserver environments. Secur. Commun. Netw. 2017, 15 p.
(2017). Article no. 5989151
22. Ding, W., Ping, W.: Two birds with one stone: two-factor authentication with
security beyond conventional bound. IEEE Trans. Dependable Secure Comput.
PP(99), 1 (2016)
23. Yeh, K.-H.: A provably secure multi-server based authentication scheme. Wirel.
Pers. Commun. 79(3), 1621–1634 (2014)
24. Wei, J., Liu, W., Hu, X.: Cryptanalysis and improvement of a robust smart card
authentication scheme for multi-server architecture. Wirel. Pers. Commun. 77(3),
2255–2269 (2014)
25. Li, X., Niu, J., Kumari, S., Liao, J., Liang, W.: An enhancement of a smart card
authentication scheme for multi-server architecture. Wirel. Pers. Commun. Int. J.
80(1), 175–192 (2015)
26. Lili, X., Fan, W.: Cryptanalysis and improvement of a user authentication scheme
preserving uniqueness and anonymity for connected health care. J. Med. Syst.
39(2), 10 (2015)
Authenticated Group Key Agreement
Protocol Without Pairing
1 Introduction
A group key agreement (GKA) protocol ensures establishment of a common ses-
sion key among the group members that remains unknown to outsiders. The
GKA protocol is applicable in various real world communication networks such
as ad-hoc networks, wireless sensor networks and body area networks, where
devices are involved in sharing common secret data over an open channel. There
are numerous other real life examples of GKA including distributed computa-
tions, video conferencing, multi-user games, etc. The key establishment protocols
can be categorized into two sets: key transport protocols and key agreement pro-
tocols. In the former, the session key is derived by one of the powerful nodes
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 606–618, 2018.
https://doi.org/10.1007/978-3-319-89500-0_52
Authenticated Group Key Agreement Protocol Without Pairing 607
and then the key is transferred securely to all the members of the group. In the
latter, a common session key is derived by all the members by interactive partic-
ipation in an agreement protocol. Moreover, group key agreement protocols can
be further categorized into balanced and imbalanced protocols. All the partic-
ipants in balanced GKA share same computing burden whereas in imbalanced
protocols a powerful node verifies all the received signatures. As established by
Bellare and Rogaway (CRYPTO 1993) [2], authentication is an essential secu-
rity requirement for key exchange protocols, otherwise the man in the middle
(MITM) attack yields the protocol vulnerable to impersonation attacks. Pre-
cisely, we provide here a construction of an authenticated group key agreement
(AGKA) protocol. We rely on the application of PKI based signature for the
purpose. Motivated by Shamir’s idea of identity-based (ID) cryptosystem [20],
we deploy our scheme on the ID-based setting to avoid overhead of certificate
management due to classical PKI setup.
After the seminal work of Diffie and Hellman [10], there has been exten-
sive efforts to convert their two-party key exchange protocol to multi-party key
exchange protocol [6,14,21]. Among the most notable works, Joux’s one round
three-party key agreement protocol [15] is considered as a significant contribu-
tion for practical GKA protocol due to the functionality of pairing. Based on
Joux’s work [15], Barua et al. [1] have presented protocols of multi-party key
agreement in two flavors unauthenticated - based on ternary trees and authen-
ticated - from bilinear maps. Unfortunately, their protocols are secure against
only passive adversaries. The first provable security model for authenticated key
exchange (AKE) security was introduced by Bresson et al. [3–5] but their pro-
tocol accounts O(n) rounds, which is very expensive. Further, Katz and Yung
improved the model in [17] and proposed a scalable compiler which transforms
any unauthenticated GKA into an authenticated one. Later, Katz and Shin [16]
modeled the insider security in GKA protocols. In 2009, Gorantla et al. [11] pro-
posed a security model, we call it the GBG model, which addresses the forward
secrecy and key compromise impersonation resilience (KCIR) for GKA proto-
cols to take into account authenticated key exchange (AKE) security and mutual
authentication (MA) security. Their model was revisited and enhanced by Zhao
et al. [29] in 2011. They improved the GBG model to stronger extended GBG
model, we call it the EGBG model, where they addressed both, the leakage of
secret key as well as the leakage of ephemeral key independently.
The authenticated ID-based GKA protocol was first formalized by Choi
et al. [7] in 2004, but their scheme was found vulnerable to insider colluding
attack [28]. In 2007, Kyung-Ah [18] claimed that scheme in [7] is vulnerable to
another insider colluding attack and improved the protocol. Unfortunately, none
of these AGKA protocols could achieve the perfect forward secrecy. Perfect for-
ward secrecy allows the compromise of long term secret keys of all participants
maintaining all earlier shared secrets unrevealed. In 2011, Wu et al. [27] pre-
sented a provably secure ID-AGKE protocol from pairings, providing forward
secrecy and security against the insider attacks. Later, Wu et al. [26] presented
their first revocable ID-based AGKE (RID-AGKE) protocol, which is provably
608 G. Sharma et al.
secure and can resist malicious participants as well. The major limitation of
existing literature is, not to consider the ephemeral key leakage [29]. In 2015,
Teng et al. [22] presented first ID-based AGKA protocol secure in EGBG model.
Their protocol claims MA security with KCIR, achieving full forward secrecy.
This protocol includes extensive number of pairing operations (2n2 − 2n) which
is inefficient for practical implementations specially for low power devices. The
session key in their construction is concatenation of ki where ki s are randomly
chosen strings of length k. Therefore, the leakage of randomness will reveal the
session key and hence, to the best of our knowledge, there is no existing protocol
secure in EGBG model. Our AGKA protocol does not use any pairing operation
and hence turns to be very much suitable for the computational performance,
especially during the implementation with limited resource. We have proved the
security following the EGBG model. Moreover, our efficiency analysis asserts
that our scheme is more efficient in the view of computation and operation time
with compare to the existing similar schemes.
Rest of the paper is organized as follows: in Sect. 2, we introduce necessary
definitions, corresponding hardness assumption, the AGKA protocol and security
model for AGKA. The proposed AGKA scheme is described in Sect. 3. The
security analysis and efficiency comparison have been presented in Sects. 4 and 5
respectively, followed by the conclusion in Sect. 6.
– Execute(ΠUi ): Any time the adversary A can query for the complete tran-
scripts of an honest execution among the users selected by himself.
– Send(ΠUi , m): During the normal execution of the protocol, this query
returns the reply generated by instance ΠUi .
– Reveal Key (ΠUi ): When the oracle is accepted, this query outputs the
group session key.
– Corrupt(Ui ): This query models the reveal of long-term secret key. The
participant is honest iff adversary A has not made any Corrupt query.
– Ephemeral Key Reveal(ΠUi ): This query models the reveal of ephemeral
key of participant Ui for instance ΠUi .
– Test(ΠUi ): This query can be made only once during the execution of
protocol π. The challenger responds with a session key.
$
Challenge: During the Test query, the challenger randomly selects a bit b ←
{0, 1} and returns the real session key if b = 0 or a random value if b = 1.
Guess: A outputs its guess b for b.
The adversary succeeds in breaking the security if b = b. We denote this event
def
by SuccA and define A’s advantage as AdvA (1k ) = |2P r[SuccA ] − 1|.
Definition 4 (AKE-Security). Let Aake be an adversary against AKE-
security. It is allowed to make queries to the Execute, Send, RevealKey, Corrupt
and Ephemeral Key Reveal oracles. It is allowed to make a single Test query
to the instance ΠUi at the end of the phase and given the challenge session key
skch,b (depending on bit b). Finally, Aake outputs a bit b and wins the game if
(1) b = b and (2) the instance ΠUi is fresh till the end of the game. The advan-
tage of Aake is AdvAake = |2P r[SuccAake ]−1|. The protocol is called AKE-secure
if the adversary’s advantage AdvAake is negligible.
Definition 5 (MA-Security with Outsider KCIR). Let Ama,out be an out-
sider adversary against MA-security. Let pidiU be a set of identities of participant
in the group with whom ΠUi wishes to establish a session key and sidiU denotes a
session id of an instance ΠUi . Ama,out is allowed to make queries to the Execute,
Send, RevealKey, Corrupt and EphemeralKey Reveal oracles. Ama,out breaks the
MA-security with outsider KCIR notion if at some point there is an uncorrupted
instance ΠUi with the key skU i
and another party U which is uncorrupted when
ΠU accepts such that there are no other insiders in pidiU and the following con-
i
ditions hold:
– there is no instance ΠUi with (pidiU , sidiU ) = (pidiU , sidiU ) or,
– there is an instance ΠUi with (pidiU , sidiU ) = (pidiU , sidiU ) which has accepted
i i
with skU = skU .
– there is no instance ΠUi with (pidiU , sidiU ) = (pidiU , sidiU ) or,
– there is an instance ΠUi with (pidiU , sidiU ) = (pidiU , sidiU ) which has accepted
i i
with skU = skU .
Key Computation (Kji , sid, pid): Upon receiving <Kji , mask, σi , Ti >, each
user verifies the received signature as:
Each user Ui then computes k̃j = H(Uji )⊕Kji . Similarly, k̃n can be computed
using mask. Note that, Uij = li Lj = li lj P = lj li P = lj Li = Uji . Each user
Ui checks the correctness of ki as H(kj ) = H(k̃j ) for (1 ≤ j ≤ n, j = i) and
computes the session identity sid = H(k1 )
H(k2 )
. . .
H(k̃n ). Finally,
the session key is computed as sk = H(k1
k2
. . .
k̃n
sid
pid).
4 Security Analysis
In our security proof, we cover the most recent security concept which achieves
the AKE and MA security, where the latter covers the impersonation attacks.
Particularly, we consider key compromise impersonation (KCI) resilience against
insider and outsider adversaries as discussed in [11]. An outsider adversary may
compromise the long-term private key of all parties except one. An outsider
adversary is successful in KCI attack if it can impersonate an uncorrupted
instance (in our case the ephemeral key) of an uncorrupted party to an uncor-
rupted instance of any of the corrupted parties. The adversary’s goal is to break
the confidentiality of the session private key and to break the MA-security. An
adversary is called insider adversary if it succeeds in corrupting a party and
participating in a protocol session representing the corrupted party. An insider
adversary is successful in breaking KCI security if it succeeds to impersonate
an uncorrupted instance of an uncorrupted party A to another uncorrupted
instance of another party B. The only goal of an insider adversary is to break
the MA-security.
Authenticated Group Key Agreement Protocol Without Pairing 613
Theorem 1. We first show that our protocol is AKE-secure under the hardness
of CDH problem and under the condition that the underlying signature scheme is
UF-CMA secure and H1 , H2 , H3 , H are random oracles. The advantage of Aake
is upper bounded by the following term
(qs + qe + qH1 + qH2 + qH3 + qH )2 qs2
2 n2 AdvACM A ,Σ + +
2λ 2λ
qs qH1 qH2 qH3 + qH
+ nqs qH1 qH2 qH3 qH AdvACDH + ,
2λ
where n is the number of participants.
Proof. We prove the theorem via game hopping technique. Let Ei be an event
that Aake wins the i-th AKE-security game. Furthermore, let ρi be the advan-
tage of Aake in game i. We set ρi = |2P r[Si ] − 1|. Our idea of game hopping
technique is motivated by [9]. If an event E occurs during Aake s execution and it
is detectable by simulator, then E is independent of Ei . We say that two games
Gamei , Gamei+1 are identical unless an event E occurs such that P r[Ei+1
|E ] = 1/2.
Game0: This game is the same as original AKE-security game. The advantage
is given by AdvAake = |2P r[E0 ] − 1| = ρ0 .
Game1: This is a game as Game0 except that the simulation of an event fails
if an event ‘Forge’ occurs, which means that Aake issues a Send query with
(mi , σi ), where user Ui is not corrupted and mi was not output in the previous
instance of Ui . According to the AKE-security for KCI attacks definition, Aake
can corrupt up to n−1 parties, but it cannot modify messages at the instances
of corrupted users. If Forge occurs, it can be used to fake signatures given
a public key as follows: the public key is assigned to one person where the
other n − 1 parties are assumed to be normal according to the protocol. Since
n−1 parties are corrupt, the secret keys of those n−1 parties are known. The
only secret key which corresponds to the public key of the Unf-CMA game
can be queried to the signing oracle. The signing oracle is available from the
underlying signature scheme. It is obvious that the probability that Acma
does not corrupt a party is 1/n, such that holds AdvA ≥ n1 P r[F orge] ⇔
P r[F orge] ≤ nAdvAcma .
Game2: This game is the same as the previous but the simulation fails if an event
‘Collision’ appears. This is the case when one of the random oracles produces
a collision. This happens when H1 (IDi , Ri ) = H1 (IDi , Ri ), with |I| possible
values for IDi , where I is the identity space, or H1 (IDi , Ri ) = H1 (IDi , Ri ),
with q possible values for Ri , or H1 (IDi , Ri ) = H1 (IDi , Ri ) with q|I| possible
variations for (IDi , Ri ). Since the input of the second hash function consists
of 3 entries, there are 6+1 options for a possible collision with in total q 2 |ID|
possible value for all options of collisions. Analogously there are 3 options for
the collision of H3 with total λ!|q −1| possible values and 1 collision option for
the H function with λ! possible values in total. Each of the Execute and Send
queries requires a query to one of the random oracles, such that the number
614 G. Sharma et al.
Proof. The proof follows using the game hopping technique as in the proof of
previous theorem. Since the descriptions of the games is very similar to the
already presented proof of AKE-security, we sketch it here and show only the
final result. The game sequence ends after the third game, where the Forge,
Collision and Repeat events could be resolved during the three games. If the
game does not abort, it follows that all honest parties from pidi compute the
same secret key, such that P r[E3 ] = 0.
5 Efficiency Analysis
In this section, we compare the efficiency of our proposed AGKA protocol with
some recent ID-based GKA protocols [22,24–27] (Table 1).
From the above table, it is clear that our proposed AGKA do not require any
expensive operation like pairing and hence very efficient with compare to the
existing schemes. To the best of our knowledge, ours is the first pairing free
balanced AGKA protocol secure in the strong security model, the EGBG model.
6 Conclusion
References
1. Barua, R., Dutta, R., Sarkar, P.: Extending Joux’s protocol to multi party key
agreement. In: Johansson, T., Maitra, S. (eds.) INDOCRYPT 2003. LNCS, vol.
2904, pp. 205–217. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-
540-24582-7 15
2. Bellare, M., Rogaway, P.: Entity authentication and key distribution. In: Stinson,
D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 232–249. Springer, Heidelberg
(1994). https://doi.org/10.1007/3-540-48329-2 21
3. Bresson, E., Chevassut, O., Pointcheval, D.: Provably authenticated group Diffie-
Hellman key exchange — the dynamic case. In: Boyd, C. (ed.) ASIACRYPT 2001.
LNCS, vol. 2248, pp. 290–309. Springer, Heidelberg (2001). https://doi.org/10.
1007/3-540-45682-1 18
4. Bresson, E., Chevassut, O., Pointcheval, D.: Dynamic group Diffie-Hellman key
exchange under standard assumptions. In: Knudsen, L.R. (ed.) EUROCRYPT
2002. LNCS, vol. 2332, pp. 321–336. Springer, Heidelberg (2002). https://doi.org/
10.1007/3-540-46035-7 21
5. Bresson, E., Chevassut, O., Pointcheval, D., Quisquater, J.-J.: Provably authenti-
cated group Diffie-Hellman key exchange. In: Proceedings of the 8th ACM Confer-
ence on Computer and Communications Security, pp. 255–264. ACM (2001)
Authenticated Group Key Agreement Protocol Without Pairing 617
6. Burmester, M., Desmedt, Y.: A secure and efficient conference key distribution
system. In: De Santis, A. (ed.) EUROCRYPT 1994. LNCS, vol. 950, pp. 275–286.
Springer, Heidelberg (1995). https://doi.org/10.1007/BFb0053443
7. Choi, K.Y., Hwang, J.Y., Lee, D.H.: Efficient ID-based group key agreement with
bilinear maps. In: Bao, F., Deng, R., Zhou, J. (eds.) PKC 2004. LNCS, vol.
2947, pp. 130–144. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-
540-24632-9 10
8. Debiao, H., Jianhua, C., Jin, H.: An ID-based proxy signature schemes without
bilinear pairings. Ann. Telecommun.-annales des télécommunications 66(11–12),
657–662 (2011)
9. Dent, A.W.: A note on game-hopping proofs. IACR Cryptology ePrint Archive
2006:260 (2006)
10. Diffie, W., Hellman, M.: New directions in cryptography. IEEE Trans. Inf. Theory
22(6), 644–654 (1976)
11. Gorantla, M.C., Boyd, C., González Nieto, J.M.: Modeling key compromise imper-
sonation attacks on group key exchange protocols. In: Jarecki, S., Tsudik, G. (eds.)
PKC 2009. LNCS, vol. 5443, pp. 105–123. Springer, Heidelberg (2009). https://
doi.org/10.1007/978-3-642-00468-1 7
12. Hess, F.: Efficient identity based signature schemes based on pairings. In: Nyberg,
K., Heys, H. (eds.) SAC 2002. LNCS, vol. 2595, pp. 310–324. Springer, Heidelberg
(2003). https://doi.org/10.1007/3-540-36492-7 20
13. Horng, S.-J., Tzeng, S.-F., Pan, Y., Fan, P., Wang, X., Li, T., Khan, M.K.: b-
SPECS+: batch verification for secure pseudonymous authentication in vanet.
IEEE Trans. Inf. Forensics Secur. 8(11), 1860–1875 (2013)
14. Ingemarsson, I., Tang, D., Wong, C.: A conference key distribution system. IEEE
Trans. Inf. Theory 28(5), 714–720 (1982)
15. Joux, A.: A one round protocol for tripartite Diffie–Hellman. In: Bosma, W. (ed.)
ANTS 2000. LNCS, vol. 1838, pp. 385–393. Springer, Heidelberg (2000). https://
doi.org/10.1007/10722028 23
16. Katz, J., Shin, J.S.: Modeling insider attacks on group key-exchange protocols.
In: Proceedings of the 12th ACM Conference on Computer and Communications
Security, pp. 180–189. ACM (2005)
17. Katz, J., Yung, M.: Scalable protocols for authenticated group key exchange.
In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 110–125. Springer,
Heidelberg (2003). https://doi.org/10.1007/978-3-540-45146-4 7
18. Kyung-Ah, S.: Further analysis of ID-based authenticated group key agreement
protocol from bilinear maps. IEICE Trans. Fundam. Electron. Commun. Comput.
Sci. 90(1), 295–298 (2007)
19. Schnorr, C.P.: Efficient identification and signatures for smart cards. In: Brassard,
G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 239–252. Springer, New York (1990).
https://doi.org/10.1007/0-387-34805-0 22
20. Shamir, A.: Identity-based cryptosystems and signature schemes. In: Blakley, G.R.,
Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 47–53. Springer, Heidelberg
(1985). https://doi.org/10.1007/3-540-39568-7 5
21. Steiner, M., Tsudik, G., Waidner, M.: Key agreement in dynamic peer groups.
IEEE Trans. Parallel Distrib. Syst. 11(8), 769–780 (2000)
22. Teng, J., Wu, C., Tang, C., Tian, Y.: A strongly secure identity-based authenticated
group key exchange protocol. Sci. China Inf. Sci. 58(9), 1–12 (2015)
23. Wei, F., Wei, Y., Ma, C.: Attack on an ID-based authenticated group key exchange
protocol with identifying malicious participants. IJ Netw. Secur. 18(2), 393–396
(2016)
618 G. Sharma et al.
24. Wu, T.-Y., Tsai, T.-T., Tseng, Y.-M.: A provably secure revocable ID-based
authenticated group key exchange protocol with identifying malicious participants.
Sci. World J. 2014 (2014)
25. Wu, T.-Y., Tseng, Y.-M.: Towards ID-based authenticated group key exchange
protocol with identifying malicious participants. Informatica 23(2), 315–334 (2012)
26. Wu, T.-Y., Tseng, Y.-M., Tsai, T.-T.: A revocable ID-based authenticated group
key exchange protocol with resistant to malicious participants. Comput. Netw.
56(12), 2994–3006 (2012)
27. Wu, T.-Y., Tseng, Y.-M., Yu, C.-W.: A secure ID-based authenticated group key
exchange protocol resistant to insider attacks. J. Inf. Sci. Eng. 27(3), 915–932
(2011)
28. Zhang, F., Chen, X.: Attack on an ID-based authenticated group key agreement
scheme from PKC 2004. Inf. Process. Lett. 91(4), 191–193 (2004)
29. Zhao, J., Gu, D., Gorantla, M.C.: Stronger security model of group key agree-
ment. In: Proceedings of the 6th ACM Symposium on Information, Computer and
Communications Security, pp. 435–440. ACM (2011)
Network Security
Machine Learning for Black-Box Fuzzing
of Network Protocols
1 Introduction
Fuzzing is one of the most effective techniques to find security vulnerabilities
in application by repeatedly testing it with modified or fuzzed inputs. State-of-
the-art Fuzzing techniques can be divided into two main types: (1) black-box
fuzzing [1], and (2) white-box fuzzing [2]. Black-box fuzzing is used to find secu-
rity vulnerabilities in closed-source applications and white-box fuzzing is for
open source applications. In terms of proprietary protocols, whose specification
and implementation code are unavailable, black-box fuzzing is the only method
can be conducted. There are two kinds of black-box fuzzing: (1) mutation-based
fuzzing, and (2) generation-based fuzzing. Mutation-based fuzzing requires no
knowledge of the protocol under test, it modifies an existing corpus of seed
inputs to generate test cases. In contrast, generation-based fuzzing requires the
input model to specify the message format of the protocol, in order to generate
test cases. It has been proved that generation-based fuzzing performs much bet-
ter, when compared to mutation-based fuzzing [3]. However, the input model of
generation-based fuzzing can not be provided if neither the specification nor the
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 621–632, 2018.
https://doi.org/10.1007/978-3-319-89500-0_53
622 R. Fan and Y. Chang
2 Preliminaries
The RNN processes the input sequence in a series of time stamps. For a
particular time stamp t, the hidden state ht and the output yt at that time
stamp has equations as Eqs. 1 and 2 show.
ht = f (ht−1 , xt ) (1)
yt = φ(ht ) (2)
In Eq. 1, f is a non-linear activation function such as sigmoid, tanh etc.,
which is used to introduce non-linearity into the network. And φ in Eq. 2 is a
function such as softmax that computes the output probability distribution over
a given vocabulary conditioned on the current hidden state. RNNs can learn a
probability distribution over a character sequence (x1 , x2 , ..., xt−1 ) by training
to predict the next character xt in the sequence.
In theory, RNNs are absolutely capable of handling long-term dependen-
cies, where the predictions need more context. Unfortunately, in practice, RNNs
become unable to learn to connect the information in cases shown in Fig. 3,
where the distance between the relevant information and the place that it is
needed becomes very large.
Long short-term memory networks (LSTMs) are a special kind of RNN, explicitly
designed to avoid the long-term dependency problem. They also have the form of
a chain, which has repeating modules of neural networks. But instead of having
a single neural network layer, the repeating module has a different structure as
Fig. 4 shows.
The horizontal line crossing through the top of Fig. 4 is the cell state, which
is the key of LSTMs. LSTMs are able to remove or add information to cell state
with structures called gates, which composed out of a sigmoid neural net layer
and a point wise multiplication operation.
Each box in Fig. 5 represents a cell of the RNN, in our method an LSTM cell.
Encoder and decoder can share weights or, as is more common, use a different set
of parameters. We train the seq2seq model using a corpus of network recordings,
treating each one of the message as a sequence of characters. Before training, we
concatenate all the messages into a single file.
3 Methodology
The main idea of our method is to learn a generative input model over the set of
network protocol messages. We use a seq2seq model that has been historically
proved to be very successful at many automatic tasks such as speech recognition
and machine translation. Traditional n-gram based approaches are limited by
contexts of finite length, while the seq2seq model is able to learn arbitrary length
contexts to predict next sequence of characters. The seq2seq model can be trained
in an unsupervised mode to learn a generative input model, which can be used
to generate test cases.
Before training the seq2seq model, we need to preprocess the corpus. Firstly,
we count the non-repeating characters in the corpus, and sort them in a list
according to their frequency of occurrence. Then, take each character as key and
its order in list as value, storing in a dictionary. Finally, create a tensor file which
replace all characters with its value in list. The main purpose of preprocessing
is to calculate the number of batches Nb ,
St
Nb = (3)
Sb ∗ Ls
where St is the size of tensor file, Sb is the size of one batch, which is set to 50
by default. And Ls is the length of each sequence in batches.
After the preprocessing, we train the seq2seq model in an unsupervised learn-
ing mode. Due to the absent of training dataset labels, we are not able to accu-
rately determine how well the trained models are performing. We instead train
several models with different epochs, which is the number of learning algorithm
execution. Therefore, an epoch is defined as an iteration of the learning algo-
rithm to go over the complete training dataset. We train the seq2seq models Ms
as shown in Algorithm 1 with five different numbers of epochs Ne : 10, 20, 30, 40
and 50. We use an LSTM model with 2 hidden layers, and each layer consists of
128 hidden states.
Ip is the initial path, where the checkpoints file stored in. Ns is the number
of training steps to save intermediate result and the default setting is 1000.
626 R. Fan and Y. Chang
We use the trained seq2seq model to generate new protocol messages. At the
beginning of the fuzzing, we always connect to the server, and take the received
message for initial sequence Is . Then request the seq2seq model to generate a
sequence until it outputs one protocol message terminator like CRLF in ftp.
Based on sampling strategy, there are three different strategies for message gen-
eration. Now, we give the details of these three different sampling strategies we
make experiments with.
Max at Each Step: In this sampling strategy, we pick the best character in the
predicted probability distribution. This strategy will generate protocol messages
which are most likely to be well-formed. But this feature just makes the strategy
unsuitable for fuzzing. Because we need test cases which are not quite the same
as well-formed messages for fuzzing.
Sample at Each Step: In this sampling strategy, we don’t pick the best pre-
dicted next characters in the probability distribution. As a result, this strategy
is able to generate multifarious new protocol messages, which combines various
templates the seq2seq model has learnt from the protocol messages. Due to sam-
pling, the generated protocol messages will not always be well-formed, which is
of great use for fuzzing.
Sample on Spaces: This sampling strategy combines the two strategies
described above. It uses the best predicted character in the probability dis-
tribution when the last character of the input sequence is not a space. And it
samples distribution to generate next character when the input sequence ends
with a space, similar to the second strategy. More well-formed protocol messages
compared to the second strategy can be generated by this strategy.
Machine Learning for Black-Box Fuzzing of Network Protocols 627
4 Experimental Evaluation
4.1 Experiment Setup
In this section, we present results of fuzzing experiments with two ftp applica-
tions WarFTPD 1.65 and Serv-U build 4.0.0.4. We establish these two ftp appli-
cations on two servers, which run Windows Server 2003. The seq2seq models is
trained on a personal computer, which has a Ubuntu 16.04 operating system.
We implement a client program to communicate with ftp server, using the test
cases generated by trained seq2seq model as input. If the program detects any
error reports from ftp server, it records error messages in an error log. And we
can validate whether the recorded error messages are indeed able to trigger vul-
nerabilities. Moreover, it is also feasible to implement a server program of the
protocol to fuzz the client applications.
We use three working standards to evaluate fuzzing effectiveness:
Coverage: A basic demand shared by random and more advanced grammar-
based fuzzers is that the instruction coverage should be as high as possible. In
the case of our method, the fuzzer is able to fuzz the communication both ends
but its coverage is highly depend on the network recordings.
628 R. Fan and Y. Chang
Bugs: During the fuzzing process, we take the advantage of tool AppVerifier to
monitor the running of ftp server. AppVerifier is a free runtime monitoring tool
which can catch memory corruption bugs like buffer overflows, and it is widely
used for fuzzing on Windows.
Performance Comparison: We record the statistical data when our fuzzer
and existing fuzzer Sulley and SPIKE running with Serv-U build 4.0.0.4 for
performance comparison. The statistical data include T imes, T ime and Speed.
T imes is the number of test cases sent, and T ime means how many minutes was
taken to find the bug. Speed indicates the number of test cases sent per second.
4.2 Corpus
We extracted about 10,000 messages for WarFTPD and 36,000 messages for
Serv-U from network recordings. Most of the network recordings are generated
by normal access to ftp server. And part of the traffic is generated by Sulley.
Using Sulley is to improve the instruction coverage, because normal access may
not include some less commonly used commands like MDTM, which is used to
get the modification time of the remote file.
These 10,000 messages for WarFTPD and 36,000 messages for Serv-U which
have both client and server side data are the training corpus for the seq2seq
model we used in this work. We generate protocol messages using the trained
seq2seq model, but the input data for ftp server should be transfered from net-
work. Therefore we implement a client program to send the generated messages
to ftp server.
4.3 Result
In order to obtain a reasonable explanation of coverage results, we select the
network recordings of normal access to ftp server, and measure their coverage
of the ftp application, to be used as a baseline for following experiments. When
training the seq2seq model, an important parameter is the number of epochs.
The results of experiments obtained after training the seq2seq model with 10,
20, 30, 40 and 50 epochs is reported here.
Coverage. Figure 6(a) and (b) show the instruction coverage obtained with
sample at each step and sample at spaces from 10 to 50 epochs for
WarFTPD and Serv-U. The figures also show the coverage obtained with the
corresponding baseline.
We observe the following:
– The coverage for sample at each step and sample on spaces are above
the baseline coverage for most epoch results.
– The trend for the coverage of WarFTPD and Serv-U from 10 to 50 epochs is
quite unstable and unpredictable.
– The best coverage obtained with sample at each step and sample on
spaces are both with 40-epochs.
Machine Learning for Black-Box Fuzzing of Network Protocols 629
(a) (b)
Bugs. Another working standard is of course the number of bugs found. Our
method has been tested on WarFTPD and Serv-U two ftp applications, and
after a nearly 4-days experiment, we found almost all of the already known
vulnerabilities in these two ftp applications as Table 1 shows.
There is a SM N T buffer overflow vulnerability in Serv-U not found, because
of the incompleteness of network traffic we used to train the seq2seq model.
that the generation of test cases by seq2seq model takes a lot of time. However,
Sulley and SPIKE can only be used when the specification of the protocol is
available, but our method is able to fuzz proprietary network protocols, whose
specification and implementation code are both unavailable.
5 Related Work
Protocol Reverse Engineering. Over a decade ago, the process of reverse engi-
neering a network protocol was a tedious, time-consuming and manual task.
Nowadays, there are plenty of methods proposed for automating the process of
protocol reverse engineering. The methods can be divided into two branches: On
the one hand, methods that utilize the protocol implementation [16,17], and on
the other hand, those extract protocol specification from network recordings only.
The Protocol Informatics Project [18] uses a bioinformatics method to imple-
ment byte sequence alignment of similar message formats. The Discoverer tool
[19] present a recursive clustering approach of tokenized messages. Biprominer
[20] and ProDecoder [21] presented by Wang et al. focused on binary protocols,
they retrieve statistically relevant keywords and sequencing. Based on data min-
ing techniques, AutoReEngine [22] reveal keywords and their position within
messages. It is particularly difficult to extract protocol specification in case the
protocol implementation code can not be available for network security staff, but
network recordings only. These approaches provide first means for automatically
identify message field boundaries and formats, but unfortunately, they are not
able to relate variable fields over temporal states.
Protocol Fuzzing. Fuzzing is one of the most effective techniques to uncover secu-
rity flaws in application by generating test case in an automated way. Two types
of fuzzing can be discriminated here: (1) black-box fuzzing [1] which a tester
can only seeing what input and output of an application, and white-box fuzzing
[2] that allows the tester to inspect the implementation code (either binary or
source code) and for instance, take advantage of static code analysis and sym-
bolic execution. This classification is obviously applicable to protocol fuzzing
as well. Most well-known black-box random fuzzers today support generation-
based fuzzing, e.g. Peach [11] and SPIKE [12], can be used to fuzz protocol
implementation when the specification of the protocol is available, but can do
no more when the protocol is unknown. Only few approaches can fuzz protocol
in situation where specification and implementation code are both unavailable.
AutoFuzz [13] and PULSAR [14], which both infer the protocol state machine
and message formats from network traffic alone.
6 Conclusion
It is a challenging problem of computer security to find vulnerabilities in the
implementations of proprietary protocols. To the best of our knowledge, this is
the first attempt to do black-box protocol fuzzing using neural network learn-
ing algorithm, which is able to find vulnerabilities in protocol implementations,
Machine Learning for Black-Box Fuzzing of Network Protocols 631
whether or not the code nor specification are available. We presented and evalu-
ated algorithms with different sampling strategies to automatically learn a gen-
erative model of protocol messages.
Although we have applied our method on very common network protocols,
the method is also able to find vulnerabilities in unusual implementations, such
as in embedded devices and industrial control systems. Moreover, we are con-
sidering adding some form of reinforcement learning in our future work to guide
the fuzzing process with coverage feedback from the application.
References
1. Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery.
Pearson Education, London (2007)
2. Godefroid, P., Levin, M.Y., Molnar, D.A., et al.: Automated whitebox fuzz testing.
In: NDSS, vol. 8, pp. 151–166 (2008)
3. Miller, C., Peterson, Z.N.: Analysis of mutation and generation-based fuzzing.
Technical report, Independent Security Evaluators (2007)
4. Sotirov, A.I.: Automatic vulnerability detection using static source code analysis.
Ph.D. thesis, University of Alabama (2005)
5. Chess, B., McGraw, G.: Static analysis for security. IEEE Secur. Priv. 2(6), 76–79
(2004)
6. Godefroid, P., Kiezun, A., Levin, M.Y.: Grammar-based whitebox fuzzing. In:
ACM Sigplan Notices, vol. 43, pp. 206–215. ACM (2008)
7. Cadar, C., Godefroid, P., Khurshid, S., Păsăreanu, C.S., Sen, K., Tillmann, N.,
Visser, W.: Symbolic execution for software testing in practice: preliminary assess-
ment. In: Proceedings of the 33rd International Conference on Software Engineer-
ing, pp. 1066–1071. ACM (2011)
8. Cadar, C., Sen, K.: Symbolic execution for software testing: three decades later.
Commun. ACM 56(2), 82–90 (2013)
9. Schwartz, E.J., Avgerinos, T., Brumley, D.: All you ever wanted to know about
dynamic taint analysis and forward symbolic execution (but might have been afraid
to ask). In: 2010 IEEE Symposium on Security and Privacy (SP), pp. 317–331.
IEEE (2010)
10. Amini, P., Portnoy, A.: Sulley: pure Python fully automated and unattended
fuzzing framework (2013)
11. Eddington, M.: Peach fuzzing platform. In: Peach Fuzzer, p. 34 (2011)
12. Spike fuzzing platform. http://www.immunitysec.com/resourcesfreesoftware.shtml
13. Gorbunov, S., Rosenbloom, A.: Autofuzz: automated network protocol fuzzing
framework. IJCSNS 10(8), 239 (2010)
14. Gascon, H., Wressnegger, C., Yamaguchi, F., Arp, D., Rieck, K.: Pulsar: stateful
black-box fuzzing of proprietary network protocols. In: Thuraisingham, B., Wang,
X.F., Yegneswaran, V. (eds.) SecureComm 2015. LNICST, vol. 164, pp. 330–347.
Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28865-9 18
15. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk,
H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for
statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
16. Comparetti, P.M., Wondracek, G., Kruegel, C., Kirda, E.: Prospex: protocol spec-
ification extraction. In: 2009 30th IEEE Symposium on Security and Privacy, pp.
110–125. IEEE (2009)
632 R. Fan and Y. Chang
17. Caballero, J., Yin, H., Liang, Z., Song, D.: Polyglot: automatic extraction of pro-
tocol message format using dynamic binary analysis. In: Proceedings of the 14th
ACM Conference on Computer and Communications Security, pp. 317–329. ACM
(2007)
18. Beddoe, M.: The protocol informatics project (2004)
19. Cui, W., Kannan, J., Wang, H.J.: Discoverer: automatic protocol reverse engineer-
ing from network traces. In: USENIX Security Symposium, pp. 1–14 (2007)
20. Wang, Y., Li, X., Meng, J., Zhao, Y., Zhang, Z., Guo, L.: Biprominer: automatic
mining of binary protocol features. In: 2011 12th International Conference on Par-
allel and Distributed Computing, Applications and Technologies (PDCAT), pp.
179–184. IEEE (2011)
21. Wang, Y., Yun, X., Shafiq, M.Z., Wang, L., Liu, A.X., Zhang, Z., Yao, D.,
Zhang, Y., Guo, L.: A semantics aware approach to automated reverse engineer-
ing unknown protocols. In: 2012 20th IEEE International Conference on Network
Protocols (ICNP), pp. 1–10. IEEE (2012)
22. Luo, J.Z., Yu, S.Z.: Position-based automatic reverse engineering of network pro-
tocols. J. Netw. Comput. Appl. 36(3), 1070–1077 (2013)
A Novel Semantic-Aware Approach
for Detecting Malicious Web Traffic
1 Introduction
detection results include not only anomaly scores but also the summary infor-
mation of malicious activities, which makes the detection results more under-
standable. We evaluate our approach with a manually labeled dataset consisting
of four different websites. It is shown that our method is effective in discovering
various malicious web users, as the average precision is 90.8% and the recall
is averagely 92.9%. Furthermore, we employ our approach on a large dataset
with more than 136 million web traffic logs from a web hosting service provider,
where 3,995 unique malicious IPs are detected involving hundreds of websites. In
the results, it is impressive that the semantic representation of malicious visits
can help webmasters or security analysts understand malice intuitively, which is
helpful to improve network defense strategies and identify compromised sites.
We organize this paper as following: Background and related work are intro-
duced in Sect. 2. We present our approach and give details on the anomaly score
computation in Sect. 3. Evaluation of our approach and results in the wild are
presented in Sect. 4. Finally we make a discussion and conclude the paper in
Sect. 5.
for a webmaster, may attract a plenty of visits from bots and human to the
website accompanied with a large volume of abusing traffic. Requests in web
abusing traffic usually do not contain harmful information to web applica-
tions, but the resources they ask for are never seen in previous traffic.
Recently, semantic analysis has been more popular in the area of network secu-
rity. Zhang et al. [11] proposed SBotScope to analysis large-scale malicious bot
queries received by a known search engine. Liao et al. [15] introduced a technique
semantic inconsistency search to detect illicit advertising content. For detection
of malicious web traffic, semantic analysis can help avoid too much dependen-
cies on structural information, which requires strong background knowledge on
specific attacking techniques.
A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic 637
3 Methodology
To generically detect various malicious web traffic, we employ the literal similar-
ity in requests from normal visits, and introduce a novel semantic-aware detec-
tion approach. Since our detection is conducted on the level of a single visit,
it is crucial that how to characterize users’ dynamic web visiting activities. In
our approach, a user’s visiting profile is represented in two word sets, and the
anomaly score of each word is computed by using a modified TF-IDF algorithm.
Furthermore, to avoid interference of normal but infrequent words, a global nor-
mal word dictionary is automatically generated. The anomaly score of a user is
educed from scores of the two word sets. Then we derive a dynamic threshold
to classify abnormal users. The concept of our methodology is shown in Fig. 2.
The normal word dictionary is a word set automatically derived from the words
of all users. It is based on the assumption that a word is more likely to be normal
if it occurs in more visits. Furthermore, even if a word is infrequent, but it is
structurally similar with normal words, it is probably normal. We use n-gram to
measure the structural similarity. Empirically we choose 4-gram.
To derive the dictionary set Dnw , firstly a global word set Gw = {(wi : mwi )}
is maintained, in which mwi is the number of users whose request contains the
word wi , and the number of total users is M . Then, we define a percentage
threshold Tf w to distinguish frequent words, which means a word wi is normal
if it is in the first Tf w percent of words in Gw in descending order by mwi . All
the frequent words are put in Dnw , and the n-gram items of them are put in
N Gnw . For other words in Gw , if more than half of their n-gram items are in
N Gnw , they are added into Dnw .
The computation of the inverse document frequency is the same for the both
files:
M
idf (wbi , Gw ) = log( ). (2)
mwbi
Since F1 contains words in resource identifiers, some words may occur repeat-
edly. Directly using the number of times that a word occurs in F1 may amplify
the score of a frequent word. Here we compute the term frequency for F1 or F2
separately as following:
n1i
tf (w1i , F1 ) = 1 + , (3)
Nr
and
tf (w2i , F2 ) = 1 + log(n2i ). (4)
If a word stratifies any of following conditions, the anomaly score is directly
set to 0.
A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic 639
sets are derived from W1 and W2 , where M W1 = {(w1i : a1i )|a1i ≥ a1i+1 }
and M W2 is the same. Based on the semantic representation, the overview of
a malicious user’s web activities can be directly identified without the need for
reviewing the raw traffic logs.
4.1 Dataset
The labeled dataset DL involves four different websites, which are named from
Site A to Site D. We respectively collected traffic logs from the four sites in
different seven days, and labeled them manually with several free web security
log analyzers as auxiliary tools, including Apache-scalp [17] and 360 xingtu [18].
Among total 43,504 IPs, we found 376 malicious IPs, and more than two-thirds
of them are not detected by our auxiliary tools. The summary of DL is shown
in Table 1. The average ratio of malicious IPs to the total for each day is listed
in the last column. For all malicious IPs, the number of web requests in their
malicious traffic ranges from one to more than ten thousands.
No. Website Site type Date range #Requests #IPs #MIPs Ratio
1 Site A Aspx Apr. 01–07 27,544 998 170 13.2%
2 Site B Php May 17–23 32,272 2,479 49 2.1%
3 Site C Java May 17–23 115,440 7,759 59 0.8%
4 Site D Html May 17–23 386,725 32,268 98 0.3%
The details of malicious IPs of four sites are illustrated in Fig. 4. Figure 4(a)
shows the occurrence of malicious IPs in each day, and Fig. 4(b) presents
the number of malicious IPs in three types of malice. Significantly, the num-
ber of malicious IPs for Site A greatly increases in the last two days, while
the situations of other three sites are almost stable. At the same time, web
abuse only occurs in the traffic of Site A, and for other sites web scan is
relatively in the majority, which is in line with expectations. It is owing to
that Site A was compromised in the fifth day by attackers through exploit-
ing the PUT method vulnerability, and uploaded several web shells, such as
A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic 641
90 140
#MIPs
40
60
30
40
20
10 20
0
0
1 2 3 4 5 6 7 Site A Site B Site C Site D
Day Index
(a) Malicious IPs in each day (b) Malicious IPs in different malice
4.2 Evaluation
For DL , we use two metrics Precision (P) and Recall (R) to measure the
effectiveness of our methodology. Given the numbers of True Positive (TP) and
False Positive (FP), the precision is calculated as: P = T P/(T P + F P ), and the
recall is R = T P/(T P + F N ).
Many existing works use True Positive Rate (TPR) and False Positive Rate
(FPR) as metrics to evaluate an anomaly detection method. However, in our
labeled dataset DL , the negative samples of four web site vary greatly, which
may cause too much deviation in FPR.
With the detection window as one day and Tf w for the normal word dic-
tionary as 45%, the dataset actually is separated as 28 groups. The detection
results for each group are listed in Table 2. In the results, the precision of 12
groups are up to 100%, and 18 groups achieve 100% recall. Overall, the average
precision achieved by our approach is 90.7%, and the recall is averagely 92.9%.
It is worthy of attention that the recall is only 15.1% for the seventh day of
Site A. The reason is that among 264 IPs of Site A in Day 7 there are 78 IPs
who queried uploaded malicious pages, which makes the normal word dictionary
642 J. Yang et al.
polluted. Actually, our approach is mainly used to provide a direct and effective
way to analyze web traffic afterwards. For analysts, it is an obvious indicator of
a possible web compromise that the number of malicious IPs in Day 6 increased
roughly five times than the average of the previous four days. In such situation,
the normal word dictionaries generated in the following detection windows are
not trustable anymore, which should be replaced by dictionaries in previous
detection windows. Here, we replace the normal word dictionary of Day 7 with
Day 1, and the recall rises to 96.5% with 100% precision.
For DU , we set the detection window as one day as well. In order to reduce false
positives as much as possible, we set Tf w as 30% and filter out the FQDNs whose
number of visiting IPs in a detection window is less than 20. As a result, there
are altogether 1,413 FQDNs and 969,731 distinct IP addresses left.
We totally find 3,995 unique malicious IPs involving 782 attacked FQDNs.
In Fig. 5(a) it is presented that for almost 90% of attacked FQDNs, there are
no more than 50 distinct malicious IPs. However, there are about two in five
FQDNs attacked in more than 10 days. It is indicated that for websites in DU ,
web attacks frequently occur but not burst.
The top six malicious IPs sorted by cumulative anomaly scores are presented
in Table 3. In the last column, we list the top abnormal words of each user.
From the words, the intention of each attacker can be directly identified. The
first two IPs attacked four different sites in different days, while the words show
that they utilized the same tool to carry out targeted web scans. The next three
IPs attacked more than two hundreds FQDNs respectively, and the difference is
that *.*.40.135 and *.*.153.20 carried out web scan persistently while *.*.154.104
only used one day. Different from the first five malicious web scanners, the last
one is an attacker who intended to discover vulnerabilities of the target website
with an automatic web penetration tool, since the total number of request sent
from the IP is more than 1 millions. From its typical words, it is obvious that
A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic 643
1
180
21.2%
0.9
160
0.8
140
0.7
120
0.6
#FQDNs
100
CDF
11.0% 11.4%
0.5
80
0.4 7.9%
7.5% 7.3% 6.9%
60 6.4% 6.3%
0.3
5.2%
4.5% 4.3%
0.2 40
0.1 20
0 0
0 50 100 150 200 250 300 1 2 3 4 5 6 7 8 9 10 11 12
#MIPs of each attacked FQDN #Attack Days
(a) CDF of MIPs for attacked FQDNs (b) Days for attacked FQDNs
the malicious IP at least conducted large amounts of SQL injection attacks. Due
to the site only containing static web resources, the injection strings were added
into the resource identifiers of requests and consequently occurs in M W1 .
attacked FQDNs and find that their popular abnormal words are almost the
same as the above. However, there is only one exception which consists of a
unique word m.yonjizz.com. With querying it in a search engine, we find that
it is a online video site related to adult. There are more than twenty mali-
cious IPs who visited the FQDN with requests like /n/M.yonjizz.com/szh/1
and /l/Www.58porn.com/es/1. Such malicious request are not discovered from
traffic of other FQDNs in DU , which may be a clue that the site was possibly
compromised.
References
1. StopBadware and CommTouch: Compromised Websites: An Owner’s Per-
spective. https://www.stopbadware.org/files/compromised-websites-an-owners-
perspective.pdf
2. Alrwais, S., Yuan, K., Alowaisheq, E., Liao, X., Oprea, A., Wang, X., Li, Z.: Catch-
ing predators at watering holes: finding and understanding strategically compro-
mised websites. In: Proceedings of the 32nd Annual Conference on Computer Secu-
rity Applications, pp. 153–166. ACM (2016)
A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic 645
3. Li, F., Ho, G., Kuan, E., Niu, Y., Ballard, L., Thomas, K., Bursztein, E., Paxson,
V.: Remedying web hijacking: notification effectiveness and webmaster comprehen-
sion. In: Proceedings of the 25th International Conference on World Wide Web,
pp. 1009–1019. ACM (2016)
4. Xie, G., Hang, H., Faloutsos, M.: Scanner hunter: understanding http scanning
traffic. In: Proceedings of the 9th ACM Symposium on Information, Computer
and Communications Security, pp. 27–38. ACM (2014)
5. Kruegel, C., Vigna, G.: Anomaly detection of web-based attacks. In: Proceedings
of the 10th ACM Conference on Computer and Communications Security, pp.
251–261. ACM (2003)
6. Valeur, F., Mutz, D., Vigna, G.: A learning-based approach to the detection of
SQL attacks. In: Proceedings of the Conference on Detection of Intrusions and
Malware and Vulnerability Assessment (DIMVA), pp. 123–140 (2005)
7. Robertson, W., Vigna, G., Kruegel, C., Kemmerer, R.A.: Using generalization and
characterization techniques in the anomaly-based detection of web attacks. In:
Annual Network and Distributed System Security Symposium (NDSS) (2006)
8. Song, Y., Keromytis, A.D., Stolfo, S.J.: Spectrogram: a mixture-of-Markov-chains
model for anomaly detection in web traffic. In: Annual Network and Distributed
System Security Symposium (NDSS) (2009)
9. Krueger, T., Gehl, C., Rieck, K., Laskov, P.: TokDoc: a self-healing web application
firewall. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp.
1846–1853. ACM (2010)
10. Lampesberger, H., Winter, P., Zeilinger, M., Hermann, E.: An on-line learning
statistical model to detect malicious web requests. In: SecureComm, pp. 19–38
(2011)
11. Zhang, J., Xie, Y., Yu, F., Soukal, D., Lee, W.: Intention and origination: an
inside look at large-scale bot queries. In: Annual Network and Distributed System
Security Symposium (NDSS) (2013)
12. Canali, D., Balzarotti, D.: Behind the scenes of online attacks: an analysis of
exploitation behaviors on the web. In: Annual Network and Distributed System
Security Symposium (NDSS) (2013)
13. Starov, O., Dahse, J., Ahmad, S.S., Holz, T., Nikiforakis, N.: No honor among
thieves: a large-scale analysis of malicious web shells. In: Proceedings of the 25th
International Conference on World Wide Web, pp. 1021–1032. ACM (2016)
14. FireEye. Detecting and Defeating the China Chopper Web Shell. https://
www.fireeye.com/content/dam/fireeye-www/global/en/current-threats/pdfs/rpt-
china-chopper.pdf
15. Liao, X., Yuan, K., Wang, X., Pei, Z., Yang, H., Chen, J., Duan, H., Du, K.,
Alowaisheq, E., Alrwais, S., Xing, L., Beyah, R.: Seeking nonsense, looking for
trouble: efficient promotional-infection detection through semantic inconsistency
search. In: IEEE Symposium on Security and Privacy, pp. 707–723 (2016)
16. Paxson, V.: Bro: a system for detecting network intruders in real-time. In: Pro-
ceedings of 7th USENIX Security Symposium (1998)
17. Apache-scalp. https://github.com/nanopony/apache-scalp
18. 360 Xingtu. http://wangzhan.360.com/Activity/xingtu
An Active and Dynamic Botnet Detection
Approach to Track Hidden Concept Drift
1 Introduction
Botnet is one of the most significant threats for Internet security. Nowadays, the
botnet keeps evolving which is composed by not only compromised computers,
but also a large variety of IoT devices, including smart phones, IP cameras,
routers, printers, DVRs and so on. With enormous cumulative bandwidth and
computing capability, botnet becomes the most important and powerful tool
available for cheaper and faster large-scale network attacks in the Internet [1].
According to AV-Test [2] report, on average over 390,000 new malware sam-
ples are detected every day. The enormous volume of new malware variants ren-
ders manual analysis inefficient and time-consuming. Nowadays, machine learn-
ing has been widely deployed in botnet detection system as a core component [3–
6], and has achieved good detection results.
However, with financial motivation, attackers keep evolving the evasion tech-
niques to bypass machine learning detection. Currently, more and more well-
crafted botnets exploit concept drift, a vulnerability of machine learning, to
accelerate the decay of detection model. Machine learning algorithms assume
that the underlying botnet data distribution is stable in training and testing
dataset. The well-crafted concept drift attacks gradually and stealthily intro-
duce changes into malware data distribution to mislead machine learning models,
such as new communication channels [7–11], mimicry attacks [12,13], gradient
descent attacks [12,13], poison attacks [14], and so on. To build change-resistant
and self-renewal learning models against advanced evasion techniques is very
important for botnet detection system.
Existing solutions use passive and periodical model retraining to miti-
gate concept drift attacks. However, the interval of two retraining is hard to
decide because frequently retraining is not efficient while loose frequency leads
untrusted predictions in some periods. And the manual labelling of all new sam-
ples is required for supervised retraining process. The labelling is based on the
traditional coarse-grained and fixed threshold, which are not sensitive to hid-
den and gradual changes of underlying data distribution. Stationary learning
model’s confidential values are critical for detection performance. So if the learn-
ing model’s algorithm and parameters are stolen by adversaries [15], retraining
will be no longer useful for sudden concept drift attacks, which are crafted quickly
and easily based on full knowledge of the detection model.
In this paper, we present an active and dynamic botnet detection approach
that enhance traditional horizontal correlation detection model. Compared to
the traditional models, this model could actively detect hidden concept drift
attacks and dynamically evolve to track the trend of latest botnet concept.
In particular, this paper makes the following contributions:
– As far as we know, we are the first to present an active and dynamic learning
approach for botnet detection that actively tracks the trend of hidden botnet
concept drift and accordingly evolves learning model dynamically to mitigate
model aging.
– We extend the traditional passive decision method which is based on coarse-
grained threshold to check whether the bottom line are crossed or not. In
contrast, we introduce fine-grained p-values as indicator to actively identify
hidden concept drift before the detection performance starts to degrade.
– The confidential values of traditional detection models are fixed, such as
model parameters, so retraining is the only way to combat concept drift
attacks. We introduce DRIFT assessment and feature reweighting to dynam-
ically tune model parameters following the trend of current botnet concept.
The remainder of this paper is outlined as follows: In Sect. 2, we review the
related works. Section 3 presents the architecture of our active and dynamic
botnet detection approach, and describes each components. Section 4 shows our
experiments performed to assess the recognition of underlying data distribution
concept drift and model self-renewal. In Sect. 5, we discuss the limitations and
future work, and in Sect. 6 we summarize our results.
648 Z. Wang et al.
2 Related Works
Nowadays, machine learning (ML) has been widely used in botnet detection
system as a core component. The assumption of ML is that the underlying data
distribution is stable for both training dataset and testing dataset. By exploiting
the assumption of ML, many well-crafted evasion approaches, known as concept
drift attacks, have been proposed to evade or mislead ML models [16]. As shown
in Fig. 1, every step of ML process is a potential part of the concept drift attack
surface. With different levels of knowledge of target ML system, attackers could
launch various concept drift attacks [17]. Arce [18] pointed out that machine
learning itself could be the weakest link in the security chain.
Gradient
Stealthy Mimicry Polymorphic Model Poisoning
Descent
Channel Attack Metamorphic Stealing Attack
Attack
Fig. 1. Machine learning process and corresponding concept drift attack surface
Poisoning attacks work by introducing carefully crafted noise into the training
data. Biggio et al. [14] proposed poisoning attacks to merge the benign and
malicisous clusters that make learning model unusable.
Therefore, botnet problems are not stable but change with time. For machine
learning based botnet detectors, they are designed under the assumption that
the training and testing data follow the same distribution which make them
vulnerable to concept drift problem that the underlying data distribution are
changing with time. One of the concept drift mitigation approach is to recognize
and react to recent concept changes before model aging. Demontis et al. [14]
proposed an adversary-aware approach to proactively anticipates the attackers.
Deo et al. [22] presented a probabilistic predictor to assess the underlying clas-
sifier and retraining model when it recognized concept drift. Transcend [23] is a
framework to identify model aging in vivo during deployment, before the perfor-
mance starts to degrade. In this paper we present an active and dynamic botnet
detection approach which could actively detect the trend of hidden concept drift
attacks and dynamically evolve learning model to mitigate model aging.
Horizontal
Correlation p-value DRIFT Feature
APV
Prediction Reweighting
Training Dataset. The training dataset selection directly affects the quality of
the detection model. The CTU botnet capture dataset is stored in files using
the binetflow format, in which each row represents a network behavior and each
column is a behavior feature. According to the granularity, the network behaviors
can be abstracted to different levels, such as packets, netflows, traces and hosts.
In this work, netflow is the basic data unit for training datasets, and then we
abstract netflows into traffic trace by grouping the netflows with the same source
IP address, the same destination IP address, the same destination port and the
protocol together.
Preprocessing. Before starting training phase, we will preprocess the data that
filter noise data and transform the features by scaling each feature to a given
range. In this work the range of each feature on the training dataset is given
between 0 and 1 at initialization time. To make the data clearer and more
usable, we will filter the datasets by whitelisting common Internet service, such
as Microsoft Update and Google, and known online movie and music traffic by
their communication pattern.
Set Features
Volume features Average of send bytes
Standard deviation of send bytes
Average of received bytes
Standard deviation of received bytes
Time-related features Average of duration
Standard deviation of duration
Average of received interval
Standard deviation of interval
Connection frequency
3.3 P-Value
Fig. 3. The conformal learning component calculates APV for each time windows
The change of concept drift score (CDS) between different time window
reflects the change of underlying botnet data distribution with time that can
identify gradual moderate drift.
If the CDS score in the latest time window increases, it shows that the current
concept of underlying botnet data distribution is different from the old concept
learnt from previous time window, and indicates that the detection model is
suffering from concept drift attack. But the decay of threshold based detection
performance may not be observed immediately when concept drift is found. Only
when the variation of the underlying data distribution exceeds the boundary of
the threshold, the detection model starts make poor decisions. If the CDS score
does not increase in the new time window, it means in the current time window,
the distribution of botnet traces does not have significant concept drift.
When concept drift is found in the latest time window, we will use the DRIFT
algorithms to evaluate the contribution of each feature in current window to
identify the features which effected by concept drift, as shown in Algorithm 2.
DRIF T [i] represents the effect of the ith feature on the average distance between
two botnet traces in different time windows. If DRIF T [i] increases, it means that
the concept drift affects the ith feature.
3.6 Model Self Renewal
When concept drift is recognized, we will reweight the affected features according
to the DRIFT score to dynamically update the model before the cumulative
radical drift. The formula for calculating a new weight based on DRIFT score
is:
4 Experiment
In this paper, we use the public CTU botnet datasets for our experiment that
is provided by Malware Capture Facility project1 . They capture long-live real
botnet traffic and generate labeled netflow files that are public for malware
research. The traffic dataset is from 2011 to present. We plan to recognize the
concept drift between different variants in the same family. We select 6 botnet
families that have more than 2 variants for this experiments as shown in Table 2.
All file names of CTU botnet captures have the same prefix “CTU-Malware-
Capture-Botnet”, and each capture has an unique suffix name. In the Table 2,
only suffix names are listed to save space. Each family has multiple variants and
the capture time of all variants and the time span of each family are different.
We cut the time span of each family into 2 time windows: tw1 and tw2 , so
there are 12 time windows that each botnet family has 2 disjoint time windows.
According to the time order of variants, the tw2 of each family only contains
its latest variant, while other variants are all grouped into tw1 . To recognize
1
Garcia, Sebastian. Malware Capture Facility Project. Retrieved from https://
stratosphereips.org.
An Active and Dynamic Botnet Detection Approach 655
the hidden concept drift between different time windows, we take three steps:
first, visualize botnet data distribution in the two dimension figure; second, split
the two dimension space into small grids and compute the significant levels of
all grids; third, calculate concept drift score using botnet data distribution and
significant levels.
The data distribution of each family are shown in Fig. 4. We select tSNE [25]
algorithm to do dimension reduction. The tSNE is an algorithm to visualize high-
dimensional datasets by dimensionality reduction. The tSNE maps the high-
dimensional points into two or three dimensions and keeps the distance structure
that the close points in high-dimensional space remain close to each other on
the low dimension space.
We split the 2-dimension tSNE space into 30 × 30 grids, so there are 900 grids
in total for each family as shown in Fig. 4. And we calculate average p-values
(APV) for all grids. The APV represents the significance of each grid. If a grid
has a high APV, the grid is important for the description of this botnet family
characteristics. In the Fig. 4, we use square to denote the common grids for two
time windows, and the triangle represents the grid only for tw1 , and the circle
means this grid only belong to tw2 .
Fig. 4. The drift of data distribution and significant levels of each family.
Table 3. The feature DRIFT scores and Table 4. The feature DRIFT scores and
new weights of Dridex new weights of Yakes
for family Dynamer, Taobao, Cridex are 0.121, 0.216 and 0.238, which means
there are gradual and moderate concept drifts in the latest time windows. For
family Yakes and Dridex, the CDSs are 0.994 and 0.998 which indicate radical
concept drifts in the latest time windows.
After recognized concept drift, we will assess this concept drift affect to each
predictive feature and then dynamically calculate new weights for all features
to track the trend of underlying concept drift. In this paper, we use DRIFT
algorithms to assess the concept drift effect to each predictive feature. DRIFT
score represents the distance between the observed traces in tw2 and the traces
in tw1 . According to Algorithm 2, we update the weight for all features as shown
in Tables 3 and 4.
Figure 6 shows the changes of time windows APVs. Note that the time win-
dow APV is different from grid APV. The time window APV is the average
p-value of all traces captured in a time window, while grid APV is the aver-
age p-value of the traces in a small grid. After feature reweighting, the latest
time window APVs of family Yakes and Dridex have dramatic increase, which
means the latest concept is becoming more consistent with the previous concept.
Note that the real underlying botnet data distribution does not change. Just the
Fig. 5. The concept drift scores of each Fig. 6. The APV of time windows
family. before and after feature reweighting.
An Active and Dynamic Botnet Detection Approach 657
model observing perspective changes. From the new perspective, the new botnet
variant looks more similar to known variants.
5 Discussion
References
1. Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J.,
Durumeric, Z., Halderman, J.A., Invernizzi, L., Kallitsis, M., Kumar, D., Lever, C.,
Ma, Z., Mason, J., Menscher, D., Seaman, C., Sullivan, N., Thomas, K., Zhou, Y.:
Understanding the mirai botnet. In: 26th USENIX Security Symposium (USENIX
Security 2017), Vancouver, BC. USENIX Association, August 2017
2. AV-Test: Malware statistics, September 2017. https://www.av-test.org/en/
statistics/malware/
3. Demontis, A., Melis, M., Biggio, B., Maiorca, D., Arp, D., Rieck, K., Corona, I.,
Giacinto, G., Roli, F.: Yes, machine learning can be more secure! A case study on
android malware detection. IEEE Trans. Dependable Secure Comput. (2017)
4. Garca, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet
detection methods. Comput. Secur. 45, 100–123 (2014)
5. Garca, S., Zunino, A., Campo, M.: Survey on network-based botnet detection meth-
ods. Secur. Commun. Netw. 7, 878–903 (2014)
An Active and Dynamic Botnet Detection Approach 659
6. Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A survey on malware detection using
data mining techniques. ACM Comput. Surv. 50, 41:1–41:40 (2017)
7. Zeng, Y., Shin, K.G., Hu, X.: Design of SMS commanded-and-controlled and P2P-
structured mobile botnets. In: Proceedings of the Fifth ACM Conference on Secu-
rity and Privacy in Wireless and Mobile Networks, WISEC 2012, New York, NY,
USA, pp. 137–148. ACM (2012)
8. Singh, K., Sangal, S., Jain, N., Traynor, P., Lee, W.: Evaluating Bluetooth as
a medium for botnet command and control. In: Kreibich, C., Jahnke, M. (eds.)
DIMVA 2010. LNCS, vol. 6201, pp. 61–80. Springer, Heidelberg (2010). https://
doi.org/10.1007/978-3-642-14215-4 4
9. Krombholz, K., Hobel, H., Huber, M., Weippl, E.: Advanced social engineering
attacks. J. Inf. Secur. Appl. 22, 113–122 (2015). Special Issue on Security of Infor-
mation and Networks
10. Yin, T., Zhang, Y., Li, S.: DR-SNBOT: a social network-based botnet with strong
destroy-resistance. In: IEEE International Conference on Networking, Architec-
ture, and Storage, pp. 191–199 (2014)
11. Kartaltepe, E.J., Morales, J.A., Xu, S., Sandhu, R.: Social network-based bot-
net command-and-control: emerging threats and countermeasures. In: Proceedings
of Applied Cryptography and Network Security, International Conference, ACNS
2010, Beijing, China, 22–25 June 2010, pp. 511–528 (2010)
12. Šrndic, N., Laskov, P.: Practical evasion of a learning-based classifier: a case study.
In: Proceedings of the 2014 IEEE Symposium on Security and Privacy, SP 2014,
Washington, DC, USA, pp. 197–211. IEEE Computer Society (2014)
13. Biggio, B., Pillai, I., Rota Bulò, S., Ariu, D., Pelillo, M., Roli, F.: Is data clustering
in adversarial settings secure? In: Proceedings of the 2013 ACM Workshop on
Artificial Intelligence and Security, AISec 2013, New York, NY, USA, pp. 87–98.
ACM (2013)
14. Biggio, B., Rieck, K., Ariu, D., Wressnegger, C., Corona, I., Giacinto, G., Roli, F.:
Poisoning behavioral malware clustering. In: Proceedings of the 2014 Workshop
on Artificial Intelligent and Security Workshop, AISec 2014, New York, NY, USA,
pp. 27–36. ACM (2014)
15. Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine
learning models via prediction APIs. In: 25th USENIX Security Symposium
(USENIX Security 16), Austin, TX, pp. 601–618. USENIX Association (2016)
16. Kantchelian, A., Afroz, S., Huang, L., Islam, A.C., Miller, B., Tschantz, M.C.,
Greenstadt, R., Joseph, A.D., Tygar, J.D.: Approaches to adversarial drift. In:
Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security,
AISec 2013, New York, NY, USA, pp. 99–110. ACM (2013)
17. Srndic, N., Laskov, P.: Practical evasion of a learning-based classifier: a case study.
In: Proceedings of the 35th IEEE Symposium on Security and Privacy (S&P), San
Jose, CA, May 2014
18. Arce, I.: The weakest link revisited. IEEE Secur. Priv. 1, 72–76 (2003)
19. Singh, K., Srivastava, A., Giffin, J., Lee, W.: Evaluating emails feasibility for botnet
command and control. In: IEEE International Conference on Dependable Systems
and Networks with FTCS and DCC, Anchorage, AK, pp. 376–385. IEEE, June
2008
20. Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems.
In: Proceedings of the 9th ACM Conference on Computer and Communications
Security, CCS 2002, New York, NY, USA, pp. 255–264. ACM (2002)
660 Z. Wang et al.
21. Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural
features. In: Proceedings of the 28th Annual Computer Security Applications Con-
ference, ACSAC 2012, New York, NY, USA, pp. 239–248. ACM (2012)
22. Deo, A., Dash, S.K., Suarez-Tangil, G., Vovk, V., Cavallaro, L.: Prescience: proba-
bilistic guidance on the retraining conundrum for malware detection. In: Proceed-
ings of the 2016 ACM Workshop on Artificial Intelligence and Security, AISec 2016,
New York, NY, USA, pp. 71–82. ACM (2016)
23. Jordaney, R., Sharad, K., Dash, S.K., Wang, Z., Papini, D., Nouretdinov, I., Cav-
allaro, L.: Transcend: detecting concept drift in malware classification models. In:
Proceedings of the 26th USENIX Security Symposium (USENIX Security 2017)
(2017)
24. Tegeler, F., Fu, X., Vigna, G., Kruegel, C.: Botfinder: finding bots in network traffic
without deep packet inspection. In: Proceedings of the 8th International Conference
on Emerging Networking Experiments and Technologies (CoNEXT 2012), France,
pp. 349–360. ACM, New York, December 2012
25. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn.
Res. 9, 2579–2605 (2008)
Statically Defend Network Consumption
Against Acker Failure Vulnerability
in Storm
1 Introduction
Before the development of the streaming computing platform, many Internet
companies, in the face of real-time big data processing problems, usually set up
network channels and multiple work nodes by themselves to deal with messages
in real time. However, the approach could no longer meet the requirement for
data processing, such as no losing data, scaling up the cluster, and manipulating
easily. The appearance of Storm [2] solved the above problems, and Storm can
deal with real-time massive data which is generated on social platforms. At
present, there are many stream data computing systems, such as Storm, S4 [12],
Spark Streaming [3], TimeStream [4] and Kafka [1]. S4 and Kafka implement
c Springer International Publishing AG, part of Springer Nature 2018
S. Qing et al. (Eds.): ICICS 2017, LNCS 10631, pp. 661–673, 2018.
https://doi.org/10.1007/978-3-319-89500-0_56
662 W. Qian et al.
high availability through passive waiting strategy, while Storm, Spark Streaming
and TimeStream achieve high availability via upstream backup strategy [6,11].
Compared with other stream processing platforms, Storm is the most widely-
used platform in the industry from the aspects of system architecture, application
interface, support language and high availability. Storm has advantages in terms
of performance, but it raises some security problems.
At present, the academic and industry mainly focus on the security and pri-
vacy issues of batch processing platforms, such as information stealing, decision
interference and denial of service attack [5,7,16]. And many solutions have been
proposed for Hadoop, such as authentication, authorization, differential privacy
technology and trusted computing base TCG (Trusted Computing Group), etc.
These solutions are complete for secure hardware environment, trusted data pro-
cessing platform, data encryption and secure computing process [8,13–15]. How-
ever, it is inevitable that the complete solutions are not optimal, and need to be
improved and tested in practice. Moreover, security issues in big data environ-
ments are complex and diverse, and different computing frameworks may require
different solutions. Compared with batch processing platforms, stream process-
ing platforms mainly focus on the real-time and reliability. Unfortunately, there
are few concern about such security vulnerability issues.
Typically, reliable mechanism in Storm is designed simply and there are secu-
rity vulnerabilities in reliable mechanism. In this paper, we mainly focus on
the security vulnerability of Storm platform. We verify that there are problems
of network consumption caused by Acker failure and message retransmission
through analyzing source codes and experimental results. Furthermore, we pro-
pose a protection scheme to examine malicious code statically and design the
experiment in Storm. Our contributions can be summarized as follows:
2 Background
Our work is related to the fields of Acker in Storm as well as Stream Grouping.
In this section, we succinctly introduce the background in these fields.
The reliable mechanism traces each message emitted by Spout relying on Acker
Bolt in Storm. Tuple tree can be understood as a directed acyclic logic structure,
which is formed by source tuples emitted by Spout and new tuples emitted
by Bolts. Within timeout limit, Acker Bolt tasks conducted the simple XOR
operation on each tupleId in a tuple tree (uniquely identified by msgId ), and
then judged whether the result of XOR operation was zero or not. If the XOR
result was zero, the tuple tree would be processed successfully. Otherwise, the
tuple tree was considered to fail. More specifically, the implementation of reliable
mechanism is as follows:
– When sending a source tuple, Spout specifies an msgId (as the unique RootId
to identify a tuple tree) and a tupleId for the source tuple. Then Spout
acknowledges the source tuple and sends RootId, tupleId to Acker Bolt.
– After processing a received tuple successfully, Bolt sends one or more new
tuples anchored to the received tuple (uniquely identified by tupleIdrec ), and
specifies a random tupleIdnew for each new tuple. Then it acknowledges the
received tuple and sends RootId, tupleIdrec ∧ tupleIdnew to Acker Bolt.
– Acker Bolt executes the XOR operation on all received acknowledgement
messages, which belong to the same RootId. If the XOR result is zero, Acker
Bolt acknowledges that the source tuple tagged with RootId is processed
completely, and then sends ack to Spout. Otherwise, after timeout, Acker
Bolt sends fail to Spout and the source tuple is judges as failed.
In this section, we point out the problems and challenges of current reliable
mechanism in Storm, and present an attack model with the existed vulnerability.
Through analyzing the source code of reliable mechanism, we detect that both
reliable and unreliable topologies can be run in Storm. Besides, Storm developer
provides programmer with flexible API. However, it is vital to deal with some
business scenarios with message consistency, such as bank deposit, transforma-
tion and remittance business.
When programmer design their topology, Storm provides Spout and Bolt
components with some basic interfaces and abstract classes, including Icompo-
nent, ISpout, IBolt, IRichSpout, IRichBolt, IBasicBolt and some other kinds of
basic interfaces, as well as BaseComponent, BaseRichSpout, BaseRichBolt and
BaseBasicBolt and some other kinds of basic abstract classes. BaseBasicBolt
class implements IBasicBolt interface, which acknowledges the received tuple
automatically. Programmer requires for inheriting BaseBasicBolt abstract class
when designing reliable Bolt, and BaseRichBolt abstract class when designing
unreliable Bolt. Through analyzing the source code of acker mechanism, we point
out some existed vulnerabilities in Storm as follows:
not equal to zero in the extended time. Spout cannot trigger the ack() function
to evacuate the messages, so it will result in resources waste.
Attack Target. If Acker Bolt do not receive the acknowledgment message from
Bolt, Acker failure and message retransmission will result in the vulnerability of
over network consumption. Once the vulnerability is used by bad attacker, who
faked as a legal user in cluster, and submitted a malicious topology, it not only
consumes network resource in cluster, but also affects the efficiency of topology
for a normal user.
The attack model of malicious topology is designed as Fig. 1. The malicious
topology implements a reliable Spout, a reliable Bolt1 and an unreliable Bolt2 by
inheriting IRichBolt. However, Bolt2 tasks are anchored to the received tuples
and not acknowledged. The adversary runs a malicious topology as follows:
3. Bolt1 emits new tuples and anchors these tuples to the received tuple.
Then, Bolt calls ack() function to acknowledge all received tuples, and sends
tupleIdrec and tupleIdnew to Acker Bolt.
4. Bolt2 pulls every tuple from Bolt1 using global grouping method, and does
not call ack() function to acknowledge all received tuples.
5. After timeout limit, it is not zero that the XOR value of RootId in Acker
Bolt. The source tuple is processed failed.
6. Spout will call fail() function itself, and re-transmit the failed message from
Kafka message queue. By repeating the previous steps, the failed message is
still processed failed all the time.
4 Experimental Evaluation
Through the Ganglia monitoring tool, we view the resource occupancy in cluster.
Through the Storm UI, we view the malicious topology and normal user topology
operation, and compare the average processing time of normal user’s topology
whether running malicious programs or not. We design three kinds of topologies,
including general topology (simplified as GT1 ), malicious topology (simplified as
M T2 ) and malicious topology (simplified as M T3 ). GT1 consists of a reliable
Spout, a reliable Bolt1 and a reliable Bolt2 , M T2 consists of a reliable Spout,
a reliable Bolt1 and an unreliable Bolt2 , and M T3 consists of a reliable Spout,
a reliable Bolt1 and an unreliable Bolt2 . The difference between GT1 and M T2
is that whether Bolt2 is a reliable component, and the difference between M T2
and M T3 is that stream grouping method is global grouping in M T3 .
Network Consumption. Comparing the network consumption between GT1
and malicious topologies, we respectively submitted three topologies into Storm
0.10.0 cluster. Figure 2 shows the total network and memory consumption when
only running GT1 in Storm cluster. And Fig. 3 shows the network consumption
respectively when only running a malicious topology, namely M T2 or M T3 .
There are significant differences between GT1 and malicious topologies. When
GT1 running within one hour in Storm cluster, the input network consumption
Statically Defend Network Consumption Against Acker Failure Vulnerability 667
Fig. 2. The network and memory consumption when only running GT1 in Storm
and the number of executor to twelve for each benchmark. Besides, six bench-
marks are reliable topologies. By counting and comparing the consumption of
memory, CPU, and network resources for each benchmark at runtime, as shown
in Table 1, we find that Slidwindow consumes the highest amount of network
resources, Multipletogger consumes the lowest amount of network resources. The
in-network consumption of Slidwindow is 1.7x than the in-network overhead of
the Multiplelogger. In the subsequent malicious topology attacks, we will con-
duct a malicious topology to impact the network resource-dependent topology,
namely Slidwindow.
Table 1. Network, CPU and memory consumption statistics about examples in Storm
In the previous analysis of interface, we can see that Storm provides a secure
and reliable anchoring mechanism for Spout and Bolt, and provides a reliable
IBasicBolt interface and BaseBasicBolt abstract class for Bolt. Therefore, we
does not need to design a secure Spout or Bolt interface. In addition, ganglia
monitors the entire Storm cluster from the perspective of the application layer.
Note that, if Ganglia detects the excessive consumption of network resources,
maybe the abnormal behavior is not caused by acker failure and message retrans-
mission in malicious procedures.
Based on the above analysis, we design and present an offline static detection
against failure in Storm as shown in Fig. 5. If the detection result of a topology
is legal, the topology will be processed in real time. The offline static detection
against acker failure in Storm works as follows:
Fig. 5. The overview of offline static detection against acker failure in Storm
5.3 Performance
After decompiling a topology, the executable .class.jar package is decompiled
into a java source file. Then, through the ModelGoon plug-in, we draw the class
Statically Defend Network Consumption Against Acker Failure Vulnerability 671
function relationship graph. In this section, we just test the decompile result
and achieve a usable relationship graph. We conduct the code detection offline,
which does not effect the real-time performance in Storm.
6 Conclusion
Our work still need to study and improve a better detection against acker
failure. Firstly, we only consider the static detection. Dromard et al. [9] and Wang
et al. [17] proposed different methods of anomaly detection respectively, which
are worthy of reference to improve the current static detection scheme. Secondly,
during the detection of malicious programs, the selected feature is anchoring and
acknowledgement. Furthermore, our next work is to select different features, and
use the method of SVM (Support Vector Machine) to detect anomaly like [10].
References
1. Apache kafka. http://kafka.apache.org
2. Apache storm. http://storm.apache.org
3. Spark streaming. http://spark.apache.org/streaming
4. TimeStream. https://github.com/TimeStream/timestream
5. Alguliyev, R., Imamverdiyev, Y.: Big data: big promises for information security.
In: IEEE International Conference on Application of Information and Communi-
cation Technologies, pp. 1–4 (2014)
6. Aritsugi, M., Nagano, K.: Recovery processing for high availability stream pro-
cessing systems in local area networks. In: TENCON 2010–2010 IEEE Region 10
Conference, pp. 1036–1041 (2010)
7. Bertino, E., Ferrari, E.: Big data security and privacy. In: IEEE International
Congress on Big Data, pp. 757–761 (2015)
8. Dinh, T.T.A., Saxena, P., Chang, E.C., Ooi, B.C., Zhang, C.: M2R: enabling
stronger privacy in MapReduce computation (2015)
9. Dromard, J., Roudiere, G., Owezarski, P.: Online and scalable unsupervised net-
work anomaly detection method. IEEE Trans. Netw. Serv. Manag. PP(99), 1
(2017)
10. Khaokaew, Y., Anusas-Amornkul, T.: A performance comparison of feature selec-
tion techniques with SVM for network anomaly detection. In: International Sym-
posium on Computational and Business Intelligence, pp. 85–89 (2016)
11. Nagano, K., Itokawa, T., Kitasuka, T., Aritsugi, M.: Exploitation of backup nodes
for reducing recovery cost in high availability stream processing systems. In: Four-
teenth International Database Engineering & Applications Symposium, pp. 61–63
(2010)
12. Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing
platform. In: IEEE International Conference on Data Mining Workshops, pp. 170–
177 (2011)
13. Ohrimenko, O., Costa, M., Fournet, C., Gkantsidis, C., Kohlweiss, M., Sharma,
D.: Observing and preventing leakage in MapReduce (2015)
14. Roy, I., Setty, S.T.V., Kilzer, A., Shmatikov, V., Witchel, E.: Airavat: security and
privacy for MapReduce. In: Usenix Symposium on Networked Systems Design and
Implementation, NSDI 2010, 28–30 April 2010, San Jose, CA, USA, pp. 297–312
(2010)
Statically Defend Network Consumption Against Acker Failure Vulnerability 673
15. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzzi-
ness Knowl. Based Syst. 10(05), 557–570 (2002)
16. Takabi, H., Joshi, J.B.D., Ahn, G.J.: Security and privacy challenges in cloud
computing environments. IEEE Secur. Priv. 8(6), 24–31 (2010)
17. Wang, Z., Yang, J., Zhang, H., Li, C., Zhang, S., Wang, H.: Towards online anomaly
detection by combining multiple detection methods and storm. In: Network Oper-
ations and Management Symposium, pp. 804–807 (2016)
Pollution Attacks Identification in Structured
P2P Overlay Networks
1 Introduction
Structured p2p systems have grown increasingly in recent years as a means of com-
munication, resource sharing, distributed computing and the development of collabo-
rative application. They provide self-organization architecture of large-scale
application. Thus, they were subjected to further analysis and a careful design to ensure
scalability and efficiency [1].
However, recent research [2] have focused on creating efficient search algorithms
that can be used to build more complex systems. But, they have not considered how to
deal with pollution attacks. These attacks occur when a polluter peer added decoys in
data object (Content pollution) or alters the metadata (Metadata pollution) or tries to
falsify indexes (index poisoning). Thus, resulting in a wide range of polluted objects
propagating in the system.
Pollution is one of the major issues affecting structured p2p networks. A study
conducted in the KAD network to quantify the pollution of contents proved that 2/3 of
the contents are polluted [15].
In this paper, our goal is to deal with pollution attacks in structured p2p systems
using supervision and detection process. The remainder of this paper is organized as
follows. In the next section, we provide some background information about pollution
attacks. Section 3 reviews the related works. Section 4 details our contributions. We
describe our proposed solution and its underlying ideas. In Sect. 5, we present details
about the simulations steps and measurements used to assess the effectiveness of our
supervision and detection process. Section 6 concludes the paper and outlines further
directions.
Pollution attacks damage targeted objects and dispatches them in the network. In this
way, contaminated objects will be distributed through the sharing overlay. Thus, they
break trust between users during objects exchange.
An object is considered polluted if the content does not fit the description presented
to the user. Pollution attacks can be classified into three categories: Content pollution,
Metadata pollution and Index poisoning.
The content pollution occurs when a malicious node adds decoys in data object.
Thus, it can easily generate multiple false copies of objects that have the same content
key by exploiting the weakness of the used hash functions. In this way, the trans-
mission quality decreases significantly [3].
Metadata pollution occurs when polluter node alters the metadata of an object.
Thus, nodes that will download objects based on metadata will obtain corrupted one
[4]. In this way, nodes may unintentionally store contaminated objects in index table. It
is very similar to the content pollution in terms of malicious intents. In both strategies,
the polluter node tries to poison the content of the object to make it unusable. Thus, it
uses its own resources to share contaminated objects in the overlay.
To find the location of desired objects, structured p2p systems use index. Polluter
node tries to falsify these indexes by the insertion of massive numbers of false infor-
mation. Consequently, when user attempts to download an object with randomly
generated identifier, sharing system fails to locate the associated object.
Polluter node always tries to poison the index of the most popular objects. When
other nodes download these objects, they get wrong or nonexistent one. Then, it
connects directly to the victim’s nodes. In this case, other nodes cannot obtain services
from victim’s nodes because these nodes have occupied the allowed connection [5].
Index poisoning directly attacks the structure of the overlay. First, polluter node can
generate a random content key which could not point anywhere in the network.
Moreover, it can generate multiple identities based on an invalid IP address or
unavailable port number and publishes keys that point to one of it camouflaged
identities [6].
676 Z. Trifa et al.
Several researches have been done to address the pollution attacks in structured p2p
overlay networks. In this section, we describe a wide range of mechanisms to attenuate
these attacks.
approach based on the lifetime and the popularity of objects. They have proposed two
detectors that filter logs files to identify contaminated objects. Zhang et al. [15] have
proposed InfoRanking: a mechanism that tries to mitigate pollution attacks by ranking
content items. It is based on the observation where malicious peers provide numerous
fakes versions of the same information items in order to avoid blacklisting. Shin and
Reeves [16] have proposed Winnowing: a novel distributed hash table based
anti-pollution schema. It aims to reduce decoy index records held by DHT nodes in the
system. Qi et al. [17] have proposed a reputation system combined with peer reputation
and object reputation. They calculate the reputation of sharing objects by the reputation
of the voting peer. Thus, honest peer, who uploads unpolluted objects and actively
votes on objects, can have a higher reputation, while a malicious peer, who uploads
polluted objects, would have a reduced reputation.
Collaborative Techniques. In these techniques, users download the objects from his
neighbors who trust him completely. If a user starts receiving contaminated objects
from any trust friend, it stops accepting objects and signals the presence of malicious
users. These approaches allow users to locate their friends using instant presence
detection process [18].
3.4 Discussion
Pollution attacks remain one of the major challenges to overcome especially in the
context of structured p2p overlay networks. Unfortunately, reputation techniques were
not effective in preventing or reducing such attacks. This is due to the complexity of
setting such mechanisms in autonomous and complex systems. These are penalized if
the peers realize bad votes. Besides, peer reputation mechanisms only care about the
reputation of object providers, while object reputation mechanisms only care about the
reputation of sharing objects. These mechanisms relay on identification-based
approaches of malicious nodes. Herein, the major drawback is the high computa-
tional costs for verification all chunks and the communication overhead due to the
678 Z. Trifa et al.
number of messages exchanged between monitor nodes and nodes participating in the
system.
We notice that all proposed solutions are not applied to structured p2p overlay
networks and are not qualified to protect them in real time. We propose in the next
section a new monitoring tool that can detect and isolate polluter nodes in real time.
Our system is based on the identification of polluter nodes by monitoring the published
messages.
4 Contributions
In this section, we present our vision of monitoring polluter nodes. It aims to detect a
wide range of polluter nodes, which provide polluted objects.
Infiltration Process. We place a monitor peer within the suspicious polluter node
spotted by the detection process for its best exploration. This enables us to control all
the published and queried messages. At the start of the infiltration process, the monitor
node introduces itself in the overlay in the following two steps. First, it initiates the
monitor node and places next to the target node in the ID space:
IDM ¼ Min d SPi ; Mj ð1Þ
same way the location of each object id received to guarantee that IP address does not
belong to the blacklist. Finally, M verifies the content id of each object through the
verification of the packet information to determine if the object is polluted or not.
Algorithm 1. Monitoring publish message process
For each publish message (content, keyword)
/*Publish Content message verification*/
If (publish message = publish content)
/* M stores (@IP, port number, object ID, content ID) */
/* M Verify content ID location*/
If (IP address of the destination ∈ list_suspicious_polluter_nodes)
Call isolation process
Else
Call publish keyword message verification
End
Else
/*Publish Keyword message verification*/
If (publish message = publish keyword)
/* M stores (@IP sender, port number, keyword ID, list of objects IDs) */
For each object:
/* M stores (@IP, port number, object ID, content ID) */
/* M Verify keyword ID location*/
If (IP address of the destination ∈ list_suspicious_polluter_nodes)
Call isolation process
Else
/*Verify location of each object*/
For each object id
If (IP address of the destination ∈ list_suspicious_polluter_nodes)
Call isolation process
Else
/* M Verify content key, searches of content ID and verifies the packet information */
If (the packet information does not match with the packet information in the database)
Second, each neighbor of S receiving the alert message achieves this three actions.
It verifies the authentication of the alert message; marks S as a polluter node and stores
the message in an alert buffer to prevent other nodes to accept or forward any message
from and to S until its remove from the overlay. Finally, M proceed to the isolation
process. It redirects all messages coming to S to other nodes; drops all messages
forwarded by S and removes S from its neighbor list.
5 Evaluation
We considered a network with 500 nodes, 4 zones and 5 simulated hours. In the
first step, each node joins the network. We assumed that 50% of nodes are honest and
50% are polluter. After stabilization phase, nodes perform random operations every
682 Z. Trifa et al.
60 s such as the publication and search objects. Honest nodes publish unpolluted
objects. However, polluter ones publish polluted objects. Thus, they claim to be the
source of all requested objects. Figures 3 and 4 present the evolution of the number of
successfully and futilely published object.
Fig. 3. The evolution of the number of successfully and futilely objects (without any protection)
Fig. 4. The evolution of the number of successfully and futilely objects (with protection)
Figure 3 depicts the evolution of the number of successfully and futilely objects
without any protection. However, we display in Fig. 4 the same number but after the
activation of the protection process. In Fig. 3 we can notice that the number of suc-
cessfully published objects is very important related with the number of failed one.
Indeed, honest and polluter nodes publish objects in a random manner and the lack of a
Pollution Attacks Identification in Structured P2P Overlay Networks 683
monitoring and control mechanism explain the high number of polluted and unpolluted
object published successfully. Moreover, the malicious behavior of polluter nodes and
the high complexity in the edifice of routing table explain the number of futilely
objects.
In Fig. 4, we can notice that the number of successfully published objects decrease
in a remarkable way. However, the number of futilely objects increases. This is due to
the activation of the supervision and detection process. Finally, we note that the
supervision has a lot of variations; this is due to the integration of the monitoring peers
in the network and the variation of the malicious behavior peers. Besides, the dynamic
nature of these peers causes a high change in the structure of the network.
Figure 5 presents the evolution of the number of monitored peers vs the evolution
of the number of polluter peers when using a network with 4 zones. We can observe
that the number of monitored peers raises exponentially, which due to the fact that the
number of connected peers to our monitor peers increases over the duration of the
experiment. Also, we can notice that the number of detected suspicious and polluter
peers increases with the detection process. The high level of participating in the net-
work, make polluter peers supervised and tracked by our tracking process.
Fig. 5. The evolution of the number of monitored peers VS the evolution of the number of
detected polluter peers
Finally, we present the evolution of the false negative and the false positive to
assess the effectiveness of our monitoring process. Figure 6 depicts the evolution of the
number of false negative related with the evolution of the number of suspicious peers. It
refers to a failure to detect polluter peers that are present on a system. We can notice
that the number of false negative decreases significantly in function of the evolution of
suspicious peers.
Figure 7 shows the evolution of the number of false positive related with the
evolution of the number of detected peers. It occurs when the detector peers mistakenly
flag an honest peer as being infected. We can notice that the number of false positive is
very low.
In summary, the validation experiments show that our supervision and tracking
process detect close to 92% of polluter nodes, which prove the effectiveness of our
methodology.
684 Z. Trifa et al.
6 Conclusion
In this paper, we presented the pollution attacks in structured p2p overlay networks.
We have depicted that this attack is one of the major problems that affect these systems.
Pollution attacks waste network resources and annoys users with contaminated objects.
They damage the contents of the target objects and dispatches them in the network. In
this way, contaminated objects will be distributed through the sharing system. We have
proposed a new monitoring process based on three steps. The first step is based on the
identification of suspicious nodes. The second step is based on supervision of all
messages of suspicious nodes and its neighbors in order to identify polluter nodes and
invoke the last step that allows the isolation process. Finally, we have implemented our
methodology on the PeerfactSim.Kom simulator using the Chord protocol.
As a future work, we plan to implement our solution on some real distributed hash
table such as KAD and try to refine both solution and the corresponding features in
order to go further towards reaching a secure overlay networks.
Pollution Attacks Identification in Structured P2P Overlay Networks 685
References
1. Maurya, R.K., Pandey, S., Kumar, V.: A survey of peer-to-peer networks. J. Adv. Res.
Comput. Commun. Eng. (2016)
2. Liang, J., Kumar, R., Xi, Y., Ross, K.W.: Pollution in p2p file sharing systems. In:
Proceeding of the International IEEE Conference INFOCOM, Miami, FL, March 2005
3. Chawla, S.: Content pollution in P2P system. J. Inf. Comput. Technol. 3(8), 841–844 (2013)
4. Chen, C.S., et al.: Application of fault-tolerant mechanism to reduce pollution attacks in
peer-to-peer networks. J. Distrib. Sensor Netw. 10(7), 792407 (2014)
5. Locher, T., Mysicka, D., Schmid, S., Wattenhofer, R.: Poisoning the Kad network. In: Kant,
K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) ICDCN 2010. LNCS, vol. 5935,
pp. 195–206. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11322-2_22
6. Liang, J., Naoumov, N., Ross, K.W.: The index poisoning attack in p2p file sharing systems.
In: Proceeding of the International IEEE Conference INFOCOM, April 2006
7. Shi, J., Liang, J., You, J.: Measurements and understanding of the KaZaA P2P network.
J. Current Trends High Perform. Comput. Appl. (2005)
8. Liang, J., Naoumov, N., Ross, K.W.: Efficient blacklisting and pollution-level estimation in
P2P file-sharing systems. In: Cho, K., Jacquet, P. (eds.) AINTEC 2005. LNCS, vol. 3837,
pp. 1–21. Springer, Heidelberg (2005). https://doi.org/10.1007/11599593_1
9. Kamvar, S.D., Schlosser, M.T., Garcia-Molina, H.: The eigentrust algorithm for reputation
management in p2p networks. In: Proceeding of the International Conference on WWW,
Budapest, Hungary, pp. 640–651 (2003)
10. Costa, C., Soares, V., Almeida, J., Almeida, V.: Fighting pollution dissemination in
peer-to-peer networks. In: Proceeding of the International Conference ACM SAC, Seoul,
Korea, pp. 1586–1590 (2007)
11. Vieiera, A.B., et al.: SimplyRep: a simple and effective reputation system to fight pollution
in P2P live streaming. J. Comput. Netw. 57(4), 1019–1036 (2013)
12. Meng, X.-F., Tan, J.: Field theory based anti-pollution strategy in P2P networks. In: Yu, Y.,
Yu, Z., Zhao, J. (eds.) CSEEE 2011. CCIS, vol. 159, pp. 107–111. Springer, Heidelberg
(2011). https://doi.org/10.1007/978-3-642-22691-5_19
13. Walsh, K., Sirer, E.G.: Fighting peer-to-peer SPAM and decoys with object reputation. In:
Proceedings of the International Conference on P2PECON, Philadelphia, August 2005
14. Feng, Q., Dai, Y.: Lip a lifetime and popularity based ranking approach to filter out fake files
in p2p file sharing systems. In: Proceedings of the International Conference on IPTPS,
February 2007
15. Zhang, P., Fotiou, N., Helvik, B.E., Marias, G.F., Ployzos, G.C.: Analysis of the effect of
InfoRanking on content pollution in P2P systems. J. Secur. Commun. Netw. 7(4), 700–713
(2014)
16. Shin, K., Reeves, D.S.: Winnowing: protecting P2P systems against pollution through
cooperative index filtering. J. Netw. Comput. Appl. 31(1), 72–84 (2012)
17. Qi, M., Guo, Y., Yan, H.: A reputation system with anti-pollution mechanism in P2P file
sharing systems. J. Distrib. Sensor Netw. 5, 44–48 (2009)
18. Montassier, G., Cholez, T., Doyen, G., Khatoun, R., Chrisment, I., et al.: Content pollution
quantification in large P2P networks: a measurement study on KAD. In: Proceedings of 11th
IEEE International Conference on Peer-to-Peer Computing, Japan, August 2011
19. Wang, Q., Vu, L., Nahrstedt, K., Khurana, H.: Identifying malicious nodes in
network-coding-based peer-to-peer streaming networks. In: Proceedings of the International
Conference on IEEE INFOCOM (2010)
686 Z. Trifa et al.
20. Gaeta, R., Grangetto, M.: Identification of malicious nodes in peer-to-peer streaming: a
belief propagation based technique. J. IEEE Trans. Parallel Distrib. Syst. 24(10), 1994–2003
(2013)
21. Gaeta, R., Grangetto, M., Bovio, L.: DIP: Distributed Identification of Polluters in P2P live
streaming. J. ACM Trans. Multimedia Comput. 10(3), 24 (2014)
22. Graffi, K.: PeerfactSim.KOM – a peer-to-peer system simulator: experiences and lessons
learned. In: Proceedings of IEEE International Conference on Peer-to-Peer Computing
(2011)
23. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable
peer-to-peer lookup service for Internet applications. In: Proccedings of ACM SIGCOMM,
San Diego, California (2001)
Author Index