Data Security and Privacy Protection For Cloud Storage

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

SPECIAL SECTION ON EMERGING APPROACHES TO CYBER SECURITY

Received June 28, 2020, accepted July 12, 2020, date of publication July 16, 2020, date of current version July 29, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3009876

Data Security and Privacy Protection


for Cloud Storage: A Survey
PAN YANG 1, NAIXUE XIONG2 , (Senior Member, IEEE), AND JINGLI REN 1
1 Henan Academy of Big Data, School of Mathematics and Statistics, Zhengzhou University, Zhengzhou 450001, China
2 Department of Mathematics and Computer Science, Northeastern State University, Tahlequah, OK 74464, USA

Corresponding author: Jingli Ren ([email protected])


This work was supported in part by the National Natural Science Foundation of China under Grant 11771407, in part by the Chinese
Academy of Engineering Advisory Project under Grant 2020-ZD-16, in part by the MOST Innovation Method Project under
Grant 2019IM050400, and in part by the Key Discipline Construction Projects of Zhengzhou University under Grant XKZDQY202004.

ABSTRACT The new development trends including Internet of Things (IoT), smart city, enterprises digital
transformation and world’s digital economy are at the top of the tide. The continuous growth of data storage
pressure drives the rapid development of the entire storage market on account of massive data generated.
By providing data storage and management, cloud storage system becomes an indispensable part of the
new era. Currently, the governments, enterprises and individual users are actively migrating their data
to the cloud. Such a huge amount of data can create magnanimous wealth. However, this increases the
possible risk, for instance, unauthorized access, data leakage, sensitive information disclosure and privacy
disclosure. Although there are some studies on data security and privacy protection, there is still a lack
of systematic surveys on the subject in cloud storage system. In this paper, we make a comprehensive
review of the literatures on data security and privacy issues, data encryption technology, and applicable
countermeasures in cloud storage system. Specifically, we first make an overview of cloud storage, including
definition, classification, architecture and applications. Secondly, we give a detailed analysis on challenges
and requirements of data security and privacy protection in cloud storage system. Thirdly, data encryption
technologies and protection methods are summarized. Finally, we discuss several open research topics of
data security for cloud storage.

INDEX TERMS Cloud storage, data security, cryptography, access control, privacy protection.

I. INTRODUCTION Besides, the advantage of pay-as-you-go makes most tradi-


With the rise of the Internet of Things (IoT), the number tional enterprises actively migrate data to the cloud. Cloud is
of information sensing devices connected to the Internet not only the destination of workload, but also provides effi-
is increasing to realize the interconnection among people, cient operation practice, which makes enterprises have higher
devices and ‘‘things’’. A new forecast by IDC [80] estimates agility and flexibility. This has promoted both enterprises
that there will be 41.6 billion internet of things devices or digital transformation and network modernization transfor-
‘‘things’’ in 2025, generating 79.4 zettabytes (ZB) of data. mation [19]. In 2019, the Digital Economy Report released
Not only that, people are still committed to improving the by the United Nations emphasizes that the digital economy
efficiency of data collection of devices in IoT, see, [59], [79]. is becoming an important driving force for economic devel-
The unprecedented amount of data is generated and hosted opment. According to incomplete statistics, the digital econ-
on the cloud service provider platform [78]. Due to the omy accounts for 4.5% to 15.5% of the world GDP [25].
high performance, scalable and reliable datacenters of the Cloud computing is conducive to promoting the deep inte-
cloud, many of the smart city applications and services will gration of Internet, big data, artificial intelligence and real
be hosted in the Cloud. Therefore, smart city residents and economy, and is the core of accelerating the construc-
service providers can rely on cloud services to host, build tion of modern economic system. According to Gartner,
and/or deploy their smart city services and applications [39]. Inc. [34], the worldwide public cloud service market will
grow by 17% in 2020, reaching $266.4 billion, up from
The associate editor coordinating the review of this manuscript and $227.8 billion in 2019. Taken together, cloud application is
approving it for publication was Luis Javier Garcia Villalba . still the mainstream.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 131723
P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

Cloud storage is essentially a cloud computing system that 3) Object storage, such as AWS S3, is optimized for storing
allows users to store and share data on the Internet. The large volumes of unstructured data.
advantages of cloud storage include unlimited data storage Cloud storage is based on virtualization infrastructure and
space, convenient, safe and efficient file accessibility and is similar to cloud computing in terms of accessible inter-
offsite backup, and low cost of use. Cloud storage can be faces, scalability and measurement resources. It consists of
divided into five categories in practical applications, namely, four layers [116], which can be summarized as follows:
public cloud storage, personal cloud storage, private cloud 1) The storage layer, the basic part of cloud storage, is made
storage, hybrid cloud storage and community cloud storage. up to storage devices and a unified storage device manage-
In public cloud, enterprises outsource data storage business ment comprise. 2) The primary management layer is the core
to cloud storage providers (for instance AWS and Alibaba part of cloud storage, and also the most challenging part
Cloud) without having to deploy infrastructures and maintain of cloud storage. 3) The application interface layer is the
servers. The data can be accessed only by authorized user. most flexible part of cloud storage. 4) The last one is the
The advantages of public cloud such as flexibility, scala- access layer. From this point of view, cloud storage supplies
bility and cost saving attract plenty of small and medium data access services including data storage, data computation,
enterprises. Personal cloud, also known as mobile cloud authentication, and access control. Due to the characteris-
storage, is essentially a branch of public cloud, but differ tics of cloud storage, data security and privacy issues are
from public cloud, it provides public cloud storage services inevitably generated in this process. The requirements of data
for individual users. In private cloud, enterprises need to security in cloud storage are mainly shown in the following
deploy cloud storage infrastructures and arrange professional aspects [8], [61], [93], [94], [108]:
staff to manage and maintain servers. This ensures that the • Data Confidentiality: Data confidentiality refers to pre-
private cloud has higher security than the public cloud and vent the active attack of unauthorized parties on users’
the control of data is in the hands of the enterprise itself. But data, and ensure that the information received by the
the cost increases dramatically. This storage model is more data receiver is completely consistent with the informa-
suitable for large enterprises with large amount of expensive tion sent by the sender. That is to mean, only autho-
and sensitive data. Hybrid cloud is a combination of public rized people are entitled to access and obtain the data.
cloud and private cloud, which inherits all the advantages Imagine your bank account. You should be able to
of both. Enterprises can store expensive and sensitive data access them, of course, and employees at the bank who
in private cloud and other data in public cloud. The appeal are helping you with a transaction should be able to
of this storage model continues to grow. As a new cloud access them, but no one else should. Once accessed by
storage mode in recent years, community cloud is very suit- others, data confidentiality is compromised, which is
able for medical and financial industries. Community cloud irreversible.
provides cloud services for several businesses in a specific • Data Integrity: Data integrity is the reliability of the data,
community. Usually these businesses have the same concerns that is, the data can not be arbitrarily tampered with
or need to work together on some projects. Infrastructure and replaced. For example, if you’re shopping online
construction and server management can be jointly under- on Amazon, someone can change the items in your cart
taken by community Cloud members or outsourced to a third without your authorization. The absence of data integrity
party. can pose serious security issues.
From the perspective of storage architecture, the major • Data Availability: Data availability emphasizes that data
cloud platforms typically offer three broad classes of storage: can be accessed normally at any time, namely user can
block storage, file storage and object storage [47]. 1) Cloud access, download, or do some modifications on data in
block storage, respected by Storage Area Networks (SAN), the cloud as soon as they need it.
in essence provides a virtualized Storage Area Network with • Fine-Grained Access Control.
logical volume management provisioning via a simplified • Secure Data Sharing in Dynamic Group.
web services interface. 2) File storage, which is also referred • Leakage-Resistant.
to as file-level or file-based storage, is normally associated • Completely Data Deletion: When users no longer use
with Network Attached Storage (NAS) technology [73]. With cloud storage, they can completely delete the data out-
the file system, file storage manages the sharing data and sourced to the cloud server and confirm that the data has
access to data stored on it more flexibly than block storage. been completely destroyed, instead of being cheated by
Massive data brings a series of challenges to enterprises, such malicious cloud service providers.
as storage expansion, data sharing, efficient transmission, • Privacy Protection: While users enjoy the convenience
cost and data security, when data storage reaches the PB level, of cloud storage, the cloud storage providers have cap-
the limitation of by NAS and SAN directly leads to the tured their privacy information, such as personal iden-
increase of equipment maintenance cost in the later period. tity, location, and sensitive data for the enterprise. Pri-
They are unable to fully meet the enterprise’s requirements vacy security mechanisms are used to guarantee these
for the reliability, availability, security and other indicators data to be secret under curious adversaries and malicious
of mass storage data in that object storage is more critical. employees of cloud service providers.

131724 VOLUME 8, 2020


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

With the further centralization of data and the increase techniques and the state of art involved in data security and
of data volume, it becomes problematic to secure data in privacy-preserving, respectively. In Section IV, we discuss the
cloud storage. Therefore, how to ensure that users and their clear research direction of cloud storage. Finally, we draw our
information resources are not exposed will be a major concern conclusion in Section V.
of cloud service providers and scholars for a long time.
However, the existing information security methods are no II. DATA ENCRYPTION TECHNOLOGY
longer meet the information security requirements in the era When data is outsourced to the cloud, its security is vul-
of big data, and security threats will gradually become the nerable. Encryption is an effective technique to protect data
bottleneck restricting the development of big data technology. security. The essence of data encryption is to transform the
In fact, data storage security includes static data security and original plaintext file or data into an string of unreadable code
dynamic data security in cloud storage. Static data security by some algorithms, which is usually called ciphertext. Even
is to ensure the security of static data on the cloud stor- if someone intercepts the garbled code, he/she can’t use the
age system, while dynamic storage security is to ensure the garbled code to get the original content, which effectively
integrity and confidentiality during data transmission. Data protects the confidentiality of the data and prevents the data
is transmitted through the IP network in the cloud storage, from being tampered. Users who are authorized to access
so security threats on the traditional network also exist in can decrypt the file with the corresponding private key, and
the cloud storage system, such as data destruction, data theft, then update, modify the ciphertext. Encryption is divided into
data tampering, denial of service, etc., affecting the safe symmetric encryption and asymmetric encryption. Symmet-
storage of data. In cloud storage system, users’ data may ric encryption uses a secret key to encrypt and decrypt data.
be distributed across multiple servers, and each server may However, before using symmetric encryption, users need
be shared by multiple users, which leads to the increasing to determine a consensus key, which is very inconvenient
risk of unauthorized access undesirably. Complex encryption for multi-user sharing files. By comparison, the asymmetric
algorithms are not friendly resources-limited users, so it is a encryption, also known as public key encryption, is more
practical problem to ensure that they can operate on their own convenient. Public key encryption contains a pair of keys. The
devices. In addition, it should be high probability for users’s public key that can be disclosed to others for encrypting files,
devices to be under the side channel attack is very high. while the private key is used for decrypting the ciphertext.
In summary, the data security and privacy-preserving in cloud In this section, we present some encryption technologies that
storage system mainly faced with the following challenges: are widely applied in cloud storage system.
• Fine-grained data access control.
• Malicious cloud service providers may return incorrect
A. IBE: IDENTITY-BASED ENCRYPTION
integrity audit results. In the traditional PKI (Public Key Infrastructure), in order to
• Side channel attack. confirm that the identity information is consistent with the
• Malicious cloud service providers do not comply with public key used for encryption, the sender needs to authenti-
customers’ requests to completely delete data in the cate the identity information of the receiver through a trusted
cloud. third-party Certificate Authority (CA) before encrypting a
• Privacy-preserving. file with the public key. This process may lead to the sender’s
workload significantly increased when he wants to share
Although cloud storage has developed for many years,
data with multiple receivers. In order to solve this problem,
it is still very important in the Internet of Things, smart city
the concept of identity based cryptography was proposed
and digital economy. Data security and privacy protection in
by Shamir [68] in 1984. The idea is to associate the user’s
cloud storage are still of great importance, which inspires us
identity information with the public key, so that there is no
to present this review. we make a comprehensive review of the
need to verify the receiver’s certificate before encryption.
literature on data security and privacy issues, data encryption
In 2001, Boneh and Franklin [12] formally gave the definition
technology, and applicable countermeasures in cloud storage
and security model of Identity-Based Encryption IBE, and
system. The main contributions of this paper are as follows
applied bilinear map to construct a secure IBE scheme in
• We first make an overview of cloud storage, including their seminal paper. In such a system, Alice is a sender
definition, classification, architecture and applications. wants to send an encrypted message to Bob. Private Key
• We give a detailed analysis of data security and privacy Generator (PKG), a trusted third party, is required to generate
issues and mechanisms in cloud storage system. the corresponding public key and private key. First, in order
• Data encryption technologies and protection meth- to encrypt the message, Alice utilizes the receiver’s unique
ods are summarized. These correspond to the security identity information (Bob’s e-mail: [email protected]) to generate
requirements we mentioned earlier. the public key from PKG. Then Alice sends the encrypted
• We discuss several open research topics of data security message to Bob. The receiver Bob contacts the PKG and
for cloud storage. authenticates to obtain the corresponding private key. The
The remainder of this article is organized as follows. Fig. 1 shows how the identity-based encryption works. Soon
Section II and Section III present the cryptography-based afterwards many scholars improved the IBE. Boneh and

VOLUME 8, 2020 131725


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

When a user revokes his/her identity, the data owner usu-


ally update the ciphertext to ensure that the user can no
longer access the previously available data and the subse-
quently shared data. This period involves a decryption–re-
encryption–upload process. This process not only increases
the exposure of private key, but also increases the computing
cost and time cost of data owner. To solve this problem,
Wei et al. [90] defined a searchable storage IBE that can
protect ‘‘forward security’’ + ‘‘backward security’’, which
can also resist private key exposure. In this scheme, each
ID is randomly assigned to a leaf node. Unrevoked user has
a node θ ∈ Path(η) ∩ KUNodes(BT , Rl, T ) in a certain
period T , which allows the user to obtain the decryption key
by re-randomizing private key (θ, SKID,θ ) and update key
FIGURE 1. Identity-based encryption. (θ, KUT ,θ ), while for the revoked user, the decryption key
cannot be obtained without θ. Lee [50] found that when a
ciphertext is updated from periodic T to periodic T + 1, its
Boyen [10] got the chosen security of IBE system under the plaintext is not available by the decryption key at time T + 1.
standard model, and the full security IBE scheme was studied They improved the scheme with the method in [49].
by [11], [37], [85].
The revocable IBE revocation algorithm usually takes the B. ABE: ATTRIBUTE-BASED ENCRYPTION
public parameter PP, user ID, revocation list RL, revocation In identity based encryption scheme, identity is a mean-
time t and state st as input, and the updated revocation list as ingful string, which is different from each other. However,
output. See Algorithm 1. the flexibility of IBE scheme runs into bottlenecks when the
ciphertext is to be legally accessed by multiple users. In 2005,
Algorithm 1 Revoke
Sahai and Waters [67] proposed the fuzzy identity-based
Input: PP, ID, RL, t, st encryption in the first time, which is the origin of attribute
Output: The updated RL based encryption (ABE). Different from identity based
1: RL ← RL ∪ {(ID, t)} encryption, identity is replaced by a set of attributes in the
2: return RL attribute based encryption, and only users whose attribute
set matches the access policy can access the encrypted data.
Reference [12] proposed the first IBE scheme with revo- Generally, ABE algorithm consists of four parts:
cation of public keys. By defining the public key as ‘‘ID + 1) Setup phase, also known as the system initialization
validity period’’, the receiver is allowed to use the private phase, in which pertinent security parameters are input
key to decrypt in a certain period. After the validity period and corresponding public parameters (PK) and master
is exceeded, the receiver needs to apply to PKG for updating key (MK) are generated;
the private key to obtain the decryption permission again. 2) KeyGen stage, namely the key generation stage, data
Once the public key of someone is revoked, PKG will not owner submit their own attributes to the system to
update the private key for him or her. No matter how many obtain the private key associated with the attributes;
times the private key is updated, only the receiver needs to 3) Encryption phase, the data owner encrypts the data by
interact with PKG, while the sender does not. This scheme his/her public key and get the ciphertext (CT) and sends
greatly improves the practicality of identity-based encryp- it to the receiver or to the public cloud.
tion. In 2015, Li et al. [51] improved the result of [12] with 4) Decryption phase, decryption users get ciphertext,
introducing outsourced computation into IBE revocation and decryption with their own private key SK.
showed the security definition of outsourcing revocable IBE ABE is promising to provide fine-grained access con-
for the first time. In this scheme, PKG no longer undertakes trol over encrypted files in the data sharing applications,
the task of key update except to send a private key for decryp- in that the data owner can specify who can access the
tion to the user at the beginning. This private key contains encrypted data. It is mainly divided into two categories:
identity component IK [ID] and time component TK [ID]Ti , Key-Policy Attribute-Based Encryption (KP-ABE) and
where Ti means that TK [ID]Ti is valid during the period Ti . Ciphertext-Policy Attribute-Based Encryption (CP-ABE).
The Key Update Public Cloud Service Provider (KU-CSP) In 2006, Goyal and Pandey [40] developed KP-ABE.
is responsible for updating time components for users who In the KP-ABE system, each ciphertext is associated with
are not revoked. KU-CSP terminates updating Ti for revoked a set of attributes, while the use’s private key is related
user as soon as he/she submits revocation application to PKG. to an access policy for the attributes. For instance, C1 is
Later, Boldyreva et al. [9] used binary trees to manage iden- a ciphertext encrypted by a set of attributes (‘‘Student’’,
tities for effective revocation. ‘‘Applied Mathematics’’) (see Fig. 2). The access policy of

131726 VOLUME 8, 2020


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

FIGURE 3. CP-ABE in Cloud.

FIGURE 2. KP-ABE in cloud.


non-revoked users and updates the decryption key for them,
while revoked user’s decryption key is invalid. This implies
an indirect revocation. Xu et al. [98] drew on the idea of
user 1 is ‘‘(‘Department of Mathematics’) OR (‘Student’ revocation in [9], [67]. Namely, the decryption key consists
AND ‘Applied Mathematics’)’’. Obviously, the attributes of two parts, long-term secret key and update key, and the
contained in the ciphertext C1 satisfy the access policy of update key needs to be updated regularly. The difference is
user 1, so he has the privilege to decrypt C1. While user 2 can that the attribute set will be divided into two disjoint sets, each
decrypt the ciphertext with attributes (‘‘Department of Math- one combines with the master key to generate a secret key,
ematics’’, ‘‘Student’’) OR (‘‘Department of Mathematics’’, respectively. The two secret keys are different and have the
‘‘Basic Mathematics’’), but not C1. In the same way, user 3 property of re-randomization, so that decryption key expo-
can’t decrypt C1, either. sure resistance can be achieved. Besides, the tree-based data
In 2007, Bethencourt et al. [7] provided the first con- structure is introduced to reduce computational burden for
struction of CP-ABE. In CP-ABE, the policy is embedded key generation centre.
in the ciphertext, and data owner can define the access On the other hand, in direct revocation schemes, trusted
policy to determine which attributes the person with can authority generates a revocation list including all revoked
access the ciphertext. User’s private key is related to the users, which is public for every user. Data owner specifies
set of corresponding attributes. From a mathematical point the revoked users directly in ciphertext so that all contained
of view, access structures can be seen as a monotonic ‘‘ revoked users cannot decrypt this ciphertext, even if their
access tree’’, and its nodes consist of threshold gates and attributes (or access policies) match the access policy (or
the leaves describe attributes. For example, a sensitive file attribute set) embedded in ciphertext. Shi et al. [71] pre-
is encrypted by an access policy ‘‘(‘President’) OR (‘Stu- sented a KP-ABE scheme with direct revocation and verifi-
dent’ AND ‘Department of mathematics’) OR (‘Professor’)’’, able ciphertext delegation. In their scheme, trusted authority
which implies that only someone with attributes (‘‘Presi- revokes users via updating revocation list and any interaction
dent’’) or (‘‘Student’’, ‘‘Department of Mathematics’’) or with non-revoked users at the same time. After receiving the
(‘‘President’’) can access the file (see Fig. 3). Cheung and new revocation list, the third party (such as cloud service
Newport [21] presented an improved scheme based on [7], provider) updates the ciphertext using public information,
which is proved to be CPA secure and CCA secure under the and this ensure the new ciphertext cannot be decrypted by
Decisional Bilinear Diffie-Hellman (DBDH) assumption. revoked users. Finally, any authorized auditor has the privi-
The attributes of user may change for various reasons. For lege to verify if the third party has updated the ciphertext cor-
instance, one transfers from one job to another. Attributes rectly. This scheme not only forbids revoked users to decrypt
changes mean that one may not be unqualified for accessing the new ciphertext, but also provides verifiable function for
data that were previously authorized. In addition, the mali- data owners to ensure that ciphertext has been updated under
cious behavior (such as collude with hackers) of some autho- the new revocation list. In 2016, Ma et al. [60] improved [71].
rized users may disclose the confidentiality and privacy of With the technology from [64], they achieve large universe
the data, which makes data owner suffer losses. Therefore, construction, where the size of attributes is not limited and
a secure revocation in ABE is necessary. Existing revocation can be exponentially large, and new attributes can be added
schemes can be divided into indirect revocation (see [3], [9], into the system. Xiong et al. [96] proposed a CP-ABE scheme
[58], [98]) and direct revocation (see [71], [107]). In indi- gathering properties on direct revocation, partially hidden
rect schemes, trusted authority periodically interacts with policy and outsourced decryption.

VOLUME 8, 2020 131727


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

In general, only key revocation does not prevent users from communication overhead. A comparison of ABE schemes
using the old private key to decrypt the previously accessible mentioned above is showed in Table 1.
ciphertext. In order to restrict the illegal access of revoked
users, the data owner will update the access policy or re C. HOMOMORPHIC ENCRYPTION
encrypt the ciphertext. When it comes to the dynamic sharing Although the identity based encryption and attribute based
of many people, this scheme is obviously inefficient. To solve encryption introduced earlier can guarantee the confidential-
this problem, the concept of revocable storage is proposed, ity of data in the cloud to a certain extent, they have some
which support both key revocation and ciphertext update. drawbacks. If a user needs to update his encrypted files stored
In 2012, Sahai et al. [66] presented a practical revocable in the cloud, he has two methods. One is to modify the cipher-
storage attribute based encryption, where the database will text in the cloud. However, after the modified ciphertext is
regularly update the stored ciphertext with the available pub- decrypted, it will usually become meaningless garbled code
lic information, and any revoked user will lose access privi- and cause data damage. The other is to update the decrypted
leges after the ciphertext is updated. Recently, Wei et al. [89] file, and send the encrypted new file to the cloud. This is
considered secure sharing and dynamic access revocation of very complex and cumbersome. If his file contains a large
the EHR data in public cloud. Both forward security and amount of data, the process of downloading, decrypting and
backward security [90] are obtained simultaneously. encrypting will not only take a lot of time, but also have
In the existing ABE schemes, a great deal of attributes a high demand for the computing power of the user’s local
lead to a large scale of access policy, and the ciphertext device. In addition, the transmission process from local to
size of most ABE schemes increase with the complexity cloud also brings the risk of data leakage. To solve this
of access policies. As a result, ciphertext redundancy has problem, homomorphic encryption shows great superiority.
increased significantly, which not only cause expensive com- Homomorphic encryption is a kind of public key encryption,
putation when user have to decrypt the ciphertext by local which allows users to perform certain algebraic operations
device, but also increases users’ workload. This is especially on ciphertext and still get the encrypted text, and the result
unfriendly for resource-constrained users. To solve this prob- after the ciphertext is decrypted is consistent with the result
lem, Many Abe schemes are proposed to reduce the burden of of the same operation on plaintext. With Fig. 4 and table
resource-constrained users. For example, outsourcing com- it’s easier for us to understand how homomorphic encryption
puting to cloud service providers [45], [53], designing cipher- works in cloud. Data owner encrypt the file by homomorphic
text of constant size, compacting policy [83] and improving encryption and send it to the cloud server. The authorized
policy management [87]. More concretly, Li et al. [53] pre- users can decrypt the ciphertext with the corresponding pri-
sented an outsourcing KP-ABE scheme with efficient query vate keys. If user 2 wants to perform some specific opera-
processing, which implements outsourcing key-issuing and tions on ciphertext, the only thing he needs to do is send
outsourcing decryption. The data owner uploads the cipher- the functions corresponding to the operations to the cloud
text with a keyword set to the storage cloud service provider. server. The server get operand and perform the operation
Users submit a trap door for a keyword such as ‘‘book’’ to without decrypt the ciphertext and return the encrypted result
the cloud service providers to request keyword search. After to user 2. Homomorphic encryption effectively protects the
receiving the client’s request, cloud service provider imme- security of outsourced data.
diately performs partial decryption and keyword search on
the ciphertext, and returns the matching results to the user.
Outsourcing decryption enables users to save a lot of comput-
ing resources on the premise of maintaining confidentiality
of data. Using trapdoor instead of keyword plaintext to per-
form query processing avoids cloud service provider using
cookie records to pry into users’ privacy and preferences.
Wang et al. [84] compact the scale of access policy through
greedy compacting algorithm, so that the ciphertext redun-
dancy can be reduced due to the decreased policy scale.
Multiple users share the public policy nodes. By introduc-
ing flexible factor and overlap factor, the policy-computing
efficiency and compact ratio are analyzed. Policy-compacting
fundamentally solves the problem of ciphertext redundancy
caused by the large scale of policy, which is of great FIGURE 4. Homomorphic encryption in cloud.
significance to improve the performance of Abe scheme.
In order to improve the scalability of CP-ABE scheme,
Wang et al. [83] designed an scalable access policy From the point of view of mathematics, homomorphic
based on the idea of blocked linear secret sharing scheme encryption embodies the concept of homomorphism [32].
(BLSSS), which has lower storage costs, computation and Given a homomorphism f : A → A∗ is a structure-preserving

131728 VOLUME 8, 2020


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

TABLE 1. Comparison of ABE schemes.

map between sets A and A∗ with the composition According to the computing power of ciphertext, homo-
operations ◦ and •, respectively. Let a, b, c ∈ A, with c = morphic encryption can be divided into three categories:
a◦b and a∗ = f (a), b∗ = f (b), c∗ = f (c) ∈ A∗ . Based on the Partial Homomorphic Encryption (PHE, also known as
above assumptions, we can get f (a◦b) = f (a)•f (b). Consider semi homomorphic encryption), Somewhat Homomorphic
that the homomorphism f (·) is a one-to-one mapping and Encryption (SHE) and Full Homomorphic Encryption (FHE).
represents the encryption procedure and A is the data set PHE refers that one operation is allowed to be per-
consists of our data stored in the cloud; f −1 , the inverse formed on ciphertext, addition homomorphism or multipli-
of f with a = f −1 (a∗ ), b = f −1 (b∗ ), c = f −1 (c∗ ), is the cation homomorphism, not both. To support the additive
decryption procedure and the composition operations are the homomorphism on ciphertext, a classical scheme of addi-
specific types of computations carried out with ciphertext. tive homomorphic encryption was proposed by Paillier [63].
The work principle of homomorphic encryption is show Fast decryption scheme based on Paillier homomorphic was
in Table 2. present by El Makkaoui et al. [30]. The unique feature of

VOLUME 8, 2020 131729


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

TABLE 2. Mapping representation of homomorphic encryption.

this scheme is that the private key is used for encrypting was proposed in [36]. In order to weaken the hypothesis,
and decrypting files, and the evaluation key is used for Brakerski and Vaikuntanathan [16] proposed a FHE scheme
performing computation (additive homomorphism) on the based on learning with errors (LWE). First, relinearization
encrypted files. For multiplicative homomorphism, [31] gives was introduced to achieve SHE, which does not involve ide-
a ElGamal homomorphic model. An additive homomorphic als. Then in order to obtain FHE from SHE, the dimension-
encryption model based on elliptic curve encryption with modulus reduction technique is creatively proposed to cancel
ElGamal. The interesting thing about this model is that it the hardness hypothesis in [36]. Brakerski et al. [15] Con-
does not encrypt the plaintext directly. Instead, the plaintext structed a more efficient layered homomorphic encryption
is first converted to an integer, then by a encoding function scheme, and bootstrappiing procedure exists only to opti-
mapped points on an elliptic curve, and finally encrypt the mize performance. Inspired by the knowledge of scale, [14]
points. When decrypting, first convert the encryption points reduce the noise of ciphertext multiplication in LWE-based
to an integer, and then calculate the corresponding plaintext. FHE scheme without modulus switching. In order to make
SHE scheme supports both addition and multiplication, multiplication natural for ciphertext, Gentry et al. [38] intro-
although the times of multiplication that can be performed duced approximate eigenvector method to make ciphertext
are limited. Most SHE schemes can do the mixed operation of be the matrix. In addition, they also gained identity-based
addition and multiplication on the data encrypted by the same FHE and attribute-based FHE. Cheon et al. [20] proposed a
public key. Zhang [112] presented a SHE scheme applicable RLWE full encryption scheme to support floating-point cal-
for multi-user to cooperation on data encrypted with their culation, where rescaling is the core technology. By rescaling,
public keys, respectively. Since different user encrypt their if the plaintext is divided by an integer, the corresponding
data with different public key, it is not feasible to directly ciphertext and the preinserted errors are divided by the same
perform operations on ciphertext. Therefore, re-encrypt the integer, where the errors are bounded. This ensures that the
ciphertext in the same way is necessary. Addition and mul- ciphertext modulus increases linearly rather than exponen-
tiplication can be performed on the re-encrypted ciphertext, tially. Although decryption is approximate to the original
and each user involved can decrypted the computed result plaintext, its accuracy can be predicted by rounding, which
using their own private key, which is corresponding to the is similar to the approximate calculation for floating-point.
public key used for the first level encryption. Quantum cryp- Although this scheme implements a lot of primary opera-
tography was introduced in the SHE scheme to obtain uncon- tions on the representation of encrypted floating-point real
ditional security and efficient query on ciphertext in [75], values, it does not support the size comparison operation for
and the proposed scheme belongs to symmetric encryption. given floating-point values. In order to solve this problem,
Multi-user training machine learning model on encrypted Moon and Lee [62] introduced TFHE [22] algorithm on the
data is also studied in recent years. In this case, the functions basis of the [20], and obtained higher performance compari-
used to learn the model are generally continuous functions, son operation.
which need to be approximated by polynomial functions.
Generally speaking, the higher the degree of polynomials D. SEARCHABLE ENCRYPTION
is, the smaller the error of approximation is, but this will Most people choose to store data in the cloud due to the
cause the greater the noise and the more time it takes to unlimited space of cloud storage and the flexible service.
calculate the encrypted data. To solve this problem, the degree To ensure data security, users typically encrypt data before
of approximate polynomials is set in an appropriate interval, uploading it to the cloud. As mentioned earlier, this ensures
and the resulting noise is controlled within a threshold value the confidentiality of the data. But if someone wants to
in [77]. When the noise reaches the threshold value, the server search for an encrypted file uploaded in the cloud, he/she
reports the calculated results (ciphertext) to the customer. The will encounter some trouble. Since the data is encrypted in
advantage of this model is that the client only needs to decrypt the cloud, users cannot search the encrypted files directly.
and view the returned results, and the server processes the There are two solutions for this problem. One is that the user
whole calculation process. downloads the encrypted files to local, decrypts the cipher-
The data encrypted by homomorphism can be performed text, and then searches the keyword over the plaintext. This
by mixed operation of addition and multiplication simultane- method is secure but inefficient. If the retrieved file contains
ously, and the number of times is unlimited. FHE is on the massive data, it will consume a lot of computing resources
right track since the first FHE scheme based on ideal lattice and time. Another solution is to decrypt the ciphertext in

131730 VOLUME 8, 2020


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

cloud and retrieve plaintext on cloud server. However, this Wi,j = {w1 , w4 } from data file Fi,j and calculates security
solution will expose the context of these files, which seriously index Ii,j = (gh(w1 )+q1 ·sk , R2 , R3 , gh(w4 )+q1 ·sk , R5 , · · · , Rn ).
threatens data security and users’ privacy. Therefore, how to This design avoids the risk that the number of keywords in
enable users to search for specific keywords on encrypted each file is leaked. In addition, Du et al. [28] proposed a
files securely in the cloud has become the concern of many multi-client SSE supporting boolean queries. Their solution
scholars [5], [46], [48], [52], [76]. Searchable encryption not only supports the data owner to dynamically update some-
is a cryptography primitive that allows authorized users to one’s query permission without affecting others’ normal use
retrieve ciphertext in the cloud by some means (such as of data, but also reduces the interaction between users and
keyword query). Its feature is to ensure that the cloud server owners.
returns encrypted data files of interest to users without know- The searchable encryption based on public key cryptogra-
ing the ciphertext content. In terms of the way of encryption, phy is PEKS. In 2004, Boneh et al. [13] designed a Public Key
searchable encryption can be divided into Searchable Sym- Encryption with keyword Search (PEKS) algorithm, which
metric Encryption (SSE) and Public Key Encryption with is used to implement searchable encryption on the email
Keyword Search (PEKS). encrypted by public key. In this scheme (see Fig. 5), Bob
SSE is a kind of searchable encryption based on symmetric sends encrypted message E(M ) and PEKS value (related
cryptography. Recently, there are many literatures focus on to the keywords in the message M ) PEKS(pk, wi ), i =
designing of mechanisms for searching over encrypted data. 1, 2, · · · , n to the email server. Alice sends the trapdoor Tw of
Specifically, in 2000, Song et al. [72] designed a practi- the specified keyword (such as ‘‘urgent’’) to the server, so that
cal searchable encryption technique, which implements key- the server checks if there is an i ∈ {1, 2, · · · , n} to make
word based query for whole document depending on XOR wi = w. During the whole process, PEKS value will not
operation. In this scheme, each word wi in the whole doc- reveal any email content except the specified keywords. After
ument is encrypted with the same secret key, where the that, Baek et al. [4] improved Boneh et al’s scheme, and
encrypted wi is written as Wi . The ciphertext Ci is obtained by constructed an effective PEKS scheme with a safe channel
XORing Wi with the pseudo-random term generated by the removed. However, their solutions only address the search-
data owner. To search for word wi , the cloud server will able encryption issue with fewer keywords. There is a lack of
XOR Wi with all Cj s and return the correct Ci to data owner. practicability for the huge amount of data in the cloud with
Obviously, the search time increases linearly with whole many keywords. Most of the existing searchable encryption
encrypted document. In order to improve the efficiency of schemes implement selectively retrieves encrypted files by
searchable encryption and make the files matched by key- using keyword search over the ciphertext of data as well
words more satisfy the interests of users, Wang et al. [82] pro- as ensure security protection and retrieve privilege over the
posed the ranked searchable symmetric encryption scheme, encrypted files for both data owners and users. However,
where documents retrieved by single-keyword search will sometimes users need to store a lot number of keys to decrypt
be ranked via relevance. In this scheme, order-preserving the ciphertext files and generate trapdoors, and they have to
symmetric encryption was introduced to obtain higher effi- submit massive trapdoors to search the keyword over a large
ciency. With the popularity and increase of outsourced data, number of file. Verifiable searchable encryption has been
it is necessary to allow multiple keywords in search requests. designed [74] to ensure the privacy of keyword and handle
Cao et al. [18] proposed a secure multi-keyword ranked the threat from a semi-honest but curious server. Generally,
search over encrypted data. They use coordinate matching users have to store a lot number of keys to generate trapdoors
to retrieve as many documents as possible, and measure
the relevance between documents and keywords by using
inner product similarity. In order to reduce the retrieval
failure caused by misspelling, Fu et al. [33] improved
multi-keyword searchable encryption by adding fuzz search
functionality. Their core technology is that each keyword is
represented by uni-gram vector. With this, the misspelled
word can be represented by the word highly similar to
the correct one through computing their Euclidean distance.
Recently, researches [104], [111] on multi-keyword search
in mult-owner model enriches searchable symmetric encryp-
tion. In Yin et al.’s scheme, a group of data owners secretly
share two l-bit primes q1 , q2 ∈ Zq with q = q1 · q2 ,
where q1 is used to encrypt the security index by data own-
ers, and q2 is kept by the authorized data user to encrypt
the query keywords. They predefine the keyword dictio-
nary KD = {w1 , w2 , · · · , wn }, in which each keyword has
its own fixed position. Data owner Di extracts keywords FIGURE 5. PEKS in [13].

VOLUME 8, 2020 131731


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

and decrypt the ciphertext. It is a big challenge for users to of location and time constraints, such as private cloud storage.
manage their keys. The key-aggregate searchable encryption If the data is only for personal use, encryption can largely
schemes [23] have been proposed to reduce the number of ensure the confidentiality of private data. When it comes
keys for users. Recently, Wang et al. [88] proposed an effi- to sharing data to multiple parties (such as organizations or
cient verifiable key-aggregate keyword searchable encryp- groups), one-to-many data sharing mode (one data owner,
tion (EVKAKSE) system model. In this scheme, data owner multiple data users) is more suitable for them. The data owner
uploads encrypted files and related encrypted potential key- gives access to a specific group by designing a fine-grained
words to the cloud server. And then, data owner send users access control scheme. In this case, collusion-resistant and
an aggregate key, which allows users to retrieve files over the tamper-resistant are worthy of deep consideration. In this
decrypted files by using keywords directly, decrypt ciphertext section, we have investigated the literature in one to many
and verify the safety and practicality of retrieved result. Next, encryption, and reviewed one-to-many encryption from three
to perform keyword search over sharing files, users have to aspects: the preset cooperative access control of designated
generate an aggregate trapdoor using the mentioned aggre- multi-user, the fuzzy multi-party shared access control to deal
gate key. With the aggregate key, users can perform keyword with emergencies and the security access control to dynamic
search over the authorized files. Furthermore, this scheme is multi-group.
able to protect the keyword and its ciphertext and the submit- There is a common sense that the security of a lock that
ted trapdoor from being determined by the semi-honest but can only be opened by many different keys is much higher
curious cloud server and malicious cloud server. Any insider than that of a lock that can only be opened by one key.
attacker cannot calculate a valid users’ aggregate key from For enterprises or organizations, the data confidentiality of
the trapdoor. some encrypted files can highly be guaranteed, if the access
In this section, we summarize four encryption technologies policy requires multiple employees with different attribute
commonly used in cloud storage, which ensure the confiden- sets to obtain the access permission through cooperate, where
tiality of data in the cloud. From the perspective of access access request should be denied even if one of them is absent.
control, IBE embeds ‘‘identity’’ into public key and private Xue et al. [100] proposed a controlled collaboration access
key, which makes IBE have great advantages in protecting control scheme, which improved the model of [7]. In their
the private data of a single or a small number of users, such scheme, a set of translation nodes are inserted in the policy
as encrypting e-mail. In addition, IBE is also applied to proxy tree by data owner, translation value is added into ciphertext
re-encryption (such as [24]) to obtain lightweight encryption via cloud server and translation key is embedded into the
schemes, which makes users with limited resources no longer secret key in PKG, and all of there are designed to make
be bothered by the complex computation when decrypting. multi-user collaboration access feasible. The data owner can
Compared with IBE, ABE, as a fuzzy identity encryption, remove the translation nodes to cancel the privilege for coop-
has higher scalability. ABE allows the data owner to use eration access. Their scheme can effectively avoid malicious
the user’s attributes as a medium to specify the legitimate deletion and modification of important files by single enter-
users, and obtains high-efficiency fine-grained access con- prise employees. Collusion-resistant also avoids the illegal
trol functionality. Because the length of ciphertext increases access to confidential data by unqualified users.
with the amount of user attributes, the decryption might In order to realize temporary access authorization in the
requires heavy computing. In order to solve this problem, process of cross domain data sharing, Yang et al. [101]
the combination of ABE and IBE (for example [35]). can presented a self-adaptive access control system with secure
not only obtain fine-grained access control, but also reduce deduplication. They considered how to enable the unqualified
the computation and communication cost during decrypting doctors to access and decrypt the electronic medical records
prase. In addition to access control, homomorphic encryp- of the patient in an emergency (such as coma of patient),
tion realizes the ability to perform predefined operations so as to provide more accurate treatment plans for the patient.
on ciphertext, searchable encryption realizes the ability to In such a scheme, the electronic medical records and physio-
retrieve ciphertext, which increases the user’s control over logical parameters detected by wearable devices in real time
data and attracts more potential users. are encrypted and transmitted to the public cloud server by
data owner (usually patient), which pre-sets a break-glass
III. PRESENT RESEARCH FOCUS key to decrypt the data mentioned above, a password for
In the following part, we provide a introduction for state of generating the key, and a list of people who knows the
the art researches on data security and privacy protection in password. Person on the list interacts with the cloud servers
cloud storage system. with the password to generate the break-glass key, which
temporarily allow unauthorized medical workers to access the
A. ONE-TO-MANY ENCRYPTION patient’s electronic medical records. The traditional access
The high scalability and unlimited expansion of cloud storage control system only allows qualified users to access encrypted
attract more and more users and organizations to share their data legally, which is fatal for patients who need emergency
data in the cloud. Some data owners upload data to the cloud treatment, in that not all doctors are qualified to access. Their
for their own use through the Internet at any time, regardless system solves the problem of temporary access authorization

131732 VOLUME 8, 2020


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

TABLE 3. Comparison of relevant schemes on data confidentiality in cloud storage.

in electronic medical record sharing. It can not only ensure the only the data owner can audit the integrity of the outsourced
confidentiality of the patient’s data, but also make the original data. Although privacy auditing schemes are secure and effi-
unauthorized doctors can access the patient’s data legally. cient, they require high computing resources and networks
Personal health data is collected by intelligent wearable for auditing. Once data owners are unavailable due to network
devices or by hospitals, which can help doctors get a com- failures or limited computing resources, privacy audits cannot
prehensive understanding of patients’ conditions. In order be performed. In public auditing, the data owner can delegate
to protect privacy, data owners will choose to encrypt the the audit to an independent third party auditor (TPA), so both
data and upload it to the cloud. Many data owners, hospitals, data owner and third party auditor can verify the integrity of
health institutions, etc. form a cloud data sharing system outsourced data. Compared with the privacy audit, the public
with multiple groups. Each participant in the system will be audit scheme is not affected by the owner’s network and
divided into a specific group. Only when users satisfy two resources. Even if the owner cannot confirm the correctness
conditions can they access the shared data: 1) they belong of the data, the third-party audit can still perform the audit-
to the specified dynamic group; 2) their attributes meet the ing task. Because of the fault tolerance of public auditing,
predefined access policy. To achieve secure data sharing in public auditing schemes have been presented in a lot of
the above, Xiong et al. [95] considered data sharing involv- literature [42], [43], [56], [70], [86], [109].
ing multiple dynamic groups. They put forward a secure In 2017, Shen et al. [70] proposed an efficient public audit-
attribute-based broadcast encryption scheme, which realizes ing protocol based on conventional public key infrastructure
data sharing among multiple groups and supports offline and (PKI)-based cryptography. In their model, global and sam-
online computing functionalities. In addition, attributes in the pling verification is proposed to address the issue that data
access policy are anonymous to protect users’ privacy. owner may distrust the cloud has stored their data securely
and the cloud service provider may become anxious owing
to their users’ wrongly accusation during their cooperation;
B. DATA INTEGRITY Data dynamics is more efficient by the novel dynamic struc-
With cloud storage services, more and more users outsource ture consisting of doubly linked info table and location array,
their data to the cloud and realize the data sharing with others. where data update and batch auditing are easier to implement;
Ensuring data integrity remains a top priority for data secu- Furthermore, to improve the practicability of their model,
rity. Since outsourced data is often kept in unknown places, they established public auditing, blockless verification, which
how to detect whether the data remains integrity without support public verifiability and prevent data leakage from
downloading the data has become a concern. In order to check cloud service providers and auditors various auditing.
the integrity, existing solutions include provable data posses- Since the key management in PKI-based scheme is
sion (PDP) model proposed by Ateniese et al. [2] and proof more complex than those in ID-based cryptosystem,
of retrievability (POR) model presented by Shacham and source-constrained users are more likely the later one.
Waters [69]. Furthermore, outsourced data integrity auditing In 2016, an Identity (ID)-based public auditing based on
schemes have been proposed to guarantee the integrity of the homomorphic ID-based signature was designed by Zhang and
data stored in the cloud. Generally speaking, data integrity Dong [109] for cloud storage system, which implement batch
auditing can be broadly divided into two categories [86], auditing in the multi-user setting and prevent forge attack,
namely private auditing and public auditing. In the former, replace attack and replay attack from an untrusted cloud
VOLUME 8, 2020 131733
P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

server. The ID-based protocol simplify the key management been used to prove whether cloud service providers can delete
and the public auditing schemes with batch auditing lighten data irrecoverably according to user requirements, among
the auditors’ and users’ load. Their model made a great con- them, Merkle Hash Tree is very popular.
tribution to save communication and computation overhead. To assured data deletion, Xue et al. [99] proposed a effi-
In 2019, Li et al. [56] formalize data integrity auditing based cient attribute revocation scheme based on Merkle Hash Tree.
on the Fuzz Identity-based cryptography. It’s very interesting Once the cloud server receives the deletion request from
that they addressed the key management issues by brought a user, it will re-encrypt the corresponding files using the
in biometric-based identities in traditional public verifiable re-encryption key generated by the trust authority. At the
RDIC protocols, which allows TPA or users to verify the data same time, according to attribute revocation, a new root of
integrity without retrieving the entire dataset. the Merkle Hash Tree will be sent to data owner so that he
In 2018, He et al. [43] presented a certificateless provable can verify the data has been deleted successfully. In addition
data possession (CL-PDP) scheme. This scheme implements to data deletion validation, other users can still use cloud
remote data integrity auditing for cloud-based smart grid services normally during the process of deleting one user’s
data management systems. Specifically, the data owner can data.
delegate a third-party auditor to verify the integrity and detect In 2019, Yang et al. [102] presented a fine-grained data
modification of the data. The verifier is allowed to audit the deletion scheme in order to prevent malicious tampering with
integrity of a large number of data belonging to different data from cloud servers and hackers as well as the incomplete
users simultaneously. Furthermore, during this period, curi- data deletion of cloud service providers. Rank-based Merkle
ous auditors can not get the content of verified data, namely Hash Tree chain is introduced to check whether the data block
data confidentiality is ensured. Other references for certifi- is altered or deleted on the behalf of user.
cateless public auditing schemes see He et al. [42] and [43]
and Wang et al. [81]. D. LEAKAGE-RESILIENT
Wang et al. [86] provided a lightweight certificate-based Side channel attack allows adversary to destroy cryptography
public/private auditing scheme in 2020. It is a certificate-based technology by collecting information leaked by encryption
PDP scheme that was based on asymmetric pairing for the algorithm. The user downloads and decrypts the ciphertext
sake of minimizing storage space and communication cost, on the local device under normal circumstances. The attacker
and is secure under both the public key replacement adver- uses the side channel attack (for example, monitoring the
sary and the malicious certifier adversary. In their scheme, electromagnetic radiation emitted by the computer screen,
the audit phase is divided into PrivateVerify and PublicVerify, monitoring the power consumption of electronic devices or
which correspond to private auditing and public auditing, recording the sound of the user’s keystroke) to grab part
respectively. Since data owners have more information than of the information of the user’s decryption key. In order
auditors, the former executes PrivateVerify more efficiently to handle this situation, the concept of leakage-resilient
when data integrity auditing is required. If data owner is not is introduced into the cryptography scheme (for instance,
available, the auditor can execute PublicVerify directly. [6], [65]). Among them, the study of memory leakage is the
most extensive. Memory leakage is a strong leakage model
C. DATA DELETION including secret key leakage. Once the private key is revealed,
Users’ data is typically distributed across multiple cloud the encryption scheme will be invalid. Although the side
servers, which may be Shared by users who do not know channel attack is affected by physical distance, with the devel-
each other. If one user wants to delete a file in local storage, opment of unmanned aerial vehicle (UAV) and intelligent
the safest way is to burn or shred it, but this is obviously mobile devices, the side channel attack will become more
not feasible for files in the cloud. In the cloud, users need easier and cheaper.
to entrust cloud service providers to delete unnecessary files. Existing leakage models usually can be divided into three
Usually the cloud service deletes the file in the form of a categories: 1) The bounded retrieval model [29]. In this
logical deletion. Logical deletion essentially hides the cor- model, f is arbitrary polynomial-time computable leakage
responding data rather than the real deletion. This may result function with a bounded output value. Leakage-resilient can
in the user’s privacy being exposed to others. On the other be obtained by designing secret key whose size is longer
hand, cloud service providers may also falsely delete data than the output of f ; 2) The bounded leakage model [1].
and cheat users due to business interests. Therefore, how to In this model, f is a polynomial-time computable leakage
verify that the data has been deleted safely is an important function with a given bounded output value, which is gen-
part of protecting the data security in the data life cycle. erally regarded as the minimum entropy of secret key; 3) The
Hash function is a one-way function that maps data to fixed auxiliary input memory model [27]. There is a premise in this
length values, known as hash values. Generally, the definition model, namely, it’s hard to recover the secret key no matter
domain of hash function is larger than the hash value domain, how much information is leaked. With it, unbounded output
so it is difficult to get the inverse of hash value. Hash is mainly length is allowed for leakage function f ; 4) The continuous
used for authentication and public audit. In recent years, due leakage model [17]. Different from the previous three mod-
to the characteristics of hash function, hash algorithm has also els, the leakage function here can have continuous output,

131734 VOLUME 8, 2020


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

and the output is bounded in each bounded period of time, records are stored in the cloud, which not only facilitate
while the amount of output can be unbounded. In such a the patient to seek medical advice in different hospitals, but
situation, the security of the encryption scheme is guaranteed also facilitate the doctor to provide more accurate treatment
by updating the private key periodically, while updating the plan for the patient according to the records. Once the sen-
public key is not required. In fact, given the initial public sitive information, such as identity information and home
key pk and the private key sk1 . After the attacker continuously address, is leaked or tampered with, irreparable harm would
obtains the bounded information of sk1 , sk1 is updated to sk2 . be caused to patients. Besides, identity and attribute leakage
At this time, sk1 is invalid for the decrypted ciphertext, so the issues are also threatening the privacy of data owners and
information collected by the attacker is invalid. So accord- authorized users. Due to the diversity of cloud data, conven-
ing to that, even if attacker collected boundless information, tional privacy-preserving mechanisms are unable to provide
which comes from different parts of sk1 , sk2 , sk3 , · · · , it is comprehensive privacy protection in the cloud. Therefore,
still hard to recover the decryption key. protection schemes [114], [115], [117] about sensitive infor-
Hu et al. [44] proposed a CCA secure public-key encryp- mation privacy, identity privacy and attributes privacy etc. are
tion scheme, which can resilient continuous leakage and tam- developed to achieve more specified privacy protection.
pering attacks by updating the private key. In fact, they did not CP-ABE schemes plays a pivotal role in implementing
get the expected results directly. They first achieve the CCA data sharing and fine-grained access control. Only the pri-
security in continuous memory leakage (CML) model. After vate key generated by attributes of user’s matches the access
that, one-time lossy-filter is introduced to obtain CCA secu- policy embedding in the ciphertext, the ciphertext could be
rity in continuous key-leakage and tampering (CLT) model. decrypted. In the general CP-ABE scheme, access policies
In bounded leakage model, the amount of leakage may be are stored in the cloud in the form of plaintext. Neverthe-
bounded in a certain period. For example the information is less, access policies and attribute sets sometimes contain
intercepted by an attacker using the bounded side-channel sensitive information of data owners and users authorized
attack. Sometimes, a continuous leakage incurs in each invo- to share data, and the attribute privacy of data owners and
cation of the cryptosystem. The amount of leakage of private users is easily exposed by the predefined access policies.
key is limited between two consecutive private key updates, Zhang et al. [114] designed an anonymous CP-ABE access
while the whole leakage amount may be arbitrary large. control system with collusion-resistance for resource-limited
Zhang et al. [115] presented a continuous leakage-resilient user. In order to protect attribute privacy, the access policy
identity-based encryption scheme (CLR-IBE) to protect data is hidden in the ciphertext by encrypting an symmetric key.
security from partial secret key leakage in the continuous In such a system, the authorized users should not know
leakage model. It is a big data storage system in cloud com- anything about the access policy determined by the data
puting. In this scheme, the secrete keys are uploaded peri- owner, even if they can access and decrypt the ciphertext
odically in a big data storage system. By defining a leakage by using their distributed attribute private key successfully.
l
ratio: |sk| , where l denotes the size of leakage, and sk means Xiong et al. [95] proposed a group-oriented ABE model to
the size of private key, they proved that their scheme allows satisfy the requirement for one-to-many data sharing. In this
a high leakage ratio 1/3. Recently, Li et al. [55] proposed scheme, data owner first need to send the encrypted files,
a hierarchical attribute-based encryption scheme, which can hidden access policy and the set of authorized users’ identities
continuously resilient the leakage of master key and private to the cloud. They protect attribute of the authorized receiver
key. In this scheme, when the leakage length of the master from being exposed by hiding the access policy fully before
key and the private key is bounded, the proposed scheme is uploading the encrypted data to the cloud.
secure under the standard model. When the attribute universe To verify the correction of data stored in cloud storage with
is consistent with the attribute set of depth K, the master low computing resources and communication costs, public
key should be re randomized. At this time, the key update auditing schemes are proposed so that both the third public
algorithm is started. Considering that leakage is tolerable auditor (TPA) and data owner have privilege to perform the
during the update process, and the amount of leakage is auditing task. However, when the TPAs are checking the
logarithmically related to the safety parameters. As long as integrity of data, they may be very curious about identity
the key is updated regularly and the key secret information is of audited user and some other sensitive information. This
not leaked in the process, the continuous leakage elasticity may cause the identity privacy of users to be disclosed to
can be obtained. This scheme has the same leakage ratio hackers or sold to illegal organizations. Therefore, the pro-
to [115]. tection of identity privacy is of great significance. When
TPAs are auditing the correctness of remote data, the join-
E. PRIVACY-PRESERVING ing, exiting and revocation of members in a dynamic group
The convenience and scalability of cloud storage system and TPAs’ curiosity will lead to the disclosure of mem-
attract more and more individual and enterprise users to ber’s identity information. For this problem, Yu et al. [105]
outsource their data to cloud service providers. However, developed an identity privacy preservation for public audit-
there is a risk of privacy disclosure. For instance, the Elec- ing protocol. In this protocol, multiple users in a dynamic
tronic Health Records (EHRs) including patient’s medical group talk things over to share a public-secret key pair so

VOLUME 8, 2020 131735


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

TABLE 4. Comparison of representative schemes on leakage-resilience.

that TPAs could perform data auditing without any knowl- recognition and so on. Recently, more and more govern-
edge about users’ identities. Furthermore, since the target ment departments (such as the Ministry of transport and
group secret key is generated by a hash function, any user the Department of Public Security) and medical institutions
who is joining the group can only know the information have migrated massive valuable data to the cloud. Taking the
after he joined but not the previous information, and any- Ministry of transport as an example, if these data can be fully
one who leaves the group will no longer be able to obtain mined, it will be helpful to reduce road traffic congestion,
the information after he leaves. Therefore, the privacy of traffic accidents and predict the 24-hour speed of a road
private key is also protected. In the framework designed by section in the future. Furthermore, the joint data analysis
Yang et al. [103], the more members of data sharing group, of the Department of transportation and the Department of
the less probability the identity privacy will be obtained by public security is also conducive to reducing the occurrence
the auditor. Besides, group manager can trace and disclose of criminal incidents in public places. Therefore,the com-
dishonest members to reduce the tempered threat of shared bination of machine learning and cloud has become a new
data. focus.
In response to malicious attacks from untrusted cloud ser- But now there are two problems: 1) departments that do
vice providers, Zhang and Zhao [110] drawn support from not trust each other may refuse to share data in order to
the idea of chameleon hash algorithm to hide the real public protect their own data security. 2) In the face of massive cloud
keys of data owner by generating dynamic public keys. This data, users with limited resources may not be able to carry
idea preserves the identity privacy of data owner from being out effective data mining and model training because of the
obtained or calculated by cloud server. high cost of computing and communication. Outsourced the
To against both threats form malicious cloud server and model training calculation to the cloud will increase the risk
TPA, Zhang et al. [113] put forward a conditional identity of leakage of key parameters of its own model. Although
privacy protection mechanism. This scheme is mainly used there are some researches on cloud based machine learning,
to protect the identity privacy and sensitive information of for example, machine learning with public auditing [41],
patients in EHRs. They used public auditing to ensure that the machine learning training and classification scheme based on
data integrity of patients and prevents malicious cloud service homomorphic encryption [54], and homomorphic deep learn-
providers from returning error audit reports. The PKG gener- ing [57]. But the efficiency and security of these programs are
ate an anonymous identity with valid period T by patient’s not satisfactory.
real identity and the computing well-defined. Based on the For the above challenges, we think there are two research
hardness assumption, any adversary will not be able to learn directions in the future.
the patient’s authentic identity information. 1) Design a more secure privacy protection scheme to
ensure that sensitive information in shared data is hidden,
IV. OPEN ISSUES AND THE POSSIBLE DEVELOPMENT especially data involving highly sensitive information such
A. PRIVACY-PRESERVING MACHINE LEARNING IN CLOUD as government data and medical data.
STORAGE 2) Design efficient and secure outsourced privacy protec-
Machine learning is very popular and widely used, such tion scheme to support more machine learning algorithms
as data mining, medical diagnosis, DNA sequencing, image (such as incremental learning).

131736 VOLUME 8, 2020


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

B. POST-QUANTUM ENCRYPTION complete data deletion and privacy protection. Next, we intro-
In recent years, with the rapid development of blockchain, duce the encryption principles of IBE, ABE, homomorphic
Internet of things and quantum computing, the world’s encryption, searchable encryption and the research direction
attention to data security and privacy has increased to an of new encryption models. Data encryption technologies and
unprecedented level, which all put forward more and higher protection methods are summarized. These correspond to the
requirements for data security and data privacy protection. mentioned security requirements. Finally, we put forward
At present, the security of public key cryptography depends some several open research topics of data security for cloud
on some mathematical problems (such as discrete logarithm storage.
problem and factorization of large integers) which are diffi-
cult to solve in traditional computers and classical algorithms. REFERENCES
In 1994, the proposed short algorithm directly threatened the [1] A. Akavia, S. Goldwasser, and V. Vaikuntanathan, ‘‘Simultaneous hard-
core bits and cryptography against memory attacks,’’ in Proc. TCC, San
RSA and a related algorithms. Recently, the research and Francisco, CA, USA, 2009, pp. 474–495.
development of quantum computer has become the focus of [2] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson,
many commercial companies. Although it is not clear when and D. Song, ‘‘Provable data possession at untrusted stores,’’ in Proc.
ACM-CSS, New York, NY, USA, 2007, pp. 598–609.
a practical quantum computer will be implemented, some [3] N. Attrapadung and H. Imai, ‘‘Attribute-based encryption supporting
quantum computers have been designed, such as Honeywell direct/indirect revocation modes,’’ in Proc. IMACC, Cirencester, U.K.,
recently announced the construction of a 64 bit quantum Dec. 2009, pp. 278–330.
[4] J. Baek, R. Safavi-Naini, and W. Susilo, ‘‘Public key encryption
computer. with keyword search revisited,’’ in Proc. ICCSA, Perugia, Italy, 2008,
Post quantum cryptography is a new generation of cryp- pp. 1249–1259.
tography that can resist the attack of quantum computer on [5] M. Bellare, A. Boldyreva, and A. O’Neill, ‘‘Deterministic and efficiently
searchable encryption,’’ in Proc. CRYPTO, Santa Barbara, CA, USA,
existing cryptography. The following is the present researches 2007, pp. 535–552.
and existing open issues about main post quantum encryption [6] F. Berti, O. Pereira, T. Peters, and F. X. Standaert, ‘‘On leakage-resilient
algorithms. authenticated encryption with decryption leakages,’’ IACR Trans. Sym-
metric Cryptol., vol. 2017, no. 3, pp. 271–293, 2017.
1) The authentication mechanism of hash-based signature [7] J. Bethencourt, A. Sahai, and B. Waters, ‘‘Ciphertext-policy attribute-
algorithm is Merkel hash tree, whose security relies on the based encryption,’’ in Proc. IEEE Symp. Secur. Privacy, Berkeley, CA,
collision resistance of hash function. Merkel hash tree is USA, May 2007, pp. 321–334.
[8] T. Bhatia and A. K. Verma, ‘‘Data security in mobile cloud computing
applied to integrity auditing, data deletion [99], [102] etc. Due paradigm: A survey, taxonomy and open research issues,’’ J. Supercom-
to the use of tree structure in hash based construction scheme, put., vol. 73, no. 6, pp. 2558–2631, Jun. 2017.
there are only digital signature construction at present, and [9] A. Boldyreva, V. Goyal, and V. Kumar, ‘‘Identity-based encryption with
efficient revocation,’’ in Proc. ACM CCS, Alexandria, VA, USA, 2008,
there are very few public key encryption systems. pp. 417–426.
2) The lattice-based algorithm can realize cryptography [10] D. Boneh and X. Boyen, ‘‘Efficient selective-ID secure identity based
construction such as encryption, digital signature, attribute encryption without random oracles,’’ in Proc. Adv. Cryptol. (Eurocrypt),
Interlaken, Switzerland, vol. 3027, 2004, pp. 223–238.
encryption and homomorphic encryption, whose security [11] D. Boneh and X. Boyen, ‘‘Secure identity-based encryption without ran-
depends on the difficulty of solving the problems in lattice. dom oracles,’’ in Proc. CRYPTO, vol. 3152. Berlin, Germany: Springer,
Under the same security, the lattice based algorithm has 2004, pp. 443–459.
[12] D. Boneh and M. Franklin, ‘‘Identity-based encryption from the Weil
smaller public key size, faster computing speed and higher pairing,’’ in Proc. CRYPTO, vol. 2139. Berlin, Germany: Springer, 2001,
security compared with the hash-based one. Recently, lat- pp. 213–229.
tice cryptography construction based on LWE (learning with [13] D. Boneh, G. Di Crescenz, R. Ostrovsky, and G. Persiano, ‘‘Public key
encryption with keyword search,’’ in Proc. EUROCRYPT, vol. 3027.
errors) [14], [16], [26] and RLWE (ring-LWE) [20] develops Berlin, Germany: Springer, 2004, pp. 506–522.
rapidly. For instance, it is noted that Wei et al.’s research on [14] Z. Brakerski, ‘‘Fully homomorphic encryption without modulus switch-
the revocable storage IBE [90] is based on bilinear pairing. ing from classical GapSVP,’’ in Proc. CRYPTO, Santa Barbara, CA, USA,
2012, pp. 868–886.
Their scheme has good performance but can’t resist quantum [15] Z. Brakerski, C. Gentry, and V. Vaikuntanathan, ‘‘(Leveled) fully homo-
attack. Lattice based revocable storage still needs further morphic encryption without bootstrapping,’’ ACM Trans. Comput. The-
exploration. ory, vol. 6, no. 3, pp. 1–36, Jul. 2014.
[16] Z. Brakerski and V. Vaikuntanathan, ‘‘Efficient fully homomorphic
encryption from (standard) LWE,’’ SIAM Journal on Computing, vol. 43,
no. 2, pp. 831–871, Jan. 2014.
V. CONCLUSION [17] Z. Brakerski, Y. T. Kalai, J. Katz, and V. Vaikuntanathan, ‘‘Overcoming
In this paper, we give a detail survey on data security and the hole in the bucket: Public-key cryptography resilient to continual
memory leakage,’’ in Proc. IEEE 51st Annu. Symp. Found. Comput. Sci.
privacy preservation in cloud storage system. First of all, from (FOCS), Las Vegas, NV, USA, Oct. 2010, pp. 501–510.
the outstanding performance of cloud in the digital economy, [18] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, ‘‘Privacy-preserving multi-
enterprise digital transformation, Internet of things and other keyword ranked search over encrypted cloud data,’’ IEEE Trans. Parallel
Distrib. Syst., vol. 25, no. 1, pp. 222–233, Jan. 2014.
fields, we confirm that cloud computing and cloud storage [19] B. Casemore, ‘‘Network modernization: Essential for digital transfor-
will still be the mainstream. We first analyze eight elements mation and multicloud,’’ IDC, Framingham, MA, USA, White Paper
of data security in cloud storage system: data confidential- US45603019, Nov. 2019.
[20] J. H. Cheon, A. Kim, M. Kim, and Y. Song, ‘‘Homomorphic encryption
ity, data integrity, data availability, fine-grained access con- for arithmetic of approximate numbers,’’ in Proc. Int. Conf. Theory Appl.
trol, secure data sharing in dynamic group, leakage-resistant, Cryptol. Inf. Secur., Hong Kong, 2017, pp. 409–437.

VOLUME 8, 2020 131737


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

[21] L. Cheung and C. Newport, ‘‘Provably secure ciphertext policy ABE,’’ in [44] C. Hu, R. Yang, P. Liu, T. Li, and F. Kong, ‘‘A countermeasure against
Proc. 14th Proc. ACM CCS, Alexandria, VA, USA, 2007, pp. 456–465. cryptographic key leakage in cloud: Public-key encryption with contin-
[22] I. Chillotti, N. Gama, M. Georgieva, and M. Izabachène, ‘‘TFHE: Fast uous leakage and tampering resilience,’’ J. Supercomput., vol. 75, no. 6,
fully homomorphic encryption over the torus,’’ J. Cryptol., vol. 33, no. 1, pp. 3099–3122, Jun. 2019.
pp. 34–91, Jan. 2020. [45] J. Hur and D. K. Noh, ‘‘Attribute-based access control with efficient
[23] B. Cui, Z. Liu, and L. Wang, ‘‘Key-aggregate searchable encryption revocation in data outsourcing systems,’’ IEEE Trans. Parallel Distrib.
(KASE) for group data sharing via cloud storage,’’ IEEE Trans. Comput., Syst., vol. 22, no. 7, pp. 1214–1221, Jul. 2011.
vol. 65, no. 8, pp. 2374–2385, Aug. 2016. [46] Y. H. Hwang and P. J. Lee, ‘‘Public key encryption with conjunctive key-
[24] H. Deng, Z. Qin, Q. Wu, Z. Guan, and Y. Zhou, ‘‘Flexible attribute- word search and its extension to a multiuser system,’’ in Proc. Int. Conf.
based proxy re-encryption for efficient data sharing,’’ Inf. Sci., vol. 511, Pairing-Based Cryptogr. Berlin, Germany: Springer, 2007, pp. 2–22.
pp. 94–113, Feb. 2020. [47] IBM. Block. Accessed: Feb. 2020. [Online]. Available: https://www.
[25] Digital Economy Report, document UN Symbol: UNCTAD/DER/2019, ibm.com/cloud/learn/block-storage
United Nations Conference on Trade and Development, Geneva, Switzer- [48] S. Kamara, C. Papamanthou, and T. Roeder, ‘‘Dynamic searchable
land, 2019. symmetric encryption,’’ in Proc. ACM CCS, Raleigh, NC, USA, 2012,
[26] C. Dong, K. Yang, J. Qiu, and Y. Chen, ‘‘Outsourced revocable identity– pp. 965–976.
based encryption from lattices,’’ Trans. Emerg. Telecommun. Technol.,
[49] K. Lee, S. G. Choi, D. H. Lee, J. H. Park, and M. Yung, ‘‘Self-updatable
vol. 30, no. 11, p. e3529, Nov. 2019.
encryption: Time constrained access control with hidden attributes and
[27] Y. Dodis, Y. T. Kalai, and S. Lovett, ‘‘On cryptography with auxiliary
better efficiency,’’ in Proc. Adv. Cryptol.-ASIACRYPT. Berlin, Germany:
input,’’ in Proc. 41st Annu. ACM Symp. Symp. Theory Comput., 2009,
Springer, 2013, pp. 235–254.
pp. 621–630.
[28] L. Du, K. Li, Q. Liu, Z. Wu, and S. Zhang, ‘‘Dynamic multi-client [50] K. Lee, ‘‘Comments on ‘Secure data sharing in cloud computing using
searchable symmetric encryption with support for Boolean queries,’’ Inf. revocable-storage identity-based encryption,’’’ IEEE Trans. Cloud Com-
Sci., vol. 506, pp. 234–257, Jan. 2020. put., early access, Feb. 13, 2020, doi: 10.1109/TCC.2020.2973623.
[29] S. Dziembowski, ‘‘Intrusion-resilience via the bounded-storage model,’’ [51] J. Li, J. Li, X. Chen, C. Jia, and W. Lou, ‘‘Identity-based encryption
in Proc. TCC, vol. 3876. Berlin, Germany: Springer, 2006, pp. 207–224. with outsourced revocation in cloud computing,’’ IEEE Trans. Comput.,
[30] K. El Makkaoui, A. Ezzati, A. Beni-Hssane, and S. Ouhmad, ‘‘Fast vol. 64, no. 2, pp. 425–437, Feb. 2015.
Cloud–Paillier homomorphic schemes for protecting confidentiality of [52] J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou, ‘‘Fuzzy keyword
sensitive data in cloud computing,’’ J. Ambient Intell. Humanized Com- search over encrypted data in cloud computing,’’ in Proc. IEEE INFO-
put., vol. 11, no. 6, pp. 2205–2214, Jun. 2020, doi: 10.1007/s12652-019- COM, San Diego, CA, USA, Mar. 2010, pp. 1–5.
01366-3. [53] J. Li, X. Lin, Y. Zhang, and J. Han, ‘‘KSF-OABE: Outsourced attribute-
[31] T. Elgamal, ‘‘A public key cryptosystem and a signature scheme based on based encryption with keyword search function for cloud storage,’’ IEEE
discrete logarithms,’’ IEEE Trans. Inf. Theory, vol. 31, no. 4, pp. 469–472, Trans. Services Comput., vol. 10, no. 5, pp. 715–725, Sep. 2017.
Jul. 1985. [54] J. Li, X. Kuang, S. Lin, X. Ma, and Y. Tang, ‘‘Privacy preservation
[32] Fully Homomorphic Encryption: Cloud Security. Accessed: Feb. 2020. for machine learning training and classification based on homomorphic
[Online]. Available: https://www.sciencedirect.com/topics/computer- encryption schemes,’’ Inf. Sci., vol. 526, pp. 166–179, Jul. 2020.
science/fully-homomorphic-encryption [55] J. Li, Q. Yu, and Y. Zhang, ‘‘Hierarchical attribute based encryption
[33] Z. Fu, X. Wu, C. Guan, X. Sun, and K. Ren, ‘‘Toward efficient multi- with continuous leakage-resilience,’’ Inf. Sci., vol. 484, pp. 113–134,
keyword fuzzy search over encrypted outsourced data with accuracy May 2019.
improvement,’’ IEEE Trans. Inf. Forensics Security, vol. 11, no. 12, [56] Y. Li, Y. Yu, G. Min, W. Susilo, J. Ni, and K.-K.-R. Choo, ‘‘Fuzzy identity-
pp. 2706–2716, Dec. 2016. based data integrity auditing for reliable cloud storage systems,’’ IEEE
[34] Gartner: Gartner Forecasts Worldwide Public Cloud Revenue to Grow Trans. Dependable Secure Comput., vol. 16, no. 1, pp. 72–83, Jan. 2019.
17% in 2020. Accessed: Feb. 2020. [Online]. Available: https://www. [57] P. Li, J. Li, Z. Huang, T. Li, C.-Z. Gao, S.-M. Yiu, and K. Chen, ‘‘Multi-
gartner.com/en/newsroom/press-releases/2019-11-13-gartner-forecasts- key privacy-preserving deep learning in cloud computing,’’ Future Gener.
worldwide-public-cloud-revenue-to-grow-17-percent-in-2020 Comput. Syst., vol. 74, pp. 76–85, Sep. 2017.
[35] C. Ge, W. Susilo, L. Fang, J. Wang, and Y. Shi, ‘‘A CCA-secure key-policy [58] X. Liang, R. Lu, X. Lin, and X. Shen, ‘‘Ciphertext policy attribute
attribute-based proxy re-encryption in the adaptive corruption model for based encryption with efficient revocation,’’ Univ. Waterloo, Waterloo,
dropbox data sharing system,’’ Designs, Codes Cryptogr., vol. 86, no. 11, ON, Canada, Tech. Rep., 2010, vol. 2, p. 8. [Online]. Available:
pp. 2587–2603, Nov. 2018. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.584.1010&
[36] C. Gentry, ‘‘Fully homomorphic encryption using ideal lattices,’’ in Proc. rep=rep1&type=pdf
STOC, Bethesda, MD, USA, May 2009, pp. 169–178. [59] X. Liu, S. Zhao, A. Liu, N. Xiong, and A. V. Vasilakos, ‘‘Knowledge-
[37] C. Gentry, ‘‘Practical identity-based encryption without random oracles,’’
aware proactive nodes selection approach for energy management in
in Proc. EUROCRYPT, vol. 4004. Berlin, Germany: Springer, 2006,
Internet of Things,’’ Future Gener. Comput. Syst., vol. 92, pp. 1142–1156,
pp. 445–464.
Mar. 2019.
[38] C. Gentry, A. Sahai, and B. Waters, ‘‘Homomorphic encryption
[60] H. Ma, T. Peng, and Z. Liu, ‘‘Directly revocable and verifiable key-policy
from learning with errors: Conceptually-simpler, asymptotically-faster,
attribute-based encryption for large universe,’’ Int. J. Netw. Secur., vol. 19,
attribute-based,’’ in Proc. Annu. Cryptol. Conf. Berlin, Germany:
no. 2, pp. 272–284, Mar. 2017.
Springer, 2013, pp. 75–92.
[39] A. Gharaibeh, M. A. Salahuddin, S. Jahed Hussini, A. Khreishah, [61] M. B. Mollah, M. A. K. Azad, and A. Vasilakos, ‘‘Security and privacy
I. Khalil, M. Guizani, and A. Al-Fuqaha, ‘‘Smart cities: A survey on challenges in mobile cloud computing: Survey and way ahead,’’ J. Netw.
data management, security, and enabling technologies,’’ IEEE Commun. Comput. Appl., vol. 84, pp. 38–54, Apr. 2017.
Surveys Tuts., vol. 19, no. 4, pp. 2456–2501, 4th Quart., 2017. [62] S. Moon and Y. Lee, ‘‘An efficient encrypted floating-point representation
[40] V. Goyal, O. Pandey, A. Sahai, and B. Waters, ‘‘Attribute-based encryp- using HEAAN and TFHE,’’ Secur. Commun. Netw., vol. 2020, pp. 1–18,
tion for fine-grained access control of encrypted data,’’ in Proc. 13th ACM 2020, Art. no. 1250295, doi: 10.1155/2020/1250295.
Conf. Comput. Commun. Secur., Alexandria, VA, USA, 2006, pp. 89–98. [63] P. Paillier, ‘‘Public-key cryptosystems based on composite degree resid-
[41] A. Hassan, R. Hamza, H. Yan, and P. Li, ‘‘An efficient outsourced privacy uosity classes,’’ in Advances in Cryptology—EUROCRYPT, vol. 1592.
preserving machine learning scheme with public verifiability,’’ IEEE Berlin, Germany: Springer, 1999, pp. 223–238.
Access, vol. 7, pp. 146322–146330, Oct. 2019. [64] Y. Rouselakis and B. Waters, ‘‘Practical constructions and new proof
[42] D. He, N. Kumar, S. Zeadally, and H. Wang, ‘‘Certificateless provable methods for large universe attribute-based encryption,’’ in Proc. ACM
data possession scheme for cloud-based smart grid data management Conf. Comput. Commun. Secur. Berlin, Germany, 2013, pp. 463–474.
systems,’’ IEEE Trans. Ind. Informat., vol. 14, no. 3, pp. 1232–1241, [65] O. Ruan, Y. Zhang, M. Zhang, J. Zhou, and L. Harn, ‘‘After-the-
Mar. 2018. fact leakage-resilient identity-based authenticated key exchange,’’ IEEE
[43] D. He, S. Zeadally, and L. Wu, ‘‘Certificateless public auditing scheme Syst. J., vol. 12, no. 2, pp. 2017–2026, Jun. 2018.
for cloud-assisted wireless body area networks,’’ IEEE Syst. J., vol. 12, [66] A. Sahai, H. Seyalioglu, and B. Waters, ‘‘Dynamic credentials and cipher-
no. 1, pp. 64–73, Mar. 2018. [Online]. Available: http://citeseerx.ist. text delegation for attribute-based encryption,’’ in Proc. CRYPTO, Berlin,
psu.edu/viewdoc/download?doi=10.1.1.584.1010&rep=rep1&type=pdf Germany: Springer, 2012, pp. 199–217.

131738 VOLUME 8, 2020


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

[67] A. Sahai and B. Waters, ‘‘Fuzzy identity-based encryption,’’ in Proc. [89] J. Wei, X. Chen, X. Huang, X. Hu, and W. Susilo, ‘‘RS-HABE:
24th Annu. Int. Conf. Theory Appl. Cryptograph. Techn. Berlin, Germany: Revocable-storage and hierarchical attribute-based access scheme
Springer, 2005, pp. 457–473. for secure sharing of e-health records in public cloud,’’ IEEE
[68] A. Shamir, ‘‘Identity-based cryptosystems and signature schemes,’’ in Trans. Dependable Secure Comput., early access, Oct. 21, 2019,
Proc. Workshop Theory Appl. Cryptograph. Techn. Berlin, Germany: doi: 10.1109/TDSC.2019.2947920.
Springer, 1984, pp. 47–53. [90] J. Wei, W. Liu, and X. Hu, ‘‘Secure data sharing in cloud computing
[69] H. Shacham and B. Waters, ‘‘Compact proofs of retrievability,’’ in Proc. using revocable-storage identity-based encryption,’’ IEEE Trans. Cloud
ASIACRYPT, Melbourne, VIC, Australia, 2008, pp. 90–107. Comput., vol. 6, no. 4, pp. 1136–1148, Oct./Dec. 2018.
[70] J. Shen, J. Shen, X. Chen, X. Huang, and W. Susilo, ‘‘An efficient public [91] L. Wu, B. Liu, and W. Lin, ‘‘A dynamic data fault-tolerance mechanism
auditing protocol with novel dynamic structure for cloud data,’’ IEEE for cloud storage,’’ in Proc. EIDWT, Xi’an, China, Sep. 2013, pp. 95–99.
Trans. Inf. Forensics Security, vol. 12, no. 10, pp. 2402–2415, Oct. 2017. [92] Z. Wu, N. Xiong, W. Han, Y. N. Huang, C. Y. Hu, Q. Gu, and B. Hang,
[71] Y. Shi, Q. Zheng, J. Liu, and Z. Han, ‘‘Directly revocable key-policy ‘‘A fault-tolerant method for enhancing reliability of services composition
attribute-based encryption with verifiable ciphertext delegation,’’ Inf. Sci., application in WSNs based on BPEL,’’ Int. J. Distrib. Sensor Netw., vol. 9,
vol. 295, pp. 221–231, Feb. 2015. no. 3, Mar. 2013, Art. no. 493678.
[72] D. Song, D. Wagner, and A. Perrig, ‘‘Practical techniques for searches [93] Z. Xia, N. N. Xiong, A. V. Vasilakos, and X. Sun, ‘‘EPCBIR: An efficient
on encrypted data,’’ in Proc. IEEE SP, Berkeley, CA, USA, May 2000, and privacy-preserving content-based image retrieval scheme in cloud
pp. 44–55. computing,’’ Inf. Sci., vol. 387, pp. 195–204, May 2017.
[73] Spectralogic. Comparing File (NAS) and Block (SAN) Storage. [94] Z. Xia, T. Shi, N. N. Xiong, X. Sun, and B. Jeon, ‘‘A privacy-preserving
Accessed: Mar. 1, 2020. [Online]. Available: https://edge.spectralogic. handwritten signature verification method using combinational features
com/?&fuseaction=home.displayFile&DocID=4630 and secure KNN,’’ IEEE Access, vol. 6, pp. 46695–46705, Aug. 2018.
[95] H. Xiong, H. Zhang, and J. Sun, ‘‘Attribute-based privacy-preserving data
[74] W. Sun, B. Wang, N. Cao, M. Li, W. Lou, Y. T. Hou, and H. Li, ‘‘Verifiable
sharing for dynamic groups in cloud computing,’’ IEEE Syst. J., vol. 13,
privacy-preserving multi-keyword text search in the cloud supporting
no. 3, pp. 2739–2750, Sep. 2019.
similarity-based ranking,’’ IEEE Trans. Parallel Distrib. Syst., vol. 25,
no. 11, pp. 3025–3035, Nov. 2014. [96] H. Xiong, Y. Zhao, L. Peng, H. Zhang, and K.-H. Yeh, ‘‘Partially policy-
hidden attribute-based broadcast encryption with secure delegation in
[75] X. Sun, T. Wang, Z. Sun, P. Wang, J. Yu, and W. Xie, ‘‘An efficient
edge computing,’’ Future Gener. Comput. Syst., vol. 97, pp. 453–461,
quantum somewhat homomorphic symmetric searchable encryption,’’
Aug. 2019.
Int. J. Theor. Phys. Volume, vol. 56, no. 4, pp. 1335—1345, Apr. 2017.
[97] S. Xu, G. Yang, Y. Mu, and X. Liu, ‘‘A secure IoT cloud storage system
[76] S. Tahir, S. Ruj, Y. Rahulamathavan, M. Rajarajan, and C. Glackin,
with fine-grained access control and decryption key exposure resistance,’’
‘‘A new secure and lightweight searchable encryption scheme over
Future Gener. Comput. Syst., vol. 97, pp. 284–294, Aug. 2019.
encrypted cloud data,’’ IEEE Trans. Emerg. Topics Comput., vol. 7, no. 4,
[98] S. Xu, G. Yang, and Y. Mu, ‘‘Revocable attribute-based encryption with
pp. 530–544, Oct. 2019.
decryption key exposure resistance and ciphertext delegation,’’ Inf. Sci.,
[77] H. Takabi, E. Hesamifard, and M. Ghasemi, ‘‘Privacy preserving multi- vol. 479, pp. 116–134, Apr. 2019.
party machine learning with homomorphic encryption,’’ in Proc. NIPS,
[99] L. Xue, Y. Yu, Y. Li, M. H. Au, X. Du, and B. Yang, ‘‘Efficient attribute-
Barcelona, Spain, 2016, pp. 1–5.
based encryption with attribute revocation for assured data deletion,’’ Inf.
[78] Y.-Y. Teing, A. Dehghantanha, K.-K.-R. Choo, and L. T. Yang, ‘‘Forensic Sci., vol. 479, pp. 640–650, Apr. 2019.
investigation of P2P cloud storage services and backbone for IoT net-
[100] Y. Xue, K. Xue, N. Gai, J. Hong, D. S. L. Wei, and P. Hong, ‘‘An
works: BitTorrent sync as a case study,’’ Comput. Electr. Eng., vol. 58,
attribute-based controlled collaborative access control scheme for public
pp. 350–363, Feb. 2017.
cloud storage,’’ IEEE Trans. Inf. Forensics Security, vol. 14, no. 11,
[79] H. Teng, Y. Liu, A. Liu, N. N. Xiong, Z. Cai, T. Wang, and X. Liu, ‘‘A pp. 2927–2942, Nov. 2019.
novel code data dissemination scheme for Internet of Things through
[101] Y. Yang, X. Zheng, W. Guo, X. Liu, and V. Chang, ‘‘Privacy-preserving
mobile vehicle of smart cities,’’ Future Gener. Comput. Syst., vol. 94,
smart IoT-based healthcare big data storage and self-adaptive access
pp. 351–367, May 2019.
control system,’’ Inf. Sci., vol. 479, pp. 567–592, Apr. 2019.
[80] According to a New IDC Forecast. The Growth in Connected IoT [102] C. Yang, Q. Chen, and Y. Liu, ‘‘Fine-grained outsourced data deletion
Devices Is Expected to Generate 79.4 ZB of Data in 2025. Accessed: scheme in cloud computing,’’ Int. J. Electron. Inf. Eng., vol. 11, no. 2,
Mar. 1, 2020. [Online]. Available: https://www.idc.com/getdoc. pp. 81–98, Dec. 2019.
jsp?containerId=prUS45213219
[103] G. Yang, J. Yu, W. Shen, Q. Su, Z. Fu, and R. Hao, ‘‘Enabling public
[81] B. Wang, B. Li, H. Li, and F. Li, ‘‘Certificateless public auditing for data auditing for shared data in cloud storage supporting identity privacy and
integrity in the cloud,’’ in Proc. IEEE CNS, National Harbor, MD, USA, traceability,’’ J. Syst. Softw., vol. 113, pp. 130–139, Mar. 2016.
Oct. 2013, pp. 136–144.
[104] H. Yin, Z. Qin, J. Zhang, L. Ou, F. Li, and K. Li, ‘‘Secure conjunc-
[82] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, ‘‘Secure ranked keyword tive multi-keyword ranked search over encrypted cloud data for multi-
search over encrypted cloud data,’’ in Proc. ICDCS, Genova, Italy, 2010, ple data owners,’’ Future Gener. Comput. Syst., vol. 100, pp. 689–700,
pp. 253–262. Nov. 2019.
[83] J. Wang, C. Huang, N. N. Xiong, and J. Wang, ‘‘Blocked linear secret [105] Y. Yu, Y. Mu, J. Ni, J. Deng, and K. Huang, ‘‘Identity privacy-
sharing scheme for scalable attribute based encryption in manageable preserving public auditing with dynamic group for secure mobile
cloud storage system,’’ Inf. Sci., vol. 424, pp. 1–26, Jan. 2018. cloud storage,’’ in Proc. NSS, New York, NY, USA, 2015,
[84] J. Wang, N. N. Xiong, J. Wang, and W.-C. Yeh, ‘‘A compact ciphertext- pp. 28–40.
policy attribute-based encryption scheme for the information-centric [106] Z. Yu, M. H. Au, Q. Xu, R. Yang, and J. Han, ‘‘Towards leakage-resilient
Internet of Things,’’ IEEE Access, vol. 6, pp. 63513–63526, 2018. fine-grained access control in fog computing,’’ Future Gener. Comput.
[85] B. Waters, ‘‘Efficient identity-based encryption without random oracles,’’ Syst., vol. 78, pp. 77–763, Jan. 2018.
in Advances in Cryptology—EUROCRYPT, vol. 3494. Berlin, Germany: [107] F. Zhang, Q. Li, and H. Xiong, ‘‘Efficient revocable key-policy attribute
Springer, 2005, pp. 114–127. based encryption with full security,’’ in Proc. IEEE CIS, Guangzhou,
[86] F. Wang, L. Xu, K.-K.-R. Choo, Y. Zhang, H. Wang, and J. Li, China, Nov. 2012, pp. 477–481.
‘‘Lightweight certificate-based public/private auditing scheme based on [108] J. Zhang, B. Chen, Y. Zhao, X. Cheng, and F. Hu, ‘‘Data security
bilinear pairing for cloud storage,’’ IEEE Access, vol. 8, pp. 2258–2271, and privacy-preserving in edge computing paradigm: Survey and open
2020. issues,’’ IEEE Access, vol. 6, pp. 18209–18237, Mar. 2018.
[87] X. Wang, T. Luo, and J. Li, ‘‘A more efficient fully homomorphic encryp- [109] J. Zhang and Q. Dong, ‘‘Efficient ID-based public auditing for the
tion scheme based on GSW and DM schemes,’’ Secur. Commun. Netw., outsourced data in cloud storage,’’ Inf. Sci., vols. 343–344, pp. 1–14,
vol. 2018, pp. 1–14, Dec. 2018, doi: 10.1155/2018/8706940. May 2016.
[88] X. Wang, X. Cheng, and Y. Xie, ‘‘Efficient verifiable key-aggregate [110] J. Zhang and X. Zhao, ‘‘Efficient chameleon hashing-based privacy-
keyword searchable encryption for data sharing in outsourcing storage,’’ preserving auditing in cloud storage,’’ Cluster Comput., vol. 19, no. 1,
IEEE Access, vol. 8, pp. 11732–11742, 2020. pp. 47–56, Mar. 2016.

VOLUME 8, 2020 131739


P. Yang et al.: Data Security and Privacy Protection for Cloud Storage: A Survey

[111] W. Zhang, Y. Lin, S. Xiao, J. Wu, and S. Zhou, ‘‘Privacy preserving ranked NAIXUE XIONG (Senior Member, IEEE) received
multi-keyword search for multiple data owners in cloud computing,’’ the Ph.D. degree from Wuhan University and the
IEEE Trans. Comput., vol. 65, no. 5, pp. 1566–1577, May 2016. Ph.D. degree from the Japan Advanced Institute of
[112] Z. Wei, ‘‘A pairing-based homomorphic encryption scheme for multi- Science and Technology. He is currently an Asso-
user settings,’’ Int. J. Technol. Hum. Interact., vol. 12, no. 2, pp. 72–82, ciate Professor (5th year) with the Department of
Apr. 2016. Mathematics and Computer Science, Northeast-
[113] X. Zhang, J. Zhao, C. Xu, H. Li, H. Wang, and Y. Zhang, ‘‘CIPPPA: ern State University, OK, USA. He has published
Conditional identity privacy-preserving public auditing for cloud-based
over 200 international journal articles and over
WBANs against malicious auditors,’’ IEEE Trans. Cloud Comput., early
100 international conference papers. His research
access, Jul. 10, 2019, doi: 10.1109/TCC.2019.2927219.
[114] Y. Zhang, X. Chen, J. Li, D. S. Wong, H. Li, and I. You, ‘‘Ensuring interests include cloud computing, security and
attribute privacy protection and fast decryption for outsourced data secu- dependability, parallel and distributed computing, networks, and optimiza-
rity in mobile cloud computing,’’ Inf. Sci., vol. 379, pp. 42–61, Feb. 2017. tion theory. He has been serving as an Associate Editor for the IEEE
[115] Y. Zhang, M. Yang, D. Zheng, P. Lang, A. Wu, and C. Chen, ‘‘Efficient TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, and Information
and secure big data storage system with leakage resilience in cloud Science, and an Editorial Member for over ten international journals.
computing,’’ Soft Comput., vol. 22, no. 23, pp. 7763–7772, Aug. 2018.
[116] D. Zhe, W. Qinghong, S. Naizheng, and Z. Yuhan, ‘‘Study on data security
policy based on cloud storage,’’ in Proc. IEEE IEEE 3rd Int. Conf. Big
Data Secur. Cloud (BigDataSecurity) Int. Conf. High Perform. Smart
Comput., (HPSC) IEEE Int. Conf. Intell. Data Secur. (IDS), Beijing,
China, May 2017, pp. 145–149.
[117] F. Yundong, W. Xiaoping, and W. Jiasheng, ‘‘Multi-authority attribute-
based encryption access control scheme with hidden policy and constant
length ciphertext for cloud storage,’’ in Proc. DSC, Shenzhen, China, JINGLI REN received the Ph.D. degree in applied
Jun. 2017, pp. 205–212. mathematics from the Beijing Institute of Technol-
ogy, in 2004.
She became a Professor with the School of
Mathematics and Statistics, Zhengzhou Univer-
PAN YANG received the B.S. and M.S. degrees sity, in 2006, where she is currently the Deputy
in pure mathematics from Zhengzhou University, Dean of the Henan Academy of Big Data. She is
Zhengzhou, China, in 2017 and 2019, respectively, also a Humboldt Scholar of Germany and a Dis-
where she is currently pursuing the Ph.D. degree tinguished Professor of Henan Province. She has
in applied mathematics with the School of Mathe- published over 80 international journal articles.
matics and Statistics. Her current research interests Her research interests include applied mathematics, applied statistics, and
include the data security, differential equation, and data science. She conducted four Projects of National Nature Science Foun-
data science. dation of China, one Alexander von Humboldt Fellowship for Experienced
Researcher, and five Provincial Projects.

131740 VOLUME 8, 2020

You might also like