2) A Blockchain-Based Process Provenance For Cloud Forensics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

UG,(((,QWHUQDWLRQDO&RQIHUHQFHRQ&RPSXWHUDQG&RPPXQLFDWLRQV

A Blockchain-based Process Provenance for Cloud Forensics

Yong Zhang, Songyang Wu*, Bo Jin, Jiaying Du


The Third Research Institute of Ministry of Public Security
Shanghai 201204, China
e-mail: [email protected], [email protected]

Abstract—Because of the distributed nature of cloud tampering for process records. The process provenance
computing, the data of cloud may be stored and processed enhances the trustworthy of the chain of custody.
across jurisdictional borders. In such case, collecting digital The most important two security requirements of our
evidence from cloud computing requires multiparty scheme are anti-tampering and privacy preservation.
cooperation. However, as far as we known, there is still no Blockchain [2] technology has attracted interest due to
available technic approach for improving the trustworthy of maintain a distributed public ledger. The decentralized
the interaction records of all stakeholders in cloud forensics. architecture of blockchain make tampering of the public
This work proposed a process provenance, which provides ledgers are extremely challenging. This work leverages the
proof of existence and privacy preservation for process records
blockchain to build the anti-tampering module. Privacy
by using technologies of blockchain and cryptography group
preservation is another vital demand in the judicial forensics.
signature. The process provenance enhances the trustworthy of
the chain of custody for cloud forensics. The proposed scheme also takes appropriate mechanisms to
protect the confidentiality of the process records and the
Keywords-cloud forensics; process provenance; blockchain anonymity of interaction parties, which ensures that no
sensitive information leaks during the blockchain-based
process provenance operation.
I. INTRODUCTION
With rapid popularity of cloud computing, cloud II. RELATED WORKS
forensics had become a research focus. However, compared NIST’s Draft NISTIR 8006 [1] aggregates, categorizes
with traditional digital investigation, cloud forensics faces and discusses the forensics challenges faced by digital
more complex challenges. National Institute of Standards investigation in a cloud-computing ecosystem. The
and Technology (NIST) outlined the cloud computing document states that forensics challenges cannot be solved
forensic science challenges in “Draft NISTIR 8006” [1], by technology, law, or organizational principles alone.
where the challenge 23 states that the secure provenance for Various solutions had been proposed by researchers for
data capture is useful method for ensuring chain of custody. mitigating the challenges in cloud computing forensics.
We argue that the “process provenance” is also vital for the Dykstra et al. designed and implemented digital forensic
chain of custody. Because of the distributed nature of cloud tools (FROST) for the OpenStack cloud computing platform
computing, the data of cloud may be stored and processed [3]. FROST supports IaaS and provides trustworthy forensic
across jurisdictional borders. In such a situation, the process acquisition of virtual disks, API logs, and guest firewall logs.
of collecting digital evidence of cloud computing requires However, none of above works consider the CSPs as a
multiparty cooperation, instead of only the investigator and dishonest principal part. Wu et al. introduced the notion of
the cloud service provider (CSP). For example, as shown in secure logging monitor service for cloud forensics [4], which
figure 3, investigator Alice from jurisdictional area A finds is deployed in the cloud and generates integrity proofs of
that the target data evidence is maintained by a CSP Bob cloud logs in real time. Alex et al. proposed a forensics
from jurisdictional area B, then Alice has to request the framework for cloud computing [5], the proposed solution
department of administration Plod from jurisdictional area B addresses the data collection issues by introducing a
to collaborate for cloud forensics. After data collecting by centralized forensic server and a forensic layer (a forensic
Bob, these electronic evidence may be submitted to Alice monitoring plan) outside the cloud infrastructure, this
through Plod. In this case, the detailed process records of scheme needs not depend on the cloud service provider for
evidence request and submission are essential for the chain collecting forensic data.
of custody. However, as far as we known, there is still no Secure provenance schemes [6, 7] that record the
available technic approach for improving the trustworthy of ownership and process history of data objects in cloud
the interaction records of all stakeholders in the cloud environment were also proposed for cloud forensics. To
forensics. preserve privacy and provide data forensics in cloud
Therefore, this work proposes a process provenance computing, secure provenance always satisfies basic secure
scheme for aforementioned requirement where a multi- requirements including unforgeability and conditional
participant data collection is performed. The process privacy preservation. Liang et al. proposed ProvChain [8], an
provenance provides powerful proof of existence and anti- architecture to collect and verify cloud data provenance by

978-1-5090-6352-9/17/$31.00 ©2017 IEEE 


embedding the data into blockchain transactions, the security In this work, we introduce the blockchain-based process
features of which includes tamper-proof and user privacy. provenance to create proofs of existence of the process
Aforementioned existing schemes focus on data records. Using the blockchain [2, 9] and group signature [10],
extraction and data integrity topics in cloud forensics, and the proposed scheme archived goals of anti-tampering and
they do not consider the trustworthiness of process records in privacy preservation.
all stakeholders' interactions for cloud computing forensics.

Figure 1. System overview

the sender. After the corresponding process record


III. OUR APPROACH has been anchored on blockchain, the receiver
retrieve process record receipts from the provenance
A. System Overview auditor.
Figure 1 outlines the architecture of our process x Sender. A sender (usually a cloud service provide or
provenance built on a blockchain. The provenance system an investigator) collects the forensics data and sends
will provide the ability to audit the process records the collected data set to the receiver. After the
(“submission_list” in Figure 1) for cloud forensics. The submission is confirmed by the receiver, the sender
process provenance achieves the following objectives: posts the process record to the provenance auditor.
x Tamper-proof forensic data submission x Provenance auditor (PA). The PA receives process
provenance. As soon as collected forensic data is records from senders. When collecting records to a
submitted, a process record including the submission certain number, PA anchor these data into
list and digital signatures of both two interacting blockchain network and stores corresponding
parties is sent to the provenance auditor. After a blockchain receipt. PA also uses the blockchain
while, provenance auditor will anchor collected receipt to generate a record receipt for each process
process records into a blockchain network, then record.
every process record will be banded with a x Certification Authority (CA). CA is not be
blockchain receipt and cannot be tampered later. described in figure 1. The CA is responsible for
x Privacy Preservation. Our process record only maintaining the group signature system.
stores hash values of the submitted forensic data files.
We use group signature to implement the privacy B. Group Signature
preservation objective, the two interacting parties of Group signature [10] that provides anonymity for signers
a forensic data submission session signature the is an essential build block in our scheme. For preserving
“submission_list” using the group signature key, privacy, we use the short group signature of Boneh et al. [10]
while lead to that the third party cannot link the as our digital signature scheme. We briefly describe the
process record with the sender and the receiver. major algorithms of the group signature as follows:
The critical components in the process provenance are x KeyGen(n). This randomized algorithm takes as
described as follows: input a parameter n, the number of members of the
x Submission list. The list file can be considered as group. It selects the bilinear group pair [10] and
the process record. The list contains hash values of setups the group public key gpk, the group manager
submitted forensics data files and the (group) private key gmsk and each user’s private key gsk[i].
signatures of sender and receiver. x Sign(gpk, gsk[i], M). Given a group public key gpk,
x Receiver. A receiver (usually an investigator) sends a user’s key gsk[i] and a message M, compute the
a request to the sender for collecting the forensic signature σ.
data from cloud and receive the forensic data from


x Verify(gpk, M, σ). Given a group public key gpk, a list to the sender. The sender could verify that the submission
message M, and a group signature σ, verify that σ is list is just be signed by the receiver through checking the
a valid signature on M. nonce2.
x Open(gpk, gmsk, M, σ). This algorithm is used for
tracing a signature to a signer. It takes as input a
group public key gpk, the corresponding group
manager private key gmsk, a message M and a
signature σ. It recover the signer’s identity value.
The CA in our architecture is responsible for algorithms
KeyGen() and Open().

IV. IMPLEMENTATION
In this section, we first describe the setup of the whole
system, and then discuss the implementation of step 1-7 that
are illustrated in Figure 1.
A. System Setup
The CA is responsible for the key management in the
system. CA run algorithm KeyGen() to compute gmsk, gpk,
and gsk[i] for each user including the PA. Then CA issues
Figure 2. Submission list
the key gpk as the public key, send gsk[i] to each user in
confidential channel and keep the gmsk in secret. Step 4. The sender broadcasts the confirmed submission
To archon data in blockchain networks, we use list as considering privacy preservation. After receiving a
Chainpoint protocol v2 [9], which collates multiple data confirmed submission list, the PA firstly verifies the received
records into a Merkle tree and publishes the Merkle Root in document by checking the group signatures, then computes
the blockchain (such as Bitcoin or Ethereum) using a single and collates this document’s hash into a Merkle tree.
transaction. The PA is responsible for collating process Step 5-6. The Merkle tree maintained by PA hosts a
records and publishing “data stub” into blockchain following number of submission list records. Following the following
the Chainpoint protocol. PA also needs to provide Chainpoint protocol, the PA publishes the Merkle root and
blockchain receipt for each data record as the proof of anchors these values in blockchain network using a single
existence. transaction. After the transaction is confirmed in blockchain,
B. Steps 1-8 the PA could fetch the block ID through the transaction ID.
Step 7. The PA constructs blockchain receipt for each
Steps 1 and 2 are common request and response for
submission list record. We adopt the “Blockchain Receipt
collecting cloud data evidence. In fact most of these two
Standard” of Chainpoint protocol [9]. The receipt contains, at
steps are performed outside the system, such as forensics
minimum, the submission list hash, Merkle tree path proof,
data files transfer. Our system only stores the corresponding
Merkle root, and the transaction ID.
submission list.
Step 8. The receiver retrieves blockchain receipts. For
The submission list is the major material of the process
privacy protection, the receiver could request all blockchain
record. This document records the hash values of transferred
receipts of the newest Merkle tree and only stores these
forensics data files and group signatures of the sender and
receipts that related to his interest.
the receiver. As using an anonymity signature, we designed
two encrypted nonce fields for verifying the counterpart C. Process Recordes Validation
during the interaction session. The main fields of submission As a submission record and its blockchain receipt are
list are illustrated in Figure 2. generated on each node of forensic data collection process,
Step 2. The sender first writes file hash values and all stakeholders in the cloud forensics could verify the
timestamp into the list, then he uses the receiver’s public key process record by linking the group signatures in submission
to encrypt a randomly nonce1 and puts the encrypted value list to the sender and receiver under the help of CA (the
into the list. The sender computes a group signature on a open() algorithm mentioned in section ċ).
message M that includes fields of Hash value, timestamp and
To validate blockchain receipt, we first query the
encrypt nonce1, and stores this group signature in this file.
corresponding transaction in the blockchain network, then
Step 3. The receiver first checks the received forensics
we compute a new Merkle root from the Merkle path proof
data files using the hash values that recorded in submission
and compare the new Merkle root with the public Merkle
list. If the checking process is passed, he next confirms the
root anchored in the transaction.
submission list. The receiver decrypts the encrypt nonce1 and
computes encrypted nonce2 where nonce2 = nonce1 + 1 using
the sender’s public key. Then the receiver computes a group
signature on this document and sends back the submission


Another limitation is that the operation of the whole
system relies on a sing node PA who must honestly performs
its own tasks (anchoring data to blockchain). A single point
of failure will affect the operation of the entire system.
In the future work, we will try to remove central nodes
such as CA and PA. Through designing consultation
mechanisms, complete the CA and PA functions depend on
all nodes in the system.
VI. CONCLUSIONS
In this paper, we proposed the process provenance, which
provides powerful proof of existence and privacy
preservation for process records by using technologies of
blockchain and cryptography group signature. The process
provenance enhances the trustworthy of the chain of custody
Figure 3. A sample process of forensic data collection for cloud forensics. However the major limitation in our
scheme is the existence of central management nodes (CA
and PA), the system is too dependent on the honestly
V. ANALYSIS performance of central nodes. We have put the improving
In this section, we analysis the proposed scheme in terms plan in the future work.
of performance and security, we also discuss its limitations.
ACKNOWLEDGMENT
A. Computation and Storage Overhead
This work was supported by the National Natural Science
Algorithms Sign() and Verify() of group signatures cost Foundation of China (No.61402117) and Project of The
high time overhead (less than 1 second on personal Third Research Institute of Ministry of Public Security
computer). However, considering these algorithms are only (No.C17355).
executed one or two times on each node in the whole
forensics data collecting process, the time overhead is REFERENCES
negligible. [1] NISTIR D. 8006 (2014) NIST Cloud Computing Forensic Science
An encrypted nonce costs 256 bytes (2048 bits RSA). Challenges accessed at http://csrc. nist. gov/publications/drafts/nistir-
According to literature [10], the group signature length is 8006/draft_nistir_8006. pdf Gary Palmer (2001),“A Road Map for
under 200 bytes, timestamp is 8 bytes field. A file hash value Digital Forensic Research”[R]. Technical Report DTR-T001-01,
DFRWS, Report From the.
length is 32 bytes. Thus the length of submission list is
[2] Swan M. Blockchain: Blueprint for a new economy[M]. " O'Reilly
( 920  n u 32 ) bytes. We assume that the length is under Media, Inc.", 2015.
2KB, then 1TB storage disk could contain more than 500 [3] J. Dykstra, A. T. Sherman, Design and implementation of frost:
million submission documents. Digital forensic tools for the openstack cloud computing platform,
According to literature [9], a blockchain receipt length is Digital Investigation, 2013, 10(8):S87–S95.
usually less than 1024 bytes. Such that 1TB storage disk [4] Wu S, Zhang Y. Secure logging monitor service for cloud forensics[C]
could contain more than 1 billion receipts //Communication Technology (ICCT), 2015 IEEE 16th International
The PA only needs to store the newest Merkle tree and Conference on. IEEE, 2015: 757-762.
all blockchain receipts. The receiver node only stores [5] Alex M E, Kishore R. Forensics framework for cloud computing[J].
Computers & Electrical Engineering, 2017.
submission records and receipts related with him. Such that
[6] Lu R, Lin X, Liang X, et al. Secure provenance: the essential of bread
1TB storage disk on each node is sufficient. and butter of data forensics in cloud computing[C]//Proceedings of
B. Security the 5th ACM Symposium on Information, Computer and
Communications Security. ACM, 2010: 282-292.
The privacy preservation is guaranteed by the Group [7] Wu S, Zhang Y. Secure Provenance for Data Forensics with Efficient
Signatures scehme [10], which was proved security under the Revocation of Anonymous Credentials in Cloud Computing[J].
Strong Diffie-Hellman assumption and the random oracle Wireless Personal Communications, 2016, 90(3): 1497-1517.
mode. The inherent characteristic of anti-tampering of [8] Liang X, Shetty S, Tosh D, et al. Provchain: A blockchain-based data
blockchains ensures that our process records can be reliably provenance architecture in cloud environment with enhanced privacy
and availability[C]//Proceedings of the 17th IEEE/ACM International
verified and are extremely difficult to be tampered. Symposium on Cluster, Cloud and Grid Computing. IEEE Press,
C. Limitations 2017: 468-477.
[9] Chainpoint: A scalable protocol for anchoring data in the blockchain
One major limitation is that the privacy of our scheme is and generating blockchain receipts, http://www.chainpoint.org/.
also depend on the CA's trustworthy, we have to assume that [10] Boneh D, Boyen X, Shacham H. Short group signatures[C]//Crypto.
the CA is must to be a trusted third party. 2004, 3152: 41-55.



You might also like