Big Data Privacy
Big Data Privacy
Big Data Privacy
Abstract
Big data is a term used for very large data sets that have more varied and complex structure.
These characteristics usually correlate with additional difficulties in storing, analyzing and
applying further procedures or extracting results. Big data analytics is the term used to
describe the process of researching massive amounts of complex data in order to reveal
hidden patterns or identify secret correlations. However, there is an obvious contradiction
between the security and privacy of big data and the widespread use of big data. This paper
focuses on privacy and security concerns in big data, differentiates between privacy and
security and privacy requirements in big data. This paper covers uses of privacy by taking
existing methods such as HybrEx, k-anonymity, T-closeness and L-diversity and its
implementation in business. There have been a number of privacy-preserving mechanisms
developed for privacy protection at different stages (for example, data generation, data
storage, and data processing) of a big data life cycle. The goal of this paper is to provide a
major review of the privacy preservation mechanisms in big data and present the challenges
for existing mechanisms. This paper also presents recent techniques of privacy preserving in
big data like hiding a needle in a haystack, identity based anonymization, differential privacy,
privacy-preserving big data publishing and fast anonymization of big data streams. This
paper refer privacy and security aspects healthcare in big data. Comparative study between
various recent techniques of big data privacy is also done as well.
Background
Big data [1, 2] specifically refers to data sets that are so large or complex that traditional data
processing applications are not sufficient. It’s the large volume of data—both structured and
unstructured—that inundates a business on a day-to-day basis. Due to recent technological
development, the amount of data generated by internet, social networking sites, sensor
networks, healthcare applications, and many other companies, is drastically increasing day by
day. All the enormous measure of data produced from various sources in multiple formats
with very high speed [3] is referred as big data. The term big data [4, 5] is defined as “a new
generation of technologies and architectures, designed to economically separate value from
very large volumes of a wide variety of data, by enabling high-velocity capture, discovery
and analysis”. On the premise of this definition, the properties of big data are reflected by
3V’s, which are, volume, velocity and variety. Later studies pointed out that the definition of
3Vs is insufficient to explain the big data we face now. Thus, veracity, validity, value,
variability, venue, vocabulary, and vagueness were added to make some complement
explanation of big data [6]. A common theme of big data is that the data are diverse, i.e., they
may contain text, audio, image, or video etc. This differing qualities of data is signified by
variety. In order to ensure big data privacy, various mechanisms have been developed in
recent years. These mechanisms can be grouped based on the stages of big data life cycle [7]
Fig. 1, i.e., data generation, storage, and processing. In data generation phase, for the
protection of privacy, access restriction as well as falsifying data techniques are used. The
approaches to privacy protection in data storage phase are chiefly based on encryption
procedures. Encryption based techniques can be further divided into Identity Based
Encryption (IBE), Attribute Based Encryption (ABE) and storage path encryption. In
addition, to protect the sensitive information, hybrid clouds are utilized where sensitive data
are stored in private cloud. The data processing phase incorporates Privacy Preserving Data
Publishing (PPDP) and knowledge extraction from the data. In PPDP, anonymization
techniques such as generalization and suppression are utilized to protect the privacy of data.
These mechanisms can be further divided into clustering, classification and association rule
mining based techniques. While clustering and classification split the input data into various
groups, association rule mining based techniques find the useful relationships and trends in
the input data [8]. To handle diverse measurements of big data in terms of volume, velocity,
and variety, there is need to design efficient and effective frameworks to process expansive
measure of data arriving at very high speed from various sources. Big data needs to
experience multiple phases during its life cycle.
Fig. 1
Big data life cycle stages of big data life cycle, i.e., data generation, storage, and
processing are shown
As of 2012, 2.5 quintillion bytes of data are created daily. The volumes of data are vast, the
generation speed of data is fast and the data/information space is global [9]. Lightweight
incremental algorithms should be considered that are capable of achieving robustness, high
accuracy and minimum pre-processing latency. Like, in case of mining, lightweight feature
selection method by using Swarm Search and Accelerated PSO can be used in place of the
traditional classification methods [10]. Further ahead, Internet of Things (IoT) would lead to
connection of all of the things that people care about in the world due to which much more
data would be produced than nowadays [11]. Indeed, IoT is one of the major driving forces
for big data analytics [9].
In today’s digital world, where lots of information is stored in big data’s, the analysis of the
databases can provide the opportunities to solve big problems of society like healthcare and
others. Smart energy big data analytics is also a very complex and challenging topic that
share many common issues with the generic big data analytics. Smart energy big data involve
extensively with physical processes where data intelligence can have a huge impact to the
safe operation of the systems in real-time [12]. This can also be useful for marketing and
other commercial companies to grow their business. As the database contains the personal
information, it is vulnerable to provide the direct access to researchers and analysts. Since in
this case, the privacy of individuals is leaked, it can cause threat and it is also illegal. The
paper is based on research not ranging to a specific timeline. As the references suggest,
research papers range from as old as 1998 to papers published in 2016. Also, the number of
papers that were retrieved from the keyword-based search can be verified from the presence
of references based on the keywords. “Privacy and security concerns” section discusses of
privacy and security concerns in big data and “Privacy requirements in big data” section
covers the Privacy requirement in big data. “Big data privacy in data generation phase”, “Big
data privacy in data storage phase” and “Big data privacy preserving in data processing”
sections discusses about big data privacy in data generation, data storage, and data processing
Phase. “Privacy Preserving Methods in Big Data” section covers the privacy preserving
techniques using big data. “Recent Techniques of Privacy Preserving in Big Data” section
presents some recent techniques of big data privacy and comparative study between these
techniques.
Privacy Information privacy is the privilege to have some control over how the personal
information is collected and used. Information privacy is the capacity of an individual or
group to stop information about themselves from becoming known to people other than those
they give the information to. One serious user privacy issue is the identification of personal
information during transmission over the Internet [13].
Security Security is the practice of defending information and information assets through the
use of technology, processes and training from:-Unauthorized access, Disclosure, Disruption,
Modification, Inspection, Recording, and Destruction.
1. 1.
The specification of privacy policies managing the access to data stored into target big
data platforms,
2. 2.
3. 3.
The integration of the generated monitors into the target analytics platforms.
Enforcement techniques proposed for traditional DBMSs appear inadequate for the
big data context due to the strict execution necessities needed to handle large data
volumes, the heterogeneity of the data, and the speed at which data must be analysed.
Businesses and government agencies are generating and continuously collecting large
amounts of data. The current increased focus on substantial sums of data will undoubtedly
create opportunities and avenues to understand the processing of such data over numerous
varying domains. But, the potential of big data come with a price; the users’ privacy is
frequently at danger. Ensures conformance to privacy terms and regulations are constrained
in current big data analytics and mining practices. Developers should be able to verify that
their applications conform to privacy agreements and that sensitive information is kept
private regardless of changes in the applications and/or privacy regulations. To address these
challenges, identify a need for new contributions in the areas of formal methods and testing
procedures. New paradigms for privacy conformance testing to the four areas of the ETL
(Extract, Transform, and Load) process as shown in Fig. 2 [15, 16].
Fig. 2
Big data architecture and testing area new paradigms for privacy conformance
testing to the four areas of the ETL (Extract, Transform, and Load) processes are
shown here
1. 1.
Pre‐hadoop process validation This step does the representation of the data loading
process. At this step, the privacy specifications characterize the sensitive pieces of
data that can uniquely identify a user or an entity. Privacy terms can likewise indicate
which pieces of data can be stored and for how long. At this step, schema restrictions
can take place as well.
2. 2.
3. 3.
4. 4.
Reports testing reports are another form of questions, conceivably with higher
visibility and wider audience. Privacy terms that characterize ‘purpose’ are
fundamental to check that sensitive data is not reported with the exception of
specified uses.
Big data privacy in data generation phase
Data generation can be classified into active data generation and passive data generation. By
active data generation, we mean that the data owner will give the data to a third party [17],
while passive data generation refers to the circumstances that the data are produced by data
owner’s online actions (e.g., browsing) and the data owner may not know about that the data
are being gathered by a third party. Minimization of the risk of privacy violation amid data
generation by either restricting the access or by falsifying data.
1. 1.
Access restriction If the data owner thinks that the data may uncover sensitive
information which is not supposed to be shared, it refuse to provide such data. If the
data owner is giving the data passively, a few measures could be taken to ensure
privacy, such as anti-tracking extensions, advertisement or script blockers and
encryption tools.
2. 2.
The conventional security mechanisms to protect data can be divided into four categories.
They are file level data security schemes, database level data security schemes, media level
security schemes and application level encryption schemes [20]. Responding to the 3V’s
nature of the big data analytics, the storage infrastructure ought to be scalable. It should have
the ability to be configured dynamically to accommodate various applications. One promising
technology to address these requirements is storage virtualization, empowered by the
emerging cloud computing paradigm [21]. Storage virtualization is process in which
numerous network storage devices are combined into what gives off an impression of being a
single storage device. SecCloud is one of the models for data security in the cloud that jointly
considers both of data storage security and computation auditing security in the cloud [22].
Therefore, there is a limited discussion in case of privacy of data when stored on cloud.
De-identification
De-identification [29, 30] is a traditional technique for privacy-preserving data mining, where
in order to protect individual privacy, data should be first sanitized with generalization
(replacing quasi-identifiers with less particular but semantically consistent values) and
suppression (not releasing some values at all) before the release for data mining. Mitigate the
threats from re-identification; the concepts of k-anonymity [29, 31, 32], l-diversity
[30, 31, 33] and t-closeness [29, 33] have been introduced to enhance traditional privacy-
preserving data mining. De-identification is a crucial tool in privacy protection, and can be
migrated to privacy preserving big data analytics. Nonetheless, as an attacker can possibly get
more external information assistance for de-identification in the big data, we have to be
aware that big data can also increase the risk of re-identification. As a result, de-identification
is not sufficient for protecting big data privacy.
Privacy-preserving big data analytics is still challenging due to either the issues of
flexibility along with effectiveness or the de-identification risks.
De-identification is more feasible for privacy-preserving big data analytics if develop
efficient privacy-preserving algorithms to help mitigate the risk of re-identification.
There are three -privacy-preserving methods of De-identification, namely, K-anonymity, L-
diversity and T-closeness. There are some common terms used in the privacy field of these
methods:
There are six attributes along with ten records in this data. There are two regular techniques
for accomplishing k-anonymity for some value of k.
1. 1.
Table 3 2-anonymity with respect to the attributes ‘Age’, ‘Gender’ and ‘State of
domicile’
2. 2.
L-diversity
It is a form of group based anonymization that is utilized to safeguard privacy in data sets by
reducing the granularity of data representation. This decrease is a trade-off that results
outcomes in some loss of viability of data management or mining algorithms for gaining
some privacy. The l-diversity model (Distinct, Entropy, Recursive) [29, 31, 34] is an
extension of the k-anonymity model which diminishes the granularity of data representation
utilizing methods including generalization and suppression in a way that any given record
maps onto at least k different records in the data. The l-diversity model handles a few of the
weaknesses in the k-anonymity model in which protected identities to the level of k-
individuals is not equal to protecting the corresponding sensitive values that were generalized
or suppressed, particularly when the sensitive values in a group exhibit homogeneity. The l-
diversity model includes the promotion of intra-group diversity for sensitive values in the
anonymization mechanism. The problem with this method is that it depends upon the range
of sensitive attribute. If want to make data L-diverse though sensitive attribute has not as
much as different values, fictitious data to be inserted. This fictitious data will improve the
security but may result in problems amid analysis. Also L-diversity method is subject to
skewness and similarity attack [34] and thus can’t prevent attribute disclosure.
T-closeness
It is a further improvement of l-diversity group based anonymization that is used to preserve
privacy in data sets by decreasing the granularity of a data representation. This reduction is a
trade-off that results in some loss of adequacy of data management or mining algorithms in
order to gain some privacy. The t-closeness model(Equal/Hierarchical distance) [29, 33]
extends the l-diversity model by treating the values of an attribute distinctly by taking into
account the distribution of data values for that attribute.
Table 4 Existing De-identification preserving privacy measures and its limitations in big
data
The four categories in which HybrEx MapReduce enables new kinds of applications that
utilize both public and private clouds are as follows-
1. 1.
Map hybrid The map phase is executed in both the public and the private clouds while
the reduce phase is executed in only one of the clouds as shown in Fig. 3a.
Fig. 3
Vertical partitioning It is shown in Fig. 3b. Map and reduce tasks are executed in the
public cloud using public data as the input, shuffle intermediate data amongst them,
and store the result in the public cloud. The same work is done in the private cloud
with private data. The jobs are processed in isolation.
3. 3.
Horizontal partitioning The Map phase is executed at public clouds only while the
reduce phase is executed at a private cloud as can be seen in Fig. 3c.
4. 4.
Hybrid As in the figure shown in Fig. 3d, the map phase and the reduce phase are
executed on both public and private clouds. Data transmission among the clouds is
also possible.
Integrity check models of full integrity and quick integrity checking are suggested as well.
The problem with HybridEx is that it does not deal with the key that is generated at public
and private clouds in the map phase and that it deals with only cloud as an adversary.
Privacy-preserving aggregation
Privacy-preserving aggregation [38] is built on homomorphic encryption used as a popular
data collecting technique for event statistics. Given a homomorphic public key encryption
algorithm, different sources can use the same public key to encrypt their individual data into
cipher texts [39]. These cipher texts can be aggregated, and the aggregated result can be
recovered with the corresponding private key. But, aggregation is purpose-specific. So,
privacy- preserving aggregation can protect individual privacy in the phases of big data
collecting and storing. Because of its inflexibility, it cannot run complex data mining to
exploit new knowledge. As such, privacy-preserving aggregation is insufficient for big data
analytics.
Differential privacy
Differential Privacy [40] is a technology that provides researchers and database analysts a
facility to obtain the useful information from the databases that contain personal information
of people without revealing the personal identities of the individuals. This is done by
introducing a minimum distraction in the information provided by the database system. The
distraction introduced is large enough so that they protect the privacy and at the same time
small enough so that the information provided to analyst is still useful. Earlier some
techniques have been used to protect the privacy, but proved to be unsuccessful.
Differential Privacy (DP) deals to provide the solution to this problem as shown Fig. 4. In DP
analyst are not provided the direct access to the database containing personal information. An
intermediary piece of software is introduced between the database and the analyst to protect
the privacy. This intermediary software is also called as the privacy guard.
Fig. 4
Step 1 The analyst can make a query to the database through this intermediary privacy guard.
Step 2 The privacy guard takes the query from the analyst and evaluates this query and other
earlier queries for the privacy risk. After evaluation of privacy risk.
Step 3 The privacy guard then gets the answer from the database.
Step 4 Add some distortion to it according to the evaluated privacy risk and finally provide it
to the analyst.
The amount of distortion added to the pure data is proportional to the evaluated privacy risk.
If the privacy risk is low, distortion added is small enough so that it do not affect the quality
of answer, but large enough that they protect the individual privacy of database. But if the
privacy risk is high then more distortion is added.
To meet these objectives, Intel created an open architecture for anonymization [41] that
allowed a variety of tools to be utilized for both de-identifying and re-identifying web log
records. In the process of implementing architecture, found that enterprise data has properties
different from the standard examples in anonymization literature [43]. This concept showed
that big data techniques could yield benefits in the enterprise environment even when
working on anonymized data. Intel also found that despite masking obvious Personal
Identification Information like usernames and IP addresses, the anonymized data was
defenceless against correlation attacks. They explored the trade-offs of correcting these
vulnerabilities and found that User Agent (Browser/OS) information strongly correlates to
individual users. This is a case study of anonymization implementation in an enterprise,
describing requirements, implementation, and experiences encountered when utilizing
anonymization to protect privacy in enterprise data analysed using big data techniques. This
investigation of the quality of anonymization used k-anonymity based metrics. Intel used
Hadoop to analyse the anonymized data and acquire valuable results for the Human Factors
analysts [44, 45]. At the same time, learned that anonymization needs to be more than simply
masking or generalizing certain fields—anonymized datasets need to be carefully analysed to
determine whether they are vulnerable to attack.
Fig. 5
In Fig. 6, the service provider adds a dummy item as noise to the original transaction data
collected by the data provider. Subsequently, a unique code is assigned to the dummy and the
original items. The service provider maintains the code information to filter out the dummy
item after the extraction of frequent item set by an external cloud platform. Apriori algorithm
is performed by the external cloud platform using data which is sent by the service provider.
The external cloud platform returns the frequent item set and support value to the service
provider. The service provider filters the frequent item set that is affected by the dummy item
using a code to extract the correct association rule using frequent item set without the dummy
item. The process of extraction association rule is not a burden to the service provider,
considering that the amount of calculation required for extracting the association rule is not
much.
Fig. 6
Overview of the process of association rule mining the service provider adds a
dummy item as noise to the original transaction data collected by the data provider
Despite the fact that k-anonymity can prevent identity attacks, it fails to protect from attribute
disclosure attacks because of the lack of diversity in the sensitive attribute within the
equivalence class. The l-diversity model mandates that each equivalence class must have at
least l well-represented sensitive values. It is common for large data sets to be processed with
distributed platforms such as the MapReduce framework [51, 52] in order to distribute a
costly process among multiple nodes and accomplish considerable performance
improvement. Therefore, in order to resolve the inefficiency, improvements of privacy
models are introduced.
MapReduce-based l-diversity
The extension of the privacy model from k-anonymity to l-diversity requires the integration
of sensitive values into either the output keys or values of the mapper. Thus, pairs which are
generated by mappers and combiners need to be appropriately modified. Unlike the mapper
in k-anonymity, the mapper in l-diversity, receives both quasi-identifiers and the sensitive
attribute as input [50].
1. 1.
Unlike static data, data streams need real-time processing and the existing k-
anonymity approaches are NP-hard, as proved.
2. 2.
For the existing static k-anonymization algorithms to reduce information loss, data
must be repeatedly scanned during the anonymization procedure. The same process is
impossible in data streams processing.
3. 3.
The scales of data streams that need to be anonymized in some applications are
increasing tremendously.
Data streams have become so large that anonymizing them is becoming a challenge for
existing anonymization algorithms.
To cope with the first and second aforementioned challenges, FADS algorithm was chosen.
This algorithm is the best choice for data stream anonymization. But it has two main
drawbacks:
1. 1.
The FADS algorithm handles tuples sequentially so is not suitable for big data stream.
2. 2.
Some tuples may remain in the system for quite a while and are discharged when a
specified threshold comes to an end.
This work provided three contributions. First, utilizing parallelism to expand the
effectiveness of FADS algorithm and make it applicable for big data stream anonymization.
Second, proposal of a simple proactive heuristic estimated round-time to prevent publishing
of a tuple after its expiration. Third, illustrating (through experimental results) that FAST is
more efficient and effective over FADS and other existing algorithm while it noticeably
diminishes the information loss and cost metric during anonymization process.
Proactive heuristic
In FADS, a new parameter is considered that represented the maximum delay that is tolerable
for an application. This parameter is called expiration-time. To avert a tuple be published
when its expiration-time passed, a simple heuristic estimated-round-time is defined. In
FADS, there is no check for whether a tuple can remain more in the system or not. As a
result, some tuples are published after expiration. This issue is violated the real time
condition of a data stream application and also increase cost metric notably.
Big data presented a comprehensive survey of different tools and techniques used in
Pervasive healthcare in a disease-specific manner. It covered the major diseases and disorders
that can be quickly detected and treated with the use of technology, such as fatal and non-
fatal falls, Parkinson’s disease, cardio-vascular disorders, stress, etc. We have discussed
different pervasive healthcare techniques available to address those diseases and many other
permanent handicaps, like blindness, motor disabilities, paralysis, etc. Moreover, a plethora
of commercially available pervasive healthcare products. It provides understanding of the
various aspects of pervasive healthcare with respect to different diseases [63].
Adoption of big data in healthcare significantly increases security and patient privacy
concerns. At the outset, patient information is stored in data centres with varying levels of
security. Traditional security solutions cannot be directly applied to large and inherently
diverse data sets. With the increase in popularity of healthcare cloud solutions, complexity in
securing massive distributed Software as a Service (SaaS) solutions increases with varying
data sources and formats. Hence, big data governance is necessary prior to exposing data to
analytics.
Data governance
1. 1.
2. 2.
3. 3.
1. 1.
Analysing security risks and predicting threat sources in real-time is of utmost need in
the burgeoning healthcare industry.
2. 2.
3. 3.
1. 1.
Invasion of patient privacy is a growing concern in the domain of big data analytics.
2. 2.
1. 1.
Health data is usually collected from different sources with totally different set-ups
and database designs which makes the data complex, dirty, with a lot of missing data,
and different coding standards for the same fields.
2. 2.
Problematic handwritings are no more applicable in EHR systems, the data collected
via these systems are not mainly gathered for analytical purposes and contain many
issues—missing data, incorrectness, miscoding—due to clinicians’ workloads, not
user friendly user interfaces, and no validity checks by humans [66].
Data sharing and privacy
1. 1.
The health data contains personal health information (PHI), there will be legal
difficulties in accessing the data due to the risk of invading the privacy.
2. 2.
Health data can be anonymized using masking and de-identification techniques, and
be disclosed to the researchers based on a legal data sharing agreement [67].
3. 3.
The data gets anonymized so much with the aim of protecting the privacy, on the
other hand it will lose its quality and would not be useful for analysis anymore And
coming up with a balance between the privacy-protection elements (anonymization,
sharing agreement, and security controls) is essential to be able to access a data that is
usable for analytics.
Relying on predictive models
1. 1.
It should not be unrealistic expectations from the constructed data mining models.
Every model has an accuracy.
2. 2.
1. 1.
The underlying math of almost all data mining techniques is complex and not very
easily understandable for non-technical fellows, thus, clinicians and epidemiologists
have usually preferred to continue working with traditional statistics methods.
2. 2.
It is essential for the data analyst to be familiar with the different techniques, and also
the different accuracy measurements to apply multiple techniques when analysing a
specific dataset.
Summary on recent approaches used in big data privacy
In this section, a summary on recent approaches used in big data privacy is done. Table 5 is
presented here comprising of different papers, the methods introduced, their focus and
demerits. It presents an overview of the work done till now in the field of big data privacy.
References
1. Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S,
Stone-braker M, Tatbul N, Zdonik SB. Aurora: a new model and
architecture for data stream manag ement. VLDB J. 2003;12(2):120–
39.
Article Google Scholar
Article Google Scholar
Google Scholar
5. Gantz J, Reinsel D. Extracting value from chaos. In: Proc on IDC
IView. 2011. p. 1–12.
6. Tsai C-W, Lai C-F, Chao H-C, Vasilakos AV. Big data analytics: a
survey. J Big Data Springer Open J. 2015.
7. Mehmood A, Natgunanathan I, Xiang Y, Hua G, Guo S. Protection of
big data privacy. In: IEEE translations and content mining are
permitted for academic research. 2016.
8. Jain P, Pathak N, Tapashetti P, Umesh AS. Privacy preserving
processing of data decision tree based on sample selection and
singular value decomposition. In: 39th international conference on
information assurance and security (lAS). 2013.
9. Qin Y, et al. When things matter: a survey on data-centric internet of
things. J Netw Comp Appl. 2016;64:137–53.
Article Google Scholar
Google Scholar
Article Google Scholar
Article Google Scholar
Article Google Scholar
Article Google Scholar
Article Google Scholar
Google Scholar
Article Google Scholar
Google Scholar
Article Google Scholar
Article Google Scholar
MathSciNet Article MATH Google Scholar
Article Google Scholar
Article Google Scholar
Google Scholar
51. Dean J, Ghemawat S. Map reduce: simplied data processing on large
clusters. OSDI; 2004.
52. Lammel R. Google’s MapReduce programming model-
revisited. Sci Comput Progr. 2008;70(1):1–30.
MathSciNet Article MATH Google Scholar
Article Google Scholar
Google Scholar
Article Google Scholar
Article Google Scholar
Article Google Scholar
Google Scholar
Article Google Scholar
Book Google Scholar
68. Wu X. Data mining with big data. IEEE Trans Knowl Data Eng.
2014;26(1):97–107.
Article Google Scholar
Article Google Scholar
Google Scholar
Article Google Scholar
MathSciNet Article MATH Google Scholar
Article Google Scholar
Article Google Scholar
Download references
Authors’ contributions
PJ performed the primary literature review and analysis for this manuscript work. MG
worked with PJ to develop the article framework and focus, and MG also drafted the
manuscript. NK introduced this topic to PJ. MG and NK revised the manuscript for important
intellectual content and have given final approval of the version to be published. All authors
read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Author information
Affiliations
1. Computer Science Department, MANIT, Bhopal, India
Priyank Jain, Manasi Gyanchandani & Nilay Khare
Corresponding author
Correspondence to Priyank Jain.