CURATE: Scaling-Up Differentially Private Causal Graph Discovery

Bhattacharjee, Payel; Tandon, Ravi

doi:10.3390/e26110946

Open AccessArticle

CURATE: Scaling-Up Differentially Private Causal Graph Discovery

by

Payel Bhattacharjee

^*,†

and

Ravi Tandon

^†

Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721, USA

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our published paper: Bhattacharjee, P.; Tandon, R. Adaptive Privacy for Differentially Private Causal Graph Discovery. In Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024, London, UK, 22–25 September 2024.

Entropy 2024, 26(11), 946; https://doi.org/10.3390/e26110946

Submission received: 9 September 2024 / Revised: 1 November 2024 / Accepted: 3 November 2024 / Published: 5 November 2024

(This article belongs to the Special Issue Information-Theoretic Security and Privacy)

Download

Browse Figures

Versions Notes

Abstract

:

Causal graph discovery (CGD) is the process of estimating the underlying probabilistic graphical model that represents the joint distribution of features of a dataset. CGD algorithms are broadly classified into two categories: (i) constraint-based algorithms, where the outcome depends on conditional independence (CI) tests, and (ii) score-based algorithms, where the outcome depends on optimized score function. Because sensitive features of observational data are prone to privacy leakage, differential privacy (DP) has been adopted to ensure user privacy in CGD. Adding the same amount of noise in this sequential-type estimation process affects the predictive performance of algorithms. Initial CI tests in constraint-based algorithms and later iterations of the optimization process of score-based algorithms are crucial; thus, they need to be more accurate and less noisy. Based on this key observation, we present CURATE (CaUsal gRaph AdapTivE privacy), a DP-CGD framework with adaptive privacy budgeting. In contrast to existing DP-CGD algorithms with uniform privacy budgeting across all iterations, CURATE allows for adaptive privacy budgeting by minimizing error probability (constraint-based), maximizing iterations of the optimization problem (score-based) while keeping the cumulative leakage bounded. To validate our framework, we present a comprehensive set of experiments on several datasets and show that CURATE achieves higher utility compared to existing DP-CGD algorithms with less privacy leakage.

Keywords:

differential privacy; causal graph discovery; adaptive privacy budgeting

1. Introduction

Causal Graph Discovery (CGD) enables estimation of the partially connected directed acyclic graph (DAG) that represents the underlying joint probability distribution of the features of an observational dataset. CGD is an important part of causal inference [1] and is widely used in various disciplines, including biology [2], genetics [3], drug discovery, ecology [4], curriculum design [5], finance, and banking [6].

Overview of Causal Graph Discovery (CGD): The process of estimating the causal graph from observational data relies on the execution of causal graph discovery algorithms. CGD algorithms are broadly classified into two categories: constraint-based algorithms and score-based algorithms. Constraint-based algorithms, including the PC algorithm (named after the authors Peter and Clark) [1], Fast Causal Inference (FCI) algorithm [7], and their variants [8], estimate the causal graph in two phases: first, in the skeleton phase, the algorithm starts with a fully connected graph, then updates the graph based on statistical conditional independence (CI) test results and returns a partially-connected undirected graph. To determine conditional independence, a variety of test statistics can be used, such as the G-test [9] or

χ^{2}

-test [10], as well as correlation coefficients such as Kendall’s Tau [11] or Spearman’s Rho [12]. In the second orientation phase, the algorithm orients the undirected edges based on the CI test results obtained in the skeleton phase and returns the estimated causal graph. Constraint-based algorithms are theoretically guaranteed to converge to the complete partial directed acyclic graph (CPDAG) under certain conditions, including the correctness of the CI tests, causal sufficiency, Markov assumptions, etc. On the other hand, score-based algorithms estimate the causal graph from observational datasets by optimizing a score function. The algorithm assigns relevance scores such as the Bayesian Dirichlet equivalent uniform (BDe(u)) score [13], Bayesian Gaussian equivalent (BGe) score [14], Bayesian information criterion (BIC [15]), and minimum description length (MDL) [16] to all potential candidate graphs derived from the dataset and uses them to estimate the best graph. This method enables score-based algorithms to avoid the need for a large amount of CI tests. In a recent work, NOTEARS [17] converted the traditional combinatorial problem to a continuous optimization problem in order to estimate the DAG. However, these algorithms are computationally more expensive, as they must enumerate and score each and every conceivable graph among all of the variables provided.

Privacy Threats and Differentially Private CGD: CGD algorithms often work with real-world datasets that may contain sensitive and private information about participants, including social and demographic details, credit histories, medical conditions, etc. Thus, releasing the causal graph itself or the intermediate statistical conditional independence (CI) test results may lead ot privacy leakage. Recent work [18] has demonstrated membership inference threats through probabilistic graphical models. Several recent works have adopted the notion of differential privacy (DP) [19] in the context of CGD to ensure a certain level of user privacy.

For instance, existing constraint-based differentially private CGD (DP-CGD) algorithms incorporate several differential privacy techniques to perturb the CI test statistic, including the Laplace Mechanism (PrivPC) [20], Exponential Mechanism (EM-PC) [21], and Sparse Vector Technique (SVT-PC) [20]. For score-based algorithms, NOLEAKS [22] adopted the Gaussian Mechanism to perturb the gradient of the optimization problem. However, existing algorithms rely on the method of adding the same amount of noise to each iteration of the estimation process. As shown in Figure 1 and discussed in Section 3, the CI tests in constraint-based CGD can be highly interdependent. If an edge between two variables is deleted by a CI test, then the conditional interdependence (conditioned on any other subset of features) is never checked in later iterations. Furthermore, this issue also impacts the scalability of private CGD, as the total privacy leakage blows up for datasets with a large number of features (

d > > 1

). Meanwhile, differentially private score-based algorithms such as NOLEAKS [22] optimize the objective function to obtain the adjacency matrix of the estimated DAG. Because this optimization technique utilizes noisy gradients of the objective function, adding the same amount of noise may lead to higher convergence times, as the optimal point may be missed by the algorithm during noise addition. To prevent the algorithm from missing optima and speed up converge, the later iterations of the optimization process should ideally be less noisy.

Overview of the Proposed CURATE Framework: The aforementioned observations bring forth the important point of adaptive privacy budgeting for both constraint-based and score-based differentially private CGD algorithms. For constraint-based algorithms, the initial CI tests are more crucial, as decisions regarding edge deletion are never checked in the later iterations. Within the scope of score-based algorithms, the later iterations in optimization are more critical. This motivates the idea of adaptive privacy budgeting. Given a total privacy budget, such budgeting can reduce the risk of errors propagating to subsequent iterations and improve the scalability of constraint-based algorithms. On the other hand, score-based algorithms ideally have less noise and more accuracy in the later iterations. Intuitively, allocating a higher privacy budget to later iterations of the optimization process should help to reduce the risk of missing the optima of the objective function. In this paper, we present an adaptive privacy budgeting framework called CURATE (CaUsal gRaph AdapTivE privacy) for both constraint- and score-based CGD algorithms in differentially private environments. The main contributions of this paper are summarized as follows:

Our proposed CURATE framework scales up the utility of the CGD process using adaptive privacy budget allocation. Within the scope of constraint-based DP-CGD algorithms, the constraint-based CURATE algorithm optimizes privacy budgets for each order of CI test (CI tests of the same order have the same privacy budget) in a principled manner, with the goal of minimizing the surrogate for the total probability of error. By allocating adaptive (and often comparatively higher) privacy budgets to the initial CI tests, CURATE ensures better overall predictive performance with less total leakage compared to the existing constraint-based DP-CGD algorithms.
We present a score-based CURATE algorithm which allows for adaptive budgeting to maximize the number of iterations given a fixed privacy budget ( $ϵ_{Total}$ ). The score-based CURATE algorithm uses a functional causal model-based optimization approach that allocates a higher privacy budget to later iterations. The privacy budget is incremented as a function of iterations, helping our score-based CURATE to achieve better utility in comparison to existing works.
We present extensive experimental results on six public CGD datasets to compare the predictive performance of our proposed CURATE framework with existing DP-CGD algorithms. Our experimental results show that CURATE ensures better predictive performance with less leakage by orders of magnitude. The average required number of CI tests in constraint-based CURATE is also significantly less than that of existing constraint-based DP-CGD algorithms.

2. Preliminaries on CGD and DP

In this section, we review the notion of causal graph discovery and provide a brief overview of both constraint-based algorithms (canonical PC) and FCM-based algorithms (NOTEARS, NOLEAKS) along with the description of differential privacy [19,23].

Definition 1

(Probabilistic Graphical Model). Given a joint probability distribution

P (F_{1}, \dots, F_{d})

of d random variables, the graphical model

G^{*}

with V vertices (

v_{1}, \dots, v_{d}

) and

E \subseteq V \times V

edges is known as a Probabilistic Graphical Model (PGM) if the joint distribution decomposes as

\begin{matrix} P (F_{1}, \dots, F_{d}) = \prod_{F_{a} \in {F_{1}, \dots, F_{d}}} P (F_{a} | P a (F_{a})), \end{matrix}

where

P a (F_{a})

represents the direct parents of the node

F_{a}

. A PGM relies on the assumption that probabilistic independence (

F_{a} ⫫ {}_{p}F_{b} | S

) ⇒ graphical independence (

v_{a} ⫫ {}_{G}v_{b} | S

) [24].

Definition 2

(Causal Graph Discovery). Given a dataset

D

with a collection of n i.i.d. samples

(x_{1}, \dots, x_{n})

drawn from a joint probability distribution

P (F_{1}, \dots, F_{d})

, where

x_{i}

is a d-dimensional vector representing the d features/variables of the

i^{t h}

sample (user), the method of estimating the PGM

(G^{*})

from

D

is known as causal graph discovery (CGD) [20].

Definition 3

(

ϵ, δ

)-Differential Privacy [19,23,25]). For all pairs of neighboring datasets

D

and

D^{'}

that differ by a single element, i.e.,

| | D - D^{'} {| |}_{1} \leq 1

, a randomized algorithm

M

with an input domain of D and output range

R

is considered to be

(ϵ, δ)

-differentially private if

\forall S \subseteq R

:

\begin{matrix} P [M (D) \in S] \leq e^{ϵ} P [M (D^{'}) \in S] + δ . \end{matrix}

Differentially private CGD algorithms have adopted the exponential mechanism [21], Laplace mechanism, sparse vector technique [20], and Gaussian mechanism [22] to ensure DP.

Definition 4

(

l_{k}

-Sensitivity). For two neighboring datasets

D

and

D^{'}

, the

l_{k}

-sensitivity of a function

f (\cdot)

is defined as

\begin{matrix} Δ_{k} (f) = max_{D, D^{'} \in R, | D, D^{'} | \leq 1} | | f (D) - f (D^{'}) {| |}_{k} . \end{matrix}

For instance, the Laplace mechanism perturbs the CI test statistic

f (\cdot)

with Laplace noise proportional to the

l_{1}

-sensitivity of the function

f (\cdot)

, whereas the Gaussian mechanism adds noise proportional to the

l_{2}

-sensitivity to guarantee DP. Ideally, the classical Gaussian mechanism uses

ϵ \leq 1

for (

ϵ, δ

) DP guarantees; however, this condition may not be sufficient in all CGD scenarios [22]. Therefore, the score-based DP algorithm in [22] uses the analytical Gaussian mechanism [26].

Definition 5

(Analytical Gaussian Mechanism [26]). For a function

f : X \leftarrow R^{d}

with

l_{2}

-sensitivity

Δ_{2}

and privacy parameters

ϵ \geq 0

and

δ \in [0, 1]

, the Gaussian output perturbation mechanism

A (x) = f (x) + Z

with

Z N (0, σ^{2} I)

is

(ϵ, δ)

-DP if and only if

Φ (\frac{Δ_{2}}{2 σ} - \frac{ϵ σ}{Δ_{2}}) - e^{ϵ} Φ (\frac{- Δ_{2}}{2 σ} - \frac{ϵ σ}{Δ_{2}}) \leq δ,

(1)

where Φ is the CDF of the Gaussian distribution.

Overview of Constraint-Based Algorithms: Canonical constraint-based CGD algorithms such as the PC algorithm [1] work in two phases: a skeleton phase followed by an orientation phase. In the skeleton phase, the algorithm starts with a fully connected graph (

G

) and prunes it by conducting a sequence of conditional independence (CI) tests. The CI tests in PC are order-dependent, and the order of a test represents the cardinality of the conditioning set S of features. In order-

(i)

tests, all connected node pairs (

v_{a}, v_{b}

) in

G

are tested for statistical independence conditioned on the set S. The conditioning set S is chosen such that

S \subseteq {A d j (G, v_{a}) \ v_{b}}

, where

A d j (G, v)

represents the adjacent vertices of node v in graph

G

. The edge between the node pairs (

v_{a}, v_{b}

) is deleted if they both pass the order-

(i)

CI test, after which no further testing is performed for statistical independence conditioned on set S with

| S | > i

. The remaining edges in

G

are then tested for independence in order-

(i + 1)

CI tests conditioned on a set S with

| S | = (i + 1)

. This process of CI testing continues until all connected node pairs in

G

are tested by conditioning on set S of size (

d - 2

). At the end of this phase, the PC algorithm returns the skeleton graph. Next, in the orientation phase, the algorithm orients the edges based on the separation set S of one independent node pair (

v_{a}, v_{b}

) without introducing cyclicity in

G

[1,21], as shown in Figure 1. In this two-step process, privacy leakage only occurs in the skeleton phase, as this is when the algorithm directly interacts with the dataset

D

. Therefore, the existing literature focuses on effectively privatizing CI tests subject to the notion of differential privacy [19,23], which can ensure that the presence/absence of a user will not significantly change the estimated causal graph.

Overview of Score-Based Algorithms: Score-based algorithms estimate the DAG that optimizes a predefined score function. Due to their combinatorial acyclicity constraints, learning DAGs from data is NP-hard [27]. To address this issue, the NOTEARS score-based CGD algorithm [17] proposes a continuous optimization problem with an acyclicity constraint to estimate the DAG from observational data, eliminating the need to search over the combinatorial space of DAGs. From a group of DAGs, the DAG is selected which optimizes a predefined score function

score (\cdot)

while satisfying the acyclicity constraints. Given an observational dataset

D

with n i.i.d. samples and d features

F = (F_{1}, F_{2}, \dots, F_{d})

, the algorithm estimates (mimics) the data generation process

f_{i} (\cdot)

for every i-th feature/variable by minimizing the loss function. Essentially, the adjacency matrix W that represents the edges of the graph

G

is modeled with the help of a functional causal model (FCM). FCM-based methods represent every i-th variable

F_{i}

of the dataset

D

as a function of its parents

Pa (F_{i})

, and represent the added noise Z as follows:

\begin{matrix} F_{i} = f_{i} (Pa (F_{i})) + Z . \end{matrix}

The key idea behind FCM-based CGD is to estimate the weight vector

w_{i}

for each variable

F_{i}

given its parents

Pa (F_{i})

. Therefore, each variable

F_{i}

can be represented as a weighted combination of its parents and noise Z as

F_{i} = w_{i}^{T} F + Z

. The optimization process of estimating the weight vector

w_{i}

is based on the idea of minimizing the squared loss function

ℓ (W, D) = \frac{1}{2 n} | | D - {D W | |}_{F}^{2}

, where W is the associated adjacency matrix of the dataset

D

and n is the number of samples. The loss function indicates how well the adjacency matrix W captures the dependencies (causal structure) in dataset

D

. The goal is to minimize the loss

ℓ (W, D)

in order to find the optimal adjacency matrix W. The squared loss

ℓ (W, D)

is defined over

R^{d \times d}

, while the minimizer of

ℓ (W, D)

recovers the true directed acyclic graph on finite samples with high probability [17]. In the optimization process, the algorithm also uses a penalty function

{λ | | W | |}_{1}

that penalizes dense graphs. The detailed working mechanism of FCM-based CGD algorithms is described in Section 3.

Sensitivity Analysis and Composition of DP: For the class of constraint-based algorithms, an edge between nodes (

v_{a}, v_{b}

) from the estimated graph

G

is deleted conditioned on set S if (

f_{v_{a}, v_{b} | S} (D) > T

), where

f_{v_{a}, v_{b} | S} (\cdot)

is the test statistic and T is the test threshold. Thus, the structure of the estimated causal graph depends on the nature of

f (\cdot)

and the threshold (T). Additionally, the amount of added noise in DP-CGD is proportional to the

l_{k}

-sensitivity (

Δ_{k}

) of the test statistic

f_{v_{a}, v_{b} | S} (\cdot)

. Therefore, to maximize the predictive performance, test statistics that have lower sensitivity with respect to sample size n are preferred. Through analysis, we have observed that the

l_{1}

-sensitivity of the Kendall’s

τ

test statistic can be bounded as

Δ_{1} \leq \frac{C}{\sqrt{n}}

, where C is a constant obtained through the analysis presented in Appendix A.2). Notably, any of the other CI test statistics mentioned in Section 1 can be used in the constraint-based CURATE framework. The class of score-based algorithms focuses on optimizing a score function to estimate the causal graph. These algorithms often rely on gradient-based methods, with the gradient of the objective function frequently being clipped and perturbed to preserve privacy. As mentioned in [22], the

l_{2}

-sensitivity of the clipped gradient can be bounded as

Δ_{2} \leq \frac{d s}{n}

, where s is the clipping threshold. In [22], the authors further exploited the properties of the dataset and adjacency matrix, allowing the

l_{2}

of the gradient to be be assigned an upper bound of

Δ_{2} \leq \frac{\sqrt{d (d - 1) s}}{n}

. Composition is a critical tool in DP-CGD, since differentially private CGD algorithms discover the causal graph in an iterative process. Constraint-based CGD algorithms run a sequence of interdependent tests, while score-based algorithms optimize a predefined score function in an iterative manner. Therefore, the total privacy leakage can be calculated using basic composition [19,23,28,29], advanced composition [25,29], optimal composition [30], adaptive composition [31], or moments accounting [32]. Within the scope of this paper, we consider basic composition (also known as sequential composition) and advanced composition. Because each order-i conditional independence (CI) test in the constraint-based CURATE algorithm has the same privacy budget and failure probability, we apply advanced composition to calculate the per-order privacy leakage. However, as the constraint-based and score-based CURATE algorithms have different privacy budgets over different iterations (orders), we apply basic composition to calculate the total leakage.

3. Adaptive Differential Privacy in Causal Graph Discovery

This section presents the main idea of this paper, our CURATE adaptive privacy budgeting framework. In Section 3.1, we demonstrate the adaptive privacy budgeting mechanism for constraint-based algorithms. We introduce and explain the basic optimization problem that enables the allocation of the adaptive privacy budget through all the iterations (orders) of the CI tests. In Section 3.2, we present adaptive privacy budget allocation for score-based algorithms. We introduce adaptivity while ensuring differential privacy (DP) during evaluation of the weighted adjacency matrix. This section provides the theoretical foundation behind the adaptive privacy budget allocation mechanism in the context of DP-CGD.

3.1. Adaptive Privacy Budget Allocation with Constraint-Based CURATE Algorithm

In this section, we present CURATE for enabling adaptive privacy budgeting while minimizing the error probability. As the CI tests in constraint-based CGD algorithms are highly interdependent, predicting the total number of CI tests in CGD before the execution of the tests is difficult. The number of order-

(i)

CI tests

t_{i}

enables the framework to approximate the per-order privacy budgets for later iterations

ϵ_{i}, \dots, ϵ_{d - 2}

based on the total remaining privacy budget

ϵ_{Total}^{(i)}

. A naive data-agnostic way to choose an upper bound on

t_{i}

is

t_{i} \leq (\binom{d}{2}) \times (\binom{d - 2}{i})

, where

(\binom{d}{2})

represents the number of ways to select an edge from the edges of a fully connected graph (the way of selecting an edge between two connected nodes out of d nodes) and

(\binom{d - 2}{i})

refers to the selection of the conditioning set S with cardinality

| S | = i

. However, this upper bound is too large, and does not depend on the outcome of the previous iteration. A better approximation of

t_{i}

is always possible given the outcome of the previous iteration. As DP is immune to postprocessing [23], releasing the number edges

e_{i + 1}

after executing order-

(i)

differentially private CI tests will preserve differential privacy. For instance, the possible number of order-(

i + 1

) CI tests can always be upper-bounded as

t_{i + 1} \leq e_{i + 1} \times (\binom{d - 2}{i + 1})

, where

e_{i + 1}

represents the remaining edges after the order-

(i)

tests. After studying both methods, we have observed that

t_{i + 1} \leq e_{i + 1} \times (\binom{d - 2}{i + 1})

is a better estimate of

t_{i + 1}

, as

e_{i} \leq (\binom{d}{2}), \forall i \in {0, d - 2}

. Given the outcome of graph

G

of the order-(

i - 1

) tests, with edges

e_{i}

and a total (remaining) privacy budget of

ϵ_{Total}^{(i)}

, we assign privacy budgets

(ϵ_{i}, \dots, ϵ_{d - 2})

. As every order-

(i)

CI test in CURATE is

(ϵ_{i}, δ)

-DP with DP failure probability

δ, δ^{'} > 0

, the total leakage in order

(i)

is calculated with advanced composition [25] as

ϵ_{curate}^{(i)} = t_{i} ϵ_{i}^{2} + \sqrt{2 log (\frac{1}{δ^{'}}) t_{i} ϵ_{i}^{2}}

, while the total failure probability in DP is calculated as

δ_{c u r a t e}^{(i)} = (δ^{'} + t_{i} δ)

. However, because different orders have different privacy budgets, the total privacy leakage in CURATE is calculated through basic composition [25] as

\sum_{j = 0}^{d - 2} ϵ_{curate}^{(j)} = \sum_{j = 0}^{d - 2} (t_{j} ϵ_{j}^{2} + \sqrt{2 t_{j} log (\frac{1}{δ^{'}}) ϵ_{j}^{2}})

, while the cumulative failure probability of CURATE is

\sum_{j = 0}^{d - 2} δ_{c u r a t e}^{(j)}

(refer to Figure 2). Therefore, given the outcome of the order-

(i - 1)

tests, the total leakage in CURATE must satisfy

\sum_{j = i}^{d - 2} (t_{j} ϵ_{j}^{2} + \sqrt{2 t_{j} log (\frac{1}{δ^{'}}) ϵ_{j}^{2}}) \leq ϵ_{Total}^{(i)}

, where

t_{j} = e_{j} \times (\binom{d - 2}{j})

and

\sum_{j = 0}^{d - 2} δ_{c u r a t e}^{(j)} \leq δ_{Total}

. Moreover, we enforce

ϵ_{i} \geq ϵ_{i + 1} \geq \dots \geq ϵ_{d - 2}

to ensure that the initial CI tests receive a higher privacy budget.

DP-CI Test in CURATE: The differentially private order-

(i)

CI test with privacy budget

ϵ_{i}

for variables

(v_{a}, v_{b}) \in G

conditioned on a set of variables S is defined as follows:

If $\hat{f} > T (1 + β_{2}) \Rightarrow$ delete edge ( $v_{a}, v_{b}$ )
Else, if $\hat{f} < T (1 - β_{1}) \Rightarrow$ keep edge ( $v_{a}, v_{b}$ )
Else, keep the edge with probability $\frac{1}{2}$

where

\hat{f} : = f_{v_{a}, v_{b} | S} (D) + Lap (\frac{Δ}{ϵ_{i}})

,

Lap (\frac{Δ}{ϵ_{i}})

is the Laplace noise,

Δ

denotes the

l_{1}

-sensitivity of the test statistic, T denotes the threshold, and

(β_{1}, β_{2})

denote the margins. In order to keep the utility high, we would ideally like to pick

(ϵ_{i}, ϵ_{i + 1}, \dots, ϵ_{d - 2})

that minimizes the error probability

P [E] = P [G \neq G^{*}]

, where

G^{*}

is the true causal graph and

G

is the estimated causal graph. Unfortunately, we do not have access to

G^{*}

; thus, in this paper we instead propose using a surrogate for the error by considering type-I and type-II errors relative to the unperturbed (non-private) statistic. A relative type-I error relative to the unperturbed CI test occurs when the private algorithm retains the edge and the unperturbed test statistic deletes the edge

(f_{v_{a}, v_{b} | S} (D) > T)

, while a relative type-II error occurs when the algorithm deletes an edge and the unperturbed test statistic keeps that edge

(f_{v_{a}, v_{b} | S} (D) < T)

. The next Lemma provides the upper bounds on the relative type-I and type-II error probabilities in CURATE.

Lemma 1.

For some

c_{1}, c_{2} \in (0, 1)

and non-negative test threshold margins

(β_{1}, β_{2})

, the relative type-I (

P [E_{1}^{i}]

) and type-II (

P [E_{2}^{i}]

) errors in order-

(i)

CI tests in CURATE with privacy budget

ϵ_{i}

and

l_{1}

-sensitivity Δ can be bounded as follows:

\begin{matrix} P [E_{1}^{i}] \leq \underset{q_{i}^{(1)}}{\underset{︸}{\frac{c_{1}}{2} + \frac{1}{2} e^{(- \frac{T β_{1} ϵ_{i}}{Δ})}}}, P [E_{2}^{i}] \leq \underset{q_{i}^{(2)}}{\underset{︸}{\frac{c_{2}}{2} + \frac{1}{2} e^{(- \frac{T β_{2} ϵ_{i}}{Δ})}}} . \end{matrix}

The proof of Lemma 1 is presented in the Appendix A. Because adding Laplace noise to the test statistic introduces randomness into the hypothesis testing process, we have designed threshold margins to precisely bound the error probabilities. As shown in Figure 3, we decide whether to retain or delete an edge based on the noisy test statistic

\hat{f} (\cdot)

. The margins

β_{1}

and

β_{2}

can also be optimized and adaptively chosen based on dataset characteristics. For instance, if the noisy test statistics lie far from the threshold T, a larger margin can be allowed and higher values for

β_{1}

and

β_{2}

can be selected; conversely, if the noisy test statistics are near the threshold, then it is necessary to reduce the margins in order to make better use of hypothesis testing. The main objective of CURATE is to adaptively allocate privacy budgets for order-

(i)

CI tests by minimizing the total relative error. The leakage in DP-CGD depends on the number of CI tests, and the number of CI tests depends in turn upon the number of edges in the estimated graph

G

. As the number of edges in the true graph is unknown, we use

P [E_{1}^{i}] + P [E_{2}^{i}]

as a surrogate for the total error probability

P [E]

. Given the outcome of order-(

i - 1

) tests, the algorithm can make a type-I error by preserving an edge that is not present in the true graph until order

d - 2

. If such an edge is present after the order-(

i - 1

) tests, the probability of a type-I error at the end of order

d - 2

can be represented as

\prod_{j = i}^{d - 2} q_{j}^{(1)}

, as the addition of independent noise to each CI test enables the framework to bound the probability of error in each order independently, and the total error probability at the end of order

(d - 2)

is the cumulative error made by the algorithm in every order

(j)

. Similarly, the probability of keeping an edge which is present in the ground truth after order-

(i - 1)

tests can be represented as

\prod_{j = i}^{d - 2} (1 - q_{j}^{(2)})

; therefore, the total type-II error can be represented as

(1 - (\prod_{j = i}^{d - 2} (1 - q_{j}^{(2)})))

. Given the outcome of order-

(i - 1)

CI tests

G

, this allows us to construct the main objective function of this paper. The objective function that we propose to minimize is

\begin{matrix} \prod_{j = i}^{d - 2} q_{j}^{(1)} + (1 - (\prod_{j = i}^{d - 2} (1 - q_{j}^{(2)}))) . \end{matrix}

(2)

Because the number of edges in the true graph is unknown, we propose to minimize (2) as a surrogate for the error probability.

Optimization for Privacy Budget Allocation: By observing the differentially private outcome of order-(

i - 1

) CI tests (the remaining edges

e_{i}

in graph

G

), CURATE optimizes for

\bar{ϵ} = {ϵ_{i}, . ., ϵ_{d - 2}}

(the privacy budgets for subsequent order-

(i)

tests and beyond) while minimizing the objective function as described in (2). Formally, we define the following optimization problem in CURATE, denoted as

O P T (ϵ_{Total}^{(i)}, e_{i}, i)

:

\underset{O P T (ϵ_{Total}^{(i)}, e_{i}, i)}{\underset{︸}{arg {min}_{\bar{ϵ}} \prod_{j = i}^{d - 2} q_{j}^{(1)} + (1 - (\prod_{j = i}^{d - 2} (1 - q_{j}^{(2)})))}} s . t . \{\begin{matrix} \sum_{j = i}^{d - 2} \underset{total leakage in order - (j)}{\underset{︸}{(t_{j} ϵ_{j}^{2} + \sqrt{2 log (\frac{1}{δ^{'}}) t_{j} ϵ_{j}^{2}})}} \leq ϵ_{Total}^{(i)} \\ ϵ_{j} \geq ϵ_{j + 1} . \end{matrix}

(3)

Given the outcome of order-

(i - 1)

tests, the above optimization function

O P T (ϵ_{Total}^{(i)}, e_{i}, i)

takes the following inputs: (a) the remaining total budget

ϵ_{Total}^{(i)}

, (b) the remaining edges

e_{i}

in the output graph

G

after all order-

(i - 1)

tests, and (c) the order index, i.e., order i. The function then optimizes and outputs the privacy budgets

ϵ_{i}, \dots, ϵ_{d - 2}

for the remaining order tests while satisfying the two constraints mentioned in (3). Because the optimization problem in (3) is difficult to solve in a closed form, in our experiments we used sequential least squares programming (SLSQP) to optimize the objective function.

Constraint-Based CURATE Algorithm: Next, we present the constraint-based CURATE as Algorithm 1, which enables adaptive privacy budget allocation for each order-i conditional independence test by solving the optimization problem in (3). The constraint-based algorithm in CURATE uses the optimization function

O P T (\cdot)

recursively to adaptively observe chosen per-iteration privacy budgets. Given the total privacy budget

ϵ_{Total}^{(i)}

for an order-i test,

O P T (\cdot)

calculates the remaining privacy budget for order-

(i + 1)

CI tests based on the number

t_{i}

of order-i CI tests:

\begin{matrix} \underset{budget for order - (i + 1)}{\underset{︸}{ϵ_{Total}^{(i + 1)}}} = \underset{budget for order - i}{\underset{︸}{ϵ_{Total}^{(i)}}} - \underset{actual leakage in order - i}{\underset{︸}{(t_{i} ϵ_{i}^{2} + ϵ_{i} \sqrt{2 t_{i} log (\frac{1}{δ^{'}})})}} . \end{matrix}

Initially, the remaining budget for order-0 CI tests is equal to the assigned total privacy budget, i.e.,

ϵ_{Total}^{(0)} = ϵ_{Total}

, and the edges in the complete graph

G_{0}

can be expressed as

e_{0} = (\binom{d}{2})

. In order 0, CURATE solves for

ϵ_{0}, \dots, ϵ_{d - 2}

using the function

O P T (ϵ_{Total}^{(0)}, e_{0}, 0)

. After completion of all order-0 CI tests, the algorithm calculates the remaining budget for order-1 CI tests as

ϵ_{Total}^{(1)} = ϵ_{Total}^{(0)} - (t_{0} ϵ_{0}^{2} + ϵ_{0} \sqrt{2 t_{0} log (\frac{1}{δ^{'}})})

; by observing the remaining edges

e_{1}

, it then solves for the next set of privacy budgets

ϵ_{1}, \dots, ϵ_{d - 2}

. This process is then recursively applied for all

i \in {0, 1, \dots, d - 2}

, corresponding to all order-i tests.

Algorithm 1: Constraint-based CURATE Algorithm

Sub-sampling has also been adopted by several recent works on DP-CGD [20,22]. As sub-sampling amplifies differential privacy [26], we can also readily incorporate sub-sampling parameters within the optimization frameworks of both constraint-based and score-based CURATE.

3.2. Adaptive Privacy Budget Allocation with Score-Based CURATE Algorithm

In this subsection, we present the adaptive and nonuniform privacy budget allocation mechanism for the class of score-based algorithms. This mechanism is based on the functional causal model (FCM) idea. Traditional score-based algorithms estimate the causal graph that optimizes a predefined score function, such as the Bayesian Dirichlet equivalent uniform (BDe(u)) score [13], Bayesian Gaussian equivalent (BGe) score [14], Bayesian information criterion (BIC) [15], or minimum description length (MDL) [16]. These methods are agnostic to the underlying true distribution of the data. There is a line of work in the literature that aims to extract more accurate underlying distributions from observational data through a functional causal model (FCM). Given an observational dataset

D

with

(x_{1}, \dots, x_{n})

i.i.d. samples and d number of features (

F = {F_{1}, \dots, F_{d}}

), FCM-based methods mimic the data generation process

f_{i} (\cdot)

to obtain feature

F_{i}

as a function of its parents

P a (F_{i})

and added noise Z, as follows:

\begin{matrix} F_{i} = f_{i} (Pa (F_{i})) + Z . \end{matrix}

It is worth mentioning that the added noise Z is independent of

P a (F_{i})

and depends on the sensitivity of the deterministic function

f_{i} (\cdot)

. Because traditional score-based algorithms impose combinatorial acyclicity constraints while learning DAG from observational data, the estimation process becomes NP-hard [27]. To address this, the non-private FCM-based NOTEARS algorithm [17], introduces a continuous optimization problem which optimizes a score function

score (W)

as follows:

\begin{matrix} min_{W \in R^{d x d}} score (W) subject to h (W) = 0 \end{matrix}

(4)

where the score function

score (\cdot) : R^{d x d} \to R

is the combination of the squared loss function and a penalization function. Briefly, the score function is defined as

\begin{matrix} score (W, α) & = \underset{objective function}{\underset{︸}{ℓ (W; D) + {λ | | W | |}_{1}}} + \underset{quadratic penalty}{\underset{︸}{\frac{ρ}{2} {| h (W) |}^{2}}} + \underset{Lagrangian multiplier}{\underset{︸}{α h (W)}}, \end{matrix}

(5)

where

ρ > 0

is a penalty parameter,

α

is the Lagrange multiplier, and

{λ | | W | |}_{1}

is a non-smooth penalizing term for dense graphs. The algorithm imposes the acyclicity constraint with

h : R^{d x d} \to R

, where

h (\cdot)

is a smooth function over real matrices [17]. The acyclicity constraint is defined by the function

h (W)

as

\begin{matrix} h (W) = tr (e^{W \circ W}) - d = 0, \end{matrix}

where ∘ is the Hadamard product and

e^{W \circ W}

is the matrix exponential of

W \circ W

. The acyclicity constraint

h (W)

is a non-convex function and has a gradient

\nabla h (W) = {(e^{W \circ W})}^{T} \circ 2 W

[17]. For a given dataset

D \in R^{n x d}

with n i.i.d. samples of feature vector

F = (F_{1}, \dots, F_{d})

, let

D

denote a discrete space of DAGs

G = (V, E)

on d nodes. The objective of the NOTEARS algorithm [17] is to model

(F_{1}, \dots, F_{d})

via FCM. The

j^{t h}

feature is defined by

F_{j} = w_{j}^{T} F + Z

, where

F = (F_{1}, \dots, F_{d})

is a feature vector and

Z = (z_{1}, \dots, z_{d})

is an added noise vector.

Differentially Private Score-Based CGD Algorithms: The optimization problem mentioned in (5) is non-private; therefore, releasing the gradient of the optimization problem is prone to privacy leakage. To address this privacy concern, the DP-preserving score-based CGD algorithm NOLEAKS [22] adopts the notion of differential privacy (DP) in the optimization process. To ensure differential privacy for the released gradient

(\hat{\nabla F})

, the Jacobian of this optimization process is clipped with a certain clipping threshold

(s)

and perturbed with Gaussian noise

N (0, σ^{2} I_{dxd})

.

Unlike constraint-based CGD algorithms, the later iterations are more critical than the initial ones when minimizing the score function

score (W, α)

. The optimization process in score-based CGD algorithms uses gradient-based methods such as stochastic gradient descent (SGD). In differentially private SGD (DP-SGD), the privacy budget

ϵ_{Total}

, DP failure probabilities

δ, δ^{'}

, and step size

η

are key factors that influence the path to finding the minima. Given a fixed privacy budget, adding the same amount of noise at each step can cause oscillations near the local minima; thus, introducing more noise in the initial iterations and gradually reducing it in subsequent iterations helps to achieve faster convergence. Therefore, initial iterations of the optimization process may handle more noise; however, as the algorithm tends to converge to the optima, the amount of added noise needs to be reduced for better convergence. This adaptivity in terms of added noise also ensures less chance of missing the optima. Motivated by this crucial fact, we introduce adaptivity to this setting and describe our proposed framework in the next section. As the NOLEAKS algorithm perturbs the Jacobian matrix through the Gaussian noise with the same noise parameter (privacy budget) to guarantee DP, the main difference between the existing NOLEAKS differential privacy framework and our proposed score-based CURATE framework is the per-iteration adaptive privacy budget increment during the perturbation of the Jacobian matrix.

Adaptive Privacy Budgeting for Score-Based Algorithms: We observe some room for improvement in terms of adaptive privacy budget allocation for differentially private FCM-based CGD algorithms. Intuitively, the later steps/iterations in the optimization of (5) are more crucial compared to the initial ones, as the later iterations are closer to the optima. Recent works, including [33,34,35], have proposed adaptive privacy budget allocation mechanisms for gradient-based optimization problems that adaptively allocate privacy budgets for each iteration in the optimization process. In our proposed score-based framework, we aim to implement adaptive privacy budget allocation for each iteration and increment the privacy budget as a function of the iterations. Therefore, our goal is to select an adaptive privacy budgeting mechanism for score-based algorithms that allocates less privacy budget to the initial iterations compared to the later ones. Intuitively, privacy budgets can be incremented additively, multiplicatively, or exponentially. Next, we analyze these three different methods of incrementing the privacy budget as functions of the initial privacy budget

ϵ_{0}

and number of iterations i, and present some experimental results to highlight the method that achieves better F1-score.

In this paper, we analyze the performance of three different privacy budget increment mechanisms. Next, we brieflydemonstrate these mechanisms. First, in the additive increment

ϵ_{i} = ϵ_{0} (1 + \frac{i}{I})

scheme, the privacy budget of the

i^{t h}

iteration is defined as a linear function of the initial budget

ϵ_{0}

, current iteration i, and total number of iterations I. Next, in the exponential increment

ϵ_{i} = ϵ_{0} \times exp (\frac{i}{I})

scheme, the budget of the

i^{t h}

iteration is incremented as a function of

exp (\frac{i}{I})

. Finally, the multiplicative increment

ϵ_{i} = ϵ_{0}^{(1 + \frac{i}{I})}

scheme increments

ϵ_{i}

multiplicatively as a function of

ϵ_{0}^{\frac{i}{I}}

.

Lemma 2.

Given a total privacy budget of

ϵ_{Total}

and initial privacy budget

ϵ_{0}

, it is possible to execute a total possible number of iterations

I_{add} = \frac{ϵ_{Total} + \frac{ϵ_{0}}{2}}{ϵ_{0} + \frac{ϵ_{0}}{2}}

(with additive increment),

I_{\exp} = \frac{ϵ_{Total}}{ϵ_{0} \times exp (1)}

(with exponential increment), and

I_{mul} = \frac{log (ϵ_{0})}{log (1 - \frac{ϵ_{0} (1 - ϵ_{0})}{ϵ_{Total}})}

for

ϵ_{0} < 1

and

I_{mul} = \frac{log (ϵ_{0})}{log (1 + \frac{ϵ_{0} (ϵ_{0} - 1)}{ϵ_{Total}})}

for

ϵ_{0} > 1

(with multiplicative increment).

Remark 1.

Combining the multiplicative and additive increments can be used to improve the number of iterations for different privacy regimes based on the total privacy budget and exhausted privacy budget.

Lemma 2 shows explicit dependence of the total number of possible iterations on the total privacy budget

ϵ_{Total}

) and initial privacy budget

ϵ_{0}

. The privacy budget incrementing methods can be readily adopted with the exponential mechanism, Laplace mechanism, and sparse vector technique. Figure 4 shows the maximum possible number of iterations with different adaptive methods given a fixed initial privacy budget

ϵ_{0}

and total privacy budget

ϵ_{Total}

. We also observe that the multiplicative method executes noticeably more iterations compared to the additive and exponential methods in higher-privacy regimes (i.e.,

ϵ_{0} < 1

). The number of iterations directly influences the performance of the optimization process, as it follows a step-wise gradient-based method. Based on the total privacy budget

ϵ_{Total}

, if the process terminates before reaching the optimum, the algorithm will suffer from suboptimal performance. Therefore, the goal is to run as many iterations as possible based on the total and initial privacy budget. As we aim to achieve better performance by executing more iterations given a total privacy budget

ϵ_{Total}

, in this paper we follow the multiplicative method for incrementing the per-iteration privacy budget.

Score-based CURATE Algorithm: Next, we present the adaptive private minimization technique used in score-based CURATE in Algorithm 2.

Algorithm 2: Adaptive Priv-Minimize

The function clip(·) denotes the clipping of the true gradient

\nabla F

using a clipping threshold s, and can be mathematically represented as

\hat{\nabla F} = \nabla F / max (1, \frac{{| | \nabla F | |}_{2}}{s})

[32]. Clipping the gradient ensures that the gradient is bounded by the clipping threshold s. We use the Priv-Linesearch feature adopted from the NOLEAKS algorithm [22], by which the algorithm aims to investigate the optimal step size

η

. The score-based CURATE algorithm essentially utilizes the FCM-based model for CGD and allows for adaptive privacy budgeting through the optimization process. The score-based CURATE algorithm follows a similar FCM-based framework to the non-private NOTEARS algorithm and differentially private NOLEAKS algorithm; however, our proposed framework enables adaptive privacy budget allocation for each iteration through the Adaptive Priv-Minimize function.

Remarks on the Score-Based CURATE Algorithm: Because the score-based CURATE algorithm follows an FCM-based workflow that is similar to the non-private NOTEARS and differentially private NOLEAKS algorithms, it achieves polynomial complexity in terms of the feature/variable size d. For small datasets and with less leakage, it achieves better and more meaningful causal graphs compared to constraint-based algorithms. However, due to the non-convex nature of the optimization problem, similar to NOTEARS and NOLEAKS algorithms, the score-based CURATE algorithm does not guarantee convergence to global optima. Nonetheless, we observed in our experiments that the score-based algorithms provided better privacy guarantees in regimes with lower total privacy (

ϵ_{Total} \leq 1

) compared to the differentially private constraint-based algorithms.

4. Results and Discussion

Data Description and Test Parameters: We compared the predictive performance of our proposed CURATE framework with non-private PC [1], EM-PC [21], SVT-PC, Priv-PC [20], and NOLEAKS [22] on six public CGD datasets [2,36,37,38,39]. Figure 5 presents the detailed description of the datasets along with the predictive performance of the non-private PC algorithm. For the experimental results, we considered the probability of failure in differential privacy

δ^{'} = 10^{- 12}

, as the safe choice for

δ^{'}

is

δ^{'} \leq n^{- 1.5}

, where n is the total number of participants/samples in the dataset. In each of the six CGD datasets, the total number of samples was

(n) = 100 k = 10^{5}

; thus, we considered the value of

δ^{'} = 10^{- 12} \leq n^{- 1.5}

. The total privacy budget ranged from

10^{- 2}

to

10^{2}

, and we varied the total budget in this range to observed the performance of the CGD algorithms in high-, moderate-, and low-privacy regimes. The initial privacy parameter was calculated based on Equation (3). The test threshold

(T)

was set as

0.05

, the subsampling rate

(q)

was

1.0

, and we used Kendall’s

τ

as a CI testing function for the constraint-based private algorithms. To run the experiments, we used a high-performance computing (HPC) system with one node and one CPU with 5 GB of RAM. The code for the constraint-based and score-based CURATE algorithm is available at https://github.com/PayelBhattacharjee14/cgdCURATE, accessed on 8 September 2024.

Evaluation Metric: For the scope of our experiments, we measured the predictive performance of the CGD algorithms in terms of the F1-score, which indicates the similarity between the estimated graph

G

and ground truth

G^{*}

. Let the ground truth be represented by the graph

G^{*} = (V, E^{*})

and the estimated graph be represented by

G = (V, E)

. The edges in the true graph are denoted as

E^{*}

and the edges in the estimated graph are denoted as

E

. We can denote the precision as

\frac{E \cap E^{*}}{E}

and recall as

\frac{E \cap E^{*}}{E^{*}}

; then, the F1-score (utility) of the CGD algorithm is defined as

F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall} .

Privacy vs. Utility Tradeoff: There is a privacy–utility tradeoff in differential privacy-preserving CGD. Through comprehensive experimental results on six public CGD datasets, we observed that the private algorithms required higher privacy leakage to achieve the same predictive performance as their non-private counterparts.

The experimental results presented in Figure 6 show that with adaptive privacy budget allocation and minimization of total probability of error, CURATE outperforms the existing private CGD algorithms, including EM-PC [21], SVT-PC, Priv-PC [20], and NOLEAKS [22]. Figure 6 presents the mean F1-score and its standard deviation for 50 consecutive runs on the Cancer, Earthquake, Survey, Asia, Sachs, and Child datasets for different privacy regimes. The number of features in the dataset also impacts the performance of the CGD algorithms. Notably, for the Cancer, Earthquake, and Survey datasets, score-based CURATE achieves the highest F1-score with a total leakage of less than 1.0. As the number of features increases, CURATE and the other CGD algorithms tend to leak more in order to achieve the best F1-score. For the Sachs and Child datasets, CURATE achieves the highest F1-score with

ϵ_{Total} > 1.0

. In addition, we observe that constraint-based CURATE achieves better utility (F1-score) with less total leakage compared to the existing constraint-based DP-CGD algorithms, including EM-PC [17], Priv-PC, and SVT-PC [20]. Therefore, adaptive privacy budgeting scales up utility in DP-CGD.

Computational Complexity of DP-CGD Algorithms: The reliability of an algorithm also depends on its computational complexity. In private CGD, score-based and constraint-based algorithms have different computational complexities. As mentioned by the authors of [8], score-based algorithms are computationally expensive because they need to enumerate and assign scores to each possible output graph. For instance, NOLEAKS uses the quasi-Newton, method which has high computational and space complexity [22]; on the other hand, EM-PC is computationally slow, as the utility function used in the exponential mechanism is computationally expensive [20]. Priv-PC adopts SVT and the Laplace mechanism to ensure DP, whereas constraint-based CURATE optimizes privacy budgets

\bar{ϵ}

in an online setting and then adopts the Laplace mechanism to privatize CI tests. This makes CURATE less computationally expensive compared to the existing constraint-based DP-CGD algorithms.

Comparison of Number of CI Tests: The total number of CI tests executed by a differentially private CDG algorithm directly affects the privacy and utility tradeoff of the algorithm. The total number of CI tests in private constraint-based CGD algorithms directly influences the total amount of leakage, as each CI test is associated with some amount of privacy leakage. Privacy leakage can be provably reduced by efficient and accurate CI testing. In the constraint-based CURATE algorithm, privacy budgets are allocated by minimizing a surrogate for the total probability of the error. Intuitively, this decreases the total leakage of CURATE, as the adaptive choice of privacy budgets makes the initial CI tests more accurate. Therefore, CURATE tends to run a smaller number of CI tests compared to other differentially private algorithms. We confirm this intuition in the results presented in Figure 7. It can be observed that the number of CI tests in EM-PC, SVT-PC, and Priv-PC are comparatively large relative to CURATE and the non-private counterpart PC algorithm [1].

Running Time Comparison: In this subsection of the paper, we provide a running time comparison between adaptive and non-adaptive score-based and constraint-based differentially private CGD algorithms. Due to their complexity, score-based CGD algorithms tend to consume more time compared to constraint-based algorithms. Figure 8 compares the running times of the differentially private CGD algorithms for 50 consecutive iterations. As shown in the figure, the constraint-based CURATE algorithm speeds up the process of DP-CGD compared to the Priv-PC and EM-PC algorithms, while the score-based CURATE algorithm achieves better predictive performance compared to the NOLEAKS algorithm with a similar amount of execution time. The adaptivity of these DP-CGD algorithms allows them to converge faster and reduces their overall execution time.

5. Conclusions

This paper proposes CURATE, a differentially private causal graph discovery framework that scales up privacy via adaptive privacy budget allocation for both constraint-based and score-based CGD environments. Constraint-based CURATE is based on the key idea of minimizing a surrogate for the total probability of error in CGD, which ensures a better privacy–utility tradeoff. Score-based CURATE allows a higher number of iterations and faster convergence of the optimization problem through adaptive budgeting, thereby guaranteeing better utility with less leakage. In our experiments, we observed that the average number of CI tests required with constraint-based CURATE is similar to the number of CI tests required by the non-private PC algorithm. Our results show that CURATE outperforms existing private CGD algorithms and achieves better utility. In addition, the leakage of the proposed framework using adaptive privacy budgeting is smaller by orders of magnitude. In addition, there are several interesting open research directions for future work: (i) an adaptive gradient-clipping mechanism could be implemented for the score-based algorithm; (ii) as our proposed score-based framework uses the resulting pruned graph, the per-iteration privacy budget could be designed based on the outcome of the previous iteration; and (iii) the outcomes of previous noisy tests could be used to tune hyperparameters such as the test threshold, margins, and clipping threshold.

6. Remarks

A part of this work, our constraint-based CURATE algorithm has been submitted and accepted to the 2024 IEEE International Workshop on Machine Learning for Signal Processing (IEEE MLSP 2024). This article is a revised and expanded version of a paper titled Adaptive Privacy for Differentially Private Causal Graph Discovery which we presented at the IEEE MLSP 2024 conference in London, UK, on 25 September 2024.

Author Contributions

Conceptualization, P.B. and R.T.; Methodology, P.B.; Software, P.B.; Validation, P.B. and R.T.; Formal analysis, P.B.; Resources, R.T.; Writing—original draft, P.B.; Writing—review & editing, R.T.; Visualization, P.B.; Supervision, R.T.; Project administration, R.T.; Funding acquisition, R.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the NSF grants CAREER 1651492, CCF-2100013, CNS-2209951, CNS-1822071, CNS-2317192, and by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing under Award Number DE-SC-ERKJ422 and NIH Award R01-CA261457-01A1.

Data Availability Statement

We have used publicly available datasets for our experiments, and they are cited within the article. The code and used datasets are also available in CURATE GitHub repository as mentioned in the Experimental Results section.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Proof of Lemma 1

In this section, we present the proof of Lemma 1. For every order-i conditional independence (CI) test, we have a privacy budget of

ϵ_{i}

. Given a CI test statistic

f (D)

with

l_{1}

-sensitivity

Δ_{1}

, threshold T, and margins

(β_{1}, β_{2})

, we perturb the test statistic using Laplace noise, defined as

Z = L a p (\frac{Δ_{1}}{ϵ_{i}})

, and check for conditional independence between

(v_{a}, v_{b}) \in G

conditioned on S as follows:

If $f (D) + Z > T (1 + β_{2}) \Rightarrow$ delete edge $(v_{a}, v_{b})$
If $f (D) + Z < T (1 - β_{1}) \Rightarrow$ keep edge $(v_{a}, v_{b})$
Else keep edge ( $v_{a}, v_{b}$ ) with probability $\frac{1}{2}$ .

For simplicity of notation, we define

f_{v_{a}, v_{b} | S} (D) : = f (D)

.

Type-I Error: We now analyze the type-I error relative to the unperturbed CI test, i.e., when the private algorithm keeps the edge given that the unperturbed test statistic deletes the edge

f (D) > T

. In other words, this can be written as

P (E_{1}^{i}) = P (Error | f (D) > T)

. We note that the error event occurs only for cases (b) and (c). We can bound the relative type-I error as follows:

\begin{matrix} P (E_{1}^{i}) = P (Error | f (D) > T) \\ \leq \frac{1}{2} (P (f (D) + Z \in [T (1 - β_{1}), T (1 + β_{2})] | f (D) > T)) \end{matrix}

(A1)

\begin{matrix} + P (f (D) + Z < T (1 - β_{1}) | f (D) > T) \end{matrix}

(A2)

\begin{matrix} \leq \frac{c_{1}}{2} + P (f (D) + Z < T (1 - β_{1}) | f (D) > T) \leq \frac{c_{1}}{2} + \frac{1}{2} exp (\frac{- T β_{1} ϵ_{i}}{Δ_{1}}) \end{matrix}

(A3)

where the last inequality follows from the Laplacian tail bound and the fact that

f (D) > T

, and we have defined

c_{1}

as

c_{1} : = P (f (D) + Z \in [T (1 - β_{1}), T (1 + β_{2})] | f (D) > T)

.

The upper bound on

P [f (D) + Z < T (1 - β_{1}) | f (D) > T]

is obtained from the Laplace tail bound as follows:

\begin{matrix} P [f (D) + Z < T (1 - β_{1}) | f (D) > T] = P [Z < T (1 - β_{1}) - f (D)] \\ = \frac{1}{2} exp (\frac{T - T β_{1} - f (D)}{Δ_{1} / ϵ_{i}}) \leq \frac{1}{2} exp (\frac{T - T β_{1} - T}{Δ_{1} / ϵ_{i}}) \\ = \frac{1}{2} exp (\frac{- T β_{1} ϵ_{i}}{Δ_{1}}) . \end{matrix}

(A4)

Type-II Error: Next, we analyze the type-II error relative to the unperturbed CI test, i.e., when the differentially private algorithm deletes an edge given that the unperturbed CI test statistic keeps the edge (

f (D) < T

). Mathematically,

P [E_{2}^{i}] = P (Error | f (D) < T) .

The type-II error occurs only for cases (a) and (c). Therefore, we can bound the type-II error as follows:

\begin{matrix} P (E_{2}^{i}) = P (Error | f (D) < T) \\ \leq \frac{1}{2} (P (f (D) + Z \in [T (1 - β_{1}), T (1 + β_{2})] | f (D) < T)) \end{matrix}

(A5)

\begin{matrix} + P (f (D) + Z > T (1 + β_{2}) | f (D) < T) \end{matrix}

(A6)

\begin{matrix} \leq \frac{c_{2}}{2} + P (f (D) + Z > T (1 + β_{2}) | f (D) < T) \end{matrix}

(A7)

\begin{matrix} \leq \frac{c_{2}}{2} + \frac{1}{2} exp (\frac{- T β_{2} ϵ_{i}}{Δ_{1}}) \end{matrix}

(A8)

where the last inequality follows from the Laplacian tail bound and the fact that

f (D) < T

, and we have defined

c_{2}

as

c_{2} : = P (f (D) + Z \in [T (1 - β_{1}), T (1 + β_{2})] | f (D) < T)

. The probability

P [f (D) + Z > T (1 + β_{2}) | f (D) < T]

can also be upper bounded as

\begin{matrix} P [f (D) & + Z > T (1 + β_{2}) | f (D) < T] = P [Z > T (1 + β_{2}) - f (D)] \end{matrix}

\begin{matrix} = \frac{1}{2} exp (- \frac{T + T β_{2} - f (D)}{Δ_{1} / ϵ_{i}}) \leq \frac{1}{2} exp (- \frac{T + T β_{2} - T}{Δ_{1} / ϵ_{i}}) \end{matrix}

(A9)

\begin{matrix} = \frac{1}{2} exp (\frac{- T β_{2} ϵ_{i}}{Δ_{1}}) . \end{matrix}

(A10)

This concludes the proof of Lemma 1.

Appendix A.2. Sensitivity Analysis of Weighted Kendall’s τ

Conditional independence (CI) tests in causal graph discovery (CGD) measure the dependence of one variable (

v_{a}

) on another (

v_{b}

) conditioned on a set of variables. Let the CI test statistic for connected variable pairs (

v_{a}, v_{b}

) in graph

G

be

τ (D)

for dataset

D

and

τ (D^{'})

for dataset

D^{'}

. For large samples, the test statistic

τ (\cdot)

follows a Gaussian distribution. Therefore, the sensitivity can be defined as follows:

\begin{matrix} Δ_{1} (Φ (τ (D))) & = sup_{D \neq D^{'}} | Φ (τ (D)) - Φ (τ (D^{'})) | \leq Δ (Φ (\cdot)) \times Δ (τ (\cdot)) \\ = sup_{D \neq D^{'}} \frac{| Φ (τ (D)) - Φ (τ (D^{'})) |}{| τ (D) - τ (D^{'}) |} \times | τ (D) - τ (D^{'}) | \\ \leq L_{Φ} \times sup | τ (D) - τ (D^{'}) | \end{matrix}

(A11)

where

{sup}_{D \neq D^{'}} | τ (D) - τ (D^{'}) |

is the

l_{1}

-sensitivity of the CI test statistic for datasets

D

and

D^{'}

, with

Φ

as the PDF of the standard normal distribution. Because

Φ (\cdot)

is differentiable, the Lipschitz constant (

L_{Φ}

) can be upper bounded as

L_{Φ} \leq \frac{1}{\sqrt{2 π}}

. Thus, the sensitivity can easily be calculated using the sensitivity of the weighted test statistic.

$l_{1}$ -Sensitivity Analysis: For large sample sizes (

n > > 1

), Kendall’s

τ

test statistic follows a Gaussian distribution with zero mean and variance

\frac{2 (2 n + 5)}{9 n (n - 1)}

, where n is the number of i.i.d. samples. Given a dataset

D

with d features, the conditional dependence of between variables (

v_{a}, v_{b}

) conditioned on set S can be measured with Kendall’s

τ

as a CI test statistic. For instance, the data can be split into k bins according to the unique values of set S. For each

i^{t h}

bin, the test statistic

τ_{i}

is calculated; the weighted average of all

τ_{i}

values represents the test statistic for the entire dataset. The weighted average [20] is defined as

τ = \frac{\sum_{i = 1}^{k} w_{i} τ_{i}}{\sqrt{\sum_{i = 1}^{k} w_{i}}}

, where

w_{i}

is the inverse of the variance

w_{i} = \frac{9 n_{i} (n_{i} - 1)}{2 (2 n_{i} + 5)}

.

As we perturb the p-value obtained from this weighted test statistic, we need to observe the

l_{1}

-sensitivity of p-value. For the scope of this paper, we consider the Lipschitz constant of Gaussian distribution while calculating the sensitivity.

The weighted average

τ

essentially follows the standard normal distribution, i.e.,

τ \sim N (0, 1)

. Hence, the

l_{1}

-sensitivity of p-value can be defined as

\begin{matrix} Δ_{1} & = | Φ (τ (D) - Φ (τ (D^{'}) | = \frac{| Φ (τ (D) - Φ (τ (D^{'}) |}{| τ (D) - τ (D^{'}) |} \times | τ (D) - τ (D^{'}) | \\ \leq L_{Φ} | τ (D) - τ (D^{'}) | \leq \frac{1}{\sqrt{2 π}} | τ (D) - τ (D^{'}) | . \end{matrix}

(A12)

The sensitivity of this weighted Kendall’s

τ

can be expressed as

\begin{matrix} Δ_{1} (τ) & = max_{| D^{'} - D | \leq 1} | τ (D^{'}) - τ (D) | \leq Δ_{1} (τ_{i}) Δ_{1} (w_{i}) . \end{matrix}

The sensitivity of

τ_{i}

depends upon the number of elements

n_{i}

and

Δ_{1} (τ_{i}) \leq \frac{2}{n_{i} - 1}

[20]. The sensitivity of weights

Δ (w_{i})

can be represented as follows:

\begin{matrix} Δ_{1} (w_{i}) & \leq |\frac{w_{i}^{'}}{\sqrt{\sum_{i \neq j}^{k} w_{j} + w_{i}^{'}}} - \frac{w_{i}}{\sqrt{\sum_{j = 1}^{k} w_{j}}}| \leq |\frac{\frac{9 n_{i} (n_{i} + 1)}{2 (2 (n_{i} + 1) + 5)}}{\sqrt{\sum_{j = 1}^{k} w_{j} + w_{i}^{'}}}| - |\frac{\frac{9 n_{i} (n_{i} - 1)}{2 (2 n_{i} + 5)}}{\sqrt{\sum_{j = 1}^{k}} w_{j}}| . \end{matrix}

(A13)

We can provide an upper bound on Equation (A13) through the triangle inequality, and the sensitivity of the weight can be bounded as

\begin{matrix} Δ_{1} (w_{i}) \leq \sqrt{\frac{2}{n}} (|\frac{9 n_{i} (n_{i} + 1)}{2 (2 (n_{i} + 1) + 5)}| - |\frac{9 n_{i}^{2}}{2 (2 n_{i} + 5)}|) . \end{matrix}

(A14)

The sensitivity

Δ (τ)

essentially depends upon the number of elements in the

i^{t h}

bin (the bin that changes due to the addition or removal of a single user). For a dataset with a block size of at least size c and

k c \approx n

, the overall sensitivity for the p-value can be bounded with Equations (A12) and (A14) as follows:

\begin{matrix} Δ_{1} & \leq \frac{1}{\sqrt{2 π}} \times \frac{2}{n_{i} - 1} \times \sqrt{\frac{2}{n}} \times (|\frac{9 n_{i} (n_{i} + 1)}{2 (2 (n_{i} + 1) + 5)}| - |\frac{9 n_{i}^{2}}{2 (2 n_{i} + 5)}|) \\ = \frac{2}{\sqrt{n π}} (\frac{|\frac{9 n_{i} (n_{i} + 1)}{2 (2 (n_{i} + 1) + 5)}| - |\frac{9 n_{i}^{2}}{2 (2 n_{i} + 5)}|}{n_{i} - 1}) . \end{matrix}

(A15)

This concludes the

l_{1}

-sensitivity analysis of the weighted Kendall’s

τ

coefficient.

Appendix A.3. Proof of Lemma 2

Finally, we analyze the methods adopted for the class of score-based algorithms. Below, we present the proof of Lemma 2. The main objective is to derive the relationship between the total privacy leakage

ϵ_{Total}

, number of iterations I, and the initial privacy budget

ϵ_{0}

for the score-based CURATE algorithm. We first demonstrate the possible number of iterations for the additive, multiplicative, and exponential incrementing methods for the score-based CURATE algorithm.

Additive Incrementing Method: This method increments the privacy budget for each iteration

ϵ_{i}

as a function of the current number of iterations i, total assigned privacy budget

ϵ_{Total}

, and initial privacy budget

ϵ_{0}

. Mathematically, for every

i^{t h}

iteration, this method increments the privacy budget for each iteration as

ϵ_{i} = ϵ_{0} (1 + \frac{i}{I_{add}})

. Given a total privacy budget

ϵ_{Total}

, initial privacy budget

ϵ_{0}

, and number of iterations

I_{add}

, we can define

ϵ_{Total}

as

\begin{matrix} ϵ_{Total} = \frac{I_{add}}{2} [2 ϵ_{0} + (I_{add} - 1) \frac{ϵ_{0}}{I_{add}}] \\ I_{add} = \frac{ϵ_{Total} + \frac{ϵ_{0}}{2}}{ϵ_{0} + \frac{ϵ_{0}}{2}} . \end{matrix}

Exponential Incrementing Method: This method increments the per-iteration privacy budget as an exponential function of the initial budget

ϵ_{0}

and current number of iterations i. For every

i^{t h}

iteration, the exponential incrementing method defines the privacy budget as

ϵ_{i} = ϵ_{0} \times exp (\frac{i}{I_{\exp}})

. Given a total privacy budget

ϵ_{Total}

, initial privacy budget

ϵ_{0}

, and possible number of iterations

I_{\exp}

, we define

ϵ_{Total}

as

\begin{matrix} exp (0) \leq exp (1 / I_{\exp}) \leq \dots \leq exp (I_{\exp} / I_{\exp}) \\ \sum_{i = 0}^{I_{\exp}} exp (\frac{i}{I_{\exp}}) \leq I_{\exp} exp (1) . \end{matrix}

To maintain the total privacy budget of

ϵ_{Total}

, we can define the relationship between

I_{\exp}, ϵ_{Total}

, and

ϵ_{0}

as

\begin{matrix} ϵ_{Total} \geq I_{\exp} \times exp (1) \times ϵ_{0} \\ I_{\exp} \leq \frac{ϵ_{Total}}{ϵ_{0} \times exp (1)} . \end{matrix}

Multiplicative Incrementing Method: This method enables the algorithm to increment the per-iteration privacy budget

ϵ_{i}

as a multiplicative function of the initial budget

ϵ_{0}

and the current number of iterations. For every

i^{t h}

iteration, the per-iteration privacy budget

ϵ_{i}

is defined as

ϵ_{i} = ϵ_{0}^{(1 + \frac{i}{I_{mul}})}

. In this method, the possible number of iterations

I_{mul}

depends on the value of the factor

ϵ_{0}^{1 / I_{mul}}

. If

ϵ_{0}^{1 / I_{mul}} \leq 1

, then

ϵ_{0} \leq 1

, which indicates a high-privacy regime; otherwise, there is a low-privacy regime where

ϵ_{0}^{1 / I_{mul}} \geq 1

and

ϵ_{0} \geq 1

. For the high-privacy regime

ϵ_{0} \leq 1

, we define the total leakage

ϵ_{Total}

as

\begin{matrix} ϵ_{Total} & = \frac{ϵ_{0} (1 - ϵ_{0}^{\frac{1}{I_{mul}} \times I_{mul}})}{1 - ϵ_{0}^{1 / I_{mul}}} = \frac{ϵ_{0} (1 - ϵ_{0})}{1 - ϵ_{0}^{1 / I_{mul}}} \\ I_{mul} & = \frac{log (ϵ_{0})}{log (1 - \frac{ϵ_{0} (1 - ϵ_{0})}{ϵ_{Total}})} . \end{matrix}

(A16)

For the case with initial privacy budget

ϵ_{0} > 1

, we can derive the expression of

I_{mul}

as follows:

\begin{matrix} ϵ_{Total} & = \frac{ϵ_{0} (ϵ_{0}^{\frac{1}{I_{mul}} \times I_{mul}} - 1)}{ϵ_{0}^{\frac{1}{I_{mul}}} - 1} = \frac{ϵ_{0} (ϵ_{0} - 1)}{ϵ_{0}^{\frac{1}{I_{mul}}} - 1} \\ I_{mul} & = \frac{log (ϵ_{0})}{log (\frac{ϵ_{0} (ϵ_{0} - 1)}{ϵ_{Total}} + 1)} . \end{matrix}

(A17)

This concludes the proof of Lemma 2.

References

Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search; Springer: New York, NY, USA, 1993; Volume 81. [Google Scholar] [CrossRef]
Sachs, K.; Perez, O.; Pe’er, D.; Lauffenburger, D.A.; Nolan, G.P. Causal protein-signaling networks derived from multiparameter single-cell data. Science 2005, 308, 523–529. [Google Scholar] [CrossRef] [PubMed]
Zhang, B.; Gaiteri, C.; Bodea, L.G.; Wang, Z.; McElwee, J.; Podtelezhnikov, A.A.; Zhang, C.; Xie, T.; Tran, L.; Dobrin, R.; et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 2013, 153, 707–720. [Google Scholar] [CrossRef] [PubMed]
Kimmel, K.; Dee, L.E.; Avolio, M.L.; Ferraro, P.J. Causal assumptions and causal inference in ecological experiments. Trends Ecol. Evol. 2021, 36, 1141–1152. [Google Scholar] [CrossRef] [PubMed]
Cordero, J.M.; Cristóbal, V.; Santín, D. Causal inference on education policies: A survey of empirical studies using PISA, TIMSS and PIRLS. J. Econ. Surv. 2018, 32, 878–915. [Google Scholar] [CrossRef]
Atanasov, V.A.; Black, B.S. Shock-based causal inference in corporate finance and accounting research. Crit. Financ. Rev. 2016, 5, 207–304. [Google Scholar] [CrossRef]
Spirtes, P. An Anytime Algorithm for Causal Inference. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, PMLR, Key West, FL, USA, 4–7 January 2001; pp. 278–285. [Google Scholar]
Nogueira, A.R.; Pugnana, A.; Ruggieri, S.; Pedreschi, D.; Gama, J. Methods and tools for causal discovery and causal inference. WIREs Data Min. Knowl. Discov. 2022, 12, e1449. [Google Scholar] [CrossRef]
Mcdonald, J.H. Handbook of Biological Statistics; Sparky House Publishing: Baltimore, MD, USA, 2014. [Google Scholar]
McHugh, M.L. The Chi-square test of independence. Biochem. Medica 2013, 23, 143–149. [Google Scholar] [CrossRef]
Kendall, M.G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
Spearman, C. The proof and measurement of association between two things. By C. Spearman, 1904. Am. J. Psychol. 1987, 100, 441–471. [Google Scholar] [CrossRef] [PubMed]
Heckerman, D.; Geiger, D.; Chickering, D.M. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 1995, 20, 197–243. [Google Scholar] [CrossRef]
Kuipers, J.; Moffa, G.; Heckerman, D. Addendum on the scoring of Gaussian directed acyclic graphical models. Ann. Statist. 2014, 42, 1689–1691. [Google Scholar] [CrossRef] [PubMed]
Maxwell Chickering, D.; Heckerman, D. Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Mach. Learn. 1997, 29, 181–212. [Google Scholar] [CrossRef]
Bouckaert, R.R. Probabilistic network construction using the minimum description length principle. In European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty; Springer: Berlin/Heidelberg, Germany, 1993; pp. 41–48. [Google Scholar]
Zheng, X.; Aragam, B.; Ravikumar, P.K.; Xing, E.P. DAGs with NO TEARS: Continuous Optimization for Structure Learning. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Murakonda, S.K.; Shokri, R.; Theodorakopoulos, G. Quantifying the privacy risks of learning high-dimensional graphical models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual, 13–15 April 2021; pp. 2287–2295. [Google Scholar]
Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our Data, Ourselves: Privacy Via Distributed Noise Generation. In Advances in Cryptology—EUROCRYPT 2006; Lecture Notes in Computer Science; Vaudenay, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 486–503. [Google Scholar] [CrossRef]
Wang, L.; Pang, Q.; Song, D. Towards practical differentially private causal graph discovery. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 5516–5526. [Google Scholar]
Xu, D.; Yuan, S.; Wu, X. Differential Privacy Preserving Causal Graph Discovery. In Proceedings of the Computer Science and Computer Engineering Faculty Publications and Presentations, Washington, DC, USA, 1–4 August 2017. [Google Scholar] [CrossRef]
Ma, P.; Ji, Z.; Pang, Q.; Wang, S. NoLeaks: Differentially Private Causal Discovery Under Functional Causal Model. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2324–2338. [Google Scholar] [CrossRef]
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography; Lecture Notes in Computer Science; Halevi, S., Rabin, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar] [CrossRef]
Zanga, A.; Ozkirimli, E.; Stella, F. A survey on causal discovery: Theory and practice. Int. J. Approx. Reason. 2022, 151, 101–129. [Google Scholar] [CrossRef]
Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends^® Theor. Comput. Sci. 2013, 9, 211–407. [Google Scholar] [CrossRef]
Balle, B.; Barthe, G.; Gaboardi, M. Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Chickering, D.M. Learning Bayesian networks is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V; Springer: New York, NY, USA, 1996; pp. 121–130. [Google Scholar]
Dwork, C.; Lei, J. Differential privacy and robust statistics. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09, New York, NY, USA, 31 May–2 June 2009; pp. 371–380. [Google Scholar] [CrossRef]
Dwork, C.; Rothblum, G.N.; Vadhan, S. Boosting and Differential Privacy. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, USA, 23–26 October 2010; pp. 51–60. [Google Scholar] [CrossRef]
Kairouz, P.; Oh, S.; Viswanath, P. The Composition Theorem for Differential Privacy. IEEE Trans. Inf. Theory 2017, 63, 4037–4049. [Google Scholar] [CrossRef]
Rogers, R.M.; Roth, A.; Ullman, J.; Vadhan, S. Privacy Odometers and Filters: Pay-as-you-Go Composition. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar] [CrossRef]
Lee, J.; Kifer, D. Concentrated differentially private gradient descent with adaptive per-iteration privacy budget. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1656–1665. [Google Scholar]
Zhang, X.; Ding, J.; Wu, M.; Wong, S.T.; Van Nguyen, H.; Pan, M. Adaptive privacy preserving deep learning algorithms for medical data. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 1169–1178. [Google Scholar]
Chen, L.; Yue, D.; Ding, X.; Wang, Z.; Choo, K.K.R.; Jin, H. Differentially private deep learning with dynamic privacy budget allocation and adaptive optimization. IEEE Trans. Inf. Forensics Secur. 2023, 18, 4422–4435. [Google Scholar] [CrossRef]
Korb, K.B.; Nicholson, A.E. Bayesian Artificial Intelligence; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
Bernardo, J.M.; Berger, J.O.; Dawid, A.P.; Smith, A.F.M.; Bernardo, J.M.; Berger, J.O.; Dawid, A.P.; Smith, A.F.M. (Eds.) Bayesian Statistics 4: Proceedings of the Fourth Valencia International Meeting: Dedicated to the memory of Morris H. DeGroot, 1931–1989: April 15–20, 1991; Oxford University Press: Oxford, UK; New York, NY, USA, 1992. [Google Scholar]
Lauritzen, S.L.; Spiegelhalter, D.J. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. J. R. Stat. Soc. Ser. (Methodol.) 1988, 50, 157–224. [Google Scholar] [CrossRef]
Scutari, M.; Denis, J.B. Bayesian Networks: With Examples in R; Chapman and Hall/CRC: New York, NY, USA, 2014. [Google Scholar] [CrossRef]

Figure 1. Generic workflow of constraint-based CGD algorithms, showing the skeleton orientation phases. The skeleton phase starts with a fully connected graph consisting of d nodes, where d is the number of features/variables and

k_{i}

is the maximum number of CI tests in order i. The sequence and number of tests in any order i are dependent on the outcomes of the order

(i - 1)

tests. Notably, the skeleton phase is prone to privacy leakage.

Figure 1. Generic workflow of constraint-based CGD algorithms, showing the skeleton orientation phases. The skeleton phase starts with a fully connected graph consisting of d nodes, where d is the number of features/variables and

k_{i}

is the maximum number of CI tests in order i. The sequence and number of tests in any order i are dependent on the outcomes of the order

(i - 1)

tests. Notably, the skeleton phase is prone to privacy leakage.

Figure 2. The composition mechanism in constraint-based CURATE across all orders of CI tests. For every order (i), the total privacy leakage is calculated with advanced composition, as the privacy budgets and failure probabilities for all order-

(i)

tests are the same. The total leakage across all orders is then calculated by constraint-based CURATE using basic composition.

Figure 2. The composition mechanism in constraint-based CURATE across all orders of CI tests. For every order (i), the total privacy leakage is calculated with advanced composition, as the privacy budgets and failure probabilities for all order-

(i)

tests are the same. The total leakage across all orders is then calculated by constraint-based CURATE using basic composition.

Figure 3. The threshold marginalization mechanism adopted in the constraint-based CURATE algorithm. The margins

β_{1}, β_{2}

allow for additional flexibility during hypothesis testing with noisy CI tests.

Figure 3. The threshold marginalization mechanism adopted in the constraint-based CURATE algorithm. The margins

β_{1}, β_{2}

allow for additional flexibility during hypothesis testing with noisy CI tests.

Figure 4. Possible number of iterations I given a total amount of privacy budget

ϵ_{Total}

and initial privacy budget

ϵ_{0}

. For varied total privacy budget

ϵ_{Total} = 0.1, ϵ_{Total} = 1.0

,

ϵ_{Total} = 10.0

and different initial budget

ϵ_{0} < < 1.0

and

ϵ_{0} > 1.0

, it can be observed that the multiplicative method executes more iterations in the high-privacy regime (i.e.,

ϵ_{0} < < 1.0

).

Figure 4. Possible number of iterations I given a total amount of privacy budget

ϵ_{Total}

and initial privacy budget

ϵ_{0}

. For varied total privacy budget

ϵ_{Total} = 0.1, ϵ_{Total} = 1.0

,

ϵ_{Total} = 10.0

and different initial budget

ϵ_{0} < < 1.0

and

ϵ_{0} > 1.0

, it can be observed that the multiplicative method executes more iterations in the high-privacy regime (i.e.,

ϵ_{0} < < 1.0

).

Figure 5. Dataset description and CGD results for the non-private PC algorithm [1] on six public CGD datasets with Kendall’s

τ

CI test statistic. The results were obtained with the following parameters: subsampling rate

= 1.0

, test threshold

= 0.05)

.

Figure 5. Dataset description and CGD results for the non-private PC algorithm [1] on six public CGD datasets with Kendall’s

τ

CI test statistic. The results were obtained with the following parameters: subsampling rate

= 1.0

, test threshold

= 0.05)

.

Figure 6. Part (a) presents the performance evaluation results of the differentially private CGD algorithms (EM-PC [21], SVT-PC, Priv-PC [20], NOLEAKS [22], and both score-based and constraint-based CURATE) in terms of total leakage vs. F1 score on six public CGD datasets: Cancer, Earthquake, Survey, Asia, Sachs, and Child. Part (b) presents the mean and standard deviation of the F1-score for 50 consecutive runs and for three privacy regimes (

ϵ_{Total} = 0.1

,

ϵ_{Total} = 5.0

,

ϵ_{Total} = 10.0

).

Figure 6. Part (a) presents the performance evaluation results of the differentially private CGD algorithms (EM-PC [21], SVT-PC, Priv-PC [20], NOLEAKS [22], and both score-based and constraint-based CURATE) in terms of total leakage vs. F1 score on six public CGD datasets: Cancer, Earthquake, Survey, Asia, Sachs, and Child. Part (b) presents the mean and standard deviation of the F1-score for 50 consecutive runs and for three privacy regimes (

ϵ_{Total} = 0.1

,

ϵ_{Total} = 5.0

,

ϵ_{Total} = 10.0

).

Figure 7. Average number of CI tests needed to achieve the maximum F1-score with a comparatively large amount of total leakage (

ϵ_{Total} = 1.0

) on the Cancer, Earthquake, Survey, Asia, Sachs, and Child datasets. The average CI tests of CURATE converges to that of the non-private PC algorithm, whereas EM-PC [17], Priv-PC, and SVT-PC [20] tend to run more CI tests.

Figure 7. Average number of CI tests needed to achieve the maximum F1-score with a comparatively large amount of total leakage (

ϵ_{Total} = 1.0

) on the Cancer, Earthquake, Survey, Asia, Sachs, and Child datasets. The average CI tests of CURATE converges to that of the non-private PC algorithm, whereas EM-PC [17], Priv-PC, and SVT-PC [20] tend to run more CI tests.

Figure 8. Running time comparison (in seconds) of differentially private constraint-based and score-based algorithms on six public CGD datasets: Cancer, Earthquake, Survey, Asia, Sachs, and Child for 50 consecutive iterations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bhattacharjee, P.; Tandon, R. CURATE: Scaling-Up Differentially Private Causal Graph Discovery. Entropy 2024, 26, 946. https://doi.org/10.3390/e26110946

AMA Style

Bhattacharjee P, Tandon R. CURATE: Scaling-Up Differentially Private Causal Graph Discovery. Entropy. 2024; 26(11):946. https://doi.org/10.3390/e26110946

Chicago/Turabian Style

Bhattacharjee, Payel, and Ravi Tandon. 2024. "CURATE: Scaling-Up Differentially Private Causal Graph Discovery" Entropy 26, no. 11: 946. https://doi.org/10.3390/e26110946

APA Style

Bhattacharjee, P., & Tandon, R. (2024). CURATE: Scaling-Up Differentially Private Causal Graph Discovery. Entropy, 26(11), 946. https://doi.org/10.3390/e26110946

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CURATE: Scaling-Up Differentially Private Causal Graph Discovery

Abstract

1. Introduction

2. Preliminaries on CGD and DP

3. Adaptive Differential Privacy in Causal Graph Discovery

3.1. Adaptive Privacy Budget Allocation with Constraint-Based CURATE Algorithm

3.2. Adaptive Privacy Budget Allocation with Score-Based CURATE Algorithm

4. Results and Discussion

5. Conclusions

6. Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Lemma 1

Appendix A.2. Sensitivity Analysis of Weighted Kendall’s τ

Appendix A.3. Proof of Lemma 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI