CURATE: Scaling-Up Differentially Private Causal Graph Discovery
Abstract
:1. Introduction
- Our proposed CURATE framework scales up the utility of the CGD process using adaptive privacy budget allocation. Within the scope of constraint-based DP-CGD algorithms, the constraint-based CURATE algorithm optimizes privacy budgets for each order of CI test (CI tests of the same order have the same privacy budget) in a principled manner, with the goal of minimizing the surrogate for the total probability of error. By allocating adaptive (and often comparatively higher) privacy budgets to the initial CI tests, CURATE ensures better overall predictive performance with less total leakage compared to the existing constraint-based DP-CGD algorithms.
- We present a score-based CURATE algorithm which allows for adaptive budgeting to maximize the number of iterations given a fixed privacy budget (). The score-based CURATE algorithm uses a functional causal model-based optimization approach that allocates a higher privacy budget to later iterations. The privacy budget is incremented as a function of iterations, helping our score-based CURATE to achieve better utility in comparison to existing works.
- We present extensive experimental results on six public CGD datasets to compare the predictive performance of our proposed CURATE framework with existing DP-CGD algorithms. Our experimental results show that CURATE ensures better predictive performance with less leakage by orders of magnitude. The average required number of CI tests in constraint-based CURATE is also significantly less than that of existing constraint-based DP-CGD algorithms.
2. Preliminaries on CGD and DP
3. Adaptive Differential Privacy in Causal Graph Discovery
3.1. Adaptive Privacy Budget Allocation with Constraint-Based CURATE Algorithm
- If delete edge ()
- Else, if keep edge ()
- Else, keep the edge with probability
Algorithm 1: Constraint-based CURATE Algorithm |
3.2. Adaptive Privacy Budget Allocation with Score-Based CURATE Algorithm
Algorithm 2: Adaptive Priv-Minimize |
4. Results and Discussion
5. Conclusions
6. Remarks
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Proof of Lemma 1
- If delete edge
- If keep edge
- Else keep edge () with probability .
Appendix A.2. Sensitivity Analysis of Weighted Kendall’s τ
Appendix A.3. Proof of Lemma 2
References
- Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search; Springer: New York, NY, USA, 1993; Volume 81. [Google Scholar] [CrossRef]
- Sachs, K.; Perez, O.; Pe’er, D.; Lauffenburger, D.A.; Nolan, G.P. Causal protein-signaling networks derived from multiparameter single-cell data. Science 2005, 308, 523–529. [Google Scholar] [CrossRef] [PubMed]
- Zhang, B.; Gaiteri, C.; Bodea, L.G.; Wang, Z.; McElwee, J.; Podtelezhnikov, A.A.; Zhang, C.; Xie, T.; Tran, L.; Dobrin, R.; et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 2013, 153, 707–720. [Google Scholar] [CrossRef] [PubMed]
- Kimmel, K.; Dee, L.E.; Avolio, M.L.; Ferraro, P.J. Causal assumptions and causal inference in ecological experiments. Trends Ecol. Evol. 2021, 36, 1141–1152. [Google Scholar] [CrossRef] [PubMed]
- Cordero, J.M.; Cristóbal, V.; Santín, D. Causal inference on education policies: A survey of empirical studies using PISA, TIMSS and PIRLS. J. Econ. Surv. 2018, 32, 878–915. [Google Scholar] [CrossRef]
- Atanasov, V.A.; Black, B.S. Shock-based causal inference in corporate finance and accounting research. Crit. Financ. Rev. 2016, 5, 207–304. [Google Scholar] [CrossRef]
- Spirtes, P. An Anytime Algorithm for Causal Inference. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, PMLR, Key West, FL, USA, 4–7 January 2001; pp. 278–285. [Google Scholar]
- Nogueira, A.R.; Pugnana, A.; Ruggieri, S.; Pedreschi, D.; Gama, J. Methods and tools for causal discovery and causal inference. WIREs Data Min. Knowl. Discov. 2022, 12, e1449. [Google Scholar] [CrossRef]
- Mcdonald, J.H. Handbook of Biological Statistics; Sparky House Publishing: Baltimore, MD, USA, 2014. [Google Scholar]
- McHugh, M.L. The Chi-square test of independence. Biochem. Medica 2013, 23, 143–149. [Google Scholar] [CrossRef]
- Kendall, M.G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
- Spearman, C. The proof and measurement of association between two things. By C. Spearman, 1904. Am. J. Psychol. 1987, 100, 441–471. [Google Scholar] [CrossRef] [PubMed]
- Heckerman, D.; Geiger, D.; Chickering, D.M. Learning Bayesian networks: The combination of knowledge and statistical data. Mach. Learn. 1995, 20, 197–243. [Google Scholar] [CrossRef]
- Kuipers, J.; Moffa, G.; Heckerman, D. Addendum on the scoring of Gaussian directed acyclic graphical models. Ann. Statist. 2014, 42, 1689–1691. [Google Scholar] [CrossRef] [PubMed]
- Maxwell Chickering, D.; Heckerman, D. Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Mach. Learn. 1997, 29, 181–212. [Google Scholar] [CrossRef]
- Bouckaert, R.R. Probabilistic network construction using the minimum description length principle. In European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty; Springer: Berlin/Heidelberg, Germany, 1993; pp. 41–48. [Google Scholar]
- Zheng, X.; Aragam, B.; Ravikumar, P.K.; Xing, E.P. DAGs with NO TEARS: Continuous Optimization for Structure Learning. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
- Murakonda, S.K.; Shokri, R.; Theodorakopoulos, G. Quantifying the privacy risks of learning high-dimensional graphical models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual, 13–15 April 2021; pp. 2287–2295. [Google Scholar]
- Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our Data, Ourselves: Privacy Via Distributed Noise Generation. In Advances in Cryptology—EUROCRYPT 2006; Lecture Notes in Computer Science; Vaudenay, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 486–503. [Google Scholar] [CrossRef]
- Wang, L.; Pang, Q.; Song, D. Towards practical differentially private causal graph discovery. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 5516–5526. [Google Scholar]
- Xu, D.; Yuan, S.; Wu, X. Differential Privacy Preserving Causal Graph Discovery. In Proceedings of the Computer Science and Computer Engineering Faculty Publications and Presentations, Washington, DC, USA, 1–4 August 2017. [Google Scholar] [CrossRef]
- Ma, P.; Ji, Z.; Pang, Q.; Wang, S. NoLeaks: Differentially Private Causal Discovery Under Functional Causal Model. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2324–2338. [Google Scholar] [CrossRef]
- Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography; Lecture Notes in Computer Science; Halevi, S., Rabin, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar] [CrossRef]
- Zanga, A.; Ozkirimli, E.; Stella, F. A survey on causal discovery: Theory and practice. Int. J. Approx. Reason. 2022, 151, 101–129. [Google Scholar] [CrossRef]
- Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends® Theor. Comput. Sci. 2013, 9, 211–407. [Google Scholar] [CrossRef]
- Balle, B.; Barthe, G.; Gaboardi, M. Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
- Chickering, D.M. Learning Bayesian networks is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V; Springer: New York, NY, USA, 1996; pp. 121–130. [Google Scholar]
- Dwork, C.; Lei, J. Differential privacy and robust statistics. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09, New York, NY, USA, 31 May–2 June 2009; pp. 371–380. [Google Scholar] [CrossRef]
- Dwork, C.; Rothblum, G.N.; Vadhan, S. Boosting and Differential Privacy. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, USA, 23–26 October 2010; pp. 51–60. [Google Scholar] [CrossRef]
- Kairouz, P.; Oh, S.; Viswanath, P. The Composition Theorem for Differential Privacy. IEEE Trans. Inf. Theory 2017, 63, 4037–4049. [Google Scholar] [CrossRef]
- Rogers, R.M.; Roth, A.; Ullman, J.; Vadhan, S. Privacy Odometers and Filters: Pay-as-you-Go Composition. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
- Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar] [CrossRef]
- Lee, J.; Kifer, D. Concentrated differentially private gradient descent with adaptive per-iteration privacy budget. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1656–1665. [Google Scholar]
- Zhang, X.; Ding, J.; Wu, M.; Wong, S.T.; Van Nguyen, H.; Pan, M. Adaptive privacy preserving deep learning algorithms for medical data. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 1169–1178. [Google Scholar]
- Chen, L.; Yue, D.; Ding, X.; Wang, Z.; Choo, K.K.R.; Jin, H. Differentially private deep learning with dynamic privacy budget allocation and adaptive optimization. IEEE Trans. Inf. Forensics Secur. 2023, 18, 4422–4435. [Google Scholar] [CrossRef]
- Korb, K.B.; Nicholson, A.E. Bayesian Artificial Intelligence; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
- Bernardo, J.M.; Berger, J.O.; Dawid, A.P.; Smith, A.F.M.; Bernardo, J.M.; Berger, J.O.; Dawid, A.P.; Smith, A.F.M. (Eds.) Bayesian Statistics 4: Proceedings of the Fourth Valencia International Meeting: Dedicated to the memory of Morris H. DeGroot, 1931–1989: April 15–20, 1991; Oxford University Press: Oxford, UK; New York, NY, USA, 1992. [Google Scholar]
- Lauritzen, S.L.; Spiegelhalter, D.J. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. J. R. Stat. Soc. Ser. (Methodol.) 1988, 50, 157–224. [Google Scholar] [CrossRef]
- Scutari, M.; Denis, J.B. Bayesian Networks: With Examples in R; Chapman and Hall/CRC: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bhattacharjee, P.; Tandon, R. CURATE: Scaling-Up Differentially Private Causal Graph Discovery. Entropy 2024, 26, 946. https://doi.org/10.3390/e26110946
Bhattacharjee P, Tandon R. CURATE: Scaling-Up Differentially Private Causal Graph Discovery. Entropy. 2024; 26(11):946. https://doi.org/10.3390/e26110946
Chicago/Turabian StyleBhattacharjee, Payel, and Ravi Tandon. 2024. "CURATE: Scaling-Up Differentially Private Causal Graph Discovery" Entropy 26, no. 11: 946. https://doi.org/10.3390/e26110946
APA StyleBhattacharjee, P., & Tandon, R. (2024). CURATE: Scaling-Up Differentially Private Causal Graph Discovery. Entropy, 26(11), 946. https://doi.org/10.3390/e26110946