Causal Bandits with General Causal Models and Interventions

Zirui Yan, Dennis Wei, Dmitriy A Katz, Prasanna Sattigeri, Ali Tajer
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4609-4617, 2024.

Abstract

This paper considers causal bandits (CBs) for the sequential design of interventions in a causal system. The objective is to optimize a reward function via minimizing a measure of cumulative regret with respect to the best sequence of interventions in hindsight. The paper advances the results on CBs in three directions. First, the structural causal models (SCMs) are assumed to be unknown and drawn arbitrarily from a general class $\mathcal{F}$ of Lipschitz-continuous functions. Existing results are often focused on (generalized) linear SCMs. Second, the interventions are assumed to be generalized soft with any desired level of granularity, resulting in an infinite number of possible interventions. The existing literature, in contrast, generally adopts atomic and hard interventions. Third, we provide general upper and lower bounds on regret. The upper bounds subsume (and improve) known bounds for special cases. The lower bounds are generally hitherto unknown. These bounds are characterized as functions of the (i) graph parameters, (ii) eluder dimension of the space of SCMs, denoted by $\mathrm{dim}(\mathcal{F})$, and (iii) the covering number of the function space, denoted by $\mathrm{cn}(\mathcal{F})$. Specifically, the cumulative achievable regret over horizon $T$ is $\mathcal{O}(K d^{L-1}\sqrt{T\,\mathrm{dim}(\mathcal{F}) \log(\mathrm{cn}(\mathcal{F}))})$, where $K$ is related to the Lipschitz constants, $d$ is the graph’s maximum in-degree, and $L$ is the length of the longest causal path. The upper bound is further refined for special classes of SCMs (neural network, polynomial, and linear), and their corresponding lower bounds are provided.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-yan24a, title = {Causal Bandits with General Causal Models and Interventions}, author = {Yan, Zirui and Wei, Dennis and A Katz, Dmitriy and Sattigeri, Prasanna and Tajer, Ali}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {4609--4617}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/yan24a/yan24a.pdf}, url = {https://proceedings.mlr.press/v238/yan24a.html}, abstract = {This paper considers causal bandits (CBs) for the sequential design of interventions in a causal system. The objective is to optimize a reward function via minimizing a measure of cumulative regret with respect to the best sequence of interventions in hindsight. The paper advances the results on CBs in three directions. First, the structural causal models (SCMs) are assumed to be unknown and drawn arbitrarily from a general class $\mathcal{F}$ of Lipschitz-continuous functions. Existing results are often focused on (generalized) linear SCMs. Second, the interventions are assumed to be generalized soft with any desired level of granularity, resulting in an infinite number of possible interventions. The existing literature, in contrast, generally adopts atomic and hard interventions. Third, we provide general upper and lower bounds on regret. The upper bounds subsume (and improve) known bounds for special cases. The lower bounds are generally hitherto unknown. These bounds are characterized as functions of the (i) graph parameters, (ii) eluder dimension of the space of SCMs, denoted by $\mathrm{dim}(\mathcal{F})$, and (iii) the covering number of the function space, denoted by $\mathrm{cn}(\mathcal{F})$. Specifically, the cumulative achievable regret over horizon $T$ is $\mathcal{O}(K d^{L-1}\sqrt{T\,\mathrm{dim}(\mathcal{F}) \log(\mathrm{cn}(\mathcal{F}))})$, where $K$ is related to the Lipschitz constants, $d$ is the graph’s maximum in-degree, and $L$ is the length of the longest causal path. The upper bound is further refined for special classes of SCMs (neural network, polynomial, and linear), and their corresponding lower bounds are provided.} }
Endnote
%0 Conference Paper %T Causal Bandits with General Causal Models and Interventions %A Zirui Yan %A Dennis Wei %A Dmitriy A Katz %A Prasanna Sattigeri %A Ali Tajer %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-yan24a %I PMLR %P 4609--4617 %U https://proceedings.mlr.press/v238/yan24a.html %V 238 %X This paper considers causal bandits (CBs) for the sequential design of interventions in a causal system. The objective is to optimize a reward function via minimizing a measure of cumulative regret with respect to the best sequence of interventions in hindsight. The paper advances the results on CBs in three directions. First, the structural causal models (SCMs) are assumed to be unknown and drawn arbitrarily from a general class $\mathcal{F}$ of Lipschitz-continuous functions. Existing results are often focused on (generalized) linear SCMs. Second, the interventions are assumed to be generalized soft with any desired level of granularity, resulting in an infinite number of possible interventions. The existing literature, in contrast, generally adopts atomic and hard interventions. Third, we provide general upper and lower bounds on regret. The upper bounds subsume (and improve) known bounds for special cases. The lower bounds are generally hitherto unknown. These bounds are characterized as functions of the (i) graph parameters, (ii) eluder dimension of the space of SCMs, denoted by $\mathrm{dim}(\mathcal{F})$, and (iii) the covering number of the function space, denoted by $\mathrm{cn}(\mathcal{F})$. Specifically, the cumulative achievable regret over horizon $T$ is $\mathcal{O}(K d^{L-1}\sqrt{T\,\mathrm{dim}(\mathcal{F}) \log(\mathrm{cn}(\mathcal{F}))})$, where $K$ is related to the Lipschitz constants, $d$ is the graph’s maximum in-degree, and $L$ is the length of the longest causal path. The upper bound is further refined for special classes of SCMs (neural network, polynomial, and linear), and their corresponding lower bounds are provided.
APA
Yan, Z., Wei, D., A Katz, D., Sattigeri, P. & Tajer, A.. (2024). Causal Bandits with General Causal Models and Interventions. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4609-4617 Available from https://proceedings.mlr.press/v238/yan24a.html.

Related Material