A Safe First-Order Method for Pricing-Based Resource Allocation in Safety-Critical Networks

Berkay Turan Spencer Hutchinson Mahnoosh Alizadeh B. Turan, S. Hutchinson, and M. Alizadeh are with Dept. of ECE, UCSB, Santa Barbara, CA, USA. This work is supported by NSF grant #1847096. E-mails: [email protected], [email protected], [email protected]

Abstract

We introduce a novel algorithm for solving network utility maximization (NUM) problems that arise in resource allocation schemes over networks with known safety-critical constraints, where the constraints form an arbitrary convex and compact feasible set. Inspired by applications where customers’ demand can only be affected through posted prices and real-time two-way communication with customers is not available, we require an algorithm to generate “safe prices”. This means that at no iteration should the realized demand in response to the posted prices violate the safety constraints of the network. Thus, in contrast to existing distributed first-order methods, our algorithm, called safe pricing for NUM (SPNUM), is guaranteed to produce feasible primal iterates at all iterations. At the heart of the algorithm lie two key steps that must go hand in hand to guarantee safety and convergence: 1) applying a projected gradient method on a shrunk feasible set to get the desired demand, and 2) estimating the price response function of the users and determining the price so that the induced demand is close to the desired demand. We ensure safety by adjusting the shrinkage to account for the error between the induced demand and the desired demand. In addition, by gradually reducing the amount of shrinkage and the step size of the gradient method, we prove that the primal iterates produced by the SPNUM achieve a sublinear static regret of ${\cal O}(\log{(T)})$ after $T$ time steps.

I Introduction

Many applications falling within the scope of resource allocation over networks, e.g., power distribution systems [1], congestion control in data networks [2, 3, 4], wireless cellular networks [5], and congestion control in urban traffic networks [6], deal with a multi-user optimization problem that falls under the general umbrella of network utility maximization (NUM) problems. The shared goal in these problems is to safely and efficiently allocate the shared resources to the users, where safety refers to satisfying the constraints of the system that depend on the resource allocation of all the users, and efficiency refers to the total utility of the users for a given resource allocation.

In NUM problems, the user-specific utility functions are assumed to be private to the users and therefore a centralized solution is not possible. Accordingly, distributed optimization methods have become suitable tools thanks to the separable structure of NUM problems [7, 8]. The idea is to decompose the main problem into sub-problems that can be solved by the individual users. The solutions of the sub-problems are then used to solve the main problem [9, 10], and this has been advocated for use in different applications, e.g., [11, 2]. Among the two main types of decomposition methods, primal decomposition methods correspond to a direct allocation of the resources by a central coordinator and solve the primal problem, whereas dual decomposition methods based on the Lagrangian dual problem [12] correspond to resource allocation via pricing and solve the dual problem [7]. Due to the structure of NUM problems, the latter approach has been widely adopted in the literature [7, 13, 14]. Additionally, it gives users the freedom of determining their own demand based on pricing-type signals.

Although there is extensive literature on pricing algorithms based on dual decomposition, the majority of studies focus on linear constraints [13, 14, 15, 16, 17], or on non-linear constraints with the assumption of separability and full user knowledge of these constraints [18, 19, 20]. Furthermore, none of the aforementioned studies propose an iterative pricing algorithm that induces resource demand satisfying the hard constraints of the problem during the iterative optimization process. Instead, these studies only provide bounds on the infeasibility amount of the resource demand (e.g., [14, 16]). Our preliminary work in [17] is an exception, which is limited to problems with linear inequality constraints characterized by binary matrices. Thus, pricing-based solutions can only be realized after convergence to a near-feasible point for resource allocation systems with safety-critical constraints. Therefore, implementation of such solutions requires a negotiation process through a two-way communication network if the system has hard safety-critical constraints, which can be considered impractical in many applications.

The research presented in this paper is motivated by network resource allocation applications in safety-critical systems, where a real-time two-way communication channel with the users is not available. One particularly relevant example of this type of application can be seen in the context of pricing-based electricity demand response. When attempting to change the users’ demand through posted prices, users determine their own electricity consumption to minimize their electricity bill, and no further control on the users’ demand is feasible. As such, these prices must be set such that the realized demand does not violate the physical constraints of the electric grid [21]. This is necessary to ensure the safe and reliable operation of the grid, as violating these physical constraints could have serious reliability implications. As such, many previous works to determine prices either directly solicit all the users’ preferences and solve for the prices centrally [7, 22], or employ distributed optimization methods that require back-and-forth communication with the users to converge to an optimal and grid-safe price [7, 11]. Both categories of methods have proven to be hard to implement in practical setups, motivating new research on solutions that do not require active customer engagement and still retain safe grid operations [23, 24]. In light of this motivating example, the solution we devise to determine a pricing-based solution for NUM involves a number of key considerations:

1.

The users themselves determine their own resource demand in response to the prices, with the actual demand only becoming observable ex-post.
2.

No negotiation or back-and-forth communication with the users is allowed, and no adjustment (curtailment) of demand is feasible, rendering existing works based on distributed optimization to determine prices inapplicable.
3.

It is essential that the safety-critical hard constraints of the systems must not be violated by users’ resource demand at any time, even when their price response is unknown.

Accordingly, the main challenge this paper aims to overcome is how to determine the prices for resources such that:

1.

No preference solicitation or negotiation with the users is required.
2.

The induced resource demand of the users at every iteration always satisfies the constraints of the system (i.e., guaranteed primal feasibility).
3.

The induced resource demand of the users is efficient, i.e., the total utility earned by the users is maximized (measured through regret bounds).

To this end, in this paper, we develop an iterative pricing algorithm to solve NUM problems with arbitrary convex and compact feasible sets, called safe pricing for NUM (SPNUM). We design our algorithm based solely on the realized demand in response to prices and communicate to the users only the prices for the resources at each iteration. Our contributions can be summarized as follows:

•

We introduce a novel algorithm, the SPNUM, for solving NUM problems with arbitrary convex and compact feasible sets through pricing. Our algorithm iteratively designs prices and allows users the freedom of determining their own decision variable based on prices according to their own profit maximization problem (without imposing any iterative variable update rule on the users).
•

We characterize a principled way to choose algorithm parameters to guarantee feasible primal iterates at all iterations. Furthermore, we prove that the static regret incurred by the feasible primal iterates produced by the SPNUM, i.e., the cumulative gap between the optimal objective value and the objective function evaluated at the primal iterates, up to time $T$ is bounded by ${\cal O}(\log{(T)})$ .
•

We numerically evaluate our algorithm to support our theoretical findings and compare its performance to existing first-order distributed methods for NUM problems.

To the best of the authors’ knowledge, no previous work has studied pricing algorithms for NUM problems on arbitrary convex feasible sets that are unknown to the users, even without consideration of safety. While primal-dual algorithms [25, 26, 27, 28] can handle non-separable arbitrary convex feasible sets, they rely on a primal update rule users need to follow in order to converge as opposed to maximizing their own profit based on observed prices. To this end, our contributions extend beyond safety, since SPNUM solves NUM problems on arbitrary convex feasible sets by iteratively designing prices and allowing the users to determine their own resource demand according to their own profit maximization problem.

The primal feasibility and the regret guarantees of the SPNUM result from a combination of two ingredients: 1) given prices and demand at a given instant, we apply a projected gradient method on a shrunk feasible set to get the next desired demand, and 2) we estimate the price response function of the users around the current prices and determine the next prices so that the induced demand is close to the desired demand. To ensure the algorithm behaves as a projected gradient method, the induced demand must be in the strict interior of the feasible set. The algorithm operates on a shrunk feasible set to account for the error between induced and desired demand, and gradually reduces shrinkage and step size to converge to the optimal solution.

Related work: Besides dual (sub)gradient methods, a few other branches of literature study a similar problem to ours. We highlight how those lines of work do not meet our particular design criteria and what differentiates our work from them. Additional details on distributed optimization algorithms and their classifications can be found in the surveys [29, 30].

1.

Primal-dual methods: Primal-dual methods tackle multi-user optimization problems with arbitrary convex global constraints by applying a projected gradient descent/ascent on the primal/dual variables of the Lagrangian [25, 26, 27, 28]. The dual variables are updated using the aggregate resource demand information of the users and can be used for pricing of the resources. Therefore the update rule for the dual variables meets our design goals. However, the primal variables, i.e., the resource demand of the users, are updated by applying one step of gradient descent instead of solving for the profit-maximizing optimal demand in response to prices. Accordingly, these algorithms do not resemble the selfish profit-maximizing behavior of the users we adopt in this paper.
2.

Projected gradient methods: The main goal of the projected gradient methods is to maintain feasibility by projecting the primal variables on the feasible convex set after each update step. Scholars have extensively studied the convergence properties of the projected gradient methods under different assumptions [9, 31, 32]. On the other hand, the main challenge brought by our setup is that the primal variables are controlled solely by the users and cannot be manipulated (e.g., projected). Even though we can determine a feasible desired resource allocation by means of a projected gradient method, the prices that induce such resource demand are unknown due to the privacy of the utility functions, which brings unique challenges not addressed by the previous literature.
3.

Interior point methods: Interior point methods are commonly used to solve inequality-constrained problems by using barrier functions to convert them into a sequence of equality-constrained problems, which are then solved using Newton’s method [33]. While producing feasible iterates, the use of Newton’s method requires the Hessian, which is often not available in practical applications, such as demand response without two-way communications. To address this limitation, previous works such as [34] and [35] have proposed feasible interior point methods that approximate the Hessian using first or second-order information exchange. However, these methods do not match the profit maximization rule we would like to preserve in this paper, which allows users to freely determine their resource consumption in response to posted prices. Closest to our setup and design goals in this paper would be [36, 37], where separable optimization problems with linear constraints are considered. While [36] proposes a Newton-like dual update that approximates the Hessian using first-order information, only the asymptotic convergence of the algorithm is proven and the feasibility of primal iterates is not guaranteed. [37] proposes an interior point method using Lagrangian dual decomposition with theoretical guarantees, but requires the exact Hessian for dual updates.
4.

Constrained Online Convex Optimization: The constrained online convex optimization literature (e.g., [38, 39, 40]) aims to minimize regret while establishing bounds on the constraint violation by employing iterative update algorithms on the primal variables. A common method in constrained online convex optimization literature is updating the primal variables (i.e., resource demand of the users) directly using the gradient of the objective function as the feedback. On the contrary, our setup only allows us to update the prices and get the primal variables (i.e., resource demand of the users) as feedback afterward, where the resource demand is determined by the users according to their own profit-maximization problem. This introduces a novel challenge because both the regret and the constraints are evaluated on the primal variables, and we somehow need to set the prices such that 1) the induced demand is in the feasible region and 2) the regret incurred by the induced demand is minimized.

Paper Organization: The remainder of the paper is organized as follows. In Section II, we formalize the problem setup. In Section III, we describe the SPNUM (Algorithm 1) and in Section IV, we prove its feasibility and regret guarantees. In Section V, we provide a numerical study demonstrating the efficacy of the SDGM.

Notation and Basic Definitions: We denote the set of real numbers by ${\mathbb{R}}$ and the set of non-negative real numbers by ${\mathbb{R}}_{+}$ . For vectors, $\|\cdot\|$ denotes the standard Euclidean norm and $\|\cdot\|_{p}$ denotes the $p$ -norm. For matrices, $\|\cdot\|$ denotes the matrix norm. Given a positive integer $n>0$ , $[n]$ denotes the set of integers $\{1,2,\dots,n\}$ . For two vectors $x,y\in{\mathbb{R}}^{d}$ , $\langle x,y\rangle$ denotes the inner product of $x$ and $y$ . Given a vector $x=[x_{1}^{\top},~{}x_{2}^{\top},~{}\dots,~{}x_{n}^{\top}]^{\top}\in{\mathbb{R}% }^{d}$ , $x_{i}\in{\mathbb{R}}^{d_{i}}$ denotes the $i$ ’th block of $x$ . For a matrix $A\in\mathbb{R}^{m\times n}$ , $A_{j}$ denotes the $j$ ’th row of $A$ , $A_{:,j}$ denotes the $j$ ’th column of $A$ . Given a matrix ${A}\in{\mathbb{R}}^{m\times m}$ , $\textnormal{diag}(A)\in{\mathbb{R}}^{m}$ is the vector of the diagonals of $A$ , $\kappa(A)$ is the condition number of $A$ , and $\sigma_{\min}(A)$ / $\sigma_{\max}(A)$ are the minimum/maximum singular values of $A$ . Given a function $f:{\cal X}\subseteq{\mathbb{R}}^{d}\rightarrow{\mathbb{R}}$ , $\nabla f$ denotes the gradient of $f$ , $\nabla^{k}f$ denotes the $k$ ’th order gradient of $f$ , and $\textnormal{dom}f$ denotes the domain ${\cal X}$ of $f$ . Given two vectors $x,y\in{\mathbb{R}}^{m}$ , $x\leq y$ implies element-wise inequality. Given a set ${\cal X}\subset\mathbb{R}^{d}$ , ${\cal X}^{\textnormal{int}}$ denotes the interior of ${\cal X}$ . Given a convex and compact set ${\cal X}\subset\mathbb{R}^{d}$ and a point $x\in{\mathbb{R}}^{d}$ , ${\Pi}_{\cal X}(x)$ denotes the Euclidian projection of ${x}$ onto ${\cal X}$ . We denote the closed and the open Euclidean ball with radius $r$ centered at origin as $\bar{\mathcal{B}}(r)$ and ${\cal B}(r)$ , respectively. $I_{d}$ denotes the identity matrix of size $d$ , $\bm{1}_{d}$ denotes the vector of all 1’s with dimension $d$ , and $e_{i}$ denotes the unit vector with $1$ in $i$ ’th dimension and $0$ everywhere else. The nomenclature can be found in the Appendix.

Definition 1.

A differentiable function $f(\cdot)$ is said to be $\bm{\mu}$ -strongly concave over the domain ${\cal X}$ if there exists $\mu>0$ such that

\langle\nabla f(x_{2})-\nabla f(x_{1}),x_{1}-x_{2}\rangle\geq\mu\|x_{1}-x_{2}% \|^{2}

(1)

holds for all $x_{1},x_{2}\in\cal X$ .

Definition 2.

A differentiable function $f(\cdot)$ is said to be $\bm{L}$ -smooth over the domain ${\cal X}$ if there exists $L>0$ such that

\|\nabla f(x_{1})-\nabla f(x_{2})\|\leq L\|x_{1}-x_{2}\|

(2)

holds for all $x_{1},x_{2}\in\cal X$ .

Definition 3.

A function $f(\cdot)$ is said to be $\bm{M}$ -Lipschitz continuous over the domain ${\cal X}$ if there exists $M>0$ such that

\|f(x_{1})-f(x_{2})\|\leq M\|x_{1}-x_{2}\|

(3)

holds for all $x_{1},x_{2}\in\cal X$ .

II Problem Setup

We study the standard NUM problem [2], where the goal is to allocate resources to $n$ users subject to a set of coupling constraints such that the total utility of the users is maximized. It can be formulated as the following optimization problem:


$\displaystyle\underset{x\in\textnormal{dom}f\subseteq\mathbb{R}^{d}}{\max}$	$\displaystyle~{}f(x)=\sum_{i=1}^{n}f_{i}(x_{i})$	(4a)
s.t.	$\displaystyle~{}x\in{\cal X},$	(4b)

where $f_{i}(\cdot)$ is the concave utility function of user $i$ that depends on the $d_{i}$ -dimensional vector of resource consumption, denoted by $x_{i}\in\textnormal{dom}f_{i}\subseteq{\mathbb{R}}^{d_{i}}$ , and ${\cal X}\subset\mathbb{R}^{d}$ is the convex and compact set of feasible resource allocations. We also have $\sum_{i\in[n]}d_{i}=d$ , $\textnormal{dom}f=\prod_{i\in[n]}\textnormal{dom}f_{i}$ , and define $\bar{d}=\max_{i\in[n]}d_{i}$ .

For all users $i\in[n]$ , we define the set ${\cal X}_{i}=\{x_{i}\in{\mathbb{R}}^{d_{i}}:\exists x\in{\cal X}\textnormal{ s% .t. }x_{i}\textnormal{ is the }i\textnormal{'th block of }x\}$ as the set of values that user $i$ ’s resource demand vector can take in the aggregate feasible set ${\cal X}$ . Note that since ${\cal X}$ is convex and compact, ${\cal X}_{i}$ is convex and compact, ${\forall i\in[n]}$ . Furthermore, if $x\in{\cal X}$ , then $x_{i}\in{\cal X}_{i}$ and if $x\in{\cal X}^{\textnormal{int}}$ , then $x_{i}\in{\cal X}_{i}^{\textnormal{int}}$ hold by definition. We make the following assumptions on the feasible set ${\cal X}$ , and on the utility functions over ${\cal X}_{i}$ , $\forall i\in[n]$ .

Assumption 1.

The feasible set ${\cal X}$ is a subset of $\textnormal{dom}f$ , i.e., ${\cal X}\subseteq\textnormal{dom}f$ . The diameter of the feasible set ${\cal X}$ is bounded by $R$ , i.e., $\|x-y\|\leq R$ , $\forall x,y\in{\cal X}$ . There exists a vector $\tilde{x}$ in the interior of ${\cal X}$ such that $\tilde{x}\in{\cal X}^{\textnormal{int}}$ .

Assumption 2.

For all $i\in[n]$ , the utility function $f_{i}(\cdot)$ is $\mu$ -strongly concave, $L$ -smooth, $M$ -Lipschitz continuous, and has $\beta$ -smooth gradient over ${\cal X}_{i}$ .

Example 1 (Utility function).

For instance, take $f_{i}(x_{i})=f_{\alpha}(x_{i})$ to be an $\alpha$ -fair utility function (see [41]) and let ${\cal X}_{i}=[\stackunder[1.2pt]{$x$}{\rule{3.44444pt}{0.32289pt}}_{i},\bar{x}% _{i}]$ with $\stackunder[1.2pt]{$x$}{\rule{3.44444pt}{0.32289pt}}_{i}>0$ . We have that $\nabla f_{i}(x_{i})\leq 1/{\stackunder[1.2pt]{$x$}{\rule{3.44444pt}{0.32289pt}% }}_{i}^{\alpha}$ , $-\alpha/{\stackunder[1.2pt]{$x$}{\rule{3.44444pt}{0.32289pt}}}_{i}^{\alpha+1}% \leq\nabla^{2}f_{i}^{(}x_{i})\leq-\alpha/{\bar{x}}_{i}^{\alpha+1}$ , and $\alpha(\alpha+1)/\bar{x}_{i}^{\alpha+2}\leq\nabla^{3}f_{i}(x_{i})\leq\alpha(% \alpha+1)/\stackunder[1.2pt]{$x$}{\rule{3.44444pt}{0.32289pt}}_{i}^{\alpha+2}$ , $\forall x\in{\cal X}_{i}$ . Therefore, $f_{i}(x_{i})$ is $\alpha/{\bar{x}}_{i}^{\alpha+1}$ -strongly concave, $\alpha/{\stackunder[1.2pt]{$x$}{\rule{3.44444pt}{0.32289pt}}}_{i}^{\alpha+1}$ -smooth, and $1/{\stackunder[1.2pt]{$x$}{\rule{3.44444pt}{0.32289pt}}}_{i}^{\alpha}$ -Lipschitz continuous, and has $\alpha(\alpha+1)/\stackunder[1.2pt]{$x$}{\rule{3.44444pt}{0.32289pt}}_{i}^{% \alpha+2}$ -smooth gradient over ${\cal X}_{i}$ .

Under Assumption 2, the objective function (4a) is strongly concave with coefficient $\mu$ . Accordingly, the convex optimization problem (4) has a unique solution denoted by $x^{\star}$ and an optimal objective value denoted by $f^{\star}$ .

Since $f_{i}(\cdot)$ are private to the users, (4) cannot be solved centrally. Therefore, distributed optimization methods based on the dual decomposition framework have been proposed in the literature (e.g., [7] for the case when ${\cal X}$ is a polytope) in order to incentivize selfish users with private utility functions to follow the optimal global solution. The common high-level idea is to divide the main problem into subproblems that can be solved by the individual users upon observing a pricing signal, and iteratively design prices $\{p^{0},p^{1},\dots\}$ to converge to the optimal resource allocation vector $x^{\star}$ . In this framework, upon observing a price $p_{i}\in{\mathbb{R}}^{d_{i}}$ , each user $i\in[n]$ determines their own decision variable according to their own profit maximization problem:

g_{i}(p_{i})=\underset{x_{i}\in\textnormal{dom}f_{i}}{\operatorname*{arg\,max}% }f_{i}(x_{i})-\langle p_{i},x_{i}\rangle.

(5)

We call $g_{i}(\cdot)$ the price response function of user $i$ and let $g(p)=[g_{1}(p_{1})^{\top},~{}g_{2}(p_{2})^{\top},\dots,~{}g_{n}(p_{n})^{\top}]$ be the concatenated vector of price responses given a price vector $p\in{\mathbb{R}}^{d}$ .

In the next section, we propose an algorithm to iteratively design $p^{t},~{}\forall t\geq 1$ , that produce feasible primal solutions, i.e., $x^{t}\in{\cal X},~{}\forall t\geq 1$ , where $x_{i}^{t}=g_{i}(p_{i}^{t})$ is determined by user $i$ through (5). In addition, the algorithm should produce primal iterates that result in a sublinear static regret per user, which is measured by

R(T)=\frac{1}{n}\sum_{t=1}^{T}f^{\star}-f(x^{t}).

(6)

It is worthwhile to highlight that even without the safety criterion, the literature on distributed optimization methods does not provide a distributed solution based on pricing to (4) with any type of convergence guarantees. Existing works in the literature 1) utilize a pricing algorithm based on the dual decomposition framework but consider linear constraints [13, 14, 15, 16, 17] or non-linear and separable constraints known by the users [18, 19, 20], or 2) solve the Lagrangian dual problem by primal-dual methods [25, 26, 27, 28], which restrict the users to follow a primal update method that cannot be enforced in the setting where users only care about maximizing their own profit dictated by (5). Therefore, a pricing algorithm that induces a sequence of primal iterates converging to the optimal solution of (4) with general convex and compact feasible sets ${\cal X}$ is novel in the distributed optimization literature.

Additionally, we note that the definition of regret in (6) quantifies the difference between the efficiencies of the optimal resource allocation and the proposed algorithm up to time $T$ . When the primal iterates $\{x^{t}\}_{t\in[T]}$ are in the feasible set ${\cal X}$ , users’ resource demand can actually be realized through the posted prices without waiting for the convergence of the algorithm, and therefore regret is a meaningful measure. On the other hand, although the above sum is computable for many of the existing works mentioned earlier (e.g., [14, 15] with linear constraints), they do not guarantee feasible primal iterates but only establish bounds on the amount of constraint violation at a given iteration $t$ . Therefore, solutions are only realizable after convergence to a near-feasible point for resource allocation systems with safety-critical constraints. As such, they can be viewed as complex negotiations with users over what their potential demand would be in response to different prices in order to converge to the optimal price, which renders regret a less meaningful measure. By incorporating primal feasibility into our design goals, we aim to continually allocate resources to the users through posted prices during the iterative optimization process and measure the overall efficiency of this process through regret.

III Safe Pricing Algorithm for NUM

In this section, we describe the price update algorithm we propose, called Safe Pricing for NUM (SPNUM), that produces feasible primal iterates satisfying a sublinear regret. To do so, we will use some definitions and results from [42] regarding the geometric properties of convex and compact sets. While the primary focus of [42] centers on a linear stochastic bandit setup that bears little resemblance to the NUM setup under study, the definitions of the shrunk set outlined in the former are applicable to the present context as well.

III-A Geometric Properties of the Feasible Set

The main ingredient that ensures the safety of SPNUM is that it operates on a shrunk feasible set, which is formally defined as follows:

Definition 4.

For a compact set $\mathcal{X}\subset\mathbb{R}^{d}$ and a positive scalar $\Delta\in{\mathbb{R}}_{+}$ , we define the shrunk version of $\mathcal{X}$ as $\mathcal{X}_{\Delta}:=\{x\in\mathcal{X}:x+v\in\mathcal{X},\forall v\in\bar{% \cal B}(\Delta)\}$ .

Example 2.

(Shrunk polytope) Let $A\in{\mathbb{R}}^{m\times d}$ and $\mathcal{X}=\{x\in\mathbb{R}^{d}:Ax\leq c\}$ be a polytope. The shrunk version of ${\cal X}$ is defined as ${\cal X}_{\Delta}=\{x\in\mathbb{R}^{d}:A_{j}^{\top}x\leq c_{j}-\Delta\|A_{j}\|% ,~{}j\in[m]\}$ .

Remark 1.

If ${\cal X}$ is convex and compact, then ${\cal X}_{\Delta}$ is also convex and compact.¹¹1We can equivalently define $\mathcal{X}_{\Delta}$ using Minkowski subtraction. The Minkowski subtraction of sets $A,B\subseteq\mathbb{R}^{d}$ is defined as $A\ominus B:=\{a-b:a\in A,b\in B\}$ , or equivalently, $A\ominus B=\bigcap_{b\in{B}}(A-b)$ . Therefore, $\mathcal{X}_{\Delta}=\mathcal{X}\ominus\mathcal{B}(\Delta)$ is an intersection of convex and closed sets and hence is convex and closed [43, Section 3.1]. By Definition 4, ${\cal X}_{\Delta}$ is a subset of ${\cal X}$ , and therefore bounded. A closed and bounded convex set is convex and compact.

Given the above definition of the shrunk version of a set, one can consider the maximum shrinkage that a set can withstand while still being nonempty. We introduce the maximum shrinkage of a set in the following definition.

Definition 5.

For a compact set $\mathcal{X}\subset\mathbb{R}^{d}$ , we define the maximum shrinkage of $\mathcal{X}$ , as $H_{\mathcal{X}}:=\sup\{\Delta:\mathcal{X}_{\Delta}\neq\emptyset\}$ .

III-B Description of the Algorithm

1: Input:

p^{0}

\Delta^{t}

\gamma^{t}

\eta^{t}

2: (Initialization stage):

3: Each user

i\in[n]

receives

p_{i}^{0}

and

p_{i}^{-t}=p_{i}^{0}+\eta^{0}e_{1+\mathrm{mod}(t,d_{i})}

\forall t\in[d_{i}]

and solves

x_{i}^{t}=g_{i}(p_{i}^{t}),~{}t=-d_{i},-d_{i}+1,\dots,0.

(7)

4: For all

i\in[n]

, estimate the Jacobian of

g_{i}

as:

\hat{\nabla}g_{i}^{0}=\left[\frac{x_{i}^{-d_{i}}-x_{i}^{0}}{\eta^{0}},\dots,~{% }\frac{x_{i}^{-1}-x_{i}^{0}}{\eta^{0}}\right]

(8)

5: for

t=0,1,\dots

6: (Update stage)

7: Compute

\hat{x}^{t+1}=\Pi_{{\cal X}_{\Delta^{t}}}(x^{t}+\gamma^{t}p^{t})

8: Set

{p}_{i}^{t+1}=p_{i}^{t}+[\hat{\nabla}g_{i}^{t}]^{-1}(\hat{x}_{i}^{t+1}-x_{i}^{% t})

, for all

i\in[n]

9: Each user

i\in[n]

receives

p_{i}^{t+1}

and solves

x_{i}^{t+1}=g_{i}(p_{i}^{t+1})

(9)

10: (Sampling stage)

11: Each user

i\in[n]

receives

p_{i}^{t+1,s}=p_{i}^{t+1}+\eta^{t+1}e_{1+\mathrm{mod}(t,d_{i})}

and solves

x_{i}^{t+1,s}=g_{i}(p_{i}^{t+1,s})

(10)

12: For each user

i\in[n]

	$\displaystyle[\hat{\nabla}g_{i}^{t}]_{:,1+\mathrm{mod}(t,d_{i})}\leftarrow(x_{% i}^{t+1,s}-x_{i}^{t+1})/{\eta^{t+1}}$		(11)
	$\displaystyle\hat{\nabla}g_{i}^{t+1}=\hat{\nabla}g_{i}^{t}$		(12)

13: end for

Algorithm 1 Safe Pricing for NUM

The proposed method, called safe pricing for NUM (SPNUM) and outlined in Algorithm 1, consists of two stages at each iteration: 1) update stage (Step 6) and 2) sampling stage (Step 10). The update stage proceeds similarly to a projected gradient method on the primal iterates while designing prices that induce realized iterates close to a desired iterate. The sampling stage estimates the Jacobians of the price response functions of the users, which are used during the update stage.

In the update stage, the algorithm first determines a desired next iterate $\hat{x}^{t+1}$ in Step 7. However, because the primal variables are not directly controllable, prices that induce $x^{t+1}$ that is close to $\hat{x}^{t+1}$ have to be determined at Step 8. Accordingly, at the heart of the update stage lie two key steps:

1.

At iteration $t$ , the central coordinator observes $x^{t}$ and determines the next desired iterate $\hat{x}^{t+1}$ by means of a projected gradient ascent step in Step 7. This is because if $x^{t}\in{\cal X}^{\textnormal{int}}$ , then $x_{i}\in{\cal X}_{i}^{\textnormal{int}}$ , which implies that $p_{i}^{t}=\nabla f_{i}(x^{t})$ by Assumption 2 and the first order optimality condition for (5). Therefore, $p^{t}=\nabla f(x^{t})$ . In addition, projection is performed onto a shrunk set ${\cal X}_{\Delta^{t}}$ , where $\Delta^{t}$ controls the amount of shrinkage at time $t$ . This is the key ingredient to ensure the safety of the algorithm because the uncertainty in the price response functions will cause the actual induced iterate $x^{t+1}$ in response to the price vector $p^{t+1}$ to deviate from the desired iterate $\hat{x}^{t+1}$ . By adding this safety margin to the constraint, we can ensure safety if $\|x^{t+1}-\hat{x}^{t+1}\|\in\bar{\cal B}(\Delta^{t})$ . Finally, by utilizing a diminishing safety margin sequence $\{\Delta^{t}\}_{t\geq 0}$ , we can ensure convergence to the optimal solution of (4).

Once the desired next iterate $\hat{x}^{t+1}$ is determined, the central coordinator has to determine $p_{i}^{t+1}$ that would ideally induce $\hat{x}_{i}^{t+1}$ , $\forall i\in[n]$ . However, the price response function is unknown to the central coordinator, and therefore an exact solution is not possible. Instead, the central coordinator makes a linear approximation of the price response function using the Jacobian estimate of $g_{i}$ , $\forall i\in[n]$ . In particular, the central coordinator keeps an estimate of the Jacobian denoted by $\hat{\nabla}g_{i}^{t}$ initialized in Steps 3 and 4 of the algorithm, which is constructed by varying the price vector along each dimension and estimating the gradient using the difference equation. This results in the following linear approximation of the price response function around $p_{i}^{t}$ :

\hat{g}_{i}(p)=x_{i}^{t}+\hat{\nabla}g_{i}^{t}(p-p_{i}^{t}).

(13)

By setting $p={p}_{i}^{t+1}$ , $\hat{g}_{i}({p}_{i}^{t+1})=\hat{x}^{t+1}$ , and rearranging, we get the price update rule in Step 8. This requires that the $\hat{\nabla}g_{i}^{t}$ is an invertible matrix, which will be proven in Section IV.

After determining $p^{t+1}$ and $x^{t+1}$ , the algorithm proceeds to the sampling stage to update the Jacobian estimates. To achieve this, the central coordinator varies the price vector $p_{i}^{t+1}$ along the dimension $1+\mathrm{mod}(t,d_{i})$ in Step 11 for user $i\in[n]$ , resulting in a sampling price of ${p_{i}^{t+1,s}}$ . The response is observed and denoted as $x_{i}^{t+1,s}$ . The difference between $x_{i}^{t+1,s}$ and $x_{i}^{t+1}$ divided by the amount of price variation serves as an estimate of the gradient of the price response function along the $1+\mathrm{mod}(t,d_{i})$ ’th principal axis, which becomes the $1+\mathrm{mod}(t,d_{i})$ ’th column of the Jacobian estimate $\hat{\nabla}g_{i}^{t+1}$ in Step 12. It is worthwhile to highlight that for a user $i$ , the error between $\hat{x}_{i}^{t+1}$ and $x_{i}^{t+1}$ has two sources: 1) the difference between the estimated Jacobian and the actual Jacobian, i.e., $\hat{\nabla}g_{i}^{t}-\nabla g_{i}(p_{i}^{t})$ , and 2) the high order terms not captured by the linear approximation, i.e., $R_{1}=g_{i}(p^{t})-\nabla g_{i}(p^{t})(p-p^{t})$ .

It is necessary that there exists an initial price vector $p^{0}$ such that the demand vectors in response to the initial sampling prices in (7) are in ${\cal X}^{\textnormal{int}}$ so that the algorithm can proceed as described above. Since this has to hold before getting any feedback from the users, we make the following assumption:

Assumption 3.

There exists a known price vector $p^{0}$ such that $g(p^{0})\in{\cal X}^{\textnormal{int}}$ and for all $i\in[n]$ , $x_{i}^{-d_{i}}\in{\cal X}_{i}^{\textnormal{int}}$ .

The above assumption guarantees that the initial demand vectors in (7) are in ${\cal X}^{\textnormal{int}}_{i},~{}\forall i\in[n]$ and therefore the initial Jacobian estimation is meaningful.

Remark 2.

One way to satisfy Assumption 3 is to choose $\eta^{0}$ such that ${\cal X}_{\frac{\sqrt{n}\eta^{0}}{\mu}}$ is non-empty and $p^{0}$ such that $g(p^{0})\in{\cal X}_{\frac{\sqrt{n}\eta^{0}}{\mu}}$ , which is proven in Appendix -D.

Remark 3.

For network resource allocation systems, the historical price response of the users can be used to choose a price point in history where the induced demand was in the feasible set. However, if there are additional assumptions that allows us to exploit the structure of the feasible set, we can utilize systematic methods. For instance, if the feasible set ${\cal X}$ is a polytope of the form ${\cal X}=\{x:Ax\leq c\}$ , where $A_{ij}\geq 0$ , then one way to find safe prices is to set the prices too high and gradually reduce them since low demand promotes safety. Indeed, this is a method we use in our preliminary work below (where $A$ was a binary matrix) to determine initial prices [17].

In the next section, we characterize a principled way to choose parameters $\Delta^{t}$ , $\gamma^{t}$ , and $\eta^{t}$ in order to produce feasible primal iterates. Additionally, we prove that the regret incurred by the iterates produced by Algorithm 1 is ${\cal O}(\log(T))$ after $T$ iterations, and the last iterate converges to the optimal solution at the rate ${\cal O}(\log(T)/T)$ .

IV Feasibility and Regret Analysis

In order to prove the safety and the regret guarantees of our algorithm, we will need to bound the distance between a point in $x\in{\cal X}$ and its projection onto the shrunk set $\Pi_{{\cal X}_{\Delta}}(x)$ . The following definition from [42] formalizes this notion called the sharpness of a set, which is defined as the maximum distance from any point in a set to the projection of it onto the shrunk version of that set.

Definition 6.

For a convex and compact set $\mathcal{X}\subset\mathbb{R}^{d}$ , we define the sharpness of $\mathcal{X}$ as

\mathrm{Sharp}_{\mathcal{X}}(\Delta)\vcentcolon=\sup_{x\in{\cal X}}\|\Pi_{{% \cal X}_{\Delta}}(x)-x\|,

(14)

for all non-negative $\Delta$ such that $\mathcal{X}_{\Delta}$ is nonempty.

The following proposition establishes a bound on the sharpness of convex and compact sets as a linear function of ${\Delta}$ :

Proposition 1.

[42, Corollary 11] For a convex, compact set $\mathcal{X}\subset\mathbb{R}^{d}$ with non-empty interior, we have that $\mathrm{Sharp}_{\mathcal{X}}(\Delta)\leq\Gamma_{\mathcal{X}}\Delta$ where $\Gamma_{\mathcal{X}}\geq 1$ is a constant that depends only on the geometry and the dimension of $\mathcal{X}$ .

Example 3 (Sharpness of a polytope [42]).

Let $\mathcal{X}=\{x\in\mathbb{R}^{d}:Ax\leq c\}$ be a polytope with a nonempty interior. Define $\mathcal{I}_{A}$ to refer to the collection of all sets of $d$ indices such that for each $\{i_{1},i_{2},...,i_{d}\}\in\mathcal{I}_{A}$ the vectors $A_{i_{1}},A_{i_{2}},...,A_{i_{d}}$ are linearly independent. For each $\ell\in\mathcal{I}_{A}$ where $\ell=\{i_{1},i_{2},...,i_{d}\}$ , we define $A^{\ell}=[A_{i_{1}}^{\top}\ A_{i_{2}}^{\top}\ ...\ A_{i_{d}}^{\top}]^{\top}$ . We have that $\mathrm{Sharp}_{\mathcal{X}}(\Delta)\leq\sqrt{d}K_{\mathcal{X}}\Delta$ , where $K_{\mathcal{X}}:=\max_{\ell\in\mathcal{I}_{A}}\kappa(A^{\ell})$ .

Example 4 (Sharpness of a ball in ${\mathbb{R}}^{d}$ ).

Let ${\cal X}=\{x\in\mathbb{R}^{d}:(x-x_{0})^{\top}(x-x_{0})\leq r^{2}\}$ be a ball in ${\mathbb{R}}^{d}$ with radius $r$ centered at $x_{0}$ . We have that $\mathrm{Sharp}_{\mathcal{X}}(\Delta)=\Delta$ .

Although we do not specify a closed-form expression of $\Gamma_{\cal X}$ for a general convex and compact set $\cal X$ , it relates to the sharpness of polytopes that are contained in ${\cal X}$ , which have closed-form bounds as given by Example 3. We refer the reader to [42] (Proposition 10) for a detailed discussion.

The next lemma characterizes the regularity properties of $g_{i}(p_{i})$ over the set of prices that induce a resource demand in ${\cal X}_{i}^{\textnormal{int}}$ for a user $i\in[n]$ . This property is crucial for our analysis and for the feasibility of the algorithm, as we need to show that the inverse of the matrix $\hat{\nabla}g_{i}^{t}$ for the price update rule in Step 8 is a valid operation.

Lemma 1.

Let ${\cal P}_{i}=\{p_{i}\in{\mathbb{R}}^{d_{i}}:g_{i}(p_{i})\in{\cal X}_{i}^{% \textnormal{int}}\}$ be the set of prices that induce a resource demand in ${\cal X}_{i}^{\textnormal{int}}$ for a user $i\in[n]$ . Over ${\cal P}_{i}$ , $g_{i}(p_{i})$ is bijective, $1/\mu$ -Lipschitz continuous, and $\beta/\mu^{3}$ -smooth. Accordingly, $g_{i}(p_{i})$ is invertible and $\nabla g_{i}(p_{i})=[\nabla^{2}f_{i}(g_{i}(p_{i}))]^{-1}$ .

The proof of Lemma 1 can be found in Appendix -E. Lemma 1 establishes that the true Jacobian of the price response function for user $i$ is invertible because it corresponds to the inverse of the Hessian of the strongly concave utility function of user $i$ . However, this does not imply that the estimated Jacobian $\hat{\nabla}g_{i}^{t}$ is invertible since it is constructed by finite difference gradient approximation. The next lemma states that the estimated Jacobian $\hat{\nabla}g_{i}^{t}$ is close enough to $\nabla g_{i}(p^{t})$ , which allows us to bound the minimum singular value of it and therefore guarantees invertibility with the appropriate choice of algorithm parameters.

Lemma 2.

Let $\gamma^{t}=1/(\mu(t+\tau))$ , $\Delta^{t}=\Delta/(t+\tau)^{2}$ , and $\eta^{t}=\mu\Delta^{t-1}/(4\sqrt{n})$ for some $\Delta>0$ and

		$\displaystyle\tau=\max\Big{\{}2,2\bar{d}-1,1+{2\mu\Delta\Gamma_{\cal X}}/({M% \sqrt{n}}),\sqrt{{\Delta}/{{H}_{\cal X}}},$
		$\displaystyle\hskip 28.45274pt{L\beta M\sqrt{\bar{d}}\left(\mu+32L\Gamma_{\cal X% }\sqrt{n}(\bar{d}-1)\right)}/({2\mu^{4}\Gamma_{\cal X}})\Big{\}}.$		(15)

Suppose that at iteration $t$ , $x^{k}\in{{\cal X}}_{\frac{\eta^{k}\sqrt{n}}{\mu}}^{\textnormal{int}}$ , $\forall k\in[\max\{t-\bar{d}+1,0\},t]$ . Then, the following holds for all $i\in[n]$ :

\displaystyle\|\hat{\nabla}g_{i}^{t}-\nabla g_{i}(p_{i}^{t})\|\leq e_{i}^{t},

(16)

where

e_{i}^{t}{=}\frac{2\beta\sqrt{d_{i}}}{\mu^{3}}\left(\eta^{t}{+}2L(d_{i}{-}1)(M% \sqrt{n}\gamma^{t}{+}2\Delta^{t}\Gamma_{\cal X})\right)\leq\frac{1}{2L}.

(17)

Accordingly, $\sigma_{\min}(\hat{\nabla}g_{i}^{t})\geq\frac{1}{2L}$ and therefore $\hat{\nabla}g_{i}^{t}$ is invertible.

The proof of Lemma 2 can be found in Appendix -A. Lemma 2 characterizes a principled way to choose the algorithm parameters with respect to a free parameter $\Delta$ in order to bound the difference between $\hat{\nabla}g_{i}^{t}$ and $\nabla g_{i}(p^{t})$ . In the following subsections, we will first characterize the choice of $\Delta$ that guarantees primal feasibility at all iterations and then prove the regret and convergence guarantees of Algorithm 1 under this choice of parameters.

IV-A Feasibility Analysis

The following proposition characterizes the choice of the parameters $\Delta^{t}$ , $\gamma^{t}$ , and $\eta^{t}$ to ensure feasible primal iterates:

Proposition 2.

Let $\gamma^{t}=1/(\mu(t+\tau))$ and $\Delta^{t}=\Delta/(t+\tau)^{2}$ , $\eta^{t}=\mu\Delta^{t-1}/(4\sqrt{n})$ , where $\tau$ is given by (2) and

\displaystyle\Delta

\displaystyle={\beta LMn^{3/2}(6L+\sqrt{d}(\mu/\sqrt{n}+32L(\bar{d}-1)))}/{\mu% ^{5}}.

(18)

Then for all $t\geq 0$ , $\|\hat{x}^{t+1}-x^{t+1}\|<3\Delta^{t}/4$ and $\|x^{t+1}-x^{t+1,s}\|\leq\Delta^{t}/4$ . Accordingly, for all $t\geq 0$ , the iterates $x^{t}$ and $x^{t,s}$ produced by Algorithm 1 are feasible and in the strict interior of the feasible set, i.e., $x^{t}\in{\cal X}_{\frac{\eta^{t}\sqrt{n}}{\mu}}^{\textnormal{int}}$ and $x^{t,s}\in{\cal X}^{\textnormal{int}},~{}\forall t\geq 1$ .

The proof of Proposition 2 can be found in Appendix -B. Given that under Proposition 2, $x^{t}$ for all $t\geq 1$ are feasible and therefore implementable, the static regret (6) is a valid choice of performance metric. Next, we prove that the regret of Algorithm 1 is ${\cal O}(\log(T))$ and the primal variables converge to the optimal solution at the rate ${\cal O}(\log(T)/T)$ .

IV-B Regret and Convergence Analysis

As our algorithm alternates between executing one update and one sampling stage, after $T$ iterations it will have executed $T/2$ update stages and $T/2$ sampling stages. In this case, the regret per user is fairly calculated as:

R(T)=\frac{1}{n}\sum_{t=1}^{T/2}(f(x^{\star})-f(x^{t})+f(x^{\star})-f(x^{t,s})).

(19)

The following theorem establishes an upper bound on the regret incurred by the primal iterates produced by Algorithm 1, and the squared distance between last iterate $x^{T/2}$ and the optimum solution $x^{\star}$ :

Theorem 1.

Let $p^{0}$ , $\gamma^{t}$ , $\Delta^{t}$ , and $\eta^{t}$ be chosen as in Proposition 2. Then for all $t\geq 0$ , the iterates produced by Algorithm 1 are feasible. Furthermore, the regret $R(T)$ for $T\geq 2$ satisfies

R(T)\leq{\cal O}(\log(T)(1+\Delta\Gamma_{\cal X}/n)),

(20)

where ${\cal O}(\cdot)$ hides other constants. In addition, the last primal iterate $x^{T/2}$ satisfies

\|x^{T/2}-x^{\star}\|^{2}\leq{\cal O}(\log(T)/T).

(21)

Proof outline: Since the update stage of the algorithm proceeds similarly to a projected gradient method, the proof is similar to that of a projected gradient ascent for strongly concave functions. We have an additional error term due to $\|x^{t+1}-\hat{x}^{t+1}\|$ , which is ${\cal O}(\Delta^{t})$ . The error term impacts the regret of the update stages as ${\cal O}(\sum_{t=1}^{T/2}\Delta^{t}/\gamma^{t})$ , which results in an additive ${\cal O}(\log(T)\Delta\Gamma_{\cal X}/n)$ term. For the regret of the sampling stages, we note that the prices for the sampling stages are set by varying the prices of the update stages by $\eta^{t}$ . Therefore, we can upper bound the sum of the regret of all sampling stages by the regret of the update stages plus a constant additive term of $\Delta M/(4\sqrt{n})$ .

The complete proof of Theorem 1 and the explicit constants of (20) can be found in Appendix -C. According to Theorem 1, Algorithm 1 produces feasible solutions that achieve a sublinear regret of ${\cal O}(\log(T))$ . Furthermore, the primal variables induced by the prices converge to the optimal solution at the rate ${\cal O}(\log(T)/T)$ .

Remark 4.

When $d_{i}=1,~{}\forall i\in[n]$ , $\Delta={\cal O(}\beta n^{3/2})$ and $R(T)={\cal O}(\log(T)(1+\sqrt{n}\beta\Gamma_{\cal X}))$ .

In the next section, we numerically demonstrate our theoretical results about the primal variables induced by Algorithm 1 and compare its performance to existing pricing algorithms.

V Numerical Studies

In this section, we demonstrate the efficacy of SPNUM via three numerical studies: 1) a benchmarking study to compare SPNUM’s convergence and feasibility performance to existing pricing methods that solve the NUM problem, 2) a toy NUM problem with a non-linear feasible set to demonstrate the success of SPNUM on non-linear feasible sets, and 3) a parameter study to demonstrate how the regret depends on the second order smoothness parameter $\beta$ , sharpness parameter ${\Gamma}_{\cal X}$ , and the number of users $n$ .

V-A Benchmarking Study

In this study, our aim is to compare the safety and convergence performance of SPNUM to the existing algorithms on feasible sets characterized by linear inequalities, i.e., ${\cal X}=\{x\in{\mathbb{R}}^{d}:Ax\leq c\}$ . We compare SPNUM to DG [16], which can achieve a linear convergence rate, and SDGM [17], which can provide safety when $A$ is a binary matrix.

We have implemented all algorithms on two types of $A$ matrices: 1) $A$ is a binary matrix and 2) $A$ is a real matrix. For both cases, we randomly generated a collection of 50 networks with a random number of users $n$ taking (integer) values in range $[5,20]$ , and a random number of constraints $m$ taking values in the interval $[5,10]$ (generated independently). Inspired by [16], for all users $i\in[n]$ , we let the utility function be $f_{i}(x_{i})=-0.5(x_{i}-3)^{2}-x_{i}-\theta_{i}\log(1+e^{x_{i}})$ , where $\theta_{i}$ is sampled uniformly from $[0,1]$ for each network configuration (we shifted the quadratic term by $3$ to ensure that the optimal solution is on the boundary of the feasible set). We set $\textnormal{dom}f_{i}=[0,1]$ for all $i\in[n]$ . For each network configuration, we first randomly generated a matrix $\hat{A}$ by sampling $m\times n$ Bernoulli random variables for the binary matrix case, and by sampling $m\times n$ random variables from the continuous uniform distribution in $[-1,1]$ for the real matrix case. We then let $A=[\hat{A}^{\top}~{}I_{n}]^{\top}$ . For the binary case, we let $c=\bm{1}_{m+n}$ , and for the real case, we let $c=[0.1\bm{1}_{m}^{\top}~{}\bm{1}_{n}^{\top}]^{\top}$ .²²2For SPNUM, we additionally include the constraints $x\geq 0$ in ${\cal X}$ to satisfy Assumption 1. For the other algorithms, this is not needed.

We note that ${\cal X}_{i}\subseteq[0,1],~{}\forall i\in[n]$ . Within ${\cal X}_{i}$ , using bounds on $\theta_{i}$ and computing the derivatives of $f_{i}$ , we get $M=2$ , $L=5/4$ , $\mu=1$ , $\beta=\sinh(1)/(2(1+\cosh(1))^{2})\approx 0.0909$ . Finally, from Example 3 we have ${\Gamma}_{\cal X}\leq\sqrt{n}\kappa(A)$ .

For each configuration, we initialized the dual variables and prices to induce $x_{i}^{0}=\eta^{0}/\mu,~{}\forall i\in[n]$ , and ran all three algorithms for a horizon of $T=1000$ . We demonstrate the results for the binary matrix case and the real matrix case in Figure 1(a) and Figure 1(b), respectively. In Figure 1(a) we observe that 1) while DG converges the fastest, it is not safe, 2) SDGM and SPNUM converge slower but are safe, and 3) SDGM converges faster than SPNUM because it is designed specifically for this setting. On the other hand, in Figure 1(b) we observe that 1) SDGM does not provide safety and convergence when $A$ is a real matrix, as its assumptions do not hold anymore (note that the plot for $\|x^{t}-x^{\star}\|^{2}$ flattens for SDGM), 2) SPNUM successfully provides safety and convergence.

V-B SPNUM on Non-linear Feasible Set

This study aims to demonstrate numerically the regret and safety guarantees of SPNUM on a problem with a feasible set characterized by non-linear inequalities. We select the feasible set ${\cal X}=\{x\in{\mathbb{R}}^{d}:\|x\|\leq 1\}$ as the unit ball in ${\mathbb{R}}^{d}$ centered at the origin. At the beginning of each run, we sample the number of users $n$ as an integer from the range $[5,20]$ uniformly at random. For all $i\in[n]$ , we let the utility function be $f_{i}(x_{i})=-0.5(x_{i}-y_{i})^{2}-x_{i}-\theta_{i}\log(1+e^{x_{i}})$ , where $\theta_{i}$ is sampled uniformly from $[0,1]$ and $y_{i}$ is sampled uniformly from $[-2,2]$ at random at the beginning of each run.

Noting that ${\cal X}_{i}=[-1,1]$ , using bounds on $\theta_{i}$ and $y_{i}$ and computing the derivatives of $f_{i}$ , we get $M=4+e/(1+e)$ , $L=5/4$ , $\mu=1$ , $\beta=\sinh(1)/(2(1+\cosh(1))^{2})\approx 0.0909$ . Finally, from Example 4 we have ${\Gamma}_{\cal X}=1$ .

We initialize the prices to induce $x_{i}^{0}=\eta^{0}/\mu,~{}\forall i\in[n]$ , and ran SPNUM 100 times for a horizon of $T=50$ . The results are illustrated in Figure 3. The figure shows that 1) the regret of SPNUM grows as ${\cal O}(\log(t))$ , 2) SPNUM guarantees feasible iterates at all iterations, and 3) the primal iterates produced by SPNUM converge to the optimal solution.

V-C Impact of Sharpness on Regret

In this study, our aim is to support our theoretical results about SPNUM with numerical examples. In particular, we study the impact of sharpness parameter ${\Gamma}_{\cal X}$ and the number of users $n$ on regret through $\beta$ . We set $d_{i}=1$ , in which case $R(T)={\cal O}(\log(T)(1+\beta\sqrt{n}\Gamma_{\cal X}))$ as stated in Remark 4. For each user $i$ , we set $f_{i}(x_{i})=\theta_{i}(\cos(\omega(x-1))/\omega^{2}-10(x-2)^{2}-x\sin(\omega)% /\omega)$ , where $\theta_{i}$ is sampled uniformly from $[1,2]$ . This particular choice of $f_{i}$ allows us to control $\beta$ while keeping the other parameters constant by simply choosing $\omega$ . Using the bounds on $\theta_{i}$ and computing the derivatives of $f_{i}$ , we get $M=40$ , $L=42$ , $\mu=19$ , and $\beta=2\omega$ .

In order to have control over the sharpness parameter ${\Gamma}_{\cal X}$ , we study linear constraints of the form ${\cal X}=\{x\in\mathbb{R}^{n}:x\geq 0,Ax\leq c\}$ , where $A_{ij}=(1-K)/(1+K(n-1))$ if $i\neq j$ , and $A_{ii}=1$ . This choice of $A$ allows us to parameterize the feasible set as a function of the condition number $K$ , since $\kappa(A)=K$ and ${\Gamma}_{\cal X}=\sqrt{n}\kappa(A)$ . Finally, since $f_{i}$ is increasing over ${\cal X}_{i}$ , the optimal solution is given by $x^{\star}=\bm{1}_{n}$ .

For $n=\{2,4,8,16\}$ , $\omega=\{0.001,0.1\}$ , and $K=\{4/\sqrt{n},8/\sqrt{n},16/\sqrt{n},32/\sqrt{n}\}$ , we randomly sampled 10 sets of $\{\theta_{i}\}_{i\in[n]}$ , initialized $p_{i}^{0}$ so that $x_{i}^{0}=\eta^{0}/\mu,~{}\forall i\in[n]$ , and ran SPNUM for a horizon of $T=500$ . Note that this corresponds to configurations of $\beta=\{0.002,0.2\}$ and $\Gamma_{\cal X}=\{4,8,16,32\}$ . We plot the regret for each configuration in Figure 2. The results indicate that 1) when $\beta$ is small, $\Gamma_{\cal X}$ and $n$ have little impact on the regret, and 2) when $\beta$ is large, regret grows with ${\Gamma}_{\cal X}$ and $n$ as the term proportional to $\sqrt{n}\beta\Gamma_{\cal X}$ dominates.

VI Conclusion

In this work, we introduced a novel algorithm, called the safe pricing for NUM (SPNUM), for solving resource allocation problems over networks with arbitrary convex and compact feasible sets in a distributed fashion. Our algorithm iteratively designs prices for resources and allows the users the determine their own resource demand in response to prices according to their own profit maximization problem. The prices produced by SPNUM ensure that the induced demand satisfies the constraints of the system during the optimization process, which promotes safety. This is done by: 1) shrinking the constraint set and applying a projected gradient method to the primal variables to determine the updated desired demand, and 2) determining the prices that would induce the desired demand by estimating the price response function of the users using the historical data. By carefully controlling the amount of shrinkage to account for the error in the estimated price response, we ensure the safety of the algorithm. In addition, we have proven that the regret incurred by the SPNUM is ${\cal O}(\log(T))$ , and the primal variables converge to the optimal solution at the rate of ${\cal O}(\log(T)/T)$ .

References

[1] P. Samadi, A.-H. Mohsenian-Rad, R. Schober, V. W. Wong, and J. Jatskevich, “Optimal real-time pricing algorithm based on utility maximization for smart grid,” in 2010 First IEEE International Conference on Smart Grid Communications. IEEE, 2010, pp. 415–420.
[2] F. P. Kelly, A. K. Maulloo, and D. K. H. Tan, “Rate control for communication networks: shadow prices, proportional fairness and stability,” Journal of the Operational Research society, vol. 49, no. 3, pp. 237–252, 1998.
[3] Y. Li, F. Song, J. Liu, X. Xie, and E. Tian, “Software defined event-triggering control for large-scale networked systems subject to stochastic cyber attacks,” IEEE Transactions on Control of Network Systems, 2023.
[4] J. Liu, E. Gong, L. Zha, X. Xie, and E. Tian, “Outlier-resistant recursive security filtering for multirate networked systems under fading measurements and round-robin protocol,” IEEE Transactions on Control of Network Systems, 2023.
[5] M. Chiang and J. Bell, “Balancing supply and demand of bandwidth in wireless cellular networks: utility maximization over powers and rates,” in IEEE INFOCOM 2004, vol. 4. IEEE, 2004, pp. 2800–2811.
[6] N. Mehr, J. Lioris, R. Horowitz, and R. Pedarsani, “Joint perimeter and signal control of urban traffic via network utility maximization,” in 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2017, pp. 1–6.
[7] D. P. Palomar and M. Chiang, “A tutorial on decomposition methods for network utility maximization,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp. 1439–1451, 2006.
[8] I. Necoara, V. Nedelcu, and I. Dumitrache, “Parallel and distributed optimization methods for estimation and control in networks,” Journal of Process Control, vol. 21, no. 5, pp. 756–766, 2011.
[9] D. P. Bertsekas, “Nonlinear programming,” Journal of the Operational Research Society, vol. 48, no. 3, pp. 334–334, 1997.
[10] D. Bertsekas and J. Tsitsiklis, Parallel and distributed computation: numerical methods. Athena Scientific, 2015.
[11] N. Li, L. Chen, and S. H. Low, “Optimal demand response based on utility maximization in power networks,” in 2011 IEEE power and energy society general meeting. IEEE, 2011, pp. 1–8.
[12] N. Z. Shor, Minimization methods for non-differentiable functions. Springer Science & Business Media, 2012, vol. 3.
[13] A. Nedić and A. Ozdaglar, “Approximate primal solutions and rate analysis for dual subgradient methods,” SIAM Journal on Optimization, vol. 19, no. 4, pp. 1757–1780, 2009.
[14] A. Beck, A. Nedić, A. Ozdaglar, and M. Teboulle, “An $o(1/k)$ gradient method for network resource allocation problems,” IEEE Transactions on Control of Network Systems, vol. 1, no. 1, pp. 64–73, 2014.
[15] I. Necoara and V. Nedelcu, “Rate analysis of inexact dual first-order methods application to dual decomposition,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1232–1243, 2013.
[16] ——, “On linear convergence of a distributed dual gradient algorithm for linearly constrained separable convex problems,” Automatica, vol. 55, pp. 209–216, 2015.
[17] B. Turan and M. Alizadeh, “Safe dual gradient method for network utility maximization problems,” in 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 6953–6959.
[18] A. Simonetto and H. Jamali-Rad, “Primal recovery from consensus-based dual decomposition for distributed convex optimization,” Journal of Optimization Theory and Applications, vol. 168, pp. 172–197, 2016.
[19] A. Falsone, K. Margellos, S. Garatti, and M. Prandini, “Dual decomposition for multi-agent distributed optimization with coupling constraints,” Automatica, vol. 84, pp. 149–158, 2017.
[20] I. Notarnicola and G. Notarstefano, “Constraint-coupled distributed optimization: a relaxation and duality approach,” IEEE Transactions on Control of Network Systems, vol. 7, no. 1, pp. 483–492, 2019.
[21] J. S. Vardakas, N. Zorba, and C. V. Verikoukis, “A survey on demand response programs in smart grids: Pricing methods and optimization algorithms,” IEEE Communications Surveys & Tutorials, vol. 17, no. 1, pp. 152–178, 2014.
[22] K. Christakou, D.-C. Tomozei, J.-Y. Le Boudec, and M. Paolone, “Ac opf in radial distribution networks–part ii: An augmented lagrangian-based opf algorithm, distributable via primal decomposition,” Electric Power Systems Research, vol. 150, pp. 24–35, 2017.
[23] N. Tucker, A. Moradipari, and M. Alizadeh, “Constrained thompson sampling for real-time electricity pricing with grid reliability constraints,” IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 4971–4983, 2020.
[24] M. Lubin, Y. Dvorkin, and S. Backhaus, “A robust approach to chance constrained optimal power flow with renewable generation,” IEEE Transactions on Power Systems, vol. 31, no. 5, pp. 3840–3849, 2015.
[25] M. Zhu and S. Martinez, “On distributed convex optimization under inequality and equality constraints,” IEEE Transactions on Automatic Control, vol. 57, no. 1, pp. 151–164, 2011.
[26] J. Koshal, A. Nedić, and U. V. Shanbhag, “Multiuser optimization: Distributed algorithms and error analysis,” SIAM Journal on Optimization, vol. 21, no. 3, pp. 1046–1081, 2011.
[27] K. Sakurama and M. Miura, “Distributed constraint optimization on networked multi-agent systems,” Applied Mathematics and Computation, vol. 292, pp. 272–281, 2017.
[28] B. Turan, C. A. Uribe, H.-T. Wai, and M. Alizadeh, “Resilient primal–dual optimization algorithms for distributed resource allocation,” IEEE Transactions on Control of Network Systems, vol. 8, no. 1, pp. 282–294, 2020.
[29] T. Yang, X. Yi, J. Wu, Y. Yuan, D. Wu, Z. Meng, Y. Hong, H. Wang, Z. Lin, and K. H. Johansson, “A survey of distributed optimization,” Annual Reviews in Control, vol. 47, pp. 278–305, 2019.
[30] Y. Zheng and Q. Liu, “A review of distributed optimization: Problems, models and algorithms,” Neurocomputing, vol. 483, pp. 446–459, 2022.
[31] P. H. Calamai and J. J. Moré, “Projected gradient methods for linearly constrained problems,” Mathematical programming, vol. 39, no. 1, pp. 93–116, 1987.
[32] E. S. Levitin and B. T. Polyak, “Constrained minimization methods,” USSR Computational mathematics and mathematical physics, vol. 6, no. 5, pp. 1–50, 1966.
[33] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
[34] P. Armand, J. C. Gilbert, and S. Jan-Jégou, “A feasible bfgs interior point algorithm for solving convex minimization problems,” SIAM Journal on Optimization, vol. 11, no. 1, pp. 199–222, 2000.
[35] E. Wei, A. Ozdaglar, and A. Jadbabaie, “A distributed newton method for network utility maximization,” in 49th IEEE Conference on Decision and Control (CDC). IEEE, 2010, pp. 1816–1821.
[36] S. Athuraliya and S. H. Low, “Optimization flow control with newton-like algorithm,” Telecommunication Systems, vol. 15, no. 3, pp. 345–358, 2000.
[37] I. Necoara and J. Suykens, “Interior-point lagrangian decomposition method for separable convex optimization,” Journal of Optimization Theory and Applications, vol. 143, no. 3, pp. 567–588, 2009.
[38] H. Guo, X. Liu, H. Wei, and L. Ying, “Online convex optimization with hard constraints: Towards the best of two worlds and beyond,” Advances in Neural Information Processing Systems, vol. 35, pp. 36 426–36 439, 2022.
[39] S. Mannor, J. N. Tsitsiklis, and J. Y. Yu, “Online learning with sample path constraints.” Journal of Machine Learning Research, vol. 10, no. 3, 2009.
[40] H. Yu, M. Neely, and X. Wei, “Online convex optimization with stochastic constraints,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[41] J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,” IEEE/ACM Transactions on networking, vol. 8, no. 5, pp. 556–567, 2000.
[42] S. Hutchinson, B. Turan, and M. Alizadeh, “The impact of the geometric properties of the constraint set in safe optimization with bandit feedback,” arXiv preprint arXiv:2305.00889, 2023.
[43] R. Schneider, Convex bodies: the Brunn–Minkowski theory. Cambridge university press, 2014, no. 151.
[44] Y. Nesterov and B. T. Polyak, “Cubic regularization of newton method and its global performance,” Mathematical Programming, vol. 108, no. 1, pp. 177–205, 2006.

-A Proof of Lemma 2

Firstly we note that by the choice of $\tau\geq\sqrt{\Delta/H_{\cal X}}$ , we can ensure that $\Delta^{t}\leq H_{\cal X}$ and that ${\cal X}_{\Delta^{t}}$ is non-empty. Next, we show that $e_{i}^{t}\leq 1/(2L),~{}\forall t\geq 0$ . Note that $e_{i}^{t}$ is decreasing with $t$ , and therefore is maximized for $t=0$ :

\displaystyle e_{i}^{t}{\leq}e_{i}^{0}{=}{2\beta\sqrt{d_{i}}}\left(\eta^{0}{+}% 2L(d_{i}{-}1)(M\sqrt{n}\gamma^{0}{+}2\Delta^{0}\Gamma_{\cal X})\right)/{\mu^{3}}

(22)

For $\tau\geq{2\mu\Delta\Gamma_{\cal X}}/({M\sqrt{n}})$ and $d_{i}\leq\bar{d}$ , we get:

	$\displaystyle e_{i}^{0}$	$\displaystyle\leq\frac{2\beta\sqrt{\bar{d}}}{\mu^{3}\tau}\left(\frac{M}{8% \Gamma_{\cal X}}+\frac{4L(\bar{d}-1)M\sqrt{n}}{\mu}\right)$		(23)
		$\displaystyle={\beta M\sqrt{\bar{d}}\left(\mu+32L\Gamma_{\cal X}\sqrt{n}(\bar{% d}-1)\right)}/({4\mu^{4}\Gamma_{\cal X}\tau}).$		(24)

Next, using $\tau\geq{L\beta M\sqrt{\bar{d}}\left(\mu+32L\Gamma_{\cal X}\sqrt{n}(\bar{d}-1)% \right)}/({2\mu^{4}\Gamma_{\cal X}})$ :

e_{i}^{t}\leq e_{i}^{0}\leq 1/(2L).

(25)

We will prove the lemma by induction that if $\|\hat{\nabla}g_{i}^{k}-\nabla g_{i}(p_{i}^{k})\|\leq e_{i}^{k}$ holds for $k\in[\max\{0,t-d_{i}+1\},t-1]$ , then it holds for $k=t$ . Using Cauchy-Schwarz inequality:

\displaystyle\|\hat{\nabla}g_{i}^{t}{-}\nabla g_{i}(p_{i}^{t})\|\leq\sqrt{d_{i% }}\max_{j\in[d_{i}]}\|[\hat{\nabla}g_{i}^{t}]_{:,j}{-}[\nabla g_{i}(p_{i}^{t})% ]_{:,j}\|.

(26)

For a given $j\in[d_{i}]$ , by construction of $\hat{\nabla}g_{i}^{t}$ we have

[\hat{\nabla}g_{i}^{t}]_{:,j}=({g_{i}(p_{i}^{\ell_{j}}+\eta^{\ell_{j}}e_{j})-g% _{i}(p_{i}^{\ell_{j}})})/{\eta^{\ell_{j}}},

(27)

for some $\ell_{j}\in[\max\{0,t-d_{i}+1\},t]$ . Using the Taylor series expansion, we can rewrite the above as:

[\hat{\nabla}g_{i}^{t}]_{:,j}=[\nabla g_{i}(p_{i}^{\ell_{j}})]_{:,j}+R_{1}/% \eta^{\ell_{j}},

(28)

where $\|R_{1}\|\leq\beta(\eta^{\ell_{j}})^{2}/(2\mu^{3})$ follows from [44, Lemma 1] using $\beta/\mu^{3}$ -smoothness of $g_{i}$ . Accordingly,

$\displaystyle\\|\hat{\nabla}g_{i}^{t}-\nabla g_{i}(p_{i}^{t})\\|$	$\displaystyle\leq\sqrt{d_{i}}\max_{j\in[d_{i}]}\\|[\nabla g_{i}(p_{i}^{\ell_{j}% })]_{:,j}-[\nabla g_{i}(p_{i}^{t})]_{:,j}\\|$
	$\displaystyle~{}~{}~{}~{}+\sqrt{d_{i}}\beta\eta^{\ell_{j}}/(2\mu^{3})$	(29)
	$\displaystyle{\leq}\max_{{\ell_{j}}\in[\max\{0,t-d_{i}+1\},t]}\frac{\beta\sqrt% {d_{i}}}{\mu^{3}}\\|p_{i}^{\ell_{j}}-p_{i}^{t}\\|+\frac{\sqrt{d_{i}}\beta\eta^{% \ell_{j}}}{2\mu^{3}},$	(30)

where we used

\|[\nabla g_{i}(p_{i}^{\ell_{j}})]_{:,j}-[\nabla g_{i}(p_{i}^{t})]_{:,j}\|\leq% \|\nabla g_{i}(p_{i}^{\ell_{j}})-\nabla g_{i}(p_{i}^{t})\|,

(31)

for all $j\in[d_{i}]$ , and smoothness of $g_{i}$ . Furthermore, note that for $\tau\geq 2\bar{d}-1$ , $\eta^{t-d_{i}+1}\leq 4\eta^{t}$ and therefore for $t=0$ we have

\displaystyle\|\hat{\nabla}g_{i}^{0}-\nabla g_{i}(p^{0})\|

\displaystyle\leq 2\sqrt{d_{i}}\beta\eta^{0}/\mu^{3}\leq e_{i}^{0}.

(32)

Accordingly, the statement holds for $t=0$ , which covers the base case. For $t>0$ , we continue from (30) and bound $\|p_{i}^{\ell_{j}}-p_{i}^{t}\|$ as

$\displaystyle\\|p_{i}^{\ell_{j}}-p_{i}^{t}\\|$	$\displaystyle\leq\sum_{k={\ell_{j}}}^{t-1}\\|p_{i}^{k}-p_{i}^{k+1}\\|$	(33)
	$\displaystyle=\sum_{k={\ell_{j}}}^{t-1}\\|[\hat{\nabla}g_{i}^{k}]^{-1}(\hat{x}_% {i}^{k+1}-x_{i}^{k})\\|$	(34)
	$\displaystyle\leq\sum_{k={\ell_{j}}}^{t-1}\\|[\hat{\nabla}g_{i}^{k}]^{-1}\\|\\|% \hat{x}_{i}^{k+1}-x_{i}^{k}\\|.$	(35)

The following two lemmas, whose proofs can be found in Appendices -F and -G bound each of the terms in the above summation:

Lemma 3.

Suppose that $\|\hat{\nabla}g_{i}^{t}-\nabla g_{i}(p_{i}^{t})\|\leq 1/(2L)$ for some $t$ . Then $\sigma_{\min}(\hat{\nabla}g_{i}^{t})\geq 1/(2L)$ and $\|[\hat{\nabla}g_{i}^{t}]^{-1}\|\leq 2L$ .

Lemma 4.

For all $t\geq 0$ , if $x^{t}\in{\cal X}^{\textnormal{int}}$ , then for a user $i\in[n]$ the following holds:

\|\hat{x}_{i}^{t+1}-x_{i}^{t}\|\leq M\sqrt{n}\gamma^{t}+\Delta^{t}\Gamma_{\cal X}.

(36)

Using Lemmas 3 and 4, we get

	$\displaystyle\max_{{\ell_{j}}\in[\max\{0,t-d_{i}+1\},t]}\\|p_{i}^{{\ell_{j}}}-p% _{i}^{t}\\|$		(37)
	$\displaystyle\leq\max_{{\ell_{j}}\in[\max\{0,t-d_{i}+1\},t]}2L\sum_{k={\ell_{j% }}}^{t-1}M\sqrt{n}\gamma^{k}+\Delta^{k}\Gamma_{\cal X}$		(38)
	$\displaystyle\leq 2L(t-{\ell}_{\min})(M\sqrt{n}\gamma^{\ell_{\min}}+\Delta^{% \ell_{\min}}\Gamma_{\cal X}),$		(39)

where $\ell_{\min}=\max\{0,t-d_{i}+1\}$ . Lastly, note that $t-\ell_{\min}\leq d_{i}-1$ , $\gamma^{\ell_{\min}}/\gamma^{t}\leq 2$ , and $\Delta^{\ell_{\min}}/\Delta^{t}\leq 4$ for $\tau\geq 2\bar{d}-1$ , which gives the final result.

-B Proof of Proposition 2

We will prove by induction that if at iteration $t$ , $\forall k\in[\max\{t-\bar{d}+1,0\},t]$ , $x^{k}\in{{\cal X}}_{\frac{\sqrt{n}\eta^{k}}{\mu}}^{\textnormal{int}}$ , then $x^{t+1}\in{\cal X}^{\textnormal{int}}_{\frac{\sqrt{n}\eta^{t+1}}{\mu}}$ and use Assumption 3 that $x^{0}\in{\cal X}^{\textnormal{int}}_{\frac{\sqrt{n}\eta^{0}}{\mu}}$ . This will ensure that $x^{t+1,s}\in{\cal X}^{\textnormal{int}}$ as well by choice of $\Delta^{t}$ and $\eta^{t}$ . Therefore, we assume that $x^{k}\in{\cal X}^{\textnormal{int}}_{\frac{\sqrt{n}\eta^{k}}{\mu}}$ . Note that $\hat{x}^{t+1}\in{\cal X}^{\textnormal{int}}$ by definition.

For all $i\in[n]$ , we consider a modified utility function $\tilde{f}_{i}(x_{i})$ , which is equal to $f_{i}(x_{i})$ if $x_{i}\in{\cal X}_{i}$ , and an $L$ -smooth, $\mu$ -strongly concave extension with $\beta$ -smooth gradient beyond the set ${\cal X}_{i}$ . Accordingly, $\textnormal{dom}\tilde{f}_{i}=\mathbb{R}^{d_{i}}$ , and $\tilde{f}_{i}$ is $L$ -smooth and $\mu$ -strongly concave over ${\mathbb{R}}^{d_{i}}$ with $\beta$ -smooth gradient.

Using the modified utility function, we define the modified price response function

{\tilde{g}}_{i}(p_{i})=\underset{x_{i}\in{\mathbb{R}}^{d_{i}}}{\operatorname*{% arg\,max}}\tilde{f}_{i}(x_{i})-\langle x_{i},p_{i}\rangle.

(40)

The following Lemma, whose proof can be found in Appendix -H, characterizes the regularity properties of $\tilde{g}_{i}(p_{i})$ , $\forall i\in[n]$ , under Assumption 2:

Lemma 5.

For all $i\in[n]$ , let $\tilde{g}_{i}(p_{i})$ be the modified price response function in (40). Then, $\tilde{g}_{i}(p_{i})$ is bijective, $1/\mu$ -Lipschitz continuous and $\beta/\mu^{3}$ -smooth over ${\mathbb{R}}^{d_{i}}$ . Furthermore, let ${\cal P}_{i}=\{p_{i}\in{\mathbb{R}}^{d_{i}}:g_{i}(p_{i})\in{\cal X}_{i}^{% \textnormal{int}}\}$ . The following hold true:

1.

If $\tilde{g}_{i}(p_{i})\in{\cal X}_{i}^{\textnormal{int}}$ , then $p_{i}\in{\cal P}_{i}$ .
2.

If $p_{i}\in{\cal P}_{i}$ , then $\tilde{g}_{i}(p_{i})=g_{i}(p_{i})$ .

For each user $i\in[n]$ , we let $\tilde{x}_{i}^{t+1}=\tilde{g}_{i}(p_{i}^{t+1})$ and we rearrange the price update rule:

\tilde{x}_{i}^{t+1}-\hat{x}_{i}^{t+1}=\tilde{x}_{i}^{t+1}-x_{i}^{t}-\hat{% \nabla}g_{i}^{t}(p^{t+1}-p^{t}).

(41)

We can also write the Taylor expansion of the modified price response function $\tilde{g}_{i}(p)$ around $p_{i}^{t}$ :

\tilde{g}_{i}(p_{i}^{t+1})-\tilde{g}_{i}(p_{i}^{t})=\nabla\tilde{g}_{i}(p_{i}^% {t})(p_{i}^{t+1}-p_{i}^{t})+R_{1}.

(42)

We replace $\tilde{g}_{i}(p_{i}^{t})=g_{i}(p_{i}^{t})=x_{i}^{t}$ and $\nabla\tilde{g}_{i}(p_{i}^{t})=\nabla g_{i}(p_{i}^{t})$ (since $p_{i}^{t}\in{\cal P}_{i}$ ) and plug the above equation into (41):

\tilde{x}_{i}^{t+1}-\hat{x}_{i}^{t+1}=(\nabla g_{i}(p_{i}^{t})-\hat{\nabla}g_{% i}^{t})(p_{i}^{t+1}-p_{i}^{t})+R_{1}.

(43)

To bound the norm of the above equation, we use Lemma 2 to bound the norm of the first term and [44, Lemma 1] to bound the second term:

\|\tilde{x}_{i}^{t+1}-\hat{x}_{i}^{t+1}\|\leq e_{i}^{t}\|p_{i}^{t+1}-p_{i}^{t}% \|+\frac{\beta}{2\mu^{3}}\|p_{i}^{t+1}-p_{i}^{t}\|^{2}.

(44)

Rearranging the price update rule and using Lemmas 2 and 4 we can bound the norm of the price change:

	$\displaystyle\begin{split}\\|p_{i}^{t+1}-p_{i}^{t}\\|&\leq\\|[\hat{\nabla}g_{i}^{% t}]^{-1}\\|\\|\hat{x}_{i}^{t+1}-x_{i}^{t}\\|\end{split}$			(45)
		$\displaystyle\leq 2L(M\sqrt{n}\gamma^{t}+\Delta^{t}\Gamma_{\cal X}).$		(46)

Note that both upper bounds for $e_{i}^{t}$ and $\|p_{i}^{t+1}-p_{i}^{t}\|$ are decreasing with $t$ . We can bound $e_{i}^{t}$ using $\tau>\frac{2\mu\Delta\Gamma_{\cal X}}{M\sqrt{n}}$ and $1\leq\Gamma_{\cal X}$ as:

	$\displaystyle e_{i}^{t}$	$\displaystyle<{\beta M\sqrt{d_{i}n}(\mu/\sqrt{n}+32L(\bar{d}-1))}/({4\mu^{4}(t% +\tau)})$		(47)
		$\displaystyle={\beta M\sqrt{d_{i}n}\gamma^{t}(\mu/\sqrt{n}+32L(\bar{d}-1))}/({% 4\mu^{3}}),$		(48)

and further upper bound $\|p_{i}^{t+1}-p_{i}^{t}\|$ as

\|p_{i}^{t+1}-p_{i}^{t}\|\leq 3LM\sqrt{n}\gamma^{t}.

(49)

Plugging the above bounds and $\gamma^{t}$ into (44):

\displaystyle\begin{split}\|\tilde{x}_{i}^{t+1}-\hat{x}_{i}^{t+1}\|&<\frac{3% \beta LM^{2}n}{4\mu^{5}(t+\tau)^{2}}\Big{(}6L\\ &+{\sqrt{d_{i}}\left(\mu/\sqrt{n}+32L(\bar{d}-1)\right)}\Big{)}.\end{split}

(50)

Next, using Cauchy-Schwarz inequality, we bound $\|\tilde{x}^{t+1}-x^{t+1}\|$ as

	$\displaystyle\begin{split}\\|\tilde{x}^{t+1}-\hat{x}^{t+1}\\|&<\frac{3\beta LM^{% 2}n^{3/2}}{4\mu^{5}(t+\tau)^{2}}\Big{(}6L\\ &+{\sqrt{d}\left(\mu/\sqrt{n}+32L(\bar{d}-1)\right)}\Big{)}\end{split}$			(51)
		$\displaystyle={3\Delta^{t}}/{4},$		(52)

where we used the definition of $\Delta^{t}$ and $\sum_{i\in[n]}\sqrt{d_{i}}\leq\sqrt{dn}$ . This establishes that by definition of a shrunk set and $\Delta^{t}/4=\frac{\sqrt{n}\eta^{t+1}}{\mu}$ , $\tilde{x}^{t+1}\in{\cal X}^{\textnormal{int}}_{\frac{\sqrt{n}\eta^{t+1}}{\mu}}$ . Furthermore, let $\tilde{x}_{i}^{t+1,s}=\tilde{g_{i}}(p_{i}^{t+1,s})$ . Using $1/\mu$ -Lipschitz continuity of $\tilde{g_{i}}(p_{i})$ :

\|\tilde{x}_{i}^{t+1,s}-\tilde{x}_{i}^{t+1}\|\leq{\Delta^{t}}/(4\sqrt{n}),

(53)

and $\|\tilde{x}^{t+1,s}-\tilde{x}^{t+1}\|\leq\Delta^{t}/4$ . Accordingly, we have

\|\tilde{x}^{t+1,s}-\hat{x}^{t+1}\|<\Delta^{t},

(54)

which establishes that $\tilde{x}^{t+1,s}\in{\cal X}^{\textnormal{int}}$ . Lastly, note that if $\tilde{x}^{t+1},\tilde{x}^{t+1,s}\in{\cal X}^{\textnormal{int}}$ , then for all $i\in[n]$ , $\tilde{x}_{i}^{t+1},\tilde{x}_{i}^{t+1,s}\in{\cal X}_{i}^{\textnormal{int}}$ , or equivalently, $\tilde{g}_{i}(p_{i}^{t+1}),\tilde{g}_{i}(p_{i}^{t+1,s})\in{\cal X}_{i}^{% \textnormal{int}}$ . Using Lemma 5 we have that $p_{i}^{t+1},p_{i}^{t+1,s}\in{\cal P}_{i}$ , $\forall i\in[n]$ . Hence, $\tilde{g}_{i}(p^{t+1})=g_{i}(p^{t+1})$ and $\tilde{x}_{i}^{t+1}=x_{i}^{t+1}$ as well as $\tilde{g}_{i}(p^{t+1,s})=g_{i}(p^{t+1,s})$ and $\tilde{x}_{i}^{t+1,s}=x_{i}^{t+1,s}$ for all $i\in[n]$ , which proves the proposition.

-C Proof of Theorem 1

We denote the regret incurred by the update stage as $R_{u}(T)=\sum_{t=1}^{T/2}f(x^{\star})-f(x^{t})$ and the regret incurred by the sampling stage as $R_{s}(T)=\sum_{t=1}^{T/2}f(x^{\star})-f(x^{t,s})$ . Let $y^{t+1}=x^{t}+\gamma^{t}p_{t}$ . By Lemma 5, we know that $p^{t}=\nabla f(x^{t})$ , $\forall t\geq 0$ , since $x^{t}\in{\cal X}^{\textnormal{int}}$ by Proposition 2. For $t\geq 1$ , we write using strong concavity:

	$\displaystyle f(x^{\star})-f(x^{t})\leq\langle-\nabla f(x^{t}),x^{t}-x^{\star}% \rangle-\frac{\mu}{2}\\|x^{t}-x^{\star}\\|^{2}$	(55)
	$\displaystyle=\frac{1}{\gamma^{t}}\langle x^{t}-y^{t+1},x^{t}-x^{\star}\rangle% -\frac{\mu}{2}\\|x^{t}-x^{\star}\\|^{2}$	(56)
$\displaystyle\begin{split}&=\frac{1}{2\gamma^{t}}\left(\\|x^{t}-y^{t+1}\\|^{2}+% \\|x^{t}-x^{\star}\\|^{2}-\\|y^{t+1}-x^{\star}\\|^{2}\right)\\ &\hskip 28.45274pt-\frac{\mu}{2}\\|x^{t}-x^{\star}\\|^{2}.\end{split}$		(57)

Next, we bound the $\|y^{t+1}-x^{\star}\|^{2}$ term using Theorem 2 as follows:

	$\displaystyle\\|y^{t+1}-x^{\star}\\|^{2}\geq\\|\Pi_{{\cal X}_{\Delta^{t}}}(y^{t+1% })-\Pi_{{\cal X}_{\Delta^{t}}}(x^{\star})\\|^{2}$	(58)
	$\displaystyle=\\|\hat{x}^{t+1}-\Pi_{{\cal X}_{\Delta^{t}}}(x^{\star})\\|^{2}$	(59)
	$\displaystyle=\\|\hat{x}^{t+1}-x^{t+1}+x^{t+1}-x^{\star}+x^{\star}-\Pi_{{\cal X% }_{\Delta^{t}}}(x^{\star})\\|^{2}$	(60)
	$\displaystyle=\\|\hat{x}^{t+1}-x^{t+1}\\|^{2}+\\|x^{t+1}-x^{\star}\\|^{2}+\\|x^{% \star}-\Pi_{{\cal X}_{\Delta^{t}}}(x^{\star})\\|^{2}$
	$\displaystyle+2\langle\hat{x}^{t+1}{-}x^{t+1},x^{t+1}{-}x^{\star}\rangle{+}2% \langle x^{t+1}{-}x^{\star},x^{\star}{-}\Pi_{{\cal X}_{\Delta^{t}}}(x^{\star})\rangle$
	$\displaystyle+2\langle x^{\star}-\Pi_{{\cal X}_{\Delta^{t}}}(x^{\star}),\hat{x% }^{t+1}-x^{t+1}\rangle$	(61)
$\displaystyle\begin{split}&\geq\\|x^{t+1}-x^{\star}\\|^{2}-2\\|\hat{x}^{t+1}-x^{t% +1}\\|\\|x^{t+1}-x^{\star}\\|\\ &-2\\|x^{t+1}-x^{\star}\\|\\|x^{\star}-\Pi_{{\cal X}_{\Delta^{t}}}(x^{\star})\\|\\ &-2\\|x^{\star}-\Pi_{{\cal X}_{\Delta^{t}}}(x^{\star})\\|\\|\hat{x}^{t+1}-x^{t+1}% \\|\end{split}$		(62)
$\displaystyle\begin{split}&\geq\\|x^{t+1}-x^{\star}\\|^{2}-2\Delta^{t}R(\Gamma_{% \cal X}+3/4)-3/2(\Delta^{t})^{2}\Gamma_{\cal X}\end{split}$		(63)
	$\displaystyle\vcentcolon=\\|x^{t+1}-x^{\star}\\|^{2}-C_{t},$	(64)

where the last inequality uses $\|{x}^{t+1}-\hat{x}^{t+1}\|<3\Delta^{t}/4$ given by Proposition 2 and Proposition 1 to bound $\|x^{\star}{-}\Pi_{{\cal X}_{\Delta^{t}}}(x^{\star})\|$ . Plugging this in (LABEL:eq:instaregret):

\begin{split}f(x^{\star})-f(x^{t})\leq&\frac{M^{2}n\gamma^{t}}{2}-\frac{\mu}{2% }\|x^{t}-x^{\star}\|^{2}+\frac{C^{t}}{2\gamma^{t}}\\ &{+}\frac{1}{2\gamma^{t}}(\|x^{t}{-}x^{\star}\|^{2}-\|x^{t+1}-x^{\star}\|^{2})% .\end{split}

(65)

Summing from $t=1$ to $T/2$ telescopes the $\|x^{t}-x^{\star}\|^{2}$ terms:

	$\displaystyle\begin{split}&nR_{u}(T)\leq\frac{M^{2}n\log(T/2)}{2\mu}+\frac{\mu% \tau}{2}\\|x^{1}-x^{\star}\\|^{2}\\ &\hskip 28.45274pt+\sum_{t=2}^{T/2}\left(\frac{1}{2\gamma^{t}}-\frac{1}{2% \gamma^{t-1}}-\frac{\mu}{2}\right)\\|x^{t}-x^{\star}\\|^{2}\\ &\hskip 28.45274pt-\frac{1}{2\gamma^{T/2}}\\|x^{T/2+1}-x^{\star}\\|^{2}+\sum_{t=% 1}^{T/2}\frac{C^{t}}{2\gamma^{t}}\end{split}$			(66)
		$\displaystyle\leq\frac{M^{2}n\log(T/2)}{2\mu}+\frac{\mu\tau}{2}\\|x^{1}-x^{% \star}\\|^{2}+\sum_{t=1}^{T/2}\frac{C^{t}}{2\gamma^{t}}.$		(67)

Finally, note that $C^{t}={\cal O}(1/t^{2})$ because the it consists of terms $\Delta^{t}$ and $(\Delta^{t})^{2}$ . Therefore, we can use the bounds ${\sum}_{t=1}^{T/2}\frac{1}{t+\tau}\leq{\sum}_{t=1}^{T/2}\frac{1}{t+2}\leq\log(% T/2)$ and for $k\geq 2$ , $\sum_{t=1}^{T/2}\frac{1}{(t+2)^{k}}\leq 1$ to show that:

\begin{split}\sum_{t=1}^{T/2}\frac{C^{t}}{2\gamma^{t}}\leq&\mu{\Delta R(3/4+% \Gamma_{\cal X})\log(T/2)}+{3\mu\Delta^{2}\Gamma_{\cal X}}/4.\end{split}

(68)

Plugging (68) into (67) and dividing by both sides by $n$ , we get the regret incurred by the update stages. For the sampling stages, we note that due to the strong concavity of $f$

\displaystyle f(x^{t}){-}f(x^{t,s})

\displaystyle\leq\langle\nabla f(x^{t,s}),x^{t}{-}x^{t,s}\rangle\leq M\sqrt{n}% \frac{\Delta^{t-1}}{4}.

(69)

Accordingly $f(x^{\star})-f(x^{t,s})\leq f(x^{\star})-f(x^{t})+M\sqrt{n}\Delta^{t-1}/4$ . Summing from $t=1$ to $T/2$ , we get

\displaystyle nR_{s}(T)

\displaystyle{=}nR_{u}(T){+}\frac{M}{4}\sum_{t=1}^{T/2}\Delta^{t-1}{\leq}nR_{u% }(T){+}\frac{\Delta M\sqrt{n}}{4},

(70)

which gives the final result as

R(T)\leq 2R_{u}(T)+{\Delta M}/({4\sqrt{n}}).

(71)

To get the convergence result, we rearrange (65):

	$\displaystyle\begin{split}\\|x^{t+1}-x^{\star}\\|^{2}&\leq\\|x^{t}-x^{\star}\\|^{2% }(1-\mu\gamma^{t})+M^{2}n(\gamma^{t})^{2}\\ &+C^{t}+2\gamma^{t}(f(x^{t})-f(x^{\star}))\end{split}$			(72)
	$\displaystyle\leq$	$\displaystyle\\|x^{t}-x^{\star}\\|^{2}(1-\mu\gamma^{t})+M^{2}n(\gamma^{t})^{2}+C% ^{t}.$		(73)

We get an equation like the above for all $t\geq 0$ . We multiply each by $(1-\mu\gamma^{t+1})$ for $t<T/2-1$ and sum them from $t=0$ to $t=T/2-1$ to get:

	$\displaystyle\begin{split}&\\|x^{T/2}-x^{\star}\\|^{2}\leq\\|x^{0}-x^{\star}\\|^{2% }\prod_{t=0}^{T/2-1}(1-\mu\gamma^{t})\\ &+M^{2}n\sum_{t=0}^{T/2-1}(\gamma^{t})^{2}\prod_{i=t+1}^{T/2-1}(1-\mu\gamma^{i% })\\ &+\sum_{t=0}^{T/2-1}C^{t}\prod_{i=t+1}^{T/2-1}(1-\mu\gamma^{i})\end{split}$			(74)
	$\displaystyle\begin{split}&\leq\\|x^{0}-x^{\star}\\|^{2}\frac{\tau-1}{\tau-1+T/2% }+\frac{M^{2}n\log(T/2)}{\mu^{2}(T/2+\tau-1)}\\ &+\frac{2R(3/4+\Gamma_{\cal X})\Delta\log(T/2)}{(T/2+\tau-1)}+\frac{3\Delta^{2% }\Gamma_{\cal X}}{2(T/2+\tau-1)}.\end{split}$			(75)

which completes the proof.

-D Proof of Remark 2

For a user $i\in[n]$ , using the modified price response function $\tilde{g}_{i}(p_{i})$ introduced in the proof of Proposition 2, we have that

\|\tilde{x}_{i}^{-t}-x_{i}^{0}\|\leq{\eta^{0}}/{\mu},~{}\forall t\in[-d_{i},-1],

(76)

which implies that $\tilde{x}_{i}^{-t}\in{\cal X}_{i}^{\textnormal{int}}$ because $x^{0}\in{\cal X}^{\textnormal{int}}_{\frac{\eta^{0}\sqrt{n}}{\mu}}$ . As such, $\tilde{x}_{i}^{-t}=x_{i}^{-t}$ and $p_{i}^{-t}=\nabla f_{i}(x_{i}^{-t})$ .

BERKAY TURAN is pursuing the Ph.D. degree in Electrical and Computer Engineering at the University of California, Santa Barbara. He received the B.Sc. degree in Electrical and Electronics Engineering as well as the B.Sc. degree in Physics degree from Boğaziçi University, Istanbul, Turkey, in 2018. The overarching goal of his research is to design network control, optimization, and learning frameworks to promote efficiency and resiliency in societal-scale cyber-physical systems.

SPENCER HUTCHINSON received the B.S. degree in electrical engineering from Colorado School of Mines in 2021. He is currently pursuing the Ph.D. degree in electrical and computer engineering from the University of California, Santa Barbara in Santa Barbara, CA, USA. His research interests include the design and analysis of optimization and learning algorithms for the control of human-cyber-physical systems.

MAHNOOSH ALIZADEH is an associate professor of Electrical and Computer Engineering at the University of California Santa Barbara. She received the B.Sc. degree (’09) in Electrical Engineering from Sharif University of Technology and the M.Sc. (’13) and Ph.D. (’14) degrees in Electrical and Computer Engineering from the University of California Davis. From 2014 to 2016, she was a postdoctoral scholar at Stanford University. Her research interests are focused on designing network control, optimization, and learning frameworks to promote efficiency and resiliency in societal-scale cyber-physical systems. Dr. Alizadeh is a recipient of the NSF CAREER award.

-E Proof of Lemma 1

By definition, $f_{i}(x_{i})$ is strongly concave over ${\cal X}_{i}$ , therefore the optimization problem ${\max}_{x\in{\textnormal{dom}f_{i}}}f_{i}(x_{i})-\langle x_{i},p_{i}\rangle$ is strongly concave and has a unique solution for $p_{i}\in{\cal P}_{i}$ . Since ${\cal X}_{i}\subseteq\textnormal{dom}f_{i}$ by Assumption 1, the optimal solution is in the interior of $\textnormal{dom}f_{i}$ . Therefore the first-order optimality condition implies that the optimal solution $g_{i}(p_{i})$ satisfies

p_{i}=\nabla f_{i}(g_{i}(p_{i})),

(77)

which implies that $\nabla{f}_{i}$ is surjective for $p_{i}\in{\cal P}_{i}$ . We also know that the gradient of a strongly concave function is injective³³3To see this, suppose that $x_{1}\neq x_{2}$ and therefore $\|x_{1}-x_{2}\|>0$ . If $\nabla f(x_{1})=\nabla f(x_{2})$ , (1) results in $0\geq\mu\|x_{1}-x_{2}\|^{2}$ , which is a contradiction and $x_{1}=x_{2}$ must hold., therefore, $\nabla{f}_{i}$ is bijective and invertible and ${g}_{i}(p_{i})=\nabla{f}_{i}^{-1}(p_{i})$ , which also proves that $g_{i}(p_{i})$ is bijective. By the inverse function theorem, we get that:

\nabla{g}_{i}(p_{i})=[\nabla^{2}{f}_{i}({g}_{i}(p_{i}))]^{-1}.

(78)

Since ${f}_{i}$ is $L$ -smooth and $\mu$ -strongly concave, inverse of it’s Hessian has eigenvalues in $[-1/\mu,-1/L]$ , which results in

\|\nabla{g}_{i}(p_{i})\|=\|[\nabla^{2}{f}_{i}({g}_{i}(p_{i}))]^{-1}\|\leq 1/\mu,

(79)

proving the Lipschitz property of ${g}_{i}(p_{i})$ . To show smoothness, we let $x_{i}^{1}={g}_{i}(p_{i}^{1})$ and $x_{i}^{2}={g}_{i}(p_{i}^{2})$ and write:

	$\displaystyle\\|\nabla{g}_{i}(p_{i}^{1}){-}\nabla{g}_{i}(p_{i}^{2})\\|=\\|[\nabla% ^{2}{f}_{i}(x_{i}^{1})]^{-1}{-}[\nabla^{2}{f}_{i}(x_{i}^{2})]^{-1}\\|$		(80)
	$\displaystyle=\\|[\nabla^{2}{f}_{i}(x_{i}^{1})]^{-1}(\nabla^{2}{f}_{i}(x_{i}^{2% }){-}\nabla^{2}{f}_{i}(x_{i}^{1}))[\nabla^{2}{f}_{i}(x_{i}^{2})]^{-1}\\|$		(81)
	$\displaystyle\leq{\beta}\\|x_{i}^{1}-x_{i}^{2}\\|/{\mu^{2}}\leq{\beta}\\|p_{i}^{1% }-p_{i}^{2}\\|/{\mu^{3}},$		(82)

where the last inequality uses $1/\mu$ -Lipschitz continuity of ${g}_{i}(p_{i})$ , which proves $\beta/\mu^{3}$ -smoothness of ${g}_{i}(p_{i})$ .

-F Proof of Lemma 3

Note that for $p_{i}^{t}\in{\cal P}_{i}$ , $\nabla g_{i}(p^{t})=[\nabla^{2}f_{i}(g_{i}(p^{t}))]^{-1}$ is symmetric by Schwarz’s theorem, since $\nabla^{2}f_{i}(g_{i}(p_{i}))$ is $\beta$ -Lipschitz continuous for $p_{i}\in{\cal P}_{i}$ . Accordingly, the minimum singular value of $\nabla g_{i}(p_{i}^{t})$ is equal to smallest absolute eigenvalue of $[\nabla^{2}f_{i}(g_{i}(p^{t}))]^{-1}$ , i.e., $\sigma_{\min}(\nabla g_{i}(p_{i}^{t}))=1/L$ . This implies that if $\|\hat{\nabla}g_{i}^{t}-\nabla g_{i}(p_{i}^{t})\|\leq 1/(2L)$ holds, then

	$\displaystyle\sigma_{\min}(\hat{\nabla}g_{i}^{t})=\underset{\\|x\\|=1}{\min}\\|% \hat{\nabla}g_{i}^{t}x\\|$		(83)
	$\displaystyle=\underset{\\|x\\|=1}{\min}\\|\nabla g_{i}(p_{i}^{t})x+(\hat{\nabla}% g_{i}^{t}-\nabla g_{i}(p_{i}^{t}))x\\|$		(84)
	$\displaystyle\geq\underset{\\|x\\|=1}{\min}\\|\nabla g_{i}(p_{i}^{t})x\\|-% \underset{\\|x\\|=1}{\max}\\|(\hat{\nabla}g_{i}^{t}-\nabla g_{i}(p_{i}^{t}))x\\|$		(85)
	$\displaystyle=1/L-1/(2L)\geq 1/(2L),$		(86)

which implies that $\|[\hat{\nabla}g_{i}^{t}]^{-1}\|=1/\sigma_{\min}(\hat{\nabla}g_{i}^{t})\leq 2L$ .

-G Proof of Lemma 4

To bound $\|\hat{x}_{i}^{t+1}-x_{i}^{t}\|$ , we will use the following as an auxiliary result:

Theorem 2.

[43, Theorem 1.2.1] Let ${\cal X}$ be a convex and compact set in $\mathbb{R}^{d}$ . Then, the metric projection onto ${\cal X}$ is contracting, that is,

\|\Pi_{\cal X}(x)-\Pi_{\cal X}(y)\|\leq\|x-y\|,~{}\forall x,y,\in{\mathbb{R}}^% {d}.

Using the above result, we bound $\|\hat{x}_{i}^{t+1}-x_{i}^{t}\|$ as:

	$\displaystyle\\|\hat{x}_{i}^{t+1}-x_{i}^{t}\\|\leq\\|\hat{x}^{t+1}-x^{t}\\|$		(87)
	$\displaystyle=\\|\Pi_{{\cal X}_{\Delta^{t}}}(x^{t}+p^{t}\gamma^{t})-\Pi_{{\cal X% }_{\Delta^{t}}}(x^{t})+\Pi_{{\cal X}_{\Delta^{t}}}(x^{t})-x^{t}\\|$		(88)
	$\displaystyle\leq\\|\Pi_{{\cal X}_{\Delta^{t}}}(x^{t}+p^{t}\gamma^{t})-\Pi_{{% \cal X}_{\Delta^{t}}}(x^{t})\\|{+}\\|\Pi_{{\cal X}_{\Delta^{t}}}(x^{t})-x^{t}\\|$		(89)
	$\displaystyle\leq\\|p^{t}\gamma^{t}\\|+\Delta^{t}\Gamma_{\cal X}\leq M\sqrt{n}% \gamma^{t}+\Delta^{t}\Gamma_{\cal X},$		(90)

where we used $\|p_{i}^{t}\|=\|\nabla f_{i}(x_{i}^{t})\|\leq M$ since $x_{i}^{t}\in{\cal X}_{i}^{\textnormal{int}}$ , and Proposition 1.

-H Proof of Lemma 5

The first part of the lemma follows from the same steps as in Lemma 1 for $p_{i}\in{\mathbb{R}}^{d_{i}}$ instead of $p_{i}\in{\cal P}_{i}$ , and using $\tilde{f}_{i}$ and $\tilde{g}_{i}$ instead of $f_{i}$ and $g_{i}$ .

Next, we prove the second part of the lemma. For the first statement, given a $p_{i}\in{\mathbb{R}}^{d_{i}}$ , suppose that $\tilde{g}_{i}(p_{i})\in{\cal X}_{i}^{\textnormal{int}}$ . This implies that there exists $x_{i}\in{\cal X}_{i}^{\textnormal{int}}$ that satisfies $\nabla\tilde{f}_{i}(x_{i})=p_{i}$ . Since $\tilde{f}_{i}(x_{i})=f_{i}(x_{i})$ for $x_{i}\in{\cal X}_{i}^{\textnormal{int}}$ , the same $x_{i}$ solves the optimization problem in (5), which implies $g_{i}(p_{i})=\tilde{g}_{i}(p_{i})$ . Therefore, $g_{i}(p_{i})\in{\cal X}_{i}^{\textnormal{int}}$ , which proves $p_{i}\in{\cal P}_{i}$ by definition.

To prove the second statement, note that if $p_{i}\in{\cal P}_{i}$ , then $g_{i}(p_{i})\in{\cal X}_{i}^{\textnormal{int}}$ . Since ${\cal X}_{i}\subseteq\textnormal{dom}f_{i}$ by Assumption 1, the first order optimality condition of (5) implies that there exists $x_{i}=g_{i}(p_{i})\in{\cal X}_{i}^{\textnormal{int}}$ such that $\nabla f_{i}(x_{i})=p_{i}$ . The same $x_{i}$ solves the optimization problem (40), since $f_{i}(x_{i})=\tilde{f}_{i}(x_{i})$ for $x_{i}\in{\cal X}_{i}^{\textnormal{int}}$ . The optimal solution to (40) has to be unique due to strong concavity, therefore it must hold true that $\tilde{g}_{i}(p_{i})=g_{i}(p_{i})$ .

Nomenclature

$n$	Number of users
$f_{i}$	Utility function of user $i$
$\nabla f_{i}$	Gradient of $f_{i}$
$f$	Sum of the $n$ $f_{i}$ ’s
$\nabla f$	Gradient of $f$
$x_{i}$	Resource demand vector of user $i$
$x$	Concatenated resource demand of $n$ $x_{i}$ ’s
$x^{\star}$	Optimal solution
$f^{\star}$	Optimal objective value
$d_{i}$	Dimension of $x_{i}$
$\bar{d}$	Highest dimension of $x_{i}$ among the users
$\textnormal{dom}f_{i}$	Domain of $f_{i}$
$\cal X$	Feasible set
${\cal X}_{i}$	The set of values that user $i$ ’s resource demand vector can take in the feasible set ${\cal X}$
${\cal X}^{\textnormal{int}},{\cal X}_{i}^{\textnormal{int}}$	Interiors of sets ${\cal X}$ , ${\cal X}_{i}$
$R$	Upper bound on the diameter of ${\cal X}$
$\mu$	Strong concavity constant for all $f_{i}$
$L$	Smootness constant for all $f_{i}$
$M$	Lipschitz constant for all $f_{i}$
$\beta$	Smoothness constant for all $\nabla f_{i}$
$p_{i}$	Resource price vector for user $i$
$p$	Concatenated resource price vector of $n$ $p_{i}$ ’s
$g_{i}$	Price response function of user $i$
$g$	Concatenated price response function of $n$ $g_{i}$ ’s
$R(T)$	Regret incurred after $T$ iterations
$t$	Iteration index
$p_{i}^{t}$	Price vector of user $i$ at iteration $t$
$p^{t}$	Concatenated price vector of $n$ $p_{i}^{t}$ ’s
$x_{i}^{t}$	Resource demand vector of user $i$ at iteration $t$
$x^{t}$	Concatenated resource demand vector of $n$ $x_{i}^{t}$ ’s
$\bar{\mathcal{B}}(r)$	Closed Euclidean ball with radius $r$ centered at origin
${\cal B}(r)$	Open Euclidean ball with radius $r$ centered at origin
${\cal X}_{\Delta}$	Shrunk version of ${\cal X}$ by an amount ${\Delta}$
$H_{\cal X}$	Maximum shrinkage of ${\cal X}$
$\hat{x}_{i}^{t}$	Desired resource demand vector of user $i$ at iteration $t$
$\hat{x}^{t}$	Desired concatenated resource demand vector of $n$ $\hat{x}_{i}^{t}$ ’s
$\hat{\nabla}g_{i}^{t}$	Jacobian estimate of user $i$ ’s price response function at iteration $t$
$p_{i}^{t,s}$	Resource price for user $i$ at sampling stage of iteration $t$
$x_{i}^{t,s}$	Resource demand of user $i$ at sampling stage of iteration $t$
$\Delta^{t}$	Amount of shrinkage of the feasible set ${\cal X}$ at iteration $t$
$\gamma^{t}$	Step-size of the algorithm at the update stage
$\eta^{t}$	The amount of price variation at the sampling stage
$\tau$	Constant shift in the denominator of $\gamma^{t}$ and $\Delta^{t}$
$\mathrm{Sharp}_{\mathcal{X}}$	Sharpness of ${\cal X}$
$\Gamma_{\cal X}$	Upper bound on the sharpness constant of ${\cal X}$
${\cal P}_{i}$	Set of prices that induce a resource demand in ${\cal X}_{i}$ for user $i$

$\displaystyle\\|p_{i}^{\ell_{j}}-p_{i}^{t}\\|$	$\displaystyle\leq\sum_{k={\ell_{j}}}^{t-1}\\|p_{i}^{k}-p_{i}^{k+1}\\|$	(33)
	$\displaystyle=\sum_{k={\ell_{j}}}^{t-1}\\|[\hat{\nabla}g_{i}^{k}]^{-1}(\hat{x}_% {i}^{k+1}-x_{i}^{k})\\|$	(34)
	$\displaystyle\leq\sum_{k={\ell_{j}}}^{t-1}\\|[\hat{\nabla}g_{i}^{k}]^{-1}\\|\\|% \hat{x}_{i}^{k+1}-x_{i}^{k}\\|.$	(35)

A Safe First-Order Method for Pricing-Based Resource Allocation in Safety-Critical Networks

Abstract

I Introduction

Definition 1.

Definition 2.

Definition 3.

II Problem Setup

Assumption 1.

Assumption 2.

Example 1 (Utility function).

III Safe Pricing Algorithm for NUM

III-A Geometric Properties of the Feasible Set

Definition 4.

Example 2.

Remark 1.

Definition 5.

III-B Description of the Algorithm

Assumption 3.

Remark 2.

Remark 3.

IV Feasibility and Regret Analysis

Definition 6.

Proposition 1.

Example 3 (Sharpness of a polytope [42]).

Example 4 (Sharpness of a ball in ℝdsuperscriptℝ𝑑{\mathbb{R}}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT).

Lemma 1.

Lemma 2.

IV-A Feasibility Analysis

Proposition 2.

IV-B Regret and Convergence Analysis

Theorem 1.

Remark 4.

V Numerical Studies

V-A Benchmarking Study

V-B SPNUM on Non-linear Feasible Set

V-C Impact of Sharpness on Regret

VI Conclusion

References

-A Proof of Lemma 2

Lemma 3.

Lemma 4.

-B Proof of Proposition 2

Lemma 5.

-C Proof of Theorem 1

-D Proof of Remark 2

-E Proof of Lemma 1

-F Proof of Lemma 3

-G Proof of Lemma 4

Theorem 2.

-H Proof of Lemma 5

Nomenclature

Example 4 (Sharpness of a ball in ${\mathbb{R}}^{d}$ ).