On-demand Quantization for Green Federated Generative Diffusion in Mobile Edge Networks

Bingkun Lai1, Jiayi He1, Jiawen Kang1,Gaolei Li2,Minrui Xu3,Tao zhang4,Shengli Xie1 This work was supported by the National Natural Science Foundation of China (NSFC) under Grants No. 62102099, No. U22A2054, the Pearl River Talent Recruitment Program under Grant 2021QN02S643, the Talent Fund of Beijing Jiaotong University under Grant 2023XKRC050, the National Funded Postdoctoral Research Program under Grant GZC20230223, and Guangzhou Basic Research Program under Grant 2023A04J1699, and is also supported by Energy Research Test-Bed and Industry Partnership Funding Initiative, Energy Grid (EG) 2.0 programme, DesCartes and MOE Tier 1 (RG87/22). corresponding author: Jiawen Kang (e-mail: [email protected]) 1School of Automation, Guangdong University of Technology, Guangzhou, China
2School of Electronics Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China 3School of Computer Science and Engineering, Nanyang Technological University, Singapore
4 School of Software Engineering, Beijing Jiaotong University, Beijing, China

Abstract

Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things. Federated learning is a promising technique for effectively training GAI models in mobile edge networks due to its data distribution. However, there is a notable issue with communication consumption when training large GAI models like generative diffusion models in mobile edge networks. Additionally, the substantial energy consumption associated with training diffusion-based models, along with the limited resources of edge devices and complexities of network environments, pose challenges for improving the training efficiency of GAI models. To address this challenge, we propose an on-demand quantized energy-efficient federated diffusion approach for mobile edge networks. Specifically, we first design a dynamic quantized federated diffusion training scheme considering various demands from the edge devices. Then, we study an energy efficiency problem based on specific quantization requirements. Numerical results show that our proposed method significantly reduces system energy consumption and transmitted model size compared to both baseline federated diffusion and fixed quantized federated diffusion methods while effectively maintaining reasonable quality and diversity of generated data.

Index Terms:

Federated Diffusion, Energy Efficient, Generative AI, Generative Diffusion, On-demand Quantization.

I Introduction

As the carrier of content flow, mobile edge networks become essential fundamentals of next-generation applications like Metaverse [1] and the Industrial Internet of Things. The generative models like GAN [2] have demonstrated excellent performance in trajectory prediction [3], education [4], healthcare [5], and other scenarios involving the Internet of Things and the Internet of Vehicles. Therefore, more creative generative diffusion models are expected to be deployed in mobile edge networks for next-generation application scenarios such as the 6G communication networks [6] and vehicular metaverses [7], [8]. Towards deploying the generative diffusion models in mobile edge networks, distributed training schemes based on federated learning called federated diffusions [9, 10]are proposed. These innovative models enable mobile edge networks to achieve higher productivity and efficiency in next-generation application scenarios.

During the training phase of federated diffusions, the model needs to be transferred between the server and edge devices at each training step to update the global model [11]. This is not a problem when training traditional AI models using federated learning due to their small number of model parameters. However, generative diffusion models are usually large, leading to significant energy expenditure during the federated training process [12]. Therefore, reducing training energy costs is crucial for improving overall operational efficiency in mobile edge networks [13].

Recent research on federated diffusions has primarily focused on improving their task performance [9]. This includes endeavors to elevate the quality and diversity of generated content. However, there is still limited depth and scope in studies that aim to optimize the overall training cost of these models. The authors in [14], [15] studied the problem of energy-efficient resource allocation of FL over wireless communication networks. They derived the energy consumption models for FL based on the convergence rate analysis. The authors in [16] explored post-training quantization techniques for diffusion models, allowing direct quantization into 8 bits without significant performance degradation, with no substantial decline in performance. However, existing works fail to take into account the substantial training costs of diffusion models or the trade-off between performance and efficiency in the context of complex generative diffusion models [17].

To explore the deployment of a green generative diffusion model in mobile edge networks, we propose a dynamic quantization scheme for transmitting models during federated diffusion training. Firstly, we compress the diffusion models using a quantization scheme before transmission. We then study an energy consumption optimization problem and its solution. The performance of our proposed scheme is evaluated through simulations on the DDPM [17] model. Our main contributions can be summarized as follows:

•

We design a new and environmentally friendly federated generative diffusion framework that utilizes a dynamic method for parameter quantization and training in mobile edge networks.
•

We formulate an optimization problem for resource allocation in dynamic quantized federated diffusion, aiming to minimize total energy consumption while maintaining commendable performance.
•

Numerical results demonstrate the effectiveness of our proposed method compared to other baseline methods, particularly in terms of energy efficiency and sample quality.

The structure of the paper is organized as follows. The system model and the proposed on-demand quantized federated diffusion framework are introduced in section II. Next, we study the energy efficiency optimization problem in section III. Finally, We show the simulation results in section IV and discuss the conclusion and future work in section V.

II System Model

As shown in Fig. 1, we consider a mobile edge network scenario where a central server and $k$ edge devices collaborate to train a diffusion model using federated learning. Given the inherent characteristics of large diffusion models, training them in federated learning scenarios can be exceptionally energy-intensive. A promising model compression method named stochastic quantization [18] is implemented prior to the transmission of model parameters from edge devices to the edge server for aggregation, this is done to mitigate the transmission costs of the training process. Additionally, we take into account the variable quantization level needs of edge devices, ensuring the flexibility of quantization to accommodate different device requirements. Furthermore, considering the heterogeneous nature of edge devices and their varying resource capacities, an energy optimization problem is formulated to further minimize the energy consumption during federated diffusion training. After the efficient training is done, the server could utilize the final global diffusion model for efficient and high-quality content generation. The learning process for each round of iteration is as follows:

•

Step 1: Given different quantization requirements, the central server determines the optimal strategy for each edge device to balance computing and communication resources based on the resource status of different devices.
•

Step 2: The edge devices then perform a local diffusion computation and transmission according to the optimal strategy.
•

Step 3: After receiving all local diffusions from edge devices, the central server uses an aggregation scheme (such as Fedavg [19]) to unite the local diffusion into a new global diffusion and send it back to the edge devices for next round of training.

Hereinafter, we introduce the concept of quantization, which is a promising method for compressing neural networks. Stochastic quantization can be efficiently used in the federated learning process to significantly reduce energy consumption while maintaining minimal impact on model performance. To minimize the cost of transmitting a comparatively large model, as in Fig. 1, we propose quantizing the local diffusion model before uploading it to the server, since the resources of edge devices are often limited. To train the federated diffusion model with a quantization scheme, we first define the stochastic quantization function as $Q(\cdot)$ . Given the local diffusion weight ${\boldsymbol{w}}_{k}$ , the quantized weight can be expressed as $\hat{\boldsymbol{w}}_{k}=Q({\boldsymbol{w}}_{k})$ . Let $|{\boldsymbol{w}}^{[n]}_{k}|$ denotes the absolute value of element in ${\boldsymbol{w}}_{k}$ , The stochastic quantization function is defined as

Refer to caption — Figure 1: On-demand quantized federated diffusion framework

Q({\boldsymbol{w}}_{k})=a\cdot sign({\boldsymbol{w}}_{k})\cdot\begin{cases}q^{% l+1}&\text{w.p. $\frac{|{\boldsymbol{w}}^{[n]}_{k}|-aq^{l}}{a(q^{l+1}-q^{l})}$% }\\ q^{l}&\text{w.p. $\frac{aq^{l+1}-{|{\boldsymbol{w}}^{[n]}_{k}|}}{a(q^{l+1}-q^{% l})}$}\\ \end{cases}

(1)

Here, $a$ is the scale factor and $sgn(\cdot)$ denotes the sign function which represents the sign of ${\boldsymbol{w}}_{k}$ . Moreover, $[q^{l},q^{l+1}]$ is the quantization interval such that for any ${|\boldsymbol{w}}^{[n]}_{k}|$ there exists $\frac{|{\boldsymbol{w}}^{[n]}_{k}|}{a}\in[q^{l},q^{l+1}]$ . With the given quantization level $L_{k}$ , $q^{l}$ can be calculated as

\begin{split}q^{l}=\frac{l(\boldsymbol{w}^{max}_{k}-\boldsymbol{w}^{min}_{k})}% {a(q^{l+1}-q^{l})}+\frac{\boldsymbol{w}^{min}_{k}}{a}\end{split}

(2)

Where $\boldsymbol{w}^{max}_{k}$ and $\boldsymbol{w}^{min}_{k}$ represent the maximum and minimum value of non-zero element $|{\boldsymbol{w}}^{[n]}_{k}|$ , respectively.

Next, we formulate the computation and communication models of the proposed scheme. Let $f_{k}$ represent the computation frequency of local client $k$ , and $D_{k}$ denotes the data size of the local dataset. The computation time for training diffusion model is expressed by

\displaystyle\begin{split}{T^{cmp}_{k}}=\frac{I_{k}{D_{k}}{C}}{f_{k}}\end{split}

(3)

where $I_{k}$ and $C$ denote the local iteration times in each communication round and the workload of local diffusion training, respectively. Following that, given the energy coefficient $\tau_{k}$ , the energy consumption of client $k$ is estimated by

\displaystyle\begin{split}{E^{cmp}_{k}}={\tau_{k}}{{f_{k}}^{2}{I_{k}}{D_{k}}{C% }}\end{split}

(4)

In the distributed diffusion setting, each edge device uploads the local diffusion model in order to generate a better global model. Moreover, the local diffusion model is quantized as $\hat{\boldsymbol{w}}_{k}$ for efficiency improvement. To this end, we adopt a frequency division multiple access (FDMA) transmission scheme [20] for quantized local diffusion transmission. Therefore, with the transmission power $P_{k}$ , the uplink transmission rate of client $k$ is deduced by

\displaystyle\begin{split}{r_{k}}={B\log_{2}{(1+\frac{|h|^{2}d^{-\eta}P_{k}}{% BN_{0}})}}\end{split}

(5)

Here, $B$ and $N_{0}$ denote the bandwidth and noise power-spectral-density each while $d$ corresponds to the distance between the client and server. Meanwhile, $h$ and $\eta$ represent the Rayleigh channel coefficient and pathloss exponent, separately. Subsequently, given the updated model size $M_{k}$ , and quantization level $L_{k}$ , the time spent by client $k$ to transmit the local model to the server is

\displaystyle\begin{split}{T^{com}_{k}}=\frac{M_{k}L_{k}}{r_{k}}\end{split}

(6)

Thus, the corresponding energy consumption is calculated by

\displaystyle\begin{split}{E^{com}_{k}}=P_{k}{T^{com}_{k}}\end{split}

(7)

III Energy Efficiency Optimization

Before the discussion of the energy efficiency problem of the proposed methods started, similar to the work in [15], we first present the following assumption and theorem. Where $\delta_{k}$ represents the unique bound demand of various edge devices, a smaller $\delta_{k}$ indicates that the resource of edge device $k$ is relatively insufficient, which leads to a lower quantization level strategy:

Assumption 1.

the expectation of the square norm of the local weight uploaded by edge devices is bounded: for any uploaded weight, ${\mathbb{E}}\|{\boldsymbol{w}}_{k}\|^{2}\leq\delta_{k}$ .

Theorem 1.

Based on Assumptions 1, the square of local weight quantization error ${\Delta}_{k}$ is bounded by:

\displaystyle\begin{split}{\Delta}_{k}={\mathbb{E}}\|{\boldsymbol{w}}_{k}-\hat% {\boldsymbol{w}}_{k}\|^{2}\leq\frac{\delta_{k}}{2L^{2}_{k}}.\end{split}

(8)

Through this theorem, we can easily obtain the corresponding quantization levels for each heterogeneous edge device at different demands, thereby further constructing our energy consumption optimization model, which we will discuss in detail in the next section.

III-A Problem Formulation

As a consequence of variations in resource capabilities among edge devices, variability exists in the required quantization level demands. In simpler terms, each device has its distinct upper bound for quantization error. Leveraging Theorem 1, we establish the energy minimization problem within the confines of this quantization error constraint as follows:

$\displaystyle({\text{P1}})$	$\displaystyle\min\limits_{P_{k},f_{k},L_{k}}(E^{cmp}_{k}$	$\displaystyle+E^{com}_{k})\quad$	(III-A)
subject to:	$\displaystyle T_{k}^{cmp}+T_{k}^{com}\leq$	$\displaystyle\;{T_{k}^{max}},\forall k$	(9a)
	$\displaystyle{\mathbb{E}}\\|{\boldsymbol{w}}_{k}-\hat{\boldsymbol{w}}_{k}\\|^{2}\leq$	$\displaystyle\;\frac{\delta_{k}}{2L^{2}_{k}},\forall k$	(9b)
	$\displaystyle P^{min}_{k}\leq P_{k}\leq$	$\displaystyle\;P^{max}_{k},\forall k$	(9c)
	$\displaystyle f_{k}^{min}\leq f_{k}\leq$	$\displaystyle\;f_{k}^{max},\forall k$	(9d)

With the unique ${\delta}_{k}$ given by the different bound requirements of local edge devices, we can always obtain the optimal solution $L^{*}_{k}=\sqrt{\frac{{\delta}_{k}}{2{\Delta}_{k}}}$ for the optimization problem. As a result, we simplify P1 as

$\displaystyle({\text{P2}})$	$\displaystyle\min\limits_{P_{k},f_{k}}(E^{cmp}_{k}$	$\displaystyle+E^{com}_{k})\quad$	(III-A)
subject to:	$\displaystyle T_{k}^{cmp}+T_{k}^{com}\leq$	$\displaystyle\;{T_{k}^{max}},\forall k$	(10a)
	$\displaystyle P^{min}_{k}\leq P_{k}\leq$	$\displaystyle\;P^{max}_{k},\forall k$	(10b)
	$\displaystyle f_{k}^{min}\leq f_{k}\leq$	$\displaystyle\;f_{k}^{max},\forall k$	(10c)

Following that, we transform P2 into a more tractable form by introducing two intermediate variables $\theta_{k}>0$ and $\pi_{k}>0$ . Moreover, we let $\theta_{k}$ and $\pi_{k}$ represent the weight factors of maximum time budget for client $k$ such that

\displaystyle\begin{split}{\theta_{k}}{T_{k}^{cmp}}=\frac{I_{k}{D_{k}}{C}}{f_{% k}}\text{,}\\ {\pi_{k}}{T_{k}^{com}}=\frac{M_{k}\log_{2}{(L_{k})}}{r_{k}}\end{split}

(11)

Here, the lower bound of $\theta_{k}$ and $\pi_{k}$ can be easily acquired given the optimal $L_{k}$

\displaystyle\begin{split}{\theta_{k}^{min}}=\frac{I_{k}{D_{k}}{C}}{f_{k}^{max% }T_{k}^{max}}\text{,}\\ {\pi^{min}_{k}}=\frac{M_{k}\log_{2}{(L^{*}_{k})}}{{BT_{k}^{max}\log_{2}{(1+% \frac{|h|^{2}d^{-\eta}P^{max}_{k}}{BN_{0}})}}}\end{split}

(12)

Furthermore, the total energy consumption of client $k$ during the fine-tuning process can be rewritten in the following form

\displaystyle\begin{split}{E_{k}}&=E^{cmp}_{k}+E^{com}_{k}\\ &=\frac{\tau I^{3}_{k}D^{3}_{k}C^{3}_{k}}{\theta_{k}^{2}(T_{k}^{max})^{2}}+% \frac{N_{0}BT^{max}}{\lvert h\rvert^{2}d^{-\eta}}\big{(}2^{\frac{M_{k}\log_{2}% {(L^{*}_{k})}}{\pi_{k}BT_{k}^{max}}}-1\big{)}\end{split}

(13)

Thus, we convert problem P3 into the following form

$\displaystyle(\text{P3})$	$\displaystyle\min\limits_{\theta_{k},\pi_{k}}E_{k}\quad\thinspace$	(III-A)
subject to:	$\displaystyle\theta_{k}+\pi_{k}=1,\forall k$	(14a)
	$\displaystyle\;\theta_{k}^{min}\leq\theta_{k},\forall k$	(14b)
	$\displaystyle\;\pi_{k}^{min}\leq\pi_{k},\forall k$	(14c)

Through this basic form, we can readily acquire the numerical solution for the original problem. In the next subsection, we will present the solution to address the current matter.

III-B Solution

It can be easily proved that problem P3 is a convex problem, which can be effectively solved by applying the Karush-Kuhn-Tucker (KKT) conditions [21]. With the optimal energy optimization solution, we can decide the final resource allocation scheme for the federated diffusion. The Lagrange function of P3 is as follows:

\displaystyle\begin{split}{\boldsymbol{L}}(P_{k},f_{k},{\nu}_{k},\zeta_{k}^{% \theta},\zeta_{k}^{\pi})=\frac{\tau I^{3}_{k}D^{3}_{k}C^{3}_{k}}{\theta_{k}^{2% }(T_{k}^{max})^{2}}\\ +\frac{N_{0}BT_{k}^{max}}{\lvert h\rvert^{2}d^{-\eta}}\big{(}2^{\frac{M_{k}% \log_{2}{(L^{*}_{k})}}{\pi_{k}BT_{k}^{max}}}-1\big{)}+{\nu}_{k}(\theta_{k}+\pi% _{k}-1)\\ +\zeta_{k}^{\theta}(\theta_{k}^{min}-\theta_{k})+\zeta_{k}^{\pi}(\pi_{k}^{min}% -\pi_{k})\end{split}

(15)

Here, ${\nu}_{k}$ is the equality Lagrange multiplier associated with equality constraint (14a), while ${\zeta}_{k}^{\theta}$ and $\zeta_{k}^{\pi}$ denote the inequality Lagrange multiplier for constraints (14b) and (14c), respectively. In order to accomplish optimality for problem P3, we derive the necessary equations from the Lagrange function as follows:

\left\{\begin{array}[]{c}{\text{constraint (14a)-(14c)}}\\ \frac{2\tau I^{3}_{k}D^{3}_{k}C^{3}_{k}}{\theta_{k}^{3}(T_{k}^{max})^{2}}+{\nu% }_{k}-\zeta_{k}^{\theta}=0\\ \frac{N_{0}BT_{k}^{max}}{\lvert h\rvert^{2}d^{-\eta}}\left(2^{\frac{M_{k}\log_% {2}{(L^{*}_{k})}}{\pi_{k}BT_{k}^{max}}}-1-\frac{\ln{(2)}M_{k}\log_{2}{(L^{*}_{% k})}}{\pi_{k}BT_{k}^{max}}\right)+{\nu}_{k}-\zeta_{k}^{\pi}=0\\ \zeta_{k}^{\theta}(\theta_{k}^{min}-\theta_{k})-\zeta_{k}^{\pi}(\pi_{k}^{min}-% \pi_{k})=0\\ \end{array}\right.

(16)

Based on Eqns 16, there exist two cases that satisfy constrain (14b) concerning variable $\theta_{k}$ . If $\theta_{k}>\theta_{k}^{min}$ , the optimal solution of $\theta^{*}_{k}$ is obtained by

\displaystyle\begin{split}\theta^{*}_{k}=\sqrt[3]{\frac{2\tau I^{3}_{k}D^{3}_{% k}C^{3}_{k}}{\nu_{k}(T_{k}^{max})^{2}}}\end{split}

(17)

Otherwise, we always have $\theta^{*}_{k}=\theta_{k}^{min}$ . Similarly to $\theta_{k}$ , when $\pi_{k}>\pi_{k}^{min}$ , the optimal $\pi^{*}_{k}$ is acquired the same way. Given the equality Lagrange multiplier $\nu_{k}$ , we have

\displaystyle\begin{split}\Phi(\pi^{0}_{k})=\frac{N_{0}BT_{k}^{max}}{\lvert h% \rvert^{2}d^{-\eta}}(2^{\frac{M_{k}\log_{2}{(L^{*}_{k})}}{\pi^{0}_{k}BT_{k}^{% max}}}-1-\frac{\ln{(2)}M_{k}\log_{2}{(L^{*}_{k})}}{\pi^{0}_{k}BT_{k}^{max}})\\ +{\nu}_{k}=0\end{split}

(18)

Where $\pi^{0}_{k}$ is the zero point of function $\Phi(\pi_{k})$ . In general, the optimal solution of $\theta_{k}$ and $\pi_{k}$ can be acquired by

\displaystyle\begin{split}\theta_{k}={\max}\{\sqrt[3]{\frac{2\tau I^{3}_{k}D^{% 3}_{k}C^{3}_{k}}{\nu_{k}(T_{k}^{max})^{2}}},\theta^{min}_{k}\},\pi_{k}={\max}% \{\pi^{0}_{k},\pi^{min}_{k}\}\end{split}

(19)

It’s worth mentioning that seeking the problem’s optimal solution directly can be rather intricate, which is why we employed binary search to find the optimal strategy for the Lagrange multiplier $\nu_{k}$ . Utilizing the most favorable Lagrange multiplier value, the optimal approach for variables $\theta_{k}$ and $\pi_{k}$ are computed based on (19). To be specific, given the searching range of $\nu_{k}$ and error tolerance $\lambda$ , the optimal $\nu_{k}$ is obtained with the constraint (14a). Additional and more detailed information is provided in Algorithm 1. Finally, the overall algorithm of the proposed method is shown as algorithm 2.

Input:

\nu^{min}_{k}

\nu^{max}_{k}

\pi^{min}_{k}

\pi^{max}_{k}

,and

\lambda

Output: The optimal Lagrange multiplier

\nu^{*}_{k}

1 while $|\nu^{max}_{k}-\nu^{min}_{k}|\leq\lambda$ do

\nu_{k}=(\nu^{max}_{k}+\nu^{min}_{k})/2

;

3 Calculate

\theta^{*}_{k}

;

4 Search for

\pi^{*}_{k}

;

5 if

\theta^{*}_{k}+\pi^{*}_{k}\leq 1

then

\nu^{max}_{k}=\nu^{*}_{k}

else

\nu^{min}_{k}=\nu^{*}_{k}

;

7 end while

8return

\nu^{*}_{k}

9 while $|\pi^{max}_{k}-\pi^{min}_{k}|\leq\lambda$ do

\pi_{k}=(\pi^{max}_{k}+\pi^{min}_{k})/2

;

11 Calculate

\Phi(\pi_{k})

;

12 if

\Phi>0

then

\pi^{max}_{k}=\pi^{*}_{k}

else

\pi^{min}_{k}=\pi^{*}_{k}

;

14 end while

15return

\pi^{*}_{k}

Algorithm 1 Binary Search

Input: pre-trained model

{\boldsymbol{w}}^{0}

; variance schedule

\{\beta\}

; iteration

I

; sample step

T

; error bound

\delta_{k}

Output: global model

{\boldsymbol{w}}^{I}

1 for $i=0$ to $I$ do

K~{}\leftarrow

Select

K

devices from edge devices pool;

3 Calculate the optimal resource allocation strategy based on Algorithm 1;

4 for $k$ in $K$ parallel do

5 Initialize local model

{\boldsymbol{w}}^{i}_{k}

{\boldsymbol{w}}^{i}

;

6 A mini-batch original images

x_{0}

in local dataset

D_{k}

;

t\sim Uniform(\{1,...,T\})

;

\epsilon\sim\mathcal{N}(0,I)

;

9 Diffuse

x_{0}

x_{t}\approx\epsilon

by:

x_{t}=\sqrt{\overline{\alpha}_{t}}x_{0}+\sqrt{1-\overline{\alpha}_{t}}\epsilon

;

10 Take the gradient decent step by minimizing:

||\epsilon-F_{{\boldsymbol{w}}^{i}_{k}}(x_{t},t)||^{2}

;

11 Quantized

{\boldsymbol{w}}^{i}_{k}

based on stochastic quantization;

12 Upload the quantized model

\hat{\boldsymbol{w}}^{i+1}_{k}

to the server.

13 end for

15 end for

Algorithm 2 Quantized Federated Diffusion

IV Numerical Results

IV-A Simulation Settings

To simulate the practical case of federated diffusion in mobile edge networks, we fine-tuned the pre-trained DDPM [17] on CIFAR10 [22] using the GTSRB [23] dataset. The dataset is divided into 10 subsets for 10 edge devices to perform federated learning. We fine-tuned the federated model with 1000 epochs for performance evaluations. The sample steps are set as 1000 for image generation. For computation and communication hyper-parameters { ${I_{k}},{D_{k}},{C},{f_{k}^{max}},{\tau_{k}}$ } and { $B,{|h|^{2}},d,{\eta},{N_{0}},{M_{k}}$ }, the default settings are { $1,512,3.25$ MCycles $,10^{9},10^{-26}$ } and { $50$ MHz $,0.001$ W $,45$ m $,3.76,-95$ dbm/MHz $,37$ M}.

IV-B Performance Evaluations

Fig. 2 illustrates the performance and energy consumption of the proposed algorithm. We employed the Fréchet Inception Distance (FID) [24] as an evaluative metric for assessing the quality of the images generated by the model. A lower FID value indicates a higher degree of similarity between the distribution of the generated dataset and that of the original dataset. To enhance the precision of our evaluation concerning the quality of the generated dataset, we ensured that the number of generated datasets equaled the number of original images. Remarkably, our proposed methodology takes into consideration distinct quantization error constraints customized for heterogeneous edge devices. Subsequently, an energy minimization problem is optimized. The range of quantization levels spans from 6 bits to 8 bits, a range normally associated with a substantial reduction in energy consumption while concurrently upholding commendable performance. We conduct a comparative analysis between the baseline method, Fedavg, in addition to the fixed quantization methods employing 6-bits, 7-bits, and 8-bits quantization levels. It is evident from our results that our approach surpasses the more economical 8-bits quantization scheme in terms of both performance and cost-efficiency. It is essential to note that the compared methods did not specifically optimize for energy consumption, utilizing only $50\%$ time budget for computation and another $50\%$ for communication by default.

Fig. 3 illustrates the successful convergence achieved by our proposed binary search algorithm in addressing the energy optimization problem we have established. Specifically, provided various quantization level requirements, the optimal solution for minimizing the energy cost is determined after about 20 searching iterations. Furthermore, a study was conducted to examine the impacts of two different hyperparameters, the time budget and the distance of communication. It can be clearly seen that the proposed method can converge well under different settings. As the allotted time budget reduces, the requisite energy consumption by the system escalates. In parallel, with an augmentation in the distance of communication between edge devices and central servers, there is a concurrent amplification in energy expenditure. Fig. 4 presents a comparative analysis of our method alongside other approaches across various time budget ranges. As observed, with an increase in the allotted time budget ranging from 13s to 18s, the system’s energy costs diminish, and our solution consistently outperforms the baseline approach. This indicates that our method can adapt to parameter settings within certain ranges.

V Conclusion and Future Work

In this paper, we first design a dynamic quantized federated diffusion training considering each edge device’s demand. Subsequently, our study turns towards addressing the challenge of energy efficiency, taking into account the unique constraint imposed by quantization demand. Our simulation results demonstrate that our proposed method outperforms both the baseline federated diffusion approach and fixed quantized federated diffusion in substantially reducing system energy consumption and transmitted model size. Remarkably, this reduction is achieved without compromising the reasonable quality and diversity of the generated data, underscoring the effectiveness of our approach.

To achieve the benefits of efficient federated generative diffusion, there still exist several open and challenging issues. For distributed diffusion models, the matter of proficient sampling remains an unsolved problem, primarily due to the distinctive characteristics inherent to diffusion itself. Diverging from conventional AI models, the interference phase of diffusion entails a substantial energy outlay, particularly in the denoising sampling steps. This heightened energy consumption may be deemed unmanageable for certain edge devices. Consequently, further study is imperative to enhance the efficiency of sampling within the context of diffusion, particularly in the domain of distributed edge intelligence scenarios.

References

[1] B. Mao, Y. Liu, J. Liu, and N. Kato, “AI-assisted edge caching for metaverse of connected and automated vehicles: Proposal, challenges, and future perspectives,” IEEE Vehicular Technology Magazine, vol. 18, no. 4, pp. 66–74, 2023.
[2] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
[3] S. Choi, J. Kim, and H. Yeo, “Trajgail: Generating urban vehicle trajectories using generative adversarial imitation learning,” Transportation Research Part C: Emerging Technologies, vol. 128, p. 103091, 2021.
[4] G. Cooper, “Examining science education in chatgpt: An exploratory study of generative artificial intelligence,” Journal of Science Education and Technology, vol. 32, no. 3, pp. 444–452, 2023.
[5] J. Kang, J. Wen, D. Ye, B. Lai, T. Wu, Z. Xiong, J. Nie, D. Niyato, Y. Zhang, and S. Xie, “Blockchain-empowered federated learning for healthcare metaverses: User-centric incentive mechanism with optimal data freshness,” IEEE Transactions on Cognitive Communications and Networking, 2023.
[6] B. Mao, X. Zhou, J. Liu, and N. Kato, “Digital twin satellite networks towards 6G: Motivations, challenges, and future perspectives,” IEEE Network, pp. 1–1, 2023.
[7] J. Kang, J. He, H. Du, Z. Xiong, Z. Yang, X. Huang, and S. Xie, “Adversarial attacks and defenses for semantic communication in vehicular metaverses,” IEEE Wireless Communications, vol. 30, no. 4, pp. 48–55, 2023.
[8] X. Luo, J. Wen, J. Kang, J. Nie, Z. Xiong, Y. Zhang, Z. Yang, and S. Xie, “Privacy attacks and defenses for digital twin migrations in vehicular metaverses,” IEEE Network, 2023.
[9] F. V. S. Jothiraj and A. Mashhadi, “Phoenix: A federated generative diffusion model,” arXiv preprint arXiv:2306.04098, 2023.
[10] M. de Goede, “Training diffusion models with federated learning: A communication-efficient model for cross-silo federated image generation,” 2023.
[11] B. Mao, J. Liu, Y. Wu, and N. Kato, “Security and privacy on 6g network edge: A survey,” IEEE Communications Surveys & Tutorials, vol. 25, no. 2, pp. 1095–1127, 2023.
[12] X. Huang, P. Li, H. Du, J. Kang, D. Niyato, D. I. Kim, and Y. Wu, “Federated learning-empowered AI-generated content in wireless networks,” arXiv preprint arXiv:2307.07146, 2023.
[13] J. Wen, J. Kang, M. Xu, H. Du, Z. Xiong, Y. Zhang, and D. Niyato, “Freshness-aware incentive mechanism for mobile AI-generated content (aigc) networks,” in 2023 IEEE/CIC International Conference on Communications in China (ICCC), pp. 1–6, 2023.
[14] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy efficient federated learning over wireless communication networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 3, pp. 1935–1949, 2020.
[15] P. Li, G. Cheng, X. Huang, J. Kang, R. Yu, Y. Wu, M. Pan, and D. Niyato, “Snowball: Energy efficient and accurate federated learning with coarse-to-fine compression over heterogeneous wireless edge devices,” IEEE Transactions on Wireless Communications, 2023.
[16] X. Meng and Y. Kabashima, “Quantized compressed sensing with score-based generative models,” in The Eleventh International Conference on Learning Representations, 2022.
[17] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
[18] R. Chen, L. Li, K. Xue, C. Zhang, M. Pan, and Y. Fang, “Energy efficient federated learning over heterogeneous mobile devices via joint design of weight quantization and wireless transmission,” IEEE Transactions on Mobile Computing, 2022.
[19] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, pp. 1273–1282, PMLR, 2017.
[20] H. G. Myung, J. Lim, and D. J. Goodman, “Single carrier fdma for uplink wireless transmission,” IEEE vehicular technology magazine, vol. 1, no. 3, pp. 30–38, 2006.
[21] S. P. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
[22] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” 2009.
[23] S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, “Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark,” in International Joint Conference on Neural Networks, no. 1288, 2013.
[24] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.