A Green Multi-Attribute Client Selection for Over-The-Air Federated Learning: A Grey-Wolf-Optimizer Approach

Maryam Ben Driss Department of Computer Science, University of Quebec at Montreal, H2L 2C4MontrealCanada Essaid Sabir Department of Science and Technology, TÉLUQ, University of Quebec, Montreal, H2S 3L4MontrealCanada Halima Elbiaze Department of Computer Science, University of Quebec at Montreal, H2L 2C4MontrealCanada Abdoulaye Baniré Diallo Department of Computer Science, University of Quebec at Montreal, H2L 2C4MontrealCanada  and  Mohamed Sadik NEST Research Group, LRI Lab, ENSEM, Hassan II University of CasablancaCasablancaMorocco
(2024)
Abstract.

Federated Learning (FL) has gained attention across various industries for its capability to train machine learning models without centralizing sensitive data. While this approach offers significant benefits such as privacy preservation and decreased communication overhead, it presents several challenges, including deployment complexity and interoperability issues, particularly in heterogeneous scenarios or resource-constrained environments. Over-the-air (OTA) FL was introduced to tackle these challenges by disseminating model updates without necessitating direct device-to-device connections or centralized servers. However, OTA-FL brought forth limitations associated with heightened energy consumption and network latency. In this paper, we propose a multi-attribute client selection framework employing the grey wolf optimizer (GWO) to strategically control the number of participants in each round and optimize the OTA-FL process while considering accuracy, energy, delay, reliability, and fairness constraints of participating devices. We evaluate the performance of our multi-attribute client selection approach in terms of model loss minimization, convergence time reduction, and energy efficiency. In our experimental evaluation, we assessed and compared the performance of our approach against the existing state-of-the-art methods. Our results demonstrate that the proposed GWO-based client selection outperforms these baselines across various metrics. Specifically, our approach achieves a notable reduction in model loss, accelerates convergence time, and enhances energy efficiency while maintaining high fairness and reliability indicators.

Over-The-Air Federated Learning; Client Selection; Grey Wolf Optimizer; Convergence Speed; Energy Efficiency; Reliability; Fairness.
copyright: acmlicensedjournalyear: 2024doi: XXXXXXX.XXXXXXXccs: Computing methodologies Machine learning

1. Introduction

Artificial intelligence (AI) can transform many aspects of human society. With applications spanning healthcare, education, finance, transportation, and beyond, AI’s capacity to analyze extensive datasets, predict outcomes, and automate tasks stands poised to enhance efficiency, accuracy, and the overall quality of life. However, traditional machine learning (ML) in massive and sensitive environments faces several challenges caused by the nature of large-scale datasets, distributed data sources, and their constraints such as data privacy, limited resources, and network heterogeneity. To address these issues, federated learning (FL) is a promising approach to train ML algorithms where devices collaborate to create and improve a shared model while preserving users’ privacy and reducing communication overhead (Driss et al., 2023; Luzón et al., 2024). Instead of sending raw data to a central server for aggregation, each device maintains its dataset, trains a local model, sends model updates or gradients to the server that aggregates these updates, and then sends back the refined model to the individual devices. This process iterates until the global model reaches the desired accuracy. Over-the-air federated learning (OTA-FL) (Xiao et al., 2024) is a specific implementation of FL that uses wireless communication channels for transmitting model updates as illustrated in Fig.1 which greatly reduces the cost of communicating model updates from the edge devices.

Implementing OTA-FL in heterogeneous scenarios, where clients have different data distribution, limited bandwidth, and less reliable network conditions, faces several challenges including limited computing capabilities, data quality, and fairness between FL agents. Thus, the client selection step is crucial and the set of participants in each training round is a key factor in addressing these challenges and enhancing the learning process (Khan et al., 2020; Azimi-Abarghouyi and Fodor, 2024). By strategically choosing clients based on their data quality and computational capabilities, the FL system can effectively navigate through communication constraints, privacy concerns, and other challenges. Additionally, the client selection process is essential for ensuring the efficiency of model update distribution and the quality of the aggregated global model (Mayhoub and M. Shami, 2024). It ensures that the devices involved make valuable contributions to the shared learning, improving the overall effectiveness of FL algorithms. By carefully selecting which devices participate in the collaborative learning process, we can maximize the impact of each contribution, leading to better model performance and more reliable results.

Refer to caption
Figure 1. Over-The-Air (OTA) federated learning process.
\Description

.

In this paper, we introduce an optimization problem focused on crafting a multi-attribute client selection framework using the grey wolf algorithm. Our goal is to consider various criteria, including model accuracy, communication cost, resource reliability, and fairness among FL clients. This framework aims to strategically select clients based on these attributes, optimizing the overall performance and efficiency of the FL process. The remainder of this paper is organized as follows: Section II presents background and reviews previous research employing various techniques and criteria for client selection, outlining our novel contribution. Section III provides the system model and defines the objective function. Section IV elaborates on the GWO mechanism for client selection in the learning process. Section V presents experimental results and insights gleaned from our study. Finally, Section VI concludes the paper by summarizing key findings and suggesting avenues for future research.

2. Background

2.1. Federated Learning

FL is designed to train models across a network of decentralized devices while keeping data private. Individual devices collaboratively create a shared model without the need to share their raw data with a central server (see Fig.1). The process involves the following steps:

  1. (1)

    Initialization: A central server generates an initial global model for the selected devices.

  2. (2)

    Local training: Each device trains a local model using its dataset, which is not transmitted to the central server.

  3. (3)

    Local model update: Each device produces a model update or gradient based on the differences between its local and global models.

  4. (4)

    Aggregation: The central server collects the updates from all participants and refines the global model through aggregation (e.g., averaging).

  5. (5)

    Global model update: The refined global model is then sent back to all participating devices.

  6. (6)

    Iteration: The process of local training, local model update, aggregation, and global model update is repeated over multiple iterations. As each iteration progresses, the overall model’s performance improves.

OTA-FL is a promising concept that allows clients to share the same spectral resources by transmitting their local model updates and aggregating these models over the air in a ”one-time” manner. This enables the efficient sharing of model updates over the air without relying on traditional wired networks (Zhu et al., 2024). The benefits of the OTA-FL are:

  • Efficiency: OTA-FL can be more efficient in wireless environments, where wired connections might not be available or practical.

  • Scalability: It allows for the scalability of FL to a large number of devices, especially in settings where devices are mobile or have limited connectivity.

While FL has significant advantages, it presents several challenges including communication overhead, resource constraints, and deployment complexity (Wen et al., 2023). Therefore, client selection has been introduced as a strategy to limit the number of communicating parties at every process step.

2.2. Client selection

Client selection strategy involves determining which devices participate in each round of model training and contributing their updates to the central server to improve the FL quality and balance the need for privacy, data diversity, and model performance. However, randomly sampling clients in each training round may not fully exploit the local updates from heterogeneous clients, resulting in lower model accuracy, slower convergence rate, and degraded fairness (Fu et al., 2023; Smestad and Li, 2023). Thus, selecting a client must consider various constraints for the following reasons:

  • Resource Constraints: Mobile devices participating in OTA-FL often have limited computational resources, battery life, and bandwidth. Client selection strategies need to take account of these constraints to minimize energy consumption, reduce latency, and optimize resource utilization.

  • Dynamic network conditions: OTA-FL operates in dynamic wireless environments where network conditions can vary significantly. Client selection algorithms must adapt to changes in connectivity, signal strength, and device availability to maintain efficient and reliable communication.

  • Scalability and distributed computation: As OTA-FL deployments scale to accommodate numerous devices and data sources, the client selection framework must be scalable and capable of distributed computation to handle the computational and communication overhead.

2.3. Grey Wolf Optimizer

The GWO is a metaheuristic optimization algorithm inspired by the social behavior of grey wolves in nature. It was introduced in 2014 (Mirjalili et al., 2014). The algorithm simulates the grey wolves’ leadership hierarchy and hunting mechanisms to solve optimization problems. The GWO is characterized by simulating the social hierarchy of a wolf pack with alpha, beta, delta, and omega wolves representing different solutions. It balances exploration and exploitation by assigning roles: the alpha wolf explores, while beta and delta wolves exploit. The omega wolf maintains diversity. Encouraging collaboration mimics real wolf pack cooperation, aiding in escaping local optima. Positions of wolves are iteratively updated based on movement equations inspired by hunting and social behaviors, contributing to the algorithm’s convergence towards the global optimum in the search space. The basic steps of the GWO can be summarized as follows:

  1. (1)

    Initialization: Initialize a population of wolves, representing potential solutions to the optimization problem.

  2. (2)

    Objective function: Evaluate the objective function for each wolf in the population.

  3. (3)

    Update positions: Update the positions of wolves based on the movement equations inspired by wolf behavior.

  4. (4)

    Boundary handling: Ensure that the updated positions of wolves remain within the defined search space.

  5. (5)

    Selection: Update the alpha, beta, and delta wolves based on their fitness values.

  6. (6)

    Iteration: Repeat the process until a stopping criterion is met such as a maximum number of iterations or a desired level of convergence.

The GWO has been applied to various optimization problems in engineering, economics, and other fields (Faris et al., 2018; Makhadmeh et al., 2023) due to its benefits, such as:

  • Global Search Capability: GWO exhibits a robust global search capability, making it well-suited for optimization tasks where finding a global optimum is crucial. This characteristic is particularly advantageous in complex problem spaces with multiple peaks.

  • Fast Convergence Rate: The algorithm is known for its fast convergence, allowing it to reach near-optimal solutions quickly. This is advantageous in scenarios where computational resources are limited, and rapid decision-making is required.

  • Ease of Implementation: GWO’s simplicity facilitates straightforward implementation, making it accessible to practitioners with varying levels of expertise. The ease of implementation expedites the integration of GWO into diverse applications.

3. Related Work and Our Contribution

Recent advances in FL have focused on various aspects, including communication efficiency, privacy preservation, and robustness to adversarial attacks. However, the challenge of optimal client selection remains underexplored. Effective client selection is crucial for fast convergence, accurate models, fairness, and efficient communication. This section presents a literature review focused on optimizing client selection through various methods and highlights our contribution to this area.

Ref Year Accuracy Energy Delay Reliability Fairness Model Implementation
(AbdulRahman et al., 2020) 2020 Dynamic programming Traditional FL
(Ruan et al., 2021) 2021 Dynamic programming Traditional FL
(Zheng et al., 2021) 2021 Dynamic programming Traditional FL
(Zhang et al., 2023) 2023 Dynamic programming Traditional FL
(Huang et al., 2022) 2022 Multi armed bandit Traditional FL
(Qu et al., 2022) 2022 Multi armed bandit Traditional FL
(Zhu et al., 2022) 2023 Multi armed bandit Traditional FL
(Huang et al., 2020) 2023 Multi armed bandit Traditional FL
(Shi et al., 2023) 2023 Multi armed bandit Traditional FL
(Kang and Ahn, 2023) 2023 Genetic algorithm Traditional FL
(Chahoud et al., 2023) 2023 Genetic algorithm Traditional FL
Our article 2024 Grey wolf optimizer Over-the-air FL
Table 1. related existing works on heuristic algorithm-based client selection

3.1. Random Selection

This client selection method is achieved by randomly selecting a subset of clients to participate in the FL process. The work in (Nishio and Yonetani, 2019) mitigates this problem and performs FL while actively managing clients based on their resource conditions by asking the randomly selected clients to send their resource information and participate in determining which of them go to complete the FL process. However, this approach presents several challenges such as building and maintaining client trust and ensuring high data quality. The random selection’s implementation is simple but may lead to uneven data distribution and performance.

3.2. Learning-based Selection

Some papers implement client selection using ML techniques, where a central model predicts which clients provide high-quality updates. For instance, reinforcement learning is deployed to improve client selection performance by involving a reinforcement learning agent that learns a client selection policy (Cheng et al., 2023). The authors in (Mohamed et al., 2024) introduced a clustering-based client selection framework to decrease the communication costs for training FL models by reducing the number of training devices at every round and the number of rounds required to reach convergence. Another learning-based client selection is proposed in (Zou et al., 2024) a comprehensive framework for client selection in FL based on the concept of value-of-information (VoI), which measures how valuable a client is for the global model aggregation, the VoI estimator uses reinforcement learning to learn the relationship between VoI and various heterogeneous factors of clients. The authors of (Wang et al., 2020) designed a framework that intelligently chooses the client devices to participate in each round to counterbalance the bias introduced by non-IID data and to speed up convergence. A client sampling method was proposed in (Wu et al., 2023) to select relevant clients and mitigate the impact of low-quality data on the training process. Although this method allows for adaptive client selection strategies, it is computationally intensive, requires additional training, and may be sensitive to the quality of the initial model.

3.3. Heuristic Algorithm-based Selection

Some methods formulate the client selection strategy as a mathematical optimization problem. Then, clients are selected using mathematical methods such as the dynamic programming model in (Zheng et al., 2021), where the authors proposed a framework to balance the trade-off between the energy consumption of the edge clients and the learning accuracy of FL. The authors in (Abouzahir et al., 2023) proposed a predictive quality of service paradigm that allows devices to self-adjust their power allocation to maintain reliability and latency within the tolerated range of the URLLC application. In (Zhang et al., 2023), the authors proposed a delay-constrained client selection framework for heterogeneous FL in intelligent transportation systems to improve the model performance such as accuracy, training, and transmission time. The multi-armed bandit (MAB) model is used in (Huang et al., 2022) to work for the hierarchical FL in MEC networks by estimating the participation probability for each client using the following information wireless channel state, local computing resources, and previous performance. The authors of (Qu et al., 2022) also formulated the client selection problem as an MAB problem to design a selection framework where the network operator learns the number of successful participating clients to improve the training performance as well as under the limited budget on each edge server. Contextual combinatorial MAB is used in (Shi et al., 2023) to formulate a client selection problem to boost volatile FL by speeding up model convergence, promoting model accuracy, and reducing energy consumption. The authors in (Zhu et al., 2022) leveraged the MAB framework and the virtual queue technique in Lyapunov optimization to conduct client selection with a fairness guarantee in the asynchronous FL framework. In (Huang et al., 2020), it was found that fairness criteria play a critical role in the FL training process. A fairer client selection strategy can lead to higher final accuracy, though it may come at the cost of some training efficiency. Authors of (Kang and Ahn, 2023) proposed a client selection method using a Genetic algorithm, which enables faster central model training at a lower cost based on the client’s cost and the result of its local update. A dynamic and multicriteria scheme for client selection is developed in (Chahoud et al., 2023) to offer more volume and heterogeneity of data in the FL process using a genetic algorithm.

3.4. Our contributions

3.4.1. Multi-attribute client selection:

Based on related works (See Table 1), certain selection methods choose the clients with the best performance or high resources. This approach results in clients with low-level resource capacity being unable to participate in the training process, and their datasets being ignored. This leads to biased and unfair selection, which ultimately results in an underfitting of the learned global model for those low-level clients. Moreover, some proposed methods suffer from some futility of the clients which train their local models and then the server does not aggregate them. This leads to a waste of client energy. While existing works have primarily focused on accuracy and cost criteria for client selection, it is imperative to take into account other attributes such as reliability, fairness, privacy preservation, and energy efficiency. By incorporating these additional dimensions, we can foster more equitable participation, protect user privacy, and optimize resource utilization in OTA-FL systems, enhancing their overall performance and global model generalization.

3.4.2. Integration of GWO with client selection:

Given the scalability and efficiency requirements of OTA-FL, the GWO holds significant promise for optimizing client selection strategies. By leveraging GWO’s global search capability and fast convergence rate, we can design client selection algorithms that effectively balance exploration and exploitation. Additionally, GWO’s ease of implementation makes it well-suited for deployment in distributed environments with resource-constrained devices. Furthermore, GWO’s ability to maintain diversity in the population of wolves can address the challenge of heterogeneity among FL clients. By ensuring that the client selection process considers diverse attributes and characteristics of participating devices, we can enhance the performance and robustness of OTA-FL systems. As shown in Table 1, GWO has not been applied to optimize client selection to enhance FL model’s accuracy, cost, energy efficiency, reliability, and fairness. This notable absence of GWO-based approaches in existing literature underscores a significant research gap and provides compelling motivation to explore its potential in this context. Our contributions are summarized below:

  • Offering a multi-attribute client selection framework that is noticed in the ”select then train” method. It balances the accuracy with energy, delay, reliability, and fairness criteria to tackle the OTA-FL challenges such as security risks, limited computational capability, and unstable networks.

  • Adopting the grey wolf algorithm to choose the set of eligible clients to join the learning process.

  • Evaluating the proposed approach and analyzing the FL model performance in terms of accuracy, convergence time, and energy efficiency.

4. Multi-Attribute Client Selection

We consider an FL framework consisting of a single base station and n𝑛nitalic_n clients N={1,2,,nN=\{1,2,\cdots,nitalic_N = { 1 , 2 , ⋯ , italic_n}. Each client i𝑖{i}italic_i possesses local data, denoted as Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For each communication round, the server aims to learn a global model with the data Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT distributed across the selected clients.

To model the FL problem, we define the weight vector w𝑤witalic_w to capture the parameters related to the global model. We introduce the loss function l(w,xj,yj)𝑙𝑤subscript𝑥𝑗subscript𝑦𝑗l(w,x_{j},y_{j})italic_l ( italic_w , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), which captures the FL performance over input vector xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and output yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for each Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The categorical cross-entropy is used as a loss function in performing the classification problem in our paper. The total loss function of client i𝑖iitalic_i writes (Yang et al., 2020):

(1) Fi(w)=1Dij=1Dil(w,xj,yj).subscript𝐹𝑖𝑤1subscript𝐷𝑖superscriptsubscript𝑗1subscript𝐷𝑖𝑙𝑤subscript𝑥𝑗subscript𝑦𝑗F_{i}(w)=\frac{1}{D_{i}}\sum_{j=1}^{D_{i}}l(w,x_{j},y_{j}).italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w ) = divide start_ARG 1 end_ARG start_ARG italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_l ( italic_w , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) .

The FL training problem can be formulated as follows:

(2) minF(w)=i=1nDiDFi(w),𝐹𝑤superscriptsubscript𝑖1𝑛subscript𝐷𝑖𝐷subscript𝐹𝑖𝑤\min F(w)=\sum_{i=1}^{n}\frac{D_{i}}{D}F_{i}(w),roman_min italic_F ( italic_w ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_D end_ARG italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w ) ,

where D=i=1nDi𝐷superscriptsubscript𝑖1𝑛subscript𝐷𝑖D=\sum_{i=1}^{n}D_{i}italic_D = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the total data samples of all clients.

4.1. Delay

To implement FL over wireless networks, wireless devices must train the model locally and transmit their results over wireless links. However, this computation and transmission introduce a delay that impacts the overall FL performance. Therefore, it is crucial to optimize the delay for efficient FL implementation.

4.1.1. Computation Delay

The computation delay is determined by the type of learning models and the desired learning accuracy ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the computation time at user i𝑖iitalic_i needed for processing is (Yang et al., 2020):

(3) τic=CiDifiυilog2(1ϵi),superscriptsubscript𝜏𝑖𝑐subscript𝐶𝑖subscript𝐷𝑖subscript𝑓𝑖subscript𝜐𝑖subscript21subscriptitalic-ϵ𝑖\tau_{i}^{c}=\frac{C_{i}D_{i}}{f_{i}}\upsilon_{i}\log_{2}\left(\frac{1}{% \epsilon_{i}}\right),italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = divide start_ARG italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ,

where υilog2(1/ϵi)subscript𝜐𝑖subscript21subscriptitalic-ϵ𝑖\upsilon_{i}\log_{2}(1/\epsilon_{i})italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 / italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the number of local iterations required for client i𝑖iitalic_i to reach the desired accuracy ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (cycles/bit) is the number of CPU cycles required for computing one sample data at user i𝑖iitalic_i, and fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the computation capacity of user i𝑖iitalic_i, which is measured by the number of CPU cycles per second.

Table 2. Main notations used in this paper.
Notation Meaning
n𝑛nitalic_n Number of clients
Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Data samples collected by client i𝑖iitalic_i
D𝐷Ditalic_D Total data samples of all users
w𝑤witalic_w Global model parameter vector
xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT Input vector for each data sample j𝑗jitalic_j
yjsubscript𝑦𝑗y_{j}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT Output vector for each data sample j𝑗jitalic_j
l(w,xj,yj)𝑙𝑤subscript𝑥𝑗subscript𝑦𝑗l(w,x_{j},y_{j})italic_l ( italic_w , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) Total loss function for client i𝑖iitalic_i
Fi(w)subscript𝐹𝑖𝑤F_{i}(w)italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_w ) Local objective function
F(w)𝐹𝑤F(w)italic_F ( italic_w ) Global objective function
τicsuperscriptsubscript𝜏𝑖𝑐\tau_{i}^{c}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT Computation delay
τitsuperscriptsubscript𝜏𝑖𝑡\tau_{i}^{t}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Transmission delay
τisubscript𝜏𝑖\tau_{i}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Delay requirement
eicsuperscriptsubscript𝑒𝑖𝑐e_{i}^{c}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT Computation energy
eitsuperscriptsubscript𝑒𝑖𝑡e_{i}^{t}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Transmission energy
eisubscript𝑒𝑖e_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Energy consumption requirement
ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT The desired learning accuracy
Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Computation capacity required (CPU cycles per bit)
fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Computation capacity of i𝑖iitalic_i (CPU cycles per second)
risubscript𝑟𝑖r_{i}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Transmission rate
bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Bandwidth allocated to user i𝑖iitalic_i
gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Channel gain between user i𝑖iitalic_i and the BS
pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Transmit power of user i𝑖iitalic_i
N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT Power spectral density of the Gaussian noise
M(w)𝑀𝑤M(w)italic_M ( italic_w ) FL model size
T𝑇Titalic_T Total number of communication rounds
ζisubscript𝜁𝑖\zeta_{i}italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Energy consumption factor of client i𝑖iitalic_i
MTBFi𝑀𝑇𝐵subscript𝐹𝑖MTBF_{i}italic_M italic_T italic_B italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Mean time between failures
ρi(t)subscript𝜌𝑖𝑡\rho_{i}(t)italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) Reliability of client i𝑖iitalic_i
ρ𝜌\rhoitalic_ρ Reliability requirement
misubscript𝑚𝑖m_{i}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Number of failures
cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Required minimum selection fraction for client i𝑖iitalic_i
𝐗psubscript𝐗𝑝\mathbf{X}_{p}bold_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT Position of the prey
𝐗𝐗\mathbf{X}bold_X Position of the wolf
𝐀𝐀\mathbf{A}bold_A and 𝐂𝐂\mathbf{C}bold_C GWO coefficient vectors
𝐝𝐝\mathbf{d}bold_d Distance between the wolf and the prey
num_clients𝑛𝑢𝑚_𝑐𝑙𝑖𝑒𝑛𝑡𝑠num\_clientsitalic_n italic_u italic_m _ italic_c italic_l italic_i italic_e italic_n italic_t italic_s Number of selected clients

4.1.2. Transmission Delay

After local computation, all users upload their local FL parameters to the server, the quality of the wireless channel is the primary factor that determines the transmission rate in each round that is given by:

(4) ri=bilog2(1+gipiN0bi),subscript𝑟𝑖subscript𝑏𝑖subscript21subscript𝑔𝑖subscript𝑝𝑖subscript𝑁0subscript𝑏𝑖r_{i}=b_{i}\log_{2}\left(1+\frac{g_{i}p_{i}}{N_{0}b_{i}}\right),italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 + divide start_ARG italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ,

where bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the bandwidth allocated to user i𝑖iitalic_i, pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the transmit power of user i𝑖iitalic_i, gisubscript𝑔𝑖g_{i}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the channel gain between user i𝑖iitalic_i and the BS, and N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the power spectral density of the Gaussian noise.

The model size determines the transmission time between the client and server, expressed as M(w)𝑀𝑤M(w)italic_M ( italic_w ). The model transmission time is calculated using the following formula:

(5) τit=M(w)ri.superscriptsubscript𝜏𝑖𝑡𝑀𝑤subscript𝑟𝑖\tau_{i}^{t}=\frac{M(w)}{r_{i}}.italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG italic_M ( italic_w ) end_ARG start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG .

4.2. Energy

Energy is a critical factor to consider when deploying FL, to implement energy-efficient ML algorithms, optimize communications, use low-power hardware accelerators, and develop energy-aware scheduling strategies. Balancing the benefits of FL with the energy constraints of participating devices is crucial for its widespread adoption and long-term sustainability. The energy consumption of each client i𝑖iitalic_i is the sum of the energy used to train the model on each client’s device and the energy used to transmit the local model from the device to the server.

4.2.1. Computation Energy

The computing resources consumed by model training depend on the size of local data Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which is expressed as (Chen et al., 2020):

(6) eic=ζifi2τicfi=ζifi2CiDiυilog2(1ϵi).superscriptsubscript𝑒𝑖𝑐subscript𝜁𝑖superscriptsubscript𝑓𝑖2superscriptsubscript𝜏𝑖𝑐subscript𝑓𝑖subscript𝜁𝑖superscriptsubscript𝑓𝑖2subscript𝐶𝑖subscript𝐷𝑖subscript𝜐𝑖subscript21subscriptitalic-ϵ𝑖e_{i}^{c}=\zeta_{i}f_{i}^{2}\cdot\tau_{i}^{c}f_{i}=\zeta_{i}f_{i}^{2}C_{i}D_{i% }\upsilon_{i}\log_{2}\left(\frac{1}{\epsilon_{i}}\right).italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_υ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) .

where ζisubscript𝜁𝑖\zeta_{i}italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the energy consumption coefficient depending on the chip of each client i𝑖iitalic_i’s device. Note that, since the server has a continuous power supply, we do not consider the energy consumption of the server in our problem.

4.2.2. Transmission Energy

The energy consumption of client i𝑖iitalic_i in model transmission is expressed as:

(7) eit=piτit=piM(w)ri.superscriptsubscript𝑒𝑖𝑡subscript𝑝𝑖superscriptsubscript𝜏𝑖𝑡subscript𝑝𝑖𝑀𝑤subscript𝑟𝑖e_{i}^{t}=p_{i}\tau_{i}^{t}=p_{i}\frac{M(w)}{r_{i}}.italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG italic_M ( italic_w ) end_ARG start_ARG italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG .

4.3. Reliability

Choosing clients capable of completing local training is a crucial maintenance metric to measure performance, safety, and equipment design, especially for critical or complex assets. The reliability of the client’s device ensures the trustworthiness, stability, and efficiency of the FL process. It allows FL systems to make informed decisions regarding the participation of clients, data quality, and security, which results in better model performance and a more dependable and robust learning process (Park et al., 2022). The reliability computation of a client i𝑖iitalic_i is performed by considering the time between failures i.e. MTBF (mean time between failures), which refers to the average time between two failures and is defined as follows (Sharma and Kaur, 2023):

(8) MTBFi=τicmi,𝑀𝑇𝐵subscript𝐹𝑖superscriptsubscript𝜏𝑖𝑐subscript𝑚𝑖MTBF_{i}=\frac{\tau_{i}^{c}}{m_{i}},italic_M italic_T italic_B italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_ARG start_ARG italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ,

where misubscript𝑚𝑖m_{i}italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the number of failures. The client reliability, or the probability of operating without fail for a time t𝑡titalic_t, is denoted by ρi(t)subscript𝜌𝑖𝑡\rho_{i}(t)italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ):

(9) ρi(t)=et/MTBFi.subscript𝜌𝑖𝑡superscript𝑒𝑡𝑀𝑇𝐵subscript𝐹𝑖\rho_{i}(t)=e^{-t/MTBF_{i}}.italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = italic_e start_POSTSUPERSCRIPT - italic_t / italic_M italic_T italic_B italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

A higher reliable client device is less likely to fail shortly. This, in turn, reduces the risk of losing the training data or the local model due to unintentional shutdown and network instability.

4.4. Fairness

During the FL process, the client selection method often prioritizes devices with low latency. However, this bias towards speed may not be fair to clients with high data quality, the local dataset which has a larger size and whose distribution is more similar to the global distribution plays a more important role, and the corresponding clients should participate in more communication rounds. Therefore, it is important to consider the fairness constraint to avoid an overabundance of relevant clients (Smestad and Li, 2023). The fairness constraint is considered to ”tell” each client how many communication rounds they should participate (Xia et al., 2020). We introduce the following constraint on a minimum selection fraction for each client i𝑖iitalic_i (Li et al., 2019):

(10) 1Tt=1TE[ai(t)]ci,1𝑇superscriptsubscript𝑡1𝑇𝐸delimited-[]subscript𝑎𝑖𝑡subscript𝑐𝑖\frac{1}{T}\sum_{t=1}^{T}E[a_{i}(t)]\geq c_{i},divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_E [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ] ≥ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where E[.]E[.]italic_E [ . ] is the expectation operator and ci(0,1)subscript𝑐𝑖01c_{i}\in(0,1)italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( 0 , 1 ) is the minimum fraction of communication rounds required to choose client i𝑖iitalic_i. T𝑇Titalic_T is the total number of rounds and ai(t)subscript𝑎𝑖𝑡a_{i}(t)italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) is a binary variable defined as an indicator with ai(t)=1subscript𝑎𝑖𝑡1a_{i}(t)=1italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = 1 indicating that client i𝑖iitalic_i is selected in round t𝑡titalic_t, and ai(t)=0subscript𝑎𝑖𝑡0a_{i}(t)=0italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = 0 otherwise.

4.5. Problem Formulation

Our approach involves a ”select then train” client selection method where the server invites clients who meet the constraints of accuracy, energy, delay, reliability, and fairness to participate in the FL algorithm. We formulate our problem whose goal is to minimize the loss function of an FL algorithm by optimizing the various wireless parameters, as follows:

(11) minF(w)=1Di=1nj=1Dil(w,xji,yji)𝐹𝑤1𝐷superscriptsubscript𝑖1𝑛superscriptsubscript𝑗1subscript𝐷𝑖𝑙𝑤subscript𝑥𝑗𝑖subscript𝑦𝑗𝑖\min F(w)=\frac{1}{D}\sum_{i=1}^{n}\sum_{j=1}^{D_{i}}l(w,x_{ji},y_{ji})roman_min italic_F ( italic_w ) = divide start_ARG 1 end_ARG start_ARG italic_D end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_l ( italic_w , italic_x start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_j italic_i end_POSTSUBSCRIPT )
(11a) s.t.τic+τitτi,iN\displaystyle s.t.\quad\tau_{i}^{c}+\tau_{i}^{t}\leq\tau_{i},\qquad\qquad% \forall i\in Nitalic_s . italic_t . italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT + italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i ∈ italic_N
(11b) 0<eic+eitei,iNformulae-sequence0superscriptsubscript𝑒𝑖𝑐superscriptsubscript𝑒𝑖𝑡subscript𝑒𝑖for-all𝑖𝑁\displaystyle\qquad 0<e_{i}^{c}+e_{i}^{t}\leq e_{i},\qquad\forall i\in N0 < italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT + italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i ∈ italic_N
(11c) ρi(t)ρ,iNformulae-sequencesubscript𝜌𝑖𝑡𝜌for-all𝑖𝑁\displaystyle\rho_{i}(t)\geq\rho,\qquad\quad\forall i\in Nitalic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ≥ italic_ρ , ∀ italic_i ∈ italic_N
(11d) 1Tt=1TE[ai(t)]ciiNformulae-sequence1𝑇superscriptsubscript𝑡1𝑇𝐸delimited-[]subscript𝑎𝑖𝑡subscript𝑐𝑖for-all𝑖𝑁\displaystyle\frac{1}{T}\sum_{t=1}^{T}E[a_{i}(t)]\geq c_{i}\quad\forall i\in Ndivide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_E [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ] ≥ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∀ italic_i ∈ italic_N
(11e) ϵminϵi1iNformulae-sequencesubscriptitalic-ϵ𝑚𝑖𝑛subscriptitalic-ϵ𝑖1for-all𝑖𝑁\displaystyle\epsilon_{min}\leq\epsilon_{i}\leq 1\quad\forall i\in Nitalic_ϵ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 1 ∀ italic_i ∈ italic_N
(11f) 0fifimaxiNformulae-sequence0subscript𝑓𝑖superscriptsubscript𝑓𝑖𝑚𝑎𝑥for-all𝑖𝑁\displaystyle 0\leq f_{i}\leq f_{i}^{max}\quad\forall i\in N0 ≤ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_a italic_x end_POSTSUPERSCRIPT ∀ italic_i ∈ italic_N
(11g) 0pipimaxiNformulae-sequence0subscript𝑝𝑖superscriptsubscript𝑝𝑖𝑚𝑎𝑥for-all𝑖𝑁\displaystyle 0\leq p_{i}\leq p_{i}^{max}\quad\forall i\in N0 ≤ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m italic_a italic_x end_POSTSUPERSCRIPT ∀ italic_i ∈ italic_N
(11h) i=1nbiBiNformulae-sequencesuperscriptsubscript𝑖1𝑛subscript𝑏𝑖𝐵for-all𝑖𝑁\displaystyle\sum_{i=1}^{n}b_{i}\leq B\quad\forall i\in N∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_B ∀ italic_i ∈ italic_N
(11i) 0ci1iNformulae-sequence0subscript𝑐𝑖1for-all𝑖𝑁\displaystyle 0\leq c_{i}\leq 1\quad\forall i\in N0 ≤ italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ 1 ∀ italic_i ∈ italic_N

where γTsubscript𝛾𝑇\gamma_{T}italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is the maximum delay to join the FL system, γEsubscript𝛾𝐸\gamma_{E}italic_γ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT is the energy consumption of the FL algorithm, γRsubscript𝛾𝑅\gamma_{R}italic_γ start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT is the minimum reliability needed to participate to the FL process.

Constraint (11a) indicates that the execution time of the local tasks and transmission time for all clients should not exceed the maximum completion time for the whole FL algorithm. (11b) is the energy consumption constraint to perform the learning task. Constraint (11c) is the client’s device reliability condition for joining the FL algorithm. Constraint (11d) is the fairness constraint to participate in the FL algorithm. The local accuracy constraint is given by (11e). Constraints (11f) and (11g) respectively represent the maximum local computation capacity and average transmit power limits of all clients. Due to the limited bandwidth of the system, we have (11h), where B𝐵Bitalic_B is the total bandwidth. Constraint (11i) is the fraction of communication rounds required to ensure a fair selection.

5. Grey Wolf Optimizer-Based client selection

5.1. Federated Learning Algorithm

Our FL system is depicted in the pseudo-algorithm 1. It is divided into two pieces, one executed by the server and the other by the clients. The server first initializes the global model parameters with random values. The server coordinates different rounds of execution. At each round, the server selects the set of clients using Algorithm 2 and, in parallel, sends a copy of the training model. To fine-tune the copy of the training model, each client performs a series of gradient descent steps using its data. After training, each client sends back the weights and biases of the local model to the server. The server aggregates the updates from all clients and starts a new round.

Algorithm 1 OTA-FL with Multi-Attribute Client Selection
Base Station Side:
Initialize the global model W0subscript𝑊0W_{0}italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
for t0𝑡0t\leftarrow 0italic_t ← 0 to T𝑇Titalic_T do
     Select client set 𝒞𝒞\mathcal{C}caligraphic_C using Algorithm 2
     Broadcast Wtsubscript𝑊𝑡W_{t}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to selected clients (i.e., 𝒞𝒞\mathcal{C}caligraphic_C).
     Receive the over-the-air aggregated global model Wt+1subscript𝑊𝑡1W_{t+1}italic_W start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT.
end for
Selected Client Side:
At each round t𝑡titalic_t:
Receive current global model Wtsubscript𝑊𝑡W_{t}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.
Train local model and produce model update Wt+1csuperscriptsubscript𝑊𝑡1𝑐W_{t+1}^{c}italic_W start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT.
Send Wt+1csuperscriptsubscript𝑊𝑡1𝑐W_{t+1}^{c}italic_W start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT to the server.

The number of selected clients is determined dynamically in each round based on several factors:

  • The number of clients available in each round.

  • The total available bandwidth B𝐵Bitalic_B: Each client’s bandwidth requirement bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is considered to avoid exceeding B𝐵Bitalic_B.

  • The computation and energy: The computational power and energy availability of both the server and the clients are considered to avoid overburdening any participant.

5.2. Client Selection Algorithm

The GWO is a metaheuristic algorithm inspired by the social hierarchy and hunting behaviors of grey wolves in nature. It leverages these natural processes to efficiently search for optimal solutions in complex optimization problems due to the advantages of fewer parameters, simple principles, and implementation (Mirjalili et al., 2014). In this work, we employ the grey wolf model for Optimizing the client selection problem (Eq. 11), wherein the wolf is represented as the set of clients that are eligible to join the learning process (See Fig.2).

Let’s assume that there are S𝑆Sitalic_S solutions (sets of clients) in the search space, GWO classifies these solutions based on the objective function (Eq.11) for four categories as follows: the best solution is alpha (α𝛼\alphaitalic_α), the second-best is beta (β𝛽\betaitalic_β), the third-best delta (δ𝛿\deltaitalic_δ) and the rest solutions are omega (ω𝜔\omegaitalic_ω). The best three solutions (α,β,δ)𝛼𝛽𝛿(\alpha,\beta,\delta)( italic_α , italic_β , italic_δ ) are used to guide the other solutions (ω)𝜔(\omega)( italic_ω ) for improving the search space. During the optimization, there are three main phases of hunting behavior: Encircling, hunting, and attacking which will be detailed later.

Refer to caption
Figure 2. The wolf in the GWO is the set of clients in the FL process: The selected clients are shown in bold pictures, while transparent pictures represent clients that have not been selected
\Description

.

5.2.1. Encircling Phase

The grey wolves start hunting by creating a circle around the prey. The mathematical model of the encircling phase is developed using the following equations:

(12) 𝐗(t+1)=𝐗p(t)𝐀×𝐝.𝐗𝑡1subscript𝐗𝑝𝑡𝐀𝐝\mathbf{X}(t+1)=\mathbf{X}_{p}(t)-\mathbf{A}\times\mathbf{d}.bold_X ( italic_t + 1 ) = bold_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_t ) - bold_A × bold_d .

The distance 𝐝𝐝\mathbf{d}bold_d between the wolf and the prey is calculated by the following equation:

(13) 𝐝=|𝐂×𝐗p(t)𝐗(t)|,𝐝𝐂subscript𝐗𝑝𝑡𝐗𝑡\mathbf{d}=|\mathbf{C}\times\mathbf{X}_{p}(t)-\mathbf{X}(t)|,bold_d = | bold_C × bold_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_t ) - bold_X ( italic_t ) | ,

where t𝑡titalic_t is the current iteration, 𝐗psubscript𝐗𝑝\mathbf{X}_{p}bold_X start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the position of the prey and 𝐗𝐗\mathbf{X}bold_X is the position of the wolf. 𝐀𝐀\mathbf{A}bold_A and 𝐂𝐂\mathbf{C}bold_C are coefficient vectors defined as follows:

(14) 𝐀𝐀\displaystyle\mathbf{A}bold_A =2𝐚×𝐫1𝐚,absent2𝐚subscript𝐫1𝐚\displaystyle=2\mathbf{a}\times\mathbf{r}_{1}-\mathbf{a},= 2 bold_a × bold_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_a ,
(15) 𝐂𝐂\displaystyle\mathbf{C}bold_C =2𝐫2.absent2subscript𝐫2\displaystyle=2\mathbf{r}_{2}.= 2 bold_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

The components of 𝐚𝐚\mathbf{a}bold_a are linearly decreased from 2 to 0 over iterations and can be calculated by:

(16) a=2t×2/maxitr,𝑎2𝑡2𝑚𝑎subscript𝑥𝑖𝑡𝑟a=2-t\times 2/max_{itr},italic_a = 2 - italic_t × 2 / italic_m italic_a italic_x start_POSTSUBSCRIPT italic_i italic_t italic_r end_POSTSUBSCRIPT ,

where maxitr𝑚𝑎subscript𝑥𝑖𝑡𝑟max_{itr}italic_m italic_a italic_x start_POSTSUBSCRIPT italic_i italic_t italic_r end_POSTSUBSCRIPT is the maximum number of iterations. 𝐫1subscript𝐫1\mathbf{r}_{1}bold_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝐫2subscript𝐫2\mathbf{r}_{2}bold_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are random vectors in [0,1]01[0,1][ 0 , 1 ].

5.2.2. Hunting Phase

During the hunting phase, the three most promising solutions denoted by (α,β,δ)𝛼𝛽𝛿(\alpha,\beta,\delta)( italic_α , italic_β , italic_δ ) are obtained. As for the other research agents (ω)𝜔(\omega)( italic_ω ), they need to update their positions by moving towards the average of the three best-known positions since they have better knowledge about the optimal location of the prey. In this regard, the following equations have been presented with i{α,β,δ}𝑖𝛼𝛽𝛿i\in\{\alpha,\beta,\delta\}italic_i ∈ { italic_α , italic_β , italic_δ }:

(17) 𝐗i(t+1)=𝐗i(t)𝐚i×𝐝i,subscript𝐗𝑖𝑡1subscript𝐗𝑖𝑡subscript𝐚𝑖subscript𝐝𝑖\mathbf{X}_{i}(t+1)=\mathbf{X}_{i}(t)-\mathbf{a}_{i}\times\mathbf{d}_{i},bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) = bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - bold_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where 𝐝isubscript𝐝𝑖\mathbf{d}_{i}bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is estimated using the following:

(18) 𝐝i=|𝐂i×𝐗i(t)𝐗(t)|.subscript𝐝𝑖subscript𝐂𝑖subscript𝐗𝑖𝑡𝐗𝑡\mathbf{d}_{i}=\left|\mathbf{C}_{i}\times\mathbf{X}_{i}(t)-\mathbf{X}(t)\right|.bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | bold_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - bold_X ( italic_t ) | .

Let pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the positive weight associated with wolf i{α,β,δ}𝑖𝛼𝛽𝛿i\in\{\alpha,\beta,\delta\}italic_i ∈ { italic_α , italic_β , italic_δ } such that ipi=1subscript𝑖subscript𝑝𝑖1\sum_{i}p_{i}=1∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1. Given the positions of wolves α,β𝛼𝛽\alpha,\betaitalic_α , italic_β, and δ𝛿\deltaitalic_δ, a good estimation of the average position of the optimal solution at round t𝑡titalic_t is given by:

(19) 𝐗(t+1)=i{α,β,δ}pi𝐗i(t+1).𝐗𝑡1subscript𝑖𝛼𝛽𝛿subscript𝑝𝑖subscript𝐗𝑖𝑡1\mathbf{X}(t+1)=\sum_{i\in\{\alpha,\beta,\delta\}}p_{i}\cdot\mathbf{X}_{i}(t+1).bold_X ( italic_t + 1 ) = ∑ start_POSTSUBSCRIPT italic_i ∈ { italic_α , italic_β , italic_δ } end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ bold_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t + 1 ) .

5.2.3. Attacking Phase

GWO finishes hunting by attacking the prey when it stops moving, to model approaching the prey we use Eq. (16) as the parameter a𝑎aitalic_a is responsible for making the balance between exploration and exploitation, the value of a linearly decreased from 2 to 0 over iterations, consequently, the parameter A𝐴Aitalic_A takes a random value in the interval [2a,2a]2𝑎2𝑎[-2a,2a][ - 2 italic_a , 2 italic_a ] given by Eq. (14). The wolves take a random position when A>1𝐴1A>1italic_A > 1 or A<1𝐴1A<-1italic_A < - 1 and are forced to move towards the prey when 1A11𝐴1-1\leq A\leq 1- 1 ≤ italic_A ≤ 1.

Algorithm 2 Grey Wolf Optimizer-Based client Selection
Initialize the grey wolf population 𝐗𝐗\mathbf{X}bold_X
Initialize a, A, and C
Calculate the fitness of each search agent
Xαsubscript𝑋𝛼X_{\alpha}italic_X start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT = the best search agent
Xβsubscript𝑋𝛽X_{\beta}italic_X start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT = the second best agent
Xδsubscript𝑋𝛿X_{\delta}italic_X start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = the third best search agent
while t<maxitr𝑡𝑚𝑎subscript𝑥𝑖𝑡𝑟t<max_{itr}italic_t < italic_m italic_a italic_x start_POSTSUBSCRIPT italic_i italic_t italic_r end_POSTSUBSCRIPT do
     for each search agent do
         Randomly initialize r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
         Update the position of the current search agent using Eq.(19)
     end for
     Update a, A, and C
     Calculate the fitness of all search agents
     Update Xα,Xβsubscript𝑋𝛼subscript𝑋𝛽X_{\alpha},X_{\beta}italic_X start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT, and Xδsubscript𝑋𝛿X_{\delta}italic_X start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT
     t=t+1𝑡𝑡1t=t+1italic_t = italic_t + 1
end while
return Xαsubscript𝑋𝛼X_{\alpha}italic_X start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT \triangleright Best solution: Set of clients to join the FL

The multi-attribute client selection is provided in Algorithm.2. First, the GWO parameters are initialized by the base station by randomly setting the positions of wolves within the defined problem bounds, ensuring diversity in the initial population. The Xαsubscript𝑋𝛼X_{\alpha}italic_X start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT, Xβsubscript𝑋𝛽X_{\beta}italic_X start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT, and Xδsubscript𝑋𝛿X_{\delta}italic_X start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT wolves, representing the best solutions found, are initially set to zero vectors and updated as the algorithm progresses. The coefficient a𝑎aitalic_a decreases linearly from 2 to 0 over the iterations, balancing exploration and exploitation. A𝐴Aitalic_A and C𝐶Citalic_C used in the position update formulas are derived from a𝑎aitalic_a and random values r1𝑟1r1italic_r 1 and r2𝑟2r2italic_r 2. Second, the GWO calculates the score of the best clients based on the lowest loss value, lowest computation and transmission delay, lowest energy consumption, highest reliability, and fairness. The best score value is sent to the BS from each set of clients. The algorithm tracks the best positions for the Xαsubscript𝑋𝛼X_{\alpha}italic_X start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT, Xβsubscript𝑋𝛽X_{\beta}italic_X start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT, and Xδsubscript𝑋𝛿X_{\delta}italic_X start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT wolves based on their fitness, updating these whenever a better solution is discovered. We set maxitr𝑚𝑎subscript𝑥𝑖𝑡𝑟max_{i}tritalic_m italic_a italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_t italic_r to 50, consistent with common practice in the literature, to ensure effective exploration and exploitation of the search space. Finally, the local models are trained by the best clients with the best score Xαsubscript𝑋𝛼X_{\alpha}italic_X start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT and sent to the base station for aggregation via OTA communication.

6. Experimental Investigation

To assess the effectiveness of the proposed multi-attribute client selection algorithm for FL systems, we conducted experiments to analyze the performance of the global model and investigate the effects of delay, energy consumption, reliability, and fairness constraints. In this section, We offer a comparative analysis between our solution and several existing methods, including dynamic programming, multi-armed bandit, and genetic algorithms. Our evaluation will be based on various datasets including MNIST, CIFAR-10 and Fashion MNIST, considering test loss, test accuracy, energy consumption, training time, reliability, and fairness as key metrics.

6.1. Experimental setup

Under our problem formulation, the ”select then train” method is effectively implemented using client metadata and historical performance data. Although the objective function is dependent on the model parameters and local datasets, the selection process leverages surrogate metrics derived from client profiles, which include computational capabilities, data size, and distribution summaries. These profiles are updated periodically and shared with the server, allowing it to make informed decisions without direct access to the local data. We implement our FL model using the following datasets:

  • MNIST, comprising 60,000 28 x 28 images of handwritten digits from 0 to 9.

  • CIFAR-10, which includes 60,000 32 x 32 color images in 10 classes, with 6,000 images per class.

  • Fashion MNIST, consisting of 70,000 28 x 28 grayscale images of 10 different categories of clothing items.

These datasets are distributed among 50 clients to train the FL model, and each client possesses unique hardware parameters, metadata, and statistics. This includes information such as data size, historical performance, device capabilities, network conditions, and previous training outcomes. This setup enables us to compute the total delay using Eq. (3) and Eq. (5), the total energy using Eq. (6) and Eq. (7), reliability using Eq. (9), and fairness using Eq. (10).

Our experimentation took place on the Google Colab T4 GPU cloud-based platform, utilizing Python version 3, TensorFlow version 2.3.0, and Keras version 2.4.3 for code development. We employed the convolutional neural network (CNN) algorithm to tackle our classification problems, employing the stochastic gradient descent (SGD) technique for training acceleration. Our study aims to enhance FL performance using GWO (Algorithm 2) by selecting clients capable of achieving optimal scores in prediction accuracy, delay, energy consumption, reliability, and fairness.

6.2. Comparison Scheme

In our previous work (Driss et al., 2024), we compared random client selection, loss-aware client selection, and our multi-attribute client selection using the MNIST dataset. This comparison demonstrated the effectiveness of our multi-attribute method in achieving superior performance metrics compared to simpler client selection techniques. The experimental results indicate that the proposed multi-attribute client selection can reduce energy consumption by up to 43% compared to the random client selection method. Additionally, our multi-attribute method outperforms the loss-aware method in terms of time reduction, computational efficiency, and energy consumption. These ablation experiments demonstrate that a comprehensive approach considering various attributes for client selection such as delay, energy, fairness, and reliability is more effective for FL systems.

Table 3. Global accuracy and energy efficiency under different client selection methods.
Client Selection Method Accuracy (%) Energy Efficiency (%/joule)
Random selection (3 clients) 68 0.46
Random selection (5 clients) 73 0.41
Loss-aware client selection 92 0.80
Our multi-attribute client selection 98 1.08

To further validate our approach, we conduct experiments on larger datasets with a larger number of clients. We compare the proposed solution with the following approaches:

  • Dynamic programming (DP): is a classic algorithm to solve the knapsack problem, an optimization problem that involves selecting a subset of items from a given set, each with a weight and a value. The objective is to maximize the total value of the chosen items while ensuring that the cumulative weight does not surpass a specified capacity (Assi and Haraty, 2018).

  • Genetic algorithm (GA): is an evolutionary optimization technique inspired by the process of selection and genetics. It mimics natural evolution, where individuals with higher fitness are more likely to survive and reproduce, leading to the emergence of better solutions over generations (Thengade and Dondal, 2012).

  • Multi-armed bandit (MAB): is a classic problem in probability theory and decision-making, often used in the context of optimization and resource allocation. The name originates from the idea of a gambler facing multiple slot machines (the ”bandits”), each with potentially different payoff probabilities, and need to decide which machine to play to maximize their total reward over time (Burtini et al., 2015). We implemented the upper confidence bound (UCB) algorithm, which is designed to address this trade-off by selecting options based on a combination of their average rewards and the uncertainty or confidence interval around those rewards. The UCB algorithm helps balance exploration and exploitation by assigning a score to each option, which includes an upper bound term to encourage exploring less-tried options (Ottens et al., 2017).

6.3. Experimental Results

To demonstrate the efficiency of our client selection approach in OTA-FL, we analyze the FL model using the MNIST, CIFAR10, and Fashion MNIST classification problems. This analysis aimed to evaluate various performance metrics, including the global model accuracy, loss probability, convergence time, energy consumption, energy efficiency, reliability, and fairness. Furthermore, we conducted a comparative assessment, juxtaposing the outcomes of our multi-attribute client selection employing GWO against those of other established methods, such as dynamic programming, multi-armed bandit, and genetic algorithms.

Table 4. Global performance of our OTA-FL system under different client selection schemes
Multi-attribute client selection method Dataset Accuracy (%) Loss Probability Convergence Time (s) Total Energy (J) Energy Efficiency (%/J)
Genetic algorithm MNIST 96.31 0.0505 15415 12000 0.008
CIFAR-10 72.25 0.045 26960 26100 0.0027
Fashion MNIST 82.73 0.034 22600 23400 0.0035
Multi-armed bandit MNIST 98.15 0.0185 15029 12800 0.0076
CIFAR-10 75.68 0.040 25200 27760 0.0027
Fashion MNIST 85.36 0.032 21300 24700 0.0034
Dynamic programming MNIST 98.07 0.0193 11422 11920 0.0082
CIFAR-10 75.15 0.043 24340 26350 0.0028
Fashion MNIST 84.49 0.033 20600 23600 0.0035
Grey wolf optimizer MNIST 98.43 0.0173 11200 11800 0.0084
CIFAR-10 77.78 0.039 23100 24500 0.0031
Fashion MNIST 86.25 0.031 19500 22300 0.0044
\Description

Comparison

Refer to caption
(a) 15 clients.
Refer to caption
(b) 50 clients.
Figure 3. Test accuracy under different client selection schemes for MNIST dataset.
\Description

Comparison

Accuracy: The accuracy metric indicates the overall performance of the FL model across all clients. It represents the proportion of correctly predicted instances in the entire dataset. In our FL system, we observe in (Table 4) and (Fig.3) that the client selection approach using the GWO and MAB achieved the highest accuracy. This indicates that the selection strategy based on the grey wolf algorithm and multi-armed bandit effectively utilized client resources and data contributions to improve model accuracy. The other two methods, using the GA and DP algorithms, also achieved high accuracy. While slightly lower than the top-performing methods, this accuracy still indicates robust performance in accurately predicting image labels.

Loss Probability: The loss probability offers valuable insight into the uncertainty surrounding model predictions, reflecting the likelihood of erroneous predictions or misclassifications. In our analysis, employing client selection methods such as the MAB algorithm and the GWO resulted in remarkably low loss probabilities (See Table 4). This achievement underscores a high degree of confidence in the accuracy of their predictions. As depicted in (Fig. 4(b)), it’s evident that the loss probability consistently decreased across various iterations using the MNIST dataset, further affirming the effectiveness and reliability of the applied methodologies.

Refer to caption
(a) 15 clients.
Refer to caption
(b) 50 clients.
Figure 4. Loss probability under different client selection schemes for MNIST dataset.
\Description

.

Convergence Time: The GWO exhibits a gradual decrease in execution time (as seen in Fig.5(b)), indicating efficient convergence towards optimal client selections over epochs, with stabilization observed after reaching a steady state. Similarly, the Multi-armed Bandit approach demonstrates steady execution time reduction, reflecting efficient convergence and balanced exploration-exploitation strategies. In contrast, the dynamic programming and genetic algorithms show more variability in execution times, with fluctuations observed throughout epochs, suggesting differing convergence behaviors and computational requirements. The convergence time (See Table 4) across different client selection methods reveals notable differences in computational efficiency. Our solution based on the GWO emerges as the most time-efficient method, with a total execution time of 6925 seconds. This indicates that the GWO-based multi-attribute client selection approach requires the least amount of time to complete the FL process compared to the other methods.

Refer to caption
(a) 15 clients.
Refer to caption
(b) 50 clients.
Figure 5. Convergence time under different client selection schemes for MNIST dataset.
\Description

.

Energy Efficiency: The energy consumption across different client selection methods in our FL system reveals notable variations in energy efficiency. As seen in Table 4, the GWO-based multi-attribute client selection consistently demonstrates the lowest energy consumption, maintaining a constant energy level for all communication rounds. This suggests that the GWO-based client selection approach maintains energy efficiency by consistently selecting optimal clients without significant fluctuations. In comparison, both the DP and GA-based approaches exhibit higher energy consumption across all epochs. Similarly, the multi-armed bandit approach also shows consistent energy consumption. These results indicate that the GWO-based approach is the most energy-efficient among the methods evaluated, highlighting its potential for reducing energy costs in our OTA-FL system.

Instantaneous Energy Efficiency (IEE): To assess the energy efficiency at round t𝑡titalic_t of our scheme and analyze how it compares with existing literature, we introduce the following energy efficiency indicator:

(20) IEE(t)Accuracy(t)Energy(t)(%/joule).𝐼𝐸𝐸𝑡Accuracy(t)Energy(t)(%/joule)IEE(t)\triangleq\frac{\mbox{Accuracy(t)}}{\mbox{Energy(t)}}\quad\mbox{(\%/% joule)}.italic_I italic_E italic_E ( italic_t ) ≜ divide start_ARG Accuracy(t) end_ARG start_ARG Energy(t) end_ARG (%/joule) .

Global Energy Efficiency (GEE): Similarly, we assess the global energy efficiency of our schemes using the following energy efficiency indicator :

(21) GEEGlobal AccuracyTotal Energy=Accuracy(T)t=1TEnergy(t)(%/joule).formulae-sequence𝐺𝐸𝐸Global AccuracyTotal EnergyAccuracy(T)superscriptsubscript𝑡1𝑇Energy(t)(%/joule)GEE\triangleq\frac{\mbox{Global Accuracy}}{\mbox{Total Energy}}=\frac{\mbox{% Accuracy(T)}}{\sum\limits_{t=1}^{T}\mbox{Energy(t)}}\quad\mbox{(\%/joule)}.italic_G italic_E italic_E ≜ divide start_ARG Global Accuracy end_ARG start_ARG Total Energy end_ARG = divide start_ARG Accuracy(T) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT Energy(t) end_ARG (%/joule) .

The evolution of this instantaneous energy efficiency indicator under different client selection schemes is illustrated in Fig.6(b) which reveals several insights. Firstly, examining our multi-attribute client selection using the GWO, it’s observed that the IEE metric consistently increases over communication rounds, suggesting a stable and potentially improving energy performance. This indicates that the GWO-based method optimizes client selection with minimal energy consumption. In contrast, the selection using the DP algorithm demonstrates slightly lower but relatively stable IEE values across iterations, indicating moderate energy efficiency. The genetic algorithm approach shows a gradual increase in IEE over communication rounds, suggesting a less stable but still moderate energy performance. Finally, the multi-armed bandit approach displays the lowest IEE values, with fluctuations across iterations, indicating relatively poor energy efficiency compared to other methods. In general, the GWO-based method stands out as the most energy-efficient approach with an accuracy of 0.0084% per joule consumed for the MNIST dataset, followed by the DP and GA methods while the multi-armed bandit-based method appears to be the least energy-efficient in this context.

Refer to caption
(a) 15 clients.
Refer to caption
(b) 50 clients.
Figure 6. Instantaneous energy efficiency under different client selection schemes for MNIST dataset.
\Description

.

Reliability: The reliability metric in our FL system is crucial for ensuring the trustworthiness, stability, and efficiency of the FL process, especially when dealing with critical or complex assets. It measures the ability of clients to complete local training tasks reliably over time, thereby impacting performance, safety, and equipment design. The results illustrated in (Table 5) indicate that our multi-attribute client selection using the GWO achieves the highest average reliability of 0.70, followed by the dynamic programming, the genetic algorithm, and the multi-armed bandit method with the lowest average reliability. Additionally, examining the worst reliability values reveals that the GWO-based method also outperforms other methods with the worst reliability of 0.42. These results imply that the GWO method not only achieves higher average reliability but also maintains better worst-case reliability compared to other client selection methods in our system, indicating its superiority in selecting reliable devices for participation in the OTA-FL process.

Table 5. Average and worst reliability and Fairness under different client selection schemes for MNIST dataset.
Multi-attribute client
selection method
Reliability Fairness
Average value Worst value Average value Worst value
Multi-armed bandit 0.53 0.34 0.92 0.68
Genetic algorithm 0.55 0.32 0.84 0.60
Dynamic programming 0.60 0.36 0.86 0.64
Grey wolf optimizer 0.70 0.42 0.89 0.65

Fairness: Our fairness constraint aims to ensure equitable treatment of all participating clients regarding their contribution to communication rounds. The results presented in Table 5 highlight the strong emphasis on fairness criteria across all four multi-attribute client selection methods in our OTA-FL system, with average fairness scores ranging from 0.89 to 0.92. Our multi-attribute client selection using the multi-armed bandit achieves the highest average fairness score of 0.92, indicating effective balancing of client participation in communication rounds and ensuring a fair distribution of workload among them. Following closely is the GWO-based method, which also demonstrates a commendable performance in maintaining fairness. The GA and DP-based methods, although slightly lower in average fairness scores, still exhibit satisfactory performance in most cases, showcasing the overall robustness of our fairness constraint across different client selection techniques.

Table 6. Global performance of our OTA-FL system under two selection methods for MNIST dataset.
Our multi-attribute Client Selection Method Convergence Time (s) Accuracy (%) Loss
Probability
Total Energy (joule) Energy Efficiency (%/joule)
Train then Select 14000 98.00 0.0180 12564 0.0078
Select then Train 11200 98.43 0.0173 11800 0.0084
\Description

select then train results

Select then train VS Train then select: There are two primary methods for client selection strategy: selecting clients before training which aims to optimize the selection process before initiating model training, potentially reducing communication overhead, and ensuring that only the most important clients participate in the learning process. The other method is selecting clients after training local models which involves training local models on all available clients first and then selecting a subset of them based on their performance or contribution to the global model. The results presented in Table 6 indicate that selecting clients before training the global model leads to faster convergence, higher accuracy, lower loss probability, and improved energy efficiency compared to training local models and then selecting clients. Additionally, this pre-selection of FL participants preserves the clients’ privacy and enhances security by minimizing the opportunity for unauthorized access to client data during the selection process. As a result, the ”Select then Train” method not only improves performance but also reinforces the confidentiality and integrity of client data, making it a more robust and privacy-preserving approach in FL environments.

6.4. System scalability

Table 7. Global Performance of our approach under different numbers of clients.
Number of Clients Accuracy (%) Convergence time (s) Energy Efficiency (%/joule) Average Reliability Average Fairness
15 Clients 98.16 6925 0.016 0.67 0.87
50 Clients 98.43 19500 0.0044 0.70 0.89

To analyze the scalability of our multi-attribute client selection using the GWO, we evaluate performance metrics with varying numbers of clients. As the number of clients increases, accuracy improves (Fig.3), indicating that the model benefits from learning from a more diverse set of client data, which enhances generalization. This improvement demonstrates that more clients contribute valuable and varied data, leading to a better-performing model. The energy efficiency shows significant improvement to indicate that while more clients require more resources, the utilization of these resources becomes more effective (Fig.6(b)). However, convergence time rises significantly with more clients (Fig.5(b)). This increase is primarily due to the additional communication overhead and computational complexity associated with aggregating updates from a larger number of clients. Furthermore, the reliability and fairness of the model improve with the addition of more clients demonstrating that a larger number of clients leads to a more stable and equitable distribution of resources and benefits. In summary, scaling up the number of clients enhances model accuracy, energy efficiency, reliability, and fairness at the price of increased complexity. In our daily lives, most applications require acceptable accuracy which may need a low-average number of clients keeping the complexity at acceptable levels.

6.5. Discussion & Insights

By leveraging the multi-attribute client selection using GWO, we aim to optimize the process of choosing clients based on multiple attributes crucial to the success of OTA-FL. One key aspect is the ability of our solution to enhance the selection of participants based on their proficiency in providing informative updates. In OTA-FL, the quality of model updates plays a pivotal role in the overall learning process. Clients capable of contributing insightful and relevant updates contribute significantly to the effectiveness and generalization of the global model (see Table 4). The GWO-based approach helps us identify and prioritize clients with a higher potential for delivering informative contributions, thereby enriching the learning experience. Moreover, the GWO assists in striking a balance between the informative updates and the associated communication costs.

Table 8. Exploration, exploitation, and complexity of each client selection method.
Client Selection Method Ref Exploration vs. Exploitation Complexity
Multi-attribute MAB-based client selection (Mannor and Tsitsiklis, 2004) Balancing exploration (trying different arms to learn their rewards) and exploitation (favoring clients with higher expected rewards based on past observations) 𝒪(nTlogn)𝒪𝑛𝑇𝑛\mathcal{O}\left(n\cdot T\cdot\log{}n\right)caligraphic_O ( italic_n ⋅ italic_T ⋅ roman_log italic_n )
Multi-attribute GA-based client selection (Nopiah et al., 2010) Balancing exploration (diversity in clients) and exploitation (selecting and refining promising sets of clients through crossover and mutation) 𝒪(nT)𝒪𝑛𝑇\mathcal{O}\left(n\cdot T\right)caligraphic_O ( italic_n ⋅ italic_T )
Multi-attribute DP-based client selection (Ezugwu et al., 2019) Focusing on exploitation, as the goal is to find the best combination of clients given the energy constraint 𝒪(nTE_dim)𝒪𝑛𝑇𝐸_𝑑𝑖𝑚\mathcal{O}\left(n\cdot T\cdot E\_dim\right)caligraphic_O ( italic_n ⋅ italic_T ⋅ italic_E _ italic_d italic_i italic_m )
Multi-attribute GWO-based client selection (Mirjalili et al., 2014) Balancing exploration (searching for new promising sets of clients) and exploitation (exploiting known promising sets) through the movement of wolves towards better solutions 𝒪(nT)𝒪𝑛𝑇\mathcal{O}\left(n\cdot T\right)caligraphic_O ( italic_n ⋅ italic_T )

Table 8 shows that each multi-attribute client selection method balances exploration and exploitation differently to optimize performance. The MAB-based method focuses on exploring various client combinations while favoring those with higher expected rewards, with computational complexity scaling logarithmically with the number of selected clients (num_clients𝑛𝑢𝑚_𝑐𝑙𝑖𝑒𝑛𝑡𝑠num\_clientsitalic_n italic_u italic_m _ italic_c italic_l italic_i italic_e italic_n italic_t italic_s). GA-based selection emphasizes diversity and refinement of promising solutions through crossover and mutation, exhibiting linear complexity with selected clients and communication rounds. The DP-based selection primarily exploits the best client combination within resource constraints, its complexity tied to the required energy consumption dimensionality (E_dim𝐸_𝑑𝑖𝑚E\_dimitalic_E _ italic_d italic_i italic_m). At the same time, the GWO-based method combines exploration and exploitation by iteratively refining client selections, with a complexity similar to the GA-based method.

Analyzing the robustness of our multi-attribute client selection approach to factors such as noise, outliers, and changes in network conditions provides valuable insights into its reliability and resilience in real-world scenarios. Noise in client data, arising from measurement errors or inconsistencies, can potentially impact the performance of client selection algorithms by introducing inaccuracies or biases. For future work, we aim to evaluate the ability of this approach to handle noisy data effectively, either by incorporating noise-reduction techniques or by adapting selection criteria to account for variability in data quality.

7. Conclusion

We proposed a multi-attribute client selection framework utilizing the GWO to strategically manage the number of participants in each round and enhance the OTA-FL process. Our framework effectively optimizes several critical factors, including accuracy, energy consumption, delay, reliability, and fairness of participating devices. Experimental results validate the robustness and scalability of our approach. Compared to state-of-the-art methods, our framework ensures that our FL system is not only more accurate and fair but also significantly more energy-efficient and responsive. This makes our approach particularly well-suited to meet the demands of modern applications, where efficient and equitable system performance is essential.

References

  • (1)
  • AbdulRahman et al. (2020) Sawsan AbdulRahman, Hanine Tout, Azzam Mourad, and Chamseddine Talhi. 2020. FedMCCS: Multicriteria client selection model for optimal IoT federated learning. IEEE Internet of Things Journal 8, 6 (2020), 4723–4735.
  • Abouzahir et al. (2023) Saad Abouzahir, Essaid Sabir, Halima Elbiaze, and Mohamed Sadik. 2023. Federated Power Control for Predictive QoS in 5G and Beyond: A Proof of Concept for URLLC. In NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium. IEEE, USA, 1–7.
  • Assi and Haraty (2018) Maram Assi and Ramzi A Haraty. 2018. A survey of the knapsack problem. In 2018 International Arab Conference on Information Technology (ACIT). IEEE, USA, 1–6.
  • Azimi-Abarghouyi and Fodor (2024) Seyed Mohammad Azimi-Abarghouyi and Viktoria Fodor. 2024. Scalable Hierarchical Over-the-Air Federated Learning. IEEE Transactions on Wireless Communications 23, 8 (2024), arXiv–2211.
  • Burtini et al. (2015) Giuseppe Burtini, Jason Loeppky, and Ramon Lawrence. 2015. A survey of online experiment design with the stochastic multi-armed bandit. arXiv preprint arXiv:1510.00757 49, 8 (2015), 1–49.
  • Chahoud et al. (2023) Mario Chahoud, Hani Sami, Azzam Mourad, Safa Otoum, Hadi Otrok, Jamal Bentahar, and Mohsen Guizani. 2023. ON-DEMAND-FL: A Dynamic and Efficient Multi-Criteria Federated Learning Client Deployment Scheme. IEEE Internet of Things Journal 10, 18 (2023), 1–15.
  • Chen et al. (2020) Mingzhe Chen, Zhaohui Yang, Walid Saad, Changchuan Yin, H Vincent Poor, and Shuguang Cui. 2020. A joint learning and communications framework for federated learning over wireless networks. IEEE Transactions on Wireless Communications 20, 1 (2020), 269–283.
  • Cheng et al. (2023) Zhipeng Cheng, Xuwei Fan, Ning Chen, Minghui Liwang, Lianfen Huang, and Xianbin Wang. 2023. Learning-based client selection for multiple federated learning services with constrained monetary budgets. ICT Express 9, 6 (2023), 1059–1064.
  • Driss et al. (2024) Maryam Ben Driss, Essaid Sabir, Halima Elbiaze, Abdoulaye Baniré Diallo, and Mohamed Sadik. 2024. GWO-Boosted Multi-Attribute Client Selection for Over-The-Air Federated Learning. In 2024 20th International Conference on the Design of Reliable Communication Networks (DRCN). IEEE, Canada, 62–69.
  • Driss et al. (2023) Maryam Ben Driss, Essaid Sabir, Halima Elbiaze, and Walid Saad. 2023. Federated Learning for 6G: Paradigms, Taxonomy, Recent Advances and Insights. arXiv preprint arXiv:2312.04688 32 (2023), 1–31.
  • Ezugwu et al. (2019) Absalom E Ezugwu, Verosha Pillay, Divyan Hirasen, Kershen Sivanarain, and Melvin Govender. 2019. A comparative study of meta-heuristic optimization algorithms for 0–1 knapsack problem: Some initial results. IEEE Access 7 (2019), 43979–44001.
  • Faris et al. (2018) Hossam Faris, Ibrahim Aljarah, Mohammed Azmi Al-Betar, and Seyedali Mirjalili. 2018. Grey wolf optimizer: a review of recent variants and applications. Neural computing and applications 30 (2018), 413–435.
  • Fu et al. (2023) Lei Fu, Huanle Zhang, Ge Gao, Mi Zhang, and Xin Liu. 2023. Client selection in federated learning: Principles, challenges, and opportunities. IEEE Internet of Things Journal 10, 24 (2023), 5265–5275.
  • Huang et al. (2022) Tiansheng Huang, Weiwei Lin, Li Shen, Keqin Li, and Albert Y Zomaya. 2022. Stochastic client selection for federated learning with volatile clients. IEEE Internet of Things Journal 9, 20 (2022), 20055–20070.
  • Huang et al. (2020) Tiansheng Huang, Weiwei Lin, Wentai Wu, Ligang He, Keqin Li, and Albert Y Zomaya. 2020. An efficiency-boosting client selection scheme for federated learning with fairness guarantee. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2020), 1552–1564.
  • Kang and Ahn (2023) Dongseok Kang and Chang Wook Ahn. 2023. GA Approach to Optimize Training Client Set in Federated Learning. IEEE Access 11 (2023), 85489–85500.
  • Khan et al. (2020) Latif U Khan, Shashi Raj Pandey, Nguyen H Tran, Walid Saad, Zhu Han, Minh NH Nguyen, and Choong Seon Hong. 2020. Federated learning for edge networks: Resource optimization and incentive mechanism. IEEE Communications Magazine 58, 10 (2020), 88–93.
  • Li et al. (2019) Fengjiao Li, Jia Liu, and Bo Ji. 2019. Combinatorial sleeping bandits with fairness constraints. IEEE Transactions on Network Science and Engineering 7, 3 (2019), 1799–1813.
  • Luzón et al. (2024) M Victoria Luzón, Nuria Rodríguez-Barroso, Alberto Argente-Garrido, Daniel Jiménez-López, Jose M Moyano, Javier Del Ser, Weiping Ding, and Francisco Herrera. 2024. A tutorial on federated learning from theory to practice: Foundations, software frameworks, exemplary use cases, and selected trends. IEEE/CAA Journal of Automatica Sinica 11, 4 (2024), 824–850.
  • Makhadmeh et al. (2023) Sharif Naser Makhadmeh, Mohammed Azmi Al-Betar, Iyad Abu Doush, Mohammed A Awadallah, Sofian Kassaymeh, Seyedali Mirjalili, and Raed Abu Zitar. 2023. Recent advances in Grey Wolf Optimizer, its versions and applications. IEEE Access 12 (2023), 1–38.
  • Mannor and Tsitsiklis (2004) Shie Mannor and John N Tsitsiklis. 2004. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research 5, Jun (2004), 623–648.
  • Mayhoub and M. Shami (2024) Samara Mayhoub and Tareq M. Shami. 2024. A Review of Client Selection Methods in Federated Learning. Archives of Computational Methods in Engineering 31, 2 (2024), 1129–1152.
  • Mirjalili et al. (2014) Seyedali Mirjalili, Seyed Mohammad Mirjalili, and Andrew Lewis. 2014. Grey wolf optimizer. Advances in engineering software 69 (2014), 46–61.
  • Mohamed et al. (2024) Aissa Hadj Mohamed, Allan M de Souza, Joahannes B D Da Costa, Leandro A Villas, and Julio Cesar Dos Reis. 2024. CCSF: Clustered Client Selection Framework for Federated Learning in non-IID Data. In Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing (UCC ’23). Association for Computing Machinery, New York, NY, USA, Article 36, 8 pages. https://doi.org/10.1145/3603166.3632563
  • Nishio and Yonetani (2019) Takayuki Nishio and Ryo Yonetani. 2019. Client selection for federated learning with heterogeneous resources in mobile edge. In ICC 2019-2019 IEEE international conference on communications (ICC). IEEE, USA, 1–7.
  • Nopiah et al. (2010) ZM Nopiah, MI Khairir, Shahrum Abdullah, MN Baharin, and A Arifin. 2010. Time complexity analysis of the genetic algorithm clustering method. In Proceedings of the 9th WSEAS international conference on signal processing, robotics and automation, ISPRA, Vol. 10. ACM, USA, 171–176.
  • Ottens et al. (2017) Brammert Ottens, Christos Dimitrakakis, and Boi Faltings. 2017. DUCT: An upper confidence bound approach to distributed constraint optimization problems. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 5 (2017), 1–27.
  • Park et al. (2022) Junha Park, Jiseon Moon, Taekyoon Kim, Peng Wu, Tales Imbiriba, Pau Closas, and Sunwoo Kim. 2022. Federated learning for indoor localization via model reliability with dropout. IEEE Communications Letters 26, 7 (2022), 1553–1557.
  • Qu et al. (2022) Zhe Qu, Rui Duan, Lixing Chen, Jie Xu, Zhuo Lu, and Yao Liu. 2022. Context-aware online client selection for hierarchical federated learning. IEEE Transactions on Parallel and Distributed Systems 33, 12 (2022), 4353–4367.
  • Ruan et al. (2021) Yichen Ruan, Xiaoxi Zhang, and Carlee Joe-Wong. 2021. How valuable is your data? optimizing client recruitment in federated learning. In 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt). IEEE, USA, 1–8.
  • Sharma and Kaur (2023) Mradula Sharma and Parmeet Kaur. 2023. Reliable federated learning in a cloud-fog-IoT environment. The Journal of Supercomputing 79, 14 (2023), 1–24.
  • Shi et al. (2023) Fang Shi, Weiwei Lin, Lisheng Fan, Xiazhi Lai, and Xiumin Wang. 2023. Efficient client selection based on contextual combinatorial multi-arm bandits. IEEE Transactions on Wireless Communications 22, 8 (2023), 5265–5277.
  • Smestad and Li (2023) Carl Smestad and Jingyue Li. 2023. A Systematic Literature Review on Client Selection in Federated Learning. arXiv preprint arXiv:2306.04862 10 (2023), 2–11.
  • Thengade and Dondal (2012) Anita Thengade and Rucha Dondal. 2012. Genetic algorithm–survey paper. In MPGI national multi conference. Citeseer, UQA, 7–8.
  • Wang et al. (2020) Hao Wang, Zakhary Kaplan, Di Niu, and Baochun Li. 2020. Optimizing federated learning on non-iid data with reinforcement learning. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, USA, 1698–1707.
  • Wen et al. (2023) Jie Wen, Zhixia Zhang, Yang Lan, Zhihua Cui, Jianghui Cai, and Wensheng Zhang. 2023. A survey on federated learning: challenges and applications. International Journal of Machine Learning and Cybernetics 14, 2 (2023), 513–535.
  • Wu et al. (2023) Wentai Wu, Ligang He, Weiwei Lin, and Carsten Maple. 2023. FedProf: Selective federated learning based on distributional representation profiling. IEEE Transactions on Parallel and Distributed Systems 34, 6 (2023), 1942–1953.
  • Xia et al. (2020) Wenchao Xia, Tony QS Quek, Kun Guo, Wanli Wen, Howard H Yang, and Hongbo Zhu. 2020. Multi-armed bandit-based client scheduling for federated learning. IEEE Transactions on Wireless Communications 19, 11 (2020), 7108–7123.
  • Xiao et al. (2024) Bingnan Xiao, Xichen Yu, Wei Ni, Xin Wang, and H Vincent Poor. 2024. Over-the-air federated learning: Status quo, open challenges, and future directions. Fundamental Research 16 (2024), arXiv–2307.
  • Yang et al. (2020) Zhaohui Yang, Mingzhe Chen, Walid Saad, Choong Seon Hong, Mohammad Shikh-Bahaei, H Vincent Poor, and Shuguang Cui. 2020. Delay minimization for federated learning over wireless communication networks. arXiv preprint arXiv:2007.03462 7, 1 (2020), arXiv–2007.
  • Zhang et al. (2023) Weiwen Zhang, Yanxi Chen, Yifeng Jiang, and Jianqi Liu. 2023. Delay-Constrained Client Selection for Heterogeneous Federated Learning in Intelligent Transportation Systems. IEEE Transactions on Network Science and Engineering 11, 1 (2023), 85489–85500.
  • Zheng et al. (2021) Jingjing Zheng, Kai Li, Eduardo Tovar, and Mohsen Guizani. 2021. Federated learning for energy-balanced client selection in mobile edge computing. In 2021 International Wireless Communications and Mobile Computing (IWCMC). IEEE, USA, 1942–1947.
  • Zhu et al. (2022) Hongbin Zhu, Yong Zhou, Hua Qian, Yuanming Shi, Xu Chen, and Yang Yang. 2022. Online client selection for asynchronous federated learning with fairness consideration. IEEE Transactions on Wireless Communications 22, 4 (2022), 2493–2506.
  • Zhu et al. (2024) Jingyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Wei Chen, and Khaled B Letaief. 2024. Over-the-Air Federated Learning and Optimization. IEEE Internet of Things Journal 31, 0 (2024), 2322–2353.
  • Zou et al. (2024) Yifei Zou, Shikun Shen, Mengbai Xiao, Peng Li, Dongxiao Yu, and Xiuzhen Cheng. 2024. Value of Information: A Comprehensive Metric for Client Selection in Federated Edge Learning. IEEE Trans. Comput. 73, 4 (jan 2024), 1152–1164. https://doi.org/10.1109/TC.2024.3355777