Exploiting Machine Learning for Intelligent

Reflecting Surfaces in Next-Gen Networks:

A Comprehensive Survey
Salim El Ghalbzouri Houda Chafnaji
Communications Systems Department, INPT, Rabat: Morocco Communications Systems Department, INPT, Rabat, Morocco
Email: [email protected] [email protected]

Abstract—In anticipation of next-generation networks, Intel- connectivity, a key aspect in the progression towards 6G
ligent Reflecting Surfaces (IRS) are poised to reshape wireless networks [2], [3].
communications. With the increasing reliance on high-frequency
waves in 6G, the challenges of signal blockage and reduced
Due to the complexity and dynamic nature of wireless
long-range efficacy have come to the forefront. IRS emerges settings, standard optimization methodologies frequently fail
as a promising, cost-effective solution for augmenting network to manage IRS adequately. Hence, the incorporation of cutting-
coverage and capacity through intelligent environment optimiza- edge technologies like Machine Learning (ML) [4]–[8], is
tion. The escalating complexity in configuring and managing IRS crucial to fully realizing IRS’s potential. Machine learning
operations, coupled with the proliferation of user-infrastructure
interactions, underscores the need for a systematic approach. This
turns out to be a crucial tool for IRS optimization that offers
paper explores the foundational principles of IRS and highlights solutions for Channel State Information (CSI), Beamform-
the pivotal role of Machine Learning (ML) in optimizing their ing, Resource Allocation and more [9]. A new world of
potential. By bridging these technologies, we not only stimulate possibilities in IRS-driven wireless networks is unlocked by
interdisciplinary research but also provide valuable avenues for exploiting ML’s capacity to learn from data and adapt to
advancing this rapidly evolving field.
changes, hastening the achievement of the 6G goal [10].
Index Terms—Intelligent reflecting surface, machine learning,
6G communications, wireless networks. Several studies have been done on this topic, but many
of them have only scratched the surface of the potential that
I. I NTRODUCTION IRS has when combined with ML approaches. For this reason
our study expands on the groundwork established by earlier
VER the last years, we have witnessed the success-
O ful implementation of 5G technology, which provides
unprecedented data rates, low latency and enables a myriad
research, aiming to offer a more thorough examination while
pushing the limits of current research.
Our key contributions in this paper include:
of applications. As we advance into the era of 6G, the
ongoing quest for high-speed communication continues to • An in-depth examination of the IRS architecture for
influence the vision of wireless network development, with enhanced comprehension.
particular interest in the substantial yet untapped spectrum • A comprehensive analysis of the relevance of ML meth-
of higher frequency bands. However, these higher frequencies ods to IRS systems, including a comparative evaluation
bring additional difficulties, such as a greater vulnerability to of their advantages and limitations.
obstacles and path loss. Along with the challenges previously • Emphasis on the significance of interdisciplinary studies
cited, the inherent limitations of current technologies, such and practical applications within this dynamically evolv-
as high energy consumption and interference in dense urban ing domain.
areas become increasingly evident, which necessitates the ex- • A review of potential challenges and future research
ploration of novel paradigms to optimize wireless communica- directions.
tions creating a more intelligent and cost-effective propagation In this survey, we conduct a thorough analysis of the
environment. interplay between IRS and ML within modern wireless net-
One such innovation paradigm is the advent of Intelligent works. Section 2 establishes the fundamental framework of
Reflecting Surfaces (IRS) which is set to revolutionize the IRS, shedding light on critical concepts pivotal to their func-
wireless communication environment [1]. Constructed from tionality. The focus of Section 3 is a detailed comparative
passive elements known as meta-surfaces, IRS possesses analysis, revealing how IRS and ML collaboratively shape
the capability to adaptively manipulate electromagnetic wave the forefront of wireless network research. In Section 4, we
propagation. This manipulation plays a crucial role in ad- perform a comprehensive review of the emerging challenges
dressing various propagation challenges, setting the stage for and future research prospects in this field. This section also
enhanced wireless network performance. With this technology, serves to identify and discuss potential hurdles and outlines
IRS contributes to establishing robust and pervasive wireless promising paths for further exploration. The paper concludes
with a synthesis of our insights and perspectives, bringing our
detailed survey to its conclusion.
IRS fundamentally alter wireless signal propagation through
a sophisticated array of passive elements, known as meta-
atoms [11], [12]. This operation is mathematically represented
by the equation:
YIRS = ejϕi Xi , (1)

where YIRS denotes the reflected signal, Xi represents the

incident wave on the i-th meta-atom, and ϕi is the phase Fig. 1. Architecure of IRS
shift induced by the meta-atom. This equation encapsulates the
essence of IRS functionality—manipulating the phase of inci-
dent electromagnetic waves to constructively or destructively the reflection process. The innermost layer is a network of
interfere at specific points, thereby tailoring the wavefront for control circuits, which dynamically modulates the reflective
optimal signal propagation. properties of the surface elements.
The meta-surface, an integral component of IRS, is a A pivotal element in this structure is the use of positive-
planar structure composed of numerous such elements, each intrinsic-negative (PIN) diodes. These diodes play a key role
capable of independently adjusting the phase (and amplitude) in controlling the reflective state of each patch. The operation
of incoming waves as per the formula: of these diodes can be mathematically modeled as a switch,
represented by:
z = [β1 ejθ1 , β2 ejθ2 , . . . , βN ejθN ]T . (2) (
1, if PIN diode is ON at time t
Here, βi and θi represent the amplitude and phase adjustments Sij (t) = , (4)
0, if PIN diode is OFF at time t
at the i-th element, respectively. This capability enables the
IRS to actively reshape wireless channels for improved com- where Sij (t) denotes the state of the j-th diode in the i-th
munication quality. patch at time t. This binary operation allows for the dynamic
In the context of network infrastructure, the IRS’s interac- modulation of the surface, altering its reflective properties in
tion with base stations (BSs) and access points (APs) is pivotal. real-time to achieve the desired signal manipulation.
It enhances signal quality and extends coverage by strategi- The intelligent controller, an integral part of the IRS,
cally redirecting signals. This process can be represented as orchestrates the operation of these diodes, thus determining
the overall reflective behavior of the IRS. It functions as
SIRS = FIRS (SBS/AP , Ψ), (3)
a communication gateway with other network components,
where SIRS is the IRS-modified signal, SBS/AP is the signal ensuring that the IRS’s response is optimally aligned with
from BSs/APs, and Ψ denotes the IRS’s configuration matrix. network demands.
This model highlights how IRS can focus signals towards
intended user terminals, thereby maximizing efficiency and
minimizing interference. In our system model, we consider a streamlined configura-
Furthermore, IRS plays a vital role in network resource tion where a base station communicates with a user equipment
management. By modulating signal trajectories, IRS aids in in an IRS-enhanced network scenario, as illustrated in Fig. 2.
balancing the traffic load across BSs and APs. Such strategic The BS features an array of M linearly-arranged antennas, and
signal management is key to ensuring optimal network perfor- the IRS is composed of N reflective elements, each capable
mance and efficient capacity utilization. of modulating incident signals.
The signal reception model at the UE, incorporating the
B. STRUCTURE OF IRS IRS’s influence, is mathematically expressed as [13]:
The architecture of intelligent reflecting surfaces is char-
yirs = Huser ΨHbase x + n, (5)
acterized by its multi-layered composition and an intelligent
controller. The foremost layer, facing the incident signal, where x ∈ CM ×1 denotes the signal transmitted by the BS.
consists of a multitude of reconfigurable metallic patches The matrix Hbase ∈ CN ×M represents the channel from the
printed on a dielectric substrate. These patches are crucial in BS to the IRS, while Huser ∈ C1×N describes the channel
directly manipulating the incoming electromagnetic signals. from the IRS to the UE. Noise at the UE is represented by
Beneath this layer lies a conductive surface, typically copper, n ∼ CN (0, σ 2 ). The matrix Ψ = diag(z) ∈ CN ×N , where
serving as a ground plane to minimize energy leakage during z ∈ CN ×1 , denotes the phase shift induced by the IRS.
where YDL represents the estimated channels, Xpilot denotes
the pilot signals, and ΘDL encapsulates the deep network
parameters. This method’s advantage is its robustness to minor
user location shifts, up to 4 degrees. However, its reliance
on extensive training data can be a limiting factor in rapidly
changing network environments.
In [16], a novel channel estimation strategy employing
a hybrid passive/active IRS configuration with low training
overhead was introduced. This method, using a Complex-
Valued Denoising Convolution Neural Network (CV-DnCNN),
significantly reduces Normalized Mean Square Error (NMSE),
as quantified by:
∆NMSE = ∥Hactual − Hestimated ∥. (8)
The advantage of this approach lies in its efficient noise
reduction capability, though it may face challenges in envi-
ronments with highly variable noise characteristics. The Deep
Fig. 2. System model Residual Learning method, explored in [17], uses a Deep
Residual Network (DRN)-based Minimum Mean Square Error
(MMSE) estimator derived from the Bayesian MMSE crite-
To account for the direct link between the BS and UE, we
rion, outperforming traditional MMSE estimators. Its strength
extend the model to include the direct channel path:
is in handling complex channel conditions, yet it requires
yirs = Huser ΨHbase x + Hdirect x + n, (6) substantial computational resources.
Further advancements include addressing the challenge of
Here, Hdirect ∈ C1×M characterizes the direct channel from channel extrapolation. An innovative Ordinary Differential
the BS to the UE. This enhanced model allows for a compre- Equation (ODE)-based CNN method is proposed in [18] for
hensive analysis of signal reception, factoring in both the IRS- extrapolating cascaded channels, enhancing efficiency through
mediated path and the direct line of transmission, essential for spatial sampling and selective IRS element activation, thus,
optimizing IRS deployment in practical network environments. improving the resource utilization. Additionally, [19] explores
III. U NRAVELING T HE S YNERGY B ETWEEN a CNN-based model to predict complete channel character-
IRS AND MACHINE LEARNING: istics from partially active elements, showcasing its potential
A C OMPARATIVE S TUDY in optimizing IRS-assisted systems in simulations. Further ad-
vancements in [20] introduce a two-phase channel estimation
In this section, we delve into an analytical review of
method using passive/active IRS structures. Initially, a CNN
machine learning techniques applied in IRS systems. Our focus
model is employed to ascertain the strengths and positions of
is twofold: to assess the varied performance metrics of ML
non-zero parts in sparse channels, an approach more efficient
integration in IRS, and to identify the challenges inherent
than traditional compressed sensing (CS) methods. The next
in such applications. This analysis, grounded in a thorough
phase involves channel reconstruction using the least squares
literature review, is essential for understanding the nuanced
(LS) method based on the identified sparse components. The
effects of ML methodologies on IRS performance, as detailed
results from simulations show that their method outperforms
in Table I.
existing ones in terms of complexity for manageable systms.
A. CHANNEL ESTIMATION In [21], channel estimation in IRS-enhanced MIMO Or-
thogonal Frequency Division Multiplexing (OFDM) systems is
Channel estimation is fundamental in IRS operations, essen-
treated as an image super-resolution problem using a proposed
tial for deciphering the dynamic states of wireless channels,
network named as SRDnNet. This approach achieves over
which is crucial in refining signal quality [14]. The evolving
10 dB improvement in NMSE, emphasizing the need for
landscape of research has seen the prominent adoption of
a minimum of eight pilots for reliable channel estimation
machine learning, particularly Deep Learning (DL) paradigms,
under 5 dB SNR. While this method excels in data size
in enhancing the efficacy of such systems.
and computation cost reduction, its performance is highly
Considerable attention has been directed towards employing
dependent on the quality of training data.
DL in IRS-enhanced massive Multiple-Input Multiple-Output
Federated learning (FL), explored in [22] for IRS-assisted
(MIMO) systems [15]–[17]. One notable approach involves
massive MIMO systems, highlights the necessity for at least 5-
each user leveraging a dedicated deep network [15], utilizing
bit quantization and 15 dB SNR for reliable performance. FL’s
received pilot signals for dual estimation of direct and cas-
robustness to information loss up to 5% shows its potential for
caded channels. This method is mathematically expressed as:
real-world applications, though its performance can slightly
YDL = FDL (Xpilot , ΘDL ), (7) degrade under imperfect label conditions.

References Aspect of IRS ML Technique Models

[14]–[22] Channel estimation DL, SL, UL, SSL, RL, FL SRDnNet, CV-DnCNN, ODE-CNN, CsiNet, DreL, KGNet
[22]–[26], [28]–[33] Beamforming DL, SL, UL, RL, FL DNN, DQN, LPSNet, DQN
[34]–[38], [41], [42] Resource management DL, UL, SSL, FL, DRL AirFL, DNN, D-DQN, MDQN, DDPG, DL-MDC, CNN
[43], [44] Energy efficiency DL, DRL DDPG, LSTM-based ESN, D3QN

Despite progress, challenges such as pilot contamination and network. By integrating constraints into the fitness function,
adaptability to user location changes remain, driving continued the HHO algorithm effectively tackles the formulated non-
research and improvement of ML methods in IRS channel convex optimization problem. The simulation results from this
estimation. study demonstrate that the HHO-based scheme can achieve
comparable, if not superior, outcomes to existing algorithms.
B. BEAMFORMING / PHASE SHIFT CONFIGURATION It represents a significant advancement in beamforming op-
Beamforming, a key technique in IRS operations, intricately timization, illustrating the potential of nature-inspired algo-
adjusts signal phases and amplitudes to target specific re- rithms in complex wireless network scenarios. This method
ceivers, enhancing network efficiency. The inherent complex- can be mathematically represented as:
ity of these multidimensional systems makes them a good
place for machine learning application.
OptimizationHHO = HHO(Fitness Function, Constraints),
Huang et al. [23] and Gong et al. [24] independently uti-
lized Deep Reinforcement Learning (DRL) for beamforming
where the fitness function is designed to maximize the re-
optimization in IRS-assisted systems. Huang et al. developed
ceived signal power, considering the system’s constraints. The
a DRL-based framework for IRS-assisted MIMO, focusing on
success of the HHO-based approach in simulations indicates
large-dimension optimization problems with low implemen-
its viability as a robust alternative to traditional optimization
tation complexity, adaptable to various system settings [23].
methods in IRS-aided networks.
This framework can be represented as:
For group phase-shift and channel correlation, a method
OptimizationDRL = arg max R(θ), (9) focusing on IRS element grouping based on expected group
phase-shift and adjacent Reflecting Element (RE) correlation
where R(θ) is the reward function under policy parameters was proposed [28]. Aygul et al. [29] and Khan et al. [30]
θ. Gong et al. introduced an optimization-driven DRL frame- pioneered the use of deep learning models for phase and
work, enhancing learning efficiency and reward performance channel optimization in IRS-assisted systems. Aygul et al.
compared to traditional model-free DRL methods [24]. How- introduced an algorithm leveraging channel correlation to
ever, this approach may require extensive data for training, a boost data rates [29], while Khan et al.’s detector demonstrated
potential limitation in dynamic IRS environments. improved Bit Error Rate (BER) performance through offline
Authors in [25] focused on creating an efficient DRL-based training, adaptable to dynamic channel conditions [30]. A Q-
framework for phase shift design in the IRS-aided downlink of learning-based algorithm aimed at minimizing network latency
multiple-input single-output (MISO) wireless communication was suggested [31]. Additionally, a Generative Adversarial
systems. This approach surpassed the fixed-point iteration Network-Distributed Deep Reinforcement Learning (GAN-
algorithm in performance, closely approximating the upper DDRL) approach was proposed for optimizing beamforming,
bound determined by the Semidefinite Relaxation (SDR) al- phase-shift, and IRS location in multi-IRS networks, focusing
gorithm, while significantly reducing computation time: on 6G network performance enhancement [31].
PerformanceDRL ≈ Upper BoundSDR − ∆time . (10) Song et al. [32] employed unsupervised learning for joint ac-
tive and passive beamforming design in IRS-aided multi-user
A Deep Deterministic Policy Gradient (DDPG) algorithm was MISO systems, achieving competitive performance with lower
proposed for phase shifting in Non-Orthogonal Multiple Ac- complexity, which facilitates real-time beamforming configu-
cess (NOMA) downlink networks, utilizing the strong fitting ration. A Zeroth-Order Stochastic Gradient Ascent (ZoSGA)
capabilities of neural networks [26]. The DDPG algorithm, method was introduced for model-free optimal beamforming
integrating Q-networks and deterministic policy networks, in passive IRS-assisted networks, demonstrating the capability
showed notable performance improvements. of ZoSGA to learn near-optimal passive IRS beamformers
A novel approach is presented in [27], where a nature- from effective CSI [33]. In [34], a low-complexity DL-based
inspired optimization technique, the Harris Hawks Optimizer approach for joint beamforming design in IRS-aided MIMO
(HHO), is employed for joint optimization of transmit beam- multiple antenna eavesdropper (MIMOME) systems showed
forming at the Access Point (AP) and passive reflect beam- performance close to conventional Alternating Optimization
forming at the IRS. This technique aims to maximize the (AO) algorithms, indicating its potential for complex network
user’s received signal power in an IRS-aided MISO wireless scenarios.
C. RESOURCE MANAGEMENT Key challenges include developing novel network architectures
Resource management is a critical aspect of IRS-supported with multiple IRSs, characterizing limits of IRS-empowered
wireless communication systems, involving the efficient uti- communications, and designing trade-off solutions between
lization of resources to optimize system performance. The connectivity, energy efficiency, safety, and accuracy. This
integration of machine learning in this domain has yielded approach signifies a holistic advancement in IRS integration
notable improvements, showcasing the flexibility of these within 6G networks. In [40], a novel IRS-self-sensing system
techniques in complex scenarios and moving closer to the full is proposed, utilizing the IRS controller for probing signal
utilization of IRS in diverse wireless communication setups. transmission and dedicated sensors for precise location/angle
In [35], a three-step Convex Optimization (CO)-based algo- estimation. The application of the MUltiple Signal Classifi-
rithm was developed to enhance IRS reflection and user con- cation (MUSIC) algorithm for accurate Direction-Of-Angle
nectivity in multi-user networks. The algorithm demonstrated (DOA) estimation and optimization of the IRS reflection ma-
data rate improvements of up to 30% and 100% over standard trix highlights the system’s efficacy in autonomous localization
IRS-user connections and non-IRS scenarios, respectively. tasks, marking a significant step forward in IRS technology.
The mathematical representation of the improvement can be The Intelligent Spectrum Learning (ISL) algorithm, in-
expressed as: troduced in [42], utilized CNNs for interference manage-
ment in IRS-aided multi-user uplink networks. This approach
Data rateCO − Data rateBaseline enabled IRSs to detect interfering signals directly, coupled
Improvementrate = . (12)
Data rateBaseline with a distributed control algorithm for optimizing Signal-
The researchers also simplified this algorithm by transforming to-Interference-plus-Noise Ratio (SINR) by dynamically con-
the IRS-user connection problem into a regression problem figuring IRS elements. The ISL-aided IRS approach showed
and applying machine learning for optimal solutions. This enhanced performance in simulations. Lastly, Lu et al. [43]
ML-based IRS controller achieved similar data rates with proposed an IRS-based hybrid precoding architecture for Ter-
significantly reduced computation times compared to the CO- ahertz (THz) communication, focusing on maximizing user
based method. Wu et al. [36] explored resource allocation sum-rate by addressing discrete phase shifts in IRS-based
optimization in IRS-enhanced OFDM systems using a hybrid hybrid precoding. They introduced the Deep Learning based
Multiple Deep Q-Networks (MDQN)-DDPG framework, adept Multiple Discrete Classification (DL-MDC) hybrid precoding
at handling mixed discrete and continuous actions. They algorithm, reducing runtime significantly compared to tradi-
also introduced a hybrid Dueling Double Deep Q-Networks- tional methods while maintaining sum-rate performance.
Twin Delayed Deep Deterministic policy gradient (D3QN- These advancements in resource management for IRS-
TD3) algorithm to improve convergence speed and stability assisted systems demonstrate the critical role of machine
in spectrum-sharing scenarios. Simulation results highlighted learning in optimizing complex network scenarios. However,
these algorithms’ effectiveness, offering superior transmission challenges remain in terms of scalability and real-world im-
rates over benchmark schemes. Ranjan et al. [37] proposed a plementation complexities, driving continued research and
Gradient Ascent (GA) algorithm for IRS coefficient optimiza- innovation in this field.
tion in OFDM systems, focusing on sum rate maximization.
This approach simplified the complex phase coefficient opti- D. ENERGY EFFICIENCY
mization into a manageable real-valued problem, enhancing This section delves into the role of machine learning in op-
operational feasibility. timizing energy consumption in IRS systems, a critical factor
Al Hammadi et al. [38] introduced a deep Q-learning in advancing sustainability and environmental considerations
system for spectral efficiency maximization in a Time Division in wireless networks.
Multiple Access (TDMA)-based IRS-assisted Visible Light The study in [44] introduced a novel strategy to enhance
Communication (VLC) network. The two-stage system first energy efficiency in BSoperations with IRS support. It formu-
allocated IRS mirrors optimally using the Maximum Possible lates an optimization problem focused on maximizing average
Fairness (MPF) algorithm, followed by Deep Q-Learning energy utilization, allowing the BS to dynamically adjust its
(DQL) algorithm application for segment orientation, Light- transmitted power, IRS angle shifts, and reflector states. The
Emitting Diode (LED) power allocation, and LED-user asso- application of deep reinforcement learning enables adaptive
ciations optimization. The DQL-MPF algorithm outperformed network configuration by the BS in uncertain wireless envi-
baselines, demonstrating fast convergence. Perdana et al. [41] ronments. This method is quantified as:
examined adaptive user pairing in multi-IRS-aided massive
Efficiency Improvement = 21.1%, (13)
MIMO-NOMA systems, addressing Spectral Efficiency (SE)
maximization while considering BS power limits and user- indicating a significant enhancement in energy efficiency and
specific Quality Of Service (QoS) requirements. They pro- harvested energy, up to twice the baseline amount. However,
posed real-time optimization using deep learning for predicting the reliance on deep learning necessitates substantial compu-
optimal solutions based on UE locations and channel gains. tational resources for training and optimization.
Recent advancements in IRS technology are comprehen- In [45], the focus shifts to joint deployment, phase shift
sively addressed in [39], focusing on the IRSE-6G vision. control, power allocation, and dynamic decoding order in IRS-
enhanced wireless systems using Non-Orthogonal Multiple • Robustness in Varied Conditions: Ensuring consis-
Access (NOMA) technology. The study introduced a long tent ML model performance across diverse operational
short-term memory (LSTM)-based Echo State Network (ESN) scenarios is challenging. Developing adaptive learning
and a D3QN-based algorithm. The D3QN algorithm, utilizing algorithms that can update model parameters in real-time
double Q-learning and a decaying ε–greedy policy, demon- is a potential solution.
strated improved performance and convergence after adequate
training, represented by: B. FUTURE RESEARCH OPPORTUNITIES

Convergence RateD3QN > Convergence RateDQN . (14) The integration of AI in IRS within 6G networks opens
several research avenues:
Meanwhile, the LSTM-based ESN algorithm effectively bal- • Intelligent Network Configuration: Exploring AI-
anced prediction accuracy and computational complexity. This driven methods for real-time, dynamic IRS network adap-
approach significantly outperformed benchmark methods in tation, focusing on adaptive control and optimization
energy efficiency, showcasing the potential of integrating ad- strategies.
vanced machine learning techniques in IRS system design. • Next-Generation Beamforming: Investigating AI-
Overall, these studies illuminate the potential of machine enhanced beamforming techniques, especially those
learning in revolutionizing energy optimization in IRS-assisted utilizing deep reinforcement learning, to improve
wireless networks. While promising, the complexity and com- network efficiency and interference management.
putational demands of these ML-based approaches present • Resource Optimization in ML Models: Developing
ongoing challenges, underscoring the need for further research techniques for resource-efficient ML, like lightweight
and development in this domain. neural networks and on-device AI, to mitigate compu-
tational constraints in IRS systems.
• Advancing Explainable AI: Enhancing XAI [46], for
IRS systems to provide insights into AI decision-making
In this section, we delve into the complexities and promising processes, crucial for critical communication applica-
avenues of challenges and future research directions, examin- tions.
ing the intricate interplay between IRS and ML in the context • O-RAN Integration: Researching the integration of IRS
of Next-Gen wireless networks. This investigation reveals both with O-RAN architecture [47], focusing on innovative
areas that require resolution and the transformative potential network management and optimization strategies.
that beckons on the horizon.
