Deep Learning-Based Structural Health Monitoring

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Automation in Construction 161 (2024) 105328

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

Review

Deep learning-based structural health monitoring


Young-Jin Cha a, *, Rahmat Ali a, John Lewis a, Oral Büyükӧztürk b
a
Department of Civil Engineering, University of Manitoba, Winnipeg, MB R3T 5V6, Canada
b
Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

A R T I C L E I N F O A B S T R A C T

Keywords: This article provides a comprehensive review of deep learning-based structural health monitoring (DL-based
Deep learning SHM). It encompasses a broad spectrum of DL theories and applications including nondestructive approaches;
Automation computer vision-based methods, digital twins, unmanned aerial vehicles (UAVs), and their integration with DL;
Digital twin
vibration-based strategies including sensor fault and data recovery methods; and physics-informed DL ap­
Damage identification
proaches. Connections between traditional machine learning and DL-based methods as well as relations of local
Defect detection
Infrastructure monitoring to global approaches including their extensive integrations are established. The state-of-the-art methods,
Physics-informed including their advantages and limitations are presented. The review draws on current literature on the topic,
also providing a synergistic analysis leading to the understanding of the evolution of DL as a basis for presenting
the future research and development needs. Our overall finding is that despite the rapid progression of digital
technology along with the progression of DL, the DL-based SHM appears to be in its infant stages with enormous
potential for future developments to bring the SHM technology to a common practical use with wide scope
applications, performance reliability, cost, and degree of automation. It is anticipated that this review paper will
serve as a basic resource for readers seeking comprehensive and holistic understanding of the subject matter.

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Deep learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Various learning modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Core deep learning operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1. Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2. Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.3. Depthwise and pointwise convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.4. Dilated/atrous convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.5. Atrous spatial pyramid pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.6. Upsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.7. Activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.8. Residual/skipped connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.9. Attention modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3. Various DL networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1. CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2. Faster R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.3. Mask R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.4. VGG net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.5. Residual net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.6. DenseNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.7. Deep autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.8. FCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.9. UNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

* Corresponding author.
E-mail address: [email protected] (Y.-J. Cha).

https://doi.org/10.1016/j.autcon.2024.105328
Received 25 August 2023; Received in revised form 3 January 2024; Accepted 6 February 2024
Available online 8 March 2024
0926-5805/© 2024 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

2.3.10. RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.11. Generative adversarial networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.12. Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.13. Various DL frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3. Approaches and applications for SHM with deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1. NDT-based imaging techniques for subsurface damage detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1. Thermography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2. Eddy current techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.3. GPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.4. Ultrasonic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.5. Acoustic emission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.6. Summary of NDT based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2. Computer vision-based approaches for surface damage identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1. Damage classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2. Bounding box level damage detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.3. Pixel level damage segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.4. Summary of computer vision based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3. Integration of DL and UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1. Summary of integration of DL and UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4. Digital twin with DL for SHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.1. Application of 3D scanning for 3D reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.2. Application of 2D photogrammetry for 3D reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.3. Application of DL for damage identification using 3D reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.4. Summary of digital twin and 3D reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5. Vibration-based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1. Supervised approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.2. Measured data recovery or prediction methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.3. Sensor fault detection approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.4. Unsupervised approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.5. Summary of vibration based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6. Physics-informed DL networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.1. Major approaches within physics-informed deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6.2. Applications in SHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6.3. Summary of physics-informed deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4. Evolution of deep learning-based SHM: A summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5. Future research directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
CRediT authorship contribution statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Declaration of competing interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1. Introduction methods involve visual inspections and localized nondestructive testing


(NDT) methods. Typically, local methods rely on information obtained
SHM of civil infrastructures is an extensive field that focuses on from global methods to further assess the severity of localized damage.
detecting structural defects, monitoring structural conditions, and Global methods are commonly referred to as vibration-based
assessing a structure's safety based on long-term data available from methods because they heavily rely on vibration measurements. Within
various types of sensors installed in the structural systems. SHM is an the vibration-based methods, three branches exist: data-driven methods,
essential process for maintenance, anomaly detection, and improving physics model-based methods, and hybrid methods. Most of these
the serviceability of civil infrastructures through the repairing and techniques aim to calculate modal properties in both the intact (base­
disaster management of structures. Civil infrastructures gradually line) and damaged structures using measured vibrations or developed
deteriorate due to continuous use, thereby losing their designed func­ finite element models (FEM). For instance, Vandiver [2] extracted nat­
tion. This unavoidable process increases the importance of regular and ural frequencies from measured acceleration time series, Yuen [3]
early maintenance. employed mode shapes for structural damage identification.
At the early stages of SHM, researchers discovered that structural Subsequently, numerous approaches have been proposed in vibra­
damage is often manifested through changes in various structural tion measurement-based damage detection methods to extract damage-
properties, such as stregnth (stiffness), and damping through vibration sensitive features from measured vibrations. These damage-sensitive
testing and continuous measurement of the audio frequency dynamic features encompass not only changes in modal properties and their
modulus and damping of specimens subjected to tensile loading [1]. subsequent processing but also spatio-temporal features within the vi­
These changes in structural properties also lead to alterations in dy­ bration time series for various applications, including offshore struc­
namic modal properties (i.e., natural frequencies, mode shapes, and tures, civil bridges, buildings, and rotating mechanical systems.
damping). To identify these changes and assess damage, different global As a result, the traditional damage identification process involves
and local damage identification methods have been developed. three main stages: measurement, feature extraction, and classification
Global methods focus on detecting changes in the modal properties using various machine learning (ML) and classification methods.
of the monitored system, as these changes significantly affect the dy­ Different signal processing techniques, such as Fourier transformation
namic vibration characteristics of the structure. On the other hand, local [4], wavelet analysis [5], auto-regressive moving average [6], Hilbert

2
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

transform [7], etc., have been employed to extract these damage- image conditions, such as blurriness, spot-lighting, shadows, and out-of-
sensitive features. focus elements, among other challenges, the trained CNN still detected
Extensive investigations of traditional ML and classification methods cracks, demonstrating its ability to address various environmental and
have also been conducted to classify the extracted features, transitioning sensing uncertainties and noise.
from an intact state to a damaged state. Some of the core and pioneering Comparative studies with traditional edge detection-based crack
applications include artificial neural networks (ANN) [8], Bayesian detection methods further validated the superior performance of this
probabilistic approaches [9], fuzzy logics [10], simple genetic algorithm approach. To extend this method of damage detection, a faster region-
(GA) [11], multiobjectgive GA [12], and support vector machines (SVM) based convolutional neural network (faster R-CNN) [26] was
[13]. employed to simultaneously detect and localize multiple different types
However, damage identification relying on modal properties and of damage using variable sizes of bounding boxes [27].
damage-sensitive features extracted from various signal processing Numerous studies have been performed using a variety of DL
techniques may not be effective in real-world environments, where methods and different types of sensors, including both contact and non-
various noises and uncertainties, including temperature changes, can be contact ones. NDT methods that utilize various imaging techniques and
present [14]. While several studies have attempted to address these computer vision have also been employed, sometimes in combination
challenges using algorithms like curve fitting methods, these approaches with UAV systems. Given the rapidly growing field of DL-based SHM and
have limitations in detecting small but critical damage. As a result, these the abundance of published articles on various subtopics, a compre­
traditional methods tend to work well only for simple structures or hensive review and summary is needed to identify the types of DL that
idealized numerical FEMs with minimal measurement errors and have been developed, where they have been implemented, the
uncertainties. achievements made, and the difficulties and limitations encountered.
In contrast, computer vision-based approaches have gained signifi­ Moreover, it is necessary to assess the state-of-the-art in different sub­
cant interest because they offer explicit, clear visual evidence of damage topics and explore future directions.
within images [15]. Various image processing techniques have been Therefore, this paper presents a detailed literature review of various
utilized for crack detection in red, green, and blue (RGB) images [16], subtopics in a synergistic way related to DL-based SHM, offering helpful
including fast Haar transform, FFT, Sobel edge detector, and Canny edge guidelines for researchers and practitioners interested in this field,
detector. Moreover, different image processing techniques have been especially those new to the topic. The ultimate goal of this paper is to
employed for the extraction of damage-sensitive features related to emphasize the importance of SHM and contribute to improving the
different types of structural damage in RGB images, such as Hough safety and integrity of structures.
transform [17] and wavelet analysis [18]. In this paper, Section 1 provides a summary of various traditional
Furthermore, computer vision-based approaches have been devel­ SHM approaches with manual feature extractions and conventional ML
oped for vibration measurements [19–22] and strain measurements methods and highlights their limitations, leading to the initiation of DL-
[23]. These methods enable the measurement of strain and reliable based SHM as a solution to overcome these limitations. Section 2 pre­
structural responses, such as displacements and accelerations, without sents a theoretical review of various DL algorithms and concepts. Section
requiring a physical stationary reference point. The measured responses 3 offers extensive technical reviews of various DL-based SHM ap­
are then used as input for vibration-based damage detection. proaches and applications, ranging from computer vision-based ap­
However, even with computer vision-based approaches, there is a proaches, vibration-based approaches, hybrid approaches (e.g., physics-
need to formulate or manually extract damage-sensitive features. informed approaches), NDT, and data prediction and reconstruction of
Additionally, these approaches often detect only one type of damage, measurements, to rendering structures with 3D digital twin recon­
and their performance is highly dependent on the lighting conditions of struction and holistic damage mapping. Section 4 provides an evolution
the images. Blurry, shadowed, or unevenly lit images can negatively and cohesive summaries of DL-based SHM approaches. Section 5 sug­
impact the performance of traditional image processing techniques in gests future directions, and Section 6 provides conclusions.
extracting damage-sensitive features, which is exacerbated by the
limited classification ability of traditional ML methods. 2. Deep learning algorithms
As a result, for over three decades, the field of SHM has faced a
complex and longstanding challenge in formulating and extracting DL networks have a design similar to that of the human brain. These
damage-sensitive features that are robust to ambient temperatures, networks involve learning or recognizing patterns and extracting fea­
noise, and changes in lighting conditions. Furthermore, there is a tures from the training data using multiple hidden layers. Each hidden
growing demand to develop more efficient and robust ML algorithms layer in the DL architecture receives an input, which passes through
capable of classifying the extracted damage-sensitive features subsequent layers and provides the output in the final layer. The
accurately. advantage of DL lies in its ability to automatically extract features during
To overcome the fundamental limitations and difficulties associated the training process, enabling the deep neural network to fulfill its
with manually extracting damage-sensitive features and classifying purpose. As the training data increases, the accuracy of this network also
them using traditional ML methods, Cha et al. [24] proposed a deep improves. This high computational process became feasible due to
learning (DL)-based approach for damage detection utilizing a deep technological advancements and improved performance of graphic
convolutional neural network (CNN). Deep learning methods are rep­ processing units (GPUs) and their parallel computing capabilities.
resentation techniques that involve multiple levels of representation, As mentioned above, in recent years, DL has taken a leading role in
achieved by combining simple yet nonlinear modules that automatically the field of SHM. This shift is a response to the mounting challenges
extract features from raw input data and pass them to deeper modules as posed by modern civil infrastructure. With sensors generating vast and
higher-level representations [25]. Deep learning has demonstrated intricate datasets, DL offers a powerful solution, automatically extract­
successful applications in various domains, such as speech recognition, ing valuable insights. Moreover, DL's integration of computer vision has
object detection, genomics, and many others. expanded the capabilities of SHM, enabling the analysis of 1 dimentional
Due to its nature, DL can automatically extract robust multilevel (1D) data such as vibration or strain measurements to 4D data such as
damage-sensitive features from raw input images by training on large RGB videos for damage assessment. These developments, combined with
labeled datasets. The CNN architecture in Cha et al. [24] was designed to hardware advances and user-friendly frameworks, have democratized
detect concrete cracks using a defined size of a sliding window to DL's application in SHM, making it a vital tool for enhancing structural
localize the detected cracks in RGB images. The results were quite suc­ safety.
cessful, with a detection accuracy of 97%. Despite variations in RGB In this section, we discuss several different learning modes in Section

3
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

2.1, fundamental DL operators in Section 2.2, and explore their appli­


cations in various DL networks for DL-based SHM in Section 2.3. The
details are described in subsections.

2.1. Various learning modes

DL networks are the featured learning tool, mostly considered a


branch of ML. The learning modes of deep neural networks are classified
into supervised, unsupervised, semi-supervised, and reinforcement
learning, as shown in Fig. 1.
Fig. 2. Convolution operation.
In supervised learning modes, the learnable parameters of the
network are updated through backpropagation by the calculation of loss
this section, various fundamental and popular DL operators that can be
function that expresses the gap between actual output and ground truth
used with any supervised, unsupervised, semi-supervised, and rein­
training data or labeled target values. To reduce the calculated gap, the
forced modes of learning networks for SHM are reviewed.
learnable parameters are updated through backpropagation based on
chain rule. Therefore, in this supervised mode, the networks have a
2.2.1. Convolution
chance to learn the patterns of the target objects through the training
A convolution performs dot product operations (i.e., element-by-
using labeled data. Therefore, due to nature of this learning mode, the
element multiplications between the input and the filter), and the
network usually requires a large amount of labeled data to train it. The
resulting products are then summed up, as depicted in Fig. 2. The filter is
majority of the DL networks used in SHM are included in these super­
also referred to as the receptive field or kernel.
vised modes for damage detection and localization problmes using
Fig. 2 illustrates the overall process of the convolution operation. The
measured vibrations and crack detection and segmentation problems.
subarray size is equivalent to the size of the receptive field (i.e., filter),
However, collecting ground truth data reflecting various damage
while the receptive field size (i.e., width and height) is always equal to or
scenarios from real structures is challenging. Each piece of civil infra­
smaller than the input feature arrays, and the depth of the filter is the
structure has unique boundary conditions and distinct characteristics,
same to the input or feature map. Another critical hyperparameter is the
including material properties and corresponding dynamic behaviors. As
stride, which determines how the kernel slides over the input array in
a result, data collected from various damage scenarios for supervised
both width and height directions. A larger stride reduces the computa­
learning may not be applicable to other structures for training [28].
tional cost due to fewer kernel applications. However, there is also the
Therefore, unsupervised learning has been employed [28–31].
possibility of losing essential features of the input data due to the larger
In unsupervised learning, the network analyzes the data and clusters
stride sizes.
unlabeled datasets. These models discover hidden patterns in the data
without any external human intervention. For SHM applications, the
2.2.2. Pooling
networks are solely trained using data from the baseline structures. This
A pooling operator is employed to reduce the dimensions of the input
unsupervised mode of DL can overcome the difficulties of supervised
feature maps and summarize the features present in the corresponding
learning. The well-trained network, using only data from the baseline
regions. There are three common types of pooling operators: max,
structure, can effectively reproduce the input data. If the trained
average, and min pooling operations. In the case of average pooling, the
network fails to accurately reproduce the input, the input data can be
operation calculates the average from the region of the feature map
considered outliers compared to the trained data. One of the represen­
covered by the filter, as illustrated in Fig. 3. Conversely, max pooling
tative unsupervised DL networks is the autoencoder [29–31].
selects the maximum element from the region of the feature map
In semi-supervised learning, both the labeled and unlabeled data are
covered by the filter.
used in the training dataset. This semi-supervised mode is applicable
when there is only a limited amount of ground truth labeled data or to
2.2.3. Depthwise and pointwise convolutions
reduce the effort needed to prepare the large amount of ground truth
The normal convolution layer involves numerous parameters that
data for training. Therefore, supervised mode and unsupervised modes
increase the overall computational cost and may also raise the risk of
are implemented together to achieve the goal of the network designer.
overfitting. To address these issues, depthwise and pointwise convolu­
In reinforced learning, the algorithms make a sequence of decisions
tions have been utilized as replacements for the normal convolution. A
from the rewards through initial random trials and errors to achieve a
depthwise convolution (DWC) applies a single convolution filter for each
goal in a complex or uncertain environment. The trial-and-error pro­
input channel separately, as illustrated in Fig. 4. On the other hand, a
cedures are adopted to solve a problem and a reward or penalties are
pointwise convolution (PWC) processes the input data in the same way
given for the actions. The primary goal in reinforcement learning is to
as the normal convolution, as depicted in Fig. 4. When combined, DWC
maximize the reward.
and PWC form what is known as depthwise separable convolution
(DWSC). The advantage of DWSC over the normal convolution lies in its
2.2. Core deep learning operators

DL networks can largely be broken down in terms of an input layer,


hidden layers and output layer, where the hidden and output layers are
also composed of various DL operators to extract features efficiently. In

Fig. 1. Various learning modes. Fig. 3. Pooling layer.

4
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Fig. 4. Depthwise separable convolution (a) DWC (b) PWC.

Fig. 5. Dilation examples (reproduced from [69]).

ability to reduce the computational cost. original spatial size of the input data, typically used in the decoder for an
end-to-end deep autoencoder. Its purpose is to increase the dimension of
2.2.4. Dilated/atrous convolution the input feature map, and it is commonly used in various segmentation
The dilated convolution is a commonly used technique to increase models. The upsampling layer can utilize different techniques, including
the receptive field without adding to the computational cost. The size of max pooling or bilinear upsampling. In the case of the max pooling
the receptive field, as shown in Fig. 5, can be expanded by inserting strategy, it can be reversed, with each element surrounded by zeros to
empty matrix cells, which is referred to as the dilation rate. achieve upsampling, as depicted in Fig. 7. The right half of the figure
The normal convolution has a dilation rate of 1. Any convolutions illustrates the upsampling operation where the nearest neighbor
used with a dilation rate greater than 1 are considered dilated/atrous approach is adopted.
convolutions. Fig. 5 displays the dilation rates, which are based on the
gaps between consecutive pixels. Various studies, such as DeepLabv3 2.2.7. Activation functions
[32] and SDDNet [33], have employed different dilation rates in DL is deployed to solve nonlinear problems, making it imperative to
convolution layers to generate multiscale feature representations. introduce nonlinearities in DL networks. To model the nonlinearity be­
tween the input and output of these networks, various activation func­
2.2.5. Atrous spatial pyramid pooling tions have been developed, and they also play a crucial role in
Atrous spatial pyramid pooling (ASPP) [32,33] is composed of preventing overfitting. Fig. 8 illustrates some commonly used nonlinear
multiple dilated/atrous separable convolutions (see Fig. 5), average
pooling, and PWC. As shown in Fig. 6, ASPP utilizes parallel atrous
convolutions with different rates to extract multiscale features effec­
tively. The output features from all the atrous convolutions with
different rates are concatenated and passed through another PWC for
feature fusion. The primary reason for employing ASPP is to avoid
computational cost when normal convolutions are only used to capture
the surrounding feature information of the target object, aiming for
improved segmentation or detection.

Fig. 7. Upsampling.
2.2.6. Upsampling
The upsampling layer is employed to restore the feature map to the

Fig. 6. Atrous spatial pyramid pulling. Fig. 8. Nonlinear activation functions (reproduced from [69]).

5
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

2.2.9. Attention modules


The attention module has become crucial for capturing global and
long-range dependencies [34]. This module learns the relationship be­
tween one pixel and all other pixels in the input features. Attention has
found significant applications in image analysis and natural language
processing. Usually, the attention module receives a feature map from
the previous convolution layer in the encoder and from the transpose
Fig. 9. Residual/skipped connection. convolution layer in the decoder. These feature maps (x) are further
processed by three pointwise convolutions: Query Q(x), Key K(x) and
Value V(x). Several operations are performed among the query, key, and
activation functions in networks for SHM.
value, followed by an elementwise summation operation to generate the
The most prevalent activation function in DL is the rectified linear
final feature map.
unit (ReLU). ReLU effectively deactivates unnecessary learnable pa­
Through these operations, the original feature map is self-refined,
rameters by filtering out negative values. Since DL networks often have
identifying core features from the original features. As a result, self-
numerous learnable parameters that are not specifically optimized for
attention is also known as intra-attention. Self-attention extracts
the given problem, ReLU performs especially well for image classifica­
important features from the input feature map within the network and
tion tasks. Initially, the input of the network is all positive, and
uses them to capture long-term dependencies. For a detailed explanation
depending on the filter values, some features can become negative.
of the attention module, refer to Fig. 10.
ReLU can control and handle these negative values. Leaky ReLU and
Multi-head attention utilizes multiple parallel attention mechanisms,
parametric ReLU (PReLU) were also developed to consider negative
known as attention heads, to capture different levels and phases of
values, but they go beyond simply deactivating them when the DL
features. Each attention head attends to a different subset of the input
network has an appropriate number of learnable parameters. Tanh is
feature map. Consequently, multi-head attention can effectively extract
another activation function used to consider the nonlinearity of both
both local and global features. The outputs from the multiple attention
negative and positive values together. The Swish nonlinear activation
heads are combined to generate a comprehensive representation of the
function is well-suited for light DL networks.
input feature map.
Channel attention mechanisms comprise two pooling operations:
2.2.8. Residual/skipped connection
average pooling and max pooling, as illustrated in Fig. 11. The feature
The residual connection provides a pathway to transfer feature data
map with dimensions H × W × D can undergo both average and max
from a given layer in the DL architecture to the later layers, bypassing
pooling. The outcomes then undergo distinct squeeze and expansion
certain intermediate layers. For this reason, these connections are also
operations. Average pooling is applied during the squeeze operation,
known as skipped connections, as depicted in Fig. 9(b), in comparison to
while reproduction occurs during excitation. Following this, the repro­
the network without skipped connections, i.e., the feed-forward network
duced results also pass through a multi-layer perceptron separately.
(FFN), as shown in Fig. 9(a). The residual connection first applies
Eventually, the outcomes are subjected to element-wise summation,
identity mapping to x, which is then followed by an elementwise addi­
which is followed by the application of a ReLU activation function.
tion, F(x) + x. The purpose of this skipped connection is twofold: it
Subsequently, the results of these individual operations are summed and
preserves various important multilevel features and concurrently pre­
element-wise multiplied with the original feature map, thereby refining
vents the loss of significant features that might occur during the
it throughout the process.
consecutive processing of the hidden layers.

Fig. 10. Self-attention module (reproduced from [69]).

Fig. 11. Channel attention module.

6
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Fig. 12. Popular benchmark networks.

Fig. 13. First CNN architecture for concrete crack detection (reproduced from [24]).

2.3. Various DL networks of the animal visual cortex, which has proven effective in numerous
image recognition problems. CNNs efficiently extract features from
Using the DL operators described above, many advanced DL algo­ input images and, due to their sparsely connected nature, require fewer
rithms have been developed to overcome the limitations of traditional computations and a pooling process. Additionally, CNNs can potentially
ML methods. Fig. 12 showcases the most popular architectures from the differentiate a large number of classes. These unique advantages make
last 10 years. CNNs an efficient method for image recognition.
These influential research papers and architectures include LeNet The main issue with CNNs was the requirement for a large amount of
[35], AlexNet [36], ZFNet [37], Generative Adversarial Network (GAN) labeled data, but this challenge was overcome through pretraining for
[38], GoogleNet [39], VGGNet [40], R-CNN [41], ResNet [42], Unet transfer learning using well-annotated databases such as ImageNet [55],
[43], FCN [44], fast R-CNN [45], faster R-CNN [26], YOLO [46], the CIFAR-10 and CIFAR-100 datasets, and the MNIST Database [68].
DeepLabv3 [32], SegNet [47], DenseNet [48], Transformer [49], Mask The CNN architecture consists of convolution, pooling, and fully
R-CNN [50], ENet [51], PANet [52], YOLOv4 [53], and YOLOv7 [243]. connected layers. Cha et al. [24] designed a new CNN for crack detection
Some of the most famous DL methods have also been adopted for DL- as the representative CNN architecture, as shown in Fig. 13. The number
based SHM. In this section, we will review some core DL algorithms of layers and sequences varies depending on the types of data and
that were implemented to solve SHM problems. desired accuracy. The size of the filter is usually significantly smaller
than the input size. The pooling layer uses a downsampling operation to
2.3.1. CNN decrease the dimensions of the input and help reduce the computational
The CNN is the most representative initial DL network, originally cost of the architecture.
developed to recognize handwritten zip code digits provided by the U.S.
Postal Service [54]. The connectivity patterns between neurons in a 2.3.2. Faster R-CNN
CNN are inspired by biological processes and resemble the organization The faster R-CNN was proposed by Ren et al. [26] and consists of two

Fig. 14. The schematic architecture of the faster R-CNN C: convolution, R: ReLu, P: pooling.

7
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

stages: the region proposal network (RPN) and the fast R-CNN, as shown significantly affects pixel-to-pixel mask predictions compared to classi­
in Fig. 14. For multiple object detection and localization, a region-based fication tasks. To address this issue, the Mask R-CNN proposes a
CNN (R-CNN) [41] was designed, taking an input and object proposals quantization-free layer called RoIAlign [57]. RoIAlign is achieved by
from selective searches [56], and then a CNN was used for extracting implementing bilinear interpolation instead of rough quantization.
features. The R-CNN significantly improved accuracy compared to CNN-
based methods. However, it was computationally costly and time- 2.3.4. VGG net
consuming due to three separate training processes: a CNN, regressor, VGG Net [40] achieved first place in localization and second place in
and SVMs. classification tasks in the 2014 ImageNet challenge. The main contri­
To address the limitations of R-CNNs, Girshick et al. [45] designed a butions of the VGG model involved replacing large kernel filters with
fast R-CNN, which achieved better performance in terms of accuracy and small (3 × 3) convolution filters one after the other, which resulted in
speed. Despite the improvement, fast R-CNN still had low accuracy and significant improvements over previous models. The VGG model com­
speed due to the time-consuming external selective search method. To prises convolution layers and max pooling layers, as depicted in Fig. 16.
solve these issues, Ren et al. [26] developed the faster R-CNN by Simonyan and Zisserman [40] evaluated the model's performance on
incorporating an RPN and fast R-CNN to improve training accuracy. The different datasets, namely VOC-2007, VOC-2012, Caltech-101, and
RPN proposes candidate bounding boxes, and the fast R-CNN extracts Caltech-256, for classification tasks. The model has two variants, VGG-
features from these candidate boxes using region of interest (RoI) 16 and VGG-19. VGG-16 and VGG-19 achieved a mean average preci­
pooling, followed by classification and bounding box regression. The sion (mAP) of 89.3% on VOC-2007 and 89.0% on VOC-2012. Addi­
faster R-CNN increases accuracy by end-to-end training of the network tionally, the VGG model surpassed state-of-the-art models, including
and by sharing features between the RPN and fast R-CNN. This faster R- GoogleNet [39] in localization tasks.
CNN provides real-time object detection and has found applications in
various fields. It was first implemented by Cha et al. [27] to detect 2.3.5. Residual net
multiple different structural damages. The residual network (ResNet) [42] introduced a solution to the
network performance degradation that occurs when the number of
2.3.3. Mask R-CNN layers increases. It is well-known that deeper networks are difficult to
The Mask R-CNN [50] is an extended version of the faster R-CNN, train and result in degradation due to the gradient vanishing phenom­
achieved by adding one more branch, as shown in Fig. 15. This addi­ enon. While adding more layers to the network may seem like a way to
tional branch predicts the object mask in parallel with the existing handle more complex tasks, it also introduces new challenges. The
branch for bounding box regression. While the faster R-CNN provides ResNet provides empirical evidence that simply adding layers to the
two outputs, namely a class label and a bounding box offset, the Mask R- network saturates both training and test accuracy. In the ResNet model,
CNN includes a third branch known as a fully convolutional network the shortcut connection (i.e., skip connection) increases neither pa­
(FCN) that provides the object mask. The Mask R-CNN introduces pixel- rameters nor computational complexity (i.e., the computational cost of
to-pixel alignment, which was not present in the faster R-CNN. The ResNet-152 is lower than that of the VGG model, even though the
faster R-CNN extracts features with coarse spatial quantization, leading ResNet-152 model is eight times deeper than the VGG model).
to misalignment between the features and RoI. This misalignment The ResNet model introduces a shortcut connection, which prevents

Fig. 15. Mask R-CNN.

Fig. 16. VGG Net.

8
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Fig. 17. ResNet.

the loss of essential features through the deep layer operations. Deeper
models tend to suffer from accuracy degradation, and the shortcut
Fig. 19. Deep autoencoder.
connections help to reduce this degradation. As a result, these shortcut
connections are referred to as residual blocks. Similarly, multiple re­
2.3.8. FCN
sidual blocks can exist, and these nonlinear residual blocks can adapt to
The FCN was developed by Long et al. [44] and is a successful
other layers. Depending on the number of layers, the authors proposed
technique for semantic segmentation. The FCN approach uses the
five different networks, namely ResNet-18, ResNet-34, ResNet-50,
already existing CNN models for segmentation that were initially used
ResNet-101, and ResNet-152. Fig. 17 shows the architecture of
for classification. Classification models such as AlexNet [36], GoogleNet
ResNet-34.
[39], and VGG-16 [40] were modified, by changing the fully connected
layers into convolution layers as shown in Fig. 20.
2.3.6. DenseNet
Therefore, FCN does not have any fully connected layers that have
The DenseNet [48] was proposed, in which each layer in the network
many learnable parameters. Consequently, it process the input more
connects with every other layer in a feed-forward style. It is common for
quickly compared to the other networks that have fully connected
CNNs to be more accurate and efficient when layers' connections are
layers. To produce per-pixel dense output, Long et al. [44] used an
densely implemented. Traditional CNN approaches provide connections
upsampled stride convolution called deconvolution.
between each layer and the subsequent layer. The key benefits of Den­
seNet include mitigating the vanishing gradient, encouraging feature
2.3.9. UNet
reuse, reducing the number of parameters, and improving feature
A UNet architecture was developed by Ronneberger et al. [43] for
propagation.
pixel-wise medical image segmentation as one of the FCNs. The UNet
Fig. 18 illustrates the different connections in the dense block,
model was initially used to detect HeLa cells in light microscopic images.
showcasing the reuse of features in the dense network. The DenseNet
The network architecture consists of an encoder, a decoder (representing
achieved higher accuracy with lower computation cost when evaluated
the downsampling and upsampling paths, respectively), and skip con­
on several object recognition tasks (i.e., CIFAR-10 and CIFAR-100, and
nections, as shown in Fig. 21.
ImageNet [55]. The procedure of concatenating various feature maps in
The downsampling path uses a convolution and pooling layer to
different layers increases the diversity of the input features of subse­
extract features from the input image. The upsampling path uses
quent layers, further improving accuracy.
transposed convolution to upsample the feature map and output the
desired size. Skip connections are employed between the up- and
2.3.7. Deep autoencoder
downsampling paths, which add local information to the global infor­
The autoencoder is an unsupervised DL tool that performs repeated
mation during upsampling.
backpropagation to train itself to reconstruct its own input. It is mainly
comprised of two parts: the encoder and decoder, consisting of an input
2.3.10. RNNs
layer, hidden layers, and an output layer, as shown in Fig. 19.
Recurent neural networks (RNNs) have become successful tools for
The encoder's hidden layer learns feature representations from the
modeling sequential data, such as sound and text, to capture spatio­
input, while the decoder's hidden layer tries to reconstruct the input into
temporal features. They have also been combined with CNNs for com­
the output layers. During the training process, the autoencoder is
puter vision tasks. While CNNs are successful in interpreting visual data
updated by reducing the reconstruction error. However, a typical single-
that is not in sequence, RNNs are popular for interpreting temporal or
layer autoencoder is not sufficient to extract representative features
sequential data, using data points in a sequence to make better pre­
from raw data. Therefore, a deep autoencoder was first developed by
dictions. The approaches used for training RNNs include back­
Hinton and Salakhutdinov [58].
propagation through time and real-time recurrent learning [59].

Fig. 18. DenseNet.

9
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Fig. 20. FCN.

Fig. 22. Feed-forward network and RNN.

synthetic samples that look similar to the real ones. However, there is a
Fig. 21. UNet. scarcity of data in the field for training such networks.

An RNN maps the complete history of the previous input to target 2.3.12. Transformer
vectors and stores the memory of the previous input in the internal state The transformer is a feed-forward neural network-based model that
of the network. The RNN develops connections between units from a utilizes an attention mechanism, specifically multi head self-attention
directed cycle and can generate memories of arbitrary length sequences (MHA), which is considered the most crucial component of the trans­
of input patterns [264]. A feed-forward network utilizes a fixed context former network [62]. This attention mechanism determines the relevant
length to predict the next word in a sequence, whereas a standard RNN portion of the input feature map that the network should focus on while
considers all previous words. generating the output feature map. In addition to self-attention, the
In a traditional DL network, inputs and outputs are independent of transformer employs other techniques like positional encoding and layer
each other; however, in an RNN, the output depends on the prior ele­ normalization in its encoder-decoder architecture, as depicted in Fig. 24.
ments in the sequence. The information in a feed-forward network flows Put simply, the transformer serves as the main model, while atten­
in one direction, from the input through hidden layers to the output. The tion operates as a key operator within the transformer model. The
feed-forward model has no memory of the input it received. On the other transformer finds applications in various domains, including natural
hand, the information runs through a loop in an RNN, which makes a language processing (NLP), sound signals, and images. When the
decision by considering the current and previous input. Fig. 22 (a) and transformer network receives an input, it converts it into two sequences:
(b) show the feed-forward network and RNN, respectively. a sequence of vector embeddings and a sequence of positional encod­
To address some of the issues with traditional RNNs, the long-short- ings. In essence, the inputs and outputs are first transformed into dense
term-memory (LSTM) [60] architecture was proposed. Additionally, vectors. Assigning each word in a sentence a specific position in a
Gers et al. [61] used LSTM and gated RNN (GRNN) to avoid back­ sequence is vital, as the model lacks a RNN capable of recalling how it
propagating errors from an exploding or vanishing gradient. The pro­ acquires the input sequences [49].
posed ReNet was further combined with CNN and used for
segmentation. 2.3.13. Various DL frameworks
All of the aforementioned DL operators and deep learning networks
2.3.11. Generative adversarial networks can be implemented using existing framworks. For instance, Python-
The GAN was first proposed by Goodfellow [38]. A GAN consists of based frameworks such as TensorFlow, PyTorch, Caffe, Theano, and
two parts: a generator, which generates synthetic data, and a discrimi­ Keras, as well as MATLAB's deep learning and computer vision toolbox,
nator, which distinguishes whether the input data is from the model have played pivotal roles. The choice of a platform for implementing DL
distribution or from the data distribution. The generator's goal is to models depends on various factors. Python stands out due to its wide­
develop samples that are very close to the real data to fool the spread availability and extensive support through numerous libraries. In
discriminator. Fig. 23 shows a typical GAN architecture, which consists contrast, MATLAB, while a powerful tool, is not freely accessible.
of a generator and a discriminator.
GANs have been implemented for various purposes to generate data 3. Approaches and applications for SHM with deep learning
and have gained significant attention in the DL field for their ability to
learn complex data distributions. The uniqueness of GANs is that they do DL has found extensive applications in two broad branches of SHM:
not make any assumptions about the distributions and can generate local and global methods, as depicted in Fig. 25. Local monitoring
methods employ NDT-based imaging techniques and computer vision

10
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Fig. 23. Typical GAN structure.

primarily utilizing RGB cameras as visual sensors to identify surface


damages. These damages encompass cracks in concrete and steel ele­
ments, corrosion, deflection, loose bolts, potholes, spalling, and other
issues.
Section 3.3 discusses the integration of DL with computer vision
techniques and some imaging techniques with UAVs to detect structural
damages or with light detection and ranging (LIDAR) to collect vision
data of structures and detect structural internal and external damages.
Section 3.4 discusses digital twins through 3D image reconstruction,
where damages detected through DL methods can be presented.
Fig. 24. Typical transformer. Section 3.5 reviews vibration-based methods, categorizing them into
data-driven, physics model, and hybrid approaches. Data-driven ap­
techniques, while global monitoring methods generally involve proaches use measured structural responses or signals obtained from
vibration-based approaches combined with computer vision techniques. various sensors, including those mounted on UAVs. These methods
Computer-vision approaches can be used for both methods because typically employ supervised DL models to detect and localize damages
images can be utilized to detect structural surface damage, such as by learning latent features of different damage scenarios. Some physics
cracks, corrosion, etc., as part of the local method. Additionally, they are model approaches suggest using finite element models to generate
employed to measure structural vibrations for global vibration-based training data in a supervised mode.
approaches. Section 3.5.1 reviews these supervised methods, while Sections 3.5.2
Section 3.1 discusses the utilization of DL with NDT-based imaging and 3.5.3 discuss their applications to data recovery and sensor fault
techniques, such as thermography, radiography, eddy current, ground- detection problems. However, due to the unique dynamic characteristics
penetrating radar (GPR), interferometry, ultrasonic, and acoustic emis­ of each large-scale structure, collecting a sufficient number of diverse
sion methods, to detect subsurface damages, including delamination, damage scenarios for training DL models is a significant challenge,
internal cracks, internal corrosion, debonding, honeycombing, among resulting in limited data sharing among structures. Consequently, Sec­
others. tion 3.5.4 reviews unsupervised DL training modes that have been
Section 3.2 examines DL utilizing computer vision techniques, developed to address this issue. Section 3.6 discusses physics-informed

Fig. 25. Classification of DL-based SHM.

11
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

methods. These IRT images were captured via UAVs, and subsequently subjected
to analysis using EfficientDet [75]. This detection method facilitated the
identification of pavement regions containing internal voids through
3.1. NDT-based imaging techniques for subsurface damage detection bounding boxes. The accuracy of the obtained results was corroborated
by GPR tests, thus confirming the veracity of the ground truth data.
NDT methods have been evolving for more than three decades, using One obstacle of these approaches is the difficulty in collecting and
various approaches such as thermography, radiography, Eddy current establishing enough ground truth data for various internal defects and
techniques, GPR, ultrasonic, and acoustic methods. These methods are damages to train DL methods. Additionally, these processes are quite
developed to image the interiors of construction materials without expensive compared to general RGB camera-based vision sensors. In
causing any damage. Notably, Büyüköztürk [63] has presented the order to address the issue of limited data availability, Ali and Cha [69]
principles of various concrete imaging techniques. In this section, an developed an attention-based GAN (AGAN) to generate synthetic ther­
extensive literature review is conducted on these various imaging mal image data for internal damage segmentation in concrete using only
techniques, along with their applications to DL methods for internal a limited number of ground truth data that were established by valida­
damage inspection. tion through experiments. The generated synthetic data were used to
train the internal damage segmentation network (IDSNet). This light­
3.1.1. Thermography weight IDSNet was designed to efficiently segment subsurface damage in
NDT techniques, such as passive thermography with DL, have been a pixel-wise manner, and it outperforms state-of-the-art (SOTA) DL
used to identify internal or subsurface damage in structures. Ali and Cha networks, such as Attention UNet [69], UNet++ [76], and DeepLabV3+
[64] proposed a method that uses passive thermography in combination [77]. The IDSNet successfully segmented internal damage (delamina­
with a deep inception neural network (DINN) to detect and localize tion, voids, cracks, debonding, and honeycombing) in concrete bridge
internal defects in steel structural members of a bridge. The authors and parkade structures in Winnipeg, Canada.
collected 2000 images of both damaged and intact surfaces and used 200 Here, the AGAN adopted a self-attention mechanism, including
of these images to test the performance of the DINN, as presented in global attention, to extract the core features of the internal defects.
Table 1. Additionally, IDSNet also employed depth-wise asymmetric, dilated,
The robustness of the network was tested on 200 new testing images. and separable dilated convolutions to effectively extract features of in­
The DINN successfully achieved an accuracy of 96.00% and a specificity ternal defects with lower computational cost.
of 97.79%. The testing results of the DINN were validated using the There are numerous studies that have utilized thermography com­
ultrasonic pulse velocity test method. Yang et al. [65] used a faster R- bined with DL for internal damage identification. The level of detection,
CNN to detect cracks in infrared images of steel sheets. An electric particularly in terms of the depth of concrete material, holds significant
heating device was used as a source for external excitation. A total of importance. Mac et al. [78] demonstrated the accurate detection of
3000 labeled images were collected to develop a databank. The devel­ delamination in concrete girders with a depth of 7.0 cm or less using the
oped method achieved a mAP of 92.41% and an accuracy of 95.54% active thermography method. On the other hand, other studies [79,80]
when tested on 125 new images. found that thermography has a maximum depth of only 1.5 cm for
Recently, Sen et al. [71] proposed a data-driven approach called plaster and 5 cm for reinforced concrete when employing active
multi-component deconvolution interferometry for predicting the thermography.
seismic response of structural systems. The proposed approach uses Moreover, Ali and Cha [69] found that damages measuring 9 cm × 9
deconvolution interferometry to develop a surrogate model that aids in cm were easily detectable when their depth was 4.5 cm or less from the
both dynamic characterization and accurate response prediction. The surface of the slabs, and 15 cm × 15 cm can be detected when the depth
model takes into account different types of uncertainties, such as noise is less than 7 cm. Cotič et al. [81] showed that defects with a size of 10
in the measurements, variations in temperature and humidity, and vi­ cm × 10 cm can be detected when the depth of defects is 8 cm or less.
brations caused by human activity. The effectiveness of the proposed However, it is important to note that the maximum depth of detection
method was proven through its application to field monitoring data using thermography can be affected by the specific materials being
obtained from structures in the Groningen area of the Netherlands. analyzed and the calibration of the instruments.
These structures had limited sensor deployment, and the data was
collected over a period averaging 10 months. 3.1.2. Eddy current techniques
Pozzer et al. [72] conducted a study on segmenting concrete defects Eddy current techniques have been used for non-destructive evalu­
using different deep convolutional neural models in thermographic and ation of conductive materials for several years. For example, De Alcan­
regular images. VGG 16, ResNet 18, ResNet 50, and MobileNetV2 [73] tara et al. [82] utilized eddy current for corrosion assessment of
were employed to identify various concrete anomalies, including spal­ reinforced bars in concrete. Recent advancements in DL techniques have
ling, delamination, cracks, and patches, from images captured at shown promising results in improving the accuracy and efficiency of
different distances and viewpoints. The MobileNetV2 model performed eddy current testing for SHM applications. For example, Fu et al. [83]
well in detecting damages in thermal images, achieving a high recall of proposed an end-to-end 1D CNN model that automatically classifies and
96.50%, accuracy of 74.20%, and F1 score of 85.20% by detecting 164 analyzes pulsed eddy current signals. The proposed model outperformed
of the 170 defects. other methods, including Gaussian process decision tree and SVM, in
Kulkarni et al. [74] similarly employed IRT images to identify sub- terms of accuracy and error. The CNN model achieved 91% accuracy,
pavement voids in pavements resulting from deteriorating culverts.

Table 1
Imaging-based subsurface damage identification.
Network Structure Type & device Train Val. Test Input size mIoU (%) Acc. (%)

DINN (Ali and Cha [64]) Steel Internal defects, infrared 1600 4,00 2,00 640 × 480 – 96.00
Faster R-CNN (Yang et al. [65]) Steel Crack, thermography 3000 – 125 – – 95.41
Deep Residual net (Ahmed et al. [66]) Bridge deck Rebar, GPR 9805 1528 – – 90.20 98.71
Segnet (Yang et al. [67]) Tunnel Internal defects, GPR 104,000 10,400 10,400 256 × 128 89.50 93.00
AGAN, IDSNet (Ali and Cha [69]) Bridge, Parkade Internal defects, infrared 820 – 84 640 × 480 90.00 95.20
Improved Mask R-CNN (Liu et al. [70]) Pavement GPR 3430 429 429 1024 × 1024 70.10 –

12
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

which is higher than the accuracy achieved by Gaussian process DenseNet-161 achieved the highest validation accuracy of 97.19%,
(53.6%), decision tree (87.2%), and SVM (61.9%). while ResNet-18 had the lowest accuracy of 80.60%.
Miao et al. [84] proposed a method for online defect recognition of Xu et al. [89] utilized a 1D CNN with GPR to detect concrete pave­
narrow overlap welds based on a two-stage recognition model by ment distress. Their proposed method achieved a testing accuracy of
combining continuous wavelet transform (CWT) with CNN. The pro­ 96.30%, which surpassed the accuracy of conventional methods such as
posed method generates a 2D time-frequency diagram from the 1D eddy Adaboost (90.12%) and SVM (82.50%). Zhang et al. [90] utilized the
current signal through CWT and then uses CNN to detect and classify ResNet18-YOLOv2 model with GPR for automatic void detection in
defects. The accuracy of the proposed method is 96.94%, nearly 10% airport runways. They generated a dataset of 811 voids from 10 airport
higher than the traditional method, i.e., SVM or K-Nearest Neighbor, runways and utilized to detect voids. The model achieved a precision of
and the average detection time is 2.4 s, making it suitable for online 90.98% and F1 score of 87.59%. The authors also used conventional
operation. data augmentation techniques to increase the size of the dataset,
Alvarenga et al. [85] proposed an embedded system for online resulting in improved performance of the used model with a precision of
detection and classification of rail surface defects based on eddy current 92.82% and F1 score of 90.05%.
technology. They developed a method to interpret eddy current signals The pixel-wise segmentation of internal damage using GPR data and
by analyzing their wavelet transforms using a CNN. The proposed DL has been investigated by researchers. Yang et al. [67] employed a
technique detects, classifies, and maps the defects on the track's surface SegNet combined with the Lovsz softmax loss function to detect tunnel
to facilitate workers in carrying out maintenance processes efficiently. lining internal defects non-destructively using GPR synthetic data. The
Field tests were conducted on rail anomalies grouped into three classes, authors concluded that due to GPR signal dissimilarity and morpho­
i.e., squats, welds, and joints. The results showed a classification effi­ logical differences, direct application of CNNs for defect segmentation is
ciency of 98%, surpassing traditional methods, i.e., nearest neighbors challenging. However, by using the SegNet model with the cross-entropy
(92%), decision tree (81%), and AdaBoost (82%). The use of a CNN to Lovasz Softmax function, they achieved a mean precision of 93% and
analyze eddy current signals shows great promise for future applications mIoU of 90%.
in the railway industry. Liu et al. [70] introduced a vertical crack segmentation method for
Meng et al. [86] combined a 1D residual CNN with eddy current asphalt pavement using GPR in conjunction with an enhanced Mask R-
testing for evaluation of defect depth. The 1D residual CNN generated a CNN. In comparison to the original Mask R-CNN, their approach in­
dataset of 48,000 scans from 18 defects of different depths. The DL corporates supplementary operations, specifically, bottom-up reverse
model with 38 layers achieved an accuracy of 93.58% in discriminating side connections, within the feature pyramid module. Moreover, they
surface defects in steel. Note that the 1D residual CNN and 1D CNN are employ the ResNet 101 architecture as the foundational backbone for
completely different networks. A 1D CNN uses only 1D convolutions the feature extraction encoder. For the training, validation, and testing
throughout the entire CNN architecture to keep 1D features; therefore, it phases, GPR images of dimensions 1024 × 1024 were utilized. The
has to sum up all the output features coming from the 1D convolutions of achieved result was an impressive 70.1% mIoU at a processing speed of
the previous layers. This may cause the loss of some important features. 4.2 frames per second (FPS).
The eddy current method has certain limitations that must be The depth of penetration for GPR is influenced by various factors,
considered. One major limitation is its limited effectiveness for non- such as the scanned medium, transmitted frequency, material, and
conductive materials. Additionally, any conductive component in the radiated power of the electromagnetic waves. In general, higher fre­
vicinity can easily influence the resulting eddy current, rendering it quencies result in shallower penetration depths but offer higher reso­
unsuitable for complex geometries and edges. Consequently, false lution. For instance, a frequency of 2.6 GHz can typically penetrate up to
readings and decreased accuracy may occur [87]. 0.3 m, while lower frequencies such as 400 MHz can reach depths of
around 2 m.
3.1.3. GPR It is essential to carefully consider the selection of GPR frequency in
The traditional manual methods for analyzing non-destructive im­ relation to the expected depth of investigation and the desired resolu­
aging GPR data, which involve using high-frequency electromagnetic tion, as they are dependent on the characteristics of the subsurface
waves to detect and locate subsurface damage, can be both time- materials. The correct selection of GPR equipment can significantly
consuming and subjective. Additionally, these methods can lead to in­ impact the accuracy and effectiveness of subsurface investigations; thus,
consistencies in the results obtained from different analysts. To over­ understanding the limitations and capabilities of GPR systems is crucial
come these limitations, researchers have started to explore the use of DL for optimal utilization in subsurface studies. For instance, Solla et al.
techniques to automatically analyze GPR data for SHM purposes. [79] detected delamination up to a depth of 40 cm using a ProEx system
For instance, Dinh et al. [88] proposed an algorithm for automatic with a high-frequency antenna of 2.3 GHz.
localization and detection of rebars from GPR data of concrete bridge
decks. The proposed methodology involves the integration of conven­ 3.1.4. Ultrasonic method
tional image processing techniques and a deep CNN. The algorithm in­ Recently, the combination of ultrasonic testing and DL has emerged
volves two key steps: first, image processing techniques such as as a promising approach for SHM. For instance, Melville et al. [91]
migration, normalized cross-correlation, and thresholding are utilized to proposed a 2D CNN for structural damage detection in thin metal plates
identify pixels containing potential rebar peaks. Second, windowed using ultrasonic guided waves. A dataset of full wavefield scans of un­
images surrounding the identified pixels are extracted and classified by a damaged and damaged plates made of different materials was used to
trained CNN. The performance of the proposed system was analyzed by train the deep network. The proposed method outperforms the 62%
applying it to GPR data from 26 bridge decks, and it demonstrated an accuracy achieved by SVM, achieving an accuracy of 99.98%.
overall accuracy rate of 95.75% to 99.60%. Tran et al. [92] employed a deep CNN with a VGG architecture and
Ahmed et al. [66] used a deep residual network and unsupervised K- laser ultrasonic technique to detect and estimate the looseness of bolted
means clustering for rebar detection and localization. The data were joints. They outperformed the K-nearest neighbor method and the SVM.
collected using GPR from nine different bridges. They found that several Additionally, they compared the model's performance with and without
factors that affect the performance of rebar detection and localization data augmentation, finding that data augmentation reduced the problem
systems include noise, reflections, and visual quality of rebar profiles. In of overfitting. The proposed method achieved the lowest mean absolute
their study, they tested the performance of various architectures, error of 1.55.
including ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, Pyle et al. [93] utilized CNNs based on AlexNet and VGG-19 to
DenseNet-121, and DenseNet-161. Among these architectures, characterize cracks in pipe inspection using ultrasonic images. They

13
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

generated training data for the CNN model using hybrid finite element and matrix spalling. Five different CNN networks were used: GoogleNet,
and ray-based simulation, consisting of 999 experimental images and ResNet18, ENet, MobileNetV2, and ResNet50. The ResNet18 achieved a
25,624 simulation-based images. The proposed CNN model was maximum accuracy of 93.94%, which was superior to the other
compared with the commonly used 6 dB drop method for sizing defects networks.
in ultrasonic images. The CNN model demonstrated a high level of ac­ Guo et al. [98] proposed a hierarchical deep convolutional regres­
curacy, achieving a 97% confidence level in sizing defects with a length sion network (i.e., 1D-ResNet) for impact source localization of acoustic
ranging from 1 to 5 mm, with an error margin of only ±1 mm. In emission signals for SHM. The proposed method considered combining
contrast, the 6 dB drop method only achieved a 48% confidence level both fail-safe mechanisms and pristine models to achieve high accuracy
under the same conditions. The study concluded that the proposed DL in impact localization results for both a simple homogeneous plate and a
method was significantly more accurate than the traditional 6 dB drop complex inhomogeneous plate with geometric features. Data augmen­
method in sizing defects in ultrasonic images. tation and transfer learning techniques were utilized to train the fail-safe
Arbaoui et al. [94] employed CNNs (specifically AlexNet and model without the need for additional experimental data.
ResNet50) to detect cracks in concrete structures using ultrasonic testing The depth at which defects can be detected using AE techniques
combined with multiresolution analysis. The proposed method involved depends on various factors, such as the type and size of the defect, the
three steps: NDT of the concrete specimen using ultrasonic signals to material being tested, and the characteristics of the AE sensors used. To
detect the presence of internal defects (step 1); multiresolution analysis detect deeper defects, a higher frequency AE sensor may be needed, as
through a discrete wavelet transform to generate a scalogram, which higher frequency waves have shorter wavelengths, enabling them to
aids in localizing the crack in space and at each resolution (step 2); and a penetrate deeper into the material. Additionally, increasing the energy
CNN that extracts relevant features from the scalogram image for or stress applied to the material can generate larger AE signals, making
identifying and classifying internal defects as either cracks or non- them easier to detect. In the AE technique, when energy is rapidly
cracks. The AlexNet approach achieved an accuracy of 91.82%, while released from a localized source within a stressed material, it generates
the ResNet50 approach achieved an accuracy of 96.91%. elastic stress waves that can cause deformation and damage. When there
Rautela and Gopalakrishnan [242] proposed a convolutional and is no further propagation of the crack, it will not detect the already
RNN model using ultrasonic guided waves for real-time, continuous existing crack, but it is still possible that residual stress waves may be
observation of structures to detect abnormal behavior. The proposed DL detectable using AE sensors.
model achieved a mean absolute percentage error (MAPE) of 8.50 for Another limitation of the AE method is that unwanted noise and
damage localization, while the random forest model achieved a MAPE of complex signal processing result in errors [87]. While surface damage
4.32, and the SVM model achieved a MAPE of 26.76. For predicting detection has been extensively studied, subsurface damage detection has
damage length, the proposed DL model had a MAPE of 11.25, the received less attention. Using DL techniques with NDT methods is a
random forest model had a MAPE of 60.39, and the SVM model had a promising approach for identifying and localizing subsurface defects in
MAPE of 91.07. structures, but further research is required to improve accuracy and
Ewald et al. [95] employed a CNN to develop an automated struc­ reliability.
tural diagnostics system for aircraft maintenance using Lamb wave. To
process the sensor data collected from the structures, the researchers 3.1.6. Summary of NDT based methods
transformed the one-dimensional time-domain signals into two- In imaging technique-based NDT for local monitoring, various
dimensional time-frequency representations. This transformation was methods such as thermography, eddy current, GPR, ultrasonic, and
crucial to obtain distinct images for each crack condition, which were acoustic emission methods have been investigated using various DL
then fed into a ConvNet designed to capture spectrotemporal features. networks to inspect internal damages of structures.
The time required to investigate an area of concrete using ultrasonic Table 2 presents a comprehensive summary of various NDT methods,
pulse velocity depends on various factors, such as the thickness of the encompassing their suitability for different materials, associated bene­
concrete, the number of measurements required, and the setup time for fits and limitations, suitable structures for application, and specific areas
the equipment [64]. However, the investigation time can be signifi­ of application. It is worth mentioning that these highlights are not
cantly longer for thicker concrete structures or larger scan grids. Ali and exhaustive and may vary depending on the material being tested and
Cha [64] conducted ultrasonic pulse velocity tests that took approxi­ other factors. Thermography-based methods, although effective in
mately two hours to gather data for a small area of 530 cm2 and pre­ detecting surface and subsurface defects, have limitations in detecting
pared contour maps for damage detection. Advanced equipment with deeper defects. Solla et al. [79] found that thermography can only detect
automated scanning capabilities can reduce the investigation time, and defects up to 5 cm deep in concrete, while Mac et al. [78] reported a
the data analysis and interpretation time can also vary depending on the detection limit of 7 cm.
complexity of the structure and the required level of analysis. Ali and Cha [69] observed that thermography can identify damages
with an area of 15 cm × 15 cm up to a depth of 7 cm from the surface.
3.1.5. Acoustic emission Passive thermography methods are also dependent on weather condi­
Acoustic emission has been extensively used in the field of SHM, tions as they rely on solar energy, as highlighted by Cotič et al. [81].
similar to other NDT methods. For instance, Han et al. [96] proposed a Active thermography offers an alternative approach to subsurface
2D CNN to detect acoustic emission crack signals in concrete structures. damage detection, but only a limited number of methods exist for pix­
They generated a dataset of acoustic emission signals emitted from elwise segmentation of interior damages. Therefore, accurate quantifi­
concrete specimens, and noise signals were generated by artificial ac­ cation of detected damages remains limited.
tivities. Low-frequency acoustic emission sensors were used to acquire AE and ultrasonic techniques, while effective for surface-breaking or
signals from laboratory-scale specimens, and physical model tests were near-surface defects, require surface preparation, special training, and
conducted to upscale from centimeter-scale specimens to meter-scale noise filtering waveguides in high noise areas [87]. GPR emerges as a
model foundations. The proposed model achieved a minimum detec­ suitable method for detecting features at greater depths, with penetra­
tion accuracy of 88%. tion depths reaching several meters depending on the scanned subsur­
Zhang et al. [97] used a CNN (ResNet18) to improve the classifica­ face media. Thermography offers a larger coverage area, while the speed
tion accuracy of several damage-induced acoustic emission signals in of the camera can reach 20 to 30 FPS. However, all methods, including
ultra-high-performance concrete. They generated an acoustic emission GPR, thermography, AE, ultrasonics, and eddy current techniques, have
dataset, and the acoustic emission sources were classified into five cat­ limitations and are subject to factors such as the tested material and
egories: matrix cracking, fiber debonding, fiber sliding, fiber scraping, instrument calibration.

14
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Table 2
Summary of NDT techniques.
NDT technique Suitability Benefits Limitations Applications

Thermography Concrete internal damage, composites, Non-contact, detects heat Surface sensitivity, limited depth penetration Structures, electrical systems,
steel, debonding, insulation systems anomalies, large areas aerospace applications
Eddy Current Conductive materials Surface and near-surface Limited depth penetration, material Aerospace, automotive,
Techniques flaw detection conductivity issues electronics
GPR Concrete, soil, pavement, geological Detects subsurface Limited penetration in certain materials but Construction, geophysics, utility
formations variations, deeper defects good for deeper depths in other materials mapping
Ultrasonic Testing Metals, composites, concrete, weld Accurate depth sizing, flaw Requires access to the material Manufacturing, aerospace, oil
inspections, corrosion mapping detection and gas
Acoustic Methods Concrete, masonry, other materials Non-destructive, real-time Limited depth penetration, sensitivity to Composites, concrete
monitoring small defects, noise interference

Most research papers utilize pre-exsting DL Networks for their one of the desired categories as the output of the DL process as shown in
respective tasks. For example, as seen in Pozzer et al. [72] with their Fig. 26.
usage of VGG16, ResNet18, ResNet50, and MobileNetV2 with thermo­ For instance, Cha et al. [24] developed a CNN composed of four
graphic images to detect concrete defects Most research papers utilize convolution layers, two pooling layers, a dropout layer with ReLU
pre-existing DL networks for their respective tasks. Overall, the methods activation function, a fully connected layer, and SoftMax to classify
used for interior damage detection have evolved over time, from intact and crack images. The CNN utilized a sliding window concept to
detection in image level using traditional CNNs [88] to detection in screen large input images and localize the damage. The sliding window
bounding box level using faster R-CNN [26] with various ResNet series, size in this study was 256 × 256 × 3 pixel resolution. As a result, a total
and DenseNets and eventually segmentation on a pixel level by Ali and of 40,000 images, with the size of the sliding window, were prepared
Cha [69]. from 277 images of dimensions 4928 × 3264 × 3 and used for training
However, there are only a few methods available that can segment and validation. The well-trained CNN was then tested on another set of
internal damage on a pixel level, and none of them can conduct quan­ 55 images, with dimensions 5888 × 3584 × 3, achieving an impressive
tification of the detected damage. Moreover, there is a lack of infor­ accuracy of 97%.
mation on the processing time with respect to the area size, and the Furthermore, the CNN's performance was compared with traditional
reliability and accuracy of these methods need to be improved. There­ edge detection-based crack detection methods, such as Sobel and Canny
fore, future studies should focus on addressing these issues and edge detection. The CNN consistently demonstrated superior perfor­
providing more comprehensive evaluations of NDE techniques in terms mance, significantly outperforming the traditional methods, even under
of accuracy, processing time, and monitoring area size. challenging conditions like shadows, blurriness, and strong spot light­
ing. Notably, the designed CNN was integrated with autonomous UAVs
for crack detection in GPS-denied areas, effectively simulating scenarios
3.2. Computer vision-based approaches for surface damage identification beneath a bridge deck or indoors by Kang and Cha [99].
The concept of image classification in DL has been applied to various
Cha et al. [24] demonstrated the feasibility of automatically civil infrastructure applications, including buildings [24], sewer systems
extracting damage-sensitive features from RGB image inputs using CNN, [100], and pavements [108], to detect various types of damages, such as
leading to numerous subsequent studies on damage identification uti­ cracks in concrete members of these structures, obstacles, joint open­
lizing various DL algorithms. The swift progression of computer vision ings, faults, debris, and silty conditions in sewers, and tile deteriorations
and DL, along with improvements in camera resolution, computational as presented in Table 3.
capabilities, and SHM, have expedited the adoption of vision-based For instance, Li et al. [101] proposed a ResNet-18 model with a hi­
damage detection. One benefit of these techniques is their use of con­ erarchical SoftMax approach for sewer line defect detection in imbal­
tactless cameras, which can be easily integrated with UAVs, Unmanned anced data, which reduced the overall performance of the network. The
Ground Vehicles (UGVs), or vehicle-mounted systems for data hierarchical approach supervised the learning process at different levels
collection. during training, with the high-level task focused on differentiating be­
Generally, the development of computer vision-based damage tween normal images and images with defects, while the low-level task
detection using DL algorithms involves three stages. The first is image determined the probability of defects within the image. The final clas­
classification-based damage identification, the second is bounding box sification results were determined using the chain rule of conditional
level object detection-based damage identification, and the third is probability.
pixelwise segmentation. These stages are discussed in Sections Hassan et al. [100] proposed an AlexNet model for defect classifi­
3.2.1–3.2.3 respectively. Each approach has been extensively reviewed cation in CCTV videos obtained from underground CCTV models. Data
in the context of various infrastructure types, including bridges, build­ augmentation techniques, such as transformation, flipping, rotation,
ings, dams, pavements, tunnels, and sewer systems. Performance com­ translation, and deformation, were used. The original AlexNet model,
parisons are provided in the tables in each section. However, it's crucial initially used for the classification of 1000 natural objects in the
to remember that many of the methods were trained and tested on ImageNet dataset [55], was modified and fine-tuned with transfer
different datasets. Consequently, these tables should be interpreted as learning for sewer defect detection.
indicative rather than definitive, and it would be misleading to conclude There is also an effort to improve CNN performance through the
that one method is superior to another based solely on these perfor­ application of probabilistic approaches. For example, Chen and Jahan­
mance comparisons. Further details on the different structures are pro­ shahi [102] proposed the NB-CNN, which utilizes a CNN and a Naive
vided in the subsequent subsections. Bayes data fusion scheme to enhance the performances of crack classi­
fication. Adam and Sathesh [103] also employed a hybrid approach for
3.2.1. Damage classification accurate crack identification in RGB images of concrete structures. They
Table 3 presents some key activities of DL for damage classification. combined a CNN with an SVM classifier and proposed a noise reduction
In this context, classification means that the DL model categorizes the technique to minimize classifier errors. The authors compared their
input image as either a damaged image or intact, or it identifies different method with a single-type classifier [104], a statistical method, and
types of damages. Consequently, the entire input image is grouped into

15
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Table 3
Damage classification.
Network Structure Damage Train Val. Test Input size Prec. Acc. CBS
(%) (%)

CNN (Cha et al. [24]) Building Crack 222 55 55 5888 × – 97.00 No


3584
AlexNet (Hassan et al. Sewer Joint open & faulty, debris silty, 24,137 – 1200 256 × – 96.50 No
[100]) 256
DCNN (Li et al. [101]) Sewer Deposit settlement, breakage, shape change, obstacles, 7933 1043 1002 224 × – 83.20 No
water level stag, joint offset, and deformation 224
NB-CNN (Chen and Building Crack 4260 1065 – 120 × 98.30 – No
Jahanshahi [102]) 120
CNN (Perez et al. [105]) Building Mold 1890 382 782 224 × 90.00 – No
224
CNN (Xu et al. [106]) Bridge Crack 4856 – 1213 224 × – 96.3 No
224
Meta learning CNN (Guo Building Crack 20,909 350 350 500 × 84.00 – No
et al. [107]) 500
CNN VGG-19 (Rao et al. Pavement Crack 36,000 12,000 12,000 256 × 97.90 95.00 No
[108]) 256
CNN VGG-16 (Kung et al. Building Tile deterioration 5112 568 – 224 × – 86.00 No
[109]) 224

Fig. 26. Schematic view of damage classification using sliding window in CNN.

image processing with FFT and demonstrated that their developed level DL networks using R-CNN, faster R-CNN, single-shot multi-box
hybrid approach outperforms these methods in terms of accuracy. detector (SSD) [110], DINN [111], and various YOLO series from v1-v8.
It should be noted that civil infrastructures are typically located in The example detected types of damage using the bounding box level
complex background scenes (CBSs), making it a challenging problem for damage detection and localization are delamination [27], peeled paint
the developed DL models to detect damages solely based on the CBSs. [27], ceiling damage [112], steel cracks [113], steel corroson [113],
Among these studies, there is no method that considered CBSs in their loosened bolts [113].
training or testing datasets. Additionally, the CNN used a sliding win­ Additionally, Yeum et al. [114] also employed AlexNet to identify
dow technique to localize the detected damage within the input image, and localize welded connections of steel bridges as the target area for
as shown in Fig. 26. However, due to the fixed size of the window, inspection using input images collected from a UAV. Zhang et al. [115]
determining the optimal size becomes challenging due to various sizes of introduced a different approach for detecting multiple types of concrete
different damages, camera sensor and lens types, and different object damage in highway bridges by utilizing YOLOv3 [116]. Li et al. [110]
and camera distances. Therefore, bounding box level object detection used transfer learning from convolution based autoencoder to SSD for
methods were adopted to detect damages. the traing of the images of post-disaster building damge detection due to
Hurricane Sandy. It showed that the transfer learning improve approx­
3.2.2. Bounding box level damage detection imately 10% damage detrection performance.
Bounding box level object detection methods of DL can resolve the Cheng and Wang [117] proposed the use of faster R-CNN for defect
limitations of the fixed sizes of sliding window techniques in damage detection in sewer lines, utilizing 3000 images from CCTV videos for
detection and localization problems. For example, the faster R-CNN model training. The authors achieved a robustness measurement of 83%
provides flexible sizes of bounding boxes to localize the detected dam­ mAp, and suggested that increasing dataset size, adjusting filter di­
ages in input images as shown in Fig. 27. mensions, and adding convolution layers and stride values could
Cha et al. [27] established various damage datasets and trained the enhance performance.
faster R-CNN with four separate steps. The trained faster R-CNN ach­ Table 4 provides a summary of various articles, classified according
ieved 89.7% of accuracy to detect structural damage in bridges by to material type, subtype, damage type, network architecture, and ref­
considering CBS. Bridge structures are exposed to vehicular loading and erences. Damage detection at the bounding box level is superior in terms
extreme weather conditions that deteriorate their structural integrity, of localization compared to image classification based approaches (see
leading to complete failure or physical changes that can be represented Section 3.2.1), and it is significantly less expensive to establish data
as damage, such as cracks, corrosion, loosened bolts, settlement, de­ compared to pixelwise segmentation approaches, which will be dis­
flections, excessive vibration, internal defects, spalling, and cussed in the following section. Hence, this method is extensively used in
delamination. SHM. However, still it is insufficient for performing damage quantifi­
There are some considerable number of applications of bounding box cation, which is the final step of SHM's reliability analysis.

16
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Fig. 27. Schematic view of bounding box level damage detection method.

3.2.3. Pixel level damage segmentation addition to identifying cracks, DL techniques have been employed to
To quantify the detected damage, researchers have devised several identify other defects, including concrete spalling and internal damage.
methods for segmenting damage at the pixel level. These methods utilize Beckman et al. [126] directed a DL network towards depth camera data
DL techniques to identify defects and damages on a per-pixel basis, as for volumetric damage quantification and used faster R-CNN with a
depicted in Fig. 28. This approach is more precise than merely identi­ depth camera for spalling under depth streaming by the camera,
fying approximate bounding box positions of the damage. Termed se­ achieving an average precision of over 90%. In summary, numerous
mantic segmentation, this technique aims to pinpoint damaged pixels studies have been conducted on SHM with different goals in mind. These
within an image frame. The primary advantage of pixel-level damage goals include improving network performance by focusing on enhancing
segmentation lies in its capacity to provide a more intricate means of metrics such as mIoU, F-score, recall, and precision. There is a wealth of
ascertaining a defect's shape, size, and location. This, in turn, empowers research exploring these existing network-based and hybrid segmenta­
researchers to more accurately identify and quantify the extent of tion methods for damages.
damage. Kang et al. [127] introduced a computer vision-based hybrid
The majority of semantic segmentation networks adopt an encoder- approach for detecting cracks and determining their thickness and
decoder architecture. This structure serves to extract damage features length. Their method employed a modified faster R-CNN for crack
and restore the original spatial dimensions of the input image. It ach­ detection and a modified tubularity flow field algorithm for conducting
ieves this by marking the object in a pixelwise manner, thereby pre­ pixel-level crack segmentation within bounding boxes obtained from
senting the detected damage pixel by pixel. Numerous comprehensive faster R-CNN. Additionally, they utilized a modified distance transform
studies have been conducted in this field, with only a subset of these method to predict the length and thickness of cracks. These approaches
methods being presented in Table 5. achieved an average precision of 95%, an Intersection over Union (IoU)
The majority of this field of study has centered around adapting of 83%, and an accuracy of 93%, respectively.
existing publicly available networks originally developed for tasks un­ Employing these existing networks proves advantageous in stream­
related to structural damage segmentation. For instance, networks such lining the implementation of DL for SHM. Nevertheless, these networks
as SegNet, UNet, FCN, Mask R-CNN, DeepLapV3+, and PANet [52], often carry undue complexity due to their original objectives differing
often combined with faster R-CNN, ResNet series, DenseNet series, and from structural damage segmentation. For example, ResNet-101 and
VGG series, have been repurposed as the foundational architecture for UNet consist of 44.5 million and 31 million learnable parameters,
these methods. To illustrate, Dong et al. [119], they proposed a pixel- respectively. These extensive networks require significant labeled data
wise segmentation network for crack and spalling based on the SegNet for training, incurring high computational costs and rendering real-time
algorithm. The team integrated the focal loss (FL) function into the FL- processing of normal videos (e.g., 1000 × 500 × 3) unfeasible. Several
SegNet model for detecting crack and spalling defects. studies have aimed to refine these networks to enhance performance.
PANet was combined with the A* algorithm for crack width and For instance, Huang et al. [128] proposed a DL method for defect
length calculation [120]. The proposed approach was compared to segmentation in tunnels. They developed a new moving tunnel inspec­
UNet, mask R-CNN [50], and DeepCrack [121], achieving an mIoU of tion tool to collect data from tunnels that contained cracks and leaks,
50.28%, surpassing mask R-CNN by 2%, DeepCrack by 19%, and UNet which were used to develop the dataset. The data were then used to train
by 23%. Zhao et al. [122] employed Mask R-CNN for tunnel crack seg­ the FCN model, separately for cracks and leakage detection. The study
mentation. Liu and Wang [123] developed a crack segmentation defined different defect conditions, such as crack only, leakage only, and
approach based on UNet by integrating VGG19, InceptionRexNetv2, and overlapping and non-overlapping crack-leakage. The authors utilized
ENetb3. Ji et al. [124] also harnessed DeepLapV3+ for crack segmen­ two stream algorithms, namely sliding window operations (SWO) and
tation. Similarly, Xi et al. [125] created YDRSNet by integrating Deep­ resizing interpolation operations (RIO), for defect recognition. The SWO
LabV3+ and YOLv5 to address gear fitting measurement challenges. In was used for recognizing cracks, and RIO was used for detecting leakage.

Table 4
Bounding box level damage detection.
Network Structure Damage Train Val. Test Input size Prec. (%) Acc. (%) CBS

Faster R-CNN (Cha et al. [27]) Bridge Crack, delamination, steel bolt corrosion 1653 – 713 500 × 375 89.70 – Yes
SSD (Li et al. [110])
DINN (Ali et al. [111]) Tunnel Crack 3600 300 96 640 × 480 – 94.00 No
Faster R-CNN Inception (Semwal et al. [112]) Building Ceiling 5800 – 100 768 × 1024 – 89.53 No
Faster R-CNN & ResNet101 (Ali et al. [113]) Steel bridge Crack, loosened bolt, corrosion 2971 – 322 1280 × 720 93.31 92.16 No
AlexNet (Yeum et al. [114]) Truss Steel crack 520 256 × 256 94.13 No
YOLOv3 (Zhang et al. [115]) Bridge Crack, spalling, exposed bar, pop-out 1764 – 441 1280 × 960 – 80.0 No
Faster R-CNN (Cheng and Wang [117])
SSD MobileNets (Perez and Tah [118]) Building Crack 700 – 176 224 × 224 53.00 61.00 No

17
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Fig. 28. Schematic view of pixel wise damage segmentation method.

Table 5
Pixel level damage segmentation.
Network Structure Damage Train Val. Test Input size mIoU (%) Prec. (%) Acc. (%) FPS CBS

CrackNet (Zhang et al. [130]) Pavement Crack 1800 – 200 1024 × 512 90.00 – No
FPHBN (Yang et al. [131]) Pavement Crack 1896 – 908 480 × 320 56.00 – – 5.10 No
FCN (Wang et al. [132]) Dam Crack 1200 150 150 608 × 608 60.15 – – – Yes
SDDNet (Choi and Cha [33]) Building Crack 160 – 40 1024 × 512 84.60 – – 36 Yes
SegNet (Dong et al. [119]) Tunnel Crack 226 – 72 256 × 256 81.53 – 69.86 – Yes
FCN (Hoskere et al. [133]) Pavement Crack 1200 160 320 288 × 288 79.80 – 91.70 – No
Mask RCNN (Zhao et al. [122]) Tunnel Crack 3622 906 503 800 × 800 – 98.46 – 9.34 No
Improved PANet (Zhao et al. [120]) Tunnel Crack 572 143 79 3000 × 3000 50.26 80.95 – No
DeepCrack (Liu et al. [121]) Tunnel Crack 572 143 79 3000 × 3000 45.46 31.11 – No
DcsNet (Pang et al. [134]) Building Crack 250 50 200 256 × 256 56.60 – – 101 No
STRNet (Kang and Cha [135]) Building Crack 1203 – 545 1280 × 720 92.60 91.70 – 49.2 Yes

The developed method exhibited superior performance when compared clocking in at over 49 FPS even with large input image sizes (1280 ×
with the region growing algorithm (RGA) and adaptive thresholding 720, 1024 × 512).
algorithm (ATA). The proposed method's maximum error rate was 0.8%, Many evaluation metrics are available. Recall and precision are
significantly less than RGA (7.1%) and ATA (18.4%). metrics that evaluate a classification model based on its ability to
Jang et al. [129] adapted SegNet [47], originally an encoder-decoder correctly identify true positives and avoid false positives, and its ability
network with 26 convolution layers and 10 pooling layers for concrete to correctly identify true positives while minimizing false negatives,
crack segmentation. The modified SegNet reduced convolution layers to respectively. The F1-score combines recall and precision by giving them
20 and pooling layers to 8, optimizing it for binary classification. This equal weight, but it can be challenging to intuitively grasp its values. In
approach additionally quantified detected cracks by determining their contrast, the mIoU metric takes into account false positives and false
width, length, and shape through Euclidean distance transformation. negatives and provides a more intuitive and comprehensive assessment
However, the majority of the above-mentioned methods still remain of the model's performance. Therefore, it is currently considered as the
computationally intensive, rendering them generally unsuitable for real- most suitable metric (Kang and Cha [135]; Müller et al. [136]). Fig. 29
time processing of inspection videos with a minimum frame rate of 30 visualized mIoU that measures the ratio of the intersection area between
FPS. While Pang et al. [134] achieved an FPS of 101, this was achieved the predicted and ground truth masks to their union area.
using a very small input image size (288 × 288), which is not cost-
TP
efficient for monitoring large-scale infrastructures. Furthermore, many Recall = (1)
TP + FN
of these methods do not consider CBS, instead focusing on pure concrete
or pavement surfaces. TP
In this context, Choi and Cha [33] took the initial step inproposed a Precision = (2)
TP + FP
deep convolutional encoder-decoder, integrated with ASPP and depth­
wise separable convolution (DWSC), specifically tailored for concrete 2 × Precision × Recall
F1 − score = (3)
crack segmentation in CBS. To develop their distinct DenSep module for Precision + Recall
crack segmentation, they harnessed DWSC and application of the PWC
concept, followed by DWSC in reverse order, effectively reducing 1 ∑N TPi
mIoU = (4)
computational costs. Their innovation laid the foundation for SDDNet, N i=1 T + F + F
Pi Pi Ni

which adopted an ASPP concept in a modified manner for crack seg­


mentation. SDDNet having 0.16 M learnable parameters enabled real- 3.2.4. Summary of computer vision based approaches
time processing of videos at 36 FPS with substantial RGB input images Computer vision with DL approaches has been extensively reviewed
(1024 × 512).
Kang and Cha [135] introduced STRNet, an entirely new DL
network. This network employed a combination of global and self-
attention mechanisms, coarse upsampling, a focal-Tversky loss func­
tion, and learnable nonlinear swish activation functions. Performance
evaluations, conducted against Attention UNet, CrackSegNet, Deep­
LabV3+, FPHBN, and UNet++, revealed STRNet's unparalleled excel­
lence. It achieved the highest precision, recall, F1 score, and mIoU
scores, reaching 91.7%, 92.7%, 92.2%, and 92.6%, respectively.
Furthermore, with 2.1 M learnable parameters, STRNet's processing
speed was found to be more than adequate for real-time operations, Fig. 29. Visualized IoU metric.

18
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

in Section 3.2 for surface damage identification dependent on structure 3.3. Integration of DL and UAV
types such as bridges, buildings, pavements, dams, tunnels, and sewer
systems. The initial objective of such research was to design robust The progress made in DL has significantly enhanced the identifica­
classifiers that can detect damage and minimize the effects of mea­ tion of damage; however, monitoring and gathering data on large-scale
surement noises, such as shadows, spotlighting, and blurs by Cha et al. structures remains a major challenge. Various methods, including
[24]. physically climbing the dam wall, cranes, and telescopes, are risky,
The next stage in this direction was the localization of damage, costly, and time-consuming for monitoring dam structures. To address
which aimed to define the most probable position of detected damage this issue, researchers have focused on integrating computer vision-
and identify multiple types of damage using bounding boxes (Cha et al. based methods into damage detection, including the use of DL with
[27]; Zhang et al. [115]). This research was then extended to more UAVs.
precise and pixel-level damage segmentation using various models, such Table 7 highlights some significant activities of DL with UAV for
as Choi and Cha [33], Dong et al. [119], Kang and Cha [135], and Liu damage detection in various structures. For example, Wu et al. [138]
et al. [121]. Each of these three different approaches has its own ad­ employed a UAV equipped with DL algorithms to detect cracks damage
vantages and limitations, which are summarized in Table 6. in bridges, as well as distress in concrete pavement; and Kim et al. [139]
While there have been a considerable number of studies on the merged an AlexNet network with UAV technology for concrete crack
application of DL approaches for the segmentation of concrete and detection; Rau et al. [140], who employed a multi-rotary UAV and
pavement cracks in SHM, some limitations have been noted. Many of performed object-based image analysis for bridge crack detection.
these methods are based on existing networks that were originally Feng et al. [141] collected data from the dam surface using a DJI
developed for different purposes, resulting in a large number of learn­ Mavic 2 UAV along a predefined path. A dataset of 100 raw images
able parameters that may make them less suitable for real-time pro­ (5472 × 3648) was generated and manually cropped to develop datasets
cessing of large input images. The size of input images can affect several for training and testing the network. The network's performance was
factors, including the computational resources required for image pro­ compared with SegNet, UNet, FCN, and ResNet-152. The method ach­
cessing and the accuracy of the methods [137]. ieved an accuracy of 80.45%, precision of 80.13%, and F-measure of
A larger input size can result in more detailed representations of the 79.16% for crack segmentation. The proposed method also achieved an
structure and, therefore, improved accuracy, but it may also increase the mIoU of 66.76%, which was 2% higher than SegNet, 4% higher than
computational resources required to process the images and the size of Unet, 11% higher than FCN, and 19% higher than ResNet-152.
the image files. Additionally, the number of data samples and the sep­ Arjoune et al. [143] applied an instance segmentation and clustering
aration of training, validation, and testing data can also impact the re­ approach to energy assessment in built environments and used a UAV to
sults and generalizability of the methods. collect thermal images from a building envelope. Samma et al. [144]
When developing computer vision-based damage segmentation incorporated road image data collected using a UAV into a pretrained
methods for real-world SHM applications, it is important to consider the CNN for damage detection. Shi et al. [142] used high-resolution images
CBS that are commonly encountered, rather than just pure concrete and obtained via a UAV and combined them with an FCN for bridge damage
pavement surfaces. This is necessary to ensure more robust damage segmentation and corrosion detection in bridge structure images. Yu
identification methods that can operate in versatile visual environments. et al. [145] employed the YOLOv4-FPM model for UAV-based crack
To address the limitations of existing crack segmentation algorithms, detection on a bridge. The authors used focal loss as a loss function,
researchers have developed new and highly efficient approaches for which improved the accuracy of crack detection in CBS. Tian et al. [146]
identifying cracks in concrete. designed a UAV to collect data from a power grid line, and the collected
Several networks, such as SDDNet [33], FCN [132], and STRNet data were used for dataset preparation and data augmentation with
[135], have been specifically developed for damage segmentation and CycleGAN.
have demonstrated good performance in complex environments (as Despite the usefulness of the methods discussed previously, they all
described in Sections 3.2.2 and 3.2.3). Additionally, it is important to require manual control of UAVs. Additionally, controlling UAVs in areas
report a model's FPS, as well as indicate the size of test input images in with poor GPS coverage, such as the undersides of bridges, tunnels, or
FPS calculation. building structures, can be challenging. To address these issues, Kang
Advanced DL-operations are valuable for a variety of reasons. These and Cha [99] pioneered the development of an autonomous navigation
operations include ASPP, dilated convolution, depthwise separable system and flight method for UAVs. This approach was integrated with
convolution, nonlinear activation function, concatenation (rather than CNN [24] to detect concrete cracks in GPS-denied environments. They
just summation of multilevel features), residual connections, and utilized an ultrasonic beacon system (UBS) to localize UAVs. The dam­
various attention mechanisms [135]. The reason they are valuable is age detection CNN was used and it showed the even though the images
that they enable more efficient and accurate feature extraction and are slightly blurry due to vibrations incurred from UAV wing motors, the
integration, ultimately improving the performance of the damage seg­ CNN could detect concrete cracks satisfactory.
mentation model, while maintaining real-time speed and processing for Ali et al. [113] extended the autonomous method for multiple
relatively large input images. damage detection by modification of faster R-CNN and applied it to
detect damage on a small prototype steel bridge and a real parking
structure. The autonomous UAV successfully followed predefined tra­
jectories, and real-time damage detection was achieved from the UAV

Table 6
CV based method comparative analysis.
Network Applications in SHM Advantage Limitation

Image Damage classification


Less computational cost and convenient to establish ground truth dataset Difficulties in damage localization and quantification
classification within images
Bounding box level Damage detection Better damage localization and quantification than image classification Still less effective than pixelwise segmentation
detection within images approaches, and convenient to establish ground truth dataset methods in damage localization and quantification
High accuracies in damage location and quantification; through an object More computational cost if existing heavy
Pixelwise Damage segmentation
specific design of networks, real-time, efficient processing is possible (ex., segmentation networks used, and tedious labeling of
segmentation within images
SDDNet, STRNet) ground truth data

19
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Table 7
UAV application for SHM.
Network Type & task Train Val. Test Input size mIoU Prec. Acc. Auto.

Faster R-CNN with UAV (Kang and Cha [99]) Building (Crack) 40K – – 2304 × 1280 – – 96.60 Yes
CNN with UAV (Wu et al. [138]) Multiple damages 2288 – – – – – 79.00 No
R-CNN and UAV (Kim et al. [139]) Crack quantification 384 384 – 256 × 256 – – – No
Deep CNN (Feng et al. [141]) Crack 404 50 50 608 × 608 66.76 80.31 – No
FCN with UAV (Shi et al. [142]) Corrosion 9728 – 2719 224 × 224 – 88.68 – No
Mask R-CNN (Arjoune et al. [143]) Heat loss 100 K – – 512 × 640 66.00 53.00 – No
Pre-trained CNN (Samma et al. [144]) Road damage 422 – 107 224 × 224 – – 97.20 No
YOLOv4-FPM (Yu et al. [145]) Crack detection 2768 – – 1000 × 1000 – 97.60 – No
Faster R-CNN (Ali et al. [113]) corrosion, cracks, loosened bolts 2673 297 322 1280 × 720 92.16 93.11 – Yes
CycleGAN (Tian et al. [146]) Power line rust 9080 – – 256 × 256 – 66.80 – No
Obstacle avoiding UAV (Waqas et al. [147]) Crack detection 1203 – 599 1024 × 512 92.50 – – Yes

videos. Fig. 30 illustrates the flowchart of the real-time UAV-driven The different implementations of UAVs, ranging from manual con­
multiple damage mapping in GPS-denied environments. The blue trol to full automation, are evaluated and summarized in Table 8,
colored solid line in Fig. 30 represents the training process, including highlighting their respective benefits and limitations. Nevertheless, the
data collection, dataset preparation, and network training. The red practical application of these methods to actual civil structures remains
colored dashed line indicates the input of real-time testing videos into limited, primarily due to the experimental testing stage.
the trained model for damage detection.
As shown in Fig. 30, data is collected from a target structure using an 3.4. Digital twin with DL for SHM
autonomous UAV equipped with a camera. One notable contribution of
this study is the detection of steel cracks with extremely small thick­ The use of 3D reconstruction techniques has attracted significant
nesses, ranging from 0.010 to 0.015 m in length and 0.00009 to 0.0002 attention in diverse fields. While 3D reconstruction can create digital
m in thickness. twin models, the methods discussed in Sections 3.2 and 3.3 mostly focus
Waqas et al. [147] have proposed a method for improving the per­ on using 2D images for identifying structural damage. However, a
formance of autonomously flying UAVs by utilizing fiducial marker- challenge faced by 2D approaches is the lack of depth information,
based localization. Their comparative study demonstrated that the use which is crucial for comprehensively understanding the overall damage
of fiducial markers is more accurate, stable, and robust for UAV locali­ status of a structure of interest. A more holistic understanding of damage
zation compared to the use of UBSs for SHM. Furthermore, its autono­ and structural status is necessary for systematic monitoring and main­
mous flight method enables obstacle avoidance within the planned tenance. Therefore, 3D reconstruction is currently receiving significant
trajectory of autonomous UAVs. This capability enhances the reliability attention.
and practicality of applying UAVs to real-world problems. In SHM, several general steps involve using 3D reconstruction with
DL-based damage identification, as depicted in Fig. 31. The first step is
3.3.1. Summary of integration of DL and UAV data collection, which employs 2D photogrammetry with monocular or
The utilization of computer vision sensors aided by deep DL for binocular cameras, or various types of 3D scanners, including depth
damage detection in civil structures presents a clear challenge due to the cameras and light detection and ranging (LiDAR). Depending on the
potential expenses associated with installing numerous cameras and collected data types, the approaches for generating 3D reconstruction
their future maintenance. As a solution, UAVs have been increasingly models differ. Ma and Liu [148] summarized two distinct methods for
embraced to address these concerns. A spectrum of DL networks has 3D reconstruction, forming the basis for the enhanced process shown in
been explored, ranging from image-level classification to pixel-wise Fig. 31, specifically tailored for SHM purposes.
segmentation, for detecting various types of damages in diverse civil When 2D photogrammetry is used for data collection, the collected
structures. Researchers have focused on integrating DL with UAVs data follows the general steps of structure from motion (SfM), dense 3D
[139–145]. reconstruction, preprocessing, and point cloud, mesh, or geometric
Despite these advancements, most of these methods rely on manned- model generation, based on the objectives. In SHM, the common choice
control UAVs, and poor GPS coverage makes it difficult to control UAVs to date is the use of mesh models with real textures [124,149]. In
in the undersides of bridges, tunnels, or building structures. To over­ contrast, 3D scanned data skips the steps of SfM and dense 3D recon­
come these challenges, Kang and Cha [99] proposed a combination of DL struction, proceeding directly to preprocessing, as illustrated in Fig. 31.
and autonomous UAVs for concrete crack detection in GPS-denied en­ The generated models are used in two different ways for damage
vironments, using a UBS system to control autonomous flight. Further­ identification. The first approach involves directly identifying structural
more, Ali et al. [113] extended this method and used it for multiple damage using DL networks from the collected 2D or 3D data, updating
damage detection on a small prototype steel bridge, and Waqas et al. the damage's sizes and locations on the generated 3D models [150]. The
[147] used fiducial marker as the localization method to overcome the second approach [151] uses the generated 3D model as input for DL
limitation of UBS systems, and integrated an obstacle avoiding method. networks to identify damage, as shown in Fig. 31, with the identified

Fig. 30. Multiple damage mapping using UAVs in real time in a GPS-denied environment.

20
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Table 8 (BIMs) [153] for further structural analysis and maintenance.


Comparison of DL with UAV applications. To develop a 3D model, the collected data from 2D photogrammetry
Control Application in Advantage Limitations Authors undergoes processing through SfM. SfM includes feature extraction,
SHM feature matching, device motion estimation, and absolute scale recov­
Limited to Kim ery. As representative traditional features, the following are available,
Manual
accessible et al. though not limited to: multi-scale descriptors (MSD) [154], scale-
localization of Cost-effective
Manual UAV UAV and and efficient
areas; GPS [139]; invariant feature transform (SIFT) [155], features from accelerated
dependence Rau segment test (FAST) [156], speeded-up robust features (SURF) [157],
detected monitoring
and require et al.
damage
skilled pilots [140]
etc.
Automatic Using the extracted features, feature matching is performed to match
Autonomous
mapping of feature points in each image pair. For this feature matching, various
flight control; Limited
Autonomous detected
Enhanced practical Kang methods have been extensively used, including approximate nearest
UAVs with damage;
accessibility; application; and Cha neighbors (ApNN), random sample consensus (RANSAC) [158], opti­
ultrasonic Concrete crack
beacon detection in
good for GPS Experimental [99]; mized random sampling algorithm (OSRA) [159], closed constraint
denied stage adjustment (CCA) [160], and the fast library for approximate nearest
GPS-denied
environment
environments neighbors (FLANN) [161]. ApNN predicts the Euclidean distance of
Automatic feature points, and false matches from ApNN can be removed by RAN­
Additional
mapping of
detected Autonomous
beacons are SAC and OSRA, which is an updated version of RANSAC.
Autonomous required if pier As depicted in Fig. 31, the subsequent step involves the data
damage; flight control;
UAVs with walls obstruct Ali et al.
Damage Enhanced collection device motion estimation, which aims to determine camera
ultrasonic UAV flight; [113]
beacon
detection in accessibility;
vulnerable to parameters, including intrinsic parameters (such as focal length and
steel and real time
various radial distortion) and extrinsic parameters (rotation and translation) for
concrete
structure
magnetic fields each image pair, utilizing epipolar geometry in conjunction with the
Collision features extracted in the previous step. Epipolar geometry represents a
avoidance of pair of images captured from distinct angles at different locations,
UAV; more establishing the relationship between points in 3D space and their cor­
Autonomous
Crack accurate and More robust
obstacle responding observations, as demonstrated in Fig. 32 [162].
detection in robust obstacle Waqas
avoidance Epipolar geometry employs essential and fundamental matrices to
concrete and localization of avoiding et al.
with
avoiding UAV and methods [147] signify the correspondence between matched feature points within the
fiducial
marker
obstacles detected required. image pair. Utilizing these matrices, the most common approaches for
damages
determining camera intrinsic and extrinsic parameters include the five-
through geo-
tagging point algorithm (FPA) [163], the eight-point algorithm (EPA) [164], and
the direct linear transform (DLT) [165].
Using the feature points obtained from feature extraction and feature
matching, along with the camera parameters derived from the epipolar
geometry concept, a sparse 3D reconstruction can be achieved using the
triangulation algorithm [166]. This algorithm determines the 3D loca­
damages being mapped onto the model. However, recently Ding et al. tions of points in the global coordinate system (GCS), relative to the local
[152] indicates that the first approach yields significantly better results, coordinate systems (LCS).
as the resolution of the generated 3D model is inadequate for detecting With the sparse 3D reconstruction generated by SfM and the camera
minor structural damage, such as thin cracks. The mapped damage from parameters from the device motion estimation, combined with addi­
the 3D model can be integrated with building information models tional images, a dense 3D reconstruction becomes feasible through well-

Fig. 31. 3D reconstruction in SHM using a TLS approach.

21
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

optical laser triangulation system as a 3D model generation method for


concrete damage detection by fusing a linear laser ray generator and a
sports camera. Beckman et al. [126] presented a method for quantifying
the volume of concrete spalling using a depth camera. This method
utilizes RANSAC for fitting the concrete surface by integrating faster R-
CNN.
The spatial coordinate data precision obtained using LiDAR is
remarkably high, ranging from 1 to 10 mm [176]. However, the use of a
laser scanner can be limited in certain fields due to its high cost and
challenges related to diffused reflection on natural vegetation and soil
surfaces, leading to penetration issues [177]. Additionally, installing
LiDAR equipment onsite can be time-consuming, expensive, and
disruptive to facility operations.

3.4.2. Application of 2D photogrammetry for 3D reconstruction


Fig. 32. Epipolar geometry.
To overcome the limitations of 3D scanning based 3D model recon­
struction, 2D photogrammetry can serve as an alternative technique to
acquire point cloud data by arranging 2D multi-view images, using
known approaches such as semi-global matching (SGM) [161], and
methods such as multi-view stereo (MVS) [178] and structure from
clustering multi-view stereo (CMVS) [167].
motion (SfM) [179].
Typically, the generated dense 3D models, whether from 2D or 3D
Although photogrammetry is more cost-effective than laser scanning,
measured data, contain noise, outliers, or redundancy. Consequently,
it suffers from lower accuracy, generating errors ranging from centi­
further processing, such as registration, noise filtering, outlier removal,
meters to meters [176]. However, recent advancements in UAV and
or downsampling, or combinations thereof, can be performed to
camera quality have significantly improved photogrammetry. For
enhance the quality of the 3D model, as illustrated in Fig. 31. Repre­
example, Zhao et al. [151] developed a 3D reconstruction model based
sentative methods include the iterative closest point (ICP) algorithm
on manual UAV image data for monitoring dam structures. They used
[168] for registration, manual and automatic noise filtering, RANSAC
SfM to match features of multiple images and create a high-precision 3D
for outlier removal, and the point spacing strategy (PSS) [169] for
dam model, detecting various damages (spalling, collapse, rack) on the
downsampling. Ultimately, 3D mesh models, with or without real tex­
generated 3D model. The reported differences between actual and
tures, are generated based on the intended purpose.
measured dam lengths ranged from 0.01 to 0.2 m, indicating the high
accuracy of their model.
3.4.1. Application of 3D scanning for 3D reconstruction
Cheng et al. [180] proposed a near-real-time gradual generation of
The field of 3D reconstruction models for SHM comprises two
3D models using UAV images, spatially linking sequential images for
distinct groups of studies: 2D photogrammetry-based reconstruction and
image localization and suitable stereopair selection for disaster areas.
3D laser scanning-based reconstruction. Recent years have seen
The study used binary robust invariant scalable keypoints (BRISKs)
considerable efforts to enhance and automate point cloud segmentation
[181] as feature extraction, and SfM was used for 3D model generation.
and modeling in the context of 3D laser scanning-based reconstruction.
For instance, a fully automated point cloud segmentation technique
3.4.3. Application of DL for damage identification using 3D reconstruction
based on LiDAR was proposed for SHM in masonry arch bridges by
Recently, studies have integrated deep learning with damage iden­
Riveiro et al. [170]. This method employed principal component anal­
tification in combination with 3D reconstruction models, as presented in
ysis to segment bridge components such as piers, arches, and spandrel
Table 9. For example, Ji et al. [124] introduced a general deep
walls.
autoencoder to segment multiclass structural damages in 3D recon­
Xu et al. [153] developed terrestrial laser scanning (TLS) methods
structed point clouds using 3D LiDAR for tunnel structures. To improve
combined with image processing techniques. These methods were used
the quality of the 3D model, data clearing (manual CloudCompre soft­
for assessing cracks in bridges and for a 3D reconstruction approach to
ware), data normalization (median value of input points), and data
evaluate surface damage, such as cracks, surface voids, and honey­
decomposition and feature fusion were implemented. For data decom­
combing in prefabricated elements.
position and feature fusion, 4D information of the point cloud is denoted
Researchers have proposed various TLS data collection strategies to
as (x, y, z, I), where I stands for the intensity of brightness of a point.
efficiently scan structures of interest. These include the top-down
Chaiyasarn et al. [182] proposed a crack damage mapping method
approach by Lu et al. [171], the divide-and-conquer method by Zhang
by applying a CNN-FCN crack segmentation approach to a 3D real
et al. [104], and the hierarchical scanning approach suggested by Jia
texture model. They designed a two-step crack segmentation process
and Lichti [172]. Lu et al. [171] proposed a top-down method for
where a CNN was applied to classify input patch images containing
detecting structural components in RC bridges from point cloud data,
cracks, and images with cracks were fed into FCN for segmentation.
which achieved high performance and reduced computational costs by
2D: photogrammetry data, 3D: 3D laser scanned data, (M): manual
segmenting pier caps, girders, and deck assemblies and merging over­
control, Mp: million points, Texture: real texture of the developed 3D
lapping segments into labeled point clusters. However, these TLS data
model.
collection methods have limitations, such as difficulties in handling
Zhao et al. [186] proposed a dam damage detection method using
complex geometries and small girder spacing, which require further
UAV and 3D reconstruction. The collected data from UAVs were used to
investigation. Besides data collection strategies, factors such as envi­
generate a 3D reconstruction model. Although UAVs have their posi­
ronmental conditions, geometric characteristics of the target structure,
tioning function with GPS, the researchers further improved the accu­
and the reflective properties of surfaces also impact data quality. The
racy of the localization of the UAV camera by integrating ground control
quality of laser scanning equipment is crucial [173].
points and GPS data. Ground control points were placed on the concrete
Jiang et al. [174] introduced a 3D model reconstruction method for
dam crest, as assigning them on arbitrary surfaces of the dam is incon­
construction site layout planning using UAV images, involving general
venient for obtaining accurate measures in the global coordinate system
SfM and bundle block adjustment algorithms to determine camera pa­
to enhance the accuracy of the 3D models. For damage detection, they
rameters and optimize the 3D model. Hua et al. [175] proposed an
proposed the YOLOv5s_HSC model by integrating a swin transformer

22
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Table 9
3D reconstruction for SHM.
Network Data, Devices Type & task Train Val. Test Input size mIoU Prec. Acc. Texture

20.7 5.1 6.5


Deep autoencoder (Ji et al. [124]) 3D, LiDAR Tunnel damages – 84.7 – – Yes
Mp Mp Mp
Crack
CNN-FCN (Chaiyasarn et al. [182]) 2D, UAV (M) 30,000 3245 811 150 × 150 – 99.88 – Yes
segmentation
Crack
UNet (Chen et al. [150]) 2D, UAV(M) 250 × 250 58.40 No
segmentation
Crack
ADUNet (Deng et al. [183]) 2D binocular 224 × 224 61.45 72.16 – No
segmentation
Improved Mask R-CNN (Fang et al. 2D, floating capsule robot Crack 1250 ×
1307 437 – – – No
[184]) (M) segmentation 1080
YOLOv5 + transformer (Idjaton et al.
2D, UAV (M) Spalling 2307 552 1096 256 × 256 – – 81.00 Yes
[185])

and coordinate attention module. Significant progress has been made in 3D model reconstruction and
Fang et al. [184] proposed an improved Mask R-CNN by adopting a damage detection using DL networks, especially with the 3D laser
split attention module for sewer system crack segmentation using im­ scanning approach. This method offers the advantage of obtaining high-
ages from floating capsule robots. 3D model reconstruction was also precision spatial coordinate data directly from the scanning devices
performed to localize the detected damage based on the location of the without the general SfM process. However, it often proves expensive, is
floating capsule robot. Visual odometry and simultaneous localization cumbersome to mount on UAVs, and faces challenges from diffused
and mapping (SLAM) were employed for 3D model reconstruction. For signal reflection on natural vegetation and soil surfaces. These limita­
feature extraction, oriented FAST and rotated BRIEF including an ori­ tions hinder the widespread adoption of 3D scanning-based 3D
ented multi-scale FAST were used, and for feature matching, RANSAC reconstruction.
was utilized. Consequently, 2D photogrammetry-based 3D reconstruction with
Chen et al. [150] introduced a novel computational approach for robots (UAVs, capsule robots) is more commonly applied to various
detecting and reconstructing concrete defects from geotagged UAV im­ structures, including buildings, bridges, pipelines, tunnels, and con­
ages. This approach involves aligning the images onto a semantic-rich struction sites. In this approach, structural damages are detected using
BIM using a bundle registration algorithm. UNet was used to segment various DL networks in two distinct ways. Either damage is first detected
cracks using 2D input images, achieving a mean intersection over union and segmented in the raw 2D images, and then the localized damages
(mIoU) of 58.4%. This method significantly reduces false discovery rates are mapped onto 3D models, or vice versa. Based on an extensive review,
and improves the accuracy of defect reconstruction. it is challenging to identify a clear superior method at this point, given
In a related development, Zhao et al. [187] introduced a DL-based the limited number of case studies available and the emerging nature of
framework named Structure-PoseNet, which consists of CompNet and this topic.
ParaNet for 3D structure displacement measurement using a monocular The 2D and 3D model reconstructions from various devices and
camera. CompNet segments the structural components of interest and methods are compared and summarized, along with their respective
predicts the structure pose parameters, which are then used to recover advantages and limitations, in Table 10. Additionally, a majority of
the 3D information of the structure through ParaNet, ultimately calcu­ implemented DL networks are publicly available networks such as the
lating displacement for all pixels. The calculated displacement exhibited YOLO series, Mask R-CNN, UNet, etc., with improvements often ach­
an error within 2%. The researchers validated the proposed method ieved by integrating various attention modules.
through experiments involving shaking table tests and real-world
examples.
3.5. Vibration-based approaches
Zhuge et al. [188] proposed a method that utilizes multi-UAVs to
measure bridge deflection by employing visual information from the
Within the realm of SHM, vibration-based approaches stand as a
UAVs. Their method includes a DL-based center extraction technique
global method. These approaches make use of recorded structural vi­
and a feature point tracking algorithm based on scale-invariant feature
bration responses to assess and identify damages within structures. They
registration. The method was validated through simulation and exper­
tap into the dynamic tendencies of structures, capturing vibrations
imental results, showing a root mean square error of less than 0.5 mm.
triggered by external forces or inherent structural attributes. Vibration
Ding et al. [152] compared the crack width that can be detected
data acquired from sensors like accelerometers or strain gauges fur­
based on the two different 3D reconstruction methods, SLAM and SfM.
nishes valuable insights into the well-being of the infrastructure
From the comparative studies, the SfM based 3D reconstruction showed
component under scrutiny.Measured structural vibration responses are
better resolution to measure 0.17 mm thickness of cracks, but SLAM can
typically 1D, and their collections are 2D data that can be analyzed using
detect cracks in thickness of more than 10 mm. Deng et al. [183] pro­
CNNs. In recent years, various DL architectures have been applied to
posed a binocular video-based 3D reconstruction method using binoc­
vibration data, utilizing supervised, semi-supervised, and unsupervised
ular visual simultaneous localization, with a mapping method and crack
approaches. By employing advanced operations and DL techniques,
segmentation method (ADUNet), where the segmented cracks are
these approaches aim to enhance the analysis of vibration data and
mapped on to the 3D model to measure the length.
enable effective detection of structural defects.
3.4.4. Summary of digital twin and 3D reconstruction
3.5.1. Supervised approaches
The application of digital twins through 3D reconstruction tech­
In supervised approaches, two different methods are available for
niques for visualizing civil structures for SHM purposes is in its nascent
damage detection. The first method involves preprocessing to convert
stage. Initial attempts at 2D reconstruction have been limited due to the
the vibration input into 2D data or 3D images, which are then used as
lack of depth information, leading to the development of 3D recon­
inputs for DL models to identify damages. The second method uses the
struction techniques, primarily through photogrammetry or 3D laser
measured vibration data directly as the input to DL models for damage
scanning.
detection. As an example of the first method, Fallahian et al. [189]

23
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Table 10
Comparison of various 3D model reconstruction methods.
Network Applications in SHM Advantage Limitations

High-precision spatial data directly from Costly, cumbersome for UAVs, signal
3D Laser Scanning 3D model reconstruction and damage detection
scanners reflection issues
3D reconstruction with robots (UAVs, capsule
2D Photogrammetry More common for various structures Limited case studies, emerging field
robots)
DL Networks for 3D Detecting and mapping damages from 2D to 3D or Utilizes publicly available networks with Emerging nature of the field, limited case
Reconstruction vice versa improvements studies

introduced an ensemble classification method that combines a deep identification that involves two steps: data normalization for pre­
neural network with coupled sparse coding for structural damage processing and CNN for damage localization. Normalized acceleration
detection. This ensemble classification approach utilizes five mode signals were used as input to the CNN to identify damage locations in
shapes and principal component analysis to feature extraction from structural members. The proposed CNN automatically extracts features
frequency response functions, which were calculated from the finite from time series measurements, allowing for efficient and accurate
element analysis (FEA) model, considering temperature variations. detection of damage locations.
Bao et al. [190] proposed a computer vision-based method for Abdeljaber et al. [196] proposed a 1D CNN to detect structural
detecting anomalies in SHM systems. The method involves data visual­ damage on a small scale four-bay grid steel structure equipped with 15
ization and training a deep neural network for data anomaly classifi­ accelerometers placed at different locations to capture accelerations in
cation. Time-series signals are transformed into grayscale images by the east-west and north-south directions. The proposed method suc­
splitting them into sections and plotting them as figures. These images cessfully detected induced structural damages. Here, the 1D CNN refers
are then saved as files and used as inputs for the neural network. Labeled to a specific type of CNN in which all the filters and feature maps during
training data and techniques such as stacked autoencoders and greedy the hidden layers exist in a 1D format. This very simplified CNN can
layer-wise training are used to train the network, which can then detect reduce the computational cost. However, consequently, the extracted
potential anomalies in large amounts of SHM data. features throughout the hidden layer processes need to be aggregated to
Yu et al. [191] proposed a CNN-based technique for assessing and maintain the 1D domain, instead of utilizing concatenation. This
localizing damages in a laboratory five-level benchmark structure using approach might entail the risk of losing essential features and con­
vibration signals as input. The structure was equipped with smart iso­ straining the range of viable operations, because, nonlinear relation­
lators and subjected to seismic loading conditions. The measured ac­ ships exist between the degree of damage and the resultant reduction in
celeration responses were converted to frequency domain through FFT, structural properties and behaviors, as well as their dynamic charac­
and the 2D frequency responses were inputted to a traditional CNN ar­ teristics [197].
chitecture to extract the vibrations of damage sensitive features. The A comprehensive exploration was conducted to identify a deep CNN
developed CNN-based method was trained and validated using 19 architecture capable of extracting features from nonlinear and nonsta­
different conditions, including intact, single, and multiple damaged tionary data to mitigate various environmental noises by Cha et al. [198]
conditions. The DL-based approach automatically extracts features from and extensive construction noises by Mostafavi and Cha [199]. These
raw signals for damage localization and performs well in vibration- investigations demonstrated that addressing these nonlinear and
based damage detection in structures. nonstationary data demands more advanced deep learning operators.
Azimi and Pekcan [192] also presented a CNN-based approach for These operators encompass DWSC, various attention mechanisms,
SHM, which incorporated transfer learning techniques to improve the nonlinear activation functions, LSTM, and RNN. These operators are
performance. They used three different datasets coming from different employed instead of simply employing an array of 1D convolutions by
types of sensors (traditional acceleration sensors, self-powered sensors, preserving the 1D format of the feature maps throughout the hidden
and highly compressed data). The study demonstrated the effective use layers.
of extremely compressed data with only three parameters and reported Numerous studies have been conducted on this supervised mode of
mean accuracy scores of 90–100%. training deep learning networks to detect and localize structural dam­
Guo et al. [193] developed a DL model to extract damage features ages. However, the fundamental issues and limitations of this approach
from mode shape data, using a novel approach that employed a CNN are the difficulties in collecting ground truth data encompassing various
with a multiscale module to extract features at various levels. This damage scenarios required to train the model from real structures
multiscale approach, based on the inception network, was effective in because the generated data from FE model is not enough to detect minor
reducing the interference of contaminated data. The researchers also damage at crticial locations [29].
incorporated a residual learning module to speed up network conver­ Moreover, unlike computer vision-based surface damage detection
gence and a global average pooling layer to minimize computational that can utilize image data from different structures to train the network,
costs. The study considered noise interference, multiple damage, and this vibration-based method learns the dynamic characteristics of spe­
missing data and evaluated the proposed method on a dataset from cific structures through measured vibrations encompassing various
simulation and two laboratory experiment datasets. To minimize damage scenarios. Because each structure possesses unique mechanical
training requirements without compromising accuracy, the study uti­ properties and its dynamic characteristics, including different boundary
lized a transferring parameter approach. conditions and sometimes even distinct environmental conditions, the
Morgantini et al. [194] presented an approach for structural damage collected data from one structure maynot be employed to train damage
assessment that involves the extraction of damage-sensitive features in detection networks for other structures [29].
the frequency domain. The quefrency domain offers simplification and
dimensionality reduction compared to the frequency domain, and the 3.5.2. Measured data recovery or prediction methods
method derives the cepstral coefficients of the structural acceleration SHM involves the use of sensors to evaluate the condition of struc­
response for use in a damage assessment strategy. The proposed method tures. However, technical faults in sensors or communication systems
was validated through numerical simulations and experimental data, can lead to permanent or temporary data loss, impacting the accuracy of
demonstrating the effectiveness of the damage assessment strategy. the evaluation of structural conditions. Despite the widespread use of
Won et al. [195] proposed a CNN for structural damage sensors such as strain gauges and accelerometers, data loss is a common

24
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

problem in SHM processes due to various factors, such as environmental carried sufficient structural vibration information while meeting the size
conditions, sensor aging, and faults in the sensor itself [200], potentially requirement for data compression through downsampling. They devel­
leading to false decisions. Replacing faulty sensors is the best solution, oped a framework that leveraged a data-level technique and a
but it can be costly and time-consuming. transformer-enhanced densely connected neural network for anomaly
To address this challenge, Oh et al. [201] proposed a CNN-based detection in SHM data, achieving a remarkable performance in data
reconstruction method for data loss recovery in SHM. The technique anomaly detection tasks.
aims to recover structural response data when it is missed due to faulty
sensors or other issues. The method involves installing multiple sensors 3.5.3. Sensor fault detection approaches
in a structure and using them to measure vibration data. If one of the Sensor fault detection is a critical application of DL in SHM as faulty
sensor's data is lost, a CNN is trained using the data from the remaining sensors can significantly compromise the accuracy of SHM systems.
sensors to produce the missing data. The loss is calculated using the Early detection and diagnosis of sensor faults are, therefore, necessary to
mean squared error (MSE) between the predicted and actual missing ensure reliable performance. In this regard, similar concepts of data
strain data. Multiple CNNs are constructed, each corresponding to the recovery and prediction methods have been applied to the sensor fault
number of assumed faulty sensors. detection problem.
This approach demonstrates the potential to mitigate data loss in Liu et al. [206] proposed a 1D CNN and wavelet clustering method
SHM and improve the accuracy of structural evaluation. The CNN is for sensor fault detection and diagnosis in the air temperature control
trained to generate missing data due to faulted sensors, where the input loop of the air handling unit. Pan et al. [207] used a deep CNN to detect
data is fitted with the output data. However, one major limitation of the sensor and actuator faults in robot joints, including offset error, sensor
proposed method is its reliance on the availability of pre-fault data, and actuator malfunction, and gain errors. The deep CNN method was
which may not always be practical. compared with conventional neural networks such as LeNet-5, ANNs,
Furthermore, the method assumes that failed sensors remain in a SVM, and long-term memory networks.
failed state throughout the data recovery process, resulting in some Junior et al. [208] utilized a multi-head CNN to detect six different
actually measured data from sensors with partial loss of data not being types of faults in an electric motor. They employed the multi-head CNN
used to recover the data loss. In addition, the approach is computa­ model to deal with each single sensor individually, facilitating the
tionally expensive, which could limit its applicability to larger-scale extraction of features. Two accelerometers were deployed to collect vi­
systems. Finally, the proposed method has only been tested on a spe­ bration data in two different directions.
cific type of structure equipped with strain sensors, and may not be Jana et al. [209] applied a CNN and convolution autoencoder to
directly applicable to other types of structures or sensor configurations. detect faults and categorize their type in a shear-type structure and a
Data loss can have a negative impact on the overall evaluation of laboratory scale model arch bridge. The models were trained to learn
structural conditions in SHM. To address this issue, Lei et al. [202] patterns in sensor data indicative of faults. When presented with sensor
proposed a method that utilizes a GAN with an encoder-decoder data from a new sample, the models used their learned patterns to
network for the generator and an adversarial discriminator. This identify faulty sensors and classify the type of fault present.
generator is designed to reconstruct lost signals by extracting features The autoencoder was useful for this task as it is designed to learn low-
from the remaining functional sensor data and generating realistic hy­ dimensional representations of high-dimensional data. The model
potheses for the lost signals. The discriminator then provides feedback to demonstrated 100% accuracy in localizing faulty sensors and 98.70%
improve the reconstruction accuracy of the generator. This method was accuracy in detecting fault types. However, it is important to note that
tested on two case studies, a numerical study using collected accelera­ while many sensors' fault detection achieved high accuracy, i.e., more
tion data and a practical study using measured strains. The results than 98.00%, in experimental models, it is crucial to test these methods
demonstrated the effectiveness and efficiency of the proposed method in in real large-scale structures. It is also worth noting that sensor fault
reconstructing lost signals. detection is a challenging problem, particularly in large-scale structures
Jiang et al. [203] developed a data-driven neural semantic recovery or complex systems where multiple sources of noise and interference
framework for SHM that transforms data recovery into a conditional may exist.
probability modeling problem using fully deep CNNs. The framework
was evaluated using long-term measured acceleration data of a pedes­ 3.5.4. Unsupervised approaches
trian bridge and demonstrated excellent recovery accuracy and robust­ Training DL networks in a supervised manner for tasks such as
ness even with a high loss ratio. In addition to data recovery, recent damage detection, loss data recovery, or sensor fault detection requires
studies have addressed the problem of low data and class imbalance an extensive dataset with accurately labeled ground truth data,
[204]. encompassing a wide range of damage scenarios. Nevertheless, gath­
Li et al. [205] proposed an approach for anomaly detection in SHM ering such data for structural applications presents a notable hurdle. To
data, which consisted of two steps: training dataset balancing and data overcome the limitations of the supervised methods discussed in Section
anomaly detection. In the first step, they developed a data-level tech­ 3.5.1, several unsupervised ML approaches have been adopted for
nique to generate anomalous data by analyzing three typical classes of structural damage detection.
data anomalies i.e., missing, trend, and outlier, and extracting their Fig. 33 provides a comprehensive illustration of the general
rough features. This generated a training dataset where each class of approach for unsupervised DL in damage detection, where the blue
anomalies was equally represented. In the second step, they introduced a colored solid line represents training data from the intact structure
specific transformer-enhanced densely connected neural network condition, and the red colored dashed line indicates testing data from an
(TDNet) for anomaly detection. unknown structure condition.
The TDNet incorporated a modified DenseNet architecture along Rafiei and Adeli [30] proposed an unsupervised deep autoencoder
with a Transformer encoder. The DenseNet facilitated the extraction of trained on acceleration data obtained from an undamaged structure. The
detailed and global features from the response data, while the Trans­ trained network proficiently reconstructed input responses, subse­
former encoder, specifically the multi-head attention mechanism, hel­ quently employing these reconstructed responses to extract damage-
ped capture long-distance correlations among the time series data. For sensitive features from acceleration data sourced from structures with
the input to the network, the authors used time-domain structural re­ unknown health conditions. The proposed technique demonstrated
sponses. These responses were directly fed into the TDNet for feature notable accuracy, validated through both experimental and numerical
learning without any manual feature transformation. The length of the studies. For a 12-story numerical building model and a miniature
input segments was set to 1024 sampling points, ensuring that they laboratory-scale bridge, the method achieved mean average accuracies

25
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

of 97.40% and 91.00%, respectively. It effectively detected severe near- experiments on an eight-degree-of-freedom shear type discrete model
collapse damage, though improvements are necessary for identifying using several damage scenarios. The second study acquired time his­
lighter or moderate damage levels. tories of acceleration responses collected by several sensors installed on
Wang and Cha [29] introduced an unsupervised deep autoencoder an actual bridge during damage progression, thereby validating the
approach for damage detection. This method involved reconstructing method through simulation and on a real structure. The main finding of
input signals and comparing them to reconstructed signals of damage the study was using the power cepstral coefficients as the inputs and
responses, utilizing three damage-sensitive features and applying a one- outputs of the proposed generalized autoencoder, as it has benefits to
class SVM. Performance testing took place on a miniature laboratory- reduce the network complexity and overfitting issue.
scale steel bridge and a 12-story numerical building model. Compre­
hensive comparative studies encompassed the Mahalanobis distance 3.5.5. Summary of vibration based approaches
technique and Rafiei and Adeli's method [30]. The Mahalanobis distance As described in Section 3.5, this category involves DL applications
technique exhibited an average damage detection accuracy approxi­ that aim to monitor the overall behavior and health of the entire
mately 5% lower than that achieved with the one-class SVM. Wang and structure as a whole. Instead of focusing on specific localized regions,
Cha's [29] method displayed superior effectiveness in detecting light these approaches analyze the collective information from multiple
damage in the form of bolt loosening in a steel bridge when compared to sensors distributed throughout the structure. The objective is to capture
Rafiei and Adeli's [30] approach. global structural behavior, identify global patterns or trends, and assess
Silva et al. [210] proposed an unsupervised two-level feature the overall health and integrity of the structure. In global monitoring,
extraction method by applying a stacked autoencoder. The stacked vibration data obtained from computer vision sensors and contact sen­
autoencoder learned the compressed representation of modal properties sors have been analyzed using various DL techniques, such as supervised
such as natural frequencies, modal shapes, and modal coordinates. The approaches, measured data recovery and prediction methods using DL,
proposed approach showed improved performance when tested on a as well as sensor fault detection approaches and unsupervised
dataset from the Z24 bridge. approaches.
Recently, Sony et al. [211] proposed an LSTM network for defect Many studies have employed supervised learning methods for dam­
detection and localization in structures. The study utilized vibration age detection and localization. However, these methods require labeled
sensor data as input to the LSTM network, and the class probabilities of data, which can be difficult and time-consuming to acquire. Moreover,
the model output were used to determine the locations of the damage. the unique dynamic behaviors of different structures mean that vibra­
The proposed LSTM method was validated using data from QUGS and tion data from one structure may not be applicable to another [28,29].
outperformed the 1D CNN method on the Z24 bridge dataset. This has led to the development of unsupervised approaches that do not
Soleimani-Babakamali et al. [212] proposed a framework using require labeled data. Recently, unsupervised DL techniques have been
GANs for unsupervised novelty detection in SHM. The discriminator used for damage detection [29–31].
network in the GAN served as the novelty detector, and the generator Several DL methods, such as CNN-based reconstruction and GANs,
provided data to fine-tune the detection threshold. The authors evalu­ have been proposed for measured data recovery. Oh et al. [201], Lei
ated the framework on two benchmark datasets and demonstrated et al. [202], and Jiang et al. [203] have demonstrated the effectiveness
promising results with 95% novelty detection accuracy. The use of of these methods. Recent studies have also addressed the problems of
CycleGAN in SHM has been explored to address data scarcity and for limited data and class imbalance in SHM, as noted by Gao et al. [204].
undamaged-to-damaged domain translation applications. Sensor fault detection approaches using DL, such as 1D CNN and
Sajedi and Liang [213] proposed a novel vibration-based method by wavelet clustering, have also been developed to diagnose and detect
combining transfer learning with deep generative Bayesian optimization faults in various systems. These techniques have been shown to accu­
(DGBO) for optimal sensor placement (OSP) in SHM. Their approach rately detect faults in systems by Pan et al. [207]; Junior et al. [208]; and
utilized generative models, including the conditional variational Jana et al. [209]. The integration of DL-based sensor fault detection
autoencoder and a surrogate neural network, to efficiently optimize approaches with vibration-based approaches can enhance the overall
expensive objective functions. The method was validated on two case effectiveness of SHM, providing more comprehensive and accurate in­
studies for damage location and severity predictions using vibration- formation about the health of a structure.
based SHM data, demonstrating that DGBO outperforms genetic algo­ The ultimate purposes of data recovery and sensor fault detection are
rithms with the same number of function evaluations and can reduce the for SHM. However, the limitations of the data recovery and sensor fault
number of accelerometers without compromising performance. The detection methods discussed above are that they cannot consider dam­
paper concludes that DGBO, combined with transfer learning and age that occurred during the data loss or sensor faults due to nature of
generative models, is a scalable solution for addressing the high- superivsed learning. The aforementioned vibration-based methods are
dimensional challenge in OSP for large-scale civil infrastructure and summarized in Table 11, along with their specific applications and
can extend beyond OSP to semantic damage segmentation. corresponding advantages and limitations.
Recently, Li et al. [214] incorporated a generalized autoencoder
method with statistical pattern recognition for structural damage 3.6. Physics-informed DL networks
assessment. They conducted two studies to evaluate the performance of
the proposed method. The first study involved conducting simulation While data-driven DL models have shown success in solving specific
problems, they may perform poorly in cases with noisy data or limited
data availability. Furthermore, while DL networks have demonstrated
remarkable success in numerous applications, their potential for scien­
tific discovery remains in its early stages. Therefore, it is imperative to
combine physical laws and knowledge with DL models to enhance their
performance. As a solution, physics-informed DL leverages existing
knowledge derived from physical, mathematical, empirical, and obser­
vational understanding to enhance the performance of DL networks. By
integrating prior knowledge into the training process, physics-informed
DL can yield better predictions for generalization and extrapolation
tasks, particularly in the presence of noisy, missing, or outlier data
Fig. 33. Unsupervised DL approach. [215].

26
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Table 11 PIDL that have come out in recent yeas [215,219]. In Karniadakis, et al.
Comparison of various 3D model reconstruction methods. [215], a broad review of physics-informed machine learning (PIML) was
Network Applications in Advantage Limitations conducted, covering basic concepts, approaches, and recent applications
SHM in topics such as fluid flow, plasma dynamics, and quantum chemistry.
Difficult/ In Cross et al. [219], the application of PIML is discussed in the context
Damage infeasible to of Bayesian learning and the application of PIML in an SHM context. In
Supervised Effective with
learning
detection and
labeled data
collect data for this section, we specially focused on physics-informed deep learning
localization various damage problems.
scenarios
May not be
Unsupervised Damage Does not require possible for 3.6.1. Major approaches within physics-informed deep learning
learning detection labeled data damage We break down two major approaches to PIDL in SHM by classifying
quantification their architectures in terms of the physics-informed loss function and the
Effective data Requires well-
CNN-based physics-informed architecture. The most common approach in PIDL is
Measured data recovery; structured data,
reconstruction
recovery Addresses Limited data and the construction of a physics-informed loss function. A loss function can
and GANs be defined as the mathematical function that relates the predicted values
imbalances class imbalance
Inability to of the model with the measured, physical values of the system (also
account for known as the ground truth). A physics-informed architecture can be
damage occurring
Sensor fault
Fault detection during data loss or
defined as an architecture in which specific components of the network
detection with Lightweight are defined by either being data-driven or physics-based, such as having
in various sensor faults, poor
1D CNN and architecture of 1D
systems, Sensor performances physics-informed layers or nodes.
Wavelet CNN
clustering
fault detection related to The general format of the physics-informed loss function is provided
nonlinear and
in Eq. (5).
nonstationary
data L = wDL L + wphy L (5)
total DL phy
Damage
Capture high
detection and
nonlinear and
Requires where, L total is the loss of the entire system, L DL is the loss of the
Advanced DL localization, sophisticate deep learning model, and L phy is the loss associated with the physics of
nonstationary
operator based fault detection, design of the
networks data recovery,
features (Cha et al.
networks for the system. L phy acts as a regularization term for the system. The
[198]; Mostafavi
sound noise specific problems weights for the deep learning component and the physics-based model
and Cha [199])
cancellations are wDL and wphy respectively.
These parameters can be adjusted through hyperparameter tuning to
increase the accuracy of the unified model. There exist a wide variety of
Physics-informed DL (PIDL) approaches are a subset of hybrid ap­ loss functions that exist within the literature for deep learning models
proaches in that they combine traditional, data-driven approaches with and are selected based off the type of problem and makeup of the
physics model–based approaches for structural damage detection [216]. dataset, such as the IoU function, which is frequently used in computer
The only difference is that the physics-informed, DL–based method vision applications, or the Focal-Tversky loss which is frequently used in
utilizes DL instead of traditional data-driven approaches. Put succinctly, problems in which the dataset is unbalanced. Often in PIDL models the
DL-based methods are those that use statistical methods within deep loss function is typically defined as the mean-squared error (MSE) loss.
learning and physics-based approaches are those which aim to represent The approach of a physics-informed loss function is shown in Fig. 35.
the governing physical laws of a system within the domain of time and As can be seen from Fig. 35, there are two major streams that run-in
space and that embed causation and generalization. In this section, basic sequence. The first is the data-driven stream through deep learning,
theory of PIDL will be discussed, along with various applications of PIDL which takes in an input in term of x and t, which is then processed
in SHM and related disciplines. through a sequence of layers in a neural network. The output of this
PIDL is best applied in situations where there exists either a partial or neural network is an output u, which is an approximate solution at (x, t).
full physical model [ex: a partial differential equation (PDE)], as well as This variable constitutes the rest of the partial differential equation
observational data of the system (i.e. a set of inputs and outputs). PIDL along with its partial derivatives with respect to x and t. In the data-
can be applied to enhance the physical model and create more accurate driven stream, u is evaluated through something like MSE loss as
predictions. PIDL is most appropriately applied to inverse problems and stated previously. In the physics-informed stream, the partial derivatives
poorly-understood problems [215]. Inverse problems refer to those in of u are evaluated and the PDE residual is calculated as the loss for the
which a portion of the PDE, such as a coefficient or a boundary condi­
tion, is reconstructed from partial knowledge of the solution in a PDE. A
clear example of an inverse problem would be damage detection [217].
Solving inference problems is often prohibitively expensive, and thus
a data-driven approach is often beneficial. Poorly-understood problems
are a suitable application for PIDL or other statistical methods for the
simple reason that neural networks are universal approximators of
continuous functions [218]. Therefore PIDL can be easily applied to
problems in which portions of the physical problem are incomplete and
can thus be approximated with neural networks.
The usage of PIDL constitutes a merging of “white-box” and “black-
box” models, where the former refers to the analysis of a system through
the characterization of an underlying mechanism, and the latter refers to Fig. 34. Categories of physics-informed DL: Black-box vs. white-box modeling.
a model based on relating outputs to inputs but placing no requirement On the black-box end of the spectrum, we see data-driven methods such as DL,
on the actual internal physics at work [219]. A breakdown of the ML, and other forms of statistical analysis. On the white-box end of the spec­
paradigm between white- vs. black-box models is shown in Fig. 34. trum we observe analytical methods such as those which use partial differential
There have been a series of comprehensive reviews and studies of equations. Physics-informed deep learning exists on the continuum within these
two methods.

27
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

physics-informed stream. The two losses are then weighted through


weights, wDL and wphy , and optimization is done through stochastic
gradient descent of learnable parameters, θ.
In physics-informed architectures, the physics are embedded into the
actual neurons or layers themselves and are representative of physical
phenomena, of which an example can be found in Yucesan and Viana
[220], with their modeling of bearing faults in wind turbines. A break­
down of different types of physics-informed deep learning architectures
is given in Fig. 36. Fig. 36. Types of physics-informed deep learning architectures and potential
applications within SHM.
3.6.2. Applications in SHM
Some examples of PIDL in SHM are in structural reliability analysis
SHM. It can also be used for response prediction, fault diagnosis, and
and fault diagnostics as seen in Yucesan and Viana [220]. In [220], a
damage quantification. In Di Lorenzo et al. [223], a physics-informed
physics-informed architecture approach was used in determining
neural network was used in structural health diagnosis, evaluated
bearing remaining useful life (RUL) in wind turbines using visual grease
through a series of case studies involving heat transfer and elasticity.
inspections. The architecture developed was a unique architecture
Another recent contribution was in Li et al. [224], in which a physics-
containing physics-informed nodes (for the bearing damage increment)
guided neural network was used to evaluate the finite deformation of
and data-driven nodes (for the grease damage increment). In this paper,
elastic plates, which are traditionally evaluated through the Föppl–von
the known physical model comes from the degradation of bearing fa­
Kármán (FvK) eqs. A physics-informed loss function was used for this
tigue damage, and the unknown portion comes from the grease degra­
problem with the total potential energy method providing the best re­
dation which was represented with deep learning in which the physics
sults in terms of adjustment of hyperparameters and computational
are poorly understood. Thus, two failure modes exist and are joined
complexity.
using a physics-informed architecture where explicit nodes in the ar­
In Bazmara et al. [225], a physics-informed neural network was
chitecture constitute the physics-based component and the data-driven
applied to the analysis of nonlinear bending of a 3D functionally graded
component respectively.
beam. The model was shown to evaluate the nonlinear bending of a
SL: spline learning, MLP: multi-layer perceptron, DAEM: deep
beam 37 times faster than the traditional numerical method. Haghighat
autoencoder based energy method, DMM: deep Markov model.
et al. [234] found that their physics-informed neural network led to
More within the field of SHM, in Vega et al. [221] a unique approach
accurate solutions in the domain of linear elastic materials. Contrasted
was applied to damage prognosis of miter gates in navigational locks. In
with previous PIDL networks, the authors used a multi-network model
this approach the inputs to the model were historical visual inspection
(5 networks in total). The authors found that their solution was better
data, knowledge of human observation errors, along with different SHM
than contemporary finite element analysis (FEA) solutions and
data, in which the architecture was integrated with a Markov transition
converged with less than 100 data points.
matrix. The model also incorporated physics-based finite element
System identification of structural systems is very important in the
models. The model showed improvement in human error in prognostic
context of SHM. Liu and Meidani [226] used a physics-informed neural
estimation of the miter gate and also showed the RUL estimate was
network, PIDynNet, to predict nonlinear structural responses. The au­
significantly reduced using the physics-informed approach.
thors used a physics-informed loss function to achieve this goal. The
In Zhang et al. [222], a physics guided loss function was used to
model was capable of predicting the responses of the system for ground
evaluate the divergence between the output of a DL-model and finite
excitations not seen by the model during training. In Yuan et al. [217], a
element model update procedure. The results were evaluated on a
physics-informed loss function was proposed for displacement field
pedestrian steel bridge model and an experimental study with a three-
construction for a system with sparse sensor placement.
story building mode, where it was found that performance was
Li et al. [227] proposed a physics-guided DL framework for predic­
improved in terms of generality and consistency of the damage detection
tive modeling of bridge vortex-induced vibrations using field monitoring
results.
data. The study resulted in a data-driven model for accurately predicting
As has been shown, PIDL can be used for a variety of tasks within

Fig. 35. Physics-informed loss function. In this particular formulation, the output of the deep learning network is used as input to the physics-based network, in
which the output is used to evaluate the partial differential equation (PDE). A general form of a PDE is given. Each of the relevant partial derivatives must be
calculated with respect to the input variables. Losses are calculated from the data-driven stream and the physics-based stream, where the two losses are unified
through a weighting function to calculate the final loss of the network, which is then backpropagated through the network to update the learnable parameters.

28
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

vortex-induced vibration responses under real natural winds. review paper. It can be clearly observed from Table 12 that the most
Sun et al. [228] proposed a physics-informed spline learning tech­ common form of architecture is the FCN/ANN/DNN/MLP. From a DL
nique to solve nonlinear dynamic equation problems involving noisy perspective, this makes sense insofar as the vast majority of PIDL
and limited measured data. The proposed method consists of three problems are of lower computational complexity, i.e., the dimension­
stages: pretraining, sparsity promoting direction optimization, and post- ality of the input-output pairs is low. Therefore, lower computational
tuning. Ultimately, a physics-informed loss function was used. The complexity networks such as FCN networks are suitable.
method's performance has been compared with two approaches, i.e., Table 13 summarizes the major approaches found within PIDL, along
Python sparse identification of nonlinear dynamics (PySINDy) and ge­ with their relative advantages and limitations. Also included are the
netic-programming–based symbolic regression. boundaries of the spectrum that defines PIDL approaches, that of pure
Another valuable application of PIDL is in metamodeling of physics-modelled architectures and purely data-driven architectures.
nonlinear structures. Eshkevari et al. [229] developed DynNet. The The majority of PIDL-based architectures contain physics-informed loss
authors used a ResNet-based architecture for nonlinear dynamic functions. Physics-informed loss functions tend to be more generalizable
response prediction of multi-degree of freedom (MDOF) systems. when compared to physics-informed architectures, such as those with
Physics-based constraints were imposed onto the architecture which physics-informed layers or nodes.
prevented the model from increasing beyond the trust-region. A purely Physics-informed architectures are well-suited to problems in which
data-driven example of deep learning for nonlinear seismic response can there is a clear delineation between physics-modelled components and
be found in Huang & Chen [230]. A number of different DL-based ar­ more poorly understood components that require modeling with a data-
chitectures were investigated, such as the 1D-CNN, LSTM, and MLP. It driven approach, resulting in solutions that are problem and domain-
was found that the 1D-CNN produced the best results, with the worst specific. All PIDL-based architectures have the same limitation of not
being the LSTM architecture. having a unified approach to defining the data-driven part of the ar­
A physics-informed loss function and multi-end autoencoder was chitecture, which defines a key direction for future research.
introduced in Ni et al. [233] for seismic response estimation of MDOF In terms of applications within SHM and the real-world, physics-
systems. The authors found that when comparing their PIDL-based ar­ informed deep learning, and to a lesser extent data-driven methods, are
chitecture to a purely data-driven approach that results were signifi­ still in their infancy relative to some of the mathematical theories these
cantly improved. methods are paired with in PIDL. The vast majority of PIDL within SHM
A physics-informed loss function was proposed in Rojas et al. [235] has been in response reconstruction of nonlinear behavior, including
for damage quantification which yielded satisfactory results. The au­ those of seismic responses, which have been modelled with both
thors found that their PIDL-based approach worked better than the PDE- physics-informed architectures and physics-informed loss functions.
constrained optimization method and was robust to noise in the dataset. Thus, in summary, we can clearly see that PIDL's main real-world ap­
In Zhu et al. [236], a novel application of a physics-informed architec­ plications are found in reconstructing data that is lacking, which has
ture was used for optimal pressure sensor placement in wind engineer­ some underlying mathematical model that can assist in improving the
ing applications. A physics-based architecture based off high-fidelity accuracy of the data, as seen in Zhu et al. [236], as well as in reducing
computational fluid dynamics equations was used to generate a pressure computational complexity of large models and expediting runtime for
model, with the MLP-based architecture taking in the pressure model solution generation, as seen in Huang and Chen [230], as well as in
and outputting an optimal sensor layout. A surrogate model was also solving problem formulations that have partial differential equations
developed to learn the average velocity and full-field pressure from the
high-fidelity data. In Pereira and Glisic [237], a miscellaneous archi­
Table 12
tecture was used to predict the 2D normal strain field in concrete
Physics-informed deep learning.
structures. A data-driven model was used to predict the temperature
field and rheological effects, the physics model was constructed through Network Domain application Type of architecture
an analytical model of the strain in the structure and the prediction Multi-layer feedforward
Displacement response Physics-informed loss
accuracy of the temperature and total strain were considered good. ANN (Yuan et al.
reconstruction function
[217])
DNN / ANN (Haghighat Response prediction Physics-informed loss
3.6.3. Summary of physics-informed deep learning et al. [234]) (elastodynamics) function
There exists a wide variety of approaches to physics-informed deep Physics-informed
LSTM (Zhang et al. Response prediction
learning, such as those through physics-informed loss functions and [232]) (nonlinear seismic)
architecture and loss
physics-informed architectures. Each approach can be radically different function
RNN (Yucesan & Viana Physics-informed
from the other and ultimately the paradigm is extremely versatile in how Bearing RUL
[220]) architecture
it incorporates the physics-based model with the data-driven approach Physics-informed loss
FCN (Rojas et al. [235]) Damage quantification
in deep learning. This is clearly seen in the physics-informed loss func­ function
tion in Eq. (5), where the weighting parameters can be used to increase ANN (Li et al. [214])
Response prediction (elastic Physics-informed loss
or diminish the relative contributions of the physics-based or DL-based plates) function
ResNet (Eshkevari et al. Response prediction Physis-informed
components in the final prediction. [229]) (nonlinear seismic) architecture
The most common approach to PIDL is the approach with a physics- 1D-CNN/LSTM/FCN Response prediction
informed loss function. Ultimately, the paradigm of PIDL is very (Huang and Chen (nonlinear seismic, subway Purely data-driven
promising and powerful in its predictive potential. There are many ad­ [230]) station)
Response prediction (linear Physics-informed loss
vantages to PIDL-based approaches, such as increased model conver­ CNN (Ni et al. [233])
and nonlinear seismic) function
gence times, increase model runtimes, and generalizability. FCN (Pereira and Glisic Response prediction Physics-informed
Future work in PIDL is needed in regard to establishing a uniform [237]) (rheological) architecture
approach to deep learning model design, as is the case in many forms of FCN (Bazmara et al. Response prediction (3D Physics-informed loss
deep learning, in the sense that there is no strong unified procedure for [225]) functional material) function
FCN (Di Lorenzo et al. Physics-informed loss
selecting a statistical model prior to simulation. [223])
Structural health diagnosis
function
A summary of some recent applications in physics-informed deep Optimal sensor layout (wind Physics-informed
FCN (Zhu et al. [236])
learning is given in Table 12. The applications are broken down in terms engineering systems) architecture
of their type of architecture, the domain within physics that they apply FCN (Liu and Meidani Response prediction Physics-informed loss
[226]) (nonlinear seismic) function
to, and the type of architecture within the framework given in this

29
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Table 13
Comparison of various physics-informed methods.
Type of architecture Applications in SHM Advantage Limitations

Pure physics-modelled All domains with a mathematical model Limited to physical phenomena in which a full mathematical
Most accurate
architecture describing physical phenomena model is present
Physics-informed loss Structural health diagnosis, fatigue crack growth Highly generalizable, most Loss of physical meaning in data-driven component, no
function prediction, response reconstruction common PIDL method standard architecture for data-driven component
Physics-informed RUL prediction, optimal sensor layout, response Less generalizable, no unified approach to overall
Problem and domain-specific
architecture prediction architecture definition
All domains with measurable data containing Most generalizable, straight Accuracy dependent on quality and scope of data, no
Purely data-driven
input-output pairs forward implementation standard architecture

that are either impossible or difficult to solve, as seen in Guo et al. [231]. and manual UAVs [139] for computer vision based damge detection.
As the next step, deeper DL networks with object detection methods,
4. Evolution of deep learning-based SHM: A summary such as faster R-CNN [27], SSD [118], and an early version of YOLO
[115,116], were applied to deal with more complex problems, including
The remarkable ability of DL to extract multiple robust features from multiple damage detection and localization. These methods evolved to
diverse data sources obtained through various sensors and sensing ap­ detect multiple different damages with higher versions of the YOLO
proaches has generated substantial research interest in the field of DL- series [125,186], but all these networks only have the encoder to extract
based SHM. Section 2 provided core DL operations mainly used for features for bounding box level damage detection and localization.
establishing hidden layers of various DL networks, and Section 3 pre­ Many studies still actively employ these bounding box level damage
sented a comprehensive review of relevant studies and methodologies localization methods through the integration of autonomous UAVs
encompassing local and global monitoring. Through these extensive [113] for civil structures and damage localization, and some of them
reviews covering broad areas with various subtopics, the evolution of DL used manual UAVs in digital twins [151]. These methods can localize
networks used in SHM has been summarized, as shown in Fig. 37. the location of damage in bounding box level for NDT [72,94] and
The initial DL network employed for SHM was a relatively shallow computer vision-based damage detection [27].
CNN that is encoder with softmax with eight hidden layers, designed for Similarly, some supervised vibration-based damage identification
addressing binary classification tasks [24]. This network was utilized to using DL could localize the damage based on the sensor location
classify input images as either cracked or intact. Numerous crack image [191,192]. A significant number of studies have been conducted using
classification methods based on this initial CNN design have been these deep encoder-based DL networks because they provide accurate
developed, including the detection of other types of damage such as localization and quantification of detected damages in images and any
deformaton [101], building mold [105], and tile deterioration [109]. kind of 1D measured data. However due to fundamental limitations of
Integrating UAVs with CNNs was also explored with manual flight the supervise learning [28,29], some unsuperivsed learning of DL was
control of the UAVs [109,139,140]. Furthermore, any imaging mea­ introuduced for SHM. For exasmple, damage detection using unsuper­
surement from NDT methods could be inputted into this CNN prototype vised mode by Wang and Cha [29], and lost data recovery [201] have
for classification of input images such as thermography [64], Eddy been developed.
current [84], GPR [88], ultrasonic testing [91], and Acoustic emission In the case of computer vision-based damage identification, deep
[96]. autoencoder-decoder based methods were realized as pixel-wise damage
CNNs were also applied to vibration-based damage detection in a segmentation. The initial versions of deep autoencoders, such as FCN
supervised manner, aiming to determine if measured vibrations origi­ [132] and UNet [123], were implemented to pixel-wise segment struc­
nated from damaged structures. In this application, 1D vibration data or tural damages in images. Encoders are designed to extract features from
collections of these 1D vibration data were converted to 2D using initial the input data, and decoders are also designed following the encoder to
processing such as the FFT [191] and wavelet transformation [84]. The present the extracted features in the spatially recovered size of the input
converted 2D data could be inputted to CNNs, and the input was clas­ data. These fully convolutional networks have only convolution opera­
sified into two groups. Through these initial applications, it was tions in hidden layers without fully connected layers, and they use
demonstrated that CNNs can extract robust features against various pointwise convolution to reduce the spatial sizes of the extracted feature
noises in images, such as blurring [24], spot lighting [24], shadowing map without parameter-rich fully connected layers. This design may
[24], and sensor noises in vibration data [203,209]. These simple CNN provide faster training and testing speeds for the networks.
or encoder based methods are integrated with autonomous UAVs [99] However, all the initially used deep autoencoders that are publicly

Fig. 37. Evolution of DL methods in SHM.

30
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

available were originally designed for more than 20 object segmentation depends on the quality and availability of data, and the complexity of
tasks, making their direct application to SHM problems unnecessarily combining different techniques can result in higher computational costs.
heavy with more than 20 million tunable weights. As a result, real-time
processing was also impossible with large input images. Therefore, 5. Future research directions
recently, object-specific deep autoencoders have been developed for
SHM. The comprehensive literature review of local and global monitoring
For example, CrackNet [130], SDDNet [33], and STRNet [135] were approaches reveals that numerous advanced works have been con­
designed with some advanced operations such as ASPP and DWDC ducted, and concurrently, many researchers have expressed interest in
(Dilated-Wise Dense Connection). These two modules were modified exploring further enhancements. In the case of local monitoring, despite
and employed in the architecture of SDDNet to enable real-time pro­ various NDT imaging techniques implemented with DL methods, there is
cessing for crack segmentation in CBS, even with relatively large input still huge room for improvement and innovation. Fig. 38 shows some
image sizes (1024 × 512). SDDNet has only 0.16 million learnable pa­ research directions for development of more advanced DL networks for
rameters, which is significantly less compared to other publicly avail­ internal damage segmentation at the pixel level, leading eventually to
able networks that have at least 20 million parameters. This reduction in damage quantification.
parameters also reduces the required amount of ground truth data for Only a few studies have focused on pixel-level segmentation of in­
training the network. ternal damages, and more detailed qualitative and quantitative in­
STRNet was also designed with only 2 million learnable parameters vestigations should be carried out on the sizes and depths of damages, as
by adopting global and local attention modules, coarse upsampling, well as the processing time of proposed DL methods. Furthermore,
nonlinear activation functions (learnable swish activation functions), multiple damage segmentation methods should be investigated for NDT
and a focal-Tversky loss function. The network demonstrated state-of- and computer vision-based local damage monitoring. And complex
the-art performance in segmenting cracks on CBS, achieving an mIoU scenes should be considered for computer vision based approaches to
of 92.6% and a frame rate of 49.2 FPS using an input size of 1280 × 720. ensure the applicability of models to real structures. All these should be
This performance was compared to recently developed advanced through lightweight efficient architectures of the DL for real time
networks such as Attention UNet (34.8 million), CrackSegNet (12.4 processing.
million), DeepLabV3+ (59 million), FPHBN (5.9 million), and UNet++ As the performance evaluation metric, mIoU or more efficient eval­
(26.9 million). STRNet was integrated into an obstacle-avoiding uation metrics should be developed for more intuitive understanding the
autonomous UAV system with fiducial marker (ArUco)-based localiza­ level of damages by consideration of false positives and false negatives.
tion for GPS-denied areas by Waqas et al. [147]. Also, it is essential to determine the reasonable computing speed of the
Recently, an IDSNet [69] was also designed using thermal image developed DL network, especially considering the large scale of civil
inputs [238]. Furthermore, GANs have been recently implemented to infrastructure monitoring. The duration of monitoring directly affects
generate datasets for training the developed DL networks (i.e., AGAN the cost and feasibility of the process. Hence, in computer science and
[69]) to overcome the deficiency of ground truth datasets for internal engineering literature, floating-point operations per second (FLOPS) are
damages. commonly used to evaluate the computational speed of DL networks.
IDSNet has only 0.085 million learnable parameters, making it an However, for FLOPS to be an effective measure, it is essential to specify
extremely lightweight DL network applied to SHM problems. It is the size of the input image or data.
composed of various advanced operations, including global and self- In the realm of global monitoring, several DL networks have been
attention modules, depth-wise asymmetric convolution, depth-wise developed, with 1D CNN being a popular choice. These networks
dilated convolution, and depth-wise separable dilated convolution. maintain the input dimensions, including all features, as 1D throughout
IDSNet also achieved state-of-the-art performances with an mIoU of 90% each hidden layer, reducing computational costs and improving pro­
and a frame rate of 74.27 FPS with an input image size of 640 × 480, cessing speed. However, the simplicity of 1D CNN in terms of operations
compared to those of UNet++, DeepLabV3+, and Attention UNet. limits its ability to effectively handle complex and real-world problems,
In the case of vibration-based damage identification approaches, especially those involving highly nonlinear or nonstationary data (Cha
advanced networks with the aforementioned operations, as well as et al. [198]; Mostafavi and Cha [199]).
LSTM and RNN layers, have been developed to extract and generate non- Therefore, more advanced DL operators such as various attention
linear and non-stationary spatiotemporal features to cancel various algorithms, transformers, depthwise separable convolutions, ASPP,
construction noises. Examples include DNoiseNet [198] and CsNNet LSTM, RNN, and nonlinear activation functions should be further
[199]. These networks were required to overcome the fundamental adopted and investigated as discussed in the computer vision approach.
limitations of feature extraction capability of pure 1D CNNs that keep Additionally, to address the scarcity of ground truth data, challenges in
the dimensions of filters and feature maps as 1D throughout the hidden data collection, and data imbalances, various training methodologies
layers, which may result in high possibilities of losing key features for should be investigated such as unsupervised, semi-supervised, self-su­
complex nonlinear or nonstationary data problems encountered in pervised, and reinforcement learning.
actual applications. Although supervised modes have demonstrated effectiveness, they
All these various efforts in damage identification in SHM are face limitations due to the inherent challenge of gathering a diverse
converging towards visualization and mapping of the identified dam­ array of vibration data encompassing various damage scenarios. Despite
ages through high-resolution digital twins using 3D reconstruction the development of certain unsupervised DL methods, additional
techniques. These techniques include 2D photogrammetry [187] and 3D research remains imperative, particularly within domains like sensor
laser scanning data [124]. fault detection and data recovery issues. Furthermore, integrated ap­
It should be noted that no single method can solve all problems, and proaches are needed to simultaneously address damage identification,
hybrid approaches can provide more comprehensive solutions. Hybrid data recovery, sensor fault detection, and compensation.
approaches, combining multiple techniques using local methods and Additionally, ensuring the robustness of DL models to varying
global methods with various DL algorithms, have been utilized to environmental conditions, sensor noise, and structural changes is an
address the limitations of individual methods for accurate damage ongoing challenge. Researchers and practitioners must address these
detection and assessment of structures, as outlined in each section in complex challenges through data curation, efficient computing infra­
Section 3. These hybrid approaches have demonstrated high accuracy structure, interpretable deep learning architectures, energy-efficient
and robustness in various applications, such as crack detection [127], deployment strategies, and robust model training techniques to maxi­
and response prediction [215,220]. However, their performance heavily mize the benefits of DL in SHM while mitigating its limitations.

31
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

Fig. 38. Future research directions.

Based on a vast literature review of various areas of DL-based SHM, hidden layers. Therefore, there is a need for further investigation into
several common aspects that require further investigation have been more comprehensible DL models.
identified. One key aspect is the need to establish high quality big data Although limited autonomous navigation methods for UAVs have
for both local and global monitoring. The primary challenge in devel­ been developed for the SHM context, more extensive investigations are
oping DL-based SHM is the availability of reliable data to train networks required to develop more reliable and efficient autonomous monitoring
that can effectively solve real-world problems. Therefore, implementing systems with robotics. Inspection data and its results collected from
broad-range data collection and labeling processes for NDT imaging actual structures must be converted into virtual form for further
techniques, computer vision, and vibration-based approaches is crucial. analysis.
Despite the availability of datasets such as concrete crack detection The integration of local and global monitoring through digital twin
[239], segmentation [240], and road damage detection and classifica­ technology represents a promising avenue for future research. Despite
tion (including pothole segmentation with the Pothole600 Dataset) some hybrid methods being proposed, the majority of investigations into
[241], there is a need for more comprehensive datasets that encompass these monitoring approaches have occurred separately and indepen­
various CBS, different structure types and environmental conditions and dently. However, as the field of SHM continues to evolve, it is becoming
diverse damage types and severities. increasingly apparent that a more holistic approach is necessary.
Data imbalance issues are common in both local and global methods, By integrating local and global monitoring approaches, the devel­
and there is a need to develop methods to resolve data imbalances be­ opment of a more comprehensive and accurate understanding of a
tween intact and damaged datasets, as well as size imbalances between structure's health can be achieved, enabling more effective management
damage size and image frame size. and maintenance strategies. Achieving this level of integration will
Moreover, understanding aspects relevant to damage severity, such require advancements in areas such as 3D reconstruction, BIM, and
as damage thickness, depth, length, location, orientation, area, and digital twin approaches. By leveraging these technologies, it is possible
volume, is critical. However, collecting data from damaged structures to achieve more accurate monitoring and management of structural
for global monitoring is quite challenging compared to collecting data damage, track the severity of the damage, and develop robust repair
from intact structures. Using existing architectures for detection is strategies, resulting in safer and more resilient structures.
comparatively impractical under real-world conditions.
Exploring various learning modes for both approaches becomes 6. Conclusion
imperative to address data deficiency concerns. These modes encompass
self-supervised, semi-supervised, unsupervised learning, and meta In recent years, the swift progression of digital technology, along
learning methods for deep learning networks. Through these different with the adoption of DL, has profoundly influenced the automation of
learning modes, the lack of data or poor data quality can be partially SHM. DL techniques have unlocked the potential for analyzing extensive
overcome. datasets and intricate system behaviors through automated feature
Additionally, the computational requirements of deep learning can extraction. This unique and revolutionary capability has been widely
be demanding, requiring access to sufficient computing resources. applied to diverse facets of SHM. However, there has been a notable
Moreover, the interpretability of DL models remains a concern, espe­ absence of a comprehensive discourse on this topic in any single article.
cially in safety-critical applications like SHM, where understanding the Thus, this article provides an in-depth review of DL's core theories, al­
model's decision-making process is crucial. Furthermore, the deploy­ gorithms, and their wide-ranging applications in SHM, from local to
ment of DL models in real-world SHM scenarios may encounter chal­ global methods.
lenges related to energy efficiency, as resource-constrained devices in Local methods encompass five NDT approaches and computer vision-
remote or inaccessible locations may struggle to meet the computational based techniques, including those using LIDAR. In contrast, global
demands of DL algorithms. methods include data-driven strategies, such as supervised and unsu­
Numerous advanced DL models have been developed in the field of pervised learning, sensor fault detection, data recovery methods,
computer science, with some of them being implemented in SHM physics-based models, and hybrid methods that merge physics-informed
problems. Moreover, in recent times, researchers have developed strategies. Additionally, this study delves into how these local and global
objective-specific DL models for SHM to enable real-time processing and methods integrate with digital twins, incorporating both manual and
improve performance by using various DL operators and their modifi­ autonomous UAVs to further automation.
cations. However, there is still a lack of profound understanding Through a meticulous review of current literature on the topic, this
regarding the specific contributions and roles of each operation and article delineates the differences between traditional machine learning

32
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

and DL in SHM, outlines the cutting-edge methods within each subfield Materials 1996: Smart Systems for Bridges, Structures, and Highways 2719, SPIE,
1996, April, pp. 25–35, https://doi.org/10.1117/12.238846.
of local and global strategies, and discusses their respective advantages
[10] J.T.P. Yao, H.G. Natke, Damage detection and reliability evaluation of existing
and disadvantages. It also examines the integration of DL techniques, structures, Struct. Saf. 15 (1–2) (1994) 3–16, https://doi.org/10.1016/0167-
digital twins, and UAVs, the evolution of DL in SHM, and identifies 4730(94)90049-3.
potential directions for future research. These insights confirm that DL- [11] F.L.K. Wan, Genetic Algorithms, their Applications and Models in Nonlinear
Systems Identification (Doctoral dissertation, University of British Columbia),
based SHM is still in its nascent stages, with vast potential for more 1991, https://doi.org/10.1016/S1000-9361(11)60206-9. Retrieved on
sophisticated applications and developments to enhance SHM perfor­ November 17, 2023.
mance across various subfields, in terms of cost, accuracy, practicality, [12] Y.J. Cha, O. Buyukozturk, Structural damage detection using modal strain energy
and hybrid multiobjective optimization, Comput. Aided Civ. Inf. Eng. 30 (5)
reliability, sustainability, and degree of automation. (2015) 347–358, https://doi.org/10.1111/mice.12122.
It is anticipated that this article will serve both current professionals [13] K. Worden, A.J. Lane, Damage identification using support vector machines,
and newcomers in swiftly gaining a comprehensive and nuanced un­ Smart Mater. Struct. 10 (3) (2001) 540–547. https://doi-org.uml.idm.oclc.
org/10.1088/0964-1726/10/3/317.
derstanding of the domain, thereby fostering a deep and extensive [14] C.R. Farrar, W.E. Baker, T.M. Bell, K.M. Cone, T.W. Darling, T.A. Duffey,
knowledge base that paves the way for future research and innovation. A. Eklund, A. Migliori, Dynamic characterization and damage detection in the I-
40 bridge over the Rio Grande, in: Technical Report LA-12767-MS, Los Alamos
National Laboratory, Los Alamos, NM, 1994, https://doi.org/10.2172/10158042.
Statement Retrieved on November 17, 2023.
[15] G.A. Stephen, J.M.W. Brownjohn, C.A. Taylor, Measurements of static and
This manuscript is our original unpublished work, the manuscript or dynamic displacement from visual monitoring of the Humber Bridge, Eng. Struct.
15 (3) (1993) 197–208, https://doi.org/10.1016/0141-0296(93)90054-8.
any variation of it has not been submitted to another publication pre­ [16] I. Abdel-Qader, O. Abudayyeh, M.E. Kelly, Analysis of edge-detection techniques
viously, and there is no conflict of interest. for crack identification in bridges, J. Comput. Civ. Eng. 17 (4) (2003) 255–263,
https://doi.org/10.1061/(ASCE)0887-3801(2003)17:4(255).
[17] Y.J. Cha, K. You, W. Choi, Vision-based detection of loosened bolts using the
CRediT authorship contribution statement Hough transform and support vector machines, Autom. Constr. 71 (2016)
181–188, https://doi.org/10.1016/j.autcon.2016.06.008.
[18] S. Patsias, W.J. Staszewskiy, Damage detection using optical measurements and
Young-Jin Cha: Writing – review & editing, Writing – original draft,
wavelets, Struct. Health Monit. 1 (1) (2002) 5–22, https://doi.org/10.1177/
Visualization, Validation, Supervision, Project administration, Method­ 147592170200100102.
ology, Investigation, Funding acquisition, Formal analysis, Data cura­ [19] P.F. Luo, Y.J. Chao, M.A. Sutton, W.H. Peters, Accurate measurement of three-
dimensional deformations in deformable and rigid bodies using computer vision,
tion, Conceptualization. Rahmat Ali: Writing – original draft,
Exp. Mech. 33 (1993) 123–132, https://doi.org/10.1007/BF02322488.
Visualization, Investigation, Formal analysis, Data curation. John [20] J.G. Chen, N. Wadhwa, Y.J. Cha, F. Durand, W.T. Freeman, O. Buyukozturk,
Lewis: Writing – original draft, Visualization, Validation, Methodology, Modal identification of simple structures with high-speed video using motion
Investigation, Formal analysis, Data curation, Conceptualization. Oral magnification, J. Sound Vib. 345 (2015) 58–71, https://doi.org/10.1016/j.
jsv.2015.01.024.
Büyükӧ ӧztürk: Writing – review & editing. [21] Y.J. Cha, J.G. Chen, O. Büyüköztürk, Output-only computer vision based damage
detection using phase-based optical flow and unscented Kalman filters, Eng.
Struct. 132 (2017) 300–313, https://doi.org/10.1016/j.engstruct.2016.11.038.
Declaration of competing interest [22] D. Feng, M.Q. Feng, E. Ozer, Y. Fukuda, A vision-based sensor for noncontact
structural displacement measurement, Sensors 15 (7) (2015) 16557–16575,
https://doi.org/10.3390/s150716557.
The authors declare that they have no known competing financial [23] Z. Wang, H. Kieu, H. Nguyen, M. Le, Digital image correlation in experimental
interests or personal relationships that could have appeared to influence mechanics and image registration in computer vision: similarities, differences and
complements, Opt. Lasers Eng. 65 (2015) 18–27, https://doi.org/10.1016/j.
the work reported in this paper.
optlaseng.2014.04.002.
[24] Y.J. Cha, W. Choi, O. Büyüköztürk, Deep learning-based crack damage detection
Data availability using convolutional neural networks, Comput. Aided Civ. Inf. Eng. 32 (5) (2017)
361–378, https://doi.org/10.1111/mice.12263.
[25] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)
The research presented in this paper was partially supported by an 436–444, https://doi.org/10.1038/nature14539.
Mitacs L2M (Application Ref.: IT33874), NSERC Discovery grant [26] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection
(RGPIN-04120-2022), and a CFI JELF grant (37394). with region proposal networks, Adv. Neural Inf. Proces. Syst. 28 (2015), https://
doi.org/10.1109/TPAMI.2016.2577031.
[27] Y.J. Cha, W. Choi, G. Suh, S. Mahmoudkhani, O. Büyüköztürk, Autonomous
References structural visual inspection using region-based deep learning for detecting
multiple damage types, Comput. Aided Civ. Inf. Eng. 33 (9) (2018) 731–747,
https://doi.org/10.1111/mice.12334.
[1] J.M. Lifshitz, A. Rotem, Determination of reinforcement unbonding of composites
[28] Y.J. Cha, Z. Wang, Unsupervised novelty detection–based structural damage
by a vibration technique, J. Compos. Mater. 3 (1969) 412–423, https://doi.org/
localization using a density peaks-based fast clustering algorithm, Struct. Health
10.1177/002199836900300305.
Monit. 17 (2) (2018) 313–324, https://doi.org/10.1177/1475921717691260.
[2] J.K. Vandiver, Detection of structural failure on fixed platforms by measurement of
[29] Z. Wang, Y.J. Cha, Unsupervised deep learning approach using a deep auto-
dynamic response, J. Pet. Technol. (1977) 305–310, https://doi.org/10.4043/
encoder with a one-class support vector machine to detect damage, Struct. Health
2267-MS. March.
Monit. 20 (1) (2021) 406–425, https://doi.org/10.1177/1475921720934051.
[3] M.M.F. Yuen, A numerical study of the eigenparameters of a damaged cantilever,
[30] M.H. Rafiei, H. Adeli, A novel unsupervised deep learning model for global and
J. Sound Vib. 103 (3) (1985) 301–310, https://doi.org/10.1016/0022-460X(85)
local health condition assessment of structures, Eng. Struct. 156 (2018) 598–607,
90423-7.
https://doi.org/10.1016/j.engstruct.2017.10.070.
[4] P.F. Rizos, N. Aspragathos, A.D. Dimarogonas, Identification of crack location and
[31] Z. Wang, Y.J. Cha, Unsupervised machine and deep learning methods for
magnitude in a cantilever beam from the vibration modes, J. Sound Vib. 138 (3)
structural damage detection: a comparative study, Eng. Rep. e12551 (2022),
(1990) 381–388, https://doi.org/10.1016/0022-460X(90)90593-O.
https://doi.org/10.1002/eng2.12551.
[5] W.J. Wang, P.D. McFadden, Application of orthogonal wavelets to early gear
[32] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, Deeplab: semantic
damage detection, Mech. Syst. Signal Process. 9 (5) (1995) 497–507, https://doi.
image segmentation with deep convolutional nets, atrous convolution, and fully
org/10.1006/mssp.1995.0038.
connected crfs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (4) (2017) 834–848,
[6] J. Rhim, S.W. Lee, A neural network approach for damage detection and
https://doi.org/10.1109/TPAMI.2017.2699184.
identification of structures, Comput. Mech. 16 (6) (1995) 437–443, https://doi.
[33] W. Choi, Y.J. Cha, SDDNet: real-time crack segmentation, IEEE Trans. Ind.
org/10.1007/BF00370565.
Electron. 67 (9) (2019) 8016–8025, https://doi.org/10.1109/TIE.2019.2945265.
[7] N.E. King, Detection of Structural Nonlinearity using Hilbert Transform
[34] J. Cheng, L. Dong, M. Lapata, Long Short-Term Memory-Networks for Machine
procedures, The University of Manchester (United Kingdom), 1994, https://doi.
Reading. arXiv preprint. arXiv:1601.06733, 2016, https://doi.org/10.48550/
org/10.1016/j.ymssp.2016.06.008.
arxiv.1601.06733.
[8] X.J. Wu, J. Ghaboussi, J.H. Garret, Use of neural networks in detection of
[35] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to
structural damage, Comput. Struct. 42 (4) (1992) 649–659, https://doi.org/
document recognition, Proc. IEEE 86 (11) (1998) 2278–2324, https://doi.org/
10.1016/0045-7949(92)90132-J.
10.1109/5.726791.
[9] G.V. Garcia, N. Stubbs, K. Butler, Relative performance evaluation of pattern
recognition models for nondestructive damage detection, in: Smart Structures and

33
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

[36] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep [61] F.A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: continual prediction
convolutional neural networks, Adv. Neural Inf. Proces. Syst. 25 (2012), https:// with LSTM, Neural Comput. 12 (10) (2000) 2451–2471, https://doi.org/
doi.org/10.1145/3065386. 10.1162/089976600300015015.
[37] M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: [62] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,
European Conference on Computer Vision, Springer, Cham, 2014, September, N. Houlsby, An Image is Worth 16x16 Words: Transformers for Image
pp. 818–833, https://doi.org/10.48550/arxiv.1311.2901. Recognition at Scale. arXiv preprint. arXiv:2010.11929, 2020, https://doi.org/
[38] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, 10.48550/arXiv.2010.11929.
Y. Bengio, Generative adversarial nets, Adv. Neural Inf. Proces. Syst. 27 (2014), [63] O. Büyüköztürk, Imaging of concrete structures, NDT & E Int. 31 (4) (1998)
https://doi.org/10.1145/3422622. 233–243, https://doi.org/10.1016/S0963-8695(98)00012-7.
[39] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, A. Rabinovich, [64] R. Ali, Y.J. Cha, Subsurface damage detection of a steel bridge using deep
Going deeper with convolutions, in: Proceedings of the IEEE Conference on learning and uncooled micro-bolometer, Constr. Build. Mater. 226 (2019)
Computer Vision and Pattern Recognition, 2015, pp. 1–9, https://doi.org/ 376–387, https://doi.org/10.1016/j.conbuildmat.2019.07.293.
10.1109/CVPR.2015.7298594. [65] J. Yang, W. Wang, G. Lin, Q. Li, Y. Sun, Y. Sun, Infrared thermal imaging-based
[40] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale crack detection using deep learning, IEEE Access 7 (2019) 182060–182077,
image recognition. arXiv preprint. arXiv:1409.1556, 2014, https://doi.org/ https://doi.org/10.1109/ACCESS.2019.2958264.
10.48550/arxiv.1409.1556. [66] H. Ahmed, H.M. La, K. Tran, Rebar detection and localization for bridge deck
[41] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate inspection and evaluation using deep residual networks, Autom. Constr. 120
object detection and semantic segmentation, in: Proceedings of the IEEE (2020) 103393, https://doi.org/10.1016/j.autcon.2020.103393.
Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587, [67] S. Yang, Z. Wang, J. Wang, A.G. Cohn, J. Zhang, P. Jiang, L. Nie, Q. Sui, Defect
https://doi.org/10.1109/CVPR.2014.81. segmentation: mapping tunnel lining internal defects with ground penetrating
[42] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: radar data using a convolutional neural network, Constr. Build. Mater. 319
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2022) 125658, https://doi.org/10.1016/j.conbuildmat.2021.125658.
2016, pp. 770–778, https://doi.org/10.1109/CVPR.2016.90. [68] L. Deng, The mnist database of handwritten digit images for machine learning
[43] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for research [best of the web], IEEE Signal Process. Mag. 29 (6) (2012) 141–142,
biomedical image segmentation, in: International Conference on Medical Image https://doi.org/10.1109/MSP.2012.2211477.
Computing and Computer-Assisted Intervention, Springer, Cham, 2015, October, [69] R. Ali, Y.J. Cha, Attention-based generative adversarial network with internal
pp. 234–241, https://doi.org/10.48550/arxiv.1505.04597. damage segmentation using thermography, Autom. Constr. 141 (2022) 104412,
[44] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic https://doi.org/10.1016/j.autcon.2022.104412.
segmentation, in: Proceedings of the IEEE Conference on Computer Vision and [70] Z. Liu, J.K. Yeoh, X. Gu, Q. Dong, Y. Chen, W. Wu, D. Wang, Automatic pixel-level
Pattern Recognition, 2015, pp. 3431–3440, https://doi.org/10.1109/ detection of vertical cracks in asphalt pavement based on GPR investigation and
CVPR.2015.7298965. improved mask R-CNN, Autom. Constr. 146 (2023) 104689, https://doi.org/
[45] R. Girshick, Fast r-cnn, in: In Proceedings of the IEEE International Conference on 10.1016/j.autcon.2022.104689.
Computer Vision, 2015, pp. 1440–1448, https://doi.org/10.1109/ [71] D. Sen, J. Long, H. Sun, X. Campman, O. Buyukozturk, Multi-component
ICCV.2015.169. deconvolution interferometry for data-driven prediction of seismic structural
[46] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real- response, Eng. Struct. 241 (2021) 112405, https://doi.org/10.1016/j.
time object detection, in: Proceedings of the IEEE Conference on Computer Vision engstruct.2021.112405.
and Pattern Recognition, 2016, pp. 779–788, https://doi.org/10.1109/ [72] S. Pozzer, E. Rezazadeh Azar, F. Dalla Rosa, Z.M. Chamberlain Pravia, Semantic
CVPR.2016.91. segmentation of defects in infrared thermographic images of highly damaged
[47] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: a deep convolutional encoder- concrete structures, J. Perform. Constr. Facil. 35 (1) (2021) 04020131, https://
decoder architecture for image segmentation, IEEE Trans. Pattern Anal. Mach. doi.org/10.1061/(ASCE)CF.1943-5509.0001541.
Intell. 39 (12) (2017) 2481–2495, https://doi.org/10.1109/ [73] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.C. Chen, Mobilenetv2: Inverted
TPAMI.2016.2644615. residuals and linear bottlenecks, in: Proceedings of the IEEE Conference on
[48] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected Computer Vision and Pattern Recognition, 2018, pp. 4510–4520, https://doi.org/
convolutional networks, in: Proceedings of the IEEE Conference on Computer 10.1109/CVPR.2018.00474.
Vision and Pattern Recognition, 2017, pp. 4700–4708, https://doi.org/ [74] N.N. Kulkarni, K. Raisi, N.A. Valente, J. Benoit, T. Yu, A. Sabato, Deep learning
10.48550/arxiv.1608.06993. augmented infrared thermography for unmanned aerial vehicles structural health
[49] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, monitoring of roadways, Autom. Constr. 148 (2023) 104784, https://doi.org/
I. Polosukhin, Attention is all you need, Adv. Neural Inf. Proces. Syst. 30 (2017), 10.1016/j.autcon.2023.104784.
https://doi.org/10.48550/arxiv.1706.03762. [75] M. Tan, R. Pang, Q.V. Le, Efficientdet: Scalable and efficient object detection, in:
[50] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
IEEE International Conference on Computer Vision, 2017, pp. 2961–2969, Recognition, 2020, pp. 10781–10790, https://doi.org/10.1109/
https://doi.org/10.1109/ICCV.2017.322. CVPR42600.2020.01079.
[51] M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural [76] Z. Zhou, M.M.R. Siddiquee, N. Tajbakhsh, J. Liang, UNet++: redesigning skip
networks, in: International Conference on Machine Learning, PMLR, 2019, May, connections to exploit multiscale features in image segmentation, IEEE Trans.
pp. 6105–6114, https://doi.org/10.48550/arxiv.1905.11946. Med. Imaging 39 (6) (2019) 1856–1867, https://doi.org/10.1109/
[52] S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance TMI.2019.2959609.
segmentation, in: Proceedings of the IEEE Conference on Computer Vision and [77] A. Ji, X. Xue, Y. Wang, X. Luo, W. Xue, An integrated approach to automatic pixel-
Pattern Recognition, 2018, pp. 8759–8768, https://doi.org/10.1109/ level crack detection and quantification of asphalt pavement, Autom. Constr. 114
CVPR.2018.00913. (2020) 103176, https://doi.org/10.1016/j.autcon.2020.103176.
[53] A. Bochkovskiy, C.Y. Wang, H.Y.M. Liao, Yolov4: Optimal speed and accuracy of [78] V.H. Mac, J. Huh, N.S. Doan, G. Shin, B.Y. Lee, Thermography-based
object detection. arXiv preprint. arXiv:2004.10934. deterioration detection in concrete bridge girders strengthened with carbon fiber-
[54] Y. LeCun, L.D. Jackel, L. Bottou, C. Cortes, J.S. Denker, H. Drucker, V. Vapnik, reinforced polymer, Sensors 20 (11) (2020) 3263, https://doi.org/10.3390/
Learning algorithms for classification: a comparison on handwritten digit s20113263.
recognition, Neural Networks: Stat. Mech. Perspect. 261 (276) (1995) 2, https:// [79] M. Solla, S. Lagüela, N. Fernández, I. Garrido, Assessing rebar corrosion through
doi.org/10.1109/ICPR.1994.576879. the combination of nondestructive GPR and IRT methodologies, Remote Sens. 11
[55] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale (14) (2019) 1705, https://doi.org/10.3390/rs11141705.
hierarchical image database, in: 2009 IEEE Conference on Computer Vision and [80] I. Garrido, M. Solla, S. Lagüela, N. Fernández, IRT and GPR techniques for
Pattern Recognition, 2009, June, pp. 248–255, https://doi.org/10.1109/ moisture detection and characterisation in buildings, Sensors 20 (22) (2020)
CVPR.2009.5206848. 6421, https://doi.org/10.3390/s20226421.
[56] J.R. Uijlings, K.E. Van De Sande, T. Gevers, A.W. Smeulders, Selective search for [81] P. Cotič, D. Kolarič, V.B. Bosiljkov, V. Bosiljkov, Z. Jagličić, Determination of the
object recognition, Int. J. Comput. Vis. 104 (2013) 154–171, https://doi.org/ applicability and limits of void and delamination detection in concrete structures
10.1007/s11263-013-0620-5. using infrared thermography, NDT & E Int. 74 (2015) 87–93, https://doi.org/
[57] A. Arnab, P.H. Torr, Pixelwise instance segmentation with a dynamically 10.1016/j.ndteint.2015.05.003.
instantiated network, in: Proceedings of the IEEE Conference on Computer Vision [82] N.P. De Alcantara Jr, F.M. Da Silva, M.T. Guimarães, M.D. Pereira, Corrosion
and Pattern Recognition, 2017, pp. 441–450, https://doi.org/10.1109/ assessment of steel bars used in reinforced concrete structures by means of eddy
CVPR.2017.100. current testing, Sensors 16 (1) (2015) 15, https://doi.org/10.3390/s16010015.
[58] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural [83] X. Fu, C. Zhang, X. Peng, L. Jian, Z. Liu, Towards end-to-end pulsed eddy current
networks, Science 313 (5786) (2006) 504–507, https://doi.org/10.1126/ classification and regression with CNN, in: In 2019 IEEE International
science.1127647. Instrumentation and Measurement Technology Conference (I2MTC), IEEE, 2019,
[59] H. Jaeger, Tutorial on Training Recurrent Neural Networks, Covering BPPT, May, pp. 1–5, https://doi.org/10.1109/I2MTC.2019.8826858.
RTRL, EKF and the “Echo State Network” Approach, 2002, https://doi.org/ [84] R. Miao, Y. Gao, L. Ge, Z. Jiang, J. Zhang, Online defect recognition of narrow
10.1016/j.cosrev.2009.03.005. overlap weld based on two-stage recognition model combining continuous
[60] M. Sundermeyer, R. Schlüter, H. Ney, LSTM neural networks for language wavelet transform and convolutional neural network, Comput. Ind. 112 (2019)
processing, Interspeech 2012 (2012) 194–197, https://doi.org/10.1109/ 103115, https://doi.org/10.1016/j.compind.2019.07.005.
TASLP.2015.2400218.

34
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

[85] T.A. Alvarenga, A.L. Carvalho, L.M. Honorio, A.S. Cerqueira, L.M. Filho, R. [110] Y. Li, W. Hu, H. Dong, X. Zhang, Building damage detection from post-event aerial
A. Nobrega, Detection and classification system for rail surface defects based on imagery using single shot multibox detector, Appl. Sci. 9 (6) (2019) 1128, https://
Eddy current, Sensors 21 (23) (2021) 7937, https://doi.org/10.3390/s21237937. doi.org/10.3390/app9061128.
[86] T. Meng, Y. Tao, Z. Chen, J.R.S. Avila, Q. Ran, Y. Shao, W. Yin, Depth evaluation [111] R. Ali, J. Zeng, Y.J. Cha, Deep learning-based crack detection in a concrete tunnel
for metal surface defects by eddy current testing using deep residual structure using multispectral dynamic imaging, in: Smart Structures and NDE for
convolutional neural networks, IEEE Trans. Instrum. Meas. 70 (2021) 1–13, Industry 4.0, Smart Cities, and Energy Systems 11382, SPIE, 2020, pp. 12–19,
https://doi.org/10.1109/TIM.2021.3117367. https://doi.org/10.1117/12.2557900.
[87] S.S. Khedmatgozar Dolati, N. Caluk, A. Mehrabi, S.S. Khedmatgozar Dolati, Non- [112] A. Semwal, R.E. Mohan, L.M.J. Melvin, P. Palanisamy, C. Baskar, L. Yi,
destructive testing applications for steel bridges, Appl. Sci. 11 (20) (2021) 9757, S. Pookkuttath, B. Ramalingam, B. Ramalingam, False ceiling detection and
https://doi.org/10.3390/app11209757. mapping using a deep learning framework and the teleoperated reconfigurable
[88] K. Dinh, N. Gucunski, T.H. Duong, An algorithm for automatic localization and ‘Falcon’ robot, Sensors 22 (1) (2022) 262, https://doi.org/10.3390/s22010262.
detection of rebars from GPR data of concrete bridge decks, Autom. Constr. 89 [113] R. Ali, D. Kang, G. Suh, Y.J. Cha, Real-time multiple damage mapping using
(2018) 292–298, https://doi.org/10.1016/j.autcon.2018.02.017. autonomous UAV and deep faster region-based neural networks for GPS-denied
[89] J. Xu, J. Zhang, W. Sun, Recognition of the typical distress in concrete pavement structures, Autom. Constr. 130 (2021) 103831, https://doi.org/10.1016/j.
based on GPR and 1D-CNN, Remote Sens. 13 (12) (2021) 2375, https://doi.org/ autcon.2021.103831.
10.3390/rs13122375. [114] C.M. Yeum, J. Choi, S.J. Dyke, Automated region-of-interest localization and
[90] J. Zhang, Y. Lu, Z. Yang, X. Zhu, T. Zheng, X. Liu, W. Li, Recognition of void defects classification for vision-based visual assessment of civil infrastructure, Struct.
in airport runways using ground-penetrating radar and shallow CNN, Autom. Health Monit. 18 (3) (2019) 675–689, https://doi.org/10.1177/
Constr. 138 (2022) 104260, https://doi.org/10.1016/j.autcon.2022.104260. 1475921718765419.
[91] J. Melville, K.S. Alguri, C. Deemer, J.B. Harley, Structural damage detection using [115] C. Zhang, C.C. Chang, M. Jamshidi, Concrete bridge surface damage detection
deep learning of ultrasonic guided waves, in: AIP Conference Proceedings Vol. using a single-stage detector, Comput. Aided Civ. Inf. Eng. 35 (4) (2020) 389–409,
1949, AIP Publishing LLC, 2018, April, p. 230004, https://doi.org/10.1121/ https://doi.org/10.1111/mice.12500.
1.5042240. No. 1. [116] J. Redmon, A. Farhadi, Yolov3: An Incremental Improvement. arXiv preprint.
[92] D.Q. Tran, J.W. Kim, K.D. Tola, W. Kim, S. Park, Artificial intelligence-based bolt arXiv:1804.02767, 2018, https://doi.org/10.48550/arXiv.1804.02767.
loosening diagnosis using deep learning algorithms for laser ultrasonic wave [117] J.C. Cheng, M. Wang, Automated detection of sewer pipe defects in closed-circuit
propagation data, Sensors 20 (18) (2020) 5329, https://doi.org/10.3390/ television images using deep learning techniques, Autom. Constr. 95 (2018)
s20185329. 155–171, https://doi.org/10.1016/j.autcon.2018.08.006.
[93] R.J. Pyle, R.L. Bevan, R.R. Hughes, R.K. Rachev, A.A.S. Ali, P.D. Wilcox, Deep [118] H. Perez, J.H. Tah, Deep learning smartphone application for real-time detection
learning for ultrasonic crack characterization in NDE, IEEE Trans. Ultrason. of defects in buildings, Struct. Control. Health Monit. 28 (7) (2021) e2751,
Ferroelectr. Freq. Control 68 (5) (2020) 1854–1865, https://doi.org/10.1109/ https://doi.org/10.1002/stc.2751.
TUFFC.2020.3045847. [119] Y. Dong, J. Wang, Z. Wang, X. Zhang, Y. Gao, Q. Sui, P. Jiang, A deep-learning-
[94] A. Arbaoui, A. Ouahabi, S. Jacques, M. Hamiane, Concrete cracks detection and based multiple defect detection method for tunnel lining damages, IEEE Access 7
monitoring using deep learning-based multiresolution analysis, Electronics 10 (15) (2019) 182643–182657, https://doi.org/10.1109/ACCESS.2019.2931074.
(2021) 1772c, https://doi.org/10.3390/electronics10151772. [120] S. Zhao, D. Zhang, Y. Xue, M. Zhou, H. Huang, A deep learning-based approach
[95] V. Ewald, R.S. Venkat, A. Asokkumar, R. Benedictus, C. Boller, R.M. Groves, for refined crack evaluation from shield tunnel lining images, Autom. Constr. 132
Perception modelling by invariant representation of deep learning for automated (2021) 103934, https://doi.org/10.1016/j.autcon.2021.103934.
structural diagnostic in aircraft maintenance: a study case using DeepSHM, Mech. [121] Y. Liu, J. Yao, X. Lu, R. Xie, L. Li, DeepCrack: a deep hierarchical feature learning
Syst. Signal Process. 165 (2022) 108153, https://doi.org/10.1016/j. architecture for crack segmentation, Neurocomputing 338 (2019) 139–153,
ymssp.2021.108153. https://doi.org/10.1016/j.neucom.2019.01.036.
[96] G. Han, Y.M. Kim, H. Kim, T.M. Oh, K.I. Song, A. Kim, T.H. Kwon, Auto-detection [122] S. Zhao, D.M. Zhang, H.W. Huang, Deep learning–based image instance
of acoustic emission signals from cracking of concrete structures using segmentation for moisture marks of shield tunnel lining, Tunn. Undergr. Space
convolutional neural networks: upscaling from specimen, Expert Syst. Appl. 186 Technol. 95 (2020) 103156, https://doi.org/10.1016/j.tust.2019.103156.
(2021) 115863, https://doi.org/10.1016/j.eswa.2021.115863. [123] F. Liu, L. Wang, UNet-based model for crack detection integrating visual
[97] R. Zhang, X. Yan, L. Guo, Deep learning-based classification of damage-induced explanations, Constr. Build. Mater. 322 (2022) 126265, https://doi.org/10.1016/
acoustic emission signals in UHPC, Constr. Build. Mater. 356 (2022) 129285, j.conbuildmat.2021.126265.
https://doi.org/10.1016/j.conbuildmat.2022.129285. [124] A. Ji, A.W.Z. Chew, X. Xue, L. Zhang, An encoder-decoder deep learning method
[98] S. Guo, H. Ding, Y. Li, H. Feng, X. Xiong, Z. Su, W. Feng, A hierarchical deep for multi-class object segmentation from 3D tunnel point clouds, Autom. Constr.
convolutional regression framework with sensor network fail-safe adaptation for 137 (2022) 104187, https://doi.org/10.1016/j.autcon.2022.104187.
acoustic-emission-based structural health monitoring, Mech. Syst. Signal Process. [125] D. Xi, Y. Qin, S. Wang, YDRSNet: An integrated Yolov5-Deeplabv3+ real-time
181 (2022) 109508, https://doi.org/10.1016/j.ymssp.2022.109508. segmentation network for gear pitting measurement, J. Intell. Manuf. (2021)
[99] D. Kang, Y.J. Cha, Autonomous UAVs for structural health monitoring using deep 1–15, https://doi.org/10.1007/s10845-021-01876-y.
learning and an ultrasonic beacon system with geo-tagging, Comput. Aided Civ. [126] G.H. Beckman, D. Polyzois, Y.J. Cha, Deep learning-based automatic volumetric
Inf. Eng. 33 (10) (2018) 885–902, https://doi.org/10.1111/mice.12375. damage quantification using depth camera, Autom. Constr. 99 (2019) 114–124,
[100] S.I. Hassan, L.M. Dang, I. Mehmood, S. Im, C. Choi, J. Kang, Y.S. Park, H. Moon, https://doi.org/10.1016/j.autcon.2018.12.006.
Underground sewer pipe condition assessment based on convolutional neural [127] D. Kang, S.S. Benipal, D.L. Gopal, Y.J. Cha, Hybrid pixel-level concrete crack
networks, Autom. Constr. 106 (2019) 102849, https://doi.org/10.1016/j. segmentation and quantification across complex backgrounds using deep
autcon.2019.102849. learning, Autom. Constr. 118 (2020) 103291, https://doi.org/10.1016/j.
[101] D. Li, A. Cong, S. Guo, Sewer damage detection from imbalanced CCTV inspection autcon.2020.103291.
data using deep convolutional neural networks with hierarchical classification, [128] H.W. Huang, Q.T. Li, D.M. Zhang, Deep learning based image recognition for
Autom. Constr. 101 (2019) 199–208, https://doi.org/10.1016/j. crack and leakage defects of metro shield tunnel, Tunn. Undergr. Space Technol.
autcon.2019.01.017. 77 (2018) 166–176, https://doi.org/10.1016/j.tust.2018.04.002.
[102] F.C. Chen, M.R. Jahanshahi, NB-CNN: deep learning-based crack detection using [129] K. Jang, Y.K. An, B. Kim, S. Cho, Automated crack evaluation of a high-rise bridge
convolutional neural network and Naïve Bayes data fusion, IEEE Trans. Ind. pier using a ring-type climbing robot, Comput. Aided Civ. Inf. Eng. 36 (1) (2021)
Electron. 65 (5) (2017) 4392–4400, https://doi.org/10.1109/TIE.2017.2764844. 14–29, https://doi.org/10.1111/mice.12550.
[103] E.E.B. Adam, A. Sathesh, Construction of accurate crack identification on concrete [130] A. Zhang, K.C. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Liu, Q. Joshua, C. Chen,
structure using hybrid deep learning approach, J. Innovat. Image Proc. (JIIP) 3 Automated pixel-level pavement crack detection on 3D asphalt surfaces using a
(02) (2021) 85–99, https://doi.org/10.36548/jiip.2021.2.002. deep-learning network, Comput. Aided Civ. Inf. Eng. 32 (10) (2017) 805–819,
[104] C. Zhang, V.S. Kalasapudi, P. Tang, Rapid data quality oriented laser scan https://doi.org/10.1016/j.aei.2016.03.004.
planning for dynamic construction environments, Adv. Eng. Inform. 30 (2) (2016) [131] F. Yang, L. Zhang, S. Yu, D. Prokhorov, X. Mei, H. Ling, Feature pyramid and
218–232, https://doi.org/10.1016/j.aei.2016.03.004. hierarchical boosting network for pavement crack detection, IEEE Trans. Intell.
[105] H. Perez, J.H. Tah, A. Mosavi, Deep learning for detecting building defects using Transp. Syst. 21 (4) (2019) 1525–1535, https://doi.org/10.1109/
convolutional neural networks, Sensors 19 (16) (2019) 3556, https://doi.org/ TITS.2019.2910595.
10.3390/s19163556. [132] S. Wang, H. Zhang, H. Wang, B. Chen, Y. Li, C. Chen, Combination of point-cloud
[106] H. Xu, X. Su, Y. Wang, H. Cai, K. Cui, X. Chen, Automatic bridge crack detection model and FCN for dam crack detection and scale calculation, in: In 2019 Chinese
using a convolutional neural network, Appl. Sci. 9 (14) (2019) 2867, https://doi. Automation Congress (CAC), IEEE, 2019, pp. 5859–5862, https://doi.org/
org/10.3390/app9142867. 10.1109/CAC48633.2019.8996699.
[107] J. Guo, Q. Wang, Y. Li, P. Liu, Façade defects classification from imbalanced [133] V. Hoskere, Y. Narazaki, T.A. Hoang, B.F. Spencer Jr., MaDnet: multi-task
dataset using meta learning-based convolutional neural network, Comput. Aided semantic segmentation of multiple types of structural materials and damage in
Civ. Inf. Eng. 35 (12) (2020) 1403–1418, https://doi.org/10.1111/mice.12578. images of civil infrastructure, J. Civ. Struct. Heal. Monit. 10 (5) (2020) 757–773,
[108] A.S. Rao, T. Nguyen, M. Palaniswami, T. Ngo, Vision-based automated crack https://doi.org/10.1007/s13349-020-00409-0.
detection using convolutional neural networks for condition assessment of [134] J. Pang, H. Zhang, H. Zhao, L. Li, DcsNet: a real-time deep network for crack
infrastructure, Struct. Health Monit. 20 (4) (2021) 2124–2142, https://doi.org/ segmentation, SIViP 16 (4) (2022) 911–919, https://doi.org/10.1007/s11760-
10.1177/1475921720965445. 021-02034-w.
[109] R.Y. Kung, N.H. Pan, C.C. Wang, P.C. Lee, Application of deep learning and [135] D. Kang, Y.J. Cha, Efficient attention-based deep encoder and decoder for
unmanned aerial vehicle on building maintenance, Adv. Civil Eng. 2021 (2021), automatic crack segmentation, Struct. Health Monit. 14759217211053776
https://doi.org/10.1155/2021/5598690. (2022), https://doi.org/10.1177/14759217211053776.

35
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

[136] D. Müller, I. Soto-Rey, F. Kramer, Towards a guideline for evaluation metrics in [160] L. Zheng, M. Yu, M. Song, A. Stefanidis, Z. Ji, C. Yang, Registration of long-strip
medical image segmentation, BMC. Res. Notes 15 (1) (2022) 1–8, https://doi.org/ terrestrial laser scanning point clouds using ransac and closed constraint
10.1186/s13104-022-06096-y. adjustment, Remote Sens. 8 (4) (2016) 278, https://doi.org/10.3390/rs8040278.
[137] P. Balasubramanian, V. Kaushik, S.Y. Altamimi, M. Amabili, M. Alteneiji, [161] A. Khaloo, D. Lattanzi, Hierarchical dense structure-from-motion reconstructions
Comparison of neural networks based on accuracy and robustness in identifying for infrastructure condition assessment, J. Comput. Civ. Eng. 31 (1) (2017)
impact location for structural health monitoring applications, Struct. Health 04016047, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000616.
Monit. 22 (1) (2023) 417–432, https://doi.org/10.1177/14759217221098569. [162] K. Hata, S. Savarese, CS231A Course Notes 3: Epipolar geometry, 2017, 18c.)/
[138] W. Wu, M.A. Qurishee, J. Owino, I. Fomunung, M. Onyango, B. Atolagbe, http://web. Stanford. edu/class/cs231a/course_notes/03-epipolar-geometry. pdf.
Coupling deep learning and UAV for infrastructure condition assessment [163] D. Nistér, An efficient solution to the five-point relative pose problem, IEEE Trans.
automation, in: In 2018 IEEE International Smart Cities Conference (ISC2), IEEE, Pattern Anal. Mach. Intell. 26 (6) (2004) 756–770, https://doi.org/10.1109/
2018, September, pp. 1–7, https://doi.org/10.1109/ISC2.2018.8656971. TPAMI.2004.17.
[139] I.H. Kim, H. Jeon, S.C. Baek, W.H. Hong, H.J. Jung, Application of crack [164] R.I. Hartley, In defense of the eight-point algorithm, IEEE Trans. Pattern Anal.
identification techniques for an aging concrete bridge inspection using an Mach. Intell. 19 (6) (1997) 580–593, https://doi.org/10.1109/34.601246.
unmanned aerial vehicle, Sensors 18 (6) (2018) 1881, https://doi.org/10.3390/ [165] Z. Zhang, Determining the epipolar geometry and its uncertainty: a review, Int. J.
s18061881. Comput. Vis. 27 (1998) 161–195, https://doi.org/10.1023/A:1007941100561.
[140] J.Y. Rau, K.W. Hsiao, J.P. Jhan, S.H. Wang, W.C. Fang, J.L. Wang, Bridge crack [166] R.I. Hartley, P. Sturm, Triangulation, Comput. Vis. Image Underst. 68 (2) (1997)
detection using multi-rotary UAV and object-base image analysis, in: The 146–157, https://doi.org/10.1006/cviu.1997.0547.
International Archives of Photogrammetry, Remote Sensing and Spatial [167] Y.F. Liu, S. Cho, B.F. Spencer Jr., J.S. Fan, Concrete crack assessment using digital
Information Sciences 42, 2017, p. 311, https://doi.org/10.5194/isprs-archives- image processing and 3D scene reconstruction, J. Comput. Civ. Eng. 30 (1) (2016)
XLII-2-W6-311-2017. 04014124, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000446.
[141] C. Feng, H. Zhang, H. Wang, S. Wang, Y. Li, Automatic pixel-level crack detection [168] Z. Shang, Z. Shen, Single-pass inline pipeline 3D reconstruction using depth
on dam surface using deep convolutional network, Sensors 20 (7) (2020) 2069, camera array, Autom. Constr. 138 (2022) 104231, https://doi.org/10.1016/j.
https://doi.org/10.3390/s20072069. autcon.2022.104231.
[142] J. Shi, J. Dang, R. Zuo, Bridge damage cropping-and-stitching segmentation using [169] H. Son, C. Kim, C. Kim, Fully automated as-built 3D pipeline extraction method
fully convolutional network based on images from UAVs, in: Bridge Maintenance, from laser-scanned data based on curvature computation, J. Comput. Civ. Eng. 29
Safety, Management, Life-Cycle Sustainability and Innovations, CRC Press, 2021, (4) (2015) B4014003, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000401.
pp. 264–270, https://doi.org/10.1201/9780429279119-32. [170] B. Riveiro, M.J. DeJong, B. Conde, Automated processing of large point clouds for
[143] Y. Arjoune, S. Peri, N. Sugunaraj, A. Biswas, D. Sadhukhan, P. Ranganathan, An structural health monitoring of masonry arch bridges, Autom. Constr. 72 (2016)
instance segmentation and clustering model for energy audit assessments in built 258–268, https://doi.org/10.1016/j.autcon.2016.02.009.
environments: a multi-stage approach, Sensors 21 (13) (2021) 4375, https://doi. [171] R. Lu, I. Brilakis, C.R. Middleton, Detection of structural components in point
org/10.3390/s21134375. clouds of existing RC bridges, Comput. Aided Civ. Inf. Eng. 34 (3) (2019)
[144] H. Samma, S.A. Suandi, N.A. Ismail, S. Sulaiman, L.L. Ping, Evolving pre-trained 191–212, https://doi.org/10.1111/mice.12407.
CNN using two-layers optimizer for road damage detection from drone images, [172] F. Jia, D.D. Lichti, A model-based design system for terrestrial laser scanning
IEEE Access 9 (2021) 158215–158226, https://doi.org/10.1109/ networks in complex sites, Remote Sens. 11 (15) (2019) 1749, https://doi.org/
ACCESS.2021.3131231. 10.3390/rs11151749.
[145] Z. Yu, Y. Shen, C. Shen, A real-time detection approach for bridge cracks based on [173] Q. Wang, M.K. Kim, Applications of 3D point cloud data in the construction
YOLOv4-FPM, Autom. Constr. 122 (2021) 103514, https://doi.org/10.1016/j. industry: a fifteen-year review from 2004 to 2018, Adv. Eng. Inform. 39 (2019)
autcon.2020.103514. 306–319, https://doi.org/10.1016/j.aei.2019.02.007.
[146] Y. Tian, Y. Chen, W. Diming, Y. Shaoguang, M. Wandeng, W. Chao, X. Chunmei, [174] W. Jiang, Y. Zhou, L. Ding, C. Zhou, X. Ning, UAV-based 3D reconstruction for
Y. Long, Augmentation method for anti-vibration hammer on power transimission hoist site mapping and layout planning in petrochemical construction, Autom.
line based on CycleGAN, Int. J. Image Data Fusion 1-20 (2022), https://doi.org/ Constr. 113 (2020) 103137, https://doi.org/10.1016/j.autcon.2020.103137.
10.1080/19479832.2022.2033855. [175] L. Hua, Y. Lu, J. Deng, Z. Shi, D. Shen, 3D reconstruction of concrete defects using
[147] A. Waqas, D. Kang, Y.J. Cha, Deep learning-based obstacle-avoiding autonomous optical laser triangulation and modified spacetime analysis, Autom. Constr. 142
UAVs with fiducial marker-based localization for structural health monitoring, (2022) 104469, https://doi.org/10.1016/j.autcon.2022.104469.
Struct. Health Monit. 14759217231177314 (2023), https://doi.org/10.1177/ [176] D. Moon, S. Chung, S. Kwon, J. Seo, J. Shin, Comparison and utilization of point
14759217231177314. cloud generated from photogrammetry and laser scanning: 3D world model for
[148] Z. Ma, S. Liu, A review of 3D reconstruction techniques in civil engineering and smart heavy equipment planning, Autom. Constr. 98 (2019) 322–331, https://
their applications, Adv. Eng. Inform. 37 (2018) 163–174, https://doi.org/ doi.org/10.1016/j.autcon.2018.07.020.
10.1016/j.aei.2018.05.005. [177] S.K. Nouwakpo, M.A. Weltz, K. McGwire, Assessing the performance of structure-
[149] G. Shen, L. Lei, Z. Li, S. Cai, L. Zhang, P. Cao, X. Liu, Deep reinforcement learning from-motion photogrammetry and terrestrial LiDAR for reconstructing soil
for flocking motion of multi-uav systems: learn from a digital twin, IEEE Internet surface microtopography of naturally vegetated plots, Earth Surf. Process. Landf.
Things J. 9 (13) (2021) 11141–11153, https://doi.org/10.1109/ 41 (3) (2016) 308–322, https://doi.org/10.1002/esp.3787.
JIOT.2021.3127873. [178] X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, P. Tan, Cascade cost volume for high-
[150] J. Chen, W. Lu, J. Lou, Automatic concrete defect detection and reconstruction by resolution multi-view stereo and stereo matching, in: Proceedings of the IEEE/
aligning aerial images onto semantic-rich building information model, Comput. CVF Conference on Computer Vision and Pattern Recognition, 2020,
Aided Civ. Inf. Eng. (2022), https://doi.org/10.1111/mice.12928. pp. 2495–2504, https://doi.org/10.1109/CVPR42600.2020.00257.
[151] S. Zhao, F. Kang, J. Li, C. Ma, Structural health monitoring and inspection of dams [179] J.L. Schonberger, J.M. Frahm, Structure-from-motion revisited, in: Proceedings of
based on UAV photogrammetry with image 3D reconstruction, Autom. Constr. the IEEE Conference on Computer Vision and Pattern Recognition, 2016,
130 (2021) 103832, https://doi.org/10.1016/j.autcon.2021.103832. pp. 4104–4113, https://doi.org/10.1109/CVPR.2016.445.
[152] W. Ding, H. Yang, K. Yu, J. Shu, Crack detection and quantification for concrete [180] M.L. Cheng, M. Matsuoka, W. Liu, F. Yamazaki, Near-real-time gradually
structures using UAV and transformer, Autom. Constr. 152 (2023) 104929, expanding 3D land surface reconstruction in disaster areas by sequential drone
https://doi.org/10.1016/j.autcon.2023.104929. imagery, Autom. Constr. 135 (2022) 104105, https://doi.org/10.1016/j.
[153] Z. Xu, R. Kang, R. Lu, 3D reconstruction and measurement of surface defects in autcon.2021.104105.
prefabricated elements using point clouds, J. Comput. Civ. Eng. 34 (5) (2020) [181] M.L. Cheng, M. Matsuoka, An enhanced image matching strategy using binary-
04020033, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000920. stream feature descriptors, IEEE Geosci. Remote Sens. Lett. 17 (7) (2019)
[154] A. Witkin, Scale-space filtering: A new approach to multi-scale description, in: 1253–1257, https://doi.org/10.1109/LGRS.2019.2943237.
ICASSP'84. IEEE International Conference on Acoustics, Speech, and Signal [182] K. Chaiyasarn, A. Buatik, H. Mohamad, M. Zhou, S. Kongsilp, N. Poovarodom,
Processing, 9, IEEE, 1984, March, pp. 150–153, https://doi.org/10.1109/ Integrated pixel-level CNN-FCN crack detection via photogrammetric 3D texture
ICASSP.1984.1172729. mapping of concrete structures, Autom. Constr. 140 (2022) 104388, https://doi.
[155] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. org/10.1016/j.autcon.2022.104388.
Comput. Vis. 60 (2004) 91–110, https://doi.org/10.1023/B: [183] L. Deng, T. Sun, L. Yang, R. Cao, Binocular video-based 3D reconstruction and
VISI.0000029664.99615.94. length quantification of cracks in concrete structures, Autom. Constr. 148 (2023)
[156] E. Rosten, T. Drummond, Machine learning for high-speed corner detection, in: 104743, https://doi.org/10.1016/j.autcon.2023.104743.
Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, [184] X. Fang, Q. Li, J. Zhu, Z. Chen, D. Zhang, K. Wu, K. Ding, Q. Li, Sewer defect
Graz, Austria, May 7–13, 2006. Proceedings, Part I 9, Springer, Berlin Heidelberg, instance segmentation, localization, and 3D reconstruction for sewer floating
2006, pp. 430–443, https://doi.org/10.1007/11744023_34. capsule robots, Autom. Constr. 142 (2022) 104494, https://doi.org/10.1016/j.
[157] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF), autcon.2022.104494.
Comput. Vis. Image Underst. 110 (3) (2008) 346–359, https://doi.org/10.1016/j. [185] K. Idjaton, R. Janvier, M. Balawi, X. Desquesnes, X. Brunetaud, S. Treuillet,
cviu.2007.09.014. Detection of limestone spalling in 3D survey images using deep learning, Autom.
[158] M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm for model Constr. 152 (2023) 104919, https://doi.org/10.1016/j.autcon.2023.104919.
fitting with applications to image analysis and automated cartography, Commun. [186] S. Zhao, F. Kang, J. Li, Concrete dam damage detection and localisation based on
ACM 24 (6) (1981) 381–395, https://doi.org/10.1145/358669.358692. YOLOv5s-HSC and photogrammetric 3D reconstruction, Autom. Constr. 143
[159] P. Rodriguez-Gonzalvez, D. Gonzalez-Aguilera, G. Lopez-Jimenez, I. Picon- (2022) 104555, https://doi.org/10.1016/j.autcon.2022.104555.
Cabrera, Image-based modeling of built environment from an unmanned aerial [187] J. Zhao, F. Hu, Y. Xu, W. Zuo, J. Zhong, H. Li, Structure-PoseNet for identification
system, Autom. Constr. 48 (2014) 44–52, https://doi.org/10.1016/j. of dense dynamic displacement and three-dimensional poses of structures using a
autcon.2014.08.010.

36
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

monocular camera, Comput. Aided Civ. Inf. Eng. 37 (6) (2022) 704–725, https:// [213] S. Sajedi, X. Liang, Deep generative Bayesian optimization for sensor placement in
doi.org/10.1111/mice.12761. structural health monitoring, Comput. Aided Civ. Inf. Eng. 37 (9) (2022)
[188] S. Zhuge, X. Xu, L. Zhong, S. Gan, B. Lin, X. Yang, X. Zhang, Noncontact deflection 1109–1127, https://doi.org/10.1111/mice.12799.
measurement for bridge through a multi-UAVs system, Comput. Aided Civ. Inf. [214] L. Li, M. Morgantini, R. Betti, Structural damage assessment through a new
Eng. 37 (6) (2022) 746–761, https://doi.org/10.1111/mice.12771. generalized autoencoder with features in the quefrency domain, Mech. Syst.
[189] M. Fallahian, F. Khoshnoudian, S. Talaei, Application of couple sparse coding Signal Process. 184 (2023) 109713, https://doi.org/10.1016/j.
ensemble on structural damage detection, Smart Struct. Syst. 21 (1) (2018) ymssp.2022.109713.
001–014, https://doi.org/10.12989/sss.2018.21.1.00. [215] G.E. Karniadakis, I.G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, Physics-
[190] Y. Bao, Z. Tang, H. Li, Y. Zhang, Computer vision and deep learning–based data informed machine learning. Nature reviews, Physics 3 (6) (2021) 422–440,
anomaly detection method for structural health monitoring, Struct. Health Monit. https://doi.org/10.1038/s42254-021-00314-5.
18 (2) (2019) 401–421, https://doi.org/10.1177/1475921718757405. [216] E. Ghorbani, O. Buyukozturk, Y.J. Cha, Hybrid output-only structural system
[191] Y. Yu, C. Wang, X. Gu, J. Li, A novel deep learning-based method for damage identification using random decrement and Kalman filter, Mech. Syst. Signal
identification of smart building structures, Struct. Health Monit. 18 (1) (2019) Process. 144 (2020) 106977, https://doi.org/10.1016/j.ymssp.2020.106977.
143–163, https://doi.org/10.1177/1475921718804132. [217] F.G. Yuan, S.A. Zargar, Q. Chen, S. Wang, Machine learning for structural health
[192] M. Azimi, G. Pekcan, Structural health monitoring using extremely compressed monitoring: challenges and opportunities, Sens. Smart Struct. Technol. Civil,
data through deep learning, Comput. Aided Civ. Inf. Eng. 35 (6) (2020) 597–614, Mech. Aerospace Syst. 2020 (11379) (2020) 1137903, https://doi.org/10.1021/
https://doi.org/10.1111/mice.12517. jacs.1c09259.
[193] T. Guo, L. Wu, C. Wang, Z. Xu, Damage detection in a novel deep-learning [218] G. Cybenko, Approximation by superpositions of a sigmoidal function, Math.
framework: a robust method for feature extraction, Struct. Health Monit. 19 (2) Control Signals Syst. 2 (4) (1989) 303–314, https://doi.org/10.1007/
(2020) 424–442, https://doi.org/10.1177/1475921719846051. BF02551274.
[194] M. Morgantini, R. Betti, L. Balsamo, Structural damage assessment through [219] E.J. Cross, S.J. Gibson, M.R. Jones, D.J. Pitchforth, S. Zhang, T.J. Rogers, Physics-
features in quefrency domain, Mech. Syst. Signal Process. 147 (2021) 107017, informed machine learning for structural health monitoring, in: Structural Health
https://doi.org/10.1016/j.ymssp.2020.107017. Monitoring Based on Data Science Techniques, Springer, Cham, 2022,
[195] J. Won, J.W. Park, S. Jang, K. Jin, Y. Kim, Automated structural damage pp. 347–367, https://doi.org/10.1007/978-3-030-81716-9_17.
identification using data normalization and 1-dimensional convolutional neural [220] Y.A. Yucesan, F.A. Viana, Hybrid physics-informed neural networks for main
network, Appl. Sci. 11 (6) (2021) 2610, https://doi.org/10.3390/app11062610. bearing fatigue prognosis with visual grease inspection, Comput. Ind. 125 (2021)
[196] O. Abdeljaber, O. Avci, M.S. Kiranyaz, B. Boashash, H. Sodano, D.J. Inman, 1-D 103386, https://doi.org/10.1016/j.compind.2020.103386.
CNNs for structural damage detection: verification on a structural health [221] M.A. Vega, Z. Hu, T.B. Fillmore, M.D. Smith, M.D. Todd, A novel framework for
monitoring benchmark data, Neurocomputing 275 (2018) 1308–1317, https:// integration of abstracted inspection data and structural health monitoring for
doi.org/10.1016/j.neucom.2017.09.069. damage prognosis of miter gates, Reliab. Eng. Syst. Saf. 211 (2021) 107561,
[197] Z.K. Peng, Z.Q. Lang, C. Wolters, S.A. Billings, K. Worden, Feasibility study of https://doi.org/10.1016/j.ress.2021.107561.
structural damage detection using NARMAX modelling and nonlinear output [222] Z. Zhang, C. Sun, H. Li, B.F. Spencer, Structural damage identification via physics-
frequency response function based analysis, Mech. Syst. Signal Process. 25 (3) guided machine learning: a methodology integrating pattern recognition with
(2011) 1045–1061, https://doi.org/10.1016/j.ymssp.2010.09.014. finite element model updating, Struct. Health Monit. 20 (4) (2021) 1675–1688,
[198] Y.J. Cha, A. Mostafavi, S.S. Benipal, DNoiseNet: deep learning-based feedback https://doi.org/10.1177/1475921720927488.
active noise control in various noisy environments, Eng. Appl. Artif. Intell. 121 [223] D. Di Lorenzo, V. Champaney, J.Y. Marzin, C. Farhat, F. Chinesta, Physics
(2023) 105971, https://doi.org/10.1016/j.engappai.2023.105971. informed and data-based augmented learning in structural health diagnosis,
[199] A. Mostafavi, Y.J. Cha, Deep learning-based active noise control on construction Comput. Methods Appl. Mech. Eng. 414 (2023) 116186, https://doi.org/
sites, Autom. Constr. (2023), https://doi.org/10.1016/j.autcon.2023.104885. 10.1016/j.cma.2023.116186.
[200] L. Sun, Z. Shang, Y. Xia, S. Bhowmick, S. Nagarajaiah, Review of bridge structural [224] W. Li, M.Z. Bazant, J. Zhu, A physics-guided neural network framework for elastic
health monitoring aided by big data and artificial intelligence: from condition plates: comparison of governing equations-based and energy-based approaches,
assessment to damage detection, J. Struct. Eng. 5 (2020) 04020073, https://doi. Comput. Methods Appl. Mech. Eng. 383 (2021) 113933, https://doi.org/
org/10.1061/(ASCE)ST.1943-541X.0002535. 10.1016/j.cma.2021.113933.
[201] B.K. Oh, B. Glisic, Y. Kim, H.S. Park, Convolutional neural network–based data [225] M. Bazmara, M. Silani, M. Mianroodi, Physics-informed neural networks for
recovery method for structural health monitoring, Struct. Health Monit. 19 (6) nonlinear bending of 3D functionally graded beam, Structures 49 (2023, March)
(2020) 1821–1838, https://doi.org/10.1177/1475921719897571. 152–162. Elsevier, https://doi.org/10.1016/j.istruc.2023.01.115.
[202] X. Lei, L. Sun, Y. Xia, Lost data reconstruction for structural health monitoring [226] T. Liu, H. Meidani, Physics-informed neural networks for system identification of
using deep convolutional generative adversarial networks, Struct. Health Monit. structural systems with a multiphysics damping model, J. Eng. Mech. 149 (10)
20 (4) (2021) 2069–2087, https://doi.org/10.1177/1475921720959226. (2023) 04023079, https://doi.org/10.1061/JENMDT.EMENG-7060.
[203] K. Jiang, Q. Han, X. Du, Lost data neural semantic recovery framework for [227] S. Li, S. Laima, H. Li, Physics-guided deep learning framework for predictive
structural health monitoring based on deep learning, Comput. Aided Civ. Inf. Eng. modeling of bridge vortex-induced vibrations from field monitoring, Phys. Fluids
37 (9) (2022) 1160–1187, https://doi.org/10.1111/mice.12850. 33 (3) (2021) 037113, https://doi.org/10.1063/5.0048909.
[204] Y. Gao, P. Zhai, K.M. Mosalam, Balanced semisupervised generative adversarial [228] F. Sun, Y. Liu, H. Sun, Physics-Informed Spline Learning for Nonlinear Dynamics
network for damage assessment from low-data imbalanced-class regime, Comput. Discovery. arXiv preprint. arXiv:2105.02368, 2021, https://doi.org/10.48550/
Aided Civ. Inf. Eng. 36 (9) (2021) 1094–1113, https://doi.org/10.1111/ arxiv.2105.02368.
mice.12741. [229] S.S. Eshkevari, M. Takáč, S.N. Pakzad, M. Jahani, DynNet: physics-based neural
[205] J. Li, W. Chen, G. Fan, Structural health monitoring data anomaly detection by architecture design for nonlinear structural response modeling and prediction,
transformer enhanced densely connected neural networks, Smart Struct. Syst. 30 Eng. Struct. 229 (2021) 111582, https://doi.org/10.1016/j.
(6) (2022) 613–626, https://doi.org/10.12989/sss.2022.30.6.613. engstruct.2020.111582.
[206] J. Liu, M. Zhang, H. Wang, W. Zhao, Y. Liu, Sensor fault detection and diagnosis [230] P. Huang, Z. Chen, Deep learning for nonlinear seismic responses prediction of
method for AHU using 1-D CNN and clustering analysis, Comput. Intell. Neurosci. subway station, Eng. Struct. 244 (2021) 112735, https://doi.org/10.1016/j.
2019 (2019), https://doi.org/10.1155/2019/5367217. engstruct.2021.112735.
[207] J. Pan, L. Qu, K. Peng, Sensor and actuator fault diagnosis for robot joint based on [231] H. Guo, X. Zhuang, T. Rabczuk, A Deep Collocation Method for the Bending
deep CNN, Entropy 23 (6) (2021) 751, https://doi.org/10.3390/e23060751. Analysis of Kirchhoff plate. arXiv preprint. arXiv:2102.02617, 2021, https://doi.
[208] R.F.R. Junior, I.A. dos Santos Areias, M.M. Campos, C.E. Teixeira, L.E.B. da Silva, org/10.32604/cmc.2019.06660.
G.F. Gomes, Fault detection and diagnosis in electric motors using 1d [232] R. Zhang, Z. Chen, S. Chen, J. Zheng, O. Büyüköztürk, H. Sun, Deep long short-
convolutional neural networks with multi-channel vibration signals, term memory networks for nonlinear structural seismic response prediction,
Measurement 190 (2022) 110759, https://doi.org/10.1016/j. Comput. Struct. 220 (2019) 55–68, https://doi.org/10.1016/j.
measurement.2022.110759. compstruc.2019.05.006.
[209] D. Jana, J. Patil, S. Herkal, S. Nagarajaiah, L. Duenas-Osorio, CNN and [233] P. Ni, L. Sun, J. Yang, Y. Li, Multi-end physics-informed deep learning for seismic
convolutional autoencoder (CAE) based real-time sensor fault detection, response estimation, Sensors 22 (10) (2022) 3697, https://doi.org/10.3390/
localization, and correction, Mech. Syst. Signal Process. 169 (2022) 108723, s22103697.
https://doi.org/10.1016/j.ymssp.2021.108723. [234] E. Haghighat, M. Raissi, A. Moure, H. Gomez, R. Juanes, A Deep Learning
[210] M.F. Silva, A. Santos, R. Santos, E. Figueiredo, J.C. Costa, Damage-sensitive Framework for Solution and Discovery in Solid Mechanics. arXiv preprint. arXiv:2
feature extraction with stacked autoencoders for unsupervised damage detection, 003.02751, 2020, https://doi.org/10.48550/arxiv.2003.02751.
Struct. Control. Health Monit. 28 (5) (2021) e2714, https://doi.org/10.1002/ [235] C.J. Rojas, M.L. Bitterncourt, J.L. Boldrini, Parameter Identification for a Damage
stc.2714. Model using a Physics informed Neural Network. arXiv preprint. arXiv:2
[211] S. Sony, S. Gamage, A. Sadhu, J. Samarabandu, Vibration-based multiclass 107.08781.
damage detection and localization using long short-term memory networks, [236] Q. Zhu, Z. Zhao, J. Yan, Physics-informed machine learning for surrogate
Structures 35 (2022, January) 436–451. Elsevier, https://doi.org/10.1016/j.is modeling of wind pressure and optimization of pressure sensor placement,
truc.2021.10.088. Comput. Mech. 71 (3) (2023) 481–491, https://doi.org/10.1007/s00466-022-
[212] M.H. Soleimani-Babakamali, R. Sepasdar, K. Nasrollahzadeh, I. Lourentzou, 02251-1.
R. Sarlo, Toward a general unsupervised novelty detection framework in [237] M. Pereira, B. Glisic, Physics-informed data-driven prediction of 2D normal strain
structural health monitoring, Comput. Aided Civ. Inf. Eng. 37 (9) (2022) field in concrete structures, Sensors 22 (19) (2022) 7190, https://doi.org/
1128–1145, https://doi.org/10.1111/mice.12812. 10.3390/s22197190.

37
Y.-J. Cha et al. Automation in Construction 161 (2024) 105328

[238] R. Ali, J. Zeng, M. Kavgic, Y.J. Cha, Heat loss detection using thermal imaging by [242] M. Rautela, S. Gopalakrishnan, Ultrasonic guided wave based structural damage
a small UAV prototype, in: Smart Structures and NDE for Industry 4.0, Smart detection and localization using model assisted convolutional and recurrent
Cities, and Energy Systems 11382, SPIE, 2020, April, pp. 82–90, https://doi.org/ neural networks, Expert Syst. Appl. 167 (2021) 114189, https://doi.org/
10.1117/12.2557902. 10.1016/j.eswa.2020.114189.
[239] Eric Bianchi, Matthew Hebdon, Concrete Crack Conglomerate Dataset, University [243] C.Y. Wang, A. Bochkovskiy, H.Y.M. Liao, YOLOv7: Trainable bag-of-freebies sets
Libraries, Virginia Tech, 2021, https://doi.org/10.7294/16625056.v1. Dataset. new state-of-the-art for real-time object detectors, in: Proceedings of the IEEE/
[240] Çağlar Fırat Özgenel, Concrete crack Segmentation Dataset, Mendeley Data, V1, CVF Conference on Computer Vision and Pattern Recognition, 2023,
2019, https://doi.org/10.17632/jwsn7tfbrp.1. Retrieved on Nov 27, 2023. pp. 7464–7475, https://doi.org/10.1109/CVPR52729.2023.00721.
[241] Pothole600 Dataset. Available at: https://sites.google.com/view/pothole-600/
dataset. Retrieved on November 25, 2023.

38

You might also like