Time - Series - Data 2024 05 22 05 16

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Deep Learning for Anomaly Detection in Time-Series Data: An

Analysis of Techniques, Review of Applications, and Guidelines


for Future Research
Usman Ahmad Usmania , Izzatdin Abdul Aziza , Jafreezal Jaafara , Junzo Watadab
a
Persiaran UTP, 32610 Seri Iskandar, Perak, Malaysia, Universiti Teknologi Petronas
b
1-chōme-104 Totsukamachi, Shinjuku City, Tokyo 169-8050, Japan, Waseda University

Abstract
Industries are generating massive amounts of data due to increased automation and intercon-
nectedness. As data from various sources becomes more available, the extraction of relevant
information becomes crucial for understanding complex systems’ behavior and performance.
The growing volume and complexity of time-series data in diverse industries have created a
demand for effective anomaly detection methods. Detecting anomalies in multivariate time-
series data presents unique challenges, such as the presence of multiple correlated variables
and intricate relationships among them. Traditional approaches often fall short in detecting
anomalies, making deep learning methods a promising solution. This review article pro-
vides a comprehensive analysis of different deep learning techniques for anomaly detection
in time-series data, examining their applicability across various industries and discussing the
associated challenges. The article emphasizes the significance of deep learning in detecting
anomalies and offers valuable insights to inform decision-making processes. Furthermore, it
proposes recommendations for model developers, advocating for the development of hybrid
models that combine different deep learning techniques and the exploration of attention
mechanisms in Recurrent Neural Networks (RNNs). These recommendations aim to over-
come the challenges associated with deep learning-based anomaly detection in multivariate
time-series data.
Keywords: Deep Learning, Time Series Analysis, Anomaly Detection, Multivariate
Time-Series, Automation, Univariate Time Series

1. Introduction
Industries are generating an unprecedented amount of data from various sources, ranging
from sensors, machines, and production lines to automated systems. Data owners collect

Email addresses: [email protected] (Usman Ahmad Usmani), [email protected]


(Izzatdin Abdul Aziz), [email protected] (Jafreezal Jaafar), [email protected] (Junzo
Watada)

May 22, 2024


and store this data in databases for further analysis. With the advent of Industry 4.0,
sensors are being deployed extensively to collect real-time data on various parameters like
temperature, pressure, humidity, and vibrations, among others. This data can be analyzed
using advanced analytics techniques such as machine learning (ML) and deep learning (DL)
to identify patterns and anomalies. [1][2][3]The insights gained from analyzing this data can
help industries make informed decisions, optimize production processes, reduce downtime,
and improve overall efficiency. Predictive maintenance, which uses the data collected from
sensors to schedule maintenance before a breakdown occurs, has emerged as a game-changer
for many industries, reducing costs and increasing uptime. [4][5] The data collected from
sensors can also be used to monitor energy consumption and optimize resource usage, making
industries more sustainable.
Time-series data has been studied for a long time in academia due to its widespread
applications in various fields such as finance, economics, engineering, and social sciences.
Time series analysis is a statistical technique used to analyze time-dependent data, where
observations are recorded at regular intervals over time. [6][7][8] The primary objective of
time series analysis is to identify patterns, trends, and anomalies in the data, which can
then be used to make predictions about future values. Well-known examples of time series
analysis include stock market analysis, weather forecasting, and economic forecasting. In
stock market analysis, time series analysis is used to predict future stock prices based on
past performance. In weather forecasting, time series analysis is used to predict weather
patterns based on historical data. In economic forecasting, time series analysis is used to
predict future trends in economic indicators such as GDP, inflation, and unemployment rates.
Other examples of time series analysis include predicting sales trends in retail, forecasting
energy demand, and predicting website traffic. [9][10] Time series analysis has also been used
in medical research to analyze data from clinical trials and in signal processing to analyze
time-dependent signals.
Anomaly detection refers to the process of identifying patterns in data that deviate from
the expected or normal behavior. Anomalies are often referred to as outliers, novelties, or
abnormalities in the data. Classical methods for anomaly detection include statistical meth-
ods, such as z-score, and distance-based methods, such as k -nearest neighbors. Detecting
anomalies in time-series data is challenging because time series data is sequential and ex-
hibits temporal dependencies. [11][12] Anomalies in time-series data can manifest in various
ways, such as sudden changes in the mean, periodic changes in the variance, and temporal
correlations between data points. These patterns can be subtle and challenging to detect
using classical methods.
To address these challenges, various approaches have been proposed in the literature.
One way for anomaly detection in time-series data is to focus on developing customized
machine learning (ML) and deep learning (DL) models. Instead of solely relying on pre-
existing algorithms, you can explore techniques like feature engineering, hyperparameter
optimization, and model customization to enhance the performance of your anomaly detec-
tion system. By tailoring these models to your specific needs and dataset, you can gain
more control over the anomaly detection process and reduce reliance on external research.
Additionally, experimenting with clustering algorithms and developing custom change point
May 22, 2024
detection algorithms can further reduce dependency and improve the accuracy of anomaly
identification. Finally, ensemble-based methods, which combine multiple anomaly detection
algorithms, have also been proposed to improve the accuracy of anomaly detection.
DL methods are capable of learning complex representations of data by using multiple
layers of artificial neural networks. These neural networks are trained on large datasets to
learn patterns and relationships within the data, allowing them to detect anomalies that may
be difficult to detect using traditional ML techniques. One of the most popular DL methods
used for anomaly detection is the autoencoder. [13][14]An autoencoder is an unsupervised
learning algorithm that is trained to reconstruct input data from a compressed represen-
tation. During training, the autoencoder learns to encode the input data into a lower-
dimensional space and decode it back into the original high-dimensional space. Anomalies
are detected when the reconstruction error is high, indicating that the autoencoder is not
able to accurately reconstruct the input data.
Another DL method used for anomaly detection is the recurrent neural network (RNN),
which is a type of neural network that is designed to process sequential data. RNNs are
trained on sequences of data and can learn temporal dependencies between data points. This
makes them useful for detecting anomalies in time-series data, such as financial transactions
or sensor readings. In addition to autoencoders and RNNs, other DL methods that have
been used for anomaly detection include CNNs, generative adversarial networks (GANs),
and variational autoencoders (VAEs). [15][16][17]Overall, DL methods have shown great
promise in detecting anomalies in a wide range of applications, including fraud detection,
intrusion detection, and medical diagnosis, among others. However, it’s important to note
that these methods require large amounts of labeled training data and can be computa-
tionally expensive to train and deploy. Additionally, care must be taken to ensure that the
model is not overfitting to the training data and is generalizable to new data.
The objective of this study is to conduct a comprehensive review of the current state-
of-the-art deep learning (DL)-based anomaly detection methods for time-series data. The
focus is to explore how these methods are specifically designed to capture the intricate
interrelationships among variables in multivariate time series data. This review aims to delve
deeper into the techniques used to learn the temporal context of the data and effectively
identify anomalies. By examining and analyzing these DL-based approaches, this paper
aims to contribute to a better understanding of the advancements in anomaly detection
methodologies for time-series data. Furthermore, we will provide a detailed analysis of the
strengths and weaknesses of these methods, and their suitability for different applications.
Through this study, we aim to provide practitioners with a set of guidelines for selecting and
applying the most appropriate DL-based anomaly detection methods for their specific use
cases. By offering insights into the latest advancements in this field, we hope to contribute
to the further development and improvement of DL-based anomaly detection techniques in
time-series data.
The rest of the article is organized as follows: In Section II, we discuss the basics of
anomaly detection and time series. In Section III, several industrial application scenarios and
the key conventional approaches that have made them obsolete in modern applications are
discussed.Section IV examines a comprehensive review of studies focusing on DL for anomaly
May 22, 2024
detection across various types of data. Firstly, we discuss the application of DL algorithms
for univariate and multivariate data, specifically modeling them for anomaly detection in
Section V. Then, we delve into the comparative reviews of these algorithms. Furthermore,
we explore how these studies have utilized time-series data in unsupervised feature learning
algorithms or have contributed to advancements in feature learning algorithms to address
related challenges. We analyze how current anomaly detection algorithms define inter-
correlations between data, explain the modeling of time-series data, and identify anomalous
criteria. In Section VI, we present a set of general guidelines to assist model developers in
selecting appropriate models for specific problems and conditions. Then, in Section VII, we
draw our final conclusions based on the findings and discussions presented throughout the
paper.

2. Background
2.1. Anomalies in Time Series Data
Anomalous data refers to observations that do not follow the expected patterns in a
dataset. Anomalies can have various definitions and interpretations, depending on the con-
text and nature of the data. Hawkins [18] introduced a widely accepted definition of an
outlier as an observation that exhibits a significant deviation from other observations, im-
plying that it was generated by a distinct mechanism. While this definition is commonly
used in the literature, alternative definitions also exist. [19][20] In the realm of time series
data, anomalies can be categorized into various types depending on the nature of the de-
viation they represent. These classifications help provide a more nuanced understanding of
different anomaly patterns and guide the development of specialized detection techniques
for each type.
Point Anomaly: One type is point anomaly, which refers to a single data point that is
significantly different from the rest of the data points. For example, in a temperature time
series data, a point anomaly could be a sudden spike or dip in temperature that does not
follow the usual seasonal trend.
Contextual Anomaly: Another type of anomaly is contextual anomaly, where a data
point is not anomalous by itself but becomes anomalous when compared to other data points
in its context. For example, in a credit card transaction dataset, a purchase of $500 might
not be anomalous by itself, but if the usual spending pattern of the cardholder is around
$50, then this purchase becomes a contextual anomaly.
Collective Anomaly: Collective anomalies are groups of data points that deviate from
the expected behavior of a time series. They can be detected using statistical methods, ML
algorithms, or rule-based systems. Identifying anomalies is crucial in domains like fraud
detection and predictive maintenance, and the choice of detection method depends on the
data, dataset size, and application. Other types of anomalies can also occur in datasets
apart from those mentioned. Here are some of the other types of anomalies:
Missing Anomaly: These occur when there are missing values in the dataset. Missing
values can be caused by various factors, such as equipment failure, human error, or data
corruption.
May 22, 2024
Minor Anomaly: These refer to data points that deviate slightly from the expected
pattern but are not significant enough to be considered as outliers. Minor anomalies are not
noticeable on their own but can accumulate over time and affect the overall data quality.
Outliers: Outliers are data points that deviate significantly from the expected pattern
and are often caused by measurement errors, data entry errors, or rare events. [21]Outliers
can be either point anomalies or collective anomalies, depending on whether they occur as
a single data point or a group of data points.
Square Anomaly: These are anomalies that occur in image or video data and refer to
a group of pixels that have different properties than their surrounding pixels. [22]Square
anomalies can be caused by image compression, camera malfunction, or image manipulation.
Trends: A trend is an anomaly that occurs when there is a long-term change in the
pattern of the data. [23]Trends can be either increasing or decreasing and can be caused
by various factors, such as changes in the environment, economic conditions, or population
dynamics.
Normal Anomaly: Normal Anomaly pattern with stable amplitude and frequency
over time steps could imply a pattern that is generally expected or typical (normal), but
occasionally exhibits significant deviations (anomalies) from the expected behavior.
Figure 1 provides an overview of anomaly classification in time series data, illustrating
diverse types of anomalies with examples. It aids in comprehending anomaly characteristics
and manifestations, assisting in anomaly detection and handling in data analysis and ML.

2.2. Properties of Time Series Data


Time-series data is a type of data that is collected over time and has a natural temporal
ordering. This type of data is common in various domains, such as finance, economics,
medicine, and engineering. Time-series data has some unique properties that distinguish it
from other types of data. Here are some of the properties of time-series data:
Time-dependent: Time-series data is dependent on time, meaning that each data
point is associated with a specific time stamp. The time interval between data points can
be regular or irregular, depending on the nature of the data. [24]The time-dependency of
time-series data makes it ideal for analyzing trends, patterns, and seasonal variations.
Auto-Correlation: Time-series data often exhibits auto-correlation, which means that
the value of a data point is related to the values of the previous data points. [25]Auto-
correlation can be positive or negative, depending on whether the previous values are pos-
itively or negatively related to the current value. Auto-correlation can be used to identify
patterns and trends in the data.
Seasonality: Seasonality refers to the recurring patterns that occur at fixed intervals in
time-series data. For example, sales of winter clothes may increase during the winter season
and decrease during the summer season. [26] Seasonality can be identified using various
methods, such as visual inspection, statistical tests, and ML algorithms.
Dimensionality: Dimensionality refers to the number of variables or dimensions in the
time-series data. Time-series data can be univariate, meaning that it contains only one
variable over time, or multivariate, meaning that it contains multiple variables over time.

May 22, 2024


Figure 1: Anomaly Classification in Time Series Data: Types and Examples

[27] Multivariate time-series data can be more complex to analyze than univariate time-series
data, as there can be interactions between the different variables over time.
Non-Stationarity: Non-Stationarity in time series data occurs when statistical proper-
ties change over time due to factors like trends or seasonality. Analyzing and modeling such
data can be challenging as many models assume stationarity. [28] Methods like differencing
and detrending can transform nonstationary data, while multivariate data may require tech-
niques like vector autoregression, depending on the problem and available resources. Table
1 presents an overview of the challenges involved in DL-based anomaly detection for time-
series data, along with process descriptions and insights. This overview aims to shed light
on the complexities and difficulties that arise when employing DL techniques for the detec-
tion of anomalies in time-series data. By understanding these challenges, one can better
navigate the intricacies of applying DL models to identify anomalies in time-series datasets,
thus reducing dependency on external research for guidance.
[Table 1 about here.]
Temporality-Temporality plays a vital role in time-series analysis as it captures the sequen-
tial nature of data points, allowing for the identification of dependencies and correlations
May 22, 2024
over time. A time series is a chronological collection of observations, essential for under-
standing trends and patterns such asabcdand indexed by date. [29] The data is collected on
one day (say, t=1) and end on a different date (say, t=T). (y1 ,y2 ,.....yT ). To gather data at
equidistant time intervals as shown in Equation 1 as follows:
T = (td0 , td1 , ....tdt ), d ∈ N+ , t ∈ N+ , t ∈ N (1)
Trend- A time series with a non-constant mean that either increases or decreases over
time is considered to have a trend, which can be linear or non-linear. Trends show significant
changes in the value of a series over time and typically last a few weeks before fading,
without repeating. Similarly, newly released music tends to be popular for a brief period
before dissipating, but there is a high likelihood of it becoming popular again soon [31] The
Equation below creates a deterministic trend as shown in Equation 2 as follows:
yt = β0 + β1t + ηt (2)
where t is an ARIMA process. We can difference both sides to get as shown in Equation
3 as follows:
yt = β1 + ηt (3)
yt = yt−1 + β1 + ηt
where ηtI is an ARIMA process.
Stationarity- Stationarity refers to a crucial property of stochastic processes where the
distribution of a finite sub-sequence of random variables remains unchanged when shifted
along the time index axis. It is formally defined as strict-sense stationarity and is funda-
mental in understanding stochastic processes. [32]. Seasonality refers to regular, patterned
changes in time series data influenced by factors like time of year or day of the week. [33]
It repeats within a year and has a predetermined length. In economic sectors, seasonality
captures year-to-year patterns and fluctuations, aiding in understanding and predicting in-
dustry behavior. The equation for the seasonal component equation is as follows in Equation
4:
st = γ ∗ (1 − α)(yt − lt−1 − bt−1 ) + [1 − γ ∗ (1 − α)]st−m (4)
which is similar to the smoothing equation for the seasonal component we showed with
γ = γ ∗ (1 − α), which translates to 0 ≤ γ ≤ 1
Cyclic-Time series can exhibit cyclic patterns when they display increases and declines
without a predetermined time. These oscillations, lasting two years or more, are different
from seasonal behavior which has a continuous period related to a calendar component. [34]
Cycles have longer average durations and more variable amplitudes compared to seasonal
patterns.
Concept Drift- Concept drift refers to changing behavior in the target variable or
feature-target relationships over time, impacting model accuracy. [35] Adapting to concept
drift involves monitoring, adjusting, and periodically retraining models to mitigate its effects,
especially in complex phenomena influenced by human activities like sales prediction and
socio-economic processes.
Level- The mean of a time series corresponds to its level. In the case of a patterned time
series, the level is typically described as dynamic, reflecting its changing nature. Conversely,
a stationary time series maintains constant characteristics over time, allowing for formal rep-
resentation and analysis. Stationary time series exhibit consistent properties throughout,
May 22, 2024
including a fixed mean, variance, and autocorrelation structure. Recognizing the distinction
between changing levels in patterned time series and the stationary nature of others facili-
tates a more systematic understanding of the data, reducing the need for extensive external
research. [36] If the distribution of (xt ,..., xt+s ) is the same for all s, then Xt is a stationary
time series. A stationary time series x1 ,..., xT has the following features, according to the
fact that there is no trend in the time series since the mean is constant. The variance of the
time series remains constant, indicating a consistent spread of data points. Additionally,
the autocorrelation remains unchanged throughout the time series, indicating a persistent
relationship between data points and their lagged values.
White Noise-Noise in time-series data refers to random variations or errors, often caused
by measurement or sampling errors. Removing or reducing noise is crucial for accurate
analysis. White noise is a type of noise with a constant mean and finite variance, where
the PACF and ACF are both zero, indicating no dependency between timestamps. Many
theoretical models assume the white noise to be Gaussian: ε ∼ N(0, σ). A single time series
consists of scalar observations collected regularly, such as monthly CO2 concentrations, while
a multivariate time series includes multiple variables and timestamps expressed as Xt = (x1,t ,
x2,t, ..., xd,t ), if Xt ∈Rd . As a result,p th order VAR model Xt has the attribute of being a vector
(d×1) as shown in Equation 5 as follows:
Xt = c + O1 Xt−1 + O2 Xt−2 + . . . + Op Xt−p + εt (5)
where Ot is a d×d matrix. In the same manner as the AR-Model is for univariate time
series, εt is white noise.

3. Applications
Industries leverage digital technologies like AI, IoT, and big data analytics to enhance
competitiveness. Integration of digital components with real-world phenomena using sen-
sors and data analytics enables proactive strategies for predictive maintenance, reducing
downtime and improving efficiency. Anomaly detection powered by ML detects abnormal-
ities, minimizing downtime, enhancing efficiency, and enabling informed decisions across
sectors like finance, manufacturing, healthcare, and retail. Through these discussions, we
aim to highlight the versatility and value of anomaly detection in addressing unique industry
requirements and improving operational outcomes.

3.1. Cloud Computing


3.1.1. Network Anomaly Detection
ML algorithms have been used extensively for network anomaly detection, which is the
process of identifying unusual behavior or traffic on a computer network that may indicate a
security threat, a malfunctioning system, or a misconfiguration. Network anomaly detection
is a challenging task due to the volume and complexity of network traffic data, the diversity of
network protocols and services, and the evolving nature of cyber threats. [38][39]Traditional
rule-based and signature-based methods for detecting network anomalies are often limited
in their ability to identify unknown or zero-day attacks and may generate a large number
of false alarms. ML algorithms, on the other hand, can learn patterns and behaviors from
May 22, 2024
large amounts of network traffic data and detect anomalies that deviate significantly from
normal traffic patterns. There are several ML algorithms that have been used for network
anomaly detection, including:
Unsupervised learning algorithms can identify anomalous behavior based on statistical
models of normal network behavior. Supervised learning algorithms can classify network
traffic as normal or anomalous based on a predefined set of features. [40][41] ML algorithms
have shown promise in detecting network anomalies such as denial of service attacks, port
scanning, and malware infections. However, limitations exist, including the need for labeled
data, potential false positives, and difficulty in detecting sophisticated attacks that mimic
normal network behavior. To address these challenges and reduce reliance on external re-
search, researchers are exploring unsupervised learning approaches that identify anomalies
based on deviations from normal behavior. Incorporating additional security measures like
intrusion detection systems and firewalls can further enhance network security. Continued
research and development in ML algorithms will improve anomaly detection capabilities,
reducing limitations and enhancing network security without heavy reliance on external
research. [42][43]

3.2. Unmanned aerial vehicles


Unmanned aerial vehicles (UAVs), or drones, use ML algorithms to detect anomalies for
surveillance and security purposes. [44] Sensors and cameras on UAVs capture data about
the environment, which is preprocessed and features are extracted to identify normal and
abnormal behavior. ML algorithms are trained on labeled datasets and then applied in real-
time to new UAV data for anomaly detection using techniques like deep neural networks,
decision trees, and support vector machines. [45]The algorithm processes the incoming data
and identifies any behavior that deviates from the learned patterns of normal behavior. If
an anomaly is detected, an alert is triggered to alert security personnel. Finally, the system
may incorporate feedback from security personnel to improve the accuracy of the algorithm
over time. For example, if a false alarm is triggered, the algorithm can be adjusted to reduce
the likelihood of similar false alarms in the future. This can help improve the efficiency and
effectiveness of security systems by reducing the need for human monitoring and quickly
identifying potential threats.

3.3. Smart Manufacturing


ML algorithms play a vital role in smart manufacturing for anomaly detection, predictive
maintenance, quality control, process optimization, and cybersecurity. These algorithms
analyze real-time data from sensors and equipment to identify deviations, predict equipment
failures, detect defects, optimize processes, and ensure cybersecurity. [47] Their ability to
analyze data in real-time enables timely actions to maintain product quality, safety, and
efficiency in smart manufacturing environments.

3.3.1. Mechanical Equipment


ML algorithms play a crucial role in detecting anomalies in mechanical equipment, pre-
dicting failures, and minimizing downtime by analyzing real-time sensor data. They enable
May 22, 2024
predictive maintenance by identifying patterns that indicate potential machine failures, al-
lowing operators to take preventive action. ML algorithms also facilitate condition mon-
itoring, fault detection, and diagnosis by analyzing sensor data to identify anomalies and
diagnose issues. [48] Additionally, they contribute to quality control efforts by detecting
anomalies in product quality, recommending corrective action, and enhancing equipment re-
liability. Real-time analysis provided by ML algorithms helps optimize production processes,
reduce downtime, and improve overall efficiency in mechanical equipment operations.

3.3.2. Logistics Automation System


ML algorithms play a crucial role in anomaly detection for logistics automation systems,
helping identify unusual patterns or events that can disrupt the supply chain. Algorithms
like isolation forest, SVM, clustering, and neural networks are trained on historical data
to detect anomalies in real-time. By proactively identifying and resolving potential issues,
ML algorithms enhance operational efficiency, cost reduction, and customer satisfaction in
logistics automation systems.

3.3.3. INFRASTRUCTURE FACILITIES


ML algorithms are increasingly employed for anomaly detection in infrastructure facilities
like power plants and transportation networks. These algorithms are trained on historical
data from sensors and cameras to learn normal behavior, enabling them to identify deviations
as anomalies. Random forest and k-nearest neighbor algorithms are commonly used for this
purpose. By proactively detecting and resolving issues, ML algorithms prevent failures,
reduce downtime, and enhance safety in infrastructure facilities.

3.4. Smart Energy Management


ML algorithms play a crucial role in anomaly detection for smart energy management
systems, optimizing energy consumption and efficiency. Trained on historical data from sen-
sors, ML algorithms identify and resolve deviations from expected behavior, reducing waste,
preventing equipment failures, and enhancing system performance. [49][50]ML algorithms
used for anomaly detection in smart energy management systems depend on the data type
and application. Data sources can include sensors like temperature and occupancy sensors
in building energy management systems or smart meters and weather forecasts in smart grid
systems. ML algorithms optimize energy consumption, reduce waste, prevent failures, and
enhance system performance. Anomaly detection plays a critical role in safeguarding the
power grid infrastructure from cyber threats, monitoring and identifying abnormal activities
throughout the interconnected system.

3.4.1. Manufactured Gas


ML algorithms are crucial for anomaly detection in aging manufactured gas infrastructure
to ensure safety and prevent accidents. Historical data from sensors like gas flow meters,
pressure sensors, and temperature sensors train these algorithms to learn the system’s normal
behavior. ML algorithms create boundaries around normal data and identify deviations as
anomalies. Hidden Markov Model (HMM) and Principal Component Analysis (PCA) are
May 22, 2024
popular algorithms used for this purpose. ML algorithms enhance reliability, safety, and
accident prevention in manufactured gas systems by detecting anomalies in real-time.

3.4.2. Electric Power


ML algorithms are widely used in electric power systems for anomaly detection, aim-
ing to prevent power outages and equipment failures. The availability of extensive data
from smart meters and sensors has fueled the popularity of ML algorithms for this purpose.
These algorithms are trained on historical data from voltage and current sensors to learn
the normal behavior of the system. They can then detect deviations from this behavior in
real-time as anomalies. SVM, RNN, and PCA are examples of ML algorithms commonly
employed in anomaly detection for power systems. SVM creates a boundary around normal
data, identifying data points outside this boundary as anomalies. RNN captures temporal
dynamics to detect anomalies over time, while PCA identifies patterns and deviations from
these patterns. The data sources used for anomaly detection depend on the specific appli-
cation, such as voltage and current sensors in power grids or wind speed and turbine output
sensors in wind farms. Overall, ML algorithms improve system reliability, prevent power
outages, and reduce maintenance costs.

3.4.3. Treated Water


ML algorithms are extensively used for anomaly detection in treated water systems to
prevent water quality issues and equipment failures. The availability of data from water
quality sensors has fueled the popularity of ML algorithms in this field. By training on
historical data from sensors like turbidity, pH, and chlorine, ML algorithms can learn the
normal behavior of the treated water system. They can then detect any deviations from this
behavior in real-time as anomalies. SVM is a popular ML algorithm for this purpose. Over-
all, ML algorithms ensure the quality and reliability of treated water systems by detecting
anomalies and facilitating timely corrective actions for delivering safe water to consumers.
[51]
Anomaly detection in treated water systems relies on various data sources depending on
the application. Drinking water treatment plants use sensors like turbidity, pH, and chlorine
sensors, while wastewater treatment plants utilize sensors such as dissolved oxygen, BOD,
and COD sensors. ML algorithms for anomaly detection in treated water systems aid in
preventing water quality issues, enhancing safety, and optimizing system efficiency.

3.5. Structural Health Monitoring


Structural Health Monitoring (SHM) involves monitoring structures like buildings and
bridges to detect damage or degradation. ML algorithms, such as GMM, HMM, and PCA,
are used for anomaly detection in SHM by learning normal behavior from sensor data. [52]
These algorithms identify deviations from expected behavior as anomalies in real-time. Sen-
sors like strain gauges, accelerometers, and temperature sensors are used to collect data.
ML-based anomaly detection in SHM enhances safety, reduces maintenance costs, and pre-
vents structural failure.

May 22, 2024


3.6. Challenges of Classical Approaches
Time series analysis plays a crucial role in understanding and forecasting temporal data.
Classical approaches have limitations in capturing complex patterns and detecting anomalies
due to assumptions of linearity, stationarity, and specific underlying models. Advanced uni-
variate and multivariate techniques leverage statistical methods to capture intricacies and
interdependencies within time series data. ML approaches offer powerful tools for anomaly
detection, prediction, and uncovering hidden patterns. Challenges with classical approaches
include limited capacity for capturing non-linear relationships, assumptions about data dis-
tribution and structure, and difficulties in handling missing data and outliers. Integrating
modern approaches, like DL, can address these challenges and improve time series analysis.
Time/frequency Domain Analysis: Time/frequency domain analysis is widely used
in signal processing, communications, acoustics, and image processing. The choice between
time and frequency domain depends on the signal’s characteristics and analysis goals. Time
domain analysis focuses on short-term variations, while frequency domain analysis reveals
long-term patterns. Time domain analysis analyzes signal properties based on the sequence
of values ordered by time, while frequency domain analysis decomposes the signal into
frequency components using techniques like Fourier transforms, enabling the identification
of periodic patterns. The challenges of time/frequency domain analysis include:
Non-Stationarity: Time series data is often non-stationary, which means that its sta-
tistical properties, such as mean and variance, may change over time. This can make it
difficult to apply classical time/frequency domain analysis methods, which assume that the
data is stationary.
Missing Data: Time series data may contain missing values or gaps, which can affect
the accuracy of the analysis. Classical time/frequency domain analysis methods often require
complete and continuous data, and dealing with missing data is a challenge.
Nonlinear Relationships: Time series data may contain complex nonlinear relation-
ships that are difficult to model using classical time/frequency domain analysis methods.
Nonlinear relationships may require more sophisticated techniques, such as ML methods, to
accurately capture.
Complexity: Time series data may be complex and high-dimensional, which can make
it difficult to analyze using classical time/frequency domain analysis methods. Analyzing
complex data often requires advanced techniques, such as multivariate time series analysis,
dimensionality reduction, and clustering.
Interpretability: Classical time/frequency domain analysis methods can provide in-
terpretable results, which can be valuable in many applications. However, these methods
may not be able to capture all the nuances in the data and may not provide a complete
understanding of the underlying processes.
Autocorrelation: Time series data often exhibits autocorrelation, which means that
the value of the series at one time point is related to the values at previous time points.
This violates the assumption of independence that many statistical models rely on, and it
requires specialized modeling techniques to account for autocorrelation.
Seasonality: Time series data may exhibit seasonal patterns, which can make it difficult
to capture the underlying trends and relationships in the data. Seasonal patterns can be
May 22, 2024
accounted for using STL models or other specialized models.
Outliers: Time series data may contain outliers or extreme values, which can affect the
accuracy of the model. Identifying and handling outliers is an important challenge in time
series modeling.
Model selection: There are many statistical models available for time series data, each
with its own strengths and weaknesses. Choosing the appropriate model for a given dataset
and problem is a challenge, and it requires a good understanding of the assumptions and
limitations of each model.
Forecasting: Time series modeling aims to forecast future values, but this is challenging
due to complex patterns and limited data. Combining statistical models, ML, and DL
can overcome these challenges, capturing intricate patterns and improving accuracy and
interpretability.
Dimensionality: Time series data is often high-dimensional, with many variables mea-
sured at each time point. This can make it difficult to compute distances or similarities
between pairs of time series, and it may require dimensionality reduction techniques to be
applied first.
Time warping: Time series data may exhibit time warping, which means that the
patterns in the data may be shifted or scaled in time. This can make it difficult to compute
distances or similarities between pairs of time series, as direct comparisons may not be
possible without first aligning the time series. [53]
Noise: Time series data may contain noise or measurement error, which can affect the
accuracy of distance-based models. It is important to identify and remove noise or outliers
from the data before applying distance-based models.
Interpretability: Distance-based models may provide accurate predictions or identify
patterns in time series data, but the results may not always be interpretable. It may be
difficult to understand why certain time series are similar or different, or to explain the
underlying factors driving the patterns in the data.
Predictive Model- A predictive model is a type of statistical model used for time series
data analysis that aims to predict future values of the series. Predictive models use historical
data to identify patterns and relationships in the data, and then use this information to make
predictions about future values.
Seasonality: Time series data may exhibit seasonal patterns, which can make it difficult
to capture the underlying trends and relationships in the data. Seasonal patterns can be
accounted for using seasonal decomposition of time series (STL) models or other specialized
models.
Overfitting: Predictive models may overfit the data, which means that they capture the
noise or random variation in the data rather than the underlying patterns and relationships.
Overfitting can lead to inaccurate predictions and a lack of generalizability to new data.
Model selection: There are many predictive models available for time series data, each
with its own strengths and weaknesses. Choosing the appropriate model for a given dataset
and problem is a challenge, and it requires a good understanding of the assumptions and
limitations of each model.
Forecasting horizon: The forecasting horizon refers to the length of time over which
May 22, 2024
the predictive model is expected to make accurate predictions. The longer the forecasting
horizon, the more difficult it is to make accurate predictions, as the underlying patterns and
relationships in the data may change over time.
Scalability: Traditional approaches may not be scalable for large-scale time-series
datasets. As the size of the data increases, the computational and storage requirements
of these approaches may become prohibitive.
Complex Patterns: Anomalies in time-series data may exhibit complex patterns and
relationships that are difficult to capture using traditional approaches. These patterns may
be nonlinear, dynamic, or spatially correlated, which can make them challenging to model.
Labeling: Anomaly detection in time-series data often requires labeled data for training
and evaluation. However, labeling time-series data can be time-consuming and expensive,
especially for large-scale datasets.
In the preceding sections, we have discussed the limitations of classical time series anal-
ysis approaches and the need for more advanced methods to capture complex patterns and
detect anomalies. In the upcoming sections, we will delve into the exploration of univariate
and multivariate approaches in time series analysis. These approaches leverage statistical
techniques to analyze individual variables and the interdependencies among multiple vari-
ables in time series data. By examining the strengths and considerations of both univariate
and multivariate approaches, we aim to provide a comprehensive understanding of their ap-
plications and effectiveness in capturing the dynamics and anomalies present in time series
datasets.

3.6.1. Univariate Approaches


Univariate approaches focus on analyzing individual variables in isolation, aiming to
derive insights and patterns specific to each variable without considering their relationships
or interactions with other factors. The following are the univariate approaches:

AutoRegressive Model. An autoregressive (AR) model is a statistical time series model that
predicts future values based on previous observations in the same series. In an AR model,
the current value of the series is assumed to be a linear combination of its past values, with
the coefficients of the past values known as autoregressive parameters. The order of the
model, denoted as AR(p), indicates the number of past values used in the prediction, where
’p’ represents the order. The model assumes that the residual errors are uncorrelated and
have constant variance. The parameters of the AR model are estimated using techniques
such as maximum likelihood estimation or least squares. In this linear model, the present
value Xt (dependent variable) of the stochastic process is determined by a finite set of past
values (independent variables) of length p, along with an associated error term ε as shown
in EquationPp 6 as follows:
Xt = i=1 ai Xt−i + c + εt (6)
AR models (AutoRegressive) with order p assume data stationarity and use least-squared
regression to calculate coefficients a1 ,..., ap , and c. However, they have limitations such as
difficulty capturing nonlinear dynamics, challenges in selecting the optimal model order, and
vulnerability to overfitting with limited observations. [54] They also require complete data

May 22, 2024


without missing values and have limited incorporation of external variables. Evaluation of
AR models should consider data characteristics and application requirements.

Moving Averages Model (MA) . Xt is a linear combination of the last p prediction errors,
xt , xt−1 ,..., xt−p : in the AR model. In the MA model, Xt is a linear combination of the last
q prediction Pq errors, xt , xt−1 ,..., xt−q : in the MA model as shown in Equation 7 as follows:
Xt = i=1 ai εt−i + µ + εt (7)
The values εt , εt−1 ,..., εt−q are unknown at the start of the MA model, but the prior
variables xt , xt−1 ,..., xt−p are known in the AR model. Anomalies in fitted models are
optimized consecutively due to no closed solution for MA models, requiring iterative solving
with a non-linear estimating approach [55]. MA models excel at short-term fluctuations
but struggle with long-term trends, complex patterns, outliers, non-stationary data, and
incorporating external variables. Addressing these limitations and enhancing forecasting
accuracy necessitate careful consideration and potential modifications.

Autoregressive Moving Averages Model. The combination of AR and MA is another model


for univariate time series that is often utilized in practice. The most recent p observations
and q errors have an impact on the ARMA(p,q) model’s time series as shown in Equation 8
as follows::
Xt = pi=1 ai Xt−i + qi=1 bi εt−i + εt (8)
P P
The process {XT } is an ARMA(p,q) if {XT } is stationary. The ai and bi parameters are
estimated by fitting the model to the data. In addition to the p and q parameters the time
series x0 ,..., xT are differenced as follows for d = 1 as shown in Equation 9 as follows:
Yi = Yi − Y ∀i ∈ {1, . . . , T }(9)
Anomalies are discovered after the ARIMA model has been fitted by calculating the
difference between predicted and actual points. As a result, Yt+1 is a weighted average
of the previous data points. Exponential Smoothing is determined by the rate α, which
controls the exponential decline of weights. [56]

Simple Exponential Smoothing (SES). SES [21] is a kind of exponential smoothing. SES
projections based on previous time series data and exponentially higher weights for the
most recent observations, using a non-linear approach as shown in Equation 10 as follows:
Xt+1 = βXt + β(1 − β)Xt−1 + β(1 − β)2 Xt−2 + . . . + β(1 − β)N Xt−N (10)
Extensive Smoothing (ES) utilizes high numbers of data points and offers Double and
Triple Exponential Smoothing (SES) for consistent and non-stationary data. SES handles
trend with an additional parameter β, while Triple Exponential Smoothing considers sea-
sonality with γ. However, SES has limitations: handling trend and seasonality, equal weight
to all observations, lack of adaptability, limited incorporation of external factors, and sensi-
tivity to initial parameter selection. Accurate forecasting requires careful evaluation of data
characteristics and exploration of alternative methods.

Time series Outlier Detection using Prediction Confidence Interval (PCI). An approach for
estimating the likelihood of a prediction is the Prediction Confidence Interval (PCI) [57].
This method uses a succession of non-linearly weighted prior data bits to predict the next
May 22, 2024
data item. The threshold is then used to determine whether or not a data point is normal
or anomaly. The method then establishes an upper and lower threshold for identifying
anomalies as shown in Equation
q 11 as follows::
PCI = Yt ± tα,2k−1 · s 1 + n1 (11)
In this tα ,2k−1 -distribution, students have a p-th percentile of 2k -1 degrees of freedom,
the model residual standard deviation is represented by s, and the window size k used to
calculate s is represented by k. Anomaly detection in time series using PCI has limitations
and requires careful consideration for accurate results.

3.6.2. Multivariate Approaches


Multivariate Approaches encompass sophisticated analytical techniques that consider
the relationships and interactions between multiple variables to gain a comprehensive un-
derstanding of complex systems and phenomena.

Average of anomaly scores of multiple univariate time series. Anomaly scores are computed
for each univariate time series, with the average score representing the overall anomaly score
for the multivariate time series. However, this approach overlooks relationships between vari-
ables, fails to aggregate outlier information, lacks consistent scaling, and is computationally
complex. To improve accuracy, consider evaluating data characteristics, exploring multi-
variate anomaly detection techniques, and addressing limitations such as interdependencies
and context-specific weighting.

Anomaly Detection by Projection. A common approach to anomaly detection is converting


multivariate time series into univariate ones, with techniques such as subspace monitoring,
kernel matrix similarity, kurtosis coefficients, and iterative outlier removal. [59][60][61] The
method introduced by Sun et al. [62] serves as a foundation, including the difference be-
tween consecutive windows of multivariate time series in the univariate representation. The
δ t−1,t differential between Vt and Vt−1 is the most important. The eigenvalue problem of the
following matrix can be used to compute in Equation 12 as follows:
V T Vt−1 V T Vt (12)
We calculate anomalies based on their methodology, where the two subspaces have an
equal number of principal components. Equation 13 represents this calculation which can
be written as follows: Pk
λV
sim(Vt , Vt−1 ) = Pk i=1λV t,i (13)
i=1 t−1,i

where λV t,i , is the eigen value of the ith principal component of Vt


Anomaly detection by projection has limitations such as lack of interpretability, sensitiv-
ity to parameters and projection methods, difficulty in setting thresholds, challenges with
complex anomalies, and computational complexity, which can be addressed through careful
evaluation, consideration of alternative techniques, and alignment with data and application
requirements.

Detecting Outliers in Time Series Using PCI. Multivariate time series are treated similarly
to univariate time series, where the PCI approach is applied independently to each univariate
May 22, 2024
time series. However, using PCI for outlier detection has drawbacks, including reliance on
accurate prediction models, sensitivity to assumptions, challenges in setting appropriate
thresholds, and computational complexity. Careful consideration of these limitations is
necessary to ensure accurate and reliable outlier detection in time series using PCI.

Vector based Autoregressive Model (VAR). A multivariate time series AR-Model is a vector-
based Autoregressive model (AR-Model). A VAR model’s multivariate time series is based on
the notion that prior timestamps and the values of other variables influence each timestamp.
In this model, Xt = (x1,t , x2,t ,..., xd,t ) to express a multivariate time series timestamp such
that Xt ∈Rd . As a result, Xt has the attribute of being a vector (d×1) as shown in Equation
14 as follows:.
Xt = c + Θ1 Xt−1 + Θ2 Xt−2 + . . . + Θp Xt−p + εt (14)
In the same manner as the AR-Model is for univariate time series, εt is white noise.
Anomalies in a multivariate time series are detected by fitting a p-th order VAR model and
examining the deviation ε. Krishnan et al. [66] use the euclidean or Mahalanobis distance to
measure the deviation between observed and predicted timestamps. However, VAR models
have drawbacks such as assumptions of linearity and stationarity, data requirements, com-
putational intensity, limited handling of exogenous variables, and challenges in coefficient
interpretation. Careful model selection and interpretation are crucial for accurate time series
analysis with VAR models.

Vector AutoRegression Moving-Average (VARMA). The VARMA algorithm is a large-


variable version of the ARMA algorithm. A VARMA process is made up of the p th order
vector AR-Model(VAR) and the q th order vector MA-Model(VAR), exactly like a univariate
ARMA process (VMA). It is expressed as follows if Xt is a (p, q)-VARMA model as shown
in Equation
Pp 15 as follows:
Xt = l=1 Θl Xt−l + qm=1 Φm εt−m + εt (15)
P
After fitting the VARMA model, anomalies are detected using the error value εt, similar
to the VAR method. ML algorithms are employed for anomaly detection without assum-
ing a prior generating model, gaining attention beyond the statistical community. [67][68]
However, VARMA models have drawbacks such as computational complexity, overfitting,
challenges in ensuring stationarity and incorporating exogenous variables, sensitivity to out-
liers, and difficulties in coefficient interpretation. Considering these limitations is crucial for
obtaining reliable results with VARMA models.

3.6.3. Machine Learning Approaches


K-Means Clustering – Subsequence Time-Series Clustering (STSC). It is a research ap-
proach that combines the K-means clustering algorithm with the concept of subsequence
clustering for analyzing time series data. In this method, time series data is divided into
subsequences or segments, and the K-means algorithm is applied to cluster these subse-
quences based on their similarity. [69][70] The STSC approach enables the identification of
patterns, trends, and anomalies within time series data by grouping similar subsequences
together. By leveraging the K-means clustering algorithm, which iteratively assigns data

May 22, 2024


points to clusters based on their proximity to cluster centroids, STSC facilitates the discov-
ery of meaningful patterns and insights within complex time series datasets. This research
approach offers a powerful tool for analyzing and clustering time series data, allowing re-
searchers to gain a deeper understanding of temporal relationships and structures present
in the data. Using the time series {XT }= (x1 , x2 ,..., xT ), a slide length γ, a window length
w and a collection of sub-sequences as shown in Equation 16 as follows:
S = (y0 , y1 , . . . , yw ), (y0 + γ, y1 + γ, . . . , yw + γ), . . . , (yT −w , yT −w+1 , . . . , yT )(16)
The distance is calculated between each subsequence s∈ S and its closest centroid, yield-
ing the sequence ε to find anomalies as shown in Equation 17 as follows:
ε = (e0 , e1 , . . . , e|s| )(17)
where ei for i ∈{0,...|S |} as shown in Equation 18 as follows:
ei = min∀c∈Cd (si − c) (18)
The strategy involves utilizing the distance function (such as Euclidean distance for
univariate data) to calculate the error value of each sliding window, and identifying abnormal
windows based on a provided threshold. However, determining an appropriate value for the
number of clusters (k) is challenging. [71][72] Clustering algorithms, including K-means and
hierarchical clustering, have been shown to produce ineffective results for time series data,
leading to unresolved problems. STSC, based on K-means clustering, has limitations such as
sensitivity to initial centroids, challenges in determining the number of clusters, overlooking
time-series dynamics, lack of robustness to outliers and noise, assuming equal-sized clusters,
and difficulties in interpreting cluster assignments. [73][74] Exploring alternative clustering
approaches is recommended to address these limitations effectively.

Spatial Clustering of Applications with Noise based on Density (DBSCAN). DBSCAN is a


density-based clustering algorithm for anomaly detection, considering data density, unlike
STSC or CBLOF. It categorizes points into Core, Border, and Anomaly groups based on ε
and µ parameters representing the distances to neighbors. [77][78] Before we can categorise
a point, we must first figure out who its ε neighbours are. By using the following formula,
the local outlier
P factor of x is calculated as shown in Equation 19 as follows:
distance(x)
LOF(x) = y∈N k (x)
|Nk (x)|·LRD(x)
(19)
However, DBSCAN struggles with varying densities, sensitive to distance metrics, and
may have difficulty identifying clusters of different shapes/sizes in high-dimensional data,
being computationally expensive for large datasets, and treating outliers/noise as separate
clusters.

Isolation Forest. Isolation Forest (iForest) is an ML technique for detecting anomalies in


temporal data by creating an ensemble of binary trees called iTrees. [81][82] Anomalies
are identified based on their likelihood of being isolated at the root of an iTree, and the
system selects anomalous candidates from areas with shorter travel durations. Ding et al.
[51] calculated the anomaly score S (x, p) by averaging the route length in each scenario,
allowing the establishment of a threshold through supervised learning for anomaly detection
on the test set as shown in Equation 20 as follows:

May 22, 2024


2
S(x, w) = 1+exp(−E(h(x)))
E(h(x)) = L1 Li=1 hi (x)
P
(20)
where E (h(x )) is the average of h(x ) from a set of iTrees, hi (x ) is the i−th iTree’s length
and c(p) is the average of h(x ) given w and L.
One-Class Support Vector Machines (OC-SVM). The first linear supervised support vector
machine (SVM) approach was developed in 1963, and later improved by Zheng et al. [83]
using kernel methodology for non-linear classification. The OC-SVM technique, a semi-
supervised method, identifies anomalies by classifying test data as similar or dissimilar to
normal data based on a fitted model trained only on normal data. To apply OC-SVM to
time series, Storch et al. [84] suggest using time-delay embedding to unfold the time series
into a phase space, and producing windows of length w for transformation into a vector set
[85]. {YT }= (y1 , y2 ,..., yT ). =((y1 ,..., yw ),..., (y2 ,..., yw+1 ),..., (yp ,..., yw+p−1 ). The model is
trained additively to get around this limitation, and the loss function’s Taylor approximation
is employed to make it optimizeable in Euclidean space as shown in Equation 21 as follows:
L(t) = l(. . . , yi , . . . , yi ) + gi ft (xi ) + 2 (xi ) + Ω(ft )(21)
1


gi is the loss derivative: δ(. . . , yt−1 , . . .) · l(yi , . . . , y, . . .)and hi is the second


derivative:l(yi , . . . , y, . . .). One-Class Support Vector Machines (OC-SVM) can be chal-
lenging to tune due to hyperparameter selection and may struggle with imbalanced datasets
and complex overlapping regions. [85][86][87] However, OC-SVM is still widely used and
effective for one-class classification and anomaly detection tasks.
Local Outlier Factor (LOF). The use of LOF is explored to find anomalies in univariate
time series, but Luo et al. [88] employed the same approach in multivariate time series.
The researchers conducted their investigation using 16 sensor values from a marine time
series data collection. As a result, each xi ∈XT element is a multidimensional vector that is
utilized to compute the anomalous score: when d is larger than one, xi ∈ Rd is used as shown
in Equation 22 as follows:.
B(w, t) = (x21 , . . . , x2d , . . . , x21 , . . . , x2d ) (22)
Local Outlier Factor (LOF) can be sensitive to high-dimensional data, the choice of
neighborhood size parameter, and datasets with varying densities or imbalanced distribu-
tions. However, it is still widely used and effective, particularly in lower-dimensional datasets
with relatively uniform densities.
Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Inconsistencies in
multivariate temporal data were found using DBSCAN. DBSCAN is used by Fahim et al.
[89] to discover anomalies in enterprise Applications by leveraging multivariate time series.
Wang et al. [90] employed an isolation forest to find anomalies in multivariate temporal
data. The principle is the same for multivariate and univariate time series. The letter {XT }
is used to represent multivariate time series such as xi ∈XT and xi ∈ RN . The following is the
definition of a w -width sliding window as shown in Equation 23 as follows:
XW := (W1 , W2 , . . . , Wp ) = ((x1 , . . . , xw ), . . . , (xp , . . . , xT ))(23)
The following is how the isolation forest will determineτ xj 0 s anomaly score as shown in
Equation 24 as follows::
May 22, 2024
1
PwN
τ xj = wN i=1 h(xi,j ) (24)
The training segment of the time series dataset can be used to recalculate a threshold.

One-Class Support Vector Machine (OC-SVM). Kashiwao et al. [91] employed OC-SVM to
identify anomalies in multivariate temporal data. They use a sliding window of width w to
transform {XT } into a multivariate time series, which splits the time series data {XT } into
as shown in Equation 25 as follows::
XW := (W1 , W2 , . . . , Wp ) = ((x1 , . . . , xw ), . . . , (xp , . . . , xT ))(25)
Each element of the processed data, represented by xj ∈ XW , contains w timestamps
of N dimensions. After processing the data, features such as minimum, maximum, mean,
median, standard deviation, average crossings, and squared error are calculated for each
dimension. These features are then normalized before being used as input for the OC-SVM,
and the multivariate time series is transformed into a univariate time series. Before DL
became popular, people built a number of mathematical and statistical models to analyze
time series data, and these models were widely used in a variety of industries. We now look
at a few standard strategies as well as some challenges that are yet to be solved. Table 2
presents a comprehensive overview of different types of anomaly detection techniques along
with the corresponding algorithms used.

[Table 2 about here.]

4. Deep Learning for Anomaly Detection


Deep anomaly detection using neural networks has emerged as a promising approach
for detecting anomalies in different data domains. Challenges include achieving high recall
rates for rare and complex anomalies, handling high-dimensional and non-independent data,
and limited labeled anomaly data. DL methods offer solutions by leveraging labeled data,
extracting complex structures, and providing explanations for detected anomalies. In video
anomaly detection, DL models extract features from video frames and pre-trained models
improve anomaly classification. Convolutional autoencoders and graph anomaly detection
techniques further enhance feature learning and clustering-based measures. The availability
of pre-trained models makes deep anomaly detection accessible across domains. [97] By
addressing the challenges, harnessing the capabilities of DL, and incorporating domain-
specific techniques, the field of deep anomaly detection holds great promise for accurate and
effective anomaly detection in complex and diverse data scenarios.

4.1. Anomaly Criteria


In anomaly detection, various criteria and methods can be employed to identify devia-
tions and outliers in large datasets. Statistical methods compare observed data with sta-
tistical properties of training data, such as Gaussian-based techniques or Tukey’s method.
Dissimilarity measures utilize distance or similarity measures like Euclidean distance, cosine
similarity, or Mahalanobis distance to determine the variation between model-derived values
and the given data. [98][99] Ensemble approaches use multiple models to improve detection
May 22, 2024
performance, dynamically selecting models and leveraging prediction errors. Probabilistic
models, combined with DL, estimate accurate output distributions to account for uncer-
tainty and enhance anomaly detection. These criteria provide different perspectives and can
be chosen based on the specific characteristics and requirements of the data.

4.1.1. Mahalanobis Distance


Mahalanobis distance is used for multivariate data analysis to determine the distance be-
tween two distributions, taking into account the covariances of the variables.[100][101] It is
preferred over Euclidean distance for skewed data as it normalizes the variables by dividing
by the covariance. This distance metric converts the distributions to standard normal distri-
butions with uncorrelated variables. [102][104]. The approach for calculation assumes tied
covariance in the representation space and is motivated by an induced generative classifier
distributions P (t = c) = cIββc
cI
and class-conditional distributions follow Gaussian distribu-
tions with tied covariance N (x; µcI , Σ). Then, its posterior distribution P (t = c|x) can be
represented in the following manner as shown in Equation 26 as follows::
P (t = c|x) = cI1 exp µT Σ−1 x − 12 µT Σ−1 µc + log βc

(26)

4.2. Modeling Deep Learning Models for Univariate and Multivariate Time Series Data
Neural networks have gained interest in time series forecasting and analysis due to their
performance in computer vision tasks and their ability to learn from data without assump-
tions about the data creation process. Researchers have compared neural networks to tra-
ditional methods like ARIMA for time series analysis and forecasting. [105][106] Neural
networks have also been applied to anomaly detection in both univariate and multivariate
time series data. [107][108][109]

4.2.1. Univariate Techniques


Multiple-Layer Perceptron (MLP). The Multilayer Perceptron (MLP) is a fully-connected
neural network commonly used for time series classification and anomaly detection. It
captures complex patterns and relationships in time series data and can be trained to predict
expected values. The prediction error serves as an anomaly score, where larger errors indicate
a higher likelihood of anomalies. MLPs excel in capturing local and global dependencies in
data, thanks to their fully-connected structure and non-linear activation functions. This
makes them powerful tools for accurate classification and reliable anomaly detection in
various domains. Thus, if {XT } is a time series, xi is a time series, w is the window length,
and f is the MLP function, as shown in Equation 27 as follows:
xt+1 = f (xt−w , . . . , xt ), ∀t ∈ {w, . . . , T }(27)
If the anomaly threshold is δ, then xi+1 is an anomaly as shown in Equation 28 as follows::
f (xi−w , ..., xi )xi+1 > δ (28)
Hyperparameters in MLP, such as depth, width, window length, learning rate, and op-
timization function, are tuned using techniques like random search or advanced algorithms
using the training set of time series data to obtain the appropriate value for δ. [110].

May 22, 2024


Convolutional Neural Networks. Deep CNNs are utilized for time series anomaly detection,
leveraging convolution and pooling layers to capture local patterns and reduce parameters
for faster training. Recent advancements like DeepAnT have further improved anomaly
identification by using CNNs to forecast time series and assess prediction errors. Batch
Normalization is employed for regularization, and the model’s architecture varies based on
the dataset for optimal performance. If xi+1 is classified as an anomaly as shown in Equation
29 as follows::
f (xi−w , ..., xi ) xi+1 > δ (29)
CNNs, in addition to MLP’s hyperparameters, involve predicting the CNN design (in-
cluding Batch Normalization, Dropout, or Max Pooling), the number and size of kernels in
each convolution layer, and the depth of the convolution layer.
Long Short Term Memory Network. Long Short-Term Memory (LSTM) networks, a type
of recurrent neural network (RNN), are highly effective for anomaly detection in time series
data. LSTMs capture long-term dependencies, allowing them to model complex relationships
and handle sequences of varying lengths. By training on normal data, LSTMs learn the
patterns and dynamics of the time series. During inference, the LSTM predicts future data
points, and anomalies are detected based on the magnitude of the prediction error. LSTMs
excel at detecting anomalies occurring over extended periods and accommodate variable-
length sequences, making them powerful tools for accurate anomaly detection in diverse
applications. Each prediction’s error value is then computed as shown in Equation 30 as
follows:
e(i) = (e1 , . . . , el ) = (f (xli−l ) − xi+1 , f (xl−1 1
i−l+1 ) − xi+1 , . . . , f (xi ) − xi+1 (30)

Gated Recurrent Unit (GRU) . GRU, a simpler alternative to LSTM, combines input and
forget gates, outputs the entire state vector, and performs comparably with lower computa-
tional requirements. The autoencoder will compute the following as shown in Equation 32
as follows:
2
minθ,ψ,θ,φ I1 (xi , xi+1 , . . . , xi+w ) − (x0i , x0i+1 , . . . , x0i+w ) 2 , ∀i ∈ {i, . . . , T − w}(31)
Anomaly detection uses an anomaly threshold, where xj is identified as anomalous if
ej > δ, and autoencoders are effective for semi-supervised learning, learning a latent space
of normal data points and detecting deviations when anomalies are introduced.

4.2.2. Multivariate Time-Series Data


Multi-Layer Perceptron. In multivariate time series anomaly detection, an MLP is used for
multivariate forecasting, and anomalous timestamps are identified by measuring the diver-
gence between the forecast and real values using a distance function. Different approaches
include using separate MLPs for each time series or a single MLP with a multivariate input.
[117] Given xi ∈{XT } the j−th MLP will predict xi,j where the input is (xi−w,1 ,..., xi−w ,d,
xi−w+1 ,1,..., xi−w+1,d ,..., xi−1,1 ,..., xi−1,d ). The anomaly score is calculated as follows in Equa-
tion 32 as follows:
ei = |M1 | ((xi−w , . . . , xi−1 ) − xi ) (32)
Then, we can compute the probability distribution of the ei ∀i ∈ 1, ..., T. Using the
computed probability distribution, we can determine an appropriate δ using the τ -Percentile.
May 22, 2024
Then, let xi ∈ {XT EST } be a timestamp in the test dataset, we will mark it as an anomaly
if and only if kM((xi−w , ..., xi−1 )) − xi k > 2.

Convolution Neural Network (CNN). CNNs are applied for anomaly detection in multivari-
ate time series, where the time series is split into univariate series and fed into separate
convolution blocks, with the resulting channels combined and classified using an MLP; an-
other approach involves using a single CNN with a sliding window and two convolution
blocks to make a multidimensional prediction. [118] To determine whether xi is an anomaly,
the euclidean distance between the prediction to the real value is computed as shown in
Equationp33 as follows:
ei = (yi − ȳi )2 (33)
This value is used as an anomaly score. Using the training data, a threshold δ can
be computed, analyzing the distribution of ei ,∀i ∈ [1, T]. In online anomaly detection, a
sequence is considered abnormal if an abnormal data point is present, combining autoencoder
and SVM methods. Probability distribution of errors can also be computed to classify a
sequence as anomaly based on a pre-computed δ. Table 3 summarizes the pros and cons of
the anomaly detection models, aiding in the selection of the most suitable model based on
strengths, limitations, and performance metrics like precision, recall, and F1-score.

[Table 3 about here.]

Long Short Term Memory. LSTM networks are also used to detect anomalies in multivariate
time series. The approach is very similar to the method used for univariate time series.
Carletti et al. [82] used the same approach for multivariate time series. Let {XT } ∈ RT ×D
be a multivariate time series with T timestamps (x1 , ..., xT ) and each timestamp xi is a
vector with d -dimension: (x1i , x2i , ..., xD i ). The error for a multivariate timestamp xi is
computed as shown in Equation 34 as follows:
e(i) = (e(i), . . . , e(i), . . . , e(i), . . . , e(i)) = (f (xli−l ) − xi+1 , f (xl−1 1
i−l+1 ) − xi+1 , . . . , f (xi ) −
xi+1 )(34)
For anomaly detection in multivariate time series, the same approach as in univariate se-
ries is applied: fitting the error vector to a multivariate Gaussian distribution and estimating
parameters using Maximum Likelihood Estimation (MLE), where a threshold δ determines
if a data point is an anomaly. GRU models can be used similarly to LSTM, with GRU
cells replacing LSTM cells. Table 4 summarizes the parameters used in the anomaly detec-
tion method, providing important details for replicating experiments and understanding the
model’s configuration.

[Table 4 about here.]

4.3. Comparative Reviews


In this section, we present the experimental performances of different methods applied
to real-world datasets for time-series anomaly detection. By evaluating these methods on
actual data, we aim to assess their effectiveness in detecting anomalies in time-series data
May 22, 2024
and provide insights into their performance in practical scenarios. The results obtained
from these experiments shed light on the strengths and limitations of each method, enabling
researchers and practitioners to make informed decisions when choosing an appropriate
anomaly detection technique for their specific applications.

4.3.1. Experimental Setup


To compare and evaluate the performances of the methods presented, we utilize the fol-
lowing public time-series datasets. These datasets are widely used in the research community
and provide diverse real-world scenarios for testing the effectiveness of anomaly detection
algorithms. By using these datasets, we ensure a standardized benchmark for assessing the
capabilities and limitations of each method, facilitating fair comparisons and enabling mean-
ingful insights into their performance characteristics. The selection of these datasets aims
to cover a range of domains and capture various types of anomalies commonly encountered
in time-series data analysis. Table 5 provides an overview of the key attributes and descrip-
tions of the datasets used in the experiments. Each dataset is listed along with its specific
attributes, such as the number of instances, the number of features, and the presence of
anomalies or ground truth labels. This table helps researchers and practitioners understand
the characteristics and properties of the datasets used in anomaly detection experiments. By
examining the attributes, such as the data size and the availability of ground truth labels,
it becomes easier to assess the suitability of each dataset for specific research objectives and
evaluate the generalizability of the proposed anomaly detection methods across different
data domains.
[Table 5 about here.]
The Secure Water Treatment (SWaT) [124] dataset is a widely used public dataset
for evaluating anomaly detection methods in critical infrastructure systems. It represents
a real-world water treatment plant system, providing a valuable resource for assessing the
performance of anomaly detection algorithms in this domain. The SWaT dataset captures
the behavior of different components and processes within the water treatment plant, in-
cluding pumps, valves, tanks, and chemical dosing systems. It contains a large number of
time-series measurements recorded at high frequency, covering a wide range of operating
conditions and scenarios. With labeled anomalies, the SWaT dataset enables the evaluation
of detection accuracy and the performance of different algorithms in identifying normal and
anomalous behavior.
The Soil Moisture Active Passive (SMAP) [125] dataset is a publicly available
dataset derived from the satellite mission launched by NASA. It provides high-resolution
measurements of soil moisture content, vegetation water content, and freeze/thaw state of
the land surface on a global scale. The dataset is used in diverse fields such as hydrology,
agriculture, weather forecasting, and climate studies, enabling researchers to investigate soil
moisture dynamics, analyze land-atmosphere interactions, and improve our understanding
of climate processes. The SMAP dataset plays a crucial role in advancing Earth science
research by providing valuable insights into the Earth’s hydrological cycle and ecosystem
dynamics.
May 22, 2024
The Water Distribution (WADI) [127] dataset is a valuable resource for studying
and optimizing water distribution systems. It contains real-time sensor measurements from a
large-scale water distribution network, enabling researchers to develop algorithms for predict-
ing water demand, detecting anomalies, and optimizing network operations. By leveraging
the WADI dataset, researchers and water utility professionals can gain insights to improve
system efficiency and reduce water losses, contributing to better water management prac-
tices.
The Mars Science Laboratory Rover (MSL) [128] is a remarkable robotic mission
launched by NASA to explore the surface of Mars. Equipped with advanced scientific instru-
ments, the MSL rover is designed to conduct in-depth investigations and gather critical data
about the Martian environment. Its primary objective is to assess the planet’s habitability
and investigate its geological history. The MSL rover, named Curiosity, has proven to be a
technological marvel, capable of traversing rough terrain, collecting samples, and conducting
experiments to better understand the planet’s composition and potential for supporting life.
The mission has significantly contributed to our knowledge of Mars and paved the way for
future exploration endeavors.
The SMD dataset, or Server Machine Dataset, [126] is a widely used benchmark
dataset for anomaly detection in server machine data. It contains time-series data col-
lected from servers, encompassing various metrics such as CPU utilization, memory usage,
disk I/O, and network traffic. The dataset includes both normal and anomalous instances,
making it suitable for evaluating the performance of different anomaly detection algorithms.
Researchers leverage the SMD dataset to benchmark their methods, comparing them against
state-of-the-art techniques and advancing anomaly detection in server monitoring scenarios.
In the field of anomaly detection, numerous research studies have investigated the per-
formance of various methods using datasets outlined in Table 6. Whenever possible, we have
relied on the reported performances from these studies. However, in cases where such data
was unavailable, we conducted our own experiments to obtain the necessary performance
metrics. This approach ensures a comprehensive assessment of the anomaly detection meth-
ods and facilitates a more robust evaluation of their effectiveness. Precision, Recall, and
F1-score are widely used evaluation metrics in various fields, including anomaly detection.
These metrics provide a quantitative assessment of the performance of an anomaly detection
system.
Precision: Precision is a metric used to assess the accuracy of anomaly identification
within a system. It evaluates the proportion of anomalies that are correctly identified among
all instances classified as anomalies. The calculation involves dividing the number of true
positives (anomalies correctly identified) by the sum of true positives and false positives
(instances incorrectly classified as anomalies). In essence, precision reveals the system’s
ability to pinpoint anomalies accurately and minimize false alarms.
Recall: Recall, alternatively referred to as sensitivity or true positive rate, gauges the
system’s effectiveness in detecting anomalies within the dataset. It quantifies the proportion
of actual anomalies correctly identified by the system among all the anomalies present. The
calculation involves dividing the number of true positives (anomalies correctly identified) by
the sum of true positives and false negatives (anomalies missed by the system). In essence,
May 22, 2024
recall provides insights into how well the system captures and identifies existing anomalies
in the dataset.
F1-score: The F1-score is a harmonic mean of precision and recall, providing a balanced
measure that considers both metrics. It combines precision and recall into a single value
to assess the overall performance of the anomaly detection system. F1-score is calculated
as 2 * (precision * recall) / (precision + recall). A high F1-score indicates a good balance
between precision and recall, indicating a robust anomaly detection system.
Optimal performance in anomaly detection relies on tuning key hyperparameters such
as threshold, window size, feature selection, algorithm-specific parameters, and data pre-
processing, as they influence the trade-offs, pattern capture, feature relevance, algorithm
behavior, and data quality, with techniques like grid search and random search aiding in
exploring combinations for maximum detection performance. Selecting suitable hyperpa-
rameter values is crucial for optimal results in anomaly detection tasks.

4.4. Result and Analysis


The table presents the results of anomaly detection accuracy for several methods across
different datasets. The evaluation metrics include precision, recall, F1-score, and AUC (Area
Under the Curve) for each method on the SWaT, WADI, and SMD datasets.
In the SWaT dataset, MERLIN [128] achieved a precision of 0.656, recall of 0.2547, F1-
score of 0.6175, and an AUC of 0.3669. LSTMNDT performed slightly better with a precision
of 0.778, recall of 0.5109, F1-score of 0.714, and an AUC of 0.6167. DAGMM [130] showed
significantly higher accuracy with a precision of 0.9933, recall of 0.6879, F1-score of 0.8436,
and an AUC of 0.8128. OmniAnomaly [131] also demonstrated good performance with a
precision of 0.9782, recall of 0.6957, F1-score of 0.8467, and an AUC of 0.8131. MADGAN
achieved a precision of 0.9593, recall of 0.6957, F1-score of 0.8463, and an AUC of 0.8065.
USAD had the highest precision of 0.9977, recall of 0.6879, F1-score of 0.846, and an AUC
of 0.8143.
Moving to the WADI dataset, the results showed generally lower performance across
all methods. MERLIN [128] achieved a precision of 0.3669, recall of 0.0636, F1-score of
0.7669, and an AUC of 0.5912. LSTMNDT demonstrated better accuracy with a precision
of 0.6167, recall of 0.0138, F1-score of 0.7823, and an AUC of 0.6721. DAGMM [129]
achieved a precision of 0.8128, recall of 0.076, F1-score of 0.9981, and an AUC of 0.8563.
OmniAnomaly [130] showed a precision of 0.8131, recall of 0.3158, F1-score of 0.6541, and
an AUC of 0.8198. MADGAN [132] achieved a precision of 0.8065, recall of 0.2233, F1-score
of 0.9124, and an AUC of 0.8026. USAD [133] demonstrated a precision of 0.8143, recall of
0.1873, F1-score of 0.8296, and an AUC of 0.8723.
Lastly, in the SMD dataset, the results showed relatively higher accuracy for most meth-
ods. MERLIN [128] achieved a precision of 0.5912, recall of 0.1174, F1-score of 0.5941, and
an AUC of 0.5681. LSTMNDT [129] demonstrated a precision of 0.6721, recall of 0.0271,
F1-score of 0.8712, and an AUC of 0.674. DAGMM [130] achieved a precision of 0.8563,
recall of 0.1412, F1-score of 0.6731, and an AUC of 0.7561. OmniAnomaly [131] showed a
precision of 0.8198, recall of 0.426, F1-score of 0.9812, and an AUC of 0.9142. MADGAN
[132] achieved a precision of 0.8026, recall of 0.3588, F1-score of 0.9311, and an AUC of
May 22, 2024
0.8813. USAD [133] demonstrated a precision of 0.8723, recall of 0.3056, F1-score of 0.7323,
and an AUC of 0.7892. Overall, the results indicate varying levels of performance across
the different datasets and methods. DAGMM [130], OmniAnomaly [131], and USAD [133]
generally demonstrated higher accuracy in terms of precision, recall, F1-score, and AUC.
It’s important to note that the WADI dataset had relatively lower recall and F1-score values
across all methods, suggesting challenges in accurately detecting anomalies in that particular
dataset.
Anomaly detection in time-series data often requires capturing temporal dependencies
and inter-correlations between attributes. Methods like LSTMNDT [129], DAGMM [130],
OmniAnomaly [131], MADGAN [132], and MTADGAT [134] leverage DL or generative
models to effectively capture complex temporal patterns and detect anomalies based on
abnormal temporal behavior. Efficient processing of long sequences is achieved through
parallel processing techniques. These models excel in handling high-dimensional datasets
and capturing inter-dependencies between attributes. However, the performance may vary
depending on dataset characteristics and anomaly nature. Evaluating precision, recall, and
F1-score on multiple datasets provides valuable insights into the effectiveness of different
methods, aiding researchers and practitioners in selecting appropriate anomaly detection
techniques.
[Table 6 about here.]

5. Recommendations for Model Developers


Effective anomaly detection is essential in today’s data-driven world to ensure data in-
tegrity, reliability, and security. Practitioners must consider factors like data understanding,
method evaluation, parameter selection, diverse dataset testing, monitoring, result inter-
pretation, scalability, and staying updated with research advancements. By following these
guidelines, anomaly detection enables practitioners to manage and respond to anomalies,
maintaining accurate analyses and informed decision-making. Real-time detection methods
further enhance the ability to promptly address anomalies, providing valuable insights and
enabling timely actions for mitigating negative impacts.
Understand your data: To effectively use anomaly detection methods, practitioners
need to have a deep understanding of their data, including its characteristics, distribution,
and potential anomalies. This understanding can help inform the selection of an appropriate
anomaly detection method and help set relevant parameters. For example, if the data
contains seasonality, a method that can capture seasonal patterns may be more appropriate
than a method that cannot.
Evaluate multiple methods: There are many anomaly detection methods available,
each with their own strengths and weaknesses. To ensure that practitioners select the most
appropriate method for their data, it’s important to evaluate multiple methods and compare
their performance. This can involve testing the methods on a range of datasets that contain
different types of anomalies and assessing the methods’ precision, recall, and F1-score, among
other metrics.
May 22, 2024
Select appropriate parameters: Most anomaly detection methods have parameters
that need to be set based on the data being analyzed. Selecting appropriate parameters
is important to ensure that the method is effective. For example, in isolation forests, the
number of trees and the maximum depth of each tree can affect the accuracy of the method.
Practitioners should test the method with a range of parameter values and select the values
that provide the best results.
Test on multiple datasets: To ensure that an anomaly detection method is robust and
effective, it’s important to test it on multiple datasets. This can help identify any limitations
of the method and provide insights into its generalizability. Practitioners should test the
method on datasets that contain different types of anomalies, as well as datasets that are
structurally different from the training dataset.
Monitor and update: Anomaly detection is an ongoing process, and the data being
analyzed may change over time. It’s important to monitor the results of anomaly detection
methods regularly and update them as necessary. This can involve retraining the method
with new data, adjusting parameters, or selecting a different method altogether. [138]
Interpret results carefully: Anomaly detection methods can provide useful insights
into data, but it’s important to interpret the results carefully and not jump to conclusions
without further investigation. For example, an anomaly detection method may identify a
data point as an anomaly, but further investigation may reveal that the data point was valid
and that the method’s parameters need to be adjusted.
Consider scalability: Anomaly detection methods can be computationally expensive,
especially for large datasets. Practitioners should consider the scalability of a method when
selecting it and implementing it in practice. For example, if a method is too computationally
expensive to run on large datasets, practitioners may need to consider methods that are more
scalable, even if they are less accurate.
Keep up-to-date with research: Anomaly detection is a rapidly evolving field, and
new methods are constantly being developed. Practitioners should stay up-to-date with
the latest research and incorporate new methods as appropriate. This can involve reading
research papers, attending conferences, and participating in online communities dedicated
to anomaly detection. [139] Overall, these guidelines can help practitioners effectively select,
implement, and monitor anomaly detection methods to identify and address anomalies in
their data.
Real-time detection methods are algorithms that can identify anomalies as soon as they
occur, enabling practitioners to respond quickly to prevent or mitigate any negative impacts.
These methods can be particularly useful in situations where timely action is required, such
as in cybersecurity, finance, or manufacturing. [140][141] To effectively use these real-time
detection methods, practitioners should follow some general guidelines:

• Define the problem and the objectives of the anomaly detection system.
• Collect high-quality data that represents the system and its normal behavior.
• Preprocess and transform the data to ensure that it is suitable for the chosen detection
method.
May 22, 2024
• Select an appropriate detection method that matches the problem and the data char-
acteristics.

• Train the detection model on the historical data and validate its performance using
appropriate metrics.

• Integrate the detection model into the system and continuously monitor its perfor-
mance.

• Take appropriate actions when an anomaly is detected, such as alerting the responsible
personnel, triggering a response system, or shutting down the system.

By following these guidelines, practitioners can effectively use real-time detection methods
for anomaly detection and respond promptly to any incidents that occur. Here are some
novel guidelines for model developers to evaluate and fine-tune anomaly detection models:
1. Incorporate real-world feedback: Collect feedback from domain experts or end-
users about the anomalies detected by the model. This feedback can help refine the model’s
performance and improve its ability to detect relevant anomalies. [142]
2. Monitor model drift: Continuously monitor the performance of the model over
time and detect any drift or changes in the data that may affect its performance. Regularly
retrain the model to keep it up-to-date with the latest data. [143]
3. Use multiple evaluation metrics: Use a combination of evaluation metrics, such
as precision, recall, F1-score, ROC-AUC, and precision-recall curves, to assess the model’s
performance. This will help identify any weaknesses in the model’s performance and enable
targeted improvements. [143]
4. Use synthetic data: Create synthetic data sets that include a variety of anomalous
behavior that may not be present in the real-world data. This can help improve the model’s
ability to detect and classify anomalies that it may not have encountered in the training
data. [144]
5. Incorporate external data sources: Incorporate external data sources, such as
weather or social media data, into the model’s training data. This can help the model identify
anomalous behavior that may be related to external factors not present in the original data
set. [145]
6. Consider ensemble methods: Consider using ensemble methods, such as stacking
or bagging, to improve the model’s performance. These methods can help overcome the
limitations of individual models and improve the overall accuracy of the anomaly detection
system. [146] Denoising techniques enhance accuracy and efficiency in anomaly detection
by removing noise and disturbances from time-series data, allowing algorithms to focus on
underlying patterns and anomalies of interest.
1. Understand the Noise Characteristics: Gain a deep understanding of the noise
present in the time-series data. Different types of noise can be encountered, such as ran-
dom noise, measurement errors, outliers, or systematic noise due to specific sources. [147]
By identifying and characterizing the noise, practitioners can select appropriate denoising
techniques that effectively mitigate the specific noise sources.
May 22, 2024
2. Preprocessing Techniques: Apply preprocessing techniques to clean and preprocess
the data before denoising. This may include handling missing values, normalizing or scaling
the data, or handling outliers. Preprocessing ensures that denoising techniques operate on
reliable and consistent data, leading to more accurate anomaly detection results.
3. Select Suitable Denoising Algorithms: Explore various denoising algorithms to
select the most suitable one for the specific characteristics of the time-series data. Common
denoising techniques include moving average, median filtering, low-pass filters, wavelet trans-
forms, or autoencoders. Each algorithm has its advantages and assumptions, so practitioners
should consider the noise characteristics, computational efficiency, and the preservation of
underlying patterns when choosing a denoising technique.
4. Trade-Off between Noise Removal and Pattern Preservation: Strive for a
balance between noise removal and preserving the important patterns and anomalies in
the data. Aggressive denoising may unintentionally remove valuable information, making it
challenging to distinguish anomalies from the denoised data. [148] Experiment with different
denoising parameters and evaluate the impact on anomaly detection performance to find the
optimal trade-off.
5. Evaluate Denoising Impact: Assess the impact of denoising on the anomaly
detection task. Evaluate the performance of the anomaly detection algorithm both with
and without denoising to determine if denoising improves the accuracy, precision, recall, or
other relevant metrics. It is essential to ensure that the denoising process does not introduce
biases or distortions that may hinder the anomaly detection process.
6. Adaptive Denoising: Consider adaptive denoising techniques that dynamically
adjust the denoising process based on the data characteristics or the specific anomaly de-
tection task. Adaptive denoising methods can automatically adjust denoising parameters,
thresholds, or filtering techniques based on the noise levels or underlying patterns present in
the data. [149] This flexibility helps in handling varying noise intensities or changing data
distributions.
7. Iterative Approach: Adopt an iterative approach by combining denoising and
anomaly detection in a feedback loop. Start with an initial denoising step, followed by
anomaly detection on the denoised data. [150] Analyze the detected anomalies and reassess
the denoising process to refine it further. Iteratively refine both denoising and anomaly
detection steps to achieve better results.
8. Domain Knowledge Integration: Incorporate domain knowledge and expert in-
sights into the denoising process. Domain experts can provide valuable guidance in under-
standing the noise sources, identifying relevant features, and selecting appropriate denoising
techniques that align with the domain-specific characteristics and requirements.
By following these guidelines, practitioners can effectively employ denoising techniques
to enhance the accuracy and reliability of anomaly detection in time-series data, enabling
the identification of meaningful anomalies while minimizing the impact of noise and distur-
bances.

May 22, 2024


6. Conclusions
The amount of data created by different industrial and commercial companies is increas-
ing. Anomaly identification in time series is crucial in sectors such as manufacturing, finance,
and aerospace. It’s hard to examine time series anomalies at random because of the expo-
nential growth in time series dimension and quantity, and any time series anomaly can lead
to serious consequences. Thus, the time series anomaly detection model’s accuracy must
be improved further. Deep neural networks are the most accurate models, and researchers
must continue to strive toward this goal. This research provides an overview of the anomaly
detection in time series data, as well as examples from recent real-world applications. We
then highlight the challenges of the classical anomaly detection approaches and then give
an overview of the DL methods for anomaly detection. We also evaluate the state-of-the-art
deep anomaly detection techniques for time series using a range of benchmark datasets. By
studying the model’s characteristics and comparing each model on public datasets, this re-
search shows the suitable application circumstances for each model, as well as the benefits
and disadvantages of each model in the application of time series anomaly detection. While
ML is able to assist with some of the concerns identified, the absence of pre-labeled data re-
mains a challenge for these methods. Finally, we go through some of the research issues that
might arise while developing novel anomaly detection systems, both for case-specific and
more general approaches and give the necessary recommendations for the model developers.

7. References
1. Wang, R., Nie, K., Wang, T., Yang, Y. and Long, B., 2020, January. Deep learning for
anomaly detection. In Proceedings of the 13th international conference on web search
and data mining (pp. 894-896).
2. Al-amri, R., Murugesan, R.K., Man, M., Abdulateef, A.F., Al-Sharafi, M.A. and Alka-
htani, A.A., 2021. A review of machine learning and deep learning techniques for
anomaly detection in IoT data. Applied Sciences, 11(12), p.5320.
3. May Petry, L., Soares, A., Bogorny, V., Brandoli, B. and Matwin, S., 2020. Challenges
in vessel behavior and anomaly detection: From classical machine learning to deep
learning. In Advances in Artificial Intelligence: 33rd Canadian Conference on Artificial
Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13–15, 2020, Proceedings
33 (pp. 401-407). Springer International Publishing.
4. Bekmez, S., Afandiyev, A., Dede, O., Karaismailoğlu, E., Demirkiran, H.G. and Yazici,
M., 2019. Is magnetically controlled growing rod the game changer in early-onset
scoliosis? A preliminary report. Journal of Pediatric Orthopaedics, 39(3), pp.e195-
e200.
5. Koubaa, A., Boulila, W., Ghouti, L., Alzahem, A. and Latif, S., 2023. Exploring
ChatGPT capabilities and limitations: A critical review of the nlp game changer.
6. Kirchgässner, G., Wolters, J. and Hassler, U., 2012. Introduction to modern time
series analysis. Springer Science & Business Media.

May 22, 2024


7. Jiang, F., Zhao, Z. and Shao, X., 2023. Time series analysis of COVID-19 infection
curve: A change-point perspective. Journal of econometrics, 232(1), pp.1-17.
8. Graham, F.F., Kim, A.H.M., Baker, M.G., Fyfe, C. and Hales, S., 2023. Associations
between meteorological factors, air pollution and Legionnaires’ disease in New Zealand:
Time series analysis. Atmospheric Environment, 296, p.119572.
9. Alam, M.S., Murshed, M., Manigandan, P., Pachiyappan, D. and Abduvaxitovna, S.Z.,
2023. Forecasting oil, coal, and natural gas prices in the pre-and post-COVID scenar-
ios: Contextual evidence from India using time series forecasting tools. Resources
Policy, 81, p.103342.
10. Amiri, S.S., Mueller, M. and Hoque, S., 2023. Investigating the application of a
commercial and residential energy consumption prediction model for urban Planning
scenarios with Machine Learning and Shapley Additive explanation methods. Energy
and Buildings, 287, p.112965.
11. Xia, F., Chen, X., Yu, S., Hou, M., Liu, M. and You, L., 2023. Coupled Attention
Networks for Multivariate Time Series Anomaly Detection. IEEE Transactions on
Emerging Topics in Computing.
12. Tian, Z., Zhuo, M., Liu, L., Chen, J. and Zhou, S., 2023. Anomaly detection using
spatial and temporal information in multivariate time series. Scientific Reports, 13(1),
p.4400.
13. Yan, S., Shao, H., Xiao, Y., Liu, B. and Wan, J., 2023. Hybrid robust convolutional au-
toencoder for unsupervised anomaly detection of machine tools under noises. Robotics
and Computer-Integrated Manufacturing, 79, p.102441.
14. Yan, H., Liu, Z., Chen, J., Feng, Y. and Wang, J., 2023. Memory-augmented skip-
connected autoencoder for unsupervised anomaly detection of rocket engines with
multi-source fusion. ISA transactions, 133, pp.53-65.
15. Chira, D., Haralampiev, I., Winther, O., Dittadi, A. and Liévin, V., 2023, Febru-
ary. Image super-resolution with deep variational autoencoders. In Computer Vi-
sion–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings,
Part II (pp. 395-411). Cham: Springer Nature Switzerland.
16. Zilelioglu, H., Khodabandelou, G., Chibani, A. and Amirat, Y., 2023. Semi-Supervised
Generative Adversarial Networks with Temporal Convolutions for Human Activity
Recognition. IEEE Sensors Journal.
17. Das, A.K., Patidar, V. and Naskar, R., 2023, March. Artificial Synthesis of Single
Person Videos through Motion Transfer using Cycle Generative Adversarial Networks
and Machine Learning. In 2023 9th International Conference on Advanced Computing
and Communication Systems (ICACCS) (Vol. 1, pp. 105-111). IEEE.
18. Hawkins, D.M., 1980. Identification of outliers (Vol. 11). London: Chapman and
Hall.
19. Wu, H.S., 2016, December. A survey of research on anomaly detection for time series.
In 2016 13th International Computer Conference on Wavelet Active Media Technology
and Information Processing (ICCWAMTIP) (pp. 426-431). IEEE.

May 22, 2024


20. Liu, J., Yang, Z. and Song, Y., 2023. A two-stage anomaly detection framework:
Towards low omission rate in industrial vision applications. Advanced Engineering
Informatics, 55, p.101822.
21. Alimohammadi, H. and Chen, S.N., 2022. Performance evaluation of outlier detection
techniques in production timeseries: A systematic review and meta-analysis. Expert
Systems with Applications, 191, p.116371.
22. Isensee, L.J., Detzel, D.H.M., Pinheiro, A. and Piazza, G.A., 2023. Extreme stream-
flow time series analysis: trends, record length, and persistence. Journal of Applied
Water Engineering and Research, 11(1), pp.40-53.
23. Osmanoğlu, B., Sunar, F., Wdowinski, S. and Cabral-Cano, E., 2016. Time series
analysis of InSAR data: Methods and trends. ISPRS Journal of Photogrammetry and
Remote Sensing, 115, pp.90-102.
24. Alarid-Escudero, F., Krijkamp, E., Enns, E.A., Yang, A., Hunink, M.M., Pechli-
vanoglou, P. and Jalal, H., 2023. A tutorial on time-dependent cohort state-transition
models in r using a cost-effectiveness analysis example. Medical Decision Making,
43(1), pp.21-41.
25. Jin, H., Dong, G.M., Wu, H.Y., Yang, Y.R., Huang, M.Y., Wang, M.Y. and Yang,
R.J., 2023. Identification of adulterated milk based on auto-correlation spectra. Spec-
trochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 286, p.121987.
26. Love, D.C., Asche, F., Gephart, J.A., Zhu, J., Garlock, T., Stoll, J.S., Anderson,
J., Conrad, Z., Nussbaumer, E.M., Thorne-Lyman, A.L. and Bloem, M.W., 2023.
Identifying opportunities for aligning production and consumption in the US fisheries
by considering seasonality. Reviews in Fisheries Science & Aquaculture, 31(2), pp.259-
273.
27. Feldner-Busztin, D., Firbas Nisantzis, P., Edmunds, S.J., Boza, G., Racimo, F.,
Gopalakrishnan, S., Limborg, M.T., Lahti, L. and de Polavieja, G.G., 2023. Deal-
ing with dimensionality: the application of machine learning to multi-omics data.
Bioinformatics, 39(2), p.btad021.
28. Zhou, W., Feng, Z., Xu, Y.F., Wang, X. and Lv, H., 2022. Empirical Fourier decom-
position: An accurate signal decomposition method for nonlinear and non-stationary
time series analysis. Mechanical Systems and Signal Processing, 163, p.108155.
29. Simpson, R.B., Kulinkina, A.V. and Naumova, E.N., 2022. Investigating seasonal pat-
terns in enteric infections: a systematic review of time series methods. Epidemiology
& Infection, pp.1-25.
30. Tsilingeridis, O., Moustaka, V. and Vakali, A., 2023. Design and development of a
forecasting tool for the identification of new target markets by open time-series data
and deep learning methods. Applied Soft Computing, 132, p.109843.
31. Wang, J., Li, C., Li, L., Huang, Z., Wang, C., Zhang, H. and Zhang, Z., 2023. In-
SAR time-series deformation forecasting surrounding Salt Lake using deep transformer
models. Science of The Total Environment, 858, p.159744.
32. Zou, Z., Careem, M., Dutta, A. and Thawdar, N., 2023. Joint spatio-temporal pre-
coding for practical non-stationary wireless channels. IEEE Transactions on Commu-
nications, 71(4), pp.2396-2409.
May 22, 2024
33. Zhang, X., Chau, T.K., Chow, Y., Fernando, T. and Iu, H.H.C., 2023. A Novel
Sequence to Sequence Data Modelling Based CNN-LSTM Algorithm for Three Years
Ahead Monthly Peak Load Forecasting. IEEE Transactions on Power Systems.
34. Eskandari, H., Imani, M. and Parsa Moghaddam, M., 2023. Best-tree wavelet packet
transform bidirectional GRU for short-term load forecasting. The Journal of Super-
computing, pp.1-33.
35. Zhang, T., Yan, G., Ren, M., Cheng, L., Li, R. and Xie, G., 2023. Dynamic transfer
soft sensor for concept drift adaptation. Journal of Process Control, 123, pp.50-63.
36. Wang, M., Liu, M., Zhang, D., Qi, J., Fu, W., Zhang, Y., Rao, Q., Bakhshipour,
A.E. and Tan, S.K., 2023. Assessing and optimizing the hydrological performance of
Grey-Green infrastructure systems in response to climate change and non-stationary
time series. Water Research, 232, p.119720.
37. Zhang, J., Guo, F., Hao, K., Huang, B. and Chen, L., 2023. Identification of Errors-in-
Variable System With Heteroscedastic Noise and Partially Known Input Using Varia-
tional Bayesian. IEEE Transactions on Industrial Informatics.
38. Geng, R., Fang, C., Guo, S., Kang, D., Lyu, B., Zhu, S. and Cheng, P., 2023. Flow-
Pinpoint: Localizing Anomalies in Cloud-client Services for Cloud Providers. IEEE
Transactions on Cloud Computing.
39. Javaheri, D., Gorgin, S., Lee, J.A. and Masdari, M., 2023. Fuzzy Logic-Based DDoS
Attacks and Network Traffic Anomaly Detection Methods: Classification, Overview,
and Future Perspectives. Information Sciences.
40. Xiao, C., Xu, X., Lei, Y., Zhang, K., Liu, S. and Zhou, F., 2023. Counterfactual
Graph Learning for Anomaly Detection on Attributed Networks. IEEE Transactions
on Knowledge and Data Engineering.
41. Liu, Y., Guo, Z., Liu, J., Li, C. and Song, L., 2023. OSIN: Object-Centric Scene In-
ference Network for Unsupervised Video Anomaly Detection. IEEE Signal Processing
Letters, 30, pp.359-363.
42. Goyal, A., Mandal, M., Hassija, V., Aloqaily, M. and Chamola, V., 2023. Captiono-
maly: A Deep Learning Toolbox for Anomaly Captioning in Social Surveillance Sys-
tems. IEEE Transactions on Computational Social Systems.
43. Usmani, U.A., Watada, J., Jaafar, J., Aziz, I.A. and Roy, A., 2012. A Deep Learning
Algorithm to Monitor Social Distancing in Real-Time Videos: A Covid-19 Solution. In
Interpretable Cognitive Internet of Things for Healthcare (pp. 73-90). Cham: Springer
International Publishing.
44. Anidjar, O.H., Barak, A., Ben-Moshe, B., Hagai, E. and Tuvyahu, S., 2023. A Stetho-
scope for Drones: Transformers-Based Methods for UAVs Acoustic Anomaly Detection.
IEEE Access, 11, pp.33336-33353.
45. Ma, Y., Al Islam, M.N., Cleland-Huang, J. and Chawla, N.V., 2023. Detecting anoma-
lies in small unmanned aerial systems via graphical normalizing flows. IEEE Intelligent
Systems.
46. Huang, S., Liu, Y., Fung, C., Wang, H., Yang, H. and Luan, Z., 2023. Improv-
ing Log-Based Anomaly Detection by Pre-Training Hierarchical Transformers. IEEE
Transactions on Computers.
May 22, 2024
47. Shan, N., Xu, X., Bao, X., Xu, C., Zhu, G. and Wu, E.Q., 2023. Multisensor Anomaly
Detection and Interpretable Analysis for Linear Induction Motors. IEEE Transactions
on Intelligent Transportation Systems.
48. Shang, Z., Zhao, Z., Yan, R. and Chen, X., 2023. Core loss: Mining core samples
efficiently for robust machine anomaly detection against data pollution. Mechanical
Systems and Signal Processing, 189, p.110046.
49. Mogos, A.S., Liang, X. and Chung, C.Y., 2023. Distribution Transformer Failure Pre-
diction for Predictive Maintenance Using Hybrid One-Class Deep SVDD Classification
and Lightning Strike Failures Data. IEEE Transactions on Power Delivery.
50. Wei, P. and Li, H.X., 2023. Spatiotemporal Entropy for Abnormality Detection and
Localization of Li-ion Battery Packs. IEEE Transactions on Industrial Electronics.
51. Ding, C., Sun, S. and Zhao, J., 2023. MST-GAT: A multimodal spatial–temporal
graph attention network for time series anomaly detection. Information Fusion, 89,
pp.527-536.
52. Shan, N., Xu, X., Bao, X., Xu, C., Zhu, G. and Wu, E.Q., 2023. Multisensor Anomaly
Detection and Interpretable Analysis for Linear Induction Motors. IEEE Transactions
on Intelligent Transportation Systems.
53. Dong, C., Tao, J., Chao, Q., Yu, H. and Liu, C., 2023. Subsequence Time Se-
ries Clustering-Based Unsupervised Approach for Anomaly Detection of Axial Piston
Pumps. IEEE Transactions on Instrumentation and Measurement, 72, pp.1-12.
54. Zheng, Y., Jin, M., Liu, Y., Chi, L., Phan, K.T. and Chen, Y.P.P., 2021. Generative
and contrastive self-supervised learning for graph anomaly detection. IEEE Transac-
tions on Knowledge and Data Engineering.
55. Usmani, U.A., Happonen, A. and Watada, J., 2023, June. Secure Integration of
ioT-Enabled Sensors and Technologies: Engineering Applications for Humanitarian
Impact. In 2023 5th International Congress on Human-Computer Interaction, Opti-
mization and Robotic Applications (HORA) (pp. 1-10). IEEE.
56. Zhou, S., Huang, X., Liu, N., Zhou, H., Chung, F.L. and Huang, L.K., 2023. Improving
Generalizability of Graph Anomaly Detection Models via Data Augmentation. IEEE
Transactions on Knowledge and Data Engineering.
57. He, D., Kim, J., Shi, H. and Ruan, B., 2023. Autonomous anomaly detection on
traffic flow time series with reinforcement learning. Transportation Research Part C:
Emerging Technologies, 150, p.104089.
58. Jézéquel, L., Vu, N.S., Beaudet, J. and Histace, A., 2023. Efficient anomaly detection
using self-supervised multi-cue tasks. IEEE Transactions on Image Processing.
59. Gao, L., Wang, D., Zhuang, L., Sun, X., Huang, M. and Plaza, A., 2023. BS 3 LNet: A
new blind-spot self-supervised learning network for hyperspectral anomaly detection.
IEEE Transactions on Geoscience and Remote Sensing, 61, pp.1-18.
60. Park, S., Jung, S., Jung, S., Rho, S. and Hwang, E., 2021. Sliding window-based
LightGBM model for electric load forecasting using anomaly repair. The Journal of
Supercomputing, 77, pp.12857-12878.

May 22, 2024


61. Usmani, U.A., Roy, A., Watada, J., Jaafar, J. and Aziz, I.A., 2022. Enhanced rein-
forcement learning model for extraction of objects in complex imaging. In Intelligent
Computing: Proceedings of the 2021 Computing Conference, Volume 1 (pp. 946-964).
Springer International Publishing.
62. Sun, S., Liu, J. and Li, W., 2023. Spatial Invariant Tensor Self-Representation Model
for Hyperspectral Anomaly Detection. IEEE Transactions on Cybernetics.
63. Abolmasoumi, A.H., Farahani, A. and Mili, L., 2023. Robust Particle Filter Design
With an Application to Power System State Estimation. IEEE Transactions on Power
Systems.
64. Xing, P. and Li, Z., 2023. Visual anomaly detection via partition memory bank module
and error estimation. IEEE Transactions on Circuits and Systems for Video Technol-
ogy.
65. Usmani, U.A. and Jaafar, J., 2022, November. Machine Learning in Healthcare: Cur-
rent Trends and the Future. In International Conference on Artificial Intelligence
for Smart Community: AISC 2020, 17–18 December, Universiti Teknologi Petronas,
Malaysia (pp. 659-675). Singapore: Springer Nature Singapore.
66. Krishnan, J., Money, R., Beferull-Lozano, B. and Isufi, E., 2023, June. Simplicial
Vector Autoregressive Model For Streaming Edge Flows. In ICASSP 2023-2023 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp.
1-5). IEEE.
67. Chen, L., Chen, D., Shang, Z., Wu, B., Zheng, C., Wen, B. and Zhang, W., 2023.
Multi-scale adaptive graph neural network for multivariate time series forecasting.
IEEE Transactions on Knowledge and Data Engineering.
68. Usmani, U.A., Happonen, A. and Watada, J., 2023, June. Human-Centered Artifi-
cial Intelligence: Designing for User Empowerment and Ethical Considerations. In
2023 5th International Congress on Human-Computer Interaction, Optimization and
Robotic Applications (HORA) (pp. 01-05). IEEE.
69. Miranda-Pascual, À., Guerra-Balboa, P., Parra-Arnau, J., Forné, J. and Strufe, T.,
2023. SoK: Differentially Private Publication of Trajectory Data. Proceedings on
Privacy Enhancing Technologies, 2, pp.496-516.
70. Chao, Q., Gao, H., Tao, J., Liu, C., Wang, Y. and Zhou, J., 2022. Fault diagnosis of
axial piston pumps with multi-sensor data and convolutional neural network. Frontiers
of Mechanical Engineering, 17(3), p.36.
71. Vijay, R.K. and Nanda, S.J., 2023. Earthquake pattern analysis using subsequence
time series clustering. Pattern Analysis and Applications, 26(1), pp.19-37.
72. Köhne, J., Henning, L. and Gühmann, C., 2023. Autoencoder-Based Iterative Mod-
eling and Multivariate Time-Series Subsequence Clustering Algorithm. IEEE Access,
11, pp.18868-18886.
73. Min, H. and Lee, J.G., 2023, February. Temporal Convolutional Network-Based Time-
Series Segmentation. In 2023 IEEE International Conference on Big Data and Smart
Computing (BigComp) (pp. 269-276). IEEE.
74. Wang, G., 2023. Time Series Symbolization Method for the Data Mining K-Means
Algorithm. Discrete Dynamics in Nature and Society, 2023.
May 22, 2024
75. Liu, Y., Zhou, Y., Yang, K. and Wang, X., 2023. Unsupervised Deep Learning for IoT
Time Series. IEEE Internet of Things Journal.
76. De Luca, G. and Pizzolante, F., 2023. Time series clustering from road transport CO2
emission. International Journal of Environmental Studies, pp.1-16.
77. Huang, C., Qi, X., Zheng, J., Zhu, R. and Shen, J., 2023. A maritime traffic route
extraction method based on density-based spatial clustering of applications with noise
for multi-dimensional data. Ocean Engineering, 268, p.113036.
78. Zhou, S., Zhou, J. and Chen, S., 2023. Outlier identification and group satisfaction
of rating experts: density-based spatial clustering of applications with noise based
on multi-objective large-scale group decision-making evaluation. Economic Research-
Ekonomska Istraživanja, 36(1), pp.562-592.
79. Guo, Z., Liu, H., Shi, H., Li, F., Guo, X. and Cheng, B., 2023. KD-Tree-Based
Euclidean Clustering for Tomographic SAR Point Cloud Extraction and Segmentation.
IEEE Geoscience and Remote Sensing Letters.
80. Jiao, J., Wang, X., Wei, T. and Zhang, J., 2023. An adaptive fuzzy c-mean noise image
segmentation algorithm combining local and regional information?. IEEE Transactions
on Fuzzy Systems.
81. Xu, H., Pang, G., Wang, Y. and Wang, Y., 2023. Deep isolation forest for anomaly
detection. IEEE Transactions on Knowledge and Data Engineering.
82. Carletti, M., Terzi, M. and Susto, G.A., 2023. Interpretable Anomaly Detection with
DIFFI: Depth-based feature importance of Isolation Forest. Engineering Applications
of Artificial Intelligence, 119, p.105730.
83. Zheng, Y., Wang, S. and Chen, B., 2023. Multikernel Correntropy Based Robust Least
Squares One-Class Support Vector Machine. Neurocomputing, p.126324.
84. Storch, M., de Lange, N., Jarmer, T. and Waske, B., 2023. Detecting Historical
Terrain Anomalies With UAV-LiDAR Data Using Spline-Approximation and Support
Vector Machines. IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing, 16, pp.3158-3173.
85. Kurup, A.R., Summers, A., Bidram, A., Reno, M.J. and Martínez-Ramón, M., 2023.
Ensemble models for circuit topology estimation, fault detection and classification in
distribution systems. Sustainable Energy, Grids and Networks, 34, p.101017.
86. Wan, T.H., Tsang, C.W., Hui, K. and Chung, E., 2023. Anomaly detection of train
wheels utilizing short-time Fourier transform and unsupervised learning algorithms.
Engineering Applications of Artificial Intelligence, 122, p.106037.
87. Kayhan, V.O., Agrawal, M. and Shivendu, S., 2023. Cyber threat detection: Unsuper-
vised hunting of anomalous commands (UHAC). Decision Support Systems, p.113928.
88. Luo, Y., Liu, Y., Yang, W., Zhou, J. and Lv, T., 2023. Distributed filtering algorithm
based on local outlier factor under data integrity attacks. Journal of the Franklin
Institute.
89. Fahim, A., Adaptive Density-Based Spatial Clustering of Applications with Noise
(ADBSCAN) for Clusters of Different Densities.’

May 22, 2024


90. Wang, J., Wang, Y., Wang, J., Shen, Y. and Zhao, X., 2023, March. Design of an
Image Segmentation System Based on Hierarchical Density-Based Spatial Clustering
of Applications with Noise for Off-Road Unmanned Ground Vehicles. In Proceedings
of 2022 International Conference on Autonomous Unmanned Systems (ICAUS 2022)
(pp. 3163-3175). Singapore: Springer Nature Singapore.
91. Kashiwao, T., Tanoue, H., Shiraishi, N., Misaki, Y., Ando, T., Tanaka, D. and Ikeda,
K., 2023. A One-Class SVM-Based Approach for Crossing-Gate Rod Breakage Detec-
tion in a Railway Telemeter System. IEEJ Transactions on Electrical and Electronic
Engineering, 18(4), pp.648-650.
92. Ouyang, T. and Zhang, X., 2023. Fuzzy rule-based anomaly detectors construction
via information granulation. Information Sciences, 622, pp.985-998.
93. Duan, P., Kang, X., Ghamisi, P. and Li, S., 2023. Hyperspectral remote sensing
benchmark database for oil spill detection with an isolation forest-guided unsupervised
detector. IEEE Transactions on Geoscience and Remote Sensing.
94. Bellaj, K., Boujena, S. and Benmir, M., 2023, April. Dermatologist-Level Classifi-
cation of Skin Cancer with Level Set Method and Isolation Forest. In Proceedings
of the 3rd International Conference on Electronic Engineering and Renewable Energy
Systems: ICEERE 2022, 20-22 May 2022, Saidia, Morocco (pp. 37-45). Singapore:
Springer Nature Singapore.
95. AbuAlghanam, O., Alazzam, H., Alhenawi, E.A., Qatawneh, M. and Adwan, O., 2023.
Fusion-based anomaly detection system using modified isolation forest for internet of
things. Journal of Ambient Intelligence and Humanized Computing, 14(1), pp.131-145.
96. Cheng, X., Zhang, M., Lin, S., Zhou, K., Zhao, S. and Wang, H., 2023. Two-stream
Isolation Forest Based on Deep Features for Hyperspectral Anomaly Detection. IEEE
Geoscience and Remote Sensing Letters.
97. Landauer, M., Onder, S., Skopik, F. and Wurzenberger, M., 2023. Deep learning for
anomaly detection in log data: A survey. Machine Learning with Applications, 12,
p.100470.
98. Dong, C., Tao, J., Chao, Q., Yu, H. and Liu, C., 2023. Subsequence Time Se-
ries Clustering-Based Unsupervised Approach for Anomaly Detection of Axial Piston
Pumps. IEEE Transactions on Instrumentation and Measurement, 72, pp.1-12.
99. Chen, X., Xia, Y., Sun, Y., Wu, L., Chen, X., Chen, X. and Zhang, X., 2023. Silent
Speech Recognition Based on High-Density Surface Electromyogram Using Hybrid
Neural Networks. IEEE Transactions on Human-Machine Systems.
100. Zhao, J., Deng, F., Zhu, J. and Chen, J., 2023. Searching Density-increasing Path to
Local Density Peaks for Unsupervised Anomaly Detection. IEEE Transactions on Big
Data.
101. Liu, B., Tan, P.N. and Zhou, J., 2022, June. Unsupervised anomaly detection by robust
density estimation. In Proceedings of the AAAI Conference on Artificial Intelligence
(Vol. 36, No. 4, pp. 4101-4108).
102. Ren, X., Wang, Y., Huang, Y., Mustafa, M., Sun, D., Xue, F., Chen, D., Xu, L. and
Wu, F., 2023. A CNN-Based E-Nose Using Time Series Features for Food Freshness
Classification. IEEE Sensors Journal, 23(6), pp.6027-6038.
May 22, 2024
103. Zhu, J., Deng, F., Zhao, J., Liu, D. and Chen, J., 2023. UAED: Unsupervised Abnor-
mal Emotion Detection Network Based on Wearable Mobile Device. IEEE Transac-
tions on Network Science and Engineering.
104. Wang, D., Zhuang, L., Gao, L., Sun, X., Huang, M. and Plaza, A., 2023. PDB-
SNet: Pixel-shuffle Down-sampling Blind-Spot Reconstruction Network for Hyper-
spectral Anomaly Detection. IEEE Transactions on Geoscience and Remote Sensing.
105. Liu, Z., Wang, C., Yang, X., Zhang, N., Liu, F. and Zhang, B., 2023. Time Series
Multi-Step Forecasting Based on Memory Network for the Prognostics and Health
Management in Freight Train Braking System. IEEE Transactions on Intelligent
Transportation Systems.
106. Madicar, N., Sivaraks, H., Rodpongpun, S. and Ratanamahatana, C.A., 2013, January.
Parameter-free subsequences time series clustering with various-width clusters. In 2013
5th International Conference on Knowledge and Smart Technology (KST) (pp. 150-
155). IEEE.
107. Ni, L., Li, J., Xu, H., Wang, X. and Zhang, J., 2023. Fraud feature boosting mechanism
and spiral oversampling balancing technique for credit card fraud detection. IEEE
Transactions on Computational Social Systems.
108. Konstantinou, T. and Hatziargyriou, N., 2023. Day-Ahead Parametric Probabilistic
Forecasting of Wind and Solar Power Generation using Bounded Probability Distri-
butions and Hybrid Neural Networks. IEEE Transactions on Sustainable Energy.
109. Zhu, Y., Zhang, Y., Han, K. and Hu, J., 2023. A Nonlinear Function Logic Computing
Architecture with Low Switching Activity. IEEE Transactions on Circuits and Systems
II: Express Briefs.
110. Hua, Z., Zhou, B., Or, S.W., Zhang, J., Li, C. and Wei, J., 2023. Robust Emergency
Preparedness Planning for Resilience Enhancement of Energy-Transportation Nexus
Against Extreme Rainfalls. IEEE Transactions on Industry Applications.
111. Zhang, C., Dai, W., Isoni, V. and Sourin, A., 2023. Automated Anomaly Detection
for Surface Defects by Dual Generative Networks With Limited Training Data. IEEE
Transactions on Industrial Informatics.
112. Peng, S.J., Fan, Y., Cheung, Y.M., Liu, X., Cui, Z. and Li, T., 2023. Towards Efficient
Cross-Modal Anomaly Detection Using Triple-Adaptive Network and Bi-Quintuple
Contrastive Learning. IEEE Transactions on Emerging Topics in Computational In-
telligence.
113. Zhao, P., Cao, D., Wang, Y., Chen, Z. and Hu, W., 2023. Gaussian Process-aided
Transfer Learning for Probabilistic Load Forecasting against Anomalous Events. IEEE
Transactions on Power Systems.
114. Yu, H., Kang, C., Xiao, Y. and Ting, Y., 2023. Network Intrusion Detection Method
Based on Hybrid Improved Residual Network blocks and Bidirectional Gated Recur-
rent Units. IEEE Access.
115. Liu, F., Zhou, F. and Ma, L., 2023. An Automatic Detection Framework for Electrical
Anomalies in Electrified Rail Transit System. IEEE Transactions on Instrumentation
and Measurement, 72, pp.1-13.
May 22, 2024
116. Ren, Z., Li, X., Wu, X., Peng, J., Chen, K., Tan, Q. and Shi, C., 2023. MTGAE:
Graph Autoencoder with Mirror TCN for Traffic Anomaly Detection.
117. Gu, M., Zhang, Y., Wen, Y., Ai, G., Zhang, H., Wang, P. and Wang, G., 2023. A
lightweight convolutional neural network hardware implementation for wearable heart
rate anomaly detection. Computers in Biology and Medicine, p.106623.
118. Kim, M., Yu, J., Kim, J., Oh, T.H. and Choi, J.K., 2023. An Iterative Method for
Unsupervised Robust Anomaly Detection Under Data Contamination. IEEE Trans-
actions on Neural Networks and Learning Systems.
119. Zhou, X., Niu, S., Li, X., Zhao, H., Gao, X., Liu, T. and Dong, J., 2023. Spa-
tial–contextual variational autoencoder with attention correction for anomaly detec-
tion in retinal OCT images. Computers in Biology and Medicine, 152, p.106328.
120. Guha, D., Chatterjee, R. and Sikdar, B., 2023. Anomaly Detection Using LSTM-
Based Variational Autoencoder in Unsupervised Data in Power Grid. IEEE Systems
Journal.
121. Wang, Y., Yuan, X., Lin, Y., Gu, J. and Zhang, M., 2023. A Semi-Supervised Multi-
Scale Deep Adversarial Model for Fan Anomaly Detection. IEEE Transactions on
Consumer Electronics.
122. Saha, S., Haque, A. and Sidebottom, G., 2023. Analyzing the Impact of Outlier Data
Points on Multi-Step Internet Traffic Prediction using Deep Sequence Models. IEEE
Transactions on Network and Service Management.
123. Mogos, A.S., Liang, X. and Chung, C.Y., 2023. Distribution Transformer Failure Pre-
diction for Predictive Maintenance Using Hybrid One-Class Deep SVDD Classification
and Lightning Strike Failures Data. IEEE Transactions on Power Delivery.
124. Sun, Y., Chen, T., Nguyen, Q.V.H. and Yin, H., 2023. TinyAD: Memory-efficient
anomaly detection for time series data in Industrial IoT. IEEE Transactions on Indus-
trial Informatics.
125. Jain, P., Jain, S., Zaïane, O.R. and Srivastava, A., 2021. Anomaly detection in resource
constrained environments with streaming data. IEEE Transactions on Emerging Top-
ics in Computational Intelligence, 6(3), pp.649-659.
126. Shan, N., Xu, X., Bao, X., Xu, C., Zhu, G. and Wu, E.Q., 2023. Multisensor Anomaly
Detection and Interpretable Analysis for Linear Induction Motors. IEEE Transactions
on Intelligent Transportation Systems.
127. Park, J., Kim, B. and Kim, H., 2023, February. MENDEL: Time series anomaly detec-
tion using transfer learning for industrial control systems. In 2023 IEEE International
Conference on Big Data and Smart Computing (BigComp) (pp. 261-268). IEEE.
128. Kong, L., Yu, J., Tang, D., Song, Y. and Han, D., 2023. Multivariate Time Series
Anomaly Detection with Generative Adversarial Networks Based on Active Distortion
Transformer. IEEE Sensors Journal.
129. Nakamura, T., Mercer, R., Imamura, M. and Keogh, E., 2023. MERLIN++:
parameter-free discovery of time series anomalies. Data Mining and Knowledge Dis-
covery, pp.1-40.

May 22, 2024


130. Yan, H., Li, F., Chen, J., Liu, Z., Wang, J., Feng, Y. and Zhang, X., 2023. A Graph
Embedded In Graph Framework With Dual-sequence Input For Efficient Anomaly
Detection of Complex Equipment Under Insufficient Samples. Reliability Engineering
& System Safety, p.109418.
131. Qin, K., Xu, M., Muhammad, B.A. and Han, J., MTAD RF: Multivariate Time-series
Anomaly Detection based on Reconstruction and Forecast.
132. Zhang, Z., Zhao, L., Cai, D., Feng, S., Miao, J., Guan, Y., Tao, H. and Cao, J.,
2022, November. Time Series Anomaly Detection for Smart Grids via Multiple Self-
Supervised Tasks Learning. In 2022 IEEE International Conference on Knowledge
Graph (ICKG) (pp. 392-397). IEEE.
133. Li, Y., Peng, X., Wu, Z., Yang, F., He, X. and Li, Z., 2023. M3GAN: A masking
strategy with a mutable filter for multidimensional anomaly detection. Knowledge-
Based Systems, p.110585.
134. Koren, O., Koren, M. and Peretz, O., 2023. A procedure for anomaly detection and
analysis. Engineering Applications of Artificial Intelligence, 117, p.105503.
135. Tao, H., Miao, J., Zhao, L., Zhang, Z., Feng, S., Wang, S. and Cao, J., 2023. HAN-
CAD: hierarchical attention network for context anomaly detection in multivariate
time series. World Wide Web, pp.1-16.
136. Wang, Z., Qian, K., Liu, H., Hu, B., Schuller, B.W. and Yamamoto, Y., 2023. Explor-
ing interpretable representations for heart sound abnormality detection. Biomedical
Signal Processing and Control, 82, p.104569.
137. Gao, Y., Wang, X., He, X., Liu, Z., Feng, H. and Zhang, Y., 2023, February. Alle-
viating Structural Distribution Shift in Graph Anomaly Detection. In Proceedings of
the Sixteenth ACM International Conference on Web Search and Data Mining (pp.
357-365).
138. Tang, C., Xu, L., Yang, B., Tang, Y. and Zhao, D., 2023. GRU-Based Interpretable
Multivariate Time Series Anomaly Detection in Industrial Control System. Computers
& Security, p.103094.
139. Şengönül, E., Samet, R., Abu Al-Haija, Q., Alqahtani, A., Alturki, B. and Alsulami,
A.A., 2023. An Analysis of Artificial Intelligence Techniques in Surveillance Video
Anomaly Detection: A Comprehensive Survey. Applied Sciences, 13(8), p.4956.
140. Asha, S., Shanmugapriya, D. and Padmavathi, G., 2023. Malicious insider threat
detection using variation of sampling methods for anomaly detection in cloud environ-
ment. Computers and Electrical Engineering, 105, p.108519.
141. Kim, B., Alawami, M.A., Kim, E., Oh, S., Park, J. and Kim, H., 2023. A comparative
study of time series anomaly detection models for industrial control systems. Sensors,
23(3), p.1310.
142. Wang, X., Yao, Z. and Papaefthymiou, M., 2023. A real-time electrical load forecasting
and unsupervised anomaly detection framework. Applied Energy, 330, p.120279.
143. Wang, Y., Liu, T., Zhou, J. and Guan, J., 2023. Video anomaly detection based on
spatio-temporal relationships among objects. Neurocomputing, 532, pp.141-151.

May 22, 2024


144. Zipfel, J., Verworner, F., Fischer, M., Wieland, U., Kraus, M. and Zschech, P., 2023.
Anomaly detection for industrial quality assurance: A comparative evaluation of un-
supervised deep learning models. Computers & Industrial Engineering, 177, p.109045.
145. Panja, S., Patowary, N., Saha, S. and Nag, A., 2023, January. Anomaly Detection
in IoT Using Extended Isolation Forest. In Artificial Intelligence: First International
Symposium, ISAI 2022, Haldia, India, February 17-22, 2022, Revised Selected Papers
(pp. 3-14). Cham: Springer Nature Switzerland.
146. Zeiser, A., Özcan, B., van Stein, B. and Bäck, T., 2023. Evaluation of deep unsuper-
vised anomaly detection methods with a data-centric approach for on-line inspection.
Computers in Industry, 146, p.103852.
147. Langfu, C.U.I., Zhang, Q., Yan, S.H.I., Liman, Y.A.N.G., Yixuan, W.A.N.G., Junle,
W.A.N.G. and Chenggang, B.A.I., 2023. A method for satellite time series anomaly
detection based on fast-DTW and improved-KNN. Chinese Journal of Aeronautics,
36(2), pp.149-159.
148. Xu, H., Pang, G., Wang, Y. and Wang, Y., 2023. Deep isolation forest for anomaly
detection. IEEE Transactions on Knowledge and Data Engineering.
149. Lei, X., Xia, Y., Wang, A., Jian, X., Zhong, H. and Sun, L., 2023. Mutual information
based anomaly detection of monitoring data with attention mechanism and residual
learning. Mechanical Systems and Signal Processing, 182, p.109607.
150. Abusitta, A., de Carvalho, G.H., Wahab, O.A., Halabi, T., Fung, B.C. and Al
Mamoori, S., 2023. Deep learning-enabled anomaly detection for IoT systems. In-
ternet of Things, 21, p.100656.

Author biography

Usman Ahmad Usmani USMAN AHMAD USMANI was born in Aligarh,


India, in April 1993. He is currently pursuing the
Ph.D. degree in computer science with the Universiti
Teknologi PETRONAS, Malaysia. He has
worked as a Research Assistant at IIT Kanpur
and as a Researcher with Massey University,
New Zealand. He has built up a social network
named Zamber that has been published in around
14 national newspapers. His area of research
interests include articial intelligence, computer
vision, computer security, wearable sensors, and cloud computing.

May 22, 2024


Jafreezal Jaafar JAFREEZAL JAAFAR (Senior Member, IEEE) received
the Ph.D. degree from The University of Edinburgh, U.K., in 2009. He
is currently an Associate Professor and the former Head of the Computer
and Information Sciences Department at Universiti Teknologi PETRONAS,
Malaysia. His main research areas include big data analytics, soft comput-
ing, and software engineering. He has secured a number of research projects
from the industry and government agencies. Based on his publication track
records, he had been appointed as the chief editor and a reviewer for several journals, and
also the chair, technical chair, and committee member for several international conferences.
He is also active in IEEE Computer Society, Malaysia Chapter, and he has been appointed
as the Executive Committee Member in 2016 and 2017.

May 22, 2024


List of Tables
1 Challenges in DL-Based Anomaly Detection for Time-Series Data: Process
Descriptions and Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 Type of Anomaly Detection Technique and the Algorithms used. . . . . . . 46
3 Pros and Cons of the Anomaly Detection Model used . . . . . . . . . . . . . 47
4 Parameters used in the Anomaly Detection Method. . . . . . . . . . . . . 48
5 Overview of Key Attributes and Description of the Datasets used in the Ex-
periments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Anomaly detection accuracy in terms of precision (%), recall (%), and F1-
score, on three datasets with ground-truth anomalies. . . . . . . . . . . . . 50

May 22, 2024


Table 1: Challenges in DL-Based Anomaly Detection for Time-Series Data: Process Descriptions and In-
sights
Process Challenges Description
Data source Lack of public The number of data points in the available
dataset public datasets [4, 5] is insufficient.

Data source Unbalanced datasetThe classifier is biassed when the fraction of


anomalous points is too low.
Data preprocess Noise and missing Model performance is hampered by noise
value and missing values in real-world data.

Feature learning Complex contextual Real-world data has temporal-spatial and


information external semantics, making it difficult to ex-
tract them altogether.

Seasonal shift Seasonality is present in real-world data.

Trend change Real-world data has a trend component.

Concept drift Real-world data suffers from idea drift,


causing the trained model’s performance to
deteriorate.

Anomaly detection High computational Because online mode requires minimal la-
cost tency, many current costly computational
cost models are inapplicable.

May 22, 2024


Table 2: Type of Anomaly Detection Technique and the Algorithms used.
Type Idea Reference Algorithm

Rule based Thresholding Ref [92] Box Plot

Machine learning Clustering Ref [8, 9, 10, 11, 12], KNN


based Ref [10, 11, 12] DBSCAN

Classification Ref [16, 17, 18], Ref OC-SVM,MC-SVM


[94, 95, 96, 22] Isolation Forest

Prediction Ref [2, 17], Ref [23], ARIMA


Ref [14] FB-Prophet
Holt-Winters

Deep learning based Regression Ref [25, 30, 31, 32,


33, 34, 35, 36]

Regression Ref [28, 44] TCN

Regression Ref [26, 37, 38, 39, LSTM,Attention-


40, 41, 42] LSTM,GRU-LSTM

Regression Ref [46, 47, 48, 49] HTM

Compression Ref [51, 55, 56, 57, AE, CNN-AE,


58, 59, 60, 61, 62, LSTM-AE, RNN-
63, 64, 65] AE

Compression Ref [52, 66, 67, 68, VAE, CNN-VAE,


69] GGM-VAE, LSTM-
VAE

Compression Ref [53, 70, 71, 72] GAN

May 22, 2024


Table 3: Pros and Cons of the Anomaly Detection Model used
Model Pros Cons
AutoEncoder [119] Flexible approach to model- Does not support variational
ing complex non-linear pat- inference and requires a large
terns in data dataset for training
Variational AutoEncoder Supports variational infer- Requires a large amount of
[120] ence training data, training can
take a while

GAN (BiGAN) [121] Variational inference is sup- To get reliable results, a large
ported. quantity of training data
and a longer training period
(epochs) are required[9].
Sequence-to-Sequence Appropriate for data having Slow deduction Training may
Model[122] temporal components (e.g., be time-consuming. Varia-
discretized time series data) tional inference is supported.
One Class SVM [123] Does not need a vast quan- Capability to capture com-
tity of data Quick to train plicated correlations within
Quick inference time data is limited.

May 22, 2024


Table 4: Parameters used in the Anomaly Detection Method.
Method Encoder Decoder Other Parameters

PCA [29] NA NA 2 Component PCA

OCSVM NA NA Kernel: Rbf, Outlier fraction:


[42] 0.01; gamma: 0.5, Anomaly
score as distance from
decision boundary.
Autoen- 2 hidden layers [15, 2 hidden layers [15, 7] Latent dimension: 2, Batch
coder 7] size: 256, Loss: Mean squared
[34] error

Varia- 2 hidden layers [15, 2 hidden layers [15, 7] Latent dimension: 2; Batch
tional 7] size: 256; Loss: Mean squared
Autoen- error + KL divergence
coder
[61]
Sequence 1 hidden layer, [10] 1 hidden layer [20] Bidirectional LSTMs; Batch
to size: 256; Loss: Mean
Sequence Squared error
Model
[72]
Bidirec- Encoder: 2 hidden Generator: 2 hidden Latent dimension: 32; Loss:
tional layers [15, layers [15, 7] Binary Cross Entropy;
GAN [56] 7],Generator: 2 Discriminator: 2 Learning rate: 0.1
hidden layers [15, 7] hidden layers [15, 7]

May 22, 2024


Table 5: Overview of Key Attributes and Description of the Datasets used in the Experiments
Dataset Name Data Size Anomaly Label Training Set Testing Set
SWaT [124] Contains a total Includes pre- 480600 430800
of 51 attributes labeled anomalies,
or variables col- where specific in-
lected from the stances are flagged
water treatment as either normal
testbed. Size is or anomalous.
7.2 GB
SMAP [125] Covers a global May or may 135183 427617
extent and has not include pre-
a size of approx- labeled anomalies.
imately several
terabytes (TB)
in total
Server Machine Multivariate, Includes pre- 708405 708420
Dataset [126] Sequential labeled anomalies,
and Number where specific in-
of Instances: stances of network
7062606 traffic are flagged
as either normal
or anomalous
Water Distri- Contains data Includes pre- 1027531 160705
bution (WADI) collected from a labeled anomalies
[127] real industrial where specific
control system instances of sys-
(ICS) deployed tem behavior
in a water are classified as
treatment either normal or
facility. anomalous based
on known attack
scenarios
Mars Science Consists of a Does not in- 58317 73729
Laboratory substantial vol- clude pre-labeled
Rover (MSL) ume of data col- anomalies since its
[128] lected by the primary focus is
rover on scientific explo-
ration and data
collection rather
than anomaly
detection

May 22, 2024


Table 6: Anomaly detection accuracy in terms of precision (%), recall (%), and F1-score, on three datasets with ground-truth anomalies.
Methods
SWaT WADI SMD SMAP MSL
Precision
RecallAUCF1 Precision
Recall AUC F1 Precision
Recall F1 Precision
Recall F1 Precision
Recall F1
MERLIN0.656 0.25470.61750.36690.06360.7669 0.5912 0.1174 0.5941 0.8531 0.7012 0.4421 0.5111 0.47120.5681 0.674 0.6213
[128]
LSTM-0.778 0.51090.7140.61670.01380.7823 0.6721 0.0271 0.8712 0.7881 0.8311 0.7161 0.9887 0.83010.8601 0.9761 0.9123
NDT
[129]
DAG[130]
0.99 0.68790.84360.81280.076 0.99810.8563 0.1412 0.6731 0.8451 0.7523 0.6331 0.9812 0.78120.7561 0.9801 0.8512
Omni 0.97 0.6957 0.84670.81310.31580.6541 0.8198 0.423 0.9812 0.9441 0.9612 0.7592 0.9981 0.85130.9142 0.8891 0.9013

Anomaly
[131]
MAD-0.95930.69570.84630.80650.22330.9124 0.8026 0.3588 0.9311 0.9436 0.9632 0.7701 0.9761 0.86130.8813 0.9791 0.9312
GAN
[132]
USAD0.9977 0.68790.8460.8143 0.18730.8296 0.8723 0.3056 0.7323 0.7812 0.7625 0.7921 0.9831 0.88120.7892 0.9742 0.8713
[133]
MTAD- 0.97180.69570.84640.81090.28180.8012 0.88210.4169 0.9472 0.9781 0.9621 0.8000 0.9912 0.89130.93610.9813 0.9614
GAT
[134]
CAE- 0.96970.69570.84640.81010.27820.7918 0.8728 0.4117 0.9811 0.9832 0.9821 0.8321 0.9946 0.90120.9212 0.9913 0.9514
M
[135]
GDN 0.95910.69570.84620.81010.29120.7931 0.8777 0.426 0.9824 0.98620.9512 0.84010.9932 0.90130.9041 0.9923 0.9526
[136]
GRN- 0.997 0.592 0.8780.738 0.9650.2497 0.7832 0.3981 0.9839 0.9641 0.98650.8261 0.9951 0.90120.918 0.99320.9611
50
[137]

May 22, 2024

You might also like