Time - Series - Data 2024 05 22 05 16
Time - Series - Data 2024 05 22 05 16
Time - Series - Data 2024 05 22 05 16
Abstract
Industries are generating massive amounts of data due to increased automation and intercon-
nectedness. As data from various sources becomes more available, the extraction of relevant
information becomes crucial for understanding complex systems’ behavior and performance.
The growing volume and complexity of time-series data in diverse industries have created a
demand for effective anomaly detection methods. Detecting anomalies in multivariate time-
series data presents unique challenges, such as the presence of multiple correlated variables
and intricate relationships among them. Traditional approaches often fall short in detecting
anomalies, making deep learning methods a promising solution. This review article pro-
vides a comprehensive analysis of different deep learning techniques for anomaly detection
in time-series data, examining their applicability across various industries and discussing the
associated challenges. The article emphasizes the significance of deep learning in detecting
anomalies and offers valuable insights to inform decision-making processes. Furthermore, it
proposes recommendations for model developers, advocating for the development of hybrid
models that combine different deep learning techniques and the exploration of attention
mechanisms in Recurrent Neural Networks (RNNs). These recommendations aim to over-
come the challenges associated with deep learning-based anomaly detection in multivariate
time-series data.
Keywords: Deep Learning, Time Series Analysis, Anomaly Detection, Multivariate
Time-Series, Automation, Univariate Time Series
1. Introduction
Industries are generating an unprecedented amount of data from various sources, ranging
from sensors, machines, and production lines to automated systems. Data owners collect
2. Background
2.1. Anomalies in Time Series Data
Anomalous data refers to observations that do not follow the expected patterns in a
dataset. Anomalies can have various definitions and interpretations, depending on the con-
text and nature of the data. Hawkins [18] introduced a widely accepted definition of an
outlier as an observation that exhibits a significant deviation from other observations, im-
plying that it was generated by a distinct mechanism. While this definition is commonly
used in the literature, alternative definitions also exist. [19][20] In the realm of time series
data, anomalies can be categorized into various types depending on the nature of the de-
viation they represent. These classifications help provide a more nuanced understanding of
different anomaly patterns and guide the development of specialized detection techniques
for each type.
Point Anomaly: One type is point anomaly, which refers to a single data point that is
significantly different from the rest of the data points. For example, in a temperature time
series data, a point anomaly could be a sudden spike or dip in temperature that does not
follow the usual seasonal trend.
Contextual Anomaly: Another type of anomaly is contextual anomaly, where a data
point is not anomalous by itself but becomes anomalous when compared to other data points
in its context. For example, in a credit card transaction dataset, a purchase of $500 might
not be anomalous by itself, but if the usual spending pattern of the cardholder is around
$50, then this purchase becomes a contextual anomaly.
Collective Anomaly: Collective anomalies are groups of data points that deviate from
the expected behavior of a time series. They can be detected using statistical methods, ML
algorithms, or rule-based systems. Identifying anomalies is crucial in domains like fraud
detection and predictive maintenance, and the choice of detection method depends on the
data, dataset size, and application. Other types of anomalies can also occur in datasets
apart from those mentioned. Here are some of the other types of anomalies:
Missing Anomaly: These occur when there are missing values in the dataset. Missing
values can be caused by various factors, such as equipment failure, human error, or data
corruption.
May 22, 2024
Minor Anomaly: These refer to data points that deviate slightly from the expected
pattern but are not significant enough to be considered as outliers. Minor anomalies are not
noticeable on their own but can accumulate over time and affect the overall data quality.
Outliers: Outliers are data points that deviate significantly from the expected pattern
and are often caused by measurement errors, data entry errors, or rare events. [21]Outliers
can be either point anomalies or collective anomalies, depending on whether they occur as
a single data point or a group of data points.
Square Anomaly: These are anomalies that occur in image or video data and refer to
a group of pixels that have different properties than their surrounding pixels. [22]Square
anomalies can be caused by image compression, camera malfunction, or image manipulation.
Trends: A trend is an anomaly that occurs when there is a long-term change in the
pattern of the data. [23]Trends can be either increasing or decreasing and can be caused
by various factors, such as changes in the environment, economic conditions, or population
dynamics.
Normal Anomaly: Normal Anomaly pattern with stable amplitude and frequency
over time steps could imply a pattern that is generally expected or typical (normal), but
occasionally exhibits significant deviations (anomalies) from the expected behavior.
Figure 1 provides an overview of anomaly classification in time series data, illustrating
diverse types of anomalies with examples. It aids in comprehending anomaly characteristics
and manifestations, assisting in anomaly detection and handling in data analysis and ML.
[27] Multivariate time-series data can be more complex to analyze than univariate time-series
data, as there can be interactions between the different variables over time.
Non-Stationarity: Non-Stationarity in time series data occurs when statistical proper-
ties change over time due to factors like trends or seasonality. Analyzing and modeling such
data can be challenging as many models assume stationarity. [28] Methods like differencing
and detrending can transform nonstationary data, while multivariate data may require tech-
niques like vector autoregression, depending on the problem and available resources. Table
1 presents an overview of the challenges involved in DL-based anomaly detection for time-
series data, along with process descriptions and insights. This overview aims to shed light
on the complexities and difficulties that arise when employing DL techniques for the detec-
tion of anomalies in time-series data. By understanding these challenges, one can better
navigate the intricacies of applying DL models to identify anomalies in time-series datasets,
thus reducing dependency on external research for guidance.
[Table 1 about here.]
Temporality-Temporality plays a vital role in time-series analysis as it captures the sequen-
tial nature of data points, allowing for the identification of dependencies and correlations
May 22, 2024
over time. A time series is a chronological collection of observations, essential for under-
standing trends and patterns such asabcdand indexed by date. [29] The data is collected on
one day (say, t=1) and end on a different date (say, t=T). (y1 ,y2 ,.....yT ). To gather data at
equidistant time intervals as shown in Equation 1 as follows:
T = (td0 , td1 , ....tdt ), d ∈ N+ , t ∈ N+ , t ∈ N (1)
Trend- A time series with a non-constant mean that either increases or decreases over
time is considered to have a trend, which can be linear or non-linear. Trends show significant
changes in the value of a series over time and typically last a few weeks before fading,
without repeating. Similarly, newly released music tends to be popular for a brief period
before dissipating, but there is a high likelihood of it becoming popular again soon [31] The
Equation below creates a deterministic trend as shown in Equation 2 as follows:
yt = β0 + β1t + ηt (2)
where t is an ARIMA process. We can difference both sides to get as shown in Equation
3 as follows:
yt = β1 + ηt (3)
yt = yt−1 + β1 + ηt
where ηtI is an ARIMA process.
Stationarity- Stationarity refers to a crucial property of stochastic processes where the
distribution of a finite sub-sequence of random variables remains unchanged when shifted
along the time index axis. It is formally defined as strict-sense stationarity and is funda-
mental in understanding stochastic processes. [32]. Seasonality refers to regular, patterned
changes in time series data influenced by factors like time of year or day of the week. [33]
It repeats within a year and has a predetermined length. In economic sectors, seasonality
captures year-to-year patterns and fluctuations, aiding in understanding and predicting in-
dustry behavior. The equation for the seasonal component equation is as follows in Equation
4:
st = γ ∗ (1 − α)(yt − lt−1 − bt−1 ) + [1 − γ ∗ (1 − α)]st−m (4)
which is similar to the smoothing equation for the seasonal component we showed with
γ = γ ∗ (1 − α), which translates to 0 ≤ γ ≤ 1
Cyclic-Time series can exhibit cyclic patterns when they display increases and declines
without a predetermined time. These oscillations, lasting two years or more, are different
from seasonal behavior which has a continuous period related to a calendar component. [34]
Cycles have longer average durations and more variable amplitudes compared to seasonal
patterns.
Concept Drift- Concept drift refers to changing behavior in the target variable or
feature-target relationships over time, impacting model accuracy. [35] Adapting to concept
drift involves monitoring, adjusting, and periodically retraining models to mitigate its effects,
especially in complex phenomena influenced by human activities like sales prediction and
socio-economic processes.
Level- The mean of a time series corresponds to its level. In the case of a patterned time
series, the level is typically described as dynamic, reflecting its changing nature. Conversely,
a stationary time series maintains constant characteristics over time, allowing for formal rep-
resentation and analysis. Stationary time series exhibit consistent properties throughout,
May 22, 2024
including a fixed mean, variance, and autocorrelation structure. Recognizing the distinction
between changing levels in patterned time series and the stationary nature of others facili-
tates a more systematic understanding of the data, reducing the need for extensive external
research. [36] If the distribution of (xt ,..., xt+s ) is the same for all s, then Xt is a stationary
time series. A stationary time series x1 ,..., xT has the following features, according to the
fact that there is no trend in the time series since the mean is constant. The variance of the
time series remains constant, indicating a consistent spread of data points. Additionally,
the autocorrelation remains unchanged throughout the time series, indicating a persistent
relationship between data points and their lagged values.
White Noise-Noise in time-series data refers to random variations or errors, often caused
by measurement or sampling errors. Removing or reducing noise is crucial for accurate
analysis. White noise is a type of noise with a constant mean and finite variance, where
the PACF and ACF are both zero, indicating no dependency between timestamps. Many
theoretical models assume the white noise to be Gaussian: ε ∼ N(0, σ). A single time series
consists of scalar observations collected regularly, such as monthly CO2 concentrations, while
a multivariate time series includes multiple variables and timestamps expressed as Xt = (x1,t ,
x2,t, ..., xd,t ), if Xt ∈Rd . As a result,p th order VAR model Xt has the attribute of being a vector
(d×1) as shown in Equation 5 as follows:
Xt = c + O1 Xt−1 + O2 Xt−2 + . . . + Op Xt−p + εt (5)
where Ot is a d×d matrix. In the same manner as the AR-Model is for univariate time
series, εt is white noise.
3. Applications
Industries leverage digital technologies like AI, IoT, and big data analytics to enhance
competitiveness. Integration of digital components with real-world phenomena using sen-
sors and data analytics enables proactive strategies for predictive maintenance, reducing
downtime and improving efficiency. Anomaly detection powered by ML detects abnormal-
ities, minimizing downtime, enhancing efficiency, and enabling informed decisions across
sectors like finance, manufacturing, healthcare, and retail. Through these discussions, we
aim to highlight the versatility and value of anomaly detection in addressing unique industry
requirements and improving operational outcomes.
AutoRegressive Model. An autoregressive (AR) model is a statistical time series model that
predicts future values based on previous observations in the same series. In an AR model,
the current value of the series is assumed to be a linear combination of its past values, with
the coefficients of the past values known as autoregressive parameters. The order of the
model, denoted as AR(p), indicates the number of past values used in the prediction, where
’p’ represents the order. The model assumes that the residual errors are uncorrelated and
have constant variance. The parameters of the AR model are estimated using techniques
such as maximum likelihood estimation or least squares. In this linear model, the present
value Xt (dependent variable) of the stochastic process is determined by a finite set of past
values (independent variables) of length p, along with an associated error term ε as shown
in EquationPp 6 as follows:
Xt = i=1 ai Xt−i + c + εt (6)
AR models (AutoRegressive) with order p assume data stationarity and use least-squared
regression to calculate coefficients a1 ,..., ap , and c. However, they have limitations such as
difficulty capturing nonlinear dynamics, challenges in selecting the optimal model order, and
vulnerability to overfitting with limited observations. [54] They also require complete data
Moving Averages Model (MA) . Xt is a linear combination of the last p prediction errors,
xt , xt−1 ,..., xt−p : in the AR model. In the MA model, Xt is a linear combination of the last
q prediction Pq errors, xt , xt−1 ,..., xt−q : in the MA model as shown in Equation 7 as follows:
Xt = i=1 ai εt−i + µ + εt (7)
The values εt , εt−1 ,..., εt−q are unknown at the start of the MA model, but the prior
variables xt , xt−1 ,..., xt−p are known in the AR model. Anomalies in fitted models are
optimized consecutively due to no closed solution for MA models, requiring iterative solving
with a non-linear estimating approach [55]. MA models excel at short-term fluctuations
but struggle with long-term trends, complex patterns, outliers, non-stationary data, and
incorporating external variables. Addressing these limitations and enhancing forecasting
accuracy necessitate careful consideration and potential modifications.
Simple Exponential Smoothing (SES). SES [21] is a kind of exponential smoothing. SES
projections based on previous time series data and exponentially higher weights for the
most recent observations, using a non-linear approach as shown in Equation 10 as follows:
Xt+1 = βXt + β(1 − β)Xt−1 + β(1 − β)2 Xt−2 + . . . + β(1 − β)N Xt−N (10)
Extensive Smoothing (ES) utilizes high numbers of data points and offers Double and
Triple Exponential Smoothing (SES) for consistent and non-stationary data. SES handles
trend with an additional parameter β, while Triple Exponential Smoothing considers sea-
sonality with γ. However, SES has limitations: handling trend and seasonality, equal weight
to all observations, lack of adaptability, limited incorporation of external factors, and sensi-
tivity to initial parameter selection. Accurate forecasting requires careful evaluation of data
characteristics and exploration of alternative methods.
Time series Outlier Detection using Prediction Confidence Interval (PCI). An approach for
estimating the likelihood of a prediction is the Prediction Confidence Interval (PCI) [57].
This method uses a succession of non-linearly weighted prior data bits to predict the next
May 22, 2024
data item. The threshold is then used to determine whether or not a data point is normal
or anomaly. The method then establishes an upper and lower threshold for identifying
anomalies as shown in Equation
q 11 as follows::
PCI = Yt ± tα,2k−1 · s 1 + n1 (11)
In this tα ,2k−1 -distribution, students have a p-th percentile of 2k -1 degrees of freedom,
the model residual standard deviation is represented by s, and the window size k used to
calculate s is represented by k. Anomaly detection in time series using PCI has limitations
and requires careful consideration for accurate results.
Average of anomaly scores of multiple univariate time series. Anomaly scores are computed
for each univariate time series, with the average score representing the overall anomaly score
for the multivariate time series. However, this approach overlooks relationships between vari-
ables, fails to aggregate outlier information, lacks consistent scaling, and is computationally
complex. To improve accuracy, consider evaluating data characteristics, exploring multi-
variate anomaly detection techniques, and addressing limitations such as interdependencies
and context-specific weighting.
Detecting Outliers in Time Series Using PCI. Multivariate time series are treated similarly
to univariate time series, where the PCI approach is applied independently to each univariate
May 22, 2024
time series. However, using PCI for outlier detection has drawbacks, including reliance on
accurate prediction models, sensitivity to assumptions, challenges in setting appropriate
thresholds, and computational complexity. Careful consideration of these limitations is
necessary to ensure accurate and reliable outlier detection in time series using PCI.
Vector based Autoregressive Model (VAR). A multivariate time series AR-Model is a vector-
based Autoregressive model (AR-Model). A VAR model’s multivariate time series is based on
the notion that prior timestamps and the values of other variables influence each timestamp.
In this model, Xt = (x1,t , x2,t ,..., xd,t ) to express a multivariate time series timestamp such
that Xt ∈Rd . As a result, Xt has the attribute of being a vector (d×1) as shown in Equation
14 as follows:.
Xt = c + Θ1 Xt−1 + Θ2 Xt−2 + . . . + Θp Xt−p + εt (14)
In the same manner as the AR-Model is for univariate time series, εt is white noise.
Anomalies in a multivariate time series are detected by fitting a p-th order VAR model and
examining the deviation ε. Krishnan et al. [66] use the euclidean or Mahalanobis distance to
measure the deviation between observed and predicted timestamps. However, VAR models
have drawbacks such as assumptions of linearity and stationarity, data requirements, com-
putational intensity, limited handling of exogenous variables, and challenges in coefficient
interpretation. Careful model selection and interpretation are crucial for accurate time series
analysis with VAR models.
One-Class Support Vector Machine (OC-SVM). Kashiwao et al. [91] employed OC-SVM to
identify anomalies in multivariate temporal data. They use a sliding window of width w to
transform {XT } into a multivariate time series, which splits the time series data {XT } into
as shown in Equation 25 as follows::
XW := (W1 , W2 , . . . , Wp ) = ((x1 , . . . , xw ), . . . , (xp , . . . , xT ))(25)
Each element of the processed data, represented by xj ∈ XW , contains w timestamps
of N dimensions. After processing the data, features such as minimum, maximum, mean,
median, standard deviation, average crossings, and squared error are calculated for each
dimension. These features are then normalized before being used as input for the OC-SVM,
and the multivariate time series is transformed into a univariate time series. Before DL
became popular, people built a number of mathematical and statistical models to analyze
time series data, and these models were widely used in a variety of industries. We now look
at a few standard strategies as well as some challenges that are yet to be solved. Table 2
presents a comprehensive overview of different types of anomaly detection techniques along
with the corresponding algorithms used.
4.2. Modeling Deep Learning Models for Univariate and Multivariate Time Series Data
Neural networks have gained interest in time series forecasting and analysis due to their
performance in computer vision tasks and their ability to learn from data without assump-
tions about the data creation process. Researchers have compared neural networks to tra-
ditional methods like ARIMA for time series analysis and forecasting. [105][106] Neural
networks have also been applied to anomaly detection in both univariate and multivariate
time series data. [107][108][109]
Gated Recurrent Unit (GRU) . GRU, a simpler alternative to LSTM, combines input and
forget gates, outputs the entire state vector, and performs comparably with lower computa-
tional requirements. The autoencoder will compute the following as shown in Equation 32
as follows:
2
minθ,ψ,θ,φ I1 (xi , xi+1 , . . . , xi+w ) − (x0i , x0i+1 , . . . , x0i+w ) 2 , ∀i ∈ {i, . . . , T − w}(31)
Anomaly detection uses an anomaly threshold, where xj is identified as anomalous if
ej > δ, and autoencoders are effective for semi-supervised learning, learning a latent space
of normal data points and detecting deviations when anomalies are introduced.
Convolution Neural Network (CNN). CNNs are applied for anomaly detection in multivari-
ate time series, where the time series is split into univariate series and fed into separate
convolution blocks, with the resulting channels combined and classified using an MLP; an-
other approach involves using a single CNN with a sliding window and two convolution
blocks to make a multidimensional prediction. [118] To determine whether xi is an anomaly,
the euclidean distance between the prediction to the real value is computed as shown in
Equationp33 as follows:
ei = (yi − ȳi )2 (33)
This value is used as an anomaly score. Using the training data, a threshold δ can
be computed, analyzing the distribution of ei ,∀i ∈ [1, T]. In online anomaly detection, a
sequence is considered abnormal if an abnormal data point is present, combining autoencoder
and SVM methods. Probability distribution of errors can also be computed to classify a
sequence as anomaly based on a pre-computed δ. Table 3 summarizes the pros and cons of
the anomaly detection models, aiding in the selection of the most suitable model based on
strengths, limitations, and performance metrics like precision, recall, and F1-score.
Long Short Term Memory. LSTM networks are also used to detect anomalies in multivariate
time series. The approach is very similar to the method used for univariate time series.
Carletti et al. [82] used the same approach for multivariate time series. Let {XT } ∈ RT ×D
be a multivariate time series with T timestamps (x1 , ..., xT ) and each timestamp xi is a
vector with d -dimension: (x1i , x2i , ..., xD i ). The error for a multivariate timestamp xi is
computed as shown in Equation 34 as follows:
e(i) = (e(i), . . . , e(i), . . . , e(i), . . . , e(i)) = (f (xli−l ) − xi+1 , f (xl−1 1
i−l+1 ) − xi+1 , . . . , f (xi ) −
xi+1 )(34)
For anomaly detection in multivariate time series, the same approach as in univariate se-
ries is applied: fitting the error vector to a multivariate Gaussian distribution and estimating
parameters using Maximum Likelihood Estimation (MLE), where a threshold δ determines
if a data point is an anomaly. GRU models can be used similarly to LSTM, with GRU
cells replacing LSTM cells. Table 4 summarizes the parameters used in the anomaly detec-
tion method, providing important details for replicating experiments and understanding the
model’s configuration.
• Define the problem and the objectives of the anomaly detection system.
• Collect high-quality data that represents the system and its normal behavior.
• Preprocess and transform the data to ensure that it is suitable for the chosen detection
method.
May 22, 2024
• Select an appropriate detection method that matches the problem and the data char-
acteristics.
• Train the detection model on the historical data and validate its performance using
appropriate metrics.
• Integrate the detection model into the system and continuously monitor its perfor-
mance.
• Take appropriate actions when an anomaly is detected, such as alerting the responsible
personnel, triggering a response system, or shutting down the system.
By following these guidelines, practitioners can effectively use real-time detection methods
for anomaly detection and respond promptly to any incidents that occur. Here are some
novel guidelines for model developers to evaluate and fine-tune anomaly detection models:
1. Incorporate real-world feedback: Collect feedback from domain experts or end-
users about the anomalies detected by the model. This feedback can help refine the model’s
performance and improve its ability to detect relevant anomalies. [142]
2. Monitor model drift: Continuously monitor the performance of the model over
time and detect any drift or changes in the data that may affect its performance. Regularly
retrain the model to keep it up-to-date with the latest data. [143]
3. Use multiple evaluation metrics: Use a combination of evaluation metrics, such
as precision, recall, F1-score, ROC-AUC, and precision-recall curves, to assess the model’s
performance. This will help identify any weaknesses in the model’s performance and enable
targeted improvements. [143]
4. Use synthetic data: Create synthetic data sets that include a variety of anomalous
behavior that may not be present in the real-world data. This can help improve the model’s
ability to detect and classify anomalies that it may not have encountered in the training
data. [144]
5. Incorporate external data sources: Incorporate external data sources, such as
weather or social media data, into the model’s training data. This can help the model identify
anomalous behavior that may be related to external factors not present in the original data
set. [145]
6. Consider ensemble methods: Consider using ensemble methods, such as stacking
or bagging, to improve the model’s performance. These methods can help overcome the
limitations of individual models and improve the overall accuracy of the anomaly detection
system. [146] Denoising techniques enhance accuracy and efficiency in anomaly detection
by removing noise and disturbances from time-series data, allowing algorithms to focus on
underlying patterns and anomalies of interest.
1. Understand the Noise Characteristics: Gain a deep understanding of the noise
present in the time-series data. Different types of noise can be encountered, such as ran-
dom noise, measurement errors, outliers, or systematic noise due to specific sources. [147]
By identifying and characterizing the noise, practitioners can select appropriate denoising
techniques that effectively mitigate the specific noise sources.
May 22, 2024
2. Preprocessing Techniques: Apply preprocessing techniques to clean and preprocess
the data before denoising. This may include handling missing values, normalizing or scaling
the data, or handling outliers. Preprocessing ensures that denoising techniques operate on
reliable and consistent data, leading to more accurate anomaly detection results.
3. Select Suitable Denoising Algorithms: Explore various denoising algorithms to
select the most suitable one for the specific characteristics of the time-series data. Common
denoising techniques include moving average, median filtering, low-pass filters, wavelet trans-
forms, or autoencoders. Each algorithm has its advantages and assumptions, so practitioners
should consider the noise characteristics, computational efficiency, and the preservation of
underlying patterns when choosing a denoising technique.
4. Trade-Off between Noise Removal and Pattern Preservation: Strive for a
balance between noise removal and preserving the important patterns and anomalies in
the data. Aggressive denoising may unintentionally remove valuable information, making it
challenging to distinguish anomalies from the denoised data. [148] Experiment with different
denoising parameters and evaluate the impact on anomaly detection performance to find the
optimal trade-off.
5. Evaluate Denoising Impact: Assess the impact of denoising on the anomaly
detection task. Evaluate the performance of the anomaly detection algorithm both with
and without denoising to determine if denoising improves the accuracy, precision, recall, or
other relevant metrics. It is essential to ensure that the denoising process does not introduce
biases or distortions that may hinder the anomaly detection process.
6. Adaptive Denoising: Consider adaptive denoising techniques that dynamically
adjust the denoising process based on the data characteristics or the specific anomaly de-
tection task. Adaptive denoising methods can automatically adjust denoising parameters,
thresholds, or filtering techniques based on the noise levels or underlying patterns present in
the data. [149] This flexibility helps in handling varying noise intensities or changing data
distributions.
7. Iterative Approach: Adopt an iterative approach by combining denoising and
anomaly detection in a feedback loop. Start with an initial denoising step, followed by
anomaly detection on the denoised data. [150] Analyze the detected anomalies and reassess
the denoising process to refine it further. Iteratively refine both denoising and anomaly
detection steps to achieve better results.
8. Domain Knowledge Integration: Incorporate domain knowledge and expert in-
sights into the denoising process. Domain experts can provide valuable guidance in under-
standing the noise sources, identifying relevant features, and selecting appropriate denoising
techniques that align with the domain-specific characteristics and requirements.
By following these guidelines, practitioners can effectively employ denoising techniques
to enhance the accuracy and reliability of anomaly detection in time-series data, enabling
the identification of meaningful anomalies while minimizing the impact of noise and distur-
bances.
7. References
1. Wang, R., Nie, K., Wang, T., Yang, Y. and Long, B., 2020, January. Deep learning for
anomaly detection. In Proceedings of the 13th international conference on web search
and data mining (pp. 894-896).
2. Al-amri, R., Murugesan, R.K., Man, M., Abdulateef, A.F., Al-Sharafi, M.A. and Alka-
htani, A.A., 2021. A review of machine learning and deep learning techniques for
anomaly detection in IoT data. Applied Sciences, 11(12), p.5320.
3. May Petry, L., Soares, A., Bogorny, V., Brandoli, B. and Matwin, S., 2020. Challenges
in vessel behavior and anomaly detection: From classical machine learning to deep
learning. In Advances in Artificial Intelligence: 33rd Canadian Conference on Artificial
Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13–15, 2020, Proceedings
33 (pp. 401-407). Springer International Publishing.
4. Bekmez, S., Afandiyev, A., Dede, O., Karaismailoğlu, E., Demirkiran, H.G. and Yazici,
M., 2019. Is magnetically controlled growing rod the game changer in early-onset
scoliosis? A preliminary report. Journal of Pediatric Orthopaedics, 39(3), pp.e195-
e200.
5. Koubaa, A., Boulila, W., Ghouti, L., Alzahem, A. and Latif, S., 2023. Exploring
ChatGPT capabilities and limitations: A critical review of the nlp game changer.
6. Kirchgässner, G., Wolters, J. and Hassler, U., 2012. Introduction to modern time
series analysis. Springer Science & Business Media.
Author biography
Anomaly detection High computational Because online mode requires minimal la-
cost tency, many current costly computational
cost models are inapplicable.
GAN (BiGAN) [121] Variational inference is sup- To get reliable results, a large
ported. quantity of training data
and a longer training period
(epochs) are required[9].
Sequence-to-Sequence Appropriate for data having Slow deduction Training may
Model[122] temporal components (e.g., be time-consuming. Varia-
discretized time series data) tional inference is supported.
One Class SVM [123] Does not need a vast quan- Capability to capture com-
tity of data Quick to train plicated correlations within
Quick inference time data is limited.
Varia- 2 hidden layers [15, 2 hidden layers [15, 7] Latent dimension: 2; Batch
tional 7] size: 256; Loss: Mean squared
Autoen- error + KL divergence
coder
[61]
Sequence 1 hidden layer, [10] 1 hidden layer [20] Bidirectional LSTMs; Batch
to size: 256; Loss: Mean
Sequence Squared error
Model
[72]
Bidirec- Encoder: 2 hidden Generator: 2 hidden Latent dimension: 32; Loss:
tional layers [15, layers [15, 7] Binary Cross Entropy;
GAN [56] 7],Generator: 2 Discriminator: 2 Learning rate: 0.1
hidden layers [15, 7] hidden layers [15, 7]
Anomaly
[131]
MAD-0.95930.69570.84630.80650.22330.9124 0.8026 0.3588 0.9311 0.9436 0.9632 0.7701 0.9761 0.86130.8813 0.9791 0.9312
GAN
[132]
USAD0.9977 0.68790.8460.8143 0.18730.8296 0.8723 0.3056 0.7323 0.7812 0.7625 0.7921 0.9831 0.88120.7892 0.9742 0.8713
[133]
MTAD- 0.97180.69570.84640.81090.28180.8012 0.88210.4169 0.9472 0.9781 0.9621 0.8000 0.9912 0.89130.93610.9813 0.9614
GAT
[134]
CAE- 0.96970.69570.84640.81010.27820.7918 0.8728 0.4117 0.9811 0.9832 0.9821 0.8321 0.9946 0.90120.9212 0.9913 0.9514
M
[135]
GDN 0.95910.69570.84620.81010.29120.7931 0.8777 0.426 0.9824 0.98620.9512 0.84010.9932 0.90130.9041 0.9923 0.9526
[136]
GRN- 0.997 0.592 0.8780.738 0.9650.2497 0.7832 0.3981 0.9839 0.9641 0.98650.8261 0.9951 0.90120.918 0.99320.9611
50
[137]