Book 8
Book 8
Book 8
Article
Time-Series Neural Network: A High-Accuracy Time-Series
Forecasting Method Based on Kernel Filter and Time Attention
Lexin Zhang 1 , Ruihan Wang 1 , Zhuoyuan Li 1 , Jiaxun Li 1 , Yichen Ge 1 , Shiyun Wa 2 , Sirui Huang 1
and Chunli Lv 1, *
Abstract: This research introduces a novel high-accuracy time-series forecasting method, namely the
Time Neural Network (TNN), which is based on a kernel filter and time attention mechanism. Taking
into account the complex characteristics of time-series data, such as non-linearity, high dimensionality,
and long-term dependence, the TNN model is designed and implemented. The key innovations of
the TNN model lie in the incorporation of the time attention mechanism and kernel filter, allowing
the model to allocate different weights to features at each time point, and extract high-level features
from the time-series data, thereby improving the model’s predictive accuracy. Additionally, an
adaptive weight generator is integrated into the model, enabling the model to automatically adjust
weights based on input features. Mainstream time-series forecasting models such as Recurrent Neural
Networks (RNNs) and Long Short-Term Memory Networks (LSTM) are employed as baseline models
and comprehensive comparative experiments are conducted. The results indicate that the TNN
model significantly outperforms the baseline models in both long-term and short-term prediction
tasks. Specifically, the RMSE, MAE, and R2 reach 0.05, 0.23, and 0.95, respectively. Remarkably, even
for complex time-series data that contain a large amount of noise, the TNN model still maintains a
Citation: Zhang, L.; Wang, R.; Li, Z.;
high prediction accuracy.
Li, J.; Ge, Y.; Wa, S.; Huang, S.; Lv, C.
Time-Series Neural Network: A
High-Accuracy Time-Series Keywords: time-series forecasting; deep learning; time-series neural network; time attention
Forecasting Method Based on Kernel
Filter and Time Attention.
Information 2023, 14, 500. https://
doi.org/10.3390/info14090500 1. Introduction
Academic Editors: Binbin Yong and With the rapid development of global financial markets, the stock market has increas-
Francesco Camastra ingly become a significant choice for investors. In the stock market, the accuracy of stock
price prediction directly influences investors’ decisions and is crucial for the health and sta-
Received: 1 August 2023 bility of economic activities. However, stock price prediction poses a formidable challenge.
Revised: 3 September 2023
Stock prices are influenced by numerous factors, including but not limited to macroe-
Accepted: 7 September 2023
conomic conditions, company performance reports, market sentiment, and even global
Published: 13 September 2023
political dynamics. The interweaving of these factors causes stock prices to exhibit a high
degree of uncertainty and non-linearity, which adds significant difficulty to forecasting.
In recent years, deep learning has made considerable contributions in fields such as
Copyright: © 2023 by the authors.
agriculture [1,2], healthcare [3,4], energy usage [5], and finance [6]. This development
Licensee MDPI, Basel, Switzerland. provides a new solution for the time series prediction problem. Neural network models
This article is an open access article have gradually been widely used in stock price prediction due to their advantages in
distributed under the terms and processing non-linear data and capturing long-distance dependencies. Nevertheless, most
conditions of the Creative Commons existing prediction models based on neural networks often overlook a critical issue, the
Attribution (CC BY) license (https:// temporal attributes of stock prices and their importance. In reality, the impact of past price
creativecommons.org/licenses/by/ trends on future prices is not equal; recent price changes often have a more significant effect
4.0/). on future price predictions.
In recent decades, many researchers and practitioners have tried to predict stock prices us-
ing various methods, including time-series-based prediction methods [7,8], machine learning-
based prediction methods [9,10], deep learning-based prediction methods [11–14], and so on.
However, due to the characteristics of stock prices, such as non-linearity, high noise, and vari-
ability, it is often difficult to achieve the desired prediction results with these methods [15–21].
To address this problem, a time-series neural network method based on Kernel Fil-
ter and Time Attention is proposed in this paper, both of which are novel applications
developed by the authors for achieving higher accuracy in stock price prediction. Firstly,
the Kernel Filter is incorporated into the neural network model to effectively extract the
features of time-series data, especially in handling data with noise. This is a novel applica-
tion aiming to improve upon existing filtering techniques in neural networks. By applying
Kernel Filter, it is possible to capture the underlying trends of stock prices more accurately
and eliminate irrelevant noise interference, thereby enhancing the accuracy of predictions.
Secondly, a novel Time Attention mechanism is designed that assigns higher weights to
recent data, a unique approach developed to extend the capabilities of existing attention
mechanisms in capturing the temporal characteristics of stock prices. The advantage of this
approach is that it can more effectively capture the dynamics of recent prices, which often
serves as a crucial factor in predicting future prices. With these two innovative designs,
the proposed model considers the characteristics of time-series data, effectively extracts
data features, and pays more attention to recent data, thereby achieving higher accuracy in
stock price prediction.
In addition to introducing these innovative techniques, a series of carefully designed
experiments was conducted to measure the model’s performance against established
benchmarks in the field, such as RNN and LSTM. Our findings confirm that the TNN
stands up exceptionally well when challenged with various forecasting tasks, making it
particularly suitable for predicting stock prices. Notably, the model’s performance remains
robust even when applied to noisy, complex time-series data. Detailed evaluations and
comparisons are presented in the subsequent sections, reaffirming the model’s superior
predictive power with noteworthy metrics.
In the future, there are plans to further optimize the model and verify it on more
financial datasets, with the aim of further enhancing the model’s generalization ability and
prediction accuracy.
2. Related Work
Times-series forecasting has continually served as a research hotspot in the field of
finance, with its core premise being to decipher patterns from historical data to predict
future price fluctuations. To tackle this issue, researchers have implemented a variety of
machine learning methods, which include both traditional machine learning methods and
deep learning techniques.
y = aX + b (1)
where X denotes the input variables, y the output variables, and a and b the model parame-
ters to be learned. However, as stock prices are influenced by a multitude of factors, the
inherent laws are often non-linear. Consequently, the linear regression model struggles to
capture this complexity.
Information 2023, 14, 500 3 of 18
Support Vector Machine, a common method for both classification and regression,
operates by finding an optimal hyperplane to separate the data, thereby achieving the goal
of prediction. For regression problems, the form of SVM is as follows:
f ( X ) = hw, φ( X )i + b (2)
where φ( X ) represents the feature mapping of input variables X, w and b are the model
parameters to be learned, and hw, φ( X )i denotes the inner product of w and φ( X ). Although
SVM can handle non-linear problems, its high computational complexity when applied to
high-dimensional and large-scale datasets proves to be a substantial obstacle.
Traditional machine learning methods like SVM, Random Forests, and Decision Trees
often encounter several limitations in the context of stock price prediction. SVMs [25], while
effective for linearly separable problems, struggle with handling high dimensionality and
require substantial tuning, including the choice of an appropriate kernel function for non-
linear financial time-series data. Random Forests [26], although they offer an improvement
over Decision Trees by ensemble learning, still suffer from high computational complexity
and can underperform when dealing with highly noisy and volatile markets. Decision
Trees, on the other hand, are simple to implement and interpret but are prone to overfitting,
especially when grappling with the complex, noisy, and erratic nature of stock markets.
These methods often require manual feature engineering and generally fail to capture the
intricate, non-linear patterns and long-term dependencies that are inherent to financial
time-series data.
yt = Why ht + by (4)
where xt is the input, ht the hidden state, yt the output, σ the activation function, Whh ,
Wxh , and Why the weight parameters, and bh and by the bias parameters. Although RNNs
can handle sequence data, they suffer from vanishing and exploding gradients in long
sequences, making it challenging to capture long-term dependencies.
LSTM, an improved RNN, introduces a gating mechanism to resolve the issue of
long-term dependencies. The basic formula of LSTM is as follows:
f t = σ (W f [ h t − 1 , x t ] + b f )
it = σ (Wi [ht−1 , xt ] + bi )
C̃t = tanh(WC [ht−1 , xt ] + bC )
(5)
Ct = f t ∗ Ct−1 + it ∗ C̃t
ot = σ (Wo [ht−1 , xt ] + bo )
ht = ot ∗ tanh(Ct )
where f t , it , and ot are the forget gate, input gate, and output gate, respectively, Ct is
the cell state, σ is the sigmoid function, tanh is the tanh function, ∗ represents element-
wise multiplication, and [ht−1 , xt ] denotes the concatenation of ht−1 and xt . While LSTM
exhibits commendable performance in certain tasks, it also encounters several issues, such
as possessing numerous parameters, high computational complexity, and difficulty in
dealing with discontinuous and irregular time-series data.
Information 2023, 14, 500 4 of 18
While RNN and LSTMs have been popular for time-series forecasting, including
stock price prediction, they also come with their own sets of challenges. RNNs [29,30], for
example, are prone to issues like vanishing and exploding gradients when handling long
sequences, making them less effective for capturing long-term dependencies in stock price
data. LSTMs, designed to mitigate some of these issues, are computationally expensive
and still might require substantial parameter tuning for optimal performance [31,32].
Additionally, both RNNs and LSTMs can be sensitive to hyperparameter settings, making
them less robust when applied to the highly volatile and noisy nature of stock markets.
In summary, both traditional machine learning methods and deep learning techniques
come with their respective advantages and drawbacks. In this work, a new time-series
neural network is proposed, integrating Kernel Filter and Time Attention mechanisms,
aiming to resolve the issues present in the aforementioned methods when applied to stock
price prediction.
1 N
N i∑
µ= xi
=1
v
u1 N (6)
u
σ = t ∑ ( x i − µ )2
N i =1
Here, N represents the number of samples, xi represents the value of a single sample,
and µ and σ are the sample mean and standard deviation, respectively. After identifying
potential outliers, the median of the corresponding feature was used for replacement. The
median is robust to outlier perturbation and thus serves as a reliable measure for this
purpose. In our dataset, the actual number of outliers identified and replaced was 1.38% of
the total number of data points.
3.2.3. Normalization
After treating outliers, we also addressed missing values in the data. In our dataset, a
total of 218 missing values were observed across “Open”, “High”, “Low”, “Close”, and
“Vol” prices. These missing values were handled using normalization. Given that the scales
and value ranges of different features may vary, inputting them directly into the model
might impact the model’s learning performance. Through normalization, the value range of
all features could be adjusted to a unified interval, avoiding the model’s over-dependence
on features with large values. Min-max normalization, also known as linear normalization,
was adopted. The formula is as follows:
x − xmin
xnorm = (8)
xmax − xmin
Here, xnorm is the normalized value, x is the original value, and xmin and xmax are the
minimum and maximum values of the sample, respectively. By systematically addressing
outliers and missing values, and by normalizing the feature scales, the data became more
Information 2023, 14, 500 6 of 18
suitable for model learning. This contributes to improving the learning performance of the
model, thereby enhancing the accuracy of stock price prediction.
as shown in Figure 1. The network structure of TNN primarily comprises an input layer,
hidden layers, and an output layer. The input layer receives raw time-series data, the
hidden layers process these data using the Kernel Filter and Time Attention mechanism,
and the output layer produces prediction results.
In the hidden layers of TNN, a multi-layer design is employed. This is because deep
neural networks, through their multi-layer structure, can learn high-level features and
abstract patterns from input data, which is beneficial for enhancing the model’s predictive
performance. Specifically, the Kernel Filter is first used in the hidden layers to extract
local patterns from the time-series data. Then, the Time Attention mechanism assigns
weights to these patterns. Finally, the weighted features are passed to the next layer for
further processing.
TNN has significant distinctions from regular deep neural networks (DNNs) when
handling time-series data. Firstly, TNN designs the Kernel Filter and Time Attention
operators, which can better handle the characteristics of time-series data, while regular
DNNs often overlook these characteristics. Secondly, the number of layers and the network
structure of TNN are optimized for time-series prediction tasks, while regular DNNs
usually adopt a general network structure, which may not effectively handle time-series
prediction tasks. Lastly, TNN can adaptively assign weights for each feature, while regular
DNNs generally assume that all features are equally important. Furthermore, since the
network structure of TNN is optimized for time-series prediction tasks, TNN can make
more effective use of computational resources when handling large-scale time-series data,
thereby enhancing prediction efficiency.
from the input image. Drawing inspiration from this idea, the Kernel Filter was designed to
slide along the temporal dimension, thereby extracting local patterns from time-series data.
The design carries several significant implications. Firstly, by applying convolutions
along the temporal dimension, local patterns within time-series data can be captured, such
as short-term fluctuations in stock prices. Secondly, different kernels can extract diverse
features, enabling the model to understand time-series data from multiple perspectives.
Finally, the convolution operation possesses the attribute of parameter sharing, implying
that the same patterns can be sought throughout the entire time-series, an operation
unachievable with traditional fully connected neural networks.
Within the TNN model, the Kernel Filter is responsible for the preliminary extraction
of features from the input time-series data. Specifically, the model receives a segment
of time-series data as input, which is initially processed by the Kernel Filter, resulting
in a set of feature maps {h1 , h2 , . . . , ht }. These feature maps not only encapsulate local
patterns of the time-series data but also preserve the temporal order of the data, providing
abundant information for subsequent Time Attention. Its advantages in the tasks are clear.
Firstly, since it can extract local patterns from time-series data, it enables the model to
capture short-term fluctuations in stock prices, which is not achievable by traditional fully
connected neural networks. Secondly, due to the parameter sharing attribute of the Kernel
Filter, the model can seek the same patterns throughout the entire time-series, which is
of significant importance for understanding and predicting stock prices. For a series of
time-series data { x1 , x2 , . . . , xt }, a group of Kernel Filters {k1 , k2 , . . . , k n } is defined; each
kernel k i is a convolution kernel that can perform convolution on the input data along the
temporal dimension, extracting local patterns from it. Each kernel k i consists of a group of
weights {w1i , w2i , . . . , wid } and a bias term bi , where d is the size of the kernel. The operation
of the Kernel Filter can be represented by the following formula:
d
hit = f ( ∑ wij · xt− j+1 + bi ) (9)
j =1
In this formula, hit represents the output of the ith kernel at time t, and f is the
activation function. When using the Kernel Filter, the size d and quantity n of the kernel
need to be determined first. The size d of the kernel determines the time range of the
patterns that can be captured, and the quantity n determines the diversity of the features
that can be extracted. Then, at each time point t, each kernel convolves the input data to
obtain the corresponding feature map hit . Finally, all feature maps are integrated to obtain
the final output of the Kernel Filter. Overall, the Kernel Filter extracts local patterns in time-
series data through convolution operations, providing rich information for subsequent
time-series prediction. Its design originates from the convolution operation of CNN,
inherits the advantages of convolution operation in feature extraction, and overcomes the
deficiencies of fully connected neural networks in handling time-series data, which is of
great significance for the task.
model to pay more attention to the time points that have a larger impact on the prediction
results when processing time-series data.
αt = σ ( g(ht )) (10)
Through this approach, distinct weights can be allocated to the features at each time
point, enabling the model to focus on the features having a significant impact on the
prediction outcomes. When employing Time Attention, the structure and parameters of
the weight generator are first determined. Subsequently, the weight generator is utilized
to generate a weight for each time point feature. This weight is then used to weight the
feature, resulting in a weighted feature. Finally, all the weighted features are integrated to
attain the final output of Time Attention.
Time Attention exhibits notable advantages in the tasks at hand. Firstly, by assigning
different weights to the features at each time point, the model is better equipped to focus
on the features that have a larger impact on the prediction results, contributing to an
enhanced prediction accuracy of the model. Secondly, by introducing a weight generator,
the model can auto-adjust the weights based on input features, thus granting the model
improved adaptability. Finally, due to Time Attention’s superior consideration of the
order and continuity of time, the model, when dealing with time-series data, possesses
clear advantages over traditional fully connected neural networks. Compared to other
attention mechanisms, Time Attention embodies important distinctions. To begin with,
Time Attention is specifically designed for time-series data, providing a better consideration
of the order and continuity of time, often overlooked by other attention mechanisms.
Additionally, the weight generator in Time Attention is a small neural network capable
of auto-adjusting weights based on the input features, granting Time Attention superior
adaptability. In essence, by assigning distinct weights to the features at each time point,
Time Attention enables the model to focus more effectively on the features impacting
the prediction outcomes, will be discussed in Section 4. The design originates from the
concept of the Attention Mechanism, inheriting its advantages in weight distribution while
overcoming the shortcomings of traditional attention mechanisms when handling time-
series data. Within the TNN model, Time Attention significantly enhances the model’s
prediction accuracy and provides superior adaptability.
The experiments are conducted on a computing platform equipped with an Intel Core
i9 processor, 64GB of RAM, and an NVIDIA RTX 3090 GPU. The software environment
consisted of Python 3.8, PyTorch 1.9, and various other supporting Python libraries. For the
TNN models, we employed three major filters: Squeeze-and-Excitation Networks (SENets),
Convolutional Block Attention Module (CBAM), and Kalman Filters based on mentioned
libraries. It is essential to note that there are no “official” third-party libraries available
for these filters. Therefore, we implemented these algorithms from scratch in PyTorch by
carefully referencing their respective official papers [34–36].
Here, n is the total number of samples, yi is the actual value of the ith sample, and ŷi
is the model’s prediction for the ith sample. In this work, RMSE can accurately evaluate
the accuracy of the model’s prediction of time-series data. The smaller the RMSE value,
the smaller the bias between the model’s predictions and the actual results, indicating a
higher prediction accuracy of the model. It is worth noting that RMSE gives more weight
to larger errors, so if the model’s predictions have large deviations, the RMSE value will
increase accordingly.
Secondly, the Mean Absolute Error (MAE) is another evaluation metric for model
prediction capabilities. Unlike RMSE, MAE pays more attention to the average bias between
the model’s predictions and the actual results rather than the variance of the predictions.
Its formula is as follows:
1 n
MAE = ∑ |yi − ŷi | (13)
n i =1
In this formula, n is the total number of samples, yi is the actual value of the ith
sample, and ŷi is the model’s prediction for the ith sample. In this research task, using MAE
can more intuitively reflect the average deviation between the model’s predictions and
the actual results. The smaller the MAE value, the smaller the bias between the model’s
predictions and the actual results, indicating a higher prediction accuracy of the model.
Compared to RMSE, MAE gives equal weight to all errors, so when the model’s predictions
have large deviations, the MAE value will be relatively smaller.
The Mean Absolute Percentage Error (MAPE) is an evaluation metric that provides a
percentage-based representation of the errors between predicted and actual values, offering
an easy-to-interpret scale of accuracy. The formula for calculating MAPE is as follows:
n
1 yi − ŷi
MAPE =
n ∑ yi
(14)
i =1
In this formula, n is the total number of samples, yi is the actual value of the ith sample,
and ŷi is the model’s prediction for the ith sample. The smaller the MAPE value, the
higher the model’s prediction accuracy. One of the advantages of using MAPE is that it
provides an easily interpretable percentage error, enabling the performance of the model to
be understood in a straightforward manner.
Information 2023, 14, 500 12 of 18
Finally, the Coefficient of Determination (R2 ) is an evaluation metric that reflects the
correlation between the model’s predictions and the actual results. The closer R2 is to 1, the
higher the correlation between the model’s predictions and the actual results. Its formula is
as follows:
∑n (yi − ŷi )2
R2 = 1 − in=1 (15)
∑i=1 (yi − ȳ)2
In this formula, n is the total number of samples, yi is the actual value of the ith sample,
ŷi is the model’s prediction for the ith sample, and ȳ is the mean of the actual values. In this
research task, R2 can be used to evaluate the correlation between the model’s predictions
and the actual results. The higher the R2 value, the higher the correlation between the
model’s predictions and the actual results, indicating a higher prediction accuracy of
the model.
In summary, through the above model evaluation metrics, the predictive capabilities of
the model can be comprehensively evaluated from different angles. RMSE focuses more on
the variance of the model’s predictions, MAE focuses more on the average bias between the
model’s predictions and the actual results, MAPE offers a percentage-based representation
of the model’s prediction errors, providing an easily interpretable scale for assessing the
model’s accuracy, and R2 evaluates the correlation between the model’s predictions and the
actual results. These metrics are important tools for measuring model prediction capabilities
and can effectively help in understanding and improving the predictive performance of
the model.
110
Ground Truth
105
Predict Value
100
Stock Price
95
90
85
80
75
0 50 100 150 200 250
Time Series
Figure 3. Ground truth and the predicted values by TNN.
A deeper analysis of the experimental results from the characteristics and mathematical
theories of each model was subsequently undertaken. Firstly, the Linear Regression model
exhibited the poorest predictive performance among all models. This is likely because stock
price fluctuations are influenced by numerous factors, including macroeconomic conditions,
Information 2023, 14, 500 14 of 18
company financials, and market sentiment, among others. The complex and non-linear
relationships between these factors make it challenging for linear models to accurately
capture these relationships. The Decision Tree and Random Forest models were next in the
analysis, both being tree-structured models with excellent interpretability. However, their
predictive performance left room for improvement. This might be due to the fact that while
they can handle non-linear relationships, they struggle with time-series data as they cannot
capture time dependency within the data. SVM, a model based on margin maximization,
demonstrated better predictive performance than Linear Regression, Decision Tree, and
Random Forest. However, its performance still lagged behind RNN, LSTM, and TNN. This
might be because while SVM can handle non-linear problems, it may struggle with the
“curse of dimensionality” when dealing with high-dimensional, complex time-series data.
Lastly, the RNN and LSTM models, both types of recurrent neural networks, are especially
adept at handling time-series data. RNN and LSTM can capture time dependency in data,
hence performing better in stock price prediction than other models. Particularly, LSTM,
due to its inherent design advantages, can overcome the gradient vanishing and exploding
problems faced by RNN, thereby demonstrating slightly superior predictive performance.
configurations were compared in the experiment: a TNN model devoid of filters, a TNN
model equipped with a Kalman Filter [36], and a TNN model furnished with a Kernel Filter.
As illustrated in Table 3, the TNN model with no filter yielded RMSE, MAE, and R2
values of 0.26, 0.51, and 0.82, respectively. These findings indicate that the TNN model
can produce satisfactory predictions even without any filter, primarily due to the inherent
advantages of the TNN model structure. This model is capable of attributing different
weights to features at each timestamp, thereby enabling a focused approach towards
features with significant impacts on the predictions. However, in the absence of a filter, the
model may encounter challenges when handling complex, noisy time-series data, leading
to a potential compromise in the predictive performance. In contrast, the TNN model
employing the Kalman Filter demonstrated RMSE, MAE, and R2 values of 0.12, 0.34, and
0.91, respectively, revealing a marked enhancement in predictive performance with the
inclusion of the Kalman Filter. The Kalman Filter, characterized as a linear filter, can
accurately estimate system states amidst noisy data, consequently boosting the model’s
predictive accuracy to a certain extent. However, being linear, the Kalman Filter might
fall short when faced with complex non-linear time-series data. Lastly, the TNN model
incorporating the Kernel Filter exhibited RMSE, MAE, and R2 values of 0.05, 0.23, and 0.95,
respectively. Evidently, the introduction of the Kernel Filter further improved the predictive
performance of the TNN model, outperforming the other two model configurations across
all metrics. This superior performance primarily results from the impressive capabilities
of the Kernel Filter. Compared to the Kalman Filter, the Kernel Filter can handle not only
noisy data but also non-linear time-series data effectively. Its proficiency in extracting
higher-level features from time-series data surpasses that of the Kalman Filter, as shown in
Figure 4. Therefore, the introduction of the Kernel Filter enables the TNN model to achieve
higher prediction accuracy when dealing with complex time-series data.
Figure 4. Comparison of different filters on RNN, LSTM, Transformer [37], and ours. The orange line
denotes the performance for different models with Kernel Filter, while the blue one is that without
Kernel Filter.
Information 2023, 14, 500 16 of 18
5. Conclusions
The theme of this research is centered on a high-accuracy time-series forecasting
method known as the TNN, which is based on a Kernel Filter and Time Attention mecha-
nism. Forecasting analysis of time-series data is a crucial task in various domains. Neverthe-
less, high-precision time-series forecasting remains a challenge due to inherent complexities
such as non-linearity, high dimensionality, and long-term dependencies. To overcome these
challenges, a novel Time Neural Network model has been designed and implemented in
this study.
The major innovation of the TNN model involves the introduction of a Time Attention
mechanism and a Kernel Filter. The Time Attention mechanism allows the model to allocate
different weights to the features at each time point, enabling the model to focus more on
features that have a significant impact on the forecasting results. Meanwhile, the Kernel
Filter is used to extract high-level features from time-series data, thereby improving the
prediction accuracy of the model. In addition, an adaptive weight generator is incorpo-
rated into the model, allowing it to automatically adjust the weights according to the
input features.
In the experimental section, several mainstream time-series forecasting models, in-
cluding RNN and LSTM, were adopted as baseline models, and exhaustive comparative
experiments were conducted. The results demonstrate that the TNN model significantly
outperforms the baseline models, regardless of whether the forecasting tasks are short-term
or long-term. Importantly, even for complex time-series data containing a large amount of
noise, the TNN model is still capable of maintaining high prediction accuracy. Ablation
experiments validated the crucial contribution of the Time Attention mechanism and Kernel
Filter to the performance of the model. When either the Time Attention mechanism or
Kernel Filter is removed, a significant decline in the predictive performance of the model is
evident, further underscoring the importance of these two components in the model.
Despite the excellent performance of the TNN model in the experiments, certain
limitations remain. These include the need for enhanced noise data processing capabilities,
Information 2023, 14, 500 17 of 18
flexibility in dealing with different time slice spans, and comprehensive handling of the
interrelatedness among features. Future work will focus on improving and deepening the
approach to address these issues.
In conclusion, the TNN model proposed in this study provides a novel solution for
time-series forecasting. By incorporating a Time Attention mechanism and Kernel Filter,
the model demonstrates superior forecasting performance and adaptability when dealing
with complex time-series data. Despite some existing limitations, it is believed that through
future improvements and in-depth exploration, the TNN model can play a greater role
in the field of time-series forecasting, offering more accurate and reliable predictions for
real-world problem solving.
Author Contributions: Conceptualization, L.Z.; Methodology, L.Z. and R.W.; Software, R.W. and
J.L.; Validation, J.L. and S.H.; Formal analysis, Z.L. and Y.G.; Investigation, Z.L. and Y.G.; Resources,
Y.G.; Data curation, L.Z. and S.H.; Writing—original draft, L.Z., R.W., Z.L., J.L., Y.G., S.W. and C.L.;
Writing—review & editing, S.W., S.H. and C.L.; Visualization, R.W. and J.L.; Supervision, C.L.; Project
administration, S.W. and C.L.; Funding acquisition, C.L. All authors have read and agreed to the
published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China, grant
number 61202479.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Zhang, Y.; Wang, H.; Xu, R.; Yang, X.; Wang, Y.; Liu, Y. High-Precision Seedling Detection Model Based on Multi-Activation Layer
and Depth-Separable Convolution Using Images Acquired by Drones. Drones 2022, 6, 152. [CrossRef]
2. Lin, X.; Wa, S.; Zhang, Y.; Ma, Q. A dilated segmentation network with the morphological correction method in farming area
image Series. Remote Sens. 2022, 14, 1771. [CrossRef]
3. Zhang, Y.; Liu, X.; Wa, S.; Liu, Y.; Kang, J.; Lv, C. GenU-Net++: An Automatic Intracranial Brain Tumors Segmentation Algorithm
on 3D Image Series with High Performance. Symmetry 2021, 13, 2395. [CrossRef]
4. Zhang, Y.; He, S.; Wa, S.; Zong, Z.; Lin, J.; Fan, D.; Fu, J.; Lv, C. Symmetry GAN Detection Network: An Automatic One-Stage
High-Accuracy Detection Network for Various Types of Lesions on CT Images. Symmetry 2022, 14, 234. [CrossRef]
5. Maarif, M.R.; Saleh, A.R.; Habibi, M.; Fitriyani, N.L.; Syafrudin, M. Energy Usage Forecasting Model Based on Long Short-Term
Memory (LSTM) and eXplainable Artificial Intelligence (XAI). Information 2023, 14, 265. [CrossRef]
6. Huo, H.; Guo, J.; Yang, X.; Lu, X.; Wu, X.; Li, Z.; Li, M.; Ren, J. An Accelerated Method for Protecting Data Privacy in Financial
Scenarios Based on Linear Operation. Appl. Sci. 2023, 13, 1764. [CrossRef]
7. Manfre Jaimes, D.; Manuel Zamudio López, M.; Zareipour, H.; Quashie, M. A Hybrid Model for Multi-Day-Ahead Electricity
Price Forecasting considering Price Spikes. Forecasting 2023, 5, 499–521. [CrossRef]
8. Ampountolas, A. Comparative Analysis of Machine Learning, Hybrid, and Deep Learning Forecasting Models: Evidence from
European Financial Markets and Bitcoins. Forecasting 2023, 5, 472–486. [CrossRef]
9. Sedai, A.; Dhakal, R.; Gautam, S.; Dhamala, A.; Bilbao, A.; Wang, Q.; Wigington, A.; Pol, S. Performance Analysis of Statistical,
Machine Learning and Deep Learning Models in Long-Term Forecasting of Solar Power Production. Forecasting 2023, 5, 256–284.
[CrossRef]
10. Wood, M.; Ogliari, E.; Nespoli, A.; Simpkins, T.; Leva, S. Day Ahead Electric Load Forecast: A Comprehensive LSTM-EMD
Methodology and Several Diverse Case Studies. Forecasting 2023, 5, 297–314. [CrossRef]
11. Mishra, A.; Dasgupta, A. Supervised and Unsupervised Machine Learning Algorithms for Forecasting the Fracture Location in
Dissimilar Friction-Stir-Welded Joints. Forecasting 2022, 4, 787–797. [CrossRef]
12. Papadimitriou, T.; Gogas, P.; Athanasiou, A.F. Forecasting Bitcoin Spikes: A GARCH-SVM Approach. Forecasting 2022, 4, 752–766.
[CrossRef]
13. Fianu, E.S. Analyzing and Forecasting Multi-Commodity Prices Using Variants of Mode Decomposition-Based Extreme Learning
Machine Hybridization Approach. Forecasting 2022, 4, 538–564. [CrossRef]
14. Carrillo, J.A.; Nieto, M.; Velez, J.F.; Velez, D. A New Machine Learning Forecasting Algorithm Based on Bivariate Copula
Functions. Forecasting 2021, 3, 355–376. [CrossRef]
15. Yasrab, R.; Jiang, W.; Riaz, A. Fighting Deepfakes Using Body Language Analysis. Forecasting 2021, 3, 303–321. [CrossRef]
16. May, M.C.; Albers, A.; Fischer, M.D.; Mayerhofer; Florian an Schäfer, L.; Lanza, G. Queue Length Forecasting in Complex
Manufacturing Job Shops. Forecasting 2021, 3, 322–338. [CrossRef]
17. Rezazadeh, A. A Generalized Flow for B2B Sales Predictive Modeling: An Azure Machine-Learning Approach. Forecasting 2020,
2, 267–283. [CrossRef]
18. Claveria, O. Forecasting with Business and Consumer Survey Data. Forecasting 2021, 3, 113–134. [CrossRef]
Information 2023, 14, 500 18 of 18
19. Shah, V.H. Machine learning techniques for stock prediction. Found. Mach. Learn. Spring 2007, 1, 6–12.
20. Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [CrossRef]
21. Murphy, K.P. Probabilistic Machine Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2022.
22. Maulud, D.; Abdulazeez, A.M. A review on linear regression comprehensive in machine learning. J. Appl. Sci. Technol. Trends
2020, 1, 140–147. [CrossRef]
23. Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998,
13, 18–28. [CrossRef]
24. Wan, A.; Dunlap, L.; Ho, D.; Yin, J.; Lee, S.; Jin, H.; Petryk, S.; Bargal, S.A.; Gonzalez, J.E. NBDT: Neural-backed decision trees.
arXiv 2020, arXiv:2004.00221.
25. Kurani, A.; Doshi, P.; Vakharia, A.; Shah, M. A comprehensive comparative study of artificial neural network (ANN) and support
vector machines (SVM) on stock forecasting. Ann. Data Sci. 2023, 10, 183–208. [CrossRef]
26. Ma, Y.; Han, R.; Fu, X. Stock prediction based on random forest and LSTM neural network. In Proceedings of the 2019 19th
International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 15–18 October 2019; pp. 126–130.
27. Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent neural network regularization. arXiv 2014, arXiv:1409.2329.
28. Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer:
Berlin/Heidelberg, Germany, 2012; pp. 37–45.
29. Zhao, J.; Zeng, D.; Liang, S.; Kang, H.; Liu, Q. Prediction model for stock price trend based on recurrent neural network.
J. Ambient. Intell. Humaniz. Comput. 2021, 12, 745–753. [CrossRef]
30. Zhu, Y. Stock price prediction using the RNN model. J. Phys. Conf. Ser. 2020, 1650, 032103. [CrossRef]
31. Swathi, T.; Kasiviswanath, N.; Rao, A.A. An optimal deep learning-based LSTM for stock price prediction using twitter sentiment
analysis. Appl. Intell. 2022, 52, 13675–13688. [CrossRef]
32. Ma, Q. Comparison of ARIMA, ANN and LSTM for stock price prediction. In Proceedings of the E3S Web of Conferences,
Chongqing, China, 20–22 November 2020; Volume 218, p. 01026.
33. Li, Y.; Zou, C.; Berecibar, M.; Nanini-Maury, E.; Chan, J.C.W.; Van den Bossche, P.; Van Mierlo, J.; Omar, N. Random forest
regression for online capacity estimation of lithium-ion batteries. Appl. Energy 2018, 232, 197–210. [CrossRef]
34. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [CrossRef]
35. Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. CoRR 2018. Available online: http:
//xxx.lanl.gov/abs/1807.06521 (accessed on 6 September 2023).
36. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. Mar. 1960, 82, 35–45. [CrossRef]
37. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need.
Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243
547dee91fbd053c1c4a845aa-Paper.pdf (accessed on 6 September 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.