Electronics 12 02598

electronics
Article
Spatial-Temporal Self-Attention Transformer Networks for
Battery State of Charge Estimation
Dapai Shi 1,2,† , Jingyuan Zhao 3, *,† , Zhenghong Wang 2 , Heng Zhao 4 , Junbin Wang 5 , Yubo Lian 5
and Andrew F. Burke 3, *
1 Hubei Longzhong Laboratory, Hubei University of Arts and Science, Xiangyang 441000, China
2 Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle,
Hubei University of Arts and Science, Xiangyang 441053, China
3 Institute of Transportation Studies, University of California, Davis, CA 95616, USA
4 College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518118, China
5 BYD Automotive Engineering Research Institute, Shenzhen 518118, China
* Correspondence: [email protected] (J.Z.); [email protected] (A.F.B.)
† These authors contributed equally to this work.
Abstract: Over the past ten years, breakthroughs in battery technology have dramatically propelled
the evolution of electric vehicle (EV) technologies. For EV applications, accurately estimating the
state-of-charge (SOC) is critical for ensuring safe operation and prolonging the lifespan of batteries,
particularly under complex loading scenarios. Despite progress in this area, modeling and forecasting
the evaluation of multiphysics and multiscale electrochemical systems under realistic conditions
using first-principles and atomistic calculations remains challenging. This study proposes a solution
by designing a specialized Transformer-based network architecture, called Bidirectional Encoder
Representations from Transformers for Batteries (BERTtery), which only uses time-resolved battery
data (i.e., current, voltage, and temperature) as an input to estimate SOC. To enhance the Transformer
model’s generalization, it was trained and tested under a wide range of working conditions, including
diverse aging conditions (ranging from 100% to 80% of the nominal capacity) and varying temperature
windows (from 35 ◦ C to −5 ◦ C). To ensure the model’s effectiveness, a rigorous test of its performance
was conducted at the pack level, which allows for the translation of cell-level predictions into real-life
Citation: Shi, D.; Zhao, J.; Wang, Z.;
problems with hundreds of cells in-series conditions possible. The best models achieve a root mean
Zhao, H.; Wang, J.; Lian, Y.; Burke,
square error (RMSE) of less than 0.5 test error and approximately 0.1% average percentage error
A.F. Spatial-Temporal Self-Attention
(APE), with maximum absolute errors (MAE) of 2% on the test dataset, accurately estimating SOC
Transformer Networks for Battery
State of Charge Estimation.
under dynamic operating and aging conditions with widely varying operational profiles. These
Electronics 2023, 12, 2598. https:// results demonstrate the power of the self-attention Transformer-based model to predict the behavior
doi.org/10.3390/electronics12122598 of complex multiphysics and multiscale battery systems.
Academic Editors: Alon Kuperman

Keywords: lithium-ion battery; SOC; deep learning; estimation; transformer; electric vehicle
and Alessandro Lampasi
Received: 4 May 2023

Revised: 6 June 2023
Accepted: 7 June 2023 1. Introduction
Published: 8 June 2023
Vehicle electrification is considered as an important decarbonization pathway for
climate change mitigation [1]. Global electric vehicles (EVs) sales are stepping into steadily
escalating phases, from less than 10,000 in 2010 to more than 10 million units in 2022,
Copyright: © 2023 by the authors.
surpassing 20 million cumulative sales [2]. In total, billions of lithium-ion batteries are used
Licensee MDPI, Basel, Switzerland. as energy storage devices in today’s EVs. In EV applications, cell performance is highly
This article is an open access article dependent on operating conditions. Under abuse conditions, such as over-charging [3]
distributed under the terms and or over-discharging [4], battery voltage can move beyond their safe operating windows,
conditions of the Creative Commons which can accelerate degradation and increase the risk of battery failure after long-term
Attribution (CC BY) license (https:// incubation. Accurate estimation of battery state-of-charge (SOC) under various operating
creativecommons.org/licenses/by/ conditions is critical for effective battery management in both large-scale EVs [5] and
4.0/). photovoltaic-assisted applications [6]. However, accurate SOC estimation faces multiple
Electronics 2023, 12, 2598. https://doi.org/10.3390/electronics12122598 https://www.mdpi.com/journal/electronics

Electronics 2023, 12, 2598 2 of 23
sources of uncertainty, including complex physio-chemical mechanisms, significant cell-

to-cell variation, and dynamic operating conditions. These challenges are exacerbated
when cases involve uncertain aging conditions, noisy data, and missing initial/boundary
conditions, such as those found in EV field applications.
1.1. Current Methods for SOC Estimation

The traditional methods for battery SOC estimation can be classified by the form of
mechanisms into a variety of categories, including Coulomb counting methods [7,8], open
circuit voltage (OCV)-based estimation [9,10], filter-based algorithms [11–13], and model-
based estimation [14,15]. Despite relentless progress, there is always a trade-off between the
computational cost and the accuracy of model-based predictions for online SOC estimation.
Coulomb (ampere hour) counting methods provide a simple, straightforward estimation
method based on the definition of SOC. Due to the low computational complexity, Coulomb
counting methods have been widely used for online SOC estimation in the EV industry.
However, this method generally achieves a limited accuracy and poor robustness resulted
from unknown initial SOC and capacity degradation, as well as current sensor drift [16].
In addition, the energy loss during charging and discharging process and self-discharge
would also cause further accumulating errors. OCV-based methods are also commonly
used for SOC estimation due to their stable and monotonical relationship. There is very
little variation among the cells that have the same chemistries and cell design in terms of
the SOC–OCV relationship, which provides tools for practical applications by mapping
the look-up table under different test conditions. However, it can be a time-consuming
process, especially considering capacity degradation [17] and working temperature [18].
In addition, OCV-based methods can only be used to describe the electrode potential
difference in the open circuit condition. In order to obtain a stable electrode potential, it
requires a long rest time for the lithium-ion battery to reach a stable potential due to the
slow diffusion, which generally takes a few hours for most operating conditions. Such a
requirement greatly limits its utility and prediction accuracy for EV applications. A recent
study introduced an efficient methodology for determining the OCV–SOC curve for lithium-
ion batteries under dynamic temperature conditions to improve model generalizability [19].
Considering the variability of the OCV–SOC curve with temperature and battery age,
the research proposed a multi-output Gaussian process (MOGP) model utilizing current–
voltage data, thereby bypassing the need for direct OCV measurement or estimation. This
model efficiently captures correlations across various temperatures and constructs an
accurate OCV–SOC curve for a specific temperature, significantly diminishing prediction
errors. This pioneering technique provides enhanced SOC estimation precision, paving
the way for a more pragmatic and accurate SOC determination approach under diverse
operating conditions.
Closed-loop-based filter algorithms have been widely developed to tackle uncertain-
ties and disturbances based on feedback correction over the past decade. Filter-based
SOC estimation has two components: a battery voltage model and a filter algorithm, such
as Kalman Filter family [20], particle filter [21], and H-infinity [22]. When a filter model
is available, a first- or second-order equivalent circuit model (ECM) is widely used for
online EV applications. High-order ECM models [23] and physics-based models (PBM) [24]
achieve a higher voltage accuracy at the cost of computational complexity. One comment,
PBM, is the pseudo-two-dimensional (P2D) model. This model provides deeper insights
into the internal dynamics of batteries. However, the complexity of the governing equa-
tions and the high computational cost makes P2D less practical for online applications.
Additionally, traditional PBMs do not consider detailed material information, which is vital
for understanding battery degradation behavior. To manage the computational demand,
a primary strategy involves simplifying the PBMs. However, such approximations must
still retain sufficient physical information to accurately predict battery behavior. A widely
studied model that adopts this simplified approach is the single-particle model (SPM). This
model operates under the key assumptions that each electrode is represented by a spherical
Electronics 2023, 12, x FOR PEER REVIEW 3 of 23
must still retain sufficient physical information to accurately predict battery behavior. A
Electronics 2023, 12, 2598 widely studied model that adopts this simplified approach is the single-particle model 3 of 23
(SPM). This model operates under the key assumptions that each electrode is represented
by a spherical particle and that the potential and concentration effects in the solution
phase are disregarded. These approximations contribute to a significant reduction in com-
particle and that the potential and concentration effects in the solution phase are disre-
putational time. Nonetheless, the SPM model falls short in accuracy when applied to high-
garded. These approximations contribute to a significant reduction in computational time.
rate simulations.
Nonetheless,
Due to the theavailability
SPM modeloffalls short in accuracy
high-throughput when applied
computing to high-ratesoftware,
and open-source simulations.
data-driven and machine learning-based approaches have been successful in helping sci-data-
Due to the availability of high-throughput computing and open-source software,
driven
entists and
and machine
engineerslearning-based
in the energy storageapproaches
realmhave beenMachine
[25–30]. successful in helping
learning scientists
techniques
and
play crucial roles in modeling and forecasting the dynamics of multiphysics and mul-play
engineers in the energy storage realm [25–30]. Machine learning techniques
crucial roles in
tiscale battery modeling
systems and
within theforecasting
framework the dynamics
of Industry 4.0 of multiphysics
[31]. Particularly, and
deepmultiscale
learn-
battery systems
ing enables withinof
the creation the framework of
computational Industry
models 4.0 [31].
that consist of Particularly, deep learning
multiple processing lay-
enables
ers, whichthecancreation of computational
learn data representationsmodels that consist
with various levels of of abstraction.
multiple processing
Through the layers,
which can learn algorithm,
backpropagation data representations
deep learning with various
uncovers levels structures
complex of abstraction.
in largeThrough
datasets the
backpropagation
and guides a machine algorithm,
to adjustdeepits learning uncovers complex
internal parameters that computestructures in large datasets
representations in
and
eachguides a machine
layer from to adjust
the previous its internal
layer’s parameters
representations. In that compute
prediction representations
tasks, the top layersin ofeach
layer from themodels
deep learning previous layer’scritical
heighten representations.
features whileIn prediction
filtering outtasks, the top variations.
unnecessary layers of deep
learning models
This layered heighten
approach critical features
of enhancing while filtering
and reducing data helpsouttounnecessary variations.
extract vital patterns, re-This
sulting in
layered accurateofprediction.
approach enhancing This techniquedata
and reducing hashelps
emerged as a promising
to extract alternative,
vital patterns, resulting in
with particular
accurate advantages
prediction. in determining
This technique cell states
has emerged as a[32,33].
promising Figure 1 illustrates
alternative, withthe bal-
particular
ance betweeninprediction
advantages determining accuracy and anticipated
cell states computational
[32,33]. Figure cost the
1 illustrates for the aforemen-
balance between
tioned methods.
prediction accuracy and anticipated computational cost for the aforementioned methods.
Figure
Figure 1. Trade-off between
1. Trade-off betweenprediction
predictionaccuracy
accuracy and
and expected
expected computational
computational costcost (Every
(Every model
model
presents
presents aa unique
unique blend
blend of strengths
strengths and obstacles. Machine
Machine Learning
Learning Models:
Models:These
Thesemodels
modelshar-
harness
ness computational
computational power
power andand large
large datasetsto
datasets tocapture
capture complex,
complex,non-linear
non-linearbattery
batterydynamics.
dynamics.TheyThey
offer an
offer an effective
effective balance
balancebetween
betweenprediction
prediction accuracy and
accuracy computational
and computationalcost,cost,
which is especially
which is especially
beneficial for determining cell states. PBM: These models, such as the P2D model, provide deeper
beneficial for determining cell states. PBM: These models, such as the P2D model, provide deeper
insights into the internal dynamics of batteries. ECMs: Widely used with filter-based algorithms,
insights
ECMs offerintoathe internal
more dynamics approach
straightforward of batteries. ECMs:
to SOC Widely used
estimation. with filter-based
High-order algorithms,
ECMs can achieve
ECMs
higher offer
voltagea more straightforward
accuracy, but at the costapproach to computational
of increased SOC estimation. High-order
complexity. ECMs Physical
Simplified can achieve
higher voltage accuracy, but at the cost of increased computational complexity. Simplified Physical
Models: Models such as the SPM reduce computational demand by simplifying the physics. However,
they may compromise accuracy, particularly in high-rate simulations).
In the recent technological era, a multitude of innovative machine learning methods

and deep neural networks have been advanced for the estimation of the SOC for EV appli-
cations. These novel proposals are meticulously designed to substantially augment model
accuracy, thus contributing to more precise and efficient energy management within the
Electronics 2023, 12, 2598 4 of 23
burgeoning field of electric vehicles. One such example is the use of convolutional neural
network (CNN). Innovative research has centered on crafting a universal SOC estimator
capable of addressing variations in battery type and sensor noise [34]. A unique closed-loop
paradigm, employing a deep convolutional neural network (DCNN), was put forward in
this study, employing transfer learning and pruning techniques for swift adaptability in
distinct scenarios. The proposed model showcased its effectiveness across diverse battery
types and stages of aging, achieving root mean square errors (RMSE) below 2.47% by adjust-
ing the final layers. Recurrent neural networks (RNNs) offer significant benefits for tasks
that demand sequential inputs and time-series data over convolutional neural networks
(CNNs). Processing each data sequence element individually, RNNs preserve a state vector
with crucial historical sequence data in their hidden units. The concept becomes apparent
when the outputs of hidden units across discrete time steps are viewed as outputs of neu-
rons in a deep, multilayered network, illuminating how backpropagation can train RNNs.
Specialized RNNs, known as long short-term memory (LSTM) networks, bring a novel
structure called a memory cell into play, which includes three gate types (input, forget, and
output) that control the memory cell’s information flow. A recent study introduced a fusion
network marrying a multi-dimensional residual shrinkage network (MRSN) with an LSTM,
enhancing SOC estimation in lithium-ion batteries [35]. The combined network efficiently
manages multi-dimensional interaction, noise interference, and precludes data leakage
using a sequence-to-point processing strategy. Further advancements in SOC estimation
techniques for lithium-ion batteries involve a LSTM-RNN augmented with extended input
and constrained output (EI-LSTM-CO) [36]. This model includes an additional input, the
sliding window average voltage, and an Ampere-hour integration-based state flow ap-
proach for output constraint. These enhancements significantly improved SOC estimation
performance by curbing output volatility. The encouraging results underscore the potential
of the EI-LSTM-CO for real-world SOC estimation. In addition, a multi-forward-step SOC
prediction method based on LSTM demonstrates its effectiveness for battery systems in real-
world EV applications. The developed Weather-Vehicle-Driver analysis method considers
how drivers’ actions and the weather affect a battery system’s performance in real-world
operating circumstances. In addition to preventing LSTM from overfitting, the proposed
dropout technology and correlation analysis efficiently choose the best parameters prior
to training. Additionally, by using LSTM and multiple linear regression algorithms, a
joint-prediction strategy was applied to achieve dual control of prediction accuracy and
prediction horizon. It offers an opportunity to control the prediction steps of LSTM while
ensuring acceptable prediction accuracy by using the one-forward-step prediction accuracy
of linear regression as the accuracy benchmark [37]. To capture temporal dependencies
in both forward and backward directions, a bidirectional LSTM neural network was used
for the SOC estimation [38]. Moreover, the bidirectional LSTM layers are stacked to im-
prove the predictive ability of the non-linear and dynamic relationship between the input
parameters and cell SOC on a layer-by-layer basis. Compared to LSTM, the gated recurrent
unit (GRU) employs a simpler structure with low-dimensional non-linear manifold and
was given a great deal of attention in relation to the prediction of battery conditions. For
example, a RNN with GRU was applied to estimate the cell SOC from measured time-
series signals, including current, voltage, and temperature [39]. The proposed method
improves estimation accuracy over traditional feed-forward neural networks by making
use of data from previous SOCs and measurements. To determine the SOC of lithium
batteries, a single hidden layer GRU-RNN-based momentum-optimized algorithm was
investigated [40]. To prevent oscillation of the weight change and to increase the training
speed of the estimation, the current weight change direction compromises the gradient
direction at the current instant and historical time. The GRU-RNN-based momentum
algorithm offers tools to obtain the battery SOC estimates and the related estimation errors
by tweaking noise variances, epochs, and hidden layer neuron counts. In a recent study,
the GRU-RNN was applied to pre-estimate battery SOC, and the adaptive Kalman filter
(AKF) was used to smooth the output of the GRU model to obtain the final results [41].
Electronics 2023, 12, 2598 5 of 23
In the proposed framework, it is not necessary to construct the intricate battery model
because GRU-RNN model is well-suited to establish the non-linear mapping between the
measured battery variables (voltage, current, and temperature) and SOC over the entire
temperature range. Moreover, since the AKF process the outputs of the GRU-RNN, there
would be more flexible to design the network’s hyperparameters, which introduces savings
in computational cost. The enhanced noise adaptive algorithm not only makes it easier to
choose the initial noise covariance but also makes the proposed GRU-AKF more adaptable
to the more complex loading scenarios. In line with recent advancements, a unique SOC
estimation approach for lithium-ion batteries was introduced that utilizes a deep feed-
forward neural network (DFFNN), optimized through an attention mechanism relevant to
stochastic weight algorithms (RAS) [42]. This strategy efficiently extracts pertinent features
from input data and updates the weights and biases, addressing gradient issues and aug-
menting the DFFNN’s applicability across a range of operational conditions. Additionally,
it implements a shifting-step unscented Kalman filter (SUKF) for the adaptive adjustment of
error covariance, thus providing robustness against spontaneous error noise. This strategy
has been verified to deliver precise SOC estimates, showcasing impressive error metrics in
trials, indicating its potential applicability in managing batteries for electric vehicles.
Collectively, these research findings demonstrate that RNNs are effective in modelling
sequential and time-series data. However, training them has proven difficult. The back-
propagated gradients either increase or decrease at each time step, so they usually explode
or vanish for the prediction tasks which require learning of the sequences with the limited
use of parallelization across multiple timescales.
The attention-based Transformer model [43], which is primarily employed in natural
language processing, recently made ground-breaking advancements in time-series pre-
diction. Over the past few years, some researchers estimated SOC with good potential
using the encoder-decoder structure, self-attention mechanism, and sequence-to-sequence
method. The Transformer model can be calculated in parallel, which permits faster training
and better use of GPU resources, unlike conventional RNNs. For example, the encoder-
based Transformer neural networks have been demonstrated to be a powerful tool to
estimate battery SOC in a self-supervised data-driven manner without considerable do-
main expertise to design features or adaptive filtering [44]. To explore the current and
voltage data separately, a two-encoder architecture was developed, which is composed
of one linear layer and two identical encoder layers for each encoder [45]. The outputs
of the encoders were then concatenated into a single sequence and used as the inputs for
the decoder. Moreover, an immersion and invariance adaptive observer was proposed to
reduce the oscillations of the Transformer prediction. Moreover, self-attention Transformer
model has demonstrated remarkable power in achieving accurate co-estimation of battery
states [46]. Self-supervised Transformer neural networks unveil new avenues for assimi-
lating representations derived from observational data. These intricate networks offer a
gradation of abstractions, thereby simplifying the incorporation of attention mechanisms,
an essential feature in the data processing pipeline. Their integration with a synergistic
cloud-edge computing framework, when combined with the versatility of deep learning,
substantially augments the predictive prowess of these networks. Such an approach ulti-
mately aids in effectively capturing and decoding long-range spatio-temporal dependencies
that span across diverse scales, thus enhancing the accuracy of analyses and predictions.
Table 1 presents a comprehensive comparison of the merits and demerits associated with
these aforementioned techniques, particularly in the context of battery SOC estimation.
This balanced evaluation provides a clear understanding of the applicability and potential
challenges of each method in real-world settings.
Electronics 2023, 12, 2598 6 of 23
Table 1. Advantages and disadvantages of the common methods used for battery SOC estimation.
Methods Advantages Disadvantages

Low computational complexity, Susceptible to errors, depends heavily on
Ampere-hour Counting
straightforward method initial SOC
Not suitable for real-time SOC, requires
Open Circuit Voltage Simple, easy to implement
resting state
Can be used for online applications, low Limited accuracy, requires careful
Model-Based Estimation
computational demand parameterization
Provides insights into the internal
Physics-Informed Methods Complex equations, high computational cost
battery dynamics
Capable of handling noise and Requires accurate system model, might be
Filter-based Methods
estimation uncertainty computationally heavy
Can handle complex relationships, potential Needs a large amount of data, requires
Machine Learning
for high accuracy training phase
1.2. Contributions and Structure of the Work

In this study, we have meticulously designed a custom Transformer network archi-
tecture. This specific construct is aimed at accurately predicting the state of charge (SOC)
of a battery under real-world operating conditions, thereby eliminating the need for prior
knowledge of the underlying physical principles. Time-series data, in this case, are under-
stood as a sequential aggregation of samples, observations, and unique features mapped
over a temporal dimension. When compiled at a predetermined sampling interval, these
data points aggregate into time-series datasets, serving as a valuable source of analyti-
cal information. The contributions of this study, embodying innovation, rigor, and the
knowledge gained, can be summarized as follows:
(1) The specialized Transformer model, termed as Bidirectional Encoder Representations
from Transformers for Batteries (BERTtery), offers an effective tool to learn the non-
linear relationship between SOC and input time-series data (e.g., current, voltage, and
temperature), and to uncover intricate structures.
(2) For efficient implementation of the Transformer, it is beneficial to create models
and algorithms considering different operating conditions, such as charging and
discharging processes. Consequently, the encoder network converts observational
data into token-level representation, where each feature in the sequence is replaced
with fixed-length positional and operational encoding.
(3) A variable-length sliding window has been designed to produce predictions adhering
to the underlying physico-chemical (thermodynamic and kinetic) principles. The
sliding window aids in enriching the network with temporal memory, enabling BERT-
tery to generalize well beyond the training samples and to better exploit temporal
structures in long-term time-series data.
(4) For real-world applications, the accuracy of model performance is essential. Therefore,
we have collected a diverse range of operating conditions and aging states from field
applications to test the generalization capabilities of the machine learning model.
(5) We devised a dual-encoder-based architecture to preserve the symplectic structure
of the underlying multiphysics battery system. The channel-wise and temporal-wise
encoders pave the way for broader exploration and capture epistemic uncertainty
across multiple timescales, facilitating the assimilation of long-term time-series data
while considering the influence of past states or forcing variables.
In the subsequent sections, we initially outline the machine learning pipeline, which
includes data generation and the implementation of the self-attention Transformer model.
Our specialized Transformer neural networks consist of three key components: embedding,
a two-tower structure, and a gating mechanism. The selection of hyperparameters is also
briefly discussed. Following this, we used field data to train and evaluate the Transformer
model across a broad range of operating and aging conditions at both the cell and pack
levels. We then discuss potential applications for real-world electric vehicle (EV) usage.
Electronics 2023, 12, 2598 7 of 23
Considering the fast-paced advancements in this field, we conclude by providing an outlook

that includes reflections on the model’s current limitations.
2. Materials and Methods

2.1. Data Generation
Transferring academic advancements to commercial applications can be a challenging
task, even with open data sharing. This is mainly due to reproducibility issues resulting
from the gap between laboratory settings and end-use scenarios. The high-dimensional
parameter space that parameterizes the state of charge (SOC) of lithium-ion batteries
presents a significant challenge to probe, given the diverse aging mechanisms, numerous
capacity fade processes, and manufacturing uncertainties involved.
To address this challenge, we collected two comprehensive datasets from real-world
electric vehicle (EV) applications. As shown in Table 2, Group #A comprises three lithium-
ion cells with widely varying state-of-health (SOH), ranging from 100% to 80%, while
Group #B comprises one large-scale battery pack after eight consecutive months of service
under realistic conditions. All charging–discharging data were cycled under varied random
charging and discharging conditions, with commercial cell balancing and thermal manage-
ment. By deliberately varying the aging conditions, we generated a dataset that captures
a wide range of SOH, from approximately 100% to 80% of nominal capacity. Although
the cell temperature is controlled for security reasons in real-world applications, it can
still vary by up to 45 ◦ C due to the large amount of heat generated during charge and
discharge. In this study, we probed discharging rates ranging from 0.1 C to 5 C pulse power
for acceleration and multi-step charging rates ranging from 0.5 to 1.5 C.
Table 2. Datasets used for machine learning modelling.
Operating
Datasets Entity Cell Specification SOH
Temperature Window
Group A
5 large-scale NMC cells 105 Ah, 115 Ah and 135 Ah 100%, 90% and 80%. −5 ◦ C to 40 ◦ C
(Cell level)
Group B 8 consecutive months
1 battery pack 92 NMC cells in-series 10 ◦ C to 35 ◦ C
(Pack level) of service time in an EV
Despite significant advancements in battery states estimation research, a prevalent gap

remains between the simulated models and their real-world applicability. This disconnect
arises due to the complex nature of lithium-ion batteries and the diverse range of operat-
ing conditions they encounter in real-world scenarios, which are often oversimplified or
overlooked in simulation-based studies.
The authors of this study address this gap by amassing comprehensive datasets that
depict the true behavior of lithium-ion batteries under a wide variety of real-world op-
erating conditions. These datasets are not limited to idealized or laboratory conditions
but encompass a broad spectrum of real-world scenarios, thus presenting a more realistic
representation of battery performance. The introduction of these detailed and representa-
tive datasets paves the way for the development and validation of more accurate, robust,
and reliable predictive models for battery diagnosis and prognosis. By employing these
datasets, researchers can better understand the multifaceted dynamics of lithium-ion bat-
teries in real-world scenarios and, consequently, enhance the transferability of academic
advancements to commercial applications. This, in turn, facilitates the creation of effec-
tive battery management strategies, ultimately extending the lifespan and improving the
safety of lithium-ion batteries in practical applications. For a comprehensive exploration of
the disparity between laboratory testing and real-world applications, please refer to the
detailed discussion presented in [47].
advancements to commercial applications. This, in turn, facilitates the creation of effective
battery management strategies, ultimately extending the lifespan and improving the
safety of lithium-ion batteries in practical applications. For a comprehensive exploration
Electronics 2023, 12, 2598 of the disparity between laboratory testing and real-world applications, please refer to the
8 of 23
detailed discussion presented in [47].
2.2. Transformer-Based Neural Network

2.2. Transformer-Based Neural Network
Recently, Transformer models have been increasingly utilized across diverse facets
Recently, Transformer
of time-series models haveaddress
analysis. Transformers been increasingly utilized across
these complexities using diverse facets of
self-attention
time-series
mechanisms analysis. Transformers
and positional address
encodings. Thesethese complexities
strategies permit themusingtoself-attention
concurrently mecha-
con-
nisms
centrateandon positional encodings.
the immediate data These
samples strategies permit
and capture them
their to concurrently
sequence details. Theconcentrate
Trans-
onformerʹs
the immediate data samples and capturerelationships
their sequence details. The Transformer 0 s struc-
structure is designed to identify between various input segments.
ture
This is is
designed
achievedtobyidentify relationships
integrating positionalbetween
data intovarious input segments.
these segments This isthe
and employing achieved
dot
byproduct
integrating positional
operation. For adata into these segments
comprehensive and employing
understanding the dot product
of the algorithm operation.
and mathemat-
For
ics,a please
comprehensive
refer to theunderstanding
resource provided of the algorithm
in [48]. and mathematics,
The proposed Transformer please
modelrefer to the
(Figure
2) consists of four main modules: a dual-embedding module, a two-tower
resource provided in [48]. The proposed Transformer model (Figure 2) consists of four main encoder mod-
ule, sequence
modules: predictions, and
a dual-embedding a gating
module, a module.
two-tower Below are the
encoder relationships
module, sequencebetween our
predictions,
Transformer
and model and
a gating module. BERTare
Below (bidirectional encoderbetween
the relationships representations from transformers):
our Transformer model and
(i) Our
BERT BERTtery adopts
(bidirectional encodertherepresentations
BERT methodology fromfor self-supervised
transformers): (i)pretraining
Our BERTtery and adopts
em-
ploys
the BERT Transformer
methodology as the
formodel backbone. pretraining
self-supervised (ii) Although and ouremploys
embedding and encoder
Transformer as the
structure
model differs from
backbone. BERT in several
(ii) Although ways, it has
our embedding andspecial
encodercapabilities
structurefordiffers
exploring
fromspe-BERT
incific knowledge
several ways, itinhas thespecial
batterycapabilities
domain. (iii)for Weexploring
used two specific
embeddings: positional
knowledge embed-
in the battery
ding and
domain. (iii)operational
We used two embedding.
embeddings: (iv) Two duel-wise
positional encoders—channel-wise
embedding encoder
and operational embedding.
and temporal-wise encoder—were designed to capture the
(iv) Two duel-wise encoders—channel-wise encoder and temporal-wise encoder—were long-range spatio-temporal
features to
designed automatically.
capture the long-range spatio-temporal features automatically.
Figure 2. The framework of the BERTtery (Two encoding techniques are devised to capture the po-
Figure 2. The framework of the BERTtery (Two encoding techniques are devised to capture the
sition of the battery operational profiles within the sequence. To optimally leverage the time-series
position of the battery operational profiles within the sequence. To optimally leverage the time-series
data of the cell, a two-tower structure was employed, incorporating both a channel encoder and a
time-step encoder. A gating mechanism serves as a robust and straightforward means to amalgamate
the outputs of the two encoder towers. In our self-attention multi-head Transformer model, query,
key, and value matrices play a crucial role in determining the level of attention each part of the
input sequence should receive. These matrices serve to identify and weigh the importance of specific
patterns within the sequence, enabling the model to focus on critical details during prediction).
Electronics 2023, 12, 2598 9 of 23
2.2.1. Normalization
The self-attention mechanism can be conceptualized as a procedure consisting of two
stages. Initially, a normalized dot product is computed among all pairs of input vectors
present in a specific input sequence. This normalization is accomplished through the
application of the softmax operator, which can be expressed as:
Tx
e xi j
ωij = softmax(xi T x j ) = xi T x k
(1)
∑ ke
where xi represent the input segments, ∑nj=1 ωij = 1 and 1 ≤ i, j ≤ n.

In the subsequent phase, we identify a fresh representation, denoted as zi , for a specific
input segment xi . This representation is a weighted aggregate of all segments {xi }nj=i within
the input:
n
zi = ∑ wij x j , ∀1 ≤ i ≤ n (2)
j =1
2.2.2. Embedding
To encode the position of the battery operational profiles in the sequence, we used
both positional and operational (charging and discharging) embedding to encoder the
position of the time-series data in the sequence. The operational embedding is designed to
produce a sequence-level representation for battery data under different energy storage
mechanisms. A sine-cosine encoding method was used in this study for both absolute and
relative positional embeddings.
PE(pos)2i = sin(p/100002i/dx ) (3)
PE(pos)2i+1 = cos(p/100002i/dx ) (4)

where 2i stands for the even dimensions and 2i + 1 stands for the odd ones. The position
embedding technique can reflect both absolute and relative position information of the
cell states.
(i) Positional Encoding
BERTtery uses positional encoding to stamp the position of the tokens in the sequence.
In applications to the electrochemical system, positional encoding plays an important role,
as the underlying mechanism is related to the detection of the subtle variations in the
parameters (current, voltage, and temperature) over long length and time scales. As time
passes, the cell charge storage behaviors would significantly change under irregular cycling
patterns and varying operating conditions for evaluating the electrochemical performance
of energy storage devices. The introduction of embedding time into the input embedding
improves the performance of the learning algorithm by forecasting long range dependencies
and interactions in sequential data.
(ii) Operational Encoding
In addition to positional encoding, battery operational (working condition) encoding
was established to improve the performance of the learning algorithm, and then it applies
a dropout technology to enhance the generalization and robustness [49]. Considering the
unique operating conditions these batteries undergo in daily use, such as discharging,
charging, and resting/idle periods, operational encoding plays a crucial role in improv-
ing the learning algorithm’s ability to accurately predict battery behavior. These distinct
operating conditions significantly alter the cells’ behavior and underlying physical (thermo-
dynamic and kinetic) properties, thus necessitating distinct model interpretations for each
state. By integrating operational encoding, we acknowledge the differential behaviors and
influences during these states and provide an enriched representation of the input data.
Electronics 2023, 12, 2598 10 of 23
2.2.3. Two-Tower Structure

A two-tower architecture with channel-temporal encoders was developed for multi-
variate time-series regression. Each encoder block is composed of multi-head self-attention
and feed forward network connected back-to-back with residual connections and normal-
ization layers around each of the sub-layers. Residual connections offer an effective and
simple technique for improving the model accuracy towards stable and efficient training of
robust neural networks. The layer normalization substantially reduces the training time
with a faster training convergence. Compared to the traditional single-tower architecture,
the two-tower model can capture deeper electrochemical parameter changes or hidden rep-
resentations, which may reflect an early stage of aging and open-circuit relaxation process.
Capturing both the step-wise (temporal) and channel-wise (spatial) information provides
powerful tools for learning the evolution of non-linear multiscale and multiphysics systems
with inhomogeneous degradation behavior, considerably advancing the capabilities of SOC
estimation under different aging and operating conditions.
The core of the Transformer neural network is the multi-head self-attention mechanism,
which is made up of various scaled dot-product attention functions and enables the model
to capture significant information in a sequence.
Vectors corresponding to input xi , such as query qi , key ki , and value vi , can be derived
by employing the following method:
qi = Wq xi , k i = Wk xi , and vi = Wv xi (5)
The matrices Wq and Wk of dimension Rd*d k , as well as Wv of dimension Rd*d v , embody

adjustable weight matrices. Consequently, the resultant output vectors, indicated by {zi }
from i = 1 to n, can be determined as follows:
zi = ∑ softmax(qi T k j )v j (6)
j
It is important to highlight that the weighting attributed to the value vector vi is reliant
on the evaluated correlation between the query vector qi at the i-th position and the key
vector kj at the j-th position. The dot product’s magnitude tends to augment with the
growth in the size of query and key vectors. Due to the softmax function’s susceptibility to
large magnitudes, the attention weights undergo scaling proportional to the square-root of
the size of the query and key vectors, denoted by dq, as follows:
qi T k j
zi = ∑ softmax ( p )v j
dq
(7)
j
In the matrix form, the self-attention mechanism can be succinctly expressed as:
QK T
Attention( Q, K, V ) = softmax( √ )V (8)
dK
where Q, K, and V represent query, key, and value matrix, respectively, and dk is the
dimension of the key matrix.
Multi-head attention empowers the model to concurrently focus on data from varied
representational spaces at diverse positions. This capacity is stifled by averaging in a model
utilizing a singular attention head.
MultiHead( Q, K, V ) = Concat(head1 , . . . , headn )WO (9)
where headi = Attention( QWi Q , KWi K , VWi V ) (10)

Electronics 2023, 12, 2598 11 of 23
(i) Temporal-Wise Encoder

The two-tower architecture, featuring a temporal self-attention decoder, is employed
in this study for its exceptional ability to learn long-term dependencies in time-series
data. This design proves particularly advantageous in extracting implicit features across a
broad spectrum of charging and discharging activities. Incorporation of the self-attention
mechanism and positional encoding techniques not only curtails computational cost but
also enhances the analysis of current data samples within the sequence. Furthermore, the
use of a dual encoder opens exciting avenues for modeling temporal evolutionary patterns,
thereby allowing for precise estimation of the multiphysics battery system and prediction
of future developments. A notable strength of the Transformer model is its combination
of stacked self-attention and point-wise, feed-forward layers. This architectural decision
ensures that the model effectively recognizes fine-scale features, thereby increasing the
model’s prediction accuracy and operational efficiency.
(ii) Channel-Wise Encoder
In the two-tower architecture, channel-wise attention plays a crucial role in captur-
ing channel features extracted along the temporal dimension. By calculating attention
weights or scores, this mechanism amplifies the contribution of informative channels while
diminishing the impact of less significant ones, ensuring a more nuanced and accurate
representation of the data. The channel-wise encoder, armed with masked multi-head
attention, adeptly captures spatial correlations among both proximate and remote charg-
ing/discharging dynamics, adding another layer of depth to the analysis. The potential to
broaden diagnostic techniques also emerges from this setup, particularly through modeling
spatial dependencies. This process, which takes into account the continuity and periodicity
of time-series data, can provide deeper insights into the temporal patterns and variations
inherent in the battery’s performance. This approach offers a more comprehensive and
dynamic understanding of battery operations.
2.2.4. Gating Mechanism

The gating mechanism serves as a practical and straightforward method for amalga-
mating the outputs of the two encoder towers. Its role in efficiently integrating the learned
representations ensures an optimal synthesis of insights gathered from both towers. In
conjunction, a linear layer and softmax operation, acting as a normalized exponential func-
tion, were implemented. This arrangement functions like a multinomial logistic regression,
effectively generating the final prediction results. The utilization of these techniques not
only streamlines the prediction process but also enhances the accuracy and reliability of
the results. By harnessing the power of these methods, we ensure that the model benefits
from the full range of information captured by both encoders, leading to more robust and
precise estimations.
2.2.5. Hyperparameter Determination

In the present research, we conducted a thorough exploration on how various hyper-
parameters can impact SOC estimation for large-scale, real-world EV batteries (Figure 3).
The variables we studied include the quantity of attention heads, the size of embeddings,
and the layer count in the self-attention Transformer model. Each of these elements affects
the model’s capacity to learn an array of attention patterns and complex representations.
Key hyperparameters, such as the learning rate that affects the velocity of learning, and the
method of positional encoding that impacts the comprehension of temporal relationships,
were also considered. Other variables, such as the dropout rate, batch size, and weight
initialization techniques, were evaluated for their influence on the learning performance
and efficiency of the model. These hyperparameters were fine-tuned with careful consider-
ation to factors such as model performance, computational expenditure, and the specific
requirements of our task. Below are the details of our chosen configurations:
learning performance and efficiency of the model. These hyperparameters were fin
tuned with careful consideration to factors such as model performance, computation
expenditure, and the specific requirements of our task. Below are the details of our chos
configurations:
Electronics 2023, 12, 2598 (i) The model dimension in both the channel-wise and temporal-wise encoders 12 of 23 was
at 64, enabling it to capture rich feature information.
(ii) We used four layers in both the channel-wise and temporal-wise encoder, with
(i) The
batch model
size dimension
of 384, balancingin both the channel-wise
between learningand temporal-wise
capability encoders was set cost.
and computational
at 64, enabling it to capture rich feature information.
(iii) Each multi-head attention for each layer was set to eight heads, allowing the mod
(ii) We used four layers in both the channel-wise and temporal-wise encoder, with a batch
to focus on384,
size of multiple
balancinginput features
between learningsimultaneously.
capability and computational cost.
(iv) We(iii) conducted
Each multi-head1300attention
training forepochs
each layer towas
ensure
set tothorough
eight heads,learning.
allowing the model to
focus on multiple input features simultaneously.
(v) A dropout rate of 0.1 was applied as a regularization technique to prevent the mod
(iv) We conducted 1300 training epochs to ensure thorough learning.
from overfitting.
(v) A dropout rate of 0.1 was applied as a regularization technique to prevent the model
(vi) We employed the Adam optimizer for loss minimization, setting the initial learni
from overfitting.
rate
(vi) atWe2employed
for fasterthe convergence.
Adam optimizer for loss minimization, setting the initial learning
rate at 2 for faster convergence.
(vii) Gradient clipping with a value set at 1 was used to prevent the gradient values fro
(vii) Gradient
becoming tooclipping
large, with
known a value set atexploding
as the 1 was used gradients
to prevent the gradient values from
problem.
becoming too large, known as the exploding gradients problem.
(viii)A weight
(viii) decay
A weight decayrate
rateofof0.0001 waschosen
0.0001 was chosen to provide
to provide additional
additional regularization.
regularization.
(ix) Batch normalization
(ix) Batch normalization was wasimplemented
implemented to to accelerate learningand
accelerate learning andstabilize
stabilizethethe neu
network.neural network.
FigureFigure
3. Hyperparameters
3. Hyperparametersofofself-attention Transformer
self-attention Transformer model.
model.
3. Results
3. Results
3.1. Model Performance
3.1. ModelWe
Performance
leveraged battery time-series charging–discharging data by pre-training a two-
tower
We transformer
leveraged encoder
battery to extract dense
time-series vector representationsdata
charging–discharging of multivariate time- a tw
by pre-training
series. In this study, we initially pre-trained the Transformer model using
tower transformer encoder to extract dense vector representations of multivariate tim observational
data from tens of cells that were randomly collected throughout their operational lifetime.
series.These
In this study, we initially pre-trained the Transformer model using observation
data, with a sampling frequency of 10 s using onboard sensor measurements, were
data from tensthe
input into of Transformer
cells that were
model.randomly
The model’scollected throughout
output, in their operational
turn, is the corresponding SOC lifetim
Theseestimations
data, with foraeach
sampling
of thesefrequency of 10The
sampling points. s using onboard
proposed method sensor
can bemeasurements,
immediately we
applied to transient data while preserving prediction accuracy, obviating
input into the Transformer model. The model’s output, in turn, is the corresponding SO the necessity
for a steady-state detector and allowing for very large time-steps with high accuracy.
estimations for each of these sampling points. The proposed method can be immediate
The Transformer architecture is characterized by large data volumes, dynamic loading
applied to transient
operations, data
and high while preserving
correlations between theprediction accuracy,
dots for each obviating
sliding window whenthe necessity f
taking
into account the high-dimensional stochastic dynamics and probability distributions for
a steady-state detector and allowing for very large time-steps with high accuracy. The
Transformer architecture is characterized by large data volumes, dynamic loading opera-
tions, and high correlations between the dots for each sliding window when taking
Electronics 2023, 12, 2598
into
13 of 23
account the high-dimensional stochastic dynamics and probability distributions for in-
dustry-scale time-series data in physical problems. It was discovered that the Transformer
industry-scale
model provides efficient, time-series data in physical
easy-to-implement, problems.
meshless It was discovered that
implementations thethe
for Transformer
kind of
model provides efficient, easy-to-implement, meshless implementations for the kind of
pattern identification associated with persistently positive connectivity between these re-
pattern identification associated with persistently positive connectivity between these
gions across the sliding
regionswindow (Figure
across the sliding 4). (Figure 4).
window
Figure 4. Self-attention Transformer

Figure model
4. Self-attention of non-equilibrium
Transformer electrochemical
model of non-equilibrium electrochemicalsystem characteris-
system characteristics.
tics. (a) Sliding window for monitoring
(a) Sliding and analyzing
window for monitoring dynamic
and analyzing voltage
dynamic voltageand
and current and
current and tempera-
temperature.
(b,c) are the attention mapping for step-wise and channel-wise encoder,
ture. (b,c) are the attention mapping for step-wise and channel-wise encoder, respectively. respectively.
The attention mechanism is a fundamental component of the Transformer model

The attention mechanism
that lends it theispower
a fundamental component
to handle sequences of data.of theattention
The Transformer
mappingmodel that
is typically
performed through what is known as multi-head attention.
lends it the power to handle sequences of data. The attention mapping is typically per- This mechanism allows the
model to focus on different parts of the input sequence for each element in the output
formed through what is known
sequence. as multi-head
It provides a weightedattention.
combination This mechanism
of all input positionsallows theoutput
for each model
to focus on different parts wherein
position, of the input sequence
the weights denote for each element
the relevance in the
or attention the output
model pays sequence.
to each
input element
It provides a weighted when generating
combination of all a specific
inputoutput element.for
positions Multi-head attention calculates
each output position,
the compatibility or similarity score between different positions in the sequence through
wherein the weights denote
a dot product, the relevance
which or attention
is then scaled and passedthe model
through pays function
a softmax to eachtoinput ele-
yield the
ment when generating a specific
attention weights.output element.
These weights Multi-head
are then attention
used to create calculates
a weighted theinput
sum of the com-
values, allowing the model to focus on certain inputs while
patibility or similarity score between different positions in the sequence through a dot generating specific outputs.
Current rates, temperature, and aging conditions are three important factors to validate
product, which is then scaled andperformance
the generalization passed through a softmax
of SOC estimation function
model. to yield
Therefore, a wide the attention
range of cell
weights. These weights are then used to create a weighted sum of the
aging conditions and operating voltage/current/temperature windows are adopted input values, allow-
to
train and test the data-driven model for improving
ing the model to focus on certain inputs while generating specific outputs. accuracy and enhancing generalization.
In this context, our investigation extends across a total of 5 Li-ion cells and 1 large-scale
Current rates, battery
temperature, and aging
pack, as presented conditions
in Table 2. We haveare threetheimportant
divided factors dataset
model development to vali-
date the generalization performance
randomly of SOC
into two distinct estimation
sections, model.and
namely the training Therefore,
testing sets. aThese
widesetsrange
featureof
cell aging conditions and operating voltage/current/temperature windows are adopted to
train and test the data-driven model for improving accuracy and enhancing generaliza-
tion. In this context, our investigation extends across a total of 5 Li-ion cells and 1 large-
scale battery pack, as presented in Table 2. We have divided the model development da-
Electronics2023,
Electronics 12,x2598
2023,12, FOR PEER REVIEW 14 of 23 14 of 2
random real-world application scenarios, adding a layer of practical complexity to the

Datasets RMSE The estimation
investigation. APE MAE
results Operating
are summarized in Table 3. Conditions
Cell_1 0.4857 0.59% 1.6507% dynamic temperatures −4 °C to 4 °C
Cell_2 Table 3. The test
0.4356 errors over
0.71% the cell and pack
1.3208% dataset. dynamic temperatures 0 °C to 35 °C
Cell_3
Datasets RMSE
0.4047 0.67%
APE
1.1275%
MAE
aging conditions, 100% SOH
Operating Conditions
Cell_4 0.4046 0.60% 0.9461% aging conditions, 90% SOH
Cell_1 0.4857 0.59% 1.6507% dynamic temperatures −4 ◦ C to 4 ◦ C
Cell_5
Cell_2 0.4218
0.4356 0.41%
0.71% 1.0836%
1.3208% aging
dynamic conditions,
temperatures 0 ◦ C80%
to 35SOH
◦C
Battery pack, Cell_V_max

Cell_3 0.4033
0.4047 0.95%
0.67% 1.4876%
1.1275% Pack level, 20 °C to 25 °C, ~97.5% SOH
aging conditions, 100% SOH
Cell_4 0.4046 0.60% 0.9461%
Battery pack, Cell_V_min 0.4497 0.88% 1.7525% Pack aging
level,conditions,
20 °C to 90% SOH
25 °C, ~97.5% SOH
Cell_5 0.4218 0.41% 1.0836% aging conditions, 80% SOH
Battery pack, Cell_V_max 0.4033 0.95% 1.4876% Pack level, 20 ◦ C to 25 ◦ C, ~97.5% SOH
Battery pack, Cell_V_min 3.1.1. Cell
0.4497 Level SOC
0.88%Estimation at Dynamic Pack
1.7525% level, 20 ◦ C to 25 ◦ C, ~97.5% SOH
Temperatures
A prime objective behind the development of new algorithms is their ability to with
3.1.1.
standCell
andLevel SOC Estimation
function robustly inatthe
Dynamic
face ofTemperatures
field data. Factors such as missing or noisy data
A prime
outliers, andobjective behind the development
other inconsistencies of newinfluence
can drastically algorithms is model
the their ability to with- When
performance.
stand and function
considering modelrobustly in the face
performance, of field data.
predictive Factors
accuracy, andsuch as missing
estimation or noisy agains
robustness
data, outliers, and
temperature other inconsistencies
uncertainty, can drastically
scattered sensor influenceand
measurements the model
sensorperformance.
drift emerge as sig
When considering model performance, predictive accuracy, and estimation robustness
nificant considerations during the design of appropriate model architectures and nove
against temperature uncertainty, scattered sensor measurements and sensor drift emerge
training algorithms. This holds particularly true for real-world applications, where prac
as significant considerations during the design of appropriate model architectures and
tical constraints
novel and dynamic
training algorithms. environmental
This holds factors
particularly true come into applications,
for real-world play. where
We trained and tested the proposed Transformer
practical constraints and dynamic environmental factors come into play. algorithm at two dynamic operat
ing We
temperature
trained andwindows,
tested the ensuring that we scrutinized
proposed Transformer algorithmitsat performance under varying
two dynamic operat-
conditions.
ing temperatureFigures 5 andensuring
windows, 6 detail that
thesewetemperature
scrutinized its windows.
performanceThisunder
process not only test
varying
conditions. Figures 5 and 6 detail these temperature windows. This
the robustness of the algorithm against temperature fluctuations but also gauges process not onlyits adapt
tests the robustness of the algorithm against temperature fluctuations
ability and consistency of performance under dynamic conditions. It showcases but also gauges its the ro
adaptability and consistency of performance under dynamic conditions. It showcases the
bustness of the BERTtery model and its ability to handle imperfect data and temperatur
robustness of the BERTtery model and its ability to handle imperfect data and temperature
uncertainties efficiently. The validation process thus serves as a testament to the BERTtery
uncertainties efficiently. The validation process thus serves as a testament to the BERTtery
model’sresilience
model’s resilience
andand adaptability,
adaptability, affirming
affirming its applicability
its applicability and potential
and potential in practical
in practical,
real-world scenarios.
Figure5.5.SOC
Figure SOC estimation
estimation at operating
at operating temperature
temperature windowswindows
of −4 toof4 −4
◦ C to
for4the
°CCell_1.
for the(a)
Cell_1. (a) Voltag
Voltage
profile. (b) Current profile. (c) Temperature profile. (d) SOC estimation. (e) Estimation
profile. (b) Current profile. (c) Temperature profile. (d) SOC estimation. (e) Estimation error. error.
Electronics2023,
Electronics 2023,12,
12,2598
x FOR PEER REVIEW 15
15 of 23
23
Figure 6. SOC estimation at operating temperature windows of 0 to 35 °C for the Cell_2. (a) Voltage
Figure
Figure 6.(b)
profile.6. SOC estimation
estimation
Current at (c)
at
profile. operating
operating temperature
temperature
Temperature windows
profile.windows of 00 to
of
(d) SOC estimation.35 ◦°C
to 35 C(e)
forEstimation Voltage
the Cell_2.error.
(a) Voltage
profile. (b)
profile. (b) Current
Current profile.
profile. (c)
(c) Temperature
Temperatureprofile.
profile. (d)
(d) SOC
SOC estimation.
estimation. (e) (e) Estimation
Estimation error.
error.
3.1.2. Cell Level SOC Estimation at Different Aging Conditions
3.1.2.
3.1.2. Cell
Cell Level
Level SOC SOC Estimation
Estimation at at Different
Different Aging
Aging Conditions
Conditions
Aging is an intrinsic property of lithium-ion batteries that significantly influences
Aging is an intrinsic property of lithium-ion
Aging is an intrinsic property of lithium-ion batteries batteries that that
significantly influences
significantly their
influences
their performance and lifespan. Degradation phenomena, such as the loss of lithium in-
performance and lifespan. Degradation phenomena, such as
their performance and lifespan. Degradation phenomena, such as the loss of lithium in-the loss of lithium inventory
ventory
(LLI) (LLI)
and(LLI) and of
the loss theactive
loss of active material (LAM), pose considerable challenges to assess
ventory and the loss ofmaterial (LAM),(LAM),
active material pose considerable challenges
pose considerable to assess
challenges SOC
to assess
SOC estimation
estimation for for
batteriesbatteries
under under
varying varying
aging aging conditions.
conditions.
SOC estimation for batteries under varying aging conditions.
The Transformermodel
The model leverages additional information gleaned from the relation-
TheTransformer
Transformer modelleverages leverages additional
additional information
information gleaned
gleanedfrom fromthe the
relationship
relation-
ship
betweenbetweenSOC SOC SOC
and andand
input input data
datadata
across across different
different aging
aging conditions.
conditions. This ability to adapt
ship between input across different aging conditions.This Thisability
abilityto to adapt
adapt
to
to changes brought on by aging increases the model’s accuracy and its effectiveness in
to changes
changes brought
brought on on byby aging
aging increases
increases the the model’s
model’s accuracy
accuracy and and itsits effectiveness
effectiveness in in
real-world
real-world scenarios. A
scenarios.AAbatterybattery
battery is typically
is typically considered
consideredto have reached
to have its
reached end-of-life when
its end-of-life
real-world scenarios. is typically considered to have reached its end-of-life when
its fullits
when charge
full capacity
charge diminishes
capacity to 80%toof80%
diminishes the nominal
of the value—a
nominal key threshold
value—a key in battery
threshold in
its full charge capacity diminishes to 80% of the nominal value—a key threshold in battery
manufacturing.
battery Our training
manufacturing. Our and testing
training and cover
testingthis entire
cover spectrum,
this entire allowing allowing
spectrum, us to under- us
manufacturing. Our training and testing cover this entire spectrum, allowing us to under-
stand
to the performance
understand of the BERTtery
the performance model in a rangein
of the BERTtery of scenarios reflecting the service
stand the performance of the BERTtery model in amodel
range of ascenarios
range ofreflecting
scenariosthe reflecting
service
life of batteries. This process is divided into three groups, each
the service life of batteries. This process is divided into three groups, each representing representing different
life of batteries. This process is divided into three groups, each representing different
stages instages
different the battery
in thelife, as illustrated
battery in Figures
life, as illustrated in 7–9. In essence,
Figures by evaluating
7–9. In essence, the model’s
by evaluating the
stages in the battery life, as illustrated in Figures 7–9. In essence, by evaluating the model’s
performance under dynamic aging conditions, we delve into
model’s performance under dynamic aging conditions, we delve into an often overlooked an often overlooked but cru-
performance under dynamic aging conditions, we delve into an often overlooked but cru-
cialcrucial
but aspect aspect
of battery SOC estimation.
of battery SOC estimation. This helps
Thisensure that ourthat
helps ensure model remainsremains
our model robust,
cial aspect of battery SOC estimation. This helps ensure that our model remains robust,
adaptable,
robust, and accurate
adaptable, across the
and accurate full lifespan
across of a battery,
the full lifespan of a thereby
battery, enhancing its practical
thereby enhancing its
adaptable, and accurate across the full lifespan of a battery, thereby enhancing its practical
applicability
practical and usability
applicability in real-world
and usability applications.
in real-world applications.
applicability and usability in real-world applications.
Figure 7. SOC estimation at the aging conditions of 100% SOH for the Cell_3. (a) Voltage profile.
(b) Current profile. (c) Temperature profile. (d) SOC estimation. (e) Estimation error.
Electronics 2023, 12, 2598 16 of 23

Figure 7. SOC estimation at the aging conditions of 100% SOH for the Cell_3. (a) Voltage profile. (b)
Current profile. (c) Temperature profile. (d) SOC estimation. (e) Estimation error.
Figure 8.
8. SOC
SOC estimation
estimationat atthe
theaging
agingconditions
conditionsof 90% SOH forfor
the Cell_4. (a) Voltage profile. (b)
Figure
Figure 8. SOC estimation at the aging conditions ofof90%
90% SOH
SOH for thethe Cell_4.
Cell_4. (a) (a) Voltage
Voltage profile.
profile. (b)
Current
(b) profile.
Current (c)
profile. Temperature profile. (d) SOC estimation. (e) Estimation error.
Current profile. (c) (c) Temperature
Temperature profile.
profile. (d)(d)
SOCSOC estimation.
estimation. (e)(e) Estimation
Estimation error.
error.
Figure 9. SOC
SOC estimation
estimation at
atthe
theaging
agingconditions
conditionsofof80%
80%SOH
SOH forfor
thethe
Cell_5. (a) Voltage profile. (b)
Cell_5.
Figure
Current9.profile. (c) Temperature profile. (d) SOC estimation. (e) Estimation error.(a) Voltage profile.
(b) Current profile. (c) Temperature profile. (d) SOC estimation. (e) Estimation error.
3.1.3. SOC
3.1.3. SOC Estimation at at Pack Level
Level
3.1.3. SOC Estimation
Estimation atPackPack Level
Theintricate
The intricate operationofofaalithium-ion
lithium-ionbattery
battery rests upon a multitude of of factors such
The intricateoperation
operation of a lithium-ion battery rests
restsupon
upona multitude
a multitude factors such
of factors as
such
as diffusion
diffusion pathways,
pathways, electron/ion
electron/ion transport,
transport, various various phase transformations,
phase transformations, electro-
electrochemical
as diffusion pathways, electron/ion transport, various phase transformations, electro-
chemical
redox redox reactions,
reactions, both both reversible
reversible and and irreversible,
irreversible, charge–transfer
charge–transfer reactions, reactions,
and and
several
chemical redox reactions, both reversible and irreversible, charge–transfer reactions, and
several material-dependent
material-dependent elements.
elements.elements. However,
However,However, these
these operations operations become
become exponentially exponentially
complex
several material-dependent these operations become exponentially
complex
in practicalin applications,
practical applications,
where where hundreds
hundreds or even or even thousands
thousands of of lithium-ion
lithium-ion batteries bat-
are
complex in practical applications, where hundreds or even thousands of lithium-ion bat-
teries are interconnected
interconnected in a in a series-parallel
series-parallel architecturearchitecture
to provide to sufficient
provide sufficient
power power
and and
energy.
teries are interconnected in a series-parallel architecture to provide sufficient power and
energy.
Pack Pack modifications,
design design modifications, environmental
environmental conditions,conditions, and loading
and loading scenarios
scenarios are aarefewaa
energy. Pack design modifications, environmental conditions, and loading scenarios are
few among
among many many factors
factors that that
can can significantly
significantly impact impact
the the overall
overall performance
performance of of the
the bat-
battery
few among many factors that can significantly impact the overall performance of the bat-
tery
system.system. Ambient
Ambient temperature
temperature variations,
variations, cell
cell packaging packaging alterations,
alterations, batch-to-batch
batch-to-batch and cell-
tery system. Ambient temperature variations, cell packaging alterations, batch-to-batch
and cell-to-cell
to-cell inconsistencies
inconsistencies originatingoriginating from differing
from differing synthesissynthesis
conditions, conditions,
electrolyte electrolyte
wetting
and cell-to-cell inconsistencies originating from differing synthesis conditions, electrolyte
wetting procedures,
procedures, and mechanical
and mechanical propertiesproperties
can lead to can lead to substantial
substantial deviationsdeviations in the
in the predicted
wetting procedures, and mechanical properties can lead to substantial deviations in the
predicted outcomes.
outcomes. These complexities
These complexities emphasize emphasize the importance
the importance of the of the practical
practical appli-
application
predicted outcomes. These complexities emphasize the importance of the practical appli-
cation performance
performance of predictive
of predictive models.models. After
After all, it is all,
the itreal-world
is the real-world
efficacy efficacy
of these of these
models
cation performance of predictive models. After all, it is the real-world efficacy of these
that
modelsdetermines their value.
that determines theirAccordingly, we further
value. Accordingly, wescrutinize the Transformer
further scrutinize model’s
the Transformer
models that determines their value. Accordingly, we further scrutinize the Transformer
performance by employing it on one large-scale battery pack operating under dynamic
conditions. Figure 10 represents these tests. To concisely present the estimation, only the
cells with the maximum and minimum voltage are depicted in the plots.
model’s performance by employing it on one large-scale battery pack operating unde

Electronics 2023, 12, 2598 17 ofestimation
dynamic conditions. Figure 10 represents these tests. To concisely present the 23
only the cells with the maximum and minimum voltage are depicted in the plots.
Figure10.
Figure 10.SOC
SOCestimation
estimation at the
at the pack
pack level.
level. (a) Voltage
(a) Voltage profile.
profile. (b) Current
(b) Current profile.
profile. (c) (c) Temperatur
Temperature
profile.(d)
profile. (d)SOC
SOC estimation.
estimation. (e) (e) Estimation
Estimation error.error.
In this respect, the validation process transcends beyond a mere algorithmic scrutiny
In this respect, the validation process transcends beyond a mere algorithmic scrutiny
and extends into a comprehensive examination of the model’s adaptability to intricate,
and extends into a comprehensive examination of the model’s adaptability to intricate
multifactorial, and dynamic conditions. As the reliance on lithium-ion batteries in practical
multifactorial,
applications and dynamic
continues conditions.
to increase, As the
the necessity forreliance on lithium-ion
sophisticated, batteries
robust, and reliablein practi
cal applications continues to increase, the necessity for sophisticated, robust,
predictive models escalates correspondingly. It is this critical juncture of theoretical models and reliable
predictive
and models
practical escalates
applications correspondingly.
in which the true valueItof
is athis critical model
predictive juncture of theoretical mod
is ascertained,
els and practical
ultimately applications
contributing in which
to the continuous the trueand
evolution value of a predictive
optimization model
of battery is ascertained
technology.
ultimately contributing to the continuous evolution and optimization of battery technol
3.2. Model Training and Evaluation
ogy.
Numerous stochastic processes are involved in the instantiation of deep learning
models. All experiments were run with a predetermined seed value to guarantee the
3.2. Model Training and Evaluation
uniformity and repeatability of the results. Unlabeled vectors of input sequence were
Numerous
utilized stochastic
in the pre-training processes
stage aremodel.
to train the involved
The in the instantiation
metrics ofthe
that are used in deep
loss learning
models.and
function All model
experiments were
evaluation arerun with aas
described predetermined
follows. seed value to guarantee the uni
formity and repeatability of the results. Unlabeled vectors of input sequence were utilized
3.2.1. Loss Function
in the pre-training stage to train the model. The metrics that are used in the loss function
and The Transformer
model model
evaluation was trained
are described asusing an end-to-end approach, and the choice
follows.
of loss function is crucial in guiding this process. The loss function quantifies how far
the model’s predictions deviate from the actual values and serves as the criteria that
3.2.1. Loss Function
the learning algorithm seeks to minimize. The mean squared error (MSE) in regression
Thecan
problems Transformer
be expressedmodel
as: was trained using an end-to-end approach, and the choice o
loss function is crucial in guiding this process. The loss function quantifies how far the
model’s predictions deviateLMSE
from(y,
the 1 N 2
ŷ)actual
= ∑values
(yi − ŷand
i ) serves as the criteria that the learn
(11)
ing algorithm seeks to minimize. The mean N i=1squared error (MSE) in regression problem
can be expressed as:
where yi and ŷi are the observed and estimated value, respectively, of the i-th samples, and
n is the total number of samples in the dataset.
N
1 for 2 tasks, mainly due to
Mean Squared Error (MSE) is frequently
ℒ MSE(y, y)ˆchosen
= regression
 (y -yˆ i ) (11
i
its simplicity, computational efficiency, and focus onNamplifying
i=1 larger discrepancies. It is
differentiable, which is advantageous for optimization methods such as gradient descent,
where
and is a ycommon
i and y
ˆyardstick
i are theforobserved
gauging and estimated value,
the performance respectively,
of regression MSEi-th sam
of the
models.
quantifies
ples, and then isdeviation
the total between
number the predictedinSOC
of samples the and the actual values. To minimize
dataset.
this loss, the Adam optimizer [50] was deployed with a user-defined learning rate, which
Mean Squared Error (MSE) is frequently chosen for regression tasks, mainly due to
its simplicity, computational efficiency, and focus on amplifying larger discrepancies. It i
differentiable, which is advantageous for optimization methods such as gradient descent
Electronics 2023, 12, 2598 18 of 23
dynamically adjusts the model parameters during the training process, thereby ensuring a
smoother and more efficient convergence.
3.2.2. Evaluation Metrics

In this study, three metrics were adopted to evaluate the performance of SOC estima-
tion model, including root mean square error (RMSE), the maximum absolute error (MAE),
and average percentage error (APE). (a). RMSE is a widely used metric for evaluating
the accuracy of predictions. It measures the square root of the average of the squared
differences between the predicted SOC values and the corresponding ground truth values.
(b). MAE measures the maximum absolute difference between the predicted SOC values
and the true values. It provides an insight into the worst-case scenario of prediction error.
(c). APE quantifies the average percentage difference between the predicted SOC values
and the true values. It provides a measure of the relative error in the predictions.
These evaluation metrics were chosen to capture different aspects of the model’s
performance. RMSE and MAE focus on the absolute error, while APE provides insights
into the relative error. By considering all three metrics, researchers can assess the accuracy,
worst-case scenario, and relative performance of the SOC estimation model, facilitating a
comprehensive evaluation of its effectiveness in capturing battery SOC.
yi∗ is the observed SOC, ŷi∗ is the predicted SOC, and n is the total number of observa-
tional data. Therefore, RMSE can be calculated as:
v
ŷi − yi∗ 2
u n ∗
u1
RMSE = t ∑ (12)
n i=1 yi∗
Maximum absolute error (MAE) can be given by:
Errormax = max |ŷi∗ − yi∗ | × 100 (13)

1£i<<n
The average percentage error (APE) is defined as:
1 n ŷi∗ − yi∗
n i∑
ErrorAPE = × 100 (14)
=1 yi∗
3.3. Model Development and Applications

In this research, we utilized MATLAB for handling and manipulating the EV battery
field data, whereas Python, in tandem with open-source deep learning libraries such as
TensorFlow and PyTorch, was employed for constructing the Transformer model. Our
computing infrastructure is comprised of an Intel Core i7-4790K CPU clocked at 4.00
GHz, coupled with 32 GB of RAM, and a robust Nvidia GeForce RTX3090 GPU. Machine
learning models considerably enhance predictive capacity, especially for long-range spatial
connections spanning various time scales, all while reducing computational costs. However,
the computational and storage limitations of current on-board Microcontroller unit (MCU)
necessitate model pretraining for optimal performance.
The model’s deployment comprises two phases: offline pretraining (training and
testing) and online application. We utilized a private cloud for offline training, which had
been previously used for developing multiple machine learning techniques for assessing
battery state of health (SOH) and state of safety (SOS). References for the data generation,
methodology, and cloud framework can be found in the cited literature [26,27,51]. The
BMS’s embedded software is updated or calibrated using over-the-air (OTA) technology,
enabling Software as a Service (SAAS) for connected EVs, as shown in Figure 11.
Electronics
Electronics 2023, 12, x FOR PEER REVIEW
2023, 12, 2598 19 of 23
Figure
Figure 11. Over the11.
airOver the airfor
technology technology
the remote for the remote
software update.software update.
4. Discussion and Outlook

4. Discussion and Outlook
Machine learning methods, particularly deep learning [52], offer promising avenues
for advancing ourMachine learningand
understanding methods,
managementparticularly deep learning
of multiphysics [52], offer
and multiscale batterypromising
for advancing
systems, pushing our understanding
the boundaries of efficiency and andaccuracy.
management Amidstofour
multiphysics and multisca
relentless pursuit
systems, pushing the boundaries of efficiency and accuracy.
of sustainable and digitalized energy systems, these models play a pivotal role, demon- Amidst our relentles
of sustainable
strating superior andin
capabilities digitalized
extracting energy
meaningfulsystems,
insightsthese
frommodels play a pivotal role
high-dimensional
and complex data, and
strating thus facilitating
superior capabilities accurate predictionsmeaningful
in extracting and expedited trainingfrom
insights times.high-dim
However, and
certain challenges necessitate careful consideration. Real-life
complex data, and thus facilitating accurate predictions and expedited observational data, traini
which often includes time-series, lab data, and field data, are frequently
However, certain challenges necessitate careful consideration. Real-life scarce, noisy, and obse
not directly accessible for certain variables of interest. Therefore, it is crucial to leverage
data, which often includes time-series, lab data, and field data, are frequent
specialized network architectures or kernel-based regression networks that excel in gen-
noisy, and not directly accessible for certain variables of interest. Therefore, it is
eralization beyond limited data and adapt to dynamic operating conditions and different
leverage specialized network architectures or kernel-based regression networks
aging levels.
in generalization
As battery beyondevolve
technologies rapidly limited data
with newand cell adapt to dynamic
chemistries operating condi
and architectures,
predictivedifferent
models must agingadapt
levels.swiftly. Variabilities within the same battery chemistry,
caused by factorsAs such
batteryas the manufacturing
technologies processes,
rapidly evolvecellwithpackaging,
new celland equipmentand arch
chemistries
differences, compound this challenge. The models that can efficiently
predictive models must adapt swiftly. Variabilities within the same accommodate thesebattery c
variables and maintain high accuracy will undoubtedly garner greater attention.
caused by factors such as the manufacturing processes, cell packaging, and eq Moreover,
domain adaptation techniques that learn from diverse data sources and hybrid modeling
differences, compound this challenge. The models that can efficiently accommod
approaches combining physics-based and data-driven models can improve model gen-
variables and maintain high accuracy will undoubtedly garner greater attention
eralization and accuracy. The innovative learning paradigm has found a contemporary
ver,indomain
manifestation adaptation
the development techniques that Neural
of Physics-Informed learn from diverse
Networks data [53].
(PINNs) sources
Thisand hyb
eling approaches combining physics-based and data-driven
nascent category of deep learning algorithms proficiently integrates data with advanced models can impro
generalization
mathematical and accuracy.
constructs, including The innovative
partial differential learning
equations (PDEs),paradigm has found a co
even in instances
rary manifestation in the development of Physics-Informed Neural
where specific physics principles are omitted or not factored in. The future of cloud batteryNetworks
management system [54,55] heavily relies on tackling these
[53]. This nascent category of deep learning algorithms proficientlychallenges, leading to integrates
the d
creation ofadvanced
more precise and trustworthy predictive models across various applications.
mathematical constructs, including partial differential equations (PD An
in-depth investigation into the extent of generalization of these transformations is crucial,
in instances where specific physics principles are omitted or not factored in. The
identifying the range of observations for which one model can reliably map to another.
cloud battery management system [54,55] heavily relies on tackling these challen
Equally critical is defining the boundaries of this generalization—the point beyond which
ing fail
these models to the creation or
to transform of calibrate
more precise and to
in relation trustworthy
each other. predictive models across va
plications. An in-depth investigation into the extent of generalization of these
mations is crucial, identifying the range of observations for which one model ca
map to another. Equally critical is defining the boundaries of this generaliza
point beyond which these models fail to transform or calibrate in relation to eac
Electronics 2023, 12, 2598 20 of 23
Addressing these challenges is paramount in advancing towards a more sustainable

and digitalized energy landscape, where the role of machine learning in battery manage-
ment becomes increasingly crucial. This continual evolution also paves the way for the
convergence of digital technologies with sustainable energy systems, shaping the future of
the energy sector.
5. Conclusions
Deep learning has revolutionized the field of machine learning by allowing computa-
tional models composed of multiple processing layers to learn data representations with
multiple levels of abstraction. By leveraging the backpropagation algorithm, deep learning
uncovers intricate structures in large datasets, indicating how a machine should adjust its
internal parameters to compute the representation in each layer from the representation in
the previous layer. Transformer models employ a multi-headed attention system, making
them proficient in handling time series data. They concurrently seize the context—both
prior and succeeding—of each sequence element. The use of multiple attention heads
facilitates the analysis of different representational subspaces, enhancing the probing of
diverse relevance aspects among input elements within time series data. This capability
allows machines to be fed with raw time-series data and to automatically discover the
representations and extract temporal features required for classification or regression. In
this study, we showcase a bespoke two-tower Transformer neural network technique for
predicting the SOC of lithium-ion batteries, using field data from practical electric vehicle
(EV) applications. This model leverages the multi-head self-attention mechanism, which is
instrumental in achieving precise predictions. This mechanism excels at discerning and em-
phasizing critical data points while simultaneously mitigating the influence of less relevant
information. This model’s unique advantage is its ability to be trained solely on battery
time-series data, effectively eliminating the need for laborious feature engineering. The
strength of this approach lies in its adaptability to the dynamic nature of battery data, aided
by a 10 s sampling frequency, enabling the capture of battery states amidst fluctuating
operating conditions. The self-attention mechanism also allows the model to focus on
varying sequence lengths and dependencies, making it particularly effective in dealing
with the temporal nature of battery data. Furthermore, the two-tower architecture ensures
that the model can learn intricate correlations, maximizing the extraction of relevant infor-
mation. This study underscores the potential of integrating machine learning tools with
sparse sensor measurements, pushing the frontiers of battery state estimation in complex,
Author Contributions: Methodology, formal analysis, investigation, J.Z.; software, validation, J.Z.
and J.W.; writing—original draft, D.S. and J.Z.; writing—review and editing, J.Z., A.F.B. and H.Z.;
visualization, J.Z. and Z.W.; supervision, resources, project administration, A.F.B. and Y.L.; funding
acquisition, D.S. and Y.L. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by [Independent Innovation Projects of the Hubei Longzhong
Laboratory] grant number [2022ZZ-24], [Central Government to Guide Local Science and Technology
Development fund Projects of Hubei Province] grant number [2022BGE267], [Basic Research Type of
Science and Technology Planning Projects of Xiangyang City] grant number [2022ABH006759] and
[Hubei Superior and Distinctive Discipline Group of “New Energy Vehicle and Smart Transportation”]
grant number [XKTD072023].
Data Availability Statement: The data could not be shared due to confidentiality.
Conflicts of Interest: The authors declare no competing interests.
Electronics 2023, 12, 2598 21 of 23
Abbreviations
AKF Adaptive Kalman filter

APE Average percentage error
BERTtery Bidirectional encoder representations from transformers for batteries
CNN Convolutional neural network
DFFNN Deep feed-forward neural network
ECM Equivalent circuit model
EVs Electric vehicles
GRU Gated recurrent unit
LAM Loss of active material
LLI Loss of lithium inventory
LSTM Long short-term memory
MAE Maximum absolute error
MCU Microcontroller unit
MSE Mean squared error
OCV Open circuit voltage
OTA Over-the-air
P2D Pseudo-two-dimensional
PBM Physics-based mode
PINNs Physics-informed neural networks
RMSE Root mean square error
RNNs Recurrent neural networks
SAAS Software as a service
SOC State of charge
SOH State of health
SOS State of safety
SPM Single particle model
References
1. Crabtree, G. The coming electric vehicle transformation. Science 2019, 366, 422–424. [CrossRef] [PubMed]
2. Global Plug-In Electric Car Sales in October 2022 Increased by 55%. Available online: https://insideevs.com/news/625651
/global-plugin-electric-car-sales-october2022/ (accessed on 19 August 2022).
3. Mao, N.; Zhang, T.; Wang, Z.; Cai, Q. A systematic investigation of internal physical and chemical changes of lithium-ion batteries
during overcharge. J. Power Sources 2022, 518, 230767. [CrossRef]
4. Zhang, G.; Wei, X.; Chen, S.; Zhu, J.; Han, G.; Dai, H. Unlocking the thermal safety evolution of lithium-ion batteries under
shallow over-discharge. J. Power Sources 2022, 521, 230990. [CrossRef]
5. Dai, H.; Wei, X.; Sun, Z.; Wang, J.; Gu, W. Online cell SOC estimation of Li-ion battery packs using a dual time-scale Kalman
filtering for EV applications. Appl. Energy 2012, 95, 227–237. [CrossRef]
6. Tostado-Véliz, M.; Kamel, S.; Hasanien, H.M.; Arévalo, P.; Turky, R.A.; Jurado, F. A stochastic-interval model for optimal
scheduling of PV-assisted multi-mode charging stations. Energy 2022, 253, 124219. [CrossRef]
7. Ng, K.S.; Moo, C.S.; Chen, Y.P.; Hsieh, Y.C. Enhanced coulomb counting method for estimating state-of-charge and state-of-health
of lithium-ion batteries. Appl. Energy 2009, 86, 1506–1511. [CrossRef]
8. Wang, S.L.; Xiong, X.; Zou, C.Y.; Chen, L.; Jiang, C.; Xie, Y.X.; Stroe, D.I. An improved coulomb counting method based on dual
open-circuit voltage and real-time evaluation of battery dischargeable capacity considering temperature and battery aging. Int. J.
Energy Res. 2021, 45, 17609–17621. [CrossRef]
9. Lee, S.; Kim, J.; Lee, J.; Cho, B.H. State-of-charge and capacity estimation of lithium-ion battery using a new open-circuit voltage
versus state-of-charge. J. Power Sources 2008, 185, 1367–1373. [CrossRef]
10. Pattipati, B.; Balasingam, B.; Avvari, G.V.; Pattipati, K.R.; Bar-Shalom, Y. Open circuit voltage characterization of lithium-ion
batteries. J. Power Sources 2014, 269, 317–333. [CrossRef]
11. Peng, J.; Luo, J.; He, H.; Lu, B. An improved state of charge estimation method based on cubature Kalman filter for lithium-ion
batteries. Appl. Energy 2019, 253, 113520. [CrossRef]
12. Lim, K.; Bastawrous, H.A.; Duong, V.H.; See, K.W.; Zhang, P.; Dou, S.X. Fading Kalman filter-based real-time state of charge
estimation in LiFePO4 battery-powered electric vehicles. Appl. Energy 2016, 169, 40–48. [CrossRef]
13. Sepasi, S.; Ghorbani, R.; Liaw, B.Y. A novel on-board state-of-charge estimation method for aged Li-ion batteries based on model
adaptive extended Kalman filter. J. Power Sources 2014, 245, 337–344. [CrossRef]
14. Xiong, R.; Tian, J.; Shen, W.; Sun, F. A novel fractional order model for state of charge estimation in lithiumion batteries. IEEE
Trans. Veh. Technol. 2018, 68, 4130–4139. [CrossRef]
Electronics 2023, 12, 2598 22 of 23
15. Zhang, C.; Allafi, W.; Dinh, Q.; Ascencio, P.; Marco, J. Online estimation of battery equivalent circuit model parameters and state
of charge using decoupled least squares technique. Energy 2018, 142, 678–688. [CrossRef]
16. Meng, J.; Ricco, M.; Luo, G.; Swierczynski, M.; Stroe, D.I.; Stroe, A.I. An overview and comparison of online implementable SOC
estimation methods for lithium-ion battery. IEEE Trans. Ind. Appl. 2017, 54, 1583–1591. [CrossRef]
17. Marongiu, A.; Nußbaum, F.G.W.; Waag, W.; Garmendia, M.; Sauer, D.U. Comprehensive study of the influence of aging on
the hysteresis behavior of a lithium iron phosphate cathode-based lithium ion battery–An experimental investigation of the
hysteresis. Appl. Energy 2016, 171, 629–645. [CrossRef]
18. Fleckenstein, M.; Bohlen, O.; Roscher, M.A.; Bäker, B. Current density and state of charge inhomogeneities in Li-ion battery cells
with LiFePO4 as cathode material due to temperature gradients. J. Power Sources 2011, 196, 4769–4778. [CrossRef]
19. Fan, K.; Wan, Y.; Wang, Z.; Jiang, K. Time-efficient identification of lithium-ion battery temperature-dependent OCV-SOC curve
using multi-output Gaussian process. Energy 2023, 268, 126724. [CrossRef]
20. Shrivastava, P.; Soon, T.K.; Idris, M.Y.I.B.; Mekhilef, S. Overview of model-based online state-of-charge estimation using Kalman
filter family for lithium-ion batteries. Renew. Sustain. Energy Rev. 2019, 113, 109233. [CrossRef]
21. Ye, M.; Guo, H.; Xiong, R.; Yu, Q. A double-scale and adaptive particle filter-based online parameter and state of charge estimation
method for lithium-ion batteries. Energy 2018, 144, 789–799. [CrossRef]
22. Xiong, R.; Yu, Q.; Lin, C. A novel method to obtain the open circuit voltage for the state of charge of lithium ion batteries in
electric vehicles by using H infinity filter. Appl. Energy 2017, 207, 346–353. [CrossRef]
23. Lai, X.; Zheng, Y.; Sun, T. A comparative study of different equivalent circuit models for estimating state-of-charge of lithium-ion
batteries. Electrochim. Acta 2018, 259, 566–577. [CrossRef]
24. Liu, Y.; Ma, R.; Pang, S.; Xu, L.; Zhao, D.; Wei, J.; Huangfu, Y.; Gao, F. A nonlinear observer SOC estimation method based on
electrochemical model for lithium-ion battery. IEEE Trans. Ind. Appl. 2020, 57, 1094–1104. [CrossRef]
25. Roman, D.; Saxena, S.; Robu, V.; Pecht, M.; Flynn, D. Machine learning pipeline for battery state-of-health estimation. Nat. Mach.
Intell. 2021, 3, 447–456. [CrossRef]
26. Zhao, J.; Ling, H.; Liu, J.; Wang, J.; Burke, A.F.; Lian, Y. Machine learning for predicting battery capacity for electric vehicles.
eTransportation 2023, 15, 100214. [CrossRef]
27. Zhao, J.; Ling, H.; Wang, J.; Burke, A.F.; Lian, Y. Data-driven prediction of battery failure for electric vehicles. Iscience
2022, 25, 104172. [CrossRef]
28. Correa-Baena, J.P.; Hippalgaonkar, K.; Duren, J.V.; Jaffer, S.; Chandrasekhar, V.R.; Stevanovic, V.; Wadia, C.; Guha, S.; Buonassisi, T.
Accelerating materials development via automation, machine learning, and high-performance computing. Joule 2018, 2, 1410–1420.
[CrossRef]
29. Severson, K.A.; Attia, P.M.; Jin, N.; Perkins, N.; Jiang, B.; Yang, Z.; Chen, M.H.; Aykol, M.; Herring, P.K.; Fraggedakis, D.; et al.
Data-driven prediction of battery cycle life before capacity degradation. Nat. Energy 2019, 4, 383–391. [CrossRef]
30. Zhao, J.; Burke, A.F. Electric Vehicle Batteries: Status and Perspectives of Data-Driven Diagnosis and Prognosis. Batteries 2022,
8, 142. [CrossRef]
31. Zhao, J.; Burke, A.F. Battery prognostics and health management for electric vehicles under industry 4.0. J. Energy Chem. 2023,
in press. [CrossRef]
32. Zheng, Y.; Ouyang, M.; Han, X.; Lu, L.; Li, J. Investigating the error sources of the online state of charge estimation methods for
lithium-ion batteries in electric vehicles. J. Power Sources 2018, 377, 161–188. [CrossRef]
33. Aykol, M.; Herring, P.; Anapolsky, A. Machine learning for continuous innovation in battery technologies. Nat. Rev. Mater. 2020,
5, 725–727. [CrossRef]
34. Wang, Q.; Ye, M.; Wei, M.; Lian, G.; Li, Y. Deep convolutional neural network based closed-loop SOC estimation for lithium-ion
batteries in hierarchical scenarios. Energy 2023, 263, 125718. [CrossRef]
35. Quan, R.; Liu, P.; Li, Z.; Li, Y.; Chang, Y.; Yan, H. A multi-dimensional residual shrinking network combined with a long
short-term memory network for state of charge estimation of Li-ion batteries. J. Energy Storage 2023, 57, 106263. [CrossRef]
36. Chen, J.; Zhang, Y.; Wu, J.; Cheng, W.; Zhu, Q. SOC estimation for lithium-ion battery using the LSTM-RNN with extended input
and constrained output. Energy 2023, 262, 125375. [CrossRef]
37. Hong, J.; Wang, Z.; Chen, W.; Wang, L.Y.; Qu, C. Online joint-prediction of multi-forward-step battery SOC using LSTM neural
networks and multiple linear regression for real-world electric vehicles. J. Energy Storage 2020, 30, 101459. [CrossRef]
38. Bian, C.; He, H.; Yang, S. Stacked bidirectional long short-term memory networks for state-of-charge estimation of lithium-ion
batteries. Energy 2020, 191, 116538. [CrossRef]
39. Yang, F.; Li, W.; Li, C.; Miao, Q. State-of-charge estimation of lithium-ion batteries based on gated recurrent neural network.
Energy 2019, 175, 66–75. [CrossRef]
40. Jiao, M.; Wang, D.; Qiu, J. A GRU-RNN based momentum optimized algorithm for SOC estimation. J. Power Sources 2020,
459, 228051. [CrossRef]
41. Chen, J.; Zhang, Y.; Li, W.; Cheng, W.; Zhu, Q. State of charge estimation for lithium-ion batteries using gated recurrent unit
recurrent neural network and adaptive Kalman filter. J. Energy Storage 2022, 55, 105396. [CrossRef]
42. Takyi-Aninakwa, P.; Wang, S.; Zhang, H.; Yang, X.; Fernandez, C. A hybrid probabilistic correction model for the state of charge
estimation of lithium-ion batteries considering dynamic currents and temperatures. Energy 2023, 273, 127231. [CrossRef]
Electronics 2023, 12, 2598 23 of 23
43. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you
need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA,
4–9 December 2017.
44. Hannan, M.A.; How, D.N.; Lipu, M.H.; Mansor, M.; Ker, P.J.; Dong, Z.Y.; Sahari, K.S.; Tiong, S.K.; Muttaqi, K.M.; Mahlia, T.I.; et al.
Deep learning approach towards accurate state of charge estimation for lithium-ion batteries using self-supervised transformer
model. Sci. Rep. 2021, 11, 19541. [CrossRef] [PubMed]
45. Shen, H.; Zhou, X.; Wang, Z.; Wang, J. State of charge estimation for lithium-ion battery using Transformer with immersion and
invariance adaptive observer. J. Energy Storage 2022, 45, 103768. [CrossRef]
46. Shi, D.; Zhao, J.; Wang, Z.; Zhao, H.; Eze, C.; Wang, J.; Lian, Y.; Burke, A.F. Cloud-Based Deep Learning for Co-Estimation of
Battery State of Charge and State of Health. Energies 2023, 16, 3855. [CrossRef]
47. Sulzer, V.; Mohtat, P.; Aitio, A.; Lee, S.; Yeh, Y.T.; Steinbacher, F.; Khan, M.U.; Lee, J.W.; Siegel, J.B.; Stefanopoulou, A.G.; et al. The
challenge and opportunity of battery lifetime prediction from field data. Joule 2021, 5, 1934–1955. [CrossRef]
48. Ahmed, S.; Nielsen, I.E.; Tripathi, A.; Siddiqui, S.; Rasool, G.; Ramachandran, R.P. Transformers in time-series analysis: A tutorial.
arXiv 2022, arXiv:2205.01138.
49. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks
from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
50. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning
Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–13.
51. Zhao, J.; Nan, J.; Wang, J.; Ling, H.; Lian, Y.; Burke, A.F. Battery Diagnosis: A Lifelong Learning Framework for Electric Vehicles.
In Proceedings of the 2022 IEEE Vehicle Power and Propulsion Conference (VPPC), Merced, CA, USA, 1–4 November 2022;
pp. 1–6.
52. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444.
53. Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys.
2021, 3, 422–440. [CrossRef]
54. Shi, D.; Zhao, J.; Eze, C.; Wang, Z.; Wang, J.; Lian, Y.; Burke, A.F. Cloud-Based Artificial Intelligence Framework for Battery
Management System. Energies 2023, 16, 4403. [CrossRef]
55. Tran, M.K.; Panchal, S.; Khang, T.D.; Panchal, K.; Fraser, R.; Fowler, M. Concept review of a cloud-based smart battery management
system for lithium-ion batteries: Feasibility, logistics, and functionality. Batteries 2022, 8, 19. [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Electronics 12 02598

Uploaded by

Copyright:

Available Formats

Electronics 12 02598

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Electronics 12 02598

Uploaded by

Copyright:

Available Formats

electronics

Academic Editors: Alon Kuperman

Received: 4 May 2023

Electronics 2023, 12, 2598. https://doi.org/10.3390/electronics12122598 https://www.mdpi.com/journal/electronics

sources of uncertainty, including complex physio-chemical mechanisms, significant cell-

1.1. Current Methods for SOC Estimation

In the recent technological era, a multitude of innovative machine learning methods

Methods Advantages Disadvantages

1.2. Contributions and Structure of the Work

Considering the fast-paced advancements in this field, we conclude by providing an outlook

2. Materials and Methods

Table 2. Datasets used for machine learning modelling.

Despite significant advancements in battery states estimation research, a prevalent gap

2.2. Transformer-Based Neural Network

where xi represent the input segments, ∑nj=1 ωij = 1 and 1 ≤ i, j ≤ n.

PE(pos)2i = sin(p/100002i/dx ) (3)

PE(pos)2i+1 = cos(p/100002i/dx ) (4)

2.2.3. Two-Tower Structure

The matrices Wq and Wk of dimension Rd*d k , as well as Wv of dimension Rd*d v , embody

MultiHead( Q, K, V ) = Concat(head1 , . . . , headn )WO (9)

where headi = Attention( QWi Q , KWi K , VWi V ) (10)

(i) Temporal-Wise Encoder

2.2.4. Gating Mechanism

2.2.5. Hyperparameter Determination

Figure 4. Self-attention Transformer

The attention mechanism is a fundamental component of the Transformer model

random real-world application scenarios, adding a layer of practical complexity to the

Battery pack, Cell_V_max

Electronics 2023, 12, 2598 16 of 23

model’s performance by employing it on one large-scale battery pack operating unde

3.2.2. Evaluation Metrics

Maximum absolute error (MAE) can be given by:

Errormax = max |ŷi∗ − yi∗ | × 100 (13)

The average percentage error (APE) is defined as:

3.3. Model Development and Applications

4. Discussion and Outlook

Addressing these challenges is paramount in advancing towards a more sustainable

AKF Adaptive Kalman filter

You might also like

The matrices Wq and Wk of dimension Rdd k , as well as Wv of dimension Rdd v , embody